mlprimitives.custom.timeseries_anomalies module¶
Time Series anomaly detection functions. Implementation inspired by the paper https://arxiv.org/pdf/1802.04431.pdf
-
mlprimitives.custom.timeseries_anomalies.
count_above
(errors, epsilon)[source]¶ Count number of errors and continuous sequences above epsilon.
Continuous sequences are counted by shifting and counting the number of positions where there was a change and the original value was true, which means that a sequence started at that position.
- Parameters
errors (ndarray) – Array of errors.
epsilon (ndarray) – Threshold value.
- Returns
Number of errors above epsilon.
Number of continuous sequences above epsilon.
- Return type
int, int
-
mlprimitives.custom.timeseries_anomalies.
deltas
(errors, epsilon, mean, std)[source]¶ Compute mean and std deltas.
delta_mean = mean(errors) - mean(all errors below epsilon) delta_std = std(errors) - std(all errors below epsilon)
- Parameters
errors (ndarray) – Array of errors.
epsilon (ndarray) – Threshold value.
mean (float) – Mean of errors.
std (float) – Standard deviation of errors.
- Returns
delta_mean.
delta_std.
- Return type
float, float
-
mlprimitives.custom.timeseries_anomalies.
find_anomalies
(errors, index, z_range=(0, 10), window_size=None, window_step_size=None, min_percent=0.1, anomaly_padding=50, lower_threshold=False)[source]¶ Find sequences of error values that are anomalous.
We first define the window of errors, that we want to analyze. We then find the anomalous sequences in that window and store the start/stop index pairs that correspond to each sequence, along with its score. Optionally, we can flip the error sequence around the mean and apply the same procedure, allowing us to find unusually low error sequences. We then move the window and repeat the procedure. Lastly, we combine overlapping or consecutive sequences.
- Parameters
errors (ndarray) – Array of errors.
index (ndarray) – Array of indices of the errors.
z_range (list) – Optional. List of two values denoting the range out of which the start points for the scipy.fmin function are chosen. If not given, (0, 10) is used.
window_size (int) – Optional. Size of the window for which a threshold is calculated. If not given, None is used, which finds one threshold for the entire sequence of errors.
window_step_size (int) – Optional. Number of steps the window is moved before another threshold is calculated for the new window.
min_percent (float) – Optional. Percentage of separation the anomalies need to meet between themselves and the highest non-anomalous error in the window sequence. It nof given, 0.1 is used.
anomaly_padding (int) – Optional. Number of errors before and after a found anomaly that are added to the anomalous sequence. If not given, 50 is used.
lower_threshold (bool) – Optional. Indicates whether to apply a lower threshold to find unusually low errors. If not given, False is used.
- Returns
Array containing start-index, end-index, score for each anomalous sequence that was found.
- Return type
ndarray
-
mlprimitives.custom.timeseries_anomalies.
regression_errors
(y, y_hat, smoothing_window=0.01, smooth=True)[source]¶ Compute an array of absolute errors comparing predictions and expected output.
If smooth is True, apply EWMA to the resulting array of errors.
- Parameters
y (ndarray) – Ground truth.
y_hat (ndarray) – Predicted values.
smoothing_window (float) – Optional. Size of the smoothing window, expressed as a proportion of the total length of y. If not given, 0.01 is used.
smooth (bool) – Optional. Indicates whether the returned errors should be smoothed with EWMA. If not given, True is used.
- Returns
Array of errors.
- Return type
ndarray
-
mlprimitives.custom.timeseries_anomalies.
z_cost
(z, errors, mean, std)[source]¶ Compute how bad a z value is.
The original formula is:
(delta_mean/mean) + (delta_std/std) ------------------------------------------------------ number of errors above + (number of sequences above)^2
which computes the “goodness” of z, meaning that the higher the value the better the z.
In this case, we return this value inverted (we make it negative), to convert it into a cost function, as later on we will use scipy.fmin to minimize it.
- Parameters
z (ndarray) – Value for which a cost score is calculated.
errors (ndarray) – Array of errors.
mean (float) – Mean of errors.
std (float) – Standard deviation of errors.
- Returns
Cost of z.
- Return type
float