mlprimitives.custom.timeseries_anomalies module

Time Series anomaly detection functions. Implementation inspired by the paper https://arxiv.org/pdf/1802.04431.pdf

mlprimitives.custom.timeseries_anomalies.count_above(errors, epsilon)[source]

Count number of errors and continuous sequences above epsilon.

Continuous sequences are counted by shifting and counting the number of positions where there was a change and the original value was true, which means that a sequence started at that position.

Parameters
  • errors (ndarray) – Array of errors.

  • epsilon (ndarray) – Threshold value.

Returns

  • Number of errors above epsilon.

  • Number of continuous sequences above epsilon.

Return type

int, int

mlprimitives.custom.timeseries_anomalies.deltas(errors, epsilon, mean, std)[source]

Compute mean and std deltas.

delta_mean = mean(errors) - mean(all errors below epsilon) delta_std = std(errors) - std(all errors below epsilon)

Parameters
  • errors (ndarray) – Array of errors.

  • epsilon (ndarray) – Threshold value.

  • mean (float) – Mean of errors.

  • std (float) – Standard deviation of errors.

Returns

  • delta_mean.

  • delta_std.

Return type

float, float

mlprimitives.custom.timeseries_anomalies.find_anomalies(errors, index, z_range=(0, 10), window_size=None, window_step_size=None, min_percent=0.1, anomaly_padding=50, lower_threshold=False)[source]

Find sequences of error values that are anomalous.

We first define the window of errors, that we want to analyze. We then find the anomalous sequences in that window and store the start/stop index pairs that correspond to each sequence, along with its score. Optionally, we can flip the error sequence around the mean and apply the same procedure, allowing us to find unusually low error sequences. We then move the window and repeat the procedure. Lastly, we combine overlapping or consecutive sequences.

Parameters
  • errors (ndarray) – Array of errors.

  • index (ndarray) – Array of indices of the errors.

  • z_range (list) – Optional. List of two values denoting the range out of which the start points for the scipy.fmin function are chosen. If not given, (0, 10) is used.

  • window_size (int) – Optional. Size of the window for which a threshold is calculated. If not given, None is used, which finds one threshold for the entire sequence of errors.

  • window_step_size (int) – Optional. Number of steps the window is moved before another threshold is calculated for the new window.

  • min_percent (float) – Optional. Percentage of separation the anomalies need to meet between themselves and the highest non-anomalous error in the window sequence. It nof given, 0.1 is used.

  • anomaly_padding (int) – Optional. Number of errors before and after a found anomaly that are added to the anomalous sequence. If not given, 50 is used.

  • lower_threshold (bool) – Optional. Indicates whether to apply a lower threshold to find unusually low errors. If not given, False is used.

Returns

Array containing start-index, end-index, score for each anomalous sequence that was found.

Return type

ndarray

mlprimitives.custom.timeseries_anomalies.regression_errors(y, y_hat, smoothing_window=0.01, smooth=True)[source]

Compute an array of absolute errors comparing predictions and expected output.

If smooth is True, apply EWMA to the resulting array of errors.

Parameters
  • y (ndarray) – Ground truth.

  • y_hat (ndarray) – Predicted values.

  • smoothing_window (float) – Optional. Size of the smoothing window, expressed as a proportion of the total length of y. If not given, 0.01 is used.

  • smooth (bool) – Optional. Indicates whether the returned errors should be smoothed with EWMA. If not given, True is used.

Returns

Array of errors.

Return type

ndarray

mlprimitives.custom.timeseries_anomalies.z_cost(z, errors, mean, std)[source]

Compute how bad a z value is.

The original formula is:

         (delta_mean/mean) + (delta_std/std)
------------------------------------------------------
number of errors above + (number of sequences above)^2

which computes the “goodness” of z, meaning that the higher the value the better the z.

In this case, we return this value inverted (we make it negative), to convert it into a cost function, as later on we will use scipy.fmin to minimize it.

Parameters
  • z (ndarray) – Value for which a cost score is calculated.

  • errors (ndarray) – Array of errors.

  • mean (float) – Mean of errors.

  • std (float) – Standard deviation of errors.

Returns

Cost of z.

Return type

float