mlprimitives.custom.timeseries_preprocessing module

mlprimitives.custom.timeseries_preprocessing.cutoff_window_sequences(X, timeseries, window_size, cutoff_time=None, time_index=None)[source]

Extract timeseries sequences based on cutoff times.

Parameters
  • X (pandas.DataFrame) – pandas.DataFrame containing the cutoff time alongside any other values that need to be used to filter the matching timeseries data. The cutoff time can either be set as the DataFrame index or as a column.

  • timeseries (pandas.DataFrame) – pandas.DataFrame containing the actual timeseries data. The time index and either be set as the DataFrame index or as a column.

  • window_size (int, str or Timedelta) – If an integer is passed, it is the number of elements to take before the cutoff time for each sequence. If a string or a Timedelta object is passed, it is the period of time we take the elements from.

  • cutoff_time (str) – Optional. If given, the indicated column will be used as the cutoff time. Otherwise, the table index will be used.

  • time_index (str) – Optional. If given, the indicated column will be used as the timeseries index. Otherwise, the table index will be used.

Returns

Numpy array with three dimentions. The frst dimension will have the same length as X, and each of the 2D matrices within it will correspond to one row in the X table.

Return type

numpy.ndarray

mlprimitives.custom.timeseries_preprocessing.intervals_to_mask(index, intervals)[source]

Create boolean mask from given intervals.

The function creates an boolean array of same size as the given index array. If an index value is within a given interval, the corresponding mask value is True.

Parameters
  • index (ndarray) – Array containing the index values.

  • intervals (list or ndarray) – List or array of intervals, consisting of start-index and end-index for each interval.

Returns

Array of boolean values, with one boolean value for each index value (True if the index value is contained in a given interval, otherwise False).

Return type

ndarray

mlprimitives.custom.timeseries_preprocessing.rolling_window_sequences(X, index, window_size, target_size, step_size, target_column, offset=0, drop=None, drop_windows=False)[source]

Create rolling window sequences out of time series data.

The function creates an array of input sequences and an array of target sequences by rolling over the input sequence with a specified window. Optionally, certain values can be dropped from the sequences.

Parameters
  • X (ndarray) – N-dimensional sequence to iterate over.

  • index (ndarray) – Array containing the index values of X.

  • window_size (int) – Length of the input sequences.

  • target_size (int) – Length of the target sequences.

  • step_size (int) – Indicating the number of steps to move the window forward each round.

  • target_column (int) – Indicating which column of X is the target.

  • offset (int) – Indicating the number of steps between the input and the target sequence.

  • drop (ndarray or None or str or float or bool) – Optional. Array of boolean values indicating which values of X are invalid, or value indicating which value should be dropped. If not given, None is used.

  • drop_windows (bool) – Optional. Indicates whether the dropping functionality should be enabled. If not given, False is used.

Returns

  • input sequences.

  • target sequences.

  • first index value of each input sequence.

  • first index value of each target sequence.

Return type

ndarray, ndarray, ndarray, ndarray

mlprimitives.custom.timeseries_preprocessing.time_segments_aggregate(X, interval, time_column, method=['mean'])[source]

Aggregate values over given time span.

Parameters
  • X (ndarray or pandas.DataFrame) – N-dimensional sequence of values.

  • interval (int) – Integer denoting time span to compute aggregation of.

  • time_column (int) – Column of X that contains time values.

  • method (str or list) – Optional. String describing aggregation method or list of strings describing multiple aggregation methods. If not given, mean is used.

Returns

  • Sequence of aggregated values, one column for each aggregation method.

  • Sequence of index values (first index of each aggregated segment).

Return type

ndarray, ndarray

mlprimitives.custom.timeseries_preprocessing.time_segments_average(X, interval, time_column)[source]

Compute average of values over given time span.

Parameters
  • X (ndarray or pandas.DataFrame) – N-dimensional sequence of values.

  • interval (int) – Integer denoting time span to compute average of.

  • time_column (int) – Column of X that contains time values.

Returns

  • Sequence of averaged values.

  • Sequence of index values (first index of each averaged segment).

Return type

ndarray, ndarray