Cardea
Featurization.
generate_feature_matrix
Calculates a feature matrix and features given in Featurization object.
es (featuretools.EntitySet) – An already initialized entityset.
target (str) – Name of the entity (entity id) on which to make predictions.
label_times (pandas.DataFrame) – A data frame that specifies the times at which to calculate the features for each instance. This data frame contains three columns instance_id, time, label. The instance_id specifies the instances for which to calculate features over. The time column specifies the cutoff time for each instance. Data before the cutoff time will be used for calculating the feature matrix. The label column specifies the ground truth label (value we want to predict) for each instance.
instance_id
time
label
instance_ids (list) – List of instances on which to calculate features.
agg_primitives (list) – List of Aggregation Feature types to apply.
trans_primitives (list) – List of Transform Feature functions to apply.
max_depth (int) – Maximum allowed depth of features.
ignore_entities (list) – List of entities to blacklist when creating features.
ignore_variables (dict) – List of specific variables within each entity to blacklist when creating features.
seed_features (list) – List of manually defined features to use.
drop_contains (list) – Drop features that contains these strings in name.
drop_exact (list) – Drop features that exactly match these strings in name.
max_features (int) – Cap the number of generated features to this number. If -1, no limit.
training_window (ft.Timedelta or str) – Window defining how much time before the cutoff time data can be used when c alculating features. If None, all data before cutoff time is used. Defaults to None. Month and year units are not relative when Pandas Timedeltas are used. Relative units should be passed as a Featuretools Timedelta or a string.
None
n_jobs (int) – Number of parallel processes to use when calculating feature matrix.
verbose (bool) – An indicator of verbose option.
include_cutoff_time (bool) – Include data at cutoff times in feature calculations. Defaults to True.
True
encode (bool) – Whether or not to encode categorical into one-hot features.
The generated feature matrix.
List of feature definitions in the feature matrix.
pandas.DataFrame, list