Quickstart

Below is a short tutorial that will show you how to get started using MLBlocks.

In this tutorial we will learn how to:

  • Create a pipeline using multiple primitives

  • Obtain the list of tunable hyperparameters from the pipeline

  • Specify hyperparameters for each primitive in the pipeline

  • Fit the pipeline using training data

  • Use the pipeline to make predictions from new data

Note

Some additional dependencies are required in order to run this Quickstart. Make sure that you have already installed them.

Creating a pipeline

With MLBlocks, creating a pipeline is as simple as specifying a list of primitives and passing them to the MLPipeline class:

In [1]: from mlblocks import MLPipeline

In [2]: primitives = [
   ...:     'mlprimitives.custom.preprocessing.ClassEncoder',
   ...:     'mlprimitives.custom.feature_extraction.CategoricalEncoder',
   ...:     'sklearn.impute.SimpleImputer',
   ...:     'xgboost.XGBClassifier',
   ...:     'mlprimitives.custom.preprocessing.ClassDecoder'
   ...: ]
   ...: 

In [3]: pipeline = MLPipeline(primitives)

Optionally, specific hyperparameters can be also set by specifying them in a dictionary and passing them as the init_params argument:

In [4]: init_params = {
   ...:     'sklearn.impute.SimpleImputer': {
   ...:         'strategy': 'median'
   ...:     }
   ...: }
   ...: 

In [5]: pipeline = MLPipeline(primitives, init_params=init_params)

Once the pipeline has been instantiated, we can easily see what hyperparameters have been set for each block, by calling the get_hyperparameters method.

The output of this method is a dictionary which has the name of each block as keys and a dictionary with the hyperparameters of the corresponding block as values.

In [6]: pipeline.get_hyperparameters()
Out[6]: 
{'mlprimitives.custom.preprocessing.ClassEncoder#1': {},
 'mlprimitives.custom.feature_extraction.CategoricalEncoder#1': {'keep': False,
  'copy': True,
  'features': 'auto',
  'max_unique_ratio': 0,
  'max_labels': 0},
 'sklearn.impute.SimpleImputer#1': {'missing_values': nan,
  'fill_value': None,
  'verbose': False,
  'copy': True,
  'strategy': 'median'},
 'xgboost.XGBClassifier#1': {'n_jobs': -1,
  'n_estimators': 100,
  'max_depth': 3,
  'learning_rate': 0.1,
  'gamma': 0,
  'min_child_weight': 1},
 'mlprimitives.custom.preprocessing.ClassDecoder#1': {}}

Tunable Hyperparameters

One of the main features of MLBlocks JSON Annotations is the possibility to indicate the type and possible values that each primitive hyperparameter accepts.

The list of possible hyperparameters and their details can easily be obtained from the pipeline instance by calling its get_tunable_hyperparameters method.

The output of this method is a dictionary that contains the list of tunable hyperparameters for each block in the pipeline, ready to be passed to any hyperparameter tuning library such as BTB.

In [7]: pipeline.get_tunable_hyperparameters()
Out[7]: 
{'mlprimitives.custom.preprocessing.ClassEncoder#1': {},
 'mlprimitives.custom.feature_extraction.CategoricalEncoder#1': {'max_labels': {'type': 'int',
   'default': 0,
   'range': [0, 100]}},
 'sklearn.impute.SimpleImputer#1': {},
 'xgboost.XGBClassifier#1': {'n_estimators': {'type': 'int',
   'default': 100,
   'range': [10, 1000]},
  'max_depth': {'type': 'int', 'default': 3, 'range': [3, 10]},
  'learning_rate': {'type': 'float', 'default': 0.1, 'range': [0, 1]},
  'gamma': {'type': 'float', 'default': 0, 'range': [0, 1]},
  'min_child_weight': {'type': 'int', 'default': 1, 'range': [1, 10]}},
 'mlprimitives.custom.preprocessing.ClassDecoder#1': {}}

Setting Hyperparameters

Modifying the hyperparameters of an already instantiated pipeline can be done using the set_hyperparameters method, which expects a dictionary with the same format as the returned by the get_hyperparameters method.

Note that if a subset of the hyperparameters is passed, only these will be modified, and the other ones will remain unmodified.

In [8]: new_hyperparameters = {
   ...:     'xgboost.XGBClassifier#1': {
   ...:         'max_depth': 15
   ...:     }
   ...: }
   ...: 

In [9]: pipeline.set_hyperparameters(new_hyperparameters)

In [10]: hyperparameters = pipeline.get_hyperparameters()

In [11]: hyperparameters['xgboost.XGBClassifier#1']['max_depth']
Out[11]: 15

Making predictions

Once we have created the pipeline with the desired hyperparameters we can fit it and then use it to make predictions on new data.

To do this, we first call the fit method passing the training data and the corresponding labels.

In [12]: from mlprimitives.datasets import load_census

In [13]: dataset = load_census()

In [14]: X_train, X_test, y_train, y_test = dataset.get_splits(1)

In [15]: pipeline.fit(X_train, y_train)

Once we have fitted our model to our data, we can call the predict method passing new data to obtain predictions from the pipeline.

In [16]: predictions = pipeline.predict(X_test)

In [17]: predictions
Out[17]: 
array([' >50K', ' <=50K', ' >50K', ..., ' >50K', ' <=50K', ' <=50K'],
      dtype=object)

In [18]: dataset.score(y_test, predictions)
Out[18]: 0.8637759489006265