Quickstart¶
Below is a short tutorial that will show you how to get started using MLBlocks.
In this tutorial we will learn how to:
Create a pipeline using multiple primitives
Obtain the list of tunable hyperparameters from the pipeline
Specify hyperparameters for each primitive in the pipeline
Fit the pipeline using training data
Use the pipeline to make predictions from new data
Note
Some additional dependencies are required in order to run this Quickstart. Make sure that you have already installed them.
Creating a pipeline¶
With MLBlocks, creating a pipeline is as simple as specifying a list of primitives and passing them to the MLPipeline class:
In [1]: from mlblocks import MLPipeline
In [2]: primitives = [
...: 'mlprimitives.custom.preprocessing.ClassEncoder',
...: 'mlprimitives.custom.feature_extraction.CategoricalEncoder',
...: 'sklearn.impute.SimpleImputer',
...: 'xgboost.XGBClassifier',
...: 'mlprimitives.custom.preprocessing.ClassDecoder'
...: ]
...:
In [3]: pipeline = MLPipeline(primitives)
Optionally, specific hyperparameters can be also set by specifying them in a dictionary and
passing them as the init_params
argument:
In [4]: init_params = {
...: 'sklearn.impute.SimpleImputer': {
...: 'strategy': 'median'
...: }
...: }
...:
In [5]: pipeline = MLPipeline(primitives, init_params=init_params)
Once the pipeline has been instantiated, we can easily see what hyperparameters have been set for each block, by calling the get_hyperparameters method.
The output of this method is a dictionary which has the name of each block as keys and a dictionary with the hyperparameters of the corresponding block as values.
In [6]: pipeline.get_hyperparameters()
Out[6]:
{'mlprimitives.custom.preprocessing.ClassEncoder#1': {},
'mlprimitives.custom.feature_extraction.CategoricalEncoder#1': {'keep': False,
'copy': True,
'features': 'auto',
'max_unique_ratio': 0,
'max_labels': 0},
'sklearn.impute.SimpleImputer#1': {'missing_values': nan,
'fill_value': None,
'verbose': False,
'copy': True,
'strategy': 'median'},
'xgboost.XGBClassifier#1': {'n_jobs': -1,
'n_estimators': 100,
'max_depth': 3,
'learning_rate': 0.1,
'gamma': 0,
'min_child_weight': 1},
'mlprimitives.custom.preprocessing.ClassDecoder#1': {}}
Tunable Hyperparameters¶
One of the main features of MLBlocks JSON Annotations is the possibility to indicate the type and possible values that each primitive hyperparameter accepts.
The list of possible hyperparameters and their details can easily be obtained from the pipeline instance by calling its get_tunable_hyperparameters method.
The output of this method is a dictionary that contains the list of tunable hyperparameters for each block in the pipeline, ready to be passed to any hyperparameter tuning library such as BTB.
In [7]: pipeline.get_tunable_hyperparameters()
Out[7]:
{'mlprimitives.custom.preprocessing.ClassEncoder#1': {},
'mlprimitives.custom.feature_extraction.CategoricalEncoder#1': {'max_labels': {'type': 'int',
'default': 0,
'range': [0, 100]}},
'sklearn.impute.SimpleImputer#1': {},
'xgboost.XGBClassifier#1': {'n_estimators': {'type': 'int',
'default': 100,
'range': [10, 1000]},
'max_depth': {'type': 'int', 'default': 3, 'range': [3, 10]},
'learning_rate': {'type': 'float', 'default': 0.1, 'range': [0, 1]},
'gamma': {'type': 'float', 'default': 0, 'range': [0, 1]},
'min_child_weight': {'type': 'int', 'default': 1, 'range': [1, 10]}},
'mlprimitives.custom.preprocessing.ClassDecoder#1': {}}
Setting Hyperparameters¶
Modifying the hyperparameters of an already instantiated pipeline can be done using the set_hyperparameters method, which expects a dictionary with the same format as the returned by the get_hyperparameters method.
Note that if a subset of the hyperparameters is passed, only these will be modified, and the other ones will remain unmodified.
In [8]: new_hyperparameters = {
...: 'xgboost.XGBClassifier#1': {
...: 'max_depth': 15
...: }
...: }
...:
In [9]: pipeline.set_hyperparameters(new_hyperparameters)
In [10]: hyperparameters = pipeline.get_hyperparameters()
In [11]: hyperparameters['xgboost.XGBClassifier#1']['max_depth']
Out[11]: 15
Making predictions¶
Once we have created the pipeline with the desired hyperparameters we can fit it and then use it to make predictions on new data.
To do this, we first call the fit
method passing the training data and the corresponding
labels.
In [12]: from mlprimitives.datasets import load_census
In [13]: dataset = load_census()
In [14]: X_train, X_test, y_train, y_test = dataset.get_splits(1)
In [15]: pipeline.fit(X_train, y_train)
Once we have fitted our model to our data, we can call the predict
method passing new data
to obtain predictions from the pipeline.
In [16]: predictions = pipeline.predict(X_test)
In [17]: predictions
Out[17]:
array([' >50K', ' <=50K', ' >50K', ..., ' >50K', ' <=50K', ' <=50K'],
dtype=object)
In [18]: dataset.score(y_test, predictions)
Out[18]: 0.8637759489006265