Single Table Pipelines¶
In this section we will go over a few pipeline examples to show MLBlocks working in different scenarios and with different types of data.
For each example, we will be using example datasets which can be downloaded using the
various functions found in the mlprimitives.datasets
module.
Note
Even though the datasets are not especially big, some of the examples might use a considerable amount of resources, especially memory, and might take several minutes to run.
Regression Pipeline¶
In the most simple example, we will be using a single RandomForestRegressor primitive over
the numeric data from The Boston Dataset, which we will load using the
mlblocks.dataset.load_boston
function.
from mlblocks import MLPipeline
from mlprimitives.datasets import load_boston
dataset = load_boston()
dataset.describe()
X_train, X_test, y_train, y_test = dataset.get_splits(1)
primitives = [
'sklearn.ensemble.RandomForestRegressor'
]
pipeline = MLPipeline(primitives)
pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)
dataset.score(y_test, predictions)
Classification Pipeline¶
As a Classification example, we will be using The Iris Dataset, which we will load using the
mlblocks.dataset.load_iris
function.
Here we will combine the StandardScaler from scikit-learn with an XGBClassifier primitive.
In this case, we will also be passing some initialization parameters for the XGBClassifier.
from mlblocks import MLPipeline
from mlprimitives.datasets import load_iris
dataset = load_iris()
dataset.describe()
X_train, X_test, y_train, y_test = dataset.get_splits(1)
primitives = [
'sklearn.preprocessing.StandardScaler',
'xgboost.XGBClassifier'
]
init_params = {
'xgboost.XGBClassifier': {
'learning_rate': 0.1
}
}
pipeline = MLPipeline(primitives, init_params)
pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)
dataset.score(y_test, predictions)