Single Table Pipelines

In this section we will go over a few pipeline examples to show MLBlocks working in different scenarios and with different types of data.

For each example, we will be using example datasets which can be downloaded using the various functions found in the mlprimitives.datasets module.

Note

Even though the datasets are not especially big, some of the examples might use a considerable amount of resources, especially memory, and might take several minutes to run.

Regression Pipeline

In the most simple example, we will be using a single RandomForestRegressor primitive over the numeric data from The Boston Dataset, which we will load using the mlblocks.dataset.load_boston function.

from mlblocks import MLPipeline
from mlprimitives.datasets import load_boston

dataset = load_boston()
dataset.describe()

X_train, X_test, y_train, y_test = dataset.get_splits(1)

primitives = [
    'sklearn.ensemble.RandomForestRegressor'
]
pipeline = MLPipeline(primitives)

pipeline.fit(X_train, y_train)

predictions = pipeline.predict(X_test)

dataset.score(y_test, predictions)

Classification Pipeline

As a Classification example, we will be using The Iris Dataset, which we will load using the mlblocks.dataset.load_iris function.

Here we will combine the StandardScaler from scikit-learn with an XGBClassifier primitive.

In this case, we will also be passing some initialization parameters for the XGBClassifier.

from mlblocks import MLPipeline
from mlprimitives.datasets import load_iris

dataset = load_iris()
dataset.describe()

X_train, X_test, y_train, y_test = dataset.get_splits(1)

primitives = [
    'sklearn.preprocessing.StandardScaler',
    'xgboost.XGBClassifier'
]
init_params = {
    'xgboost.XGBClassifier': {
        'learning_rate': 0.1
    }
}
pipeline = MLPipeline(primitives, init_params)

pipeline.fit(X_train, y_train)

predictions = pipeline.predict(X_test)

dataset.score(y_test, predictions)