Quickstart¶

In this short tutorial we will guide you through the necessary steps to get started using BTB to select and tune the best model to solve a Machine Learning problem.

In particular, in this example we will be using BTBSession to perform solve the Wine classification problem by selecting between the DecisionTreeClassifier and the SGDClassifier models from scikit-learn while also searching for their best hyperparameter configuration.

Prepare a scoring function¶

The first step in order to use the BTBSession class is to develop a scoring function.

This is a Python function that, given a model name and a hyperparameter configuration, evaluates the performance of the model on your data and returns a score.

[2]:

from sklearn.datasets import load_wine
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import f1_score, make_scorer
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier

dataset = load_wine()

models = {
    'DTC': DecisionTreeClassifier,
    'SGDC': SGDClassifier,
}

def scoring_function(model_name, hyperparameter_values):
    model_class = models[model_name]
    model_instance = model_class(**hyperparameter_values)
    scores = cross_val_score(
        estimator=model_instance,
        X=dataset.data,
        y=dataset.target,
        scoring=make_scorer(f1_score, average='macro')
    )
    return scores.mean()

Define the tunable hyperparameters¶

The second step is to define the hyperparameters that we want to tune for each model as Tunables.

[3]:

from btb.tuning import Tunable
from btb.tuning import hyperparams as hp

tunables = {
    'DTC': Tunable({
        'max_depth': hp.IntHyperParam(min=3, max=200),
        'min_samples_split': hp.FloatHyperParam(min=0.01, max=1)
    }),
    'SGDC': Tunable({
        'max_iter': hp.IntHyperParam(min=1, max=5000, default=1000),
        'tol': hp.FloatHyperParam(min=1e-3, max=1, default=1e-3),
    })
}

Start the searching process¶

Once you have defined a scoring function and the tunable hyperparameters specification of your models, you can start the searching for the best model and hyperparameter configuration by using the btb.BTBSession.

All you need to do is create an instance passing the tunable hyperparameters scpecification and the scoring function.

[4]:

from btb import BTBSession

session = BTBSession(
    tunables=tunables,
    scorer=scoring_function,
    verbose=True
)

And then call the run method indicating how many tunable iterations you want the Session to perform:

[5]:

best_proposal = session.run(20)

The result will be a dictionary indicating the name of the best model that could be found and the hyperparameter configuration that was used:

[6]:

best_proposal

[6]:

{'id': 'e47c13afa40a10d55da91a13add6e142',
 'name': 'DTC',
 'config': {'max_depth': 3, 'min_samples_split': 0.1445639630277333},
 'score': 0.9127678612465631}