BTBSession

What is BTBSession

BTBSession provides a simplified user interface to be able to search the best solution for your tuning problem by combining tuners and selectors with as little steps required as possible.

Creating a BTBSession

We will guide you through the necessary steps to get started using BTBSession to select and tune the best model to solve a Machine Learning problem.

In particular, in this example we will be using BTBSession to perform solve the Wine classification problem by selecting between the DecisionTreeClassifier and the SGDClassifier models from scikit-learn while also searching for their best hyperparameter configuration.

Prepare a scoring function

The first step in order to use the BTBSession class is to develop a scoring function.

This is a Python function that, given a model name and a hyperparameter configuration, evaluates the performance of the model on your data and returns a score.

Next, we will load the dataset which we will use later on to evaluate the performance of our machine learning model:

[2]:
from sklearn.datasets import load_wine

dataset = load_wine()

Now we will create a dictionary of our “models” with a given name as a key, this will help us when selecting to pick the model:

[3]:
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier

models = {
    'DTC': DecisionTreeClassifier,
    'SGDC': SGDClassifier,
}

And finally we can proceed to create our scoring function that will take as an input the model name (the key that we used previously) and the hyperparameter values. We will use the cross_val_score that will use a f1_score as scorer.

So our scoring_function will: 1. Get the model using the name that we gave. 2. Create that models instance with the given hyperparameter values. 3. Generate scores using corss_val_score. 4. Return the average score.

[4]:
from sklearn.metrics import make_scorer, f1_score
from sklearn.model_selection import cross_val_score

def scoring_function(model_name, hyperparameter_values):
    # choose the model
    model_class = models[model_name]

    # instantiate the model
    model_instance = model_class(**hyperparameter_values)

    # perform fit-score
    scores = cross_val_score(
        estimator=model_instance,
        X=dataset.data,
        y=dataset.target,
        scoring=make_scorer(f1_score, average='macro')
    )

    return scores.mean()

Define the tunable hyperparameters

The second step is to define the hyperparameters that we want to tune for each model as Tunables.

[5]:
from btb.tuning import hyperparams as hp
from btb.tuning import Tunable


tunables = {
    'DTC': Tunable({
        'max_depth': hp.IntHyperParam(min=3, max=200),
        'min_samples_split': hp.FloatHyperParam(min=0.01, max=1)
    }),
    'SGDC': Tunable({
        'max_iter': hp.IntHyperParam(min=1, max=5000, default=1000),
        'tol': hp.FloatHyperParam(min=1e-3, max=1, default=1e-3),
    })
}

Create BTBSession instance

Once you have defined a scoring function and the tunable hyperparameters specification of your models, you can create the instance of btb.BTBSession.

BTBSession accepts the following arguments:

  • tunables (dict): Python dictionary that has as keys the name of the tunable and as value a dictionary with the tunable hyperparameters or an btb.tuning.tunable.Tunable instance.

  • scorer (callable object / function): A callable object or function with signature scorer(tunable_name, config) wich should return only a single value.

  • tuner_class (btb.tuning.tuner.BaseTuner): A tuner based on BTB BaseTuner class. This tuner will manage the new proposals. Defaults to btb.tuning.tuners.gaussian_process.GPTuner

  • selector_class (btb.selection.selector.Selector): A selector based on BTB Selector class. This will determinate which one of the tunables is performing better, and which one to test next. Defaults to btb.selection.selectors.ucb1.UCB1

  • maximize (bool): If True the scores are interpreted as bigger is better, if False then smaller is better, this should depend on the problem type (maximization or minimization). Defaults to True.

  • max_erors (int): Amount of errors allowed for a tunable to not generate a score. Once this amount of errors is reached, the tunable will be removed from the list. Defaults to 1.

  • verbose (bool): If True a progress bar will be displayed for the run process.

For now all you need to do is pass the tunable hyperparameters scpecification and the scoring function.

[6]:
from btb import BTBSession

session = BTBSession(
    tunables=tunables,
    scorer=scoring_function,
    verbose=True
)

Using BTBSession

Run

BTBSession works with it’s main method called run. This method accepts as an argument the amount of tuning iterations to perform. By default this argument is None wich means that it will run until it’s not stopped by the user or a StopTuning exception is raised.

For now you can call the run method indicating how many tunable iterations you want the Session to perform:

[7]:
best_proposal = session.run(5)

Exploring the result

The result will be a dictionary indicating the name of the best model that could be found and the hyperparameter configuration that was used:

[8]:
best_proposal
[8]:
{'id': '834d610fff74cae8a10e169c82346a0a',
 'name': 'DTC',
 'config': {'max_depth': 3, 'min_samples_split': 0.01},
 'score': 0.8897699044250768}

The session object also contains this best_proposal as an attribute

[9]:
session.best_proposal
[9]:
{'id': '834d610fff74cae8a10e169c82346a0a',
 'name': 'DTC',
 'config': {'max_depth': 3, 'min_samples_split': 0.01},
 'score': 0.8897699044250768}

Resume session

The session allows us to resume our tuning from the last iteration that we did. We can run for some more iterations and expect our score to be improved by calling the run method:

[10]:
best_proposal = session.run(20)

best_proposal

[10]:
{'id': '4a13e9f66e16e453fb9258ac59f27fa0',
 'name': 'DTC',
 'config': {'max_depth': 44, 'min_samples_split': 0.016971453207688683},
 'score': 0.9076991973543698}

As we can observe, this time, our score has improved after continuing our tuning.

Fitting the best solution

One we have found the best possible solution, we are ready to learn a model from our data in order to make predictions. To do this, we will have to retrieve from the best_proposal dict both the name and the configuration of the best solution.

[11]:
best_model_name = best_proposal['name']
hyperparameters = best_proposal['config']
best_model_class = models[best_model_name]
model_instance = best_model_class(**hyperparameters)
[12]:
model_instance.fit(dataset.data, dataset.target)
[12]:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=44,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1,
                       min_samples_split=0.016971453207688683,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best')