Quickstart¶
In this short tutorial we will guide you through the necessary steps to get started using BTB to select and tune the best model to solve a Machine Learning problem.
In particular, in this example we will be using BTBSession
to perform solve the Wine classification problem by selecting between the DecisionTreeClassifier
and the SGDClassifier
models from scikit-learn while also searching for their best hyperparameter configuration.
Prepare a scoring function¶
The first step in order to use the BTBSession
class is to develop a scoring function.
This is a Python function that, given a model name and a hyperparameter configuration, evaluates the performance of the model on your data and returns a score.
[2]:
from sklearn.datasets import load_wine
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import f1_score, make_scorer
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
dataset = load_wine()
models = {
'DTC': DecisionTreeClassifier,
'SGDC': SGDClassifier,
}
def scoring_function(model_name, hyperparameter_values):
model_class = models[model_name]
model_instance = model_class(**hyperparameter_values)
scores = cross_val_score(
estimator=model_instance,
X=dataset.data,
y=dataset.target,
scoring=make_scorer(f1_score, average='macro')
)
return scores.mean()
Define the tunable hyperparameters¶
The second step is to define the hyperparameters that we want to tune for each model as Tunables
.
[3]:
from btb.tuning import Tunable
from btb.tuning import hyperparams as hp
tunables = {
'DTC': Tunable({
'max_depth': hp.IntHyperParam(min=3, max=200),
'min_samples_split': hp.FloatHyperParam(min=0.01, max=1)
}),
'SGDC': Tunable({
'max_iter': hp.IntHyperParam(min=1, max=5000, default=1000),
'tol': hp.FloatHyperParam(min=1e-3, max=1, default=1e-3),
})
}
Start the searching process¶
Once you have defined a scoring function and the tunable hyperparameters specification of your models, you can start the searching for the best model and hyperparameter configuration by using the btb.BTBSession
.
All you need to do is create an instance passing the tunable hyperparameters scpecification and the scoring function.
[4]:
from btb import BTBSession
session = BTBSession(
tunables=tunables,
scorer=scoring_function,
verbose=True
)
And then call the run
method indicating how many tunable iterations you want the Session to perform:
[5]:
best_proposal = session.run(20)
The result will be a dictionary indicating the name of the best model that could be found and the hyperparameter configuration that was used:
[6]:
best_proposal
[6]:
{'id': 'e47c13afa40a10d55da91a13add6e142',
'name': 'DTC',
'config': {'max_depth': 3, 'min_samples_split': 0.1445639630277333},
'score': 0.9127678612465631}