BTBSession¶
What is BTBSession¶
BTBSession provides a simplified user interface to be able to search the best solution for your tuning problem by combining tuners and selectors with as little steps required as possible.
Creating a BTBSession¶
We will guide you through the necessary steps to get started using BTBSession to select and tune the best model to solve a Machine Learning problem.
In particular, in this example we will be using BTBSession
to perform solve the Wine classification problem by selecting between the DecisionTreeClassifier
and the SGDClassifier
models from scikit-learn while also searching for their best hyperparameter configuration.
Prepare a scoring function¶
The first step in order to use the BTBSession
class is to develop a scoring function.
This is a Python function that, given a model name and a hyperparameter configuration, evaluates the performance of the model on your data and returns a score.
Next, we will load the dataset which we will use later on to evaluate the performance of our machine learning model:
[2]:
from sklearn.datasets import load_wine
dataset = load_wine()
Now we will create a dictionary of our “models” with a given name as a key, this will help us when selecting to pick the model:
[3]:
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
models = {
'DTC': DecisionTreeClassifier,
'SGDC': SGDClassifier,
}
And finally we can proceed to create our scoring function that will take as an input the model name (the key that we used previously) and the hyperparameter values. We will use the cross_val_score that will use a f1_score
as scorer.
So our scoring_function
will: 1. Get the model using the name that we gave. 2. Create that models instance with the given hyperparameter values. 3. Generate scores using corss_val_score
. 4. Return the average score.
[4]:
from sklearn.metrics import make_scorer, f1_score
from sklearn.model_selection import cross_val_score
def scoring_function(model_name, hyperparameter_values):
# choose the model
model_class = models[model_name]
# instantiate the model
model_instance = model_class(**hyperparameter_values)
# perform fit-score
scores = cross_val_score(
estimator=model_instance,
X=dataset.data,
y=dataset.target,
scoring=make_scorer(f1_score, average='macro')
)
return scores.mean()
Define the tunable hyperparameters¶
The second step is to define the hyperparameters that we want to tune for each model as Tunables
.
[5]:
from btb.tuning import hyperparams as hp
from btb.tuning import Tunable
tunables = {
'DTC': Tunable({
'max_depth': hp.IntHyperParam(min=3, max=200),
'min_samples_split': hp.FloatHyperParam(min=0.01, max=1)
}),
'SGDC': Tunable({
'max_iter': hp.IntHyperParam(min=1, max=5000, default=1000),
'tol': hp.FloatHyperParam(min=1e-3, max=1, default=1e-3),
})
}
Create BTBSession instance¶
Once you have defined a scoring function and the tunable hyperparameters specification of your models, you can create the instance of btb.BTBSession
.
BTBSession accepts the following arguments:
tunables
(dict): Python dictionary that has as keys the name of the tunable and as value a dictionary with the tunable hyperparameters or anbtb.tuning.tunable.Tunable
instance.scorer
(callable object / function): A callable object or function with signaturescorer(tunable_name, config)
wich should return only a single value.tuner_class
(btb.tuning.tuner.BaseTuner): A tuner based on BTBBaseTuner
class. This tuner will manage the new proposals. Defaults tobtb.tuning.tuners.gaussian_process.GPTuner
selector_class
(btb.selection.selector.Selector): A selector based on BTBSelector
class. This will determinate which one of the tunables is performing better, and which one to test next. Defaults tobtb.selection.selectors.ucb1.UCB1
maximize
(bool): IfTrue
the scores are interpreted as bigger is better, ifFalse
then smaller is better, this should depend on the problem type (maximization or minimization). Defaults toTrue
.max_erors
(int): Amount of errors allowed for a tunable to not generate a score. Once this amount of errors is reached, the tunable will be removed from the list. Defaults to 1.verbose
(bool): IfTrue
a progress bar will be displayed for therun
process.
For now all you need to do is pass the tunable hyperparameters scpecification and the scoring function.
[6]:
from btb import BTBSession
session = BTBSession(
tunables=tunables,
scorer=scoring_function,
verbose=True
)
Using BTBSession¶
Run¶
BTBSession works with it’s main method called run
. This method accepts as an argument the amount of tuning iterations to perform. By default this argument is None
wich means that it will run until it’s not stopped by the user or a StopTuning
exception is raised.
For now you can call the run
method indicating how many tunable iterations you want the Session to perform:
[7]:
best_proposal = session.run(5)
Exploring the result¶
The result will be a dictionary indicating the name of the best model that could be found and the hyperparameter configuration that was used:
[8]:
best_proposal
[8]:
{'id': '834d610fff74cae8a10e169c82346a0a',
'name': 'DTC',
'config': {'max_depth': 3, 'min_samples_split': 0.01},
'score': 0.8897699044250768}
The session object also contains this best_proposal
as an attribute
[9]:
session.best_proposal
[9]:
{'id': '834d610fff74cae8a10e169c82346a0a',
'name': 'DTC',
'config': {'max_depth': 3, 'min_samples_split': 0.01},
'score': 0.8897699044250768}
Resume session¶
The session allows us to resume our tuning from the last iteration that we did. We can run for some more iterations and expect our score to be improved by calling the run
method:
[10]:
best_proposal = session.run(20)
best_proposal
[10]:
{'id': '4a13e9f66e16e453fb9258ac59f27fa0',
'name': 'DTC',
'config': {'max_depth': 44, 'min_samples_split': 0.016971453207688683},
'score': 0.9076991973543698}
As we can observe, this time, our score has improved after continuing our tuning.
Fitting the best solution¶
One we have found the best possible solution, we are ready to learn a model from our data in order to make predictions. To do this, we will have to retrieve from the best_proposal
dict both the name and the configuration of the best solution.
[11]:
best_model_name = best_proposal['name']
hyperparameters = best_proposal['config']
best_model_class = models[best_model_name]
model_instance = best_model_class(**hyperparameters)
[12]:
model_instance.fit(dataset.data, dataset.target)
[12]:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=44,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1,
min_samples_split=0.016971453207688683,
min_weight_fraction_leaf=0.0, presort=False,
random_state=None, splitter='best')