Selection

What is a Selection problem?

There are some scenarios where we have a single goal which can be solved in multiple ways. Each one of these solutions represents an individual Tuning Problem, and we have no prior knowledge about how good each solution will be once it is tuned.

In these scenarios, one possibility would be to solve the problem using brute-force, which means tuning each solution candidate and then using the one that got the best score.

However, in most cases we will not have the time or resources to tune all the possible solutions beforehand, and we will want to solve what is called a Multi-armed Bandit Problem: start tuning all the solutions at once and optimally select which solutions to keep tuning depending on the scores that they obtain during the process.

Multi-Armed Bandit instead is a solution that tries to find the best candidate spending the minimum trials possible. This process is less time consuming than the previous one as is based on the scores that the candidates are obtaining during the process.

What is a Selector ?

In BTB, the selection problem is solved using the Selector family of classes.

These classes have to be used in combination with multiple tuning problems, represented as multiple Tuner instances, each one of them created using a different Tunable, and the corresponding functions or Machine Learning algorithms.

Currently, BTB implements the following Selectors:

  • UCB1: uses Upper Confidence Bound 1 algorithm (UCB1) for bandit selection.

  • BestKReward: computes the average reward from the past scores by using only the highest k scores.

  • BestKVelocity: compute the velocity of the best scores. The velocities are the \(k\) distances between the \(k+1\) best scores.

  • PureBestKVelocity: returns the choice with the best best-K velocity.

  • RecentKReward: recent \(k\) reward selector, where \(k\) is the number of best scores to consider.

  • RecentKVelocity: compute the velocity of the \(k+1\) most recent scores.

  • Uniform: selects a choice uniformly at random.

Using a Selector

The selectors are intended to be used in combination with Tuners by using their select method.

Creating a Selector

In order to create a selector you need to define a list of candidates and then pass it as a positional argument.

[2]:
from btb.selection import UCB1

candidates = ['foo', 'bar']

selector = UCB1(candidates)

Select

Once we have evaluated some tuners and obtained scores, we have to create a dictionary with the candidate as a key and a list of scores that this has obtained.

This dictionry has to be passed to the select method which will return the name of the next tuner to use.

[3]:
tuner_scores = {
    'foo': [0.1, 0.2],
    'bar': [0.001, 0.002]
}

next_choice = selector.select(tuner_scores)
next_choice
[3]:
'foo'

Selection loop example

Here is an example of how to use Selectors and Tuners together to solve a Machine Learning problem with two candidate algorithms.

For this example we are going to use the Iris dataset and tune two estimators: - DecisionTreeClassifier - SGDClassifier

Next, we will load the dataset and split it in two partitions, train and test, which we will use later on to evaluate the performance of our machine learning model:

[4]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# load the dataset
dataset = load_iris()

# split the dataset
X_train, X_test, y_train, y_test = train_test_split(
    dataset.data, dataset.target, test_size=0.3, random_state=0)

Now we will create a dictionary of our “candidates” with a given name as a key, this will help us when selecting to pick the model:

[5]:
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier

candidates = {
    'DTC': DecisionTreeClassifier,
    'SGDC': SGDClassifier,
}

In the following step we will create the hyperpams for each one of our models:

[6]:
from btb.tuning import hyperparams as hp

dtc_hyperparams = {
    'max_depth': hp.IntHyperParam(min=3, max=200),
    'min_samples_split': hp.FloatHyperParam(min=0.01, max=1)
}

sgdc_hyperparams = {
    'max_iter': hp.IntHyperParam(min=1, max=5000, default=1000),
    'tol': hp.FloatHyperParam(min=1e-3, max=1, default=1e-3),
}

Now let’s create the tunable and tuner for each one of them, and store those in a dictionary like we did with the models, in order to be able to access them when selecting:

[7]:
from btb.tuning import GPTuner, Tunable

# Creating the tunables for our tuners
dtc_tunable = Tunable(dtc_hyperparams)
sgdc_tunable = Tunable(sgdc_hyperparams)

tuners = {
    'DTC': GPTuner(dtc_tunable),
    'SGDC': GPTuner(sgdc_tunable)
}

Now, we can create a selector with the candidates “DTC” and “SGDC” as our selector will return one of those values, and we can access the models / tuners using those keys:

[8]:
from btb.selection import UCB1

selector = UCB1(['DTC', 'SGDC'])

Finally, we will proceed to loop with the following steps:

  1. select a candidate.

  2. propose a set of parameters.

  3. fit the model with those parameters.

  4. score the model.

  5. record the parameters and the score that we obtained.

  6. evaluate if its the best score found so far.

[9]:
best_score = 0
for _ in range(100):
    candidate = selector.select({
        'DTC': tuners['DTC'].scores,
        'SGDC': tuners['SGDC'].scores
    })
    parameters = tuners[candidate].propose()
    model = candidates[candidate](**parameters)
    model.fit(X_train, y_train)
    score = model.score(X_test, y_test)
    tuners[candidate].record(parameters, score)

    if score > best_score:
        best_score = score
        best_model = candidate
        best_params = parameters

print('Best score: ', best_score)
print('Best model: ', best_model)
print('Best parameters: ', best_params)
Best score:  0.9777777777777777
Best model:  DTC
Best parameters:  {'max_depth': 26, 'min_samples_split': 0.2978860831816922}