btb.selection package

Module contents

class btb.selection.BestKReward(choices, k=2)[source]

Bases: btb.selection.ucb1.UCB1

Best K reward selector

Computes the average reward from the past scores by using only the highest k scores. In implementation, the other scores are replaced with ``nan``s such that they still factor into the number of arm pulls.

Parameters

k (int) – number of best scores to consider

compute_rewards(scores)[source]

Retain the K best scores, and replace the rest with nans

select(choice_scores)[source]

Select a choice using the K best scores

Keeps the choice counts intact, but only let the bandit see the top k learners’ scores. If there is not enough score history to do K-selection, use the default UCB1 reward function.

class btb.selection.BestKVelocity(choices, k=2)[source]

Bases: btb.selection.best.BestKReward

Best K velocity selector

compute_rewards(scores)[source]

Compute the velocity of the best scores

The velocities are the k distances between the k+1 best scores.

class btb.selection.HierarchicalByAlgorithm(choices, by_algorithm)[source]

Bases: btb.selection.ucb1.UCB1

Hierarchical selector

Parameters

by_algorithm (Dict[str, List]) – mapping of ML algorithms to frozen set choices

select(choice_scores)[source]

Groups the frozen sets by algorithm and first chooses an algorithm based on the traditional UCB1 criteria.

Next, from that algorithm’s frozen sets, makes the final set choice.

class btb.selection.PureBestKVelocity(choices, k=3)[source]

Bases: btb.selection.selector.Selector

Pure Best K Velocity Selector

Simply returns the choice with the best best-K velocity.

compute_rewards(scores)[source]

Compute the “velocity” of (average distance between) the k+1 best scores. Return a list with those k velocities padded out with zeros so that the count remains the same.

select(choice_scores)[source]

Select the choice with the highest best-K velocity. If any choices don’t have MIN_K scores yet, return the one with the fewest.

class btb.selection.RecentKReward(choices, k=2)[source]

Bases: btb.selection.ucb1.UCB1

Recent K reward selector

Parameters

k (int) – number of best scores to consider

compute_rewards(scores)[source]

Retain the K most recent scores, and replace the rest with zeros

select(choice_scores)[source]

Use the top k learner’s scores for usage in rewards for the bandit calculation

class btb.selection.RecentKVelocity(choices, k=2)[source]

Bases: btb.selection.recent.RecentKReward

Recent K velocity selector

compute_rewards(scores)[source]

Compute the velocity of thte k+1 most recent scores.

The velocity is the average distance between scores. Return a list with those k velocities padded out with zeros so that the count remains the same.

class btb.selection.UCB1(choices)[source]

Bases: btb.selection.selector.Selector

UCB1 selector

Uses Upper Confidence Bound 1 algorithm (UCB1) for bandit selection.

See also:

Auer, Peter et al. "Finite-time Analysis of the Multiarmed Bandit Problem."
Machine Learning 47 (2002): 235-256.
bandit(choice_rewards)[source]

Multi-armed bandit method which chooses the arm for which the upper confidence bound (UCB) of expected reward is greatest.

If there are multiple arms with the same UCB1 index, then one is chosen at random.

An explanation is here: https://www.cs.bham.ac.uk/internal/courses/robotics/lectures/ucb1.pdf

class btb.selection.Uniform(choices)[source]

Bases: btb.selection.selector.Selector

Uniform selector

Selects a choice uniformly at random.

select(choice_scores)[source]

Select the next best choice to make

Parameters

choice_scores (Dict[object, List[float]]) –

Mapping of choice to list of scores for each possible choice. The caller is responsible for making sure each choice that is possible at this juncture is represented in the dict, even those with no scores. Score lists should be in ascending chronological order, that is, the score from the earliest trial should be listed first.

For example:

{
    1: [0.56, 0.61, 0.33, 0.67],
    2: [0.25, 0.58],
    3: [0.60, 0.65, 0.68],
}