btb.selection.best module

class btb.selection.best.BestKReward(choices, k=2)[source]

Bases: btb.selection.ucb1.UCB1

Best K reward selector

Computes the average reward from the past scores by using only the highest k scores. In implementation, the other scores are replaced with ``nan``s such that they still factor into the number of arm pulls.

Parameters

k (int) – number of best scores to consider

compute_rewards(scores)[source]

Retain the K best scores, and replace the rest with nans

select(choice_scores)[source]

Select a choice using the K best scores

Keeps the choice counts intact, but only let the bandit see the top k learners’ scores. If there is not enough score history to do K-selection, use the default UCB1 reward function.

class btb.selection.best.BestKVelocity(choices, k=2)[source]

Bases: btb.selection.best.BestKReward

Best K velocity selector

compute_rewards(scores)[source]

Compute the velocity of the best scores

The velocities are the k distances between the k+1 best scores.