btb.selection.selector module¶
-
class
btb.selection.selector.
Selector
(choices)[source]¶ Bases:
object
Base selector
- Parameters
choices (list) – a list of discrete choices from which the selector must choose at every call to
select
.
-
bandit
(choice_rewards)[source]¶ Return the choice to take next using multi-armed bandit
Multi-armed bandit method. Accepts a mapping of choices to rewards which indicate their historical performance, and returns the choice that we should make next in order to maximize expected reward in the long term.
The default implementation is to return the arm with the highest average score.
- Parameters
choice_rewards (Dict[object, List[float]]) – maps choice IDs to lists of rewards.
- Returns
the name of the choice to take next.
- Return type
str
-
compute_rewards
(scores)[source]¶ Compute rewards from choice’s scores
Convert a list of scores associated with one choice into a list of rewards. Normally, the length of the list will be preserved, even if some of the scores are dropped.
-
select
(choice_scores)[source]¶ Select the next best choice to make
- Parameters
choice_scores (Dict[object, List[float]]) –
Mapping of choice to list of scores for each possible choice. The caller is responsible for making sure each choice that is possible at this juncture is represented in the dict, even those with no scores. Score lists should be in ascending chronological order, that is, the score from the earliest trial should be listed first.
For example:
{ 1: [0.56, 0.61, 0.33, 0.67], 2: [0.25, 0.58], 3: [0.60, 0.65, 0.68], }