btb.selection.ucb1 module¶
-
class
btb.selection.ucb1.
UCB1
(choices)[source]¶ Bases:
btb.selection.selector.Selector
UCB1 selector
Uses Upper Confidence Bound 1 algorithm (UCB1) for bandit selection.
See also:
Auer, Peter et al. "Finite-time Analysis of the Multiarmed Bandit Problem." Machine Learning 47 (2002): 235-256.
-
bandit
(choice_rewards)[source]¶ Multi-armed bandit method which chooses the arm for which the upper confidence bound (UCB) of expected reward is greatest.
If there are multiple arms with the same UCB1 index, then one is chosen at random.
An explanation is here: https://www.cs.bham.ac.uk/internal/courses/robotics/lectures/ucb1.pdf
-