btb.selection.ucb1 module

class btb.selection.ucb1.UCB1(choices)[source]

Bases: btb.selection.selector.Selector

UCB1 selector

Uses Upper Confidence Bound 1 algorithm (UCB1) for bandit selection.

See also:

Auer, Peter et al. "Finite-time Analysis of the Multiarmed Bandit Problem."
Machine Learning 47 (2002): 235-256.
bandit(choice_rewards)[source]

Multi-armed bandit method which chooses the arm for which the upper confidence bound (UCB) of expected reward is greatest.

If there are multiple arms with the same UCB1 index, then one is chosen at random.

An explanation is here: https://www.cs.bham.ac.uk/internal/courses/robotics/lectures/ucb1.pdf