Adaptive treatment allocation and the multi-armed bandit problem (Q1102059)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Adaptive treatment allocation and the multi-armed bandit problem |
scientific article |
Statements
Adaptive treatment allocation and the multi-armed bandit problem (English)
0 references
1987
0 references
There are k distinct statistical populations each specified by a univariate density function characterized by a parameter of unknown value. The question concerns how \(x_ 1,x_ 2,...,x_ N\) should be sampled sequentially from the k populations in order to maximize (in some sense) the mean value of their sum. A class of simple allocation rules based on upper confidence bounds for the population parameters is proposed. These rules are shown to exhibit asymptotic optimality in both a Bayesian and a frequentist sense. A simulation study provides evidence that the rules perform well even for moderate values of N.
0 references
adaptive treatment allocation
0 references
multi-armed bandit problem
0 references
boundary crossing
0 references
adaptive control
0 references
dynamic allocation
0 references
upper confidence bounds
0 references
asymptotic optimality
0 references
simulation study
0 references