Adaptive treatment allocation and the multi-armed bandit problem (Q1102059)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Adaptive treatment allocation and the multi-armed bandit problem
scientific article

    Statements

    Adaptive treatment allocation and the multi-armed bandit problem (English)
    0 references
    1987
    0 references
    There are k distinct statistical populations each specified by a univariate density function characterized by a parameter of unknown value. The question concerns how \(x_ 1,x_ 2,...,x_ N\) should be sampled sequentially from the k populations in order to maximize (in some sense) the mean value of their sum. A class of simple allocation rules based on upper confidence bounds for the population parameters is proposed. These rules are shown to exhibit asymptotic optimality in both a Bayesian and a frequentist sense. A simulation study provides evidence that the rules perform well even for moderate values of N.
    0 references
    0 references
    0 references
    0 references
    0 references
    adaptive treatment allocation
    0 references
    multi-armed bandit problem
    0 references
    boundary crossing
    0 references
    adaptive control
    0 references
    dynamic allocation
    0 references
    upper confidence bounds
    0 references
    asymptotic optimality
    0 references
    simulation study
    0 references
    0 references
    0 references