Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (Q4862097)

From MaRDI portal
scientific article; zbMATH DE number 837042
Language Label Description Also known as
English
Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem
scientific article; zbMATH DE number 837042

    Statements

    Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (English)
    0 references
    0 references
    0 references
    9 July 1996
    0 references
    0 references
    0 references
    0 references
    0 references
    upper confidence bounds
    0 references
    asymptotically efficient
    0 references
    large deviations
    0 references
    stochastic adaptive control
    0 references
    non-Bayesian infinite horizon version
    0 references
    multi-armed bandit problem
    0 references
    Kullback-Leibler number
    0 references
    0 references