Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (Q4862097): Difference between revisions

From MaRDI portal
Importer (talk | contribs)
Created a new Item
 
Added link to MaRDI item.
links / mardi / namelinks / mardi / name
 

Revision as of 05:08, 8 February 2024

scientific article; zbMATH DE number 837042
Language Label Description Also known as
English
Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem
scientific article; zbMATH DE number 837042

    Statements

    Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (English)
    0 references
    0 references
    0 references
    9 July 1996
    0 references
    0 references
    0 references
    0 references
    0 references
    upper confidence bounds
    0 references
    asymptotically efficient
    0 references
    large deviations
    0 references
    stochastic adaptive control
    0 references
    non-Bayesian infinite horizon version
    0 references
    multi-armed bandit problem
    0 references
    Kullback-Leibler number
    0 references