Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (Q4862097): Difference between revisions

From MaRDI portal
Added link to MaRDI item.
Import240304020342 (talk | contribs)
Set profile property.
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank

Revision as of 17:46, 5 March 2024

scientific article; zbMATH DE number 837042
Language Label Description Also known as
English
Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem
scientific article; zbMATH DE number 837042

    Statements

    Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (English)
    0 references
    0 references
    9 July 1996
    0 references
    upper confidence bounds
    0 references
    asymptotically efficient
    0 references
    large deviations
    0 references
    stochastic adaptive control
    0 references
    non-Bayesian infinite horizon version
    0 references
    multi-armed bandit problem
    0 references
    Kullback-Leibler number
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references