Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (Q4862097): Difference between revisions

From MaRDI portal
Added link to MaRDI item.
Set OpenAlex properties.
 
(One intermediate revision by one other user not shown)
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / full work available at URL
 
Property / full work available at URL: https://doi.org/10.2307/1427934 / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W2000080679 / rank
 
Normal rank

Latest revision as of 23:01, 19 March 2024

scientific article; zbMATH DE number 837042
Language Label Description Also known as
English
Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem
scientific article; zbMATH DE number 837042

    Statements

    Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (English)
    0 references
    0 references
    0 references
    9 July 1996
    0 references
    0 references
    0 references
    0 references
    0 references
    upper confidence bounds
    0 references
    asymptotically efficient
    0 references
    large deviations
    0 references
    stochastic adaptive control
    0 references
    non-Bayesian infinite horizon version
    0 references
    multi-armed bandit problem
    0 references
    Kullback-Leibler number
    0 references
    0 references