Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (Q4862097)

From MaRDI portal





scientific article; zbMATH DE number 837042
Language Label Description Also known as
default for all languages
No label defined
    English
    Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem
    scientific article; zbMATH DE number 837042

      Statements

      Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (English)
      0 references
      0 references
      9 July 1996
      0 references
      upper confidence bounds
      0 references
      asymptotically efficient
      0 references
      large deviations
      0 references
      stochastic adaptive control
      0 references
      non-Bayesian infinite horizon version
      0 references
      multi-armed bandit problem
      0 references
      Kullback-Leibler number
      0 references

      Identifiers

      0 references
      0 references
      0 references
      0 references
      0 references
      0 references