Sample mean based index policies by O(log n) regret for the multi-armed bandit problem (Q4862097)

scientific article; zbMATH DE number 837042

Language	Label	Description	Also known as
English	Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem	scientific article; zbMATH DE number 837042

Statements

instance of

scholarly article

0 references

title

Sample mean based index policies by O(log n) regret for the multi-armed bandit problem (English)

0 references

author

Rajeev Agrawal

0 references

published in

Advances in Applied Probability

0 references

publication date

9 July 1996

0 references

zbMATH Keywords

upper confidence bounds

0 references

asymptotically efficient

0 references

large deviations

0 references

stochastic adaptive control

0 references

non-Bayesian infinite horizon version

0 references

multi-armed bandit problem

0 references

Kullback-Leibler number

0 references

Identifiers

zbMATH Open document ID

0840.90129

0 references

DOI

10.2307/1427934

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:4862097

Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (Q4862097)

Statements

Identifiers

Sitelinks

Mathematics(1 entry)