Sample mean based index policies by O(log n) regret for the multi-armed bandit problem (Q4862097): Difference between revisions

Latest revision as of 22:01, 19 March 2024

scientific article; zbMATH DE number 837042

Language	Label	Description	Also known as
English	Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem	scientific article; zbMATH DE number 837042

Statements

instance of

scholarly article

0 references

title

Sample mean based index policies by O(log n) regret for the multi-armed bandit problem (English)

0 references

author

Rajeev Agrawal

0 references

published in

Advances in Applied Probability

0 references

publication date

9 July 1996

0 references

zbMATH Keywords

upper confidence bounds

0 references

asymptotically efficient

0 references

large deviations

0 references

stochastic adaptive control

0 references

non-Bayesian infinite horizon version

0 references

multi-armed bandit problem

0 references

Kullback-Leibler number

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.2307/1427934

0 references

Identifiers

zbMATH Open document ID

0840.90129

0 references

DOI

10.2307/1427934

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:4862097

@@ Property / full work available at URL @@
+https://doi.org/10.2307/1427934
@@ Property / full work available at URL: https://doi.org/10.2307/1427934 / rank @@
+Normal rank
@@ Property / OpenAlex ID @@
+W2000080679
@@ Property / OpenAlex ID: W2000080679 / rank @@
+Normal rank

Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (Q4862097): Difference between revisions

Latest revision as of 22:01, 19 March 2024

Statements

Identifiers

Sitelinks

Mathematics(1 entry)