Sample mean based index policies by O(log n) regret for the multi-armed bandit problem (Q4862097): Difference between revisions

Revision as of 17:46, 5 March 2024

scientific article; zbMATH DE number 837042

Language	Label	Description	Also known as
English	Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem	scientific article; zbMATH DE number 837042

Statements

instance of

scholarly article

0 references

title

Sample mean based index policies by O(log n) regret for the multi-armed bandit problem (English)

0 references

author

Rajeev Agrawal

0 references

published in

Advances in Applied Probability

0 references

publication date

9 July 1996

0 references

zbMATH Keywords

upper confidence bounds

0 references

asymptotically efficient

0 references

large deviations

0 references

stochastic adaptive control

0 references

non-Bayesian infinite horizon version

0 references

multi-armed bandit problem

0 references

Kullback-Leibler number

0 references

MaRDI profile type

MaRDI publication profile

0 references

Identifiers

zbMATH Open document ID

0840.90129

0 references

DOI

10.2307/1427934

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:4862097

Revision as of 04:08, 8 February 2024 Import240129110113 (talk \| contribs) Bots 7,163,963 edits Added link to MaRDI item. ← Older edit	Revision as of 17:46, 5 March 2024 Import240304020342 (talk \| contribs) 4,416,906 edits Set profile property. Newer edit →
	Property / MaRDI profile type
		MaRDI publication profile
	Property / MaRDI profile type: MaRDI publication profile / rank
		Normal rank

Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (Q4862097): Difference between revisions

Revision as of 17:46, 5 March 2024

Statements

Identifiers

Sitelinks

Mathematics(1 entry)