Multi-armed bandits with discount factor near one: The Bernoulli case

From MaRDI portal

Revision as of 05:42, 31 January 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:1161450

Jump to:navigation, search

DOI10.1214/aos/1176345578zbMath0478.90073OpenAlexW2095160246MaRDI QIDQ1161450

F. P. Kelly

Publication date: 1981

Published in: The Annals of Statistics (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1214/aos/1176345578

zbMATH Keywords

Gittins index asymptotic bounds multi-armed bandit discount optimality Bernoulli bandit process expected average reward optimality infinite sequence of Bernoulli random variables least failures rule limit rule optimal arm pulling strategy play-the-winner rule

Mathematics Subject Classification ID

Markov and semi-Markov decision processes (90C40) Sequential statistical design (62L05) Optimal stopping in statistics (62L15)

Related Items

Sequential allocation in clinical trials, On optimal search with unknown detection probabilities, Dynamic priority allocation via restless bandit marginal productivity indices, An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits, Gittins' theorem under uncertainty, Branching Bandit Processes, Finite state multi-armed bandit problems: Sensitive-discount, average-reward and average-overtaking optimality

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1161450&oldid=13224171"