Multi-armed bandits with discount factor near one: The Bernoulli case
From MaRDI portal
Publication:1161450
DOI10.1214/aos/1176345578zbMath0478.90073OpenAlexW2095160246MaRDI QIDQ1161450
Publication date: 1981
Published in: The Annals of Statistics (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1214/aos/1176345578
Gittins indexasymptotic boundsmulti-armed banditdiscount optimalityBernoulli bandit processexpected average reward optimalityinfinite sequence of Bernoulli random variablesleast failures rulelimit ruleoptimal arm pulling strategyplay-the-winner rule
Markov and semi-Markov decision processes (90C40) Sequential statistical design (62L05) Optimal stopping in statistics (62L15)
Related Items
Sequential allocation in clinical trials, On optimal search with unknown detection probabilities, Dynamic priority allocation via restless bandit marginal productivity indices, An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits, Gittins' theorem under uncertainty, Branching Bandit Processes, Finite state multi-armed bandit problems: Sensitive-discount, average-reward and average-overtaking optimality