Multi-armed bandits with discount factor near one: The Bernoulli case
From MaRDI portal
Publication:1161450
DOI10.1214/aos/1176345578zbMath0478.90073OpenAlexW2095160246MaRDI QIDQ1161450
Publication date: 1981
Published in: The Annals of Statistics (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1214/aos/1176345578
Gittins indexasymptotic boundsmulti-armed banditdiscount optimalityBernoulli bandit processexpected average reward optimalityinfinite sequence of Bernoulli random variablesleast failures rulelimit ruleoptimal arm pulling strategyplay-the-winner rule
Markov and semi-Markov decision processes (90C40) Sequential statistical design (62L05) Optimal stopping in statistics (62L15)
Related Items (7)
Sequential allocation in clinical trials ⋮ On optimal search with unknown detection probabilities ⋮ Dynamic priority allocation via restless bandit marginal productivity indices ⋮ An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits ⋮ Gittins' theorem under uncertainty ⋮ Branching Bandit Processes ⋮ Finite state multi-armed bandit problems: Sensitive-discount, average-reward and average-overtaking optimality
This page was built for publication: Multi-armed bandits with discount factor near one: The Bernoulli case