Regret bounds for restless Markov bandits
From MaRDI portal
Publication:465253
DOI10.1016/J.TCS.2014.09.026zbMATH Open1360.60090arXiv1209.2693OpenAlexW2178643644MaRDI QIDQ465253FDOQ465253
Authors: Ronald Ortner, Daniil Ryabko, Peter Auer, Rémi Munos
Publication date: 31 October 2014
Published in: Theoretical Computer Science (Search for Journal in Brave)
Abstract: We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm that after steps achieves regret with respect to the best policy that knows the distributions of all arms. No assumptions on the Markov chains are made except that they are irreducible. In addition, we show that index-based policies are necessarily suboptimal for the considered problem.
Full work available at URL: https://arxiv.org/abs/1209.2693
Recommendations
Stopping times; optimal stopping problems; gambling theory (60G40) Markov and semi-Markov decision processes (90C40) Probabilistic games; gambling (91A60)
Cites Work
- Title not available (Why is that?)
- Asymptotically efficient adaptive allocation rules
- Title not available (Why is that?)
- Title not available (Why is that?)
- The Nonstochastic Multiarmed Bandit Problem
- Finite-time analysis of the multiarmed bandit problem
- Equivalence notions and model minimization in Markov decision processes
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
- Threshold limits for cover times
- Near-optimal regret bounds for reinforcement learning
- Pseudometrics for State Aggregation in Average Reward Markov Decision Processes
- On Chebyshev-Type Inequalities for Primes
- Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards
- Title not available (Why is that?)
- On the possibility of learning in reactive environments with arbitrary dependence
Cited In (12)
- Whittle index based Q-learning for restless bandits with average reward
- Approximations of the restless bandit problem
- An online algorithm for the risk-aware restless bandit
- Regret Bounds for Restless Markov Bandits
- Learning unknown service rates in queues: a multiarmed bandit approach
- Multi-armed bandit problem with online clustering as side information
- Learning in structured MDPs with convex cost functions: improved regret bounds for inventory management
- A new bandit setting balancing information from state evolution and corrupted context
- Approximation algorithms for restless bandit problems
- Optimal exploration-exploitation in a multi-armed bandit problem with non-stationary rewards
- Bounded Regret for Finitely Parameterized Multi-Armed Bandits
- Regret bounds for Narendra-Shapiro bandit algorithms
This page was built for publication: Regret bounds for restless Markov bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q465253)