Whittle index based Q-learning for restless bandits with average reward
From MaRDI portal
(Redirected from Publication:2116660)
Abstract: A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.
Recommendations
- Conditions for indexability of restless bandits and an algorithm to compute Whittle index
- On the computation of Whittle's index for Markovian restless bandits
- Whittle index for restless bandits with expanding state spaces
- Regret bounds for restless Markov bandits
- Multi-armed restless bandits, index policies, and dynamic priority allocation
Cites work
- scientific article; zbMATH DE number 4087408 (Why is no real title available?)
- scientific article; zbMATH DE number 49674 (Why is no real title available?)
- scientific article; zbMATH DE number 1321699 (Why is no real title available?)
- scientific article; zbMATH DE number 1348599 (Why is no real title available?)
- scientific article; zbMATH DE number 700091 (Why is no real title available?)
- scientific article; zbMATH DE number 7042556 (Why is no real title available?)
- A stability criterion for two timescale stochastic approximation schemes
- Admission and routing of soft real-time jobs to multiclusters: design and comparison of index policies
- An actor-critic algorithm for constrained Markov decision processes
- An index policy for dynamic pricing in cloud computing under price commitments
- Asymptotically optimal index policies for an abandonment queue with convex holding cost
- Asynchronous Stochastic Approximations
- Bandit algorithms
- Convergence results for single-step on-policy reinforcement-learning algorithms
- Index Policies for the Admission Control and Routing of Impatient Customers to Heterogeneous Service Stations
- Index policies for the maintenance of a collection of machines by a set of repairmen
- Indexability and index heuristics for a simple class of inventory routing problems
- Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access
- Learning algorithms for Markov decision processes with average cost
- Multi-armed bandit allocation indices. With a foreword by Peter Whittle.
- Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges
- On an index policy for restless bandits
- Reinforcement learning. An introduction
- Simulation-based optimization of Markov reward processes
- The Borkar-Meyn theorem for asynchronous stochastic approximations
- The complexity of optimal queuing network control
- Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning
- Whittle Index Policy for Crawling Ephemeral Content
Cited in
(2)
This page was built for publication: Whittle index based Q-learning for restless bandits with average reward
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2116660)