Whittle index based Q-learning for restless bandits with average reward

From MaRDI portal
Publication:2116660

DOI10.1016/J.AUTOMATICA.2022.110186zbMATH Open1485.93341arXiv2004.14427OpenAlexW3022217175MaRDI QIDQ2116660FDOQ2116660

Konstantin Avrachenkov, Vivek Borkar

Publication date: 18 March 2022

Published in: Automatica (Search for Journal in Brave)

Abstract: A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.


Full work available at URL: https://arxiv.org/abs/2004.14427





Cites Work


Cited In (2)






This page was built for publication: Whittle index based Q-learning for restless bandits with average reward

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2116660)