Whittle index based Q-learning for restless bandits with average reward

DOI10.1016/J.AUTOMATICA.2022.110186MaRDI QIDQ2116660zbMATH OpenOpenAlexFDO

Authors Vivek Borkar, Konstantin Avrachenkov

Publication date 18 March 2022

Published in Automatica (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/2004.14427

zbMATH Keywords

Whittle index average reward reinforcement learning Q-learning discrete event system restless bandits

Mathematics Subject Classification ID

Artificial neural networks and deep learning (68T07) Discrete event control/observation systems (93C65)

Abstract: A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.

Recommendations

Cites work

Cited in

(2)

This page was built for publication: Whittle index based Q-learning for restless bandits with average reward

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2116660)