Regret Analysis of a Markov Policy Gradient Algorithm for Multiarm Bandits
From MaRDI portal
Publication:6121638
DOI10.1287/moor.2022.1311arXiv2007.10229OpenAlexW3042983647WikidataQ114967780 ScholiaQ114967780MaRDI QIDQ6121638
Publication date: 27 February 2024
Published in: Mathematics of Operations Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2007.10229
Inference from stochastic processes and prediction (62M20) Discrete-time Markov processes on general state spaces (60J05) Stochastic approximation (62L20)