Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning

From MaRDI portal

Publication:6148353

Jump to:navigation, search

DOI10.1137/22m1499261arXiv2203.02628OpenAlexW4389438905MaRDI QIDQ6148353

Zaiwei Chen, John-Paul B. Clarke, Siva Theja Maguluri

Publication date: 11 January 2024

Published in: SIAM Journal on Mathematics of Data Science (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/2203.02628

zbMATH Keywords

reinforcement learning finite-sample analysis \(Q\)-learning linear function approximation

Mathematics Subject Classification ID

Artificial neural networks and deep learning (68T07) Learning and adaptive systems in artificial intelligence (68T05) Stochastic approximation (62L20) Markov and semi-Markov decision processes (90C40) Computational aspects of data analysis and big data (68T09)

Cites Work

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:6148353&oldid=35618015"