Target Network and Truncation Overcome the Deadly Triad in \(\boldsymbol{Q}\)-Learning
From MaRDI portal
Publication:6148353
DOI10.1137/22m1499261arXiv2203.02628OpenAlexW4389438905MaRDI QIDQ6148353
Zaiwei Chen, John-Paul B. Clarke, Siva Theja Maguluri
Publication date: 11 January 2024
Published in: SIAM Journal on Mathematics of Data Science (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2203.02628
Artificial neural networks and deep learning (68T07) Learning and adaptive systems in artificial intelligence (68T05) Stochastic approximation (62L20) Markov and semi-Markov decision processes (90C40) Computational aspects of data analysis and big data (68T09)
Cites Work
- Asynchronous stochastic approximation and Q-learning
- An upper bound on the loss from approximate optimal-value functions
- \({\mathcal Q}\)-learning
- A distribution-free theory of nonparametric regression
- Finite-sample analysis of nonlinear stochastic approximation with applications in reinforcement learning
- A concentration bound for contractive stochastic approximation
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- An analysis of temporal-difference learning with function approximation
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- A Stochastic Approximation Method
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item