Neural Temporal Difference and Q Learning Provably Converge to Global Optima
From MaRDI portal
Publication:6149409
DOI10.1287/MOOR.2023.1370arXiv1905.10027OpenAlexW4367298527MaRDI QIDQ6149409FDOQ6149409
Authors: Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang
Publication date: 5 March 2024
Published in: Mathematics of Operations Research (Search for Journal in Brave)
Abstract: Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning. However, due to the nonlinearity in value function approximation, such a coupling leads to nonconvexity and even divergence in optimization. As a result, the global convergence of neural TD remains unclear. In this paper, we prove for the first time that neural TD converges at a sublinear rate to the global optimum of the mean-squared projected Bellman error for policy evaluation. In particular, we show how such global convergence is enabled by the overparametrization of neural networks, which also plays a vital role in the empirical success of neural TD. Beyond policy evaluation, we establish the global convergence of neural (soft) Q-learning, which is further connected to that of policy gradient algorithms.
Full work available at URL: https://arxiv.org/abs/1905.10027
Recommendations
- Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
- A finite time analysis of temporal difference learning with linear function approximation
- The convergence of \(TD(\lambda)\) for general \(\lambda\)
- Proximal gradient temporal difference learning: stable reinforcement learning with polynomial sample complexity
- On the convergence of temporal-difference learning with linear function approximation
Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)
This page was built for publication: Neural Temporal Difference and Q Learning Provably Converge to Global Optima
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6149409)