On Generalized Bellman Equations and Temporal-Difference Learning
DOI10.1007/978-3-319-57351-9_1zbMath1454.68135arXiv1704.04463OpenAlexW2606786028MaRDI QIDQ3305109
Richard S. Sutton, Huizhen Yu, Ashique Rupam Mahmood
Publication date: 5 August 2020
Published in: Advances in Artificial Intelligence (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1704.04463
Markov chainMarkov decision processreinforcement learningpolicy evaluationtemporal differencesrandomized stopping timegeneralized Bellman equationapproximate policy evaluationtemporal-difference method
Learning and adaptive systems in artificial intelligence (68T05) Dynamic programming (90C39) Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Markov and semi-Markov decision processes (90C40)
Related Items
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Stationary policies and Markov policies in Borel dynamic programming
- Asynchronous stochastic approximation and Q-learning
- Technical update: Least-squares temporal difference learning
- Non-negative matrices and Markov chains.
- Q( $$\lambda $$ ) with Off-Policy Corrections
- Importance Sampling for Stochastic Simulations
- Error Bounds for Approximations from Projected Linear Equations
- General Irreducible Markov Chains and Non-Negative Operators
- Markov Chains and Stochastic Stability
- Matrix Analysis
- An analysis of temporal-difference learning with function approximation
- OnActor-Critic Algorithms
- Combining importance sampling and temporal difference control variates to simulate Markov Chains
- Ergodic Theorems for Discrete Time Stochastic Systems Using a Stochastic Lyapunov Function
- Real Analysis and Probability
- Least Squares Temporal Difference Methods: An Analysis under General Conditions
- Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning