Publication:3305109: Difference between revisions
Created automatically from import240129110113 |
EloiFerrer (talk | contribs) m EloiFerrer moved page On Generalized Bellman Equations and Temporal-Difference Learning to On Generalized Bellman Equations and Temporal-Difference Learning: Duplicate |
(No difference)
|
Latest revision as of 14:58, 2 May 2024
DOI10.1007/978-3-319-57351-9_1zbMath1454.68135arXiv1704.04463OpenAlexW2606786028MaRDI QIDQ3305109
Richard S. Sutton, Huizhen Yu, Ashique Rupam Mahmood
Publication date: 5 August 2020
Published in: Advances in Artificial Intelligence (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1704.04463
Markov chainMarkov decision processreinforcement learningpolicy evaluationtemporal differencesrandomized stopping timegeneralized Bellman equationapproximate policy evaluationtemporal-difference method
Learning and adaptive systems in artificial intelligence (68T05) Dynamic programming (90C39) Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) (60J20) Markov and semi-Markov decision processes (90C40)
Related Items (3)
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Stationary policies and Markov policies in Borel dynamic programming
- Asynchronous stochastic approximation and Q-learning
- Technical update: Least-squares temporal difference learning
- Non-negative matrices and Markov chains.
- Q( $$\lambda $$ ) with Off-Policy Corrections
- Importance Sampling for Stochastic Simulations
- Error Bounds for Approximations from Projected Linear Equations
- General Irreducible Markov Chains and Non-Negative Operators
- Markov Chains and Stochastic Stability
- Matrix Analysis
- An analysis of temporal-difference learning with function approximation
- OnActor-Critic Algorithms
- Combining importance sampling and temporal difference control variates to simulate Markov Chains
- Ergodic Theorems for Discrete Time Stochastic Systems Using a Stochastic Lyapunov Function
- Real Analysis and Probability
- Least Squares Temporal Difference Methods: An Analysis under General Conditions
- Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning
This page was built for publication: On Generalized Bellman Equations and Temporal-Difference Learning