Reinforcement learning with replacing eligibility traces
From MaRDI portal
Publication:1911343
DOI10.1007/BF00114726zbMath0843.68094MaRDI QIDQ1911343
Richard S. Sutton, Satinder Pal Singh
Publication date: 13 August 1996
Published in: Machine Learning (Search for Journal in Brave)
Monte Carlo methodsreinforcement learningtemporal difference learningeligibility tracereplacing trace
Related Items
Guiding exploration by pre-existing knowledge without modifying reward, The optimal unbiased value estimator and its relation to LSTD, TD and MC, Risk-averse policy optimization via risk-neutral policy optimization, A Gentle Introduction to Reinforcement Learning
Cites Work
- Asynchronous stochastic approximation and Q-learning
- Practical issues in temporal difference learning
- The convergence of \(TD(\lambda)\) for general \(\lambda\)
- Temporal-difference methods and Markov models
- On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
- A Note on the Inversion of Matrices by Random Walks
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item