An emphatic approach to the problem of off-policy temporal-difference learning
From MaRDI portal
Publication:2810885
zbMATH Open1360.68712arXiv1503.04269MaRDI QIDQ2810885FDOQ2810885
Authors: Richard S. Sutton, A. Rupam Mahmood, Martha White
Publication date: 6 June 2016
Published in: Journal of Machine Learning Research (JMLR) (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1503.04269
Recommendations
- On Generalized Bellman Equations and Temporal-Difference Learning
- On Generalized Bellman Equations and Temporal-Difference Learning
- Weak convergence properties of constrained emphatic temporal-difference learning with constant and slowly diminishing stepsize
- \(\text{Q}(\lambda)\) with off-policy corrections
- A finite time analysis of temporal difference learning with linear function approximation
Cited In (12)
- Distributed consensus-based multi-agent temporal-difference learning
- Gradient temporal-difference learning for off-policy evaluation using emphatic weightings
- Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning
- Policy evaluation with temporal differences: a survey and comparison
- On Generalized Bellman Equations and Temporal-Difference Learning
- Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning
- Adaptive importance sampling for value function approximation in off-policy reinforcement learning
- Off-policy temporal difference learning with distribution adaptation in fast mixing chains
- Statistical inference for online decision making via stochastic gradient descent
- Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
- Multi-agent reinforcement learning: a selective overview of theories and algorithms
- \(\text{Q}(\lambda)\) with off-policy corrections
This page was built for publication: An emphatic approach to the problem of off-policy temporal-difference learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2810885)