Gradient temporal-difference learning for off-policy evaluation using emphatic weightings
From MaRDI portal
Recommendations
- An emphatic approach to the problem of off-policy temporal-difference learning
- Off-policy linear temporal difference learning algorithms with a generalized oblique projection
- \(\text{Q}(\lambda)\) with off-policy corrections
- Policy evaluation with temporal differences: a survey and comparison
- Off-policy temporal difference learning with distribution adaptation in fast mixing chains
Cites work
- An analysis of temporal-difference learning with function approximation
- An emphatic approach to the problem of off-policy temporal-difference learning
- Marginal Mean Models for Dynamic Regimes
- Policy evaluation with temporal differences: a survey and comparison
- Recruitment-imitation mechanism for evolutionary reinforcement learning
- Reinforcement learning. An introduction
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- Weak convergence properties of constrained emphatic temporal-difference learning with constant and slowly diminishing stepsize
- \({\mathcal Q}\)-learning
This page was built for publication: Gradient temporal-difference learning for off-policy evaluation using emphatic weightings
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6146179)