Policy evaluation with temporal differences: a survey and comparison
zbMATH Open1317.68150MaRDI QIDQ2934010FDOQ2934010
Authors: Christoph Dann, Gerhard Neumann, Jan Peters
Publication date: 8 December 2014
Full work available at URL: http://jmlr.csail.mit.edu/papers/v15/dann14a.html
Recommendations
- Least squares temporal difference methods: An analysis under general conditions
- Technical update: Least-squares temporal difference learning
- Approximate policy iteration: a survey and some new methods
- Generalized TD learning
- An emphatic approach to the problem of off-policy temporal-difference learning
Point estimation (62F10) Markov processes: estimation; hidden Markov models (62M05) Learning and adaptive systems in artificial intelligence (68T05) Research exposition (monographs, survey articles) pertaining to computer science (68-02) Research exposition (monographs, survey articles) pertaining to statistics (62-02) Markov and semi-Markov decision processes (90C40)
Cited In (22)
- A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic
- Approximated multi-agent fitted Q iteration
- Accelerating Stochastic Composition Optimization
- Gradient temporal-difference learning for off-policy evaluation using emphatic weightings
- Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics
- Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning
- Multi-agent natural actor-critic reinforcement learning algorithms
- An incremental off-policy search in a model-free Markov decision process using a single sample path
- An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method
- Hybrid SGD algorithms to solve stochastic composite optimization problems with application in sparse portfolio selection problems
- Simple and optimal methods for stochastic variational inequalities. II: Markovian noise and policy evaluation in reinforcement learning
- A functional model method for nonconvex nonsmooth conditional stochastic optimization
- Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
- Accelerated and Instance-Optimal Policy Evaluation with Linear Function Approximation
- Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage
- Off-policy temporal difference learning with distribution adaptation in fast mixing chains
- Multi-agent reinforcement learning: a selective overview of theories and algorithms
- Stochastic composition optimization of functions without Lipschitz continuous gradient
- Multilevel composite stochastic optimization via nested variance reduction
- Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization
- A finite time analysis of temporal difference learning with linear function approximation
- Distributed entropy-regularized multi-agent reinforcement learning with policy consensus
This page was built for publication: Policy evaluation with temporal differences: a survey and comparison
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2934010)