Least Squares Temporal Difference Methods: An Analysis under General Conditions
From MaRDI portal
Publication:4910565
DOI10.1137/100807879zbMath1274.90478OpenAlexW2141022000MaRDI QIDQ4910565
Publication date: 19 March 2013
Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)
Full work available at URL: http://hdl.handle.net/1721.1/77629
Monte Carlo methods (65C05) Dynamic programming (90C39) Markov and semi-Markov decision processes (90C40)
Related Items
Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage, An incremental off-policy search in a model-free Markov decision process using a single sample path, Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning, On Generalized Bellman Equations and Temporal-Difference Learning, Proximal algorithms and temporal difference methods for solving fixed point problems, Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning