Least Squares Temporal Difference Methods: An Analysis under General Conditions

From MaRDI portal

Publication:4910565

Jump to:navigation, search

DOI10.1137/100807879zbMath1274.90478OpenAlexW2141022000MaRDI QIDQ4910565

Publication date: 19 March 2013

Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)

Full work available at URL: http://hdl.handle.net/1721.1/77629

zbMATH Keywords

Markov decision process approximate dynamic programming temporal difference method

Mathematics Subject Classification ID

Monte Carlo methods (65C05) Dynamic programming (90C39) Markov and semi-Markov decision processes (90C40)

Related Items

Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage, An incremental off-policy search in a model-free Markov decision process using a single sample path, Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning, On Generalized Bellman Equations and Temporal-Difference Learning, Proximal algorithms and temporal difference methods for solving fixed point problems, Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:4910565&oldid=19303374"