Least squares policy evaluation algorithms with linear function approximation

From MaRDI portal

Publication:1870310

Jump to:navigation, search

DOI10.1023/A:1022192903948zbMath1030.93061MaRDI QIDQ1870310

Dimitri P. Bertsekas, Angelia Nedić

Publication date: 11 May 2003

Published in: Discrete Event Dynamic Systems (Search for Journal in Brave)

zbMATH Keywords

simulation martingale convergence results temporal difference stepsize linear function approximation least-square methods \(\text{LSTD}(\lambda)\) algorithm discrete-time stationary Markov chain infinite-horizon dynamic programming policy evaluation algorithms

Mathematics Subject Classification ID

Discrete-time control/observation systems (93C55) Least squares and related methods for stochastic control systems (93E24) Markov chains (discrete-time Markov processes on discrete state spaces) (60J10) Linearizations (93B18) Optimal stochastic control (93E20)

Related Items

Approximate policy iteration: a survey and some new methods, Potential-based least-squares policy iteration for a parameterized feedback control system, An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method, Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning, Batch mode reinforcement learning based on the synthesis of artificial trajectories, A concentration bound for \(\operatorname{LSPE}( \lambda )\), Reinforcement learning algorithms with function approximation: recent advances and applications, Temporal difference-based policy iteration for optimal control of stochastic systems, Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes, Dynamic modeling and control of supply chain systems: A review, Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes, Real-time reinforcement learning by sequential actor-critics and experience replay, Proximal algorithms and temporal difference methods for solving fixed point problems, Kernel dynamic policy programming: applicable reinforcement learning to robot systems with high dimensional states, A note on linear function approximation using random projections, A formal framework and extensions for function approximation in learning classifier systems, Projected equation methods for approximate solution of large linear systems, Variance Regularization in Sequential Bayesian Optimization, Transmission scheduling for multi-process multi-sensor remote estimation via approximate dynamic programming, Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation, Allocating resources via price management systems: a dynamic programming-based approach, Unnamed Item

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1870310&oldid=14262592"