Least squares policy evaluation algorithms with linear function approximation
From MaRDI portal
Publication:1870310
DOI10.1023/A:1022192903948zbMath1030.93061MaRDI QIDQ1870310
Dimitri P. Bertsekas, Angelia Nedić
Publication date: 11 May 2003
Published in: Discrete Event Dynamic Systems (Search for Journal in Brave)
simulationmartingaleconvergence resultstemporal differencestepsizelinear function approximationleast-square methods\(\text{LSTD}(\lambda)\) algorithmdiscrete-time stationary Markov chaininfinite-horizon dynamic programmingpolicy evaluation algorithms
Discrete-time control/observation systems (93C55) Least squares and related methods for stochastic control systems (93E24) Markov chains (discrete-time Markov processes on discrete state spaces) (60J10) Linearizations (93B18) Optimal stochastic control (93E20)
Related Items
Approximate policy iteration: a survey and some new methods, Potential-based least-squares policy iteration for a parameterized feedback control system, An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method, Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning, Batch mode reinforcement learning based on the synthesis of artificial trajectories, A concentration bound for \(\operatorname{LSPE}( \lambda )\), Reinforcement learning algorithms with function approximation: recent advances and applications, Temporal difference-based policy iteration for optimal control of stochastic systems, Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes, Dynamic modeling and control of supply chain systems: A review, Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes, Real-time reinforcement learning by sequential actor-critics and experience replay, Proximal algorithms and temporal difference methods for solving fixed point problems, Kernel dynamic policy programming: applicable reinforcement learning to robot systems with high dimensional states, A note on linear function approximation using random projections, A formal framework and extensions for function approximation in learning classifier systems, Projected equation methods for approximate solution of large linear systems, Variance Regularization in Sequential Bayesian Optimization, Transmission scheduling for multi-process multi-sensor remote estimation via approximate dynamic programming, Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation, Allocating resources via price management systems: a dynamic programming-based approach, Unnamed Item