Least squares policy evaluation algorithms with linear function approximation

From MaRDI portal

Revision as of 11:43, 1 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:1870310

Jump to:navigation, search

DOI10.1023/A:1022192903948zbMath1030.93061MaRDI QIDQ1870310

Dimitri P. Bertsekas, Angelia Nedić

Publication date: 11 May 2003

Published in: Discrete Event Dynamic Systems (Search for Journal in Brave)

zbMATH Keywords

simulation martingale convergence results temporal difference stepsize linear function approximation least-square methods \(\text{LSTD}(\lambda)\) algorithm discrete-time stationary Markov chain infinite-horizon dynamic programming policy evaluation algorithms

Mathematics Subject Classification ID

Discrete-time control/observation systems (93C55) Least squares and related methods for stochastic control systems (93E24) Markov chains (discrete-time Markov processes on discrete state spaces) (60J10) Linearizations (93B18) Optimal stochastic control (93E20)

Related Items (22)

Approximate policy iteration: a survey and some new methods ⋮ Potential-based least-squares policy iteration for a parameterized feedback control system ⋮ An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method ⋮ Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning ⋮ Batch mode reinforcement learning based on the synthesis of artificial trajectories ⋮ A concentration bound for \(\operatorname{LSPE}( \lambda )\) ⋮ Reinforcement learning algorithms with function approximation: recent advances and applications ⋮ Temporal difference-based policy iteration for optimal control of stochastic systems ⋮ Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes ⋮ Dynamic modeling and control of supply chain systems: A review ⋮ Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes ⋮ Real-time reinforcement learning by sequential actor-critics and experience replay ⋮ Proximal algorithms and temporal difference methods for solving fixed point problems ⋮ Kernel dynamic policy programming: applicable reinforcement learning to robot systems with high dimensional states ⋮ A note on linear function approximation using random projections ⋮ A formal framework and extensions for function approximation in learning classifier systems ⋮ Projected equation methods for approximate solution of large linear systems ⋮ Variance Regularization in Sequential Bayesian Optimization ⋮ Transmission scheduling for multi-process multi-sensor remote estimation via approximate dynamic programming ⋮ Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation ⋮ Allocating resources via price management systems: a dynamic programming-based approach ⋮ Unnamed Item

This page was built for publication: Least squares policy evaluation algorithms with linear function approximation

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1870310&oldid=14262592"