Convergence Results for Some Temporal Difference Methods Based on Least Squares
From MaRDI portal
Publication:4974645
DOI10.1109/TAC.2009.2022097zbMath1367.93731OpenAlexW2165418472MaRDI QIDQ4974645
Dimitri P. Bertsekas, Huizhen Yu
Publication date: 8 August 2017
Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1109/tac.2009.2022097
Linear regression; mixed models (62J05) Least squares and related methods for stochastic control systems (93E24) Markov and semi-Markov decision processes (90C40)
Related Items (13)
Approximate policy iteration: a survey and some new methods ⋮ A concentration bound for \(\operatorname{LSPE}( \lambda )\) ⋮ A Lyapunov-based version of the value iteration algorithm formulated as a discrete-time switched affine system ⋮ Temporal difference-based policy iteration for optimal control of stochastic systems ⋮ Approximate dynamic programming via direct search in the space of value function approximations ⋮ Proximal algorithms and temporal difference methods for solving fixed point problems ⋮ Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling ⋮ Regularized feature selection in reinforcement learning ⋮ Finite-Time Performance of Distributed Temporal-Difference Learning with Linear Function Approximation ⋮ Properties of subgradient projection iteration when applying to linear imaging system ⋮ A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation ⋮ Allocating resources via price management systems: a dynamic programming-based approach ⋮ Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities
This page was built for publication: Convergence Results for Some Temporal Difference Methods Based on Least Squares