Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling
From MaRDI portal
Publication:2051259
DOI10.1007/s10994-020-05912-5OpenAlexW3118861484MaRDI QIDQ2051259
Nathaniel Korda, L. A. Prashanth, Rémi Munos
Publication date: 24 November 2021
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1306.2557
Related Items
Risk-Sensitive Reinforcement Learning via Policy Gradient Search, A concentration bound for \(\operatorname{LSPE}( \lambda )\), Concentration of Contractive Stochastic Approximation and Reinforcement Learning
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Transport-entropy inequalities and deviation estimates for stochastic approximation schemes
- Concentration bounds for stochastic approximations
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- Stochastic approximation. A dynamical systems viewpoint.
- Stochastic approximation methods for constrained and unconstrained systems
- Online Learning as Stochastic Approximation of Regularization Paths: Optimality and Almost-Sure Convergence
- Acceleration of Stochastic Approximation by Averaging
- An analysis of temporal-difference learning with function approximation
- OnActor-Critic Algorithms
- High-Dimensional Statistics
- Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression
- 10.1162/1532443041827907
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- Convergence Results for Some Temporal Difference Methods Based on Least Squares
- A Stochastic Approximation Method