A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation
From MaRDI portal
Publication:5003727
DOI10.1287/opre.2020.2024zbMath1472.90150arXiv1806.02450OpenAlexW2963616027MaRDI QIDQ5003727
Daniel J. Russo, Raghav Singal, Jalaj Bhandari
Publication date: 29 July 2021
Published in: Operations Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1806.02450
Related Items
Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms, A concentration bound for \(\operatorname{LSPE}( \lambda )\), Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis, Fundamental design principles for reinforcement learning algorithms, Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation, Convergence of Recursive Stochastic Algorithms Using Wasserstein Divergence
Cites Work
- General state space Markov chains and MCMC algorithms
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- Stochastic optimal control. The discrete time case
- Linear least-squares algorithms for temporal difference learning
- On the worst-case analysis of temporal-difference learning algorithms
- Distributed Policy Evaluation Under Multiple Behavior Strategies
- The Linear Programming Approach to Approximate Dynamic Programming
- Pricing American Options: A Duality Approach
- Robust Stochastic Approximation Approach to Stochastic Programming
- Acceleration of Stochastic Approximation by Averaging
- An analysis of temporal-difference learning with function approximation
- Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
- Optimization Methods for Large-Scale Machine Learning
- Non-convex Optimization for Machine Learning
- On the Averaged Stochastic Approximation for Linear Regression
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- Convergence Results for Some Temporal Difference Methods Based on Least Squares
- How Much Does Your Data Exploration Overfit? Controlling Bias via Information Usage
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item