A Small Gain Analysis of Single Timescale Actor Critic
From MaRDI portal
Publication:6042800
DOI10.1137/22M1483335arXiv2203.02591OpenAlexW4367311942MaRDI QIDQ6042800FDOQ6042800
Authors: Alex Olshevsky, Bahman Gharesifard
Publication date: 4 May 2023
Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)
Abstract: We consider a version of actor-critic which uses proportional step-sizes and only one critic update with a single sample from the stationary distribution per actor step. We provide an analysis of this method using the small-gain theorem. Specifically, we prove that this method can be used to find a stationary point, and that the resulting sample complexity improves the state of the art for actor-critic methods to to find an -approximate stationary point where is the condition number associated with the critic.
Full work available at URL: https://arxiv.org/abs/2203.02591
Recommendations
- A convergent online single time scale actor critic algorithm
- Real-time reinforcement learning by sequential actor-critics and experience replay
- Actor Critic Learning: A Near Set Approach
- Reinforcement learning in finite MDPs: PAC analysis
- A Simultaneous Perturbation Stochastic Approximation-Based Actor–Critic Algorithm for Markov Decision Processes
- TD-regularized actor-critic methods
- On the sample complexity of actor-critic method for reinforcement learning with function approximation
Cites Work
- OnActor-Critic Algorithms
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- An analysis of temporal-difference learning with function approximation
- Introduction to nonlinear optimization: theory, algorithms, and applications with MATLAB
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
- Taylor series expansions for stationary Markov chains
- Policy gradient in Lipschitz Markov decision processes
- Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs
- Fundamental design principles for reinforcement learning algorithms
- Global convergence of policy gradient methods to (almost) locally optimal policies
- Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon
- A convergent online single time scale actor critic algorithm
- Title not available (Why is that?)
- Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
- Deep Reinforcement Learning: A State-of-the-Art Walkthrough
- A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic
Cited In (1)
This page was built for publication: A Small Gain Analysis of Single Timescale Actor Critic
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6042800)