A Small Gain Analysis of Single Timescale Actor Critic
From MaRDI portal
Publication:6042800
Abstract: We consider a version of actor-critic which uses proportional step-sizes and only one critic update with a single sample from the stationary distribution per actor step. We provide an analysis of this method using the small-gain theorem. Specifically, we prove that this method can be used to find a stationary point, and that the resulting sample complexity improves the state of the art for actor-critic methods to to find an -approximate stationary point where is the condition number associated with the critic.
Recommendations
- A convergent online single time scale actor critic algorithm
- Real-time reinforcement learning by sequential actor-critics and experience replay
- Actor Critic Learning: A Near Set Approach
- Reinforcement learning in finite MDPs: PAC analysis
- A Simultaneous Perturbation Stochastic Approximation-Based Actor–Critic Algorithm for Markov Decision Processes
- TD-regularized actor-critic methods
- On the sample complexity of actor-critic method for reinforcement learning with function approximation
Cites work
- scientific article; zbMATH DE number 5833176 (Why is no real title available?)
- A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic
- A convergent online single time scale actor critic algorithm
- Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs
- Actor-Critic--Type Learning Algorithms for Markov Decision Processes
- An analysis of temporal-difference learning with function approximation
- Deep Reinforcement Learning: A State-of-the-Art Walkthrough
- Fundamental design principles for reinforcement learning algorithms
- Global convergence of policy gradient methods to (almost) locally optimal policies
- Introduction to nonlinear optimization: theory, algorithms, and applications with MATLAB
- OnActor-Critic Algorithms
- Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
- Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon
- Policy gradient in Lipschitz Markov decision processes
- Taylor series expansions for stationary Markov chains
- The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
Cited in
(1)
This page was built for publication: A Small Gain Analysis of Single Timescale Actor Critic
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6042800)