Approximate Newton Policy Gradient Algorithms
From MaRDI portal
Publication:6074547
Abstract: Policy gradient algorithms have been widely applied to Markov decision processes and reinforcement learning problems in recent years. Regularization with various entropy functions is often used to encourage exploration and improve stability. This paper proposes an approximate Newton method for the policy gradient algorithm with entropy regularization. In the case of Shannon entropy, the resulting algorithm reproduces the natural policy gradient algorithm. For other entropy functions, this method results in brand-new policy gradient algorithms. We prove that all these algorithms enjoy Newton-type quadratic convergence and that the corresponding gradient flow converges globally to the optimal solution. We use synthetic and industrial-scale examples to demonstrate that the proposed approximate Newton method typically converges in single-digit iterations, often orders of magnitude faster than other state-of-the-art algorithms.
Recommendations
- Approximate Newton methods for policy search in Markov decision processes
- Fast global convergence of natural policy gradient methods with entropy regularization
- On linear and super-linear convergence of natural policy gradient algorithm
- Entropy Regularization for Mean Field Games with Learning
- Natural actor-critic algorithms
Cites work
- scientific article; zbMATH DE number 1667417 (Why is no real title available?)
- scientific article; zbMATH DE number 3128787 (Why is no real title available?)
- scientific article; zbMATH DE number 3173999 (Why is no real title available?)
- scientific article; zbMATH DE number 3790208 (Why is no real title available?)
- scientific article; zbMATH DE number 1953444 (Why is no real title available?)
- scientific article; zbMATH DE number 2107836 (Why is no real title available?)
- scientific article; zbMATH DE number 7306852 (Why is no real title available?)
- scientific article; zbMATH DE number 3322635 (Why is no real title available?)
- A Characterization of Superlinear Convergence and Its Application to Quasi-Newton Methods
- A comparison of iterative methods for solving nonsymmetric linear systems
- Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems
- Fast global convergence of natural policy gradient methods with entropy regularization
- Hessian informed mirror descent
- Mirror descent algorithms for minimizing interacting free energy
- New results on superlinear convergence of classical quasi-Newton methods
- OnActor-Critic Algorithms
- Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
- Primal-dual subgradient methods for convex problems
- Rates of superlinear convergence for classical quasi-Newton methods
- Reinforcement learning. An introduction
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Softmax policy gradient methods can take exponential time to converge
- The Information Geometry of Mirror Descent
Cited in
(14)- Geometry and convergence of natural policy gradient methods
- Hessian matrix distribution for Bayesian policy gradient reinforcement learning
- Approximate Newton methods for policy search in Markov decision processes
- A stochastic trust-region framework for policy optimization
- A Class of Decision Processes Showing Policy-Improvement/Newton–Raphson Equivalence
- Entropy Regularization for Mean Field Games with Learning
- Fast global convergence of natural policy gradient methods with entropy regularization
- Global convergence of natural policy gradient with Hessian-aided momentum variance reduction
- On linear and super-linear convergence of natural policy gradient algorithm
- Accelerating Primal-Dual Methods for Regularized Markov Decision Processes
- Block Policy Mirror Descent
- Entropy regularization methods for parameter space exploration
- Compatible natural gradient policy search
- Global convergence of policy gradient methods to (almost) locally optimal policies
This page was built for publication: Approximate Newton Policy Gradient Algorithms
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6074547)