Geometry and convergence of natural policy gradient methods
From MaRDI portal
Publication:6138809
DOI10.1007/S41884-023-00106-ZarXiv2211.02105OpenAlexW4379185386MaRDI QIDQ6138809FDOQ6138809
Authors:
Publication date: 16 January 2024
Published in: Information Geometry (Search for Journal in Brave)
Abstract: We study the convergence of several natural policy gradient (NPG) methods in infinite-horizon discounted Markov decision processes with regular policy parametrizations. For a variety of NPGs and reward functions we show that the trajectories in state-action space are solutions of gradient flows with respect to Hessian geometries, based on which we obtain global convergence guarantees and convergence rates. In particular, we show linear convergence for unregularized and regularized NPG flows with the metrics proposed by Kakade and Morimura and co-authors by observing that these arise from the Hessian geometries of conditional entropy and entropy respectively. Further, we obtain sublinear convergence rates for Hessian geometries arising from other convex functions like log-barriers. Finally, we interpret the discrete-time NPG methods with regularized rewards as inexact Newton methods if the NPG is defined with respect to the Hessian geometry of the regularizer. This yields local quadratic convergence rates of these methods for step size equal to the penalization strength.
Full work available at URL: https://arxiv.org/abs/2211.02105
Recommendations
- Fast global convergence of natural policy gradient methods with entropy regularization
- On linear and super-linear convergence of natural policy gradient algorithm
- Global convergence of natural policy gradient with Hessian-aided momentum variance reduction
- Approximate Newton Policy Gradient Algorithms
- Global convergence of policy gradient methods to (almost) locally optimal policies
Markov decision processHessian geometrystochastic policynatural policy gradientstate-action frequency
Cites Work
- Title not available (Why is that?)
- Title not available (Why is that?)
- The geometry of Hessian structures
- Inexact Newton Methods
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Finite state Markovian decision processes
- OnActor-Critic Algorithms
- Reinforcement learning. An introduction
- On the Fisher metric of conditional probability polytopes
- Axiomatic Geometry of Conditional Models
- Simulation-based optimization of Markov reward processes
- Hessian Riemannian Gradient Flows in Convex Programming
- Survey of linear programming for standard and nonstandard Markovian control problems. Part I: Theory
- Title not available (Why is that?)
- Wasserstein Riemannian geometry of Gaussian densities
- An Extended Cencov Characterization of the Information Metric
- Information geometry and its applications
- Natural gradient via optimal transport
- Information geometry
- Title not available (Why is that?)
- Title not available (Why is that?)
- Global convergence of policy gradient methods to (almost) locally optimal policies
- Hessian informed mirror descent
- Approximate Newton Policy Gradient Algorithms
- Fast global convergence of natural policy gradient methods with entropy regularization
- Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning
- Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
- Invariance properties of the natural gradient in overparametrised systems
- Efficient Natural Gradient Descent Methods for Large-Scale PDE-Based Optimization Problems
Cited In (1)
This page was built for publication: Geometry and convergence of natural policy gradient methods
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6138809)