Accelerating Primal-Dual Methods for Regularized Markov Decision Processes
From MaRDI portal
Publication:6202767
Numerical optimization and variational techniques (65K10) Learning and adaptive systems in artificial intelligence (68T05) Minimax problems in mathematical programming (90C47) Markov and semi-Markov decision processes (90C40) Lyapunov and storage functions (93D30) Acceleration of convergence in numerical analysis (65B99)
Abstract: Entropy regularized Markov decision processes have been widely used in reinforcement learning. This paper is concerned with the primal-dual formulation of the entropy regularized problems. Standard first-order methods suffer from slow convergence due to the lack of strict convexity and concavity. To address this issue, we first introduce a new quadratically convexified primal-dual formulation. The natural gradient ascent descent of the new formulation enjoys global convergence guarantee and exponential convergence rate. We also propose a new interpolating metric that further accelerates the convergence significantly. Numerical results are provided to demonstrate the performance of the proposed methods under multiple settings.
Recommendations
- Approximate Newton Policy Gradient Algorithms
- A note on optimization formulations of Markov decision processes
- Fast global convergence of natural policy gradient methods with entropy regularization
- Entropy Regularization for Mean Field Games with Learning
- A First-Order Approach to Accelerated Value Iteration
Cites work
- scientific article; zbMATH DE number 432503 (Why is no real title available?)
- 10.1162/jmlr.2003.3.4-5.803
- A note on optimization formulations of Markov decision processes
- Approximate Newton Policy Gradient Algorithms
- Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes
- Randomized linear programming solves the Markov decision problem in nearly linear (sometimes sublinear) time
- Reinforcement learning. An introduction
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
This page was built for publication: Accelerating Primal-Dual Methods for Regularized Markov Decision Processes
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6202767)