Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence
From MaRDI portal
Publication:6161312
DOI10.1137/21m1456789arXiv2105.11066MaRDI QIDQ6161312
Jason D. Lee, Yuxin Chen, Yuejie Chi, Shicong Cen, Unnamed Author, Unnamed Author
Publication date: 27 June 2023
Published in: SIAM Journal on Optimization (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/2105.11066
Computational learning theory (68Q32) Analysis of algorithms and problem complexity (68Q25) Nonconvex programming, global optimization (90C26)
Related Items (3)
Approximate Newton Policy Gradient Algorithms ⋮ Softmax policy gradient methods can take exponential time to converge ⋮ Geometry and convergence of natural policy gradient methods
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Primal-dual first-order methods with \({\mathcal {O}(1/\varepsilon)}\) iteration-complexity for cone programming
- An optimal randomized incremental gradient method
- Mirror descent and nonlinear projected subgradient methods for convex optimization.
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Possible generalization of Boltzmann-Gibbs statistics.
- On linear and super-linear convergence of natural policy gradient algorithm
- Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes
- Proximal Minimization Methods with Generalized Bregman Functions
- First-Order Methods in Optimization
- Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization
- Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence
- Softmax policy gradient methods can take exponential time to converge
This page was built for publication: Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence