Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence (Q6161312)

scientific article; zbMATH DE number 7702810

Language	Label	Description	Also known as
English	Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence	scientific article; zbMATH DE number 7702810

Statements

instance of

scholarly article

0 references

title

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence (English)

0 references

0 references

0 references

0 references

0 references

0 references

0 references

SIAM Journal on Optimization

0 references

publication date

27 June 2023

0 references

full work available at URL

https://arxiv.org/abs/2105.11066

0 references

zbMATH Keywords

policy mirror descent

0 references

Bregman divergence

0 references

regularization

0 references

policy optimization

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

First-Order Methods in Optimization

0 references

Mirror descent and nonlinear projected subgradient methods for convex optimization.

0 references

Q4597712

0 references

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

0 references

On linear and super-linear convergence of natural policy gradient algorithm

0 references

Proximal Minimization Methods with Generalized Bregman Functions

0 references

Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes

0 references

Primal-dual first-order methods with ${\mathcal {O}(1/\varepsilon)}$ iteration-complexity for cone programming

0 references

An optimal randomized incremental gradient method

0 references

Softmax policy gradient methods can take exponential time to converge

0 references

Q3967358

0 references

Q4315289

0 references

Possible generalization of Boltzmann-Gibbs statistics.

0 references

Simple statistical gradient-following algorithms for connectionist reinforcement learning

0 references

Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence

0 references

Identifiers

0 references

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:6161312