Policy gradient in Lipschitz Markov decision processes
From MaRDI portal
Recommendations
- An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
- On linear and super-linear convergence of natural policy gradient algorithm
- Lipschitz continuous policy functions for strongly concave optimization problems
- Policy gradient in continuous time
- Lipschitz continuity of value functions in Markovian decision processes
Cites work
- scientific article; zbMATH DE number 3137306 (Why is no real title available?)
- scientific article; zbMATH DE number 1304480 (Why is no real title available?)
- scientific article; zbMATH DE number 700091 (Why is no real title available?)
- scientific article; zbMATH DE number 1507941 (Why is no real title available?)
- scientific article; zbMATH DE number 1753152 (Why is no real title available?)
- A Stochastic Approximation Method
- Collective motions of a shell structure
- Line search algorithms with guaranteed sufficient decrease
- Lipschitz continuity of value functions in Markovian decision processes
- Minimization of functions having Lipschitz continuous first partial derivatives
- Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
- Policy search for motor primitives in robotics
- Solving connection and linearization problems within the Askey scheme and its \(q\)-analogue via inversion formulas
- Stochastic optimal control. The discrete time case
Cited in
(15)- On linear and super-linear convergence of natural policy gradient algorithm
- On high-order differentiability of the policy function
- Learning parametric policies and transition probability models of Markov decision processes from data
- A Small Gain Analysis of Single Timescale Actor Critic
- Expected policy gradients for reinforcement learning
- Importance sampling techniques for policy optimization
- scientific article; zbMATH DE number 6982305 (Why is no real title available?)
- 10.1162/1532443041827907
- Risk-averse optimization of reward-based coherent risk measures
- scientific article; zbMATH DE number 5957492 (Why is no real title available?)
- Nonconvex policy search using variational inequalities
- Global convergence of policy gradient methods to (almost) locally optimal policies
- Lipschitz continuous policy functions for strongly concave optimization problems
- Smoothing policies and safe policy gradients
- On the sample complexity of actor-critic method for reinforcement learning with function approximation
This page was built for publication: Policy gradient in Lipschitz Markov decision processes
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q747252)