A stochastic trust-region framework for policy optimization
DOI10.4208/JCM.2104-M2021-0007OpenAlexW2990109857MaRDI QIDQ5096136FDOQ5096136
Authors: Mingming Zhao, Yongfeng Li, Zaiwen Wen
Publication date: 15 August 2022
Published in: Journal of Computational Mathematics (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1911.11640
Recommendations
- Expected policy gradients for reinforcement learning
- Compatible natural gradient policy search
- Approximate Newton Policy Gradient Algorithms
- Stochastic trust-region methods with trust-region radius depending on probabilistic models
- A generalized path integral control approach to reinforcement learning
global convergencedeep reinforcement learningpolicy optimizationentropy controlstochastic trust region method
Nonconvex programming, global optimization (90C26) Stochastic programming (90C15) Markov and semi-Markov decision processes (90C40) Optimal stochastic control (93E20)
Cites Work
- Title not available (Why is that?)
- Optimization theory and methods. Nonlinear programming
- Stochastic optimization using a trust-region method and random models
- Convergence of trust-region methods based on probabilistic models
- Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies
- Title not available (Why is that?)
Cited In (2)
Uses Software
This page was built for publication: A stochastic trust-region framework for policy optimization
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5096136)