Analysis and improvement of policy gradient estimation
From MaRDI portal
Publication:448295
Recommendations
- Efficient sample reuse in policy gradients with parameter-based exploration
- Variance reduction techniques for gradient estimates in reinforcement learning
- Expected policy gradients for reinforcement learning
- Using Gaussian processes for variance reduction in policy gradient algorithms
- Policy gradient in continuous time
Cites work
- scientific article; zbMATH DE number 1702339 (Why is no real title available?)
- scientific article; zbMATH DE number 1983522 (Why is no real title available?)
- scientific article; zbMATH DE number 1753153 (Why is no real title available?)
- scientific article; zbMATH DE number 194544 (Why is no real title available?)
- 10.1162/1532443041827907
- Approximate gradient methods in policy-space optimization of Markov reward processes
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Using Expectation-Maximization for Reinforcement Learning
- Variance reduction techniques for gradient estimates in reinforcement learning
Cited in
(15)- Estimation and approximation bounds for gradient-based reinforcement learning
- Model-based reinforcement learning with dimension reduction
- Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
- Expected policy gradients for reinforcement learning
- Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning
- Importance sampling techniques for policy optimization
- Policy search for active fault diagnosis with partially observable state
- scientific article; zbMATH DE number 1753153 (Why is no real title available?)
- A unified algorithm framework for mean-variance optimization in discounted Markov decision processes
- Hessian matrix distribution for Bayesian policy gradient reinforcement learning
- An ODE method to prove the geometric convergence of adaptive stochastic algorithms
- Global convergence of policy gradient methods to (almost) locally optimal policies
- Gradient estimation with simultaneous perturbation and compressive sensing
- Efficient sample reuse in policy gradients with parameter-based exploration
- Smoothing policies and safe policy gradients
This page was built for publication: Analysis and improvement of policy gradient estimation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q448295)