Analysis and improvement of policy gradient estimation
From MaRDI portal
Publication:448295
DOI10.1016/J.NEUNET.2011.09.005zbMATH Open1245.68165DBLPjournals/nn/ZhaoHNS12OpenAlexW2148053762WikidataQ51513131 ScholiaQ51513131MaRDI QIDQ448295FDOQ448295
Authors: Tingting Zhao, Hirotaka Hachiya, Gang Niu, Masashi Sugiyama
Publication date: 30 August 2012
Published in: Neural Networks (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.neunet.2011.09.005
Recommendations
- Efficient sample reuse in policy gradients with parameter-based exploration
- Variance reduction techniques for gradient estimates in reinforcement learning
- Expected policy gradients for reinforcement learning
- Using Gaussian processes for variance reduction in policy gradient algorithms
- Policy gradient in continuous time
variance reductionreinforcement learningpolicy gradientspolicy gradients with parameter-based exploration
Cites Work
- Title not available (Why is that?)
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- 10.1162/1532443041827907
- Variance reduction techniques for gradient estimates in reinforcement learning
- Approximate gradient methods in policy-space optimization of Markov reward processes
- Using Expectation-Maximization for Reinforcement Learning
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
Cited In (15)
- A unified algorithm framework for mean-variance optimization in discounted Markov decision processes
- Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning
- Estimation and approximation bounds for gradient-based reinforcement learning
- Hessian matrix distribution for Bayesian policy gradient reinforcement learning
- Gradient estimation with simultaneous perturbation and compressive sensing
- Expected policy gradients for reinforcement learning
- Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
- Importance sampling techniques for policy optimization
- Title not available (Why is that?)
- Model-based reinforcement learning with dimension reduction
- Efficient sample reuse in policy gradients with parameter-based exploration
- Smoothing policies and safe policy gradients
- Policy search for active fault diagnosis with partially observable state
- An ODE method to prove the geometric convergence of adaptive stochastic algorithms
- Global convergence of policy gradient methods to (almost) locally optimal policies
This page was built for publication: Analysis and improvement of policy gradient estimation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q448295)