Analysis and improvement of policy gradient estimation
From MaRDI portal
Publication:448295
DOI10.1016/j.neunet.2011.09.005zbMath1245.68165OpenAlexW2148053762WikidataQ51513131 ScholiaQ51513131MaRDI QIDQ448295
Tingting Zhao, Gang Niu, Hirotaka Hachiya, Masashi Sugiyama
Publication date: 30 August 2012
Published in: Neural Networks (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.neunet.2011.09.005
variance reductionreinforcement learningpolicy gradientspolicy gradients with parameter-based exploration
Related Items (7)
Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation ⋮ A unified algorithm framework for mean-variance optimization in discounted Markov decision processes ⋮ Smoothing policies and safe policy gradients ⋮ Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration ⋮ Unnamed Item ⋮ Model-based reinforcement learning with dimension reduction ⋮ An ODE method to prove the geometric convergence of adaptive stochastic algorithms
Cites Work
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Approximate gradient methods in policy-space optimization of Markov reward processes
- Using Expectation-Maximization for Reinforcement Learning
- 10.1162/1532443041827907
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
This page was built for publication: Analysis and improvement of policy gradient estimation