Efficient sample reuse in policy gradients with parameter-based exploration
DOI10.1162/NECO_A_00452zbMATH Open1414.68090arXiv1301.3966OpenAlexW2133224499WikidataQ47904761 ScholiaQ47904761MaRDI QIDQ5378202FDOQ5378202
Authors: Tingting Zhao, Hirotaka Hachiya, Voot Tangkaratt, Jun Morimoto, Masashi Sugiyama
Publication date: 12 June 2019
Published in: Neural Computation (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1301.3966
Recommendations
- Analysis and improvement of policy gradient estimation
- Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
- Expected policy gradients for reinforcement learning
- Importance sampling techniques for policy optimization
- Variance reduction techniques for gradient estimates in reinforcement learning
Learning and adaptive systems in artificial intelligence (68T05) Artificial intelligence for robotics (68T40)
Cites Work
- Improving predictive inference under covariate shift by weighting the log-likelihood function
- Title not available (Why is that?)
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Title not available (Why is that?)
- Approximate dynamic programming with a fuzzy parameterization
- Reinforcement learning. An introduction
- Variance reduction techniques for gradient estimates in reinforcement learning
- Analysis and improvement of policy gradient estimation
- Real-time reinforcement learning by sequential actor-critics and experience replay
- Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
Cited In (15)
- Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
- An Incremental Fast Policy Search Using a Single Sample Path
- Expected policy gradients for reinforcement learning
- Learning under nonstationarity: covariate shift and class-balance change
- Analysis and improvement of policy gradient estimation
- An active exploration method for data efficient reinforcement learning
- Adaptive importance sampling for value function approximation in off-policy reinforcement learning
- Efficient exploration through active learning for value function approximation in reinforcement learning
- Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
- Importance sampling techniques for policy optimization
- Model-based reinforcement learning with dimension reduction
- Recurrent policy gradients
- Policy search for active fault diagnosis with partially observable state
- Reinforcement learning in sparse-reward environments with hindsight policy gradients
- A generalized path integral control approach to reinforcement learning
This page was built for publication: Efficient sample reuse in policy gradients with parameter-based exploration
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5378202)