Efficient sample reuse in policy gradients with parameter-based exploration
DOI10.1162/NECO_A_00452zbMATH Open1414.68090arXiv1301.3966OpenAlexW2133224499WikidataQ47904761 ScholiaQ47904761MaRDI QIDQ5378202FDOQ5378202
Voot Tangkaratt, Tingting Zhao, Masashi Sugiyama, Jun Morimoto, Hirotaka Hachiya
Publication date: 12 June 2019
Published in: Neural Computation (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1301.3966
Recommendations
- Analysis and improvement of policy gradient estimation
- Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
- Expected policy gradients for reinforcement learning
- Importance sampling techniques for policy optimization
- Variance reduction techniques for gradient estimates in reinforcement learning
Learning and adaptive systems in artificial intelligence (68T05) Artificial intelligence for robotics (68T40)
Cites Work
- Improving predictive inference under covariate shift by weighting the log-likelihood function
- Title not available (Why is that?)
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Title not available (Why is that?)
- Approximate dynamic programming with a fuzzy parameterization
- Title not available (Why is that?)
- Title not available (Why is that?)
- Analysis and improvement of policy gradient estimation
- Real-time reinforcement learning by sequential actor-critics and experience replay
- Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
Cited In (6)
- Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
- An Incremental Fast Policy Search Using a Single Sample Path
- Learning under nonstationarity: covariate shift and class-balance change
- Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
- Model-based reinforcement learning with dimension reduction
- Policy search for active fault diagnosis with partially observable state
This page was built for publication: Efficient sample reuse in policy gradients with parameter-based exploration
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5378202)