Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
From MaRDI portal
Publication:2887009
DOI10.1162/NECO_A_00199zbMATH Open1237.68147OpenAlexW1971492381WikidataQ51539172 ScholiaQ51539172MaRDI QIDQ2887009FDOQ2887009
Hirotaka Hachiya, Masashi Sugiyama, Jan Peters
Publication date: 15 May 2012
Published in: Neural Computation (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1162/neco_a_00199
Recommendations
- Efficient sample reuse in policy gradients with parameter-based exploration
- Reinforcement learning in sparse-reward environments with hindsight policy gradients
- Recurrent policy gradients
- Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning
- scientific article; zbMATH DE number 1753141
- Expected policy gradients for reinforcement learning
Cites Work
- Improving predictive inference under covariate shift by weighting the log-likelihood function
- Input-dependent estimation of generalization error under covariate shift
- Trading Variance Reduction with Unbiasedness: The Regularized Subspace Information Criterion for Robust Model Selection in Kernel Regression
- Using Expectation-Maximization for Reinforcement Learning
- Real-time reinforcement learning by sequential actor-critics and experience replay
- Adaptive importance sampling for value function approximation in off-policy reinforcement learning
- Efficient exploration through active learning for value function approximation in reinforcement learning
Cited In (4)
Uses Software
This page was built for publication: Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2887009)