Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
From MaRDI portal
Publication:2887009
Recommendations
- Efficient sample reuse in policy gradients with parameter-based exploration
- Reinforcement learning in sparse-reward environments with hindsight policy gradients
- Recurrent policy gradients
- Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning
- scientific article; zbMATH DE number 1753141
- Expected policy gradients for reinforcement learning
Cites work
- Adaptive importance sampling for value function approximation in off-policy reinforcement learning
- Efficient exploration through active learning for value function approximation in reinforcement learning
- Improving predictive inference under covariate shift by weighting the log-likelihood function
- Input-dependent estimation of generalization error under covariate shift
- Real-time reinforcement learning by sequential actor-critics and experience replay
- Trading Variance Reduction with Unbiasedness: The Regularized Subspace Information Criterion for Robust Model Selection in Kernel Regression
- Using Expectation-Maximization for Reinforcement Learning
Cited in
(10)- Learning under nonstationarity: covariate shift and class-balance change
- Efficient exploration through active learning for value function approximation in reinforcement learning
- Autonomous reinforcement learning with experience replay
- Importance sampling techniques for policy optimization
- Policy search for motor primitives in robotics
- Accelerating reinforcement learning with a directional-Gaussian-smoothing evolution strategy
- Efficient sample reuse in policy gradients with parameter-based exploration
- Bayesian policy reuse
- Reinforcement learning in sparse-reward environments with hindsight policy gradients
- 10.1162/jmlr.2003.3.4-5.921
This page was built for publication: Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2887009)