Efficient exploration through active learning for value function approximation in reinforcement learning
DOI10.1016/J.NEUNET.2009.12.010zbMATH Open1396.68086DBLPjournals/nn/AkiyamaHS10OpenAlexW2160095661WikidataQ48234774 ScholiaQ48234774MaRDI QIDQ1784573FDOQ1784573
Authors: Takayuki Akiyama, Hirotaka Hachiya, Masashi Sugiyama
Publication date: 27 September 2018
Published in: Neural Networks (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.neunet.2009.12.010
Recommendations
- 10.1162/1532443041827907
- An active exploration method for data efficient reinforcement learning
- Rollout sampling approximate policy iteration
- Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
- Efficient sample reuse in policy gradients with parameter-based exploration
Markov decision processreinforcement learningleast-squares policy iterationactive learningbatting robot
Linear regression; mixed models (62J05) Learning and adaptive systems in artificial intelligence (68T05) Algorithms for approximation of functions (65D15) Artificial intelligence for robotics (68T40)
Cites Work
- 10.1162/153244303765208377
- Regularization algorithms for learning that are equivalent to multilayer networks
- Pattern recognition and machine learning.
- Ridge Regression: Biased Estimation for Nonorthogonal Problems
- Title not available (Why is that?)
- Title not available (Why is that?)
- 10.1162/1532443041827907
- Near-optimal reinforcement learning in polynomial time
- Active learning algorithm using the maximum weighted log-likelihood estimator
- Title not available (Why is that?)
- Adaptive importance sampling for value function approximation in off-policy reinforcement learning
- Pool-based active learning in approximate linear regression
- Robust weights and designs for biased regression models: Least squares and generalized \(M\)-estimation
Cited In (7)
- Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search
- Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
- A parallel scheduling algorithm for reinforcement learning in large state space
- Learning under nonstationarity: covariate shift and class-balance change
- Model-based policy gradients with parameter-based exploration by least-squares conditional density estimation
- Improving importance estimation in pool-based batch active learning for approximate linear regression
- Active policy learning for robot planning and exploration under uncertainty
Uses Software
This page was built for publication: Efficient exploration through active learning for value function approximation in reinforcement learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1784573)