scientific article; zbMATH DE number 6982305
From MaRDI portal
Publication:4558153
Recommendations
- Using trajectory data to improve Bayesian optimization for reinforcement learning
- An incremental off-policy search in a model-free Markov decision process using a single sample path
- A stochastic trust-region framework for policy optimization
- Policy gradient in continuous time
- On the policy improvement algorithm in continuous time
- Safe policy iteration: a monotonically improving approximate policy iteration approach
Cites work
- scientific article; zbMATH DE number 1095138 (Why is no real title available?)
- scientific article; zbMATH DE number 6982305 (Why is no real title available?)
- A generalized path integral control approach to reinforcement learning
- Algorithms for reinforcement learning.
- Hierarchical relative entropy policy search
- Model-based contextual policy search for data-efficient generalization of robot skills
- Policy gradient in Lipschitz Markov decision processes
Cited in
(7)- TD-regularized actor-critic methods
- Experiments with Tractable Feedback in Robotic Planning Under Uncertainty: Insights over a Wide Range of Noise Regimes
- Expected policy gradients for reinforcement learning
- scientific article; zbMATH DE number 7307467 (Why is no real title available?)
- scientific article; zbMATH DE number 6982305 (Why is no real title available?)
- Optimistic reinforcement learning by forward Kullback-Leibler divergence optimization
- Compatible natural gradient policy search
This page was built for publication:
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4558153)