scientific article; zbMATH DE number 6982305
From MaRDI portal
Publication:4558153
zbMATH Open1437.68147arXiv1606.09197MaRDI QIDQ4558153FDOQ4558153
Authors: Riad Akrour, A. Abdolmaleki, Hany Abdulsamad, Jan Peters, Gerhard Neumann
Publication date: 21 November 2018
Full work available at URL: https://arxiv.org/abs/1606.09197
Title of this publication is not available (Why is that?)
Recommendations
- Using trajectory data to improve Bayesian optimization for reinforcement learning
- An incremental off-policy search in a model-free Markov decision process using a single sample path
- A stochastic trust-region framework for policy optimization
- Policy gradient in continuous time
- On the policy improvement algorithm in continuous time
- Safe policy iteration: a monotonically improving approximate policy iteration approach
Statistical aspects of information-theoretic topics (62B10) Learning and adaptive systems in artificial intelligence (68T05) Markov and semi-Markov decision processes (90C40)
Cites Work
- Title not available (Why is that?)
- A generalized path integral control approach to reinforcement learning
- Algorithms for reinforcement learning.
- Hierarchical relative entropy policy search
- Policy gradient in Lipschitz Markov decision processes
- Model-based contextual policy search for data-efficient generalization of robot skills
- Title not available (Why is that?)
Cited In (7)
- TD-regularized actor-critic methods
- Experiments with Tractable Feedback in Robotic Planning Under Uncertainty: Insights over a Wide Range of Noise Regimes
- Expected policy gradients for reinforcement learning
- Title not available (Why is that?)
- Title not available (Why is that?)
- Optimistic reinforcement learning by forward Kullback-Leibler divergence optimization
- Compatible natural gradient policy search
Uses Software
This page was built for publication:
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4558153)