Batch mode reinforcement learning based on the synthesis of artificial trajectories
DOI10.1007/S10479-012-1248-5zbMATH Open1276.68134OpenAlexW2134689794WikidataQ42258641 ScholiaQ42258641MaRDI QIDQ378762FDOQ378762
Authors: R. Fonteneau, Louis Wehenkel, D. Ernst, Susan A. Murphy
Publication date: 12 November 2013
Published in: Annals of Operations Research (Search for Journal in Brave)
Full work available at URL: http://europepmc.org/articles/pmc3773886
Recommendations
- Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
- scientific article; zbMATH DE number 5957269
- Machine Learning: ECML 2004
- Iteratively extending time horizon reinforcement learning.
- Min max generalization for deterministic batch mode reinforcement learning: relaxation schemes
Learning and adaptive systems in artificial intelligence (68T05) Stochastic learning and adaptive control (93E35)
Cites Work
- Title not available (Why is that?)
- A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect
- Marginal Mean Models for Dynamic Regimes
- Optimal Dynamic Treatment Regimes
- Least squares policy evaluation algorithms with linear function approximation
- 10.1162/1532443041827907
- Linear least-squares algorithms for temporal difference learning
- Kernel-based reinforcement learning
- Finite-time bounds for fitted value iteration
- Title not available (Why is that?)
- Technical update: Least-squares temporal difference learning
- Towards min max generalization in reinforcement learning
- Finite-sample analysis of least-squares policy iteration
- Iteratively extending time horizon reinforcement learning.
Cited In (4)
Uses Software
This page was built for publication: Batch mode reinforcement learning based on the synthesis of artificial trajectories
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q378762)