Batch mode reinforcement learning based on the synthesis of artificial trajectories
From MaRDI portal
(Redirected from Publication:378762)
Recommendations
- Learning Near-Optimal Policies with Bellman-Residual Minimization Based Fitted Policy Iteration and a Single Sample Path
- scientific article; zbMATH DE number 5957269
- Machine Learning: ECML 2004
- Iteratively extending time horizon reinforcement learning.
- Min max generalization for deterministic batch mode reinforcement learning: relaxation schemes
Cites work
- scientific article; zbMATH DE number 5957269 (Why is no real title available?)
- scientific article; zbMATH DE number 3126094 (Why is no real title available?)
- 10.1162/1532443041827907
- A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect
- Finite-sample analysis of least-squares policy iteration
- Finite-time bounds for fitted value iteration
- Iteratively extending time horizon reinforcement learning.
- Kernel-based reinforcement learning
- Least squares policy evaluation algorithms with linear function approximation
- Linear least-squares algorithms for temporal difference learning
- Marginal Mean Models for Dynamic Regimes
- Optimal Dynamic Treatment Regimes
- Technical update: Least-squares temporal difference learning
- Towards min max generalization in reinforcement learning
Cited in
(4)
This page was built for publication: Batch mode reinforcement learning based on the synthesis of artificial trajectories
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q378762)