Batch mode reinforcement learning based on the synthesis of artificial trajectories (Q378762): Difference between revisions

@@ Property / cites work @@
+Q3241581
@@ Property / cites work: Q3241581 / rank @@
+Normal rank
@@ Property / cites work @@
+Technical update: Least-squares temporal difference learning
+Normal rank
@@ Property / cites work @@
+Q5477859
@@ Property / cites work: Q5477859 / rank @@
+Normal rank
@@ Property / cites work @@
+Machine Learning: ECML 2003
@@ Property / cites work: Machine Learning: ECML 2003 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3093261
@@ Property / cites work: Q3093261 / rank @@
+Normal rank
@@ Property / cites work @@
+Towards Min Max Generalization in Reinforcement Learning
+Normal rank
@@ Property / cites work @@
+.1162/1532443041827907
@@ Property / cites work: 10.1162/1532443041827907 / rank @@
+Normal rank
@@ Property / cites work @@
+Q5405216
@@ Property / cites work: Q5405216 / rank @@
+Normal rank
@@ Property / cites work @@
+Q3096132
@@ Property / cites work: Q3096132 / rank @@
+Normal rank
@@ Property / cites work @@
+Optimal Dynamic Treatment Regimes
@@ Property / cites work: Optimal Dynamic Treatment Regimes / rank @@
+Normal rank
@@ Property / cites work @@
+Marginal Mean Models for Dynamic Regimes
@@ Property / cites work: Marginal Mean Models for Dynamic Regimes / rank @@
+Normal rank
@@ Property / cites work @@
+Least squares policy evaluation algorithms with linear function approximation
+Normal rank
@@ Property / cites work @@
+Kernel-based reinforcement learning
@@ Property / cites work: Kernel-based reinforcement learning / rank @@
+Normal rank
@@ Property / cites work @@
+A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect
+Normal rank