Batch policy learning in average reward Markov decision processes (Q2112817): Difference between revisions

@@ Property / author @@
-Susan A. Murphy
@@ Property / author: Susan A. Murphy / rank @@
-Normal rank
@@ Property / author @@
+Susan A. Murphy
@@ Property / author: Susan A. Murphy / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+L-BFGS
@@ Property / describes a project that uses: L-BFGS / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+Spearmint
@@ Property / describes a project that uses: Spearmint / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / cites work @@
+Learning Algorithms for Markov Decision Processes with Average Cost
+Normal rank
@@ Property / cites work @@
+Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
+Normal rank
@@ Property / cites work @@
+Q4277836
@@ Property / cites work: Q4277836 / rank @@
+Normal rank
@@ Property / cites work @@
+Double/debiased machine learning for treatment and structural parameters
+Normal rank
@@ Property / cites work @@
+Doubly robust policy evaluation and optimization
@@ Property / cites work: Doubly robust policy evaluation and optimization / rank @@
+Normal rank
@@ Property / cites work @@
+Q3093261
@@ Property / cites work: Q3093261 / rank @@
+Normal rank
@@ Property / cites work @@
+Constructing dynamic treatment regimes over indefinite time horizons
+Normal rank
@@ Property / cites work @@
+Model selection in reinforcement learning
@@ Property / cites work: Model selection in reinforcement learning / rank @@
+Normal rank
@@ Property / cites work @@
+Q2834459
@@ Property / cites work: Q2834459 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4255598
@@ Property / cites work: Q4255598 / rank @@
+Normal rank
@@ Property / cites work @@
+Q5148951
@@ Property / cites work: Q5148951 / rank @@
+Normal rank
@@ Property / cites work @@
+Dynamic treatment regimes: technical challenges and applications
+Normal rank
@@ Property / cites work @@
+.1162/1532443041827907
@@ Property / cites work: 10.1162/1532443041827907 / rank @@
+Normal rank
@@ Property / cites work @@
+Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health
+Normal rank
@@ Property / cites work @@
+On the limited memory BFGS method for large scale optimization
+Normal rank
@@ Property / cites work @@
+Statistical consistency and asymptotic normality for high-dimensional robust \(M\)-estimators
+Normal rank
@@ Property / cites work @@
+Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning
+Normal rank
@@ Property / cites work @@
+Q5477863
@@ Property / cites work: Q5477863 / rank @@
+Normal rank
@@ Property / cites work @@
+The landscape of empirical risk for nonconvex losses
+Normal rank
@@ Property / cites work @@
+Q3096132
@@ Property / cites work: Q3096132 / rank @@
+Normal rank
@@ Property / cites work @@
+Marginal Mean Models for Dynamic Regimes
@@ Property / cites work: Marginal Mean Models for Dynamic Regimes / rank @@
+Normal rank
@@ Property / cites work @@
+Semiparametric efficiency bounds
@@ Property / cites work: Semiparametric efficiency bounds / rank @@
+Normal rank
@@ Property / cites work @@
+Kernel-based reinforcement learning
@@ Property / cites work: Kernel-based reinforcement learning / rank @@
+Normal rank
@@ Property / cites work @@
+Q4315289
@@ Property / cites work: Q4315289 / rank @@
+Normal rank
@@ Property / cites work @@
+Estimation of Regression Coefficients When Some Regressors Are Not Always Observed
+Normal rank
@@ Property / cites work @@
+Support Vector Machines
@@ Property / cites work: Support Vector Machines / rank @@
+Normal rank
@@ Property / cites work @@
+Q4626283
@@ Property / cites work: Q4626283 / rank @@
+Normal rank
@@ Property / cites work @@
+Asymptotic Statistics
@@ Property / cites work: Asymptotic Statistics / rank @@
+Normal rank
@@ Property / cites work @@
+Resampling‐based confidence intervals for model‐free robust inference on optimal treatment regimes
+Normal rank
@@ Property / cites work @@
+A Robust Method for Estimating Optimal Treatment Regimes
+Normal rank
@@ Property / cites work @@
+Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions
+Normal rank
@@ Property / cites work @@
+New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes
+Normal rank
@@ Property / cites work @@
+Q4633064
@@ Property / cites work: Q4633064 / rank @@
+Normal rank