Batch policy learning in average reward Markov decision processes (Q2112817)

From MaRDI portal

Jump to:navigation, search

scientific article

Language	Label	Description	Also known as
English	Batch policy learning in average reward Markov decision processes	scientific article

Statements

scholarly article

0 references

Batch policy learning in average reward Markov decision processes (English)

0 references

10.1214/22-AOS2231

0 references

0 references

0 references

0 references

Predrag Klasnja

0 references

Susan A. Murphy

0 references

The Annals of Statistics

0 references

publication date

12 January 2023

0 references

full work available at URL

https://arxiv.org/abs/2007.11771

0 references

https://projecteuclid.org/journals/annals-of-statistics/volume-50/issue-6/Batch-policy-learning-in-average-reward-Markov-decision-processes/10.1214/22-AOS2231.full

0 references

Mathematics Subject Classification ID

0 references

zbMATH DE Number

0 references

zbMATH Keywords

Markov decision process

0 references

average reward

0 references

policy optimization

0 references

doubly robust estimator

0 references

describes a project that uses

0 references

0 references

MaRDI profile type

MaRDI publication profile

0 references

Learning Algorithms for Markov Decision Processes with Average Cost

0 references

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

0 references

0 references

Double/debiased machine learning for treatment and structural parameters

0 references

Doubly robust policy evaluation and optimization

0 references

0 references

Constructing dynamic treatment regimes over indefinite time horizons

0 references

Model selection in reinforcement learning

0 references

0 references

0 references

0 references

Dynamic treatment regimes: technical challenges and applications

0 references

10.1162/1532443041827907

0 references

Off-Policy Estimation of Long-Term Average Outcomes With Applications to Mobile Health

0 references

On the limited memory BFGS method for large scale optimization

0 references

Statistical consistency and asymptotic normality for high-dimensional robust \(M\)-estimators

0 references

Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning

0 references

0 references

The landscape of empirical risk for nonconvex losses

0 references

0 references

Marginal Mean Models for Dynamic Regimes

0 references

Semiparametric efficiency bounds

0 references

Kernel-based reinforcement learning

0 references

0 references

Estimation of Regression Coefficients When Some Regressors Are Not Always Observed

0 references

Support Vector Machines

0 references

0 references

Asymptotic Statistics

0 references

Resampling‐based confidence intervals for model‐free robust inference on optimal treatment regimes

0 references

A Robust Method for Estimating Optimal Treatment Regimes

0 references

Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions

0 references

New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2112817

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q2112817&oldid=37477851"