Double reinforcement learning for efficient off-policy evaluation in Markov decision processes
From MaRDI portal
Publication:5148951
Authors: Nathan Kallus, Masatoshi Uehara
Publication date: 5 February 2021
Full work available at URL: https://arxiv.org/abs/1908.08526
Recommendations
- Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
- Proximal reinforcement learning: efficient off-policy evaluation in partially observed Markov decision processes
- Reliable off-policy evaluation for reinforcement learning
- Projected state-action balancing weights for offline reinforcement learning
- Off-policy evaluation in partially observed Markov decision processes under sequential ignorability
Cites Work
- Large Sample Properties of Generalized Method of Moments Estimators
- Introduction to empirical processes and semiparametric inference
- Asymptotic Statistics
- Nonparametric econometrics. Theory and practice.
- Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models
- Double/debiased machine learning for treatment and structural parameters
- Efficient estimation of panel data models with sequential moment restrictions
- Unified methods for censored longitudinal data and causality
- Semiparametric theory and missing data.
- A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect
- Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score
- Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions
- Title not available (Why is that?)
- On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects
- The semiparametric efficiency bound for models of sequential moment restrictions containing unknown functions
- On methods of sieves and penalization
- The use of polynomial splines and their tensor products in multivariate function estimation. (With discussion)
- Local Rademacher complexities
- Marginal Mean Models for Dynamic Regimes
- Optimal Dynamic Treatment Regimes
- Title not available (Why is that?)
- Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions
- Doubly robust policy evaluation and optimization
- Title not available (Why is that?)
- On differentiable functionals
- Statistical methods for dynamic treatment regimes. Reinforcement learning, causal inference, and personalized medicine
- 10.1162/1532443041827907
- Reinforcement learning. An introduction
- A matrix extension of the Cauchy-Schwarz inequality
- Constructing dynamic treatment regimes over indefinite time horizons
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- Bias and variance approximation in value function estimates
- Robust inference on the average treatment effect using the outcome highly adaptive Lasso
- Consistent estimation of the influence function of locally asymptotically linear estimators
- Estimating dynamic treatment regimes in mobile health using V-learning
- Title not available (Why is that?)
- \(\text{Q}(\lambda)\) with off-policy corrections
Cited In (12)
- Batch policy learning in average reward Markov decision processes
- Toward theoretical understandings of robust Markov decision processes: sample complexity and asymptotics
- Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons
- Reliable off-policy evaluation for reinforcement learning
- Proximal reinforcement learning: efficient off-policy evaluation in partially observed Markov decision processes
- Off-policy evaluation in partially observed Markov decision processes under sequential ignorability
- Projected state-action balancing weights for offline reinforcement learning
- Title not available (Why is that?)
- Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
- Settling the sample complexity of model-based offline reinforcement learning
- Optimal policy evaluation using kernel-based temporal difference methods
- Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
This page was built for publication: Double reinforcement learning for efficient off-policy evaluation in Markov decision processes
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5148951)