Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
From MaRDI portal
Publication:5060503
DOI10.1287/opre.2021.2249OpenAlexW2994709386MaRDI QIDQ5060503
Nathan Kallus, Masatoshi Uehara
Publication date: 10 January 2023
Published in: Operations Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1909.05850
Related Items
A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets, Off-policy evaluation in partially observed Markov decision processes under sequential ignorability, Projected state-action balancing weights for offline reinforcement learning, Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Doubly robust policy evaluation and optimization
- Comment: Understanding OR, PS and DR
- On the Markov chain central limit theorem
- Basic properties of strong mixing conditions. A survey and some open questions
- Consistent estimation of the influence function of locally asymptotically linear estimators
- Efficient estimation of panel data models with sequential moment restrictions
- Least squares policy evaluation algorithms with linear function approximation
- Introduction to empirical processes and semiparametric inference
- Semiparametric theory and missing data.
- Irregular Identification, Support Conditions, and Inverse Weight Estimation
- On Generalized Bellman Equations and Temporal-Difference Learning
- Semiparametric efficiency bounds
- Markov Chains and Stochastic Stability
- Asymptotic Statistics
- Estimation of Regression Coefficients When Some Regressors Are Not Always Observed
- Marginal Mean Models for Dynamic Regimes
- Sieve Extremum Estimates for Weakly Dependent Data
- Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models
- Optimal Dynamic Treatment Regimes
- 10.1162/1532443041827907
- Least Squares Temporal Difference Methods: An Analysis under General Conditions
- Double/debiased machine learning for treatment and structural parameters
- Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning
- Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score
- Characterization of parameters with a mixed bias property