Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
From MaRDI portal
Publication:5060503
DOI10.1287/OPRE.2021.2249OpenAlexW2994709386MaRDI QIDQ5060503FDOQ5060503
Authors: Nathan Kallus, Masatoshi Uehara
Publication date: 10 January 2023
Published in: Operations Research (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1909.05850
Recommendations
- Double reinforcement learning for efficient off-policy evaluation in Markov decision processes
- Doubly robust policy evaluation and optimization
- scientific article; zbMATH DE number 1753153
- Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning
- Policy learning for time-bounded reachability in continuous-time Markov decision processes via doubly-stochastic gradient ascent
- An emphatic approach to the problem of off-policy temporal-difference learning
- scientific article; zbMATH DE number 1753152
- Off-policy linear temporal difference learning algorithms with a generalized oblique projection
- Reinforcement learning in sparse-reward environments with hindsight policy gradients
Cites Work
- Introduction to empirical processes and semiparametric inference
- Asymptotic Statistics
- Markov Chains and Stochastic Stability
- Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models
- Double/debiased machine learning for treatment and structural parameters
- Efficient estimation of panel data models with sequential moment restrictions
- Semiparametric theory and missing data.
- Semiparametric efficiency bounds
- Estimation of Regression Coefficients When Some Regressors Are Not Always Observed
- Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score
- Title not available (Why is that?)
- Title not available (Why is that?)
- On the Markov chain central limit theorem
- Basic properties of strong mixing conditions. A survey and some open questions
- Irregular identification, support conditions, and inverse weight estimation
- Marginal Mean Models for Dynamic Regimes
- Comment: Understanding OR, PS and DR
- Sieve Extremum Estimates for Weakly Dependent Data
- Optimal Dynamic Treatment Regimes
- Doubly robust policy evaluation and optimization
- Least squares policy evaluation algorithms with linear function approximation
- 10.1162/1532443041827907
- Reinforcement learning. An introduction
- Dynamic programming and optimal control. Vol. 2
- Generalized TD learning
- Consistent estimation of the influence function of locally asymptotically linear estimators
- Least squares temporal difference methods: An analysis under general conditions
- Estimating dynamic treatment regimes in mobile health using V-learning
- Title not available (Why is that?)
- Characterization of parameters with a mixed bias property
Cited In (9)
- Predicting and optimizing marketing performance in dynamic markets
- A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets
- Reliable off-policy evaluation for reinforcement learning
- Proximal reinforcement learning: efficient off-policy evaluation in partially observed Markov decision processes
- Off-policy evaluation in partially observed Markov decision processes under sequential ignorability
- Projected state-action balancing weights for offline reinforcement learning
- Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning
- Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
- Deep spectral Q-learning with application to mobile health
Uses Software
This page was built for publication: Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5060503)