Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons
From MaRDI portal
Publication:6153987
Abstract: We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health applications with a pre-collected offline dataset remain unknown. The aim of this paper is to develop a novel advantage learning framework in order to efficiently use pre-collected data for policy optimization. The proposed method takes an optimal Q-estimator computed by any existing state-of-the-art RL algorithms as input, and outputs a new policy whose value is guaranteed to converge at a faster rate than the policy derived based on the initial Q-estimator. Extensive numerical experiments are conducted to back up our theoretical findings. A Python implementation of our proposed method is available at https://github.com/leyuanheart/SEAL.
Cites work
- scientific article; zbMATH DE number 700091 (Why is no real title available?)
- scientific article; zbMATH DE number 7306868 (Why is no real title available?)
- scientific article; zbMATH DE number 7307471 (Why is no real title available?)
- A simple method for estimating interactions between a treatment and a large number of covariates
- Basic properties of strong mixing conditions. A survey and some open questions
- Constructing dynamic treatment regimes over indefinite time horizons
- Double/debiased machine learning for treatment and structural parameters
- Doubly Robust Estimation in Missing Data and Causal Inference Models
- Doubly-robust dynamic treatment regimen estimation via weighted least squares
- Estimating dynamic treatment regimes in mobile health using V-learning
- Fast learning rates for plug-in classifiers
- Greedy outcome weighted tree learning of optimal personalized treatment rules
- High-dimensional \(A\)-learning for optimal dynamic treatment regimes
- Inference for non-regular parameters in optimal dynamic treatment regimes
- Interpretable dynamic treatment regimes
- Learning when-to-treat policies
- Mathematical Foundations of Infinite-Dimensional Statistical Models
- Maximin projection learning for optimal treatment decision with heterogeneous individualized treatment effects
- Multi-Armed Angle-Based Direct Learning for Estimating Optimal Individualized Treatment Rules With Various Outcomes
- New statistical learning methods for estimating optimal dynamic treatment regimes
- Optimal Dynamic Treatment Regimes
- Optimal Structural Nested Models for Optimal Sequential Decisions
- Optimal aggregation of classifiers in statistical learning.
- Optimal global rates of convergence for nonparametric regression
- Penalized Q-learning for dynamic treatment regimens
- Performance guarantees for individualized treatment rules
- Personalized Policy Learning Using Longitudinal Mobile Health Data
- Quantile-optimal treatment regimes
- Reinforcement learning. An introduction
- Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions
- Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy
- \(Q\)- and \(A\)-learning methods for estimating optimal dynamic treatment regimes
- \({\mathcal Q}\)-learning
This page was built for publication: Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6153987)