Q-learning with censored data

DOI10.1214/12-AOS968MaRDI QIDQ450048zbMATH OpenWikidataFDO

Publication date 3 September 2012

Published in The Annals of Statistics (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1205.6659, https://projecteuclid.org/euclid.aos/1336396182

survival analysis generalization error reinforcement learning

Censored data models (62N01) Applications of statistics to biology and medical sciences; meta analysis (62P10) Medical applications (general) (92C50)

Abstract: We develop methodology for a multistage decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.

Recommendations

Cites work

Cited in

(35)

This page was built for publication: Q-learning with censored data

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q450048)