Inference for the Case Probability in High-dimensional Logistic Regression
From MaRDI portal
Publication:66196
DOI10.48550/ARXIV.2012.07133arXiv2012.07133MaRDI QIDQ66196FDOQ66196
Authors: Zijian Guo, Prabrisha Rakshit, Daniel S. Herman, Jinbo Chen
Publication date: 13 December 2020
Abstract: Labeling patients in electronic health records with respect to their statuses of having a disease or condition, i.e. case or control statuses, has increasingly relied on prediction models using high-dimensional variables derived from structured and unstructured electronic health record data. A major hurdle currently is a lack of valid statistical inference methods for the case probability. In this paper, considering high-dimensional sparse logistic regression models for prediction, we propose a novel bias-corrected estimator for the case probability through the development of linearization and variance enhancement techniques. We establish asymptotic normality of the proposed estimator for any loading vector in high dimensions. We construct a confidence interval for the case probability and propose a hypothesis testing procedure for patient case-control labelling. We demonstrate the proposed method via extensive simulation studies and application to real-world electronic health record data.
Cited In (1)
This page was built for publication: Inference for the Case Probability in High-dimensional Logistic Regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q66196)