Improving estimation efficiency for two-phase, outcome-dependent sampling studies
From MaRDI portal
Publication:6158213
DOI10.1214/23-EJS2124arXiv2212.09817OpenAlexW4362648691MaRDI QIDQ6158213FDOQ6158213
Authors: Menglu Che, Peisong Han, Jerald F. Lawless
Publication date: 31 May 2023
Published in: Electronic Journal of Statistics (Search for Journal in Brave)
Abstract: Two-phase outcome dependent sampling (ODS) is widely used in many fields, especially when certain covariates are expensive and/or difficult to measure. For two-phase ODS, the conditional maximum likelihood (CML) method is very attractive because it can handle zero Phase 2 selection probabilities and avoids modeling the covariate distribution. However, most existing CML-based methods use only the Phase 2 sample and thus may be less efficient than other methods. We propose a general empirical likelihood method that uses CML augmented with additional information in the whole Phase 1 sample to improve estimation efficiency. The proposed method maintains the ability to handle zero selection probabilities and avoids modeling the covariate distribution, but can lead to substantial efficiency gains over CML in the inexpensive covariates, or in the influential covariate when a surrogate is available, because of an effective use of the Phase 1 data. Simulations and a real data illustration using NHANES data are presented.
Full work available at URL: https://arxiv.org/abs/2212.09817
conditional likelihoodmissing at randomempirical likelihoodtwo-phase studysurrogate covariateexpensive covariate
Cites Work
- A Generalization of Sampling Without Replacement From a Finite Universe
- Empirical likelihood and general estimating equations
- Empirical likelihood
- Estimation of Regression Coefficients When Some Regressors Are Not Always Observed
- A semiparametric empirical likelihood method for data from an outcome-dependent sampling scheme with a continuous outcome
- A Pseudoscore Estimator for Regression Problems With Two-Phase Sampling
- An Estimated Likelihood Method for Continuous Outcome Regression Models With Outcome-Dependent Sampling
- Improving the Efficiency of Relative-Risk Estimation in Case-Cohort Studies
- Logistic regression for two-stage case-control data
- Title not available (Why is that?)
- Case-control studies
- More efficient estimators for case-cohort studies
- A mean score method for missing and auxiliary covariate data in regression models
- Fitting regression models to case-control data by maximum likelihood
- Empirical likelihood in missing data problems
- Miscellanea. Combining parametric and empirical likelihoods
- Semiparametric Methods for Response-Selective and Missing Data Problems in Regression
- Likelihood methods for regression models with expensive variables missing by design
- Fitting regression models with response-biased samples
- Empirical and conditional likelihoods for two‐phase studies
- Statistical Analysis with Missing Data, Third Edition
- Semiparametric maximum likelihood for missing covariates in parametric regression
- Score tests for association under response-dependent sampling designs for expensive covariates
- Empirical likelihood estimation using auxiliary summary information with different covariate distributions
Cited In (5)
- Statistical Inference for a Two-Stage Outcome-Dependent Sampling Design with a Continuous Outcome
- Efficient use of a two-stage randomized response procedure
- Causal Inference in Outcome-Dependent Two-Phase Sampling Designs
- A semiparametric method for risk prediction using integrated electronic health record data
- Novel two‐phase sampling designs for studying binary outcomes
This page was built for publication: Improving estimation efficiency for two-phase, outcome-dependent sampling studies
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6158213)