A knockoff filter for high-dimensional selective inference
From MaRDI portal
Publication:2328050
DOI10.1214/18-AOS1755zbMATH Open1444.62034arXiv1602.03574OpenAlexW2965676676MaRDI QIDQ2328050FDOQ2328050
Authors: Rina Foygel Barber, Emmanuel J. Candès
Publication date: 9 October 2019
Published in: The Annals of Statistics (Search for Journal in Brave)
Abstract: This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced set of variables; we also develop strategies for leveraging information from the first part of the data at the inference step for greater power. In our work, the inferential step is carried out by applying the recently introduced knockoff filter, which creates a knockoff copy-a fake variable serving as a control-for each screened variable. We prove that this procedure controls the directional false discovery rate (FDR) in the reduced model controlling for all screened variables; this says that our high-dimensional knockoff procedure 'discovers' important variables as well as the directions (signs) of their effects, in such a way that the expected proportion of wrongly chosen signs is below the user-specified level (thereby controlling a notion of Type S error averaged over the selected set). This result is non-asymptotic, and holds for any distribution of the original features and any values of the unknown regression coefficients, so that inference is not calibrated under hypothesized values of the effect sizes. We demonstrate the performance of our general and flexible approach through numerical studies, showing more power than existing alternatives. Finally, we apply our method to a genome-wide association study to find locations on the genome that are possibly associated with a continuous phenotype.
Full work available at URL: https://arxiv.org/abs/1602.03574
Recommendations
Parametric hypothesis testing (62F03) Linear regression; mixed models (62J05) Measures of association (correlation, canonical correlation, etc.) (62H20)
Cites Work
- EigenPrism: inference for high dimensional signal-to-noise ratios
- Type S error for classical and Bayesian single and multiple comparison procedures
- Title not available (Why is that?)
- Confidence Intervals and Hypothesis Testing for High-Dimensional Regression
- Square-root lasso: pivotal recovery of sparse signals via conic programming
- Title not available (Why is that?)
- Title not available (Why is that?)
- Sure Independence Screening for Ultrahigh Dimensional Feature Space
- Confidence Intervals for Low Dimensional Parameters in High Dimensional Linear Models
- A significance test for the lasso
- A knockoff filter for high-dimensional selective inference
- Exact post-selection inference, with application to the Lasso
- Valid post-selection inference
- Inference on treatment effects after selection among high-dimensional controls
- High-dimensional variable selection
- The sparsity and bias of the LASSO selection in high-dimensional linear regression
- False discoveries occur early on the Lasso path
- Title not available (Why is that?)
- Panning for Gold: ‘Model-X’ Knockoffs for High Dimensional Controlled Variable Selection
- Controlling Variable Selection by the Addition of Pseudovariables
- Asymptotics of selective inference
- Can one estimate the conditional distribution of post-model-selection estimators?
- Controlling the false discovery rate via knockoffs
- Sequential selection procedures and false discovery rate control
- Graph estimation with joint additive models
- Familywise error rate control via knockoffs
- False Discovery Rate–Adjusted Multiple Confidence Intervals for Selected Parameters
- John W. Tukey's contributions to multiple comparisons
- Selective inference with unknown variance via the square-root Lasso
Cited In (61)
- Nonparametric augmented probability weighting with sparsity
- Large-Scale Two-Sample Comparison of Support Sets
- Sequential knockoffs for continuous and categorical predictors: with application to a large psoriatic arthritis clinical trial pool
- FDR control and power analysis for high-dimensional logistic regression via Stabkoff
- Reproducible learning for accelerated failure time models via deep knockoffs
- False discovery rate-controlled multiple testing for union null hypotheses: a knockoff-based approach
- Overview of research advance for knockoff methods
- Differential network knockoff filter with application to brain connectivity analysis
- Split Knockoffs for Multiple Comparisons: Controlling the Directional False Discovery Rate
- Stab-GKnock: controlled variable selection for partially linear models using generalized knockoffs
- CoxKnockoff: controlled feature selection for the Cox model using knockoffs
- False Discovery Rate Control via Data Splitting
- A power analysis for Model-X knockoffs with \(\ell_p\)-regularized statistics
- Nonparametric false discovery rate control for identifying simultaneous signals
- The revisited knockoffs method for variable selection in L1-penalized regressions
- Sufficient variable screening with high-dimensional controls
- Revisiting feature selection for linear models with FDR and power guarantees
- FANOK: knockoffs in linear time
- A generalized knockoff procedure for FDR control in structural change detection
- IPAD: stable interpretable forecasting with knockoffs inference
- Knockoff procedure for false discovery rate control in high-dimensional data streams
- Structure learning of exponential family graphical model with false discovery rate control
- Projection-based Inference for High-dimensional Linear Models
- A knockoff filter for high-dimensional selective inference
- Null-free false discovery rate control using decoy permutations
- Multilayer knockoff filter: controlled variable selection at multiple resolutions
- Robust inference with knockoffs
- Title not available (Why is that?)
- Online rules for control of false discovery rate and false discovery exceedance
- Empirical Bayes cumulative \(\ell\)-value multiple testing procedure for sparse sequences
- Model-Free Conditional Feature Screening with FDR Control
- A Critical Review of LASSO and Its Derivatives for Variable Selection Under Dependence Among Covariates
- Feature screening and FDR control with knockoff features for ultrahigh-dimensional right-censored data
- GGM Knockoff Filter: False Discovery Rate Control for Gaussian Graphical Models
- A powerful procedure that controls the false discovery rate with directional information
- Testing Mediation Effects Using Logic of Boolean Matrices
- RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs
- StarTrek: combinatorial variable selection with false discovery rate control
- Learning sparse conditional distribution: an efficient kernel-based approach
- Multicarving for high-dimensional post-selection inference
- Determine the number of clusters by data augmentation
- Adaptive procedures for directional false discovery rate control
- A stable and adaptive polygenic signal detection method based on repeated sample splitting
- Model-Free Feature Screening and FDR Control With Knockoff Features
- A prototype knockoff filter for group selection with FDR control
- Two-directional simultaneous inference for high-dimensional models
- Panning for Gold: ‘Model-X’ Knockoffs for High Dimensional Controlled Variable Selection
- Knockoffs with side information
- Reproducible feature selection in high-dimensional accelerated failure time models
- Kernel Knockoffs Selection for Nonparametric Additive Models
- Gene hunting with hidden Markov model knockoffs
- Compositional knockoff filter for high‐dimensional regression analysis of microbiome data
- Threshold Selection in Feature Screening for Error Rate Control
- False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation
- Reproducible learning in large-scale graphical models
- A robust knockoff filter for sparse regression analysis of microbiome compositional data
- Semi-supervised multiple testing
- An ensemble learning method for variable selection: application to high-dimensional data and missing values
- Selective inference via marginal screening for high dimensional classification
- Statistical proof? The problem of irreproducibility
- Derandomizing Knockoffs
Uses Software
This page was built for publication: A knockoff filter for high-dimensional selective inference
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2328050)