p-values for high-dimensional regression
From MaRDI portal
Publication:3069897
DOI10.1198/JASA.2009.TM08647zbMATH Open1205.62089arXiv0811.2177OpenAlexW2082213488MaRDI QIDQ3069897FDOQ3069897
Authors: Nicolai Meinshausen, Lukas Meier, Peter Bühlmann
Publication date: 1 February 2011
Published in: Journal of the American Statistical Association (Search for Journal in Brave)
Abstract: Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits the data into two parts. The number of variables is then reduced to a manageable size using the first split, while classical variable selection techniques can be applied to the remaining variables, using the data from the second split. This yields asymptotic error control under minimal conditions. It involves, however, a one-time random split of the data. Results are sensitive to this arbitrary choice: it amounts to a `p-value lottery' and makes it difficult to reproduce results. Here, we show that inference across multiple random splits can be aggregated, while keeping asymptotic control over the inclusion of noise variables. We show that the resulting p-values can be used for control of both family-wise error (FWER) and false discovery rate (FDR). In addition, the proposed aggregation is shown to improve power while reducing the number of falsely selected variables substantially.
Full work available at URL: https://arxiv.org/abs/0811.2177
Recommendations
- Statistical significance in high-dimensional linear models
- Oracle P-values and variable screening
- Consistent variable selection in high dimensional regression via multiple testing
- High-dimensional linear model selection motivated by multiple testing
- Bootstrapping and sample splitting for high-dimensional, assumption-lean inference
false discovery ratefamily-wise error ratemultiple comparisonsdata splittinghigh-dimensional variable selection
Cited In (only showing first 100 items - show all)
- SLOPE-adaptive variable selection via convex optimization
- A regularization-based adaptive test for high-dimensional GLMs
- High-dimensional inference: confidence intervals, \(p\)-values and R-software \texttt{hdi}
- Discussion of big Bayes stories and BayesBag
- Hierarchical inference for genome-wide association studies: a view on methodology with software
- Discussion: ``A significance test for the lasso
- Discussion: ``A significance test for the lasso
- Discussion: ``A significance test for the lasso
- Discussion: ``A significance test for the lasso
- Variable selection for generalized odds rate mixture cure models with interval-censored failure time data
- Testing covariates in high dimension linear regression with latent factors
- The smooth-Lasso and other \(\ell _{1}+\ell _{2}\)-penalized methods
- A unified theory of confidence regions and testing for high-dimensional estimating equations
- Selective inference with a randomized response
- Robust error density estimation in ultrahigh dimensional sparse linear model
- CLT For U-statistics With Growing Dimension
- A significance test for the lasso
- Projection-based Inference for High-dimensional Linear Models
- Bootstrapping and sample splitting for high-dimensional, assumption-lean inference
- Inference for High-Dimensional Censored Quantile Regression
- Testing a single regression coefficient in high dimensional linear models
- Error density estimation in high-dimensional sparse linear model
- Exact tests via multiple data splitting
- Detection of gene-gene interactions using multistage sparse and low-rank regression
- New hard-thresholding rules based on data splitting in high-dimensional imbalanced classification
- Discussion: ``A significance test for the lasso
- Confidence Intervals and Hypothesis Testing for High-Dimensional Regression
- High-dimensional variable screening and bias in subsequent inference, with an empirical comparison
- Pivotal Estimation in High-Dimensional Regression via Linear Programming
- Title not available (Why is that?)
- An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis
- Estimation for high-dimensional linear mixed-effects models using \(\ell_1\)-penalization
- Penalized weighted composite quantile regression in the linear regression model with heavy-tailed autocorrelated errors
- The Holdout Randomization Test for Feature Selection in Black Box Models
- Wavelet-domain regression and predictive inference in psychiatric neuroimaging
- The predictive power of the business and bank sentiment of firms: a high-dimensional Granger causality approach
- Multicarving for high-dimensional post-selection inference
- Title not available (Why is that?)
- Projection tests for high-dimensional spiked covariance matrices
- Predictor ranking and false discovery proportion control in high-dimensional regression
- Rejoinder: ``A significance test for the lasso
- Confidence intervals for high-dimensional inverse covariance estimation
- A penalized approach to covariate selection through quantile regression coefficient models
- Group inference in high dimensions with applications to hierarchical testing
- Goodness-of-Fit Tests for High Dimensional Linear Models
- Variable selection after screening: with or without data splitting?
- Exact model comparisons in the plausibility framework
- On asymptotically optimal confidence regions and tests for high-dimensional models
- The benefit of group sparsity in group inference with de-biased scaled group Lasso
- Two-directional simultaneous inference for high-dimensional models
- Empirical likelihood test for high dimensional linear models
- High-dimensional inference in misspecified linear models
- Selecting massive variables using an iterated conditional modes/medians algorithm
- Two-stage procedures for high-dimensional data
- Statistical significance in high-dimensional linear models
- Likelihood ratio test in multivariate linear regression: from low to high dimension
- Multi split conformal prediction
- Discussion: ``A significance test for the lasso
- Rejoinder on: ``High-dimensional simultaneous inference with the bootstrap
- Exact adaptive confidence intervals for linear regression coefficients
- False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation
- Estimation of High Dimensional Mean Regression in the Absence of Symmetry and Light Tail Assumptions
- High-dimensional simultaneous inference with the bootstrap
- High-dimensional linear model selection motivated by multiple testing
- Consistent variable selection for functional regression models
- Beyond support in two-stage variable selection
- Selective inference via marginal screening for high dimensional classification
- Tests for high-dimensional single-index models
- Testing the differential network between two gaussian graphical models with false discovery rate control
- False Discovery Rate Control via Data Splitting
- A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models
- A Two-Sample Conditional Distribution Test Using Conformal Prediction and Weighted Rank Sum
- Large-Scale Two-Sample Comparison of Support Sets
- A global homogeneity test for high-dimensional linear regression
- Distributionally robust and generalizable inference
- Markov Neighborhood Regression for High-Dimensional Inference
- Inference for high‐dimensional linear models with locally stationary error processes
- A post-screening diagnostic study for ultrahigh dimensional data
- Structure learning of exponential family graphical model with false discovery rate control
- Bayesian learners in gradient boosting for linear mixed models
- Marginal false discovery rate for a penalized transformation survival model
- Inference for sparse linear regression based on the leave-one-covariate-out solution path
- High-dimensional statistical inference via DATE
- Universal inference
- Support recovery of Gaussian graphical model with false discovery rate control
- A three-stage approach to identify biomarker signatures for cancer genetic data with survival endpoints
- Innovated scalable efficient inference for ultra-large graphical models
- Scalable penalized spatiotemporal land-use regression for ground-level nitrogen dioxide
- Post-model-selection inference in linear regression models: an integrated review
- False discovery control for penalized variable selections with high-dimensional covariates
- On the impact of model selection on predictor identification and parameter inference
- Thresholding tests based on affine Lasso to achieve non-asymptotic nominal level and high power under sparse and dense alternatives in high dimension
- Cellwise outlier detection with false discovery rate control
- Modeling Postoperative Mortality in Older Patients by Boosting Discrete-Time Competing Risks Models
- A nonlinear mixed–integer programming approach for variable selection in linear regression model
- Automatic bias correction for testing in high‐dimensional linear models
- Statistical inference for model parameters in stochastic gradient descent
- Feature-specific inference for penalized regression using local false discovery rates
- Markov neighborhood regression for statistical inference of high-dimensional generalized linear models
- Testing Mediation Effects Using Logic of Boolean Matrices
This page was built for publication: \(p\)-values for high-dimensional regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3069897)