p-values for high-dimensional regression
From MaRDI portal
Publication:3069897
Abstract: Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits the data into two parts. The number of variables is then reduced to a manageable size using the first split, while classical variable selection techniques can be applied to the remaining variables, using the data from the second split. This yields asymptotic error control under minimal conditions. It involves, however, a one-time random split of the data. Results are sensitive to this arbitrary choice: it amounts to a `p-value lottery' and makes it difficult to reproduce results. Here, we show that inference across multiple random splits can be aggregated, while keeping asymptotic control over the inclusion of noise variables. We show that the resulting p-values can be used for control of both family-wise error (FWER) and false discovery rate (FDR). In addition, the proposed aggregation is shown to improve power while reducing the number of falsely selected variables substantially.
Recommendations
- Statistical significance in high-dimensional linear models
- Oracle P-values and variable screening
- Consistent variable selection in high dimensional regression via multiple testing
- High-dimensional linear model selection motivated by multiple testing
- Bootstrapping and sample splitting for high-dimensional, assumption-lean inference
Cited in
(only showing first 100 items - show all)- Two-stage procedures for high-dimensional data
- Nonuniformity of \(p\)-values can occur early in diverging dimensions
- Rejoinder on: ``Hierarchical inference for genome-wide association studies: a view on methodology with software
- Kernel meets sieve: post-regularization confidence bands for sparse additive model
- Multi split conformal prediction
- Confidence intervals for parameters in high-dimensional sparse vector autoregression
- Oracle P-values and variable screening
- Rejoinder on: ``High-dimensional simultaneous inference with the bootstrap
- Discussion: ``A significance test for the lasso
- Grouped penalization estimation of the osteoporosis data in the traditional Chinese medicine
- Likelihood ratio test in multivariate linear regression: from low to high dimension
- Exact adaptive confidence intervals for linear regression coefficients
- False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation
- Simultaneous test for linear model via projection
- On the Hauck–Donner Effect in Wald Tests: Detection, Tipping Points, and Parameter Space Characterization
- Network differential connectivity analysis
- Spatially relaxed inference on high-dimensional linear models
- Compositional knockoff filter for high‐dimensional regression analysis of microbiome data
- Conditional Test for Ultrahigh Dimensional Linear Regression Coefficients
- Threshold Selection in Feature Screening for Error Rate Control
- High-dimensional simultaneous inference with the bootstrap
- Estimation of High Dimensional Mean Regression in the Absence of Symmetry and Light Tail Assumptions
- Consistent variable selection for functional regression models
- Beyond support in two-stage variable selection
- Variable selection procedures from multiple testing
- High-dimensional linear model selection motivated by multiple testing
- Debiased Inference on Treatment Effect in a High-Dimensional Model
- Selective inference via marginal screening for high dimensional classification
- A spatially adaptive large-scale multiple-testing procedure
- Multi-split conformal prediction via Cauchy aggregation
- Tests for high-dimensional single-index models
- SLOPE-adaptive variable selection via convex optimization
- Discussion of big Bayes stories and BayesBag
- High-dimensional inference: confidence intervals, \(p\)-values and R-software \texttt{hdi}
- A regularization-based adaptive test for high-dimensional GLMs
- Testing the differential network between two gaussian graphical models with false discovery rate control
- Hierarchical inference for genome-wide association studies: a view on methodology with software
- False Discovery Rate Control via Data Splitting
- A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models
- Variable selection for generalized odds rate mixture cure models with interval-censored failure time data
- Testing covariates in high dimension linear regression with latent factors
- The smooth-Lasso and other \(\ell _{1}+\ell _{2}\)-penalized methods
- Discussion: ``A significance test for the lasso
- Discussion: ``A significance test for the lasso
- Discussion: ``A significance test for the lasso
- Discussion: ``A significance test for the lasso
- A global homogeneity test for high-dimensional linear regression
- A unified theory of confidence regions and testing for high-dimensional estimating equations
- A Two-Sample Conditional Distribution Test Using Conformal Prediction and Weighted Rank Sum
- Large-Scale Two-Sample Comparison of Support Sets
- Distributionally robust and generalizable inference
- Selective inference with a randomized response
- Robust error density estimation in ultrahigh dimensional sparse linear model
- Markov Neighborhood Regression for High-Dimensional Inference
- A significance test for the lasso
- CLT For U-statistics With Growing Dimension
- Inference for high‐dimensional linear models with locally stationary error processes
- A post-screening diagnostic study for ultrahigh dimensional data
- Bootstrapping and sample splitting for high-dimensional, assumption-lean inference
- Testing a single regression coefficient in high dimensional linear models
- Projection-based Inference for High-dimensional Linear Models
- Structure learning of exponential family graphical model with false discovery rate control
- Inference for High-Dimensional Censored Quantile Regression
- Error density estimation in high-dimensional sparse linear model
- Exact tests via multiple data splitting
- New hard-thresholding rules based on data splitting in high-dimensional imbalanced classification
- Detection of gene-gene interactions using multistage sparse and low-rank regression
- Marginal false discovery rate for a penalized transformation survival model
- Bayesian learners in gradient boosting for linear mixed models
- Discussion: ``A significance test for the lasso
- Confidence Intervals and Hypothesis Testing for High-Dimensional Regression
- High-dimensional variable screening and bias in subsequent inference, with an empirical comparison
- Inference for sparse linear regression based on the leave-one-covariate-out solution path
- High-dimensional statistical inference via DATE
- Universal inference
- Innovated scalable efficient inference for ultra-large graphical models
- Scalable penalized spatiotemporal land-use regression for ground-level nitrogen dioxide
- Post-model-selection inference in linear regression models: an integrated review
- Support recovery of Gaussian graphical model with false discovery rate control
- A three-stage approach to identify biomarker signatures for cancer genetic data with survival endpoints
- False discovery control for penalized variable selections with high-dimensional covariates
- On the impact of model selection on predictor identification and parameter inference
- Thresholding tests based on affine Lasso to achieve non-asymptotic nominal level and high power under sparse and dense alternatives in high dimension
- scientific article; zbMATH DE number 7370575 (Why is no real title available?)
- Pivotal Estimation in High-Dimensional Regression via Linear Programming
- An improved variable selection procedure for adaptive Lasso in high-dimensional survival analysis
- Estimation for high-dimensional linear mixed-effects models using \(\ell_1\)-penalization
- Cellwise outlier detection with false discovery rate control
- Statistical inference for model parameters in stochastic gradient descent
- Modeling Postoperative Mortality in Older Patients by Boosting Discrete-Time Competing Risks Models
- Penalized weighted composite quantile regression in the linear regression model with heavy-tailed autocorrelated errors
- A nonlinear mixed–integer programming approach for variable selection in linear regression model
- Automatic bias correction for testing in high‐dimensional linear models
- Wavelet-domain regression and predictive inference in psychiatric neuroimaging
- Feature-specific inference for penalized regression using local false discovery rates
- Testing Mediation Effects Using Logic of Boolean Matrices
- Markov neighborhood regression for statistical inference of high-dimensional generalized linear models
- The predictive power of the business and bank sentiment of firms: a high-dimensional Granger causality approach
- The Holdout Randomization Test for Feature Selection in Black Box Models
- Multicarving for high-dimensional post-selection inference
This page was built for publication: \(p\)-values for high-dimensional regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3069897)