Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory
From MaRDI portal
Publication:2986116
Abstract: We consider linear regression in the high-dimensional regime where the number of observations is smaller than the number of parameters . A very successful approach in this setting uses -penalized least squares (a.k.a. the Lasso) to search for a subset of parameters that best explain the data, while setting the other parameters to zero. Considerable amount of work has been devoted to characterizing the estimation and model selection problems within this approach. In this paper we consider instead the fundamental, but far less understood, question of emph{statistical significance}. More precisely, we address the problem of computing p-values for single regression coefficients. On one hand, we develop a general upper bound on the minimax power of tests with a given significance level. On the other, we prove that this upper bound is (nearly) achievable through a practical procedure in the case of random design matrices with independent entries. Our approach is based on a debiasing of the Lasso estimator. The analysis builds on a rigorous characterization of the asymptotic distribution of the Lasso estimator and its debiased version. Our result holds for optimal sample size, i.e., when is at least on the order of . We generalize our approach to random design matrices with i.i.d. Gaussian rows . In this case we prove that a similar distributional characterization (termed `standard distributional limit') holds for much larger than . Finally, we show that for optimal sample size, being at least of order , the standard distributional limit for general Gaussian designs can be derived from the replica heuristics in statistical physics.
Cited in
(56)- One-step regularized estimator for high-dimensional regression models
- Enmsp: an elastic-net multi-step screening procedure for high-dimensional regression
- Online Debiasing for Adaptively Collected High-Dimensional Data With Applications to Time Series Analysis
- Large-Scale Two-Sample Comparison of Support Sets
- StarTrek: combinatorial variable selection with false discovery rate control
- SLOPE-adaptive variable selection via convex optimization
- On asymptotically optimal confidence regions and tests for high-dimensional models
- Discussion: ``A significance test for the lasso
- In defense of the indefensible: a very naïve approach to high-dimensional inference
- A unifying framework of high-dimensional sparse estimation with difference-of-convex (DC) regularizations
- Scalable inference for high-dimensional precision matrix
- Discussion: ``A significance test for the lasso
- Discussion: ``A significance test for the lasso
- Discussion: ``A significance test for the lasso
- Discussion: ``A significance test for the lasso
- Discussion: ``A significance test for the lasso
- Asymptotic normality and optimalities in estimation of large Gaussian graphical models
- Gaussian graphical model estimation with false discovery rate control
- Statistical Inference, Learning and Models in Big Data
- The Lasso with general Gaussian designs with applications to hypothesis testing
- Uniformly valid post-regularization confidence regions for many functional parameters in z-estimation framework
- On the asymptotic variance of the debiased Lasso
- The distribution of the Lasso: uniform control over sparse balls and adaptive parameter tuning
- Flexible and Interpretable Models for Survival Data
- Debiasing the Lasso: optimal sample size for Gaussian designs
- Worst possible sub-directions in high-dimensional models
- Ill-posed estimation in high-dimensional models with instrumental variables
- Linear hypothesis testing in dense high-dimensional linear models
- Constructing confidence intervals for the signals in sparse phase retrieval
- Post-model-selection inference in linear regression models: an integrated review
- Generalized M-estimators for high-dimensional Tobit I models
- Efficient estimation of smooth functionals in Gaussian shift models
- Significance testing in non-sparse high-dimensional linear models
- The benefit of group sparsity in group inference with de-biased scaled group Lasso
- A significance test for the lasso
- Lasso-driven inference in time and space
- Detangling robustness in high dimensions: composite versus model-averaged estimation
- Inference for high-dimensional varying-coefficient quantile regression
- Generalized matrix decomposition regression: estimation and inference for two-way structured data
- Additive model selection
- Asymptotically honest confidence regions for high dimensional parameters by the desparsified conservative Lasso
- Asymptotic risk and phase transition of \(l_1\)-penalized robust estimator
- Asymptotic normality of robust \(M\)-estimators with convex penalty
- LASSO risk and phase transition under dependence
- Semiparametric efficiency bounds for high-dimensional models
- De-biasing the Lasso with degrees-of-freedom adjustment
- Rejoinder: ``A significance test for the lasso
- Semi-analytic resampling in Lasso
- Online rules for control of false discovery rate and false discovery exceedance
- Universality of regularized regression estimators in high dimensions
- Statistical Inference for High-Dimensional Generalized Linear Models With Binary Outcomes
- Debiasing convex regularized estimators and interval estimation in linear models
- Asymptotically efficient estimation of smooth functionals of covariance operators
- Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models
- Estimating structured high-dimensional covariance and precision matrices: optimal rates and adaptive estimation
- Gene set priorization guided by regulatory networks with p-values through kernel mixed model
This page was built for publication: Hypothesis Testing in High-Dimensional Regression Under the Gaussian Random Design Model: Asymptotic Theory
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2986116)