A modern maximum-likelihood theory for high-dimensional logistic regression
From MaRDI portal
Publication:5218552
Abstract: Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there are formulas to predict the variability of these estimates which are used for the purpose of statistical inference; for instance, to produce p-values for testing the significance of regression coefficients. Although these formulas come from large sample asymptotics, we are often told that we are on reasonably safe grounds when is large in such a way that or . This paper shows that this is far from the case, and consequently, inferences routinely produced by common software packages are often unreliable. Consider a logistic model with independent features in which and become increasingly large in a fixed ratio. Then we show that (1) the MLE is biased, (2) the variability of the MLE is far greater than classically predicted, and (3) the commonly used likelihood-ratio test (LRT) is not distributed as a chi-square. The bias of the MLE is extremely problematic as it yields completely wrong predictions for the probability of a case based on observed values of the covariates. We develop a new theory, which asymptotically predicts (1) the bias of the MLE, (2) the variability of the MLE, and (3) the distribution of the LRT. We empirically also demonstrate that these predictions are extremely accurate in finite samples. Further, an appealing feature is that these novel predictions depend on the unknown sequence of regression coefficients only through a single scalar, the overall strength of the signal. This suggests very concrete procedures to adjust inference; we describe one such procedure learning a single parameter from data and producing accurate inference
Recommendations
- Maximum likelihood estimation in logistic regression models with a diverging number of covariates
- The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square
- The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression
- Accuracies in the theory of logistic models
- Rate of convergence of the probability of non-existence of the MLE's in simple logistic regression
Cited in
(64)- Hierarchical inference for genome-wide association studies: a view on methodology with software
- Penalization-induced shrinking without rotation in high dimensional GLM regression: a cavity analysis
- The existence of maximum likelihood estimate in high-dimensional binary response generalized linear models
- A Unifying Tutorial on Approximate Message Passing
- Probabilistic learning inference of boundary value problem with uncertainties based on Kullback-Leibler divergence under implicit constraints
- Directional testing for high dimensional multivariate normal distributions
- Multicarving for high-dimensional post-selection inference
- Universality of approximate message passing with semirandom matrices
- Automatic bias correction for testing in high‐dimensional linear models
- Approximate message passing algorithms for rotationally invariant matrices
- Conjugate priors and bias reduction for logistic regression models
- Fundamental barriers to high-dimensional regression with convex penalties
- Debiased lasso for generalized linear models with a diverging number of covariates
- The Lasso with general Gaussian designs with applications to hypothesis testing
- Mallows criterion for heteroskedastic linear regressions with many regressors
- Optimal combination of linear and spectral estimators for generalized linear models
- Binary classification of Gaussian mixtures: abundance of support vectors, benign overfitting, and regularization
- The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression
- Online inference in high-dimensional generalized linear models with streaming data
- The distribution of the Lasso: uniform control over sparse balls and adaptive parameter tuning
- A regularization-based adaptive test for high-dimensional GLMs
- Finite-sample analysis of \(M\)-estimators using self-concordance
- Precise statistical analysis of classification accuracies for adversarial training
- Replica analysis of overfitting in generalized linear regression models
- The asymptotic distribution of the MLE in high-dimensional logistic models: arbitrary covariance
- Consistency of logistic regression coefficient estimates calculated from a training sample.
- Analysis of overfitting in the regularized Cox model
- scientific article; zbMATH DE number 7306882 (Why is no real title available?)
- scientific article; zbMATH DE number 7625184 (Why is no real title available?)
- A precise high-dimensional asymptotic theory for boosting and minimum-\(\ell_1\)-norm interpolated classifiers
- Knockoffs with side information
- The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square
- Which bridge estimator is the best for variable selection?
- Approximate message passing with spectral initialization for generalized linear models*
- Discussion on: “A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models” by Dai, Lin, Zing, Liu
- Maximum likelihood estimation in logistic regression models with a diverging number of covariates
- Some perspectives on inference in high dimensions
- scientific article; zbMATH DE number 7370623 (Why is no real title available?)
- Assessing the Most Vulnerable Subgroup to Type II Diabetes Associated with Statin Usage: Evidence from Electronic Health Record Data
- Comment on “A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models” by Chengguang Dai, Buyu Lin, Xin Xing, and Jun S. Liu
- Comments on “A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models”
- A Friendly Tutorial on Mean-Field Spin Glass Techniques for Non-Physicists
- scientific article; zbMATH DE number 7306859 (Why is no real title available?)
- scientific article; zbMATH DE number 7626769 (Why is no real title available?)
- Universality of regularized regression estimators in high dimensions
- Statistical Inference for High-Dimensional Generalized Linear Models With Binary Outcomes
- Debiasing convex regularized estimators and interval estimation in linear models
- Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models
- Sharp global convergence guarantees for iterative nonconvex optimization with random data
- scientific article; zbMATH DE number 7370646 (Why is no real title available?)
- A Scale-Free Approach for False Discovery Rate Control in Generalized Linear Models
- An adaptively resized parametric bootstrap for inference in high-dimensional generalized linear models
- Noisy linear inverse problems under convex constraints: exact risk asymptotics in high dimensions
- A discussion of ``A note on universal inference by Tse and Davison
- FDR control and power analysis for high-dimensional logistic regression via Stabkoff
- Bounded-memory adjusted scores estimation in generalized linear models with large data sets
- A tradeoff between false discovery and true positive proportions for sparse high-dimensional logistic regression
- Dimension-agnostic inference using cross U-statistics
- Tractability from overparametrization: the example of the negative perceptron
- A comprehensive review of bias reduction methods for logistic regression
- Exact convergence analysis for metropolis–hastings independence samplers in Wasserstein distances
- On the functional regression model and its finite-dimensional approximations
- Approximate message passing with rigorous guarantees for pooled data and quantitative group testing
- StarTrek: combinatorial variable selection with false discovery rate control
This page was built for publication: A modern maximum-likelihood theory for high-dimensional logistic regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5218552)