Local uncertainty sampling for large-scale multiclass logistic regression
From MaRDI portal
Publication:2196246
DOI10.1214/19-AOS1867zbMATH Open1452.62163arXiv1604.08098OpenAlexW3043437699MaRDI QIDQ2196246FDOQ2196246
Ting Yang, Lei Han, Tong Zhang, Kean Ming Tan
Publication date: 28 August 2020
Published in: The Annals of Statistics (Search for Journal in Brave)
Abstract: A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be handled by available computational resources. In this paper, we propose a general subsampling scheme for large-scale multi-class logistic regression and examine the variance of the resulting estimator. We show that asymptotically, the proposed method always achieves a smaller variance than that of the uniform random sampling. Moreover, when the classes are conditionally imbalanced, significant improvement over uniform sampling can be achieved. Empirical performance of the proposed method is compared to other methods on both simulated and real-world datasets, and these results match and confirm our theoretical analysis.
Full work available at URL: https://arxiv.org/abs/1604.08098
Recommendations
- Optimal subsampling for large sample logistic regression
- Optimal subsample selection for massive logistic regression with distributed data
- Multiclass-penalized logistic regression
- Probability estimation for large-margin classifiers
- Multi-output local Gaussian process regression: applications to uncertainty quantification
- Efficient posterior sampling for high-dimensional imbalanced logistic regression
- Logistic Discrimination Based on Regularized Local Likelihood Method
Point estimation (62F10) Sampling theory, sample surveys (62D05) Generalized linear models (logistic models) (62J12)
Cites Work
- A Generalization of Sampling Without Replacement From a Finite Universe
- Additive logistic regression: a statistical view of boosting. (With discussion and a rejoinder by the authors)
- Title not available (Why is that?)
- Separate sample logistic discrimination
- On the Robustness of Weighted Methods for Fitting Models to Case–Control Edata
- Sample Selection Bias Correction Theory
- Local case-control sampling: efficient subsampling in imbalanced data sets
- Fitting Logistic Regression Models in Stratified Case-Control Studies
Cited In (11)
- Approximating Partial Likelihood Estimators via Optimal Subsampling
- Optimal subsampling for softmax regression
- Deterministic subsampling for logistic regression with massive data
- Optimal subsampling for large-scale quantile regression
- Optimal Poisson subsampling for softmax regression
- Optimal subsampling for multiplicative regression with massive data
- Title not available (Why is that?)
- A two-stage optimal subsampling estimation for missing data problems with large-scale data
- A Subsampling Method for Regression Problems Based on Minimum Energy Criterion
- Model constraints independent optimal subsampling probabilities for softmax regression
- A review on design inspired subsampling for big data
This page was built for publication: Local uncertainty sampling for large-scale multiclass logistic regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2196246)