Local uncertainty sampling for large-scale multiclass logistic regression
From MaRDI portal
Publication:2196246
Abstract: A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be handled by available computational resources. In this paper, we propose a general subsampling scheme for large-scale multi-class logistic regression and examine the variance of the resulting estimator. We show that asymptotically, the proposed method always achieves a smaller variance than that of the uniform random sampling. Moreover, when the classes are conditionally imbalanced, significant improvement over uniform sampling can be achieved. Empirical performance of the proposed method is compared to other methods on both simulated and real-world datasets, and these results match and confirm our theoretical analysis.
Recommendations
- Optimal subsampling for large sample logistic regression
- Optimal subsample selection for massive logistic regression with distributed data
- Multiclass-penalized logistic regression
- Probability estimation for large-margin classifiers
- Multi-output local Gaussian process regression: applications to uncertainty quantification
- Efficient posterior sampling for high-dimensional imbalanced logistic regression
- Logistic Discrimination Based on Regularized Local Likelihood Method
Cites work
- scientific article; zbMATH DE number 3984372 (Why is no real title available?)
- A Generalization of Sampling Without Replacement From a Finite Universe
- Additive logistic regression: a statistical view of boosting. (With discussion and a rejoinder by the authors)
- Fitting Logistic Regression Models in Stratified Case-Control Studies
- Local case-control sampling: efficient subsampling in imbalanced data sets
- On the Robustness of Weighted Methods for Fitting Models to Case–Control Edata
- Sample Selection Bias Correction Theory
- Separate sample logistic discrimination
Cited in
(14)- A two-stage optimal subsampling estimation for missing data problems with large-scale data
- Approximating Partial Likelihood Estimators via Optimal Subsampling
- Surprise sampling: improving and extending the local case-control sampling
- Information-based optimal subdata selection for big data logistic regression
- Optimal Poisson subsampling for softmax regression
- A review on design inspired subsampling for big data
- Local case-control sampling: efficient subsampling in imbalanced data sets
- Deterministic subsampling for logistic regression with massive data
- Optimal subsampling for multiplicative regression with massive data
- A Subsampling Method for Regression Problems Based on Minimum Energy Criterion
- Model constraints independent optimal subsampling probabilities for softmax regression
- Optimal subsampling for softmax regression
- Automated scalable Bayesian inference via Hilbert coresets
- Optimal subsampling for large-scale quantile regression
This page was built for publication: Local uncertainty sampling for large-scale multiclass logistic regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2196246)