Local uncertainty sampling for large-scale multiclass logistic regression
From MaRDI portal
Abstract: A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be handled by available computational resources. In this paper, we propose a general subsampling scheme for large-scale multi-class logistic regression and examine the variance of the resulting estimator. We show that asymptotically, the proposed method always achieves a smaller variance than that of the uniform random sampling. Moreover, when the classes are conditionally imbalanced, significant improvement over uniform sampling can be achieved. Empirical performance of the proposed method is compared to other methods on both simulated and real-world datasets, and these results match and confirm our theoretical analysis.
Recommendations
- Optimal subsampling for large sample logistic regression
- Optimal subsample selection for massive logistic regression with distributed data
- Multiclass-penalized logistic regression
- Probability estimation for large-margin classifiers
- Multi-output local Gaussian process regression: applications to uncertainty quantification
- Efficient posterior sampling for high-dimensional imbalanced logistic regression
- Logistic Discrimination Based on Regularized Local Likelihood Method
Cites work
- scientific article; zbMATH DE number 3984372 (Why is no real title available?)
- A Generalization of Sampling Without Replacement From a Finite Universe
- Additive logistic regression: a statistical view of boosting. (With discussion and a rejoinder by the authors)
- Fitting Logistic Regression Models in Stratified Case-Control Studies
- Local case-control sampling: efficient subsampling in imbalanced data sets
- On the Robustness of Weighted Methods for Fitting Models to Case–Control Edata
- Sample Selection Bias Correction Theory
- Separate sample logistic discrimination
Cited in
(24)- Approximating Partial Likelihood Estimators via Optimal Subsampling
- Optimal subsampling for softmax regression
- Independence-Encouraging Subsampling for Nonparametric Additive Models
- Deterministic subsampling for logistic regression with massive data
- Optimal subsampling for large-scale quantile regression
- Optimal Poisson subsampling for softmax regression
- Optimal subsampling for multiplicative regression with massive data
- A two-stage optimal subsampling estimation for missing data problems with large-scale data
- Surprise sampling: improving and extending the local case-control sampling
- Information-based optimal subdata selection for big data logistic regression
- Optimal subsampling for estimation of dimension reduction directions
- Efficient subsampling for high-dimensional data
- Refitted cross-validation estimation for high-dimensional subsamples from low-dimension full data
- A Subsampling Method for Regression Problems Based on Minimum Energy Criterion
- Local case-control sampling: efficient subsampling in imbalanced data sets
- Automated scalable Bayesian inference via Hilbert coresets
- Optimal sample selection through uncertainty estimation and its application in deep learning
- Optimal Subsampling for Data Streams with Measurement Constrained Categorical Responses
- Model constraints independent optimal subsampling probabilities for softmax regression
- Subsampled one-step estimation for fast statistical inference
- DsubCox: a fast subsampling algorithm for Cox model with distributed and massive survival data
- A review on design inspired subsampling for big data
- Optimal subsampling for multinomial logistic models with big data
- A Subsampling Strategy for AIC-based Model Averaging with Generalized Linear Models
This page was built for publication: Local uncertainty sampling for large-scale multiclass logistic regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2196246)