Local uncertainty sampling for large-scale multiclass logistic regression (Q2196246): Difference between revisions

For analyzing huge data sets using multiclass logistic regression when computational facilities are not available, one of the often used methods is to subsample a data set which can be accommodated within the available computer resources. There are two types of imbalances in classes, namely marginal imbalance (MI) when some classes are rarer than others and conditional imbalance (CI) when the class labels are easy to predict for most of the observations. For MI binary classification, case control (CC) subsampling is used with an equal number of samples from each class uniformly. In this paper, the authors review one of the earlier subsampling schemes for a binary logistic regression termed as local case control (LCC) sampling. This scheme is shown to fare better than the uniform random sampling with respect to asymptotic variance criterion of the estimates obtained. Next, they propose general subsampling schemes for large scale multiclass logistic regression problems. The method consists of selecting data points with labels that are conditionally uncertain given their local observations based on the predicted probability distribution and then fitting a multiclass logistic model for estimating the model parameter. Simulation and real world data sets, namely MNIST and Web-spam data are considered and it is confirmed that the LUS method fares better than uniform sampling, CC sampling and LCC sampling under various settings. If the full sample size $(n)$ based mle has asymptotic variance $v$, then the LUS has asymptotic variance less than $e v$ $(e>1)$, now based on a sample size of $n/ e$.

0 references

reviewed by

T. J. Rao

0 references

zbMATH Keywords

binary and multiclass logistic regression

0 references

local case control sampling

0 references

local uncertainty sampling

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

Separate sample logistic discrimination

0 references

Sample Selection Bias Correction Theory

0 references

Local case-control sampling: efficient subsampling in imbalanced data sets

0 references

Additive logistic regression: a statistical view of boosting. (With discussion and a rejoinder by the authors)

0 references

A Generalization of Sampling Without Replacement From a Finite Universe

0 references

On the Robustness of Weighted Methods for Fitting Models to Case–Control Edata

0 references

Q3747546

0 references

Fitting Logistic Regression Models in Stratified Case-Control Studies

0 references

Identifiers

zbMATH Open document ID

1452.62163

0 references

DOI

10.1214/19-AOS1867

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2196246

@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / arXiv ID @@
+.08098
@@ Property / arXiv ID: 1604.08098 / rank @@
+Normal rank
@@ Property / cites work @@
+Separate sample logistic discrimination
@@ Property / cites work: Separate sample logistic discrimination / rank @@
+Normal rank
@@ Property / cites work @@
+Sample Selection Bias Correction Theory
@@ Property / cites work: Sample Selection Bias Correction Theory / rank @@
+Normal rank
@@ Property / cites work @@
+Local case-control sampling: efficient subsampling in imbalanced data sets
+Normal rank
@@ Property / cites work @@
+Additive logistic regression: a statistical view of boosting. (With discussion and a rejoinder by the authors)
+Normal rank
@@ Property / cites work @@
+A Generalization of Sampling Without Replacement From a Finite Universe
+Normal rank
@@ Property / cites work @@
+On the Robustness of Weighted Methods for Fitting Models to Case–Control Edata
+Normal rank
@@ Property / cites work @@
+Q3747546
@@ Property / cites work: Q3747546 / rank @@
+Normal rank
@@ Property / cites work @@
+Fitting Logistic Regression Models in Stratified Case-Control Studies
+Normal rank
@@ Property / OpenAlex ID @@
+W3043437699
@@ Property / OpenAlex ID: W3043437699 / rank @@
+Normal rank
@@ links / mardi / name / links / mardi / name @@
+Publication:2196246