Local uncertainty sampling for large-scale multiclass logistic regression (Q2196246): Difference between revisions

From MaRDI portal
Importer (talk | contribs)
Created a new Item
 
Set OpenAlex properties.
 
(4 intermediate revisions by 4 users not shown)
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / arXiv ID
 
Property / arXiv ID: 1604.08098 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Separate sample logistic discrimination / rank
 
Normal rank
Property / cites work
 
Property / cites work: Sample Selection Bias Correction Theory / rank
 
Normal rank
Property / cites work
 
Property / cites work: Local case-control sampling: efficient subsampling in imbalanced data sets / rank
 
Normal rank
Property / cites work
 
Property / cites work: Additive logistic regression: a statistical view of boosting. (With discussion and a rejoinder by the authors) / rank
 
Normal rank
Property / cites work
 
Property / cites work: A Generalization of Sampling Without Replacement From a Finite Universe / rank
 
Normal rank
Property / cites work
 
Property / cites work: On the Robustness of Weighted Methods for Fitting Models to Case–Control Edata / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3747546 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Fitting Logistic Regression Models in Stratified Case-Control Studies / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W3043437699 / rank
 
Normal rank
links / mardi / namelinks / mardi / name
 

Latest revision as of 08:56, 30 July 2024

scientific article
Language Label Description Also known as
English
Local uncertainty sampling for large-scale multiclass logistic regression
scientific article

    Statements

    Local uncertainty sampling for large-scale multiclass logistic regression (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    28 August 2020
    0 references
    For analyzing huge data sets using multiclass logistic regression when computational facilities are not available, one of the often used methods is to subsample a data set which can be accommodated within the available computer resources. There are two types of imbalances in classes, namely marginal imbalance (MI) when some classes are rarer than others and conditional imbalance (CI) when the class labels are easy to predict for most of the observations. For MI binary classification, case control (CC) subsampling is used with an equal number of samples from each class uniformly. In this paper, the authors review one of the earlier subsampling schemes for a binary logistic regression termed as local case control (LCC) sampling. This scheme is shown to fare better than the uniform random sampling with respect to asymptotic variance criterion of the estimates obtained. Next, they propose general subsampling schemes for large scale multiclass logistic regression problems. The method consists of selecting data points with labels that are conditionally uncertain given their local observations based on the predicted probability distribution and then fitting a multiclass logistic model for estimating the model parameter. Simulation and real world data sets, namely MNIST and Web-spam data are considered and it is confirmed that the LUS method fares better than uniform sampling, CC sampling and LCC sampling under various settings. If the full sample size $(n)$ based mle has asymptotic variance $v$, then the LUS has asymptotic variance less than $e v$ $(e>1)$, now based on a sample size of $n/ e$.
    0 references
    0 references
    binary and multiclass logistic regression
    0 references
    local case control sampling
    0 references
    local uncertainty sampling
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references