Classification with many classes: challenges and pluses
From MaRDI portal
Publication:2008227
DOI10.1016/J.JMVA.2019.104536zbMATH Open1428.62264arXiv1506.01567OpenAlexW2964910440MaRDI QIDQ2008227FDOQ2008227
Authors: Marianna Pensky, Felix P. Abramovich
Publication date: 22 November 2019
Published in: Journal of Multivariate Analysis (Search for Journal in Brave)
Abstract: The objective of the paper is to study accuracy of multi-class classification in high-dimensional setting, where the number of classes is also large ("large , large , small " model). While this problem arises in many practical applications and many techniques have been recently developed for its solution, to the best of our knowledge nobody provided a rigorous theoretical analysis of this important setup. The purpose of the present paper is to fill in this gap. We consider one of the most common settings, classification of high-dimensional normal vectors where, unlike standard assumptions, the number of classes could be large. We derive non-asymptotic conditions on effects of significant features, and the low and the upper bounds for distances between classes required for successful feature selection and classification with a given accuracy. Furthermore, we study an asymptotic setup where the number of classes is diverging with the dimension of feature space and while the number of samples per class is possibly limited. We point out on an interesting and, at first glance, somewhat counter-intuitive phenomenon that a large number of classes may be a "blessing" rather than a "curse" since, in certain settings, the precision of classification can improve as the number of classes grows. This is due to more accurate feature selection since even weaker significant features, which are not sufficiently strong to be manifested in a coarse classification, being shared across the classes, have a stronger impact as the number of classes increases. We supplement our theoretical investigation by a simulation study and a real data example where we again observe the above phenomenon.
Full work available at URL: https://arxiv.org/abs/1506.01567
Recommendations
- Model selection for classification with a large number of classes
- High-dimensional classification when useful information comes from many, perhaps all features
- Impossibility of successful classification when useful features are rare and weak
- Bayesian feature selection for classification with possibly large number of classes
- A procedure of linear discrimination analysis with detected sparsity structure for high-dimensional multi-class classification
Bayesian inference (62F15) Classification and discrimination; cluster analysis (statistical aspects) (62H30)
Cites Work
- Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism
- High-dimensional classification using features annealed independence rules
- Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations
- Impossibility of successful classification when useful features are rare and weak
- Feature selection by higher criticism thresholding achieves the optimal phase diagram
- Multicategory Support Vector Machines
- Sparse linear discriminant analysis by thresholding for high dimensional data
- Computing the distribution of quadratic forms in normal variables
- Classification of sparse high-dimensional vectors
- Title not available (Why is that?)
- 10.1162/15324430260185628
- On the consistency of multiclass classification methods
- Theory of Classification: a Survey of Some Recent Advances
- An alternative point of view on Lepski's method
- A framework for kernel-based multi-category classification
- Title not available (Why is that?)
- Bayesian feature selection for classification with possibly large number of classes
- Training highly multiclass classifiers
- Multi-class classification in image analysis via error-correcting output codes
Cited In (5)
- Model selection for classification with a large number of classes
- Impossibility of successful classification when useful features are rare and weak
- Extrapolating expected accuracies for large multi-class problems
- Training highly multiclass classifiers
- Optimal discriminant analysis in high-dimensional latent factor models
This page was built for publication: Classification with many classes: challenges and pluses
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2008227)