Optimal properties of centroid-based classifiers for very high-dimensional data
From MaRDI portal
Publication:2380097
Abstract: We show that scale-adjusted versions of the centroid-based classifier enjoys optimal properties when used to discriminate between two very high-dimensional populations where the principal differences are in location. The scale adjustment removes the tendency of scale differences to confound differences in means. Certain other distance-based methods, for example, those founded on nearest-neighbor distance, do not have optimal performance in the sense that we propose. Our results permit varying degrees of sparsity and signal strength to be treated, and require only mild conditions on dependence of vector components. Additionally, we permit the marginal distributions of vector components to vary extensively. In addition to providing theory we explore numerical properties of a centroid-based classifier, and show that these features reflect theoretical accounts of performance.
Recommendations
- Scale adjustments for classifiers in high-dimensional, low sample size settings
- A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data
- Median-based classifiers for high-dimensional data
- Robust centroid based classification with minimum error rates for high dimension, low sample size data
- Quantile-based classifiers
Cites work
- scientific article; zbMATH DE number 3806754 (Why is no real title available?)
- Bandwidth choice for nonparametric classification
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data
- Consistent nonparametric regression. Discussion
- Geometric Representation of High Dimension, Low Sample Size Data
- Long- and short-range correlations in genome organization
- Nonlinear time series. Nonparametric and parametric methods
- Pattern classification.
- Regularized estimation of large covariance matrices
- Scale adjustments for classifiers in high-dimensional, low sample size settings
- The elements of statistical learning. Data mining, inference, and prediction
- Theoretical Measures of Relative Performance of Classifiers for High Dimensional Data with Small Sample Sizes
Cited in
(5)
This page was built for publication: Optimal properties of centroid-based classifiers for very high-dimensional data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2380097)