Optimal properties of centroid-based classifiers for very high-dimensional data

From MaRDI portal
Publication:2380097

DOI10.1214/09-AOS736zbMATH Open1183.62104arXiv1002.4781MaRDI QIDQ2380097FDOQ2380097


Authors: Tung H. Pham, Peter Hall Edit this on Wikidata


Publication date: 24 March 2010

Published in: The Annals of Statistics (Search for Journal in Brave)

Abstract: We show that scale-adjusted versions of the centroid-based classifier enjoys optimal properties when used to discriminate between two very high-dimensional populations where the principal differences are in location. The scale adjustment removes the tendency of scale differences to confound differences in means. Certain other distance-based methods, for example, those founded on nearest-neighbor distance, do not have optimal performance in the sense that we propose. Our results permit varying degrees of sparsity and signal strength to be treated, and require only mild conditions on dependence of vector components. Additionally, we permit the marginal distributions of vector components to vary extensively. In addition to providing theory we explore numerical properties of a centroid-based classifier, and show that these features reflect theoretical accounts of performance.


Full work available at URL: https://arxiv.org/abs/1002.4781




Recommendations




Cites Work


Cited In (5)

Uses Software





This page was built for publication: Optimal properties of centroid-based classifiers for very high-dimensional data

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2380097)