Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions

From MaRDI portal
Publication:5281049

DOI10.1109/TIT.2011.2104630zbMATH Open1366.62117arXiv0909.2353MaRDI QIDQ5281049FDOQ5281049


Authors: Ery Arias-Castro Edit this on Wikidata


Publication date: 27 July 2017

Published in: IEEE Transactions on Information Theory (Search for Journal in Brave)

Abstract: In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for a few emblematic methods based on pairwise distances: a simple algorithm based on the extraction of connected components in a neighborhood graph; the spectral clustering method of Ng, Jordan and Weiss; and hierarchical clustering with single linkage. The methods are shown to enjoy some near-optimal properties in terms of separation between clusters and robustness to outliers. The local scaling method of Zelnik-Manor and Perona is shown to lead to a near-optimal choice for the scale in the first two methods. We also provide a lower bound on the spectral gap to consistently choose the correct number of clusters in the spectral method.


Full work available at URL: https://arxiv.org/abs/0909.2353







Cited In (17)





This page was built for publication: Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5281049)