Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions
From MaRDI portal
Publication:5281049
Abstract: In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for a few emblematic methods based on pairwise distances: a simple algorithm based on the extraction of connected components in a neighborhood graph; the spectral clustering method of Ng, Jordan and Weiss; and hierarchical clustering with single linkage. The methods are shown to enjoy some near-optimal properties in terms of separation between clusters and robustness to outliers. The local scaling method of Zelnik-Manor and Perona is shown to lead to a near-optimal choice for the scale in the first two methods. We also provide a lower bound on the spectral gap to consistently choose the correct number of clusters in the spectral method.
Cited in
(17)- Robust subspace clustering
- On the quality of k-means clustering based on grouped data
- \(K\)-prototypes based clustering algorithm for data mixed with numeric and categorical values
- A new nonparametric pairwise clustering algorithm based on iterative estimation of distance profiles
- On clustering procedures and nonparametric mixture estimation
- Clustering the mixed panel dataset using Gower's distance and k-prototypes algorithms
- scientific article; zbMATH DE number 7255037 (Why is no real title available?)
- Diffusion \(K\)-means clustering on manifolds: provable exact recovery via semidefinite relaxations
- scientific article; zbMATH DE number 5905610 (Why is no real title available?)
- The coreness and H-index of random geometric graphs
- The shape of data and probability measures
- scientific article; zbMATH DE number 7626762 (Why is no real title available?)
- Learning by unsupervised nonlinear diffusion
- A multiscale environment for learning by diffusion
- Statistical analysis of a hierarchical clustering algorithm with outliers
- Balancing geometry and density: path distances on high-dimensional data
- Spectral clustering based on local linear approximations
This page was built for publication: Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5281049)