Recovery guarantees for exemplar-based clustering
From MaRDI portal
Publication:897656
DOI10.1016/J.IC.2015.09.002zbMATH Open1333.62165arXiv1309.3256OpenAlexW2167930540MaRDI QIDQ897656FDOQ897656
Publication date: 7 December 2015
Published in: Information and Computation (Search for Journal in Brave)
Abstract: For a certain class of distributions, we prove that the linear programming relaxation of -medoids clustering---a variant of -means clustering where means are replaced by exemplars from within the dataset---distinguishes points drawn from nonoverlapping balls with high probability once the number of points drawn and the separation distance between any two balls are sufficiently large. Our results hold in the nontrivial regime where the separation distance is small enough that points drawn from different balls may be closer to each other than points drawn from the same ball; in this case, clustering by thresholding pairwise distances between points can fail. We also exhibit numerical evidence of high-probability recovery in a substantially more permissive regime.
Full work available at URL: https://arxiv.org/abs/1309.3256
Recommendations
Classification and discrimination; cluster analysis (statistical aspects) (62H30) Linear programming (90C05) Boolean programming (90C09)
Cites Work
- Algorithm AS 136: A K-Means Clustering Algorithm
- Clustering by Passing Messages Between Data Points
- Least squares quantization in PCM
- NP-hardness of Euclidean sum-of-squares clustering
- Convex optimization for the planted \(k\)-disjoint-clique problem
- Guaranteed clustering and biclustering via semidefinite programming
- Approximating k-median via pseudo-approximation
- Correlation clustering
- Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP
- Improved Approximation Algorithms for the Uncapacitated Facility Location Problem
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- A 1.488 Approximation Algorithm for the Uncapacitated Facility Location Problem
- On the Complexity of Some Common Geometric Location Problems
- Algorithms for graph partitioning on the planted partition model
- Efficiently learning mixtures of two Gaussians
- A new greedy approach for facility location problems
- Title not available (Why is that?)
- Learning Theory
- Learning Theory
- An Optimal Bifactor Approximation Algorithm for the Metric Uncapacitated Facility Location Problem
- Approximation Algorithms for Metric Facility Location Problems
- Local Search Heuristics for k-Median and Facility Location Problems
- Worst-Case and Probabilistic Analysis of a Geometric Location Problem
- A spectral algorithm for learning mixture models
- Learning mixtures of arbitrary gaussians
- A new partitioning around medoids algorithm
- Learning Mixtures of Gaussians in High Dimensions
- Title not available (Why is that?)
- Title not available (Why is that?)
- Random Projection Trees for Vector Quantization
- PAC Learning Axis-Aligned Mixtures of Gaussians with No Separation Assumption
- A Binary Variable Model for Affinity Propagation
Cited In (9)
- Discrete facility location in machine learning
- Hidden Integrality and Semirandom Robustness of SDP Relaxation for Sub-Gaussian Mixture Model
- Probably certifiably correct \(k\)-means clustering
- Exact Clustering of Weighted Graphs via Semidefinite Programming
- Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
- Clustering subgaussian mixtures by semidefinite programming
- Convex optimization for the densest subgraph and densest submatrix problems
- \(k\)-median: exact recovery in the extended stochastic ball model
- Local Versions of Sum-of-Norms Clustering
Uses Software
This page was built for publication: Recovery guarantees for exemplar-based clustering
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q897656)