Robust k-means clustering for distributions with two moments
From MaRDI portal
Publication:2054489
Abstract: We consider the robust algorithms for the -means clustering problem where a quantizer is constructed based on independent observations. Our main results are median of means based non-asymptotic excess distortion bounds that hold under the two bounded moments assumption in a general separable Hilbert space. In particular, our results extend the renowned asymptotic result of Pollard, 1981 who showed that the existence of two moments is sufficient for strong consistency of an empirically optimal quantizer in . In a special case of clustering in , under two bounded moments, we prove matching (up to constant factors) non-asymptotic upper and lower bounds on the excess distortion, which depend on the probability mass of the lightest cluster of an optimal quantizer. Our bounds have the sub-Gaussian form, and the proofs are based on the versions of uniform bounds for robust mean estimators.
Recommendations
- Robust moment estimation and improved clustering via sum of squares
- Robust interval type-2 possibilistic \(C\)-means clustering
- Robust and sparse \(k\)-means clustering for high-dimensional data
- Robust hierarchical \(k\)-center clustering
- Robust estimation of the mean vector for high-dimensional data set using robust clustering
- A robust algorithm for cluster initialization using uniform effect of \(k\)-means
- A review of robust clustering methods
- Some refinements of rough \(k\)-means clustering
- Robust clustering by double bisection crossing minimization
Cites Work
- scientific article; zbMATH DE number 3790208 (Why is no real title available?)
- scientific article; zbMATH DE number 3446442 (Why is no real title available?)
- $K$-Dimensional Coding Schemes in Hilbert Spaces
- A Vector-Contraction Inequality for Rademacher Complexities
- Concentration inequalities. A nonasymptotic theory of independence
- Empirical risk minimization for heavy-tailed losses
- Extensions of Lipschitz mappings into a Hilbert space
- Fast rates for empirical vector quantization
- Foundations of quantization for probability distributions
- High-dimensional probability. An introduction with applications in data science
- Improved Minimax Bounds on the Test and Training Distortion of Empirically Designed Vector Quantizers
- Near-optimal mean estimators with respect to general norms
- Nonasymptotic bounds for vector quantization in Hilbert spaces
- On Hölder fields clustering
- On the Performance of Clustering in Hilbert Spaces
- Quantization and clustering with Bregman divergences
- Regularization, sparse recovery, and median-of-means tournaments
- Robust Bregman clustering
- Robust classification via MOM minimization
- Robust covariance estimation under \(L_4\)-\(L_2\) norm equivalence
- Robust machine learning by median-of-means: theory and practice
- Strong consistency of k-means clustering
- Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries
- Sub-Gaussian estimators of the mean of a random vector
- Testing the manifold hypothesis
- The minimax distortion redundancy in empirical quantizer design
- The space complexity of approximating the frequency moments
- Theory of Classification: a Survey of Some Recent Advances
- Upper and lower bounds for stochastic processes. Modern methods and classical problems
Cited In (12)
- Snipping for robust \(k\)-means clustering under component-wise contamination
- Robustifying Markowitz
- Robust moment estimation and improved clustering via sum of squares
- Quantization/clustering: when and why does \(k\)-means work?
- On the minimum of the mean-squared error in 2-means clustering
- Distribution-free robust linear regression
- Robustification of the \(k\)-means clustering problem and tailored decomposition methods: when more conservative means more accurate
- K-bMOM: A robust Lloyd-type clustering algorithm based on bootstrap median-of-means
- On Hölder fields clustering
- Topics in robust statistical learning
- Upper bound estimations of misclassification rate in the heteroscedastic clustering model with sub-Gaussian noises
- Also for \(k\)-means: more data does not imply better performance
This page was built for publication: Robust \(k\)-means clustering for distributions with two moments
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2054489)