Dimensionality reduction for k-means clustering and low rank approximation

From MaRDI portal
Publication:2941504

DOI10.1145/2746539.2746569zbMATH Open1321.68398arXiv1410.6801OpenAlexW2133157266MaRDI QIDQ2941504FDOQ2941504


Authors: Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, Madalina Persu Edit this on Wikidata


Publication date: 21 August 2015

Published in: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing (Search for Journal in Brave)

Abstract: We show how to approximate a data matrix mathbfA with a much smaller sketch mathbfildeA that can be used to solve a general class of constrained k-rank approximation problems to within (1+epsilon) error. Importantly, this class of problems includes k-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just O(k) dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For k-means dimensionality reduction, we provide (1+epsilon) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for , but can be used directly to compute this subspace. Finally, for k-means clustering, we show how to achieve a (9+epsilon) approximation by Johnson-Lindenstrauss projecting data points to just O(logk/epsilon2) dimensions. This gives the first result that leverages the specific structure of k-means to achieve dimension independent of input size and sublinear in k.


Full work available at URL: https://arxiv.org/abs/1410.6801




Recommendations




Cites Work


Cited In (41)

Uses Software





This page was built for publication: Dimensionality reduction for \(k\)-means clustering and low rank approximation

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2941504)