Model assisted variable clustering: minimax-optimal recovery and algorithms
From MaRDI portal
Abstract: Model-based clustering defines population level clusters relative to a model that embeds notions of similarity. Algorithms tailored to such models yield estimated clusters with a clear statistical interpretation. We take this view here and introduce the class of G-block covariance models as a background model for variable clustering. In such models, two variables in a cluster are deemed similar if they have similar associations will all other variables. This can arise, for instance, when groups of variables are noise corrupted versions of the same latent factor. We quantify the difficulty of clustering data generated from a G-block covariance model in terms of cluster proximity, measured with respect to two related, but different, cluster separation metrics. We derive minimax cluster separation thresholds, which are the metric values below which no algorithm can recover the model-defined clusters exactly, and show that they are different for the two metrics. We therefore develop two algorithms, COD and PECOK, tailored to G-block covariance models, and study their minimax-optimality with respect to each metric. Of independent interest is the fact that the analysis of the PECOK algorithm, which is based on a corrected convex relaxation of the popular K-means algorithm, provides the first statistical analysis of such algorithms for variable clustering. Additionally, we contrast our methods with another popular clustering method, spectral clustering, specialized to variable clustering, and show that ensuring exact cluster recovery via this method requires clusters to have a higher separation, relative to the minimax threshold. Extensive simulation studies, as well as our data analyses, confirm the applicability of our approach.
Recommendations
Cites work
- scientific article; zbMATH DE number 6381735 (Why is no real title available?)
- Adaptive estimation in structured factor models with applications to overlapping clustering
- Approximating K‐means‐type Clustering via Semidefinite Programming
- Clustering subgaussian mixtures by semidefinite programming
- Community detection in sparse networks via Grothendieck's inequality
- Concentration inequalities and moment bounds for sample covariance operators
- Consistency of spectral clustering in stochastic block models
- Entrywise eigenvector analysis of random matrices with low expected rank
- Exact recovery in the Ising blockmodel
- Least squares quantization in PCM
- Model assisted variable clustering: minimax-optimal recovery and algorithms
- Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data
- Model-based clustering of high-dimensional data: a review
- On semidefinite relaxations for the block model
- Optimization via low-rank approximation for community detection in networks
- Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices
- The hardness of approximation of Euclidean \(k\)-means
Cited in
(10)- Adaptive estimation in structured factor models with applications to overlapping clustering
- A Doubly Enhanced EM Algorithm for Model-Based Tensor Clustering
- High-dimensional inference for cluster-based graphical models
- Posterior consistency of factor dimensionality in high-dimensional sparse factor models
- Partial recovery bounds for clustering with the relaxed \(K\)-means
- Model assisted variable clustering: minimax-optimal recovery and algorithms
- scientific article; zbMATH DE number 7625163 (Why is no real title available?)
- Detecting approximate replicate components of a high-dimensional random vector with latent structure
- Paralinear distance and its algorithm for hierarchical clustering of high-dimensional discrete variables
- Efficient, certifiably optimal clustering with applications to latent variable graphical models
This page was built for publication: Model assisted variable clustering: minimax-optimal recovery and algorithms
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2176610)