CHIME: clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality (Q2414086)

Clustering and discrimination analysis for high-dimensional Gaussian mixtures are the focus in this paper. A procedure called CHIME is proposed, which is based on EM algorithm and a direct estimation method for the discrimination vector. High-dimensional clustering problem appears, for instance, in genetic data. After an adequately chosen number of iterations CHIME provides good estimations for the parameters. The discriminant vector estimator and the excess misclustering error attain minimax optimal rates of convergence. The CHIME algorithm requires two conditions to work well. First, it needs initialization values for the parameters in the mixture to be not very far away from their true values. The authors indicate using the Hardt-Price algorithm in order to obtaining a satisfying initialization. Second, the discriminant vector must be sparse. Next, the algorithm is adapted to attack the low-dimensional Gaussian mixture clustering problem. It is shown that the optimal properties are preserved. It is noticed that the estimators to the parameters in the Gaussian mixture given by CHIME achieve the same convergence rate as the maximum likelihood estimators obtained in the model with known sample labels. Initially the paper considers the two classes Gaussian mixtures. Later, the results are extended for the multi-class Gaussian mixtures. Simulation studies as well as an application to gioblastoma gene expression data are presented and discussed.

0 references

reviewed by

Nelson I. Tanaka

0 references

zbMATH Keywords

high-dimensional data

0 references

unsupervised learning

0 references

Gaussian mixture model

0 references

EM algorithm