CHIME: clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality (Q2414086): Difference between revisions
From MaRDI portal
Added link to MaRDI item. |
Set profile property. |
||
Property / MaRDI profile type | |||
Property / MaRDI profile type: MaRDI publication profile / rank | |||
Normal rank |
Revision as of 07:06, 5 March 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | CHIME: clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality |
scientific article |
Statements
CHIME: clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality (English)
0 references
10 May 2019
0 references
Clustering and discrimination analysis for high-dimensional Gaussian mixtures are the focus in this paper. A procedure called CHIME is proposed, which is based on EM algorithm and a direct estimation method for the discrimination vector. High-dimensional clustering problem appears, for instance, in genetic data. After an adequately chosen number of iterations CHIME provides good estimations for the parameters. The discriminant vector estimator and the excess misclustering error attain minimax optimal rates of convergence. The CHIME algorithm requires two conditions to work well. First, it needs initialization values for the parameters in the mixture to be not very far away from their true values. The authors indicate using the Hardt-Price algorithm in order to obtaining a satisfying initialization. Second, the discriminant vector must be sparse. Next, the algorithm is adapted to attack the low-dimensional Gaussian mixture clustering problem. It is shown that the optimal properties are preserved. It is noticed that the estimators to the parameters in the Gaussian mixture given by CHIME achieve the same convergence rate as the maximum likelihood estimators obtained in the model with known sample labels. Initially the paper considers the two classes Gaussian mixtures. Later, the results are extended for the multi-class Gaussian mixtures. Simulation studies as well as an application to gioblastoma gene expression data are presented and discussed.
0 references
high-dimensional data
0 references
unsupervised learning
0 references
Gaussian mixture model
0 references
EM algorithm
0 references
misclustering error
0 references
minimax optimality
0 references