CHIME: clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality (Q2414086)

From MaRDI portal
scientific article
Language Label Description Also known as
English
CHIME: clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality
scientific article

    Statements

    CHIME: clustering of high-dimensional Gaussian mixtures with EM algorithm and its optimality (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    10 May 2019
    0 references
    Clustering and discrimination analysis for high-dimensional Gaussian mixtures are the focus in this paper. A procedure called CHIME is proposed, which is based on EM algorithm and a direct estimation method for the discrimination vector. High-dimensional clustering problem appears, for instance, in genetic data. After an adequately chosen number of iterations CHIME provides good estimations for the parameters. The discriminant vector estimator and the excess misclustering error attain minimax optimal rates of convergence. The CHIME algorithm requires two conditions to work well. First, it needs initialization values for the parameters in the mixture to be not very far away from their true values. The authors indicate using the Hardt-Price algorithm in order to obtaining a satisfying initialization. Second, the discriminant vector must be sparse. Next, the algorithm is adapted to attack the low-dimensional Gaussian mixture clustering problem. It is shown that the optimal properties are preserved. It is noticed that the estimators to the parameters in the Gaussian mixture given by CHIME achieve the same convergence rate as the maximum likelihood estimators obtained in the model with known sample labels. Initially the paper considers the two classes Gaussian mixtures. Later, the results are extended for the multi-class Gaussian mixtures. Simulation studies as well as an application to gioblastoma gene expression data are presented and discussed.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    high-dimensional data
    0 references
    unsupervised learning
    0 references
    Gaussian mixture model
    0 references
    EM algorithm
    0 references
    misclustering error
    0 references
    minimax optimality
    0 references
    0 references
    0 references