Clustering by principal component analysis with Gaussian kernel in high-dimension, low-sample-size settings (Q2048123)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Clustering by principal component analysis with Gaussian kernel in high-dimension, low-sample-size settings
scientific article

    Statements

    Clustering by principal component analysis with Gaussian kernel in high-dimension, low-sample-size settings (English)
    0 references
    0 references
    0 references
    0 references
    5 August 2021
    0 references
    The authors considered a clustering method based on the kernel principal component analysis (KPCA) for high-dimension, low-sample-size (HDLSS) data. First, they investigated asymptotic properties of the KPCA with the linear and Gaussian kernels for the two-class \((k = 2)\) model. Their results seems to extend an important results given by \textit{K. Yata} and \textit{M. Aoshima} [Scand. J. Stat. 47, No. 3, 899--921 (2020; Zbl 1454.62188)]. Second, they showed that HDLSS data can be classified by the sign of the first PC (principal component) scores.They gave theoretical reasons why the Gaussian kernel is effective for clustering high-dimensional data. Third, they discussed the choice of the scale parameter, \(\gamma\), to enjoy high performances of the KPCA with the Gaussian kernel. Then, they showed that the Gaussian kernel with the \(\gamma\) gives preferable performances both in numerical simulations and actual data analyses (they use three microarray data sets given in the supplemental material of \textit{M. Mramor} et al. [``Visualization-based cancer microarray data classification analysis'', Bioinformatics 23, 2147--2154 (2007)].
    0 references
    0 references
    HDLSS
    0 references
    nonlinear PCA
    0 references
    PC score
    0 references
    radial basis function kernel
    0 references
    spherical data
    0 references

    Identifiers