On some significance tests in cluster analysis (Q1072282)

From MaRDI portal
scientific article
Language Label Description Also known as
English
On some significance tests in cluster analysis
scientific article

    Statements

    On some significance tests in cluster analysis (English)
    0 references
    0 references
    1985
    0 references
    The author investigates the properties of several significance tests for distinguishing between the hypothesis H of a ''homogeneous'' population and an alternative A involving ''clustering'' or ''heterogeneity'', with emphasis on the case of multidimensional observations \(x_ 1,...,x_ n\in R^ p.\) Four types of test statistics are considered: the (s-th) largest gap between observations, their mean distance (or similarity), the minimum within-cluster sum of squares resulting from a k-means algorithm, and the resulting maximum F statistic. If, for a given significance level (error probability) a, such a test statistic exceeds the corresponding critical value \(c=c(a)\), the hypothesis H of homogeneity is rejected (e.g., in favor of a clustering structure A). The asymptotic distributions under H are given for \(n\to \infty\) and the asymptotic power of the tests is derived for neighboring alternatives \(A=A_ n\) approaching A. In particular, the asymptotic distribution of the maximum F statistic is obtained. Moreover, the asymptotic power of the gap test is characterized by a speed factor (log n)\({}^{-1}\) (for \(A_ n\) converging to H), and by a factor \(n^{-1/4}\) for tests based on the mean similarity.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    cluster analysis
    0 references
    asymptotic normality
    0 references
    classification
    0 references
    significance tests
    0 references
    clustering
    0 references
    heterogeneity
    0 references
    mean distance
    0 references
    similarity
    0 references
    minimum within-cluster sum of squares
    0 references
    k-means algorithm
    0 references
    maximum F statistic
    0 references
    homogeneity
    0 references
    neighboring alternatives
    0 references
    asymptotic power
    0 references
    gap test
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references