On some significance tests in cluster analysis (Q1072282): Difference between revisions

From MaRDI portal
Added link to MaRDI item.
RedirectionBot (talk | contribs)
Removed claim: author (P16): Item:Q1144341
Property / author
 
Property / author: Hans-Hermann Bock / rank
Normal rank
 

Revision as of 10:54, 22 February 2024

scientific article
Language Label Description Also known as
English
On some significance tests in cluster analysis
scientific article

    Statements

    On some significance tests in cluster analysis (English)
    0 references
    1985
    0 references
    The author investigates the properties of several significance tests for distinguishing between the hypothesis H of a ''homogeneous'' population and an alternative A involving ''clustering'' or ''heterogeneity'', with emphasis on the case of multidimensional observations \(x_ 1,...,x_ n\in R^ p.\) Four types of test statistics are considered: the (s-th) largest gap between observations, their mean distance (or similarity), the minimum within-cluster sum of squares resulting from a k-means algorithm, and the resulting maximum F statistic. If, for a given significance level (error probability) a, such a test statistic exceeds the corresponding critical value \(c=c(a)\), the hypothesis H of homogeneity is rejected (e.g., in favor of a clustering structure A). The asymptotic distributions under H are given for \(n\to \infty\) and the asymptotic power of the tests is derived for neighboring alternatives \(A=A_ n\) approaching A. In particular, the asymptotic distribution of the maximum F statistic is obtained. Moreover, the asymptotic power of the gap test is characterized by a speed factor (log n)\({}^{-1}\) (for \(A_ n\) converging to H), and by a factor \(n^{-1/4}\) for tests based on the mean similarity.
    0 references
    cluster analysis
    0 references
    asymptotic normality
    0 references
    classification
    0 references
    significance tests
    0 references
    clustering
    0 references
    heterogeneity
    0 references
    mean distance
    0 references
    similarity
    0 references
    minimum within-cluster sum of squares
    0 references
    k-means algorithm
    0 references
    maximum F statistic
    0 references
    homogeneity
    0 references
    neighboring alternatives
    0 references
    asymptotic power
    0 references
    gap test
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references