Asymptotic properties of univariate sample k-means clusters (Q1062391)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Asymptotic properties of univariate sample k-means clusters
scientific article

    Statements

    Asymptotic properties of univariate sample k-means clusters (English)
    0 references
    0 references
    0 references
    1984
    0 references
    The problem of optimal partition of a univariate random sample of size N on [a,b] into k clusters is considered for the case as k approaches infinity with N (such that the length of each cluster interval approaches zero while the number of observations in each cluster approaches infinity). This k-means clustering method minimizes the within-cluster sums of squares, and it can potentially be used for constructing variable-cell histograms. It is shown that the sampling locally optimal partition approaches the population optimal partition under certain regularity conditions (the population density is positive and has four bounded derivatives in [a,b]). Some large sample properties of this method are obtained: the sample k- means clusters are such that the within-cluster sums of squares are asymptotically equal, and the sizes of the cluster intervals are inversely proportional to the one-third power of the underlying density at the midpoints of the intervals. The multivariate case requires further investigation as the generalization of univariate results to many dimensions is not straightforward.
    0 references
    0 references
    cluster lengths
    0 references
    non-standard asymptotics
    0 references
    optimal partition
    0 references
    univariate random sample
    0 references
    k-means clustering method
    0 references
    within-cluster sums of squares
    0 references
    variable-cell histograms
    0 references
    large sample properties
    0 references