Estimating the number of clusters via a corrected clustering instability

DOI10.1007/S00180-020-00981-5zbMATH Open1505.62179DBLPjournals/cstat/HaslbeckW20arXiv1608.07494OpenAlexW3028113540WikidataQ100762424 ScholiaQ100762424MaRDI QIDQ2228237FDOQ2228237

Authors: Jonas M. B. Haslbeck, Dirk U. Wulff

Publication date: 17 February 2021

Published in: Computational Statistics (Search for Journal in Brave)

Abstract: We improve current instability-based methods for the selection of the number of clusters

k

in cluster analysis by developing a normalized cluster instability measure that corrects for the distribution of cluster sizes, a previously unaccounted driver of cluster instability. We show that our normalized instability measure outperforms current instability-based measures across the whole sequence of possible

k

and especially overcomes limitations in the context of large

k

. We also compare, for the first time, model-based and model-free approaches to determine cluster-instability and find their performance to be comparable. We make our method available in the R-package verb+cstab+.

Full work available at URL: https://arxiv.org/abs/1608.07494

Recommendations

zbMATH Keywords

cluster analysis resampling stability \(k\)-means

Mathematics Subject Classification ID

Computational methods for problems pertaining to statistics (62-08) Classification and discrimination; cluster analysis (statistical aspects) (62H30)

Cites Work

Cited In (5)

Uses Software

Silhouettes

This page was built for publication: Estimating the number of clusters via a corrected clustering instability

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2228237)