Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters
From MaRDI portal
Publication:5963783
Abstract: There are two notoriously hard problems in cluster analysis, estimating the number of clusters, and checking whether the population to be clustered is not actually homogeneous. Given a dataset, a clustering method and a cluster validation index, this paper proposes to set up null models that capture structural features of the data that cannot be interpreted as indicating clustering. Artificial datasets are sampled from the null model with parameters estimated from the original dataset. This can be used for testing the null hypothesis of a homogeneous population against a clustering alternative. It can also be used to calibrate the validation index for estimating the number of clusters, by taking into account the expected distribution of the index under the null model for any given number of clusters. The approach is illustrated by three examples, involving various different clustering techniques (partitioning around medoids, hierarchical methods, a Gaussian mixture model), validation indexes (average silhouette width, prediction strength and BIC), and issues such as mixed type data, temporal and spatial autocorrelation.
Recommendations
- Significance testing in clustering
- Identifying genuine clusters in a classification
- Identifying genuine clusters in a classification
- How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis
- Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes
Cites work
- scientific article; zbMATH DE number 41467 (Why is no real title available?)
- scientific article; zbMATH DE number 708500 (Why is no real title available?)
- scientific article; zbMATH DE number 3429948 (Why is no real title available?)
- Consistent estimation of the order of mixture models.
- Distance-based parametric bootstrap tests for clustering of species ranges
- Finding Groups in Data
- Finding the Number of Clusters in a Dataset
- How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis
- Model-Based Clustering, Classification, and Density Estimation Using mclust in R
- Model-Based Clustering, Discriminant Analysis, and Density Estimation
- Multidimensional scaling.
- Probabilistic models in cluster analysis
Cited in
(10)- Distance-based parametric bootstrap tests for clustering of species ranges
- Clustering with the average silhouette width
- Bootstrapping for Significance of Compact Clusters in Multidimensional Datasets
- Significance testing in clustering
- Spatial variability clustering for spatially dependent functional data
- Distance Metrics and Clustering Methods for Mixed‐type Data
- E-ReMI: extended maximal interaction two-mode clustering
- An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture‐based clustering
- REMAXINT: a two-mode clustering-based method for statistical inference on two-way interaction
- Testing independence between two nonhomogeneous point processes in time
This page was built for publication: Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5963783)