Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters

DOI10.1007/S11222-015-9566-5MaRDI QIDQ5963783zbMATH OpenOpenAlexWikidataFDO

Publication date 23 February 2016

Published in Statistics and Computing (Search for Journal in Brave)

Copyright license Creative Commons Attribution 4.0 International

Full work available at URL https://arxiv.org/abs/1502.02574

Markov chain mixture model mixed-type data spatial autocorrelation cluster validation distance-based clustering presence-absence data

Mathematics Subject Classification ID

Parametric hypothesis testing (62F03) Bootstrap, jackknife and other resampling methods (62F40) Classification and discrimination; cluster analysis (statistical aspects) (62H30)

Abstract: There are two notoriously hard problems in cluster analysis, estimating the number of clusters, and checking whether the population to be clustered is not actually homogeneous. Given a dataset, a clustering method and a cluster validation index, this paper proposes to set up null models that capture structural features of the data that cannot be interpreted as indicating clustering. Artificial datasets are sampled from the null model with parameters estimated from the original dataset. This can be used for testing the null hypothesis of a homogeneous population against a clustering alternative. It can also be used to calibrate the validation index for estimating the number of clusters, by taking into account the expected distribution of the index under the null model for any given number of clusters. The approach is illustrated by three examples, involving various different clustering techniques (partitioning around medoids, hierarchical methods, a Gaussian mixture model), validation indexes (average silhouette width, prediction strength and BIC), and issues such as mixed type data, temporal and spatial autocorrelation.

Recommendations

Cites work

Cited in

(10)

Describes a project that uses

Uses Software

mclust

This page was built for publication: Flexible parametric bootstrap for testing homogeneity against clustering and assessing the number of clusters

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5963783)