Statistical properties of the single linkage hierarchical clustering estimator

From MaRDI portal
Publication:514178

DOI10.1016/J.JSPI.2016.12.002zbMATH Open1360.62367arXiv1511.07715OpenAlexW2963569116MaRDI QIDQ514178FDOQ514178


Authors: Dekang Zhu, Dan P. Guralnik, Xuezhi Wang, Xiang Li, W. Moran Edit this on Wikidata


Publication date: 28 February 2017

Published in: Journal of Statistical Planning and Inference (Search for Journal in Brave)

Abstract: Distance-based hierarchical clustering (HC) methods are widely used in unsupervised data analysis but few authors take account of uncertainty in the distance data. We incorporate a statistical model of the uncertainty through corruption or noise in the pairwise distances and investigate the problem of estimating the HC as unknown parameters from measurements. Specifically, we focus on single linkage hierarchical clustering (SLHC) and study its geometry. We prove that under fairly reasonable conditions on the probability distribution governing measurements, SLHC is equivalent to maximum partial profile likelihood estimation (MPPLE) with some of the information contained in the data ignored. At the same time, we show that direct evaluation of SLHC on maximum likelihood estimation (MLE) of pairwise distances yields a consistent estimator. Consequently, a full MLE is expected to perform better than SLHC in getting the correct HC results for the ground truth metric.


Full work available at URL: https://arxiv.org/abs/1511.07715




Recommendations




Cites Work


Cited In (5)





This page was built for publication: Statistical properties of the single linkage hierarchical clustering estimator

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q514178)