Statistical properties of the single linkage hierarchical clustering estimator
From MaRDI portal
(Redirected from Publication:514178)
Abstract: Distance-based hierarchical clustering (HC) methods are widely used in unsupervised data analysis but few authors take account of uncertainty in the distance data. We incorporate a statistical model of the uncertainty through corruption or noise in the pairwise distances and investigate the problem of estimating the HC as unknown parameters from measurements. Specifically, we focus on single linkage hierarchical clustering (SLHC) and study its geometry. We prove that under fairly reasonable conditions on the probability distribution governing measurements, SLHC is equivalent to maximum partial profile likelihood estimation (MPPLE) with some of the information contained in the data ignored. At the same time, we show that direct evaluation of SLHC on maximum likelihood estimation (MLE) of pairwise distances yields a consistent estimator. Consequently, a full MLE is expected to perform better than SLHC in getting the correct HC results for the ground truth metric.
Recommendations
- Statistical analysis of a hierarchical clustering algorithm with outliers
- Statistical theory in clustering
- On the properties of \(\alpha\)-unchaining single linkage hierarchical clustering
- Characterization, stability and convergence of hierarchical clustering methods
- Hierarchical clustering better than average-linkage
Cites work
- scientific article; zbMATH DE number 432498 (Why is no real title available?)
- scientific article; zbMATH DE number 3479762 (Why is no real title available?)
- scientific article; zbMATH DE number 3504320 (Why is no real title available?)
- scientific article; zbMATH DE number 1786505 (Why is no real title available?)
- scientific article; zbMATH DE number 3274494 (Why is no real title available?)
- A Laplace transform algorithm for the volume of a convex polytope
- A time series illustration of approximate conditional likelihood
- Characterization, stability and convergence of hierarchical clustering methods
- Computing the volume, counting integral points, and exponential sums
- Consistent Estimates Based on Partially Consistent Observations
- Energy minimization methods in computer vision and pattern recognition. 4th international workshop, EMMCVPR 2003, Lisbon, Portugal, July 7--9, 2003. Proceedings.
- Integrated likelihood methods for eliminating nuisance parameters. (With comments and a rejoinder).
- Introduction to Information Retrieval
- Lectures on natural exponential families and their variance functions
- Likelihood Based Hierarchical Clustering
- Maximum entropy Gaussian approximations for the number of integer points and volumes of polytopes
- Metrics on spaces of finite trees
- On the Extreme Rays of the Metric Cone
- Partial likelihood
- Six theorems about injective metric spaces
- Wireless sensor network localization techniques
Cited in
(5)- Functorial hierarchical clustering with overlaps
- Shannon’s entropy of partitions determined by hierarchical clustering trees in asymmetry and dimension identification
- Edge erasures and chordal graphs
- Approximate single linkage cluster analysis of large data sets in high-dimensional spaces
- On the properties of \(\alpha\)-unchaining single linkage hierarchical clustering
This page was built for publication: Statistical properties of the single linkage hierarchical clustering estimator
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q514178)