A fast quartet tree heuristic for hierarchical clustering
From MaRDI portal
(Redirected from Publication:622012)
Abstract: The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the weighted quartet topologies on objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly transforms a dendrogram, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. The problem and the solution heuristic has been extensively used for general hierarchical clustering of nontree-like (non-phylogeny) data in various domains and across domains with heterogeneous data. We also present a greatly improved heuristic, reducing the running time by a factor of order a thousand to ten thousand. All this is implemented and available, as part of the CompLearn package. We compare performance and running time of the original and improved versions with those of UPGMA, BioNJ, and NJ, as implemented in the SplitsTree package on genomic data for which the latter are optimized. Keywords: Data and knowledge visualization, Pattern matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering, Global optimization, Quartet tree, Randomized hill-climbing,
Recommendations
Cites work
- scientific article; zbMATH DE number 1405796 (Why is no real title available?)
- A discipline of evolutionary programming
- A polynomial time approximation scheme for inferring evolutionary trees from quartet topologies and its application
- An introduction to Kolmogorov complexity and its applications
- Calculating Sums of Infinite Series
- Clustered SplitsNetworks
- Clustering by Compression
- Equation of state calculations by fast computing machines
- Hierarchical clustering schemes
- Information distance
- Integer linear programming as a tool for constructing trees from quartet data
- Optimization by simulated annealing
- Pattern classification.
- Phylogenetic supertrees. Combining information to reveal the tree of life
- Shared Information and Program Plagiarism Detection
- The Similarity Metric
- The complexity of reconstructing trees from qualitative characters and subtrees
- Tree structures for proximity data
Cited in
(6)- Improved metaheuristics for the quartet method of hierarchical clustering
- An exact algorithm for the minimum quartet tree cost problem
- Hesitant fuzzy agglomerative hierarchical clustering algorithms
- Similarity and denoising
- Clique optimization: A method to construct parsimonious ultrametric trees from similarity data
- Combining attribute content and label information for categorical data ensemble clustering
This page was built for publication: A fast quartet tree heuristic for hierarchical clustering
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q622012)