Ckmeans.1d.dp

From MaRDI portal
Software:27655



swMATH15783CRANCkmeans.1d.dpMaRDI QIDQ27655

Optimal, Fast, and Reproducible Univariate Clustering

Mingzhou Song, Hua Zhong, Haizhou Wang

Last update: 19 August 2023

Software version identifier: 4.3.4, 1.0, 1.1, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 3.0, 3.01, 3.02, 3.3.0, 3.3.1, 3.3.3, 3.4.0-1, 3.4.0, 3.4.6-1, 3.4.6-2, 3.4.6-3, 3.4.6-4, 3.4.6-5, 3.4.6-6, 3.4.6, 4.0.0, 4.0.1, 4.2.0, 4.2.1, 4.2.2, 4.3.0, 4.3.2, 4.3.3, 4.3.5

Source code repository: https://github.com/cran/Ckmeans.1d.dp

Copyright license: GNU Lesser General Public License

Fast, optimal, and reproducible weighted univariate clustering by dynamic programming. Four problems are solved, including univariate k-means (Wang & Song 2011) <doi:10.32614/RJ-2011-015> (Song & Zhong 2020) <doi:10.1093/bioinformatics/btaa613>, k-median, k-segments, and multi-channel weighted k-means. Dynamic programming is used to minimize the sum of (weighted) within-cluster distances using respective metrics. Its advantage over heuristic clustering in efficiency and accuracy is pronounced when there are many clusters. Multi-channel weighted k-means groups multiple univariate signals into k clusters. An auxiliary function generates histograms adaptive to patterns in data. This package provides a powerful set of tools for univariate data analysis with guaranteed optimality, efficiency, and reproducibility, useful for peak calling on temporal, spatial, and spectral data.