Semi-supervised cross-entropy clustering with information bottleneck constraint
From MaRDI portal
Publication:780993
DOI10.1016/J.INS.2017.07.016zbMATH Open1436.62286arXiv1705.01601OpenAlexW2611453196MaRDI QIDQ780993FDOQ780993
Marek Śmieja, Bernhard C. Geiger
Publication date: 16 July 2020
Published in: Information Sciences (Search for Journal in Brave)
Abstract: In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goals: the accuracy with which the data set is modeled, the simplicity of the model, and the consistency of the clustering with side information. Experiments demonstrate that CEC-IB has a performance comparable to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but is faster, more robust to noisy labels, automatically determines the optimal number of clusters, and performs well when not all classes are present in the side information. Moreover, in contrast to other semi-supervised models, it can be successfully applied in discovering natural subgroups if the partition-level side information is derived from the top levels of a hierarchical clustering.
Full work available at URL: https://arxiv.org/abs/1705.01601
Recommendations
model-based clusteringcross-entropyinformation bottlenecksemi-supervised clusteringpartition-level side information
Cites Work
- Algorithm AS 136: A K-Means Clustering Algorithm
- Cross-entropy clustering
- Mixture Densities, Maximum Likelihood and the EM Algorithm
- Learning from partially supervised data using mixture models and belief functions
- Title not available (Why is that?)
- Robust supervised classification with mixture models: learning from data with uncertain labels
- Introduction to Semi-Supervised Learning
- Semi-supervised concept factorization for document clustering
- A graph-based semi-supervised \(k\) nearest-neighbor method for nonlinear manifold distributed data classification
- Semi-supervised information-maximization clustering
Cited In (1)
Uses Software
This page was built for publication: Semi-supervised cross-entropy clustering with information bottleneck constraint
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q780993)