Semi-supervised cross-entropy clustering with information bottleneck constraint

From MaRDI portal
Publication:780993

DOI10.1016/J.INS.2017.07.016zbMATH Open1436.62286arXiv1705.01601OpenAlexW2611453196MaRDI QIDQ780993FDOQ780993

Marek Śmieja, Bernhard C. Geiger

Publication date: 16 July 2020

Published in: Information Sciences (Search for Journal in Brave)

Abstract: In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goals: the accuracy with which the data set is modeled, the simplicity of the model, and the consistency of the clustering with side information. Experiments demonstrate that CEC-IB has a performance comparable to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but is faster, more robust to noisy labels, automatically determines the optimal number of clusters, and performs well when not all classes are present in the side information. Moreover, in contrast to other semi-supervised models, it can be successfully applied in discovering natural subgroups if the partition-level side information is derived from the top levels of a hierarchical clustering.


Full work available at URL: https://arxiv.org/abs/1705.01601




Recommendations




Cites Work


Cited In (1)

Uses Software





This page was built for publication: Semi-supervised cross-entropy clustering with information bottleneck constraint

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q780993)