SMLSOM: the shrinking maximum likelihood self-organizing map
From MaRDI portal
Publication:6168922
DOI10.1016/J.CSDA.2023.107714arXiv2104.13971MaRDI QIDQ6168922FDOQ6168922
Authors: Ryosuke Motegi, Yoichi Seki
Publication date: 11 July 2023
Published in: Computational Statistics and Data Analysis (Search for Journal in Brave)
Abstract: Determining the number of clusters in a dataset is a fundamental issue in data clustering. Many methods have been proposed to solve the problem of selecting the number of clusters, considering it to be a problem with regard to model selection. This paper proposes an efficient algorithm that automatically selects a suitable number of clusters based on a probability distribution model framework. The algorithm includes the following two components. First, a generalization of Kohonen's self-organizing map (SOM) is introduced. In Kohonen's SOM, clusters are modeled as mean vectors. In the generalized SOM, each cluster is modeled as a probabilistic distribution and constructed by samples classified based on the likelihood. Second, the dynamically updating method of the SOM structure is introduced. In Kohonen's SOM, each cluster is tied to a node of a fixed two-dimensional lattice space and learned using neighborhood relations between nodes based on Euclidean distance. The extended SOM defines a graph with clusters as vertices and neighborhood relations as links and updates the graph structure by cutting weakly-connection and unnecessary vertex deletions. The weakness of a link is measured using the Kullback--Leibler divergence, and the redundancy of a vertex is measured using the minimum description length. Those extensions make it efficient to determine the appropriate number of clusters. Compared with existing methods, the proposed method is computationally efficient and can accurately select the number of clusters.
Full work available at URL: https://arxiv.org/abs/2104.13971
Recommendations
- scientific article; zbMATH DE number 1860775
- Advances in Intelligent Data Analysis VI
- Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density.
- On the generative probability density model in the self-organizing map
- Kernel-based self-organizing map clustering
Cites Work
- A new R package for Bayesian estimation of multivariate normal mixtures allowing for selection of the number of components and interval-censored data
- Model-based clustering and classification for data science. With applications in R
- Estimating the dimension of a model
- Title not available (Why is that?)
- Model-Based Clustering, Discriminant Analysis, and Density Estimation
- A new look at the statistical model identification
- Reversible jump Markov chain Monte Carlo computation and Bayesian model determination
- Title not available (Why is that?)
- On Information and Sufficiency
- Self-organized formation of topologically correct feature maps
- A Stochastic Approximation Method
- 10.1162/153244303321897735
- The dip test of unimodality
- Model Selection and the Principle of Minimum Description Length
- Modeling by shortest data description
- Title not available (Why is that?)
- An Information Measure for Classification
- Self-organizing maps.
- Clustering: a neural network approach
- Degrees of freedom and model selection for \(k\)-means clustering
- Dealing with overdispersion in multivariate count data
- Title not available (Why is that?)
- Mixture models for standard \(p\)-dimensional Euclidean data
This page was built for publication: SMLSOM: the shrinking maximum likelihood self-organizing map
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6168922)