Variable selection for model-based clustering using the integrated complete-data likelihood
From MaRDI portal
Abstract: Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty. However, the calibration of the penalty term can suffer from criticisms. Model selection methods are an efficient alternative, yet they require a difficult optimization of an information criterion which involves combinatorial problems. First, most of these optimization algorithms are based on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are often greedy because they need multiple calls of EM algorithms. Here we propose to use a new information criterion based on the integrated complete-data likelihood. It does not require any estimate and its maximization is simple and computationally efficient. The original contribution of our approach is to perform the model selection without requiring any parameter estimation. Then, parameter inference is needed only for the unique selected model. This approach is used for the variable selection of a Gaussian mixture model with conditional independence assumption. The numerical experiments on simulated and benchmark datasets show that the proposed method often outperforms two classical approaches for variable selection.
Recommendations
- Variable Selection for Model-Based Clustering
- Variable selection methods for model-based clustering
- scientific article; zbMATH DE number 5251644
- scientific article; zbMATH DE number 6458364
- Variable selection in model-based clustering and discriminant analysis with a regularization approach
- Variable Selection for Clustering with Gaussian Mixture Models
- Variable selection in clustering via Dirichlet process mixture models
- Variable selection in model-based clustering: a general variable role modeling
- A mixed integer linear model for clustering with variable selection
- A simple model-based approach to variable selection in classification and clustering
Cites work
- scientific article; zbMATH DE number 5957252 (Why is no real title available?)
- scientific article; zbMATH DE number 5593833 (Why is no real title available?)
- scientific article; zbMATH DE number 4159863 (Why is no real title available?)
- scientific article; zbMATH DE number 3942813 (Why is no real title available?)
- scientific article; zbMATH DE number 3780417 (Why is no real title available?)
- scientific article; zbMATH DE number 3567782 (Why is no real title available?)
- scientific article; zbMATH DE number 2124691 (Why is no real title available?)
- scientific article; zbMATH DE number 6458364 (Why is no real title available?)
- A framework for feature selection in clustering
- Algorithm AS 136: A K-Means Clustering Algorithm
- Bayesian Variable Selection in Clustering High-Dimensional Data
- Bayesian variable selection for latent class analysis using a collapsed Gibbs sampler
- Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion
- Clustering Objects on Subsets of Attributes (with Discussion)
- Clustering criteria for discrete data and latent class models
- Consistent estimation of the order of mixture models.
- Estimating the dimension of a model
- Exact and Monte Carlo calculations of integrated likelihoods for the latent class model
- Latent class models for mixed variables with applications in archaeometry
- On the choice of a model to fit data from an exponential family
- Penalized model-based clustering with application to variable selection
- The Bayesian Choice
- Variable Selection for Clustering with Gaussian Mixture Models
- Variable Selection for Model-Based Clustering
- Variable selection in model-based clustering: a general variable role modeling
Cited in
(25)- A tractable multi-partitions clustering
- Variable Selection for Model-Based Clustering
- A mixed integer linear model for clustering with variable selection
- Variable selection in model-based clustering using multilocus genotype data
- Variable selection for mixed data clustering: application in human population genomics
- Data-driven penalty calibration: a case study for Gaussian mixture model selection
- Variable selection in model-based clustering: a general variable role modeling
- Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics
- SelvarClustMV: variable selection approach in model-based clustering allowing for missing values
- Bayesian bi-clustering methods with applications in computational biology
- Variable selection methods for model-based clustering
- Modelling the role of variables in model-based cluster analysis
- VarSelLCM
- Unifying data units and models in (co-)clustering
- On variable selection in matrix mixture modelling
- A non asymptotic penalized criterion for Gaussian mixture model selection
- A survey on model-based co-clustering: high dimension and estimation challenges
- Distance Metrics and Clustering Methods for Mixed‐type Data
- Bayesian inference for infinite asymmetric Gaussian mixture with feature selection
- scientific article; zbMATH DE number 6458364 (Why is no real title available?)
- Variable selection in model-based clustering and discriminant analysis with a regularization approach
- Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion
- A model selection criterion for model-based clustering of annotated gene expression data
- Enhancing the selection of a model-based clustering with external categorical variables
- Estimation and model selection for model-based clustering with the conditional classification likelihood
This page was built for publication: Variable selection for model-based clustering using the integrated complete-data likelihood
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q133915)