Identifiability of nonparametric mixture models and Bayes optimal clustering
From MaRDI portal
Publication:2215737
DOI10.1214/19-AOS1887zbMATH Open1455.62068arXiv1802.04397MaRDI QIDQ2215737FDOQ2215737
Authors: Bryon Aragam, Chen Dan, Eric P. Xing, Pradeep Ravikumar
Publication date: 14 December 2020
Published in: The Annals of Statistics (Search for Journal in Brave)
Abstract: Motivated by problems in data clustering, we establish general conditions under which families of nonparametric mixture models are identifiable, by introducing a novel framework involving clustering overfitted emph{parametric} (i.e. misspecified) mixture models. These identifiability conditions generalize existing conditions in the literature, and are flexible enough to include for example mixtures of Gaussian mixtures. In contrast to the recent literature on estimating nonparametric mixtures, we allow for general nonparametric mixture components, and instead impose regularity assumptions on the underlying mixing measure. As our primary application, we apply these results to partition-based clustering, generalizing the notion of a Bayes optimal partition from classical parametric model-based clustering to nonparametric settings. Furthermore, this framework is constructive so that it yields a practical algorithm for learning identified mixtures, which is illustrated through several examples on real data. The key conceptual device in the analysis is the convex, metric geometry of probability measures on metric spaces and its connection to the Wasserstein convergence of mixing measures. The result is a flexible framework for nonparametric clustering with formal consistency guarantees.
Full work available at URL: https://arxiv.org/abs/1802.04397
Recommendations
- On clustering procedures and nonparametric mixture estimation
- An operator theoretic approach to nonparametric mixture models
- Non-parametric identification and estimation of the number of components in multivariate mixtures
- Clustering via finite nonparametric ICA mixture models
- Nonparametric inference in multivariate mixtures
Nonparametric estimation (62G05) Classification and discrimination; cluster analysis (statistical aspects) (62H30) Estimation in multivariate analysis (62H12)
Cites Work
- Statistical analysis of finite mixture distributions
- Title not available (Why is that?)
- Model-Based Clustering, Discriminant Analysis, and Density Estimation
- Spectral clustering and the high-dimensional stochastic blockmodel
- Probabilistic models in cluster analysis
- Mixture models: theory, geometry and applications
- Least squares quantization in PCM
- Robust cluster analysis and variable selection
- On the Identifiability of Finite Mixtures
- Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities.
- Consistency of Single Linkage for High-Density Clusters
- Title not available (Why is that?)
- Fully adaptive density-based clustering
- Identifiability of parameters in latent structure models with many observed variables
- Nonparametric estimation of component distributions in a multivariate mixture
- Estimating multivariate latent-structure models
- Maximum smoothed likelihood for multivariate mixtures
- Title not available (Why is that?)
- Identifiability of Finite Mixtures of Elliptical Distributions
- Title not available (Why is that?)
- Semiparametric estimation of a two-component mixture model
- Title not available (Why is that?)
- Generalized density clustering
- Inference for mixtures of symmetric distributions
- Semiparametric mixtures of regressions
- Convergence of latent mixing measures in finite and infinite mixture models
- Minimum Hellinger distance estimates for parametric models
- The geometry of kernelized spectral clustering
- Almost Nonparametric Inference for Repeated Measures in Mixture Models
- Identifiability of Finite Mixtures
- Optimal rate of convergence for finite mixture models
- Nonparametric inference in multivariate mixtures
- Learning mixtures of separated nonspherical Gaussians
- Learning Theory
- On a converse to Scheffé's theorem
- Title not available (Why is that?)
- Identifiability of Mixtures
- Rates of convergence for the Gaussian mixture sieve.
- Semiparametric estimation of a two-component mixture of linear regressions in which one component is known
- Towards an axiomatic approach to hierarchical clustering of measures
- Remarks on the non-identifiability of mixtures of distributions
- Data spectroscopy: eigenspaces of convolution operators and clustering
- Identifiability of Mixtures of Product Measures
- Identification of mixture models using support variations
- Adaptive Mixtures
- On the Mixture of Distributions
- Identifiability of mixtures of exponential families
- Title not available (Why is that?)
- On strong identifiability and convergence rates of parameter estimation in finite mixtures
- Strong identifiability and optimal minimax rates for finite mixture estimation
- Identifiability of continuous mixtures of unknown Gaussian distributions
- Singularity structures and impacts on parameter estimation in finite mixtures of distributions
- Non-Parametric Estimation of Finite Mixtures from Repeated Measurements
- A comprehensive approach to mode clustering
- A population background for nonparametric density-based clustering
- Clustering subgaussian mixtures by semidefinite programming
- When are overcomplete topic models identifiable? Uniqueness of tensor Tucker decompositions with structured sparsity
- The Spectral Method for General Mixture Models
- Inference on two-component mixtures under tail restrictions
Cited In (10)
- Mixture modeling with normalizing flows for spherical density estimation
- Bayesian Tree-Structured Two-Level Clustering for Nested Data Analysis
- Skeleton Clustering: Dimension-Free Density-Aided Clustering
- Uniform consistency in nonparametric mixture models
- On clustering procedures and nonparametric mixture estimation
- A nonparametric mixed-effects mixture model for patterns of clinical measurements associated with COVID-19
- Title not available (Why is that?)
- Title not available (Why is that?)
- Distributionally robust optimization using optimal transport for Gaussian mixture models
- Optimal Bayesian estimators for latent variable cluster models
This page was built for publication: Identifiability of nonparametric mixture models and Bayes optimal clustering
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2215737)