Fast Generation of Exchangeable Sequence of Clusters Data
From MaRDI portal
Publication:6409912
DOI10.1007/S11222-024-10385-WarXiv2209.02844OpenAlexW4391593803WikidataQ128260752 ScholiaQ128260752MaRDI QIDQ6409912FDOQ6409912
Authors: Keith Levin, Brenda Betancourt
Publication date: 6 September 2022
Abstract: Recent advances in Bayesian models for random partitions have led to the formulation and exploration of Exchangeable Sequences of Clusters (ESC) models. Under ESC models, it is the cluster sizes that are exchangeable, rather than the observations themselves. This property is particularly useful for obtaining microclustering behavior, whereby cluster sizes grow sublinearly in the number of observations, as is common in applications such as record linkage, sparse networks and genomics. Unfortunately, the exchangeable clusters property comes at the cost of projectivity. As a consequence, in contrast to more traditional Dirichlet Process or Pitman-Yor process mixture models, samples a priori from ESC models cannot be easily obtained in a sequential fashion and instead require the use of rejection or importance sampling. In this work, drawing on connections between ESC models and discrete renewal theory, we obtain closed-form expressions for certain ESC models and develop faster methods for generating samples a priori from these models compared with the existing state of the art. In the process, we establish analytical expressions for the distribution of the number of clusters under ESC models, which was unknown prior to this work.
Full work available at URL: https://doi.org/10.1007/s11222-024-10385-w
Recommendations
- Nonexchangeable random partition models for microclustering
- Random Partition Models for Microclustering Tasks
- Generalized Ewens-Pitman model for Bayesian clustering
- Random partition models and exchangeability for Bayesian identification of population struc\-ture
- Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model
Computational methods for problems pertaining to statistics (62-08) Bayesian inference (62F15) Classification and discrimination; cluster analysis (statistical aspects) (62H30) Exchangeability for stochastic processes (60G09)
This page was built for publication: Fast Generation of Exchangeable Sequence of Clusters Data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6409912)