Dirichlet-tree multinomial mixtures for clustering microbiome compositions
From MaRDI portal
Publication:2170402
DOI10.1214/21-AOAS1552zbMATH Open1498.62241OpenAlexW3046411508MaRDI QIDQ2170402FDOQ2170402
Authors: Yanyan Li
Publication date: 5 September 2022
Published in: The Annals of Applied Statistics (Search for Journal in Brave)
Abstract: Studying the human microbiome has gained substantial interest in recent years, and a common task in the analysis of these data is to cluster microbiome compositions into subtypes. This subdivision of samples into subgroups serves as an intermediary step in achieving personalized diagnosis and treatment. In applying existing clustering methods to modern microbiome studies including the American Gut Project (AGP) data, we found that this seemingly standard task, however, is very challenging in the microbiome composition context due to several key features of such data. Standard distance-based clustering algorithms generally do not produce reliable results as they do not take into account the heterogeneity of the cross-sample variability among the bacterial taxa, while existing model-based approaches do not allow sufficient flexibility for the identification of complex within-cluster variation from cross-cluster variation. Direct applications of such methods generally lead to overly dispersed clusters in the AGP data and such phenomenon is common for other microbiome data. To overcome these challenges, we introduce Dirichlet-tree multinomial mixtures (DTMM) as a Bayesian generative model for clustering amplicon sequencing data in microbiome studies. DTMM models the microbiome population with a mixture of Dirichlet-tree kernels that utilizes the phylogenetic tree to offer a more flexible covariance structure in characterizing within-cluster variation, and it provides a means for identifying a subset of signature taxa that distinguish the clusters. We perform extensive simulation studies to evaluate the performance of DTMM and compare it to state-of-the-art model-based and distance-based clustering methods in the microbiome context. Finally, we report a case study on the fecal data from the AGP to identify compositional clusters among individuals with inflammatory bowel disease and diabetes.
Full work available at URL: https://arxiv.org/abs/2008.00400
Recommendations
- Microbiome Subcommunity Learning with Logistic-Tree Normal Latent Dirichlet Allocation
- Bayesian mixed effects models for zero-inflated compositions in microbiome data analysis
- A Dirichlet-Tree Multinomial Regression Model for Associating Dietary Nutrients with Gut Microorganisms
- High-dimensional count and compositional data analysis in\\ microbiome studies
- A phylogenetic scan test on a Dirichlet-tree multinomial model for microbiome data
Bayesian inference (62F15) Classification and discrimination; cluster analysis (statistical aspects) (62H30) Applications of statistics to biology and medical sciences; meta analysis (62P10)
Cites Work
- A phylogenetic scan test on a Dirichlet-tree multinomial model for microbiome data
- Title not available (Why is that?)
- Least squares quantization in PCM
- Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem
- A Dirichlet-Tree Multinomial Regression Model for Associating Dietary Nutrients with Gut Microorganisms
- Title not available (Why is that?)
- Gibbs Sampling Methods for Stick-Breaking Priors
- Title not available (Why is that?)
- Bayesian graphical compositional regression for microbiome data
- On the hyper-dirichlet type 1 and hyper-liouville distributions
- Analysis of Distributional Variation Through Graphical Multi-Scale Beta-Binomial Models
Cited In (10)
- A mixture of logistics skew-normal multinomial models
- Classification Rules that Include Neutral Zones and Their Application to Microbial Community Profiling
- A hierarchical Bayesian approach for detecting global microbiome associations
- Microbiome Subcommunity Learning with Logistic-Tree Normal Latent Dirichlet Allocation
- Modeling association in microbial communities with clique loglinear models
- T-BAPS: A Bayesian Statistical Tool for Comparison of Microbial Communities Using Terminal-restriction Fragment Length Polymorphism (T-RFLP) Data
- A Zero-Inflated Logistic Normal Multinomial Model for Extracting Microbial Compositions
- Logistic Normal Multinomial Factor Analyzers for Clustering Microbiome Data
- A phylogenetic scan test on a Dirichlet-tree multinomial model for microbiome data
- A Dirichlet-Tree Multinomial Regression Model for Associating Dietary Nutrients with Gut Microorganisms
This page was built for publication: Dirichlet-tree multinomial mixtures for clustering microbiome compositions
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2170402)