Classification of molecular sequence data using Bayesian phylogenetic mixture models
From MaRDI portal
Markov chain Monte Carlomodel selectionBayesian mixture modelclassificationphylogenyamong-site rate variation
Computational methods for problems pertaining to statistics (62-08) Bayesian inference (62F15) Classification and discrimination; cluster analysis (statistical aspects) (62H30) Applications of statistics to biology and medical sciences; meta analysis (62P10) Monte Carlo methods (65C05) Numerical analysis or methods applied to Markov chains (65C40)
Abstract: Rate variation among the sites of a molecular sequence is commonly found in applications of phylogenetic inference. Several approaches exist to account for this feature but they do not usually enable the investigator to pinpoint the sites that evolve under one or another rate of evolution in a straightforward manner. The focus is on Bayesian phylogenetic mixture models, augmented with allocation variables, as tools for site classification and quantification of classification uncertainty. The method does not rely on prior knowledge of site membership to classes or even the number of classes. Furthermore, it does not require correlated sites to be next to one another in the sequence alignment, unlike some phylogenetic hidden Markov or change-point models. In the approach presented, model selection on the number and type of mixture components is conducted ahead of both model estimation and site classification; the steppingstone sampler (SS) is used to select amongst competing mixture models. Example applications of simulated data and mitochondrial DNA of primates illustrate site classification via 'augmented' Bayesian phylogenetic mixtures. In both examples, all mixtures outperform commonly-used models of among-site rate variation and models that do not account for rate heterogeneity. The examples further demonstrate how site classification is readily available from the analysis output. The method is directly relevant to the choice of partitions in Bayesian phylogenetics, and its application may lead to the discovery of structure not otherwise recognised in a molecular sequence alignment. Computational aspects of Bayesian phylogenetic model estimation are discussed, including the use of simple Markov chain Monte Carlo (MCMC) moves that mix efficiently without tempering the chains.
Recommendations
Cites work
- scientific article; zbMATH DE number 3943611 (Why is no real title available?)
- scientific article; zbMATH DE number 509150 (Why is no real title available?)
- scientific article; zbMATH DE number 1085980 (Why is no real title available?)
- A Bayesian Hierarchical Model for Photometric Red Shifts
- A comparative study of Monte Carlo methods for efficient evaluation of marginal likelihood
- Bayes Factors
- Comparison of labeled trees with valency three
- Computing Bayes Factors Using a Generalization of the Savage-Dickey Density Ratio
- Estimating Bayes factors via thermodynamic integration and population MCMC
- Marginal Likelihood Estimation via Power Posteriors
- Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling
- Reversible jump Markov chain Monte Carlo computation and Bayesian model determination
- Two-way Bayesian hierarchical phylogenetic models: an application to the co-evolution of gp120 and gp41 during and after enfuvirtide treatment
Cited in
(7)- Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model
- Inferring Spatial Phylogenetic Variation Along Nucleotide Sequences
- Bayesian factor models in characterizing molecular adaptation
- Bayesian phylogenetics. Methods, algorithms, and applications
- Mixture models in phylogenetic inference
- Bayesian modelling of compositional heterogeneity in molecular phylogenetics
- Classification and clustering of sequencing data using a Poisson model
This page was built for publication: Classification of molecular sequence data using Bayesian phylogenetic mixture models
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1623476)