A hierarchical Dirichlet process mixture model for haplotype reconstruction from multi-popu\-la\-tion data
From MaRDI portal
Publication:2270673
mixture modelspopulation geneticsDirichlet processcoalescencehaplotype inferencehierarchical Dirichlet process
Bayesian inference (62F15) Classification and discrimination; cluster analysis (statistical aspects) (62H30) Applications of statistics to biology and medical sciences; meta analysis (62P10) Numerical analysis or methods applied to Markov chains (65C40) Genetics and epigenetics (92D10) Nonparametric inference (62G99)
Abstract: The perennial problem of "how many clusters?" remains an issue of substantial interest in data mining and machine learning communities, and becomes particularly salient in large data sets such as populational genomic data where the number of clusters needs to be relatively large and open-ended. This problem gets further complicated in a co-clustering scenario in which one needs to solve multiple clustering problems simultaneously because of the presence of common centroids (e.g., ancestors) shared by clusters (e.g., possible descents from a certain ancestor) from different multiple-cluster samples (e.g., different human subpopulations). In this paper we present a hierarchical nonparametric Bayesian model to address this problem in the context of multi-population haplotype inference. Uncovering the haplotypes of single nucleotide polymorphisms is essential for many biological and medical applications. While it is uncommon for the genotype data to be pooled from multiple ethnically distinct populations, few existing programs have explicitly leveraged the individual ethnic information for haplotype inference. In this paper we present a new haplotype inference program, Haploi, which makes use of such information and is readily applicable to genotype sequences with thousands of SNPs from heterogeneous populations, with competent and sometimes superior speed and accuracy comparing to the state-of-the-art programs. Underlying Haploi is a new haplotype distribution model based on a nonparametric Bayesian formalism known as the hierarchical Dirichlet process, which represents a tractable surrogate to the coalescent process. The proposed model is exchangeable, unbounded, and capable of coupling demographic information of different populations.
Recommendations
- Modeling population structure under hierarchical Dirichlet processes
- Hidden Markov Dirichlet process: modeling genetic inference in open ancestral space
- Bayesian Inference on Population Structure: From Parametric to Nonparametric Modeling
- A nonparametric HMM for genetic imputation and coalescent inference
- Hierarchical Dirichlet Processes
Cites work
- scientific article; zbMATH DE number 3817476 (Why is no real title available?)
- scientific article; zbMATH DE number 3357695 (Why is no real title available?)
- 10.1162/jmlr.2003.3.4-5.993
- A Bayesian analysis of some nonparametric problems
- A Method for Combining Inference Across Related Nonparametric Bayesian Models
- Bayesian Density Estimation and Inference Using Mixtures
- Computational Methods for SNPs and Haplotype Inference
- Ferguson distributions via Polya urn schemes
- Hidden Markov Dirichlet process: modeling genetic inference in open ancestral space
- Hierarchical Dirichlet Processes
- Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems
- Pólya-like urns and the Ewens' sampling formula
Cited in
(5)- Hierarchical species sampling models
- A nonparametric HMM for genetic imputation and coalescent inference
- Hidden Markov Dirichlet process: modeling genetic inference in open ancestral space
- PyPop: a software framework for population genomic: analyzing large-scale multi-locus genotype data
- Modeling population structure under hierarchical Dirichlet processes
This page was built for publication: A hierarchical Dirichlet process mixture model for haplotype reconstruction from multi-popu\-la\-tion data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2270673)