Abstract: We wish to estimate the total number of classes in a population based on sample counts, especially in the presence of high latent diversity. Drawing on probability theory that characterizes distributions on the integers by ratios of consecutive probabilities, we construct a nonlinear regression model for the ratios of consecutive frequency counts. This allows us to predict the unobserved count and hence estimate the total diversity. We believe that this is the first approach to depart from the classical mixed Poisson model in this problem. Our method is geometrically intuitive and yields good fits to data with reasonable standard errors. It is especially well-suited to analyzing high diversity datasets derived from next-generation sequencing in microbial ecology. We demonstrate the method's performance in this context and via simulation, and we present a dataset for which our method outperforms all competitors.
Recommendations
- A note on marginal count distributions for diversity estimation
- Parametric models for estimating the number of classes
- Bayesian model averaging for estimating the number of classes: applications to the total number of species in metagenomics
- Tuning parameter selection for a penalized estimator of species richness
- scientific article; zbMATH DE number 10134
Cites work
- scientific article; zbMATH DE number 4078473 (Why is no real title available?)
- scientific article; zbMATH DE number 3587888 (Why is no real title available?)
- scientific article; zbMATH DE number 2208164 (Why is no real title available?)
- scientific article; zbMATH DE number 3297775 (Why is no real title available?)
- A Penalized Nonparametric Maximum Likelihood Approach to Species Richness Estimation
- Equivalence of Truncated Count Mixture Distributions and Mixtures of Truncated Count Distributions
- Estimating the Population Size for Capture-Recapture Data with Unequal Catchability
- Estimating the number of classes
- Estimating the number of species in a stochastic abundance model
- Families of power series distributions, with particular reference to the Lerch family
- Metric Multivariate Poisson Approximation of the Generalized Multinomial Distribution
- Nonlinear Regression with Dependent Observations
- On the poisson approximation to the multinomial distribution
- On the rate of multivariate Poisson convergence
- Parameter estimation by Hellinger type distance for multivariate distributions based upon probability generating functions
- Poisson mixtures and quasi-infinite divisibility of distributions
- Population size estimation based upon ratios of recapture probabilities
- Univariate Discrete Distributions
Cited in
(10)- Limit theorems for empirical Rényi entropy and divergence with applications to molecular diversity analysis
- A flexible ratio regression approach for zero-truncated capture-recapture counts
- A note on marginal count distributions for diversity estimation
- Clonality: point estimation
- Tuning parameter selection for a penalized estimator of species richness
- breakaway
- scientific article; zbMATH DE number 4072123 (Why is no real title available?)
- Analyzing fractal property of species abundance distribution and diversity indexes
- Effective numbers in the partitioning of biological diversity
- A modification of Chao's lower bound estimator in the case of one-inflation
This page was built for publication: Estimating diversity via frequency ratios
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q88012)