Rediscovery of Good-Turing estimators via Bayesian nonparametrics
From MaRDI portal
Publication:2805189
Abstract: The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this paper we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library.
Recommendations
- Bayesian nonparametric inference for discovery probabilities: credible intervals and large sample asymptotics
- Bayesian Nonparametric Estimation of the Probability of Discovering New Species
- A new estimator of the discovery probability
- Predicting the Conditional Probability of Discovering a New Class
- Nonparametric bayes estimation of the probability of discovering a new species
Cites work
- A new estimator of the discovery probability
- A triptych of discrete distributions related to the stable law
- Bayesian Nonparametric Estimation of the Probability of Discovering New Species
- Conditional formulae for Gibbs-type exchangeable random partitions
- Efficiently sampling nested Archimedean copulas
- Exchangeable and partially exchangeable random partitions
- Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws
- Predicting the Conditional Probability of Discovering a New Class
- THE NUMBER OF NEW SPECIES, AND THE INCREASE IN POPULATION COVERAGE, WHEN A SAMPLE IS INCREASED
- THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS
Cited in
(8)- Generalized Good-Turing Improves Missing Mass Estimation
- A Good-Turing estimator for feature allocation models
- Asymptotic properties of Turing's formula in relative error
- Good-Turing frequency estimation in a finite population
- A new estimator of the discovery probability
- Bayesian nonparametric inference for discovery probabilities: credible intervals and large sample asymptotics
- Perfect sampling of the posterior in the hierarchical Pitman-Yor process
- Bayesian Nonparametric Estimation of the Probability of Discovering New Species
This page was built for publication: Rediscovery of Good-Turing estimators via Bayesian nonparametrics
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2805189)