On consistent and rate optimal estimation of the missing mass
From MaRDI portal
Abstract: Given samples from a population of individuals belonging to different types with unknown proportions, how do we estimate the probability of discovering a new type at the -th draw? This is a classical problem in statistics, commonly referred to as the missing mass estimation problem. Recent results by Ohannessian and Dahleh citet{Oha12} and Mossel and Ohannessian citet{Mos15} showed: i) the impossibility of estimating (learning) the missing mass without imposing further structural assumptions on the type proportions; ii) the consistency of the Good-Turing estimator for the missing mass under the assumption that the tail of the type proportions decays to zero as a regularly varying function with parameter . In this paper we rely on tools from Bayesian nonparametrics to provide an alternative, and simpler, proof of the impossibility of a distribution-free estimation of the missing mass. Up to our knowledge, the use of Bayesian ideas to study large sample asymptotics for the missing mass is new, and it could be of independent interest. Still relying on Bayesian nonparametric tools, we then show that under regularly varying type proportions the convergence rate of the Good-Turing estimator is the best rate that any estimator can achieve, up to a slowly varying function, and that minimax rate must be at least . We conclude with a discussion of our results, and by conjecturing that the Good-Turing estimator is an rate optimal minimax estimator under regularly varying type proportions.
Recommendations
Cites work
- scientific article; zbMATH DE number 3518091 (Why is no real title available?)
- scientific article; zbMATH DE number 4000257 (Why is no real title available?)
- scientific article; zbMATH DE number 3248623 (Why is no real title available?)
- 10.1162/1532443041424292
- A Bayesian analysis of some nonparametric problems
- A Brief History of Generative Models for Power Law and Lognormal Distributions
- Always Good Turing: asymptotically optimal probability estimation
- Asymptotic normality of a nonparametric estimator of sample coverage
- Asymptotic properties of Turing's formula in relative error
- Bounds for the difference between median and mean of beta and negative binomial distributions
- Combinatorial stochastic processes. Ecole d'Eté de Probabilités de Saint-Flour XXXII -- 2002.
- Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications
- Critical phenomena in natural sciences. Chaos, fractals, selforganization and disorder: Concepts and tools.
- Distinct Values Estimators for Power Law Distributions
- Estimating the number of unseen variants in the human genome
- Fundamentals of nonparametric Bayesian inference
- Moderate deviations for a nonparametric estimator of sample coverage
- Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws
- On the impossibility of estimating densities in the extreme tail
- Optimal discovery with probabilistic expert advice: finite time analysis and macroscopic optimality
- Power-law distributions in empirical data
- THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS
- The Structure and Function of Complex Networks
- The sample size required in importance sampling
- The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator
- Universal Coding and Order Identification by Model Selection Methods
- Universal Compression of Memoryless Sources Over Unknown Alphabets
Cited in
(6)- Always Good Turing: asymptotically optimal probability estimation
- On the sub-Gaussianity of the missing mass
- On the concentration of the missing mass
- Near-optimal estimation of the unseen under regularly varying tail populations
- Necessary and sufficient conditions for the asymptotic normality of higher order Turing estimators
- Generalized Good-Turing Improves Missing Mass Estimation
This page was built for publication: On consistent and rate optimal estimation of the missing mass
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2077330)