On the concentration of the missing mass
From MaRDI portal
Publication:742962
Abstract: A random variable is sampled from a discrete distribution. The missing mass is the probability of the set of points not observed in the sample. We sharpen and simplify McAllester and Ortiz's results (JMLR, 2003) bounding the probability of large deviations of the missing mass. Along the way, we refine and rigorously prove a fundamental inequality of Kearns and Saul (UAI, 1998).
Recommendations
- The missing mass problem
- 10.1162/1532443041424292
- Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications
- On consistent and rate optimal estimation of the missing mass
- An optimal uniform concentration inequality for discrete entropies on finite alphabets in the high-dimensional setting
Cited In (16)
- Fano's inequality for random variables
- Asymptotic properties of Turing's formula in relative error
- 10.1162/1532443041424292
- On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables
- The Kearns-Saul inequality for Bernoulli and Poisson-binomial distributions
- Uniform Hanson-Wright type concentration inequalities for unbounded entries via the entropy method
- On the sub-Gaussianity of the missing mass
- The missing mass problem
- Learning Theory
- On Sub-Gaussian Concentration of Missing Mass
- Non-asymptotic sub-Gaussian error bounds for hypothesis testing
- On density perturbations and missing mass
- Large-sample properties of unsupervised estimation of the linear discriminant using projection pursuit
- Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications
- On consistent and rate optimal estimation of the missing mass
- Concentration bounds for unigram language models
This page was built for publication: On the concentration of the missing mass
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q742962)