On the concentration of the missing mass
From MaRDI portal
Publication:742962
DOI10.1214/ECP.V18-2359zbMATH Open1329.60050arXiv1210.3248MaRDI QIDQ742962FDOQ742962
Authors: Daniel Berend, Aryeh Kontorovich
Publication date: 22 September 2014
Published in: Electronic Communications in Probability (Search for Journal in Brave)
Abstract: A random variable is sampled from a discrete distribution. The missing mass is the probability of the set of points not observed in the sample. We sharpen and simplify McAllester and Ortiz's results (JMLR, 2003) bounding the probability of large deviations of the missing mass. Along the way, we refine and rigorously prove a fundamental inequality of Kearns and Saul (UAI, 1998).
Full work available at URL: https://arxiv.org/abs/1210.3248
Recommendations
- The missing mass problem
- 10.1162/1532443041424292
- Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications
- On consistent and rate optimal estimation of the missing mass
- An optimal uniform concentration inequality for discrete entropies on finite alphabets in the high-dimensional setting
Cited In (16)
- Fano's inequality for random variables
- Asymptotic properties of Turing's formula in relative error
- 10.1162/1532443041424292
- On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables
- The Kearns-Saul inequality for Bernoulli and Poisson-binomial distributions
- Uniform Hanson-Wright type concentration inequalities for unbounded entries via the entropy method
- On the sub-Gaussianity of the missing mass
- The missing mass problem
- Learning Theory
- On Sub-Gaussian Concentration of Missing Mass
- Non-asymptotic sub-Gaussian error bounds for hypothesis testing
- On density perturbations and missing mass
- Large-sample properties of unsupervised estimation of the linear discriminant using projection pursuit
- Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications
- On consistent and rate optimal estimation of the missing mass
- Concentration bounds for unigram language models
This page was built for publication: On the concentration of the missing mass
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q742962)