Sharp Frequency Bounds for Sample-Based Queries

From MaRDI portal
Publication:6407797

arXiv2208.06753MaRDI QIDQ6407797FDOQ6407797


Authors: Eric Bax, John T. Donald Edit this on Wikidata


Publication date: 13 August 2022

Abstract: A data sketch algorithm scans a big data set, collecting a small amount of data -- the sketch, which can be used to statistically infer properties of the big data set. Some data sketch algorithms take a fixed-size random sample of a big data set, and use that sample to infer frequencies of items that meet various criteria in the big data set. This paper shows how to statistically infer probably approximately correct (PAC) bounds for those frequencies, efficiently, and precisely enough that the frequency bounds are either sharp or off by only one, which is the best possible result without exact computation.













This page was built for publication: Sharp Frequency Bounds for Sample-Based Queries

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6407797)