Optimal Random Sampling from Distributed Streams Revisited

From MaRDI portal
Publication:3095333

DOI10.1007/978-3-642-24100-0_27zbMATH Open1350.68051arXiv1903.12065OpenAlexW135476618WikidataQ60148584 ScholiaQ60148584MaRDI QIDQ3095333FDOQ3095333


Authors: Srikanta Tirthapura, David P. Woodruff Edit this on Wikidata


Publication date: 28 October 2011

Published in: Lecture Notes in Computer Science (Search for Journal in Brave)

Abstract: We give an improved algorithm for drawing a random sample from a large data stream when the input elements are distributed across multiple sites which communicate via a central coordinator. At any point in time the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotically improve the total number of messages sent in the system as well as the computation required of the coordinator. We also present a matching lower bound, showing that our protocol sends the optimal number of messages up to a constant factor with large probability. As a byproduct, we obtain an improved algorithm for finding the heavy hitters across multiple distributed sites.


Full work available at URL: https://arxiv.org/abs/1903.12065




Recommendations




Cites Work


Cited In (17)





This page was built for publication: Optimal Random Sampling from Distributed Streams Revisited

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3095333)