Optimal Random Sampling from Distributed Streams Revisited
From MaRDI portal
Publication:3095333
Abstract: We give an improved algorithm for drawing a random sample from a large data stream when the input elements are distributed across multiple sites which communicate via a central coordinator. At any point in time the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotically improve the total number of messages sent in the system as well as the computation required of the coordinator. We also present a matching lower bound, showing that our protocol sends the optimal number of messages up to a constant factor with large probability. As a byproduct, we obtain an improved algorithm for finding the heavy hitters across multiple distributed sites.
Recommendations
- Continuous sampling from distributed streams
- Stream sampling for variance-optimal estimation of subset sums
- Efficient stream sampling for variance-optimal estimation of subset sums
- Boosting distinct random sampling for basic counting on the union of distributed streams
- Perfect \(L_p\) sampling in a data stream
- Streaming algorithms via precision sampling
- Efficient sampling of non-strict turnstile data streams
- Efficient sampling of non-strict turnstile data streams
- Sampling in dynamic data streams and applications
- SAMPLING IN DYNAMIC DATA STREAMS AND APPLICATIONS
Cites work
- scientific article; zbMATH DE number 5764835 (Why is no real title available?)
- scientific article; zbMATH DE number 3750146 (Why is no real title available?)
- scientific article; zbMATH DE number 2119720 (Why is no real title available?)
- Continuous sampling from distributed streams
- Data Streams: Algorithms and Applications
- Distributed streams algorithms for sliding windows
- Effective computations on sliding windows
- Functional Monitoring without Monotonicity
- Optimal sampling from sliding windows
- Random sampling with a reservoir
- Sketching asynchronous data streams over sliding windows
Cited in
(17)- Towards Optimal Moment Estimation in Streaming and Distributed Models
- Space-efficient estimation of statistics over sub-sampled streams
- A general result for selecting balanced unequal probability samples from a stream
- Continuous sampling from distributed streams
- On Local Distributed Sampling and Counting
- Towards Optimal Moment Estimation in Streaming and Distributed Models
- Perfect \(L_p\) sampling in a data stream
- New algorithms for distributed sliding windows
- Efficient sampling of non-strict turnstile data streams
- Sampling from Dense Streams without Penalty
- Model counting meets \(F_0\) estimation
- Distributed monitoring of election winners
- Optimal tracking of distributed heavy hitters and quantiles
- Boosting distinct random sampling for basic counting on the union of distributed streams
- Compact samples for data dissemination
- Improved algorithms for distributed entropy monitoring
- Randomized algorithms for tracking distributed count, frequencies, and ranks
This page was built for publication: Optimal Random Sampling from Distributed Streams Revisited
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3095333)