An improved data stream algorithm for clustering (Q904105)

From MaRDI portal
scientific article
Language Label Description Also known as
English
An improved data stream algorithm for clustering
scientific article

    Statements

    An improved data stream algorithm for clustering (English)
    0 references
    0 references
    0 references
    15 January 2016
    0 references
    The paper deals with the problem of the \(k\)-center problem, i.e., given a set of points \(P\) in a metric space and an integer \(k\), we want to find \(k\) points \(c_1, c_2, \dots, c_k\) such that \(\max_{p \in P} \min_{1\leq i \leq k} |p c_i|\) is minimized, where \(|pq|\) denotes the distance between \(p\) and \(q\). The authors make some additional assumptions, namely they take the \(d\)-dimensional Euclidean metric space \(\mathbb R^d\). Moreover, they assume that the points arrive one by one in a stream and we are allowed to examine each point only once, and only a small amount of information can be stored in a device. The authors start with giving a single-pass data stream algorithm that for any \(d \geq 1\) uses the \(O(d / \varepsilon)\) space and returns two centers together with their radii that guarantee a \((1.8 + \varepsilon)\)-approximation for the Euclidean \(2\)-center problem. Each update in the algorithm takes \(O(d / \varepsilon)\) time. Later, they extend the algorithm to the Euclidean \(k\)-center problem. They design an approximation algorithm that for some constant \(k > 2\) and for any dimension \(d \geq 1\) uses \(O(2^k (k+3)! d / \varepsilon )\) space and returns \(k\) centers that guarantee a \((1.8 + \varepsilon)\)-approximation. Each update in the algorithm takes \(O(2^k (k+2)! d / \varepsilon)\) time.
    0 references
    0 references
    Euclidean \(k\)-clustering
    0 references
    Euclidean \(k\)-center problem
    0 references
    streaming algorithms
    0 references
    computational geometry
    0 references
    0 references