Document listing on repetitive collections with guaranteed performance

From MaRDI portal
Publication:2632016

DOI10.1016/J.TCS.2018.11.022zbMATH Open1422.68062arXiv1707.06374OpenAlexW2964108621WikidataQ128847431 ScholiaQ128847431MaRDI QIDQ2632016FDOQ2632016

Gonzalo Navarro

Publication date: 17 May 2019

Published in: Theoretical Computer Science (Search for Journal in Brave)

Abstract: We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size N over alphabet [1,sigma] is composed of D copies of a string of size n, and s edits are applied on ranges of copies. We introduce the first document listing index with size ildeO(n+s), precisely O((nlogsigma+slog2N)logD) bits, and with useful worst-case time guarantees: Given a pattern of length m, the index reports the doc>0 strings where it appears in time O(mlog1+epsilonNcdotdoc), for any constant epsilon>0 (and tells in time O(mlogN) if doc=0). Our technique is to augment a range data structure that is commonly used on grammar-based indexes, so that instead of retrieving all the pattern occurrences, it computes useful summaries on them. We show that the idea has independent interest: we introduce the first grammar-based index that, on a text T[1,N] with a grammar of size r, uses O(rlogN) bits and counts the number of occurrences of a pattern P[1,m] in time O(m2+mlog2+epsilonr), for any constant epsilon>0. We also give the first index using O(zlog(N/z)logN) bits, where T is parsed by Lempel-Ziv into z phrases, counting occurrences in time O(mlog2+epsilonN).


Full work available at URL: https://arxiv.org/abs/1707.06374





Cites Work


Cited In (5)

Uses Software


Recommendations





This page was built for publication: Document listing on repetitive collections with guaranteed performance

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2632016)