Sparse suffix tree construction in optimal time and space

From MaRDI portal
Publication:4575764

DOI10.1137/1.9781611974782.27zbMATH Open1410.68099arXiv1608.00865OpenAlexW2953348849WikidataQ56485243 ScholiaQ56485243MaRDI QIDQ4575764FDOQ4575764


Authors: Paweł Gawrychowski, Tomasz Kociumaka Edit this on Wikidata


Publication date: 16 July 2018

Published in: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (Search for Journal in Brave)

Abstract: Suffix tree (and the closely related suffix array) are fundamental structures capturing all substrings of a given text essentially by storing all its suffixes in the lexicographical order. In some applications, we work with a subset of b interesting suffixes, which are stored in the so-called sparse suffix tree. Because the size of this structure is Theta(b), it is natural to seek a construction algorithm using only O(b) words of space assuming read-only random access to the text. We design a linear-time Monte Carlo algorithm for this problem, hence resolving an open question explicitly stated by Bille et al. [TALG 2016]. The best previously known algorithm by I et al. [STACS 2014] works in O(nlogb) time. Our solution proceeds in n/b rounds; in the r-th round, we consider all suffixes starting at positions congruent to r modulo n/b. By maintaining rolling hashes, we lexicographically sort all interesting suffixes starting at such positions, and then we merge them with the already considered suffixes. For efficient merging, we also need to answer LCE queries in small space. By plugging in the structure of Bille et al. [CPM 2015] we obtain O(n+blogb) time complexity. We improve this structure, which implies a linear-time sparse suffix tree construction algorithm. We complement our Monte Carlo algorithm with a deterministic verification procedure. The verification takes O(nsqrtlogb) time, which improves upon the bound of O(nlogb) obtained by I et al. [STACS 2014]. This is obtained by first observing that the pruning done inside the previous solution has a rather clean description using the notion of graph spanners with small multiplicative stretch. Then, we are able to decrease the verification time by applying difference covers twice. Combined with the Monte Carlo algorithm, this gives us an O(nsqrtlogb)-time and O(b)-space Las Vegas algorithm.


Full work available at URL: https://arxiv.org/abs/1608.00865




Recommendations




Cited In (14)





This page was built for publication: Sparse suffix tree construction in optimal time and space

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4575764)