Speeding up q-gram mining on grammar-based compressed texts

From MaRDI portal
Publication:2904495

DOI10.1007/978-3-642-31265-6_18zbMATH Open1358.68339arXiv1202.3311OpenAlexW3098704848MaRDI QIDQ2904495FDOQ2904495


Authors: Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda Edit this on Wikidata


Publication date: 14 August 2012

Published in: Combinatorial Pattern Matching (Search for Journal in Brave)

Abstract: We present an efficient algorithm for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP mathcalT of size n that represents string T, the algorithm computes the occurrence frequencies of all q-grams in T, by reducing the problem to the weighted q-gram frequencies problem on a trie-like structure of size m=|T|mathitdup(q,mathcalT), where mathitdup(q,mathcalT) is a quantity that represents the amount of redundancy that the SLP captures with respect to q-grams. The reduced problem can be solved in linear time. Since m=O(qn), the running time of our algorithm is O(min|T|mathitdup(q,mathcalT),qn), improving our previous O(qn) algorithm when q=Omega(|T|/n).


Full work available at URL: https://arxiv.org/abs/1202.3311




Recommendations




Cited In (5)





This page was built for publication: Speeding up \(q\)-gram mining on grammar-based compressed texts

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2904495)