Speeding up q-gram mining on grammar-based compressed texts
From MaRDI portal
Publication:2904495
Abstract: We present an efficient algorithm for calculating -gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size that represents string , the algorithm computes the occurrence frequencies of all -grams in , by reducing the problem to the weighted -gram frequencies problem on a trie-like structure of size , where is a quantity that represents the amount of redundancy that the SLP captures with respect to -grams. The reduced problem can be solved in linear time. Since , the running time of our algorithm is , improving our previous algorithm when .
Recommendations
Cited in
(5)- Compact q-gram profiling of compressed strings
- Compact \(q\)-gram profiling of compressed strings
- LZD factorization: simple and practical online grammar compression with variable-to-fixed encoding
- Fast \(q\)-gram mining on SLP compressed strings
- Computing \(q\)-gram non-overlapping frequencies on SLP compressed texts
This page was built for publication: Speeding up \(q\)-gram mining on grammar-based compressed texts
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2904495)