Compressed indexing with signature grammars

DOI10.1007/978-3-319-77404-6_25MaRDI QIDQ2294697zbMATH OpenOpenAlexFDO

Authors Anders Roy Christiansen, Mikko Berggren Ettienne

Publication date 12 February 2020

Full work available at URL https://arxiv.org/abs/1711.08217

Data structures (68P05) Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science) (68P30) Grammars and rewriting systems (68Q42) Algorithms on strings (68W32)

Abstract: The compressed indexing problem is to preprocess a string

S

of length

n

into a compressed representation that supports pattern matching queries. That is, given a string

P

of length

m

report all occurrences of

P

in

S

. We present a data structure that supports pattern matching queries in

O (m + o c c (l g l g n + l g^{e} p s i l o n z))

time using

O (z l g (n / z))

space where

z

is the size of the LZ77 parse of

S

and

e p s i l o n > 0

is an arbitrarily small constant, when the alphabet is small or

z = O (n^{1 - d e l t a})

for any constant

d e l t a > 0

. We also present two data structures for the general case; one where the space is increased by

O (z l g l g z)

, and one where the query time changes from worst-case to expected. These results improve the previously best known solutions. Notably, this is the first data structure that decides if

P

occurs in

S

in

O (m)

time using

O (z l g (n / z))

space. Our results are mainly obtained by a novel combination of a randomized grammar construction algorithm with well known techniques relating pattern matching to 2D-range reporting.

Recommendations

Cited in

(8)

This page was built for publication: Compressed indexing with signature grammars

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2294697)