An encoding for order-preserving matching

From MaRDI portal
Publication:5111725

DOI10.4230/LIPICS.ESA.2017.38zbMATH Open1442.68038arXiv1610.02865OpenAlexW2595089934MaRDI QIDQ5111725FDOQ5111725

Rossano Venturini, Travis Gagie, Giovanni Manzini

Publication date: 27 May 2020

Abstract: Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of order-preserving pattern matching. This problem was introduced only a few years ago but has already attracted significant attention because of its applications in data analysis. Two strings are said to be an order-preserving match if the {em relative order} of their characters is the same: e.g., 4,1,3,2 and 10,3,7,5 are an order-preserving match. We show how, given a string S[1..n] over an arbitrary alphabet and a constant cgeq1, we can build an O(nloglogn)-bit encoding such that later, given a pattern P[1..m] with mleqlgcn, we can return the number of order-preserving occurrences of P in S in O(m) time. Within the same time bound we can also return the starting position of some order-preserving match for P in S (if such a match exists). We prove that our space bound is within a constant factor of optimal; our query time is optimal if logsigma=Omega(logn). Our space bound contrasts with the Omega(nlogn) bits needed in the worst case to store S itself, an index for order-preserving pattern matching with no restrictions on the pattern length, or an index for standard pattern matching even with restrictions on the pattern length. Moreover, we can build our encoding knowing only how each character compares to O(lgcn) neighbouring characters.


Full work available at URL: https://arxiv.org/abs/1610.02865




Recommendations




Cites Work


Cited In (6)





This page was built for publication: An encoding for order-preserving matching

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5111725)