Internal pattern matching queries in a text and applications

From MaRDI portal
Publication:5362985

DOI10.1137/1.9781611973730.36zbMATH Open1371.68340arXiv1311.6235OpenAlexW2949583843MaRDI QIDQ5362985FDOQ5362985


Authors: Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń Edit this on Wikidata


Publication date: 5 October 2017

Published in: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms (Search for Journal in Brave)

Abstract: We consider several types of internal queries, that is, questions about fragments of a given text T specified in constant space by their locations in T. Our main result is an optimal data structure for Internal Pattern Matching (IPM) queries which, given two fragments x and y, ask for a representation of all fragments contained in y and matching x exactly; this problem can be viewed as an internal version of the Exact Pattern Matching problem. Our data structure answers IPM queries in time proportional to the quotient |y|/|x| of fragments' lengths, which is required due to the information content of the output. If T is a text of length n over an integer alphabet of size sigma, then our data structure occupies O(n/logsigman) machine words (that is, O(nlogsigma) bits) and admits an O(n/logsigman)-time construction algorithm. We show the applicability of IPM queries for answering internal queries corresponding to other classic string processing problems. Among others, we derive optimal data structures reporting the periods of a fragment and testing the cyclic equivalence of two fragments. IPM queries have already found numerous further applications, following the path paved by the classic Longest Common Extension (LCE) queries of Landau and Vishkin (JCSS, 1988). In particular, IPM queries have been implemented in grammar-compressed and dynamic settings and, along with LCE queries, constitute elementary operations of the PILLAR model, developed by Charalampopoulos, Kociumaka, and Wellnitz (FOCS 2020). On the way to our main result, we provide a novel construction of string synchronizing sets of Kempa and Kociumaka (STOC 2019). Our method, based on a new restricted version of the recompression technique of Je.z (J. ACM, 2016), yields a hierarchy of O(logn) string synchronizing sets covering the whole spectrum of fragments' lengths.


Full work available at URL: https://arxiv.org/abs/1311.6235




Recommendations




Cited In (32)





This page was built for publication: Internal pattern matching queries in a text and applications

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5362985)