Permuted scaled matching (Q294926): Difference between revisions

From MaRDI portal
Importer (talk | contribs)
Created a new Item
 
ReferenceBot (talk | contribs)
Changed an Item
 
(4 intermediate revisions by 4 users not shown)
Property / review text
 
The authors show how, given a text \(T\) of length \(n\) over an alphabet \(\Sigma\) and a multiset \(P\) of \(m\) characters from the same alphabet, for all positive integers \(k\) simultaneously we can find all the substrings (called permuted \(k\)-scaled occurrences of \(P\)) of \(T\) that contain exactly \(k\) times as many copies of each distinct character as there are in \(P\). Specifically, they give an \(O (n \log |\Sigma|)\)-time deterministic algorithm and an \(O (n)\)-time Monte Carlo algorithm that return, for each distinct position \(i\), the smallest \(k\) (if one exists) such that \(T [i, i + k m - 1]\) is a permuted \(k\)-scaled occurrence of \(P\). This restricts the size of the possible output to \(O (n)\) without sacrificing any information since, if \(T [i, j]\) and \(T [j + 1, h]\) are permuted scaled occurrences of \(P\), then \(T [i, h]\) is as well. The first main idea behind the authors' algorithms is that \(T [i, j]\) is a permuted scaled occurrence of \(P\) if and only if the following conditions hold: 1) for each character \(\sigma \in \Sigma\), \(\#_\sigma (T [0, i - 1]) \equiv \#_\sigma (T [0, j]) \bmod \#_\sigma (P)\), where \(\#_\sigma\) returns the number of copies of \(\sigma\) in its argument; 2) for each consecutive pair of characters \(\sigma_\ell, \sigma_{\ell + 1} \in \Sigma\), \[ \left\lfloor \frac{\#_{\sigma_\ell} (T [0, i - 1])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, i - 1])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor = \left\lfloor \frac{\#_{\sigma_\ell} (T [0, j])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, j])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor\;. \] The second main idea is to compute for each position \(i\) in \(T\) a vector of length \(2 |\Sigma|\) whose components are, to start with, \(\#_\sigma (T [0, i - 1])\) for each character \(\sigma \in \Sigma\); next, \(\left\lfloor \frac{\#_{\sigma_\ell} (T [0, i - 1])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, i - 1])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor\) for each consecutive pair of characters \(\sigma_\ell, \sigma_{\ell + 1} \in \Sigma\); and, finally, \(i\). The vectors for positions \(i\) and \(j\) with \(i < j\) agree on their prefixes of length \(2 |\Sigma| - 1\) if and only if \(T [i, j]\) is a permuted scaled occurrence of \(P\). This immediately yields an \(O (n |\Sigma|)\)-time algorithm based on computing the vectors naïvely and radix-sorting them. The third main idea is to speed up the computation of the vectors using deterministic renaming or randomized hashing and the fact that vectors for consecutive positions differ on \(O (1)\) components. Applying a result by \textit{T. Kociumaka} et al. [J. Comput. Syst. Sci. 84, 205--218 (2017; Zbl 1353.68225)] to group the vectors on the basis of their first \(2 |\Sigma| - 1\) components, then yields the deterministic and Monte Carlo time bounds stated above.
Property / review text: The authors show how, given a text \(T\) of length \(n\) over an alphabet \(\Sigma\) and a multiset \(P\) of \(m\) characters from the same alphabet, for all positive integers \(k\) simultaneously we can find all the substrings (called permuted \(k\)-scaled occurrences of \(P\)) of \(T\) that contain exactly \(k\) times as many copies of each distinct character as there are in \(P\). Specifically, they give an \(O (n \log |\Sigma|)\)-time deterministic algorithm and an \(O (n)\)-time Monte Carlo algorithm that return, for each distinct position \(i\), the smallest \(k\) (if one exists) such that \(T [i, i + k m - 1]\) is a permuted \(k\)-scaled occurrence of \(P\). This restricts the size of the possible output to \(O (n)\) without sacrificing any information since, if \(T [i, j]\) and \(T [j + 1, h]\) are permuted scaled occurrences of \(P\), then \(T [i, h]\) is as well. The first main idea behind the authors' algorithms is that \(T [i, j]\) is a permuted scaled occurrence of \(P\) if and only if the following conditions hold: 1) for each character \(\sigma \in \Sigma\), \(\#_\sigma (T [0, i - 1]) \equiv \#_\sigma (T [0, j]) \bmod \#_\sigma (P)\), where \(\#_\sigma\) returns the number of copies of \(\sigma\) in its argument; 2) for each consecutive pair of characters \(\sigma_\ell, \sigma_{\ell + 1} \in \Sigma\), \[ \left\lfloor \frac{\#_{\sigma_\ell} (T [0, i - 1])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, i - 1])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor = \left\lfloor \frac{\#_{\sigma_\ell} (T [0, j])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, j])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor\;. \] The second main idea is to compute for each position \(i\) in \(T\) a vector of length \(2 |\Sigma|\) whose components are, to start with, \(\#_\sigma (T [0, i - 1])\) for each character \(\sigma \in \Sigma\); next, \(\left\lfloor \frac{\#_{\sigma_\ell} (T [0, i - 1])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, i - 1])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor\) for each consecutive pair of characters \(\sigma_\ell, \sigma_{\ell + 1} \in \Sigma\); and, finally, \(i\). The vectors for positions \(i\) and \(j\) with \(i < j\) agree on their prefixes of length \(2 |\Sigma| - 1\) if and only if \(T [i, j]\) is a permuted scaled occurrence of \(P\). This immediately yields an \(O (n |\Sigma|)\)-time algorithm based on computing the vectors naïvely and radix-sorting them. The third main idea is to speed up the computation of the vectors using deterministic renaming or randomized hashing and the fact that vectors for consecutive positions differ on \(O (1)\) components. Applying a result by \textit{T. Kociumaka} et al. [J. Comput. Syst. Sci. 84, 205--218 (2017; Zbl 1353.68225)] to group the vectors on the basis of their first \(2 |\Sigma| - 1\) components, then yields the deterministic and Monte Carlo time bounds stated above. / rank
 
Normal rank
Property / reviewed by
 
Property / reviewed by: Travis Gagie / rank
 
Normal rank
Property / Mathematics Subject Classification ID
 
Property / Mathematics Subject Classification ID: 68W32 / rank
 
Normal rank
Property / Mathematics Subject Classification ID
 
Property / Mathematics Subject Classification ID: 68R15 / rank
 
Normal rank
Property / zbMATH DE Number
 
Property / zbMATH DE Number: 6594178 / rank
 
Normal rank
Property / zbMATH Keywords
 
pattern matching
Property / zbMATH Keywords: pattern matching / rank
 
Normal rank
Property / zbMATH Keywords
 
scaled matching
Property / zbMATH Keywords: scaled matching / rank
 
Normal rank
Property / zbMATH Keywords
 
jumbled matching
Property / zbMATH Keywords: jumbled matching / rank
 
Normal rank
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / full work available at URL
 
Property / full work available at URL: https://doi.org/10.1016/j.tcs.2016.02.036 / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W2407207612 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Efficient text fingerprinting via Parikh mapping / rank
 
Normal rank
Property / cites work
 
Property / cites work: Real scaled matching / rank
 
Normal rank
Property / cites work
 
Property / cites work: Real two dimensional scaled matching / rank
 
Normal rank
Property / cites work
 
Property / cites work: Efficient one-dimensional real scaled matching / rank
 
Normal rank
Property / cites work
 
Property / cites work: Alphabet-Independent and Scaled Dictionary Matching / rank
 
Normal rank
Property / cites work
 
Property / cites work: On Hardness of Jumbled Indexing / rank
 
Normal rank
Property / cites work
 
Property / cites work: Faster two dimensional scaled matching / rank
 
Normal rank
Property / cites work
 
Property / cites work: Efficient pattern matching with scaling / rank
 
Normal rank
Property / cites work
 
Property / cites work: A fast string searching algorithm / rank
 
Normal rank
Property / cites work
 
Property / cites work: ALGORITHMS FOR JUMBLED PATTERN MATCHING IN STRINGS / rank
 
Normal rank
Property / cites work
 
Property / cites work: On approximate jumbled pattern matching in strings / rank
 
Normal rank
Property / cites work
 
Property / cites work: Scaled and permuted string matching / rank
 
Normal rank
Property / cites work
 
Property / cites work: Permuted Scaled Matching / rank
 
Normal rank
Property / cites work
 
Property / cites work: Clustered Integer 3SUM via Additive Combinatorics / rank
 
Normal rank
Property / cites work
 
Property / cites work: Near Linear Time Construction of an Approximate Index for All Maximum Consecutive Sub-sums of a Sequence / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4055156 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Time-space-optimal string matching / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4125823 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Efficient randomized pattern-matching algorithms / rank
 
Normal rank
Property / cites work
 
Property / cites work: Fast Pattern Matching in Strings / rank
 
Normal rank
Property / cites work
 
Property / cites work: Efficient Indexes for Jumbled Pattern Matching with Constant-Sized Alphabet / rank
 
Normal rank
Property / cites work
 
Property / cites work: Fast Algorithms for Abelian Periods in Words and Greatest Common Divisor Queries / rank
 
Normal rank
Property / cites work
 
Property / cites work: Sub-quadratic time and linear space data structures for permutation matching in binary strings / rank
 
Normal rank
Property / cites work
 
Property / cites work: Deterministic Sampling–A New Technique for Fast Pattern Matching / rank
 
Normal rank
links / mardi / namelinks / mardi / name
 

Latest revision as of 04:51, 12 July 2024

scientific article
Language Label Description Also known as
English
Permuted scaled matching
scientific article

    Statements

    Permuted scaled matching (English)
    0 references
    0 references
    0 references
    0 references
    16 June 2016
    0 references
    The authors show how, given a text \(T\) of length \(n\) over an alphabet \(\Sigma\) and a multiset \(P\) of \(m\) characters from the same alphabet, for all positive integers \(k\) simultaneously we can find all the substrings (called permuted \(k\)-scaled occurrences of \(P\)) of \(T\) that contain exactly \(k\) times as many copies of each distinct character as there are in \(P\). Specifically, they give an \(O (n \log |\Sigma|)\)-time deterministic algorithm and an \(O (n)\)-time Monte Carlo algorithm that return, for each distinct position \(i\), the smallest \(k\) (if one exists) such that \(T [i, i + k m - 1]\) is a permuted \(k\)-scaled occurrence of \(P\). This restricts the size of the possible output to \(O (n)\) without sacrificing any information since, if \(T [i, j]\) and \(T [j + 1, h]\) are permuted scaled occurrences of \(P\), then \(T [i, h]\) is as well. The first main idea behind the authors' algorithms is that \(T [i, j]\) is a permuted scaled occurrence of \(P\) if and only if the following conditions hold: 1) for each character \(\sigma \in \Sigma\), \(\#_\sigma (T [0, i - 1]) \equiv \#_\sigma (T [0, j]) \bmod \#_\sigma (P)\), where \(\#_\sigma\) returns the number of copies of \(\sigma\) in its argument; 2) for each consecutive pair of characters \(\sigma_\ell, \sigma_{\ell + 1} \in \Sigma\), \[ \left\lfloor \frac{\#_{\sigma_\ell} (T [0, i - 1])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, i - 1])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor = \left\lfloor \frac{\#_{\sigma_\ell} (T [0, j])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, j])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor\;. \] The second main idea is to compute for each position \(i\) in \(T\) a vector of length \(2 |\Sigma|\) whose components are, to start with, \(\#_\sigma (T [0, i - 1])\) for each character \(\sigma \in \Sigma\); next, \(\left\lfloor \frac{\#_{\sigma_\ell} (T [0, i - 1])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, i - 1])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor\) for each consecutive pair of characters \(\sigma_\ell, \sigma_{\ell + 1} \in \Sigma\); and, finally, \(i\). The vectors for positions \(i\) and \(j\) with \(i < j\) agree on their prefixes of length \(2 |\Sigma| - 1\) if and only if \(T [i, j]\) is a permuted scaled occurrence of \(P\). This immediately yields an \(O (n |\Sigma|)\)-time algorithm based on computing the vectors naïvely and radix-sorting them. The third main idea is to speed up the computation of the vectors using deterministic renaming or randomized hashing and the fact that vectors for consecutive positions differ on \(O (1)\) components. Applying a result by \textit{T. Kociumaka} et al. [J. Comput. Syst. Sci. 84, 205--218 (2017; Zbl 1353.68225)] to group the vectors on the basis of their first \(2 |\Sigma| - 1\) components, then yields the deterministic and Monte Carlo time bounds stated above.
    0 references
    0 references
    0 references
    pattern matching
    0 references
    scaled matching
    0 references
    jumbled matching
    0 references
    0 references