Permuted scaled matching (Q294926): Difference between revisions

The authors show how, given a text \(T\) of length \(n\) over an alphabet \(\Sigma\) and a multiset \(P\) of \(m\) characters from the same alphabet, for all positive integers \(k\) simultaneously we can find all the substrings (called permuted \(k\)-scaled occurrences of \(P\)) of \(T\) that contain exactly \(k\) times as many copies of each distinct character as there are in \(P\). Specifically, they give an \(O (n \log |\Sigma|)\)-time deterministic algorithm and an \(O (n)\)-time Monte Carlo algorithm that return, for each distinct position \(i\), the smallest \(k\) (if one exists) such that \(T [i, i + k m - 1]\) is a permuted \(k\)-scaled occurrence of \(P\). This restricts the size of the possible output to \(O (n)\) without sacrificing any information since, if \(T [i, j]\) and \(T [j + 1, h]\) are permuted scaled occurrences of \(P\), then \(T [i, h]\) is as well. The first main idea behind the authors' algorithms is that \(T [i, j]\) is a permuted scaled occurrence of \(P\) if and only if the following conditions hold: 1) for each character \(\sigma \in \Sigma\), \(\#_\sigma (T [0, i - 1]) \equiv \#_\sigma (T [0, j]) \bmod \#_\sigma (P)\), where \(\#_\sigma\) returns the number of copies of \(\sigma\) in its argument; 2) for each consecutive pair of characters \(\sigma_\ell, \sigma_{\ell + 1} \in \Sigma\), \[ \left\lfloor \frac{\#_{\sigma_\ell} (T [0, i - 1])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, i - 1])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor = \left\lfloor \frac{\#_{\sigma_\ell} (T [0, j])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, j])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor\;. \] The second main idea is to compute for each position \(i\) in \(T\) a vector of length \(2 |\Sigma|\) whose components are, to start with, \(\#_\sigma (T [0, i - 1])\) for each character \(\sigma \in \Sigma\); next, \(\left\lfloor \frac{\#_{\sigma_\ell} (T [0, i - 1])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, i - 1])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor\) for each consecutive pair of characters \(\sigma_\ell, \sigma_{\ell + 1} \in \Sigma\); and, finally, \(i\). The vectors for positions \(i\) and \(j\) with \(i < j\) agree on their prefixes of length \(2 |\Sigma| - 1\) if and only if \(T [i, j]\) is a permuted scaled occurrence of \(P\). This immediately yields an \(O (n |\Sigma|)\)-time algorithm based on computing the vectors naïvely and radix-sorting them. The third main idea is to speed up the computation of the vectors using deterministic renaming or randomized hashing and the fact that vectors for consecutive positions differ on \(O (1)\) components. Applying a result by \textit{T. Kociumaka} et al. [J. Comput. Syst. Sci. 84, 205--218 (2017; Zbl 1353.68225)] to group the vectors on the basis of their first \(2 |\Sigma| - 1\) components, then yields the deterministic and Monte Carlo time bounds stated above.

0 references

reviewed by

Travis Gagie

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

pattern matching

0 references

scaled matching

0 references

jumbled matching

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1016/j.tcs.2016.02.036

0 references

OpenAlex ID

W2407207612

0 references

cites work

Efficient text fingerprinting via Parikh mapping

0 references

Real scaled matching

0 references

Real two dimensional scaled matching

0 references

Efficient one-dimensional real scaled matching

0 references

Alphabet-Independent and Scaled Dictionary Matching

0 references

On Hardness of Jumbled Indexing

0 references

Faster two dimensional scaled matching

0 references

Efficient pattern matching with scaling

0 references

A fast string searching algorithm

0 references

ALGORITHMS FOR JUMBLED PATTERN MATCHING IN STRINGS

0 references

On approximate jumbled pattern matching in strings

0 references

Scaled and permuted string matching

0 references

Permuted Scaled Matching

0 references

Clustered Integer 3SUM via Additive Combinatorics

0 references

Near Linear Time Construction of an Approximate Index for All Maximum Consecutive Sub-sums of a Sequence

0 references

Q4055156

0 references

Time-space-optimal string matching

0 references

Q4125823

0 references

Efficient randomized pattern-matching algorithms

0 references

Fast Pattern Matching in Strings

0 references

Efficient Indexes for Jumbled Pattern Matching with Constant-Sized Alphabet

0 references

Fast Algorithms for Abelian Periods in Words and Greatest Common Divisor Queries

0 references

Sub-quadratic time and linear space data structures for permutation matching in binary strings

0 references

Deterministic Sampling–A New Technique for Fast Pattern Matching

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:294926

@@ Property / review text @@
+The authors show how, given a text \(T\) of length \(n\) over an alphabet \(\Sigma\) and a multiset \(P\) of \(m\) characters from the same alphabet, for all positive integers \(k\) simultaneously we can find all the substrings (called permuted \(k\)-scaled occurrences of \(P\)) of \(T\) that contain exactly \(k\) times as many copies of each distinct character as there are in \(P\). Specifically, they give an \(O (n \log |\Sigma|)\)-time deterministic algorithm and an \(O (n)\)-time Monte Carlo algorithm that return, for each distinct position \(i\), the smallest \(k\) (if one exists) such that \(T [i, i + k m - 1]\) is a permuted \(k\)-scaled occurrence of \(P\). This restricts the size of the possible output to \(O (n)\) without sacrificing any information since, if \(T [i, j]\) and \(T [j + 1, h]\) are permuted scaled occurrences of \(P\), then \(T [i, h]\) is as well.  The first main idea behind the authors' algorithms is that \(T [i, j]\) is a permuted scaled occurrence of \(P\) if and only if the following conditions hold:  1) for each character \(\sigma \in \Sigma\), \(\#_\sigma (T [0, i - 1]) \equiv \#_\sigma (T [0, j]) \bmod \#_\sigma (P)\), where \(\#_\sigma\) returns the number of copies of \(\sigma\) in its argument;  2) for each consecutive pair of characters \(\sigma_\ell, \sigma_{\ell + 1} \in \Sigma\),  \[ \left\lfloor \frac{\#_{\sigma_\ell} (T [0, i - 1])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, i - 1])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor = \left\lfloor \frac{\#_{\sigma_\ell} (T [0, j])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, j])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor\;. \]   The second main idea is to compute for each position \(i\) in \(T\) a vector of length \(2 |\Sigma|\) whose components are, to start with, \(\#_\sigma (T [0, i - 1])\) for each character \(\sigma \in \Sigma\); next, \(\left\lfloor \frac{\#_{\sigma_\ell} (T [0, i - 1])}{\#_{\sigma_\ell (P)}} \right\rfloor - \left\lfloor \frac{\#_{\sigma_{\ell + 1}} (T [0, i - 1])}{\#_{\sigma_{\ell + 1} (P)}} \right\rfloor\) for each consecutive pair of characters \(\sigma_\ell, \sigma_{\ell + 1} \in \Sigma\); and, finally, \(i\). The vectors for positions \(i\) and \(j\) with \(i < j\) agree on their prefixes of length \(2 |\Sigma| - 1\) if and only if \(T [i, j]\) is a permuted scaled occurrence of \(P\). This immediately yields an \(O (n |\Sigma|)\)-time algorithm based on computing the vectors naïvely and radix-sorting them. The third main idea is to speed up the computation of the vectors using deterministic renaming or randomized hashing and the fact that vectors for consecutive positions differ on \(O (1)\) components. Applying a result by \textit{T. Kociumaka} et al. [J. Comput. Syst. Sci. 84, 205--218 (2017; Zbl 1353.68225)] to group the vectors on the basis of their first \(2 |\Sigma| - 1\) components, then yields the deterministic and Monte Carlo time bounds stated above.
+Normal rank
@@ Property / reviewed by @@
+Travis Gagie
@@ Property / reviewed by: Travis Gagie / rank @@
+Normal rank
@@ Property / Mathematics Subject Classification ID @@
+W32
@@ Property / Mathematics Subject Classification ID: 68W32 / rank @@
+Normal rank
@@ Property / Mathematics Subject Classification ID @@
+R15
@@ Property / Mathematics Subject Classification ID: 68R15 / rank @@
+Normal rank
@@ Property / zbMATH DE Number @@
+6594178
@@ Property / zbMATH DE Number: 6594178 / rank @@
+Normal rank
@@ Property / zbMATH Keywords @@
+pattern matching
@@ Property / zbMATH Keywords: pattern matching / rank @@
+Normal rank
@@ Property / zbMATH Keywords @@
+scaled matching
@@ Property / zbMATH Keywords: scaled matching / rank @@
+Normal rank
@@ Property / zbMATH Keywords @@
+jumbled matching
@@ Property / zbMATH Keywords: jumbled matching / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.1016/j.tcs.2016.02.036
+Normal rank
@@ Property / OpenAlex ID @@
+W2407207612
@@ Property / OpenAlex ID: W2407207612 / rank @@
+Normal rank
@@ Property / cites work @@
+Efficient text fingerprinting via Parikh mapping
@@ Property / cites work: Efficient text fingerprinting via Parikh mapping / rank @@
+Normal rank
@@ Property / cites work @@
+Real scaled matching
@@ Property / cites work: Real scaled matching / rank @@
+Normal rank
@@ Property / cites work @@
+Real two dimensional scaled matching
@@ Property / cites work: Real two dimensional scaled matching / rank @@
+Normal rank
@@ Property / cites work @@
+Efficient one-dimensional real scaled matching
@@ Property / cites work: Efficient one-dimensional real scaled matching / rank @@
+Normal rank
@@ Property / cites work @@
+Alphabet-Independent and Scaled Dictionary Matching
+Normal rank
@@ Property / cites work @@
+On Hardness of Jumbled Indexing
@@ Property / cites work: On Hardness of Jumbled Indexing / rank @@
+Normal rank
@@ Property / cites work @@
+Faster two dimensional scaled matching
@@ Property / cites work: Faster two dimensional scaled matching / rank @@
+Normal rank
@@ Property / cites work @@
+Efficient pattern matching with scaling
@@ Property / cites work: Efficient pattern matching with scaling / rank @@
+Normal rank
@@ Property / cites work @@
+A fast string searching algorithm
@@ Property / cites work: A fast string searching algorithm / rank @@
+Normal rank
@@ Property / cites work @@
+ALGORITHMS FOR JUMBLED PATTERN MATCHING IN STRINGS
+Normal rank
@@ Property / cites work @@
+On approximate jumbled pattern matching in strings
+Normal rank
@@ Property / cites work @@
+Scaled and permuted string matching
@@ Property / cites work: Scaled and permuted string matching / rank @@
+Normal rank
@@ Property / cites work @@
+Permuted Scaled Matching
@@ Property / cites work: Permuted Scaled Matching / rank @@
+Normal rank
@@ Property / cites work @@
+Clustered Integer 3SUM via Additive Combinatorics
@@ Property / cites work: Clustered Integer 3SUM via Additive Combinatorics / rank @@
+Normal rank
@@ Property / cites work @@
+Near Linear Time Construction of an Approximate Index for All Maximum Consecutive Sub-sums of a Sequence
+Normal rank
@@ Property / cites work @@
+Q4055156
@@ Property / cites work: Q4055156 / rank @@
+Normal rank
@@ Property / cites work @@
+Time-space-optimal string matching
@@ Property / cites work: Time-space-optimal string matching / rank @@
+Normal rank
@@ Property / cites work @@
+Q4125823
@@ Property / cites work: Q4125823 / rank @@
+Normal rank
@@ Property / cites work @@
+Efficient randomized pattern-matching algorithms
@@ Property / cites work: Efficient randomized pattern-matching algorithms / rank @@
+Normal rank
@@ Property / cites work @@
+Fast Pattern Matching in Strings
@@ Property / cites work: Fast Pattern Matching in Strings / rank @@
+Normal rank
@@ Property / cites work @@
+Efficient Indexes for Jumbled Pattern Matching with Constant-Sized Alphabet
+Normal rank
@@ Property / cites work @@
+Fast Algorithms for Abelian Periods in Words and Greatest Common Divisor Queries
+Normal rank
@@ Property / cites work @@
+Sub-quadratic time and linear space data structures for permutation matching in binary strings
+Normal rank
@@ Property / cites work @@
+Deterministic Sampling–A New Technique for Fast Pattern Matching
+Normal rank
@@ links / mardi / name / links / mardi / name @@
+Publication:294926