Unified compression-based acceleration of edit-distance computation
From MaRDI portal
Publication:1939664
Abstract: The edit distance problem is a classical fundamental problem in computer science in general, and in combinatorial pattern matching in particular. The standard dynamic programming solution for this problem computes the edit-distance between a pair of strings of total length O(N) in O(N^2) time. To this date, this quadratic upper-bound has never been substantially improved for general strings. However, there are known techniques for breaking this bound in case the strings are known to compress well under a particular compression scheme. The basic idea is to first compress the strings, and then to compute the edit distance between the compressed strings. As it turns out, practically all known o(N^2) edit-distance algorithms work, in some sense, under the same paradigm described above. It is therefore natural to ask whether there is a single edit-distance algorithm that works for strings which are compressed under any compression scheme. A rephrasing of this question is to ask whether a single algorithm can exploit the compressibility properties of strings under any compression method, even if each string is compressed using a different compression. In this paper we set out to answer this question by using straight line programs. These provide a generic platform for representing many popular compression schemes including the LZ-family, Run-Length Encoding, Byte-Pair Encoding, and dictionary methods. For two strings of total length N having straight-line program representations of total size n, we present an algorithm running in O(nN log(N/n)) time for computing the edit-distance of these two strings under any rational scoring function, and an O(n^{2/3}N^{4/3}) time algorithm for arbitrary scoring functions. Our new result, while providing a signi cant speed up for highly compressible strings, does not surpass the quadratic time bound even in the worst case scenario.
Recommendations
- A unified algorithm for accelerating edit-distance computation via text-compression
- A fully compressed algorithm for computing the edit distance of run-length encoded strings
- A fully compressed algorithm for computing the edit distance of run-length encoded strings
- Edit distance for a run-length-encoded string and an uncompressed string
- The string edit distance matching problem with moves
Cites work
- scientific article; zbMATH DE number 1615281 (Why is no real title available?)
- scientific article; zbMATH DE number 1629861 (Why is no real title available?)
- scientific article; zbMATH DE number 1045405 (Why is no real title available?)
- scientific article; zbMATH DE number 2079423 (Why is no real title available?)
- scientific article; zbMATH DE number 2119662 (Why is no real title available?)
- A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices
- A faster algorithm computing string edit distances
- A unified algorithm for accelerating edit-distance computation via text-compression
- A universal algorithm for sequential data compression
- Algorithms on Strings, Trees and Sequences
- All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings
- An improved algorithm for computing the edit distance of run-length coded strings
- Application of Lempel-Ziv factorization to the approximation of grammar-based compression.
- Collage system: A unifying framework for compressed pattern matching.
- Compression of individual sequences via variable-rate coding
- Edit distance of run-length encoded strings.
- Efficient Parallel Algorithms for String Editing and Related Problems
- Fast distance multiplication of unit-Monge matrices
- Faster subsequence recognition in compressed strings
- Geometric applications of a matrix-searching algorithm
- Let sleeping files lie: Pattern matching in Z-compressed files.
- Matching for run-length encoded strings
- On the Complexity of Finite Sequences
- Processing Compressed Texts: A Tractability Border
- Querying and Embedding Compressed Texts
- Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions
- The String-to-String Correction Problem
- Window Subsequence Problems for Compressed Texts
Cited in
(6)- Towards approximate matching in compressed strings: local subsequence recognition
- LZD factorization: simple and practical online grammar compression with variable-to-fixed encoding
- Near-optimal distance emulator for planar graphs
- Fast distance multiplication of unit-Monge matrices
- Hybridizing variable neighborhood search with ant colony optimization for solving the single row facility layout problem
- A unified algorithm for accelerating edit-distance computation via text-compression
This page was built for publication: Unified compression-based acceleration of edit-distance computation
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1939664)