Approximate string matching with compressed indexes (Q1662494): Difference between revisions

Summary: A compressed full-text self-index for a text $T$ is a data structure requiring reduced space and able to search for patterns $P$ in $T$. It can also reproduce any substring of $T$, thus actually replacing $T$. Despite the recent explosion of interest on compressed indexes, there has not been much progress on functionalities beyond the basic exact search. In this paper we focus on indexed approximate string matching (ASM), which is of great interest, say, in bioinformatics. We study ASM algorithms for Lempel-Ziv compressed indexes and for compressed suffix trees/arrays. Most compressed self-indexes belong to one of these classes. We start by adapting the classical method of partitioning into exact search to self-indexes, and optimize it over a representative of either class of self-index. Then, we show that a Lempel-Ziv index can be seen as an extension of the classical $q$-samples index. We give new insights on this type of index, which can be of independent interest, and then apply them to a Lempel-Ziv index. Finally, we improve hierarchical verification, a successful technique for sequential searching, so as to extend the matches of pattern pieces to the left or right. Most compressed suffix trees/arrays support the required bidirectionality, thus enabling the implementation of the improved technique. In turn, the improved verification largely reduces the accesses to the text, which are expensive in self-indexes. We show experimentally that our algorithms are competitive and provide useful space-time tradeoffs compared to classical indexes.

0 references

zbMATH Keywords

compressed index

0 references

approximate string matching

0 references

Lempel-Ziv index

0 references

compressed suffix tree

0 references

compressed suffix array

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.3390/a2031105

0 references

cites work

Average-optimal single and multiple approximate string matching

0 references

Dictionary matching and indexing with errors and don't cares

0 references

Combinatorial Pattern Matching

0 references

A Linear Size Index for Approximate Pattern Matching

0 references

Suffix Arrays: A New Method for On-Line String Searches

0 references

Q4661875

0 references

A sublinear algorithm for approximate keyword searching

0 references

Indexing text with approximate $q$-grams

0 references

Compressed representations of sequences and full-text indexes

0 references

An analysis of the Burrows—Wheeler transform

0 references

Compression of individual sequences via variable-rate coding

0 references

Indexing compressed text

0 references

Indexing text using the Ziv--Lempel trie

0 references

Reducing the Space Requirement of LZ-Index

0 references

New text indexing functionalities of the compressed suffix arrays

0 references

Q4471381

0 references

Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching

0 references

Compressed suffix trees with full functionality

0 references

An(other) Entropy-Bounded Compressed Suffix Tree

0 references

Fully-Compressed Suffix Trees

0 references

Compressed text indexes

0 references

Combinatorial Pattern Matching

0 references

Algorithms and Computation

0 references

Improving an algorithm for approximate pattern matching

0 references

Q3690245

0 references

Algorithms on Strings, Trees and Sequences

0 references

On the Complexity of Finite Sequences

0 references

A universal algorithm for sequential data compression

0 references

Finding approximate patterns in strings

0 references

Incremental String Comparison

0 references

Linear bidirectional on-line construction of affix trees

0 references

Implementing the LZ-index

0 references

Dynamic Rank-Select Structures with Applications to Run-Length Encoded Texts

0 references

A fast bit-vector algorithm for approximate string matching based on dynamic programming

0 references

Very fast and simple approximate string matching

0 references

Succinct Indexable Dictionaries with Applications to Encoding $k$-ary Trees, Prefix Sums and Multisets

0 references

Dynamic entropy-compressed sequences and full-text indexes

0 references

Identifiers

zbMATH Open document ID

1461.68271

0 references

DOI

10.3390/a2031105

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1662494

@@ Property / author @@
+Pedro Morales-Almazan
@@ Property / author: Pedro Morales-Almazan / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.3390/a2031105
@@ Property / full work available at URL: https://doi.org/10.3390/a2031105 / rank @@
+Normal rank
@@ Property / OpenAlex ID @@
+W2121806125
@@ Property / OpenAlex ID: W2121806125 / rank @@
+Normal rank
@@ Property / Wikidata QID @@
+Q58883988
@@ Property / Wikidata QID: Q58883988 / rank @@
+Normal rank
@@ Property / cites work @@
+Average-optimal single and multiple approximate string matching
+Normal rank
@@ Property / cites work @@
+Dictionary matching and indexing with errors and don't cares
+Normal rank
@@ Property / cites work @@
+Combinatorial Pattern Matching
@@ Property / cites work: Combinatorial Pattern Matching / rank @@
+Normal rank
@@ Property / cites work @@
+A Linear Size Index for Approximate Pattern Matching
+Normal rank
@@ Property / cites work @@
+Suffix Arrays: A New Method for On-Line String Searches
+Normal rank
@@ Property / cites work @@
+Q4661875
@@ Property / cites work: Q4661875 / rank @@
+Normal rank
@@ Property / cites work @@
+A sublinear algorithm for approximate keyword searching
+Normal rank
@@ Property / cites work @@
+Indexing text with approximate \(q\)-grams
@@ Property / cites work: Indexing text with approximate \(q\)-grams / rank @@
+Normal rank
@@ Property / cites work @@
+Compressed representations of sequences and full-text indexes
+Normal rank
@@ Property / cites work @@
+An analysis of the Burrows—Wheeler transform
@@ Property / cites work: An analysis of the Burrows—Wheeler transform / rank @@
+Normal rank
@@ Property / cites work @@
+Compression of individual sequences via variable-rate coding
+Normal rank
@@ Property / cites work @@
+Indexing compressed text
@@ Property / cites work: Indexing compressed text / rank @@
+Normal rank
@@ Property / cites work @@
+Indexing text using the Ziv--Lempel trie
@@ Property / cites work: Indexing text using the Ziv--Lempel trie / rank @@
+Normal rank
@@ Property / cites work @@
+Reducing the Space Requirement of LZ-Index
@@ Property / cites work: Reducing the Space Requirement of LZ-Index / rank @@
+Normal rank
@@ Property / cites work @@
+New text indexing functionalities of the compressed suffix arrays
+Normal rank
@@ Property / cites work @@
+Q4471381
@@ Property / cites work: Q4471381 / rank @@
+Normal rank
@@ Property / cites work @@
+Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
+Normal rank
@@ Property / cites work @@
+Compressed suffix trees with full functionality
@@ Property / cites work: Compressed suffix trees with full functionality / rank @@
+Normal rank
@@ Property / cites work @@
+An(other) Entropy-Bounded Compressed Suffix Tree
@@ Property / cites work: An(other) Entropy-Bounded Compressed Suffix Tree / rank @@
+Normal rank
@@ Property / cites work @@
+Fully-Compressed Suffix Trees
@@ Property / cites work: Fully-Compressed Suffix Trees / rank @@
+Normal rank
@@ Property / cites work @@
+Compressed text indexes
@@ Property / cites work: Compressed text indexes / rank @@
+Normal rank
@@ Property / cites work @@
+Combinatorial Pattern Matching
@@ Property / cites work: Combinatorial Pattern Matching / rank @@
+Normal rank
@@ Property / cites work @@
+Algorithms and Computation
@@ Property / cites work: Algorithms and Computation / rank @@
+Normal rank
@@ Property / cites work @@
+Improving an algorithm for approximate pattern matching
+Normal rank
@@ Property / cites work @@
+Q3690245
@@ Property / cites work: Q3690245 / rank @@
+Normal rank
@@ Property / cites work @@
+Algorithms on Strings, Trees and Sequences
@@ Property / cites work: Algorithms on Strings, Trees and Sequences / rank @@
+Normal rank
@@ Property / cites work @@
+On the Complexity of Finite Sequences
@@ Property / cites work: On the Complexity of Finite Sequences / rank @@
+Normal rank
@@ Property / cites work @@
+A universal algorithm for sequential data compression
+Normal rank
@@ Property / cites work @@
+Finding approximate patterns in strings
@@ Property / cites work: Finding approximate patterns in strings / rank @@
+Normal rank
@@ Property / cites work @@
+Incremental String Comparison
@@ Property / cites work: Incremental String Comparison / rank @@
+Normal rank
@@ Property / cites work @@
+Linear bidirectional on-line construction of affix trees
+Normal rank
@@ Property / cites work @@
+Implementing the LZ-index
@@ Property / cites work: Implementing the LZ-index / rank @@
+Normal rank
@@ Property / cites work @@
+Dynamic Rank-Select Structures with Applications to Run-Length Encoded Texts
+Normal rank
@@ Property / cites work @@
+A fast bit-vector algorithm for approximate string matching based on dynamic programming
+Normal rank
@@ Property / cites work @@
+Very fast and simple approximate string matching
@@ Property / cites work: Very fast and simple approximate string matching / rank @@
+Normal rank
@@ Property / cites work @@
+Succinct Indexable Dictionaries with Applications to Encoding $k$-ary Trees, Prefix Sums and Multisets
+Normal rank
@@ Property / cites work @@
+Dynamic entropy-compressed sequences and full-text indexes
+Normal rank