Document listing on repetitive collections with guaranteed performance
From MaRDI portal
Publication:2632016
Abstract: We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size over alphabet is composed of copies of a string of size , and edits are applied on ranges of copies. We introduce the first document listing index with size , precisely bits, and with useful worst-case time guarantees: Given a pattern of length , the index reports the strings where it appears in time , for any constant (and tells in time if ). Our technique is to augment a range data structure that is commonly used on grammar-based indexes, so that instead of retrieving all the pattern occurrences, it computes useful summaries on them. We show that the idea has independent interest: we introduce the first grammar-based index that, on a text with a grammar of size , uses bits and counts the number of occurrences of a pattern in time , for any constant . We also give the first index using bits, where is parsed by Lempel-Ziv into phrases, counting occurrences in time .
Recommendations
- Document Listing on Repetitive Collections with Guaranteed Performance
- Document listing on repetitive collections
- Document retrieval on repetitive collections
- Flexible indexing of repetitive collections
- Indexing highly repetitive collections
- scientific article; zbMATH DE number 2119724
- Publication:4886029
Cites work
- scientific article; zbMATH DE number 2079421 (Why is no real title available?)
- scientific article; zbMATH DE number 6850405 (Why is no real title available?)
- scientific article; zbMATH DE number 2119724 (Why is no real title available?)
- A Functional Approach to Data Structures and Its Use in Multidimensional Searching
- A \textit{really} simple approximation of smallest grammar
- A faster grammar-based self-index
- A fully linear-time approximation algorithm for grammar-based compression
- Algorithms for approximate string matching
- Alphabet-independent compressed text indexing
- Application of Lempel-Ziv factorization to the approximation of grammar-based compression.
- Approximation of grammar-based compression via recompression
- Composite repetition-aware data structures
- Compressed indexing with signature grammars
- Data structure lower bounds on random access to grammar-compressed strings
- Document listing on repetitive collections
- Efficient randomized pattern-matching algorithms
- Elements of Information Theory
- Fast relative Lempel-Ziv self-index for similar sequences
- Fully dynamic data structure for LCE queries in compressed space
- Indexing highly repetitive collections
- Indexing similar DNA sequences
- Information retrieval. Implementing and evaluating search engines.
- LRM-trees: compressed indices, adaptive sorting, and compressed permutations
- LZ77-based self-indexing with faster pattern matching
- New algorithms on wavelet trees and applications to information retrieval
- On compressing and indexing repetitive sequences
- On the Complexity of Finite Sequences
- Practical entropy-compressed rank/select dictionary
- Random access to grammar-compressed strings and trees
- Self-indexed grammar-based compression
- Space-Efficient Framework for Top-k String Retrieval Problems
- Space-efficient data-analysis queries on grids
- Space-efficient preprocessing schemes for range minimum queries on static arrays
- Spaces, trees, and colors: the algorithmic landscape of document retrieval on sequences
- Succinct data structures for flexible text retrieval systems
- Suffix tree of alignment: an efficient index for similar data
- The Smallest Grammar Problem
- The smallest grammar problem revisited
- Time-space trade-offs for Lempel-Ziv compressed indexing
- Universal compressed text indexing
- Wavelet trees for all
Cited in
(7)- Random access in persistent strings and segment selection
- Document retrieval on repetitive collections
- Document listing on repetitive collections
- Contextual Pattern Matching
- Sensitivity of string compressors and repetitiveness measures
- Document Listing on Repetitive Collections with Guaranteed Performance
- scientific article; zbMATH DE number 7765406 (Why is no real title available?)
This page was built for publication: Document listing on repetitive collections with guaranteed performance
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2632016)