Fast prefix search in little space, with applications
From MaRDI portal
Publication:3586483
Abstract: It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution. Traditionally, prefix search is solved by data structures that are also dictionaries---they actually contain the strings in . For very large collections stored in slow-access memory, we propose much more compact data structures that support emph{weak} prefix searches---they return the ranks of matching strings provided that emph{some} string in starts with the given prefix. In fact, we show that our most space-efficient data structure is asymptotically space-optimal. Previously, data structures such as String B-trees (and more complicated cache-oblivious string data structures) have implicitly supported weak prefix queries, but they all have query time that grows logarithmically with the size of the string collection. In contrast, our data structures are simple, naturally cache-efficient, and have query time that depends only on the length of the prefix, all the way down to constant query time for strings that fit in one machine word. We give several applications of weak prefix searches, including exact prefix counting and approximate counting of tuples matching conjunctive prefix conditions.
Recommendations
Cited in
(23)- On the weak prefix-search problem
- On the weak prefix-search problem
- Fast compressed self-indexes with deterministic linear-time construction
- Space-efficient substring occurrence estimation
- Improved space-time tradeoffs for approximate full-text indexing with one edit error
- scientific article; zbMATH DE number 2185639 (Why is no real title available?)
- I/O-efficient data structures for colored range and prefix reporting
- Time-optimal top-\(k\) document retrieval
- Fast Compressed Self-Indexes with Deterministic Linear-Time Construction
- String Indexing with Compressed Patterns
- Fast scalable construction of ([compressed] static | minimal perfect hash) functions
- c-trie++: a dynamic trie tailored for fast prefix searches
- An encoding for order-preserving matching
- Minimal indices for predecessor search
- Streaming Dictionary Matching with Mismatches
- All-pairs suffix/prefix in optimal time using Aho-Corasick space
- Time-space trade-offs for Lempel-Ziv compressed indexing
- Linear-time computation of prefix table for weighted strings {\&} applications
- Fast prefix matching of bounded strings
- Dynamic relative compression, dynamic partial sums, and substring concatenation
- Universal compressed text indexing
- Minimal and monotone minimal perfect hash functions
- Space-efficient conversions from SLPs
This page was built for publication: Fast prefix search in little space, with applications
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3586483)