Time-optimal top-k document retrieval
From MaRDI portal
Publication:2963583
Abstract: Let be a collection of documents, which are strings over an alphabet of size , of total length . We describe a data structure that uses linear space and and reports most relevant documents that contain a query pattern , which is a string of length , in time , which is optimal in the RAM model in the general case where , and involves a novel RAM-optimal suffix tree search. Our construction supports an ample set of important relevance measures... [clip] When , we show how to reduce the space of the data structure from to bits... [clip] We also consider the dynamic scenario, where documents can be inserted and deleted from the collection. We obtain linear space and query time , whereas insertions and deletions require time per symbol, for any constant . Finally, we consider an extended static scenario where an extra parameter is defined, and the query must retrieve only documents such that , where this range is specified at query time. We solve these queries using linear space and time, for any constant . Our technique is to translate these top- problems into multidimensional geometric search problems. As an additional bonus, we describe some improvements to those problems.
Recommendations
- Top-\(k\) document retrieval in optimal time and linear space
- Top-\(k\) document retrieval in compact space and near-optimal time
- New space/time tradeoffs for top-\(k\) document retrieval on sequences
- Towards an optimal space-and-query-time index for top-\(k\) document retrieval
- Top-\(k\) document retrieval in optimal space
Cites work
- scientific article; zbMATH DE number 3883638 (Why is no real title available?)
- scientific article; zbMATH DE number 3913711 (Why is no real title available?)
- scientific article; zbMATH DE number 3473265 (Why is no real title available?)
- scientific article; zbMATH DE number 1947389 (Why is no real title available?)
- scientific article; zbMATH DE number 2079421 (Why is no real title available?)
- scientific article; zbMATH DE number 1512678 (Why is no real title available?)
- scientific article; zbMATH DE number 2119724 (Why is no real title available?)
- A Functional Approach to Data Structures and Its Use in Multidimensional Searching
- A Generalized Suffix Tree and Its (Un)Expected Asymptotic Behaviors
- A LINEAR SPACE DATA STRUCTURE FOR ORTHOGONAL RANGE REPORTING AND EMPTINESS QUERIES
- A SIMPLE BALANCED SEARCH TREE WITH O(1) WORST-CASE UPDATE TIME
- A Space-Economical Suffix Tree Construction Algorithm
- Algorithms on Strings, Trees and Sequences
- Alphabet-dependent string searching with wexponential search trees
- Alphabet-independent compressed text indexing
- Colored range queries and document retrieval
- Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
- Compressed representations of sequences and full-text indexes
- Compressed suffix trees with full functionality
- Constructing Efficient Dictionaries in Close to Sorting Time
- Derandomization, witnesses for Boolean matrix multiplication and construction of perfect hash functions
- Dynamic LCA Queries on Trees
- Dynamic dictionary matching
- Dynamic ordered sets with exponential search trees
- Efficient index for retrieving top-\(k\) most frequent documents
- Fast prefix search in little space, with applications
- Fully compressed suffix trees
- General document retrieval in compact space
- Improved compressed indexes for full-text document retrieval
- Improved dynamic dictionary matching
- Improved single-term top-\(k\) document retrieval
- Integer priority queues with decrease key in constant time and the single source shortest paths problem
- Jewels of Stringology
- New algorithms on wavelet trees and applications to information retrieval
- New space/time tradeoffs for top-\(k\) document retrieval on sequences
- On Hardness of Several String Indexing Problems
- On-line construction of suffix trees
- Online sorted range reporting
- Optimal External Memory Interval Management
- Orthogonal range searching on the RAM, revisited
- Persistence, randomization and parallelization: On some combinatorial games and their applications (abstract)
- Priority Search Trees
- Rank and select revisited and extended
- Space-Efficient Algorithms for Document Retrieval
- Space-Efficient Framework for Top-k String Retrieval Problems
- Space-efficient data-analysis queries on grids
- Space-efficient frameworks for top-\(k\) string retrieval
- Space-efficient preprocessing schemes for range minimum queries on static arrays
- Spaces, trees, and colors: the algorithmic landscape of document retrieval on sequences
- Storing a Sparse Table with 0 (1) Worst Case Access Time
- Succinct data structures for flexible text retrieval systems
- The design of dynamic data structures
- Top-\(K\) color queries for document retrieval
- Top-\(k\) document retrieval in external memory
- Top-\(k\) document retrieval in optimal space
- Top-\(k\) document retrieval in optimal time and linear space
- Top-\(k\) ranked document search in general text databases
- Towards an optimal space-and-query-time index for top-\(k\) document retrieval
- Worst-case optimal insertion and deletion methods for decomposable searching problems
Cited in
(23)- Top-\(k\) document retrieval in optimal time and linear space
- String indexing for top-\(k\) close consecutive occurrences
- Top-\(k\) ranked document search in general text databases
- New space/time tradeoffs for top-\(k\) document retrieval on sequences
- Fast Compressed Self-Indexes with Deterministic Linear-Time Construction
- Ranked document selection
- A top-k retrieval algorithm based on a decomposition of ranking functions
- Ranked document selection
- Top-\(k\) document retrieval in optimal space
- Towards an optimal space-and-query-time index for top-\(k\) document retrieval
- Fast compressed self-indexes with deterministic linear-time construction
- Path queries on functions
- $$Top$$ - $$K$$ Query Retrieval of Combinations with Sum-of-Subsets Ranking
- scientific article; zbMATH DE number 2119724 (Why is no real title available?)
- Space-efficient data-analysis queries on grids
- Gapped indexing for consecutive occurrences
- Top-\(k\) document retrieval in compact space and near-optimal time
- Top-\(K\) color queries for document retrieval
- Efficient index for retrieving top-\(k\) most frequent documents
- Indexing weighted sequences: neat and efficient
- Top-\(k\) term-proximity in succinct space
- Ranked Document Retrieval in External Memory
- Top-\(k\) term-proximity in succinct space
This page was built for publication: Time-optimal top-\(k\) document retrieval
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2963583)