Time-optimal top-k document retrieval
From MaRDI portal
Publication:2963583
DOI10.1137/140998949zbMATH Open1359.68053arXiv1307.6789OpenAlexW2587741488MaRDI QIDQ2963583FDOQ2963583
Gonzalo Navarro, Yakov Nekrich
Publication date: 15 February 2017
Published in: SIAM Journal on Computing (Search for Journal in Brave)
Abstract: Let be a collection of documents, which are strings over an alphabet of size , of total length . We describe a data structure that uses linear space and and reports most relevant documents that contain a query pattern , which is a string of length , in time , which is optimal in the RAM model in the general case where , and involves a novel RAM-optimal suffix tree search. Our construction supports an ample set of important relevance measures... [clip] When , we show how to reduce the space of the data structure from to bits... [clip] We also consider the dynamic scenario, where documents can be inserted and deleted from the collection. We obtain linear space and query time , whereas insertions and deletions require time per symbol, for any constant . Finally, we consider an extended static scenario where an extra parameter is defined, and the query must retrieve only documents such that , where this range is specified at query time. We solve these queries using linear space and time, for any constant . Our technique is to translate these top- problems into multidimensional geometric search problems. As an additional bonus, we describe some improvements to those problems.
Full work available at URL: https://arxiv.org/abs/1307.6789
Recommendations
- Top-\(k\) document retrieval in optimal time and linear space
- Top-\(k\) document retrieval in compact space and near-optimal time
- New space/time tradeoffs for top-\(k\) document retrieval on sequences
- Towards an optimal space-and-query-time index for top-\(k\) document retrieval
- Top-\(k\) document retrieval in optimal space
Cites Work
- Title not available (Why is that?)
- Algorithms on Strings, Trees and Sequences
- Improved compressed indexes for full-text document retrieval
- Compressed representations of sequences and full-text indexes
- Fast Prefix Search in Little Space, with Applications
- Title not available (Why is that?)
- Storing a Sparse Table with 0 (1) Worst Case Access Time
- Colored range queries and document retrieval
- Title not available (Why is that?)
- Title not available (Why is that?)
- Space-Efficient Framework for Top-k String Retrieval Problems
- Title not available (Why is that?)
- Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
- Title not available (Why is that?)
- The design of dynamic data structures
- Compressed suffix trees with full functionality
- Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
- Fully compressed suffix trees
- Constructing Efficient Dictionaries in Close to Sorting Time
- Online sorted range reporting
- Jewels of Stringology
- Top-k Document Retrieval in External Memory
- Alphabet-Independent Compressed Text Indexing
- Space-Efficient Frameworks for Top- k String Retrieval
- Priority Search Trees
- A Space-Economical Suffix Tree Construction Algorithm
- Spaces, Trees, and Colors
- Orthogonal range searching on the RAM, revisited
- Succinct data structures for flexible text retrieval systems
- Worst-case optimal insertion and deletion methods for decomposable searching problems
- On-line construction of suffix trees
- Space-Efficient Algorithms for Document Retrieval
- Dynamic ordered sets with exponential search trees
- Space-efficient data-analysis queries on grids
- New algorithms on wavelet trees and applications to information retrieval
- Title not available (Why is that?)
- A Functional Approach to Data Structures and Its Use in Multidimensional Searching
- Optimal External Memory Interval Management
- Dynamic LCA Queries on Trees
- Persistence, randomization and parallelization: On some combinatorial games and their applications (abstract)
- Derandomization, witnesses for Boolean matrix multiplication and construction of perfect hash functions
- Rank and select revisited and extended
- Towards an Optimal Space-and-Query-Time Index for Top-k Document Retrieval
- Top-k Ranked Document Search in General Text Databases
- New space/time tradeoffs for top-\(k\) document retrieval on sequences
- Top-\(k\) document retrieval in optimal space
- Efficient index for retrieving top-\(k\) most frequent documents
- Integer priority queues with decrease key in constant time and the single source shortest paths problem
- A SIMPLE BALANCED SEARCH TREE WITH O(1) WORST-CASE UPDATE TIME
- Dynamic dictionary matching
- Improved dynamic dictionary matching
- Title not available (Why is that?)
- Title not available (Why is that?)
- A Generalized Suffix Tree and Its (Un)Expected Asymptotic Behaviors
- Alphabet-Dependent String Searching with Wexponential Search Trees
- On Hardness of Several String Indexing Problems
- A LINEAR SPACE DATA STRUCTURE FOR ORTHOGONAL RANGE REPORTING AND EMPTINESS QUERIES
- General document retrieval in compact space
- Improved Single-Term Top-k Document Retrieval
Cited In (14)
- Fast compressed self-indexes with deterministic linear-time construction
- Top-\(k\) document retrieval in optimal space
- Title not available (Why is that?)
- Efficient index for retrieving top-\(k\) most frequent documents
- $$Top$$ - $$K$$ Query Retrieval of Combinations with Sum-of-Subsets Ranking
- Ranked Document Retrieval in External Memory
- Space-efficient data-analysis queries on grids
- Fast Compressed Self-Indexes with Deterministic Linear-Time Construction
- String indexing for top-\(k\) close consecutive occurrences
- Ranked document selection
- Indexing weighted sequences: neat and efficient
- Gapped indexing for consecutive occurrences
- A top-k retrieval algorithm based on a decomposition of ranking functions
- Path queries on functions
This page was built for publication: Time-optimal top-\(k\) document retrieval
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2963583)