Towards an optimal space-and-query-time index for top-k document retrieval

From MaRDI portal
Publication:2904490

DOI10.1007/978-3-642-31265-6_14zbMATH Open1358.68092arXiv1108.0554OpenAlexW1504477191MaRDI QIDQ2904490FDOQ2904490


Authors: Wing-Kai Hon, Rahul Shah, Sharma V. Thankachan Edit this on Wikidata


Publication date: 14 August 2012

Published in: Combinatorial Pattern Matching (Search for Journal in Brave)

Abstract: Let D=d1,d2,...dD be a given set of D string documents of total length n, our task is to index D, such that the k most relevant documents for an online query pattern P of length p can be retrieved efficiently. We propose an index of size |CSA|+nlogD(2+o(1)) bits and O(ts(p)+kloglogn+polyloglogn) query time for the basic relevance metric emph{term-frequency}, where |CSA| is the size (in bits) of a compressed full text index of D, with O(ts(p)) time for searching a pattern of length p . We further reduce the space to |CSA|+nlogD(1+o(1)) bits, however the query time will be O(ts(p)+k(logsigmaloglogn)1+epsilon+polyloglogn), where sigma is the alphabet size and epsilon>0 is any constant.


Full work available at URL: https://arxiv.org/abs/1108.0554




Recommendations




Cited In (17)





This page was built for publication: Towards an optimal space-and-query-time index for top-\(k\) document retrieval

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2904490)