Time-optimal top-k document retrieval

From MaRDI portal
Publication:2963583

DOI10.1137/140998949zbMATH Open1359.68053arXiv1307.6789OpenAlexW2587741488MaRDI QIDQ2963583FDOQ2963583

Gonzalo Navarro, Yakov Nekrich

Publication date: 15 February 2017

Published in: SIAM Journal on Computing (Search for Journal in Brave)

Abstract: Let mathcalD be a collection of D documents, which are strings over an alphabet of size sigma, of total length n. We describe a data structure that uses linear space and and reports k most relevant documents that contain a query pattern P, which is a string of length p, in time O(p/logsigman+k), which is optimal in the RAM model in the general case where lgD=Theta(logn), and involves a novel RAM-optimal suffix tree search. Our construction supports an ample set of important relevance measures... [clip] When lgD=o(logn), we show how to reduce the space of the data structure from O(nlogn) to O(n(logsigma+logD+loglogn)) bits... [clip] We also consider the dynamic scenario, where documents can be inserted and deleted from the collection. We obtain linear space and query time O(p(loglogn)2/logsigman+logn+kloglogk), whereas insertions and deletions require O(log1+epsilonn) time per symbol, for any constant epsilon>0. Finally, we consider an extended static scenario where an extra parameter par(P,d) is defined, and the query must retrieve only documents d such that par(P,d)in[au1,au2], where this range is specified at query time. We solve these queries using linear space and O(p/logsigman+log1+epsilonn+klogepsilonn) time, for any constant epsilon>0. Our technique is to translate these top-k problems into multidimensional geometric search problems. As an additional bonus, we describe some improvements to those problems.


Full work available at URL: https://arxiv.org/abs/1307.6789




Recommendations




Cites Work


Cited In (14)





This page was built for publication: Time-optimal top-\(k\) document retrieval

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2963583)