Multiscale matrix sampling and sublinear-time PageRank computation

From MaRDI portal
Publication:4985347

DOI10.1080/15427951.2013.802752zbMATH Open1462.68008arXiv1202.2771OpenAlexW2148662050MaRDI QIDQ4985347FDOQ4985347


Authors: Michael Brautbar, Christian Borgs, Jennifer T. Chayes, Shang-Hua Teng Edit this on Wikidata


Publication date: 23 April 2021

Published in: Internet Mathematics (Search for Journal in Brave)

Abstract: A fundamental problem arising in many applications in Web science and social network analysis is, given an arbitrary approximation factor c>1, to output a set S of nodes that with high probability contains all nodes of PageRank at least Delta, and no node of PageRank smaller than Delta/c. We call this problem {sc SignificantPageRanks}. We develop a nearly optimal, local algorithm for the problem with runtime complexity ildeO(n/Delta) on networks with n nodes. We show that any algorithm for solving this problem must have runtime of Omega(n/Delta), rendering our algorithm optimal up to logarithmic factors. Our algorithm comes with two main technical contributions. The first is a multi-scale sampling scheme for a basic matrix problem that could be of interest on its own. In the abstract matrix problem it is assumed that one can access an unknown {em right-stochastic matrix} by querying its rows, where the cost of a query and the accuracy of the answers depend on a precision parameter epsilon. At a cost propositional to 1/epsilon, the query will return a list of O(1/epsilon) entries and their indices that provide an epsilon-precision approximation of the row. Our task is to find a set that contains all columns whose sum is at least Delta, and omits any column whose sum is less than Delta/c. Our multi-scale sampling scheme solves this problem with cost ildeO(n/Delta), while traditional sampling algorithms would take time Theta((n/Delta)2). Our second main technical contribution is a new local algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed in cite{JehW03,AndersenCL06} and is highly efficient particularly for networks with large in-degrees or out-degrees. Together with our multiscale sampling scheme we are able to optimally solve the {sc SignificantPageRanks} problem.


Full work available at URL: https://arxiv.org/abs/1202.2771




Recommendations



Cites Work


Cited In (7)

Uses Software





This page was built for publication: Multiscale matrix sampling and sublinear-time PageRank computation

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4985347)