Randomized near-neighbor graphs, giant components and applications in data science

Statistical aspects of big data and data science (62R07) Random graphs (graph-theoretic aspects) (05C80) Geometric probability and stochastic geometry (60D05) Interacting random processes; statistical mechanics type models; percolation theory (60K35) Connectivity (05C40)

Abstract: If we pick

n

random points uniformly in

[0, 1]^{d}

and connect each point to its

k -

nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in

[0, 1]^{d}

it suffices to connect every point to

c_{d, 1} l o g l o g n

points chosen randomly among its

c_{d, 2} l o g n -

nearest neighbors to ensure a giant component of size

n - o (n)

with high probability. This construction yields a much sparser random graph with

s i m n l o g l o g n

instead of

s i m n l o g n

edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the

k -

nearest neighbors, one can often pick

k^{'} l l k

random points out of the

k -

nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.

Recommendations

Cites work

Describes a project that uses

Uses Software

This page was built for publication: Randomized near-neighbor graphs, giant components and applications in data science

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3299443)