Balancing geometry and density: path distances on high-dimensional data

From MaRDI portal
Publication:5037564

DOI10.1137/20M1386657zbMATH Open1493.62391arXiv2012.09385OpenAlexW4226462355MaRDI QIDQ5037564FDOQ5037564


Authors: Anna V. Little, Daniel McKenzie, James M. Murphy Edit this on Wikidata


Publication date: 1 March 2022

Published in: SIAM Journal on Mathematics of Data Science (Search for Journal in Brave)

Abstract: New geometric and computational analyses of power-weighted shortest-path distances (PWSPDs) are presented. By illuminating the way these metrics balance density and geometry in the underlying data, we clarify their key parameters and discuss how they may be chosen in practice. Comparisons are made with related data-driven metrics, which illustrate the broader role of density in kernel-based unsupervised and semi-supervised machine learning. Computationally, we relate PWSPDs on complete weighted graphs to their analogues on weighted nearest neighbor graphs, providing high probability guarantees on their equivalence that are near-optimal. Connections with percolation theory are developed to establish estimates on the bias and variance of PWSPDs in the finite sample setting. The theoretical results are bolstered by illustrative experiments, demonstrating the versatility of PWSPDs for a wide range of data settings. Throughout the paper, our results require only that the underlying data is sampled from a low-dimensional manifold, and depend crucially on the intrinsic dimension of this manifold, rather than its ambient dimension.


Full work available at URL: https://arxiv.org/abs/2012.09385




Recommendations




Cites Work


Cited In (5)

Uses Software





This page was built for publication: Balancing geometry and density: path distances on high-dimensional data

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5037564)