Balancing geometry and density: path distances on high-dimensional data
From MaRDI portal
Publication:5037564
Abstract: New geometric and computational analyses of power-weighted shortest-path distances (PWSPDs) are presented. By illuminating the way these metrics balance density and geometry in the underlying data, we clarify their key parameters and discuss how they may be chosen in practice. Comparisons are made with related data-driven metrics, which illustrate the broader role of density in kernel-based unsupervised and semi-supervised machine learning. Computationally, we relate PWSPDs on complete weighted graphs to their analogues on weighted nearest neighbor graphs, providing high probability guarantees on their equivalence that are near-optimal. Connections with percolation theory are developed to establish estimates on the bias and variance of PWSPDs in the finite sample setting. The theoretical results are bolstered by illustrative experiments, demonstrating the versatility of PWSPDs for a wide range of data settings. Throughout the paper, our results require only that the underlying data is sampled from a low-dimensional manifold, and depend crucially on the intrinsic dimension of this manifold, rather than its ambient dimension.
Recommendations
- Diffusion state distances: multitemporal analysis, fast algorithms, and applications to biological networks
- Empirical geodesic graphs and CAT\((k)\) metrics for data analysis
- Intrinsic dimension of geometric data sets
- Shape dimension and intrinsic metric from samples of manifolds with high co-dimension (extended abstract)
- Bounds on the mean power-weighted nearest neighbour distance
Cites work
- scientific article; zbMATH DE number 1844606 (Why is no real title available?)
- scientific article; zbMATH DE number 7255037 (Why is no real title available?)
- scientific article; zbMATH DE number 3023295 (Why is no real title available?)
- 50 Years of First-Passage Percolation
- A Nonparametric Estimate of a Multivariate Density Function
- A distribution-free theory of nonparametric regression
- A note on some rates of convergence in first-passage percolation
- Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions
- Connecting dots: from local covariance to empirical intrinsic geometry and locally linear embedding
- Curvature Measures
- Density-sensitive semisupervised inference
- Diffusion maps
- Efficient Algorithms for Shortest Paths in Sparse Networks
- Entropy reduction in Euclidean first-passage percolation
- Estimating the reach of a manifold
- Euclidean models of first-passage percolation
- Exact computation of a manifold metric, via Lipschitz Embeddings and Shortest Paths on a Graph
- Finding the homology of submanifolds with high confidence from random samples
- Fractional diffusion maps
- Generalized density clustering
- Geodesics and spanning trees for Euclidean first-passage percolation.
- Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps
- Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data
- Laplacian Eigenmaps for Dimensionality Reduction and Data Representation
- Local kernels and the geometric structure of data
- Local regularization of noisy point clouds: improved global geometric estimates and data analysis
- Manifold learning with arbitrary norms
- Nonhomogeneous Euclidean first-passage percolation and distance learning
- On energy, discrepancy and group invariant measures on measurable subsets of Euclidean space
- Robust path-based spectral clustering
- Shortest path through random points
- The geometry of kernelized spectral clustering
- The reach, metric distortion, geodesic convexity and the variation of tangent spaces
- The strong uniform consistency of nearest neighbor density estimates
- Variable bandwidth diffusion kernels
- Visualizing data using t-SNE
Cited in
(5)- People mover's distance: class level geometry using fast pairwise data adaptive transportation costs
- Comments on: ``Distance geometry and data science
- Ratio convergence rates for Euclidean first-passage percolation: applications to the graph infinity Laplacian
- Clustering Dynamics on Graphs: From Spectral Clustering to Mean Shift Through Fokker–Planck Interpolation
- Geometric scattering on measure spaces
This page was built for publication: Balancing geometry and density: path distances on high-dimensional data
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5037564)