Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling

From MaRDI portal
Publication:6136229

DOI10.1137/22M1516968arXiv2209.08004OpenAlexW4384821870MaRDI QIDQ6136229FDOQ6136229


Authors: Boris Landa, Xiuyuan Cheng Edit this on Wikidata


Publication date: 29 August 2023

Published in: SIAM Journal on Mathematics of Data Science (Search for Journal in Brave)

Abstract: The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points. Yet, they can be inaccurate under high-dimensional noise, especially if the noise magnitude varies considerably across the data, e.g., under heteroskedasticity or outliers. In this work, we investigate a more robust alternative -- the doubly stochastic normalization of the Gaussian kernel. We consider a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish that the doubly stochastic affinity matrix and its scaling factors concentrate around certain population forms, and provide corresponding finite-sample probabilistic error bounds. We then utilize these results to develop several tools for robust inference under general high-dimensional noise. First, we derive a robust density estimator that reliably infers the underlying sampling density and can substantially outperform the standard kernel density estimator under heteroskedasticity and outliers. Second, we obtain estimators for the pointwise noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean distances between clean data points. Lastly, we derive robust graph Laplacian normalizations that accurately approximate various manifold Laplacians, including the Laplace Beltrami operator, improving over traditional normalizations in noisy settings. We exemplify our results in simulations and on real single-cell RNA-sequencing data. For the latter, we show that in contrast to traditional methods, our approach is robust to variability in technical noise levels across cell types.


Full work available at URL: https://arxiv.org/abs/2209.08004




Recommendations




Cites Work


Cited In (2)





This page was built for publication: Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6136229)