Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling
From MaRDI portal
Publication:6136229
DOI10.1137/22M1516968OpenAlexW4384821870MaRDI QIDQ6136229FDOQ6136229
Authors: Boris Landa, Xiuyuan Cheng
Publication date: 29 August 2023
Published in: SIAM Journal on Mathematics of Data Science (Search for Journal in Brave)
Abstract: The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points. Yet, they can be inaccurate under high-dimensional noise, especially if the noise magnitude varies considerably across the data, e.g., under heteroskedasticity or outliers. In this work, we investigate a more robust alternative -- the doubly stochastic normalization of the Gaussian kernel. We consider a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish that the doubly stochastic affinity matrix and its scaling factors concentrate around certain population forms, and provide corresponding finite-sample probabilistic error bounds. We then utilize these results to develop several tools for robust inference under general high-dimensional noise. First, we derive a robust density estimator that reliably infers the underlying sampling density and can substantially outperform the standard kernel density estimator under heteroskedasticity and outliers. Second, we obtain estimators for the pointwise noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean distances between clean data points. Lastly, we derive robust graph Laplacian normalizations that accurately approximate various manifold Laplacians, including the Laplace Beltrami operator, improving over traditional normalizations in noisy settings. We exemplify our results in simulations and on real single-cell RNA-sequencing data. For the latter, we show that in contrast to traditional methods, our approach is robust to variability in technical noise levels across cell types.
Full work available at URL: https://arxiv.org/abs/2209.08004
Recommendations
- Adaptive manifold density estimation
- Density estimation on manifolds with boundary
- A unified probabilistic framework for robust manifold learning and embedding
- Nonparametric Bayesian density estimation on manifolds with applications to planar shapes
- Robustness of statistical manifolds
- scientific article; zbMATH DE number 7625193
- A stable manifold MCMC method for high dimensions
- Nonparametric inference on manifolds. With applications to shape spaces
- Nonparametric inference on manifolds. With applications to shape spaces
manifold learningdiffusion mapsdoubly stochasticgraph Laplaciannoise robustnessaffinity matrixSinkhorn scaling
Cites Work
- scientific article; zbMATH DE number 1033392 (Why is no real title available?)
- scientific article; zbMATH DE number 7370580 (Why is no real title available?)
- A Note Concerning Simultaneous Integral Equations
- A Review of Image Denoising Algorithms, with a New One
- A Symmetry Preserving Algorithm for Matrix Scaling
- A common variable minimax theorem for graphs
- Analysis of call centre arrival data using singular value decomposition
- Clipped noisy images: heteroskedastic modeling and practical denoising
- Computational optimal transport. With applications to data sciences
- Concerning nonnegative matrices and doubly stochastic matrices
- Diffusion Interpretation of Nonlocal Neighborhood Filters for Signal Denoising
- Diffusion maps
- Diffusion maps, spectral clustering and reaction coordinates of dynamical systems
- Diffusion wavelets
- Doubly stochastic normalization of the Gaussian kernel is robust to heteroskedastic noise
- Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation
- Entropy minimization, \(DAD\) problems, and doubly stochastic kernels
- Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace-Beltrami operator
- From graph to manifold Laplacian: the convergence rate
- Graph Laplacian Regularization for Image Denoising: Analysis in the Continuous Domain
- Graph connection Laplacian methods can be made robust to noise
- Heat kernels on weighted manifolds and applications
- High-dimensional probability. An introduction with applications in data science
- Improved spectral convergence rates for graph Laplacians on \(\varepsilon \)-graphs and \(k\)-NN graphs
- Laplacian Eigenmaps for Dimensionality Reduction and Data Representation
- Learning Theory
- Learning doubly stochastic and nearly idempotent affinity matrix for graph-based clustering
- Manifold learning with bi-stochastic kernels
- On Estimation of a Probability Density Function and Mode
- On information plus noise kernel random matrices
- Perturbation of the eigenvectors of the graph Laplacian: application to image denoising
- Poisson noise reduction with non-local PCA
- Remarks on Some Nonparametric Estimates of a Density Function
- Role of normalization in spectral clustering for stochastic blockmodels
- Spectral Convergence of Diffusion Maps: Improved Error Bounds and an Alternative Normalization
- Spectral convergence of graph Laplacian and heat kernel reconstruction in \(L^\infty\) from random samples
- Strong uniform consistency with rates for kernel density estimators with general kernels on manifolds
- Symmetrizing smoothing filters
- The Steerable Graph Laplacian and its Application to Filtering Image Datasets
- Variable bandwidth diffusion kernels
- Visualizing data using t-SNE
- Wavelets on graphs via spectral graph theory
Cited In (2)
This page was built for publication: Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6136229)