Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling
From MaRDI portal
Publication:6136229
DOI10.1137/22M1516968arXiv2209.08004OpenAlexW4384821870MaRDI QIDQ6136229FDOQ6136229
Authors: Boris Landa, Xiuyuan Cheng
Publication date: 29 August 2023
Published in: SIAM Journal on Mathematics of Data Science (Search for Journal in Brave)
Abstract: The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points. Yet, they can be inaccurate under high-dimensional noise, especially if the noise magnitude varies considerably across the data, e.g., under heteroskedasticity or outliers. In this work, we investigate a more robust alternative -- the doubly stochastic normalization of the Gaussian kernel. We consider a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish that the doubly stochastic affinity matrix and its scaling factors concentrate around certain population forms, and provide corresponding finite-sample probabilistic error bounds. We then utilize these results to develop several tools for robust inference under general high-dimensional noise. First, we derive a robust density estimator that reliably infers the underlying sampling density and can substantially outperform the standard kernel density estimator under heteroskedasticity and outliers. Second, we obtain estimators for the pointwise noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean distances between clean data points. Lastly, we derive robust graph Laplacian normalizations that accurately approximate various manifold Laplacians, including the Laplace Beltrami operator, improving over traditional normalizations in noisy settings. We exemplify our results in simulations and on real single-cell RNA-sequencing data. For the latter, we show that in contrast to traditional methods, our approach is robust to variability in technical noise levels across cell types.
Full work available at URL: https://arxiv.org/abs/2209.08004
Recommendations
- Adaptive manifold density estimation
- Density estimation on manifolds with boundary
- A unified probabilistic framework for robust manifold learning and embedding
- Nonparametric Bayesian density estimation on manifolds with applications to planar shapes
- Robustness of statistical manifolds
- scientific article; zbMATH DE number 7625193
- A stable manifold MCMC method for high dimensions
- Nonparametric inference on manifolds. With applications to shape spaces
- Nonparametric inference on manifolds. With applications to shape spaces
manifold learningdiffusion mapsdoubly stochasticgraph Laplaciannoise robustnessaffinity matrixSinkhorn scaling
Cites Work
- Visualizing data using t-SNE
- Diffusion maps
- Diffusion wavelets
- Concerning nonnegative matrices and doubly stochastic matrices
- A Symmetry Preserving Algorithm for Matrix Scaling
- Wavelets on graphs via spectral graph theory
- Computational Optimal Transport: With Applications to Data Science
- Remarks on Some Nonparametric Estimates of a Density Function
- High-Dimensional Probability
- On Estimation of a Probability Density Function and Mode
- Perturbation of the eigenvectors of the graph Laplacian: application to image denoising
- Symmetrizing smoothing filters
- A Review of Image Denoising Algorithms, with a New One
- Laplacian Eigenmaps for Dimensionality Reduction and Data Representation
- Diffusion maps, spectral clustering and reaction coordinates of dynamical systems
- From graph to manifold Laplacian: the convergence rate
- Learning Theory
- Variable bandwidth diffusion kernels
- Entropy minimization, \(DAD\) problems, and doubly stochastic kernels
- Title not available (Why is that?)
- Diffusion Interpretation of Nonlocal Neighborhood Filters for Signal Denoising
- Role of normalization in spectral clustering for stochastic blockmodels
- Heat kernels on weighted manifolds and applications
- Graph connection Laplacian methods can be made robust to noise
- On information plus noise kernel random matrices
- Poisson noise reduction with non-local PCA
- Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace-Beltrami operator
- Graph Laplacian Regularization for Image Denoising: Analysis in the Continuous Domain
- Analysis of call centre arrival data using singular value decomposition
- Title not available (Why is that?)
- Clipped noisy images: heteroskedastic modeling and practical denoising
- Spectral convergence of graph Laplacian and heat kernel reconstruction in \(L^\infty\) from random samples
- Improved spectral convergence rates for graph Laplacians on \(\varepsilon \)-graphs and \(k\)-NN graphs
- Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation
- Spectral Convergence of Diffusion Maps: Improved Error Bounds and an Alternative Normalization
- Doubly Stochastic Normalization of the Gaussian Kernel Is Robust to Heteroskedastic Noise
- Manifold learning with bi-stochastic kernels
- A common variable minimax theorem for graphs
- Learning doubly stochastic and nearly idempotent affinity matrix for graph-based clustering
- A Note Concerning Simultaneous Integral Equations
- The Steerable Graph Laplacian and its Application to Filtering Image Datasets
- Strong uniform consistency with rates for kernel density estimators with general kernels on manifolds
Cited In (2)
This page was built for publication: Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6136229)