Learning low-dimensional nonlinear structures from high-dimensional noisy data: an integral operator approach
From MaRDI portal
Publication:6183757
DOI10.1214/23-AOS2306arXiv2203.00126OpenAlexW4387828537MaRDI QIDQ6183757FDOQ6183757
Authors: Xiucai Ding, Rong Ma
Publication date: 4 January 2024
Published in: The Annals of Statistics (Search for Journal in Brave)
Abstract: We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations, where the datasets are assumed to be sampled from an intrinsically low-dimensional manifold and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and size of the samples are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various manifolds in diverse applications.
Full work available at URL: https://arxiv.org/abs/2203.00126
Statistical aspects of big data and data science (62R07) Statistics on manifolds (62R30) Integral operators (47G10)
Cites Work
- Visualizing data using t-SNE
- Principal component analysis.
- Diffusion maps
- Modern multidimensional scaling. Theory and applications.
- Gaussian processes for machine learning.
- Pattern recognition and machine learning.
- On the distribution of the largest eigenvalue in principal components analysis
- An introduction to support vector machines and other kernel-based learning methods.
- High-Dimensional Probability
- Title not available (Why is that?)
- DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES
- Kernel methods in machine learning
- Consistency of spectral clustering
- Title not available (Why is that?)
- Laplacian Eigenmaps for Dimensionality Reduction and Data Representation
- From graph to manifold Laplacian: the convergence rate
- Vector diffusion maps and the connection Laplacian
- Empirical graph Laplacian approximation of Laplace–Beltrami operators: Large sample results
- Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment
- Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data
- The imbedding problem for Riemannian manifolds
- On learning with integral operators
- Learning theory estimates via integral operators and their approximations
- The spectrum of kernel random matrices
- Nonlinear Dimensionality Reduction
- Graph connection Laplacian methods can be made robust to noise
- On information plus noise kernel random matrices
- The Dynamics of Message Passing on Dense Graphs, with Applications to Compressed Sensing
- Accurate error bounds for the eigenvalues of the kernel matrix
- Geometry on probability spaces
- Statistical properties of kernel principal component analysis
- Data spectroscopy: eigenspaces of convolution operators and clustering
- Kernel methods and machine learning
- Title not available (Why is that?)
- Title not available (Why is that?)
- Spectral convergence of the connection Laplacian from random samples
- On Euclidean random matrices in high dimension
- Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace-Beltrami operator
- The spectral norm of random inner-product kernel matrices
- The spectrum of random kernel matrices: universality results for rough and varying kernels
- The spectrum of random inner-product kernel matrices
- Local linear regression on manifolds and its geometric interpretation
- Spectral ranking using seriation
- Think globally, fit locally under the manifold setup: asymptotic analysis of locally linear embedding
- Spectral convergence of graph Laplacian and heat kernel reconstruction in \(L^\infty\) from random samples
- Spectral Convergence of Diffusion Maps: Improved Error Bounds and an Alternative Normalization
- On the Spectral Property of Kernel-Based Sensor Fusion Algorithms of High Dimensional Data
- An $\ell_{\infty}$ Eigenvector Perturbation Bound and Its Application to Robust Covariance Estimation
- Analysis of spectral clustering algorithms for community detection: the general bipartite setting
- Statistical inference for principal components of spiked covariance matrices
- Optimality of spectral clustering in the Gaussian mixture model
- Singular vector and singular subspace distribution for the matrix denoising model
- Title not available (Why is that?)
- An \({\ell_p}\) theory of PCA and spectral clustering
- Concentration of kernel matrices with application to kernel spectral clustering
- A Riemann-Stein kernel method
- Lipschitz Regularity of Graph Laplacians on Random Data Clouds
- Scalability and robustness of spectral embedding: landmark diffusion is all you need
- Improving Spectral Clustering Using the Asymptotic Value of the Normalized Cut
- Clustering with t-SNE, Provably
- Graph Based Gaussian Processes on Restricted Domains
- Spectral Methods for Data Science: A Statistical Perspective
Cited In (1)
This page was built for publication: Learning low-dimensional nonlinear structures from high-dimensional noisy data: an integral operator approach
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6183757)