Learning low-dimensional nonlinear structures from high-dimensional noisy data: an integral operator approach
From MaRDI portal
Publication:6183757
DOI10.1214/23-AOS2306OpenAlexW4387828537MaRDI QIDQ6183757FDOQ6183757
Authors: Xiucai Ding, Rong Ma
Publication date: 4 January 2024
Published in: The Annals of Statistics (Search for Journal in Brave)
Abstract: We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations, where the datasets are assumed to be sampled from an intrinsically low-dimensional manifold and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension and size of the samples are comparably large, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Numerical simulations and analysis of three real datasets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various manifolds in diverse applications.
Full work available at URL: https://arxiv.org/abs/2203.00126
Recommendations
- Learning Eigenfunctions Links Spectral Embedding and Kernel PCA
- Nonlinear dimensionality reduction by topologically constrained isometric embedding
- A study of the classification of low-dimensional data with supervised manifold learning
- Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment
- Learning gradients on manifolds
Statistical aspects of big data and data science (62R07) Statistics on manifolds (62R30) Integral operators (47G10)
Cites Work
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- A Riemann-Stein kernel method
- Accurate error bounds for the eigenvalues of the kernel matrix
- An \(\ell_{\infty}\) eigenvector perturbation bound and its application
- An \({\ell_p}\) theory of PCA and spectral clustering
- An introduction to support vector machines and other kernel-based learning methods.
- Analysis of spectral clustering algorithms for community detection: the general bipartite setting
- Clustering with t-SNE, provably
- Concentration of kernel matrices with application to kernel spectral clustering
- Consistency of spectral clustering
- DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES
- Data spectroscopy: eigenspaces of convolution operators and clustering
- Diffusion maps
- Empirical graph Laplacian approximation of Laplace–Beltrami operators: Large sample results
- Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace-Beltrami operator
- From graph to manifold Laplacian: the convergence rate
- Gaussian processes for machine learning.
- Geometry on probability spaces
- Graph Based Gaussian Processes on Restricted Domains
- Graph connection Laplacian methods can be made robust to noise
- Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data
- High-dimensional probability. An introduction with applications in data science
- Improving Spectral Clustering Using the Asymptotic Value of the Normalized Cut
- Kernel methods and machine learning
- Kernel methods in machine learning
- Laplacian Eigenmaps for Dimensionality Reduction and Data Representation
- Learning theory estimates via integral operators and their approximations
- Lipschitz regularity of graph Laplacians on random data clouds
- Local linear regression on manifolds and its geometric interpretation
- Modern multidimensional scaling. Theory and applications.
- Nonlinear Dimensionality Reduction
- On Euclidean random matrices in high dimension
- On information plus noise kernel random matrices
- On learning with integral operators
- On the Spectral Property of Kernel-Based Sensor Fusion Algorithms of High Dimensional Data
- On the distribution of the largest eigenvalue in principal components analysis
- Optimality of spectral clustering in the Gaussian mixture model
- Pattern recognition and machine learning.
- Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment
- Principal component analysis.
- Scalability and robustness of spectral embedding: landmark diffusion is all you need
- Singular vector and singular subspace distribution for the matrix denoising model
- Spectral Convergence of Diffusion Maps: Improved Error Bounds and an Alternative Normalization
- Spectral Methods for Data Science: A Statistical Perspective
- Spectral convergence of graph Laplacian and heat kernel reconstruction in \(L^\infty\) from random samples
- Spectral convergence of the connection Laplacian from random samples
- Spectral ranking using seriation
- Statistical inference for principal components of spiked covariance matrices
- Statistical properties of kernel principal component analysis
- The Dynamics of Message Passing on Dense Graphs, with Applications to Compressed Sensing
- The imbedding problem for Riemannian manifolds
- The spectral norm of random inner-product kernel matrices
- The spectrum of kernel random matrices
- The spectrum of random inner-product kernel matrices
- The spectrum of random kernel matrices: universality results for rough and varying kernels
- Think globally, fit locally under the manifold setup: asymptotic analysis of locally linear embedding
- Vector diffusion maps and the connection Laplacian
- Visualizing data using t-SNE
Cited In (2)
This page was built for publication: Learning low-dimensional nonlinear structures from high-dimensional noisy data: an integral operator approach
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6183757)