Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations
From MaRDI portal
(Redirected from Publication:764487)
consistencydiscriminant analysisprincipal components analysiseigenvalue distributionHDLSSinverse matrixnoise reduction
Factor analysis and principal components; correspondence analysis (62H25) Classification and discrimination; cluster analysis (statistical aspects) (62H30) Asymptotic distribution theory in statistics (62E20) Estimation in multivariate analysis (62H12) Eigenvalues, singular values, and eigenvectors (15A18) Set-valued maps in general topology (54C60)
Abstract: In this paper, we consider clustering based on principal component analysis (PCA) for high-dimension, low-sample-size (HDLSS) data. We give theoretical reasons why PCA is effective for clustering HDLSS data. First, we derive a geometric representation of HDLSS data taken from a two-class mixture model. With the help of the geometric representation, we give geometric consistency properties of sample principal component scores in the HDLSS context. We develop ideas of the geometric representation and geometric consistency properties to multiclass mixture models. We show that PCA can classify HDLSS data under certain conditions in a surprisingly explicit way. Finally, we demonstrate the performance of the clustering by using microarray data sets.
Recommendations
- Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix
- Statistical inference for high-dimension, low-sample-size data
- PCA consistency in high dimension, low sample size context
- Effective methodologies for high-dimensional data
- Boundary behavior in high dimension, low sample size asymptotics of PCA
Cites work
- Asymptotics of sample eigenstructure for a large dimensional spiked covariance model
- Basic properties of strong mixing conditions. A survey and some open questions
- Comparison of Discrimination Methods for High Dimensional Data
- Convergence and prediction of principal component scores in high-dimensional settings
- Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix
- Eigenvalues of large sample covariance matrices of spiked population models
- Geometric Representation of High Dimension, Low Sample Size Data
- Intrinsic dimensionality estimation of high-dimension, low sample size data with \(D\)-asymptotics
- Minimum distance classification rules for high dimensional data
- Multivariate Theory for Analyzing High Dimensional Data
- On Strong Mixing Conditions for Stationary Gaussian Processes
- On the distribution of the largest eigenvalue in principal components analysis
- PCA Consistency for Non-Gaussian Data in High Dimension, Low Sample Size Context
- PCA consistency in high dimension, low sample size context
- Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices
- The high-dimension, low-sample-size geometric representation holds under mild conditions
Cited in
(49)- Robust PCA for high‐dimensional data based on characteristic transformation
- Reconstruction of a low-rank matrix in the presence of Gaussian noise
- Using visual statistical inference to better understand random class separations in high dimension, low sample size data
- Geometric classifiers for high-dimensional noisy data
- Reconstruction of a high-dimensional low-rank matrix
- Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models
- Hypothesis tests for high-dimensional covariance structures
- On asymptotic normality of cross data matrix-based PCA in high dimension low sample size
- Location-invariant tests of homogeneity of large-dimensional covariance matrices
- Limiting laws for divergent spiked eigenvalues and largest nonspiked eigenvalue of sample covariance matrices
- Overview of object oriented data analysis
- CORRELATION MATRIX OF EQUI-CORRELATED NORMAL POPULATION: FLUCTUATION OF THE LARGEST EIGENVALUE, SCALING OF THE BULK EIGENVALUES, AND STOCK MARKET
- High-dimensional hypothesis testing for allometric extension model
- On estimation of the noise variance in high dimensional probabilistic principal component analysis
- Asymptotics of hierarchical clustering for growing dimension
- Correlation tests for high-dimensional data using extended cross-data-matrix methodology
- Effective methodologies for high-dimensional data
- Statistical inference under the strongly spiked eigenvalue model
- Authors' response
- Discussion on ``Two-stage procedures for high-dimensional data by Makoto Aoshima and Kazuyoshi Yata
- scientific article; zbMATH DE number 7387552 (Why is no real title available?)
- Estimation of linear functional of large spectral density matrix and application to Whittle's approach
- Consistency of the objective general index in high-dimensional settings
- More about asymptotic properties of some binary classification methods for high dimensional data
- A survey of high dimension low sample size asymptotics
- Inference on high-dimensional mean vectors under the strongly spiked eigenvalue model
- A test of sphericity for high-dimensional data and its application for detection of divergently spiked noise
- Two-stage procedures for high-dimensional data
- Equality tests of high-dimensional covariance matrices under the strongly spiked eigenvalue model
- Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix
- A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data
- Binary discrimination methods for high-dimensional data with a geometric representation
- Analysis of high-dimensional one group repeated measures designs
- Asymptotic properties of the first principal component and equality tests of covariance matrices in high-dimension, low-sample-size context
- Double data piling leads to perfect classification
- Clustering by principal component analysis with Gaussian kernel in high-dimension, low-sample-size settings
- Polynomial whitening for high-dimensional data
- Statistical inference for high-dimension, low-sample-size data
- Perturbation theory for cross data matrix-based PCA
- Two-stage dimension reduction for noisy high-dimensional images and application to cryogenic electron microscopy
- Inference on high-dimensional mean vectors with fewer observations than the dimension
- Intrinsic dimensionality estimation of high-dimension, low sample size data with \(D\)-asymptotics
- A classifier under the strongly spiked eigenvalue model in high-dimension, low-sample-size context
- scientific article; zbMATH DE number 7376764 (Why is no real title available?)
- Semiparametric estimation of the high-dimensional elliptical distribution
- A High-Dimensional Two-Sample Test for Non-Gaussian Data under a Strongly Spiked Eigenvalue Model
- Equality tests of covariance matrices under a low-dimensional factor structure
- Asymptotic independence of spiked eigenvalues and linear spectral statistics for large sample covariance matrices
- Test for high-dimensional outliers with principal component analysis
This page was built for publication: Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q764487)