Distributed estimation of principal eigenspaces (Q2284361)

From MaRDI portal

Jump to:navigation, search

scientific article

Language	Label	Description	Also known as
English	Distributed estimation of principal eigenspaces	scientific article

Statements

scholarly article

0 references

Distributed estimation of principal eigenspaces (English)

0 references

0 references

The Annals of Statistics

0 references

publication date

15 January 2020

0 references

full work available at URL

https://arxiv.org/abs/1702.06488

0 references

https://projecteuclid.org/euclid.aos/1572487381

0 references

The authors deal with the problem of large data sets being scattered across distant places. Massive datasets are nowadays ubiquitous and interesting examples are motivating the authors, as the data recorded by IT companies from all around the world (which cannot be stored in a single data center) or health records that are scattered across many hospitals or countries. The fusion or aggregation of such data sets is extremely difficult due to communication cost, privacy, data security, ownerships and other factors. A typical approach is based on distributed statistical/regression methods that first calculate local statistics based on each subdataset and then combine all the subsample-based statistics to produce an aggregated statistic. Principal component analysis (PCA) as a tool in statistical machine learning deals in the recent literature with a certain sparsity on top eigenvectors imposed to overcome the noise accumulation. Distributed PCA needs to handle data that are partitioned and stored across multiple servers. The paper contains a distributed algorithm for estimating the top eigenvectors, the statistical error rates of the aggregated estimator and simulation results to validate the theories, under sub-Gaussian assumptions of the data (where the tails are dominated by the tails of a Gaussian). Further interesting research directions are mentioned, for instance to investigate the possibility to use heavy-tailed distributions (heavier tails than sub-Gaussian tails), when to establish statistical rate with exponential deviation require shrinkage of the data and the control of the induced bias.

0 references

zbMATH Keywords

scattered data sets

0 references

machine learning, regression method

0 references

distributed algorithm

0 references

Flavia-Corina Mitroi-Symeonidis

0 references

MaRDI profile type

MaRDI publication profile

0 references

Asymptotic Theory for Principal Component Analysis

0 references

Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices

0 references

Optimal principal component analysis in distributed and streaming models

0 references

Sparse PCA: optimal rates and adaptive estimation

0 references

A split-and-conquer approach for analysis of

0 references

Second order accurate distributed eigenvector computation for extremely large matrices

0 references

Distributed estimation of principal eigenspaces

0 references

0 references

Learning theory of distributed spectral algorithms

0 references

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

0 references

On the distribution of the largest eigenvalue in principal components analysis

0 references

On Consistency and Sparsity for Principal Components Analysis in High Dimensions

0 references

PCA consistency in high dimension, low sample size context

0 references

Distributed clustering using collective principal component analysis

0 references

0 references

Inference for Density Families Using Functional Principal Component Analysis

0 references

Asymptotics and concentration bounds for bilinear forms of spectral projectors of sample covariance

0 references

Concentration inequalities and moment bounds for sample covariance operators

0 references

0 references

Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries

0 references

0 references

Finite sample approximation results for principal component analysis: A matrix perturbation approach

0 references

Asymptotics of the principal components estimator of large factor models with weakly influential factors

0 references

0 references

Nonasymptotic upper bounds for the reconstruction error of PCA

0 references

A Distributed Framework for Dimensionality Reduction and Denoising

0 references

Consistency of sparse PCA in high dimension, low sample size contexts

0 references

The statistics and mathematics of high dimension low sample size asymptotics

0 references

Practical Sketching Algorithms for Low-Rank Matrix Approximation

0 references

A Second-Order Perturbation Expansion for the SVD

0 references

Minimax sparse principal subspace estimation in high dimensions

0 references

Singular Vector Perturbation Under Gaussian Noise

0 references

Asymptotics of empirical eigenstructure for high dimensional spiked covariance

0 references

A useful variant of the Davis–Kahan theorem for statisticians

0 references

Identifiers

zbMATH Open document ID

0 references

10.1214/18-AOS1713

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

zbMATH DE Number

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2284361

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q2284361&oldid=36964298"