Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis (Q1650069)

scientific article

Language	Label	Description	Also known as
English	Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis	scientific article

Statements

instance of

scholarly article

0 references

title

Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis (English)

0 references

0 references

0 references

0 references

The Annals of Statistics

0 references

publication date

29 June 2018

0 references

full work available at URL

https://projecteuclid.org/euclid.aos/1525313075

0 references

review text

This paper deals with the problem of estimating the number of significant components in principal component analysis (PCA), which is known as the dimensionality in PCA. Specifically, let \(y_{1}\),\dots,\(y_{n}\) be a random sample of size \(n\) from a \(p\)-dimensional population with mean \(\mu\) and covariance matrix \(\Sigma\). The problem of estimating the dimensionality is considered as a problem of selecting an appropriate model from the set \(\{M_{0}, M_{1},\dots,M_{p-1}\}\), where \[ M_{k}=\lambda_{k}>\lambda_{k+1}=\dots=\lambda_{p}=\lambda, \] with \(\lambda_{1}\geqq\dots\geqq\lambda_{p}\) the population eigenvalues of the covariance matrix \(\Sigma\). In this context, the authors consider two estimation criteria, AIC [\textit{H. Akaike}, in: 2nd International Symposium on Information Theory, Tsahkadsor 1971, 267--281 (1973; Zbl 0283.62006)] and BIC [\textit{G. Schwarz}, Ann. Stat. 6, 461--464 (1978; Zbl 0379.62005)], and their purpose is to examine the consistency of the estimation criteria under a high-dimensional framework where \(p,n\rightarrow \infty\) such that \(p/n\rightarrow c>0\). It is assumed that the number of significant components, say \(k\), is fixed; that the number of candidate models is greater than \(k\) and that the fourth population moment is finite. Both the cases of \(p<n\) (\(0<c<1\)) and \(p>n\) (\(c>1\)) are discussed. In this last case, modified AIC and BIC criteria given on p.~1060 are considered. The main results of the paper are obtained by techniques from random matrix theory and are summarized as follows: {\parindent=6mm \begin{itemize}\item[a)] For \(0<c<1\), if \(\lambda_1\) is bounded then under the so-called gap condition (C3) given on p.~1057 of the paper, AIC is strongly consistent, but BIC is not. Furthermore, if \(\lambda_k\rightarrow \infty\) AIC is always strongly consistent regardless of whether the gap condition holds, while if \(\lambda_k/\log n\rightarrow \infty\) then BIC is strongly consistent. \item[b)] For \(c>1\), if \(\lambda_1\) is bounded then under the so-called modified gap condition (C5) given on p.~1060 of the paper, the modified AIC is strongly consistent, but the modified BIC is not. Furthermore, if \(\lambda_k\rightarrow \infty\) the modified AIC is always strongly consistent regardless of whether the modified gap condition holds, while if \(\lambda_k/\log n\rightarrow \infty\) then the modified BIC is strongly consistent. \end{itemize}} Finally, simulation studies show that the sufficient conditions given are essential.

0 references

reviewed by

Apostolos Batsidis

0 references

zbMATH Keywords

principal component analysis

0 references

dimensionality

0 references

AIC

0 references

BIC