Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis (Q1650069)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis
scientific article

    Statements

    Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    29 June 2018
    0 references
    This paper deals with the problem of estimating the number of significant components in principal component analysis (PCA), which is known as the dimensionality in PCA. Specifically, let \(y_{1}\),\dots,\(y_{n}\) be a random sample of size \(n\) from a \(p\)-dimensional population with mean \(\mu\) and covariance matrix \(\Sigma\). The problem of estimating the dimensionality is considered as a problem of selecting an appropriate model from the set \(\{M_{0}, M_{1},\dots,M_{p-1}\}\), where \[ M_{k}=\lambda_{k}>\lambda_{k+1}=\dots=\lambda_{p}=\lambda, \] with \(\lambda_{1}\geqq\dots\geqq\lambda_{p}\) the population eigenvalues of the covariance matrix \(\Sigma\). In this context, the authors consider two estimation criteria, AIC [\textit{H. Akaike}, in: 2nd International Symposium on Information Theory, Tsahkadsor 1971, 267--281 (1973; Zbl 0283.62006)] and BIC [\textit{G. Schwarz}, Ann. Stat. 6, 461--464 (1978; Zbl 0379.62005)], and their purpose is to examine the consistency of the estimation criteria under a high-dimensional framework where \(p,n\rightarrow \infty\) such that \(p/n\rightarrow c>0\). It is assumed that the number of significant components, say \(k\), is fixed; that the number of candidate models is greater than \(k\) and that the fourth population moment is finite. Both the cases of \(p<n\) (\(0<c<1\)) and \(p>n\) (\(c>1\)) are discussed. In this last case, modified AIC and BIC criteria given on p.~1060 are considered. The main results of the paper are obtained by techniques from random matrix theory and are summarized as follows: {\parindent=6mm \begin{itemize}\item[a)] For \(0<c<1\), if \(\lambda_1\) is bounded then under the so-called gap condition (C3) given on p.~1057 of the paper, AIC is strongly consistent, but BIC is not. Furthermore, if \(\lambda_k\rightarrow \infty\) AIC is always strongly consistent regardless of whether the gap condition holds, while if \(\lambda_k/\log n\rightarrow \infty\) then BIC is strongly consistent. \item[b)] For \(c>1\), if \(\lambda_1\) is bounded then under the so-called modified gap condition (C5) given on p.~1060 of the paper, the modified AIC is strongly consistent, but the modified BIC is not. Furthermore, if \(\lambda_k\rightarrow \infty\) the modified AIC is always strongly consistent regardless of whether the modified gap condition holds, while if \(\lambda_k/\log n\rightarrow \infty\) then the modified BIC is strongly consistent. \end{itemize}} Finally, simulation studies show that the sufficient conditions given are essential.
    0 references
    0 references
    principal component analysis
    0 references
    dimensionality
    0 references
    AIC
    0 references
    BIC
    0 references
    consistency
    0 references
    high-dimensional asymptotic framework
    0 references