Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis (Q1650069)

From MaRDI portal
Revision as of 18:25, 24 July 2023 by Importer (talk | contribs) (‎Created a new Item)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
scientific article
Language Label Description Also known as
English
Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis
scientific article

    Statements

    Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    29 June 2018
    0 references
    This paper deals with the problem of estimating the number of significant components in principal component analysis (PCA), which is known as the dimensionality in PCA. Specifically, let \(y_{1}\),\dots,\(y_{n}\) be a random sample of size \(n\) from a \(p\)-dimensional population with mean \(\mu\) and covariance matrix \(\Sigma\). The problem of estimating the dimensionality is considered as a problem of selecting an appropriate model from the set \(\{M_{0}, M_{1},\dots,M_{p-1}\}\), where \[ M_{k}=\lambda_{k}>\lambda_{k+1}=\dots=\lambda_{p}=\lambda, \] with \(\lambda_{1}\geqq\dots\geqq\lambda_{p}\) the population eigenvalues of the covariance matrix \(\Sigma\). In this context, the authors consider two estimation criteria, AIC [\textit{H. Akaike}, in: 2nd International Symposium on Information Theory, Tsahkadsor 1971, 267--281 (1973; Zbl 0283.62006)] and BIC [\textit{G. Schwarz}, Ann. Stat. 6, 461--464 (1978; Zbl 0379.62005)], and their purpose is to examine the consistency of the estimation criteria under a high-dimensional framework where \(p,n\rightarrow \infty\) such that \(p/n\rightarrow c>0\). It is assumed that the number of significant components, say \(k\), is fixed; that the number of candidate models is greater than \(k\) and that the fourth population moment is finite. Both the cases of \(p<n\) (\(0<c<1\)) and \(p>n\) (\(c>1\)) are discussed. In this last case, modified AIC and BIC criteria given on p.~1060 are considered. The main results of the paper are obtained by techniques from random matrix theory and are summarized as follows: {\parindent=6mm \begin{itemize}\item[a)] For \(0<c<1\), if \(\lambda_1\) is bounded then under the so-called gap condition (C3) given on p.~1057 of the paper, AIC is strongly consistent, but BIC is not. Furthermore, if \(\lambda_k\rightarrow \infty\) AIC is always strongly consistent regardless of whether the gap condition holds, while if \(\lambda_k/\log n\rightarrow \infty\) then BIC is strongly consistent. \item[b)] For \(c>1\), if \(\lambda_1\) is bounded then under the so-called modified gap condition (C5) given on p.~1060 of the paper, the modified AIC is strongly consistent, but the modified BIC is not. Furthermore, if \(\lambda_k\rightarrow \infty\) the modified AIC is always strongly consistent regardless of whether the modified gap condition holds, while if \(\lambda_k/\log n\rightarrow \infty\) then the modified BIC is strongly consistent. \end{itemize}} Finally, simulation studies show that the sufficient conditions given are essential.
    0 references
    0 references
    principal component analysis
    0 references
    dimensionality
    0 references
    AIC
    0 references
    BIC
    0 references
    consistency
    0 references
    high-dimensional asymptotic framework
    0 references