Selection of variables in two-group discriminant analysis by error rate and Akaike's information criteria (Q1074986)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Selection of variables in two-group discriminant analysis by error rate and Akaike's information criteria |
scientific article |
Statements
Selection of variables in two-group discriminant analysis by error rate and Akaike's information criteria (English)
0 references
1985
0 references
The author considers two criteria for selecting the ''best'' subset of variables for the linear discriminant function in the case of two p- variate normal populations \(\Pi_ 1\), \(\Pi_ 2\) with different means and a common covariance matrix, the means and the matrix being unknown and are to be estimated by random samples of unequal sizes \(N_ 1\), \(N_ 2.\) One criterion is based on minimizing \textit{G. J. McLachlan's} asymptotic unbiased estimate [Biometrics 36, 501-510 (1980; Zbl 0442.62046)] for the error rate of misclassification \[ M(j)=\Phi [-2^{-1}D_ j+2^{- 1}(k_ j-1)(N_ 1^{-1}+N_ 2^{-1})/D_ j+\quad \{32(N_ 1+N_ 2-2)\}^{-1}D_ j\{4(4k_ j-1)-D^ 2_ j\}] \] where \(D_ j\) is the j-subset sample Mahalanobis distance between \(\Pi_ 1\) and \(\Pi_ 2\), and \(k_ j\) is the dimension of this subset. The other selection criterion is based on a ''no additional information'' model minimizing Akaike's information criterion \[ A(j)=(N_ 1+N_ 2)\log \{1+(p-k_ j)F(j)/(N_ 1+N_ 2-p-1)\}+2(k_ j-p), \] \[ where\quad F(j)=\{(N_ 1+N_ 2-p-1)/(p-k_ j)\}(D^ 2-D^ 2_ j)/\{(N_ 1+N_ 2-2)(N_ 1^{-1\quad}+N_ 2^{-1})+D_ j^ 2\}, \] D being the p-variate Mahalanobis distance. It is shown that the expected error rate is closely related to the no additional information model. The asymptotic distributions and error rate risks of both criteria are obtained and are shown to be identical for these criteria, so in this sense the two criteria considered are asymptotically equivalent.
0 references
two-group discriminant analysis
0 references
selection of variables
0 references
linear discriminant function
0 references
p-variate normal populations
0 references
different means
0 references
common covariance matrix
0 references
asymptotic unbiased estimate
0 references
error rate of misclassification
0 references
Mahalanobis distance
0 references
selection
0 references
Akaike's information criterion
0 references
no additional information model
0 references
0 references
0 references