Selection of variables in two-group discriminant analysis by error rate and Akaike's information criteria (Q1074986)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Selection of variables in two-group discriminant analysis by error rate and Akaike's information criteria
scientific article

    Statements

    Selection of variables in two-group discriminant analysis by error rate and Akaike's information criteria (English)
    0 references
    0 references
    1985
    0 references
    The author considers two criteria for selecting the ''best'' subset of variables for the linear discriminant function in the case of two p- variate normal populations \(\Pi_ 1\), \(\Pi_ 2\) with different means and a common covariance matrix, the means and the matrix being unknown and are to be estimated by random samples of unequal sizes \(N_ 1\), \(N_ 2.\) One criterion is based on minimizing \textit{G. J. McLachlan's} asymptotic unbiased estimate [Biometrics 36, 501-510 (1980; Zbl 0442.62046)] for the error rate of misclassification \[ M(j)=\Phi [-2^{-1}D_ j+2^{- 1}(k_ j-1)(N_ 1^{-1}+N_ 2^{-1})/D_ j+\quad \{32(N_ 1+N_ 2-2)\}^{-1}D_ j\{4(4k_ j-1)-D^ 2_ j\}] \] where \(D_ j\) is the j-subset sample Mahalanobis distance between \(\Pi_ 1\) and \(\Pi_ 2\), and \(k_ j\) is the dimension of this subset. The other selection criterion is based on a ''no additional information'' model minimizing Akaike's information criterion \[ A(j)=(N_ 1+N_ 2)\log \{1+(p-k_ j)F(j)/(N_ 1+N_ 2-p-1)\}+2(k_ j-p), \] \[ where\quad F(j)=\{(N_ 1+N_ 2-p-1)/(p-k_ j)\}(D^ 2-D^ 2_ j)/\{(N_ 1+N_ 2-2)(N_ 1^{-1\quad}+N_ 2^{-1})+D_ j^ 2\}, \] D being the p-variate Mahalanobis distance. It is shown that the expected error rate is closely related to the no additional information model. The asymptotic distributions and error rate risks of both criteria are obtained and are shown to be identical for these criteria, so in this sense the two criteria considered are asymptotically equivalent.
    0 references
    0 references
    0 references
    0 references
    0 references
    two-group discriminant analysis
    0 references
    selection of variables
    0 references
    linear discriminant function
    0 references
    p-variate normal populations
    0 references
    different means
    0 references
    common covariance matrix
    0 references
    asymptotic unbiased estimate
    0 references
    error rate of misclassification
    0 references
    Mahalanobis distance
    0 references
    selection
    0 references
    Akaike's information criterion
    0 references
    no additional information model
    0 references