Selection of variables in two-group discriminant analysis by error rate and Akaike's information criteria (Q1074986): Difference between revisions

The author considers two criteria for selecting the ''best'' subset of variables for the linear discriminant function in the case of two p- variate normal populations \(\Pi_ 1\), \(\Pi_ 2\) with different means and a common covariance matrix, the means and the matrix being unknown and are to be estimated by random samples of unequal sizes \(N_ 1\), \(N_ 2.\) One criterion is based on minimizing \textit{G. J. McLachlan's} asymptotic unbiased estimate [Biometrics 36, 501-510 (1980; Zbl 0442.62046)] for the error rate of misclassification \[ M(j)=\Phi [-2^{-1}D_ j+2^{- 1}(k_ j-1)(N_ 1^{-1}+N_ 2^{-1})/D_ j+\quad \{32(N_ 1+N_ 2-2)\}^{-1}D_ j\{4(4k_ j-1)-D^ 2_ j\}] \] where \(D_ j\) is the j-subset sample Mahalanobis distance between \(\Pi_ 1\) and \(\Pi_ 2\), and \(k_ j\) is the dimension of this subset. The other selection criterion is based on a ''no additional information'' model minimizing Akaike's information criterion \[ A(j)=(N_ 1+N_ 2)\log \{1+(p-k_ j)F(j)/(N_ 1+N_ 2-p-1)\}+2(k_ j-p), \] \[ where\quad F(j)=\{(N_ 1+N_ 2-p-1)/(p-k_ j)\}(D^ 2-D^ 2_ j)/\{(N_ 1+N_ 2-2)(N_ 1^{-1\quad}+N_ 2^{-1})+D_ j^ 2\}, \] D being the p-variate Mahalanobis distance. It is shown that the expected error rate is closely related to the no additional information model. The asymptotic distributions and error rate risks of both criteria are obtained and are shown to be identical for these criteria, so in this sense the two criteria considered are asymptotically equivalent.

0 references

zbMATH Keywords

two-group discriminant analysis

0 references

selection of variables

0 references

linear discriminant function

0 references

p-variate normal populations

0 references

different means

0 references

common covariance matrix

0 references

asymptotic unbiased estimate

0 references

error rate of misclassification

0 references

Mahalanobis distance

0 references

selection

0 references

Akaike's information criterion

0 references

no additional information model

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

A new look at the statistical model identification

0 references

Investigating the relative importance of individual variables and variable subsets in discriminant analysis

0 references

A criterion for variable selection in multiple discriminant analysis

0 references

Selection of Variables in Discriminant Analysis by F-Statistic and Error Rate

0 references

Q4743596

0 references

Q4125596

0 references

An Asymptotic Unbiased Technique for Estimating the Error Rates in Discriminant Analysis

0 references

A Criterion for Selecting Variables for the Linear Discriminant Function

0 references

On the Relationship between the F Test and the Overall Error Rate for Variable Selection in Two-Group Discriminant Analysis

0 references

An Asymptotic Expansion for the Distribution of the Linear Discriminant Function

0 references

Q5644917

0 references

Linear Statistical Inference and its Applications

0 references

Selection of the order of an autoregressive model by Akaike's information criterion

0 references

A Combinatorial Lemma and Its Application to Probability Theory

0 references

full work available at URL

https://doi.org/10.1016/0047-259x(85)90092-2

0 references

Identifiers

zbMATH Open document ID

0591.62053

0 references

DOI

10.1016/0047-259X(85)90092-2

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1074986

@@ Property / full work available at URL @@
+https://doi.org/10.1016/0047-259x(85)90092-2
+Normal rank
@@ Property / OpenAlex ID @@
+W2061038390
@@ Property / OpenAlex ID: W2061038390 / rank @@
+Normal rank