Divergence-based estimation and testing of statistical models of classification (Q1898412)

A frequent problem of categorical data analysis is that a fixed number \(n\) of samples \(X = (X_1, \dots, X_n) \in {\mathcal X}^n\) is taken from each of \(N\) different populations (families of individuals, clusters of objects). The sample space \(\mathcal X\) is classified into \(r\) categories by a rule \(\rho : {\mathcal X} \to \{1, \dots, r\}\). Let \(Y = (Y_1,\dots, Y_r)\) be the classification vector with the components representing counts of the respective categories in the sample vector \(X\); i.e., let \[ Y_j = \#\{1 \leq k \leq n: \rho(X_k) = j\},\quad 1 \leq j\leq r. \] The sample space of the vector \(Y\) is denoted by \(S_{n,r}\); i.e., \[ S_{n,r} = \{y = (y_1,\dots, y_r) \in \{0,1,\dots, n\}^r : y_1 + \cdots + y_r = n\}. \] Populations \(i =1,\dots, N\) generate different sampel vectors \(X^{(i)}\) and the corresponding classification vectors \(Y^{(i)}\). The sampled populations are assumed to be independent and homogeneous in the sense that \(X^{(i)}\), and consequently \(Y^{(i)}\), are independent realizations of the above considered \(X\) and \(Y\). The i.i.d. property of the components \(X_1,\dots, X_n\) is included as a special case. The aim of this paper is to present an extended class of methods for estimating parameters of statistical models of vectors \(Y\) and for testing statistical hypotheses about these models. Our methods are based on the so-called \(\phi\)-divergences of probability distributions. They include as particular cases the well-known maximum likelihood method of estimation and Pearson's \(X^2\)-method of testing. Asymptotic properties of estimators minimizing \(\phi\)-divergence between theoretical and empirical vectors of means are established. Asymptotic distributions of \(\phi\)-divergences between empirical and estimated vectors of means are explicitly evaluated, and tests based on these statistics are studied.

0 references

zbMATH Keywords

phi divergence

0 references

classification

0 references

clustered data

0 references

minimum divergence estimation

0 references

minimum divergence testing

0 references