Multinomial interpoint distances (Q1706476)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Multinomial interpoint distances
scientific article

    Statements

    Multinomial interpoint distances (English)
    0 references
    0 references
    22 March 2018
    0 references
    Consider the following model: $r$ and $n$ are integers greater or equal than $2$. Define $\overline{(0,n)}$ as \[ \overline{(0,n)}:=\left\{ 0,1,\dots ,n\right\}. \] Let $E_{r,n}$ be given by \[ E_{r,n}:=\left\{(n_1,\dots,n_{r})^{\prime}\in \left(\overline{(0,n)}\right)^{r}\mid \sum_{i=1}^{r}n_{i}=n\right\}. \] $\mathbf{Z}=(Z_{1},\dots ,Z_{r})$ is a random field assuming values in $E_{r,n}$. Let $\mathbf{P}$ be given by \[ \mathbf{P}:=\left\{ \mathbf{p=}(p_{1},\dots ,p_{r})\in ( \left[ 0,1\right] )^{r}\mid \sum_{i=1}^{r}p_{i}=1\right\}. \] For each $\mathbf{p=}(p_{1},\dots ,p_{r})\in \mathbf{P}$, define \[ MT(r,n,\mathbf{p}):=\left\{ P\text{ is a probability on }E_{r,n} \text{ / }P(\left\{ (n_{1},\dots ,n_{r})\right\} )=n!\prod_{t=1}^{r}\frac{(p_{t})^{n_{t}}}{n_{t}!}\right\}. \] $N_{1}$ and $N_{2}$ are positive integers. $\mathbf{p}_{1}\mathbf{=}(p_{11},\dots ,p_{1r})\in \mathbf{P}$ and $\mathbf{p}_{2}\mathbf{=}(p_{21},\dots ,p_{2r})\in \mathbf{ P}$. $\mathbf{X}_{1},\dots ,\mathbf{X}_{N_{1}}$ are independent and identically $ MT(r,n,\mathbf{p}_{1})$-distributed random vectors. $\mathbf{Y}_{1},\dots ,\mathbf{Y}_{N_{2}}$ are independent and identically $ MT(r,n,\mathbf{p}_{2})$-distributed random vectors. For $1\leq i<j\leq N_{1}$ the squared interpoint distance (ID) between $ \mathbf{X}_{i}$ and $\mathbf{X}_{j}$ is the random variable defined by \[ d_{(x)ij}^{2}:=(\mathbf{X}_{i}-\mathbf{X}_{j})^{\prime}(\mathbf{X}_{i}-\mathbf{X}_{j})=\sum_{t=1}^{r}(X_{it}-X_{jt})^{2}. \] For $1\leq k<h\leq N_{2}$ the squared interpoint distance (ID) between $ \mathbf{Y}_{k}$ and $\mathbf{Y}_{h}$ is the random variable defined by \[ d_{(y)kh}^{2}:=(\mathbf{Y}_{k}-\mathbf{Y}_{h})^{\prime}(\mathbf{Y}_{k}-\mathbf{Y}_{h})=\sum_{t=1}^{r}(Y_{kt}-Y_{ht})^{2}. \] For $1\leq i\leq N_{1}$ and $1\leq k\leq N_{2}$ the squared interpoint distance (ID) between $\mathbf{X}_{i}$ and $\mathbf{Y}_{k}$ is the random variable defined by \[ d_{(xy)ik}^{2}:=(\mathbf{X}_{i}-\mathbf{Y}_{k})^{\prime}(\mathbf{X}_{i}-\mathbf{Y}_{k})=\sum_{t=1}^{r}(X_{it}-Y_{kt})^{2}. \] In this paper the probability distribution, the mean, the variance of $d_{(x)ij}^{2}$, of $d_{(xy)ik}^{2}$, and the joint probability distribution of $d_{(x)ij}^{2}$ and $d_{(y)kh}^{2}$, are calculated with detailed rigour under the hypothesis mentioned above. Also, three applications of statistics based on the random variables previously defined are provided. These applications are: testing $\mathbf{p}_{1}=\mathbf{p}_{2}$ against $\mathbf{p}_{1}\neq \mathbf{p}_{2}$ through a simulation study comparing five statistical tests, classification of data from $MT(r,n,\mathbf{p})$ distributions with large $r$ and, detection of multidimensional outliers. The paper is well written and organized. Its study is recommended for those who are interested in application problems where the number of variables is much larger than the sample size, such as in medical images for example.
    0 references
    0 references
    interpoint distance
    0 references
    multinomial
    0 references
    high dimension
    0 references
    testing
    0 references
    0 references
    0 references
    0 references