Multinomial interpoint distances (Q1706476)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Multinomial interpoint distances |
scientific article |
Statements
Multinomial interpoint distances (English)
0 references
22 March 2018
0 references
Consider the following model: $r$ and $n$ are integers greater or equal than $2$. Define $\overline{(0,n)}$ as \[ \overline{(0,n)}:=\left\{ 0,1,\dots ,n\right\}. \] Let $E_{r,n}$ be given by \[ E_{r,n}:=\left\{(n_1,\dots,n_{r})^{\prime}\in \left(\overline{(0,n)}\right)^{r}\mid \sum_{i=1}^{r}n_{i}=n\right\}. \] $\mathbf{Z}=(Z_{1},\dots ,Z_{r})$ is a random field assuming values in $E_{r,n}$. Let $\mathbf{P}$ be given by \[ \mathbf{P}:=\left\{ \mathbf{p=}(p_{1},\dots ,p_{r})\in ( \left[ 0,1\right] )^{r}\mid \sum_{i=1}^{r}p_{i}=1\right\}. \] For each $\mathbf{p=}(p_{1},\dots ,p_{r})\in \mathbf{P}$, define \[ MT(r,n,\mathbf{p}):=\left\{ P\text{ is a probability on }E_{r,n} \text{ / }P(\left\{ (n_{1},\dots ,n_{r})\right\} )=n!\prod_{t=1}^{r}\frac{(p_{t})^{n_{t}}}{n_{t}!}\right\}. \] $N_{1}$ and $N_{2}$ are positive integers. $\mathbf{p}_{1}\mathbf{=}(p_{11},\dots ,p_{1r})\in \mathbf{P}$ and $\mathbf{p}_{2}\mathbf{=}(p_{21},\dots ,p_{2r})\in \mathbf{ P}$. $\mathbf{X}_{1},\dots ,\mathbf{X}_{N_{1}}$ are independent and identically $ MT(r,n,\mathbf{p}_{1})$-distributed random vectors. $\mathbf{Y}_{1},\dots ,\mathbf{Y}_{N_{2}}$ are independent and identically $ MT(r,n,\mathbf{p}_{2})$-distributed random vectors. For $1\leq i<j\leq N_{1}$ the squared interpoint distance (ID) between $ \mathbf{X}_{i}$ and $\mathbf{X}_{j}$ is the random variable defined by \[ d_{(x)ij}^{2}:=(\mathbf{X}_{i}-\mathbf{X}_{j})^{\prime}(\mathbf{X}_{i}-\mathbf{X}_{j})=\sum_{t=1}^{r}(X_{it}-X_{jt})^{2}. \] For $1\leq k<h\leq N_{2}$ the squared interpoint distance (ID) between $ \mathbf{Y}_{k}$ and $\mathbf{Y}_{h}$ is the random variable defined by \[ d_{(y)kh}^{2}:=(\mathbf{Y}_{k}-\mathbf{Y}_{h})^{\prime}(\mathbf{Y}_{k}-\mathbf{Y}_{h})=\sum_{t=1}^{r}(Y_{kt}-Y_{ht})^{2}. \] For $1\leq i\leq N_{1}$ and $1\leq k\leq N_{2}$ the squared interpoint distance (ID) between $\mathbf{X}_{i}$ and $\mathbf{Y}_{k}$ is the random variable defined by \[ d_{(xy)ik}^{2}:=(\mathbf{X}_{i}-\mathbf{Y}_{k})^{\prime}(\mathbf{X}_{i}-\mathbf{Y}_{k})=\sum_{t=1}^{r}(X_{it}-Y_{kt})^{2}. \] In this paper the probability distribution, the mean, the variance of $d_{(x)ij}^{2}$, of $d_{(xy)ik}^{2}$, and the joint probability distribution of $d_{(x)ij}^{2}$ and $d_{(y)kh}^{2}$, are calculated with detailed rigour under the hypothesis mentioned above. Also, three applications of statistics based on the random variables previously defined are provided. These applications are: testing $\mathbf{p}_{1}=\mathbf{p}_{2}$ against $\mathbf{p}_{1}\neq \mathbf{p}_{2}$ through a simulation study comparing five statistical tests, classification of data from $MT(r,n,\mathbf{p})$ distributions with large $r$ and, detection of multidimensional outliers. The paper is well written and organized. Its study is recommended for those who are interested in application problems where the number of variables is much larger than the sample size, such as in medical images for example.
0 references
interpoint distance
0 references
multinomial
0 references
high dimension
0 references
testing
0 references
0 references