Conditional formulae for Gibbs-type exchangeable random partitions (Q373830)

Let \((X_{n})_{n\geq 1}\) be an \({\mathcal X}\)-valued exchangeable sequence, \(\operatorname{P}\) the random probability on \({\mathcal X}\) in the de Finetti representation. \(\operatorname{P}\) is supposed to be concentrated on the set of discrete probabilities and in the representation \(\operatorname{P}=\sum_{i\in I}p_{i}\varepsilon_{Y_{i}}\), where \((p_{i})\) and \((Y_{i})\) are independent. For every \(n\), consider the random partition \(\Pi_{n}\) of \(\{1,\dots,n\}\), defined by the exchangeable equivalence relation \(i\sim j\) if \(X_{i}=X_{j}\). It is characterized by the probabilities \[ p_{k}^{(n)}(n_{1},\dots,n_{k}), \text{ where }\sum_{i=1}^{k}n_{i}=n, \] that the number \(M_{i,n}\) of sets of cardinal \(i\) in \(\Pi_{n}\) is \(n_{i}\); \(k\) is denoted \(K_{n}\). If \[ p_{k}^{(n)}(n_{1},\dots,n_{k})=V_{n,k}\Pi_{i=1}^{k}(1-\sigma )_{n_{i}-1},\, \sigma \in (-\infty ,1), \] where generally \(a_{n}= a(a+1)\cdot \cdot \cdot (a+n-1)\), and \[ V_{n,k}=V_{n+1,k+1}+(n-\sigma k) V_{n+1,k},\, k\leq n,\text{ with } V_{1,1}=1, \] is called of Gibbs type. Let \(O_{i,m}^{n}\) be the number of sets of size \(i\) in \(\Pi_{n+m}\) intersecting \(\{1,\dots,n\}\), \(N_{i,m}^{n}\) the number of sets of size \(i\) in \(\Pi_{n+m}\) not intersecting \(\{1,\dots,n\}\), \(M_{i,m}^{n}=O_{i,m}^{n}+N_{i,m}^{n}\). The authors establish formulas for \(\operatorname{E}((M_{i,n})_{[q]})\) (\(a_{[q]} =a(a-1)\cdot \cdot \cdot (a-q+1)\)) and for \[ \operatorname{E}((O_{i,m}^{(n)})_{|q|}), \operatorname{E}((N_{i,m}^{(n)})_{|q|}), \text{ and }\operatorname{E}((M_{i,m}^{(n)})_{|q|}) \] being \(\cdot_{i,m}^{n}\) conditioned on \((K_{n},M_{1,n},\dots,M_{K_{n},n})\). The results are applied to three examples: D with \(\sigma =0\) and \(V_{n,k}=\theta^{k}/\theta_{n}\), \(\theta >0\), PD with \[ \sigma \in (0,1),\,V_{n,k}=\Pi_{i=0}^{k-1}(\theta +i\sigma )/\theta_{n},\, \theta > -\sigma, \] and Gnedin with \[ \sigma =-1,\, V_{n,k}=\gamma_{n-k}\Pi_{i=1}^{k-1}(i^{2}-\gamma i)\Pi_{i=1}^{n-1}(i^{2}+\gamma i)^{-1},\, \gamma \in [0,1). \] Explicit formulas for the distributions of \(O_{i,m}^{(n)}\), \(N_{i,m}^{(n)}\), \(M_{i,m}^{(n)}\) and for their means are obtained. Convergence in distribution results: For D, \(M_{i,n}\rightarrow \pi_{\theta /i}\) (\(\pi\) distributed according to a Poisson distribution), \[ M_{i,m}^{(n)}, N_{i,m}^{(n)}\rightarrow \pi_{(\theta +n)/i} \] for \(m\rightarrow \infty\), in PD \[ N_{i,m}^{(n)}/ m^{\sigma }, M_{i,m}^{(n)}/ m^{\sigma }\rightarrow \sigma (1-\sigma )_{i-1}i!^{-1}B Y, \] \[ K_{m}^{(n)}/ m^{\sigma }\rightarrow BY, B, Y \] are independent, \[ B \beta(j+\theta /\sigma ,n/\sigma -j), \,j=K_{n}, Y \] having density \[ (\Gamma (q\sigma +1) y^{q-1/\sigma -1}f_{\sigma }(y^{-1/\sigma }))/(\sigma \Gamma (q+1)) \] where \(q=(\theta +n)/\sigma\) and \(f_{\sigma }\) the density of a \(\sigma\)-stable \(\geq 0\) r.v. In Gnedin \(M_{i,m}^{(n)}, N_{i,m}^{(n)}\rightarrow 0\). In the paragraph ``genomic applications'', the authors study 2586 data, in PD, estimating the parameters to maximize the corresponding \(p_{k}^{(n)}(n_{1},\dots,n_{k})\). They study \(O_{\tau }^{(n)}= O_{1,m}^{(n)}+\dots+O_{\tau ,m}^{(n)}\) (the number of new genes appearing at most \(\tau\) times in the \(m\) experiments following after \(n\) ones), \(\tau =3,4,5\) and similar for \(N\), \(M\). They split into \(n=1000\), \(m=1586\), compare \(O\), \(N\), \(M\) with the predicted ones (using \(\operatorname{E}\)), then they determine the prediction for \(n=2586\), \(m= 250,500,750,1000\).

0 references

Mathematics Subject Classification ID

60G09

0 references

0 references

0 references

0 references