Asymptotically efficient adaptive allocation rules (Q1060517)

\(\Pi_ j\) \((j=1,...,k)\) denote statistical populations specified respectively by univariate density functions \(f(x;\theta_ j)\) with respect to some measure \(\nu\), where the form of f is known but the parameters \(\theta_ 1,...,\theta_ k\) are unknown. It is assumed that \(\int^{\infty}_{-\infty}| x| f(x;\theta)d\nu (x)<\infty\) for all possible values of \(\theta\). The problem is to sample \(x_ 1,x_ 2,..\). sequentially from the k populations in order to achieve the greatest possible expected value of the sum \(S_ n=x_ 1+...+x_ n\) as n approaches infinity. At each stage, we can use all the past observations to decide from which population to sample. Define \(\mu(\theta)\) as \(\int^{\infty}_{-\infty}xf(x;\theta)d\nu (x)\), and \(\mu^*\) as \(\max\{\mu (\theta_ 1),...,\mu (\theta_ k)\}\). Define \(R_ n(\theta_ 1,...,\theta_ k)\) as \(n\mu^*-E(S_ n)\). Then our problem is equivalent to minimizing \(R_ n(\theta_ 1,...,\theta_ k)\) as n approaches infinity. Let \(I(\theta,\lambda)\) denote \(\int^{\infty}_{-\infty}[\log (f(x;\theta)/f(x;\lambda))]f(x;\theta) d\nu(x).\) It is assumed that f is such that \(0<I(\theta,\lambda)<\infty\) whenever \(\mu (\lambda)>\mu (\theta)\), and for all \(\epsilon >0\) and all \(\theta,\lambda\) such that \(\mu(\lambda)> \mu(\theta)\), there exists \(\delta(\epsilon,\theta,\lambda)\) greater than zero for which \(| I(\theta,\lambda)-I(\theta,\lambda')| <\epsilon\) whenever \(\mu (\lambda)\leq \mu (\lambda')\leq \mu (\lambda)+\delta (\epsilon,\theta,\lambda).\) A sampling rule with the property that for any fixed values \(\theta_ 1,...,\theta_ k\) for which the \(\mu (\theta_ j)\) are not all equal, \(R_ n(\theta_ 1,...,\theta_ k)/(\log n)\) approaches \(\sum_{j:\mu (\theta_ j)<\mu^*}(\mu^*-\mu (\theta_ j))/I(\theta_ j,\theta^*)\) as n approaches infinity, where \(\theta^*\) is defined by \(\mu^*=\mu (\theta^*)\), will be called ''asymptotically efficient.'' (It is shown that this asymptotic value for \(R_ n(\theta_ 1,...,\theta_ k)\) is the smallest possible.) An asymptotically efficient sampling rule is constructed as follows. Suppose \(\{h_ i(Y_ 1,...,Y_ i)\}\) and \(\{g_{ni}(Y_ 1,...,Y_ i)\}\) are sequences of real-valued functions \((n=1,2,...\); \(i=1,...,n)\) with the following properties: \(g_{ni}\) is nondecreasing in \(n\geq i\) for every fixed \(i=1,2,..\). ; \(h_ i\leq g_{ni}\) for all \(n\geq i\). Assuming that \(Y_ 1,...,Y_ i\) are i.i.d. each with density \(f(y;\theta)\), then for all \(\theta\), \(P_{\theta}(r\leq g_{ni}(Y_ 1,...,Y_ n)\quad for\quad all\quad i\leq n)=1-o(n^{-1})\) for every \(r<\mu (\theta)\), \[ \lim_{\epsilon \downarrow 0}( \limsup_{n\to \infty}\sum^{n}_{i=1}[P_{\theta}\{g_{ni}(Y_ 1,...,Y_ i)\geq \mu(\lambda)-\epsilon \}]/(\log n))\leq 1/I(\theta,\lambda) \] whenever \[ \mu(\lambda)>\mu (\theta); \] \[ P_{\theta}\{\max_{\delta n\leq i\leq n}| h_ i(Y_ 1,...,Y_ i)-\mu (\theta)| >\epsilon \}=o(n^{- 1})\text{ for all }\epsilon >0\quad and\quad 0<\delta <1. \] For \(j=1,...,k\), let \(T_ n(j)\) denote the number of times that the rule samples from \(\Pi_ j\) up to stage n, and let \(Y_{j1},...,Y_{j,T_ n(j)}\) denote the successive observations. Define \({\hat \mu}_ n(j)=h_{T_ n(j)}(Y_{j1},...,Y_{j,T_ n(j)}),\) \(U_ n(j)=g_{n,T_ n(j)}(Y_{j1},...,Y_{j,T_ n(j)}),\) and let \(0<\delta <1/k.\) At stage \(j=1,...,k\), the rule takes one observation from \(\Pi_ j.\) Now suppose the rule has taken \(n\geq k\) observations. We choose \(j_ n\) such that \({\hat \mu}_ n(j_ n)=\max \{{\hat \mu}_ n(j):\quad T_ n(j)\geq \delta_ n\}.\) At stage \(n+1\), writing \(n+1=km+j,\) where m is a positive integer and j is one of the integers 1,...,k, we take an observation from \(\Pi_ j\) only if \({\hat \mu}_ n(j_ n)\leq U_ n(j),\) and sample from \(\Pi_{j_ n}\) otherwise. The sampling rule just described is asymptotically efficient. It is illustrated for the special cases of sampling from normal, Bernoulli, exponential, and Poisson distributions.

0 references

reviewed by

Lionel Weiss

0 references

zbMATH Keywords

adaptive allocation rules

0 references

asymptotically efficient sampling rule

0 references

normal

0 references

Bernoulli