Bayesian nonparametric bandits (Q1072315)

A finite number (n) of selections are to be made from two independent stochastic processes, or ''arms''. The choice of arm at each stage is allowed to depend upon the full history of the process and the aim is to maximise the expectation of the sum of the observations. One arm produces i.i.d. observations with known probability measure, whereas observations from the other have a probability measure which is a Dirichlet process with parameter \(\alpha\). Call this latter process arm 1. Various monotonicity properties of optimal strategies are established, yielding corollaries of a ''stay-with-a-winner'' kind. For example, there exists a quantity \(b_ n(\alpha)\) such that if arm 1 is optimal initially and if \(X_ 1=x\) is observed then arm 1 is optimal again provided \(x\geq b_ n(\alpha).\) The determination of \(b_ n(\alpha)\) requires finding an optional strategy! However a simple upper bound is available which can be used to approximate an optimal rule. Calculations give some insight into the quality of such an approximation.

0 references

Mathematics Subject Classification ID

62L05

0 references

0 references

0 references

0 references