Bayesian nonparametric bandits (Q1072315)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Bayesian nonparametric bandits |
scientific article |
Statements
Bayesian nonparametric bandits (English)
0 references
1985
0 references
A finite number (n) of selections are to be made from two independent stochastic processes, or ''arms''. The choice of arm at each stage is allowed to depend upon the full history of the process and the aim is to maximise the expectation of the sum of the observations. One arm produces i.i.d. observations with known probability measure, whereas observations from the other have a probability measure which is a Dirichlet process with parameter \(\alpha\). Call this latter process arm 1. Various monotonicity properties of optimal strategies are established, yielding corollaries of a ''stay-with-a-winner'' kind. For example, there exists a quantity \(b_ n(\alpha)\) such that if arm 1 is optimal initially and if \(X_ 1=x\) is observed then arm 1 is optimal again provided \(x\geq b_ n(\alpha).\) The determination of \(b_ n(\alpha)\) requires finding an optional strategy! However a simple upper bound is available which can be used to approximate an optimal rule. Calculations give some insight into the quality of such an approximation.
0 references
sequential selections
0 references
sequential decisions
0 references
nonparametric decisions
0 references
one-armed bandits
0 references
two-armed bandits
0 references
Dirichlet process
0 references
monotonicity properties of optimal strategies
0 references
stay-with-a-winner
0 references
optional strategy
0 references
upper bound
0 references
approximation
0 references