Bayesian nonparametric bandits (Q1072315)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Bayesian nonparametric bandits
scientific article

    Statements

    Bayesian nonparametric bandits (English)
    0 references
    0 references
    0 references
    1985
    0 references
    A finite number (n) of selections are to be made from two independent stochastic processes, or ''arms''. The choice of arm at each stage is allowed to depend upon the full history of the process and the aim is to maximise the expectation of the sum of the observations. One arm produces i.i.d. observations with known probability measure, whereas observations from the other have a probability measure which is a Dirichlet process with parameter \(\alpha\). Call this latter process arm 1. Various monotonicity properties of optimal strategies are established, yielding corollaries of a ''stay-with-a-winner'' kind. For example, there exists a quantity \(b_ n(\alpha)\) such that if arm 1 is optimal initially and if \(X_ 1=x\) is observed then arm 1 is optimal again provided \(x\geq b_ n(\alpha).\) The determination of \(b_ n(\alpha)\) requires finding an optional strategy! However a simple upper bound is available which can be used to approximate an optimal rule. Calculations give some insight into the quality of such an approximation.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    sequential selections
    0 references
    sequential decisions
    0 references
    nonparametric decisions
    0 references
    one-armed bandits
    0 references
    two-armed bandits
    0 references
    Dirichlet process
    0 references
    monotonicity properties of optimal strategies
    0 references
    stay-with-a-winner
    0 references
    optional strategy
    0 references
    upper bound
    0 references
    approximation
    0 references
    0 references