On Bayesian index policies for sequential resource allocation (Q1750289)

From MaRDI portal
scientific article
Language Label Description Also known as
English
On Bayesian index policies for sequential resource allocation
scientific article

    Statements

    On Bayesian index policies for sequential resource allocation (English)
    0 references
    0 references
    0 references
    18 May 2018
    0 references
    Consider the stochastic multi-armed bandit problem. So, it is assumed that there are \(K\) (\(K \geq 2\)) arms with which an agent interacts in a sequential way on the control horizon \(T\). The choice of the \(a\)-th arm is accompanied with a random reward the distribution of which depends only on the chosen arm and is unknown to the agent. The goal is to maximize (in some sense) the total expected reward. If the agent knew distributions of the rewards, he should always use the oracle policy and choose the arm corresponding to the largest mathematical expectation of the reward. However, for the actually used policy \(A\) the total expected reward is less than corresponding to the oracle policy by a value of the regret \(R(T,A)\). In the article, asymptotically optimal Bayesian index policies for the multi-armed bandit problem are proposed for a large class of distributions of the rewards belonging to a one-dimensional exponential family. For these strategies \(R(T,A)\), considered in the frequentist view (i.e. for some fixed unknown distributions of the rewards), has the order \(\log T\). The proposed policies are the Bayes-UCB algorithms which rely on quantiles of posterior distributions. Bayesian insight on alternative exploration rates is also presented. To this end, proposed algorithms are compared with finite-horizon Gittins indices, kl-UCB\(^+\) and kl-UCB-H\(^{+}\) algorithms. Numerical results are given.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    multi-armed bandit problems
    0 references
    Bayesian methods
    0 references
    upper-confidence bounds
    0 references
    Gittins indices
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references