The multi-armed bandit problem: an efficient nonparametric solution (Q2176624)

From MaRDI portal
scientific article
Language Label Description Also known as
English
The multi-armed bandit problem: an efficient nonparametric solution
scientific article

    Statements

    The multi-armed bandit problem: an efficient nonparametric solution (English)
    0 references
    0 references
    5 May 2020
    0 references
    The author treats the multi-armed bandit problem in the formulation which can be found in [\textit{T. L. Lai} and \textit{H. Robbins}, Adv. Appl. Math. 6, 4--22 (1985; Zbl 0568.62074)]. \textit{T. L. Lai} [Ann. Stat. 15, 1091--1114 (1987; Zbl 0643.62054)] provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback-Leibler information of the reward distributions, estimated from specified parametric families. The subject of this paper is a new nonparametric an arm allocation procedure subsample-mean comparison (SSMC) which is efficient when the reward distributions are from an unspecified one-dimensional exponential family. It achieves this by comparing subsample means of the leading arm with the sample means of its competitors. It is empirical in its approach, using more informative subsample means rather than full-sample means alone, for better decision-making.
    0 references
    efficiency
    0 references
    KL-UCB
    0 references
    subsampling
    0 references
    Thompson sampling
    0 references
    upper confidence bound (UCB)
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references