The multi-armed bandit problem: an efficient nonparametric solution (Q2176624)
From MaRDI portal
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | The multi-armed bandit problem: an efficient nonparametric solution |
scientific article |
Statements
The multi-armed bandit problem: an efficient nonparametric solution (English)
0 references
5 May 2020
0 references
The author treats the multi-armed bandit problem in the formulation which can be found in [\textit{T. L. Lai} and \textit{H. Robbins}, Adv. Appl. Math. 6, 4--22 (1985; Zbl 0568.62074)]. \textit{T. L. Lai} [Ann. Stat. 15, 1091--1114 (1987; Zbl 0643.62054)] provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback-Leibler information of the reward distributions, estimated from specified parametric families. The subject of this paper is a new nonparametric an arm allocation procedure subsample-mean comparison (SSMC) which is efficient when the reward distributions are from an unspecified one-dimensional exponential family. It achieves this by comparing subsample means of the leading arm with the sample means of its competitors. It is empirical in its approach, using more informative subsample means rather than full-sample means alone, for better decision-making.
0 references
efficiency
0 references
KL-UCB
0 references
subsampling
0 references
Thompson sampling
0 references
upper confidence bound (UCB)
0 references
0 references
0 references