The multi-armed bandit problem: an efficient nonparametric solution (Q2176624)

From MaRDI portal

Jump to:navigation, search

scientific article

Language	Label	Description	Also known as
English	The multi-armed bandit problem: an efficient nonparametric solution	scientific article

Statements

scholarly article

0 references

The multi-armed bandit problem: an efficient nonparametric solution (English)

0 references

0 references

The Annals of Statistics

0 references

publication date

5 May 2020

0 references

full work available at URL

https://arxiv.org/abs/1703.08285

0 references

https://projecteuclid.org/euclid.aos/1581930138

0 references

The author treats the multi-armed bandit problem in the formulation which can be found in [\textit{T. L. Lai} and \textit{H. Robbins}, Adv. Appl. Math. 6, 4--22 (1985; Zbl 0568.62074)]. \textit{T. L. Lai} [Ann. Stat. 15, 1091--1114 (1987; Zbl 0643.62054)] provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback-Leibler information of the reward distributions, estimated from specified parametric families. The subject of this paper is a new nonparametric an arm allocation procedure subsample-mean comparison (SSMC) which is efficient when the reward distributions are from an unspecified one-dimensional exponential family. It achieves this by comparing subsample means of the leading arm with the sample means of its competitors. It is empirical in its approach, using more informative subsample means rather than full-sample means alone, for better decision-making.

0 references

zbMATH Keywords

efficiency

0 references

KL-UCB

0 references

subsampling

0 references

Thompson sampling

0 references

upper confidence bound (UCB)

0 references

Krzysztof J. Szajowski

0 references

MaRDI profile type

MaRDI publication profile

0 references

Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem

0 references

Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space

0 references

Finite-time analysis of the multiarmed bandit problem

0 references

A Bernoulli Two-armed Bandit

0 references

0 references

Optimal learning and experimentation in bandit problems.

0 references

Optimal adaptive policies for sequential allocation problems

0 references

Kullback-Leibler upper confidence bounds for optimal sequential allocation

0 references

Optimal stopping and dynamic allocation

0 references

Some Remarks on the Two-Armed Bandit

0 references

0 references

Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains

0 references

0 references

Adaptive treatment allocation and the multi-armed bandit problem

0 references

Asymptotically efficient adaptive allocation rules

0 references

0 references

A new approach to the design of reinforcement schemes for learning automata

0 references

Nonparametric bandit methods

0 references

Identifiers

zbMATH Open document ID

0 references

10.1214/19-AOS1809

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

zbMATH DE Number

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:2176624

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q2176624&oldid=37118820"