Combining multiple strategies for multiarmed bandit problems and asymptotic optimality (Q892592): Difference between revisions

Summary: This brief paper provides a simple algorithm that selects a strategy at each time in a given set of multiple strategies for stochastic multiarmed bandit problems, thereby playing the arm by the chosen strategy at each time. The algorithm follows the idea of the probabilistic \(\epsilon_t\)-switching in the \(\epsilon_t\)-greedy strategy and is asymptotically optimal in the sense that the selected strategy converges to the best in the set under some conditions on the strategies in the set and the sequence of \(\epsilon_t\).

0 references

zbMATH Keywords

multiarmed bandit problems

0 references

asymptotic optimality

0 references

multiple strategies

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1155/2015/264953

0 references

cites work

Online Learning Methods for Networking

0 references

Prediction, Learning, and Games

0 references

Some aspects of the sequential design of experiments

0 references

Q3329417

0 references

Randomised allocation of treatments in sequential trials

0 references

Finite-time analysis of the multiarmed bandit problem

0 references

Q4057976

0 references

The Nonstochastic Multiarmed Bandit Problem

0 references

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

0 references

Combining expert advice in reactive environments

0 references

Identifiers

zbMATH Open document ID

1326.93115

0 references

DOI

10.1155/2015/264953

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:892592

@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.1155/2015/264953
+Normal rank
@@ Property / OpenAlex ID @@
+W2010356817
@@ Property / OpenAlex ID: W2010356817 / rank @@
+Normal rank
@@ Property / cites work @@
+Online Learning Methods for Networking
@@ Property / cites work: Online Learning Methods for Networking / rank @@
+Normal rank
@@ Property / cites work @@
+Prediction, Learning, and Games
@@ Property / cites work: Prediction, Learning, and Games / rank @@
+Normal rank
@@ Property / cites work @@
+Some aspects of the sequential design of experiments
+Normal rank
@@ Property / cites work @@
+Q3329417
@@ Property / cites work: Q3329417 / rank @@
+Normal rank
@@ Property / cites work @@
+Randomised allocation of treatments in sequential trials
+Normal rank
@@ Property / cites work @@
+Finite-time analysis of the multiarmed bandit problem
+Normal rank
@@ Property / cites work @@
+Q4057976
@@ Property / cites work: Q4057976 / rank @@
+Normal rank
@@ Property / cites work @@
+The Nonstochastic Multiarmed Bandit Problem
@@ Property / cites work: The Nonstochastic Multiarmed Bandit Problem / rank @@
+Normal rank
@@ Property / cites work @@
+Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
+Normal rank
@@ Property / cites work @@
+Combining expert advice in reactive environments
@@ Property / cites work: Combining expert advice in reactive environments / rank @@
+Normal rank
@@ links / mardi / name / links / mardi / name @@
+Publication:892592