On monotone optimal decision rules and the stay-on-a-winner rule for the two-armed bandit (Q1821704): Difference between revisions

Consider the following optimization problem: Find a decision rule $\delta$ such that $w(x,\delta (x))=\max_{a}w(x,a)$ for all x under the constraint $\delta$ (x)$\in D(x)$. We give conditions for the existence of monotone optimal decision rules $\delta$. The term 'monotone' is used in a general sense. The well-known stay-on-a-winner rules for the two- armed bandit can be characterized as monotone decision rules by including the stage number into x and using a special ordering on x. This enables us to give simple conditions for the existence of optimal rules that are stay-on-a-winner rules. We extend results of \textit{D. A. Berry} [Ann. Math. Stat. 43, 871-897 (1972; Zbl 0258.62013)] and \textit{D. Kalin} and \textit{R. Theodorescu} [Math. Operations-Forsch. Stat., Ser. Optimization 13, 469-472 (1982; Zbl 0505.90080)] to the case of dependent arms.

0 references

zbMATH Keywords

existence of monotone optimal decision rules

0 references

stay-on-a-winner rules

0 references

two-armed bandit

0 references

MaRDI profile type

MaRDI publication profile

0 references

cites work

On the k-armed Bernoulli bandit: monotonicity of the total reward under an arbitrary prior distribution

0 references

A Bernoulli Two-armed Bandit

0 references

On Sequential Designs for Maximizing the Sum of $n$ Observations

0 references

On the Bernoulli two-armed bandit problem

0 references

Q3725880

0 references

A note on ‘monotone optimal policies for markov decision processes’

0 references

Q4198358

0 references

A note on structural properties of the Bernoulli two-armed bandit problem

0 references

Some Concepts of Dependence

0 references

Minimizing a Submodular Function on a Lattice

0 references

Identifiers

zbMATH Open document ID

0616.90093

0 references

DOI

10.1007/BF01897828

0 references

Mathematics Subject Classification ID

90C40

0 references

zbMATH DE Number

3999703

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1821704

@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / cites work @@
+On the k-armed Bernoulli bandit: monotonicity of the total reward under an arbitrary prior distribution
+Normal rank
@@ Property / cites work @@
+A Bernoulli Two-armed Bandit
@@ Property / cites work: A Bernoulli Two-armed Bandit / rank @@
+Normal rank
@@ Property / cites work @@
+On Sequential Designs for Maximizing the Sum of $n$ Observations
+Normal rank
@@ Property / cites work @@
+On the Bernoulli two-armed bandit problem
@@ Property / cites work: On the Bernoulli two-armed bandit problem / rank @@
+Normal rank
@@ Property / cites work @@
+Q3725880
@@ Property / cites work: Q3725880 / rank @@
+Normal rank
@@ Property / cites work @@
+A note on ‘monotone optimal policies for markov decision processes’
+Normal rank
@@ Property / cites work @@
+Q4198358
@@ Property / cites work: Q4198358 / rank @@
+Normal rank
@@ Property / cites work @@
+A note on structural properties of the Bernoulli two-armed bandit problem
+Normal rank
@@ Property / cites work @@
+Some Concepts of Dependence
@@ Property / cites work: Some Concepts of Dependence / rank @@
+Normal rank
@@ Property / cites work @@
+Minimizing a Submodular Function on a Lattice
@@ Property / cites work: Minimizing a Submodular Function on a Lattice / rank @@
+Normal rank