Two-stage bandits (Q1115072): Difference between revisions

Two stochastic processes, or ``arms'', that yield dichotomous responses are available for use in a two-stage decision problem. During the first stage, arms are chosen sequentially; the resulting observations are discounted by a fixed value \(\beta\). A single arm must be used in the second stage, in which observations are not discounted. The decision to end the first stage is based on the data obtained. Optimal strategies are considered in the presence of the random discount sequence that arises in this setting. This extends the work of \textit{D. A. Berry} and \textit{B. Fristedt} [Ann. Stat. 7, 1086-1105 (1979; Zbl 0415.62056)].

0 references

zbMATH Keywords

two-stage bandit

0 references

sequential decisions

0 references

regular discounting

0 references

dichotomous responses

0 references

two-stage decision problem

0 references

Optimal strategies

0 references

random discount sequence

0 references

MaRDI profile type

MaRDI publication profile

0 references

full work available at URL

https://doi.org/10.1214/aos/1176350841

0 references

Identifiers

zbMATH Open document ID

0664.62081

0 references

DOI

10.1214/aos/1176350841

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1115072

@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / full work available at URL @@
+https://doi.org/10.1214/aos/1176350841
+Normal rank
@@ Property / OpenAlex ID @@
+W1992595769
@@ Property / OpenAlex ID: W1992595769 / rank @@
+Normal rank