On monotone optimal decision rules and the stay-on-a-winner rule for the two-armed bandit (Q1821704): Difference between revisions

From MaRDI portal
RedirectionBot (talk | contribs)
Changed an Item
ReferenceBot (talk | contribs)
Changed an Item
 
(One intermediate revision by one other user not shown)
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / cites work
 
Property / cites work: On the k-armed Bernoulli bandit: monotonicity of the total reward under an arbitrary prior distribution / rank
 
Normal rank
Property / cites work
 
Property / cites work: A Bernoulli Two-armed Bandit / rank
 
Normal rank
Property / cites work
 
Property / cites work: On Sequential Designs for Maximizing the Sum of $n$ Observations / rank
 
Normal rank
Property / cites work
 
Property / cites work: On the Bernoulli two-armed bandit problem / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q3725880 / rank
 
Normal rank
Property / cites work
 
Property / cites work: A note on ‘monotone optimal policies for markov decision processes’ / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4198358 / rank
 
Normal rank
Property / cites work
 
Property / cites work: A note on structural properties of the Bernoulli two-armed bandit problem / rank
 
Normal rank
Property / cites work
 
Property / cites work: Some Concepts of Dependence / rank
 
Normal rank
Property / cites work
 
Property / cites work: Minimizing a Submodular Function on a Lattice / rank
 
Normal rank

Latest revision as of 18:31, 17 June 2024

scientific article
Language Label Description Also known as
English
On monotone optimal decision rules and the stay-on-a-winner rule for the two-armed bandit
scientific article

    Statements

    On monotone optimal decision rules and the stay-on-a-winner rule for the two-armed bandit (English)
    0 references
    0 references
    0 references
    0 references
    1985
    0 references
    Consider the following optimization problem: Find a decision rule \(\delta\) such that \(w(x,\delta (x))=\max_{a}w(x,a)\) for all x under the constraint \(\delta\) (x)\(\in D(x)\). We give conditions for the existence of monotone optimal decision rules \(\delta\). The term 'monotone' is used in a general sense. The well-known stay-on-a-winner rules for the two- armed bandit can be characterized as monotone decision rules by including the stage number into x and using a special ordering on x. This enables us to give simple conditions for the existence of optimal rules that are stay-on-a-winner rules. We extend results of \textit{D. A. Berry} [Ann. Math. Stat. 43, 871-897 (1972; Zbl 0258.62013)] and \textit{D. Kalin} and \textit{R. Theodorescu} [Math. Operations-Forsch. Stat., Ser. Optimization 13, 469-472 (1982; Zbl 0505.90080)] to the case of dependent arms.
    0 references
    existence of monotone optimal decision rules
    0 references
    stay-on-a-winner rules
    0 references
    two-armed bandit
    0 references

    Identifiers