Abstract: The early sections of this paper present an analysis of a Markov decision model that is known as the multi-armed bandit under the assumption that the utility function of the decision maker is either linear or exponential. The analysis includes efficient procedures for computing the expected utility associated with the use of a priority policy and for identifying a priority policy that is optimal. The methodology in these sections is novel, building on the use of elementary row operations. In the later sections of this paper, the analysis is adapted to accommodate constraints that link the bandits.
Recommendations
Cites work
- scientific article; zbMATH DE number 4131489 (Why is no real title available?)
- scientific article; zbMATH DE number 3687126 (Why is no real title available?)
- scientific article; zbMATH DE number 3474804 (Why is no real title available?)
- scientific article; zbMATH DE number 3638998 (Why is no real title available?)
- scientific article; zbMATH DE number 1348599 (Why is no real title available?)
- scientific article; zbMATH DE number 194374 (Why is no real title available?)
- A Turnpike Theorem For A Risk-Sensitive Markov Decision Process with Stopping
- A \((2/3)n^{3}\) fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain
- A generalized Gittins index for a Markov chain and its recursive calculation
- A short proof of the Gittins index theorem
- Branching Bandit Processes
- Conservation Laws, Extended Polymatroids and Multiarmed Bandit Problems; A Polyhedral Approach to Indexable Systems
- Contraction Mappings in the Theory Underlying Dynamic Programming
- Discrete Dynamic Programming with Sensitive Discount Optimality Criteria
- Dynamic allocation problems in continuous time
- Extensions of the multiarmed bandit problem: The discounted case
- Multi-armed bandit allocation indices. With a foreword by Peter Whittle.
- Multi-armed bandits in discrete and continuous time
- On the Gittins index for multiarmed bandits
- Risk-Sensitive and Risk-Neutral Multiarmed Bandits
- Splitting randomized stationary policies in total-reward Markov decision processes
- The Multi-Armed Bandit Problem: Decomposition and Computation
- Why imitate, and if so, how? A boundedly rational approach to multi-armed bandits
Cited in
(14)- An asymptotically optimal strategy for constrained multi-armed bandit problems
- Four proofs of Gittins' multiarmed bandit theorem
- On the reduction of total-cost and average-cost MDPs to discounted mdps
- Robust control of the multi-armed bandit problem
- Constrained regret minimization for multi-criterion multi-armed bandits
- Multi-armed bandits with episode context
- Risk-Sensitive and Risk-Neutral Multiarmed Bandits
- Percentile optimization in multi-armed bandit problems
- Index policy for multiarmed bandit problem with dynamic risk measures
- Bandits with global convex constraints and objective
- Multi-armed bandits with censored consumption of resources
- Coordinating Pricing and Inventory Replenishment with Nonparametric Demand Learning
- Multi-armed bandits under general depreciation and commitment
- The Irrevocable Multiarmed Bandit Problem
This page was built for publication: The multi-armed bandit, with constraints
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q378726)