The multi-armed bandit, with constraints
From MaRDI portal
Publication:378726
DOI10.1007/S10479-012-1250-YzbMATH Open1274.90470arXiv1203.4640OpenAlexW1965106111MaRDI QIDQ378726FDOQ378726
Authors: Eric V. Denardo, Eugene A. Feinberg, Uriel G. Rothblum
Publication date: 12 November 2013
Published in: Annals of Operations Research (Search for Journal in Brave)
Abstract: The early sections of this paper present an analysis of a Markov decision model that is known as the multi-armed bandit under the assumption that the utility function of the decision maker is either linear or exponential. The analysis includes efficient procedures for computing the expected utility associated with the use of a priority policy and for identifying a priority policy that is optimal. The methodology in these sections is novel, building on the use of elementary row operations. In the later sections of this paper, the analysis is adapted to accommodate constraints that link the bandits.
Full work available at URL: https://arxiv.org/abs/1203.4640
Recommendations
Cites Work
- On the Gittins index for multiarmed bandits
- A \((2/3)n^{3}\) fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain
- Title not available (Why is that?)
- Multi-armed bandit allocation indices. With a foreword by Peter Whittle.
- Extensions of the multiarmed bandit problem: The discounted case
- Title not available (Why is that?)
- Title not available (Why is that?)
- Conservation Laws, Extended Polymatroids and Multiarmed Bandit Problems; A Polyhedral Approach to Indexable Systems
- Why imitate, and if so, how? A boundedly rational approach to multi-armed bandits
- Title not available (Why is that?)
- Discrete Dynamic Programming with Sensitive Discount Optimality Criteria
- Splitting randomized stationary policies in total-reward Markov decision processes
- The Multi-Armed Bandit Problem: Decomposition and Computation
- Title not available (Why is that?)
- Multi-armed bandits in discrete and continuous time
- A short proof of the Gittins index theorem
- Branching Bandit Processes
- Title not available (Why is that?)
- Risk-Sensitive and Risk-Neutral Multiarmed Bandits
- A generalized Gittins index for a Markov chain and its recursive calculation
- Contraction Mappings in the Theory Underlying Dynamic Programming
- A Turnpike Theorem For A Risk-Sensitive Markov Decision Process with Stopping
- Dynamic allocation problems in continuous time
Cited In (14)
- An asymptotically optimal strategy for constrained multi-armed bandit problems
- Four proofs of Gittins' multiarmed bandit theorem
- On the reduction of total-cost and average-cost MDPs to discounted mdps
- Robust control of the multi-armed bandit problem
- Constrained regret minimization for multi-criterion multi-armed bandits
- Multi-armed bandits with episode context
- Risk-Sensitive and Risk-Neutral Multiarmed Bandits
- Percentile optimization in multi-armed bandit problems
- Index policy for multiarmed bandit problem with dynamic risk measures
- Bandits with global convex constraints and objective
- Multi-armed bandits with censored consumption of resources
- Coordinating Pricing and Inventory Replenishment with Nonparametric Demand Learning
- Multi-armed bandits under general depreciation and commitment
- The Irrevocable Multiarmed Bandit Problem
This page was built for publication: The multi-armed bandit, with constraints
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q378726)