Multi‐Armed Bandit Allocation Indices

DOI10.1002/9780470980033zbMath1401.90257OpenAlexW2499002200MaRDI QIDQ3083924

Richard R. Weber, Kevin D. Glazebrook, J. C. Gittins

Publication date: 24 March 2011

Full work available at URL: https://doi.org/10.1002/9780470980033

Search theory (90B40) Deterministic scheduling theory in operations research (90B35) Dynamic programming (90C39) Markov and semi-Markov decision processes (90C40) Performance evaluation, queueing, and scheduling in the context of computer systems (68M20) Research exposition (monographs, survey articles) pertaining to operations research and mathematical programming (90-02) Sequential statistical design (62L05)

Related Items (91)

Optimal stopping problems with restricted stopping times ⋮ Bayesian adaptive bandit-based designs using the Gittins index for multi-armed trials with normally distributed endpoints ⋮ A forwards induction approach to candidate drug selection ⋮ Conditions for indexability of restless bandits and an algorithm to compute Whittle index ⋮ Infomax strategies for an optimal balance between exploration and exploitation ⋮ Open Bandit Processes with Uncountable States and Time-Backward Effects ⋮ Minimizing the mean slowdown in a single-server queue ⋮ Incentivizing Exploration with Heterogeneous Value of Money ⋮ Control problems in online advertising and benefits of randomized bidding strategies ⋮ Four proofs of Gittins' multiarmed bandit theorem ⋮ Perspectives of approximate dynamic programming ⋮ Whittle index approach to size-aware scheduling for time-varying channels with multiple states ⋮ Bayesian Exploration: Incentivizing Exploration in Bayesian Games ⋮ Integrated Online Learning and Adaptive Control in Queueing Systems with Uncertain Payoffs ⋮ Ameso optimization: a relaxation of discrete midpoint convexity ⋮ Kullback-Leibler upper confidence bounds for optimal sequential allocation ⋮ On the computation of Whittle's index for Markovian restless bandits ⋮ The multi-armed bandit, with constraints ⋮ Multi-machine preventive maintenance scheduling with imperfect interventions: a restless bandit approach ⋮ Optimal Learning for Stochastic Optimization with Nonlinear Parametric Belief Models ⋮ Multi-round cooperative search games with multiple players ⋮ Optimal activation of halting multi‐armed bandit models ⋮ Unnamed Item ⋮ Multi-armed bandit-based hyper-heuristics for combinatorial optimization problems ⋮ On competitive analysis for polling systems ⋮ A novel statistical test for treatment differences in clinical trials using a response‐adaptive forward‐looking Gittins Index Rule ⋮ Topp-Leone distribution with an application to binomial sampling ⋮ On Submodular Search and Machine Scheduling ⋮ Unnamed Item ⋮ A foreground-background queueing model with speed or capacity modulation ⋮ Optimal dynamic resource allocation to prevent defaults ⋮ MULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENT ⋮ ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS ⋮ ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT ⋮ Exponential asymptotic optimality of Whittle index policy ⋮ Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems ⋮ Consumer strategy, vendor strategy and equilibrium in duopoly markets with production costs ⋮ Open Problem—M/G/1 Scheduling with Preemption Delays ⋮ A confirmation of a conjecture on Feldman’s two-armed bandit problem ⋮ Optimal schedule of elective surgery operations subject to disruptions by emergencies ⋮ An adversarial model for scheduling with testing ⋮ r-extreme signalling for congestion control ⋮ A unified framework for stochastic optimization ⋮ A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model ⋮ A reinforcement learning approach to personalized learning recommendation systems ⋮ MYOPIC POLICIES FOR NON-PREEMPTIVE SCHEDULING OF JOBS WITH DECAYING VALUE ⋮ BANDIT STRATEGIES EVALUATED IN THE CONTEXT OF CLINICAL TRIALS IN RARE LIFE-THREATENING DISEASES ⋮ Optimal learning before choice ⋮ Adaptive Matching for Expert Systems with Uncertain Task Types ⋮ Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors ⋮ Algorithms for recursive delegation ⋮ Stochastic scheduling: a short history of index policies and new approaches to index generation for dynamic resource allocation ⋮ Locks, Bombs and Testing: The Case of Independent Locks ⋮ On Bayesian index policies for sequential resource allocation ⋮ Optimal switching between cash-flow streams ⋮ A linear-quadratic Gaussian approach to dynamic information acquisition ⋮ Unnamed Item ⋮ Unnamed Item ⋮ An online algorithm for the risk-aware restless bandit ⋮ Complete expected improvement converges to an optimal budget allocation ⋮ Adaptive policies for perimeter surveillance problems ⋮ An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits ⋮ Optimal learning with non-Gaussian rewards ⋮ On index policies for stochastic minsum scheduling ⋮ Learning to Optimize via Information-Directed Sampling ⋮ Asymptotically optimal index policies for an abandonment queue with convex holding cost ⋮ On the dynamic allocation of assets subject to failure ⋮ Open problems in queueing theory inspired by datacenter computing ⋮ Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability ⋮ Improvements and Generalizations of Stochastic Knapsack and Markovian Bandits Approximation Algorithms ⋮ Bayesian Exploration for Approximate Dynamic Programming ⋮ Bayesian Incentive-Compatible Bandit Exploration ⋮ Unnamed Item ⋮ Gittins Index for Simple Family of Markov Bandit Processes with Switching Cost and No Discounting ⋮ Technical Note—A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents ⋮ Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach ⋮ Matching While Learning ⋮ Unnamed Item ⋮ An asymptotically optimal strategy for constrained multi-armed bandit problems ⋮ Uncertainty in learning, choice, and visual fixation ⋮ From reinforcement learning to optimal control: a unified framework for sequential decisions ⋮ Reinforcement learning: an industrial perspective ⋮ On the Gittins index for multistage jobs ⋮ Unnamed Item ⋮ The pure exploration problem with general reward functions depending on full distributions ⋮ A General Theory of MultiArmed Bandit Processes with Constrained Arm Switches ⋮ Approximately optimal scheduling of an \(\mathrm{M}/\mathrm{G}/1\) queue with heavy tails ⋮ Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges ⋮ Optimal discrete search with technological choice ⋮ Whittle index based Q-learning for restless bandits with average reward ⋮ A Restless Bandit Model for Resource Allocation, Competition, and Reservation

This page was built for publication: Multi‐Armed Bandit Allocation Indices