Extensions of the multiarmed bandit problem: The discounted case
From MaRDI portal
Publication:3682272
DOI10.1109/TAC.1985.1103989zbMath0566.90096MaRDI QIDQ3682272
Cagatay Buyukkoc, Jean Walrand, Pravin P. Varaiya
Publication date: 1985
Published in: IEEE Transactions on Automatic Control (Search for Journal in Brave)
optimal strategiessequential design of experimentssemi-Markov modelsstochastic resource allocationGittins indicesarrivalsbandit problemscomplex decision structures
Deterministic scheduling theory in operations research (90B35) Adaptive control/observation systems (93C40) Dynamic programming (90C39) Markov and semi-Markov decision processes (90C40)
Related Items
A bisection/successive approximation method for computing Gittins indices ⋮ Optimal control of single-server queueing networks ⋮ Multi-armed bandit problem revisited ⋮ Stochastic scheduling and forwards induction ⋮ Open Bandit Processes with Uncountable States and Time-Backward Effects ⋮ Bandit and covariate processes, with finite or non-denumerable set of arms ⋮ Optimistic Gittins Indices ⋮ Simultaneous optimization of flow control and scheduling in a single server queue with two job classes ⋮ Simultaneous optimization of flow-control and scheduling in a single server queue with two job classes: Numerical results and approximation ⋮ Competing Markov decision processes ⋮ On Gittins' index theorem in continuous time ⋮ Resource capacity allocation to stochastic dynamic competitors: knapsack problem for perishable items and index-knapsack heuristic ⋮ Four proofs of Gittins' multiarmed bandit theorem ⋮ Stochastic scheduling of parallel queues with set-up costs ⋮ A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning ⋮ On an Optimal Stopping Problem for Multi-Parameter Diffusion Processes ⋮ The multi-armed bandit, with constraints ⋮ Derman's book as inspiration: some results on LP for MDPs ⋮ Sample path methods in the control of queues ⋮ The archievable region method in the optimal control of queueing systems; formulations, bounds and policies ⋮ Performance evaluation of scheduling control of queueing networks: Fluid model heuristics ⋮ Index policy for multiarmed bandit problem with dynamic risk measures ⋮ A perpetual search for talents across overlapping generations: a learning process ⋮ MULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENT ⋮ Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems ⋮ Dynamic priority allocation via restless bandit marginal productivity indices ⋮ Flow time distributions in a \(K\) class \(M/G/1\) priority feedback queue ⋮ On the evaluation of strategies for branching bandit processes ⋮ Stochastic scheduling: a short history of index policies and new approaches to index generation for dynamic resource allocation ⋮ A generalized Gittins index for a Markov chain and its recursive calculation ⋮ Reading policies for joins: an asymptotic analysis ⋮ A comparative study of ad hoc techniques and evolutionary methods for multi-armed bandit problems ⋮ Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems ⋮ Optimal stopping for Brownian motion with applications to sequential analysis and option pricing ⋮ On the Gittins index in the M/G/1 queue ⋮ Stationary multi-choice bandit problems. ⋮ Independently Expiring Multiarmed Bandits ⋮ New results for generalized bandit problems ⋮ Tax problems in the undiscounted case ⋮ Survey of linear programming for standard and nonstandard Markovian control problems. Part II: Applications ⋮ A survey of Markov decision models for control of networks of queues ⋮ Optimal intensity control of a multi-class queue ⋮ Branching Bandit Processes ⋮ Dynamic allocation policies for the finite horizon one armed bandit problem ⋮ Multi-armed bandits in discrete and continuous time ⋮ A General Theory of MultiArmed Bandit Processes with Constrained Arm Switches ⋮ Discrete multiarmed bandits and multiparameter processes ⋮ Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges ⋮ Optimal stopping problems for multiarmed bandit processes with arms' independence
This page was built for publication: Extensions of the multiarmed bandit problem: The discounted case