A General Theory of MultiArmed Bandit Processes with Constrained Arm Switches
From MaRDI portal
Publication:5020738
DOI10.1137/19M1282386zbMath1483.90092arXiv1808.06314MaRDI QIDQ5020738
Xianyi Wu, Wenqing Bao, Xiaoqiang Cai
Publication date: 7 January 2022
Published in: SIAM Journal on Control and Optimization (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1808.06314
Gittins indexstochastic adaptive controlmultiarmed bandit processesmachine learning/reinforcement learningrestricted stopping time
Related Items
Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Gittins indices in the dynamic allocation problem for diffusion processes
- Continuous multi-armed bandits and multiparameter processes
- Arm-acquiring bandits
- Multi-armed bandits in discrete and continuous time
- Discrete multiarmed bandits and multiparameter processes
- Dynamic allocation problems in continuous time
- Multi-armed bandit problem revisited
- Optimal stopping problems with restricted stopping times
- Optimal stochastic scheduling
- Consistency of Sequential Bayesian Sampling Policies
- Multi‐Armed Bandit Allocation Indices
- Stochastic Scheduling Subject to Preemptive-Repeat Breakdowns with Incomplete Information
- On the Optimal Reward Function of the Continuous Time Multiarmed Bandit Problem
- Branching Bandit Processes
- Index policies for discounted bandit problems with availability constraints
- Extensions of the multiarmed bandit problem: The discounted case
- A Survey of Some Results in Stochastic Adaptive Control
- Continuous-time allocation indices and their discrete-time approximation
- Switching Costs and the Gittins Index
- Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic
- General Gittins index processes in discrete time.
- The Continuum-Armed Bandit Problem
- Open Bandit Processes with Uncountable States and Time-Backward Effects
- MULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENT
- ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT
- Applications of Martingale System Theorems