An online algorithm for the risk-aware restless bandit

DOI10.1016/j.ejor.2020.08.028zbMath1487.90634OpenAlexW3081812122MaRDI QIDQ2029383

Publication date: 3 June 2021

Published in: European Journal of Operational Research (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1016/j.ejor.2020.08.028

zbMATH Keywords

Markov process risk measure online optimization multi-armed bandit risk-aware

Mathematics Subject Classification ID

Statistical methods; risk measures (91G70) Markov and semi-Markov decision processes (90C40)

Related Items (1)

A tractable online learning algorithm for the multinomial logit contextual bandit

Cites Work

Unnamed Item
Unnamed Item
Unnamed Item
Unnamed Item
Regret bounds for restless Markov bandits
Risk-averse dynamic programming for Markov decision processes
Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process
General state space Markov chains and MCMC algorithms
Asymptotically efficient adaptive allocation rules
Adaptive treatment allocation and the multi-armed bandit problem
Optimal selection of obsolescence mitigation strategies using a restless bandit model
Concentration inequalities for Markov chains by Marton couplings and spectral methods
RISK MEASURES ON P(R) AND VALUE AT RISK WITH PROBABILITY/LOSS FUNCTION
Optimal Control of Markov Decision Processes With Linear Temporal Logic Constraints
Online Learning of Rested and Restless Bandits
Multi‐Armed Bandit Allocation Indices
Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards
A Central Limit Theorem and Hypotheses Testing for Risk-averse Stochastic Programs
Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
Optimality of Myopic Sensing in Multichannel Opportunistic Access
Sequential Decision Making With Coherent Risk
Optimization of Convex Risk Functions
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Distributionally Robust Reward-Risk Ratio Optimization with Moment Constraints
Some aspects of the sequential design of experiments
On consistency of stochastic dominance and mean-semideviation models
Finite-time analysis of the multiarmed bandit problem

This page was built for publication: An online algorithm for the risk-aware restless bandit