Bandits with switching costs

From MaRDI portal
Publication:5259581

DOI10.1145/2591796.2591868zbMATH Open1315.68207arXiv1310.2997OpenAlexW2109690147MaRDI QIDQ5259581FDOQ5259581

Jian Ding, Ofer Dekel, Yuval Peres, Tomer Koren

Publication date: 26 June 2015

Published in: Proceedings of the forty-sixth annual ACM symposium on Theory of computing (Search for Journal in Brave)

Abstract: We study the adversarial multi-armed bandit problem in a setting where the player incurs a unit cost each time he switches actions. We prove that the player's T-round minimax regret in this setting is widetildeTheta(T2/3), thereby closing a fundamental gap in our understanding of learning with bandit feedback. In the corresponding full-information version of the problem, the minimax regret is known to grow at a much slower rate of Theta(sqrtT). The difference between these two rates provides the emph{first} indication that learning with bandit feedback can be significantly harder than learning with full-information feedback (previous results only showed a different dependence on the number of actions, but not on T.) In addition to characterizing the inherent difficulty of the multi-armed bandit problem with switching costs, our results also resolve several other open problems in online learning. One direct implication is that learning with bandit feedback against bounded-memory adaptive adversaries has a minimax regret of widetildeTheta(T2/3). Another implication is that the minimax regret of online learning in adversarial Markov decision processes (MDPs) is widetildeTheta(T2/3). The key to all of our results is a new randomized construction of a multi-scale random walk, which is of independent interest and likely to prove useful in additional settings.


Full work available at URL: https://arxiv.org/abs/1310.2997





Cites Work


Cited In (4)

Uses Software






This page was built for publication: Bandits with switching costs

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5259581)