Bandits with switching costs, T^2/3 regret
From MaRDI portal
Publication:5259581
Learning and adaptive systems in artificial intelligence (68T05) Online algorithms; streaming algorithms (68W27) Randomized algorithms (68W20) Sums of independent random variables; random walks (60G50) Computational difficulty of problems (lower bounds, completeness, difficulty of approximation, etc.) (68Q17) Rationality and learning in game theory (91A26)
Abstract: We study the adversarial multi-armed bandit problem in a setting where the player incurs a unit cost each time he switches actions. We prove that the player's -round minimax regret in this setting is , thereby closing a fundamental gap in our understanding of learning with bandit feedback. In the corresponding full-information version of the problem, the minimax regret is known to grow at a much slower rate of . The difference between these two rates provides the emph{first} indication that learning with bandit feedback can be significantly harder than learning with full-information feedback (previous results only showed a different dependence on the number of actions, but not on .) In addition to characterizing the inherent difficulty of the multi-armed bandit problem with switching costs, our results also resolve several other open problems in online learning. One direct implication is that learning with bandit feedback against bounded-memory adaptive adversaries has a minimax regret of . Another implication is that the minimax regret of online learning in adversarial Markov decision processes (MDPs) is . The key to all of our results is a new randomized construction of a multi-scale random walk, which is of independent interest and likely to prove useful in additional settings.
Recommendations
Cites work
- scientific article; zbMATH DE number 5485440 (Why is no real title available?)
- scientific article; zbMATH DE number 5485574 (Why is no real title available?)
- Advances in Cryptology – CRYPTO 2004
- Answering \(n^{2+o(1)}\) counting queries with differential privacy is hard
- Bounds on the sample complexity for private learning and private data release
- Characterizing the sample complexity of private learners
- Collusion-secure fingerprinting for digital data
- Differential privacy and the fat-shattering dimension of linear queries
- Efficient algorithms for privately releasing marginals via convex relaxations
- Faster algorithms for privately releasing marginals
- Faster private release of marginals on small databases
- Interactive privacy via the median mechanism
- Iterative Constructions and Private Data Release
- Lower bounds in differential privacy
- New Efficient Attacks on Statistical Disclosure Control Mechanisms
- On the complexity of differentially private data release, efficient algorithms and hardness results
- On the geometry of differential privacy
- Our Data, Ourselves: Privacy Via Distributed Noise Generation
- Private Learning and Sanitization: Pure vs. Approximate Differential Privacy
- The price of privately releasing contingency tables and the spectra of random matrices with correlated rows
- Theory of Cryptography
Cited in
(6)- Average optimality in a Poissonian bandit with switching arms
- Chasing Ghosts: Competing with Stateful Policies
- Online learning over a finite action set with limited switching
- Multi-armed Bandits with Metric Switching Costs
- Constrained no-regret learning
- Sharp dichotomies for regret minimization in metric spaces
This page was built for publication: Bandits with switching costs, \(T^{2/3}\) regret
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5259581)