Tsallis-INF: an optimal algorithm for stochastic and adversarial bandits
From MaRDI portal
Publication:4998901
Authors: Julian Zimmert, Yevgeny Seldin
Publication date: 9 July 2021
Full work available at URL: https://arxiv.org/abs/1807.07623
Recommendations
stochasticonline learningTsallis entropymulti-armed banditsbanditsadversarialbest of both worldsI.I.D.online mirror descent
Cites Work
- Elements of Information Theory
- Prediction, Learning, and Games
- Asymptotically efficient adaptive allocation rules
- Possible generalization of Boltzmann-Gibbs statistics.
- The Nonstochastic Multiarmed Bandit Problem
- Some aspects of the sequential design of experiments
- Finite-time analysis of the multiarmed bandit problem
- Regret bounds and minimax policies under partial monitoring
- Online learning and online convex optimization
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
- Thompson sampling: an asymptotically optimal finite-time analysis
- Stochastic bandits robust to adversarial corruptions
- A generalized online mirror descent with applications to classification and regression
- Perturbation techniques in online learning and optimization
- Multi-player bandits revisited
Cited In (6)
- Interior-Point Methods for Full-Information and Bandit Online Learning
- Title not available (Why is that?)
- Relaxing the i.i.d. assumption: adaptively minimax optimal regret via root-entropic regularization
- Improved regret for zeroth-order adversarial bandit convex optimisation
- Implicitly normalized forecaster with clipping for linear and non-linear heavy-tailed multi-armed bandits
- Online team formation under different synergies
This page was built for publication: Tsallis-INF: an optimal algorithm for stochastic and adversarial bandits
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4998901)