A note on the price of bandit feedback for mistake-bounded online learning
From MaRDI portal
Publication:2034409
Abstract: The standard model and the bandit model are two generalizations of the mistake-bound model to online multiclass classification. In both models the learner guesses a classification in each round, but in the standard model the learner recieves the correct classification after each guess, while in the bandit model the learner is only told whether or not their guess is correct in each round. For any set of multiclass classifiers, define and to be the optimal worst-case number of prediction mistakes in the standard and bandit models respectively. Long (Theoretical Computer Science, 2020) claimed that for all and infinitely many , there exists a set of functions from a set to a set of size such that and . The proof of this result depended on the following lemma, which is false e.g. for all prime , (the all vector), (the all vector), and all . Lemma: Fix and prime , and let be chosen uniformly at random from . For any with and for any , we have . We show that this lemma is false precisely when and are multiples of each other mod . Then using a new lemma, we fix Long's proof.
Recommendations
- New bounds on the price of bandit feedback for mistake-bounded online multiclass learning
- New bounds on the price of bandit feedback for mistake-bounded online multiclass learning
- Sharp bounds on the price of bandit feedback for several models of mistake-bounded online learning
- Mistake bounds on the noise-free multi-armed bandit game
- On-line learning with linear loss constraints.
Cites work
- scientific article; zbMATH DE number 3048068 (Why is no real title available?)
- scientific article; zbMATH DE number 3106671 (Why is no real title available?)
- New bounds on the price of bandit feedback for mistake-bounded online multiclass learning
- On the complexity of function learning
- Pairwise independence and derandomization.
- Structural results about on-line learning models with and without queries
Cited in
(3)
This page was built for publication: A note on the price of bandit feedback for mistake-bounded online learning
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2034409)