A note on the price of bandit feedback for mistake-bounded online learning

From MaRDI portal
Publication:2034409

DOI10.1016/J.TCS.2021.05.009zbMATH Open1504.68085arXiv2101.06891OpenAlexW3160023654MaRDI QIDQ2034409FDOQ2034409


Authors: Jesse Geneson Edit this on Wikidata


Publication date: 22 June 2021

Published in: Theoretical Computer Science (Search for Journal in Brave)

Abstract: The standard model and the bandit model are two generalizations of the mistake-bound model to online multiclass classification. In both models the learner guesses a classification in each round, but in the standard model the learner recieves the correct classification after each guess, while in the bandit model the learner is only told whether or not their guess is correct in each round. For any set F of multiclass classifiers, define optstd(F) and optbandit(F) to be the optimal worst-case number of prediction mistakes in the standard and bandit models respectively. Long (Theoretical Computer Science, 2020) claimed that for all M>2 and infinitely many k, there exists a set F of functions from a set X to a set Y of size k such that optstd(F)=M and optbandit(F)ge(1o(1))(|Y|ln|Y|)optstd(F). The proof of this result depended on the following lemma, which is false e.g. for all prime pge5, s=mathbf1 (the all 1 vector), t=mathbf2 (the all 2 vector), and all z. Lemma: Fix nge2 and prime p, and let u be chosen uniformly at random from left0,dots,p1ightn. For any s,tinleft1,dots,p1ightn with seqt and for any zinleft0,dots,p1ight, we have Pr(tcdotu=zmodpext|extscdotu=zmodp)=frac1p. We show that this lemma is false precisely when s and t are multiples of each other mod p. Then using a new lemma, we fix Long's proof.


Full work available at URL: https://arxiv.org/abs/2101.06891




Recommendations




Cites Work


Cited In (3)





This page was built for publication: A note on the price of bandit feedback for mistake-bounded online learning

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2034409)