Good arm identification via bandit feedback

From MaRDI portal
Publication:2425222

DOI10.1007/S10994-019-05784-4zbMATH Open1491.68160arXiv1710.06360OpenAlexW2962902250WikidataQ128264264 ScholiaQ128264264MaRDI QIDQ2425222FDOQ2425222

Atsuyoshi Nakamura, Masashi Sugiyama, Kentaro Matsuura, Kentaro Sakamaki, Hideaki Kano, Junya Honda

Publication date: 26 June 2019

Published in: Machine Learning (Search for Journal in Brave)

Abstract: We consider a novel stochastic multi-armed bandit problem called {em good arm identification} (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem that a single agent repeats a process of outputting an arm as soon as it is identified as a good one before confirming the other arms are actually not good. The objective of GAI is to minimize the number of samples for each process. We find that GAI faces a new kind of dilemma, the {em exploration-exploitation dilemma of confidence}, which is different difficulty from the best arm identification. As a result, an efficient design of algorithms for GAI is quite different from that for the best arm identification. We derive a lower bound on the sample complexity of GAI that is tight up to the logarithmic factor mathrmO(logfrac1delta) for acceptance error rate delta. We also develop an algorithm whose sample complexity almost matches the lower bound. We also confirm experimentally that our proposed algorithm outperforms naive algorithms in synthetic settings based on a conventional bandit problem and clinical trial researches for rheumatoid arthritis.


Full work available at URL: https://arxiv.org/abs/1710.06360





Cites Work


Cited In (3)






This page was built for publication: Good arm identification via bandit feedback

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2425222)