Good arm identification via bandit feedback
From MaRDI portal
Publication:2425222
Abstract: We consider a novel stochastic multi-armed bandit problem called {em good arm identification} (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem that a single agent repeats a process of outputting an arm as soon as it is identified as a good one before confirming the other arms are actually not good. The objective of GAI is to minimize the number of samples for each process. We find that GAI faces a new kind of dilemma, the {em exploration-exploitation dilemma of confidence}, which is different difficulty from the best arm identification. As a result, an efficient design of algorithms for GAI is quite different from that for the best arm identification. We derive a lower bound on the sample complexity of GAI that is tight up to the logarithmic factor for acceptance error rate . We also develop an algorithm whose sample complexity almost matches the lower bound. We also confirm experimentally that our proposed algorithm outperforms naive algorithms in synthetic settings based on a conventional bandit problem and clinical trial researches for rheumatoid arthritis.
Recommendations
- On the complexity of best-arm identification in multi-armed bandit models
- Lower bounds on the sample complexity of exploration in the multi-armed bandit problem.
- Pure exploration in infinitely-armed bandit models with fixed-confidence
- The sample complexity of exploration in the multi-armed bandit problem
- Multi-armed bandits with simple arms
Cites work
- A procedure for selecting a subset of size m containing the l best of k independent normal populations, with applications to simulation
- Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems
- Asymptotically efficient adaptive allocation rules
- Finite-time analysis of the multiarmed bandit problem
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- On the complexity of best-arm identification in multi-armed bandit models
- Reinforcement learning. An introduction
Cited in
(8)- On the complexity of best-arm identification in multi-armed bandit models
- Pure exploration in infinitely-armed bandit models with fixed-confidence
- Best arm identification for contaminated bandits
- Best arm identification in generalized linear bandits
- Secure best arm identification in multi-armed bandits
- A bad arm existence checking problem: how to utilize asymmetric problem structure?
- A PAC algorithm in relative precision for bandit problem with costly sampling
- Finding a good normal population
This page was built for publication: Good arm identification via bandit feedback
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2425222)