An asymptotically optimal strategy for constrained multi-armed bandit problems

From MaRDI portal
Publication:784789

DOI10.1007/S00186-019-00697-3zbMATH Open1447.90022arXiv1805.01237OpenAlexW2997070617WikidataQ126414170 ScholiaQ126414170MaRDI QIDQ784789FDOQ784789


Authors: Hyeong Soo Chang Edit this on Wikidata


Publication date: 3 August 2020

Published in: Mathematical Methods of Operations Research (Search for Journal in Brave)

Abstract: For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the epsilont-greedy strategy. We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm that holds for all time steps. Under some conditions, the bound approaches one as time t goes to infinity. A particular example sequence of epsilont having the asymptotic convergence rate in the order of (1frac1t)4 that holds from a sufficiently large t is also discussed.


Full work available at URL: https://arxiv.org/abs/1805.01237




Recommendations




Cites Work


Cited In (17)





This page was built for publication: An asymptotically optimal strategy for constrained multi-armed bandit problems

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q784789)