An asymptotically optimal strategy for constrained multi-armed bandit problems

From MaRDI portal




Abstract: For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the epsilont-greedy strategy. We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm that holds for all time steps. Under some conditions, the bound approaches one as time t goes to infinity. A particular example sequence of epsilont having the asymptotic convergence rate in the order of (1frac1t)4 that holds from a sufficiently large t is also discussed.









This page was built for publication: An asymptotically optimal strategy for constrained multi-armed bandit problems

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q784789)