An asymptotically optimal strategy for constrained multi-armed bandit problems

DOI10.1007/S00186-019-00697-3MaRDI QIDQ784789zbMATH OpenOpenAlexWikidataFDO

Authors Hyeong Soo Chang

Publication date 3 August 2020

Published in Mathematical Methods of Operations Research (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1805.01237

zbMATH Keywords

multi-armed bandit constrained stochastic optimization simulation optimization constrained Markov decision process

Mathematics Subject Classification ID

Stochastic programming (90C15) Markov and semi-Markov decision processes (90C40)

Abstract: For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the

e p s i l o n_{t}

-greedy strategy. We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm that holds for all time steps. Under some conditions, the bound approaches one as time

t

goes to infinity. A particular example sequence of

e p s i l o n_{t}

having the asymptotic convergence rate in the order of

(1 - f r a c 1 t)^{4}

that holds from a sufficiently large

t

is also discussed.

Recommendations

Cites work

Cited in

(17)

This page was built for publication: An asymptotically optimal strategy for constrained multi-armed bandit problems

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q784789)