An asymptotically optimal strategy for constrained multi-armed bandit problems

DOI10.1007/S00186-019-00697-3zbMATH Open1447.90022arXiv1805.01237OpenAlexW2997070617WikidataQ126414170 ScholiaQ126414170MaRDI QIDQ784789FDOQ784789

Authors: Hyeong Soo Chang

Publication date: 3 August 2020

Published in: Mathematical Methods of Operations Research (Search for Journal in Brave)

Abstract: For the stochastic multi-armed bandit (MAB) problem from a constrained model that generalizes the classical one, we show that an asymptotic optimality is achievable by a simple strategy extended from the

e p s i l o n_{t}

-greedy strategy. We provide a finite-time lower bound on the probability of correct selection of an optimal near-feasible arm that holds for all time steps. Under some conditions, the bound approaches one as time

t

goes to infinity. A particular example sequence of

e p s i l o n_{t}

having the asymptotic convergence rate in the order of

(1 - f r a c 1 t)^{4}

that holds from a sufficiently large

t

is also discussed.

Full work available at URL: https://arxiv.org/abs/1805.01237

Recommendations

zbMATH Keywords

multi-armed bandit constrained stochastic optimization simulation optimization constrained Markov decision process

Mathematics Subject Classification ID

Stochastic programming (90C15) Markov and semi-Markov decision processes (90C40)

Cites Work

Cited In (17)

This page was built for publication: An asymptotically optimal strategy for constrained multi-armed bandit problems

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q784789)