An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem
From MaRDI portal
Publication:1785430
DOI10.1016/j.orl.2015.08.008zbMath1408.91052OpenAlexW1640961991MaRDI QIDQ1785430
Ryo Watanabe, Mineichi Kudo, Atsuyoshi Nakamura
Publication date: 28 September 2018
Published in: Operations Research Letters (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.orl.2015.08.008
Performance evaluation, queueing, and scheduling in the context of computer systems (68M20) Probabilistic games; gambling (91A60) Matching models (91B68)
Cites Work