An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem

From MaRDI portal

Publication:1785430

Jump to:navigation, search

DOI10.1016/j.orl.2015.08.008zbMath1408.91052OpenAlexW1640961991MaRDI QIDQ1785430

Ryo Watanabe, Mineichi Kudo, Atsuyoshi Nakamura

Publication date: 28 September 2018

Published in: Operations Research Letters (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1016/j.orl.2015.08.008

zbMATH Keywords

matching online learning multi-armed bandit problem combinatorial bandit regret analysis

Mathematics Subject Classification ID

Performance evaluation, queueing, and scheduling in the context of computer systems (68M20) Probabilistic games; gambling (91A60) Matching models (91B68)

Cites Work

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:1785430&oldid=14135451"