An asymptotically optimal policy for finite support models in the multiarmed bandit problem

From MaRDI portal

Publication:415624

Jump to:navigation, search

DOI10.1007/s10994-011-5257-4zbMath1237.91037arXiv0905.2776OpenAlexW2131958277WikidataQ56675674 ScholiaQ56675674MaRDI QIDQ415624

Junya Honda, Akimichi Takemura

Publication date: 8 May 2012

Published in: Machine Learning (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/0905.2776

zbMATH Keywords

convex optimization bandit problems finite-time regret MED policy

Mathematics Subject Classification ID

Convex programming (90C25) Stochastic games, stochastic differential games (91A15) Rationality and learning in game theory (91A26)

Related Items (9)

Infomax strategies for an optimal balance between exploration and exploitation ⋮ EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET ⋮ Kullback-Leibler upper confidence bounds for optimal sequential allocation ⋮ Unnamed Item ⋮ A perpetual search for talents across overlapping generations: a learning process ⋮ ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT ⋮ On Bayesian index policies for sequential resource allocation ⋮ Unnamed Item ⋮ Unnamed Item

Cites Work

This page was built for publication: An asymptotically optimal policy for finite support models in the multiarmed bandit problem

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:415624&oldid=12290297"