An asymptotically optimal policy for finite support models in the multiarmed bandit problem
From MaRDI portal
Publication:415624
DOI10.1007/s10994-011-5257-4zbMath1237.91037arXiv0905.2776OpenAlexW2131958277WikidataQ56675674 ScholiaQ56675674MaRDI QIDQ415624
Junya Honda, Akimichi Takemura
Publication date: 8 May 2012
Published in: Machine Learning (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/0905.2776
Convex programming (90C25) Stochastic games, stochastic differential games (91A15) Rationality and learning in game theory (91A26)
Related Items (9)
Infomax strategies for an optimal balance between exploration and exploitation ⋮ EXPLORATION–EXPLOITATION POLICIES WITH ALMOST SURE, ARBITRARILY SLOW GROWING ASYMPTOTIC REGRET ⋮ Kullback-Leibler upper confidence bounds for optimal sequential allocation ⋮ Unnamed Item ⋮ A perpetual search for talents across overlapping generations: a learning process ⋮ ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT ⋮ On Bayesian index policies for sequential resource allocation ⋮ Unnamed Item ⋮ Unnamed Item
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Introduction to sensitivity and stability analysis in nonlinear programming
- Nonparametric bandit methods
- Asymptotically efficient adaptive allocation rules
- Multi-armed bandit problem revisited
- Optimal adaptive policies for sequential allocation problems
- Exploration of multi-state environments: Local measures and back-propagation of uncertainty
- The Multi-Armed Bandit Problem: Decomposition and Computation
- Non-overlapping domain decomposition for evolution operators
- The Nonstochastic Multiarmed Bandit Problem
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- The Continuum-Armed Bandit Problem
- Elements of Information Theory
- Some aspects of the sequential design of experiments
- Convergence of stochastic processes
- Finite-time analysis of the multiarmed bandit problem
This page was built for publication: An asymptotically optimal policy for finite support models in the multiarmed bandit problem