Pages that link to "Item:Q4862097"
From MaRDI portal
The following pages link to Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem (Q4862097):
Displayed 15 items.
- Geiringer theorems: from population genetics to computational intelligence, memory evolutive systems and Hebbian learning (Q269771) (← links)
- Wisdom of crowds versus groupthink: learning in groups and in isolation (Q361811) (← links)
- Kullback-Leibler upper confidence bounds for optimal sequential allocation (Q366995) (← links)
- Exploration and exploitation of scratch games (Q374139) (← links)
- Robustness of stochastic bandit policies (Q391739) (← links)
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem (Q415624) (← links)
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem (Q653803) (← links)
- Boundary crossing probabilities for general exponential families (Q722599) (← links)
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits (Q1017665) (← links)
- On Bayesian index policies for sequential resource allocation (Q1750289) (← links)
- Efficient crowdsourcing of unknown experts using bounded multi-armed bandits (Q2014933) (← links)
- Tuning Bandit Algorithms in Stochastic Environments (Q3520056) (← links)
- (Q4558161) (← links)
- Learning the distribution with largest mean: two bandit frameworks (Q4606431) (← links)
- Finite-Time Analysis for the Knowledge-Gradient Policy (Q4610155) (← links)