Sample mean based index policies by O(log n) regret for the multi-armed bandit problem

From MaRDI portal
Publication:4862097


DOI10.2307/1427934zbMath0840.90129MaRDI QIDQ4862097

Rajeev Agrawal

Publication date: 9 July 1996

Published in: Advances in Applied Probability (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.2307/1427934


93E20: Optimal stochastic control

93E35: Stochastic learning and adaptive control

90C40: Markov and semi-Markov decision processes


Related Items

Unnamed Item, Learning the distribution with largest mean: two bandit frameworks, Finite-Time Analysis for the Knowledge-Gradient Policy, Infinite Arms Bandit: Optimality via Confidence Bounds, Nonasymptotic Analysis of Monte Carlo Tree Search, Optimistic Gittins Indices, Continuous Assortment Optimization with Logit Choice Probabilities and Incomplete Information, Derivative-free optimization methods, Functional Sequential Treatment Allocation, Dealing with expert bias in collective decision-making, Convergence rate analysis for optimal computing budget allocation algorithms, Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems, A confirmation of a conjecture on Feldman’s two-armed bandit problem, Geiringer theorems: from population genetics to computational intelligence, memory evolutive systems and Hebbian learning, Wisdom of crowds versus groupthink: learning in groups and in isolation, Kullback-Leibler upper confidence bounds for optimal sequential allocation, Exploration and exploitation of scratch games, Robustness of stochastic bandit policies, An asymptotically optimal policy for finite support models in the multiarmed bandit problem, UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem, Boundary crossing probabilities for general exponential families, A non-parametric solution to the multi-armed bandit problem with covariates, Exploration-exploitation tradeoff using variance estimates in multi-armed bandits, On Bayesian index policies for sequential resource allocation, Efficient crowdsourcing of unknown experts using bounded multi-armed bandits, An online algorithm for the risk-aware restless bandit, A revised approach for risk-averse multi-armed bandits under CVaR criterion, Gittins' theorem under uncertainty, Multi-agent reinforcement learning: a selective overview of theories and algorithms, The multi-armed bandit problem: an efficient nonparametric solution, How fragile are information cascades?, Tuning Bandit Algorithms in Stochastic Environments