Sample mean based index policies by O(log n) regret for the multi-armed bandit problem

From MaRDI portal
Revision as of 05:08, 8 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:4862097

DOI10.2307/1427934zbMath0840.90129OpenAlexW2000080679MaRDI QIDQ4862097

Rajeev Agrawal

Publication date: 9 July 1996

Published in: Advances in Applied Probability (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.2307/1427934



Related Items

Functional Sequential Treatment Allocation, Geiringer theorems: from population genetics to computational intelligence, memory evolutive systems and Hebbian learning, A non-parametric solution to the multi-armed bandit problem with covariates, Nonasymptotic Analysis of Monte Carlo Tree Search, Optimistic Gittins Indices, Wisdom of crowds versus groupthink: learning in groups and in isolation, Kullback-Leibler upper confidence bounds for optimal sequential allocation, Exploration and exploitation of scratch games, The multi-armed bandit problem: an efficient nonparametric solution, Continuous Assortment Optimization with Logit Choice Probabilities and Incomplete Information, Robustness of stochastic bandit policies, Dealing with expert bias in collective decision-making, Convergence rate analysis for optimal computing budget allocation algorithms, An asymptotically optimal policy for finite support models in the multiarmed bandit problem, Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems, A confirmation of a conjecture on Feldman’s two-armed bandit problem, Learning the distribution with largest mean: two bandit frameworks, Tuning Bandit Algorithms in Stochastic Environments, Finite-Time Analysis for the Knowledge-Gradient Policy, UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem, How fragile are information cascades?, On Bayesian index policies for sequential resource allocation, Efficient crowdsourcing of unknown experts using bounded multi-armed bandits, Unnamed Item, Infinite Arms Bandit: Optimality via Confidence Bounds, Boundary crossing probabilities for general exponential families, An online algorithm for the risk-aware restless bandit, Exploration-exploitation tradeoff using variance estimates in multi-armed bandits, A revised approach for risk-averse multi-armed bandits under CVaR criterion, Derivative-free optimization methods, Gittins' theorem under uncertainty, Multi-agent reinforcement learning: a selective overview of theories and algorithms