Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
DOI10.2307/1427934zbMATH Open0840.90129OpenAlexW2000080679MaRDI QIDQ4862097FDOQ4862097
Authors: Rajeev Agrawal
Publication date: 9 July 1996
Published in: Advances in Applied Probability (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.2307/1427934
Recommendations
- On the bias, risk, and consistency of sample means in multi-armed bandits
- An index-based deterministic convergent optimal algorithm for constrained multi-armed bandit problems
- On an index policy for restless bandits
- The sample complexity of exploration in the multi-armed bandit problem
- Lower bounds on the sample complexity of exploration in the multi-armed bandit problem.
- Index-based policies for discounted multi-armed bandits on parallel machines.
large deviationsasymptotically efficientmulti-armed bandit problemupper confidence boundsKullback-Leibler numberstochastic adaptive controlnon-Bayesian infinite horizon version
Markov and semi-Markov decision processes (90C40) Optimal stochastic control (93E20) Stochastic learning and adaptive control (93E35)
Cited In (39)
- Robustness of stochastic bandit policies
- Learning the distribution with largest mean: two bandit frameworks
- Finite-Time Analysis for the Knowledge-Gradient Policy
- Geiringer theorems: from population genetics to computational intelligence, memory evolutive systems and Hebbian learning
- The multi-armed bandit problem: an efficient nonparametric solution
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- How fragile are information cascades?
- On Bayesian index policies for sequential resource allocation
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
- Boundary crossing probabilities for general exponential families
- Functional Sequential Treatment Allocation
- Dealing with expert bias in collective decision-making
- An online algorithm for the risk-aware restless bandit
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- A confirmation of a conjecture on Feldman’s two-armed bandit problem
- Factorial Designs for Online Experiments
- The sample complexity of exploration in the multi-armed bandit problem
- Nonasymptotic Analysis of Monte Carlo Tree Search
- Continuous Assortment Optimization with Logit Choice Probabilities and Incomplete Information
- Tuning Bandit Algorithms in Stochastic Environments
- A revised approach for risk-averse multi-armed bandits under CVaR criterion
- Convergence rate analysis for optimal computing budget allocation algorithms
- Infinite Arms Bandit: Optimality via Confidence Bounds
- Wisdom of crowds versus groupthink: learning in groups and in isolation
- Optimistic Gittins Indices
- Multi-agent reinforcement learning: a selective overview of theories and algorithms
- Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems
- Efficient crowdsourcing of unknown experts using bounded multi-armed bandits
- Some memoryless bandit policies
- Linearly parameterized bandits
- Title not available (Why is that?)
- Gittins' theorem under uncertainty
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
- Lower bounds on the sample complexity of exploration in the multi-armed bandit problem.
- Exploration and exploitation of scratch games
- Derivative-free optimization methods
- Deviations of stochastic bandit regret
- A non-parametric solution to the multi-armed bandit problem with covariates
- Finite-time lower bounds for the two-armed bandit problem
This page was built for publication: Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4862097)