Sample mean based index policies by O(log n) regret for the multi-armed bandit problem

DOI10.2307/1427934zbMath0840.90129OpenAlexW2000080679MaRDI QIDQ4862097

Publication date: 9 July 1996

Published in: Unnamed Author (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.2307/1427934

zbMATH Keywords

large deviations upper confidence bounds multi-armed bandit problem asymptotically efficient Kullback-Leibler number stochastic adaptive control non-Bayesian infinite horizon version

Mathematics Subject Classification ID

Optimal stochastic control (93E20) Stochastic learning and adaptive control (93E35) Markov and semi-Markov decision processes (90C40)

Related Items (32)

Functional Sequential Treatment Allocation ⋮ Geiringer theorems: from population genetics to computational intelligence, memory evolutive systems and Hebbian learning ⋮ A non-parametric solution to the multi-armed bandit problem with covariates ⋮ Nonasymptotic Analysis of Monte Carlo Tree Search ⋮ Optimistic Gittins Indices ⋮ Wisdom of crowds versus groupthink: learning in groups and in isolation ⋮ Kullback-Leibler upper confidence bounds for optimal sequential allocation ⋮ Exploration and exploitation of scratch games ⋮ The multi-armed bandit problem: an efficient nonparametric solution ⋮ Continuous Assortment Optimization with Logit Choice Probabilities and Incomplete Information ⋮ Robustness of stochastic bandit policies ⋮ Dealing with expert bias in collective decision-making ⋮ Convergence rate analysis for optimal computing budget allocation algorithms ⋮ An asymptotically optimal policy for finite support models in the multiarmed bandit problem ⋮ Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems ⋮ A confirmation of a conjecture on Feldman’s two-armed bandit problem ⋮ Learning the distribution with largest mean: two bandit frameworks ⋮ Tuning Bandit Algorithms in Stochastic Environments ⋮ Finite-Time Analysis for the Knowledge-Gradient Policy ⋮ UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem ⋮ How fragile are information cascades? ⋮ On Bayesian index policies for sequential resource allocation ⋮ Efficient crowdsourcing of unknown experts using bounded multi-armed bandits ⋮ Unnamed Item ⋮ Infinite Arms Bandit: Optimality via Confidence Bounds ⋮ Boundary crossing probabilities for general exponential families ⋮ An online algorithm for the risk-aware restless bandit ⋮ Exploration-exploitation tradeoff using variance estimates in multi-armed bandits ⋮ A revised approach for risk-averse multi-armed bandits under CVaR criterion ⋮ Derivative-free optimization methods ⋮ Gittins' theorem under uncertainty ⋮ Multi-agent reinforcement learning: a selective overview of theories and algorithms

This page was built for publication: Sample mean based index policies by O(log n) regret for the multi-armed bandit problem