Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
From MaRDI portal
Publication:4862097
DOI10.2307/1427934zbMath0840.90129OpenAlexW2000080679MaRDI QIDQ4862097
Publication date: 9 July 1996
Published in: Advances in Applied Probability (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.2307/1427934
large deviationsupper confidence boundsmulti-armed bandit problemasymptotically efficientKullback-Leibler numberstochastic adaptive controlnon-Bayesian infinite horizon version
Optimal stochastic control (93E20) Stochastic learning and adaptive control (93E35) Markov and semi-Markov decision processes (90C40)
Related Items
Functional Sequential Treatment Allocation, Geiringer theorems: from population genetics to computational intelligence, memory evolutive systems and Hebbian learning, A non-parametric solution to the multi-armed bandit problem with covariates, Nonasymptotic Analysis of Monte Carlo Tree Search, Optimistic Gittins Indices, Wisdom of crowds versus groupthink: learning in groups and in isolation, Kullback-Leibler upper confidence bounds for optimal sequential allocation, Exploration and exploitation of scratch games, The multi-armed bandit problem: an efficient nonparametric solution, Continuous Assortment Optimization with Logit Choice Probabilities and Incomplete Information, Robustness of stochastic bandit policies, Dealing with expert bias in collective decision-making, Convergence rate analysis for optimal computing budget allocation algorithms, An asymptotically optimal policy for finite support models in the multiarmed bandit problem, Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems, A confirmation of a conjecture on Feldman’s two-armed bandit problem, Learning the distribution with largest mean: two bandit frameworks, Tuning Bandit Algorithms in Stochastic Environments, Finite-Time Analysis for the Knowledge-Gradient Policy, UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem, How fragile are information cascades?, On Bayesian index policies for sequential resource allocation, Efficient crowdsourcing of unknown experts using bounded multi-armed bandits, Unnamed Item, Infinite Arms Bandit: Optimality via Confidence Bounds, Boundary crossing probabilities for general exponential families, An online algorithm for the risk-aware restless bandit, Exploration-exploitation tradeoff using variance estimates in multi-armed bandits, A revised approach for risk-averse multi-armed bandits under CVaR criterion, Derivative-free optimization methods, Gittins' theorem under uncertainty, Multi-agent reinforcement learning: a selective overview of theories and algorithms