Sample mean based index policies by O(log n) regret for the multi-armed bandit problem

From MaRDI portal
Revision as of 04:08, 8 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:4862097

DOI10.2307/1427934zbMath0840.90129OpenAlexW2000080679MaRDI QIDQ4862097

Rajeev Agrawal

Publication date: 9 July 1996

Published in: Unnamed Author (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.2307/1427934




Related Items (32)

Functional Sequential Treatment AllocationGeiringer theorems: from population genetics to computational intelligence, memory evolutive systems and Hebbian learningA non-parametric solution to the multi-armed bandit problem with covariatesNonasymptotic Analysis of Monte Carlo Tree SearchOptimistic Gittins IndicesWisdom of crowds versus groupthink: learning in groups and in isolationKullback-Leibler upper confidence bounds for optimal sequential allocationExploration and exploitation of scratch gamesThe multi-armed bandit problem: an efficient nonparametric solutionContinuous Assortment Optimization with Logit Choice Probabilities and Incomplete InformationRobustness of stochastic bandit policiesDealing with expert bias in collective decision-makingConvergence rate analysis for optimal computing budget allocation algorithmsAn asymptotically optimal policy for finite support models in the multiarmed bandit problemEmpirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problemsA confirmation of a conjecture on Feldman’s two-armed bandit problemLearning the distribution with largest mean: two bandit frameworksTuning Bandit Algorithms in Stochastic EnvironmentsFinite-Time Analysis for the Knowledge-Gradient PolicyUCB revisited: improved regret bounds for the stochastic multi-armed bandit problemHow fragile are information cascades?On Bayesian index policies for sequential resource allocationEfficient crowdsourcing of unknown experts using bounded multi-armed banditsUnnamed ItemInfinite Arms Bandit: Optimality via Confidence BoundsBoundary crossing probabilities for general exponential familiesAn online algorithm for the risk-aware restless banditExploration-exploitation tradeoff using variance estimates in multi-armed banditsA revised approach for risk-averse multi-armed bandits under CVaR criterionDerivative-free optimization methodsGittins' theorem under uncertaintyMulti-agent reinforcement learning: a selective overview of theories and algorithms







This page was built for publication: Sample mean based index policies by O(log n) regret for the multi-armed bandit problem