Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
From MaRDI portal
Publication:4862097
Recommendations
- On the bias, risk, and consistency of sample means in multi-armed bandits
- An index-based deterministic convergent optimal algorithm for constrained multi-armed bandit problems
- On an index policy for restless bandits
- The sample complexity of exploration in the multi-armed bandit problem
- Lower bounds on the sample complexity of exploration in the multi-armed bandit problem.
- Index-based policies for discounted multi-armed bandits on parallel machines.
Cited in
(48)- A revised approach for risk-averse multi-armed bandits under CVaR criterion
- Geiringer theorems: from population genetics to computational intelligence, memory evolutive systems and Hebbian learning
- Functional Sequential Treatment Allocation
- Deviations of stochastic bandit regret
- Learning the distribution with largest mean: two bandit frameworks
- Robustness of stochastic bandit policies
- Lower bounds on the sample complexity of exploration in the multi-armed bandit problem.
- Multi-armed bandits based on a variant of simulated annealing
- Approximate indexability and bandit problems with concave rewards and delayed feedback
- Derivative-free optimization methods
- Exploration-exploitation policies with almost sure, arbitrarily slow growing asymptotic regret
- Dealing with expert bias in collective decision-making
- Nonasymptotic Analysis of Monte Carlo Tree Search
- Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem
- Explore first, exploit next: the true shape of regret in bandit problems
- Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards
- scientific article; zbMATH DE number 6982311 (Why is no real title available?)
- An online algorithm for the risk-aware restless bandit
- Some memoryless bandit policies
- Gittins' theorem under uncertainty
- Boundary crossing for general exponential families
- Exploration and exploitation of scratch games
- Convergence rate analysis for optimal computing budget allocation algorithms
- Optimistic Gittins Indices
- Wisdom of crowds versus groupthink: learning in groups and in isolation
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- Infinite Arms Bandit: Optimality via Confidence Bounds
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
- Multi-agent reinforcement learning: a selective overview of theories and algorithms
- A non-parametric solution to the multi-armed bandit problem with covariates
- Boundary crossing probabilities for general exponential families
- Linearly parameterized bandits
- A confirmation of a conjecture on Feldman’s two-armed bandit problem
- Empirical Gittins index strategies with \(\varepsilon\)-explorations for multi-armed bandit problems
- How fragile are information cascades?
- Normal bandits of unknown means and variances
- Finite-time analysis for the knowledge-gradient policy
- The sample complexity of exploration in the multi-armed bandit problem
- Finite-time lower bounds for the two-armed bandit problem
- Factorial Designs for Online Experiments
- An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem
- On Bayesian index policies for sequential resource allocation
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Continuous Assortment Optimization with Logit Choice Probabilities and Incomplete Information
- Tuning Bandit Algorithms in Stochastic Environments
- Efficient crowdsourcing of unknown experts using bounded multi-armed bandits
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
- The multi-armed bandit problem: an efficient nonparametric solution
This page was built for publication: Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4862097)