scientific article; zbMATH DE number 6982311
From MaRDI portal
Publication:4558161
Recommendations
- Infinite Arms Bandit: Optimality via Confidence Bounds
- 10.1162/153244303321897663
- scientific article; zbMATH DE number 6276176
- Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits
- On upper-confidence bound policies for switching bandit problems
- Pure exploration in infinitely-armed bandit models with fixed-confidence
- Confidence sets for discrete stochastic optimization
- Optimal learning and experimentation in bandit problems.
Cites work
- scientific article; zbMATH DE number 3658788 (Why is no real title available?)
- scientific article; zbMATH DE number 1306865 (Why is no real title available?)
- scientific article; zbMATH DE number 6982311 (Why is no real title available?)
- scientific article; zbMATH DE number 1390271 (Why is no real title available?)
- Adaptive treatment allocation and the multi-armed bandit problem
- An Asymptotic Minimax Theorem for the Two Armed Bandit Problem
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- Asymptotically efficient adaptive allocation rules
- Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: finite parameter space
- Boundary crossing of Brownian motion. Its relation to the law of the iterated logarithm and to sequential analysis
- Concentration inequalities. A nonasymptotic theory of independence
- Finite-time analysis of the multiarmed bandit problem
- Finite-time lower bounds for the two-armed bandit problem
- Introduction to nonparametric estimation
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Lemma 1
- Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem
- Multi-armed bandit allocation indices. With a foreword by Peter Whittle.
- Near-optimal regret bounds for Thompson sampling
- Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards
- Normal bandits of unknown means and variances
- On Bayesian index policies for sequential resource allocation
- Optimal adaptive policies for sequential allocation problems
- Pure exploration in multi-armed bandits problems
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
- Sample mean based index policies by O(log n) regret for the multi-armed bandit problem
- The multi-armed bandit problem with covariates
- Tuning Bandit Algorithms in Stochastic Environments
- UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem
Cited in
(4)- A minimax and asymptotically optimal algorithm for stochastic bandits
- Exploration-exploitation policies with almost sure, arbitrarily slow growing asymptotic regret
- scientific article; zbMATH DE number 6276176 (Why is no real title available?)
- scientific article; zbMATH DE number 6982311 (Why is no real title available?)
This page was built for publication:
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4558161)