On Bayesian index policies for sequential resource allocation (Q1750289)

From MaRDI portal

Revision as of 15:31, 15 July 2024 by ReferenceBot (talk | contribs) (‎Changed an Item)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to:navigation, search

scientific article

Language	Label	Description	Also known as
English	On Bayesian index policies for sequential resource allocation	scientific article

Statements

scholarly article

0 references

On Bayesian index policies for sequential resource allocation (English)

0 references

Emilie Kaufmann

0 references

The Annals of Statistics

0 references

publication date

18 May 2018

0 references

full work available at URL

https://arxiv.org/abs/1601.01190

0 references

Consider the stochastic multi-armed bandit problem. So, it is assumed that there are \(K\) (\(K \geq 2\)) arms with which an agent interacts in a sequential way on the control horizon \(T\). The choice of the \(a\)-th arm is accompanied with a random reward the distribution of which depends only on the chosen arm and is unknown to the agent. The goal is to maximize (in some sense) the total expected reward. If the agent knew distributions of the rewards, he should always use the oracle policy and choose the arm corresponding to the largest mathematical expectation of the reward. However, for the actually used policy \(A\) the total expected reward is less than corresponding to the oracle policy by a value of the regret \(R(T,A)\). In the article, asymptotically optimal Bayesian index policies for the multi-armed bandit problem are proposed for a large class of distributions of the rewards belonging to a one-dimensional exponential family. For these strategies \(R(T,A)\), considered in the frequentist view (i.e. for some fixed unknown distributions of the rewards), has the order \(\log T\). The proposed policies are the Bayes-UCB algorithms which rely on quantiles of posterior distributions. Bayesian insight on alternative exploration rates is also presented. To this end, proposed algorithms are compared with finite-horizon Gittins indices, kl-UCB\(^+\) and kl-UCB-H\(^{+}\) algorithms. Numerical results are given.

0 references

Alex V. Kolnogorov

0 references

zbMATH Keywords

multi-armed bandit problems

0 references

Bayesian methods

0 references

upper-confidence bounds

0 references

Gittins indices

0 references

MaRDI profile type

MaRDI publication profile

0 references

Sample mean based index policies by <i>O</i>(log <i>n</i>) regret for the multi-armed bandit problem

0 references

Near-Optimal Regret Bounds for Thompson Sampling

0 references

0 references

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

0 references

Finite-time analysis of the multiarmed bandit problem

0 references

0 references

0 references

0 references

On Sequential Designs for Maximizing the Sum of $n$ Observations

0 references

ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM

0 references

Kullback-Leibler upper confidence bounds for optimal sequential allocation

0 references

Optimal stopping and dynamic allocation

0 references

Small-sample performance of Bernoulli two-armed bandit Bayesian strategies

0 references

0 references

Multi‐Armed Bandit Allocation Indices

0 references

An asymptotically optimal policy for finite support models in the multiarmed bandit problem

0 references

On Bayesian index policies for sequential resource allocation

0 references

Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis

0 references

Adaptive treatment allocation and the multi-armed bandit problem

0 references

Asymptotically efficient adaptive allocation rules

0 references

On the Prior Sensitivity of Thompson Sampling

0 references

Computing a Classic Index for Finite-Horizon Bandits

0 references

0 references

Learning to Optimize via Posterior Sampling

0 references

Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting

0 references

Identifiers

zbMATH Open document ID

0 references

10.1214/17-AOS1569

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

0 references

zbMATH DE Number

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1750289

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Item:Q1750289&oldid=36391195"