Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis

From MaRDI portal
Publication:3164821

DOI10.1007/978-3-642-34106-9_18zbMath1386.91055arXiv1205.4217OpenAlexW2158319693MaRDI QIDQ3164821

Rémi Munos, Emilie Kaufmann, Nathaniel Korda

Publication date: 16 October 2012

Published in: Lecture Notes in Computer Science (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1205.4217




Related Items (33)

Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliersModification of improved upper confidence bounds for regulating exploration in Monte-Carlo tree searchInfomax strategies for an optimal balance between exploration and exploitationOptimistic Gittins IndicesImproving multi-armed bandit algorithms in online pricing settingsPractical Bayesian support vector regression for financial time series prediction and market condition change detectionBandit Theory: Applications to Learning Healthcare Systems and Clinical TrialsKullback-Leibler upper confidence bounds for optimal sequential allocationMaximizing revenue for publishers using header bidding and ad exchange auctionsMulti-armed bandit-based hyper-heuristics for combinatorial optimization problemsOnline learning of network bottlenecks via minimax pathsMulti-armed bandit problem with online clustering as side informationUnnamed ItemUnnamed ItemUnnamed ItemOnline learning of energy consumption for navigation of electric vehiclesResponse-adaptive randomization in clinical trials: from myths to practical considerationsLearning the distribution with largest mean: two bandit frameworksOnline Network Revenue Management Using Thompson SamplingSimple Bayesian Algorithms for Best-Arm IdentificationBallooning multi-armed banditsUnnamed ItemOn Bayesian index policies for sequential resource allocationUnnamed ItemEfficient multiobjective optimization employing Gaussian processes, spectral sampling and a genetic algorithmMulti-Armed Bandit for Species Discovery: A Bayesian Nonparametric ApproachAdaptive policies for perimeter surveillance problemsMechanisms with learning for stochastic multi-armed bandit problemsLearning to Optimize via Information-Directed SamplingOn the Prior Sensitivity of Thompson SamplingLearning Unknown Service Rates in Queues: A Multiarmed Bandit ApproachDismemberment and design for controlling the replication variance of regret for the multi-armed banditAsymptotically optimal algorithms for budgeted multiple play bandits




This page was built for publication: Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis