Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis
From MaRDI portal
Publication:3164821
DOI10.1007/978-3-642-34106-9_18zbMath1386.91055arXiv1205.4217OpenAlexW2158319693MaRDI QIDQ3164821
Rémi Munos, Emilie Kaufmann, Nathaniel Korda
Publication date: 16 October 2012
Published in: Lecture Notes in Computer Science (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1205.4217
Bayesian problems; characterization of Bayes procedures (62C10) Decision theory (91B06) Probabilistic games; gambling (91A60)
Related Items (33)
Robust sequential design for piecewise-stationary multi-armed bandit problem in the presence of outliers ⋮ Modification of improved upper confidence bounds for regulating exploration in Monte-Carlo tree search ⋮ Infomax strategies for an optimal balance between exploration and exploitation ⋮ Optimistic Gittins Indices ⋮ Improving multi-armed bandit algorithms in online pricing settings ⋮ Practical Bayesian support vector regression for financial time series prediction and market condition change detection ⋮ Bandit Theory: Applications to Learning Healthcare Systems and Clinical Trials ⋮ Kullback-Leibler upper confidence bounds for optimal sequential allocation ⋮ Maximizing revenue for publishers using header bidding and ad exchange auctions ⋮ Multi-armed bandit-based hyper-heuristics for combinatorial optimization problems ⋮ Online learning of network bottlenecks via minimax paths ⋮ Multi-armed bandit problem with online clustering as side information ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Online learning of energy consumption for navigation of electric vehicles ⋮ Response-adaptive randomization in clinical trials: from myths to practical considerations ⋮ Learning the distribution with largest mean: two bandit frameworks ⋮ Online Network Revenue Management Using Thompson Sampling ⋮ Simple Bayesian Algorithms for Best-Arm Identification ⋮ Ballooning multi-armed bandits ⋮ Unnamed Item ⋮ On Bayesian index policies for sequential resource allocation ⋮ Unnamed Item ⋮ Efficient multiobjective optimization employing Gaussian processes, spectral sampling and a genetic algorithm ⋮ Multi-Armed Bandit for Species Discovery: A Bayesian Nonparametric Approach ⋮ Adaptive policies for perimeter surveillance problems ⋮ Mechanisms with learning for stochastic multi-armed bandit problems ⋮ Learning to Optimize via Information-Directed Sampling ⋮ On the Prior Sensitivity of Thompson Sampling ⋮ Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach ⋮ Dismemberment and design for controlling the replication variance of regret for the multi-armed bandit ⋮ Asymptotically optimal algorithms for budgeted multiple play bandits
This page was built for publication: Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis