scientific article

From MaRDI portal

Publication:2810878

Jump to:navigation, search

zbMath1360.62030arXiv1403.5341MaRDI QIDQ2810878

Daniel J. Russo, Benjamin van Roy

Publication date: 6 June 2016

Full work available at URL: https://arxiv.org/abs/1403.5341

Title: zbMATH Open Web Interface contents unavailable due to conflicting licenses.

zbMATH Keywords

information theory online optimization regret bounds Thompson sampling mutli-armed bandit

Mathematics Subject Classification ID

Sequential statistical design (62L05) Statistical aspects of information-theoretic topics (62B10) Optimal stopping in statistics (62L15) General considerations in statistical decision theory (62C05)

Related Items (23)

Generalizations of maximal inequalities to arbitrary selection rules ⋮ Bandit Theory: Applications to Learning Healthcare Systems and Clinical Trials ⋮ Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning ⋮ Probabilistic bisection with spatial metamodels ⋮ A Bayesian approach to (online) transfer learning: theory and algorithms ⋮ Information theory for ranking and selection ⋮ Reward Maximization Through Discrete Active Inference ⋮ Reinforcement Learning, Bit by Bit ⋮ Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection ⋮ Foraging decisions as multi-armed bandit problems: applying reinforcement learning algorithms to foraging data ⋮ Nonstationary Bandits with Habituation and Recovery Dynamics ⋮ Exploratory distributions for convex functions ⋮ Multi-Armed Bandit for Species Discovery: A Bayesian Nonparametric Approach ⋮ Improved regret for zeroth-order adversarial bandit convex optimisation ⋮ Adaptive policies for perimeter surveillance problems ⋮ Learning to Optimize via Information-Directed Sampling ⋮ Efficient Simulation of High Dimensional Gaussian Vectors ⋮ Derivative-free optimization methods ⋮ On the Prior Sensitivity of Thompson Sampling ⋮ Matching While Learning ⋮ Dismemberment and design for controlling the replication variance of regret for the multi-armed bandit ⋮ Satisficing in Time-Sensitive Bandit Learning ⋮ Entropy Regularization for Mean Field Games with Learning

This page was built for publication:

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:2810878&oldid=15709454"