Learning to Optimize via Posterior Sampling

From MaRDI portal
Publication:5247618

DOI10.1287/moor.2014.0650zbMath1310.93091arXiv1301.2609OpenAlexW2149721706MaRDI QIDQ5247618

Daniel J. Russo, Benjamin van Roy

Publication date: 24 April 2015

Published in: Mathematics of Operations Research (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1301.2609




Related Items (45)

Optimal Learning for Nonlinear Parametric Belief Models Over Multidimensional Continuous SpacesUnnamed ItemPractical Bayesian support vector regression for financial time series prediction and market condition change detectionOptimal Information Blending with Measurements in the L2 SphereBandit Theory: Applications to Learning Healthcare Systems and Clinical TrialsFeel-Good Thompson Sampling for Contextual Bandits and Reinforcement LearningBayesian optimization with partially specified queriesMulti-armed bandit-based hyper-heuristics for combinatorial optimization problemsOnline Resource Allocation with Personalized LearningOnline learning of network bottlenecks via minimax pathsOn the Convergence Rates of Expected Improvement MethodsDecomposable Markov Decision Processes: A Fluid Optimization ApproachReward Maximization Through Discrete Active InferenceReinforcement Learning, Bit by BitON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITSOptimal Learning in Linear Regression with Combinatorial Feature SelectionOnline learning of energy consumption for navigation of electric vehiclesMulti-fidelity cost-aware Bayesian optimizationGaussian process bandits with adaptive discretizationOnline Decision Making with High-Dimensional CovariatesTechnical Note—Consistency Analysis of Sequential Learning Under Approximate Bayesian InferenceOnline Network Revenue Management Using Thompson SamplingA unified framework for stochastic optimizationNonstationary Bandits with Habituation and Recovery DynamicsOptimal Online Learning for Nonlinear Belief Models Using Discrete PriorsOptimal Learning with Local Nonlinear Parametric Models over Continuous DesignsUnnamed ItemOn Bayesian index policies for sequential resource allocationBayesian adversarial multi-node bandit for optimal smart grid protection against cyber attacksInfinite Arms Bandit: Optimality via Confidence BoundsMulti-Armed Bandit for Species Discovery: A Bayesian Nonparametric ApproachImproved regret for zeroth-order adversarial bandit convex optimisationComplete expected improvement converges to an optimal budget allocationLearning to Optimize via Information-Directed SamplingEfficient Simulation of High Dimensional Gaussian VectorsThe Local Time Method for Targeting and SelectionBayesian Exploration for Approximate Dynamic ProgrammingVariance Regularization in Sequential Bayesian OptimizationMulti-armed bandit with sub-exponential rewardsBest arm identification in generalized linear banditsOn the Prior Sensitivity of Thompson SamplingIntelligentPooling: practical Thompson sampling for mHealthGame of Thrones: Fully Distributed Learning for Multiplayer BanditsUnnamed ItemSatisficing in Time-Sensitive Bandit Learning



Cites Work


This page was built for publication: Learning to Optimize via Posterior Sampling