Learning to optimize via information-directed sampling
From MaRDI portal
Publication:4969321
Abstract: We propose information-directed sampling -- a new approach to online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback. Each action is sampled in a manner that minimizes the ratio between squared expected single-period regret and a measure of information gain: the mutual information between the optimal action and the next observation. We establish an expected regret bound for information-directed sampling that applies across a very general class of models and scales with the entropy of the optimal action distribution. We illustrate through simple analytic examples how information-directed sampling accounts for kinds of information that alternative approaches do not adequately address and that this can lead to dramatic performance gains. For the widely studied Bernoulli, Gaussian, and linear bandit problems, we demonstrate state-of-the-art simulation performance.
Recommendations
Cites work
- scientific article; zbMATH DE number 1804105 (Why is no real title available?)
- scientific article; zbMATH DE number 3612796 (Why is no real title available?)
- scientific article; zbMATH DE number 2107836 (Why is no real title available?)
- scientific article; zbMATH DE number 5485582 (Why is no real title available?)
- scientific article; zbMATH DE number 6276166 (Why is no real title available?)
- A Knowledge-Gradient Policy for Sequential Information Collection
- Adaptive submodularity: theory and applications in active learning and stochastic optimization
- Adaptive treatment allocation and the multi-armed bandit problem
- An information-theoretic analysis of Thompson sampling
- An informational approach to the global optimization of expensive-to-evaluate functions
- Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains
- Asymptotically efficient adaptive allocation rules
- Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space
- Asymptotically efficient adaptive allocation schemes for controlled i.i.d. processes: finite parameter space
- Bayesian experimental design: A review
- Bisection search with noisy responses
- Computing a classic index for finite-horizon bandits
- Dynamic assortment optimization with a multinomial logit choice model and capacity constraint
- Dynamic pricing under a general parametric choice model
- Entropy and information theory.
- Finite-time analysis of the multiarmed bandit problem
- Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
- Kullback-Leibler upper confidence bounds for optimal sequential allocation
- Learning to optimize via posterior sampling
- Linearly parameterized bandits
- Multi-armed bandit allocation indices. With a foreword by Peter Whittle.
- Near-optimal regret bounds for reinforcement learning
- On a Measure of the Information Provided by an Experiment
- Partial monitoring -- classification, regret bounds, and algorithms
- Refined knowledge-gradient policy for learning probabilities
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems
- Regret in online combinatorial optimization
- The knowledge gradient algorithm for a general class of online learning problems
- Thompson sampling: an asymptotically optimal finite-time analysis
- Twenty questions with noise: Bayes optimal policies for entropy loss
- \(X\)-armed bandits
Cited in
(16)- Reinforcement Learning, Bit by Bit
- Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization
- Learning in combinatorial optimization: what and how to explore
- Dynamic programs with shared resources and signals: dynamic fluid policies and asymptotic optimality
- Efficient learning for clustering and optimizing context-dependent designs
- Optimal learning in linear regression with combinatorial feature selection
- Infomax strategies for an optimal balance between exploration and exploitation
- No-regret Bayesian optimization with unknown hyperparameters
- Approximating the operating characteristics of Bayesian uncertainty directed trial designs
- Exploratory distributions for convex functions
- scientific article; zbMATH DE number 7306856 (Why is no real title available?)
- Simple Bayesian algorithms for best-arm identification
- Occupancy information ratio: infinite-horizon, information-directed, parameterized policy search
- Optimistic Gittins Indices
- Online team formation under different synergies
- Deep exploration via randomized value functions
This page was built for publication: Learning to optimize via information-directed sampling
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4969321)