Stochastic convex optimization with bandit feedback
From MaRDI portal
Publication:5300524
DOI10.1137/110850827zbMATH Open1270.90107arXiv1107.1744OpenAlexW2567882221MaRDI QIDQ5300524FDOQ5300524
Alekh Agarwal, Alexander Rakhlin, Dean P. Foster, Sham M. Kakade, Daniel Hsu
Publication date: 27 June 2013
Published in: SIAM Journal on Optimization (Search for Journal in Brave)
Abstract: This paper addresses the problem of minimizing a convex, Lipschitz function over a convex, compact set under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value at any query point . The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs regret. Since any algorithm has regret at least on this problem, our algorithm is optimal in terms of the scaling with .
Full work available at URL: https://arxiv.org/abs/1107.1744
Recommendations
- An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
- Online convex optimization in the bandit setting: gradient descent without a gradient
- Kernel-based Methods for Bandit Convex Optimization
- Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case
- Kernel-based methods for bandit convex optimization
Learning and adaptive systems in artificial intelligence (68T05) Convex programming (90C25) Derivative-free methods and methods using generalized derivatives (90C56)
Cited In (21)
- An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback
- Optimal Policy for Dynamic Assortment Planning Under Multinomial Logit Models
- Interior-Point Methods for Full-Information and Bandit Online Learning
- Kernel-based Methods for Bandit Convex Optimization
- Online bandit convex optimisation with stochastic constraints via two-point feedback
- Technical Note—Nonstationary Stochastic Optimization Under Lp,q-Variation Measures
- Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case
- Technical note: <scp>Finite‐time</scp> regret analysis of <scp>Kiefer‐Wolfowitz</scp> stochastic approximation algorithm and nonparametric <scp>multi‐product</scp> dynamic pricing with unknown demand
- Improved regret for zeroth-order adversarial bandit convex optimisation
- Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback
- Exploratory distributions for convex functions
- Random gradient-free minimization of convex functions
- Non-stationary stochastic optimization
- Operations research applications of dichotomous search
- Title not available (Why is that?)
- An Efficient Algorithm for Learning with Semi-bandit Feedback
- Kernel-based methods for bandit convex optimization
- Zeroth-order feedback optimization for cooperative multi-agent systems
- Derivative-free optimization methods
- Stochastic continuum-armed bandits with additive models: minimax regrets and adaptive algorithm
- An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization
This page was built for publication: Stochastic convex optimization with bandit feedback
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5300524)