Stochastic convex optimization with bandit feedback

DOI10.1137/110850827zbMATH Open1270.90107arXiv1107.1744OpenAlexW2567882221MaRDI QIDQ5300524FDOQ5300524

Authors: Alekh Agarwal, Dean P. Foster, Daniel Hsu, Sham M. Kakade, Alexander Rakhlin

Publication date: 27 June 2013

Published in: SIAM Journal on Optimization (Search for Journal in Brave)

Abstract: This paper addresses the problem of minimizing a convex, Lipschitz function

f

over a convex, compact set

x s e t

under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value

f (x)

at any query point

x i n x s e t

. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs

o t i l (p o l y (d) s q r t T)

regret. Since any algorithm has regret at least

O m e g a (s q r t T)

on this problem, our algorithm is optimal in terms of the scaling with

T

.

Full work available at URL: https://arxiv.org/abs/1107.1744

Recommendations

zbMATH Keywords

derivative-free optimization ellipsoid method bandit optimization

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Convex programming (90C25) Derivative-free methods and methods using generalized derivatives (90C56)

Cited In (24)

This page was built for publication: Stochastic convex optimization with bandit feedback

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5300524)