Kernel-based methods for bandit convex optimization

DOI10.1145/3055399.3055403zbMATH Open1370.90175arXiv1607.03084OpenAlexW2963831922MaRDI QIDQ4977962FDOQ4977962

Authors: Sébastien Bubeck, Yin Tat Lee, Ronen Eldan

Publication date: 17 August 2017

Published in: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (Search for Journal in Brave)

Abstract: We consider the adversarial convex bandit problem and we build the first

m a t h r m p o l y (T)

-time algorithm with

m a t h r m p o l y (n) s q r t T

-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves

i l d e O (n^{9.5} s q r t T)

-regret, and we show that a simple variant of this algorithm can be run in

m a t h r m p o l y (n l o g (T))

-time per step at the cost of an additional

m a t h r m p o l y (n) T^{o (1)}

factor in the regret. These results improve upon the

i l d e O (n^{11} s q r t T)

-regret and

e x p (m a t h r m p o l y (T))

-time result of the first two authors, and the

l o g (T)^{m a t h r m p o l y (n)} s q r t T

-regret and

l o g (T)^{m a t h r m p o l y (n)}

-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve

i l d e O (n^{1.5} s q r t T)

-regret, and moreover that this regret is unimprovable (the current best lower bound being

O m e g a (n s q r t T)

and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order

n^{3} / e p s i l o n^{2}

.

Full work available at URL: https://arxiv.org/abs/1607.03084

Recommendations

zbMATH Keywords

convex optimization multi-armed bandit online learning

Mathematics Subject Classification ID

Learning and adaptive systems in artificial intelligence (68T05) Convex programming (90C25) Analysis of algorithms and problem complexity (68Q25) Probabilistic games; gambling (91A60)

Cited In (15)

This page was built for publication: Kernel-based methods for bandit convex optimization

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4977962)