Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations

DOI10.1109/TIT.2015.2409256zbMath1359.90155arXiv1312.2139OpenAlexW2171830216MaRDI QIDQ2978646

Andre Wibisono, Michael I. Jordan, Martin J. Wainwright, John C. Duchi

Publication date: 28 April 2017

Published in: IEEE Transactions on Information Theory (Search for Journal in Brave)

Abstract: We consider derivative-free algorithms for stochastic and non-stochastic convex optimization problems that use only function values rather than gradients. Focusing on non-asymptotic bounds on convergence rates, we show that if pairs of function values are available, algorithms for $d$-dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most $sqrt{d}$ in convergence rate over traditional stochastic gradient methods. We establish such results for both smooth and non-smooth cases, sharpening previous analyses that suggested a worse dimension dependence, and extend our results to the case of multiple ($m ge 2$) evaluations. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, establishing the sharpness of our achievable results up to constant (sometimes logarithmic) factors.

Full work available at URL: https://arxiv.org/abs/1312.2139

Mathematics Subject Classification ID

Convex programming (90C25) Derivative-free methods and methods using generalized derivatives (90C56)

Related Items (58)

Full-low evaluation methods for derivative-free optimization ⋮ Gradient-free two-point methods for solving stochastic nonsmooth convex optimization problems with small non-random noises ⋮ Stochastic zeroth-order discretizations of Langevin diffusions for Bayesian inference ⋮ New First-Order Algorithms for Stochastic Variational Inequalities ⋮ Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems ⋮ A theoretical and empirical comparison of gradient approximations in derivative-free optimization ⋮ Finite Difference Gradient Approximation: To Randomize or Not? ⋮ Zeroth-order algorithms for stochastic distributed nonconvex optimization ⋮ Zeroth-Order Stochastic Compositional Algorithms for Risk-Aware Learning ⋮ Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case ⋮ Gradient-free distributed optimization with exact convergence ⋮ An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization ⋮ Decentralized online convex optimization based on signs of relative states ⋮ Personalized optimization with user's feedback ⋮ Improved complexities for stochastic conditional gradient methods under interpolation-like conditions ⋮ On the convergence rate issues of general Markov search for global minimum ⋮ Zeroth-order optimization with orthogonal random directions ⋮ Gradient-free federated learning methods with $l_1$ and $l_2$-randomization for non-smooth convex stochastic optimization problems ⋮ Gradient-free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact ⋮ Non-smooth setting of stochastic decentralized convex optimization problem over time-varying graphs ⋮ A Zeroth-Order Proximal Stochastic Gradient Method for Weakly Convex Stochastic Optimization ⋮ A gradient‐free distributed optimization method for convex sum of nonconvex cost functions ⋮ Direct Search Based on Probabilistic Descent in Reduced Spaces ⋮ Distributed Nash equilibrium learning: A second‐order proximal algorithm ⋮ Optimistic optimisation of composite objective with exponentiated update ⋮ Zeroth-order feedback optimization for cooperative multi-agent systems ⋮ Federated learning for minimizing nonsmooth convex loss functions ⋮ Re-thinking high-dimensional mathematical statistics. Abstracts from the workshop held May 15--21, 2022 ⋮ Unnamed Item ⋮ Exact optimization: Part I ⋮ Adaptive sampling quasi-Newton methods for zeroth-order stochastic optimization ⋮ Online distributed dual averaging algorithm for multi-agent bandit optimization over time-varying general directed networks ⋮ Unifying framework for accelerated randomized methods in convex optimization ⋮ Zeroth-order nonconvex stochastic optimization: handling constraints, high dimensionality, and saddle points ⋮ Adaptive Catalyst for Smooth Convex Optimization ⋮ Online Statistical Inference for Stochastic Optimization via Kiefer-Wolfowitz Methods ⋮ Convergence guarantees for forward gradient descent in the linear regression model ⋮ Expected decrease for derivative-free algorithms using random subspaces ⋮ Unnamed Item ⋮ An accelerated directional derivative method for smooth stochastic convex optimization ⋮ Small errors in random zeroth-order optimization are imaginary ⋮ On the upper bound for the expectation of the norm of a vector uniformly distributed on the sphere and the phenomenon of concentration of uniform measure on the sphere ⋮ Worst case complexity of direct search under convexity ⋮ Distributed zeroth-order optimization: convergence rates that match centralized counterpart ⋮ Stochastic adversarial noise in the ``black box optimization problem ⋮ Accelerated zero-order SGD method for solving the black box optimization problem under ``overparametrization condition ⋮ A stochastic subspace approach to gradient-free optimization in high dimensions ⋮ Stochastic zeroth order descent with structured directions ⋮ Gradient-Free Methods with Inexact Oracle for Convex-Concave Stochastic Saddle-Point Problem ⋮ A zeroth order method for stochastic weakly convex optimization ⋮ A new one-point residual-feedback oracle for black-box learning and control ⋮ Derivative-free optimization methods ⋮ Distributed Subgradient-Free Stochastic Optimization Algorithm for Nonsmooth Convex Functions over Time-Varying Networks ⋮ Model-free linear quadratic regulator ⋮ Unnamed Item ⋮ Distributed online bandit optimization under random quantization ⋮ Noisy zeroth-order optimization for non-smooth saddle point problems ⋮ One-point gradient-free methods for smooth and non-smooth saddle-point problems

This page was built for publication: Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations