Breaking the Span Assumption Yields Fast Finite-Sum Minimization

arXiv1805.07786MaRDI QIDQ6301845FDOQ6301845

Authors: Robert Hannah, Yanli Liu, Daniel O'Connor, Wotao Yin

Publication date: 20 May 2018

Abstract: In this paper, we show that SVRG and SARAH can be modified to be fundamentally faster than all of the other standard algorithms that minimize the sum of

n

smooth functions, such as SAGA, SAG, SDCA, and SDCA without duality. Most finite sum algorithms follow what we call the "span assumption": Their updates are in the span of a sequence of component gradients chosen in a random IID fashion. In the big data regime, where the condition number

k a p p a = m a t h c a l O (n)

, the span assumption prevents algorithms from converging to an approximate solution of accuracy

e p s i l o n

in less than

n l n (1 / e p s i l o n)

iterations. SVRG and SARAH do not follow the span assumption since they are updated with a hybrid of full-gradient and component-gradient information. We show that because of this, they can be up to

O m e g a (1 + (l n (n / k a p p a))_{+})

times faster. In particular, to obtain an accuracy

e p s i l o n = 1 / n^{a} l p h a

for

and

, modified SVRG requires

m a t h c a l O (n)

iterations, whereas algorithms that follow the span assumption require

m a t h c a l O (n l n (n))

iterations. Moreover, we present lower bound results that show this speedup is optimal, and provide analysis to help explain why this speedup exists. With the understanding that the span assumption is a point of weakness of finite sum algorithms, future work may purposefully exploit this to yield even faster algorithms in the big data regime.

This page was built for publication: Breaking the Span Assumption Yields Fast Finite-Sum Minimization

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6301845)