An Improved Analysis of Stochastic Gradient Descent with Momentum

From MaRDI portal
Publication:6345192

arXiv2007.07989MaRDI QIDQ6345192FDOQ6345192

Yanli Liu, Wotao Yin, Yuan Gao

Publication date: 15 July 2020

Abstract: SGD with momentum (SGDM) has been widely applied in many machine learning tasks, and it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise manner. Despite of its empirical advantage over SGD, the role of momentum is still unclear in general since previous analyses on SGDM either provide worse convergence bounds than those of SGD, or assume Lipschitz or quadratic objectives, which fail to hold in practice. Furthermore, the role of dynamic parameters has not been addressed. In this work, we show that SGDM converges as fast as SGD for smooth objectives under both strongly convex and nonconvex settings. We also establish extit{the first} convergence guarantee for the multistage setting, and show that the multistage strategy is beneficial for SGDM compared to using fixed parameters. Finally, we verify these theoretical claims by numerical experiments.




Has companion code repository: https://github.com/gao-yuan-hangzhou/improved-analysis-sgdm








This page was built for publication: An Improved Analysis of Stochastic Gradient Descent with Momentum

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6345192)