Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods

DOI10.1007/S11590-023-02026-4arXiv2208.14318MaRDI QIDQ6409258FDOQ6409258

Authors: Jintao Xu, Chenglong Bao, Wenxun Xing

Publication date: 30 August 2022

Abstract: Training deep neural networks (DNNs) is an important and challenging optimization problem in machine learning due to its non-convexity and non-separable structure. The alternating minimization (AM) approaches split the composition structure of DNNs and have drawn great interest in the deep learning and optimization communities. In this paper, we propose a unified framework for analyzing the convergence rate of AM-type network training methods. Our analysis is based on the non-monotone

j

-step sufficient decrease conditions and the Kurdyka-Lojasiewicz (KL) property, which relaxes the requirement of designing descent algorithms. We show the detailed local convergence rate if the KL exponent

h e t a

varies in

[0, 1)

. Moreover, the local R-linear convergence is discussed under a stronger

j

-step sufficient decrease condition.

Recommendations

Mathematics Subject Classification ID

Mathematical programming (90Cxx)

Cited In (1)

A convergence analysis of Nesterov's accelerated gradient method in training deep linear neural networks

This page was built for publication: Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6409258)