Make \(\ell_1\) regularization effective in training sparse CNN (Q782914)

This paper considers the training of sparse deep neural networks. The aim of the paper is to study sparse training algorithms for a special class of deep neural networks, namely convolutional neural networks (CNN). The paper focuses on the network pruning as the most popular compressing method due to its good compatibility and competitive performance. The authors report that the simple dual averaging (SDA) method , with some appropriate modification, can also be made highly effectivе with an \( \ell _ 1\) regularization to obtain sparse convolutional neural networks. In particular, by combining it with an \(\ell _ 1\) regularization , the authors develop the corresponding regularized dual averaging (RDA) method. \textit{L. Xiao} [J. Mach. Learn. Res. 11, 2543--2596 (2010; Zbl 1242.62011)] originally developed the RDA method specifically for convex problem. The RDA method proposed by the authors in this paper (using proper initialization and adaptivity) with an \(\ell _ 1\) regularization achieves a state-of-the-art sparsity for the highly non-convex CNN compared to other weight pruning methods without a compromising generalization accuracy.

0 references

zbMATH Keywords

sparse optimization

0 references

\(\ell_1\) regularization

0 references

dual averaging

0 references

reviewed by

Nada I. Djuranović-Miličić

0 references

describes a project that uses