Convergence of a Relaxed Variable Splitting Method for Learning Sparse Neural Networks via \ell₁, \ell₀, and transformed-\ell₁ Penalties

From MaRDI portal
Publication:6311145

arXiv1812.05719MaRDI QIDQ6311145FDOQ6311145


Authors: Thu Dinh, Jack Xin Edit this on Wikidata


Publication date: 13 December 2018

Abstract: Sparsification of neural networks is one of the effective complexity reduction methods to improve efficiency and generalizability. We consider the problem of learning a one hidden layer convolutional neural network with ReLU activation function via gradient descent under sparsity promoting penalties. It is known that when the input data is Gaussian distributed, no-overlap networks (without penalties) in regression problems with ground truth can be learned in polynomial time at high probability. We propose a relaxed variable splitting method integrating thresholding and gradient descent to overcome the lack of non-smoothness in the loss function. The sparsity in network weight is realized during the optimization (training) process. We prove that under ell1,ell0; and transformed-ell1 penalties, no-overlap networks can be learned with high probability, and the iterative weights converge to a global limit which is a transformation of the true weight under a novel thresholding operation. Numerical experiments confirm theoretical findings, and compare the accuracy and sparsity trade-off among the penalties.













This page was built for publication: Convergence of a Relaxed Variable Splitting Method for Learning Sparse Neural Networks via $\ell_1, \ell_0$, and transformed-$\ell_1$ Penalties

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6311145)