On the benefit of width for neural networks: disappearance of basins

From MaRDI portal
Publication:5097010

DOI10.1137/21M1394205zbMATH Open1493.68331arXiv1812.11039OpenAlexW4289334798MaRDI QIDQ5097010FDOQ5097010


Authors: Dawei Li, Tian Ding, Ruoyu Sun Edit this on Wikidata


Publication date: 19 August 2022

Published in: SIAM Journal on Optimization (Search for Journal in Brave)

Abstract: Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove? To understand the benefit of width, it is important to identify the difference between wide and narrow networks. In this work, we prove that from narrow to wide networks, there is a phase transition from having sub-optimal basins to no sub-optimal basins. Specifically, we prove two results: on the positive side, for any continuous activation functions, the loss surface of a class of wide networks has no sub-optimal basins, where "basin" is defined as the set-wise strict local minimum; on the negative side, for a large class of networks with width below a threshold, we construct strict local minima that are not global. These two results together show the phase transition from narrow to wide networks.


Full work available at URL: https://arxiv.org/abs/1812.11039




Recommendations




Cites Work


Cited In (4)





This page was built for publication: On the benefit of width for neural networks: disappearance of basins

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5097010)