Robust and resource-efficient identification of two hidden layer neural networks

DOI10.1007/S00365-021-09550-5MaRDI QIDQ2117339zbMATH OpenOpenAlexFDO

Authors Massimo Fornasier, Timo Klock, Michael Rauchensteiner

Publication date 21 March 2022

Published in Constructive Approximation (Search for Journal in Brave)

Copyright license Creative Commons Attribution 4.0 International

Full work available at URL https://arxiv.org/abs/1907.00485

frames deep neural networks active sampling deparametrization nonconvex optimization on matrix spaces exact identifiability

Mathematics Subject Classification ID

Artificial neural networks and deep learning (68T07) Nonconvex programming, global optimization (90C26) Algorithms for approximation of functions (65D15)

Abstract: We address the structure identification and the uniform approximation of two fully nonlinear layer neural networks of the type

f (x) = 1^{T} h (B^{T} g (A^{T} x))

on

m a t h b b R^{d}

from a small number of query samples. We approach the problem by sampling actively finite difference approximations to Hessians of the network. Gathering several approximate Hessians allows reliably to approximate the matrix subspace

m a t h c a l W

spanned by symmetric tensors

a_{1} o t i m e s a_{1}, d o t s, a_{m_{0}} o t i m e s a_{m_{0}}

formed by weights of the first layer together with the entangled symmetric tensors

v_{1} o t i m e s v_{1}, d o t s, v_{m_{1}} o t i m e s v_{m_{1}}

, formed by suitable combinations of the weights of the first and second layer as

v_{e} l l = A G_{0} b_{e} l l / | A G_{0} b_{e} l l |_{2}

,

e l l i n [m_{1}]

, for a diagonal matrix

G_{0}

depending on the activation functions of the first layer. The identification of the 1-rank symmetric tensors within

m a t h c a l W

is then performed by the solution of a robust nonlinear program. We provide guarantees of stable recovery under a posteriori verifiable conditions. We further address the correct attribution of approximate weights to the first or second layer. By using a suitably adapted gradient descent iteration, it is possible then to estimate, up to intrinsic symmetries, the shifts of the activations functions of the first layer and compute exactly the matrix

G_{0}

. Our method of identification of the weights of the network is fully constructive, with quantifiable sample complexity, and therefore contributes to dwindle the black-box nature of the network training phase. We corroborate our theoretical results by extensive numerical experiments.

Recommendations

Cites work

Cited in

(6)

Describes a project that uses

Uses Software

This page was built for publication: Robust and resource-efficient identification of two hidden layer neural networks

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2117339)