Reconstructing a neural net from its output (Q1344572)

Summary: Neural nets were originally introduced as highly simplified models of the nervous system. Today they are widely used in technology and studied theoretically by scientists from several disciplines. However, they remain little understood. Mathematically, a (feed-forward) neural net consists of (1) a finite sequence of positive integers \((D_0,D_1,\dots,D_L)\), (2) a family of real numbers \((\omega^\ell_{jk})\) defined for \(1\leq\ell\leq L\), \(1\leq j\leq D_\ell\), \(1\leq k\leq D_{\ell-1}\), and (3) a family of real numbers \((\theta^\ell_j)\) defined for \(1\leq\ell\leq L\), \(1\leq j\leq D_\ell\). The sequence \((D_0,D_1,\dots,D_L)\) is called the architecture of the neural net, while the \(\omega^\ell_{jk}\) are called weights and the \(\theta^\ell_j\) thresholds. Neural nets are used to compute nonlinear maps from \(\mathbb{R}^N\) to \(\mathbb{R}^M\) by the following construction. We begin by fixing a nonlinear function \(\sigma(x)\) of one variable. Analogy with the nervous system suggests that we take \(\sigma(t)\) asymptotic to constants as \(t\) tends to \(\pm\infty\); a standard choice, which we adopt throughout this paper, is \(\sigma(x)=\text{tanh}(x/2)\). Given an ``input'' \((t_1,\dots,t_{D_0})\in\mathbb{R}^{D_0}\), we define real numbers \(x^\ell_j\) for \(0\leq\ell\leq L\), \(1\leq j\leq D_\ell\) by the following induction on \(\ell\). If \(\ell=0\) then \(x^\ell_j= t_j\). If the \(x^{\ell-1}_k\) are known, \(1\leq\ell\leq L\), then we set \[ x^\ell_j=\sigma\Biggl( \sum_{1\leq k\leq D_{\ell-1}}\omega^\ell_{jk} x^{\ell-1}_k+ \theta^\ell_j\Biggr),\quad\text{for } 1\leq j\leq D_\ell. \] Here \(x^\ell_1,\dots, x^\ell_{D_\ell}\) are interpreted as the output of \(D_\ell\) ``neurons'' in the \(\ell\)th ``layer'' of the net. The output map of the net is defined as the map \[ \Phi: (t_1,\dots, t_{D_0})\mapsto (x^L_1,\dots, x^L_{D_L}). \] In practical applications, one tries to pick the neural net \[ [(D_0,D_1,\dots, D_L),\;(\omega^\ell_{jk}),\;(\theta^\ell_j)] \] so that the output map \(\Phi\) approximates a given map about which we have only imperfect information. The main result of this paper is that under generic conditions, perfect knowledge of the output map \(\Phi\) uniquely specifies the architecture, the weights and the thresholds of a neural net, up to obvious symmetries.

0 references

zbMATH Keywords

neural nets

0 references

MaRDI profile type

MaRDI publication profile

0 references