Łukasiewicz logic and artificial neural networks (Q2215899)

While deep neural networks have been very successful in solving many complex problems, they have a serious limitation -- that they are a black box, that their results do not come with any explanation. This paper provides an important step towards generating such explanations. It starts with the fact that deep learning approximates any function by a superposition of linear functions and the rectified linear activation function \(s(x)=\max(0,x)\), a function that ``truncates'' all the inputs into the set of non-negative numbers. It is possible to prove that if we re-scale all inputs and the output to the interval \([0,1]\), then we get the same class of approximating functions if we use a slightly different activation function \(s(x)=\min(\max(0,x),1)\) that truncates all its inputs into the interval \([0,1]\). Since we are talking about approximations anyway, we can always safely assume that the coefficients of the corresponding linear functions are rational. Interestingly, the resulting functions \(f(x_1,\ldots,x_m)\) are exactly what we can get from ``fuzzy'' inputs \(x_i\in [0,1]\) by applying negation \(1-x\), ``or''-operation \(\min(x+y,1)\), and additional ``decrease-in-confidence'' operations \(\delta_n(x)\stackrel{\mathrm{def}}=x/n\) -- so we can get a natural logical interpretation of the deep learning results in this ``Łukasiewicz logic''. From this viewpoint, the known results that 3-layer neural networks are universal approximators turn out to be related to a similar-sounding fact that any logical formula can be represented in a CNF or DNF form. Resulting functions are piece-wise linear. The authors generalize the known simple result about linear functions -- that they are uniquely determined by their values at finitely many points -- into a similar result about piece-wise linear functions: namely that for each such function \(f\) there exists a finite set of points for which \(f\) is the only function with given values at these points that has the smallest possible number of linear parts. Similar results are also proven for the case when, instead of directly applying a deep neural network to the original values \(x_1,\ldots,x_n\), we first perform some nonlinear transformations \(x_i\mapsto x'_i=h_i(x_i)\). For the entire collection see [Zbl 1448.68033].

0 references

Mathematics Subject Classification ID

68T37

0 references

0 references

0 references

0 references