Over-parametrized deep neural networks minimizing the empirical risk do not generalize well (Q1983625)

The authors contribute to the theoretical understanding of convergence and generalization properties of neural networks. They focus on fully connected neural networks with the sigmoidal squasher activation function in a regression setting. The authors find the global minimum of the empirical risk on the training data using over-parametrization. They give a lower bound to achieve a minimal error on the training data with a high probability and prove that such networks do not generalize well on new data. Specifically, they demonstrate how these networks, although minimizing the empirical risk, do not achieve the optimal convergence for estimation of smooth regression functions. Their Theorem 2 shows that any estimates (such as those stated explicitly in Theorem 1) that probabilistically minimize error on the training data do not, in general, generalize well to new data. (They assume that the distributions of \(X\) concentrate on finite sets). The main takeaway from this paper is a somewhat negative result for this kind of fully connected neural network architecture with this type of sigmoidal activation function. The learning process of an over-parametrized neural network does not matter. It cannot reach the optimal minimax convergence rate when it achieves a minimal empirical risk. In conclusion, it is not clear whether an over-parametrized neural network that minimizes the empirical \(L_{2}\) risk generalizes well on new data.

0 references

reviewed by

Pablo Suárez-Serrato

0 references

zbMATH Keywords

neural networks

0 references

nonparametric regression

0 references

over-parametrization

0 references

rate of convergence