Why does deep and cheap learning work so well?
From MaRDI portal
Abstract: We show how the success of deep learning could depend not only on mathematics but also on physics: although well-known mathematical theorems guarantee that neural networks can approximate arbitrary functions well, the class of functions of practical interest can frequently be approximated through "cheap learning" with exponentially fewer parameters than generic ones. We explore how properties frequently encountered in physics such as symmetry, locality, compositionality, and polynomial log-probability translate into exceptionally simple neural networks. We further argue that when the statistical process generating the data is of a certain hierarchical form prevalent in physics and machine-learning, a deep neural network can be more efficient than a shallow one. We formalize these claims using information theory and discuss the relation to the renormalization group. We prove various "no-flattening theorems" showing when efficient linear deep networks cannot be accurately approximated by shallow ones without efficiency loss, for example, we show that variables cannot be multiplied using fewer than 2^n neurons in a single hidden layer.
Recommendations
- The unreasonable effectiveness of deep learning in artificial intelligence
- Why does unsupervised pre-training help deep learning?
- The Science of Deep Learning
- Why do deep convolutional networks generalize so poorly to small image transformations?
- Deep vs. shallow networks: an approximation theory perspective
Cites work
- scientific article; zbMATH DE number 1064082 (Why is no real title available?)
- scientific article; zbMATH DE number 1161568 (Why is no real title available?)
- scientific article; zbMATH DE number 1405266 (Why is no real title available?)
- scientific article; zbMATH DE number 3090543 (Why is no real title available?)
- 10.1162/153244303765208368
- Approximation by superpositions of a sigmoidal function
- Causal structure of the entanglement renormalization ansatz
- Deep vs. shallow networks: an approximation theory perspective
- Elements of Information Theory
- Gaussian elimination is not optimal
- Hierarchical model of natural images and the origin of scale invariance
- Information Theory and Statistical Mechanics
- Learning deep architectures for AI
- Multilayer feedforward networks are universal approximators
- On Information and Sufficiency
- On the expressive power of deep architectures
- Powers of tensors and fast matrix multiplication
- Solving the quantum many-body problem with artificial neural networks
- Statistical Physics of Fields
- Structural risk minimization over data-dependent hierarchies
Cited in
(39)- Deep learning acceleration of total Lagrangian explicit dynamics for soft tissue mechanics
- On PDE characterization of smooth hierarchical functions computed by neural networks
- Compositional sparsity of learnable functions
- scientific article; zbMATH DE number 7626714 (Why is no real title available?)
- Optimization under uncertainty explains empirical success of deep learning heuristics
- Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations
- Localized learning: a possible alternative to current deep learning techniques
- On decision regions of narrow deep neural networks
- Explicitly antisymmetrized neural network layers for variational Monte Carlo simulation
- Exact maximum-entropy estimation with Feynman diagrams
- Solving second-order nonlinear evolution partial differential equations using deep learning
- Enforcing constraints for interpolation and extrapolation in generative adversarial networks
- Theoretical issues in deep networks
- On the approximation of functions by tanh neural networks
- Features of the spectral density of a spin system
- Machine learning algorithms based on generalized Gibbs ensembles
- Linearly recurrent autoencoder networks for learning dynamics
- Hierarchical deep learning neural network (HiDeNN): an artificial intelligence (AI) framework for computational science and engineering
- Topology optimization based on deep representation learning (DRL) for compliance and stress-constrained design
- ReLU networks are universal approximators via piecewise linear or constant functions
- Quantifying the separability of data classes in neural networks
- Universal approximation with quadratic deep networks
- A physics-constrained deep residual network for solving the sine-Gordon equation
- On functions computed on trees
- A selective overview of deep learning
- A computational perspective of the role of the thalamus in cognition
- Dunkl analouge of Szász Schurer Beta bivariate operators
- Deep neural networks for rotation-invariance approximation and learning
- Provably scale-covariant continuous hierarchical networks based on scale-normalized differential expressions coupled in cascade
- Deep learning the Ising model near criticality
- Measurement error models: from nonparametric methods to deep neural networks
- Transport analysis of infinitely deep neural network
- Optimal adaptive control of partially uncertain linear continuous-time systems with state delay
- Free dynamics of feature learning processes
- Understanding autoencoders with information theoretic concepts
- Constructive expansion for vector field theories I. Quartic models in low dimensions
- The Modern Mathematics of Deep Learning
- Resolution and relevance trade-offs in deep learning
- Deep distributed convolutional neural networks: universality
This page was built for publication: Why does deep and cheap learning work so well?
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1676557)