Wide neural networks of any depth evolve as linear models under gradient descent ^*

DOI10.1088/1742-5468/abc62bOpenAlexW2970217468MaRDI QIDQ5857449

Jeffrey Pennington, Samuel S. Schoenholz, Roman Novak, Yasaman Bahri, Lechao Xiao, Jascha Sohl-Dickstein, Jae Hoon Lee

Publication date: 1 April 2021

Published in: Journal of Statistical Mechanics: Theory and Experiment (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1902.06720

zbMATH Keywords

machine learning

Mathematics Subject Classification ID

Statistical mechanics, structure of matter (82-XX)

Related Items (37)

Training a Neural-Network-Based Surrogate Model for Aerodynamic Optimisation Using a Gaussian Process ⋮ Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation ⋮ Surprises in high-dimensional ridgeless least squares interpolation ⋮ Loss landscapes and optimization in over-parameterized non-linear systems and neural networks ⋮ When and why PINNs fail to train: a neural tangent kernel perspective ⋮ Locality defeats the curse of dimensionality in convolutional teacher–student scenarios* ⋮ Adaptive and Implicit Regularization for Matrix Completion ⋮ Improved architectures and training algorithms for deep operator networks ⋮ Drop-activation: implicit parameter reduction and harmonious regularization ⋮ Data science applications to string theory ⋮ Adversarial examples in random neural networks with general activations ⋮ A rigorous framework for the mean field limit of multilayer neural networks ⋮ Priors in Bayesian Deep Learning: A Review ⋮ Free dynamics of feature learning processes ⋮ Deep stable neural networks: large-width asymptotics and convergence rates ⋮ Graph-based sparse Bayesian broad learning system for semi-supervised learning ⋮ Weighted neural tangent kernel: a generalized and improved network-induced kernel ⋮ Deep Q‐learning: A robust control approach ⋮ An asynchronous parallel high-throughput model calibration framework for crystal plasticity finite element constitutive models ⋮ On the spectral bias of coupled frequency predictor-corrector triangular DNN: the convergence analysis ⋮ Some models are useful, but how do we know which ones? Towards a unified Bayesian model taxonomy ⋮ Affine-invariant ensemble transform methods for logistic regression ⋮ Landscape and training regimes in deep learning ⋮ On the eigenvector bias of Fourier feature networks: from regression to solving multi-scale PDEs with physics-informed neural networks ⋮ A statistician teaches deep learning ⋮ Unnamed Item ⋮ Plateau Phenomenon in Gradient Descent Training of RELU Networks: Explanation, Quantification, and Avoidance ⋮ Linearized two-layers neural networks in high dimension ⋮ On the Effect of the Activation Function on the Distribution of Hidden Nodes in a Deep Network ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Multilevel Fine-Tuning: Closing Generalization Gaps in Approximation of Solution Maps under a Limited Budget for Training Data ⋮ Machine unlearning: linear filtration for logit-based classifiers ⋮ The interpolation phase transition in neural networks: memorization and generalization under lazy training ⋮ Provably training overparameterized neural network classifiers with non-convex constraints ⋮ Understanding approximate Fisher information for fast convergence of natural gradient descent in wide neural networks* ⋮ Discriminative clustering with representation learning with any ratio of labeled to unlabeled data

Uses Software

Cites Work

This page was built for publication: Wide neural networks of any depth evolve as linear models under gradient descent ^*

Wide neural networks of any depth evolve as linear models under gradient descent *

Wide neural networks of any depth evolve as linear models under gradient descent ^*