Wide neural networks of any depth evolve as linear models under gradient descent *

From MaRDI portal
Publication:5857449

DOI10.1088/1742-5468/abc62bOpenAlexW2970217468MaRDI QIDQ5857449

Jeffrey Pennington, Samuel S. Schoenholz, Roman Novak, Yasaman Bahri, Lechao Xiao, Jascha Sohl-Dickstein, Jae Hoon Lee

Publication date: 1 April 2021

Published in: Journal of Statistical Mechanics: Theory and Experiment (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1902.06720




Related Items (37)

Training a Neural-Network-Based Surrogate Model for Aerodynamic Optimisation Using a Gaussian ProcessFit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolationSurprises in high-dimensional ridgeless least squares interpolationLoss landscapes and optimization in over-parameterized non-linear systems and neural networksWhen and why PINNs fail to train: a neural tangent kernel perspectiveLocality defeats the curse of dimensionality in convolutional teacher–student scenarios*Adaptive and Implicit Regularization for Matrix CompletionImproved architectures and training algorithms for deep operator networksDrop-activation: implicit parameter reduction and harmonious regularizationData science applications to string theoryAdversarial examples in random neural networks with general activationsA rigorous framework for the mean field limit of multilayer neural networksPriors in Bayesian Deep Learning: A ReviewFree dynamics of feature learning processesDeep stable neural networks: large-width asymptotics and convergence ratesGraph-based sparse Bayesian broad learning system for semi-supervised learningWeighted neural tangent kernel: a generalized and improved network-induced kernelDeep Q‐learning: A robust control approachAn asynchronous parallel high-throughput model calibration framework for crystal plasticity finite element constitutive modelsOn the spectral bias of coupled frequency predictor-corrector triangular DNN: the convergence analysisSome models are useful, but how do we know which ones? Towards a unified Bayesian model taxonomyAffine-invariant ensemble transform methods for logistic regressionLandscape and training regimes in deep learningOn the eigenvector bias of Fourier feature networks: from regression to solving multi-scale PDEs with physics-informed neural networksA statistician teaches deep learningUnnamed ItemPlateau Phenomenon in Gradient Descent Training of RELU Networks: Explanation, Quantification, and AvoidanceLinearized two-layers neural networks in high dimensionOn the Effect of the Activation Function on the Distribution of Hidden Nodes in a Deep NetworkUnnamed ItemUnnamed ItemMultilevel Fine-Tuning: Closing Generalization Gaps in Approximation of Solution Maps under a Limited Budget for Training DataMachine unlearning: linear filtration for logit-based classifiersThe interpolation phase transition in neural networks: memorization and generalization under lazy trainingProvably training overparameterized neural network classifiers with non-convex constraintsUnderstanding approximate Fisher information for fast convergence of natural gradient descent in wide neural networks*Discriminative clustering with representation learning with any ratio of labeled to unlabeled data


Uses Software


Cites Work


This page was built for publication: Wide neural networks of any depth evolve as linear models under gradient descent *