DOI10.1073/pnas.1903070116zbMath1433.68325arXiv1812.11118OpenAlexW2963518130WikidataQ92153099 ScholiaQ92153099MaRDI QIDQ5218544
Soumik Mandal, Mikhail Belkin, Daniel Hsu, Siyuan Ma
Publication date: 4 March 2020
Published in: Proceedings of the National Academy of Sciences (Search for Journal in Brave)
Full work available at URL: https://arxiv.org/abs/1812.11118
Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks ⋮
Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks ⋮
Deep learning: a statistical viewpoint ⋮
Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation ⋮
Machine learning from a continuous viewpoint. I ⋮
Deep learning for inverse problems. Abstracts from the workshop held March 7--13, 2021 (hybrid meeting) ⋮
Surprises in high-dimensional ridgeless least squares interpolation ⋮
Counterfactual inference with latent variable and its application in mental health care ⋮
Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration ⋮
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks ⋮
Neural network training using \(\ell_1\)-regularization and bi-fidelity data ⋮
Learning curves of generic features maps for realistic datasets with a teacher-student model* ⋮
Deep networks on toroids: removing symmetries reveals the structure of flat regions in the landscape geometry* ⋮
A precise high-dimensional asymptotic theory for boosting and minimum-\(\ell_1\)-norm interpolated classifiers ⋮
Dimensionality Reduction, Regularization, and Generalization in Overparameterized Regressions ⋮
Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting, and Regularization ⋮
Prevalence of neural collapse during the terminal phase of deep learning training ⋮
Overparameterized neural networks implement associative memory ⋮
Benign overfitting in linear regression ⋮
The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima ⋮
On Transversality of Bent Hyperplane Arrangements and the Topological Expressiveness of ReLU Neural Networks ⋮
Scientific machine learning through physics-informed neural networks: where we are and what's next ⋮
Overparameterization and Generalization Error: Weighted Trigonometric Interpolation ⋮
Benefit of Interpolation in Nearest Neighbor Algorithms ⋮
On the Benefit of Width for Neural Networks: Disappearance of Basins ⋮
Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits ⋮
HARFE: hard-ridge random feature expansion ⋮
SCORE: approximating curvature information under self-concordant regularization ⋮
Deep empirical risk minimization in finance: Looking into the future ⋮
High dimensional binary classification under label shift: phase transition and regularization ⋮
Large-dimensional random matrix theory and its applications in deep learning and wireless communications ⋮
On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions ⋮
Free dynamics of feature learning processes ⋮
A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors ⋮
Reliable extrapolation of deep neural operators informed by physics or sparse observations ⋮
Learning algebraic models of quantum entanglement ⋮
Re-thinking high-dimensional mathematical statistics. Abstracts from the workshop held May 15--21, 2022 ⋮
Unnamed Item ⋮
Is deep learning a useful tool for the pure mathematician? ⋮
Also for \(k\)-means: more data does not imply better performance ⋮
Random neural networks in the infinite width limit as Gaussian processes ⋮
Stability of the scattering transform for deformations with minimal regularity ⋮
High-Dimensional Analysis of Double Descent for Linear Regression with Random Projections ⋮
On the robustness of sparse counterfactual explanations to adverse perturbations ⋮
On the influence of over-parameterization in manifold based surrogates and deep neural operators ⋮
An instance-dependent simulation framework for learning with label noise ⋮
On lower bounds for the bias-variance trade-off ⋮
Benign Overfitting and Noisy Features ⋮
Learning ability of interpolating deep convolutional neural networks ⋮
The mathematics of artificial intelligence ⋮
Unnamed Item ⋮
On the properties of bias-variance decomposition for kNN regression ⋮
Discussion of: ``Nonparametric regression using deep neural networks with ReLU activation function ⋮ Optimization for deep learning: an overview ⋮ Landscape and training regimes in deep learning ⋮ Over-parametrized deep neural networks minimizing the empirical risk do not generalize well ⋮ A statistician teaches deep learning ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Shallow neural networks for fluid flow reconstruction with limited sensors ⋮ A generic physics-informed neural network-based constitutive model for soft biological tissues ⋮ A selective overview of deep learning ⋮ Linearized two-layers neural networks in high dimension ⋮ The Random Feature Model for Input-Output Maps between Banach Spaces ⋮ High-dimensional dynamics of generalization error in neural networks ⋮ Generalization Error of Minimum Weighted Norm and Kernel Interpolation ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Dimension independent excess risk by stochastic gradient descent ⋮ Implicit Regularization and Momentum Algorithms in Nonlinearly Parameterized Adaptive Control and Prediction ⋮ Precise statistical analysis of classification accuracies for adversarial training ⋮ On the robustness of minimum norm interpolators and regularized empirical risk minimizers ⋮ Scaling description of generalization with number of parameters in deep learning ⋮ A Multi-resolution Theory for Approximating Infinite-p-Zero-n: Transitional Inference, Individualized Predictions, and a World Without Bias-Variance Tradeoff ⋮ Large scale analysis of generalization error in learning using margin based classification methods ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ AdaBoost and robust one-bit compressed sensing ⋮ Unnamed Item ⋮ A Unifying Tutorial on Approximate Message Passing ⋮ Bayesian learning via neural Schrödinger-Föllmer flows ⋮ Understanding neural networks with reproducing kernel Banach spaces ⋮ The interpolation phase transition in neural networks: memorization and generalization under lazy training ⋮ A sieve stochastic gradient descent estimator for online nonparametric regression in Sobolev ellipsoids ⋮ A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent* ⋮ Generalisation error in learning with random features and the hidden manifold model* ⋮ For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability ⋮ Two Models of Double Descent for Weak Features ⋮ Prediction errors for penalized regressions based on generalized approximate message passing
This page was built for publication: Reconciling modern machine-learning practice and the classical bias–variance trade-off