Reconciling modern machine-learning practice and the classical bias–variance trade-off

From MaRDI portal

Revision as of 17:52, 8 February 2024 by Import240129110113 (talk | contribs) (Created automatically from import240129110113)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Publication:5218544

Jump to:navigation, search

DOI10.1073/pnas.1903070116zbMath1433.68325arXiv1812.11118OpenAlexW2963518130WikidataQ92153099 ScholiaQ92153099MaRDI QIDQ5218544

Soumik Mandal, Mikhail Belkin, Daniel Hsu, Siyuan Ma

Publication date: 4 March 2020

Published in: Proceedings of the National Academy of Sciences (Search for Journal in Brave)

Full work available at URL: https://arxiv.org/abs/1812.11118

Mathematics Subject Classification ID

Classification and discrimination; cluster analysis (statistical aspects) (62H30) Artificial neural networks and deep learning (68T07) General nonlinear regression (62J02) Learning and adaptive systems in artificial intelligence (68T05) Computational aspects of data analysis and big data (68T09)

Related Items (93)

Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks ⋮ Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks ⋮ Deep learning: a statistical viewpoint ⋮ Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation ⋮ Machine learning from a continuous viewpoint. I ⋮ Deep learning for inverse problems. Abstracts from the workshop held March 7--13, 2021 (hybrid meeting) ⋮ Surprises in high-dimensional ridgeless least squares interpolation ⋮ Counterfactual inference with latent variable and its application in mental health care ⋮ Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration ⋮ Loss landscapes and optimization in over-parameterized non-linear systems and neural networks ⋮ Neural network training using \(\ell_1\)-regularization and bi-fidelity data ⋮ Learning curves of generic features maps for realistic datasets with a teacher-student model* ⋮ Deep networks on toroids: removing symmetries reveals the structure of flat regions in the landscape geometry* ⋮ A precise high-dimensional asymptotic theory for boosting and minimum-\(\ell_1\)-norm interpolated classifiers ⋮ Dimensionality Reduction, Regularization, and Generalization in Overparameterized Regressions ⋮ Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting, and Regularization ⋮ Prevalence of neural collapse during the terminal phase of deep learning training ⋮ Overparameterized neural networks implement associative memory ⋮ Benign overfitting in linear regression ⋮ The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima ⋮ On Transversality of Bent Hyperplane Arrangements and the Topological Expressiveness of ReLU Neural Networks ⋮ Scientific machine learning through physics-informed neural networks: where we are and what's next ⋮ Overparameterization and Generalization Error: Weighted Trigonometric Interpolation ⋮ Benefit of Interpolation in Nearest Neighbor Algorithms ⋮ On the Benefit of Width for Neural Networks: Disappearance of Basins ⋮ Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits ⋮ HARFE: hard-ridge random feature expansion ⋮ SCORE: approximating curvature information under self-concordant regularization ⋮ Deep empirical risk minimization in finance: Looking into the future ⋮ High dimensional binary classification under label shift: phase transition and regularization ⋮ Large-dimensional random matrix theory and its applications in deep learning and wireless communications ⋮ On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions ⋮ Free dynamics of feature learning processes ⋮ A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors ⋮ Reliable extrapolation of deep neural operators informed by physics or sparse observations ⋮ Learning algebraic models of quantum entanglement ⋮ Re-thinking high-dimensional mathematical statistics. Abstracts from the workshop held May 15--21, 2022 ⋮ Unnamed Item ⋮ Is deep learning a useful tool for the pure mathematician? ⋮ Also for \(k\)-means: more data does not imply better performance ⋮ Random neural networks in the infinite width limit as Gaussian processes ⋮ Stability of the scattering transform for deformations with minimal regularity ⋮ High-Dimensional Analysis of Double Descent for Linear Regression with Random Projections ⋮ On the robustness of sparse counterfactual explanations to adverse perturbations ⋮ On the influence of over-parameterization in manifold based surrogates and deep neural operators ⋮ An instance-dependent simulation framework for learning with label noise ⋮ On lower bounds for the bias-variance trade-off ⋮ Benign Overfitting and Noisy Features ⋮ Learning ability of interpolating deep convolutional neural networks ⋮ The mathematics of artificial intelligence ⋮ Unnamed Item ⋮ On the properties of bias-variance decomposition for kNN regression ⋮ Discussion of: ``Nonparametric regression using deep neural networks with ReLU activation function ⋮ Optimization for deep learning: an overview ⋮ Landscape and training regimes in deep learning ⋮ Over-parametrized deep neural networks minimizing the empirical risk do not generalize well ⋮ A statistician teaches deep learning ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Shallow neural networks for fluid flow reconstruction with limited sensors ⋮ A generic physics-informed neural network-based constitutive model for soft biological tissues ⋮ A selective overview of deep learning ⋮ Linearized two-layers neural networks in high dimension ⋮ The Random Feature Model for Input-Output Maps between Banach Spaces ⋮ High-dimensional dynamics of generalization error in neural networks ⋮ Generalization Error of Minimum Weighted Norm and Kernel Interpolation ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Dimension independent excess risk by stochastic gradient descent ⋮ Implicit Regularization and Momentum Algorithms in Nonlinearly Parameterized Adaptive Control and Prediction ⋮ Precise statistical analysis of classification accuracies for adversarial training ⋮ On the robustness of minimum norm interpolators and regularized empirical risk minimizers ⋮ Scaling description of generalization with number of parameters in deep learning ⋮ A Multi-resolution Theory for Approximating Infinite-p-Zero-n: Transitional Inference, Individualized Predictions, and a World Without Bias-Variance Tradeoff ⋮ Large scale analysis of generalization error in learning using margin based classification methods ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ Unnamed Item ⋮ AdaBoost and robust one-bit compressed sensing ⋮ Unnamed Item ⋮ A Unifying Tutorial on Approximate Message Passing ⋮ Bayesian learning via neural Schrödinger-Föllmer flows ⋮ Understanding neural networks with reproducing kernel Banach spaces ⋮ The interpolation phase transition in neural networks: memorization and generalization under lazy training ⋮ A sieve stochastic gradient descent estimator for online nonparametric regression in Sobolev ellipsoids ⋮ A random matrix analysis of random Fourier features: beyond the Gaussian kernel, a precise phase transition, and the corresponding double descent* ⋮ Generalisation error in learning with random features and the hidden manifold model* ⋮ For interpolating kernel machines, minimizing the norm of the ERM solution maximizes stability ⋮ Two Models of Double Descent for Weak Features ⋮ Prediction errors for penalized regressions based on generalized approximate message passing

This page was built for publication: Reconciling modern machine-learning practice and the classical bias–variance trade-off

Retrieved from "https://portal.mardi4nfdi.de/w/index.php?title=Publication:5218544&oldid=19826961"