On early stopping in gradient descent learning

DOI10.1007/s00365-006-0663-2zbMath1125.62035OpenAlexW2034978228WikidataQ56169177 ScholiaQ56169177MaRDI QIDQ2642922

Yuan Yao, Lorenzo Rosasco, Andrea Caponnetto

Publication date: 6 September 2007

Published in: Constructive Approximation (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1007/s00365-006-0663-2

Mathematics Subject Classification ID

Nonparametric regression and quantile regression (62G08) Classification and discrimination; cluster analysis (statistical aspects) (62H30) Computational learning theory (68Q32) Learning and adaptive systems in artificial intelligence (68T05) Stochastic approximation (62L20) Stopping times; optimal stopping problems; gambling theory (60G40)

Related Items

Early stopping for statistical inverse problems via truncated SVD estimation, Synchronization and Redundancy: Implications for Robustness of Neural Learning and Decision Making, Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation, Non-intrusive model reduction of large-scale, nonlinear dynamical systems using deep learning, Construction and Monte Carlo estimation of wavelet frames generated by a reproducing kernel, LEAST SQUARE REGRESSION WITH COEFFICIENT REGULARIZATION BY GRADIENT DESCENT, A robust framework for identification of PDEs from noisy data, Multidimensional item response theory in the style of collaborative filtering, Geometry on probability spaces, Nonparametric stochastic approximation with large step-sizes, Hermite learning with gradient data, Toward Efficient Ensemble Learning with Structure Constraints: Convergent Algorithms and Applications, Distributed regression learning with coefficient regularization, Optimal learning rates for kernel partial least squares, On regularization algorithms in learning theory, Gradient descent for robust kernel-based regression, Distributed kernel gradient descent algorithm for minimum error entropy principle, Learning the mapping \(\mathbf{x}\mapsto \sum\limits_{i=1}^d x_i^2\): the cost of finding the needle in a haystack, Kernel-based online gradient descent using distributed approach, Learning theory of distributed spectral algorithms, Data science applications to string theory, Capacity dependent analysis for functional online learning algorithms, Learning gradients via an early stopping gradient descent method, Large-dimensional random matrix theory and its applications in deep learning and wireless communications, Nonparametric Functional Graphical Modeling Through Functional Additive Regression Operator, Convergence rates of gradient methods for convex optimization in the space of measures, Convex regularization in statistical inverse learning problems, Unnamed Item, Unnamed Item, Effective stabilized self-training on few-labeled graph data, Deep learning for natural language processing: a survey, Just interpolate: kernel ``ridgeless regression can generalize, Convergence of the forward-backward algorithm: beyond the worst-case with the help of geometry, Spectral Algorithms for Supervised Learning, Side effects of learning from low-dimensional data embedded in a Euclidean space, Gradient descent for deep matrix factorization: dynamics and implicit bias towards low rank, The regularized least squares algorithm and the problem of learning halfspaces, Mercer's theorem on general domains: on the interaction between measures, kernels, and RKHSs, Unnamed Item, Consistency analysis of spectral regularization algorithms, Smoothed residual stopping for statistical inverse problems via truncated SVD estimation, Boosting algorithms: regularization, prediction and model fitting, Online learning for quantile regression and support vector regression, Adaptive kernel methods using the balancing principle, Online Pairwise Learning Algorithms, Kernel methods in system identification, machine learning and function estimation: a survey, Kernel gradient descent algorithm for information theoretic learning, Optimal rates for regularization of statistical inverse learning problems, Distributed kernel-based gradient descent algorithms, Theoretical investigation of generalization bounds for adversarial learning of deep neural networks, Variational networks: an optimal control approach to early stopping variational methods for image restoration, Parzen windows for multi-class classification, Learning gradients by a gradient descent algorithm, Learning from non-identical sampling for classification, Bi-cross-validation for factor analysis, Boosting with structural sparsity: a differential inclusion approach, Balancing principle in supervised learning for a general regularization scheme, Some properties of Gaussian reproducing kernel Hilbert spaces and their implications for function approximation and learning theory, Distributed linear regression by averaging, Estimation of local degree distributions via local weighted averaging and Monte Carlo cross-validation, Sparse recovery via differential inclusions, Learning rates of gradient descent algorithm for classification, Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces, Deep unfolding of a proximal interior point method for image restoration, Convergence Rates of Spectral Regularization Methods: A Comparison between Ill-Posed Inverse Problems and Statistical Kernel Learning, Fast and strong convergence of online learning algorithms, High-dimensional dynamics of generalization error in neural networks, Gradient-based Regularization Parameter Selection for Problems With Nonsmooth Penalty Functions, Semi-supervised learning with summary statistics, Distributed learning with indefinite kernels, Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression, Optimal Rates for Multi-pass Stochastic Gradient Methods, An elementary analysis of ridge regression with random design, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, Unnamed Item, A boosting inspired personalized threshold method for sepsis screening, Thresholded spectral algorithms for sparse approximations, From inexact optimization to learning via gradient concentration, Implicit regularization with strongly convex bias: Stability and acceleration