Boosting with early stopping: convergence and consistency
From MaRDI portal
Publication:2583412
Abstract: Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulting estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early-stopping strategies under which boosting is shown to be consistent based on i.i.d. samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step-sizes, as known in practice through the work of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with epsilon o0 step-size becomes an L^1-margin maximizer when left to run to convergence.
Recommendations
Cites work
- scientific article; zbMATH DE number 2089354 (Why is no real title available?)
- scientific article; zbMATH DE number 1804107 (Why is no real title available?)
- scientific article; zbMATH DE number 49190 (Why is no real title available?)
- scientific article; zbMATH DE number 47282 (Why is no real title available?)
- scientific article; zbMATH DE number 1332320 (Why is no real title available?)
- scientific article; zbMATH DE number 1869443 (Why is no real title available?)
- 10.1162/153244303321897690
- 10.1162/1532443041424300
- 10.1162/1532443041424319
- 10.1162/153244304773936108
- A decision-theoretic generalization of on-line learning and an application to boosting
- A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training
- Additive logistic regression: a statistical view of boosting. (With discussion and a rejoinder by the authors)
- Arcing classifiers. (With discussion)
- Boosting With theL2Loss
- Boosting the margin: a new explanation for the effectiveness of voting methods
- Complexities of convex combinations and bounding the generalization error in classification
- Convexity, Classification, and Risk Bounds
- Efficient agnostic learning of neural networks with bounded fan-in
- Empirical margin distributions and bounding the generalization error of combined classifiers
- Greedy function approximation: A gradient boosting machine.
- Improved boosting algorithms using confidence-rated predictions
- Local Rademacher complexities
- Logistic regression, AdaBoost and Bregman distances
- Matching pursuits with time-frequency dictionaries
- On the Bayes-risk consistency of regularized boosting methods.
- Population theory for boosting ensembles.
- Process consistency for AdaBoost.
- Sequential greedy approximation for certain convex optimization problems
- Statistical behavior and consistency of classification methods based on convex risk minimization.
- The elements of statistical learning. Data mining, inference, and prediction
- Universal approximation bounds for superpositions of a sigmoidal function
- Weak convergence and empirical processes. With applications to statistics
Cited in
(74)- Early stopping in \(L_{2}\)Boosting
- Adaptive step-length selection in gradient boosting for Gaussian location and scale models
- Stochastic boosting algorithms
- Stochastic boosting algorithms
- Accelerated gradient boosting
- Nonparametric stochastic approximation with large step-sizes
- Toward Efficient Ensemble Learning with Structure Constraints: Convergent Algorithms and Applications
- Coupling the reduced-order model and the generative model for an importance sampling estimator
- Double machine learning with gradient boosting and its application to the Big \(N\) audit quality effect
- Boosting algorithms: regularization, prediction and model fitting
- AdaBoost and robust one-bit compressed sensing
- Random gradient boosting for predicting conditional quantiles
- Deep learning for natural language processing: a survey
- Consistency and generalization bounds for maximum entropy density estimation
- Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces
- AdaBoost is consistent
- Complexities of convex combinations and bounding the generalization error in classification
- Boosted nonparametric hazards with time-dependent covariates
- Bi-cross-validation for factor analysis
- Mathematical foundations of machine learning. Abstracts from the workshop held March 21--27, 2021 (hybrid meeting)
- A boosting method for maximization of the area under the ROC curve
- The implicit bias of gradient descent on separable data
- Variational networks: an optimal control approach to early stopping variational methods for image restoration
- A stochastic approximation view of boosting
- A precise high-dimensional asymptotic theory for boosting and minimum-\(\ell_1\)-norm interpolated classifiers
- A new perspective on boosting in linear regression via subgradient optimization and relatives
- Deep learning: a statistical viewpoint
- Supervised projection approach for boosting classifiers
- scientific article; zbMATH DE number 7370593 (Why is no real title available?)
- Regularization in statistics
- SVM-boosting based on Markov resampling: theory and algorithm
- Analysis of boosting algorithms using the smooth margin function
- Aggregation of estimators and stochastic optimization
- Survival ensembles by the sum of pairwise differences with application to lung cancer microarray studies
- Interpreting initial offset boosting via reconstitution in integral domain
- Deformation of log-likelihood loss function for multiclass boosting
- scientific article; zbMATH DE number 2089371 (Why is no real title available?)
- Random classification noise defeats all convex potential boosters
- Boosting in the presence of outliers: adaptive classification with nonconvex loss functions
- Early stopping and non-parametric regression: an optimal data-dependent stopping rule
- Boosting for high-dimensional linear models
- scientific article; zbMATH DE number 5251641 (Why is no real title available?)
- A boosting inspired personalized threshold method for sepsis screening
- Randomized Gradient Boosting Machine
- A new accelerated proximal boosting machine with convergence rate \(O(1/t^2)\)
- Population theory for boosting ensembles.
- Optimization by Gradient Boosting
- Boosting as a regularized path to a maximum margin classifier
- On boosting kernel regression
- 10.1162/jmlr.2003.3.4-5.863
- Small area estimation of the homeless in Los Angeles: an application of cost-sensitive stochastic gradient boosting
- Fast and strong convergence of online learning algorithms
- scientific article; zbMATH DE number 2089370 (Why is no real title available?)
- scientific article; zbMATH DE number 7306853 (Why is no real title available?)
- Rejoinder: One-step sparse estimates in nonconcave penalized likelihood models
- Dimension reduction boosting
- Fully corrective boosting with arbitrary loss and regularization
- Convergence and Consistency of Regularized Boosting With Weakly Dependent Observations
- Pinball boosting of regression quantiles
- AdaBoost Semiparametric Model Averaging Prediction for Multiple Categories
- Unbiased Boosting Estimation for Censored Survival Data
- Implicit regularization with strongly convex bias: Stability and acceleration
- Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models
- Use of majority votes in statistical learning
- Tweedie gradient boosting for extremely unbalanced zero-inflated data
- The vanishing learning rate asymptotic for linear \(L^2\)-boosting
- Early stopping for $ L^2 $-boosting in high-dimensional linear models
- Explainable subgradient tree boosting for prescriptive analytics in operations management
- Fast iterative regularization by reusing data
- Estimation and inference of treatment effects with \(L_2\)-boosting in high-dimensional settings
- A large-sample theory for infinitesimal gradient boosting
- Tree-based boosting with functional data
- Infinitesimal gradient boosting
- Fully corrective gradient boosting with squared hinge: fast learning rates and early stopping
This page was built for publication: Boosting with early stopping: convergence and consistency
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2583412)