Boosting with early stopping: convergence and consistency
From MaRDI portal
Publication:2583412
Abstract: Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulting estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early-stopping strategies under which boosting is shown to be consistent based on i.i.d. samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step-sizes, as known in practice through the work of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with epsilon o0 step-size becomes an L^1-margin maximizer when left to run to convergence.
Recommendations
Cites work
- scientific article; zbMATH DE number 2089354 (Why is no real title available?)
- scientific article; zbMATH DE number 1804107 (Why is no real title available?)
- scientific article; zbMATH DE number 49190 (Why is no real title available?)
- scientific article; zbMATH DE number 47282 (Why is no real title available?)
- scientific article; zbMATH DE number 1332320 (Why is no real title available?)
- scientific article; zbMATH DE number 1869443 (Why is no real title available?)
- 10.1162/153244303321897690
- 10.1162/1532443041424300
- 10.1162/1532443041424319
- 10.1162/153244304773936108
- A decision-theoretic generalization of on-line learning and an application to boosting
- A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training
- Additive logistic regression: a statistical view of boosting. (With discussion and a rejoinder by the authors)
- Arcing classifiers. (With discussion)
- Boosting With theL2Loss
- Boosting the margin: a new explanation for the effectiveness of voting methods
- Complexities of convex combinations and bounding the generalization error in classification
- Convexity, Classification, and Risk Bounds
- Efficient agnostic learning of neural networks with bounded fan-in
- Empirical margin distributions and bounding the generalization error of combined classifiers
- Greedy function approximation: A gradient boosting machine.
- Improved boosting algorithms using confidence-rated predictions
- Local Rademacher complexities
- Logistic regression, AdaBoost and Bregman distances
- Matching pursuits with time-frequency dictionaries
- On the Bayes-risk consistency of regularized boosting methods.
- Population theory for boosting ensembles.
- Process consistency for AdaBoost.
- Sequential greedy approximation for certain convex optimization problems
- Statistical behavior and consistency of classification methods based on convex risk minimization.
- The elements of statistical learning. Data mining, inference, and prediction
- Universal approximation bounds for superpositions of a sigmoidal function
- Weak convergence and empirical processes. With applications to statistics
Cited in
(74)- Variational networks: an optimal control approach to early stopping variational methods for image restoration
- A new perspective on boosting in linear regression via subgradient optimization and relatives
- Mathematical foundations of machine learning. Abstracts from the workshop held March 21--27, 2021 (hybrid meeting)
- A new accelerated proximal boosting machine with convergence rate \(O(1/t^2)\)
- Aggregation of estimators and stochastic optimization
- Boosting in the presence of outliers: adaptive classification with nonconvex loss functions
- A large-sample theory for infinitesimal gradient boosting
- Interpreting initial offset boosting via reconstitution in integral domain
- Tree-based boosting with functional data
- Boosting as a regularized path to a maximum margin classifier
- scientific article; zbMATH DE number 7370593 (Why is no real title available?)
- Dimension reduction boosting
- SVM-boosting based on Markov resampling: theory and algorithm
- Fast and strong convergence of online learning algorithms
- Random gradient boosting for predicting conditional quantiles
- Consistency and generalization bounds for maximum entropy density estimation
- Fast iterative regularization by reusing data
- Coupling the reduced-order model and the generative model for an importance sampling estimator
- Boosting for high-dimensional linear models
- Optimal rates for spectral algorithms with least-squares regression over Hilbert spaces
- Implicit regularization with strongly convex bias: Stability and acceleration
- Accelerated gradient boosting
- 10.1162/jmlr.2003.3.4-5.863
- scientific article; zbMATH DE number 7306853 (Why is no real title available?)
- AdaBoost Semiparametric Model Averaging Prediction for Multiple Categories
- The implicit bias of gradient descent on separable data
- Use of majority votes in statistical learning
- scientific article; zbMATH DE number 2089370 (Why is no real title available?)
- A boosting inspired personalized threshold method for sepsis screening
- Early stopping for $ L^2 $-boosting in high-dimensional linear models
- AdaBoost is consistent
- Fully corrective boosting with arbitrary loss and regularization
- Unbiased Boosting Estimation for Censored Survival Data
- The vanishing learning rate asymptotic for linear \(L^2\)-boosting
- Stochastic boosting algorithms
- Stochastic boosting algorithms
- On boosting kernel regression
- Fully corrective gradient boosting with squared hinge: fast learning rates and early stopping
- Pinball boosting of regression quantiles
- Small area estimation of the homeless in Los Angeles: an application of cost-sensitive stochastic gradient boosting
- Population theory for boosting ensembles.
- Boosted nonparametric hazards with time-dependent covariates
- Explainable subgradient tree boosting for prescriptive analytics in operations management
- Tweedie gradient boosting for extremely unbalanced zero-inflated data
- Early stopping and non-parametric regression: an optimal data-dependent stopping rule
- Early stopping in \(L_{2}\)Boosting
- Survival ensembles by the sum of pairwise differences with application to lung cancer microarray studies
- Estimation and inference of treatment effects with \(L_2\)-boosting in high-dimensional settings
- Nonparametric stochastic approximation with large step-sizes
- Insurance Premium Prediction via Gradient Tree-Boosted Tweedie Compound Poisson Models
- Deformation of log-likelihood loss function for multiclass boosting
- Boosting algorithms: regularization, prediction and model fitting
- Bi-cross-validation for factor analysis
- Deep learning: a statistical viewpoint
- Random classification noise defeats all convex potential boosters
- Deep learning for natural language processing: a survey
- A stochastic approximation view of boosting
- Randomized Gradient Boosting Machine
- Supervised projection approach for boosting classifiers
- Analysis of boosting algorithms using the smooth margin function
- Double machine learning with gradient boosting and its application to the Big \(N\) audit quality effect
- Complexities of convex combinations and bounding the generalization error in classification
- scientific article; zbMATH DE number 5251641 (Why is no real title available?)
- Convergence and Consistency of Regularized Boosting With Weakly Dependent Observations
- Toward Efficient Ensemble Learning with Structure Constraints: Convergent Algorithms and Applications
- scientific article; zbMATH DE number 2089371 (Why is no real title available?)
- A boosting method for maximization of the area under the ROC curve
- Regularization in statistics
- Infinitesimal gradient boosting
- Rejoinder: One-step sparse estimates in nonconcave penalized likelihood models
- Adaptive step-length selection in gradient boosting for Gaussian location and scale models
- AdaBoost and robust one-bit compressed sensing
- Optimization by Gradient Boosting
- A precise high-dimensional asymptotic theory for boosting and minimum-\(\ell_1\)-norm interpolated classifiers
This page was built for publication: Boosting with early stopping: convergence and consistency
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2583412)