Boosting with early stopping: convergence and consistency

DOI10.1214/009053605000000255zbMATH Open1078.62038arXivmath/0508276OpenAlexW3098897816WikidataQ56169183 ScholiaQ56169183MaRDI QIDQ2583412FDOQ2583412

Authors: Tong Zhang, Bin Yu

Publication date: 16 January 2006

Published in: The Annals of Statistics (Search for Journal in Brave)

Abstract: Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulting estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early-stopping strategies under which boosting is shown to be consistent based on i.i.d. samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step-sizes, as known in practice through the work of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with epsilon o0 step-size becomes an L^1-margin maximizer when left to run to convergence.

Full work available at URL: https://arxiv.org/abs/math/0508276

Recommendations

zbMATH Keywords

cross validation matching pursuit greedy optimization

Mathematics Subject Classification ID

Nonparametric estimation (62G05) Nonparametric regression and quantile regression (62G08) Asymptotic properties of nonparametric inference (62G20) Classification and discrimination; cluster analysis (statistical aspects) (62H30) Stopping times; optimal stopping problems; gambling theory (60G40)

Cites Work

Cited In (74)

Uses Software

AdaBoost.MH

This page was built for publication: Boosting with early stopping: convergence and consistency

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2583412)