Estimating the algorithmic variance of randomized ensembles via the bootstrap
From MaRDI portal
(Redirected from Publication:666594)
Abstract: Although the methods of bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. In particular, there are not many theoretical guarantees for deciding when an ensemble is "large enough" --- so that its accuracy is close to that of an ideal infinite ensemble. Due to the fact that bagging and random forests are randomized algorithms, the choice of ensemble size is closely related to the notion of "algorithmic variance" (i.e. the variance of prediction error due only to the training algorithm). In the present work, we propose a bootstrap method to estimate this variance for bagging, random forests, and related methods in the context of classification. To be specific, suppose the training dataset is fixed, and let the random variable denote the prediction error of a randomized ensemble of size . Working under a "first-order model" for randomized ensembles, we prove that the centered law of can be consistently approximated via the proposed method as . Meanwhile, the computational cost of the method is quite modest, by virtue of an extrapolation technique. As a consequence, the method offers a practical guideline for deciding when the algorithmic fluctuations of are negligible.
Recommendations
- Estimating a sharp convergence bound for randomized ensembles
- Bootstrap bias corrections for ensemble methods
- Measuring the algorithmic convergence of randomized ensembles: the regression setting
- Computationally efficient double bootstrap variance estimation
- Computation of Exact Bootstrap Confidence Intervals: Complexity and Deterministic Algorithms
- scientific article; zbMATH DE number 5233718
Cites work
- scientific article; zbMATH DE number 1726664 (Why is no real title available?)
- scientific article; zbMATH DE number 6378123 (Why is no real title available?)
- scientific article; zbMATH DE number 3860199 (Why is no real title available?)
- scientific article; zbMATH DE number 3870089 (Why is no real title available?)
- scientific article; zbMATH DE number 5056239 (Why is no real title available?)
- scientific article; zbMATH DE number 6665017 (Why is no real title available?)
- A bootstrap method for error estimation in randomized matrix multiplication
- Analysis of a random forests model
- Analyzing bagging
- Bagging predictors
- Boosting. Foundations and algorithms.
- Comments on: ``A random forest guided tour
- Condition. The geometry of numerical algorithms
- Consistency of random forests
- Consistency of random forests and other averaging classifiers
- Estimating the algorithmic variance of randomized ensembles via the bootstrap
- Estimation and accuracy after model selection
- Extrapolation methods theory and practice
- Foundations of Modern Probability
- How large should ensembles of classifiers be?
- On the asymptotics of random forests
- Practical Extrapolation Methods
- Properties of Bagged Nearest Neighbour Classifiers
- Quantifying uncertainty in random forests via confidence intervals and hypothesis tests
- Random Forests and Adaptive Nearest Neighbors
- Random Forests and Kernel Methods
- Random forests
- Random-projection ensemble classification. (With discussion).
- Richardson Extrapolation and the Bootstrap
- Sample size selection in optimization methods for machine learning
- Standard errors for bagged and random forest estimators
- The elements of statistical learning. Data mining, inference, and prediction
- Variance reduction in purely random forests
- Weak convergence and empirical processes. With applications to statistics
Cited in
(11)- How large should ensembles of classifiers be?
- scientific article; zbMATH DE number 7307469 (Why is no real title available?)
- Estimating a sharp convergence bound for randomized ensembles
- scientific article; zbMATH DE number 7370562 (Why is no real title available?)
- Bootstrapping the operator norm in high dimensions: error estimation for covariance matrices and sketching
- Estimating the algorithmic variance of randomized ensembles via the bootstrap
- Standard errors for bagged and random forest estimators
- A bootstrap method for error estimation in randomized matrix multiplication
- Randomized numerical linear algebra: Foundations and algorithms
- scientific article; zbMATH DE number 7370612 (Why is no real title available?)
- Measuring the algorithmic convergence of randomized ensembles: the regression setting
This page was built for publication: Estimating the algorithmic variance of randomized ensembles via the bootstrap
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q666594)