Confidence sets for split points in decision trees
From MaRDI portal
Abstract: We investigate the problem of finding confidence sets for split points in decision trees (CART). Our main results establish the asymptotic distribution of the least squares estimators and some associated residual sum of squares statistics in a binary decision tree approximation to a smooth regression curve. Cube-root asymptotics with nonnormal limit distributions are involved. We study various confidence sets for the split point, one calibrated using the subsampling bootstrap, and others calibrated using plug-in estimates of some nuisance parameters. The performance of the confidence sets is assessed in a simulation study. A motivation for developing such confidence sets comes from the problem of phosphorus pollution in the Everglades. Ecologists have suggested that split points provide a phosphorus threshold at which biological imbalance occurs, and the lower endpoint of the confidence set may be interpreted as a level that is protective of the ecosystem. This is illustrated using data from a Duke University Wetlands Center phosphorus dosing study in the Everglades.
Recommendations
Cites work
- scientific article; zbMATH DE number 991833 (Why is no real title available?)
- A continuous mapping theorem for the argmax‐functional in the non‐unique case
- Analyzing bagging
- Confidence sets for nonparametric wavelet regression
- Cube root asymptotics
- Detecting Abrupt Changes by Wavelet Methods
- Empirical-Bias Bandwidths for Local Polynomial Nonparametric Regression and Density Estimation
- Estimation of the mean of a multivariate normal distribution
- Large sample confidence regions based on subsamples under minimal assumptions
- Likelihood ratio tests for monotone functions.
- Nonparametric estimation of a discontinuity in regression
- On the estimation of jump points in smooth curves
- Subsampling
- Subsampling inference in cube root asymptotics with an application to Manski's maximum score estimator.
- Weak convergence and empirical processes. With applications to statistics
Cited in
(17)- Rates of convergence for random forests via generalized U-statistics
- Asymptotics for p-value based threshold estimation under repeated measurements
- Limit distribution theory for block estimators in multiple isotonic regression
- Transformation-Invariant Learning of Optimal Individualized Decision Rules with Time-to-Event Outcomes
- Asymptotics for p-value based threshold estimation in regression settings
- Local M-estimation with discontinuous criterion for dependent and limited observations
- Model-robust inference for continuous threshold regression models
- Berry-Esseen bounds for Chernoff-type nonstandard asymptotics in isotonic regression
- A random forest guided tour
- Divide and conquer in nonstandard problems and the super-efficiency phenomenon
- Streaming motion in Leo I
- Inference after estimation of breaks
- Confidence intervals for multiple isotonic regression and other monotone models
- Quantile regression models for current status data
- A Tree-Based Semi-Varying Coefficient Model for the COM-Poisson Distribution
- Extending the scope of empirical likelihood
- Asymptotics for change-point models under varying degrees of mis-specification
This page was built for publication: Confidence sets for split points in decision trees
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q995416)