Abstract: We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. In this distribution-free regression setting, we show that boundedness of the conditional second moment of the response given the covariates is a necessary and sufficient condition for achieving nontrivial guarantees. As a starting point, we prove an optimal version of the classical in-expectation bound for the truncated least squares estimator due to Gy"{o}rfi, Kohler, Krzy.{z}ak, and Walk. However, we show that this procedure fails with constant probability for some distributions despite its optimal in-expectation performance. Then, combining the ideas of truncated least squares, median-of-means procedures, and aggregation theory, we construct a non-linear estimator achieving excess risk of order with an optimal sub-exponential tail. While existing approaches to linear regression for heavy-tailed distributions focus on proper estimators that return linear functions, we highlight that the improperness of our procedure is necessary for attaining nontrivial guarantees in the distribution-free setting.
Recommendations
- Robust linear least squares regression
- Loss minimization and parameter estimation with heavy tails
- Mean estimation and regression under heavy-tailed distributions: A survey
- Efficient algorithms and lower bounds for robust linear regression
- Iteratively reweighted \(\ell_1\)-penalized robust regression
Cites work
- {{#invoke:WikidataIB|getLink|Q3723487}} scientific article; zbMATH DE number 3954047 (Why is no real title available?)
- {{#invoke:WikidataIB|getLink|Q3967358}} scientific article; zbMATH DE number 3790208 (Why is no real title available?)
- {{#invoke:WikidataIB|getLink|Q4511091}} scientific article; zbMATH DE number 1522808 (Why is no real title available?)
- {{#invoke:WikidataIB|getLink|Q4692741}} scientific article; zbMATH DE number 194744 (Why is no real title available?)
- {{#invoke:WikidataIB|getLink|Q4770512}} scientific article; zbMATH DE number 3446442 (Why is no real title available?)
- {{#invoke:WikidataIB|getLink|Q4944751}} scientific article; zbMATH DE number 1420699 (Why is no real title available?)
- {{#invoke:WikidataIB|getLink|Q5600636}} scientific article; zbMATH DE number 3320125 (Why is no real title available?)
- {{#invoke:WikidataIB|getLink|Q3714486}} A theory of the learnable
- {{#invoke:WikidataIB|getLink|Q516010}} Adaptive importance sampling in least-squares Monte Carlo algorithms for backward stochastic differential equations
- {{#invoke:WikidataIB|getLink|Q842390}} Aggregation via empirical risk minimization
- {{#invoke:WikidataIB|getLink|Q5215471}} An unrestricted learning procedure
- {{#invoke:WikidataIB|getLink|Q1930659}} Challenging the empirical mean and empirical variance: a deviation study
- {{#invoke:WikidataIB|getLink|Q1582634}} Combining different procedures for adaptive regression
- {{#invoke:WikidataIB|getLink|Q4831997}} Competitive On-line Statistics
- {{#invoke:WikidataIB|getLink|Q1076400}} Conditions d'intégrabilité pour les multiplicateurs dans le TLC banachique. (Integrability conditions for multiplicators in the central limit theorem in Banach spaces)
- {{#invoke:WikidataIB|getLink|Q1824970}} Convergence of the Robbins-Monro method for linear problems in a Banach space
- {{#invoke:WikidataIB|getLink|Q2313287}} Convergence rates of least squares regression estimators with heavy-tailed errors
- {{#invoke:WikidataIB|getLink|Q520672}} Empirical entropy, minimax regret and minimax risk
- {{#invoke:WikidataIB|getLink|Q892246}} Empirical risk minimization for heavy-tailed losses
- {{#invoke:WikidataIB|getLink|Q5139054}} Extending the scope of the small-ball method
- {{#invoke:WikidataIB|getLink|Q2388975}} Fast learning rates in statistical inference through aggregation
- {{#invoke:WikidataIB|getLink|Q122792}} Geometric median and robust estimation in Banach spaces
- {{#invoke:WikidataIB|getLink|Q3658954}} How Many Variables Should be Entered in a Regression Equation?
- {{#invoke:WikidataIB|getLink|Q4674555}} Improving the sample complexity using global data
- {{#invoke:WikidataIB|getLink|Q4958226}} Learning Bounded Subsets of Lₚ
- {{#invoke:WikidataIB|getLink|Q5305859}} Learning Theory and Kernel Machines
- {{#invoke:WikidataIB|getLink|Q955138}} Learning by mirror averaging
- {{#invoke:WikidataIB|getLink|Q2796408}} Learning without concentration
- {{#invoke:WikidataIB|getLink|Q2583411}} Local Rademacher complexities
- {{#invoke:WikidataIB|getLink|Q2810786}} Loss minimization and parameter estimation with heavy tails
- {{#invoke:WikidataIB|getLink|Q2329044}} Mean estimation and regression under heavy-tailed distributions: A survey
- {{#invoke:WikidataIB|getLink|Q1848770}} Mixing strategies for density estimation.
- {{#invoke:WikidataIB|getLink|Q2334371}} Near-optimal mean estimators with respect to general norms
- {{#invoke:WikidataIB|getLink|Q309706}} Nonparametric stochastic approximation with large step-sizes
- {{#invoke:WikidataIB|getLink|Q2363649}} On aggregation for heavy-tailed classes
- {{#invoke:WikidataIB|getLink|Q1697051}} On optimality of empirical risk minimization in linear aggregation
- {{#invoke:WikidataIB|getLink|Q385457}} On the stability and accuracy of least squares approximations
- {{#invoke:WikidataIB|getLink|Q1360379}} On the strong universal consistency of a series type regression estimate
- {{#invoke:WikidataIB|getLink|Q2385535}} Optimal rates for the regularized least-squares algorithm
- {{#invoke:WikidataIB|getLink|Q282546}} Performance of empirical risk minimization in linear aggregation
- {{#invoke:WikidataIB|getLink|Q1342520}} Predicting \(\{ 0,1\}\)-functions on randomly drawn points
- {{#invoke:WikidataIB|getLink|Q354190}} Quantitative error estimates for a least-squares Monte Carlo algorithm for American option pricing
- {{#invoke:WikidataIB|getLink|Q1079379}} Random generation of combinatorial structures from a uniform distribution
- {{#invoke:WikidataIB|getLink|Q2303034}} Regression function estimation on non compact support in an heteroscesdastic model
- {{#invoke:WikidataIB|getLink|Q1604219}} Relative expected instantaneous loss bounds
- {{#invoke:WikidataIB|getLink|Q5945695}} Relative loss bounds for on-line density estimation with the exponential family of distributions
- {{#invoke:WikidataIB|getLink|Q2302852}} Risk minimization by median-of-means tournaments
- {{#invoke:WikidataIB|getLink|Q5896300}} Robust Statistics
- {{#invoke:WikidataIB|getLink|Q2054489}} Robust \(k\)-means clustering for distributions with two moments
- {{#invoke:WikidataIB|getLink|Q2196239}} Robust covariance estimation under \(L_4\)-\(L_2\) norm equivalence
- {{#invoke:WikidataIB|getLink|Q661182}} Robust linear least squares regression
- {{#invoke:WikidataIB|getLink|Q2196199}} Robust machine learning by median-of-means: theory and practice
- {{#invoke:WikidataIB|getLink|Q2656601}} Robust multivariate mean estimation: the optimality of trimmed mean
- {{#invoke:WikidataIB|getLink|Q2174664}} Robust statistical learning with Lipschitz and convex loss functions
- {{#invoke:WikidataIB|getLink|Q1731055}} Sub-Gaussian estimators of the mean of a random vector
- {{#invoke:WikidataIB|getLink|Q510694}} Sub-Gaussian mean estimators
- {{#invoke:WikidataIB|getLink|Q343803}} The lower tail of random quadratic forms with applications to ordinary least squares
- {{#invoke:WikidataIB|getLink|Q2788420}} The sample complexity of learning linear predictors with the squared loss
- {{#invoke:WikidataIB|getLink|Q1305928}} The space complexity of approximating the frequency moments
- {{#invoke:WikidataIB|getLink|Q1915050}} Weak convergence and empirical processes. With applications to statistics
Cited in
(6)- Robust linear least squares regression
- Nonexact oracle inequalities, \(r\)-learnability, and fast rates
- Robust linear regression with broad distributions of errors
- Suboptimality of constrained least squares and improvements via non-linear predictors
- An elementary analysis of ridge regression with random design
- Least squares regression under weak moment conditions
This page was built for publication: Distribution-free robust linear regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2113267)