Distribution-free robust linear regression
From MaRDI portal
Publication:2113267
DOI10.4171/MSL/27zbMATH Open1493.62429arXiv2102.12919OpenAlexW4205966363MaRDI QIDQ2113267FDOQ2113267
Authors: Jaouad Mourtada, Tomas Vaškevičius, Nikita Zhivotovskiy
Publication date: 11 March 2022
Published in: Mathematical Statistics and Learning (Search for Journal in Brave)
Abstract: We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. In this distribution-free regression setting, we show that boundedness of the conditional second moment of the response given the covariates is a necessary and sufficient condition for achieving nontrivial guarantees. As a starting point, we prove an optimal version of the classical in-expectation bound for the truncated least squares estimator due to Gy"{o}rfi, Kohler, Krzy.{z}ak, and Walk. However, we show that this procedure fails with constant probability for some distributions despite its optimal in-expectation performance. Then, combining the ideas of truncated least squares, median-of-means procedures, and aggregation theory, we construct a non-linear estimator achieving excess risk of order with an optimal sub-exponential tail. While existing approaches to linear regression for heavy-tailed distributions focus on proper estimators that return linear functions, we highlight that the improperness of our procedure is necessary for attaining nontrivial guarantees in the distribution-free setting.
Full work available at URL: https://arxiv.org/abs/2102.12919
Recommendations
- Robust linear least squares regression
- Loss minimization and parameter estimation with heavy tails
- Mean estimation and regression under heavy-tailed distributions: A survey
- Efficient algorithms and lower bounds for robust linear regression
- Iteratively reweighted \(\ell_1\)-penalized robust regression
least squaresrobust estimationimproper learningmedian-of-means tournamentsrandom design linear regression
Cites Work
- Weak convergence and empirical processes. With applications to statistics
- Fast learning rates in statistical inference through aggregation
- Title not available (Why is that?)
- Learning Theory and Kernel Machines
- Robust Statistics
- Aggregation via empirical risk minimization
- Learning by mirror averaging
- Nonparametric stochastic approximation with large step-sizes
- Title not available (Why is that?)
- Title not available (Why is that?)
- Title not available (Why is that?)
- Convergence rates of least squares regression estimators with heavy-tailed errors
- Geometric median and robust estimation in Banach spaces
- Challenging the empirical mean and empirical variance: a deviation study
- Loss minimization and parameter estimation with heavy tails
- Title not available (Why is that?)
- Robust linear least squares regression
- The space complexity of approximating the frequency moments
- Combining different procedures for adaptive regression
- Mixing strategies for density estimation.
- Local Rademacher complexities
- Improving the sample complexity using global data
- Title not available (Why is that?)
- Optimal rates for the regularized least-squares algorithm
- Random generation of combinatorial structures from a uniform distribution
- Robust \(k\)-means clustering for distributions with two moments
- Adaptive importance sampling in least-squares Monte Carlo algorithms for backward stochastic differential equations
- Learning without concentration
- Performance of empirical risk minimization in linear aggregation
- A theory of the learnable
- On the stability and accuracy of least squares approximations
- Title not available (Why is that?)
- Conditions d'intégrabilité pour les multiplicateurs dans le TLC banachique. (Integrability conditions for multiplicators in the central limit theorem in Banach spaces)
- The lower tail of random quadratic forms with applications to ordinary least squares
- Quantitative error estimates for a least-squares Monte Carlo algorithm for American option pricing
- How Many Variables Should be Entered in a Regression Equation?
- Competitive On-line Statistics
- Relative loss bounds for on-line density estimation with the exponential family of distributions
- Convergence of the Robbins-Monro method for linear problems in a Banach space
- Robust multivariate mean estimation: the optimality of trimmed mean
- Sub-Gaussian mean estimators
- Empirical entropy, minimax regret and minimax risk
- Relative expected instantaneous loss bounds
- On the strong universal consistency of a series type regression estimate
- Predicting \(\{ 0,1\}\)-functions on randomly drawn points
- Empirical risk minimization for heavy-tailed losses
- On optimality of empirical risk minimization in linear aggregation
- Robust machine learning by median-of-means: theory and practice
- Learning Bounded Subsets of Lₚ
- An unrestricted learning procedure
- On aggregation for heavy-tailed classes
- Risk minimization by median-of-means tournaments
- Sub-Gaussian estimators of the mean of a random vector
- Robust covariance estimation under \(L_4\)-\(L_2\) norm equivalence
- Near-optimal mean estimators with respect to general norms
- Regression function estimation on non compact support in an heteroscesdastic model
- Robust statistical learning with Lipschitz and convex loss functions
- Mean estimation and regression under heavy-tailed distributions: A survey
- Extending the scope of the small-ball method
- The sample complexity of learning linear predictors with the squared loss
Cited In (6)
- Robust linear regression with broad distributions of errors
- Suboptimality of constrained least squares and improvements via non-linear predictors
- Nonexact oracle inequalities, \(r\)-learnability, and fast rates
- An elementary analysis of ridge regression with random design
- Least squares regression under weak moment conditions
- Robust linear least squares regression
This page was built for publication: Distribution-free robust linear regression
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2113267)