Distribution-free robust linear regression

From MaRDI portal
Publication:2113267

DOI10.4171/MSL/27zbMATH Open1493.62429arXiv2102.12919OpenAlexW4205966363MaRDI QIDQ2113267FDOQ2113267


Authors: Jaouad Mourtada, Tomas Vaškevičius, Nikita Zhivotovskiy Edit this on Wikidata


Publication date: 11 March 2022

Published in: Mathematical Statistics and Learning (Search for Journal in Brave)

Abstract: We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. In this distribution-free regression setting, we show that boundedness of the conditional second moment of the response given the covariates is a necessary and sufficient condition for achieving nontrivial guarantees. As a starting point, we prove an optimal version of the classical in-expectation bound for the truncated least squares estimator due to Gy"{o}rfi, Kohler, Krzy.{z}ak, and Walk. However, we show that this procedure fails with constant probability for some distributions despite its optimal in-expectation performance. Then, combining the ideas of truncated least squares, median-of-means procedures, and aggregation theory, we construct a non-linear estimator achieving excess risk of order d/n with an optimal sub-exponential tail. While existing approaches to linear regression for heavy-tailed distributions focus on proper estimators that return linear functions, we highlight that the improperness of our procedure is necessary for attaining nontrivial guarantees in the distribution-free setting.


Full work available at URL: https://arxiv.org/abs/2102.12919




Recommendations




Cites Work


Cited In (6)





This page was built for publication: Distribution-free robust linear regression

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2113267)