Partial least squares prediction in high-dimensional regression (Q1731062)

The linear regression model \[ y=\mu+\beta^\top(X-\mathbb{E}(X))+\varepsilon \] is considered, where $y$ is univariate response, $X\in \mathbb{R}^p$ is random predictor vector, $\mu$ and $\beta$ are unknown coefficients, and the centered error $\varepsilon$ is independent of $X$. It is assumed that $(y, X)$ follows a nonsingular multivariate normal distribution and that the data $(y_i, X_i)$, $i=1,\dots, n,$ arise as independent copies of $(y, X)$. The partial least squares (PLS) estimator $\hat{\beta}$ developed in [\textit{R. D. Cook} et al., J. R. Stat. Soc., Ser. B, Stat. Methodol. 75, No. 5, 851--877 (2013; Zbl 1411.62137)] is used. The asymptotic behavior of PLS prediction is studied as $n$ and $p$ diverge in various alignments. It is shown that there is a range of regression scenarios where PLS predictors have $\sqrt{n}$ convergence rate, even when $n$ is essentially smaller than $p,$ and an even wider range where the rate is slower but may still produce practically useful results. It is shown also that PLS predictions achieve their best asymptotic behavior in abundant regressions where many predictors contribute information about the response.

0 references

Mathematics Subject Classification ID

62J05

0 references

0 references

0 references

0 references