PLS for Big Data: a unified parallel algorithm for regularised group PLS

From MaRDI portal
Publication:2323935

DOI10.1214/19-SS125zbMATH Open1431.62249arXiv1702.07066OpenAlexW2971816392MaRDI QIDQ2323935FDOQ2323935

Pierre Lafaye de Micheaux, Benoit Liquet, Matthew William Sutton

Publication date: 13 September 2019

Published in: Statistics Surveys (Search for Journal in Brave)

Abstract: Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocs of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of observations and in presence of high collinearity between variables. Different sparse versions of PLS have been developed to integrate multiple data sets while simultaneously selecting the contributing variables. Sparse modelling is a key factor in obtaining better estimators and identifying associations between multiple data sets. The cornerstone of the sparsity version of PLS methods is the link between the SVD of a matrix (constructed from deflated versions of the original matrices of data) and least squares minimisation in linear regression. We present here an accurate description of the most popular PLS methods, alongside their mathematical proofs. A unified algorithm is proposed to perform all four types of PLS including their regularised versions. Various approaches to decrease the computation time are offered, and we show how the whole procedure can be scalable to big data sets.


Full work available at URL: https://arxiv.org/abs/1702.07066




Recommendations




Cites Work


Cited In (1)

Uses Software





This page was built for publication: PLS for Big Data: a unified parallel algorithm for regularised group PLS

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2323935)