Covariance regression with random forests

From MaRDI portal
Publication:71739

DOI10.48550/ARXIV.2209.08173arXiv2209.08173MaRDI QIDQ71739FDOQ71739


Authors: Cansu Alakus, Denis Larocque, Aurelie Labbe Edit this on Wikidata


Publication date: 16 September 2022

Abstract: Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and biomedicine. We propose a new method called Covariance Regression with Random Forests (CovRegRF) to estimate the covariance matrix of a multivariate response given a set of covariates, using a random forest framework. Random forest trees are built with a splitting rule specially designed to maximize the difference between the sample covariance matrix estimates of the child nodes. We also propose a significance test for the partial effect of a subset of covariates. We evaluate the performance of the proposed method and significance test through a simulation study which shows that the proposed method provides accurate covariance matrix estimates and that the Type-1 error is well controlled. An application of the proposed method to thyroid disease data is also presented. CovRegRF is implemented in a freely available R package on CRAN.











Summary: This article presents a novel method for estimating the covariance matrix of a multivariate response based on a set of covariates using a random forest framework. The methodology involves constructing trees optimized to maximize differences in sample covariance between child nodes, utilizing OOB data to estimate conditional covariance matrices for new observations. Key aspects include model setup with assumptions about error terms and covariate relationships, estimation through nearest neighbor analysis from OOB data, hypothesis testing for variable effects, assessment of variable importance, simulation study validation, and real data application. The method demonstrates flexibility, computational efficiency, and applicability in capturing complex relationships among variables, offering a competitive alternative to existing models in the literature.

Summary_simple: The article discusses an innovative method for estimating the covariance matrix of multivariate responses using a random forest framework. This technique involves growing trees where splits are designed to maximize differences in sample covariance between child nodes based on covariates, then utilizing OOB data to estimate conditional covariances for new observations. Key steps include setting up the model with assumptions about response vectors and error terms, building trees optimized for covariate-based splits, estimating covariance matrices via nearest neighbors from OOB data, conducting hypothesis tests, assessing variable importance, and evaluating performance through simulations and a real data example. The method is flexible, captures complex relationships, and demonstrates efficiency in estimation tasks compared to other models.


This page was built for publication: Covariance regression with random forests

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q71739)