A partially linear framework for massive heterogeneous data

From MaRDI portal
Publication:309709

DOI10.1214/15-AOS1410zbMATH Open1358.62050arXiv1410.8570WikidataQ36352871 ScholiaQ36352871MaRDI QIDQ309709FDOQ309709

Han Liu, Tianqi Zhao, Guang Cheng

Publication date: 7 September 2016

Published in: The Annals of Statistics (Search for Journal in Brave)

Abstract: We consider a partially linear framework for modelling massive heterogeneous data. The major goal is to extract common features across all sub-populations while exploring heterogeneity of each sub-population. In particular, we propose an aggregation type estimator for the commonality parameter that possesses the (non-asymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. This oracular result holds when the number of sub-populations does not grow too fast. A plug-in estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available. We also test the heterogeneity among a large number of sub-populations. All the above results require to regularize each sub-estimation as though it had the entire sample size. Our general theory applies to the divide-and-conquer approach that is often used to deal with massive homogeneous data. A technical by-product of this paper is the statistical inferences for the general kernel ridge regression. Thorough numerical results are also provided to back up our theory.


Full work available at URL: https://arxiv.org/abs/1410.8570




Recommendations




Cites Work


Cited In (71)

Uses Software





This page was built for publication: A partially linear framework for massive heterogeneous data

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q309709)