Additive partially linear models for massive heterogeneous data (Q1722060)

The authors of the paper generalize the partially linear model (PLM) presented in [\textit{T. Zhao} et al., Ann. Stat. 44, No 4, 1400--1437 (2016; Zbl 1358.62050)] and propose an additive partially linear model (APLM) for modeling massive heterogeneous data. Let $\{(Y_i,\mathbf{X}_i, \mathbf{Z}_i)\}$, $i=1,2,\ldots, N$, be the observations from a sample consisting of $ N$ subjects. According to APLM, there exist $s$ independent sub-populations, and the data from the $j$-th sub-population satisfy the following equalities, \[ Y^{(j)}=\mathbf{X}^T\,\overrightarrow{\beta}_0^{\,(j)}+\sum_{k=1}^{K}g_{0k}(Z_k)+\varepsilon,\qquad j\in\{1,2,\dots,s\}, \] where $\mathbf{X} = (X_1,\dots, X_d)^T$, $\mathbf{Z }= (Z_1, \dots, Z_K)$, $\overrightarrow{\beta}_{0}^{(j)}=(\beta_{01}^{(j)},\dots, \beta_{0d}^{(j)})^T$ is the vector of unknown parameters for the $j$-th sub-population, $g_{01}, \dots, g_{0K}$ are unknown smooth functions, and the random variable $\varepsilon$ has zero mean and a finite variance. Under the model proposed, $Y^{(j)}$ depends on $\mathbf{X}$ linearly but with coefficients varying across different sub-populations, whereas $Y^{(j)}$ depends on $\mathbf{Z}$ through additive non-linear functions that are common to all sub-populations. \par The main assumptions on the data structure and on the unknown parameters are described. The hypothesis testing procedures are presented. The asymptotic properties of estimators are derived. The performance of the proposed methods is evaluated via a simulated studies and a real data.

0 references

reviewed by

Jonas Šiaulys

0 references

zbMATH Keywords

divide-and-conquer

0 references

homogeneity

0 references

heterogeneity

0 references

oracle property