Copula-based regression models with data missing at random (Q2201548)

From MaRDI portal
scientific article
Language Label Description Also known as
English
Copula-based regression models with data missing at random
scientific article

    Statements

    Copula-based regression models with data missing at random (English)
    0 references
    0 references
    0 references
    0 references
    29 September 2020
    0 references
    Given \(Y\) a regressand and \(\mathbf{W}=(W_1,\dots,W_d)^\top\) \(d\)-dimensional regressors, the generalized regression model \(\min_{a \in\mathbb{R}}\mathbf{E}[L\{g(Y)-a\}|\mathbf{W}=\mathbf{w}]\) is considered, where \(\mathbf{E}\) denotes mathematical expectation, \(g(Y)\) is a known function of \(Y\), and \(L(v)\) is a loss function whose derivative exists almost everywhere. This model includes many prominent cases. For example, if \(L(v)=v^2\) and \(g(Y)=Y\) then \(a_0(\mathbf{w})=\mathbf{E}(Y|\mathbf{W}=\mathbf{w})\) is the conditional mean regression. If \(L(v)=v\{\tau-\mathbf{1}(v \leq 0)\}\) and \(g(Y)=Y\) then \(a_0(\mathbf{w})\) is the \(\tau\)-th conditional quantile. Conditional distribution function and asymmetric least squares also can be represented by this model. \par A conditional mean regression function is estimated by exploiting copulas. This approach allows a conditional expectation of the loss function given \(\mathbf{W}\) to rewrite as an unconditional expectation involving a semiparametric copula and nonparametric marginal distributions. A semiparametric copula and the target regression curve are estimated via the calibration approach. A benchmark approach assumes that complete data are available. In the article this approach is generalized to data missing at random (MAR), i.e., data \([Y_i,\mathbf{W}_i]\) and indicators of their missing status \(\mathbf{T}_i=(T_{0i},T_{1i},\dots,T_{di})\), where \(T_{ji}=1/0\), depending on observable/unobservable status of \(j\)-th component of \([Y_i,\mathbf{W}_i]\), are conditionally independent of each other given covariates which are observable for all \(i \in \{1,\dots,N\}\). The consistency and asymptotic normality of the estimated regression curve are proved. Simulation results show that the proposed approach performs well in finite samples, while a benchmark approach fails with substantial bias under MAR.
    0 references
    calibration estimation
    0 references
    generalized regression model
    0 references
    missing at random (MAR)
    0 references
    semiparametric copula
    0 references
    0 references
    0 references
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references