Nonparametric estimation of a regression function (Q5916348): Difference between revisions

From MaRDI portal
Set OpenAlex properties.
Import241208061232 (talk | contribs)
Normalize DOI.
 
Property / DOI
 
Property / DOI: 10.1214/aos/1176347382 / rank
Normal rank
 
Property / DOI
 
Property / DOI: 10.1214/AOS/1176347382 / rank
 
Normal rank

Latest revision as of 09:49, 16 December 2024

scientific article; zbMATH DE number 4612
Language Label Description Also known as
English
Nonparametric estimation of a regression function
scientific article; zbMATH DE number 4612

    Statements

    Nonparametric estimation of a regression function (English)
    0 references
    0 references
    0 references
    25 June 1992
    0 references
    Let \((X_ 1,Y_ 1),\dots,(X_ n,Y_ n)\) be i.i.d. on \(I\times\mathbb{R}\), where \(I\) is a compact set in \(\mathbb{R}^ p\). Let \(m(x)=E(Y\mid X=x)\). Let \(F\) be the marginal distribution function (d.f.) of \(X\) and let \(F_ n\) be its empirical d.f. It is assumed that \(\sup_ x E(Y^{4s}\mid X=x)<\infty\) for some integer \(s\geq 2\). Let \(\{W_{nk}:\;k=(k_ 1,\dots,k_ p)\in D_ n\}\) be a sequence of weight functions (depending on \(F\)) on \(I\times I\), where \(D_ n\) is an index set and \(\hbox{card}(D_ n)=K_ n\) with \(K_ n/n^ s\to 0\). From the above sequence of weight functions the authors construct a sequence of estimates \[ \hat m_ k(x)=\sum Y_ jW_ k(x,X_ j,F_ n)/n,\qquad k\in D_ n. \] A data-dependent method of choosing the (smoothness) index \(k\) which minimizes the prediction square error is proposed. Since this leads to a \(\tilde k\) which depends on an unknown distribution of \((X,Y)\), the authors heuristically motivate applying \(\hat k\) which is the minimizer of \[ \hat T_ n(k)=n^{- 2}\sum\hat\varepsilon^ 2_{kj}[1+2n^{-1}W_ k(X_ j,X_ j,F_ n)], \] where \(\hat\varepsilon_{kj}=Y_ j-\hat m_ k(X_ j)\). Then they use \(\hat m_ k\) as an estimate of the unknown regression function \(m(x)\). This estimate can be specialized to piecewise polynomial, spline, orthogonal series, kernel and nearest neighbor methods. The main optimality result is that for all of these methods \(L_ n(\hat k)/L_ n(\tilde k)\to 1\) in probability, where \[ L_ n(k)=\int(\hat m_ k(x)- m(x))^ 2 dF(x). \] Further results of this kind and a numerical example are also given.
    0 references
    kernel method
    0 references
    weight functions
    0 references
    data-dependent method
    0 references
    prediction square error
    0 references
    piecewise polynomial
    0 references
    spline
    0 references
    orthogonal series
    0 references
    nearest neighbor methods
    0 references

    Identifiers