On the amount of noise inherent in bandwidth selection for a kernel density estimator (Q1117628)

From MaRDI portal
scientific article
Language Label Description Also known as
English
On the amount of noise inherent in bandwidth selection for a kernel density estimator
scientific article

    Statements

    On the amount of noise inherent in bandwidth selection for a kernel density estimator (English)
    0 references
    1987
    0 references
    In the kernel density estimator \[ \hat f_ h=n^{- 1}\sum^{n}_{i=1}h^{-1} K((x-X_ i)/h) \] where \(\{X_ 1,...,X_ n\}\) is a random sample from a distribution which has density function f, the choice of the bandwidth sequence \(h\equiv h_ n\) is crucial to the performance of this estimator. The choice of h which minimizes the mean integrated square error, \[ MISE(\hat f_ h,f)=E \int^{\infty}_{- \infty}(\hat f_ h(x)-f(x))^ 2dx=\int^{\infty}_{-\infty}E(\hat f_ h(x)-f(x))^ 2dx, \] is \(h_ f=n^{-1/5}\alpha (K)\beta (f)\) where \(\alpha\) (K) and \(\beta\) (f) are constants which depend on the kernel K and density f, respectively. Since the choice of K is available to the user, \(\alpha\) (K) is known. However, \[ \beta (f)=[\int^{\infty}_{-\infty}| f^{(2)}(y)|^ 2dy]^{-1/5} \] varies greatly even over well-known statistical distributions, f is unknown, and \(\hat f_ h\) is not robust for poor choices of h. Thus, any practical method of choosing a bandwidth must depend only on the sample. Let \(\hat h_ f\) be the choice of h which minimizes the integrated square error, \[ \Delta (h,f)=\int (\hat f_ h(x)-f(x))^ 2dx. \] Let \(\hat h_ c\) be the least squares, cross-validation choice of h which minimizes \[ CV(h)=\int \hat f_ h(x)^ 2dx-n^{- 1}\sum^{n}_{i=1}\hat f_{h,i}(X_ i) \] where \(\hat f_{h,i}\) denotes the kernel density estimator with the i th observation deleted from the sample. Clearly, \(\hat h_ f\) and \(\hat h_ c\) are functions of the sample, and \(\Delta\) (ĥ\({}_ f,f)\leq \Delta (\hat h_ c,f)\). However, when K is a smooth symmetric density and f is twice differentiable, then \[ \hat h_ c/\hat h_ f-1={\mathcal O}(n^{- 1/10})\quad and\quad \Delta (\hat h_ c,f)/\Delta (\hat h_ f,f)- 1={\mathcal O}(n^{-1/5}), \] where \({\mathcal O}_ p\) denotes bounded in probability. The major results of this paper show that these upper bounds are the best possible in the sense that for \(\hat h\) any measurable function of \(X_ 1,...,X_ n\), \[ \lim_{\epsilon \to 0}\liminf_{n\to \infty}\sup_{f\in F}P_ f[| \hat h/\hat h_ f- 1| >\epsilon n^{-1/10}]=1,\quad and \] \[ \lim_{\epsilon \to 0}\liminf_{n\to \infty}\sup_{f\in F}P_ f[| \Delta (\hat h,f)/\Delta (\hat h_ f,f)-1| >\epsilon n^{-1/5}]=1 \] where F is the class of all densities whose second derivatives exist and are uniformly bounded by a constant \(B>0\).
    0 references
    0 references
    data-driven estimate
    0 references
    noise
    0 references
    smoothing parameter selection
    0 references
    window width
    0 references
    kernel density estimator
    0 references
    bandwidth
    0 references
    mean integrated square error
    0 references
    least squares
    0 references
    cross-validation
    0 references
    upper bounds
    0 references
    0 references
    0 references
    0 references