On the amount of noise inherent in bandwidth selection for a kernel density estimator (Q1117628)

In the kernel density estimator \[ \hat f_ h=n^{- 1}\sum^{n}_{i=1}h^{-1} K((x-X_ i)/h) \] where \(\{X_ 1,...,X_ n\}\) is a random sample from a distribution which has density function f, the choice of the bandwidth sequence \(h\equiv h_ n\) is crucial to the performance of this estimator. The choice of h which minimizes the mean integrated square error, \[ MISE(\hat f_ h,f)=E \int^{\infty}_{- \infty}(\hat f_ h(x)-f(x))^ 2dx=\int^{\infty}_{-\infty}E(\hat f_ h(x)-f(x))^ 2dx, \] is \(h_ f=n^{-1/5}\alpha (K)\beta (f)\) where \(\alpha\) (K) and \(\beta\) (f) are constants which depend on the kernel K and density f, respectively. Since the choice of K is available to the user, \(\alpha\) (K) is known. However, \[ \beta (f)=[\int^{\infty}_{-\infty}| f^{(2)}(y)|^ 2dy]^{-1/5} \] varies greatly even over well-known statistical distributions, f is unknown, and \(\hat f_ h\) is not robust for poor choices of h. Thus, any practical method of choosing a bandwidth must depend only on the sample. Let \(\hat h_ f\) be the choice of h which minimizes the integrated square error, \[ \Delta (h,f)=\int (\hat f_ h(x)-f(x))^ 2dx. \] Let \(\hat h_ c\) be the least squares, cross-validation choice of h which minimizes \[ CV(h)=\int \hat f_ h(x)^ 2dx-n^{- 1}\sum^{n}_{i=1}\hat f_{h,i}(X_ i) \] where \(\hat f_{h,i}\) denotes the kernel density estimator with the i th observation deleted from the sample. Clearly, \(\hat h_ f\) and \(\hat h_ c\) are functions of the sample, and \(\Delta\) (ĥ\({}_ f,f)\leq \Delta (\hat h_ c,f)\). However, when K is a smooth symmetric density and f is twice differentiable, then \[ \hat h_ c/\hat h_ f-1={\mathcal O}(n^{- 1/10})\quad and\quad \Delta (\hat h_ c,f)/\Delta (\hat h_ f,f)- 1={\mathcal O}(n^{-1/5}), \] where \({\mathcal O}_ p\) denotes bounded in probability. The major results of this paper show that these upper bounds are the best possible in the sense that for \(\hat h\) any measurable function of \(X_ 1,...,X_ n\), \[ \lim_{\epsilon \to 0}\liminf_{n\to \infty}\sup_{f\in F}P_ f[| \hat h/\hat h_ f- 1| >\epsilon n^{-1/10}]=1,\quad and \] \[ \lim_{\epsilon \to 0}\liminf_{n\to \infty}\sup_{f\in F}P_ f[| \Delta (\hat h,f)/\Delta (\hat h_ f,f)-1| >\epsilon n^{-1/5}]=1 \] where F is the class of all densities whose second derivatives exist and are uniformly bounded by a constant \(B>0\).

0 references

Mathematics Subject Classification ID

62G05

0 references

0 references

0 references

data-driven estimate

0 references