On the amount of noise inherent in bandwidth selection for a kernel density estimator (Q1117628)

scientific article

Language	Label	Description	Also known as
English	On the amount of noise inherent in bandwidth selection for a kernel density estimator	scientific article

Statements

instance of

scholarly article

0 references

title

On the amount of noise inherent in bandwidth selection for a kernel density estimator (English)

0 references

published in

The Annals of Statistics

0 references

publication date

1987

0 references

review text

In the kernel density estimator \[ \hat f_ h=n^{- 1}\sum^{n}_{i=1}h^{-1} K((x-X_ i)/h) \] where \(\{X_ 1,...,X_ n\}\) is a random sample from a distribution which has density function f, the choice of the bandwidth sequence \(h\equiv h_ n\) is crucial to the performance of this estimator. The choice of h which minimizes the mean integrated square error, \[ MISE(\hat f_ h,f)=E \int^{\infty}_{- \infty}(\hat f_ h(x)-f(x))^ 2dx=\int^{\infty}_{-\infty}E(\hat f_ h(x)-f(x))^ 2dx, \] is \(h_ f=n^{-1/5}\alpha (K)\beta (f)\) where \(\alpha\) (K) and \(\beta\) (f) are constants which depend on the kernel K and density f, respectively. Since the choice of K is available to the user, \(\alpha\) (K) is known. However, \[ \beta (f)=[\int^{\infty}_{-\infty}| f^{(2)}(y)|^ 2dy]^{-1/5} \] varies greatly even over well-known statistical distributions, f is unknown, and \(\hat f_ h\) is not robust for poor choices of h. Thus, any practical method of choosing a bandwidth must depend only on the sample. Let \(\hat h_ f\) be the choice of h which minimizes the integrated square error, \[ \Delta (h,f)=\int (\hat f_ h(x)-f(x))^ 2dx. \] Let \(\hat h_ c\) be the least squares, cross-validation choice of h which minimizes \[ CV(h)=\int \hat f_ h(x)^ 2dx-n^{- 1}\sum^{n}_{i=1}\hat f_{h,i}(X_ i) \] where \(\hat f_{h,i}\) denotes the kernel density estimator with the i th observation deleted from the sample. Clearly, \(\hat h_ f\) and \(\hat h_ c\) are functions of the sample, and \(\Delta\) (ĥ\({}_ f,f)\leq \Delta (\hat h_ c,f)\). However, when K is a smooth symmetric density and f is twice differentiable, then \[ \hat h_ c/\hat h_ f-1={\mathcal O}(n^{- 1/10})\quad and\quad \Delta (\hat h_ c,f)/\Delta (\hat h_ f,f)- 1={\mathcal O}(n^{-1/5}), \] where \({\mathcal O}_ p\) denotes bounded in probability. The major results of this paper show that these upper bounds are the best possible in the sense that for \(\hat h\) any measurable function of \(X_ 1,...,X_ n\), \[ \lim_{\epsilon \to 0}\liminf_{n\to \infty}\sup_{f\in F}P_ f[| \hat h/\hat h_ f- 1| >\epsilon n^{-1/10}]=1,\quad and \] \[ \lim_{\epsilon \to 0}\liminf_{n\to \infty}\sup_{f\in F}P_ f[| \Delta (\hat h,f)/\Delta (\hat h_ f,f)-1| >\epsilon n^{-1/5}]=1 \] where F is the class of all densities whose second derivatives exist and are uniformly bounded by a constant \(B>0\).

0 references

zbMATH Keywords

data-driven estimate

0 references

noise

0 references

smoothing parameter selection

0 references

window width

0 references

kernel density estimator

0 references