On the amount of noise inherent in bandwidth selection for a kernel density estimator (Q1117628): Difference between revisions

Revision as of 02:15, 5 March 2024

scientific article

Language	Label	Description	Also known as
English	On the amount of noise inherent in bandwidth selection for a kernel density estimator	scientific article

Statements

instance of

scholarly article

0 references

title

On the amount of noise inherent in bandwidth selection for a kernel density estimator (English)

0 references

published in

The Annals of Statistics

0 references

publication date

1987

0 references

review text

In the kernel density estimator \[ \hat f_ h=n^{- 1}\sum^{n}_{i=1}h^{-1} K((x-X_ i)/h) \] where \(\{X_ 1,...,X_ n\}\) is a random sample from a distribution which has density function f, the choice of the bandwidth sequence \(h\equiv h_ n\) is crucial to the performance of this estimator. The choice of h which minimizes the mean integrated square error, \[ MISE(\hat f_ h,f)=E \int^{\infty}_{- \infty}(\hat f_ h(x)-f(x))^ 2dx=\int^{\infty}_{-\infty}E(\hat f_ h(x)-f(x))^ 2dx, \] is \(h_ f=n^{-1/5}\alpha (K)\beta (f)\) where \(\alpha\) (K) and \(\beta\) (f) are constants which depend on the kernel K and density f, respectively. Since the choice of K is available to the user, \(\alpha\) (K) is known. However, \[ \beta (f)=[\int^{\infty}_{-\infty}| f^{(2)}(y)|^ 2dy]^{-1/5} \] varies greatly even over well-known statistical distributions, f is unknown, and \(\hat f_ h\) is not robust for poor choices of h. Thus, any practical method of choosing a bandwidth must depend only on the sample. Let \(\hat h_ f\) be the choice of h which minimizes the integrated square error, \[ \Delta (h,f)=\int (\hat f_ h(x)-f(x))^ 2dx. \] Let \(\hat h_ c\) be the least squares, cross-validation choice of h which minimizes \[ CV(h)=\int \hat f_ h(x)^ 2dx-n^{- 1}\sum^{n}_{i=1}\hat f_{h,i}(X_ i) \] where \(\hat f_{h,i}\) denotes the kernel density estimator with the i th observation deleted from the sample. Clearly, \(\hat h_ f\) and \(\hat h_ c\) are functions of the sample, and \(\Delta\) (ĥ\({}_ f,f)\leq \Delta (\hat h_ c,f)\). However, when K is a smooth symmetric density and f is twice differentiable, then \[ \hat h_ c/\hat h_ f-1={\mathcal O}(n^{- 1/10})\quad and\quad \Delta (\hat h_ c,f)/\Delta (\hat h_ f,f)- 1={\mathcal O}(n^{-1/5}), \] where \({\mathcal O}_ p\) denotes bounded in probability. The major results of this paper show that these upper bounds are the best possible in the sense that for \(\hat h\) any measurable function of \(X_ 1,...,X_ n\), \[ \lim_{\epsilon \to 0}\liminf_{n\to \infty}\sup_{f\in F}P_ f[| \hat h/\hat h_ f- 1| >\epsilon n^{-1/10}]=1,\quad and \] \[ \lim_{\epsilon \to 0}\liminf_{n\to \infty}\sup_{f\in F}P_ f[| \Delta (\hat h,f)/\Delta (\hat h_ f,f)-1| >\epsilon n^{-1/5}]=1 \] where F is the class of all densities whose second derivatives exist and are uniformly bounded by a constant \(B>0\).

0 references

zbMATH Keywords

data-driven estimate

0 references

noise

0 references

smoothing parameter selection

0 references

window width

0 references

kernel density estimator

0 references

bandwidth

0 references

mean integrated square error

0 references

least squares

0 references

cross-validation

0 references

upper bounds

0 references

0 references

0 references

MaRDI publication profile

0 references

Identifiers

zbMATH Open document ID

0667.62022

0 references

DOI

10.1214/aos/1176350259

0 references

Mathematics Subject Classification ID

0 references

0 references

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1117628

Revision as of 21:04, 9 February 2024 RedirectionBot (talk \| contribs) Bots 2,880,369 edits ‎Changed an Item ← Older edit	Revision as of 02:15, 5 March 2024 Import240304020342 (talk \| contribs) 4,416,906 edits Set profile property. Newer edit →
	Property / MaRDI profile type
		MaRDI publication profile
	Property / MaRDI profile type: MaRDI publication profile / rank
		Normal rank