Print this article - Statistica

1 downloads 0 Views 398KB Size Report
AND CONFIDENCE INTERVALS FOR THE DISTRIBUTION,. DENSITY AND QUANTILES OF KERNEL DENSITY ESTIMATES. C.S. Withers, S. Nadarajah. 1.
STATISTICA, anno LXVIII, nn. 3-4, 2008

EDGEWORTH AND CORNISH FISHER EXPANSIONS AND CONFIDENCE INTERVALS FOR THE DISTRIBUTION, DENSITY AND QUANTILES OF KERNEL DENSITY ESTIMATES C.S. Withers, S. Nadarajah

1. INTRODUCTION Our aim here is to present nonparametric confidence intervals (CIs) for densities on ℜ based on kernel estimates. Suppose we have a random sample X 1 ,..., X n of size n on ℜ with empirical distribution Fˆ ,( x ) from a distribution X∼F(x) with density f ( x 0 ) and derivatives f ⋅r ( x 0 ), r = 1, 2,... finite at some point x 0 . Let K(z) be a function on ℜ with the finite values

Bri = Bri ( K ) = ∫ z r K ( z )i dz for r=0,1,..., i=1,2,... and B01 = 1 . The kernel estimate for f ( x 0 ) of kernel K and bandwidth (or smoothing) parameter h is n

fˆ ( x 0 ) = ∫ K h ( x 0 − y )dFˆ ( y ) = n −1 ∑ U1i ,

(1)

i =1

where K h ( x ) = h −1K ( x / h ) and U1i = K h ( x 0 − X i ) . The bandwidth h = h( n ) is assumed to satisfy h → 0 as n → ∞ . This estimate has mean ∞

Th ( F , K ) = ∫ K h ( x 0 − y )dF ( y ) = ∫ K ( z ) f ( x 0 − hz )dz = ∑ f ⋅r ( x 0 )( −h )r Br 1 / r !. r =0

(2) Note that the existence of the full Taylor series in (2) is not needed for the results in this paper. The boundedness of the required number of derivatives of f is sufficient to validate the results presented. Now suppose that the kernel K is of order p≥1, that is, Br 1 = 0 for 1≤r

0 and α = 1/(2 p + 1) . This rate y holds of course for any c > 0 , not just the AMSE-optimal c = c ( f , x 0 ) say. Compare Parzen (1962). (Allowing c to depend on x means that fˆ ( x ) no longer integrates to one.) In fact, this choice 0

0

of α achieves minimum asymptotic integrated MSE (AIMSE)



E| fˆ ( x ) − f ( x )|2 dx = O( n −γ )

(4)

under suitable regularity conditions for y = 2 p /(2 p + 1) and c > 0 , not just the AIMSE-optimal c = c ( f ) say. See Theorem 2.1.7 of Prakasa Rao (1983), and (3.4) of Terrell and Scott (1992). Using either AMSE or AIMSE, optimal c can be estimated using an initial estimate for f. See, for example, Scott et al. (1977). However, the use of such an estimate may destroy its optimality property. Kernel estimates were introduced by Rosenblatt (1956). Since then there has been a huge literature on the subject. We mention only a few here. Silverman (1986) and Terrell and Scott (1992) show that the gain in optimizing c is generally not great compared with fixed c estimators. For example, the latter have asymptotic efficiency about 0.9 or 0.8 for f normal or Cauchy. (Terrell and Scott (1992) scale K so that B p 1 = ±1 .) Prakasa Rao (1983) has many related results. For example, some give convergence rates for supx | fˆ ( x ) − f ( x )|. See Theorems 2.1.10, 12, 15 and 20 and Theorems 2.2.3 and 2.2.4 for similar results. Silverman (1986) and Terrell and Scott (1992) also consider estimates of the form f ( x 0 ) = ∫ K h ( y ) ( x 0 − y )dFˆ ( y ) for K h of (1). When p = 2 one can choose h(y)

to achieve (4) as if p = 4 , that is with y = 8/9 . Optimizing c is one way of deciding how to choose h or equivalently how to scale K. Another way is to replace h by h( Fn ) = cn −α µ 2 ( Fn )1/ 2 , where µ 2 ( F ) is the variance and Fn the empirical distribution. One can show that this does not affect the optimal rate γ. Silverman (1986, page 45) suggests this with α = 1/(2 p + 1) for p = 2 and c = 1.06 . Turning now to CIs for f ( x 0 ) , the AMSE and AIMSE criteria are no longer relevant. The obvious criterion is to minimise the coverage error - the difference between the actual coverage probability and the nominal level of the CI - or rather the asymptotic coverage error (ACE). Theorem 2.1.18 of Prakasa Rao (1983) gives a consistent CI for f ( x 0 ) but the ACE is not given. The first order CIs are based on Studentizing. Their optimal bandwidth h again has the form (3), giving ACE = O( n − β ) . We shall refer to β as the ACE rate. Its possible values are β >0. The contents of this paper are organized as follows. Section 2 provides Edgeworth and Cornish Fisher expansions for Studentized versions of the kernel den-

283

Edgeworth and Cornish Fisher expansions and confidence intervals etc.

sity estimate, fˆ ( x 0 ) , and its extensions. These expansions are used to derive various CIs for f ( x 0 ) . Section 3 derives first order one- and two-sided CIs using asymptotic Studentization ( ε = 1) and empirical Studentization ( ε = p ) , where ε is an indicator variable. Section 4 derives second order one- and two-sided CIs. Section 5 considers choosing the ACE-optimal constant c in (3) for first order CIs based on empirical Studentization. Finally, some conclusions are noted in Section 6. 2. CORNISH FISHER EXPANSIONS Here, we lay the groundwork by showing that the kernel density estimate has Cornish-Fisher expansions with parameter m=nh. Moments and cumulants of the kernel estimate fˆ ( x 0 ) in (1) are derived in Section 2.1. These quantities are used in Sections 2.2–2.5 to provide Cornish Fisher expansions for Studentized versions of fˆ ( x 0 ) . Section 2.2 derives Cornish Fisher expansions for Yn = m 1/ 2 ( wˆ 1 − w1 )k2−h1/ 2 , w1 = f ( x 0 ) ,

where

(5)

wˆ 1 = fˆ ( x 0 ) ,

U1 = K h ( x 0 − X ) ,

κ rh = κ r (U 1 )

and

r −1

krh = h κ rh . Section 2.3 derives Cornish Fisher expansions for Yn = m 1/ 2 (Wˆ − W ), where W = ( w1 ,..., w q )′ , Wˆ = ( wˆ 1 ,

(6)

, wˆ q )′ , q ≥ 1 , w ih = Th ( F , K i ) , K i ( z ) = K ( z )i

and wˆ ih = Th ( Fˆ , K i ) . Note that we have suppressed the dependency of w ih on h by simply writing w i for w ih . Section 2.4 derives Cornish Fisher expansions for −1/ 2 , Yn 0 = m 1/ 2 ( t (Wˆ ) − t (W ))a 21

(7)

where t (W ) is a real function with finite derivatives and satisfies the asymptotic expansion (see (19) and Withers (1982)): κ r ( t (Wˆ )) ≈





i =r −1

a ri m − i

(8)

for r ≥ 1 and for certain bounded functions a ri = a rih ( w ) given by Withers (1982). Finally, Section 2.5 derives Cornish Fisher expansions for Yn 1 = m 1/2 t (Wˆ ) for t (W ) = ( w1 − f ( x 0 ))kˆ2−h1/ 2 and kˆ2 h an estimate of k2h .

(9)

284

C.S. Withers, S. Nadarajah

The expansions given are “formal”. We have given the expansions in the fullest possible forms - rather than say to the first or the second order - because they could lead to better approximations and have wider applicability. Often expansions give better approximations even if they diverge. We have not attempted to check the existence or the validity of the expansions. Some Cramer-type conditions and conditions on differentiability may be required to ensure existence and validity, see, for example, Hall (1992) and Garcia-Soidan (1998). This task is beyond the purpose of the present paper. 2.1 Moments and cumulants The estimate fˆ ( x 0 ) of (1) can be viewed as the mean of a random sample of size n, where each observation of the sample is distributed as U1 . Its ith moment is mih = EU1i = ∫ K h ( x 0 − y )i dF ( y ) = h 1−i w ih . By (2), ∞

w ih = ∑ f ⋅r ( x 0 )Bri ( −h )r / r !

(10)

= w i 0 + O( h )

(11)

r =0

for w i 0 = f ( x 0 )∫ K ( z )i dz . So, w ih is of magnitude 1 and mih = O( h 1−i ) . Typically K is symmetric so that (10) is a power series in h 2 . Both parametric and nonparametric approaches, using ŵ and Fˆ , have their merits. We can view wˆ i as the mean of a random sample of size n, where each observation of the sample is distributed as U i = K hi ( x 0 − X ) = h −1K (( x 0 − X )/ h )i .

(12)

The rth cumulant of wˆ 1 is given by κ r ( wˆ 1 ) = n 1−r κ rh = m 1−r krh .

(13)

The moments and cumulants of ŵ are polynomials in h and w. An expression for the general krh = krh ( w ) as a polynomial in h and w is given in Appendix B. One can now use (10) to expand krh as a formal power series in h with coefficients functions of the derivatives of f at x 0 . Note that krh is bounded as h → 0 and

Edgeworth and Cornish Fisher expansions and confidence intervals etc.

kr 0 = lim h r −1mrh = f ( x 0 )B0r .

285

(14)

h →0

2.2 Cornish Fisher expansions for (5) By (13), fˆ ( x 0 ) behaves like the mean of a random sample of size m = nh with rth cumulant krh . (Hall (1992, page 211) gave a heuristic justification for this.) By (13), the cumulants of wˆ = fˆ ( x ) satisfy the Cornish-Fisher assumption with re1

0

spect to m = nh . That is κ r ( wˆ 1 ) = O( m 1−r ) as m → ∞ . So, the distribution of Yn in (5) satisfies the Cornish-Fisher expansions ∞

Pn ( x ) = P ( Yn ≤ x ) ≈ Φ( x ) − φ ( x )∑ m −r / 2 hr ( x , λrh ),

(15)

r =1



dPn ( x )/ dx ≈ φ ( x )∑ m −r / 2 h r ( x , λrh ),

(16)

r =1 ∞

Φ −1 ( Pn ( x )) ≈ x − ∑ m − r / 2 f r ( x , λrh ),

(17)

r =1 ∞

Pn−1 ( Φ( x )) ≈ x + ∑ m −r /2 g r ( x , λrh ),

(18)

r =1

where Φ and φ are the unit normal distribution and density. Note that hr , h r , f r , g r are functions given in Cornish and Fisher (1960) for r ≤ 4 or in Withers (1984) and that λrh are the coefficients of the expansions. For example, h1 ( x , λrh ) =

f 1 ( x , λrh ) = g 1 ( x , λrh ) = λ3h ( x 2 − 1)/6 and h 1 ( x , λrh ) = λ3h ( x 3 − 3x )/6 . (An alternative derivation is to apply Example 1 of Withers (1983) with T ( F ) = Th ( F ) = ∫ K h ( x 0 − y )dF ( y ) , a ri = 0 if i ≠ r − 1 , and a ri = κ rh if i = r − 1 . So, κ r ( fˆ ( x 0 )) = κ rh n 1−r = krh m 1−r , and Yn = n1/2 ( f ( x 0 ) − κ1h )κ 2−h1/2 = m1/2 ( wˆ 1 − k1h )k2−h1/2 has an Edgeworth-Cornish-Fisher expansion with respect to m in terms of the coefficients Ar ,r −1 = κ rh κ 2−hr / 2 = λrh h 1−r / 2 for λrh of (17). By (18), one can replace n and { Ar ,r −1} in the Cornish-Fisher expansions by m and {λrh } .) The expansion (15) was noted in (4.83) of Hall (1992). (There is a slip there: µ40 in p2 ( y ) in 2 second equation of page 212 should be µ40 − 3h µ 20 . This slip is also on pages 269 and 271.)

286

C.S. Withers, S. Nadarajah

2.3 Cornish Fisher expansions for (6) The joint rth order cumulants of ŵ have magnitude m 1−r for the same reason that krh is bounded: κ ( wˆ i1 ,..., wˆ i r ) = m r −1kh ( i 1 ...i r )

(19)

for kh ( i 1 ...i r ) = h r −1κ (U i1 ,...,U ir ) = O(1) and U i of (12). This expresses kh ( i 1 ...i r ) in the form ki1 ...ir h ( F ) . We can also write kh ( i 1 ...i r ) = ki1 ...ir h ( w ) as a polynomial in h and w: see Appendix A. For example, kh ( ij ) = w i + j − hw i w j . Now fix q and set

V = ( kh ( ij ) : 1 ≤ i , j ≤ q ) . So, V is q × q . For x in ℜq , let ΦV ( x ) and φV ( x ) be the distribution and density of a normal q-vector with mean 0 and covariance V. Then for Yn in (6) the Edgeworth expansion (15) has an extension of the form ∞

P ( Yn ≤ x ) ≈ ΦV ( x ) + ∑ m − r / 2br ( −∂ / ∂x , kh )ΦV ( x )/ r !,

(20)

r =1

r

where, for s in ℜq , br ( s , kh ) = Br ( c ( s )) , Br ( c ) = ∑ Bri ( c ) , Br ( c ) and Bri ( c ) are i =0

the complete and partial exponential Bell polynomials tabled on page 307 of Comtet (1974), c r ( s ) = kr + 2,r +1 ( s )/{( r + 2)( r + 1)} for kr ,r −1 ( s ) = κ (U i1 ,...,U ir )s i1 ...s ir , and we use the tensor sum convention of implicitly summing repeated pairs of indices i 1 ,..., i r over their range 1,..., q . (See Appendix B for a definition of these Bell polynomials.) Equation (5.67) of Hall (1992) gives a double expansion version of this for q = 2 . 2.4 Cornish Fisher expansions for (7) By Withers (1982), the Cornish-Fisher expansions (15)–(18) also hold with Yn replaced by Yn 0 in (7) and λrh replaced by −r / 2 Ah = { Ari = a ri a 21 }

(21)

provided that a 21 is bounded away from 0 and that some regularity conditions ensuring ∞

sup|t1|+…+|t q|>ς



−∞

⎧⎪ q exp ⎨∑ ⎪⎩ j =1

⎫⎪ −1t j K j ( u )⎬ dF ( x − hu ) < 1 − C (ς )h ⎪⎭

(22)

287

Edgeworth and Cornish Fisher expansions and confidence intervals etc.

hold, where C(⋅) is some bounded real valued function. Also by Withers (1982), the first few coefficients in (8) are:

a 21 = t ⋅i t ⋅ j kh ( ij ), a11 = t ⋅ij kh ( ij )/ 2,

a 32 = c 21 + 3c 23 for t ⋅i1

ir

(23)

= ∂ r t (W )/ ∂w i1 ... ∂w ir , c 21 = t ⋅i t ⋅ j t ⋅l kh ( ijl ) , and c 23 = t ⋅i kh ( ij )t ⋅ jl kh ( lq )t ⋅q ,

where we use the tensor summation convention. For example, ∞

P ( Yn 0 ≤ x ) ≈ Φ( x ) − φ ( x )∑ m −r /2 hr ( x )

(24)

r =1

for hr ( x , Ah ) = hr ( x ) =

3 r −1

∑ Prih Her ( x ) , i =0

where Her ( x ) = φ ( x )−1 ( −d / dx )i φ ( x ) is

the rth Hermite polynomial, Prih is a polynomial in Ah , and the last summation is restricted to r−i odd. For example, h 2 ( x , Ah ) =



i =1,3,5

h1 ( x , Ah ) =

∑ P1ih Hei ( x )

and

i = 0,2

2 P2 ih He i ( x ) for P10 h = A11h , P12 h = A32 h /6 , P21h = ( A11h + A22h )/2 ,

P23h = (4 A11h A32 h + A43h )/ 24 , and He i ( x ) = E( x + −1N )r = ψ i

2 P25h = A32 By Withers (2000), h /72 .

say, where

N ∼ N (0,1) ,

ψ0 = 1,

ψ1 = x ,

ψ 2 = x 2 − 1 , ψ 3 = x 3 − 3x , ψ 4 = x 4 − 6x 2 + 3 , ψ 5 = x 5 − 10 x 3 + 15x , ....

2.5 Cornish Fisher expansions for (9) Now consider the Studentized estimate of f ( x 0 ) given by (9). We shall consider two types of estimates with a 21 = 1 + O( h ε ) for ε =1 or p. We can apply (24) −1/2 by noting that Yn 1 ≤ z if and only if Yn 0 ≤ x = a 21 ( z − m 1/ 2 t (W )) . So, −1/ 2 1/2 x = z + δ for δ = δ 0 + δ 1z , δ 0 = −a 21 m t (W ) = O p ( m 1/2 h p ) , and −1/ 2 δ 1 = a 21 − 1 = O( h ε ).

(25)

Also at x = z + δ , φ ( x )He i ( x ) = φ ( z )He i ( z ) − δφ ( z )He i +1 ( z ) + O(δ 2 ), φ ( x )h1 ( x ) = φ ( z )h1 ( z ) − δφ ( z ) ∑ P1ih He i +1 ( z ) + O(δ 2 ), i = 0,2

288

C.S. Withers, S. Nadarajah 3

P ( Yn 1 ≤ z ) = P ( Yn 0 ≤ x ) = Φ( x ) − φ ( x )∑ m −r /2 hr ( x ) + O( m −2 ) r =1

3

= Φ( z ) − φ ( z ){−δ + z δ 2 / 2 + ∑ m −r /2 hr ( z ) r =1

= −δ m −1/2

∑ P1ih Hei +1(z ) + O(δ 3 + m −1/2δ + m −1δ + m −2 ),

(26)

i = 0,2

giving P (|Yn 1 |≤ z ) = 2Φ( z ) − 1 − 2φ ( z ){z ( −δ 1 + δ 02 / 2) + m −1h2 ( z )

= −δ 0 m −1/2

∑ P1ih Hei +1(z ) + O(δ 03 + m −1/2δ 0 2 + m −1δ 0 + m −2 ), (27)

i = 0,2

3. FIRST ORDER CIs The first order CIs for f ( x 0 ) are obtained as usual by Studentizing. This can be done using the empirical estimate of mvarfˆ ( x ) = k ( w ) or by estimating its 0

2h 2

asymptotic approximation of (14), k2 h ( w ) ≈ f ( x 0 )∫ K ( z ) dz . 3.1 First order one-sided CIs based on asymptotic studentization Choose t ( w1 ) = ( w1 − f ( x 0 ))( w1B02 )−1/2 .

(28)

Then a 21 = t ⋅21k2 h = 1 + O( h ) . So, ε=1 in (25) and, by (26), P ( Yn 1 ≤ z ) = Φ( z ) + O( en 1 ) for en 1 = m −1/2 + h + m 1/2 h p . Taking h β = min((1 − α )/ 2, α , ( p + 1/ 2)α − 1/ 2) .

(29) as

in

(3),

en 1 = O( n − β )

with

The case p ≥ 2 : β has its maximum 1/3 at α =1/3 . So, a one-sided lower CI for f ( x 0 ) of level Φ( z ) + O( n −1/3 ) and “half-width” O( m −1/2 ) = O( n −1/3 ) is given by Yn 1 ≤ z , that is f ( x 0 ) ≥ L = wˆ 1 − ( B02 wˆ 1 / m )1/2 z .

(30)

289

Edgeworth and Cornish Fisher expansions and confidence intervals etc.

Replacing z by −z, a one-sided upper CI for f ( x 0 ) of level Φ( z ) + O( n −1/3 ) and “half-width” O( n −1/3 ) is given by Yn 1 ≥ z , that is f ( x 0 ) ≤ U = wˆ 1 + ( B02 wˆ 1 / m )1/2 z .

(31)

The case p = 1 : β has its maximum 1/4 at α =1/4 . So, both the error and width of these one-sided CIs of nominal level Φ(z ) are O( n −1/4 ) . Note 1. Note t ( w1 ) is strictly increasing. So, Yn 1 ≤ z if and only if wˆ 1 ≤ t −1 ( m −1/2z ) if and only if Yn ≤ z 0 = m 1/2{t −1( m −1/2z ) − k1h }k2−h1/2 . So, an asymptotic expansion for the left hand side of (29) is given by the right hand side of (15) at x = z 0 . Alternatively, we can “stabilize the variance” by choosing s ( wˆ 1 ) with variance ≈ s ⋅1 ( w1 )2 w1B02 / n = 1/ n , that is s ( w1 ) = 2( w1 / B02 )1/2 . That is, choosing t ( w1 ) = s ( w1 ) − s ( f ( x 0 )) , a CI with the same properties as (30) is given by Yn 1 = m 1/2 t ( wˆ 1 ) ≤ z , that is f ( x 0 ) ≥ L = {wˆ 11/2 − ( B02 / m )1/2 z / 2}2 .

(32)

Replacing z by –z, a one-sided upper CI for f ( x 0 ) of level Φ( z ) + O( n −1/3 ) and “halfwidth” O( n −1/3 ) is given by f ( x 0 ) ≤ U = {wˆ 11/2 + ( B02 / m )1/2 z / 2}2 .

(33)

A similar idea was used in S3, see page 210 of Hall (1992). ∞

Note (11) suggests estimating θ = w1 by θˆ = ∑τ i θˆi for θˆi = wˆ i / B0 i and {τ i } i =1 2

constants adding to 1. Its bias is O( h ) or O( h ) if K is symmetric (about 0). Also mvar (θˆ ) =



∑ τ iτ j kh ( ij )

so that E(θˆ − θ )2 = m −1θτ ′Bτ + O( h 2 + h / m ) for

i , j =1

B = ( B0,i + j ) . To this degree of accuracy this MSE is minimized by τ = B −1 1/1′B −1 1 , where 1 is an infinite vector of 1’s, giving E(θˆ − θ )2 = m −1θ /1′B −1 1 + O( h 2 + h / m ) . Apart from the question of if and how these infinite sums need to be truncated, clearly this estimate can be used to provide an alternative CI to that of (30). However, we shall see that its asymptotic efficiency relative to the following method is 0 for p≥3.

3.2 First order one-sided CIs based on empirical studentization Choose

290

C.S. Withers, S. Nadarajah

t (W ) = ( w1 − f ( x 0 ))k2h ( w )−1/2 ,

(34)

where k2 h ( w ) = k2 h = w 2 − hw12 . Then a 21 = t ⋅21kh (11) + 2t ⋅1t ⋅2 kh (12) + t ⋅22 kh (22) = 1 + O( h p ) since t ⋅1 = k2−h1/2 + O( h p +1 )

and t ⋅2 = O( h p ) . So, ε = p in (25). So, by (26), P ( Yn 1 ≤ z ) = Φ( z ) + O( en 1′ ) = O( n − β ) for en 1′ = m −1/2 + m 1/2 h p and = min((1 )/ 2,( p + 1/ 2) 1/ 2) with the maximum β = p /(2 p + 2)

at

(35)

α = 1/( p + 1) .

Φ( z ) + O( n

So,

− p /( 2 p + 2 )

a

one-sided

lower

) and “half-width” O( m

−1/2

CI

for

) = O( n

f (x0 )

− p /( 2 p + 2 )

of

level

) is given by

Yn 1 ≤ z , that is f ( x 0 ) ≥ L ′ = wˆ 1 − ( k2h ( wˆ )/ m )1/2 z .

(36)

Replacing z by −z gives the corresponding one-sided upper CI f ( x 0 ) ≤ U ′ = wˆ 1 + ( k2 h ( wˆ )/ m )1/2 z .

(37)

So, for p = 1 or 2, empirical and asymptotic Studentization achieve the same β but for p ≥ 3 empirical Studentization is superior. 3.3 First order two-sided CIs By

(27)

en 2 = δ 1 + δ 02

with +m

−1

h ε

∼ h + mh

of

(3),

2p

−1

+m

∼n

P (|Yn 1 |≤ z ) = 2Φ( z ) − 1 + O( en 2 ) −β

for

and β =min(εα ,1 − α ,(2p+1)α − 1) .

If ε = 1 the corresponding two-sided CI of level 2Φ( z ) − 1 + O( n − β ) is given by L ≤ f ( x 0 ) ≤ U for L and U of (30) and (31). An alternative with the same asymptotic properties is L ≤ f ( x 0 ) ≤ U , for L and Ũ of (32) and (33). If ε = p (i.e. using empirical Studentization), the corresponding two-sided CI of level 2Φ( z ) − 1 + O( n − β ) is given by L ′ ≤ f ( x 0 ) ≤ U ′ for L' and U' of (36) and (37). 4. SECOND ORDER CIs Here, we give second order one- and two-sided CIs for f ( x 0 ) .

291

Edgeworth and Cornish Fisher expansions and confidence intervals etc.

4.1 Second order one-sided CIs using asymptotic studentization Taking t of (28), by (26), P ( Yn 1 ≤ z ) = Φ( z ) + m −1/2 h1 ( z ) + O(δ 0 + m −1 ) so P ( Yn 1 − m −1/2 h1 ( z ) ≤ z ) = Φ( z ) + O(δ 0 + m −1 ) . So, one would expect that P ( Yn 1 − m −1/2 hˆ1 ( z ) ≤ z ) = Φ( z ) + O(δ 0 + m −1 )

(38)

if hˆ1 ( z ) = h1( z ) + O p ( m −1/2 ) . Let us take hˆ1 ( z ) = h1h ( z , wˆ ) . We now confirm (38). Given z, set t m (W ) = t ( w1 ) − m −1h1h ( z , w ) for t ( w1 ) of (28). By Lemma 5.1 of Withers (1983), κ r ( t m (Wˆ )) ≈



∑ a ri ′′m −i

i = r −1

for certain functions a ri ′′ = a ri ′′ ( w ) .

a10′′ = a10 , a 21′′ = a 21 , a11′′ = a11 − h1 ( z ) , and a 32′′ = a 32 . Set −1/2 and Yn 1′′ = m 1/2 t m (Wˆ ) . Then Yn 0′′ ≤ x if Yn 0′′ = m ( t m (Wˆ ) − t (W ))a 21 Also

1/2

and

only

Yn 1′′ = m

1/2

Φ( x ) − m

1/2 m 1/2 ( t m ( wˆ ) − t ( w )) ≤ y = a 21 x

if

t m ( wˆ ) ≤ z = y + m

−1/2

1/2

t(w )

and

this

−1

h1′′ ( x )φ ( x ) + O( m ) = Φ( z ) − m

x = z + O( h + m

1/2

−1/ 2

if

and

only

occurs

with

probability

h1 ''( z )φ ( z ) + O( e n 2 )

if since

h ) . Here, h1′′ is h1 for the coefficients a ri ′′ . So, h1′′ ( z ) = p

−1/2 −1/2 A11′′ + A32 ( z 2 − 1)/6 = h1 ( z ) − a 21 h1 ( z ) = O( h ) since A11′′ = A11 − a 21 h1( z ) .

So, P ( Yn 1′′ ≤ z ) = Φ( z ) + O( en 2 ) . This confirms (38). So, a second order lower one-sided CI is 1/2 −1/2 ˆ f ( x 0 ) ≥ L 2 = wˆ 1 − ( B02 wˆ 1 / m ) ( z + m h1 ( z )) . For h of (3), its error behaves as ∼ δ + m −1 ∼ h ε + m 1/ 2 h p + m −1 ∼ n − β for β = min((1 − α ), εα ,( p + 1/ 2)α − 1/ 2). (For p ≥ 2 this rate improves on β using the method of Section 3.2.) Replacing z by −z, a second order upper one-sided CI of level Φ( z ) + O( n − β ) is f ( x ) ≤ U = wˆ + ( B wˆ / m )1/2 ( z − m −1/2 hˆ ( z )) . 0

2

1

02 1

1

Recall that h1 ( z ) = A11 + A32 ( z − 1)/6 is given by (21)-(23) and (20). Now suppose we take instead hˆ10 ( z ) = lim h1h ( z , wˆ ) . This introduces another 2

h →0

error of magnitude O p ( h ) since h1h ( z , w ) has a power series expansion in h. The effect is to add a term of magnitude m −1/2 h into e n 2 in (38). But m −1/2 h = O( en 2 ) −3/ 2 so there is no change to the above α and β. Using λ30 = lim λ3h = w 20 w 30 = h →0

−1/ 2

f (x0 )

−3/ 2 B02 B03

, we obtain

292

C.S. Withers, S. Nadarajah

a 210 = 1, a110 = −λ30 / 2, a 320 = −2λ30 , h10 ( z ) = −λ30 (2z 2 + 1)/6.

(39)

4.2 Second order one-sided CIs using empirical studentization Now consider the second type of Studentizing. By (26), P ( Yn 1 − m −1/2 h1 ( z ) ≤ z ) = Φ( z ) + O( en 2′ ) for e n 2′ = m −1 + m 1/ 2 h p ∼ n − β and β =min(1 − α ,pα ,(p+1/2)α − 1/2) . As before one can show that P ( Y − m −1/2 hˆ ( z ) ≤ z ) = Φ( z ) + O( e ′ ) . So, a second order lower one-sided CI n1

n2

1

of level Φ( z ) + O( n

−β

) for β = 2 p /(2 p + 3) is given by α =3/(2p+3) and

f ( x 0 ) ≥ L 2′ = wˆ 1 − k2 h ( wˆ )1/ 2{m −1/ 2z − m −1h1h ( z , wˆ )} . The corresponding upper one-sided CI is f ( x 0 ) ≤ U 2′ = wˆ 1 + k2h ( wˆ )1/ 2{m −1/ 2z − m −1h1h ( z , wˆ )} . So, a twosided CI of level 2Φ(z ) − 1 + O( n − β ) is given by L 2′ ≤ f ( x 0 ) ≤ U 2′ . 4.3 An alternative approach For j ≥ 1, Example 1 of Withers (1983) gives a one-sided CI of level Φ( x ) + O( n − j / 2 ) of the form V jm ( Fˆ , x ) ≤ µ ( F ) for a mean µ ( F ) = E Y for Y = h( X )

and

h(x)

a

given

function,

where,

for

example,

j

V jn ( Fˆ , x ) = µ ( Fˆ ) + ∑ n − r / 2 qr ( Fˆ , x ) , q1 ( F , x ) = − µ21/2 x , r =1

q 2 ( F , x ) = µ 2−1 µ 3 (1 + 2 x 2 )/6 , and µr = µr ( Y ) .

Taking

Y = K h ( x 0 − X ) , we have

n − r / 2 qr ( F , z ) = m −r / 2 qr ( w , z ) , where

q1 ( w , z ) = −ν 21/h 2z , q 2 ( w , z ) = ν 2−h1ν 3h (1 + 2z 2 )/6 and ν rh = h r −1 µr ( Y ) . See Appendix B for these as polynomials in h and w. Replacing x by z, this suggests evaluating the coverage probability of the following jth order CI for f ( x 0 ) : L ′′j = V jm ( wˆ , z ) ≤ f ( x 0 ),

(40)

j

where V jn ( wˆ , z ) = wˆ 1 + ∑ m − r /2 qr ( wˆ , z ) . We now evaluate the probability of (40) r =1

for the case j=2. Note (40) holds if and only if m 1/ 2 t m (Wˆ ) ≤ z

for

t m (W ) = t (W ) + m −1t 1 (W ) , t 1 (W ) = w 2−3/ 2 w 3 (1 + 2z 2 )/6 , and t of (34). By Lemma 5.1 of Withers (1983), κ r ( t m (Wˆ )) ≈



∑ ari′ m −i ,

i = r −1

(41)

Edgeworth and Cornish Fisher expansions and confidence intervals etc.

293

where the leading a ri′ are given in terms of a ri for t = t 0 by a r ,r −1′ = a r ,r −1

a11′ = a11 + t 1 (W ) . So, the Cornish-Fisher expansions hold for −1/2 Yn 0 = m 1/2 ( t m (Wˆ ) − t (W ))a 21 with polynomials hi′( x ) say. By an argument similar to that proving (38) one obtains the probability of (40) as −1/2 Φ( z ) − m −1/2 h1′ ( z )φ ( z ) + O( e n′ 2 ) for h1′ ( z ) = h1 ( z ) + a 21 t 1 (W ) = O( h ) so that and

the ACE has magnitude m −1/2 h + en′ 2 . If p ≥ 3 this gives ACE rate β = 2/3 for α = 1/3 . If p ≤ 3 this gives ACE rate β = 2 p /(2 p + 3) for α = 3/(2 p + 3) , the same as achieved by a one-sided CI by empirical Studentization. 4.4 An improvement One can show that we can improve the rate given above for p ≥ 3 to that given for p ≥ 3 if we choose t 1 ( w ) so that h1′ ( z ) = O( h ε ) with 3ε ≥ p . (This follows since the ACE then has magnitude m −1/2 h ε + en′ 2 .) We now show how to achieve this with ε = p . Write t of (34) as ND −1/2 . So, N = O( h p ) is not a function of w, but D and the derivatives of N and D are. For example, D⋅1 = −2hw1 . Also

t ⋅1 = t 1 − ND −1/2 D⋅1 / 2 = t 1 + O( h p +1 ), t ⋅2 = − ND −3/2 / 2 = O( h p ), t ⋅11 = t 11 + 3 ND −5/2 D⋅1 /4 − ND −3/2 D⋅11 / 2 = t 11 + O( h p + 2 ), t ⋅12 = t 12 + 3 ND −5/2 D⋅21 /4 = t 12 + O( h p +1 ), t ⋅22 = 3ND −5/2 /4 = O( h p ) for t 1 = D −1/2 , t 11 = − D −3/2 D1 and t 12 = − D −3/2 / 2 . In this way we can approximate the derivatives of t to O( h p ) , and so using (21)-(23), approximate the coefficients a ri and hr ( z ) to O( h p ) . Call these approximations a ri = a ri ( w ) and hr ( z ) = hrh ( z , w ) . For example,

a11 = −λ3h / 2, a 32 = c 21 + 3c 23 , c 21 = D −3/2 k3h ,

294

C.S. Withers, S. Nadarajah −3/2 c 23 = − D −3/2 D⋅1k22h − D −5/2 k2h kh (12) = 2hw1k21/2 , h − ( w 3 − hw 1w 2 )k2 h

h1 ( z ) = h1h ( z , w ) = a11 + a 32 ( z 2 − 1)/6. −1/2 Now choose t m ( w ) = t ( w ) − h1h ( z , w ) . By (41), h1′ ( z ) = h1 ( z ) + a 21 t 1 ( w ) = O( h p )

so that the ACE of the CI m 1/2 t m ( wˆ ) ≤ z has magnitude en′ 2 giving ACE rate β = 2 p /(2 p + 3) for α = 3/(2 p + 3) , the same as achieved for a one-sided CI by empirical Studentization. This gives the second order lower one-sided CI f ( x 0 ) ≥ L 2 = wˆ 1 − k2h ( wˆ )1/2 ( m −1/2z + m −1h1h ( z , wˆ )) of level Φ( z ) + O( n − β ) . The corresponding upper CI is f ( x 0 ) ≤ U 2 = wˆ 1 + k2 h ( wˆ )1/2 ( m −1/2z − m −1h1h ( z , wˆ )) . 4.5 Second order two-sided CIs using empirical studentization By (27),

P (|Yn 1 |≤ z ) = 2Φ( z ) − 1 − 2φ ( z )m −1h2 ( z ) + O(δ 0 + m −2 ) = 2Φ( z ) − 1 − 2φ ( z )m −1h2 h ( z , w ) + O( en 3 ) for en 3 = h p + mh 2 p + m −2 so that

P (|Yn 1 |≤ z + m −1h2 ( z , w )) = 2Φ( z ) − 1 + O( en 3 ) and this also holds with w replaced by wˆ . Also en 3 ∼ n − β for β = min(pα , − 1+(2p+1)α ,2 − 2α )=2p/(p+2) at α =2/(p+2) .

This gives the two-sided CI | f ( x 0 ) − wˆ 1 | ≤ ( k2h ( wˆ )/ m )1/2 ( z + m −1h2 ( z , wˆ )) of asymptotic level 2Φ( z ) − 1 and ACE rate β =2p/(p+2) . 5. OPTIMIZING THE CONSTANT C IN THE SMOOTHING PARAMETER (3) 5.1 One-sided first order CI based on empirical studentization Consider the one-sided first order CI (36) with α =1/(p+1) . We noted that, for any c > 0 , its coverage error is O( n − β ) , where β =p/(2p+2) . We now show how to choose c = c ( f , x 0 ) to minimize the asymptotic coverage error (ACE). Set ρr = f ⋅r ( x 0 )Br 1 ( −1)r / r ! . Since K is of order p, ρ p ≠ 0 . We now prove the fol-

lowing

in

terms

of 1/( p +1)

[ h10 ( z )/{(2 p + 1)ρ p }]

h10 ( z )

of

. For ρ p < 0 ,

(39).

For

ρp > 0 ,

c( f , x 0 )

=

295

Edgeworth and Cornish Fisher expansions and confidence intervals etc.

c ( f , x 0 ) = [ h10 ( z )/ ρ p ]1/( p +1)

(42)

and the ACE is improved to O( n − β1 ) , where β1 = ( p + 2)/(2 p + 2) > β . This surprising result is analogous to the result of Silverman (1986) quoted in the introduction. (As for that result, in practice one needs to study the effect of replacing ρ p by an estimate. We shall not consider that here.) We now prove these results. By (25), x = z − m 1/2 h p ρ p k2−h1/2 + O( δ 2 + h p )

for δ 2 = m 1/2 h p +1 , and

P ( Yn 1 ≤ z ) = Φ( z ) − φ ( z )en 3 + O(δ 2 + m −1 + h p ) for

en 3 = m 1/2 h p ρ p k2−h1/2 + m −1/2 h1 ( z ) = en 30 + O( m −1/2 h ) , e n 30 = m 1/2 h p ρ p + m −1/2 h10 ( z ) = n − β a ( c ) , a ( c ) = ρ p k2−h1/2 c p +1/2 + c −1/2 h10 ( z ) ,

and α, β as in (35). We want to minimize the asymptotic value of the error |en 30 |, that is, minimize |a(c)|. We assume that

∫K

3

> 0 so that w 30 > 0 . The minimiz-

ing c is as given. For ρ p < 0, |a(c)|=0 at c of (42) giving ACE rate behaves as ε + m −1/2 h ∼ n − β1 since m −1 + h p ∼ n − p /( p +1) and p /( p + 1) > β1 for p > 2 . For ρ p < 0 , one can show that we can improve this ACE rate

β1

to

β 2 = ( p / 2 + 1)/( p + 1) > β1 by replacing h = cn −α by h = cn −α (1 + c 3n −α ) for c of

(42)

and

D = −( p + 1/ 2)c

c3 = N / D p +1/2

ρp + c

−1/2

for

N = c p + 3/2 ρ p +1 + ch11 ( z ) ,

and

h10 ( z )/ 2 , where h1r ( z ) is defined by the expan-



sion h1 ( z ) = h1h ( z ) = ∑ h1r ( z )h r . In fact, this process may be repeated using a r =0

third term in h to further increase the ACE rate above β 2 . 5.2 Two-sided first order CI based on empirical studentization −2 −2 First note that a 21 = 1 − 2w 20 ρ p h p + O( h p +1 ) and δ 1 = w 20 ρ p h p + O( h p +1 ) . So,

by (27), P (|Yn 1 |≤ z ) = 2Φ( z ) − 1 − 2φ ( z )n − β1 a ( c ) + O( n − β 2 ) for α1 = 1/( p + 1) , β1 = p /( p + 1) ,

β 2 = β1 + α 1 = 1 ,

a( c ) = γ 0 c −1 − γ 1c p + γ 2 c 2 p +1 ,

γ 0 = h20 ( z ) ,

296

C.S. Withers, S. Nadarajah

−2 −1/2 γ 1 / ρ p = zw 20 + w 20

∑ P1i 0 Hei +1(z ) ,

i = 0,2

−1 2 and γ 2 = zw 20 ρ p . The optimal c mini-

mizes|a(c)|. Again there are two cases. If the minimum is positive the CI has ACE rate β1 = p /( p + 1) . But if there exists c such that a ( c ) = 0 then the CI has ACE rate β 2 = 1 . Since γ 2 > 0 there one such c exists if γ 0 < 0 while if γ 1 > 0 there will be two such c if γ 1 is sufficiently large but otherwise no such c; that is, if γ 1 ≥ γ 10 , where γ 10 is given by eliminating c 0 from a( c 0 ) = a ( c 0 ) = 0 for a ( c ) the derivative of a ( c ) . 6. CONCLUSIONS In this paper, we have given first order and second order one- and two-sided CIs for f ( x 0 ) . These CIs are based on Edgeworth and Cornish Fisher expansions for Studentized versions of the kernel density estimate, fˆ ( x ) . Tables 1 0

and 2 summarize the main CIs we give in terms of their ACE rate β achieved at their optimal choice of α and their subsection. TABLE 1

ACE rate β and α for (3) for one-sided CIs ( ε = 1 for asymptotic Studentization, ε = p for empirical Studentization, p * = best ACE rate achievable using the c in (3), p = order of the kernel) ε

1 1 P p*

first order CIs 1/3 at 1/3 if p≥2 1/4 at 1/4 if p=1 p/(2p+2) at 1/(p+1) (p+2)/(2p+2) at 1/(p+1)

§ 3.1 3.1 3.2 5.1

second order CIs 1/2 at 1/2 if p≥2 2/5 at 3/5 if p=1 2p/(2p+3) at 3/(2p+3)

§ 4.1.1, Hall 4.1.1, Hall 4.1.2, 4.1.4, Hall, page 222

TABLE 2

ACE rate β and α for (3) for twosided CIs ( ε = 1 for asymptotic Studentization, ε = p for empirical Studentization, p * = best ACE rate achievable using the c in (3), p = order of the kernel) ε

1 P p*

first order CIs 1/2 at 1/2 p/(p+1) at 1/(p+1) 1 at 1/(p+1)

§ 3.3 3.3 5.2

second order CIs 2/3 at 2/3 2p/(p+2) at 2/(p+2)

§ 4.5 4.5

Hall (1992, Section 4.4.3, page 222) gives one- and two-sided CIs based on bootstrapping with β = 2 p /(2 p + 3) and β = p /( p + 1) using α = 3/(2 p + 3) and α = 1/( p + 1) , respectively. (This is for the case where bias is not estimated. Otherwise the formulas for β become too complicated.) We have used two types of Studentizations for the first order CIs given in Section 3. Using asymptotic Studentization, we obtain β = 1/3 and 1/ 2 for one- and two-sided CIs using α = 1/3 and 1/ 2 for p ≥ 2 . Using the usual empirical Studentization, these values of β are improved to β = p /(2 p + 2) for one-sided CIs, and

Edgeworth and Cornish Fisher expansions and confidence intervals etc.

297

β = p /( p + 1) for two-sided CIs both using α = 1/( p + 1) . The second order one- and two-sided CIs given (Section 4) have β = 2 p /(2 p + 3) for one-sided CIs, and β = 2 p /( p + 2) ) for two-sided CIs using α = 3/(2 p + 3) and 2/( p + 2) . We have also considered two cases for choosing the ACE-optimal constant c in (3) for first order CIs based on empirical Studentization. In one case, using this c increases the ACE rate to ( p + 2)/(2 p + 2) for one-sided CIs and to 1 for twosided CIs. This result is analogous to that of Silverman (1986). ACKNOWLEDGMENTS

The authors would like to thank the Editor-in-Chief and the referee for carefully reading the paper and for their comments which greatly improved the paper. Applied Mathematics Group Industrial Research Limited

CHRISTOPHER S. WITHERS

School of Mathematics University of Manchester

SARALEES NADARAJAH

REFERENCES L. BIGGERI,

(1999), Diritto alla ‘privacy’ e diritto all’informazione statistica, in Sistan-Istat, “Atti della Quarta Conferenza Nazionale di Statistica”, Roma, 11-13 novembre 1998, Roma, Istat, Tomo 1, pp. 259-279. L. COMTET, (1974), Advanced Combinatorics, Reidel, Dordrecht, Holland. R. A. FISHER, E. A. CORNISH, (1960), The percentile points of distributions having known cumulants, “Technometrics”, 2, pp. 209-225. P. GARCIA-SOIDAN, (1998), Edgeworth expansions for triangular arrays, “Communications in Statistics-Theory and Methods”, 27, pp. 705-722. P. HALL, (1992), The Bootstrap and Edgeworth Expansion, Springer-Verlag, New York. E. PARZEN, (1962), On the estimation of a probability density and mode, “Annals of Mathematical Statistics”, 33, pp. 1065-1076. B. L. S. PRAKASA RAO, (1983), Nonparametric Functional Estimation, Academic Press, Orlando. M. ROSENBLATT, (1956), Remarks on some nonparametric estimates of a density function, “Annals of Mathematical Statistics”, 27, pp. 883-835. D. W. SCOTT, R. A. TAPIA, J. R. THOMPSON, (1977), Kernel density estimation by discrete maximum penalized-likelihood criteria, “Annals of Statistics”, 8, pp. 820-832. B. W. SILVERMAN, (1986), Density Estimation for Statistics and Data Analysis, Chapman and Hall, New York. G. R. TERRELL, D. W. SCOTT, (1992), Variable kernel density estimation, “Annals of Statistics”, 20, pp. 1236-1265. C. S. WITHERS, (1982), The distribution and quantiles of a function of parameter estimates, “Annals of the Institute of Statistical Mathematics, Series A”, 34, pp. 55-68. C. S. WITHERS, (1983), Expansions for the distribution and quantiles of a regular functional of the empirical distribution with applications to nonparametric CIs, “Annals of Statistics”, 11, pp. 577-587. C. S. WITHERS, (1984), Asymptotic expansions for distributions and quantiles with power series cumulants, “Journal of the Royal Statistical Society, Series B”, 46, pp. 389-396. C. S. WITHERS, (2000), A simple expression for the multivariate Hermite polynomial, “Statistics and Probability Letters”, 47, pp. 165-169.

298

C.S. Withers, S. Nadarajah

APPENDIX A: WAYS TO CONSTRUCT KERNELS OF HIGHER ORDER

We saw in Section 1 that fˆ ( x 0 ) , the kernel estimate with kernel K of order

p has bias O( h p ) . One can reduce the bias to O( h p +1 ) by using f ( x 0 ) = fˆ ( x 0 ) − fˆ⋅ p ( x 0 )B p 1 ( −h ) p / p ! , where fˆ⋅ p ( x ) = ( d / dx ) p fˆ ( x ) . But this amounts to replacing the kernel K by a kernel of order p+1, giving K 0 ( z ) = K ( z ) − K ⋅ p ( z )B p 1 ( −1) p / p ! = (1 − D p B p 1 ( −1) p / p !)K ( z ) for D= ∂/∂z . If

K is symmetric then p is even so K 0 is of order p + 2 . Example 1. Take K (z )= φ (z ) so p = 2 . Set D= ∂/∂z . Then K 0 ( z ) = (1 − D 2 / 2)φ ( z ) = (1 − He 2 ( z )/ 2)φ ( z ) = (3 − z 2 )φ ( z )/ 2 has order 4, K 1 ( z ) = (1 + D 4 /8)(1 − D 2 / 2)φ ( z ) = (1 − He 2 ( z )/ 2 + He 4 ( z )/8 + He6 ( z )/16)φ ( z ) = = (15 − 45z 2 + 15z 4 − z 6 )/16φ ( z )/16

has order 8,

K 2 ( z ) = (1 − D 6 /96)(1 + D 4 /8)(1 − D 2 / 2)φ ( z ) = = (1 − He 2 ( z )/ 2 + He 4 ( z )/8 − He6 ( z )(1/16 + 1/96) + He8 ( z )/(96 × 2) = = − He10 ( z )/(96 × 8) + He12 ( z )/(96 × 8 × 2) has order 8, and so on.

299

Edgeworth and Cornish Fisher expansions and confidence intervals etc.

APPENDIX B: CUMULANTS IN TERMS OF MOMENTS

Here,

we

show

how

to

express

krh = m r −1κ r ( wˆ 1 )

and

r −1

kh ( i 1 ...i r ) = m κ ( wˆ i1 ,..., wˆ ir ) of (13) and (19) as polynomials in h and w = ( w1 , w 2 ,...) of (10). Let U be a real random variable with finite moments mr = EU r and cumulants κ r defined as usual by



∑ κ r t r / r ! ≡ ln(1 + S( t )) for all r =1

t in C for which the moment generating function 1 + S ( t ) = EeUt exists. By equation (2), page 160 of Comtet (1974), r

κ r = ∑ ( −1)i −1 ( i − 1)! Bri ( m ),

(43)

i =1

where for m = ( m1 , m 2 ,...) , Bri ( m ) is the partial exponential Bell polynomial defined by i

∞ ⎛ ∞ ⎞ r r ⎜ ∑ mr t / r ! ⎟ / i ! = ∑ Bri ( m )t / r ! r =i ⎝ r =1 ⎠

for i=0,1,... . These polynomials Bri ( m ) are tabled by Comtet on page 307. For example, Br 1 ( m ) = mr , Brr ( m ) = m1r , B32 ( m ) = 3m1m 2 , B42 ( m ) = 4 m1m3 + 3m 22 , and

B43 ( m ) = 6m12 m 2 . An explicit formula for Bri ( m ) is given by Bri ( m ) = for

π(n)

∑ {π ( n )m1n ...mrn 1

n∈N

r

the

partition

r

: n1 + ... + nr = i , n1 + 2n 2 + ... + rnr = r }

function

defined

by

r

π ( n ) = r !/ ∏ ( i !ni ni !)

for

i =1

r = n1 + 2n 2 + ... + rnr . Now take U = U1 = K h ( x 0 − X ) , where X has distribution F on ℜ. In the notation of Section 2, mr = mrh = h 1−r w r , and κ r = κ rh = h 1−r krh . It follows that r

krh = ∑ ( −1)i −1 ( i − 1)! h i −1Bri ( w ), i =1

a polynomial in h of degree r − 1 . For example,

k1h = w1 , k2 h = w 2 − hw12 , k3h = w 3 − 3hw 2 w1 + 2! h 2 w13 ,

(44)

300

C.S. Withers, S. Nadarajah

k4 h = w 4 − h(4 w1w 3 + 3w 22 ) + 2! h 2 (6w12 w 2 ) − 3! h 3w14 , k5h = w 5 − h (5w 1w 4 + 10w 2 w 3 ) + 2! h 2 (10w12 w 3 + 15w 1w 22 ) − 3! h 3 (10w 13 w 2 ) + 4 ! h 4 w15 .

Analogous to w r = w rh = h r −1EY r and krh = h r −1κ r ( Y ) , define ν rh = h r −1 µr ( Y ) . Expanding in terms of non-central moments gives r r −2 r ⎛r ⎞ ⎛ ⎞ ν rh = ∑ ⎜ ⎟ ( −hw1 ) j w r − j = ∑ ⎜ ⎟ ( −hw1 ) j w r − j + ( r − 1)( −h )r −1 w1r j =0 ⎝ j ⎠ j =0 ⎝ j ⎠

for w 0 = h −1 . For example, ν 2h = k2h = w 2 − hw12 and ν 4 h = k4 h + 3hk22h . The multivariate forms of (43) and (44) can now be written as r

κ i1 ...i r = ∑ ( −1)i −1 ( i − 1)! Bii1

ir

(w )

i =1

and r

kh ( i 1 ...i r ) = ∑ ( −1)i −1 ( i − 1)! h i −1Bii1

ir

( w ),

i =1

where Bii1 ...ir ( m ) is the multivariate form of Bri ( m ) . These may be written down on sight. For example, B42 ( m ) = 4 m1m3 + 3m 22 is replaced by B2i1 ...i 4 ( m ) = 4

∑ mi mi 1

3

i4

2

+ ∑ mi1i 2 mi 3i 4 , where

N

∑ f ( i1 ...i r )

is

f ( j 1 ... j r ) summed over

all N permutations j 1 ... j r of i 1 ...i r giving different terms. So, B2i1 ...i 4 ( m ) is the sum of 3

∑ mi i mi i 1 2

3 4

4

∑ mi mi 1

i4

2

= m i1 m i 2 ...i 4 + m i 2 m i 3 i 4 i1 + m i 3 m i 4 i1 i 2 + m i 4 m i1 i 2 i 4

and

= m i1i 2 m i 3i 4 + m i1i 3 m i 2 i 4 + m i1i 4 m i 2 i 3 . So,

kh ( i 1 ) = w i1 , kh ( i 1i 2 ) = w i1 + i 2 − hw i1 w i 2 , 3

kh ( i 1i 2 i 3 ) = w i1 + i 2 + i 3 − h ∑ w i1 + i 2 w i 3 + 2! h 2 w i1 w i 2 w i 3 , kh ( i1i 2 i 3 i 4 ) = 4

3

6

= w i1 + i 2 + i 3 + i 4 − h{∑ w i1 + i 2 + i 3 w i 4 + ∑ w i1 + i 2 w i 3 + i 4 } + 2! h 2 ∑ w i1 + i 2 w i 3 w i 4 − 3! h 3 w i1 w i 2 w i 3 w i 4 .

For example, kh (112) = w 4 − h(2w1w 3 + w 22 ) + 2h 2 w12 w 2 .

Edgeworth and Cornish Fisher expansions and confidence intervals etc.

301

SUMMARY

Edgeworth and Cornish Fisher expansions and confidence intervals for the distribution, density and quantiles of Kernel density estimates We show that kernel density estimates fˆ ( x 0 ) of bandwidth h = h( n ) → 0 satisfy the Cornish-Fisher assumption with parameter m=nh. This allows Cornish-Fisher expansions about the normal for standardized and Studentized kernel density estimates in powers of m −1/ 2 for smooth functions t. The expansions given are formal and the conditions for existence/validity are not explored. The expansions lead to first order confidence intervals (CIs) for f ( x 0 ) of level 1 − ω + O( n − β ) , where β = p /(2 p + 2) for one-sided CIs and β = p /( p + 1) for two-sided CIs, where p is the order of the kernel used. The second order one- and two-sided CIs are given with β = 2 p /(2 p + 3) and β = 2 p /( p + 2) . We show how to choose the bandwidth for asymptotic optimality.