Skewing and Generalized Jackknifing in Kernel Density Estimation

0 downloads 0 Views 221KB Size Report
Kernel methods are very popular in nonparametric density estimation. In this article we suggest a simple estimator which reduces the bias to the fourth power of ...
MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016 ©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

COMMUNICATIONS IN STATISTICS Theory and Methods Vol. 32, No. 11, pp. 2153–2162, 2003

Skewing and Generalized Jackknifing in Kernel Density Estimation Choongrak Kim,1 Woochul Kim,2 and Byeong U. Park2,* 1

Department of Statistics, Pusan National University, Pusan, Korea 2 Department of Statistics, Seoul National University, Seoul, Korea

ABSTRACT Kernel methods are very popular in nonparametric density estimation. In this article we suggest a simple estimator which reduces the bias to the fourth power of the bandwidth, while the variance of the estimator increases only by at most a moderate constant factor. Our proposal turns out to be a fourth order kernel estimator and may be regarded as a new version of the generalized jackknifing approach (Schucany W. R., Sommers, J. P. (1977). Improvement of Kernal type estimators. Journal of the American Statistical Association 72:420–423.) applied to kernel density estimation.

*Correspondence: Byeong U. Park, Department of Statistics, Seoul National University, Seoul, 151 747 Korea; E-mail: [email protected]. 2153 DOI: 10.1081/STA-120024473 Copyright & 2003 by Marcel Dekker, Inc.

0361-0926 (Print); 1532-415X (Online) www.dekker.com

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016 ©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

2154

Kim, Kim, and Park

1. INTRODUCTION Nonparametric methods have received a lot of attention in density estimation, and kernel density estimation is very popular among them. Best references in this area are Silverman (1986) and Wand and Jones (1995). There have been many proposals for reducing the bias of the classical kernel density estimator. Among them, use of a higher-order kernel is the simplest one with the longest history. Other proposals include the transformation kernel density estimator of Ruppert and Cline (1994), the multiplicative bias correction of Jones et al. (1995), the variable bandwidth approach of Abramson (1982), and the varying location kernel estimator of Samiuddin and El-Sayyad (1990). See Jones and Signorini (1997) for a useful compendium of these methods. Recently, Choi and Hall (1998) gave a proposal, termed ‘‘skewing method’’, for reducing the order of bias in regression setting. The basic idea is to shift the center of the local fit to the left and right of the point at which one wishes to estimate the curve and take a convex combination of three estimators, the two oppositely shifted estimators and the symmetric one. It is therefore another version of the generalized jackknifing approach (Schucany and Sommers (1977), Jones and Foster (1993)). Choi and Hall (1998) showed that skewing with local linear regression reduces the order of bias to the fourth power of the bandwidth at the expense of a slight increase in variance by a constant factor. Cheng et al. (2000) applied the same idea in locally parametric density estimation. In this article we apply the skewing method to the classical kernel density estimator instead of rather complicated locally parametric density estimators. The idea is to consider tangent lines of the classical kernel estimator placed at points a little to the left and right of the point x where we wish to estimate the density. We show that, by evaluating the two tangent lines and the classical kernel estimator at x and taking a convex combination of these three values, one can reduce the order of bias to the fourth power of bandwidth. The proposed estimator with the proper amount of skewing turned out to be a kernel estimator using a specific fourth order kernel, which is expected from the fact that the skewing method can be regarded as a new version of the generalized jackknifing. In Sec. 2, the new density estimator is introduced, and its asymptotic properties are illustrated. It will be seen that our proposal is a fourth order kernel estimator. Numerical properties of our estimator are illustrated in Sec. 3.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016 ©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Kernel Density Estimation

2155

2. THE NEW GENERALIZED JACKKNIFE ESTIMATOR Let X1 , . . . , Xn be a random sample from a distribution with an unknown density f ðÞ, which we wish to estimate. The kernel estimator of f at x is   n 1 X x  Xi ^ f ðxÞ ¼ K , nh i¼1 h

ð1Þ

where h is the bandwidth, and K is the kernel function. One typical feature of nonparametric estimators, including the kernel estimator given at Eq. (1), is that they under-estimate at peaks and over-estimate at troughs. Motivated by this phenomenon, we suggest an estimator at x as  f^ ðxÞ þ f^ðxÞ þ 2 f^2 ðxÞ f~ðxÞ ¼ 1 1 , 1 þ 1 þ 2

ð2Þ

where 1 , 2 > 0, l1 < 0 and l2 > 0 are constants to be determined, f^j ðxÞ ¼ f^ðx þ lj hÞ  lj h f^ 0 ðx þ lj hÞ

ð3Þ

for j ¼ 1, 2, and f^ 0 ðxÞ is the first derivative of f^ðxÞ. The suggested estimator f~ðxÞ is a convex combination of f^1 ðxÞ, f^ðxÞ, and f^2 ðxÞ. The estimate f^j ðxÞ represents the value, at x, of the tangent line which meets f^ðÞ at x þ lj h, as depicted in Fig. 1. Therefore, f~ðxÞ will be larger than f^ðxÞ where the point of interest x is located at the peak area. Similarly, f~ðxÞ will be smaller than f^ðxÞ where x is located at the through area. Thus, we may expect that the bias of f~ðxÞ is smaller than that of f^ðxÞ. Suppose we choose 1 ¼ 2 ¼ . Then, it may be seen that taking l1 ¼ l2 ¼ lðÞ with lðÞ ¼ fð1 þ 2Þ2 =ð2Þg1=2 ð4Þ R and l ¼ ul KðuÞ du cancels the Oðh2 Þ terms in the bias expansion for the estimator f~ðxÞ. In fact, with these choices of the constants, the estimator of f at x can be written as   n 1 X x  Xi ~ ~ f ðxÞ ¼ K , nh i¼1 h

ð5Þ

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016 ©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

2156

Kim, Kim, and Park

Figure 1.

Convex combination of three kernel estimators.

where, with l lðÞ, the ‘‘jackknifed’’ kernel K~ ðxÞ equals ð2 þ 1Þ1 ½KðxÞ þ fKðx þ l Þ þ Kðx  l Þ  lfK 0 ðx þ l Þ  K 0 ðx  l Þgg : ð6Þ It may be proved that K~ ðxÞ is a fourth order kernel, and thus the bias and variance properties of f~ðxÞ are immediate as demonstrated in the following theorem (see the Appendix for details of the bias calculations). Theorem 1. Assume that f has four bounded and continuous derivatives in a neighborhood Rof x; that the kernel K is nonnegative, bounded, and symmetric with K ¼ 1; and that h ! 0 and nh ! 1. If one takes 1 ¼ 2 ¼  > 0 and l1 ¼ l2 ¼ lðÞ, then   f ð4Þ ðxÞ 3ð1 þ 6Þ 2 4 4  2 h þ oðh4 Þ, Ef f~ðxÞg ¼ f ðxÞ þ 24 2   1 1 varf f~ðxÞg ¼ f ðxÞVðÞ þ o nh nh where with l lðÞ

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016 ©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Kernel Density Estimation

Z VðÞ ¼

2157

K~ ðtÞ2 dt

 Z Z ¼ ð2 þ 1Þ2 ð22 þ 1Þ K 2 ðtÞ dt þ 4 KðtÞKðt þ l Þ dt Z þ 22 Kðt  l ÞKðt þ l Þ dt  f8ð1 þ 2Þ2 g1=2 Z Z 0 3 1=2 Kðt  l ÞK 0 ðt þ l Þ dt  KðtÞK ðt þ l Þ dt  f8 ð1 þ 2Þ2 g Z 0 2

K ðtÞ  K 0 ðt  l ÞK 0 ðt þ l Þ dt : þ ð2 þ 1Þ2 Our proposal given at Eqs. (5) and (6) is a new version of the generalized jackknifing approach applied to density estimation. We note that Schucany and Sommers (1977) applied the idea of the generalized jackknife (Schucany et al., 1971) to reduce the bias of the ordinary kernel density estimator. It amounts to use of kernels of the form ð2  2 0 Þ1 f2 KðxÞ  2 LðxÞg,

ð7Þ

with 2 6¼ 2 0 in place Rof K in Eq. (1), where R L is another kernel different from K, j ¼ x j KðxÞ dx and j ¼ x j LðxÞ dx. Jones and Foster (1993) explored various forms of fourth order kernels based on Eq. (7). The jackknifed kernel K~ ðxÞ defined at Eq. (6) has a similar flavor with these kernels in that it is a consequence of applying the notion of generalized jackknifing and is a fourth order kernel which yields Oðh4 Þ bias for the resulting estimator. Our proposal, however, does differ from any in Jones and Foster (1993) even though the latter covers many special cases. Remark 1. As suggested by Choi and Hall (1998) one may use  which minimizes VðÞ. Another possibility is to use  which minimizes MISE. However, the latter is computationally difficult. The minimizer of VðÞ varies from 0.1 to 1.0 depending on the kernel K. We found that the estimator f~ is not very sensitive to the choice of , in general. Remark 2. One simple choice of  is to take  ¼ 1. In this case, f~ðxÞ has the corresponding kernel 1 K~ 1 ðxÞ ¼ ½Kðx þ l Þ þ Kðx  l Þ  lfK 0 ðx þ l Þ  K 0 ðx  l Þg : 2 This estimator still achieves Oðh4 Þ bias. However, the constant factors in the leading bias and variance are usually larger than those of f~ðxÞ with the optimal  which is usually finite.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016 ©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

2158

Kim, Kim, and Park

Remark 3. By a Taylor expansion, f~ðxÞ using K~ 1 is very similar to the nonnegative estimator f^1 ðxÞ suggested by Cheng et al. (2000). In fact, 1 f^1 ðxÞ ¼ ð f^þ ðxÞ þ f^ ðxÞÞ, 2

ð8Þ

2 where f^ ðxÞ ¼ f^ðx  lhÞ exp½12 l 2  12 fl  hf^ 0 ðx  lhÞ=f^ðx  lhÞg .

Remark 4. One may consider an alternative, slightly simpler, method using the one-sided version of f~ðxÞ, i.e., 1 ðf^ ðxÞ þ f^ðxÞÞ, f^os ðxÞ ¼ þ1 1 which has the corresponding kernel oi h n K~ os ðxÞ ¼ ð þ 1Þ1 KðxÞ þ  Kðx þ l Þ  lK 0 ðx þ l Þ : It may be shown that, for appropriate choice of lðÞ, f^os ðxÞ has Oðh3 Þ bias which is larger than Oðh4 Þ of f~ðxÞ. This estimator may be useful at boundary points where data to one side of x are not available. Remark 5. Since K~ ðxÞ is a fourth order kernel, the nonnegativity of f~ðxÞ is not guaranteed. There are standard ways around this problem. See Gajek (1986), for example. We have experienced that f~ðxÞ is usually nonnegative except some extreme cases such as separated bimodal densities.

3. NUMERICAL RESULTS We considered three densities: the standard Gaussian, a skewed unimodal, and a bimodal as described in Marron and Wand (1992). For each density, we compared f~ with the classical second-order kernel estimator f^ as at Eq. (1), the two-parameter locally parametric estimator f^1 of Cheng et al. (2000) as at Eq. (8), and the fourth-order kernel estimator f^4 with the kernel 2KðxÞ  K  KðxÞ. The standard normal density function was used as the kernel function K. In computing f~, we used  ¼ :307 which minimizes VðÞ in Theorem 1. But, as stated in Remark 1, the choice of  was not sensitive. Sample size was n ¼ 100. Figure 2 shows the mean integrated squared error (MISE) performance of the four estimators based on 100 replications. As expected, f^

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016 ©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Kernel Density Estimation

2159

Figure 2. MISE of the four estimators: the ordinary kernel estimator f^ðxÞ (     ), the new estimator f~ðxÞ (- - - - -), the locally-parametric estimator f^1 ðxÞ (– – –), and the fourth-order kernel estimator f^4 ðxÞ (— —) when the true density is (a) Nð0, 1Þ; (b) the skewed unimodal ð1=5ÞNð0, 1Þ þ ð1=5ÞNðð1=2Þ, ð2=3Þ2 Þþ ð3=5ÞNðð13=12Þ, ð5=9Þ2 Þ; (c) the bimodal ð1=2ÞNð1, ð2=3Þ2 Þ þ ð1=2ÞNð1, ð2=3Þ2 Þ.

performs worst. The estimators f~, f^1 , and f^4 are comparable to each other. Our proposal f~ exhibits the lowest optimal MISE in the skewed unimodal case, while f^1 wins in the other two cases. In any case, f~ outperforms f^4 . Figure 3 depicts the average curves based on 100 values

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016 ©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

2160

Kim, Kim, and Park

Figure 3. Plots of the four estimates for the true density (a) Nð0, 1Þ; (b) the skewed unimodal ð1=5ÞNð0, 1Þ þ ð1=5ÞNðð1=2Þ, ð2=3Þ2 Þ þ ð3=5ÞNðð13=12Þ, ð5=9Þ2 Þ; (c) the bimodal ð1=2ÞNð1, ð2=3Þ2 Þ þ ð1=2ÞNð1, ð2=3Þ2 Þ. Line types are as in Fig. 2 and solid curves correspond to the true densities. Each curve (except solid) represents the average of 100 simulated values of the corresponding estimates.

of the estimators as well as the true density curves. It gives an indication of biases for the four estimators. The bandwidth used for each estimator was the one which minimizes the MISE. We see that f~ has the smallest bias at the peaks.

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016 ©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

Kernel Density Estimation

2161

APPENDIX : ASYMPTOTIC BIAS CALCULATIONS The bias formula can be obtained from calculation of Z

x p K~ ðxÞ dx ¼

 Z 1 p þ  ðx  l Þ p þ ðx þ l Þ p KðxÞ dx 2 þ 1 Z

 l ðx  l Þ p  ðx þ l Þ p K 0 ðxÞ dx:

ðA.1Þ

It may be seen that Z

fðx  l Þ p þ ðx þ l Þ p gKðxÞ dx ¼ 2

½X p=2  j¼0

 p 2j l p2j , 2j

ðA.2Þ

where ½ p=2 denotes the greatest integer which does not exceed p=2. The integral in Eq. (A.2) equals zero for p ¼ 1, 3, and it is 2, 2ð2 þ l 2 Þ, 2ð4 þ 6l 2 2 þ l 4 Þ for p ¼ 0, 2, 4, respectively. Similarly, Z

fðx  l Þ p  ðx þ l Þ p gK 0 ðxÞ dx ¼2

½ð p1Þ=2 X j¼0



 p ð p  2j  1Þ l 2jþ1 p2j2 2j þ 1

ðA.3Þ

for p  1, and it is zero for p ¼ 0. The integral in Eq. (A.3) equals zero, too, for p ¼ 1, 3, and it is 4l, 8lð32 þ l 2 Þ for p ¼ 2, 4, respectively. Plugging R these values of the integrals for p ¼ 0, 1, and 3 into Eq. (A.1) yields that x p K~ ðxÞ dx equals 1 for p ¼ 0, and is 0 for p ¼ 1, 3. The proof of the theorem is now completed by observing that Z

x2 K~ ðxÞ dx ¼

1 f þ 2ð2 þ l 2 Þ  lð4l Þg ¼ 0 2 þ 1 2

by Eq. (4), and that Z

1 f þ 2ð4 þ 6l 2 2 þ l 4 Þ  8l 2 ð32 þ l 2 Þg 2 þ 1 4 3ð1 þ 6Þ 2 2 : ¼ 4  2

x4 K~ ðxÞ dx ¼

MARCEL DEKKER, INC. • 270 MADISON AVENUE • NEW YORK, NY 10016 ©2003 Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc.

2162

Kim, Kim, and Park

ACKNOWLEDGMENTS The first author was supported by Korea Research Foundation Grant KRF-2000-041-D00050. Research of the second and the third authors was supported by the Korea Research Foundation Grant KRF-2002-070-C00017.

REFERENCES Abramson, I. S. (1982). On bandwidth variation in kernel estimates—a square root law. The Annals of Statistics 10:1217–1223. Cheng, M.-Y., Choi, E., Fan, J., Hall, P. (2000). Skewing-methods for two parameter locally-parametric density estimation. Bernoulli 6:169–182. Choi, E., Hall, P. (1998). On bias reduction in local linear smoothing. Biometrika 85:333–345. Gajek, L. (1986). On improving density estimators which are not bona fide functions. The Annals of Statistics 14:1612–1618. Jones, M. C., Foster, P. J. (1993). Generalized jackknifing and higher order kernels. Journal of Nonparametric Statistics 3:81–94. Jones, M. C., Signorini, D. F. (1997). A comparison of higher-order bias kernel density estimators. Journal of the American Statistical Association 92:1063–1073. Jones, M. C., Linton, O., Nielsen, J. P. (1995). A simple bias reduction method for density estimation. Biometrika 82:327–338. Marron, J. S., Wand, M. P. (1992). Exact mean integrated squared error. The Annals of Statistics 20:712–736. Ruppert, D., Cline, D. B. H. (1994). Bias reduction in kernel density estimation by smoothed empirical transformations. The Annals of Statistics 22:185–210. Samiuddin, M., El-Sayyad, G. M. (1990). On nonparametric kernel density estimates. Biometrika 77:865–874. Schucany, W. R., Sommers, J. P. (1977). Improvement of kernel type estimators. Journal of the American Statistical Association 72:420–423. Schucany, W. R., Gray, H. L., Owen, D. B. (1971). On bias reduction in estimation. Journal of the American Statistical Association 66:524–533. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman and Hall. Wand, M. P., Jones, M. C. (1995). Kernel Smoothing. London: Chapman and Hall.