Nonparametric estimation of multivariate CDF with

NONPARAMETRIC ESTIMATION OF MULTIVARIATE CDF WITH CATEGORICAL AND CONTINUOUS DATA Gaosheng Ju, Rui Li and Zhongwen Liang ABSTRACT In this paper we construct a nonparametric kernel estimator to estimate the joint multivariate cumulative distribution function (CDF) of mixed discrete and continuous variables. We use a data-driven cross-validation method to choose optimal smoothing parameters which asymptotically minimize the mean integrated squared error (MISE). The asymptotic theory of the proposed estimator is derived, and the validity of the crossvalidation method is proved. We provide sufficient and necessary conditions for the uniqueness of optimal smoothing parameters when the estimation of CDF degenerates to the case with only continuous variables, and provide a sufficient condition for the general mixed variables case.

Nonparametric Econometric Methods Advances in Econometrics, Volume 25, 291–318 Copyright r 2009 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0731-9053/doi:10.1108/S0731-9053(2009)0000025012

291

292

GAOSHENG JU ET AL.

1. INTRODUCTION As the rapid advancement of modern computer technology makes the computing of complicated problems feasible, nonparametric statistic methods become increasingly popular. Nonparametric methods have been applied in many economic contexts. The most striking advantage of nonparametric methods over parametric ones is that no prior assumptions, which often turn out to be inappropriate, about the unknown true distributions are taken. The joint distributions of multiple economic variables can give a direct illustration of the relationship among these variables and help researchers to infer the underlying causality. Consequently, the estimation of joint distributions is an important and fundamental issue in the nonparametric econometrics/statistics literature. Traditionally, nonparametric methods focus on the estimation of either continuous variables or discrete variables (see, e.g., Grund, 1993; Grund & Hall, 1993; Hall, 1981). However, estimation and testing methods able to handle mixed data are quite desirable because most data sets contain both continuous and discrete variables. For instance, labor economists are usually interested in the relationships between the continuous income and discrete explanatory variables such as gender, race, education levels, locations, etc. Recently, Li and Racine (2003), Racine and Li (2004), and Li and Racine (2008) discussed nonparametric smoothing estimations of probability density functions, regression functions, and conditional cumulative distribution functions (CDF) and quantile functions (with mixed discrete and continuous variables). Their work is of great importance for enlarging the scope of the application of nonparametric methods to the context with both continuous and discrete variables. This paper contributes to this literature by investigating a nonparametric estimation of the unconditional joint CDF of mixed data types. One difficulty in dealing with the estimation of discrete and continuous variables simultaneously is a lack of joint observations. Conventional approaches to handle the estimation of CDF of discrete variables are frequency based. Although we can directly combine it with the kernel estimator of continuous variables, the approach suffers because the number of observations for estimation of discrete variables by a frequency-based approach may be insufficient to ensure an accurate nonparametric estimation of marginal CDF for the remaining continuous variables. Aitchison and Aitken (1976) proposed a novel nonparametric smoothing method to estimate distribution functions defined over binary data. Their method can mitigate the problem of data insufficiency for finite-sample applications.

Nonparametric Estimation of Multivariate CDF

293

Their proposed smoothing method can reduce the estimation variance significantly, though it incurs some mild estimation bias. Li and Racine (2003) extended Aitchison and Aitken’s method to a context with mixed discrete and continuous variables. In this paper, we adopt their ideas of smoothing both discrete and continuous variables to estimate an unconditional CDF which contains both discrete and continuous components. It is well known that the selection of smoothing parameters is of crucial importance in nonparametric estimations. There exist several popular methods of smoothing parameter selections. Among them, the most popular ways are the plug-in method and the cross-validation method. There are many discussions about these methods (e.g., Ha¨rdle & Marron, 1985; Loader, 1999). However, there is no clear conclusion which method is better. In practice, the cross validation may be a preferred choice, especially in multivariate settings. This is because the cross-validation method is fully data driven. Rudemo (1982) and Bowman (1984) introduced the cross-validation selection of smoothing parameters for density estimation (see Wand & Jones, 1995, Chapter 3; Li & Racine, 2007, for a thorough discussion). Bowman, Hall, and Prvan (1998) presented a cross-validation bandwidth selection for the smoothing estimation of continuous distribution functions. In this paper, we propose to use the least squares cross-validation method to choose the smoothing parameters. We will show that the resultant smoothing parameters are optimal in the sense of minimizing the mean integrated squared error (MISE). Another interesting problem is the uniqueness of the smoothing parameter vector in cross-validation methods. This was first tackled in Li and Zhou (2005) for the nonparametric kernel estimation of the PDF and regression function of continuous variables. We also discuss this problem in the paper. We give a sufficient and necessary condition for uniqueness when the estimation of CDF degenerates to a case with only continuous variables. For the case of mixed variables, we provide a sufficient condition. The estimation of CDF is quite useful in econometrics and economics, especially for the econometric theory and economic applications of tests of stochastic dominance. Recently, there are some theories and applications about nonparametric tests of stochastic dominance. Among them are Barrett and Donald (2003) which provided some consistent tests of stochastic dominance for any pre-specified order, Anderson (1996) which gave a nonparametric test of stochastic dominance applied in income distributions, and Davidson and Duclos (2000) which showed some statistical inference and applications in poverty, inequality, and social welfare. Our estimation can be

294

GAOSHENG JU ET AL.

readily used in the test of stochastic dominance under the circumstance of mixed data. The paper is organized as follows. In Section 2, we propose an estimator of distribution function that admits mixed discrete and continuous variables. We derive the rates of convergence and establish the asymptotic normality of our estimator. In Section 3, we show that the smoothing parameters selected by the cross-validation method are optimal in the sense that they converge to the minimizer of MISE in probability. In Section 4, we give a sufficient and necessary condition for the uniqueness of the smoothing parameter vector when the estimation contains continuous variables only, and we give a sufficient condition for the mixed case. Section 5 provides an empirical application to examine the relationship between city size and unemployment rate. Section 6 concludes the paper.

2. ESTIMATION OF CDF WITH MIXED DISCRETE AND CONTINUOUS VARIABLES We consider the case for which x is a vector containing a mix of discrete and continuous variables. Let x ¼ (xc, xd), where xcARq is a q-dimensional continuous random vector, and where xd is an r-dimensional discrete random vector. Let X dis ðxds Þ denote the sth component of X di ðxd Þ; s ¼ 1; . . . ; r, i ¼ 1, y n, where n is the sample size. We restrict the discrete components to a finite support. Without loss of generality, assume that the support of X dis is {0, 1, y, cs1}, hence the support of X di is S d ¼ Prs¼1 f0; 1; . . . ; cs 1g. For discrete variables, we use the following kernel: ( 1 ls ; if X dis ¼ xds d d lðX is ; xs ; ls Þ ¼ ls =ðcs 1Þ; if X dis axds Note that ls is a bandwidth having the following properties: when ls ¼ 0, lðX dis ; xds ; 0Þ becomes an indicator function, and when ls ¼ ðcs 1Þ=cs , lðX dis ; xds ; ðcs 1Þ=cs Þ ¼ 1=cs becomes a uniform weight function. Thus, the range of ls is [0, (cs1)/cs]. The product kernel function is given by LðX di ; xd ; lÞ ¼

r Y

lðX dis ; xds ; ls Þ

s¼1

We use k( ) to denote a univariate kernel function for a continuous variable. The product kernel function used for the continuous variables

295


is given by c Y c q X ij xcj X i xc ¼ k K h hj j¼1 where X cij ðxcj Þ denotes the jth component of X ci ðxc Þ; j ¼ 1; . . . ; q, i ¼ 1, y, n, and hj is the bandwidth associated with xcj . We use f(x) and F(x) to denote the density function and CDF of X, respectively. Following Li and Racine (2003), the kernel estimator of density function f(x) is given by c n X 1 X i xc LðX di ; xdi ; lÞ K f^ðxÞ ¼ f^ðxc ; xd Þ ¼ h nh1 h2 hq i¼1 Naturally, one can obtain a kernel estimator of F(x) by integrating f^ðxÞ, which is expressed as " !# n c X c X 1 x X c d d i ^ ^ ;x Þ ¼ FðxÞ ¼ Fðx G LðX i ; u; lÞ (1) h n i¼1 uxd Rx Q where GðxÞ ¼ 1 kðvÞdv, and Gððxc X ci Þ=hÞ ¼ qj¼1 Gððxcj X cij Þ=hj Þ. We introduce some notations before we state the main theorem of this section. Let 1(A) denote an indicator function that takes the value 1 if A occurs and 0 otherwise. Define an indicator function 1s( , ) by 1s ðzd ; uÞ ¼ 1ðzds aus Þ

r Y

1ðzdt ¼ ut Þ

(2)

tas

We can see that 1s( , ) equals to one if and only if zd and u differ only in the sth component. The following assumptions will be used in studying the asymptotic behavior of cross-validated smoothing parameters and in deriving the asymptotic distribution of our CDF estimator. Condition (C1). The data fðX ci ; X di Þgni¼1 are independent and identically distributed as (Xc, Xd). F(xc, xd) has continuous third-order partial derivatives with respect to xc. Condition (C2). k( ) is a bounded and symmetric kernel density function R R with a compact support. k(v)dv ¼ 1, v2k(v)dv ¼ k2oN. Condition (C3). As n-N, hj-0, nh6j ! 0, for j ¼ 1, y, q and ls ! 0; nl4s ! 0, for s ¼ 1, y, r.

296

GAOSHENG JU ET AL.

2 c d c c d c c Let F ð1Þ F ð2Þ j ðx ; x Þ ¼ ð@FðxÞÞ=ð@xj Þ; jj ðx ; x Þ ¼ ð@ FðxÞÞ=ð@xj @xj Þ. The next theorem shows the rate of convergence in terms of MSE and MISE and the asymptotic normality of our estimator.

Theorem 1. Under condition (C1), (C2), and (C3), we have q r X 1 hj X ls c d c d c d ^ MSEð Fðx ; x ÞÞ ¼ ; x Þð1 Fðx ; x ÞÞ A1j þ A2s Fðx (i) n n n j¼1 s¼1 !2 q r X X B1j h2j þ B2s ls þ j¼1

1 þO n

s¼1 q X j¼1

r r X X 1X þ l2s þ h6j þ l4s n s¼1 j¼1 s¼1 q

h2j

!

R c d A2s ¼ 2=ðcs 1Þ where a0 ¼ 2 vGðvÞkðvÞdv; A1j ¼ a0 F ð1Þ j ðx ; x Þ; P P c c d c d 1 ðu; vÞFðx j uÞpðuÞ 2Fðx ; x Þ 2Fðx ; x ÞB2s , d d s ux vx ; vau P P ð2Þ c d B1j ¼ ð1=2Þk2 F jj ðx ; x Þ, and B2s ¼ 1=ðcs 1Þ zd 2Sd uxd 1s ðzd ; uÞ Fðxc jxd Þpðxd Þ Fðxc ; xd Þ. ! X Z 1 ^ c ; xd ÞÞ ¼ Z T BBT dxc Z þ AT Z~ (ii) MISEðFðx n xd 2S d Z 1 X Fðxc ; xd Þð1 Fðxc ; xd ÞÞdxc þ n d d x 2S ! q q r r X X 1X 2 1X 2 6 4 þO h þ l þ hj þ ls ð3Þ n j¼1 j n s¼1 s j¼1 s¼1 where Z ¼ ðh21 ; . . . ; h2q ; l1 ; . . . ; lr ÞT ; Z~ ¼ ðh1 ; . .P . ; hq ; lR1 ; . . . ; lr ÞT ; T B ¼ ðB11 ; . . . ; B1q ; B21 ; . . . ; B2r Þ , and A ¼ xd 2Sd ðA11 ; . . . ; A1q ; A21 ; . . . ; A2r ÞT dxc . ! q r X X pffiffiffi (iii) 2 c d c d ^ ; x Þ Fðx ; x Þ B1j h B2s ls n Fðx j

j¼1

s¼1

d

! Nð0; Fðxc ; xd Þð1 Fðxc ; xd ÞÞÞ. The proof of Theorem 1 is given in Appendix A. pffiffiffi We can see that the convergence rate of our CDF estimator is n. Under the optimal convergence rates for hj and ls, j ¼ 1, y, q, s ¼ 1, y, r


297

1/3 (i.e., and lsBn2/3), the statement (iii) in Theorem 1 simplifies to p ffiffiffi ^hjBn d c d nðFðx ; x Þ Fðxc ; xd ÞÞ ! Nð0; Fðxc ; xd Þð1 Fðxc ; xd ÞÞÞ.

3. CROSS-VALIDATION BANDWIDTH SELECTION In this section, we focus on how to choose the smoothing parameters when ^ estimating FðÞ. Theoretically, we may choose the optimal bandwidths by minimizing the leading term of MISE given by Eq. (3) in Theorem 1. Taking derivatives with respect to hj and ls, one can easily see that optimal smoothing requires that hjBn1/3, j ¼ 1, y, q and lsBn2/3, s ¼ 1, y, r, as qZ1. However, we can see that the coefficients of these orders involve unknown functions. Therefore, this method is infeasible in practice. In practice one can compute plug-in bandwidths based on Eq. (3) by choosing some initial ‘‘pilot’’ bandwidths, the results may be sensitive to the choice of these pilots. Therefore, it is highly desirable to construct an automatic datadriven bandwidth selection procedure, which does not rely on some ad hoc pilot bandwidth values to estimate unknown functions. Following Bowman et al. (1998), we suggest choosing the smoothing parameters (h, l) ¼ (h1, y, hq, l1, y, lr) by minimizing the following crossvalidation function: " # n X Z 1X ðIðxc ; X ci ÞIðxd ; X di Þ F^ i ðxc ; xd ÞÞ2 dxc CVðh; lÞ ¼ n i¼1 d d x 2S

P P where F^ i ðxc ; xd Þ ¼ ð1=ðn 1ÞÞ jai Gððxc X cj Þ=hÞ uxd LðX dj ; u; lÞ; Iðxc ; X ci Þ ¼ 1ðX ci xc Þ, and Iðxd ; X di Þ ¼ 1ðX di xd Þ. Define I i Iðx; X i Þ ¼ Iðxc ; X ci ÞIðxd ; X di Þ and a term unrelated to smoothing parameters X Z fðF n FÞ2 E½ðF n FÞ2 gdxc Jn ¼ xd 2S d

n X Z 1X ½Iðx; X i Þ Fðxc ; xd Þ2 dxc n i¼1 d d x 2S

P where F n ðxc ; xd Þ ¼ ð1=nÞ ni¼1 Iðxc ; X ci ÞIðxd ; X di Þ is the empirical distribution function. In Theorem 2 below, we show that H(h, l) ¼ CV(h, l)þJn is a good approximation to MISE(h, l).

298

GAOSHENG JU ET AL.

Theorem 2. Define H(h, l) ¼ CV(h, l)þJn, then under condition (C1) and (C2), we have for each d, e, CW0, Hðh; lÞ ¼ MISEðh; lÞ þ Op

n3=2 þ n1

q X

hqj þ n1=2

j¼1

þn1=2

q X

h4j þ n1=2

j¼1

r X

q X

hqþ2 j

j¼1

! ! l2s nd

s¼1

with probability 1, uniformly in 0rhj, lsrCne for j ¼ 1, y, q, s ¼ 1, y, r, as n-N. Essentially, Theorem 2 says that CV(h,l) ¼ (leading terms of MISE(h, l))þ (terms unrelated to h, l)þ(small order terms). Therefore, minimizing crossvalidation function is asymptotically equivalent to minimizing MISE (h, l). Therefore, we immediately have the following corollary. Corollary 1. Under the conditions (C1) and (C2), let h^j ; l^ s ; j ¼ 1; . . . ; q; s ¼ 1; . . . ; r denote the smoothing parameters that minimizes the CV(h, l) over the set [0, Cne]qþr for any CW0 and any 0oeo1/3, let h0j ; l0s ; j ¼ 1; . . . ; q; s ¼ 1; . . . ; r denote the smoothing parameters that minimizes the MISE(h, l), then we have h^j l^ s ! 1 and !1 h0j l0s

ðif l0s a0Þ

or

l^ s ! 0

ðif l0s ¼ 0Þ

in probability, for all j ¼ 1, y, q, and s ¼ 1, y, r. The proof of Theorem 2 is given in the Appendix B.

4. UNIQUENESS OF SMOOTHING PARAMETER VECTOR Section 3 has established the fact that minimizing cross-validation function is asymptotically equivalent to minimizing MISE. Hence, to investigate the asymptotic uniqueness of the cross-validated smoothing parameters, we only need to examine the uniqueness of parameters minimizing the leading terms of MISE. When there does not exist discrete variables,


299

our objective function is inf

Z2Rqþ ; jjZjj¼1

1 ZT MZ þ AT Z 1=2 n

(4)

R P where Z ¼ ðh21 ; . . . ; h2q ÞT ; Z 1=2 ¼ ðh1 ; . . . ; hq ÞT ; M ¼ xd 2Sd BBT dxc , and both A and B are of dimension q 1 (they are the first q elements of the general mixed variable case). Based on the previous discussion, the optimal rates for hj and ls are n1/3 and n2/3, respectively. Let hj ¼ ajn1/3, for j ¼ 1, y, q. Substituting these parameters into Eq. (4), then minimize Z T MZ þ ð1=nÞAT Z 1=2 is equivalent to minimize Z T MZ þ AT Z 1=2 , where we abuse notation a little bit, Z ¼ ða11 ; . . . ; a2q ÞT and Z 1=2 ¼ ða1 ; . . . ; aq ÞT . When the estimation of CDF degenerates to the case with only continuous variables, we give the necessary and sufficient condition in the following theorem. Theorem 3. Assume that r ¼ 0, let Z ¼ ðh21 ; . . . ; h2q ÞT , define m ¼ inf Z2Rqþ ; jjZjj¼1 Z T MZ. Then wðZÞ ¼ Z T MZ þ AT Z1=2 has a unique minimizer Z 2 Rqþ , if and only if mW0. Proof. Our proof follows similar arguments as in Li and Zhou (2005). First we prove the ‘‘only if’’ part. Suppose m ¼ 0 is attained at some Z ð0Þ 2 Rqþ with jjZð0Þ jj ¼ 1. Then there exists at least onepcomponent ffiffi ð0Þ ð0Þ T ð0Þ 2 a0, that is, Zð0Þ þ AT tðZ ð0Þ Þ1=2 ¼ Z ð0Þ i p i 40. So wðtZ Þ ¼ t ðZ Þ MZ ffiffi AT tðZð0Þ Þ1=2 ! 1, as t-þN. Note that the components of A are negative, and tZð0Þ 2 Rqþ : This implies that w has no minimizer. Next we prove the ‘‘if ’’ part. If mW0, for any Z 2 Rqþ , with jjZjj ¼ 1, pffiffi wepffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi have that tZ 2 Rqþ ; t40. Then wðtZÞ ¼ t2 Z T MZ þ tAT Z 1=2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffi ðt Z T MZ ð1=ð2 Z T MZÞÞÞ2 þ ð t þ ððAT Z1=2 Þ=2ÞÞ2 ð1=4ZT MZÞ ððAT Z 1=2 Þ2 =4Þ ! þ1, as t-þN. For RW0, denote BR ¼ fZ 2 Rqþ : jjZjj Rg. Since w is a continuous function on Rqþ ; BR is a compact set and w(tZ)-þN, as t-þN, we have that there exists RW0 such that minq wðZÞ3 min wðZÞ. Z2Rþ

Z2BR

From wðtZÞ ¼ t2 Z T MZ þ t1=2 AT Z 1=2 , we know that w(tZ) attains its minimum at t ¼ ððAT ZÞ=ð4Z T MZÞÞ2=3 40. So 0 is not the minimizer of w. Similarly, we get that wðZ þ tð0; . . . ; 1; . . . ; 0ÞT Þ ¼ Z T MZ þ cannot attain its 2tZ T Mð0; . . . ; 1; . . . ; 0ÞT þ AT Z þ t2 mii þ Ai t1=2

300

GAOSHENG JU ET AL.

minimum at t ¼ 0. So Z with h2i ¼ 0 cannot be the minimizer of w, which means that w can only attain its minimum in the interior of BR. The Hessian matrix H of w is H ¼ ð@2 w=ð@Z@Z T ÞÞ ¼ 2M þ G, where 3=2 3=2 3=2 G ¼ ð1=4Þ diag ðc1 z1 ; c2 z2 ; . . . ; cq zq Þ is a diagonal matrix. Since cio0, G is positive definite in the interior of BR. Also, M is symmetric and positive semi-definite. So H is positive definite in the interior of BR. Therefore, w has a unique minimizer in the interior of BR. This completes the proof. In general, our objective function is inf

Z2Rqþ ; jjZjj¼1

1 Z T MZ þ AT Z~ n

(5)

2 2 T T ~ whereP RZ ¼ Tðh1 ;c . . . ; hq ; l1 ; . . . ; lr Þ ; Z ¼ ðh1 ; . . . ; hTq ; l1 ; . . . ; lr Þ ; and A ¼ M 1q ; B21 ; . . . ; B2r Þ , P ¼ Rxd 2Sd BB dx , B ¼ ðB11 ; . . . ; B T c ðA ; . . . ; A ; A ; . . . ; A Þ dx are defined in Theorem 1. d d 11 1q 21 2r x 2S Substituting hj ¼ ajn1/3, for j ¼ 1, y, q, and ls ¼ bsn2/3, for s ¼ 1, y, r into Eq. (5), we have that Eq. (5) is equivalent to minimize ZT MZ þ AT Z~ with respect to Z ¼ ða21 ; . . . ; a2q ; b1 ; . . . ; br ÞT and Z~ ¼ ða1 ; . . . ; aq ; b1 ; . . . ; br ÞT . A sufficient condition for the estimation of the CDF of the mixed discrete and continuous variables is given as follows.

Theorem 4. Let m ¼ inf Z2Rqþr ; jjZjj¼1 Z T MZ. If mW0, then w has a þ minimizer Z 2 Rqþr þ . If M is positive definite, then Hessian matrix H of w is positive definite at every point of Rqþr þ . Thus, w has a unique minimizer Z 2 Rqþr þ . Proof. If mW0, for any Z 2 Rqþr þ , with ||Z|| ¼ 1, we have that 1=2 ; t40. Using the notation Zð1Þ ¼ ða21 ; . . . ; a2q ÞT ; Z ð1Þ ¼ tZ 2 Rqþr þ T T 2 T ða1 ; . . . ; aq Þ and Z ð2Þ ¼ ðb1 ; . . . ; br Þ , we have wðtZÞ ¼ t Z MZ þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffi T 1=2 t A1 Z ð1Þ þ t AT2 Z ð2Þ ¼ ðt Z T MZ þ ððAT2 Z ð2Þ 1Þ=ð2 Z T MZÞÞÞ2 þ pffiffi 1=2 1=2 ð t þ AT1 Z ð1Þ =2Þ2 ððAT2 Z ð2Þ 1Þ2 =ð4Z T MZÞÞ ððAT1 Z ð1Þ Þ2 =4Þ ! þ1, T as t-þN, where A1 ¼ ðc1 ; . . . ; cq Þ ,A2 ¼ ðcqþ1 ; . . . ; cqþr ÞT . For RW0, denote BR ¼ fZ 2 Rqþr þ : jjZjj Rg. Since w is a continuous function , B is a compact set, and w(tZ)-þN, t-þN, we have that on Rqþr R þ wðZÞ3 min wðZÞ. Therefore, w has a there exists RW0, such that min Z2BR Z2Rqþr þ . minimizer Z 2 Rqþr þ G 0 2 T The Hessian matrix H of w is H ¼ @ w=ð@Z@Z Þ ¼ 2M þ . If M 0 0 T is positive definite, then mW0, since Z MZW0 on the compact set

301


fZ : Z 2 Rqþr þ ; jjZjj ¼ 1g. Also, H is positive definite at every point . Thus, w has a unique minimizer Z 2 Rqþr Z 2 Rqþr þ þ . This completes the proof.

5. AN EMPIRICAL APPLICATION Gan and Zhang (2006) presented a theory predicting that a large city tends to have smaller unemployment rate. Their empirical study applied US data on city population and average unemployment rate based upon a sample of 295 cities. The average unemployment rate, which is continuous, ranges from 2.4% to 19.6%. To get a categorical variable, we artificially stipulate that those with population of more than 200,000 are large cities, and the others are small cities. This classification gives 112 large cities and 183 small cities. In Fig. 1, we plot the conditional CDF of unemployment rate, which is calculated from our estimation of the joint CDF, for large and small cities. We use a Gaussian kernel for the unemployment rate. The cross-validated bandwidths for the continuous variable and categorical variable are 0.3470 and 0.0289, respectively.1 The conditional CDF estimate is consistent with the theory that large cities tend to have lower unemployment rates than small cities. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Small city

0.2

Large city

0.1 0 0

2

4

6

8

10

12

14

16

18

20

Unemployment rate

Fig. 1.

CDF Estimate of Unemployment Rate of Large and Small Cities.

302

GAOSHENG JU ET AL.

The conditional CDF curve for large cities is above that of the small cities for most part. Fig. 1 shows that, for most of the unemployment range, the distribution of unemployment rate for large cities stochastically dominates that of small cities.

6. CONCLUSION We propose a consistent nonparametric kernel estimator of joint unconditional CDF defined over a mix of discrete and continuous variables. A datadriven cross-validation method for selecting the smoothing parameters is examined. We show that it is asymptotically equivalent to minimizing integrated MSE. The uniqueness condition of the cross-validation procedure is discussed. In view of the fact that many economic data sets involve both continuous and discrete variables, our proposed estimator should prove useful to applied researchers.

NOTE 1. For practical implementations of nonparametric econometrics, refer to Scott (1992) and Hayfield and Racine (2008).

ACKNOWLEDGMENTS We thank the editors and two referees for their insightful comments which help us to improve our paper substantially. We also thank Dr. Qi Li who leads us to this fruitful field and for the intense discussion of this paper.

REFERENCES Aitchison, J., & Aitken, C. G. G. (1976). Multivariate binary discrimination by the kernel method. Biometrika, 63(3), 413–420. Anderson, G. (1996). Nonparametric tests of stochastic dominance in income distributions. Econometrica, 64(5), 1183–1193. Barrett, G. F., & Donald, S. G. (2003). Consistent tests for stochastic dominance. Econometrica, 71(1), 71–104. Bowman, A. W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71(2), 353–360.


303

Bowman, A., Hall, P., & Prvan, T. (1998). Bandwidth selection for the smoothing of distribution functions. Biometrika, 85(4), 799–808. Davidson, R., & Duclos, J.-Y. (2000). Statistical inference for stochastic dominance and for the measurement of poverty and inequality. Econometrica, 68(6), 1435–1464. Gan, L., & Zhang, Q. (2006). The thick market effect on local unemployment rate fluctuations. Journal of Econometrics, 133(1), 127–152. Grund, B. (1993). Kernel estimators for cell probabilities. Journal of Multivariate Analysis, 46(2), 283–308. Grund, B., & Hall, P. (1993). On the performance of kernel estimators for high-dimensional, sparse binary data. Journal of Multivariate Analysis, 44(2), 321–344. Hall, P. (1981). On nonparametric multivariate binary discrimination. Biometrika, 68(1), 287–294. Hall, P., & Heyde, C. C. (1980). Martingale limit theory and its applications. New York, NY: Academic Press. Ha¨rdle, W., & Marron, J. S. (1985). Optimal bandwidth selection in nonparametric regression function estimation. The Annals of Statistics, 13(4), 1465–1481. Hayfield, T., & Racine, J. S. (2008). Nonparametric econometrics: The np package. Journal of Statistical Software, 27(5), 1–32. Li, Q., & Racine, J. S. (2003). Nonparametric estimation of distributions with categorical and continuous data. Journal of Multivariate Analysis, 86(2), 266–292. Li, Q., & Racine, J. S. (2007). Nonparametric econometrics: Theory and practice. Princeton, NJ: Princeton University Press. Li, Q., & Racine, J. S. (2008). Nonparametric estimation of conditional CDF and quantile functions with mixed categorical and continuous data. Journal of Business and Economic Statistics, 26(4), 423–434. Li, Q., & Zhou, J. (2005). The uniqueness of cross-validation selected smoothing parameters in kernel estimation of nonparametric models. Econometric Theory, 21(5), 1017–1025. Loader, C. R. (1999). Bandwidth selection: classical or plug-in?. The Annals of Statistics, 27(2), 415–438. Racine, J. S., & Li, Q. (2004). Nonparametric estimation of regression functions with both categorical and continuous data. Journal of Econometrics, 119(1), 99–130. Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9(2), 65–78. Scott, D. W. (1992). Multivariate density estimation: Theory, practice, and visualization. New York, NY: Wiley. Wand, M. P., & Jones, M. C. (1995). Kernel smoothing. London: Chapman & Hall.

304

GAOSHENG JU ET AL.

APPENDIX A. PROOF OF THEOREM 1 Proof of Theorem 1 2 ^ ^ ^ ^ þ varðFðxÞÞ. As we all know MSEðFðxÞÞ ¼ E½FðxÞ FðxÞ2 ¼ ½biasðFðxÞÞ ^ ^ We will evaluate the terms biasðFðxÞÞ and varðFðxÞÞ separately. For simplicity, we use dz and dv to denote dz1 y dzq and dv1 y dvq, respectively, throughout the appendices. For the continuous variables, using the change of variables, integration by parts, and Taylor expansion, we have " c # Z "Y c # c q q Y xj X cij x j zj x X ci ¼ f ðz1 ; z2 ; . . . ; zq Þdz E G G G ¼E h hj hj j¼1 j¼1 # Z "Y q ¼ h1 h2 hq Gðvj Þ f ðxc1 h1 v1 ; xc1 h2 v2 ; . . . ; xcq hq vq Þdv

¼

Z "Y q

j¼1

#

kðvj Þ Fðxc1 h1 v1 ; xc2 h2 v2 ; . . . ; xcq hq vq Þdv

j¼1

#(

) q 1 X ð2Þ c ¼ kðvj Þ Fðx Þ þ F ðx Þhi hj vi vj dv 2 i;j¼1 ij j¼1 j¼1 ! ! q q q X X k2 X ð2Þ c 2 3 3 c þO hj ¼ Fðx Þ þ F ðx Þhj þ O hj 2 j¼1 jj j¼1 j¼1 Z "Y q

c

q X

c F ð1Þ j ðx Þhj vj

ðA:1Þ

R where k2 ¼ v2 kðvÞdv, and " c c # Z " Y c # q q c Y xj X cij xj zj 2 x Xi 2 2 ¼E ¼ f ðz1 ; z2 ; . . . ; zq Þdz E G G G h hj hj j¼1 j¼1 # Z "Y q 2 ¼ h1 h2 hq G ðvj Þ f ðxc1 h1 v1 ; xc2 h2 v2 ; . . . ; xcq hq vq Þdv ¼2

q

Z "Y q

j¼1

#"

Gðvj Þ

j¼1

¼2

q

Z "Y q

#" Gðvj Þ

j¼1

¼ Fðxc Þ a0

q Y

# kðvj Þ Fðxc1 h1 v1 ; xc2 h2 v2 ; . . . ; xcq hq vq Þdv

j¼1 q Y

# c

kðvj Þ fFðx Þ

j¼1 q X

F j ðxc Þhj þ O

j¼1

R where a0 ¼ 2 vGðvÞkðvÞdv.

q X j¼1

q X j¼1

! h2j

c F ð1Þ j ðx Þhj vj gdv þ O

q X

! h2j

j¼1

ðA:2Þ

305


For the discrete variables, we have 1ðzds aus Þ r r Y Y d ls lðzds ; us ; lÞ ¼ ð1 ls Þ1ðzs ¼us Þ Lðzd ; u; lÞ ¼ c 1 s s¼1 s¼1 ! ! r r r Y X Y l s ¼ ð1 ls Þ 1ðzd ¼ uÞ þ ð1 lt Þ 1s ðzd ; uÞ cs 1 tas s¼1 s¼1 ! ! r r X X 2 ls ¼ 1 ls 1ðzd ¼ uÞ þO s¼1

s¼1

r r X X ls 1s ðzd ; uÞ þ O l2s þ c 1 s¼1 s s¼1

! ðA:3Þ

where 1(zd ¼ u) and 1s(zd, u) are indicator functions. 1s(zd, u) denotes that zd and u only differ in sth component. that if zd and u differ in more than Pr Note 2 d one component, Lðz ; u; lÞ ¼ Oð s¼1 ls Þ. From (A.3), it is easy to obtain: " #2 " ! r X X X d Lðz ; u; lÞ ¼ 1 ls 1ðzd ¼ uÞ uxd

s¼1

uxd

r r XX X ls 1s ðzd ; uÞ þ O þ l2s c 1 s¼1 uxd s¼1 s # !2 " r X X ¼ 1 ls 1ðzd ¼ uÞ s¼1

!#2

uxd

! # r r X X X ls 2 þ2 1ðzd ¼ uÞ1s ðzd ; vÞ þ O ls c 1 u; vxd s¼1 s s¼1 # !" r X X d ¼ 12 ls 1ðz ¼ uÞ s¼1 r X

"

"

uxd

XX ls 1ðzd ¼ uÞ1s ðu; vÞ þ2 c 1 s uav s¼1 ! r X þO l2s

#

ðA:4Þ

s¼1

Here and in the following, for any two vectors x; y 2 Rr , xry denotes xiryi for all i ¼ 1, y, r, where xi and yi are the ith component of x and y, respectively.

306

GAOSHENG JU ET AL.

We use f(xc|xd) and F(xc|xd) to denote the conditional density function and conditional CDF of X, respectively. Then, f ðxc ; xd Þ ¼ f ðxc jxd Þpðxd Þ

Fðxc ; xd Þ ¼

X

(A.5)

Fðxc jzd Þpðzd Þ

(A.6)

zd 2S d ; zd xd

P With (A.5) and (A.6), we can calculate E½GðÞ LðÞ by two steps. First, integrate the integrand with respect to xc conditional on xd and then take the summation with respect to xd. Thus, " # xc X ci X c d d ^ E Fðx ; x Þ ¼ E G LðX i ; u; lÞ h uxd X Z xc zc X G Lðzd ; u; lÞf ðzc jzd Þpðzd Þdzc ¼ h d d ux zd 2S ! X X Z xc zc c d c d G ¼ Lðz ; u; lÞ pðzd Þ f ðz jz Þdz h uxd zd 2S d " q X X k2 c d 2 ¼ Fðxc jzd Þ þ F ð2Þ jj ðx jz Þhj 2 j¼1 zd 2S d !! ! q r X X X 3 þO hj 1 ls 1ðzd ¼ uÞ j¼1

s¼1

uxd

!!# r X ls 2 d pðzd Þ þ ls 1s ðz ; uÞ þ O c 1 s s¼1 s¼1 " # r X X X 1 c d d c d d c d ¼ Fðx ; x Þ þ 1s ðz ; uÞFðx jx Þpðx Þ Fðx ; z Þ ls cs 1 d d uxd s¼1 z 2S ! q h q r X X X k2 ð2Þ c d i 2 3 2 þ hj þ ls F ðx ; x Þ hj þ O 2 jj s¼1 j¼1 j¼1 ! q q r r X X X X 2 3 2 c d ðA:7Þ ¼ Fðx ; x Þ þ B1j hj þ B2s ls þ O hj þ ls r X

j¼1

s¼1

c d where B1j ¼ðk2 =2ÞF ð2Þ jj ðx ; x Þ; B2s ¼ ð1=ðcs 1ÞÞ d c d pðx Þ Fðx ; x Þ.

s¼1

j¼1

P

zd 2S d

P

uxd 1s ðz

d

; uÞFðxc jxd Þ

307


So we obtain

^ c ; xd ÞÞ ¼ biasðFðx

q X

B1j h2j

þ

r X

j¼1

B2s ls þ O

q X

h3j

þ

j¼1

s¼1

r X

! l2s

(A.8)

s¼1

Similarly, combining (A.2) and (A.4), we have 2

" #2 3 c c X x Xi LðX di ; u; lÞ 5 E 4G2 h uxd ¼

XZ zd 2S d

¼

X

G2

xc zc h

" c

d

Fðx jz Þ a0

#2 Lðzd ; u; lÞ f ðzc jzd Þpðzd Þdzc

uxd q X

c d F ð1Þ j ðx jz Þhj

þO

q X

j¼1

zd 2S d

"

12

" X

!

r X

ls

s¼1

X

!# h2j

j¼1

1ðzd ¼ uÞ

uxd

r r X X ls X X 1ðzd ¼ uÞ1s ðu; vÞ þ O l2s þ2 c 1 uav s¼1 s s¼1

¼ Fðxc ; xd Þ a0

q X

!# pðzd Þ

c d F ð1Þ j ðx ; x Þhj

j¼1

" # r X 2 XX þ 1s ðu; vÞFðxc juÞpðuÞ 2Fðxc ; xd Þ ls cs 1 uav s¼1 ! q r X X 2 2 þO hj þ ls j¼1 c

d

¼ Fðx ; x Þ

s¼1 q X j¼1

A1j hj þ

r X

C 2s ls þ O

j¼1

s¼1

c d where A1j ¼ a0 F ð1Þ j ðx ; x Þ; C 2s ¼ ð2=ðcs 1ÞÞ

q X

PP uav

h2j

þ

r X

! l2s

s¼1

1s ðu; vÞFðxc juÞpðuÞ 2Fðxc ; xd Þ.

308

GAOSHENG JU ET AL.

Hence, " # 1 xc X ci X c d d ^ var½Fðx ; x Þ ¼ var G LðX i ; u; lÞ h n uxd 2 2 !2 3 X 1 4 4 2 xc X ci ¼ E G LðX di ; u; lÞ 5 h n d ux " " ##2 3 xc X ci X 5 E G LðX di ; u; lÞ h d ux " ! q q r r X X X X 1 2 2 c d Fðx ; x Þ A1j hj þ C 2s ls þ O hj þ ls ¼ n j¼1 j¼1 s¼1 s¼1 !!2 3 q q r r X X X X 2 3 2 5 ðFðxc ; xd Þ þ B1j hj þ B2s ls þ O hj þ ls j¼1

j¼1

s¼1

s¼1

X 1 hj Fðxc ; xd Þð1 Fðxc ; xd ÞÞ A1j n n j¼1 q

¼

þ

r X

ðC 2s 2Fðxc ; xd ÞB2s Þ

s¼1 q r 1X 2 1X hj þ l2 þO n j¼1 n s¼1 s

ls n

!

r X 1 hj X ls Fðxc ; xd Þð1 Fðxc ; xd ÞÞ A1j þ C 2s n n n j¼1 s¼1 ! q r 1X 2 1X hj þ l2 þO n j¼1 n s¼1 s q

¼

where A2s ¼ C2s2F(xc, xd)B2s.

ðA:9Þ

309


Using (A.8) and (A.9), we have ^ c ; xd ÞÞ ¼ ½biasðFðx ^ c ; xd ÞÞ2 þ varðFðx ^ c ; xd ÞÞ MSEðFðx ¼

q X j¼1

B1j h2j þ

r X

B2s ls þ O

s¼1

q X j¼1

h3j þ

r X

!!2 l2s

s¼1

q r X 1 hj X ls þ Fðxc ; xd Þð1 Fðxc ; xd ÞÞ A1j þ A2s n s¼1 n n j¼1 ! q r 1X 2 1X 1 þO hj þ l2 ¼ Fðxc ; xd Þð1 Fðxc ; xd ÞÞ n j¼1 n s¼1 s n !2 q q r r X X X hj X ls 2 A1j þ A2s þ B1j hj þ B2s ls n s¼1 n j¼1 j¼1 s¼1 ! q q r r X X 1X 2 1X þO h þ l2 þ h6j þ l4s n j¼1 j n s¼1 s j¼1 s¼1

Thus, we obtain X Z 1 Fðxc ; xd Þð1 Fðxc ; xd ÞÞ n d d 2sd d x x 2S !2 1 q q r r X X X hj X ls 2 A1j þ A2s þ B1j hj þ B2s ls Adxc n s¼1 n s¼1 j¼1 j¼1 ! q q r r X X 1X 2 1X 2 6 4 þO h þ l þ h þ l n j¼1 j n s¼1 s j¼1 j s¼1 s ! XZ 1 T T c ¼Z BB dx Z þ AT Z~ n d xd 2S Z X 1 Fðxc ; xd Þð1 Fðxc ; xd ÞÞdxc þ n d d x 2S ! q q r r X X 1X 2 1X 2 6 4 þO h þ l þ h þ l n j¼1 j n s¼1 s j¼1 j s¼1 s

^ c ; xd ÞÞ ¼ MISEðFðx

XZ

^ c ; xd ÞÞdxc ¼ MSEðFðx

310

GAOSHENG JU ET AL.

T where Z ¼ ðh21 ; . . . ; h2q ; l1 ; . . . ; lr ÞTP ; Z~ ¼ ðh R 1 ; . . . ; hq ; l1 ; . . . ; lr Þ ; B ¼ T ðB11 ; . . . ; B1q ; B21 ; . . . ; B2r Þ and A ¼ xd 2Sd ðA11 ; . . . ; A1q ; A21 ; . . . ; A2r ÞT dxc . P Let W i ¼ Gððxc X ci Þ=hÞ uxd LðX di ; u; lÞ. From (A.8), (A.9), and condition (C3), we have

q r X X pffiffiffi ^ c ; xd Þ Fðxc ; xd Þ B1j h2j B2s ls n Fðx j¼1

!

s¼1

" # q n r X X 1 X 2 W i Fðxc ; xd Þ B1j hj B2s ls ¼ pffiffiffi n i¼1 j¼1 s¼1 ! q n r X X pffiffiffi 1 X 3 2 ¼ pffiffiffi ½W i EðW i Þ þ nOp hj þ ls n i¼1 j¼1 s¼1 d

! Nð0; Fðxc ; xd Þð1 Fðxc ; xd ÞÞÞ pffiffiffi P by Lyapunov’s central limit theorem and varðð1= nÞ ni¼1 ½W i EðW i ÞÞ ! Fðxc ; xd Þð1 Fðxc ; xd ÞÞ. This completes the proof of Theorem 1.

APPENDIX B. PROOF OF THEOREM 2 Proof of Theorem 2 Recall that " # n X Z 1X CVðh; lÞ ¼ ðIðxc ; X ci ÞIðxd ; X di Þ F^ i ðxc ; xd ÞÞ2 dxc n i¼1 d d x 2S

P P where F^ i ðxc ;xd Þ ¼ ð1=ðn1ÞÞ jai Gððxc X cj Þ=hÞ uxd LðX dj ; u; lÞ; Iðxc ;X ci Þ ¼ 1ðX ci xc Þ, and Iðxd ; X di Þ ¼ 1ðX di xd Þ. c c d ; X di Þ and H ¼ CVðh; lÞ ð1=nÞ Ii R Iðx; X i Þ ¼ Iðx c; X di ÞIðx PLet n P 2 c ½Iðx; X Þ Fðx ; x Þ dx . For simplicity, we use F^ i and i i¼1 xd 2sd F to denote F^ i ðxc ; xd Þ and F(xc, xd), respectively, throughout this appendix.

311


Then we have nH ¼ ¼ ¼

X X Z i

xd 2S d

i

d

X X Z xd 2S

X X Z i

½ðI i F^ i Þ2 ðI i FÞ2 dxc fðF^ i FÞ2 2ðI i FÞðF^ i FÞgdxc ðF^ i FÞ2 dxc 2

X X Z i

xd 2S d

ðI i FÞðF^ i FÞdxc

xd 2S d

S 1 2S2

ðB:1Þ

Let Di ¼ Gððxc X ci Þ=hÞ

P

d uxd LðX i ; u; lÞ

Fðxc ; xd Þ;

D0i ¼ I i F, then

2 X X Z n 1 ^ ðF FÞ Di dxc S1 ¼ n1 n1 i xd 2S d i xd 2S d X Z X X Z n3 ^ FÞ2 dxc 2n ð F ðF^ FÞDi dxc ¼ ðn 1Þ2 xd 2Sd ðn 1Þ2 i xd 2Sd X X Z 2 D2i dxc þ ðn 1Þ X X Z

ðF^ i FÞ2 dxc ¼

i

xd 2S d

Z X X Z n 2n X 1 ^ FÞ2 dxc þ ð F D2i dxc ¼ ðn 1Þ2 xd 2Sd ðn 1Þ2 i xd 2Sd 3

2

ðB:2Þ

and S2 ¼

XXZ i

xd 2S d

ðI i FÞðF^ i FÞdxc

n 1 ^ ðF FÞ Di dxc ðI i FÞ ¼ n1 n1 i xd 2S d # Z " X Z n2 X 1 1 XX ðI i FÞDi dxc ¼ I i F ðF^ FÞdxc n1 d d n i n1 i d d x 2S x 2S Z Z 2 X X X n 1 ðF n FÞðF^ FÞdxc Di D0i dxc ¼ ðB:3Þ n1 d d n1 i d d XXZ

x 2S

by noting that F n F n ðxc ; xd Þ ¼ ð1=nÞ

x 2S

Pn

c d c d i¼1 Iðx ; X i ÞIðx ; X i Þ

ð1=nÞ

P

iI i.

312

GAOSHENG JU ET AL.

Combining (B.1), (B.2), and (B.3), we have 1 ½S 1 2S 2 n X Z X X Z 1 1 ^ FÞ2 dxc þ ð F D2i dxc ¼ 1 ðn 1Þ2 xd 2Sd nðn 1Þ2 i xd 2Sd X Z 1 ðF n FÞðF^ FÞdxc 2 1þ n1 d d x 2S X X Z 2 Di D0i dxc þ ðB:4Þ nðn 1Þ i d d x 2S R P Let mðh; lÞ ¼ xd 2Sd EðDi D0i Þdxc . Using lemma (B.1) and (B.4), we have that X Z X Z ðF n FÞ2 dxc ¼ ðF^ F n Þ2 dxc Hþ

H¼

xd 2S d

xd 2S d

X Z

Z 1 2 X 2 c ^ ðF FÞ dx ðF n FÞðF^ FÞdxc n1 d d ðn 1Þ2 xd 2Sd x 2S X X Z X X Z 1 2 2 c Di dx þ Di D0i dxc þ nðn 1Þ i d d nðn 1Þ2 i xd 2Sd x 2S X Z 2 2 c ðF^ F n Þ dx þ ¼ mðh; lÞ n1 d d x 2S ! q r X X 4 2 3=2 1 1 þ Op n þn hj þ n ls ðB:5Þ j¼1 c

Recall that W i ¼ Gððx XZ xd 2S d

s¼1

X ci Þ=hÞ

XZ

P

d uxd LðX i ; u; lÞ,

we have that

!2

n n 1X 1X Wi I i dxc n n i¼1 i¼1 xd 2S d Z X X X 1 ðW i I i ÞðW j I j Þdxc ¼ 2 n iaj xd 2S d Z 1XX ðW i I i Þ2 dxc þ 2 n i d d

ðF^ F n Þ2 dxc ¼

x 2S

n 1 XX 1X ¼ 2 gðX i ; X j Þ þ 2 gðX i ; X i Þ ¼ S þ T n n i¼1 iaj

ðB:6Þ

313


P where the definitions of S and T are obvious, and gðX i ; X j Þ ¼ xd 2Sd R ðW i I i ÞðW j I j Þdxc . We can see that S is a second-order U-statistic. Define g1(x) ¼ E[g(x, X1)] and g0 ¼ E[g1(X1)], then we have g1(Xi) ¼ E[g(Xi, Xj)|Xi] and g1(Xj) ¼ E[g(Xi, Xj)|Xj], if i 6¼ j. Using the Hoeffding decomposition, we have XX gðX i ; X j Þ S ¼ n2 iaj

¼n

2

XX

fgðX i ; X j Þ g1 ðX j Þ g1 ðX j Þ þ g0 g

iaj

n 1 1 X 1 fg1 ðX i Þ g0 g þ 1 g0 1 n n i¼1 n 1 ¼ S ð1Þ þ S ð2Þ þ 1 g0 n þ2

ðB:7Þ

where the definitions of S(1) and S(2) are obvious. Then by the law of iterated expectations, we have XX ½EðgðX i ; X j ÞÞ Eðg1 ðX i ÞÞ Eðg1 ðX j ÞÞ þ g0 ¼ 0 (B.8) EðSð1Þ Þ ¼ n2 iaj

EðS ð2Þ Þ ¼ 2n1 ð1 n1 Þ

n X

ðEðg1 ðX i ÞÞ g0 Þ ¼ 0

(B.9)

i¼1

Also, it is easy to see that E[S(1)|Xi] ¼ 0 for all i ¼ 1, y, n and E[S(2)|Xj] ¼ 0 for j 6¼ i, since Xi and Xj are independent. Thus, we have !2 XX ð1Þ 2 2 ðgðX i ; X j Þ g1 ðX i Þ g1 ðX j Þ þ g0 Þ EðS Þ ¼ E n iaj

¼n

4

XX

EðgðX i ; X j Þ g1 ðX i Þ g1 ðX j Þ þ g0 Þ2

ðB:10Þ

iaj

and "

#2 n X EðS Þ ¼ E 2n ð1 n Þ ðg1 ðX i Þ g0 Þ ð2Þ 2

1

1

i¼1 n 4ðn 1Þ X ¼ E½g1 ðX i Þ g0 2 4 n i¼1 2

(B.11)

314

GAOSHENG JU ET AL.

From lemma B.2 and (B.10), (B.11), we have 1 2 2 2 2 EðS Þ ¼ O 4 ðn nÞðEðgðX i ; X j ÞÞ þ Eðg1 ðX 1 ÞÞ þ g0 n !! q q q r X X X X 3q 2qþ4 8 4 2 ¼O n hj þ hj þ hj þ ls ð1Þ 2

j¼1

j¼1

j¼1

s¼1

and EðS ð2Þ Þ2 ¼

n 4ðn 1Þ2 X E½g1 ðX i Þ g0 2 n4 i¼1

¼O n

1

q X

h2qþ4 j

þ

q X

j¼1

h8j

þ

j¼1

r X

!! l4s

s¼1

R P Also, P E½gðX 1 ; X 1 Þ2 ¼ E½ xd 2Sd ðW 1 I 1 Þ2 dxc 2 ¼ Oð1Þ implies VarðTÞ ¼ 3 Varðn2 i gðX i ; X i ÞÞ ¼ ð1=n3 ÞVarðgðX i ; X i ÞÞ P¼ OðnR Þ. 2 c ^ Combining (B.6) and (B.7), we have xd 2S d ðF F n Þ dx ¼ S þ T ¼ ð1Þ ð2Þ 1 S þ S þ ð1 n Þg0 þ T.With (B.8) and (B.9), we have " E

X Z

# 2 c ^ ðF F n Þ dx ¼ EðS ð1Þ Þ þ EðSð2Þ Þ þ ð1 n1 Þg0

xd 2S d

(B.12)

þ EðTÞ ¼ ð1 n1 Þg0 þ EðTÞ R P Using (B.10), (B.11), and (B.12), we can see that Eð xd 2Sd ðF^ F n Þ2 dxc S ð2Þ þ ð1 n1 Þg0 EðTÞÞ2 ¼ E½S þ T ð1 n1 Þg0 EðTÞ2 ¼ E½S ð1Þ Pþ q 3q 2 ð1Þ 2 ð2Þ 2 3 2 ðT P EðTÞÞ ¼ OðEðS EðS j¼1 hj þ Pq Þ8 þ 1 Pr Þ 4þ VarðTÞÞ ¼ Oðn þ n q 2qþ4 1 1 þn n s¼1 ls Þ. Hence, j¼1 hj j¼1 hj þ n X Z

ðF^ F n Þ2 dxc ¼ EðTÞ þ ð1 n1 Þg0

xd 2S d

þ Op n3=2 þ n1

q X

3q=2

hj

þ n1=2

j¼1

þ n1=2

q X j¼1

h4j þ n1=2

r X s¼1

! l2s

q X

hqþ2 j

j¼1

ðB:13Þ

315


Combining (B.5), (B.12), and (B.13), we have XZ Hþ ðF n FÞ2 dxc ¼ EðTÞ þ ð1 n1 Þg0 þ 2ðn 1Þ1 mðh; lÞ xd 2Sd

þ Op n

3=2

1

þn

q X

3q=2 hj

j¼1

"

XZ

¼E

1=2

þn

q X

hjqþ2

þn

j¼1

#

h4j

1=2

þn

hqþ2 þ n1=2 j

q X

h4j þ n1=2

r X

r X

! l2s

s¼1

ðF^ F n Þ2 dxc þ 2ðn 1Þ1 mðh; lÞ þ Op n3=2 þ n1

q X j¼1

q X j¼1

xd 2Sd

þ n1=2

1=2

q X

3q=2

hj

j¼1

! l2s

ðB:14Þ

s¼1

j¼1

It is easy to see that " # XZ XZ XZ 2 c E ðF^ F n Þ dx ¼ E½ðF^ FÞ2 dxc þ E ðF n F Þ2 dxc xd 2Sd

xd 2Sd

"

XZ

2E

xd 2Sd

EðF^ FÞðF n FÞdx

xd 2Sd

XZ

¼

# c

E½ðF^ FÞ2 dxc þ

xd 2Sd

XZ

E½ðF n FÞ2 dxc

xd 2Sd

Z 1X E½ðW i FÞðI i FÞdxc n d d x 2S XZ ¼ E½ðF^ FÞ2 dxc 2

xd 2Sd

þ

XZ

xd 2Sd

2 E½ðF n F Þ2 dxc mðh; lÞ n

R R P R P P 0 c ðD f ðGðvÞ Also, we have mðh; lÞ ¼ E½ D Þdx ¼ d 2S d d 2S d d 2S d i i x x x P d c d c c d d 1 c d Lðx ; u; lÞ Fðx þ hv; x ÞÞðIðx þ hv; x ÞIðx ; x Þ Fðx þ hv; x ÞÞhdvg uxd 1 1 1 1 1 P 1 f ðxc1 ; xd1 Þdxc1 ¼ Oð qj¼1 hqj Þ. Thus, we have " # XZ XZ 2 c ^ ðF F n Þ dx ¼ E½ðF^ FÞ2 dxc E xd 2S d

xd 2S d

þ

XZ

xd 2S d

2

c

E½ðF n FÞ dx þO n

1

q X j¼1

! hqj

(B.15)

316

GAOSHENG JU ET AL.

Combining (B.14) and (B.15), we obtain that XZ XZ XZ Hþ ðF n FÞ2 dxc E½ðF n FÞ2 dxc ¼ E½ðF^ FÞ2 dxc xd 2S d

xd 2S d

þ Op n

3=2

þn

1

q X

hqj

þn

xd 2S d

1=2

q X

j¼1

hqþ2 j

þn

1=2

q X

j¼1

h4j

þn

1=2

j¼1

r X

! l2s

s¼1

That is, CVðh; lÞ þ J n ¼ MISEðh; lÞ þ Op n3=2 þ n1

q X

hqj

j¼1 1=2

þn

q X

hqþ2 j

þn

1=2

q X

j¼1

h4j

þn

1=2

j¼1

r X

! l2s

s¼1

Essentially, we have proved the upper bound of the second moment of CV(h, l)þJn–MISE(h, l). Using Markov’s inequality to the left hand side of (B.16) and Rosenthal’s inequality (see Hall & Heyde, 1980, p. 23) to S(1) in (B.7) and repeating the previous proof, we can give the upper bound of each order moment of CV(h, l)þJnMISE(h, l). With the aid of nd and the differentiability of the kernel function, we can get ( q X hqj P sup jCVðh; lÞ þ J n MISEðh; lÞj4 n3=2 þ n1 j¼1

þn

1=2

q X

hqþ2 j

þn

1=2

j¼1

q X

h4j

þn

j¼1

1=2

r X

!) l2s Þnd

(B.16) g

¼ Oðn Þ

s¼1

for arbitrarily large g. Then by the Borel–Cantelli lemma, we obtain the uniform strong convergence. This completes the proof of Theorem 2. Lemma B.1. X Z X Z (i) ðF^ FÞ2 dxc þ ðF n FÞðF^ FÞdxc xd 2S d

xd 2S d

¼ Op n1 þ

q X j¼1

(ii) n3

X X Z i

h4j þ

r X

! l2s

s¼1

D2i dxc ¼ Op ðn2 Þ and n2

i

xd 2S d

¼ n1

X Z

xd 2S

d

X X Z

EðDi D0i Þdxc þ Oðn3=2 Þ.

xd 2S d

Di D0i dxc

317


P Proof. From (A.8) and (A.9), we have F^ F ¼ Op ðn1=2 þ qj¼1 h2j þ R P P P Pr 2 c 1 ^ þ qj¼1 h4j þ rs¼1 l2s Þ. s¼1 ls Þ. So we have xd 2S d ðF FÞ dx ¼ Op ðn It is easy to see that E½F n ðxc ; xd Þ ¼ E½Iðx; X i Þ ¼ Fðxc ; xd Þ and Var(Fn (xc, xd)) ¼ n1{E[I(x, Xi)]2(E[I(x, Xi)])2} ¼ n1F(xc, xd)[1F(xc, xd)]. Thus, we have E½F n ðxc ; xd Þ Fðxc ; xd Þ2 ¼ Var½F n ðxc ; xd Þ ¼ Oð1=nÞ, R P which implies F n ðxc ; xd Þ Fðxc ; xd Þ ¼ Op ðn1=2 Þ and xd 2S d ðF n FÞ P P ðF^ FÞdxc ¼ Oðn1 þ qj¼1 h4j þ rs¼1 l2s Þ. From the law P of large numbers andPthe central limit theorem, we get that n1 i D2i ¼ Op ð1Þ and n1 i Di D0i ¼ EðDi D0i Þ þ Op ðn1=2 Þ. R R PP PP Therefore, n3 i xd 2Sd D2i dxc ¼ n2 ð1=nÞ i xd 2Sd D2i dxc ¼ Op ðn2 Þ R R P P P and n2 i xd 2Sd Di D0i dxc ¼ n1 xd 2Sd EðDi D0i Þdxc þ Op ðn3=2 Þ. This completes the proof of this lemma. P Lemma B.2. P (i) E½gðX 1 ; X 2 Þ2 ¼P Oð qj¼1 h3q Þ; (ii) Eðg1 ðX 1 ÞÞ2 ¼ jP Pq 2qþ4 q r r 4 4 2 þ s¼1 ls Þ; (iii) g0 ¼ Oð j¼1 hj þ s¼1 ls Þ. Oð j¼1 hj Proof. Using the change of variables, we have E½gðX 1 ; X 2 Þ2 ¼

( " XZ X XZ xc xc1 X G Lðxd1 ; u; lÞ h uxd xd1 2Sd xd2 2Sd xd 2Sd # Iðxc ; xc1 ÞIðxd ; xd1 Þ

" # )2 xc xc2 X d c c d d Lðx2 ; u; lÞ Iðx ; x2 ÞIðx ; x2 Þ dxc G h d ux f ðxc1 ; xd1 ; xc2 ; xd2 Þdxc1 dxc2 ( " # XZ X X XZ d c c d d GðvÞ ¼ Lðx1 ; u; lÞ Iðx1 þ hv; x1 ÞIðx ; x1 Þ xd1 2Sd xd2 2Sd

xd 2Sd

uxd

" # )2 xc1 xc2 X d c c d d Lðx2 ; u; lÞ Iðx1 þ hv; x2 ÞIðx ; x2 Þ hdv G vþ h uxd f ðxc1 ; xd1 ; xc2 ; xd2 Þdxc1 dxc2 ( " # XZ X X XZ d d d GðvÞ ¼ Lðx1 ; u; lÞ Iðhv; 0ÞIðx ; x1 Þ xd1 2Sd xd2 2Sd

"

Gðv þ wÞ

xd 2Sd

X

uxd

# Lðxd2 ; u; lÞ Iðhðv þ wÞ; 0ÞIðxd ; xd2 Þ

uxd

f ðxc2

þ hw; xd1 ; xc2 ; xd2 Þhdwdxc2

¼O

q X j¼1

)2

hdv

! h3q j

ðB:17Þ

318

GAOSHENG JU ET AL.

(A.7) and E(Ii) ¼ F(xc, xd), we obtain E½W 1 I 1 ¼ Oð PFrom r s¼1 ls Þ. Then we have ( )2 X Z 2 c ðW 1 I 1 ÞE½W 1 I 1 dx Eðg1 ðX 1 ÞÞ ¼ E xd 2S d

¼ ðE½W 1 I 1 Þ

( X Z X Z

2

xd1 2S d c

Iðx

¼O

xd 2S d

!

; xc1 ÞIðxd ; xd1 Þ

q X

h4j

þ

j¼1

r X

! l2s

s¼1

dx

( X Z X Z xd1 2S d

¼O

j¼1

þ

r X

c x xc1 X Lðxd1 ; u; lÞ h uxd

dxc1

Iðxc1 þ hv; xc1 ÞIðxd ; xd1 Þ hdv h2qþ4 j

2 j¼1 hj þ

)2 c

!

q X

G

Pq

xd 2S d

GðvÞ

X

Lðxd1 ; u; lÞ

uxd

)2 dxc1

! l4s

.

ðB:18Þ

s¼1

is easy to see that g0 ¼ E½g1 ðX 1 Þ ¼ ðE½W 1 I 1 Þ2 ¼ Oð PIt r 2 s¼1 ls Þ, which completes the proof.

Pq

4 j¼1 hj þ