NONPARAMETRIC ESTIMATION OF MULTIVARIATE CDF WITH CATEGORICAL AND CONTINUOUS DATA Gaosheng Ju, Rui Li and Zhongwen Liang ABSTRACT In this paper we construct a nonparametric kernel estimator to estimate the joint multivariate cumulative distribution function (CDF) of mixed discrete and continuous variables. We use a data-driven cross-validation method to choose optimal smoothing parameters which asymptotically minimize the mean integrated squared error (MISE). The asymptotic theory of the proposed estimator is derived, and the validity of the crossvalidation method is proved. We provide sufficient and necessary conditions for the uniqueness of optimal smoothing parameters when the estimation of CDF degenerates to the case with only continuous variables, and provide a sufficient condition for the general mixed variables case.

1. INTRODUCTION As the rapid advancement of modern computer technology makes the computing of complicated problems feasible, nonparametric statistic methods become increasingly popular. Nonparametric methods have been applied in many economic contexts. The most striking advantage of nonparametric methods over parametric ones is that no prior assumptions, which often turn out to be inappropriate, about the unknown true distributions are taken. The joint distributions of multiple economic variables can give a direct illustration of the relationship among these variables and help researchers to infer the underlying causality. Consequently, the estimation of joint distributions is an important and fundamental issue in the nonparametric econometrics/statistics literature. Traditionally, nonparametric methods focus on the estimation of either continuous variables or discrete variables (see, e.g., Grund, 1993; Grund & Hall, 1993; Hall, 1981). However, estimation and testing methods able to handle mixed data are quite desirable because most data sets contain both continuous and discrete variables. For instance, labor economists are usually interested in the relationships between the continuous income and discrete explanatory variables such as gender, race, education levels, locations, etc. Recently, Li and Racine (2003), Racine and Li (2004), and Li and Racine (2008) discussed nonparametric smoothing estimations of probability density functions, regression functions, and conditional cumulative distribution functions (CDF) and quantile functions (with mixed discrete and continuous variables). Their work is of great importance for enlarging the scope of the application of nonparametric methods to the context with both continuous and discrete variables. This paper contributes to this literature by investigating a nonparametric estimation of the unconditional joint CDF of mixed data types. One difficulty in dealing with the estimation of discrete and continuous variables simultaneously is a lack of joint observations. Conventional approaches to handle the estimation of CDF of discrete variables are frequency based. Although we can directly combine it with the kernel estimator of continuous variables, the approach suffers because the number of observations for estimation of discrete variables by a frequency-based approach may be insufficient to ensure an accurate nonparametric estimation of marginal CDF for the remaining continuous variables. Aitchison and Aitken (1976) proposed a novel nonparametric smoothing method to estimate distribution functions defined over binary data. Their method can mitigate the problem of data insufficiency for finite-sample applications.

Nonparametric Estimation of Multivariate CDF


Their proposed smoothing method can reduce the estimation variance significantly, though it incurs some mild estimation bias. Li and Racine (2003) extended Aitchison and Aitken’s method to a context with mixed discrete and continuous variables. In this paper, we adopt their ideas of smoothing both discrete and continuous variables to estimate an unconditional CDF which contains both discrete and continuous components. It is well known that the selection of smoothing parameters is of crucial importance in nonparametric estimations. There exist several popular methods of smoothing parameter selections. Among them, the most popular ways are the plug-in method and the cross-validation method. There are many discussions about these methods (e.g., Ha¨rdle & Marron, 1985; Loader, 1999). However, there is no clear conclusion which method is better. In practice, the cross validation may be a preferred choice, especially in multivariate settings. This is because the cross-validation method is fully data driven. Rudemo (1982) and Bowman (1984) introduced the cross-validation selection of smoothing parameters for density estimation (see Wand & Jones, 1995, Chapter 3; Li & Racine, 2007, for a thorough discussion). Bowman, Hall, and Prvan (1998) presented a cross-validation bandwidth selection for the smoothing estimation of continuous distribution functions. In this paper, we propose to use the least squares cross-validation method to choose the smoothing parameters. We will show that the resultant smoothing parameters are optimal in the sense of minimizing the mean integrated squared error (MISE). Another interesting problem is the uniqueness of the smoothing parameter vector in cross-validation methods. This was first tackled in Li and Zhou (2005) for the nonparametric kernel estimation of the PDF and regression function of continuous variables. We also discuss this problem in the paper. We give a sufficient and necessary condition for uniqueness when the estimation of CDF degenerates to a case with only continuous variables. For the case of mixed variables, we provide a sufficient condition. The estimation of CDF is quite useful in econometrics and economics, especially for the econometric theory and economic applications of tests of stochastic dominance. Recently, there are some theories and applications about nonparametric tests of stochastic dominance. Among them are Barrett and Donald (2003) which provided some consistent tests of stochastic dominance for any pre-specified order, Anderson (1996) which gave a nonparametric test of stochastic dominance applied in income distributions, and Davidson and Duclos (2000) which showed some statistical inference and applications in poverty, inequality, and social welfare. Our estimation can be



readily used in the test of stochastic dominance under the circumstance of mixed data. The paper is organized as follows. In Section 2, we propose an estimator of distribution function that admits mixed discrete and continuous variables. We derive the rates of convergence and establish the asymptotic normality of our estimator. In Section 3, we show that the smoothing parameters selected by the cross-validation method are optimal in the sense that they converge to the minimizer of MISE in probability. In Section 4, we give a sufficient and necessary condition for the uniqueness of the smoothing parameter vector when the estimation contains continuous variables only, and we give a sufficient condition for the mixed case. Section 5 provides an empirical application to examine the relationship between city size and unemployment rate. Section 6 concludes the paper.

2. ESTIMATION OF CDF WITH MIXED DISCRETE AND CONTINUOUS VARIABLES We consider the case for which x is a vector containing a mix of discrete and continuous variables. Let x ¼ (xc, xd), where xcARq is a q-dimensional continuous random vector, and where xd is an r-dimensional discrete random vector. Let X dis ðxds Þ denote the sth component of X di ðxd Þ; s ¼ 1; . . . ; r, i ¼ 1, y n, where n is the sample size. We restrict the discrete components to a finite support. Without loss of generality, assume that the support of X dis is {0, 1, y, cs1}, hence the support of X di is S d ¼ Prs¼1 f0; 1; . . . ; cs  1g. For discrete variables, we use the following kernel: ( 1  ls ; if X dis ¼ xds d d lðX is ; xs ; ls Þ ¼ ls =ðcs  1Þ; if X dis axds Note that ls is a bandwidth having the following properties: when ls ¼ 0, lðX dis ; xds ; 0Þ becomes an indicator function, and when ls ¼ ðcs  1Þ=cs , lðX dis ; xds ; ðcs  1Þ=cs Þ ¼ 1=cs becomes a uniform weight function. Thus, the range of ls is [0, (cs1)/cs]. The product kernel function is given by LðX di ; xd ; lÞ ¼

r Y

lðX dis ; xds ; ls Þ


We use k(  ) to denote a univariate kernel function for a continuous variable. The product kernel function used for the continuous variables


Nonparametric Estimation of Multivariate CDF

is given by  c  Y  c  q X ij  xcj X i  xc ¼ k K h hj j¼1 where X cij ðxcj Þ denotes the jth component of X ci ðxc Þ; j ¼ 1; . . . ; q, i ¼ 1, y, n, and hj is the bandwidth associated with xcj . We use f(x) and F(x) to denote the density function and CDF of X, respectively. Following Li and Racine (2003), the kernel estimator of density function f(x) is given by  c  n X 1 X i  xc LðX di ; xdi ; lÞ K f^ðxÞ ¼ f^ðxc ; xd Þ ¼ h nh1 h2    hq i¼1 Naturally, one can obtain a kernel estimator of F(x) by integrating f^ðxÞ, which is expressed as "  !# n c X c X 1 x  X c d d i ^ ^ ;x Þ ¼ FðxÞ ¼ Fðx G LðX i ; u; lÞ (1) h n i¼1 uxd Rx Q where GðxÞ ¼ 1 kðvÞdv, and Gððxc  X ci Þ=hÞ ¼ qj¼1 Gððxcj  X cij Þ=hj Þ. We introduce some notations before we state the main theorem of this section. Let 1(A) denote an indicator function that takes the value 1 if A occurs and 0 otherwise. Define an indicator function 1s(  ,  ) by 1s ðzd ; uÞ ¼ 1ðzds aus Þ

r Y

1ðzdt ¼ ut Þ



We can see that 1s(  ,  ) equals to one if and only if zd and u differ only in the sth component. The following assumptions will be used in studying the asymptotic behavior of cross-validated smoothing parameters and in deriving the asymptotic distribution of our CDF estimator. Condition (C1). The data fðX ci ; X di Þgni¼1 are independent and identically distributed as (Xc, Xd). F(xc, xd) has continuous third-order partial derivatives with respect to xc. Condition (C2). k(  ) is a bounded and symmetric kernel density function R R with a compact support. k(v)dv ¼ 1, v2k(v)dv ¼ k2oN. Condition (C3). As n-N, hj-0, nh6j ! 0, for j ¼ 1, y, q and ls ! 0; nl4s ! 0, for s ¼ 1, y, r.



2 c d c c d c c Let F ð1Þ F ð2Þ j ðx ; x Þ ¼ ð@FðxÞÞ=ð@xj Þ; jj ðx ; x Þ ¼ ð@ FðxÞÞ=ð@xj @xj Þ. The next theorem shows the rate of convergence in terms of MSE and MISE and the asymptotic normality of our estimator.

Theorem 1. Under condition (C1), (C2), and (C3), we have q r X 1 hj X ls c d c d c d ^ MSEð Fðx ; x ÞÞ ¼ ; x Þð1  Fðx ; x ÞÞ  A1j þ A2s Fðx (i) n n n j¼1 s¼1 !2 q r X X B1j h2j þ B2s ls þ j¼1

1 þO n

s¼1 q X j¼1

r r X X 1X þ l2s þ h6j þ l4s n s¼1 j¼1 s¼1 q



R c d A2s ¼ 2=ðcs  1Þ where a0 ¼ 2 vGðvÞkðvÞdv; A1j ¼ a0 F ð1Þ j ðx ; x Þ; P P c c d c d 1 ðu; vÞFðx j uÞpðuÞ  2Fðx ; x Þ  2Fðx ; x ÞB2s , d d s ux vx ; vau P P ð2Þ c d B1j ¼ ð1=2Þk2 F jj ðx ; x Þ, and B2s ¼ 1=ðcs  1Þ zd 2Sd uxd 1s ðzd ; uÞ Fðxc jxd Þpðxd Þ  Fðxc ; xd Þ. ! X Z 1 ^ c ; xd ÞÞ ¼ Z T BBT dxc Z þ AT Z~ (ii) MISEðFðx n xd 2S d Z 1 X Fðxc ; xd Þð1  Fðxc ; xd ÞÞdxc þ n d d x 2S ! q q r r X X 1X 2 1X 2 6 4 þO h þ l þ hj þ ls ð3Þ n j¼1 j n s¼1 s j¼1 s¼1 where Z ¼ ðh21 ; . . . ; h2q ; l1 ; . . . ; lr ÞT ; Z~ ¼ ðh1 ; . .P . ; hq ; lR1 ; . . . ; lr ÞT ; T B ¼ ðB11 ; . . . ; B1q ; B21 ; . . . ; B2r Þ , and A ¼ xd 2Sd ðA11 ; . . . ; A1q ; A21 ; . . . ; A2r ÞT dxc . ! q r X X pffiffiffi (iii) 2 c d c d ^ ; x Þ  Fðx ; x Þ  B1j h  B2s ls n Fðx j




! Nð0; Fðxc ; xd Þð1  Fðxc ; xd ÞÞÞ. The proof of Theorem 1 is given in Appendix A. pffiffiffi We can see that the convergence rate of our CDF estimator is n. Under the optimal convergence rates for hj and ls, j ¼ 1, y, q, s ¼ 1, y, r

Nonparametric Estimation of Multivariate CDF


1/3 (i.e., and lsBn2/3), the statement (iii) in Theorem 1 simplifies to p ffiffiffi ^hjBn d c d nðFðx ; x Þ  Fðxc ; xd ÞÞ ! Nð0; Fðxc ; xd Þð1  Fðxc ; xd ÞÞÞ.

3. CROSS-VALIDATION BANDWIDTH SELECTION In this section, we focus on how to choose the smoothing parameters when ^ estimating FðÞ. Theoretically, we may choose the optimal bandwidths by minimizing the leading term of MISE given by Eq. (3) in Theorem 1. Taking derivatives with respect to hj and ls, one can easily see that optimal smoothing requires that hjBn1/3, j ¼ 1, y, q and lsBn2/3, s ¼ 1, y, r, as qZ1. However, we can see that the coefficients of these orders involve unknown functions. Therefore, this method is infeasible in practice. In practice one can compute plug-in bandwidths based on Eq. (3) by choosing some initial ‘‘pilot’’ bandwidths, the results may be sensitive to the choice of these pilots. Therefore, it is highly desirable to construct an automatic datadriven bandwidth selection procedure, which does not rely on some ad hoc pilot bandwidth values to estimate unknown functions. Following Bowman et al. (1998), we suggest choosing the smoothing parameters (h, l) ¼ (h1, y, hq, l1, y, lr) by minimizing the following crossvalidation function: " # n X Z 1X ðIðxc ; X ci ÞIðxd ; X di Þ  F^ i ðxc ; xd ÞÞ2 dxc CVðh; lÞ ¼ n i¼1 d d x 2S

P P where F^ i ðxc ; xd Þ ¼ ð1=ðn  1ÞÞ jai Gððxc  X cj Þ=hÞ uxd LðX dj ; u; lÞ; Iðxc ; X ci Þ ¼ 1ðX ci  xc Þ, and Iðxd ; X di Þ ¼ 1ðX di  xd Þ. Define I i  Iðx; X i Þ ¼ Iðxc ; X ci ÞIðxd ; X di Þ and a term unrelated to smoothing parameters X Z fðF n  FÞ2  E½ðF n  FÞ2 gdxc Jn ¼ xd 2S d

n X Z 1X ½Iðx; X i Þ  Fðxc ; xd Þ2 dxc n i¼1 d d x 2S

P where F n ðxc ; xd Þ ¼ ð1=nÞ ni¼1 Iðxc ; X ci ÞIðxd ; X di Þ is the empirical distribution function. In Theorem 2 below, we show that H(h, l) ¼ CV(h, l)þJn is a good approximation to MISE(h, l).



Theorem 2. Define H(h, l) ¼ CV(h, l)þJn, then under condition (C1) and (C2), we have for each d, e, CW0, Hðh; lÞ ¼ MISEðh; lÞ þ Op

n3=2 þ n1

q X

hqj þ n1=2



q X

h4j þ n1=2


r X

q X

hqþ2 j


! ! l2s nd


with probability 1, uniformly in 0rhj, lsrCne for j ¼ 1, y, q, s ¼ 1, y, r, as n-N. Essentially, Theorem 2 says that CV(h,l) ¼ (leading terms of MISE(h, l))þ (terms unrelated to h, l)þ(small order terms). Therefore, minimizing crossvalidation function is asymptotically equivalent to minimizing MISE (h, l). Therefore, we immediately have the following corollary. Corollary 1. Under the conditions (C1) and (C2), let h^j ; l^ s ; j ¼ 1; . . . ; q; s ¼ 1; . . . ; r denote the smoothing parameters that minimizes the CV(h, l) over the set [0, Cne]qþr for any CW0 and any 0oeo1/3, let h0j ; l0s ; j ¼ 1; . . . ; q; s ¼ 1; . . . ; r denote the smoothing parameters that minimizes the MISE(h, l), then we have h^j l^ s ! 1 and !1 h0j l0s

ðif l0s a0Þ


l^ s ! 0

ðif l0s ¼ 0Þ

in probability, for all j ¼ 1, y, q, and s ¼ 1, y, r. The proof of Theorem 2 is given in the Appendix B.

4. UNIQUENESS OF SMOOTHING PARAMETER VECTOR Section 3 has established the fact that minimizing cross-validation function is asymptotically equivalent to minimizing MISE. Hence, to investigate the asymptotic uniqueness of the cross-validated smoothing parameters, we only need to examine the uniqueness of parameters minimizing the leading terms of MISE. When there does not exist discrete variables,

Nonparametric Estimation of Multivariate CDF


our objective function is inf

Z2Rqþ ; jjZjj¼1

1 ZT MZ þ AT Z 1=2 n


R P where Z ¼ ðh21 ; . . . ; h2q ÞT ; Z 1=2 ¼ ðh1 ; . . . ; hq ÞT ; M ¼ xd 2Sd BBT dxc , and both A and B are of dimension q  1 (they are the first q elements of the general mixed variable case). Based on the previous discussion, the optimal rates for hj and ls are n1/3 and n2/3, respectively. Let hj ¼ ajn1/3, for j ¼ 1, y, q. Substituting these parameters into Eq. (4), then minimize Z T MZ þ ð1=nÞAT Z 1=2 is equivalent to minimize Z T MZ þ AT Z 1=2 , where we abuse notation a little bit, Z ¼ ða11 ; . . . ; a2q ÞT and Z 1=2 ¼ ða1 ; . . . ; aq ÞT . When the estimation of CDF degenerates to the case with only continuous variables, we give the necessary and sufficient condition in the following theorem. Theorem 3. Assume that r ¼ 0, let Z ¼ ðh21 ; . . . ; h2q ÞT , define m ¼ inf Z2Rqþ ; jjZjj¼1 Z T MZ. Then wðZÞ ¼ Z T MZ þ AT Z1=2 has a unique minimizer Z  2 Rqþ , if and only if mW0. Proof. Our proof follows similar arguments as in Li and Zhou (2005). First we prove the ‘‘only if’’ part. Suppose m ¼ 0 is attained at some Z ð0Þ 2 Rqþ with jjZð0Þ jj ¼ 1. Then there exists at least onepcomponent ffiffi ð0Þ ð0Þ T ð0Þ 2 a0, that is, Zð0Þ þ AT tðZ ð0Þ Þ1=2 ¼ Z ð0Þ i p i 40. So wðtZ Þ ¼ t ðZ Þ MZ ffiffi AT tðZð0Þ Þ1=2 ! 1, as t-þN. Note that the components of A are negative, and tZð0Þ 2 Rqþ : This implies that w has no minimizer. Next we prove the ‘‘if ’’ part. If mW0, for any Z 2 Rqþ , with jjZjj ¼ 1, pffiffi wepffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi have that tZ 2 Rqþ ; t40. Then wðtZÞ ¼ t2 Z T MZ þ tAT Z 1=2 ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffi ðt Z T MZ  ð1=ð2 Z T MZÞÞÞ2 þ ð t þ ððAT Z1=2 Þ=2ÞÞ2  ð1=4ZT MZÞ ððAT Z 1=2 Þ2 =4Þ ! þ1, as t-þN. For RW0, denote BR ¼ fZ 2 Rqþ : jjZjj  Rg. Since w is a continuous function on Rqþ ; BR is a compact set and w(tZ)-þN, as t-þN, we have that there exists RW0 such that minq wðZÞ3 min wðZÞ. Z2Rþ


From wðtZÞ ¼ t2 Z T MZ þ t1=2 AT Z 1=2 , we know that w(tZ) attains its minimum at t ¼ ððAT ZÞ=ð4Z T MZÞÞ2=3 40. So 0 is not the minimizer of w. Similarly, we get that wðZ þ tð0; . . . ; 1; . . . ; 0ÞT Þ ¼ Z T MZ þ cannot attain its 2tZ T Mð0; . . . ; 1; . . . ; 0ÞT þ AT Z þ t2 mii þ Ai t1=2



minimum at t ¼ 0. So Z with h2i ¼ 0 cannot be the minimizer of w, which means that w can only attain its minimum in the interior of BR. The Hessian matrix H of w is H ¼ ð@2 w=ð@Z@Z T ÞÞ ¼ 2M þ G, where 3=2 3=2 3=2 G ¼ ð1=4Þ diag ðc1 z1 ; c2 z2 ; . . . ; cq zq Þ is a diagonal matrix. Since cio0, G is positive definite in the interior of BR. Also, M is symmetric and positive semi-definite. So H is positive definite in the interior of BR. Therefore, w has a unique minimizer in the interior of BR. This completes the proof. In general, our objective function is inf

Z2Rqþ ; jjZjj¼1

1 Z T MZ þ AT Z~ n


2 2 T T ~ whereP RZ ¼ Tðh1 ;c . . . ; hq ; l1 ; . . . ; lr Þ ; Z ¼ ðh1 ; . . . ; hTq ; l1 ; . . . ; lr Þ ; and A ¼ M 1q ; B21 ; . . . ; B2r Þ , P ¼ Rxd 2Sd BB dx , B ¼ ðB11 ; . . . ; B T c ðA ; . . . ; A ; A ; . . . ; A Þ dx are defined in Theorem 1. d d 11 1q 21 2r x 2S Substituting hj ¼ ajn1/3, for j ¼ 1, y, q, and ls ¼ bsn2/3, for s ¼ 1, y, r into Eq. (5), we have that Eq. (5) is equivalent to minimize ZT MZ þ AT Z~ with respect to Z ¼ ða21 ; . . . ; a2q ; b1 ; . . . ; br ÞT and Z~ ¼ ða1 ; . . . ; aq ; b1 ; . . . ; br ÞT . A sufficient condition for the estimation of the CDF of the mixed discrete and continuous variables is given as follows.

Theorem 4. Let m ¼ inf Z2Rqþr ; jjZjj¼1 Z T MZ. If mW0, then w has a þ minimizer Z  2 Rqþr þ . If M is positive definite, then Hessian matrix H of w is positive definite at every point of Rqþr þ . Thus, w has a unique minimizer Z  2 Rqþr þ . Proof. If mW0, for any Z 2 Rqþr þ , with ||Z|| ¼ 1, we have that 1=2 ; t40. Using the notation Zð1Þ ¼ ða21 ; . . . ; a2q ÞT ; Z ð1Þ ¼ tZ 2 Rqþr þ T T 2 T ða1 ; . . . ; aq Þ and Z ð2Þ ¼ ðb1 ; . . . ; br Þ , we have wðtZÞ ¼ t Z MZ þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffi T 1=2 t A1 Z ð1Þ þ t AT2 Z ð2Þ ¼ ðt Z T MZ þ ððAT2 Z ð2Þ  1Þ=ð2 Z T MZÞÞÞ2 þ pffiffi 1=2 1=2 ð t þ AT1 Z ð1Þ =2Þ2  ððAT2 Z ð2Þ  1Þ2 =ð4Z T MZÞÞ  ððAT1 Z ð1Þ Þ2 =4Þ ! þ1, T as t-þN, where A1 ¼ ðc1 ; . . . ; cq Þ ,A2 ¼ ðcqþ1 ; . . . ; cqþr ÞT . For RW0, denote BR ¼ fZ 2 Rqþr þ : jjZjj  Rg. Since w is a continuous function , B is a compact set, and w(tZ)-þN, t-þN, we have that on Rqþr R þ wðZÞ3 min wðZÞ. Therefore, w has a there exists RW0, such that min Z2BR Z2Rqþr þ . minimizer Z  2 Rqþr þ   G 0 2 T The Hessian matrix H of w is H ¼ @ w=ð@Z@Z Þ ¼ 2M þ . If M 0 0 T is positive definite, then mW0, since Z MZW0 on the compact set


Nonparametric Estimation of Multivariate CDF

fZ : Z 2 Rqþr þ ; jjZjj ¼ 1g. Also, H is positive definite at every point . Thus, w has a unique minimizer Z  2 Rqþr Z 2 Rqþr þ þ . This completes the proof.

5. AN EMPIRICAL APPLICATION Gan and Zhang (2006) presented a theory predicting that a large city tends to have smaller unemployment rate. Their empirical study applied US data on city population and average unemployment rate based upon a sample of 295 cities. The average unemployment rate, which is continuous, ranges from 2.4% to 19.6%. To get a categorical variable, we artificially stipulate that those with population of more than 200,000 are large cities, and the others are small cities. This classification gives 112 large cities and 183 small cities. In Fig. 1, we plot the conditional CDF of unemployment rate, which is calculated from our estimation of the joint CDF, for large and small cities. We use a Gaussian kernel for the unemployment rate. The cross-validated bandwidths for the continuous variable and categorical variable are 0.3470 and 0.0289, respectively.1 The conditional CDF estimate is consistent with the theory that large cities tend to have lower unemployment rates than small cities. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Small city


Large city

0.1 0 0











Unemployment rate

Fig. 1.

CDF Estimate of Unemployment Rate of Large and Small Cities.



The conditional CDF curve for large cities is above that of the small cities for most part. Fig. 1 shows that, for most of the unemployment range, the distribution of unemployment rate for large cities stochastically dominates that of small cities.

6. CONCLUSION We propose a consistent nonparametric kernel estimator of joint unconditional CDF defined over a mix of discrete and continuous variables. A datadriven cross-validation method for selecting the smoothing parameters is examined. We show that it is asymptotically equivalent to minimizing integrated MSE. The uniqueness condition of the cross-validation procedure is discussed. In view of the fact that many economic data sets involve both continuous and discrete variables, our proposed estimator should prove useful to applied researchers.

NOTE 1. For practical implementations of nonparametric econometrics, refer to Scott (1992) and Hayfield and Racine (2008).

ACKNOWLEDGMENTS We thank the editors and two referees for their insightful comments which help us to improve our paper substantially. We also thank Dr. Qi Li who leads us to this fruitful field and for the intense discussion of this paper.

APPENDIX A. PROOF OF THEOREM 1 Proof of Theorem 1 2 ^ ^ ^ ^ þ varðFðxÞÞ. As we all know MSEðFðxÞÞ ¼ E½FðxÞ  FðxÞ2 ¼ ½biasðFðxÞÞ ^ ^ We will evaluate the terms biasðFðxÞÞ and varðFðxÞÞ separately. For simplicity, we use dz and dv to denote dz1 y dzq and dv1 y dvq, respectively, throughout the appendices. For the continuous variables, using the change of variables, integration by parts, and Taylor expansion, we have "  c # Z "Y  c #   c  q q Y xj  X cij x j  zj x  X ci ¼ f ðz1 ; z2 ; . . . ; zq Þdz E G G G ¼E h hj hj j¼1 j¼1 # Z "Y q ¼ h1 h2    hq Gðvj Þ f ðxc1  h1 v1 ; xc1  h2 v2 ; . . . ; xcq  hq vq Þdv


Z "Y q



kðvj Þ Fðxc1  h1 v1 ; xc2  h2 v2 ; . . . ; xcq  hq vq Þdv



) q 1 X ð2Þ c ¼ kðvj Þ Fðx Þ  þ F ðx Þhi hj vi vj dv 2 i;j¼1 ij j¼1 j¼1 ! ! q q q X X k2 X ð2Þ c 2 3 3 c þO hj ¼ Fðx Þ þ F ðx Þhj þ O hj 2 j¼1 jj j¼1 j¼1 Z "Y q


q X

c F ð1Þ j ðx Þhj vj


R where k2 ¼ v2 kðvÞdv, and "   c  c # Z " Y  c # q q c  Y xj  X cij xj  zj 2 x  Xi 2 2 ¼E ¼ f ðz1 ; z2 ; . . . ; zq Þdz E G G G h hj hj j¼1 j¼1 # Z "Y q 2 ¼ h1 h2    hq G ðvj Þ f ðxc1  h1 v1 ; xc2  h2 v2 ; . . . ; xcq  hq vq Þdv ¼2


Z "Y q



Gðvj Þ




Z "Y q

#" Gðvj Þ


¼ Fðxc Þ  a0

q Y

# kðvj Þ Fðxc1  h1 v1 ; xc2  h2 v2 ; . . . ; xcq  hq vq Þdv

j¼1 q Y

# c

kðvj Þ fFðx Þ 

j¼1 q X

F j ðxc Þhj þ O


R where a0 ¼ 2 vGðvÞkðvÞdv.

q X j¼1

q X j¼1

! h2j

c F ð1Þ j ðx Þhj vj gdv þ O

q X

! h2j




Nonparametric Estimation of Multivariate CDF

For the discrete variables, we have 1ðzds aus Þ r r  Y Y d ls lðzds ; us ; lÞ ¼ ð1  ls Þ1ðzs ¼us Þ Lðzd ; u; lÞ ¼ c  1 s s¼1 s¼1 ! ! r r r Y X Y l s ¼ ð1  ls Þ 1ðzd ¼ uÞ þ ð1  lt Þ 1s ðzd ; uÞ cs  1 tas s¼1 s¼1 ! ! r r X X 2 ls ¼ 1  ls 1ðzd ¼ uÞ þO s¼1


r r X X ls 1s ðzd ; uÞ þ O l2s þ c  1 s¼1 s s¼1

! ðA:3Þ

where 1(zd ¼ u) and 1s(zd, u) are indicator functions. 1s(zd, u) denotes that zd and u only differ in sth component. that if zd and u differ in more than Pr Note 2 d one component, Lðz ; u; lÞ ¼ Oð s¼1 ls Þ. From (A.3), it is easy to obtain: " #2 " ! r X X X d Lðz ; u; lÞ ¼ 1 ls 1ðzd ¼ uÞ uxd



r r XX X ls 1s ðzd ; uÞ þ O þ l2s c  1 s¼1 uxd s¼1 s # !2 " r X X ¼ 1 ls 1ðzd ¼ uÞ s¼1



! # r r X X X ls 2 þ2 1ðzd ¼ uÞ1s ðzd ; vÞ þ O ls c  1 u; vxd s¼1 s s¼1 # !" r X X d ¼ 12 ls 1ðz ¼ uÞ s¼1 r X




XX ls 1ðzd ¼ uÞ1s ðu; vÞ þ2 c  1 s uav s¼1 ! r X þO l2s




Here and in the following, for any two vectors x; y 2 Rr , xry denotes xiryi for all i ¼ 1, y, r, where xi and yi are the ith component of x and y, respectively.



We use f(xc|xd) and F(xc|xd) to denote the conditional density function and conditional CDF of X, respectively. Then, f ðxc ; xd Þ ¼ f ðxc jxd Þpðxd Þ

Fðxc ; xd Þ ¼



Fðxc jzd Þpðzd Þ


zd 2S d ; zd xd

P With (A.5) and (A.6), we can calculate E½GðÞ LðÞ by two steps. First, integrate the integrand with respect to xc conditional on xd and then take the summation with respect to xd. Thus, "  #    xc  X ci X c d d ^ E Fðx ; x Þ ¼ E G LðX i ; u; lÞ h uxd X Z xc  zc  X G Lðzd ; u; lÞf ðzc jzd Þpðzd Þdzc ¼ h d d ux zd 2S !  X X Z xc  zc  c d c d G ¼ Lðz ; u; lÞ pðzd Þ f ðz jz Þdz h uxd zd 2S d " q X X k2 c d 2 ¼ Fðxc jzd Þ þ F ð2Þ jj ðx jz Þhj 2 j¼1 zd 2S d !! ! q r X X X 3 þO hj 1 ls 1ðzd ¼ uÞ j¼1



!!# r X ls 2 d pðzd Þ þ ls 1s ðz ; uÞ þ O c  1 s s¼1 s¼1 " # r X X X 1 c d d c d d c d ¼ Fðx ; x Þ þ 1s ðz ; uÞFðx jx Þpðx Þ  Fðx ; z Þ ls cs  1 d d uxd s¼1 z 2S ! q h q r X X X k2 ð2Þ c d i 2 3 2 þ hj þ ls F ðx ; x Þ hj þ O 2 jj s¼1 j¼1 j¼1 ! q q r r X X X X 2 3 2 c d ðA:7Þ ¼ Fðx ; x Þ þ B1j hj þ B2s ls þ O hj þ ls r X



c d where B1j ¼ðk2 =2ÞF ð2Þ jj ðx ; x Þ; B2s ¼ ð1=ðcs  1ÞÞ d c d pðx Þ  Fðx ; x Þ.




zd 2S d


uxd 1s ðz


; uÞFðxc jxd Þ


Nonparametric Estimation of Multivariate CDF

So we obtain

^ c ; xd ÞÞ ¼ biasðFðx

q X

B1j h2j


r X


B2s ls þ O

q X





r X

! l2s



Similarly, combining (A.2) and (A.4), we have 2

" #2 3  c c X x  Xi LðX di ; u; lÞ 5 E 4G2 h uxd ¼

XZ zd 2S d




xc  zc h

" c


Fðx jz Þ  a0

#2 Lðzd ; u; lÞ f ðzc jzd Þpðzd Þdzc

uxd q X

c d F ð1Þ j ðx jz Þhj


q X


zd 2S d



" X


r X




!# h2j


1ðzd ¼ uÞ


r r X X ls X X 1ðzd ¼ uÞ1s ðu; vÞ þ O l2s þ2 c  1 uav s¼1 s s¼1

¼ Fðxc ; xd Þ  a0

q X

!# pðzd Þ

c d F ð1Þ j ðx ; x Þhj


" # r X 2 XX þ 1s ðu; vÞFðxc juÞpðuÞ  2Fðxc ; xd Þ ls cs  1 uav s¼1 ! q r X X 2 2 þO hj þ ls j¼1 c


¼ Fðx ; x Þ 

s¼1 q X j¼1

A1j hj þ

r X

C 2s ls þ O



c d where A1j ¼ a0 F ð1Þ j ðx ; x Þ; C 2s ¼ ð2=ðcs 1ÞÞ

q X

PP uav



r X

! l2s


1s ðu; vÞFðxc juÞpðuÞ 2Fðxc ; xd Þ.



Hence, "  #  1 xc  X ci X c d d ^ var½Fðx ; x Þ ¼ var G LðX i ; u; lÞ h n uxd 2 2 !2 3   X 1 4 4 2 xc  X ci ¼ E G LðX di ; u; lÞ 5 h n d ux " "  ##2 3  xc  X ci X 5  E G LðX di ; u; lÞ h d ux " ! q q r r X X X X 1 2 2 c d Fðx ; x Þ  A1j hj þ C 2s ls þ O hj þ ls ¼ n j¼1 j¼1 s¼1 s¼1 !!2 3 q q r r X X X X 2 3 2 5 ðFðxc ; xd Þ þ B1j hj þ B2s ls þ O hj þ ls j¼1




X 1 hj Fðxc ; xd Þð1  Fðxc ; xd ÞÞ  A1j n n j¼1 q



r X

ðC 2s  2Fðxc ; xd ÞB2s Þ

s¼1 q r 1X 2 1X hj þ l2 þO n j¼1 n s¼1 s

ls n


r X 1 hj X ls Fðxc ; xd Þð1  Fðxc ; xd ÞÞ  A1j þ C 2s n n n j¼1 s¼1 ! q r 1X 2 1X hj þ l2 þO n j¼1 n s¼1 s q


where A2s ¼ C2s2F(xc, xd)B2s.



Nonparametric Estimation of Multivariate CDF

Using (A.8) and (A.9), we have ^ c ; xd ÞÞ ¼ ½biasðFðx ^ c ; xd ÞÞ2 þ varðFðx ^ c ; xd ÞÞ MSEðFðx ¼

q X j¼1

B1j h2j þ

r X

B2s ls þ O


q X j¼1

h3j þ

r X

!!2 l2s


q r X 1 hj X ls þ Fðxc ; xd Þð1  Fðxc ; xd ÞÞ  A1j þ A2s n s¼1 n n j¼1 ! q r 1X 2 1X 1 þO hj þ l2 ¼ Fðxc ; xd Þð1  Fðxc ; xd ÞÞ n j¼1 n s¼1 s n !2 q q r r X X X hj X ls 2 A1j þ A2s þ B1j hj þ B2s ls  n s¼1 n j¼1 j¼1 s¼1 ! q q r r X X 1X 2 1X þO h þ l2 þ h6j þ l4s n j¼1 j n s¼1 s j¼1 s¼1

Thus, we obtain X Z 1 Fðxc ; xd Þð1  Fðxc ; xd ÞÞ n d d 2sd d x x 2S !2 1 q q r r X X X hj X ls 2  A1j þ A2s þ B1j hj þ B2s ls Adxc n s¼1 n s¼1 j¼1 j¼1 ! q q r r X X 1X 2 1X 2 6 4 þO h þ l þ h þ l n j¼1 j n s¼1 s j¼1 j s¼1 s ! XZ 1 T T c ¼Z BB dx Z þ AT Z~ n d xd 2S Z X 1 Fðxc ; xd Þð1  Fðxc ; xd ÞÞdxc þ n d d x 2S ! q q r r X X 1X 2 1X 2 6 4 þO h þ l þ h þ l n j¼1 j n s¼1 s j¼1 j s¼1 s

^ c ; xd ÞÞ ¼ MISEðFðx


^ c ; xd ÞÞdxc ¼ MSEðFðx



T where Z ¼ ðh21 ; . . . ; h2q ; l1 ; . . . ; lr ÞTP ; Z~ ¼ ðh R 1 ; . . . ; hq ; l1 ; . . . ; lr Þ ; B ¼ T ðB11 ; . . . ; B1q ; B21 ; . . . ; B2r Þ and A ¼ xd 2Sd ðA11 ; . . . ; A1q ; A21 ; . . . ; A2r ÞT dxc . P Let W i ¼ Gððxc  X ci Þ=hÞ uxd LðX di ; u; lÞ. From (A.8), (A.9), and condition (C3), we have

q r X X pffiffiffi ^ c ; xd Þ  Fðxc ; xd Þ  B1j h2j  B2s ls n Fðx j¼1



" # q n r X X 1 X 2 W i  Fðxc ; xd Þ  B1j hj  B2s ls ¼ pffiffiffi n i¼1 j¼1 s¼1 ! q n r X X pffiffiffi 1 X 3 2 ¼ pffiffiffi ½W i  EðW i Þ þ nOp hj þ ls n i¼1 j¼1 s¼1 d

! Nð0; Fðxc ; xd Þð1  Fðxc ; xd ÞÞÞ pffiffiffi P by Lyapunov’s central limit theorem and varðð1= nÞ ni¼1 ½W i  EðW i ÞÞ ! Fðxc ; xd Þð1  Fðxc ; xd ÞÞ. This completes the proof of Theorem 1.

APPENDIX B. PROOF OF THEOREM 2 Proof of Theorem 2 Recall that " # n X Z 1X CVðh; lÞ ¼ ðIðxc ; X ci ÞIðxd ; X di Þ  F^ i ðxc ; xd ÞÞ2 dxc n i¼1 d d x 2S

P P where F^ i ðxc ;xd Þ ¼ ð1=ðn1ÞÞ jai Gððxc X cj Þ=hÞ uxd LðX dj ; u; lÞ; Iðxc ;X ci Þ ¼ 1ðX ci  xc Þ, and Iðxd ; X di Þ ¼ 1ðX di  xd Þ. c c d ; X di Þ and H ¼ CVðh; lÞ  ð1=nÞ Ii  R Iðx; X i Þ ¼ Iðx c; X di ÞIðx PLet n P 2 c ½Iðx; X Þ  Fðx ; x Þ dx . For simplicity, we use F^ i and i i¼1 xd 2sd F to denote F^ i ðxc ; xd Þ and F(xc, xd), respectively, throughout this appendix.


Nonparametric Estimation of Multivariate CDF

Then we have nH ¼ ¼ ¼

X X Z i

xd 2S d



X X Z xd 2S

X X Z i

½ðI i  F^ i Þ2  ðI i  FÞ2 dxc fðF^ i  FÞ2  2ðI i  FÞðF^ i  FÞgdxc ðF^ i  FÞ2 dxc  2

X X Z i

xd 2S d

ðI i  FÞðF^ i  FÞdxc

xd 2S d

 S 1  2S2


Let Di ¼ Gððxc  X ci Þ=hÞ


d uxd LðX i ; u; lÞ

 Fðxc ; xd Þ;

D0i ¼ I i  F, then

2 X X Z  n 1 ^ ðF  FÞ  Di dxc S1 ¼ n1 n1 i xd 2S d i xd 2S d X Z X X Z n3 ^  FÞ2 dxc  2n ð F ðF^  FÞDi dxc ¼ ðn  1Þ2 xd 2Sd ðn  1Þ2 i xd 2Sd X X Z 2 D2i dxc þ ðn  1Þ X X Z

ðF^ i  FÞ2 dxc ¼


xd 2S d

Z X X Z n  2n X 1 ^  FÞ2 dxc þ ð F D2i dxc ¼ ðn  1Þ2 xd 2Sd ðn  1Þ2 i xd 2Sd 3



and S2 ¼


xd 2S d

ðI i  FÞðF^ i  FÞdxc

 n 1 ^ ðF  FÞ  Di dxc ðI i  FÞ ¼ n1 n1 i xd 2S d # Z " X Z n2 X 1 1 XX ðI i  FÞDi dxc ¼ I i  F ðF^  FÞdxc  n1 d d n i n1 i d d x 2S x 2S Z Z 2 X X X n 1 ðF n  FÞðF^  FÞdxc  Di D0i dxc ¼ ðB:3Þ n1 d d n1 i d d XXZ

x 2S

by noting that F n  F n ðxc ; xd Þ ¼ ð1=nÞ

x 2S


c d c d i¼1 Iðx ; X i ÞIðx ; X i Þ



iI i.



Combining (B.1), (B.2), and (B.3), we have 1 ½S 1  2S 2  n   X Z X X Z 1 1 ^  FÞ2 dxc þ ð F D2i dxc ¼ 1 ðn  1Þ2 xd 2Sd nðn  1Þ2 i xd 2Sd   X Z 1 ðF n  FÞðF^  FÞdxc 2 1þ n1 d d x 2S X X Z 2 Di D0i dxc þ ðB:4Þ nðn  1Þ i d d x 2S R P Let mðh; lÞ ¼ xd 2Sd EðDi D0i Þdxc . Using lemma (B.1) and (B.4), we have that X Z X Z ðF n  FÞ2 dxc ¼ ðF^  F n Þ2 dxc Hþ

xd 2S d

xd 2S d


Z 1 2 X 2 c ^ ðF  FÞ dx  ðF n  FÞðF^  FÞdxc  n1 d d ðn  1Þ2 xd 2Sd x 2S X X Z X X Z 1 2 2 c Di dx þ Di D0i dxc þ nðn  1Þ i d d nðn  1Þ2 i xd 2Sd x 2S X Z 2 2 c ðF^  F n Þ dx þ ¼ mðh; lÞ n1 d d x 2S ! q r X X 4 2 3=2 1 1 þ Op n þn hj þ n ls ðB:5Þ j¼1 c

Recall that W i ¼ Gððx  XZ xd 2S d


X ci Þ=hÞ



d uxd LðX i ; u; lÞ,

we have that


n n 1X 1X Wi  I i dxc n n i¼1 i¼1 xd 2S d Z X X X 1 ðW i  I i ÞðW j  I j Þdxc ¼ 2 n iaj xd 2S d Z 1XX ðW i  I i Þ2 dxc þ 2 n i d d

ðF^  F n Þ2 dxc ¼

x 2S

n 1 XX 1X ¼ 2 gðX i ; X j Þ þ 2 gðX i ; X i Þ ¼ S þ T n n i¼1 iaj



Nonparametric Estimation of Multivariate CDF

P where the definitions of S and T are obvious, and gðX i ; X j Þ ¼ xd 2Sd R ðW i  I i ÞðW j  I j Þdxc . We can see that S is a second-order U-statistic. Define g1(x) ¼ E[g(x, X1)] and g0 ¼ E[g1(X1)], then we have g1(Xi) ¼ E[g(Xi, Xj)|Xi] and g1(Xj) ¼ E[g(Xi, Xj)|Xj], if i 6¼ j. Using the Hoeffding decomposition, we have XX gðX i ; X j Þ S ¼ n2 iaj




fgðX i ; X j Þ  g1 ðX j Þ  g1 ðX j Þ þ g0 g


  n   1 1 X 1 fg1 ðX i Þ  g0 g þ 1  g0 1 n n i¼1 n   1 ¼ S ð1Þ þ S ð2Þ þ 1  g0 n þ2


where the definitions of S(1) and S(2) are obvious. Then by the law of iterated expectations, we have XX ½EðgðX i ; X j ÞÞ  Eðg1 ðX i ÞÞ  Eðg1 ðX j ÞÞ þ g0  ¼ 0 (B.8) EðSð1Þ Þ ¼ n2 iaj

EðS ð2Þ Þ ¼ 2n1 ð1  n1 Þ

n X

ðEðg1 ðX i ÞÞ  g0 Þ ¼ 0



Also, it is easy to see that E[S(1)|Xi] ¼ 0 for all i ¼ 1, y, n and E[S(2)|Xj] ¼ 0 for j 6¼ i, since Xi and Xj are independent. Thus, we have !2 XX ð1Þ 2 2 ðgðX i ; X j Þ  g1 ðX i Þ  g1 ðX j Þ þ g0 Þ EðS Þ ¼ E n iaj




EðgðX i ; X j Þ  g1 ðX i Þ  g1 ðX j Þ þ g0 Þ2



and "

#2 n X EðS Þ ¼ E 2n ð1  n Þ ðg1 ðX i Þ  g0 Þ ð2Þ 2



i¼1 n 4ðn  1Þ X ¼ E½g1 ðX i Þ  g0 2 4 n i¼1 2




From lemma B.2 and (B.10), (B.11), we have  1 2 2 2 2 EðS Þ ¼ O 4 ðn  nÞðEðgðX i ; X j ÞÞ þ Eðg1 ðX 1 ÞÞ þ g0 n !! q q q r X X X X 3q 2qþ4 8 4 2 ¼O n hj þ hj þ hj þ ls ð1Þ 2





and EðS ð2Þ Þ2 ¼

n 4ðn  1Þ2 X E½g1 ðX i Þ  g0 2 n4 i¼1

¼O n


q X

h2qþ4 j


q X





r X

!! l4s


R P Also, P E½gðX 1 ; X 1 Þ2 ¼ E½ xd 2Sd ðW 1  I 1 Þ2 dxc 2 ¼ Oð1Þ implies VarðTÞ ¼ 3 Varðn2 i gðX i ; X i ÞÞ ¼ ð1=n3 ÞVarðgðX i ; X i ÞÞ P¼ OðnR Þ. 2 c ^ Combining (B.6) and (B.7), we have xd 2S d ðF  F n Þ dx ¼ S þ T ¼ ð1Þ ð2Þ 1 S þ S þ ð1  n Þg0 þ T.With (B.8) and (B.9), we have " E


# 2 c ^ ðF  F n Þ dx ¼ EðS ð1Þ Þ þ EðSð2Þ Þ þ ð1  n1 Þg0

xd 2S d


þ EðTÞ ¼ ð1  n1 Þg0 þ EðTÞ R P Using (B.10), (B.11), and (B.12), we can see that Eð xd 2Sd ðF^  F n Þ2 dxc  S ð2Þ þ ð1  n1 Þg0  EðTÞÞ2 ¼ E½S þ T  ð1  n1 Þg0  EðTÞ2 ¼ E½S ð1Þ Pþ q 3q 2 ð1Þ 2 ð2Þ 2 3 2 ðT P  EðTÞÞ ¼ OðEðS EðS j¼1 hj þ Pq Þ8 þ 1 Pr Þ 4þ VarðTÞÞ ¼ Oðn þ n q 2qþ4 1 1 þn n s¼1 ls Þ. Hence, j¼1 hj j¼1 hj þ n X Z

ðF^  F n Þ2 dxc ¼ EðTÞ þ ð1  n1 Þg0

xd 2S d

þ Op n3=2 þ n1

q X



þ n1=2


þ n1=2

q X j¼1

h4j þ n1=2

r X s¼1

! l2s

q X

hqþ2 j




Nonparametric Estimation of Multivariate CDF

Combining (B.5), (B.12), and (B.13), we have XZ Hþ ðF n  FÞ2 dxc ¼ EðTÞ þ ð1  n1 Þg0 þ 2ðn  1Þ1 mðh; lÞ xd 2Sd

þ Op n




q X

3q=2 hj







q X








hqþ2 þ n1=2 j

q X

h4j þ n1=2

r X

r X

! l2s


ðF^  F n Þ2 dxc þ 2ðn  1Þ1 mðh; lÞ þ Op n3=2 þ n1

q X j¼1

q X j¼1

xd 2Sd

þ n1=2


q X




! l2s




It is easy to see that " # XZ XZ XZ   2 c E ðF^  F n Þ dx ¼ E½ðF^  FÞ2 dxc þ E ðF n  F Þ2 dxc xd 2Sd

xd 2Sd




xd 2Sd

EðF^  FÞðF n  FÞdx 

xd 2Sd



# c

E½ðF^  FÞ2 dxc þ

xd 2Sd


E½ðF n  FÞ2 dxc

xd 2Sd

Z 1X E½ðW i  FÞðI i  FÞdxc n d d x 2S XZ ¼ E½ðF^  FÞ2 dxc 2

xd 2Sd



xd 2Sd

2 E½ðF n  F Þ2 dxc  mðh; lÞ n

R R P R P P 0 c ðD f ðGðvÞ Also, we have mðh; lÞ ¼ E½ D Þdx  ¼ d 2S d d 2S d d 2S d i i x x x P d c d c c d d 1 c d Lðx ; u; lÞ Fðx þ hv; x ÞÞðIðx þ hv; x ÞIðx ; x Þ  Fðx þ hv; x ÞÞhdvg uxd 1 1 1 1 1 P 1 f ðxc1 ; xd1 Þdxc1 ¼ Oð qj¼1 hqj Þ. Thus, we have " # XZ XZ 2 c ^ ðF  F n Þ dx ¼ E½ðF^  FÞ2 dxc E xd 2S d

xd 2S d



xd 2S d



E½ðF n FÞ dx þO n


q X j¼1

! hqj




Combining (B.14) and (B.15), we obtain that XZ XZ XZ Hþ ðF n  FÞ2 dxc  E½ðF n  FÞ2 dxc ¼ E½ðF^  FÞ2 dxc xd 2S d

xd 2S d

þ Op n




q X



xd 2S d


q X


hqþ2 j



q X






r X

! l2s


That is, CVðh; lÞ þ J n ¼ MISEðh; lÞ þ Op n3=2 þ n1

q X


j¼1 1=2


q X

hqþ2 j



q X






r X

! l2s


Essentially, we have proved the upper bound of the second moment of CV(h, l)þJn–MISE(h, l). Using Markov’s inequality to the left hand side of (B.16) and Rosenthal’s inequality (see Hall & Heyde, 1980, p. 23) to S(1) in (B.7) and repeating the previous proof, we can give the upper bound of each order moment of CV(h, l)þJnMISE(h, l). With the aid of nd and the differentiability of the kernel function, we can get ( q X hqj P sup jCVðh; lÞ þ J n  MISEðh; lÞj4 n3=2 þ n1 j¼1



q X

hqþ2 j




q X





r X

!) l2s Þnd

(B.16) g

¼ Oðn Þ


for arbitrarily large g. Then by the Borel–Cantelli lemma, we obtain the uniform strong convergence. This completes the proof of Theorem 2. Lemma B.1. X Z X Z (i) ðF^  FÞ2 dxc þ ðF n  FÞðF^  FÞdxc xd 2S d

xd 2S d

¼ Op n1 þ

q X j¼1

(ii) n3

X X Z i

h4j þ

r X

! l2s


D2i dxc ¼ Op ðn2 Þ and n2


xd 2S d

¼ n1


xd 2S



EðDi D0i Þdxc þ Oðn3=2 Þ.

xd 2S d

Di D0i dxc


Nonparametric Estimation of Multivariate CDF

P Proof. From (A.8) and (A.9), we have F^  F ¼ Op ðn1=2 þ qj¼1 h2j þ R P P P Pr 2 c 1 ^ þ qj¼1 h4j þ rs¼1 l2s Þ. s¼1 ls Þ. So we have xd 2S d ðF  FÞ dx ¼ Op ðn It is easy to see that E½F n ðxc ; xd Þ ¼ E½Iðx; X i Þ ¼ Fðxc ; xd Þ and Var(Fn (xc, xd)) ¼ n1{E[I(x, Xi)]2(E[I(x, Xi)])2} ¼ n1F(xc, xd)[1F(xc, xd)]. Thus, we have E½F n ðxc ; xd Þ  Fðxc ; xd Þ2 ¼ Var½F n ðxc ; xd Þ ¼ Oð1=nÞ, R P which implies F n ðxc ; xd Þ  Fðxc ; xd Þ ¼ Op ðn1=2 Þ and xd 2S d ðF n  FÞ P P ðF^  FÞdxc ¼ Oðn1 þ qj¼1 h4j þ rs¼1 l2s Þ. From the law P of large numbers andPthe central limit theorem, we get that n1 i D2i ¼ Op ð1Þ and n1 i Di D0i ¼ EðDi D0i Þ þ Op ðn1=2 Þ. R R PP PP Therefore, n3 i xd 2Sd D2i dxc ¼ n2 ð1=nÞ i xd 2Sd D2i dxc ¼ Op ðn2 Þ R R P P P and n2 i xd 2Sd Di D0i dxc ¼ n1 xd 2Sd EðDi D0i Þdxc þ Op ðn3=2 Þ. This completes the proof of this lemma. P Lemma B.2. P (i) E½gðX 1 ; X 2 Þ2  ¼P Oð qj¼1 h3q Þ; (ii) Eðg1 ðX 1 ÞÞ2 ¼ jP Pq 2qþ4 q r r 4 4 2 þ s¼1 ls Þ; (iii) g0 ¼ Oð j¼1 hj þ s¼1 ls Þ. Oð j¼1 hj Proof. Using the change of variables, we have E½gðX 1 ; X 2 Þ2 ¼

( "   XZ X XZ xc  xc1 X G Lðxd1 ; u; lÞ h uxd xd1 2Sd xd2 2Sd xd 2Sd #  Iðxc ; xc1 ÞIðxd ; xd1 Þ

"  # )2  xc  xc2 X d c c d d Lðx2 ; u; lÞ  Iðx ; x2 ÞIðx ; x2 Þ dxc G h d ux  f ðxc1 ; xd1 ; xc2 ; xd2 Þdxc1 dxc2 ( " # XZ X X XZ d c c d d GðvÞ ¼ Lðx1 ; u; lÞ  Iðx1 þ hv; x1 ÞIðx ; x1 Þ xd1 2Sd xd2 2Sd

xd 2Sd


"  # )2  xc1  xc2 X d c c d d Lðx2 ; u; lÞ  Iðx1 þ hv; x2 ÞIðx ; x2 Þ hdv G vþ h uxd  f ðxc1 ; xd1 ; xc2 ; xd2 Þdxc1 dxc2 ( " # XZ X X XZ d d d GðvÞ ¼ Lðx1 ; u; lÞ  Iðhv; 0ÞIðx ; x1 Þ xd1 2Sd xd2 2Sd


Gðv þ wÞ

xd 2Sd



# Lðxd2 ; u; lÞ  Iðhðv þ wÞ; 0ÞIðxd ; xd2 Þ


 f ðxc2

þ hw; xd1 ; xc2 ; xd2 Þhdwdxc2


q X j¼1



! h3q j




(A.7) and E(Ii) ¼ F(xc, xd), we obtain E½W 1  I 1  ¼ Oð PFrom r s¼1 ls Þ. Then we have ( )2 X Z 2 c ðW 1  I 1 ÞE½W 1  I 1 dx Eðg1 ðX 1 ÞÞ ¼ E xd 2S d

¼ ðE½W 1  I 1 Þ

( X Z X Z


xd1 2S d c



xd 2S d


; xc1 ÞIðxd ; xd1 Þ

q X




r X

! l2s



( X Z X Z xd1 2S d




r X

 c  x  xc1 X Lðxd1 ; u; lÞ h uxd


 Iðxc1 þ hv; xc1 ÞIðxd ; xd1 Þ hdv h2qþ4 j

2 j¼1 hj þ

)2 c


q X



xd 2S d



Lðxd1 ; u; lÞ


)2 dxc1

! l4s




is easy to see that g0 ¼ E½g1 ðX 1 Þ ¼ ðE½W 1  I 1 Þ2 ¼ Oð PIt r 2 s¼1 ls Þ, which completes the proof.


4 j¼1 hj þ