Empirical processes based on pseudo-observations II: the multivariate ...

0 downloads 0 Views 311KB Size Report
of the Xi's. Note that Kn is the empirical distribution function of the residuals ... rors terms to construct empirical distribution functions for goodness-of-fit tests, pre-.
Fields Institute Communications Volume 44, 2004

Empirical processes based on pseudo-observations II: the multivariate case Kilani Ghoudi United Arab Emirates University, P.O. Box 17555 Al Ain United Arab Emirates. [email protected]

Bruno R´ emillard HEC Montr´ eal, 3000, chemin de la Cˆ ote Sainte-Catherine Montr´ eal (Qu´ ebec) Canada H3T 2A7 [email protected]

This paper is dedicated Miklos Cs¨ org˝ o on his 70th birthday.

Abstract. One often needs to estimate the distribution functions of a random vector ε = H(X), where the H is unknown and might depend on the law of X. When H is estimated by some Hn , using a sample X1 , . . . , Xn , the Hn (Xi )’s are termed pseudo-observations. In a semiparametric context, one often wants to estimate parameters related to the law of the non-observable ε. The transformed data Hn (X1 ), . . ., Hn (Xn ) are then naturally used, introducing dependence. Classical techniques do not apply and hard work is needed to get the asymptotic behaviour of estimators and empirical processes. The aim of this paper is to give a unified treatment of inference procedures based on pseudo-observations in the multivariate setting. Examples of applications are given.

1 Introduction There is always a price to pay for not being able to observe a random variable. To illustrate this statement, consider the following well-known model: Xi = µ + εi , 2000 Mathematics Subject Classification. Primary 60F05; Secondary 62E20. Key words and phrases. Empirical process, pseudo-observations, semi-parametric models. This work is supported in part by the Fonds Institutionnel de Recherche, Universit´ e du Qu´ ebec ` a Trois-Rivi` eres, the Fonds pour la formation de chercheurs et l’aide ` a la recherche du Gouvernement du Qu´ ebec and by the Natural Sciences and Engineering Research Council of Canada. c

2004 American Mathematical Society

381

382

Kilani Ghoudi and Bruno R´ emillard

where the observable Xi ’s are independent and identically distributed and µ is the unknown mean. If one is interested in estimating the common distribution function K(t) = F (t + µ) of the εi ’s, where F is the distribution function of the Xi ’s, it is reasonable ¯ n ), where Fn is the empirical distribution function to estimate it by Kn (t) = Fn (t+X of the Xi ’s. Note that Kn is the empirical distribution function of the residuals ¯ n ’s. It is easy to see that under additional conditions, the process ei,n =√Xi − X Kn = n (Kn − K) will converge in law to a continuous Gaussian process having representation B ◦ K(t) + Zk(t), where B is a Brownian bridge and Z is a Gaussian random variable depending on B. The limit differs by the term Zk(t) from the usual limit B ◦ K(t). So the price to pay here for using residuals instead of the real random variable is to assume the existence of the density k and to get a limit which is more complicated than the usual one. In what follows, we will see that this is typical of pseudo-observations. There is always a price to pay in additional assumptions, and the limit is more complicated. One of the first emergence of pseudo-observations is in the regression setting, including the study of time series. Residuals are used instead of the non-observable errors terms to construct empirical distribution functions for goodness-of-fit tests, prediction intervals, and so on. Empirical processes built from residuals received considerable attention lately; see for example Loynes (1980), Shorack (1984), Meester and Lockhart (1988), Koul and Ossiander (1994), Koul (1996), Kulperger (1996) and Mammen (1996). Only recently, pseudo-observations which were not residuals appeared in the statistical literature. In estimation problems in the context of multivariate extreme value theory or in the context of dependence structures, e.g. Abdous et al., (1999), Cap´era`a et al., (1997), Genest et al., (1995) and Tawn (1988), univariate marginal distributions are often considered as nuisance parameters; one would prefer to work with specified marginals. Specifically, let X be a Rd -valued random variable with continuous distribution function F and univariate marginal distributions F1 , . . . , Fd . Then estimation procedures are related to a non-observable  random vector of the form  = H(X) = G−1 ◦ F1 (X (1) ), . . . , G−1 ◦ Fd (X (d) ) , where G, the objective marginal distribution, is given and the marginals F1 , . . . , Fd are not known. The so-called copula is just the distribution function of  when G is the identity function. It is well-known that the copula characterizes the dependence structure, see for example Nelsen (1999). To estimate parameters of the law of  or its distribution function, it is natural to replace the unknown marginals by their empirical counterparts, obtaining the pseudo-observations ei,n = Hn (Xi ), where n o Hn (x) = G−1 ◦ F1n (x(1) ), . . . , G−1 ◦ Fdn (x(d) ) , and where the Fjn ’s are the empirical marginal distribution functions. In all cases, transforming the observations introduces dependence and classical techniques do not work. So far, only case-by-case solutions were proposed to tackle the asymptotic behaviour of estimators and empirical processes. The aim of this paper is to give a unified treatment of inference procedures based on pseudo-observations in the multivariate setting. The first step in that direction was taken in Ghoudi and R´emillard (1998) where the univariate case was treated.

Empirical processes based on pseudo-observations II: the multivariate case

383

Notations and sufficient conditions for the convergence of the empirical processes based on pseudo-observations are stated in Section 2. Examples of pseudoobservations will be given in Section 3, with applications to serial copula process (Subsection 3.1), copulas and semiparametric estimation (Subsection 3.2), multivariate regression (Subsection 3.3)and serial independence tests for time series (Subsection 3.4). A technique to extend some classical statistics is given in Subsection 3.5. A sketch of the proofs of the main result and its corollary are then given in Section 4. The details of the argument can be found in Sections 5 and 6. In Section 7, more tractable conditions are given for some linear models in order to satisfy the Hypotheses of Section 2. Note that all results could have been proven using the so-called modern theory of empirical processes (e.g. van der Vaart and Wellner 1996). However the proofs would not have been shortened nor would the reader have benefited from such an approach. For the applications we have in mind, the classical treatment of empirical processes is well suited and conditions of application are easy to verify. 2 Notations and results Consider an observable X-valued random variable X and let {Xi }i≥1 be observations of X so that the series is stationary and ergodic and so that the (nonobservable) {εi = H(Xi )}i≥1 are random variables with values in a rectangle T ⊂ Rd , which is a product of intervals of R, that is T = T1 × T2 × · · · × Td . Given an estimate Hn of H, the pseudo-observations {ei,n }1≤i≤n are defined by ei,n = Hn (Xi ), 1 ≤ i ≤ n. Suppose that the pseudo-observations have also values in T . Then the empirical distribution Kn based on these pseudo-observations is defined by n 1X II{ei,n ≤ t}, t ∈ Rd , Kn (t) = n i=1 where for a, b ∈ Rd , the notation a ≤ b stands for a(j) ≤ b(j) for all j = 1, . . . , d. Let K be the distribution function of the random vector ε = H(X). In what follows K (j) denotes the distribution function of the j-th component ε(j) of ε, 1 ≤ j ≤ d. In order to establish the weak convergence of the empirical process √ Kn (t) = n{Kn (t) − K(t)}, the following additional assumptions will be made. These are similar to those used for univariate pseudo-observations in Ghoudi and R´emillard (1998). Suppose that X is a complete and separable metric space and let C be a subset of the continuous functions f from X to Rd containing the function f ≡ 0. Set J = {1 ≤ j ≤ d; f (j) 6≡ 0, for at least one f ∈ C}. Assume that there exists a nonnegative continuous function r from X to Rd such that min inf r(j) (x) > 0, r(j) ≡ 0 for all j 6∈ J, and such that k · kr defined by j∈J x∈X

kf kr = inf{a ≥ 0; |f (j) (x)| ≤ ar(j) (x), for all x ∈ X and 1 ≤ j ≤ d}, is finite for all f ∈ C. Also assume that C = Cr is closed with respect the norm induced by r. Further let C˜r = {f = f0 + θr; f0 ∈ Cr and θ ∈ R}.

384

Kilani Ghoudi and Bruno R´ emillard

Finally let πj be the canonical projection from Rd to R such that πj (t) = t(j) , 1 ≤ j ≤ d. n o Hypothesis I. Suppose that max E r(j) (X) is finite. Suppose also that for any 1≤j≤d

fixed j ∈ J, the following conditions hold: • the law of ε(j) = H (j) (X) admits a density kj (·) which is bounded on every compact subset of Tj ; • there exists a version of the conditional distribution of X given ε(j) = t(j) , such that (i) for any f ∈ C˜r and for any continuous and bounded ψ from R to R, the mapping o n t 7→ µj (t, ψ ◦ f (j) ) := kj (t(j) )E ψ ◦ f (j) (X)II{ε ≤ t} ε(j) = t(j) (2.1) is continuous on πj−1 (Tj ) = {t ∈ Rd ; t(j) ∈ Tj }; (ii) for any compact subset C of Rd such that πj (C) ⊂ Tj , Z ∞ sup P (r(j) (X) > u, ε ≤ t|ε(j) = t(j) )du M

(2.2)

t∈C

converges to zero as M tends to infinity. Remark 2.1 Note that condition (2.1) is implied by the following condition: for any continuous and bounded φ1 and φ2 on R, the mapping t 7→ µj {t, (φ1 ◦ g (j) )(φ2 ◦ r(j) )}

(2.3)

is continuous on πj−1 (Tj ), where f ∈ C˜r , and g (j) = f (j) /r(j) , for all j ∈ J. Before stating the next hypothesis which basically requires that Hn is a good estimate of H, one has to define the following processes for all 1 ≤ j ≤ d, and any continuous Ψ on X such that 0 ≤ Ψ ≤ 1. For (a, t) ∈ R × Rd , set n 1 X (j) An,j,Ψ (a, t) = √ Ψ(Xi )II{εi ≤ t(j) + ar(j) (Xi )}Wj (εi , t), n i=1

where Wj (ε, t) =

d Y

II{ε(l) ≤ t(l) }, and set

l=1; l6=j

αn,j,Ψ (a, t) = An,j,Ψ (a, t) − E {An,j,Ψ (a, t)} . In particular, the classical empirical process which one obtains if the εi ’s were observable is given by n 1 X √ [II{εi ≤ t} − K(t)] , t ∈ Rd . αn (t) = αn,1,1 (0, t) = n i=1 Hypothesis II. Suppose that ˜ n − H) ∈ Cr and such ˜ n of Hn such that Hn = √n (H • there exists a version H √ ˜ that n kHn − Hn kr converges in probability to zero; • (αn , Hn ) converges in C(Rd ) × Cr to a process (α, H); • for any f ∈ Cr with g (j) = f (j) /r(j) , j ∈ J, and for any continuous ψ on R, such that 0 ≤ ψ ≤ 1, √ sup αn,j,ψ◦g(j) (a/ n , t) − αn,j,ψ◦g(j) (0, t) (2.4) t∈C

Empirical processes based on pseudo-observations II: the multivariate case

385

converge in probability to zero as n tends to infinity, for every compact subset C of T and for every a ∈ R. Remark 2.2 The first condition ensures that estimator Hn behaves nicely and does not introduce any measurability problems. Condition (2.4) is trivially satisfied and follows from the weak convergence of αn when T¯ is compact r(j) ≡ 1, and Cr is of the form Cr ⊂ {f ; f (j) = b(j) ◦ H (j) , bj ∈ C(T¯j )}. To see that, note that on √ (j) (j) {t(j) < εi ≤ t(j) + a/ n }, ψ ◦ g j (Xi ) = ψ ◦ b(j) (εi ) ≈ ψ ◦ b(j) (t(j) ). Hence (2.4) is bounded by the sum of the following terms: supu,v∈Tj ,|u−v|≤a/√n |ψ ◦ b(j) (u) − ψ ◦ b(j) (v)|, which tends to zero by continuity, and the term sup ψ ◦ b(j) (u) × u∈Tj

sup

√ s,t∈C,|t(j) −s(j) |≤a/ n

|αn (t) − αn (s)|

which also converges to zero in probability by the tightness of αn . Further note that under the same conditions, (2.1) can be written as o n µj (t, ψ ◦ f (j) ) = kj (t(j) )ψ ◦ b(j) (t(j) )E II(ε ≤ t) ε(j) = t(j) ∂K(t) . (2.5) ∂t(j) Therefore the mapping t 7→ µj (t, ψ ◦ f (j) ) is continuous if one assumes that (j) ∂K(t) kj (t ) ∂t(j) is continuous on πj−1 (Tj ). = kj (t(j) )ψ ◦ b(j) (t(j) )

The next hypothesis is needed when T is not closed. This is the case for example when the densities kj are not bounded. Recall that Hypothesis I requires that these densities are bounded on compact subsets of T . Before stating the hypothesis, let Q be the set of all functions q defined in a positive neighborhood of zero, such that q is positive and increasing, q(u)/u is decreasing, and such that q(2u)/q(u) is bounded above. Hypothesis III. For all j ∈ J such that t∗ = inf Tj is finite and does not belong to Tj , there exist q∗ ∈ Q and a sequence {tn } of positive numbers decreasing to zero, √ √ both depending on j, such that lim n K (j) (t∗ +tn ) = 0, lim q∗ (tn )/(tn n ) = 0, n→∞

n→∞

lim kj (t∗ + u)q∗ (u) = 0, and the sequence u↓0

(

) sup

(j) |H(j) (x) − t∗ ) n (x)|/q∗ (H

x : H (j) (x)−t∗ >tn

n≥1

is tight. For all j ∈ J such that t∗ = sup Tj is finite and does not belong to Tj , there exist q ∗ ∈ Q and a sequence {sn } of positive numbers decreasing to zero, both √ √ depending on j, such that lim n K (j) (t∗ − sn ) = 0, lim q ∗ (sn )/(sn n ) = 0, n→∞

lim kj (t∗ − u)q ∗ (u) = 0, and the sequence

n→∞

u↓0

(

) sup

x : t∗ −H (j) (x)>sn

is tight.

∗ ∗ (j) |H(j) (x)) n (x)|/q (t − H n≥1

386

Kilani Ghoudi and Bruno R´ emillard

Remark 2.3 When Hn and H are distribution functions, the tightness of the sequences in Hypothesis III can possibly be proven by using strong approximations techniques (e.g. Cs¨ org˝ o and R´ev´esz 1981) or by using Theorem 2.4 in Alexander (1987). Recall that when C is a compact set, D(C) is the space of right-continuous functions on C with left limits, equipped with the Skorohod topology. Moreover D(T ) is the projective limit of the spaces {D(C), C is a compact subset of T }, that is, a sequence of processes converges in D(T ) if it converges in D(C) for every compact subset C of T . With these notations, the main result of this paper is stated in the following way. Theorem 2.4 Under Hypotheses I and II above, the empirical process Kn converges in D(T ) to a continuous process IK with representation IK(t) = α(t) − µ(t, H),

(2.6)

X

(2.7)

where µ is defined by µ(t, f ) =

µj (t, f (j) ),

j∈J

for any f ∈ C˜r . If in addition Hypothesis III holds true, then Kn converges in D(Rd ) to a continuous process having representation (2.6) on T , and vanishing outside T . One easy consequence of Theorem 2.4 and of its proof is the following result which looks like an integration by parts formula. ˜ n of Hn such that Hn = 2.5 Suppose that √ there exists a version H √ Corollory ˜ ˜ n (Hn − H)√∈ Cr and kHn − Hn kr converges in probability to zero. Pnsuch that n d Setting Zn = n n1  i=1 H(Xi ) − µ , suppose that (Zn , Hn ) converges R in R ×Cr Pn √ 1 to (Z, H). Then n n i=1 Hn (Xi ) − µ converges in law to Z + H(x)P (dx). When applying Theorem 2.4, it is often desirable to estimate µ(t, f ). The following corollary provides a uniformly consistent estimate of µ(t, f ). Corollory 2.6 Suppose f ∈ Cr and set n  √ 1 X µ ˆn (t, f ) = √ II{ei,n ≤ t + f (Xi )/ n } − II{ei,n ≤ t} . n i=1

Under Hypotheses I and II above, sup |ˆ µn (t, f ) − µ(t, f )| converges in probability to t∈C

zero, for any compact subset C of T . 3 Examples of Application In this section, examples of application of the above result are presented. The first subsection presents a new result dealing with the serial copula process. The second outlines the application of the main result to the dependence or copula functions. The third subsection is devoted to the residuals of multivariate regression models. The fourth subsection presents applications to empirical processes used to test serial independence in time series and the last subsection deals with dilation of classical statistics.

Empirical processes based on pseudo-observations II: the multivariate case

387

3.1 Serial copulas. Suppose ζ1 , ζ2 , . . . is a stationary and ergodic time series with continuous marginal distribution F . For u ∈ [0, 1], set n 1 X βn (u) = √ [II{F (ζi ) ≤ u} − u] , n i=1

and for t ∈ [0, 1]d , defines n i 1 Xh αn (t) = √ II{F (ζi ) ≤ t(1) , . . . , F (ζi+d−1 ) ≤ t(d) } − C(t) , n i=1

where C is the copula associated with Xi> = (ζi , . . . , ζi+d−1 ), that is C is the common law of the vectors εi = H(Xi ) = {F (ζi ), . . . , F (ζi+d−1 )}, i ≥ 1. The goal is to estimate C. To this end, define pseudo-observations ei,n = (j) Hn (Xi ), where Hn (x) = Fn (x(j) ) and n

Fn (xj ) = √

1X II(ζi ≤ x(j) ). n i=1

n (Fn − F ) = βn ◦ F , that is    o √ √ n Hn (x) = n (Hn − H)(x) = n βn ◦ F x(1) , . . . , βn ◦ F x(d) .

Note that

Then n n o 1X 1X n II{Hn (Xi ) ≤ t} = II Fn (ζi ) ≤ t(1) , . . . , Fn (ζi+d−1 ) ≤ t(d) n i=1 n i=1 √ is the empirical copula and Kn = n (Cn − C) is called the serial copula process. Suppose that for the sequence of processes the following condition holds: (C1) αn converges in C(Rd ) to a continuous process α. It follows that βn converges in C(R) to a continuous process β such that β(0) = β(1) = 0 and having representation

Cn (t) =

β(u) = α(u, 1, . . . , 1) = · · · = α(1, . . . , 1, u),

u ∈ [0, 1].

Further set X = X1 and ε = ε1 . It follows that Hn converges in D(Rd ) to the continuous process H having representation n    o H(x) = β ◦ F x(1) , . . . , β ◦ F x(d) . The interval T is such that (0, 1)d ⊂ T ⊂ [0, 1]d . Let C be defined as     C : {f ; f (j) x(j) = b ◦ F x(j) , b(0) = b(1) = 0, b ∈ C[0, 1], 1 ≤ j ≤ d}, and r(j) ≡ 1 for all 1 ≤ j ≤ d. Then from condition (C1), (αn , Hn ) converges in C(Rd ) × Cr to (α, H). It follows that Hypothesis II holds, using the remark following Hypothesis II. Because is continuous on T . kj ≡ 1 on [0, 1], Hypothesis I follows if one assumes that ∂C(t) ∂t(j) Finally, if T 6= [0, 1]d , one needs to assume the tightness of sup1≥u>tn |βn (u)|/q(u), when 0 6∈ Tj , or the tightness of sup1≥u>tn |βn (1 − u)|/q(u), when 1 6∈ Tj , where √ √ n tn → 0, and q(tn )/( n tn ) → 0 for some q ∈ Q.

388

Kilani Ghoudi and Bruno R´ emillard

Finally, using (2.5), it is easy to check that µj (t, H(j) ) = β(t(j) )

∂C (t). ∂t(j)

One can conclude from Theorem 2.4 that the copula process converges in C(Rd ) to α(t) −

d X

β(t(j) )

j=1

∂C (t). ∂t(j)

Remark 3.1 Condition (C1) is rather a natural condition to impose, for it means that if one can observe the F (ζ1 ), F (ζ2 ), . . ., then the empirical process αn converges. 3.2 Empirical Copula Processes and Semiparametric estimation. Let X be a Rd -valued random variable with continuous distribution function F and marginal distributions F1 , . . . , Fd . Specification of a dependence model is equivalent to the choice of the copula (dependence function) through the relation F (x) = C{F1 (x(1) ), · · · , Fd (x(d) )}, where the copula C is a distribution function concentrated on the unit square with uniform marginals. C is also the distribution function of the random vector  = {F1 (X (1) ), · · · , Fd (X (d) )}. If the marginal distributions are not known then  is not observable. Estimating the copula function is a challenging inference problem that can be tackled from a parametric (e.g Genest et al., 1995) or a nonparametric (e.g. Genest and Rivest 1993) point of view. First the nonparametric approach is described. Let X1 , . . . , Xn be independent and identically distributed random vector with common distribution F . Let Fn be the empirical distribution function of the Xi ’s and let Fjn ’s be the associated empirical marginal distribution functions. Then a nonparametric estimate of the copula function is given by n

−1 (1) −1 (d) Cn (t) = Fn (F1n (t ), . . . , Fdn (t )) =

d

1 XY (j) II{Fjn (Xi ) ≤ t(j) }. n i=1 j=1

√ The asymptotic behaviour of n (Cn − C) was studied in Stute (1984) and Gaenssler and Stute (1987), using a different approach than the one presented here. Using the pseudo-observations approach, note that Cn is nothing but the empirical distribution function Kn based on the pseudo-observations ei,n = Hn (Xi ), where (j) (j) ei,n = Fjn (Xi ) = Hn(j) (Xi ), 1 ≤ j ≤ d. d The interval T is such that (0, 1)√ ⊂ T ⊂ [0, 1]d . The empirical process Kn = n (Cn − C) is called the copula process. It is easy to see that αn converges weakly to a continuous Gaussian process α with α(j) (t(j) ) = α(1, . . . , 1, t(j) , 1, . . . , 1) distributed as a Brownian bridge. Moreover Hn converges weakly to a random vector H with j-th component H(j) = α(j) ◦ H (j) . If C is defined by

C : {f ; f (j) = a(j) ◦ H (j) , a(j) (0) = a(j) (1) = 0, a(j) ∈ C[0, 1], 1 ≤ j ≤ d},

Empirical processes based on pseudo-observations II: the multivariate case

389

and r(j) ≡ 1 for all 1 ≤ j ≤ d, then (αn , Hn ) converges in C(Rd ) × Cr to (α, H). Also, using Lemma 7.1, one can check that condition 2.4 is verified and Hypothesis II holds. Finally, it is easy to check that µj (t, aj ◦ H (j) ) = a(j) (t(j) )

∂C (t), ∂t(j)

provided the derivative exists. Under smoothness assumptions on C (e.g. continud ity of ∂t∂C (j) (t) on [0, 1] ), Hypotheses I is also verified and one can conclude from Theorem 2.4 that the copula process converges in C(Rd ) to α(t) −

d X j=1

α(j) (t(j) )

∂C (t), ∂t(j)

as expressed in Stute (1984). d If one only assumes that ∂t∂C (j) (t) is continuous on (0, 1) , then the copula process converges in C((0, 1)d ) and one needs additional conditions on ∂t∂C (j) (t) in the neighborhoods of t(j) = 0 and t(j) = 1 to extend the convergence to C(Rd ). Next, choosing a parametric model for the dependence structure is equivalent to assume that the copula C belongs to a class of copula Cθ indexed by some parameter θ. A natural way to estimate the parameter θ, is to treat the marginals as nuisance parameters and to replace them by their empirical counterparts. This was the technique adopted by Genest et al., (1995) where the properties of pseudomaximum likelihood estimator of θ were considered. In fact, only hypothesis II is needed to show that " n # √ 1X n φ(ei,n ) − E{φ(ε)} n i=1 R converges in distribution to Z + H(x)> ∇φ{H(x)}dF (x), where Z is a centered Gaussian random variable with variance-covariance matrix given by E{φ(ε)φ(ε)> }. This leads to the more general question of estimating θ where K belongs to some parametric family Kθ . 3.2.1 Estimation of θ when K = Kθ . In general, assume that K = Kθ for some parameter θ. If the density kθ of the εi ’s is regular and Hypothesis II is verified, √ then the pseudo-maximum likelihood estimator θˆn of θ is such that n (θˆn − θ) converges in law to a Gaussian random vector having representation −I −1 (ν + Z), where Z is the usual limit, I is Fisher information, and  Z  ∇θ ∇t k{H(x), θ} ∇θ k{H(x), θ}∇t k > {H(x), θ} ν= − H(x)dF (x). k{H(x), θ} k 2 {H(x), θ}

It is also well known that the copula associated with a multivariate extreme value distribution has a special form and is characterized by some functional parameter θ. That is C belongs to some class Cθ where θ is a function, with special properties, from [0, 1]d−1 to [0, 1]. Replacing the marginals with their empirical equivalents, Abdous et al., (1999) studied the asymptotic properties of kernel estimates of the function θ. A unifying approach dealing with the use of pseudoobservations in function estimation will be considered in a forthcoming paper.

390

Kilani Ghoudi and Bruno R´ emillard

3.3 Multivariate Regression. This section presents a new result dealing with the asymptotic of the empirical process of the residuals of multivariate regression models. So far only the univariate case was considered in the literature. Set X = Rd+p , x = (y, z), y ∈ Rd and z ∈ Rp . Consider the regression model Yi = a + b> Zi + εi , where > stands for transposition, and where Zi and (εl )l≥i are independent, (Zi )i≥1 is a stationary and ergodic series, the εi ’s are independent, εi , a ∈ Rd and b ∈ Rp×d . Hence one can define H(x) = H(y, z) = y − a − b> z, and Hn (x) = Hn (y, z) = y − an − b> n z, where an and bn are estimators of a and b respectively. In that case, the residuals Hn (Xi )√are the pseudo-observations. √ If { n (an − a), n (bn − b), αn } converges in Rd × Rp×d × C(Rd ) to (A, B, α), then (αn , Hn ) converges in C(Rd )×Cr to the continuous process H(y, z) = (α, −A− d X B 0 z), where r(j)(x) = r(j)(y, z) = 1 + |z (j) | and Cr = {u + v 0 z; u ∈ Rd and v ∈ l=1

Rp×d }. If in addition E{|Z|2 } is finite, then by Lemma 7.2, condition 2.4√is verified and√Hypothesis II holds. Note that the joint weak convergence of ( n (an − a), n (bn − b), αn ) holds for the usual ordinary least square estimates of (a, b) or Hodges-Lehmann estimates based on multivariate Kendall’s tau statistic. In fact, it will holds for any estimator of (a, b) that can be written in terms of a ’nice’ functional of the empirical process αn . If one supposes that the density kj of ε(j) = H (j) (Y, Z) is continuous on the support Tj , then Hypothesis I is also satisfied with µ(t, f ) =

d X

{a + b> E(Z)}(j) ∂j K(t),

j=1

∂ K(t). Therefore Theorem 2.4 ∂tj applies and Kn converges in D(T ) to a continuous process having representation where f (y, z) = a + b> z ∈ Cr and where ∂j K(t) =

IK(t) = α(t) +

d X

{A + B > E(Z)}(j) ∂j K(t).

j=1

Note that the above model includes univariate autoregressive models of order p and more generally multivariate autoregressive models of the form ζk − µ = Θ(ζk−1 − µ) + εk . We note that when the Zi ’s are independent, then the weaker condition E(|Z|) < ∞ together with Lemma 7.1 yields the conclusion. Similar results holds for general ARMA series (e.g. see Bai 1994 for univariate ARMA series), but the unified treatment in terms of pseudo-observations will be done in a forthcoming paper. 3.4 Serial Independence in Time Series. This section introduces an application of pseudo-observations for tests of independence. The asymptotic results obtained are new and constitute a good application of the unified approach introduced in this paper. Define n 1X Kn (t1 , . . . , td ) = II{ei,n ≤ t1 , . . . , ei+d−1,n ≤ td }, n i=1

Empirical processes based on pseudo-observations II: the multivariate case

391

where the ei,n ’s are the residuals of a univariate regression or a univariate autoregressive model or the form Yi = a + b> Zi + ei , where Zi and (el )l≥i are independent, (Zi )i≥1 is a stationary and ergodic series, the ei ’s are independent, ei , a ∈ R and b ∈ Rp . Set p Y K(t) = F (t(j) ), j=1

where F is the continuous distribution function (with continuous derivative on its support) of the non-observable error. Consider first the regression case, that is the Zi ’s are independent, and E(|Z|) < ∞. In that case, with the same notations as in the previous section, the limiting process IK of IK has representation d Y  X IK(t) = α(t) + A + B > E(Z) F 0 (t(j) ) F (t(l) ). j=1

l6=j

In the case of autoregressive models, the deterministic part is much more complicated. Suppose that ζi − µ = ei +

p X

φl (ζi−l − µ) = Ri +

l=1

i−1 X

θl ei−l ,

l=0

where the ei ’s are independent and identically distributed mean zero and finite variance, and where the rest Ri has mean zero and is independent of the ei ’s, i ≥ 1. Set Xi> = (ζi , . . . , ζi+1−p−d ), ε> i = (ei , . . . , ei+1−d ), i ≥ p + d. Further set X = Xp+d and ε = εp+d . Next define p X H (j) (x) = x(j) − µ − φl (x(j+l) − µ), l=1

and Hn(j) (x) = x(j) − µ ˆn −

p X

φˆl,n (x(j+l) − µ ˆn ),

l=1

where µ ˆn , φˆ1,n , . . . φˆp,n are respectively estimates of µ and φ1 , . . . , φp . p X Set r(j) (x) = 1 + |x(j+l) | and l=1

( Cr =

f; f

(j)

(x) = m

(j)

+

p X

) (j) bk (x(j+l)

(j)

− µ), m

(j) , b1 , . . . , b(j) p

∈ R, 1 ≤ j ≤ d .

l=1

If

√

 √ √ n (ˆ µn − µ), n (φˆ1,n − φ1 ), . . . , n (φˆp,n − φp ), αn converges in Rp+1 ×

C(Rd ) to (M, Φ1 , . . . , Φp , α), then (αn , Hn ) converges in C(Rd )×Cr to (α, H), where ! p p X X (j) H (x) = − 1 − φl M − Φl (x(j+l) − µ), 1 ≤ j ≤ d. l=1

l=1

392

Kilani Ghoudi and Bruno R´ emillard

It follows from the above and the hypothesis E(e2i ) < ∞ together with Lemma 7.2 that Hypothesis II is verified. p X (j) Next, for f ∈ Cr such that f (x) = m + bl (x(j+l) − µ), 1 ≤ j ≤ d, it follows l=1

that µj (t, f (j) ) is given by  µj (t, f (j) ) = F 0 (t(j) ) m

Y

F (t(l) ) +

l6=j

p d X X

 Y

θq−j−l bl G(t(q) )

l=1 q=j+l

F (t(i) ) ,

i6=j,q

where G(s) = E (e1 II{e1 ≤ s}). So Hypothesis I is also verified and Kn converges in D(T1d ) to a process Kn having representation ! p d X X Y IK(t) = α(t) + 1 − φl M F 0 (t(j) ) F (t(l) ) (3.1) j=1

l=1

+

d X

F 0 (t(j) )

j=1

p X

d X

l=1 q=j+l

l6=j

θq−j−l Φl G(t(q) )

Y

F (t(i) ),

i6=j,q

for all t ∈ T . Remark 3.2 The results of this subsection are used in Ghoudi et al., (2001) to obtain a nonparametric test of serial independence. Note also that representation (3.1) yields many known results on the weak convergence of statistics based on empirical autocorrelation coefficients. 3.5 Dilation of classical statistics. Another application of pseudo-observations is testing of functional hypotheses rather than a finite dimensional hypotheses. n 1X Hn (Xi ) (e.g. Kendall’s tau, Spearman rho, Many test statistics of the form n i=1 Wilcoxon signed-rank) are used to test the hypothesis that the distribution function of H(X) is a given K, where Hn is a consistent estimate of H, and where H may depend on the law of the Xi ’s. Unfortunately, the null hypothesis will be accepted in general whenever the expectation of H(X) coincides with that of the random variable V having distribution function K. More efficient tests could be based on the empirical distribution function Kn constructed from the pseudo-observations rather than on the mean of these pseudoobservations. Note that Z ∞ n 1X Hn (Xi ) − E(V ) = − {Kn (t) − K(t)} dt, n i=1 −∞ √ so if Kn = n {Kn − K} converges to a continuous process K, ( n ) √ 1X n Hn (Xi ) − E(V ) n i=1 Z ∞ will converge in general to − K(t)dt. In particular, Kolmogorov-Smirnov or −∞

Cram´er-von Mises type statistics be more efficient in general ( n based on Kn should ) X √ 1 Hn (Xi ) − E(V ) . than a test based on n n i=1

Empirical processes based on pseudo-observations II: the multivariate case

393

As an illustration let us take a closer look at the Kendall’s tau statistic used as a test of independence. Let {Xi }i≥1 be a stationary and ergodic sequence of bivariate random vectors with continuous distribution function H and let Hn denote the empirical distribution function. If τn denotes the empirical Kendall’s tau which is an estimation of τ = 4E{H(Xi )} − 1 = 4µ − 1, then it is well-known, e.g. Barbe et al., (1996), that n 1 X 3√ n (τn − τ ) = 6 √ {Hn (Xi ) − µ}. 2 n i=1 √ It follows from Corollary 2.5 that if n (Hn − H) is tight, then Z n √ 3√ 1 X n (τn − τ ) ≈ 6 √ {H(Xi ) − µ} + 6 n {Hn (x) − µ}dH(x) 2 n i=1 =

n 1 X (1) (2) 6√ {2H(Xi ) − F1 (Xi ) − F2 (Xi ) − 2µ + 1}, n i=1

where F1 and F2 are respectively the distribution functions of the first and second component of Xi . The first case of interest is the classical independence setting, where the Xi ’s (1) (2) are i.i.d. with Xi and Xi independent. In that case, H(y, z) = yz, so one obtains n 1 X 3√ (1) (2) n τn ≈ 12 √ {F1 (Xi ) − 1/2}{F2 (Xi ) − 1/2} 2 n i=1 which converges to a standard Gaussian distribution from the central limit theorem. Another interesting case is the serial Kendall’s tau studied by Ferguson et al., (2000) and Genest et al., (2002). The Xi ’s are given by Xi = (Yi , Yi+1 ), where Y1 , . . . , Yn , . . . is a stationary and ergodic time series with common continuous distribution function F . Under the white noise hypothesis, one also have n 3√ 1 X n τn ≈ 12 √ {F (Yi ) − 1/2}{F (Yi+1 ) − 1/2} 2 n i=1

which also converges to a standard Gaussian distribution. However, even for √ strongly dependent time series, 23 n τn can converge to a centered Gaussian distribution. Take for example the well-known Tent Map series defined in the following way: Y1 is uniformly distributed on (0, 1) and Yi+1 = 1 − |2Yi − 1|, i ≥ 1. Then one can find that H(Yi , Yi+1 ) = Yi+1 /2, so n 3√ 1 X n τn ≈ −6 √ (Yi − 1/2) 2 n i=1

which converges to a centered Gaussian distribution with variance 3 from the central limit theorem for (reverse) martingales (e.g. Durrett 1996). It follows that a test of serial dependence based on Kendall’s tau will reject 26% percent of time when the series is the Tent Map series. This is coherent with simulations done in Genest et al., (2002). Other recent applications to tests of serial dependence in time series can be found in Ghoudi et al., (2001) and Quessy (2000), see also Abdous et al., (2003) for an application to tests of an extended notion of symmetry ( e.g. Abdous and R´emillard 1995).

394

Kilani Ghoudi and Bruno R´ emillard

3.6 Other applications. Pseudo-observations can also occur when Hn is not random but is a perturbation of H as in asymptotic relative efficiency calculations or in the proof of convergence of Hodges-Lehmann estimators. In these cases, the pseudo-observations Hn (Xi ) can be identified with the observations under contiguous alternatives. 4 Sketch of Proof of Theorem 2.4. The empirical process Kn may be expressed in terms of Hn and the εi ’s as " n # √ √ 1X II{εi ≤ t − Hn (Xi )/ n } − K(t) , Kn (t) = n n i=1 which may also be written as the sum of two subsidiary processes, namely αn (t) = αn,1,1 (0, t), as defined in the statement of Hypothesis II, and βn (t), defined by n  √ 1 X II{εi ≤ t − Hn (Xi )/ n } − II{εi ≤ t} . βn (t) = √ n i=1

The convergence of the process βn will be studied in Sections 5 and 6. First, it will be shown that βn (t) converges in D(T ) to a continuous process by proving that βn differs from a continuous function of the empirical process Hn by a quantity that tends to zero in probability, uniformly on any compact subset C of T . More precisely, Lemma 4.1 Under Hypotheses I and II,   lim P sup |βn (t) + µ(t, Hn )| > η = 0, n→∞

t∈C

for any η > 0 and for any compact subset C of T . If Lemma 4.1 holds true, then the first part of Theorem 2.4 is proven since D(T ) is the projective limit of the spaces {D(C), C is a compact subset of T } and since the continuity of the mapping t 7→ µ(t, f ) for all f ∈ Cr implies that βn (t) converges in D(T ) to µ(t, H). Therefore, when T is closed, Lemma 4.1 and representation 2.6 yield Theorem 2.4. When T is not closed, in order to extend the convergence to all of Rd , it is sufficient to prove that under the additional Hypothesis III, for any compact subset C 0 , the restriction of βn (t) to C 0 ∩ T \ C can be made arbitrarily small for some compact subset C of T . More precisely, Lemma 4.2 Under Hypotheses I, II and III, for any compact subset C 0 of Rd and for any η > 0, one can find a compact subset C of T depending on C 0 and η, so that ( ) lim sup P n→∞

sup

|βn (t)| > η

< η.

t∈C 0 ∩T \C

The proof of Lemma 4.2 is given in Section 6. Finally, to prove Corollary 2.6, set  √ 1  β˜n (t) = √ II{εi ≤ (Hn (Xi ) − f (Xi ))/ n } − II{εi ≤ t} . n Then, because µ(t, ·) is linear, µ ˆn (t, f ) − µ(t, f ) = −{βn (t) + µ(t, H)} + {β˜n (t) + µ(t, H − f )}.

Empirical processes based on pseudo-observations II: the multivariate case

395

It follows from Lemma 4.1 that both terms on the right-hand side converges in probability to zero, uniformly on compact subsets of T . Hence the result. 5 Convergence of βn on compact subsets of T . As argued in Section 5 of Ghoudi and √ R´emillard (1998) one may restrict the proof of Lemma 4.1 to the case where n (Hn − H) ∈ Cr . That is why the rest √ of the paper supposes that Hn = n (Hn − H) ∈ Cr . The next two lemmas are required for the proof of Lemma 4.1 but they have their own utility. The main idea of the proof of Lemma 4.1 is that due to the tightness of Hn , one gets that Hn is arbitrarily close to some non random element of Cr with high probability. 5.1 Auxiliary results. The first lemma is related to the well-known delta method (e.g. van der Vaart and Wellner 1996). However verification of the conditions are longer than the direct proof. Lemma 5.1 For any f ∈ C˜r and any j ∈ J, set n o √ √ n E II{ε(j) ≤ t(j) + f (j) (X)/ n }Wj (ε, t) − II{ε ≤ t} µn,j (t, f (j) ) = n o √ √ = n P ε(j) ≤ t(j) + f (j) (X)/ n , ε(l) ≤ t(l) , ∀ l 6= j √ − n P (ε ≤ t). Under Hypothesis I, µn,j (t, f (j) ) converges uniformly to µj (t, f (j) ) on compact subsets C of Rd such that πj (C) ⊂ Tj . Proof Let j ∈ J be fixed and let C be a compact subset of Rd such that πj (C) ⊂ Tj . Define Y = max{0, f (j) (X)} and Z = max{0, −f (j) (X)}. Then µn,j (t, f (j) ) = µn,j (t, Y ) − µn,j (t, Z), where µn,j (t, Y ) =



h i √ n E II{t(j) < ε(j) ≤ t(j) + Y / n }Wj (ε, t) ,

and where µn,j (t, Z) =



h i √ n E II{t(j) − Z/ n < ε(j) ≤ t(j) }Wj (ε, t) .

It is sufficient to show that µn,j (t, Y ) and µn,j (t, Z) converge respectively to µj (t, Y ) and µj (t, Z), uniformly on C. Since both proofs are similar, only the proof of the convergence of µn,j (t, Y ) to µj (t, Y ) will be given. Choose δ > 0 so that Cδ = {s ∈ π −1 (Tj ); |s(j) − t(j) | ≤ δ, for some t ∈ C} is compact. The rest of the proof mimics the argument in pages 183–184 of Ghoudi and  R´emillard (1998), just replacing the Pt by Pj,t (Y > u) = E II{Y > u}Wj (ε, t)|ε = t(j) and Pt+ √un by Pj,t+ √un ζj where ζj is the vector in Rd with all components equal to (j)

zero but ζj

= 1.

Lemma 5.2 Suppose f ∈ C˜r and set n  √ 1 X  γn (t, f ) = √ II εi ≤ t + f (Xi )/ n − II{εi ≤ t} . n i=1

(5.1)

396

Kilani Ghoudi and Bruno R´ emillard

Then under Hypothesis I and II, sup |γn (t, f ) − µ(t, f )| converges in probability t∈C

to 0, as n tends to infinity, for any compact subset C of Rd so that πj (C) ⊂ Tj whenever f (j) 6≡ 0, 1 ≤ j ≤ d. Proof First, for any A ⊂ Id = {1, . . . , d}, define     n 1 XY f (j) (Xi ) (j) (j) γn (t, A, f ) = √ II εi ≤ t(j) + √ − II{εi ≤ t(j) } n i=1 n j∈A Y (l) × II{εi ≤ t(l) }. l∈Ac

Clearly γn (t, A, f ) ≡ 0 if Card(A \ Jf ) > 0, where Jf = {j ∈ J; f (j) 6≡ 0}. Note that Jf ⊂ J since r(j) ≡ 0 whenever j 6∈ J. Using the multivariate analog of the binomial formula given by  ! d Y X Y Y (ai + bi ) = ai  bj  , i=1

A⊂Id

j∈Ac

i∈A

one finds that γn (t, f ) =

X

γn (t, A, f ).

A⊂Jf ,A6=∅

Let C be a given compact subset of T . To prove the Lemma, one first prove that for any subset A of Jf with Card(A) ≥ 2, sup |γn (t, A, f )| converges in probability to t∈C zero. Then one proves that for any j ∈ Jf , sup γn (t, {j}, f ) − µj (t, f (j) ) converges t∈C

in probability to 0, as n tends to infinity. Let δ > 0 be given and set γn (t, A, δ, f )

=

    n 1 XY f (j) (Xi ) (j) (j) (j) (j) √ II εi ≤ t + √ − II{εi ≤ t } n i=1 n j∈A " #   Y √ (l) (l) (j) × II{εi ≤ t } II max |f (Xi )| ≤ δ n . l∈Ac

1≤j≤d

Then for any δ > 0,   n √ 1 X (j) II max |f (Xi )| > δ n . |γn (t, A, f ) − γn (t, A, δ, f )| ≤ √ 1≤j≤d n i=1 The righthand side of the "last inequality converges in probability to zero as n #  n X √ 1 II max |f (j) (Xi )| > δ n is bounded by tends to infinity, because E √ 1≤j≤d n i=1 d n √ X √ o n P kf kr r(j) (X) > δ n , which in turn converges to zero as n tends to j=1  infinity because E r(j) (X) is finite by Hypothesis I. It follows that for any δ > 0, γn (t, A, f ) and γn (t, A, δ, f ) have the same behaviour as n tends to infinity, so it is enough to consider γn (t, A, δ, f ).

Empirical processes based on pseudo-observations II: the multivariate case

397

Next, choose δ > 0 so that \ Cδ = {s ∈ πj−1 (Tj ); max |s(l) − t(l) | ≤ δ, for some t ∈ C} j∈Jf

1≤l≤d

is compact. Suppose that j ∈ A ⊂ Jf and Card(A) > 1. Set a = kf (j) kr(j) . Then a crude upper bound for |γn (t, A, δ, f )| is given by the sum of the following two terms:   n r(j) (Xi ) 1 X (j) √ II t(j) < εi ≤ t(j) + a √ n i=1 n  " # Y Y (l) (l) × II{t(l) − δ < εi ≤ t(l) + δ} II{εi ≤ t(l) } , l∈Ac

l∈A\{j}

and   n 1 X r(j) (Xi ) (j) √ II t(j) − a √ < εi ≤ t(j) n i=1 n  " # Y Y (l) (l) (l) (l) (l) × II{t − δ < εi ≤ t + δ} II{εi ≤ t } . l∈Ac

l∈A\{j}

Set

√ Dn,j,b (t) = αn,j,1 (b/ n , t) − αn,j,1 (0, t). It is easy to see that (5.2) and (5.2) are respectively sums of d−1−Card(A) > 0 terms of the form n o {Dn,j,a (s) − Dn,j,a (s0 )} + µn,j (s, ar(j) ) − µn,j (s0 , ar(j) ) , and

n o {Dn,j,−a (s) − Dn,j,−a (s0 )} − µn,j (s, −ar(j) ) − µn,j (s0 , −ar(j) ) ,

with s, s0 ∈ Cδ and ks−s0 k ≤ 2δ. If follows from Hypothesis II that sup |Dn,j,±a (t)| t∈Cδ

converges in probability to zero, as n tends to infinity. Next, using Lemma 5.1, one concludes that sup | ± aµj (t, r(j) ) − µn,j (t, ±ar(j) )| converges to zero, as n tends t∈Cδ

to infinity. Finally, from the continuity of the mapping t 7→ µj (r(j) ) on Cδ , one obtains that sup |µj (s, r(j) ) − µj (s0 , r(j) )| s,s0 ∈Cδ ,ks−s0 k≤δ

tends to zero as δ decreases to zero. Therefore sup |γn (t, A, f )| converges in probat∈C

bility to zero whenever Card(A) ≥ 2. It remains to show that for any j ∈ Jf , sup γn (t, {j}, f ) − µj (t, f (j) ) converges t∈C

in probability to 0, as n tends to infinity. From now on, let j be fixed and set g = f (j) /r(j) . Then g is bounded and continuous, so g(X) is compact. It follows that for any λ > 0, one can find s1 , . . . , sm ∈ R so that g(X) is covered by balls Bk centered at sk of radius λ, 1 ≤ k ≤ m. Moreover g(X) is a normal space, so it follows from Theorem 5.1 in Chapter 4 of Munkres (1975) that there exists a partition of unity dominated by

398

Kilani Ghoudi and Bruno R´ emillard

the covering, that is, there exist continuous positive functions ψk , 1 ≤ k ≤ m so m X that the support of ψk is contained in Bk , and ψk ◦ g(x) = 1, for all x ∈ X. k=1

Recall that Wj (ε, t) =

Y {ε(l) ≤ t(l) }. Set l6=j

γn,k (t)

=

n 1 X √ ψk ◦ g(Xi )Wj (εi , t) n i=1 oi n h n √ o (j) (j) × II εi ≤ t(j) + f (j) (Xi )/ n − II εi ≤ t(j) .

Then γn (t, {j}, f ) =

m X

γn,k (t). The rest of the argument follows exactly the same

k=1

steps as the proof of Lemma 5.2 of Ghoudi and R´emillard (1998) and is therefore omitted. Remark 5.3 It follows from Lemma 5.1 and the proof of Lemma 5.2 that for any f ∈ C˜r ,  √  √ n P {ε ≤ t + f (X)/ n } − P (ε ≤ t) (5.2) converges to µ(t, f ), uniformly on compact subsets of T , as n tends to infinity. Proof of Lemma 4.1 Given the above Lemmas the proof is exactly the same as in the univariate case. See pages 186–187 of Ghoudi and R´emillard (1998). 6 Behaviour of βn near the boundary. The proof of Lemma 4.2 is similar to the proof of Lemma 3 in Barbe et al., (1996) and Lemma 4.2 in Ghoudi and R´emillard (1998). Let C 0 = C10 × · · · Cd0 be a compact subset of Rd . It is obvious that the proof of the lemma can be restricted to such compact sets. If T is closed then there is nothing to prove since one can choose the compact subset C = C 0 ∩ T . So suppose that T is not closed. Let A ⊂ Id be the subset of indices j such that Tj is not closed. Let C be a compact subset of T 0 = C 0 ∩ T of the form C = C1 × · · · × Cd , where Cj = Tj ∩ Cj0 , if j 6∈ A. Then [ T0 \ C = CB , B⊂A,B6=∅

where CB = {t; t(j) ∈ Cj0 \ Cj , ∀ j ∈ B, t(j) ∈ Cj , ∀ j 6∈ B}. Note that these sets are disjoint. Let B ⊂ A, B 6= ∅ be fixed and choose j ∈ B. Then three non trivial cases can occur: • Tj0 is of the form Tj0 = (t∗,j , b]; • Tj0 is of the form Tj0 = [a, t∗,j ); • Tj0 is of the form Tj0 = (t∗,j , t∗,j ). One can concentrate only on the first case since the other two cases can be treated similarly.

Empirical processes based on pseudo-observations II: the multivariate case

399

For any j ∈ B, let tn,j be a sequence whose existence is guaranteed by Hypothesis III, and define the following quantities: Y Un,j (x, s) = II{Hn(l) (x) ≤ s(l) }, l6=j

Uj (x, s) =

Y

II{H (l) (x) ≤ s(l) },

l6=j n 1 X (j) (j) βn,j,+ (u) = √ II{u < εi ≤ u + {(Hn − H)(j) (Xi )}− }II{εi > t∗,j + tn,j }, n i=1 n 1 X (j) (j) II{u − {(Hn − H)(j) (Xi )}+ ≤ εi < u}II{εi > t∗,j + tn,j }, βn,j,− (u) = √ n i=1

and 1 Gn,j (u) = √ n

"

n X

# (j) II{εi

≤ u} − K

(j)

(u) .

i=1

Then II{Hn (x) ≤ t} − II{H(x) ≤ t} = II{Hn(j) (x) ≤ t(j) }Un,j (x, s) − II{H (j) (x) ≤ t(j) }Uj (x, t) h i = II{Hn(j) (x) ≤ t(j) } − II{H (j) (x) ≤ t(j) } Un,j (x, t) +II{H (j) (x) ≤ t(j) } {Un,j (x, t) − Uj (x, t)} h i = II{Hn(j) (x) ≤ t(j) } − II{H (j) (x) ≤ t(j) } Un,j (x, t) ×II{H (j) (x) > t∗,j + tn } h i + II{Hn(j) (x) ≤ t(j) } − II{H (j) (x) ≤ t(j) } Un,j (x, t) ×II{H (j) (x) ≤ t∗,j + tn } +II{H (j) (x) ≤ t(j) } {Un,j (x, t) − Uj (x, t)} . Replacing x be Xi , summing over i and rearranging terms yield the following inequality: |βn (t)|

≤ βn,j,+ (t(j) ) + βn,j,− (t(j) ) + |Gn,j (t∗,j + tn,j )| √ + n K (j) (t∗,j + tn,j ) + |βn (t, {j})|,

where βn (t, L) is equal to " # n Y Y 1 X Y (l) (l) √ II{Hn(l) (Xi ) ≤ t(l) } − II{εi ≤ t(l) } II{εi ≤ t(l) }. n i=1 c c l∈L

l∈L

l∈L

Repeating the same arguments for other points in B, one obtains o Xn |βn (t)| ≤ βn,j,+ (t(j) ) + βn,j,− (t(j) ) + |Gn,j (t∗,j + tn,j )| j∈B

√ X (j) K (t∗,j + tn,j )| + |βn (t, B)|. + n j∈B

400

Kilani Ghoudi and Bruno R´ emillard

Therefore Lemma 4.2 will be proven if one can show that for any η > 0, and any j ∈ B, ( ) lim lim sup P

sup

u0 →0 n→∞

βn,j,+ (u) > η

= 0,

(6.1)

= 0,

(6.2)

t∗,j η

t∗,j η} = 0, n→∞ √ lim sup n K (j) (t∗,j + tn,j ) = 0,

(6.3) (6.4)

n→∞

and

 lim sup P n→∞

 sup |βn (t, B)| > η

= 0,

(6.5)

t∈CB

for an appropriate C. (j) (j) Let\Fn be defined by Fn(j) = H (j) if j ∈ B, and otherwise Fn = Hn . Set S= πj−1 (Tj ). Then βn (·, B) correspond to the process βn constructed from j∈Id \B

the pseudo-observations Fn (Xi ) and all hypotheses are satisfied with S replacing T , C 0 = {f ∈ C; f (j) ≡ 0, for all j ∈ B} replacing C and Fn replacing Hn . It follows that Lemma 4.1 holds true, hence Pr

lim sup |βn (t, B) + µ(t, Fn )| −→ 0,

n→∞ t∈CB

√ where Fn = n (Fn − H). Next, for any f ∈ C 0 , supt∈CB |µ(t, f )| can be made arbitrarily small Xby choosing C adequately, because of the continuity of the mapping t 7→ µt (f ) = µj (t, f (j) ) on S. Therefore (6.5) holds true for some C. j∈J\B

The proof will be completed if one can show that for any j ∈ J, (6.1), (6.2), (6.3), and (6.4) also hold true. In fact (6.3) follows from the fact that Gn,j (t∗,j ) = 0 and the tightness of αn . Next (6.4) follows directly from Hypothesis III. Therefore it only remains to prove (6.1) and (6.2). This follows directly from the proof of Lemma 4.2 in Ghoudi and R´emillard (1998). 7 Verification of condition (2.4) In this section it will be shown that condition (2.4) of Hypothesis II holds for most linear models. First it is established that the validity of condition (2.4) can be easily verified by checking the conditions of any of the following two Lemmas. Lemma 7.1 Let j ∈ J be given. If Hypothesis I holds, if the εi ’s are independent and if for any g = f (j) /r(j) , f ∈ Cr , the conditional law of n o (l) g(Xi ), r(j) (Xi ), (εi )l6=j given (j)

Ai−1 = σ{g(X1 ), r(j) (X1 ), . . . , g(Xi−1 ), r(j) (Xi−1 ), ε1 , , . . . , εi−1 , εi } (j)

is the same as the one given σ{εi }, then √ sup αn,j,ψ◦g (s/ n , t) − αn,j,ψ◦g (0, t) t∈C

Empirical processes based on pseudo-observations II: the multivariate case

401

converges to zero in probability as n tends to infinity, for any compact subset C of T. Lemma 7.2 Let j ∈ J be given. If Hypothesis I is satisfied, if αn is tight, if (j) E[{r(j) (X)}2 ] < ∞ and if for any g = f (j) /r(j) , f ∈ Cr , the conditional law of εi given (l)

Bi−1 = σ{g(X1 ), r(j) (X1 ), . . . , g(Xi ), r(j) (Xi ), ε1 , , . . . , εi−1 , (εi )l6=j } is the same as the one given (l)

σ{Zi = (g(Xi+1−q ), r(j) (Xi+1−q ), (εi+1−q )l6=j )1≤q≤p }, for some p ≥ 1. Let Z be the support of the random vector Zi ∈ Rp(d+1) . If the (j) law of εi given Zi = z admits a density kj,f (·, z) which is bounded and such that {kj,f (·, z), z ∈ Z} is equicontinuous on C for any compact subset C of T , then √ sup αn,j,ψ◦g (s/ n , t) − αn,j,ψ◦g (0, t) t∈C

converges to zero in probability as n tends to infinity. Remark 7.3 The existence of density kj,f is a strong assumption. However, it (j) becomes trivial if εi is independent of Zi . In such a case, kj,f = kj . This happens to be the case of most linear models; in particular it holds for linear regression models (e.g. Subsections 3.3) and for autoregressive models (e.g. Subsection 3.4) of the form p X Yi − µ = εi + ρk (Yi−k − µ), k=1

where the εi ’s are independent and identically distributed. In that case, Xi = p X (Yi , Zi ) = (Yi , Yi−1 , . . . , Yi−p ), r(x) = r(y, z) = 1 + |z (k) | and Cr = {c + d0 z; c ∈ R and d ∈ Rp }. It follows that εi is independent of

k=1

σ{g(X1 ), r(X1 ), . . . , g(Xi ), r(Xi ), ε1 , , . . . , εi−1 } ⊂ σ{Y1 , . . . , Yi−1 , ε1 , . . . , εi−1 }. Proof of Lemmas 7.1 and 7.2 Let j ∈ J be given and assume that s > √ 0. The case of s < 0 is similar and is therefore omitted. Further let sn = s/ n, Yi = r(j) (Xi ) and g = f (j) /r(j) . Set Y (l) Ui,n (t) = ψ ◦ g(Xi )II{t(j) < εi ≤ t(j) + sn Yi } × II{εi ≤ t(l) }, l6=j

and set Vi,n (t) = E{Ui,n (t)}. Since C is compact, it can be covered by a finite number mn of closed intervals [aq,n , bq,n ] such that m [n C⊂ [aq,n , bq,n ] ⊂ C 0 ⊂ T, q=1 0

(j)

(j)

for some compact set C , and such that bq,n − aq,n ≤ ∆n , mn = O(∆−1 n ), where ∆n is chosen so that √ 1 lim ∆n n + = 0. (7.1) n→∞ n∆n

402

Kilani Ghoudi and Bruno R´ emillard (l)

(l)

Note that since it is not required that bq,n − aq,n tends to zero as n tends to infinity, for l 6= j, the condition mn = O(∆−1 n ) makes sense. For t ∈ [aq,n , bq,n ], observe that Y (j) (l) (j) Ui,n (t) ≤ Ui,n (bq,n ) + II{a(j) II{εi ≤ b(l) (7.2) q,n < εi ≤ bq,n } × q,n }, l6=j

and (j)

Ui,n (t) ≥ Ui,n (aq,n ) − II{a(j) q,n < εi

≤ b(j) q,n } ×

Y

(l)

II{εi ≤ a(l) q,n }.

(7.3)

l6=j

Taking expectations and summing (7.2) and (7.3) over i yield the following bound: √ αn,j,ψ◦g (s/ n , t) − αn,j,ψ◦g (0, t) ≤ sup t∈[aq,n ,bq,n ]

√ √ max |S q,n | + n pq,n + n {K(bq,n ) − K(b0q,n )} + |αn (bq,n ) − αn (b0q,n )|, i √ √ |S q,n | + n pq,n + n {K(aq,n ) − K(a0q,n )} + |αn (aq,n ) − αn (a0q,n )| , h

where

n 1 X S q,n = √ {Ui,n (aq,n ) − Vi,n (aq,n )}, n i=1 n 1 X {Ui,n (bq,n ) − Vi,n (bq,n )}, S q,n = √ n i=1

pq,n = V1,n (bq,n ) − V1,n (aq,n ), 0(l) aq,n

0(j)

(l) aq,n

(j)

0(l)

(l)

if l 6= j and aq,n = bq,n , and bq,n = bq,n if l 6= j and = and where (j) 0(j) bq,n = aq,n . It follows from Hypothesis I that √ √ max n {K(bq,n ) − K(b0q,n )} ≤ ∆n n sup kj (t(j) ), 1≤q≤mn

t∈C 0

which tends to zero as n tends to infinity since ∆n satisfies (7.1). √ max n {K(aq,n ) − K(a0q,n )} tends to zero as n goes to infinity.

Similarly

1≤q≤mn

Next, it follows from the tightness of αn (Hypothesis II) and (7.1) that  max max |αn (bq,n ) − αn (b0q,n )|, |αn (aq,n ) − αn (a0q,n )| 1≤q≤mn

tends to zero in probability as n goes to infinity. It follows from the proof of Lemma 5.1, in the case of Lemma 7.1, or from the (j) hypotheses on √ the conditional density of εi given Zi , in the case of Lemma 7.2, n pq,n tends to zero as n goes to infinity. Moreover in either cases, that max 1≤q≤mn

one has sup Vi,n (t) = O(n−1/2 ).

(7.4)

t∈C 0

Therefore to show that (2.4) goes to zero in probability, as n tends to infinity, it suffices to prove that for any λ > 0,   lim P max |S q,n | > λ = 0, (7.5) n→∞

1≤q≤mn

Empirical processes based on pseudo-observations II: the multivariate case

403

and 

 max |S q,n | > λ

lim P

n→∞

1≤q≤mn

= 0.

(7.6)

Only (7.5) will be proved since the proof of (7.6) is similar. Let (Gi )i≥1 be a filtration such that Ui,n (bq,n ) is Gi -measurable. Let Fi−1,n = σ{g(X1 ), r(X1 ), . . . , g(Xi−1 ), r(Xi−1 ), ε1 , , . . . , εi }, Set di,q,n = Ui,n (bq,n ) − E{Ui,n (bq,n )|Gi−1 } and κi,q,n = E{Ui,n (bq,n )|Gi−1 } − Vi,n (bq,n ), and let n 1 X di,q,n , ζq,n = √ n i=1 and n 1 X ξq,n = √ κi,q,n . n i=1

Since |S q,n | ≤ |ζq,n | + |ξq,n |, (7.5) will be proved if for any λ > 0,   lim P max |ζq,n | > λ = 0, n→∞

1≤q≤mn

(7.7)

and  lim P

n→∞

 max |ξq,n | > λ

1≤q≤mn

= 0.

(7.8)

Now 

 max |ζq,n | > λ ≤ mn max P {|ζq,n | > λ} 1≤q≤mn 1≤q≤mn  4 −4 , ≤ λ mn max E ζq,n

P

1≤q≤mn

and  P

  4 . max |ξq,n | > λ ≤ λ−4 mn max E ξq,n 1≤q≤mn

1≤q≤mn

Therefore (7.7) and (7.8) hold if  4 lim mn max E ζq,n = 0,

(7.9)

 4 lim mn max E ξq,n = 0,

(7.10)

n→∞

1≤q≤mn

and n→∞

1≤q≤mn

hold respectively. Now (di,q,n , Gi )1≤i≤n is a martingale difference and it follows from Rosenthal’s inequality (e.g. Hall and Heyde 1980), that  "  #2 n n   X X 4 E{ζq,n } ≤ cn−2 E E{d2i,q,n |Gi−1 } + E{d4i,q,n } , (7.11)   i=1

for some positive constant c.

i=1

404

Kilani Ghoudi and Bruno R´ emillard

End of the proof of Lemma 7.1 Set Gi = Ai . Then ξq,n is the sum of i.i.d mean zero random variables, and one easily gets  2 4 E{ξq,n } ≤ 3 E[κ21,q,n ] + E{κ1,q,n (r)4 }/n. Going back to the definition of κi,q,n , one sees that E(κ21,q,n ) ≤ E(V1,q,n ) = O(n−1/2 ), 4 from the argument in the proof of Lemma 5.1. Moreover, E(κ4i,q,n ) ≤ 1, so E(ξq,n )= −1 −1 O(n ). Since mn = O(∆n ), it follows that (7.9) holds. (j) Next, observe that |di,q,n | ≤ 1 and E{d2i,q,n |Ai−1 } ≤ E[Ui,q,n (bq,n )|εi ]. Re(j)

placing in (7.11) and using (7.4) and the fact that the εi ’s are independent yield 4 E(ζq,n ) = O(n−1 ). This completes the proof of Lemma 7.1. End of the proof of Lemma 7.2 Suppose now that the hypotheses of Lemma 7.2 hold. Set Gi = Bi . Fix η > 0 and choose δ such that bq,n + δ ∈ Tj if sup 6∈ Tj and such that s∈Tj

sup |kj,f (s1 , z) − kj,f (s2 , z)| ≤ η.

sup

|s1 −s2 |

Suggest Documents