A central limit theorem for triangular arrays of weakly dependent random variables Michael H. Neumann Friedrich-Schiller-Universit¨at Jena Institut f¨ ur Stochastik Ernst-Abbe-Platz 2, 07743 Jena, Germany E-mail:
[email protected]
Abstract We derive a central limit theorem for triangular arrays of possibly nonstationary random variables satisfying a condition of weak dependence in the sense of Doukhan and Louhichi (1999). The proof uses an apparently new variant of the Lindeberg method where the behavior of the partial sums is compared to that of partial sums of dependent Gaussian random variables. We also discuss a few applications in statistics which show that the formulation of our central limit theorem is actually tailor-made for statistics of different type.
May 9, 2010 2000 Mathematics Subject Classification. Primary 60F05; secondary 62F40, 62G07, 62M15. Keywords and Phrases. Central limit theorem, Lindeberg method, weak dependence, bootstrap. Running title. Central limit theorem for dependent rv’s.
1
1. Introduction For a long time mixing conditions have been the dominating way for imposing a restriction on the dependence between time series data. The most popular notions of this type are strong mixing (α-mixing) which was introduced by Rosenblatt (1956), absolute regularity (β-mixing) introduced by Volkonski and Rozanov (1959) and uniform mixing (φ -mixing) proposed by Ibragimov (1962). These concepts were extensively used by statisticians since many processes of interest fulfill such conditions under natural restrictions on the process parameters; see Doukhan (1994) as one of the standard references for examples and available tools. On the other hand, it turned out that some classes of processes statisticians like to work with are not mixing although a decline of the influence of past states takes place as time evolves. The simplest example of such a process is a stationary AR(1)process, Xt = θXt−1 + εt , where the innovations are independent with P (εt = 1) = P (εt = −1) = 1/2 and 0 < |θ| ≤ 1/2 . This process has a stationary distribution supported on [−2, 2] and it follows from the above model equation that Xt has always the same sign as εt . Hence, we could perfectly recover Xt−1 , Xt−2 , . . . from Xt which clearly excludes any of the common mixing properties to hold. (Rosenblatt (1980) mentions the fact that a process similar to (Xt )t∈Z is purely deterministic going backwards in time. A rigorous proof that it is not strong mixing is given by Andrews (1984).) On the other hand, it can be seen from the equation Xt = εt +θεt−1 +· · ·+θt−s−1 εs+1 +θt−s Xs that the impact of Xs on Xt declines as t−s tends to infinity. Besides this simple example, there are several other processes of this type which are of considerable interest in statistics and for which mixing properties cannot be proved. To give a more relevant example, for bootstrapping a linear autoregressive process of finite order, it is most natural to estimate first the distribution of the innovations by the empirical distribution of the (possibly re-centered) residuals and to generate then a bootstrap process iteratively by drawing independent bootstrap innovations from this distribution. It turns out that commonly used techniques for proving mixing of autoregressive processes fail; because of the discreteness of the bootstrap innovations it is in general impossible to construct a (successful) coupling of two versions of the process with different initial values. Inspired by such problems, Doukhan and Louhichi (1999) and Bickel and B¨ uhlmann (1999) proposed alternative notions called weak dependence and ν-dependence, respectively. Basically, they imposed a restriction on covariances of smooth functions of blocks of random variables separated by a certain time gap and called a process weakly dependent (or ν-dependent, respectively) if these covariances tend to zero as the time gap increases. It has been shown that many classes of processes for which mixing is problematic satisfy appropriate versions of these weaker conditions; see e.g. Dedecker et al. (2007) for numerous examples. The main result in this paper is a central limit theorem for triangular arrays of weakly dependent random variables. Under conditions of this type, first central limit theorems were given by Corollary A in Doukhan and Louhichi (1999) for sequences of stationary random variables and by Theorem 1 in Coulon-Prieur and Doukhan (2000) for triangular arrays of asymptotically sparse random variables as they appear with nonparametric kernel density estimators. Using their notion of ν-mixing
2
Bickel and B¨ uhlmann (1999) proved a CLT for linear processes of infinite order and their (smoothed) bootstrap counterparts. CLTs for triangular arrays of weakly dependent, row-wise stationary random variables were also given by Theorem 6.1 in Neumann and Paparoditis (2008) and by Theorem 1 in Bardet, Doukhan, Lang and Ragache (2008). The latter one is applicable in situations where the dependence between the summands declines as n → ∞. This is a common situation in nonparametric curve estimation where the so-called “whitening-by-windowing” principle applies. Under projective conditions, versions of a CLT were derived by Dedecker (1998), Dedecker and Rio (2000) and Dedecker and Merlev`ede (2002). In the present paper we generalize previous versions of a CLT under weak dependence. Having applications in statistics such as to bootstrap processes in mind, it is important to formulate the theorem for triangular arrays. Moreover, motivated by a particular application discussed in Section 3, we also allow that the random variables in each row are nonstationary. The versatility of our CLT is indicated by the applications discussed in Section 3. To prove this result, we adapt the approach introduced by Lindeberg (1922) in the case of independent random variables. Actually, we use a variant of this method where the asymptotic behavior of the partial sums is explicitly compared to the behavior of partial sums of Gaussian random variables. Since the increments of the variances of the partial sum process are not necessarily nonnegative we take dependent Gaussian random variables, having the same covariance structure as the original ones, as an appropriate counterpart. In retrospect, this apparently new modification of the classical Lindeberg method seems to be quite natural in the case of dependent random variables and it turns out that it does not create any essential additional problems compared to situations where the partial sum process can be compared to a process with independent Gaussian summands. This paper ist organised as follows. In Section 2 we state and discuss the main result. Section 3 is devoted to a discussion of some possible applications in statistics. The proof of the CLT is contained in Section 4.
2. Main result Theorem 2.1. Suppose that (X )k=1,... ,n , n ∈ N, is a triangular scheme of random Pn,k n 2 variables with EXn,k = 0 and k=1 EXn,k ≤ v0 , for all n, k and some v0 < ∞. We assume that σn2 := var(Xn,1 + · · · + Xn,n ) n→∞ −→ σ 2 ∈ [0, ∞).
(2.1)
and that n X
k=1
2 I(|Xn,k | > ǫ)} n→∞ −→ 0 E{Xn,k
(2.2)
holds for all ǫ > 0. Furthermore, we assume that there exists a summable sequence (θr )r∈N such that, for all u ∈ N and all indices 1 ≤ s1 < s2 < · · · < su < su + r = t1 ≤ t2 ≤ n, the following upper bounds for covariances hold true: for all measurable
3
and quadratic integrable functions f : Ru −→ R, |cov (f (Xn,s1 , . . . , Xn,su ), Xn,t1 )| ≤
q
Ef 2 (Xn,s1 , . . .
, Xn,su )
q
2 EXn,t 1
∨n
−1/2
θr ,
(2.3)
for all measurable and bounded functions f : Ru −→ R, |cov (f (Xn,s1 , . . . , Xn,su ), Xn,t1 Xn,t2 )|
2 2 ≤ kf k∞ EXn,t + EXn,t + n−1 θr , 2 1
where kf k∞ = supx∈Ru |f (x)|. Then
(2.4)
d
Xn,1 + · · · + Xn,n −→ N (0, σ 2 ). Remark 1. To prove this theorem, we use the approach introduced by Lindeberg (1922) in the case of independent random variables. Later this method has been adapted by Billingsley (1961) and Ibragimov (1963) to the case of stationary and ergodic martingale difference sequences, by Rio (1995) to the case of triangular arrays of strongly mixing and possibly nonstationary processes, by Dedecker (1998) to stationary random fields, by Dedecker and Rio (2000) for deriving a functional central limit theorem for stationary processes, and by Dedecker and Merlev`ede (2002) and Neumann and Paparoditis (2008) for triangular arrays of weakly dependent random variables. The core of Lindeberg’s method is that, for any suffiently regular test fun′′ tion h, E{h(Xn,1 +· · ·+Xn,k )−h(Xn,1 +· · ·+Xn,k−1 )} is approximated by E{h (Xn,1 + · · · + Xn,k−1 )vk /2}, where vk = var(Xn,1 + · · · + Xn,k ) − var(Xn,1 + · · · + Xn,k−1 ) is the increment of the variance of the partial sum process. Since an analogous approximation also holds for the partial sums process of related Gaussian random variables one obtains the transition from the non-Gaussian to the Gaussian case. In most of these papers, this idea is put into practice by a comparison of Eh(Xn,1 + · · · + Xn,n ) with Eh(Zn,1 +· · ·+Zn,n ), where Zn,1 , . . . , Zn,n are independent Gaussian random variables 2 with EZn,k = 0 and EZn,k = var(Xn,1 + · · · + Xn,k ) − var(Xn,1 + · · · + Xn,k−1 ). To simplify computations, sometimes blocks of random variables rather than single random variables are considered; see e.g. Dedecker and Rio (2000) and Dedecker and Merlev`ede (2002). In the case considered here, however, we have to deviate from this common practice. Since Xn,1 , . . . , Xn,n are not necessarily stationary we cannot guarantee that the increments of the variances, var(Xn,1 + · · · + Xn,k ) − var(Xn,1 + · · · + Xn,k−1 ), are always nonnegative. Moreover, it is also not clear if blocks of fixed length would help here. Therefore, we compare the behavior of Xn,1 + · · · + Xn,n with that of Zn,1 + · · · + Zn,n , where Zn,1 , . . . , Zn,n are no longer independent. Rather, we choose them as centered and jointly Gaussian random variables with the same covariance structure as Xn,1 , . . . , Xn,n . Nevertheless, it turns out that the proof goes through without essential complications compared to the common situation with independent Zn,k ’s. Note that Rio (1995) also proved a central limit theorem for triangular arrays of not necessarily stationary, mixing random variables. This author managed the problem with possibly negative increments of the variances of the partial sums by
4
omitting an explicit use of Gaussian random variables; rather he derived an approximation to the characteristic function, on the Gaussian side. 3. Applications in statistics The purpose of this subsection is twofold. First, we intend to demonstrate the versatility of our CLT by a few applications of different type. And second, we also want to show how this can be accomplished in an effective manner. 3.1. Asymptotics of quadratic forms. For the analysis of nonstationary time series, Dahlhaus (1997) introduced the notion of locally stationary processes as a suitable framework for a rigorous asymptotic theory. Basically, one has to deal with a triangular scheme of random variables (Xn,t )t=1,... ,n , n ∈ N, where the covariance behavior of . . . , Xn,[nu]−1 , Xn,[nu] , Xn,[nu]+1 , . . . is asymptotically constant for any fixed u ∈ [0, 1]. In this sense, the time points 1, . . . , n are renormalized to the unit interval. As an empirical version of the corresponding local spectral density, Neumann and von Sachs (1997) proposed the so-called preperiodogram, X 1 In (v, ω) = Xn,[nv+1/2−k/2] Xn,[nv+1/2+k/2] cos(ω k). 2π k∈Z: 1≤[nv+1/2−k/2],[nv+1/2+k/2]≤n The integrated and normalized preperiodogram is defined as √ Z λZ u Jn (u, λ) = n {In (v, ω) − EIn (v, ω)} dv dω. 0
0
The process Jn = (Jn (u, λ))u∈[0,1],λ∈[0,π] appears in many contexts, ranging from testing the hypothesis of stationarity to estimation of model parameters; see Dahlhaus (2009) for several examples. Neumann and Paparoditis (2010) develop details for a bootstrap-based test for stationarity derived from Jn . Below we will sketch how Theorem 2.1 can be employed for proving asymptotic normality of its finite-dimensional distributions, which yields in conjunction with stochastic equicontinuity the convergence to a certain Gaussian process. The advantage of this somehow unusual approach is that finiteness of fourth moments suffices whereas traditionally used cumulant methods require finiteness of all moments. Note that the preperiodogram can be rewritten as In (v, ω) = X ′n An X n ,
where X n = (Xn,1 , . . . , Xn,n )′ and (An )i,j = (2π)−1 cos(ω(i−j))I((i+j−1)/(2n) ≤ v < (i + j + 1)/(2n)) . Accordingly, we have that Jn (u, λ) = X ′n Bn X n − EX ′n Bn X n ,
√ − j)) λ1 ([0, u] ∩ [(i + j − 1)/(2n), (i + j + where (Bn )i,j = n (2π)−1 (sin(λ(i √ − j))/(i 1)/(2n))) for i 6= j, (Bn )i,i = n (2π)−1 λ λ1 ([0, u] ∩ [(i − 1/2)/(n), (i + 1/2)/(n))) and λ1 denotes the Lebesgue measure on (R, B). For proving asymptotic normality of the finite-dimensional distributions of Jn it suffices to show that, for arbitrary λ1 , . . . , λd ∈ [0, π], u1 , . . . , ud ∈ [0, 1] and d ∈ N, d
(Jn (λ1 , u1 ), . . . , Jn (λd , ud ))′ −→ J = (J1 , . . . , Jd )′ ,
(3.1)
5
where J is a Gaussian random vector with zero mean. By the Cram´er-Wold device, (3.1) will follow if Zn =
d X i=1
where v =
Pd
j,k=1 cj
d
ci Jn (λi , ui ) −→ Z ∼ N (0, v),
ck cov(Jk , Jl ) , for any c1 , . . . , cd ∈ R . Zn can be rewritten as Zn = X ′n Cn X n − EX ′n Cn X n ,
where Cn is some (n × n)-matrix with (Cn )i,j
1 1 1∧ = O √ n |i − j|
!!
.
Since the off-diagonal terms decay fast enough as |i−j| grows we can approximate Zn by Zn(K) = X ′n Cn(K) X n − EX ′n Cn(K) X n ,
where (Cn(K) )i,j = (Cn )i,j I(|i − j| ≤ K). It follows under suitable conditions that
n
E {X ′n Cn X n − EX ′n Cn X n } − X ′n Cn(K) X n − EX ′n Cn(K) X n
= var X ′n (Cn − Cn(K) )X n = O(1/K).
o2
(3.2)
According to Theorem 4.2 in Billingsley (1968), it follows from (3.2) that it suffices to prove that, for fixed K, d
Zn(K) −→ N (0, v (K) ),
(3.3)
with v (K) →n→∞ v . The matrix Cn(K) , however, has almost a diagonal form which will allow us to prove (3.3) by Theorem 2.1. To this end, we rewrite Zn(K) as Zn(K) =
n X
Yn,i ,
i=1
P
where Yn,i = i+K j=i−K (Cn )i,j (Xn,i Xn,j − EXn,i Xn,j ) . Under appropriate conditions, the triangular scheme (Yn,i )i=1,... ,n satisfies the assumptions of Theorem 2.1. Here we note that the Yn,i are in general not stationary, even if the Xn,i are. Therefore, it is essential to have a CLT which allows nonstationary random variables in each row. Moreover, it also follows from (3.2) that
sup var(X ′n Cn(K) X n ) − var(X ′n Cn X n ) −→ 0. n
This, together with
var(X ′n Cn X n ) n→∞ −→ v, yields (3.1).
K→∞
6
3.2. Consistency of bootstrap methods. Since our central limit theorem is formulated for triangular schemes it is tailor-made for applications to bootstrap processes. We demonstrate this for the simplest possible example, the sample mean. Assume that we have a strictly stationary process (Xt )t∈Z with EX0 = µ and EX02 < ∞. Additionally we assume that there exists a summable sequence (θr )r∈N such that, for all u ∈ N and all indices 1 ≤ s1 < s2 < · · · < su < su +r = t1 ≤ t2 ≤ n, the following upper bounds for covariances hold true: for all measurable and quadratic integrable functions f : Ru −→ R, |cov (f (Xs1 , . . . , Xsu ), Xt1 )| ≤
q
Ef 2 (Xs1 , . . .
, Xsu )
q
EXt21
∨1
for all measurable and bounded functions f : Ru −→ R,
θr , (3.4)
|cov (f (Xs1 , . . . , Xsu ), Xt1 Xt2 )| ≤ kf k∞ EXt21 + EXt22 + 1 θr .
(3.5) √ Then the triangular scheme (Xn,k )k=1,... ,n , n ∈ N, given by Xn,k = (Xk − µ)/ n satisfies the conditions of Theorem 2.1. In particular, it follows from (3.4) and majorized convergence that σn2
:=
var(Xn,1 + · · · + Xn,n )
=
var(X0 ) + 2
n−1 X
n−k cov(X0 , Xk ) n k=1
−→ σ 2 := var(X0 ) + 2
n→∞
∞ X
k=1
cov(X0 , Xk ) ∈ [0, ∞).
¯ n = n−1 Pnt=1 Xt , that Now we obtain from Theorem 2.1, for X √ d ¯ n − µ) = Xn,1 + · · · + Xn,n −→ n(X Z ∼ N (0, σ 2 ).
(3.6)
Assume now that, conditioned on X1 , . . . , Xn , a stationary bootstrap version is given. We assume that
(Xt∗ )t∈Z
X0∗ ,Xk∗
P∗
holds for all k ∈ N and that
=⇒ P X0 ,Xk
in probability
(3.7)
P
E∗ [(X0∗ )2 ] −→ E[X02 ].
(3.8)
(Starred symbols such as E∗ and P∗ refer as usual to the conditional distribution, given X1 , . . . , Xn .) (3.7) and (3.8) imply that P
cov∗ (X0∗ , Xk∗ ) −→ cov(X0 , Xk )
∀k = 0, 1, . . . .
(3.9)
Assume further that there exist sequences (θn,r )r∈N , n ∈ N, which may depend on the original sample X1 , . . . , Xn , such that, for all u ∈ N and all indices 1 ≤ s1 < s2 < · · · < su < su + r = t1 ≤ t2 ≤ n, the following upper bounds for covariances hold true: for all measurable and quadratic integrable functions f : Ru −→ R, cov∗ f (Xs∗1 , . . . , Xs∗u ), Xt∗1 q q
≤
Ef 2 (Xs∗1 , . . .
, Xs∗u )
E(Xt∗1 )2
∨1
θn,r ,
(3.10)
7
for all measurable and bounded functions f : Ru −→ R, cov∗ f (Xs∗1 , . . . , Xs∗u ), Xt∗1 Xt∗2
≤ kf k∞ E(Xt∗1 )2 + E(Xt∗2 )2 + 1 θn,r ,
and that
P θn,r ≤ θ¯r
∀r ∈ N −→ 1, n→∞
(3.11) (3.12)
for some summable sequence (θ¯r )r∈N . Actually, (3.10) to (3.12) can often be verified for model-based bootstrap schemes if for the underlying model (3.4) and (3.5) are ∗ fulfilled; see Neumann and Paparoditis (2008) for a few examples. We define Xn,k = √ ∗ ∗ (Xk − µn )/ n , where µn = E∗ X0 . It follows, again from (3.7) and (3.8), that n X
k=1
n
∗ ∗ E∗ (Xn,k )2 I(|Xn,k | > ǫ)
o
n √ o P = E∗ (X0∗ − µn )2 I(|X0∗ − µn | > ǫ n) −→ 0.
∗ Hence, the triangular scheme (Xn,k )k∈N , n ∈ N, satisfies the conditions of Theorem 2.1 “in probability”. The latter statement means that there exist measurable sets Ωn with P ((X1 , . . . , Xn )′ ∈ Ωn ) −→n→∞ 1 and, for any sequence (ωn )n∈N with ωn ∈ Ωn , we ∗ )k∈N have that conditioned on (X1 , . . . , Xn )′ = ωn that the triangular scheme (Xn,k satisfies the conditions of our CLT. This, however, implies that √ d ¯ ∗ − µn ) = X ∗ + · · · + X ∗ −→ n(X Z, in probability. n n,1 n,n (3.13)
(3.6) and (3.13) yield that √ √ P ¯ n − µ) ≤ x − P∗ ¯ n∗ − µn ) ≤ x −→ n(X n(X 0, sup P x
which means that the bootstrap produces an asymptotically correct confidence interval for µ. 3.3. Nonparametric density estimation under uniform mixing. As a further test case for the Lindeberg nature of our CLT we consider asymptotics for a nonparametric density estimator under a uniform (φ-) mixing condition. Here we have the typical aspect of sparsity, that is, in the case of a kernel function with compact support, there is a vanishing share of nonzero summands. It turns out that a classical results by Billingsley (1968) derived by standard uniform mixing technology can also be proved by our Theorem 2.1. Assume that observations X1 , . . . , Xn from a strictly stationary process (Xt )t∈Z with values in Rd are available. We assume that X0 has a density which we denote by f . The commonly used kernel density estimator fbn of f is given by n x − Xk 1 X K , fbn (x) = nhdn k=1 hn where K : Rd → R is a measurable “kernel” function and (hn )n∈N is a sequence of (nonrandom) bandwidths. Favorable asymptotic properties of fbn (x) are usually
8
guaranteed if f is continuous at x, K possesses certain regularity properties and, as a minimal condition for consistency, if hn −→n→∞ 0 and nhdn −→n→∞ ∞ . Here we will make the following assumptions: P 1/2 • (Xt )t∈Z is uniform (φ -) mixing with coefficients φ(r) satisfying ∞ < r=1 (φ(r)) ∞. • f is bounded and continuous at x. • For each k ∈ N, the joint density of X0 and Xk , fX0 ,Xk , is bounded. R R d • Rd |K(u)| λ (du) < ∞ and Rd K(u)2 λd (du) < ∞ . • hn −→ 0 and nhdn −→ ∞ . n→∞
n→∞
We q will show how Theorem 2.1 can be employed for proving asymptotic normality of nhdn (fbn (x) − E fbn (x)) . To verify the conditions of our central limit theorem we need appropriate covariance inequalities for φ -mixing processes. For s1 < s2 < . . . < su < t1 < t2 < . . . < tv , it follows from Lemma 20.1 in Billingsley (1968) that |cov (g(Xs1 , . . . , Xsu ), h(Xt1 , . . . , Xtv ))| ≤ 2
q
φ(t1 − su )
q
E{g 2 (Xs1 , . . . , Xsu )}
q
E{h2 (Xt1 , . . . , Xtv )}.
(3.14)
Furthermore, Equation (20.28) in Billingsley (1968) yields that |cov (g(Xs1 , . . . , Xsu ), h(Xt1 , . . . , Xtv ))| ≤ 2 φ(t1 − su )
q
E{|g(Xs1 , . . . , Xsu )|} khk∞ .
(3.15)
In order to get condition (2.4) from our CLT satisfied, we will consider the process (Xt )t∈Z in time-reversed order. We define, for k = 1, . . . , n and n ∈ N, Xn,k = (nhdn )−1/2 {K((x − Xn−k+1 )/hn ) − EK((x − Xn−k+1 )/hn )} .
Now let 1 ≤ s1 < s2 < . . . < su < t1 ≤ t2 ≤ n. We obtain from (3.14) that |cov (f (Xn,s1 , . . . , Xn,su ), Xn,t1 )| ≤ 2 and from (3.15) that
q
φ(t1 − su )
q
E{f 2 (Xn,s1 , . . . , Xn,su )}
q
2 EXn,t , 1
2 |cov (f (Xn,s1 , . . . , Xn,su ), Xn,t1 Xn,t2 )| ≤ 2 φ(t1 − su ) kf k∞ EXn,t . 1
(3.16)
(3.17)
Therefore, the weak dependence conditions (2.2) and (2.3) are fulfilled for θr = q 2 φ(r) . Furthermore, it is easy to see that the Lindeberg condition (2.2) is fulfilled by the triangular scheme. It remains to indentify the limiting variance of Xn,1 +· · ·+ 2 2 −1 Xn,n . Since EXn,k ≤ n−1 E{h−d , for some C < ∞, n K ((x − Xn−k+1 )/hn )} ≤ Cn we obtain from (3.16) that |cov (Xn,k , Xn,k+l )| ≤ 2
q
φ(|l|) C n−1 .
On the other hand, it follows from the boundedness of the joint densities that there exist finite constants C1 , C2 , . . . such that |cov (Xn,k , Xn,k+l )| ≤ hdn Cl n−1
9
holds for all l ∈ N. Both inequalities together yield, by majorized convergence, that n X X
k=1 l6=k
|cov (Xn,k , Xn,l )| n→∞ −→ 0.
Finally, we have, as in the case of independent random variables, that n var (Xn,k )
−→ σ 2 n→∞ 0
:= f (x)
Z
Rd
K 2 (u) λd (du).
Hence, we obtain by Theorem 2.1 that q
d nhdn fbn (x) − E fbn (x) −→ N (0, σ02 ).
Remark 2. In the case of a constant bandwidth, hn = h > 0 for all n ∈ N, we can apply the functional CLT (Theorem 20.1) in Billingsley (1968) to prove that √ d (3.18) nhd fbn (x) − E fbn (x) −→ N (0, σ 2 ), P
−d/2 where σ 2 = var(h−d/2 K((x−X0 )/h)) + 2 ∞ K((x−X0 )/h), h−d/2 K((x− k=1 cov(h Xk )/h)) . Note that the proof of that functional CLT was heavily based on empirical process theory and did not use any of the typical methods of proving asymptotic normality. It can be easily seen that the time-reversed triangular scheme satisfies also the conditions of our Theorem 2.1. Therefore, our CLT provides an alternative proof of (3.18) which is closer to traditional paths.
4. Proof of the main result Proof of Theorem 2.1. Let h : R −→ R be an arbitrary bounded and three times ′′ continuously differentiable function with max{kh′ k∞ , kh k∞ , kh(3) k∞ } ≤ 1 . Furthermore, let Zn,1 , . . . , Zn,n be centered and jointly Gaussian random variables which are independent of Xn,1 , . . . , Xn,n and have the same covariance structure as Xn,1 , . . . , Xn,n , that is, cov(Zn,j , Zn,k ) = cov(Xn,j , Xn,k ), for 1 ≤ j, k ≤ n. Since Zn,1 + · · · + Zn,n ∼ N (0, σn2 ) =⇒ N (0, σ 2 ) it follows from Theorem 7.1 in Billingsley (1968) that it suffices to show that Eh (Xn,1 + · · · + Xn,n ) − Eh (Zn,1 + · · · + Zn,n ) n→∞ −→ 0. We define Sn,k =
Pk−1 j=1
Xn,j and Tn,k =
Pn
j=k+1
(4.1)
Zn,j . Then,
Eh (Xn,1 + · · · + Xn,n ) − Eh (Zn,1 + · · · + Zn,n ) =
n X
∆n,k ,
k=1
where ∆n,k = E {h(Sn,k + Xn,k + Tn,k ) − h(Sn,k + Zn,k + Tn,k )} . P
P
k−1 2 2 We set vn,k = 21 EXn,k + j=1 EXn,k Xn,j and v¯n,k = 12 EZn,k + nj=k+1 EZn,k Zn,j . Note that 2vn,k = var(Sn,k + Xn,k ) − var(Sn,k ) are the commonly used increments of the variance of the partial sum process while 2¯ vn,k = var(Zn,k + Tn,k ) − var(Tn,k ) are
10
the increments of the variance of the reversed partial sum process. Now we decompose (1) (2) (3) further ∆n,k = ∆n,k − ∆n,k + ∆n,k , where (1)
′′
(2)
′′
∆n,k = Eh(Sn,k + Xn,k + Tn,k ) − Eh(Sn,k + Tn,k ) − vn,k Eh (Sn,k + Tn,k ), ∆n,k = Eh(Sn,k + Zn,k + Tn,k ) − Eh(Sn,k + Tn,k ) − v¯n,k Eh (Sn,k + Tn,k ), (3)
′′
∆n,k = (vn,k − v¯n,k )Eh (Sn,k + Tn,k ). We will show that n X
k=1
(i) Upper bound for |
(i)
∆n,k n→∞ −→ 0,
Pn
for i = 1, 2, 3.
(4.2)
(1)
k=1
∆n,k |
Let ǫ > 0 be arbitrary. We will actually show that n X (1) ∆ n,k ≤ ǫ k=1
for all n ≥ n(ǫ)
(4.3)
and n(ǫ) sufficiently large. It follows from a Taylor series expansion that (1) ∆n,k
(
)
2 Xn,k ′′ ′′ = EXn,k h (Sn,k +Tn,k ) + E h (Sn,k + τn,k Xn,k + Tn,k ) − vn,k Eh (Sn,k +Tn,k ), 2 ′
for some appropriate random τn,k ∈ (0, 1) . (It is not clear that τn,k is measurable, ′′ 2 h (Sn,k +τn,k Xn,k +Tn,k ) is. Therefore, the above expectation is correctly however, Xn,k defined.) Since Sn,1 = 0 and, therefore, EXn,k h′ (Sn,1 + Tn,k ) = 0 we obtain that EXn,k h′ (Sn,k + Tn,k ) =
k−1 X j=1
=
k−1 X
EXn,k {h′ (Sn,j+1 + Tn,k ) − h′ (Sn,j + Tn,k )} ′′
EXn,k Xn,j h (Sn,j + µn,k,j Xn,j + Tn,k ),
j=1
(1)
again for some random µn,k,j ∈ (0, 1) . This allows us to decompose ∆n,k as follows: (1)
∆n,k =
k−d−1 X j=1
+
h
n
′′
oi
′′
E Xn,k Xn,j h (Sn,j + µn,k,j Xn,j + Tn,k ) − Eh (Sn,k + Tn,k )
k−1 X
j=k−d
h
n
′′
′′
oi 1 h 2 n ′′ ′′ + E Xn,k h (Sn,k + τn,k Xn,k + Tn,k ) − Eh (Sn,k + Tn,k ) 2 (1,1) (1,2) (1,3) = ∆n,k + ∆n,k + ∆n,k ,
say.
oi
E Xn,k Xn,j h (Sn,j + µn,k,j Xn,j + Tn,k ) − Eh (Sn,k + Tn,k )
(4.4)
11
By choosing d sufficiently large we obtain from (2.3) that n X (1,1) ∆ n,k ≤ 2 v0
n X
k=1
(1,2)
j=d+1
θj ≤
ǫ 3
for all n.
(4.5)
The term ∆n,k will be split up as (1,2)
∆n,k
k−1 X
=
j=k−d
+
n
EXn,k Xn,j h (Sn,j−d′ + Tn,k ) − Eh (Sn,j−d′ + Tn,k )
k−1 X
EXn,k Xn,j Eh (Sn,j−d′ + Tn,k ) − Eh (Sn,k + Tn,k )
j=k−d (1,2,1) ∆n,k
=
o
′′
k−1 X
j=k−d
+
′′
EXn,k Xn,j h (Sn,j + µn,k,j Xn,j + Tn,k ) − h (Sn,j−d′ + Tn,k ) n
n
(1,2,2)
+ ∆n,k
o
′′
′′
′′
(1,2,3)
+ ∆n,k
o
′′
,
(4.6)
say. (The proper choice of d′ will become clear from what follows.) Using the decomposition Xn,k = Xn,k I(|Xn,k | > ǫ′ ) + Xn,k I(|Xn,k | ≤ ǫ′ ) , for some ′ ǫ > 0, we obtain by the Cauchy-Schwarz inequality and the Lindeberg condition (2.2) that n X (1,2,1) ∆ n,k k=1
≤
v v v u u n uX k−1 n n uX X uX u u ′ 2 2 2 ′ t t EXn,j EXn,k I(|Xn,k | > ǫ ) EXn,j + ǫ t 2d j=1
k=1
k=1 j=k−d
v u n k−1 uX X u ×t E{h′′ (Sn,j + µn,k,j Xn,j + Tn,k ) − h′′ (Sn,j−d′ + Tn,k )}2 k=1 j=k−d
= o(1) + O(ǫ′ ).
(4.7)
Using condition (2.4) we obtain that n X (1,2,2) ∆ ≤ d θd′ n,k k=1
2
n X
2 (EXn,k
k=1
−1
!
∨n ) .
(4.8)
Furthermore, it follows from the Lindeberg condition (2.2) that max1≤k≤n {var(Xn,k )} →n→∞ 0 , which implies that n X (1,2,3) ∆n,k −→ 0. n→∞ k=1
This implies, in conjunction with (4.7) and (4.8) that n X ǫ (1,2) ≤ ∆ n,k 3 k=1
for all n ≥ n(1) (ǫ),
(4.9)
12
provided that d′ and n(1) (ǫ) are chosen sufficiently large. Finally, we obtain in complete analogy to the computations above that n X (1,3) ∆ n,k ≤ ǫ/3
for all n ≥ n(2) (ǫ),
k=1
(4.10)
which completes, in conjunction with (4.5) and (4.9), the proof of (4.3). (ii) Upper bound for |
Pn
(2)
k=1
∆n,k | (2)
Similarly to (4.4), we decompose ∆n,k as (2)
∆n,k = E [Zn,k h′ (Sn,k + Tn,k+d )] +
k+d X
j=k+1
h
n
′′
oi
′′
E Zn,k Zn,j h (Sn,k + µ ¯n,k,j Zn,j + Tn,j ) − Eh (Sn,k + Tn,k )
oi 1 h 2 n ′′ ′′ h (Sn,k + τ¯n,k Zn,k + Tn,k ) − Eh (Sn,k + Tn,k ) + E Zn,k 2 (2,1) (2,2) (2,3) = ∆n,k + ∆n,k + ∆n,k ,
say, where µ ¯n,k,j and τ¯n,k are appropriate random quantities with values in (0, 1). It follows from (2.3) that |cov(Zn,j , Zn,k )| = |cov(Xn,j , Xn,k )| ≤
q
2 EXn,j ∨ n−1/2
q
2 EXn,k ∨ n−1/2 θ|j−k| . (4.11)
For our following estimates we make use of Lemma 1 from Liu (1994) which states that, for X := (X1 , . . . , Xk )′ ∼ N (µ, Σ) and any function h : Rk −→ R such that ∂h/∂xi exists almost everywhere and E| ∂x∂ i h(X)| < ∞ , i = 1, . . . , k, the following identity holds true: !′
∂ ∂ h(X)], . . . , E[ h(X)] cov(X, h(X)) = Σ E[ ∂x1 ∂xk
.
(4.12)
Since Sn,k is by construction independent of Zn,1 , . . . , Zn,n we obtain from (4.12) that (2,1)
∆n,k
= cov (Zn,k , h′ (Sn,k + Zn,k+d+1 + · · · + Zn,n )) =
n X
′′
cov(Zn,k , Zn,j ) Eh (Sn,k + Tn,k+d ),
j=k+d+1
which implies that n X (2,1) ∆ n,k ≤ v0 k=1
∞ X
j=d+1
θj .
(4.13)
13 (2,2)
Similarly to (4.6), we decompose ∆n,k as (2,2) ∆n,k
k+d X
=
j=k+1
+
h
oi
′′
E Zn,k Zn,j h (Sn,k + Tn,j+d′ ) − Eh (Sn,k + Tn,j+d′ )
k+d X
E Zn,k Zn,j Eh (Sn,k + Tn,j+d′ ) − Eh (Sn,k + Tn,k )
j=k+1 (2,2,1) ∆n,k
=
′′
k+d X
j=k+1
+
n
E Zn,k Zn,j h (Sn,k + µ ¯n,k,j Zn,j + Tn,j ) − h (Sn,k + Tn,j+d′ ) h
n
h
n
(2,2,2)
+ ∆n,k
′′
oi
′′
′′
(2,2,3)
+ ∆n,k
oi
′′
,
(4.14) Pn
say. It follows from the Lindeberg condition (2.2) that which implies that
k=1
n X (2,2,1) ∆n,k −→ 0. n→∞
E|Zk,n |3 →n→∞ 0 , (4.15)
k=1
We obtain from (4.12) that h
n
′′
oi
′′
E Zn,k Zn,j h (Sn,k + Tn,j+d′ ) − Eh (Sn,k + Tn,j+d′ )
′′
= cov Zn,k Zn,j , h (Sn,k + Tn,j+d′ )
′′
′′
= cov Zn,k , Zn,j h (Sn,k + Tn,j+d′ ) − E[Zn,k Zn,j ] Eh (Sn,k + Tn,j+d′ ) X
=
cov(Zn,k , Zn,j+l ) E[Zn,j h(3) (Sn,k + Tn,j+d′ )],
l≥d′ +1
which provides that n ∞ d X X X (2,2,2) ∆n,k = O θj+l . ′
P
(2,2,3)
(4.16)
j=1 l=d +1
k=1
P
Finally, nk=1 ∆n,k = O( nk=1 E|Zn,k |3 ) = o(1) is obvious. This yields, in conjunction with (4.15) and (4.16), that n X
k=1
P
(2,2)
∆n,k n→∞ −→ 0.
(4.17)
(2,3)
The term nk=1 ∆n,k can be estimated analogously, which gives, together with (4.13) and (4.17), that n X
k=1
(iii) Upper bound for |
Pn
k=1
(3)
∆n,k |
(2)
∆n,k −→ 0. n→∞
(4.18)
14
We split up: n X
(3) ∆n,k
=
n X
k=1
k=1
+
j=k−d
n X
k=1
=
k−1 X
d n X X
k=1 j=1
EXn,k Xn,j −
k−d−1 X j=1
h
k+d X
j=k+1
EXn,k Xn,j − n
′′
EXn,k Xn,j Eh (Sn,k + Tn,k )
n X
j=k+d+1
′′
EXn,k Xn,j Eh (Sn,k + Tn,k )
′′
oi
′′
E Xn,k Xn,k−j Eh (Sn,k + Tn,k ) − Eh (Sn,k−j + Tn,k−j )
+O
∞ X
r=d+1
θr .
(4.19)
We obtain, again from the Lindeberg condition (2.2), that max
1≤k≤n, 1≤j≤d
′′ ′′ Eh (Sn,k + Tn,k ) − Eh (Sn,k−j + Tn,k−j ) −→ 0. n→∞
Therefore, the first term on the right-hand side of (4.19) converges to 0 as n → ∞. The second one can be made arbitrarily small if d is chosen large enough. This completes the proof of the theorem. Acknowledgment . This research was supported by the German Research Foundation DFG (project: NE 606/2-1). References Andrews, D. W. K. (1984). Non-strong mixing autoregressive processes. J. Appl. Probab. 21, 930–934. Bardet, J. M., Doukhan, P., Lang G. and Ragache, N. (2008). Dependent Lindeberg central limit theorem and some applications. ESAIM: Probability and Statistics 12, 154–172. ¨hlmann, P. (1999). A new mixing notion and functional central limit Bickel, P. J. and Bu theorems for a sieve bootstrap in time series. Bernoulli 5, 413–446. Billingsley, P. (1961). The Lindeberg-L´evy theorem for martingales. Proc. Amer. Math. Soc. 12, 788–792. Billingsley, P. (1968). Convergence of Probability Measures. New York: Wiley. Coulon-Prieur, C. and Doukhan, P. (2000). A triangular central limit theorem under a new weak dependence condition. Statist. Probab. Lett. 47, 61–68. Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. Ann. Statist. 25, 1–37. Dahlhaus, R. (2009). Local inference for locally stationary time series based on the empirical spectral measure. Journal of Econometrics 151, 101–112. Dedecker, J. (1998). A central limit theorem for stationary random fields. Probab. Theory Related Fields 110, 397–426. ´ n, J. R., Louhichi, S., Prieur, C. (2007). Weak Dedecker, J., Doukhan, P., Lang, G., Leo Dependence: With Examples and Applications. Lecture Notes in Statistics 190, Springer-Verlag. `de, F. (2002). Necessary and sufficient conditions for the conditional Dedecker, J. and Merleve central limit theorem. Ann. Probab. 30, 1044–1081. Dedecker, J. and Rio, E. (2000). On the functional central limit theorem for stationary processes. Annales de l’institut Henri Poincar´e s´erie B 36, 1–34. Doukhan, P. (1994). Mixing: Properties and Examples. Lecture Notes in Statistics 85, SpringerVerlag. Doukhan, P. and Louhichi, S. (1999). A new weak dependence condition and application to moment inequalities. Stoch. Proc. Appl. 84, 313–342.
15
Ibragimov, I. A. (1962). Some limit theorems for stationary processes. Teor. Veroyatn. Primen. 7 361–392. (in Russian). [English translation: Theory Probab. Appl. 7, 349–382.] Ibragimov, I. A. (1963). A central limit theorem for a class of dependent random variables. Teor. Veroyatnost. i Primenen. 8, 89–94. (in Russian). [English translation: Theor. Probab. Appl. 8, 83–89.] Lindeberg, J. W. (1922). Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung. Mathematische Zeitschrift 15, 211–225. Liu, J. S. (1994). Siegel’s formula via Stein’s identities. Statist. Probab. Lett. 21, 247–251. Neumann, M. H. and Paparoditis, E. (2008). Goodness-of-fit tests for Markovian time series models: Central limit theory and bootstrap approximations. Bernoulli 14, 14–46. Neumann, M. H. and Paparoditis, E. (2010). A test for stationarity. Manuscript. Neumann, M. H. and von Sachs, R. (1997). Wavelet thresholding in anisotropic function classes and application to adaptive estimation of evolutionary spectra. Ann. Statist. 25, 38–76. Rio, E. (1995). About the Lindeberg method for strongly mixing sequences. ESAIM, Probab. Statist. 1, 35–61. Rosenblatt, M. (1956). A central limit theorem and a strong mixing condition. Proceedings of the National Academy of Sciences USA 42, 43–47. Rosenblatt, M. (1980). Linear processes and bispectra. J. Appl. Probab. 17, 265–270. Volkonski, V. A. and Rozanov, Y. A. (1959). Some limit theorems for random functions, Part I. Teor. Veroyatn. Primen. 4 186–207. (in Russian). [English translation: Theory Probab. Appl. 4, 178–197.]