Asymptotic properties of LSE in multivariate continuous regression ...

4 downloads 0 Views 105KB Size Report
rem where we derive the asymptotic distribution of ˆa(T) by using a non central limit theorem for non linear transformations of Gaussian processes. Theorem 3.2.
N. N. LEONENKO (*) – EMANUELE TAUFER (**)

Asymptotic properties of LSE in multivariate continuous regression with long memory stationary errors Contents: 1. Introduction. — 2. The model. — 3. Variance and asymptotic distributions. — 4. Some simulated results. Appendix. References. Summary. Riassunto. Key words.

1. Introduction The presence of data showing long range dependence is a well recognized problem in many important fields such as astronomy, agriculture, chemistry, economics, finance, geophysics, hydrology, metereology and the telecommunications. The study of random processes with long range dependence presents interesting and challenging probabilistic and statistical problems. Recent literature has seen an increasing number of papers developing models for the description and analysis of this phenomenon; for a review of models and applications the reader is referred to Beran (1992, 1994) and Baillie (1996). In the following we will consider a continuous multivariate regression model ν (t) = g (t) A + η(t) ,

t ∈R

(1.1)

(*) School of Mathematics, Cardiff University, Senghennydd Road, Cardiff CF2 4YH, Wales, UK E-mail: [email protected] and Department of Mathematics, Kyiv University (National), Volodymyrska, 64, 252601, Kyiv, Ukraine E-mail: [email protected] (**) Department of Computer and Management Sciences, University of Trento, Via Inama 5, 38100 Trento, Italy E-mail: [email protected] The research was partially supported by the Australian Research Council Grants A 69804041, A 89601825, C 19600199 for the first author

56 where ν (t) = (ν1 (t), . . . , ν p (t)) is a p-vector of observed responses over a time interval [0, T ] ∈ R, g (t) = (g1 (t), . . . , gq (t)) is a known q-vector of functions of time t forming a linearly independent set of real functions positive and square integrable over a finite interval [0, T ] ∈ R; A = [ai j ], i = 1, 2, . . . q, j = 1, 2, . . . p, is a matrix of unknown regression coefficients, η(t) = (η1 (t), . . . η p (t)) is a p-vector of errors. Our paper will focus on the asymptotic properties of Least Squares Estimators (LSE) of the parameter matrix A when the process of errors shows a long range dependence structure. In particular we will consider quite a general setting where the process of errors η(t) is a non linear function of stationary Gaussian sequences encompassing in this way either Gaussian and non-Gaussian situations. The line of investigation will follow the path laid down by Leonenko and Benˇsi´c (1996, 1998) which studied model (1.1) for the case p = 1; their results have been obtained by the methods presented in the works of Dobrushin and Major (1979) and Taqqu (1979) and are based on the asymptotic analysis of orthogonal expansions of non-linear functionals of Gaussian processes. Our findings differ significantly from the corresponding results for regression models with weakly dependent errors (see Hannan, 1970). It will be shown that the limiting distributions of the estimators maybe non Gaussian and the proper normalizing factor depends on g (t) and αm < 1; here α ∈ (0, 1) is the so called dependence index and m ∈ N represents the (Hermite) rank of the first non null term in an expansion of the process of errors by means of Hermite polynomials. In particular, for polynomial regression, i.e. gi (t) = t µi , µi > 0 i = 1, . . . , q estimators can achieve better convergence rates than T 1/2 . These results indicate that regression may still be very good in settings where long range dependence is present. We do not take into consideration here the problem of estimation of the parameter α, for this, see Giraitis and Koul (1997) and their references. Note that the paper considers regression for t ∈ R; it could be pointed out that even if many of today’s data set are available virtually in continuous time, practical applications require to consider the observed phenomena at fixed time points. Notwithstanding, procedures of discretization lead sometimes to loss of information on important parameters; see, for example, the discretized version of the spectral density presented in Leonenko, 1999, p.42, it does not depend on the intermittency parameter β which is an important feature of both turbulence and finance processes. Our results then extend and com-

57 plement the results obtained for regression models with long memory in discrete time, in particular see Yajima (1988, 1991), K¨unsch, Beran and Hampel (1993), Dahlhaus (1995), Robinson and Hidalgo (1997), Deo (1997), Deo and Hurvich (1998), Beran and Ghosh (1998). One can also consult Koul and Mukherjee (1993,1994), see also their references, who considered the asymptotic properties of various robust estimates of regression coefficients. For other continuous time models and applications see Comte and Renault (1996).

2. The model In order to study model (1.1) we rewrite it in vector form, for this purpose, let a( j) denote the j-th column of A, hence ν (t) = G (t) a + η(t),

t ∈R

(2.1)

where a = ( a(1) , . . . a( p) ) is the vector of unknown regression coefficients obtained by stacking the columns of A one over the other and G (t) = Ip ⊗ g (t) , Ip denotes a p × p identity matrix. Next, we need to state the exact structure of the error vector η(t). Let us begin with the following set of assumptions. Assumption 2.1. Let ξ (t) = (ξ1 (t), . . . , ξ p (t)) , t ∈ R, be a p-vector stationary Gaussian process with Eξ (t) = 0,

Eξ (0)ξ (t) = B (t)

where B (t) = [Bi j (t)]1≤i, j≤ p , with Bi j (t) =

1 , i = j, [1 + t 2 ]α/2

and α ∈ (0, 1),

Bi j (t) =

1 ρ0 , i = j, (2.2) [1 + t 2 ]α/2

ρ0 ∈ [0, 1).

A consequence of assumption (2.1) is that the covariance term 1/(1 + t 2 )α/2 has a spectral density f (λ) = f (|λ|), λ ∈ R which has the following exact form (Leonenko and Anh, 2000): 1−α

α−1 2 2 2 , f α (λ) = f α (|λ|) = α √ K α−1 (|λ|) |λ| 2 ( 2 ) π

λ ∈ R,

58 where 1 K ν (z) = 2





s 0

ν−1





1 1 exp − z s + 2 s



z > 0,

ds,

is the modified Bessel function of the third kind of order ν (see, for example, Watson, 1944). We need to note that K ν (z) ∼ (ν)2ν−1 z −ν ,

z ↓ 0,

ν > 0.

Using this last fact it is possible to obtain the following representation for the spectral density (see Donoghue, 1969, p. 295) f α (λ) = c1 (α)|λ|α−1 (1 − θ (|λ|)),

α ∈ (0, 1),

λ ∈ R,

(2.3)

where c1 (α) = [2 (α) cos(απ/2)]−1 and θ(|λ|) → 0 as |λ| → 0. Assumption 2.2. The p-vector of errors η(t), t ∈ R, allows a representation η(t) = Uξ (t) where Uξ (t) = (U1 ξ (t), . . . , U p ξ (t)) and Ui : R p → R, 1 ≤ i ≤ p are real Borel functions such that EUi (ξ (t)) = 0 and E(Ui ξ (t))2 < ∞ for any 1 ≤ i ≤ p. The requirement that α ∈ (0, 1) introduces a long range dependence in the process of errors ξ (t). Also, we assume a compound symmetry of B (t) by setting the same α for all terms Bi j (t), the degree of interaction within the vector ξ (t) being determined by the parameter ρ0 . This will enable us to keep the development of the asymptotic theory for the LSE relatively simple; note however that by an appropriate choice of the function U we can introduce any desired structure in the correlation matrix of the process of errors η(t). On the basis of our assumptions the following step is to find an expansion of η(t) in terms of orthogonal polynomials. In order to do this, let R = [Ri j ]1≤i, j≤ p where R1 j = [ p(1 + ( p − 1)ρ0 )]−1/2 , Ri j = [i(i − 1)(1 − ρ0 )]−1/2 ,

1≤ j ≤ p 2 ≤ i ≤ p,

Rii = −(i − 1)1/2 [i(1 − ρ0 )]−1/2 , Ri j = 0,

otherwise.

1≤ j ≤i

2≤i ≤ p

59 We also point out that R−1 exists, for details see Leonenko (1999, pp. 171-172). Then, the process ξ˜ (t) = R ξ (t)

is a stationary Gaussian process with Eξ˜ (0)ξ˜ (t) = B˜ (t)

Eξ˜ (t) = 0,

where B˜ (t) = [ B˜ i j (t)]1≤i, j≤ p with B˜ i j (t) = Bi j (t),

B˜ i j (t) = 0,

i = j,

i = j,

(2.4)

It follows that we can express the process of errors η(t) as a non linear functional of the process ξ˜ (t) of independent errors. Let ˜ ξ˜ (t) η(t) = U R−1 R ξ (t) = U

(2.5)

where U˜ ξ˜ (t) = (U˜1 ξ˜ (t), . . . , U˜p ξ˜ (t)) , U˜ i (ξ˜ (t)) = Ui R−1 Rξ (t), 1 ≤ i ≤ p. At this point we are ready to provide an orthogonal expansion for 2 d m −u 2 /2 , the process η(t). For this purpose, let Hm (u) = (−1)m eu /2 du me u ∈ R, m ∈ N denote the Chebyshev Hermite polynomials with leading coefficient equal to 1 and define em (u) =

p 

Hm j (u j )

j=1

where u = (u 1 . . . u p ) and m = (m 1 , . . . , m p ) , m j ∈ N. The set of polynomials {em (u)}m ∈N forms a complete orthogonal system in the Hilbert space  L 2 R p ,

p 



φ(u j )du j  =

j=1

 

U˜ :



 Rp

U˜ 2

p 

 

φ(u j )du j < ∞ ,

j=1



φ(u) the standard Gaussian density. The functions U˜ i ∈  denotes  p p L 2 R , j=1 φ(u j )du j , 1 ≤ i ≤ p admit an expansion in mean square convergent series of the following form: U˜ i (u) =

  1 m≥0 m ∈Sm

m!

Cm (i)em (u),

60 where m! = m},

p j=1

m : m j ∈ N, 1 ≤ j ≤ p, m j ! and Sm = {m

Cm (i) =

 Rp

U˜ i (u)em (u)

p 

φ(u j )du j

p j=1

mj = (2.6)

j=1

Hence, given the serial expansion for U˜ i (u) we can express the process of errors in the following way:   1 η(t) = Cm em (ξ˜ (t)) (2.7) m ! m m≥1 ∈Sm where Cm = (Cm (1), . . . , Cm ( p)) . Note that from assumption (2.2) and formula (2.5), C(0,...0) (i) = 0, 1 ≤ i ≤ p. Let us define the Hermite rank of the function U by the smallest integer m such that m ∈ Sm and Cm (i) = 0 for at least one i. Also, recalling (2.4) and applying lemma 3.4.1 in Leonenko (1999, p. 172) we have that Eem (ξ˜ (t))en (ξ˜ (s)) = δnm

p 

Bj j (|t − s|)m j m j !

j=1

δnm

indicates Kronecker’s delta. Hence, we can expand the where error’s covariance function as    1 Eη(t)η(s) = Cm Cn Eem (ξ˜ (t))en (ξ˜ (s)) = n m !n ! m≥1 n≥1 m ∈Sm n ∈Sn (2.8) p   1  mj  Cm Cm Bj j (|t − s|) = m! m≥1 m ∈Sm j=1 Using representations (2.7) and (2.8) we are now ready to study the asymptotic properties of LSE.

3. Variance and asymptotic distributions The least squares estimator of a can be found by minimizing 

T

0

[ν (t) − G (t) a] [ν (t) − G (t) a]dt

with respect to a. The final form of LSE is given by (the integral is taken with respect to every element of the matrices) aˆ (T ) = Q (T )

−1



0

T

G (t) ν (t)dt

61 where



Q (T ) =

0

T

G (t) G (t)dt.

The existence of Q (T )−1 follows from the fact that  0

T

G (t) G (t)dt =



T

0

[ Ip ⊗ g (t)][ Ip ⊗ g (t) ]dt = Ip ⊗



T

0

g (t) g (t) dt

and from linear independence of g (t). It is straightforward to verify that E(ˆa(T )) = a and that 



V(ˆa(T )) = E aˆ (T ) − a aˆ (T ) − a = Q (T )

−1



T



0

T

0



G (t) Eη(t)η(s) G (s) dtds Q (T )−1 .

In order to study the asymptotic behaviour of the estimators we express the variance in terms of the expansion (2.8) and study the asymptotic behaviour thereof. We state the following theorem: Theorem 3.1. Let the regression model (2.1) satisfy assumptions (2.1) and (2.2); let m be the Hermite rank of U˜ and let αm < 1. Suppose the following limits exist and are finite: −1

l = lim d (T )



T →∞

L = lim d (T )−1

0



T →∞

0

1



1

0 1 0

1

g (T u) g (T v)

du dv d (T )−1 |u − v|αm

g (T u) g (T v) dudv d (T )−1

where d (T ) = diag (g1 (T ), . . . , gq (T )), gi (T ) > 0, i = 1 . . . q, for all T > 0, and l is a regular matrix. Then, for the covariance matrix V(ˆa(T )) it holds lim | D(T )−1 Q (T )V(ˆa(T )) Q (T ) D(T )−1 T mα−2 −

T →∞

where D(T ) = Ip ⊗ d (T ).

 1

m ∈Sm

m!

Cm Cm ⊗ l | = 0

62 The proof of Theorem 3.1 is given in the Appendix. Remark 1. In the case ρ0 = 0 the result of Theorem 3.1 still hold only that one needs to consider expansion (2.7) in terms of U and ˜ and ξ˜ (t). ξ (t) instead of U Remark 2. Conditions of Theorem 3.1 hold for polynomial regression. Theorem 3.1 gives us an appropriate normalizing factor to study the asymptotic behaviour of the LSE. This is done in the following theorem where we derive the asymptotic distribution of aˆ (T ) by using a non central limit theorem for non linear transformations of Gaussian processes. Theorem 3.2. Let the regression model (2.1) satisfy assumptions (2.1) and (2.2), let λ = (λ11 , . . . λ1m 1 , . . . λ pm p ) , D(T ) = Ip ⊗ d (T ), d (T ) = diag (g1 (T ), . . . , gq (T )), gi (T ) > 0, i = 1 . . . q, for all T > 0 and suppose there exists a function g¯ (u), u ∈ [0, 1] such that | d (T )−1 g (uT ) − g¯ (u)| → 0,

as

T →∞

(3.1)

uniformly for u ∈ [0, 1] and 



Rm

   

0

1



1

0



g¯ (u) g¯ (v)e

i(u−v)1λ

  dλ dudv  1; from Thm. 3.1 we see that, for m = 1, the process 

α

T 2 −1 D(T )−1 Q (T ) aˆ (T ) − a



has at the limit  a normal distribution with null mean vector and covariance matrix S1 C1 C1 ⊗ l . 4. Some simulated results We have performed some simulations by the method of Monte Carlo for a model with p = q = 2 and for which g (t) = (t 0.6 t 0.8 ) and a = (1 0.4 0.4 1) . For simulating a continuous time long memory Gaussian process, following Comte (1996), we consider the process at specific time points ξ(h), ξ(2h), . . . ξ(nh) with h small. We use Davies and Harte’s (1987) method to simulate a Gaussian process with the desired correlation structure:  1 2n−1 √ Z k gk e2iπ jk/2n ξ( j h) = √ 2 n k=0

j = 0, . . . n

where gk =

n−1  j=0

B( j h)e2iπ jk/2n +

2n−1 

B((2n − j)h)e2iπ jk/2n

k = 0, . . . 2n − 1

j=n

and B( j h) are the autocorrelations and are defined in (2.2). Z k is a sequence of independent complex normal random variables with independent real and imaginary parts for 0 ≤ k ≤ n and with Z k = Z¯ 2n−k for n < k < 2n. Z 0 and Z n are real N (0, 2) and otherwise, real and imaginary parts of Z k are N (0, 1).

64 As far as the process of errors η(t) is concerned, we consider two cases where η(t) is a function of two independent random sequences ξ1 (t) and ξ2 (t) with correlation structure (2.2) with α = 0.2, (ρ = 0). The two settings examined are the following: S1) η1 (t) = ξ1 (t) + 0.2 ξ2 (t), η2 (t) = 0.1 ξ1 (t) + ξ2 (t) S2) η1 (t) = ξ1 (t)2 + 0.2 ξ2 (t)2 − 1.2, η2 (t) = 0.1ξ1 (t)2 + ξ2 (t)2 − 1.1 In setting S1, by applying formula 2.3 we have that C1,0 (1) = E[(ξ1 (t) + 0.2 ξ2 (t))ξ1 (t)] = 1, similarly, we obtain C0,1 (1) = 0.2, C1,0 (2) = 0.1 and C0,1 (2) = 1 hence the rank of U is m = 1 in this case and C1,0 = (1 0.1) , C0,1 = (0.2 1) . One then expects asymptotic normality of the LSE. Doing the same type of computations in S2 we obtain that all coefficients Cm (i), i = 1, 2 are null for m = 1 passing to the case m = 2 we obtain C2,0 (1) = 2, C1,1 (1) = 0, C0,2 (1) = 0.4, and C2,0 (2) = 0.2, C1,1 (2) = 0, C0,2 (2) = 2; hence the rank of U is m = 2 and C2,0 = (2 0.2) , C1,1 = (0 0) and C0,2 = (0.4 2) . For selected values of T the continuous process has been approximated by generating correlated Gaussian sequences of length nh, with h = 0.05 and n = 20, 200, 2000, 4000, which then represent T = 1, 10, 100, 200. Each experiment has been repeated M = 5000 times (M=3000 for n=4000). The mean and variance of the Monte Carlo estimates are shown in Table 1 and Table 2 respectively. Table 1: Montecarlo results for error structure S1.

T=1 T = 10 T = 100 T = 200

a1 (Var)

a2 (Var)

a3 (Var)

a4 (Var)

-0.3534 (37.11) 0.6480 (2.837) 0.9640 (0.075) 0.9826 (0.024)

1.8488 (46.55) 0.6384 (1.431) 0.4155 (0.015) 0.4065 (0.003)

-0.7983 (35.61) 0.0996 (2.819) 0.3712 (0.072) 0.3871 (0.023)

2.2804 (44.69) 1.2026 (1.422) 1.0124 (0.014) 1.0048 (0.003)

Table 2: Montecarlo results for error structure S2.

T=1 T = 10 T = 100 T = 200

a1 (Var)

a2 (Var)

a3 (Var)

a4 (Var)

a1 (Var) 1.0748 (99.35) 1.0304 (5.787) 1.0047 (0.469) 1.0208 (0.154)

a2 (Var) 0.3166 (124.3) 0.3783 (2.918) 0.3975 (0.124) 0.3906 (0.031)

a3 (Var) 0.3772 (96.68) 0.4287 (5.552) 0.4058 (0.441) 0.4036 (0.153)

a4 (Var) 1.0268 (121.0) 0.9793 (2.798) 0.9969 (0.117) 0.9984 (0.031)

65 Note that for setting S1 the mean values of the estimates are very close to the true values in all cases while, in setting S2 this is true only for T = 100 and T = 200. In order to investigate the distribution of the LSE in the two settings we have computed the empirical distribution function over sections of the 4-variate distribution individuated by the hyperplane aˆ 1 + aˆ 2 + aˆ 3 + aˆ 4 = t, where aˆ i , i = 1, . . . 4 are the normalized LSE such that E(aˆ i ) = 0, V(aˆ i ) = 1, E(aˆ i aˆ j ) = 0, i = j. Table 3 and 4, for selected values of t, show the estimated probability P(aˆ 1 + aˆ 2 + aˆ 3 + aˆ 4 ≤ t) obtained via the formula nh 1  1{y ≤t} nh i=1 i

(4.1)

where Yi = aˆ 1 + aˆ 2 + aˆ 3 + aˆ 4 . For comparison, tables 3 and 4 show in the second column the estimated probabilities P(X 1 + X 2 + X 3 + X 4 ≤ t) where (X 1 , X 2 , X 3 , X 4 ) is a 4-variate standard Normal random vector. The estimates have been obtained by (4.1) from a 4-variate sequence of 50000 pseudo-random Normal numbers. Table 3: Empirical distribution function for setting S1. t

Normal

T=1

T = 10

T = 100

T = 200

−6.0 −5.0 −4.0 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 4.0 5.0 6.0

0.0013 0.0064 0.0239 0.0672 0.1072 0.1615 0.2299 0.3111 0.4047 0.5027 0.5982 0.6917 0.7739 0.8410 0.8940 0.9334 0.9773 0.9936 0.9997

0.0001 0.0004 0.0190 0.0730 0.1244 0.1746 0.2318 0.3102 0.4004 0.4928 0.5852 0.6820 0.7628 0.8264 0.8854 0.9252 0.9844 0.9996 1.0000

0.0001 0.0008 0.0164 0.0754 0.1218 0.1802 0.2438 0.3242 0.4136 0.5080 0.6040 0.6894 0.7668 0.8316 0.8846 0.9306 0.9802 0.9986 1.0000

0.0001 0.0004 0.0290 0.0770 0.1170 0.1800 0.2490 0.3300 0.4130 0.5170 0.6170 0.7120 0.7850 0.8400 0.8980 0.9410 0.9820 0.9980 1.0000

0.0001 0.0004 0.0180 0.0675 0.1105 0.1605 0.2330 0.3130 0.4075 0.5005 0.5920 0.6870 0.7630 0.8310 0.8885 0.9275 0.9760 0.9965 1.0000

66 Table 4: Empirical distribution function for setting S2. t

Normal

T=1

T = 10

T = 100

T = 200

−8.0 −7.0 −6.0 −5.0 −4.0 −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 4.0 5.0 6.0

0.0000 0.0003 0.0013 0.0064 0.0239 0.0672 0.1072 0.1615 0.2299 0.3111 0.4047 0.5027 0.5982 0.6917 0.7739 0.8410 0.8940 0.9334 0.9773 0.9936 0.9997

0.0024 0.0080 0.0244 0.0556 0.1212 0.2254 0.2890 0.3702 0.4620 0.5656 0.6638 0.7610 0.8430 0.9092 0.9600 0.9874 0.9984 0.9998 1.0000 1.0000 1.0000

0.0012 0.0064 0.0188 0.0430 0.0960 0.1770 0.2348 0.3018 0.3776 0.4682 0.5670 0.6714 0.7720 0.8666 0.9366 0.9802 0.9972 1.0000 1.0000 1.0000 1.0000

0.0022 0.0058 0.0128 0.0292 0.0578 0.1140 0.1558 0.2072 0.2700 0.3462 0.4380 0.5464 0.6552 0.7604 0.8584 0.9252 0.9682 0.9916 1.0000 1.0000 1.0000

0.0033 0.0053 0.0130 0.0273 0.0580 0.1077 0.1447 0.1980 0.2667 0.3433 0.4377 0.5373 0.6533 0.7620 0.8547 0.9233 0.9707 0.9900 0.9993 1.0000 1.0000

Note that for setting S1 there is much more agreement between the standard Normal distribution and the distribution of the normalized LSE than in setting S2. The differences between the two distributions are of order 1/100 for T = 200 in setting S1 while they stay quite larger in setting S2.

Appendix Proof of Theorem 3.1. Using representation (2.8) for Eη(t)η(s) and the properties of the Kronecker product we get, Q (T )V(ˆa(T )) Q (T ) = Sm + S0

(A.1)

67 where Sm =

 1 m ∈Sm

S0 =

m!

Cm Cm

  1 m!

r ≥m+1 m ∈Sr





T



0

T

0

Cm Cm ⊗

g (t) g (s)



p 

Bj j (|t − s|)m j dtds

j=1



T

0

 0

T

g (t) g (s)

p 

Bj j (|t − s|)m j dtds.

j=1

Consider Sm and let u = t/T and v = s/T . The existence and finiteness of l and representation (2.2) of Bi j (t) allow us to write Sm =

 1

m ∈Sm

m!

 1

Cm Cm T 2−αm ⊗

0

0

1

g (uT ) g (vT ) [T −2 +(u −v)2 ]−

αm 2

dtds.

At this point, let D(T ) = Ip ⊗ d (T ) then  1

D(T )−1 Sm D(T )−1 =

m ∈Sm

⊗ d (T )−1



1



0

0

1

m!

Cm Cm T 2−αm ⊗

g (uT ) g (vT ) [T −2 + (u − v)2 ]−

αm 2

dtds d (T )−1

and hence, as T → ∞, by pre and post multiplying (A.1) by D(T )−1 and reordering terms, we get D(T )−1 Q (T )V(ˆa(T )) Q (T ) D(T )−1 T αm−2 −

 1 m ∈Sm

m!

Cm Cm ⊗ l =

= o(1) + D(T )−1 S0 D(T )−1 T αm−2 where o(1) is the matrix function such that limT →∞ |o(1)(T )| = 0. To prove Theorem (3.1) it remains to show that S0 = o(Sm ) as T → ∞. This can be achieved by the devices shown in Leonenko, 1999, pp.174176 and 280-281, details are omitted to save space. Proof of Theorem 3.2. The proof of Theorem 3.2 rests on the following reduction lemma: Lemma A.1. Let model (2.1) satisfy assumptions (2.1) and (2.2), let m be the rank of U˜ and αm < 1, D(T ) = Ip ⊗ d (T ), d (T ) = diag (g1 (T ), . . .

68 . . . , gq (T )), gi (T ) > 0, i = 1 . . . q, for all T > 0. Then if one of the asymptotic distributions of T

αm −1 2

  D(T )−1 Q (T ) aˆ (T ) − a

or xm (T ) = T

αm −1 2

D(T )−1



T

0

 1

G (t)

m ∈Sm

m!

Cm em (ξ˜ (t))dt

(A.2)

exists, then the other exists too and they are the same. Reduction lemmas have been considered by many authors, for a proof similar to the one needed in Lemma A.1 see Leonenko and Benˇsi´c (1996b). Using Lemma A.1 we search for the asymptotic distribution of xm (T ), in order to do this, in (A.2) we consider the following representation for em (ξ (t)): em (ξ (t)) =







Rm

eit1 λ

mi  p  

f i (λi j )Wi (dλi j )

(A.3)

i=1 j=1

where Wi (·), i = 1, . . . p are independent complex Gaussian White Noise measures on (R, B(R)). For definition of multiple stochastic integrals like (A.3) see Fox and Taqqu (1987). Next, interchanging the order of integration, changing the variables u with t/T , λ with λ∗ = T λ, using representation (2.3) for the covariance function and self similarity of the random measure W (·) we arrive at xm (T ) − κ = =



m

c1 (α) 2

m ∈Sm

×e

iu1λ ∗

du



1 Cm ⊗ m!

mi  p  





Rm 0

|λi∗j |

1

[ d (T )−1 g (uT )K T ( p,m i ,λi∗j ) − g¯ (u)]×

(α−1) 2 Wi (dλ∗



ij)

(A.4)

i=1 j=1

where K T ( p, m i , λi j ) =

mi  p   i=1 j=1



1−θ

λi j T

 1 2

69 note that from (2.3), K T ( p, m i , λi j ) → 1 as T → ∞. To prove that

2T = E(xm (T ) − κ) (xm (T ) − κ) → 0 as T → ∞, by adding and subtracting d (T )−1 g (uT ) inside the integral (A.4), write  c1 (α)m

=

2T

2

m!

m ∈Sm

Cm Cm





× [ d (T ) g (vT ) − g¯ (v)] 

  1

1

0 0

+2





0





R1

[ d (T )−1 g (uT ) − g¯ (u)] ×

e(u−v)λ dλ |λ|1−α

!m

dudv+

−1

d (T )

1

0 0

−1

d (T )

 c1 (α)m m!2

m ∈Sm



g (uT ) g (vT ) d (T )

−1 i(u−v)1λ

e



du dv d λ+

(K T ( p, m i , λi j ) − 1) × |λ11 · · · λ pm p |1−α

Rm

  1

×

1

(K T ( p, m i , λi j ) − 1)2 × |λ11 · · · λ pm p |1−α

Rm

×

=





0

−1

+

1



g (uT ) [ d (T )

−1

g (vT )− g¯ (v)]e

i(u−v)1λ





du dv d λ =

Cm Cm ⊗ (I1T + I2T + I3T )

It is known that (see, for example Zigmund 1977)  R1

e(u−v)λ 2 dλ = |λ|1−α |u − v|α

 R1

|t|α−1 eit dt =

1 |u − v|α c1 (α)

(A.5)

Then, by condition on (3.1) and (A.5) we can make the following estimate "

I1T ≤

#

−1



−1

sup ( d (T ) g (uT ) − g¯ (u)) ( d (T ) g (vT ) − g¯ (v)) × 0≤u,v≤1

× c1 (α)−m

 0

1

 0

1

du dv →0 |u − v|αm

70 as T → ∞. By conditions (3.1) and (3.2) together with (2.3) and dominated convergence theorem we have that limT →∞ I2T = 0 and, combining the previous arguments, it can be shown that limT →∞ I3T = 0.

REFERENCES Baillie, T. (1996) Long memory processes and fractional integration in econometrics, Journal of Econometrics, 73, 5-60. Beran, J. (1992) Statistical methods for data with long-range dependence, Statistical Science, 7, 404-427. Beran, J. (1994) Statistics for Long-Memory Processes, Chapman & Hall, New York. Beran, J. and Ghosh, S. (1998) Root-n-consistent estimation in partial linear models with long-memory errors, Scandinavian Journal of Statistics, 25, 345-357. Comte, F. (1996) Simulation and estimation of long memory continuous time models, J. Time Series Analysis, 17, 19-36. Comte, F. and Renault, E. (1996) Long memory continuous time models, Journal of Econometrics, 73, 101-150. Dahlhaus, R. (1995) Efficient location and regression estimation for long range dependent regression models, Ann. Stat., 23, 1029-1047. Davies, R. B. and Harte, D. S. (1987) Test for the Hurst effect, Biometrika, 74, 95-101. Deo, R. S. (1997) Asymptotic theory for certain regression models with long memory errors, J.Time Series Anal., 18, 385-393. Deo, R. S. and Hurvich, C. M. (1998) Linear trend with fractional integrated errors, J. Time Series Anal., 19, 379-397. Dobrushin, R. L. and Major, P. (1979) Non-central limit theorems for non linear functionals of Gaussian fields, Z. Wahrsch. verw. Gebiete, 50, 27-52. Donoghue, W. J. (1969) Distributions and Fourier Transforms, Academic Press, New York. Fox, R. and Taqqu, M. S. (1987) Multiple Stochastic Integrals with Dependent Integrators, J. Multiv. Anal., 21, 105-127. Giraitis, L. and Koul, H. (1997) Estimation of dependence parameter in linear regression with long range dependent errors, Stoch. Proc. Applic. , 71, 207-224. Hannan, E. (1970) Multiple Time Series, Wiley, New York. Koul, H. L. and Mukherjee, K. (1993) Asymptotics of R-, MD- and LAD-estimators in linear regression models with long range dependence, Probab. Theory and Related Fields, 95, 535-553. Koul, H. L. and Mukherjee, K. (1994) Regression quantiles and related processes under long range dependent errors, J. Multiv. Anal., 51, 318-337. K¨unsch, H. R., Beran, J. and Hampel, F. (1993) Contrasts under long range correlations, Ann. Stat., 21, 943-964.

71 ˇ Leonenko, N. N. and Silac-Benˇ si´c, M. (1996a) Asymptotic properties of the LSE in a regression model with long-memory Gaussian and non-Gaussian stationary errors, Random Oper. and Stoch. Equ., 4, 17-32. ˇ Leonenko, N. N. and Silac-Benˇ si´c, M. (1996b) On estimation of regression coefficients in the case of long-memory noise, Theory of Stoch. Processes, 18, 3-4. ˇ Leonenko, N. N. and Silac-Benˇ si´c, M. (1998) On estimation of regression of longmemory random fields observed on the arrays, Random Oper. and Stoch. Equ., 6, 61-76. Leonenko, N. N. and Anh, V. V. (2000) On the rate of convergence to Rosenblatt distribution for additive functionals of stochastic processes with long range dependence, to appear in J. Applied Math and Stochastics. Leonenko, N. N. (1999) Limit Theorems for Random Fields with Singular Spectrum, Kluwer Academic Publishers, Dordrecht. Robinson, P. M. and Hidalgo, F. J. (1997) Time series regression with long range dependence, Ann. Stat., 25, 77-104. Taqqu, M. S. (1979) Convergence of iterated processes of arbitrary Hermite rank, Z. Wahrsch. verw. Gebiete, 50, 53-58. Watson, G.N. (1944) A Treatise of the Theory of Bessel Functions, University Press, Cambridge. Yajima, Y. (1988) On estimation of a regression model with long-memory stationary errors, Ann. Stat., 6, 791-807. Yajima, Y. (1991) Asymptotic properties of the LSE in a regression model with longmemory stationary errors, Ann. Stat., 19, 158-177. Zygmund, A. (1977) Trigonometric Series, Cambridge University Press, Cambridge.

Asymptotic properties of LSE in multivariate continuous regression with long memory stationary errors Summary This paper focusses on a multivariate continuous regression model with long memory stationary errors. Expressions for the asymptotic variances and the asymptotic distributions of the least squares estimators are obtained. The method of investigation is based on the asymptotic analysis of orthogonal expansions of non linear functionals of stationary Gaussian processes.

Propriet`a asintotiche degli stimatori a minimi quadrati nella regressione multivariata continua con errori stazionari a lunga memoria Riassunto Questo lavoro considera un modello continuo di regressione multivariata dove gli errori costituiscono un processo stazionario a memoria lunga. Ottenuti gli stimatori col

72 metodo dei minimi quadrati, se ne derivano la varianza e le distribuzioni asintotiche. Il metodo di studio e´ basato sull’analisi asintotica di espansioni ortogonali di funzionali non lineari di processi stazionari Gaussiani.

Key words Least Squares Estimators; Long memory errors; Multivariate regression; Hermite polynomials; Multiple Wiener Itˆo integrals.

[Manuscript received April 2000; final version received October 2000.]