Least absolute deviation estimation for fractionally ... - CiteSeerX

10 downloads 0 Views 315KB Size Report
properties of the least absolute deviation estimation are established. This ar- ...... ______. _. _. __. _. _. _. _. __. LAD t(3). MLE t(3). LAD t(5). MLE t(5). LAD nor.
Least absolute deviation estimation for fractionally integrated autoregressive moving average time series models with conditional heteroscedasticity ∗ Guodong Li and Wai Keung Li Department of Statistics and Actuarial Science, The University of Hong Kong

Abstract In order to model time series exhibiting the features of long memory, conditional heteroscedasticity and heavy tails, a least absolute deviation approach is considered to estimate fractionally autoregressive integrated moving average models with conditional heteroscedasticity. The time series generated by this model is short memory or long memory, stationary or nonstationary, depending on whether the fractional differencing parameter d ∈ (−1/2, 0) or (0, ∞),

(−1/2, 1/2) or (1/2, ∞) respectively. Using a unified approach, the asymptotic properties of the least absolute deviation estimation are established. This ar-

ticle also derives the large sample distribution of residual autocorrelations and absolute residual autocorrelations and these results lead to two useful diagnostic tools for checking the adequacy of the fitted models. Some Monte Carlo experiments were conducted to examine the performance of the theoretical results in finite sample cases. As an illustration, the process of modeling the absolute return of the daily closing Dow Jones Industrial Average Index (1995-2004) is also reported. ∗

W.K. Li thanks the Croucher Foundation for awarding a Senior Research Fellowship and the

Hong Kong Research Grant Council grant (HKU7031/04P) for partial support. The authors thank Dr. S. Ling for useful comments and discussion.

1

Short Title: LAD estimation for ARFIMA-GARCH models Key words and phrases: ARFIMA-GARCH, asymptotic distribution, conditional heteroscedasticity, diagnostic checking, heavy tails, least absolute deviation, long memory. AMS2000 Subject classifications: Primary 62M10; Secondary 62F35.

1

Introduction

The phenomenon of long range dependence has been observed in diverse fields of statistical application long before suitable stochastic models are available. Developing appropriate statistical models to explain the long memory property in time series has attracted the attention of many statisticians since the pioneering work of Mandelbrot (1977, 1983) and several long memory models have been established since then, see Beran (1994) and the reference therein. As a most popular long memory model and a natural extension of the classical ARIMA models, the fractionally integrated autoregressive moving average (ARFIMA) process played an important role in this literature and has been widely applied in the fields of hydrology, economics and finance. This model was proposed by McLeod and Hipel (1978), Granger and Joyeux (1980) and Hosking (1981) and has the form of (1.1)

φ(B)(1 − B)d Yt = ψ(B)εt ,

where φ(B) = 1−φ1 B−· · ·−φp B p , ψ(B) = 1+ψ1 B+· · ·+ψq B q , (1−B)d =

P∞

k=0

ak B k

with ak = (k − d − 1)!/k!(−d − 1)!, B is the backward shift operator and {εt } is a

sequence of independent and identically distributed random variables with mean zero and finite variance. When d ∈ (−1/2, 1/2), the process {Yt } generated by model (1.1)

is stationary and exhibits the short memory property for the case d ∈ (−1/2, 0) or

the long memory property for the case d ∈ (0, 1/2). Many procedures of estimation

for this case have been developed, e.g. the regression approach based on periodogram in Geweke and Porter-Hudak (1983), maximum likelihood estimations in Fox and Taqqu (1986), Yajima (1988), Sowell (1992), Li and McLeod (1986) and Chan and Palma (1998). When d ∈ (1/2, ∞), the stationarity condition of the process {Yt } is

not satisfied but model (1.1) is still invertible. Beran (1995) proposed a conditional maximum likelihood estimation for model (1.1) and obtained the asymptotic prop2

erties for not only the stationary case but also the nonstationary case. The above procedures are all based on the condition that εt is normally distributed or at least has finite fourth moment. However, more and more empirical evidence recently has suggested that financial time series can be very heavy-tailed (Mittnik et al., 1998 and Rachev and Mittnik, 2000). The least absolute deviation, known as a robust method, is not sensitive to outlier and therefore may be useful in estimating heavy-tailed time series. Under second-order moment of the error sequence, Davis and Dunsmuir (1997) proposed a least absolute deviation estimation for ARMA models. However, how to perform robust estimation for fractional ARIMA models is still an open problem. Since Engle’s seminal work, the fact that many financial and economic time series have a time varying conditional variance has been widely accepted by most economists and statisticians. In the process of attempting to model this feature, Bollerslev (1986) extended the autoregressive conditional heteroscedasticity (ARCH) model (Engle, 1982), and proposed the generalized autoregressive conditional heteroscedasticity (GARCH) model,  εt = ut σt (1.2) σ 2 = c + Pr t

i=1

ai ε2t−i +

Ps

2 j=1 bj σt−j

where c > 0, ai > 0 and bi > 0 are unknown parameters, and {ut } are independent and identically distributed with mean zero and variance one. Given the recent

interests in time series with long memory and changing conditional variance, it is natural to combine both the fractional ARIMA and conditional heteroscedastic models. Granger et al. (2000) applied ARFIMA models directly to the the absolute return of several stock indices which obviously has a time varying conditional variance. Baillie et al. (1995) considered an ARFIMA(0, d, 1)-GARCH(1, 1) model to analyze the monthly post-World War II consumer price index (CPI) inflation series of 10 different countries. These provided direct empirical evidences of long memory with conditional heteroscedasticity and a complete statistical inference methodology was first developed by Ling and Li (1997). Under the normality of εt , they studied the statistical properties of the general ARFIMA-GARCH model and proposed a maximum likelihood estimation. Two portmanteau tests for checking the adequacy of the fitted model were also constructed. As an extension of Ling and Li (1997), Beran and Feng (2001) considered the local polynomial estimation of semiparameteric models with an 3

ARFIMA-GARCH error but the condition, Eε4t < ∞, was also required. For pure GARCH models, Hall and Yao (2003) showed that the asymptotic distribution of the quasi maximum likelihood estimation may not be normal, when the error sequence are heavy-tailed with an infinite fourth moment. It is natural to expect that the same results also hold for the ARFIMA-GARCH models. Furthermore, as is well known, the quasi maximum likelihood estimation is sensitive to heavy-tailed time series. For this case, in order to obtain a good estimate, outliers will generally be removed before estimation, see Ling and Li (1997) and Granger et al. (2000). This gives rise to another problem: how to estimate an ARFIMA-GARCH model for very heavy-tailed time series without losing the information included in the outliers. Peng and Yao (2003) considered a least absolute deviation estimation for the pure GARCH models under a finite second-order moment of ut only. Motivated by Ling and Li (1997) and Peng and Yao (2003), this article considers a least absolute deviation estimation for general ARFIMA-GARCH models and the asymptotic behavior of the estimators is also derived. The result depends only on the existence of the second order moment of ut and is therefore robust under heavy-tailed distributions. Model estimation is only one of three stages in the Box-Jenkins approach to time series modelling, and the next stage is to check whether or not the fitted model is adequate. Based on the asymptotic distribution of the residual autocorrelations, Box and Pierce (1970) first derived tests for individual residual autocorrelations and also an overall portmanteau statistic for model diagnostic checking. For the asymptotic distribution of the residual autocorrelations for general time series models, see Li (1992). When performing diagnostic checking for ARMA models, McLeod and Li (1983) first considered the squared residual autocorrelations instead of residual autocorrelations. Li and Mak (1994) used the squared residual autocorrelations to devise some useful diagnostic tools for time series models with changing conditional variance. Li and Li (2005) first considered a portmanteau test with absolute residual autocorrelations for the pure GARCH models fitted by a least absolute deviation approach. Note that a finite fourth moment of errors is required for the existence of squared residual autocorrelations. This article derives the asymptotic distributions of the autocorrelations of residuals and absolute residuals from an ARFIMA-GARCH model fitted by the least absolute deviation approach. These results allow us to construct two portmanteau tests that are useful in checking whether or not the fitted models is adequate. 4

This article is organized as follows. Section 2 gives the definition of a least absolute deviation estimation and discusses the asymptotic properties of the estimators. Section 3 derives the asymptotic distribution of residual autocorrelations and absolute residual autocorrelations and hence two portmanteau tests. Several simulation experiments are presented in Sections 4 and 5. Section 6 reports the process of modeling the absolute return of the daily closing Dow Jones Industrial Average Index (1995-2004). Proofs of theorems and lemmas in sections 2 and 3 are given in the Appendix.

2

Least absolute deviation estimation of ARFIMAGARCH models

This section considers a least absolute deviation estimator to the ARFIMA-GARCH model which combines (1.1) and (1.2). The asymptotic properties of this estimator are also derived. Suppose that Y1 , ..., Yn are generated by the following ARFIMA(p, d, q)-GARCH(r, s) process,

(2.1)

  φ(B)(1 − B)d Yt = ψ(B)εt ,    1/2 εt = ut ht ,    h = α + Pr α ε2 + Ps t

0

i=1

i t−i

j=1

βj ht−j ,

where {ut } is an independent and identically distributed sequence with mean zero and

a finite variance σ 2 . In order to pursue the theoretical properties of the least absolute

deviation estimator in this section, as in Peng and Yao (2003), further conditions on the distribution of ut are required. Assumption 1: The median of ut is equal to zero, E|ut | = 1 and the probability

density function f (x) of ut is continuous at the origin.

Note that, from Assumption 1, we have σ 2 = E|ut |2 > 1 which is different from

the requirement for model (1.2). For this case, it is not difficult to obtain that P P σ 2 ri=1 αi + sj=1 βj < 1 is the necessary and sufficient condition for {εt } in (2.1) to exist as a unique strictly stationary sequence with a finite second order moment.

Let l = p + q + r + s + 2 and denote the parameter vector of model (2.1) by 5

λ = (γ T , δ T )T , where γ = (d, φ1 , ..., φp , ψ1 , ..., ψq )T , δ = (α0 , α1 , ..., αr , β1 , ..., βs )T and λ is a l-dimension vector. Assume that the parameter space Θ is a compact set of Rl , the true parameter vector λ0 is an interior point of Θ and each λ in Θ satisfies the following assumptions. Assumption 2: d ∈ ∪J=0 (J − 1/2, J + 1/2), where J is a non-negative integer. All P P roots of the polynomials φ(z) = 1 − pi=1 φi z i and ψ(z) = 1 + qj=1 ψj z j lie outside

the unit circle, φp 6= 0, ψq 6= 0 and there is also no common factor between φ(z) and ψ(z).

P P Assumption 3: αi > 0, i = 0, 1, ..., r, βj > 0, j = 1, ..., s, σ 2 ri=1 αi + sj=1 βj < 1 P P and the polynomials ri=1 αi z i and sj=1 βj z j have no common root.

Note that the restrictions on φ(z) and ψ(z) in Assumption 2 are typical for the

estimation of ARFIMA models. Furthermore, the cases that d = J + 1/2 with integer J > 0 are excluded by Assumption 2. Whether or not the conclusion of this article applies for these cases remains an interesting topic for future possible research. In Assumption 3, we assume that α1 , ..., αr , β1 , ..., βs are not equal to zero. In fact this is necessary to obtain the asymptotic normality of the estimators, see Remark 2.7 of Francq and Zakoian (2004). Finally, because of the compactness of Θ, there exists a lower bound a0 > 0 for the parameter α0 . Under Assumptions 2 and 3, the process {Yt } is invertible, that is, εt can be

written as (2.2)

−1

εt = φ(B)θ (B)

∞ X (k − d − 1)! k=0

k!(−d − 1)!

Yt−k ,

see Ling and Li (1997). When the true parameters in (2.2) are replaced by λ ∈ Θ, εt

can be considered as a function on Θ and hence can be denoted as εt (λ). Similarly, P P the function ht (λ) can be defined iteratively as α0 + ri=1 αi ε2t−i (λ) + sj=1 βj ht−j (λ)

by (2.1). Based on these two functions, we can consider the least absolute deviation estimator for model (2.1). First recall the definition of quasi maximum likelihood estimation for ARFIMA-

GARCH models. The standard normal distribution is assumed temporarily on the model error ut and then, by maximizing the joint conditional density function of the random variables Y1 , ..., Yn , we can obtain the following quasi maximum likelihood

6

estimator, bM LE = argmin λ λ

n X ε2 (λ) { t + loght (λ)}. h (λ) t t=1

Under a finite fourth moment of the errors, all theoretical results of Ling and Li (1997) bM LE since the normality of ut can be removed from the proof. Similarly, also hold for λ

the least absolute deviation approach usually can be considered as maximum likelihood estimation when the model error follows a Laplace distribution, f (x) = a/2e−a|x|

with a > 0. This idea can be found in the linear regression model by Bassett and Koenker (1978), the autoregressive models by Bloomfield and Steiger (1983, Ch. 3) and the ARMA models by Davis and Dunsmuir (1997). Now we use the same idea to define a least absolute deviation estimator for ARFIMA-GARCH models. Assuming that the error of model (2.1) follows a Laplace distribution and then maximizing the joint conditional density function of Y1 , ..., Yn , we can define a least absolute deviation estimator as follows, bLAD = argmin λ λ

n X 1 |εt (λ)| + loght (λ)}. {p ht (λ) 2 t=1

bLAD can be differentiated to any order when model Note that the score function for λ (2.1) degenerates to the pure GARCH case. This special case has been mentioned by Peng and Yao (2003) and discussed by Berkes and Horvath (2003).

For pure GARCH models, Peng and Yao (2003) proposed three least absolute deviation estimators by rewriting the model in the form of regression. We can consider the best of these three for the ARFIMA-GARCH case, namely, n X bP Y = argmin |logε2t (λ) − loght (λ)|. λ λ

t=1

bP Y has an asymptotical bias, which can be shown to be equal to Unfortunately, λ √ bP Y has poor E{1/|ut |}E{(1/ ht )(∂εt /∂λ)}, and simulation results also showed λ

performance. For the other two least absolute deviation estimators, a finite fourth moment seems unavoidable in obtaining asymptotic normality under the ARFIMA-

GARCH models. Hence, this article did not pursue this direction and focused only bLAD . Without confusion, we denote λ bLAD by λ bn , and the correon the estimator λ sponding score function by (2.3)

n X 1 |εt (λ)| + log ht (λ)}. {p Qn (λ) = ht (λ) 2 t=1

7

Next, we study the asymptotic properties of the least absolute deviation estimator bn for the stationary and the nonstationary cases respectively. λ

When |d| < 1/2, under Assumptions 2 and 3, the process {Yt } generated by (2.1) bn for this is stationary, see Ling and Li (1997). To state the asymptotic behavior of λ case, the following first-order derivatives of the functions, εt (λ) and ht (λ), are first

considered,

∂εt (λ) = −φ−1 (B)εt−j (λ), ∂φj ∞ X 1 ∂εt (λ) =− εt−k (λ), ∂d k k=1 r

∂εt (λ) = −ψ −1 (B)εt−j (λ), ∂ψj s

X ∂ht−i (λ) ∂ht (λ) βi = ε˜t + , ∂δ ∂δ i=1 s

X ∂εt−i (λ) X ∂ht−i (λ) ∂ht (λ) αi εt−i (λ) βi =2 + , ∂γ ∂γ ∂γ i=1 i=1

where ε˜t = (1, ε2t−1 (λ), ..., ε2t−r (λ), ht−1 (λ), ..., ht−s (λ))T . Note that in application only n observations are available, however, εt (λ), ht (λ), ∂εt (λ)/∂γ, ∂ht (λ)/∂γ and ∂ht (λ)∂δ all depend on the theoretically infinite past history of {Yt } or {εt }. For

simplicity, we set the initial values of {Yt } and {εt } to zero and replace ht and ε2t for P t 6 0 by (1/n) ni=1 ε2i . This will not affect asymptotic efficiency and other asymptotic properties, see Bollerslev (1986) and Weiss (1986). Denote the matrices 1 ∂εt (λ0 ) ∂εt (λ0 ) }, Ωε = E{ ht (λ0 ) ∂γ ∂γ T and Ω2 = E{

1 4h2t (λ0 )

Ω1 =

Ã

Ωε 0 0

0

!

∂ht (λ0 ) ∂ht (λ0 ) }, ∂λ ∂λT

where Ω1 and Ω2 are l × l and Ωε is (p + q + 1) × (p + q + 1). The existence of these

three matrices comes from Lemma B.1 of Ling (2003). Similar to the proof of Lemmas 3.1-3.3 in Weiss (1986), we can show that Ωε and Ω2 are positive definite matrices

and then the matrix c1 Ω1 + c2 Ω2 is also positive definite, where c1 and c2 are two arbitrary but fixed positive number. Hence, we can state the asymptotic properties bn for the stationary ARFIMA-GARCH models as follows. of λ 8

Theorem 2.1. Suppose that {Yt } and {εt } are generated by (2.1). Under Assumpbn } of Qn (λ) such tions 1-3, if |d| < 1/2, there exists a sequence of local minimizers {λ that



bn − λ0 ) −→ N (0, Σ) n(λ

in distribution, where the covariance matrix

1 1 1 2 Ω2 )(f (0)Ω1 + Ω2 )−1 , Σ = (f (0)Ω1 + Ω2 )−1 (Ω1 + σ|u| 4 2 2 2 σ|u| = var(|ut |) and f (0) is the value of the probability density function f (·) at zero.

bn = (b Denote the asymptotic covariance of γ bn and δbn by Σ12 , where λ γnT , δbnT )T . In

fact Σ12 is the (p + q + 1) × (r + s + 1) upper right-hand part of the matrix Σ. Note

that, unlike Ling and Li (1997), this submatrix of Σ may not be equal to zero. Hence, it is not suitable to perform the above least absolute deviation estimation for γ and δ separately. When model (2.1) degenerates to a pure GARCH process, the covariance matrix

2 Σ is equal to σ|u| Ω−1 2 . Berkes and Horvath (2003) considered this special case and

obtained the corresponding asymptotic distribution.

For the stationary case, the process {Yt } generated by model (2.1) may include an

unknown mean µ, that is, instead of the first equation in model (2.1), the following model φ(B)(1 − B)d (Yt − µ) = ψ(B)εt

is involved, see Li and McLeod (1986), Beran (1995) and Ling and Li (1997). For this case, we can center the observed series by the sample mean first and then perform the procedure of the least absolute deviation estimation in the above. The simulation results in section 4 imply that the estimators obtained by this method are very close to those obtained when the mean is known. When d > 1/2, the process {Yt } generated by model (1.1) is nonstationary, so

is the model (2.1). It is often difficult to perform estimation for nonstationary time series directly or discuss its asymptotic properties in the field of time series analysis.

This problem can be overcome by following Beran (1995) and Ling and Li (1997). Suppose that d = m + df , where df ∈ (−1/2, 1/2) and m is a positive integer. The

first equation in (2.1) can be rewritten as (2.4)

φ(B)(1 − B)df (1 − B)m Yt = ψ(B)εt . 9

Denote Ut = (1 − B)m Yt , then Ut follows the model, φ(B)(1 − B)df Ut = ψ(B)εt ,

(2.5)

where εt is the same as in model (2.1). It means that, after taking mth order differencing for the nonstationary process {Yt } generated by model (2.1), the resulting series is a stationary ARFIMA(p, df , q)-GARCH(r, s) model. Denote γ ∗ =

(df , φ1 , ..., φp , ψ1 , ..., ψq )T and λ∗ = (γ ∗T , δ T )T , where the first elements d in λ and γ are both replaced by df . For the process {Ut }, by Theorem 2.1, there exists a b∗ with asymptotic normality. Let λ bn = local least absolute deviation estimator λ n

b∗ λ n

bn is a local least absolute deviation estimator for the orig+ (m, 0, ..., 0) . Then λ b∗ . inal sequence {Yt } and has the same asymptotic distribution as λ T

n

According to the definition of the function εt (γ), since d > −1/2, we have εt (γ) =

∞ X

ak (γ)Yt−k ,

k=0

where these ak (γ) are continuously differentiable with respect to γ. On the other hand, by (2.5), εt also has the following representation, ∗

εt (γ ) =

∞ X

a ˜k (γ ∗ )Ut−k ,

k=0



where these a ˜k (γ ) are continuously differentiable with respect to γ ∗ . Note that εt (γ) = εt (γ ∗ ), hence the functions ht (λ) = α0 +

r X

αi ε2t−i (γ) +



ht (λ ) = α0 +

r X

βi ht−i (λ)

j=1

i=1

and

s X

αi ε2t−i (γ ∗ )

i=1

+

s X

βi ht−i (λ∗ ),

j=1

are also equal. It is easy to show that

∂εt (λ∗0 ) ∂ht (λ0 ) ∂ht (λ∗0 ) ∂εt (λ0 ) = and = , ∂γ ∂γ ∗ ∂λ ∂λ∗ furthermore, Ω∗ε = Ωε , Ω∗1 = Ω1 and Ω∗2 = Ω2 , where the matrices Ω∗ε , Ω∗1 and Ω∗2 are defined by replacing Yt and λ by Ut and λ∗ respectively. Based on the above argument, bn for the nonstationary ARFIMA-GARCH we can state the asymptotic properties of λ models as follows.

10

Theorem 2.2. Suppose that {Yt } and {εt } are generated by (2.1). Under Assump-

tions 1-3, if d ∈ ∪∞ J=1 (J − 1/2, J + 1/2), there exists a sequence of local minimizers bn } of Qn (λ) such that {λ √ bn − λ0 ) −→ N (0, Σ) n(λ in distribution, where the covariance matrix Σ is defined in Theorem 2.1.

In the proof of Theorem 2.2, we need to know the exact value of m for the differencing parameter d. However, in practice, it is unnecessary to specify m before estimation and this argument is supported by the simulation results in section 4. Similar to the stationary case, when d = m + df , m ≥ 1, the following process is

considered,

φ(B)(1 − B)df {(1 − B)m Yt − µ}, where µ is unknown. We can deal with this case by following Beran (1995) and Ling and Li (1997). Let Ut = (1 − B)m Yt and then estimate µ by the sample mean of Ut .

After removing the sample mean by centralizing the sequence {Ut }, then Theorem 2.1

can be applied to this case. Obviously, for this method, we need to know the value of m before estimation. When ht is a constant, that is, model (2.1) reduces to a pure ARFIMA case, Theorem 2.1 and Theorem 2.2 are also satisfied. Hence, this article also provides a robust estimate procedure for the pure ARFIMA models.

3

Statistics for diagnostic checking

There are three stages in the Box-Jenkins approach to time series modelling: identification, estimation and diagnostic checking. In this section, two portmanteau statistics, based on the residual autocorrelations and absolute residual autocorrelations respectively, are constructed for the third stage and can be used to check whether or not a FARIMA-GARCH model fitted with least absolute deviation method is adequate. The asymptotic distribution of the residual autocorrelations is considered first. Let εbt and b ht be the corresponding values when the parameter vector λ in functions bn , the local minimizer obtained in Theorem 2.1. Hence εt (λ) and ht (λ) is replaced by λ 11

we can define the lag-k standardized residual autocorrelation as Pn 1/2 1/2 εt /b ht − ε˜)(b εt−k /b ht−k − ε˜) t=k+1 (b , k = 1, 2, ..., r˜k = Pn 1/2 (b εt /b ht − ε˜)2 t=1

where ε˜ = n

P −1

1/2 εbt /b ht and n is the sample size. Generally it is more difficult to

discuss the asymptotic behavior of r˜k . However, if the model is correct, it can be shown that ε˜ converges to zero in probability. Hence we consider rbk instead of r˜k ,

where

rbk =

Pn

εt εbt−k )/(b htb ht−k )1/2 t=k+1 (b , Pn 2 b ε b / h t t=1 t

k = 1, 2, ...,

and ε˜ in r˜k is replaced by zero. b = (b Let R r1 , ..., rbM )T , where M is a fixed positive integer. Denote the matrices X = (X1 , ..., XM )T and Z = (Z1 , ..., ZM )T , where, for k = 1, ..., M , 1 ∂εt (λ0 ) εt−k (λ0 ) p Xk = E{ p } ht (λ0 ) ∂λ ht−k (λ0 )

and

Zk = E{

1 ∂ht (λ0 ) εt−k (λ0 ) p }. ht (λ0 ) ∂λ ht−k (λ0 )

Under the assumption of Theorems 2.1 and 2.2, following Li (1992), it is easy to show that √

(3.1)

b −→ N (0, V1 ), nR

in distribution, where the variance matrix V1 = I −

1 κ 1 T X {(f (0)Ω1 + Ω2 )−1 − Σ}X + 4 Ω3 , 4 σ 2 4σ

the matrix Ω3 = X T (f (0)Ω1 +0.5Ω2 )−1 Z+Z T (f (0)Ω1 +0.5Ω2 )−1 X and κ = E{u2t [I(ut > 0) − I(ut < 0)]}.

From the above, we can obtain the correct asymptotic standard errors for the

residual autocorrelations. Note that the matrix (f (0)Ω1 + 0.5Ω2 )−1 − Σ is equal to 2 1 1 σ|u| 1 1 )Ω2 ](f (0)Ω1 + Ω2 )−1 . (f (0)Ω1 + Ω2 )−1 [(f (0) − )Ω1 + ( − 2 4 2 4 2

In particular, when ut is symmetrically distributed, we have that κ = 0. Furthermore, 2 the matrix (f (0) − 1/4)Ω1 + (1/2 − σ|u| /4)Ω2 is equal to 0.1553Ω1 + 0.1332Ω2 , or

12

0.1103Ω1 + 0.2874Ω2 or 0.0683Ω1 + 0.3573Ω2 , when ut follows a t-distribution with 3 or 5 degrees of freedom, or a normal distribution respectively. Hence, the asymptotic √ standard errors are generally less than 1/ n, which is usually regarded as a crude standard error in diagnostic checking. Our result implies that the test, using simply √ 1.96/ n, could be too conservative. These results are typical and consistent with the classical results, see Box and Pierce (1970) and Li and Mak (1994). Furthermore, when model (2.1) degenerates to a pure GARCH model, the matrix X is equal to √ b zero and the asymptotic variance matrix of nR is an identity matrix, that is, the √ quantity 1/ n is just the standard error for rbk , k = 1, ..., M . Next we consider the asymptotic distribution of the absolute residual autocorre-

lations. The lag-k standardized absolute residual autocorrelation are defined as Pn 1/2 1/2 εt |/b ht − ε¯)(|b εt−k |/b ht−k − ε¯) t=k+1 (|b , k = 1, 2, ..., (3.2) ρ˜k = Pn 1/2 (|b εt |/b ht − ε¯)2 t=1

where ε¯ = n−1

P

1/2 |b εt |/b ht . Note that ε¯ converges to E|ut | = 1 in probability, if the

model is correct. Hence, for the same reason as that for the residual autocorrelations, we consider ρbk instead, where Pn 1/2 1/2 εt |/b ht − 1)(|b εt−k |/b ht−k − 1) t=k+1 (|b ρbk = (3.3) , Pn 1/2 (|b εt |/b ht − 1)2

k = 1, 2, ... ,

t=1



and ε¯ in (3.2) is replaced by one.

We note that, if the model is correct, the item n−1

P 1/2 (|b εt |/b ht − 1)2 in (3.3)

2 converges to the constant σ|u| = var(|ut |) in probability. Hence, for ρbk , we need only

consider the asymptotic distribution of

n 1 X |b εt | |b εt−k | b Ck = ( 1/2 − 1)( 1/2 − 1). b b n t=k+1 ht ht−k

b = (C b1 , C b2 , ..., C bM )T and C = (C1 , C2 , ..., CM )T , where Ck is the correspondLet C b in C bk is replaced by the true parameter vector λ0 . Similarly define ing value when λ

ρb = (b ρ1 , ..., ρbM )T and ρ = (ρ1 , ..., ρM )T . Then, we have the following relationship b and C. between the vectors C

Lemma 3.1. If Assumptions 1-3 are satisfied, then (3.4)

bn − λ0 ) + op ( √1 ), b = C − H(λ C n 13

where the matrix H = (H1 , ..., HM )T and Hk = E{

1 ∂ht (λ0 ) |εt−k (λ0 )| (p − 1)}, 2ht (λ0 ) ∂λ ht−k (λ0 )

k = 1, ..., M.

For the vector C on the right hand side of (3.4), applying Theorem 2.8.1 in Lehmann (1998) directly, we can obtain that √ 4 nC −→ N {0, σ|u| I},

(3.5)

in distribution, where I is the M × M identity matrix. Furthermore, based on the proof of Theorem 2.1, as in Li and Li (2005), it can be shown that √

(3.6) where ζt =

n

1 X bn − λ0 ) = (f (0)Ω1 + 1 Ω2 )−1 √ n(λ ζt + op (1), 2 2 n t=1

1 ∂εt (λ0 ) |ut | − 1 ∂ht (λ0 ) −p [I(ut > 0) − I(ut < 0)]. 2ht (λ0 ) ∂λ ht (λ0 ) ∂λ

Based on the approximation of (3.6), we can obtained the asymptotic covariance √ √ b nC as follows, matrix between n(λ n − λ0 ) and (3.7)

√ √ bn − λ0 ), nCk ) cov( n(λ

1 X 1 |εt−k (λ0 )| 1 X |εt (λ0 )| (p ≈ E[{(f (0)Ω1 + Ω2 )−1 √ − 1)( p − 1)}] ζt }{ √ 2 2 n t n s ht (λ0 ) ht−k (λ0 )

=

2 σ|u|

1 (f (0)Ω1 + Ω2 )−1 Hk , 2 2

where Hk is defined in Lemma 3.1. Hence, by applying the Mann-Wald device, the martingale central limit theorem √ b and (3.4) to (3.7), we know that nC is asymptotically normally distributed with

4 mean zero and covariance σ|u| V2 . Furthermore,



(3.8)

nb ρ −→ N (0, V2 )

in distribution, where V2 = I −

1 1 T 2 H {σ|u| (f (0)Ω1 + Ω2 )−1 − Σ}H. 4 σ|u| 2 14

The above provides the correct asymptotic standard errors for the absolute resid2 ual autocorrelations. Similarly, the matrix σ|u| (f (0)Ω1 +0.5Ω2 )−1 −Σ can be rewritten

as

1 2 f (0) − (f (0)Ω1 + Ω2 )−1 [(σ|u| 2 2 The quantity σ|u| f (0) − 1/4 is equal to

2 σ|u| 1 1 )Ω1 + Ω2 ](f (0)Ω1 + Ω2 )−1 . 4 4 2 0.3447 or 0.0565 when ut follows from a t-

distribution with 3 or 5 degrees of freedom respectively. Hence, for heavy-tailed error sequence in model (2.1), the asymptotic standard errors of absolute residual autocor√ relations are also smaller than 1/ n, as in the case of residual autocorrelations. Note

2 that the quantity σ|u| f (0) − 1/4 is negative, -0.0683, when ut is normally distributed.

However, for this case, the simulation results in Table 2 imply that the asymptotic √ standard errors are also less than 1/ n. bT R b and In general, the matrices V1 and V2 are not idempotent, and then nR

nb ρT ρb do not follow an asymptotic χ2 distribution. However, by (3.1) and (3.8), the statistics,

and

bT V −1 R b Qr (M ) = nR 1 Qa (M ) = nb ρT V2−1 ρb,

will be asymptotically distributed as χ2 (M ) if the model is correct. These two quantities should be useful as portmanteau statistics for checking the adequacy of the ARFIMA-GARCH models estimated by the least absolute deviation approach. In 2 practice, we can obtain the exact values of σ 2 , σ|u| and f (0) in the definitions of maP ht trices V1 and V2 if the distribution of ut is known. Otherwise, we can use n−1 εb2t /b P 1/2 1/2 to replace σ 2 and n−1 (|b εt |/b ht −1)2 to replace σ 2 . For f (0), the sequence {b εt /b ht } |u|

is first supposed to be independent identically distributed and then some nonpara-

metric methods, such as kernel estimation, can be applied to fit the density function fb(x). Finally, we can use fb(0) to replace f (0). The entries of X, Z, H, Ωε and Ω2

can be replaced by the corresponding sample averages as in Li and Mak (1994). Tse and Zuo (1997) considered the optimal choice of M for portmanteau tests proposed

in Li and Mak (1994). The simulation results in section 5 imply that the combination of Qr (M ) and Qa (M ) can be used to check whether or not the fitted ARFIMA-GARCH models is adequate. It can be seen that the portmanteau test Qr (M ) is not sensitive to the 15

misspecification in conditional variances while Qa (M ) is not sensitive to the misspecification in conditional means. In fact, Wong and Ling (2005) observed the same phenomenon for the residual autocorrelations and squared residual autocorrelations and proposed a combined portmanteau test based on both the residual autocorrelations and squared residual autocorrelations. Hence, it is also an interesting topic to construct a combined portmanteau test, based on the asymptotic joint distribution of residual autocorrelations and absolute residual autocorrelations, in checking the adequacy of ARFIMA-GARCH models fitted by the least absolute deviation approach. However, this method is more complex and we leave it for future research.

4

Simulation results for the least absolute deviation estimation

The first experiment is considered to compare numerically the least absolute deviation estimate in section 2 with the conditional quasi maximum likelihood estimate in Ling and Li (1997) and the following ARFIMA(0,d,0)-GARCH(1,1) process was involved,   (1 − B)d Yt = εt ,    1/2 (4.1) εt = ut ht ,    h = 0.5 + 0.2ε2 + 0.7h , t

t−1

t−1

where ut follows a standard normal distribution or a student’s t-distribution with 3 or 5 degrees of freedom. The two t-distributions were first standardized to have mean zero and variance one. This step was always performed in all our simulation experiments. The differencing parameter d was chosen to be −0.3 or 0.3 for the

stationary case and d = 0.7 or 1.3 for the nonstationary case. For each combination of error distribution and differencing parameter, we considered the sample size n = 600 and drew 400 independent replications. The iteration algorithm of Nelder and Mead (1965), which is available in the International Mathematical and Statistical Library (IMSL) subroutine BCPOL, was employed to perform the exhaustive search for the bM LE and the least absolute deviation estimator quasi maximum likelihood estimator λ bLAD at the same time. We set the initial value of the parameter d to be zero and λ

the parameters in conditional variance, α0 , α1 and β1 , to be 0.1. The subroutine 16

BCPOL was also used in the following experiment and the real example. Since the values of parameters α0 and α1 fitted by the least absolute deviation approach are different from 0.5 and 0.2 by a common factor, as in Peng and Yao (2003), we define the average absolute errors as 1 (|b α1 /b α0 − 0.4| + |βb1 − 0.2| + |db − d|), 3

bM LE with λ bLAD . which can be used to compare the performance of λ

Figures 1 and 2 display the boxplots for the average absolute errors. There are a

few very large values of average absolute errors for the quasi maximum likelihood estimation when the errors are t-distributed with 3 degrees of freedom. For convenience of presentation, we have removed these outliers from the figures. The least absolute deviation estimation performs much better when ut follows t-distribution with 3 degrees of freedom. This may reflect the fact that the heavier are the tails, the slower is the convergence rate of the maximum likelihood estimation, see Hall and Yao (2003). Note that the t-distribution with 5 degrees of freedom has a finite fourth moment so bM LE will enjoy the √n convergence that the quasi maximum likelihood estimators λ bLAD rate (Ling and Li, 1997). For this case, the least absolute deviation estimator λ

also performs better. When the error is normally distributed, of course, the quasi

maximum likelihood estimation is the better choice. However, the performance of the least absolute deviation estimation is also comparable. The next experiment is designed to examine the performance of the least absolute deviation estimation in finite sample cases. The generating process (4.1) with d = 0.3 was employed again and the errors follow a normal distribution, a t-distribution with 3 or 5 degrees of freedom respectively. Note that the series {Yt } has long memory

property. The sample size is set to be 300 or 400. We drew 500 independent replica-

tions for each combination of the sample size and the error distribution and the least absolute deviation estimation in section 2 was considered for each replication. Table √ 1 presents the estimated biases, the empirical root mean squared errors ( MSE) of √ the estimates and the root mean asymptotic variance ( MAV). From this table, we see that the biases are all small and the root mean asymptotic variances are very close to the empirical root mean squared errors. All biases, empirical root mean squared errors and root mean asymptotic variance change little when the series is centered by the sample mean and decrease as the sample size increases. The empirical root mean 17

squared errors and the root mean asymptotic variance become much closer when the sample size is larger, n = 400. We also considered different differencing parameters and found very similar results.

5

Numerical performance of the goodness-of-fit tests

In this section, we performed two simulation experiments to demonstrate the usefulness of the two portmanteau test obtained in section 3. In the first experiment, we considered the ARFIMA(0,d,0)-GARCH(1,1) model   (1 − B)0.7 Yt = εt ,    1/2 εt = ut ht ,    h = 0.5 + 0.3ε2 + 0.5h t

t−1

t−1 ,

where {ut } follows a standard normal distribution or a standard t-distribution with 3 or 5 degrees of freedom. For these three models, we considered two different sample sizes, n = 300 and 500, and there were 500 independent replications for each combination of models and sample sizes. The asymptotic standard deviations Ai , i = 1, ..., 6, of the absolute residual autocorrelations, ρb = (b ρ1 , ..., ρb6 )T , and the residual autocor-

relations, rb = (b r1 , ..., rb6 )T , were computed according to the results in section 3. The

empirical standard deviations Si of ρb and rb were also obtained and were taken to be

the ’true’ standard deviations. Table 2 presents the empirical standard deviations and the averages of the asymptotic standard deviations. It can be seen that the

asymptotic results for both absolute residual and residual autocorrelations match the ’true’ values satisfactorily for n as small as 300 and quite well for n = 500. Note that the generating process in this experiment is nonstationary since the differencing parameter d = 0.7. The results for the different parameter vectors, including some stationary models, are similar to those in Table 2 and hence are not presented here. In the second experiment, we considered the empirical size and power of the test statistics Qa (M ) and Qr (M ). Three different generating processes were involved: the ARFIMA(0,d,0)-GARCH(1,1) process,  (1 − B)d Yt = εt , εt = ut h1/2 , t h = 0.3 + 0.3ε2 + 0.3h , t

t−1

18

t−1

was used to check the empirical sizes. The ARFIMA(0,d,0)-GARCH(3,1) process,  (1 − B)d Yt = εt , εt = ut h1/2 , t h = 0.3 + 0.3ε2 + 0.3ε2 + 0.3h , t

t−1

t−3

t−1

was used to check the sensitivity for the misspecification of conditional variance (we call this Type I power) and the ARFIMA(2,d,0)-GARCH(1,1) process,  (1 − 0.2B 2 )(1 − B)d Yt = εt , εt = ut h1/2 , t h = 0.3 + 0.3ε2 + 0.3h , t

t−1

t−1

was used to check the sensitivity for the misspecification of conditional mean (we call this Type II power). The sequence {ut } in the above three generating processes came from a t-distribution with 3 or 5 degrees of freedom or a normal distribution

respectively and was standardized to be mean zero and variance one. The differencing parameter d was taken to be -0.3, 0.3 or 0.7 resulting in series with the short memory, long memory or nonstationary property respectively. Two different sample sizes n = 400 and n = 600 were considered and there were 1000 replications for each combination of the differencing parameter d, sample size n, and the error distribution. We estimated all the simulated data with the ARFIMA(0,d,0)-GARCH(1,1) model by the least absolute deviation approach and computed the values of Qa (M ) and Qr (M ) with the methods in section 3. Table 3 displays the proportions of rejections based on the upper 5th percentile of the corresponding asymptotic χ26 distribution. All the sizes of Qa (M ) and Qr (M ) in Table 3 are very close to 0.05 especially for the cases with n = 600. Type I powers of Qa (M ) and Type II powers of Qr (M ) are all greater than 0.5 when the sample size n is as large as 600. It means that the two goodness-of-tests, Qa (M ) and Qr (M ), can be used to check respectively whether or not the conditional variance part and conditional mean part of the fitted model is misspecified. Note that, as expected, Type I powers of Qr (M ) and Type II powers of Qa (M ) are no more than 0.15 and can not be used in real application. Hence, only the combination of Qa (M ) and Qr (M ) forms a complete test which can be used to check whether or not the fitted ARFIMA-GARCH model by least absolute deviation approach is adequate.

19

6

An illustrative example

Suppose rt is the return from a speculative asset such as a stock. From an axiomatic argument, Luce (1980) had shown that |rt |θ was an appropriate class of risk-measures

where the value of θ is under the choice of the individual investor. In particular,

Granger and Ding (1995) and Granger and Sin (2000) treated the observed absolute return |rt | as a measure of risk which is compared with the unobserved variance or

standard deviation of the returns. It is well known that there is little serial correlations in the returns which is consistent with the efficient market theory. However, Taylor (1986) found that the absolute return |rt | has significant autocorrelations over

long lags. This property is characterized to be ‘long-memory’, see Ding et al. (1993),

Granger et al. (2000) and Tsay (2002). In fact, Granger et al. (2000) applied the fractional ARIMA models to several price indices and obtain their long-memory properties. Many absolute return series appear to have the properties of both long-memory and conditional heteroscedasticity. Hence, the ARFIMA-GARCH model should be a good choice for modeling the absolute returns. Furthermore, many financial time series are heavy-tailed and existing estimation procedures for ARFIMA or ARFIMAGARCH models are not robust. Consequently, in order to obtain a good estimate, outliers are often removed before estimation, see Ling and Li (1997) and Granger et al. (2000). However, these outliers may include much useful information, see Embrechts et al. (1997). Hence, a more robust method, such as the least absolute deviation estimation in this article, should be employed to obtain good estimates. The data set we analyze in this section is the absolute return, as a percentage, of the daily closing Dow Jones Industrial Average Index. There is a total of 2519 observations from January 3, 1995 to December 31, 2004. The mean and standard deviation of the absolute returns are 0.352 and 0.335 respectively. Denote the centralized absolute return by yt . The phenomenon of conditional heteroscedasticity in the time series {yt } obviously present in Figure 3, which gives the time series plot of yt , t = 1, ..., 2519. Figure 4 shows the sample autocorrelation functions (ACF) of

the absolute return up to 200 lags. The ACFs are relatively small in magnitude, but decay very slowly. They appear to be significant at the 5% level even after 200 lags, suggesting the presence of long memory (Tsay, 2002). The ARFIMA-GARCH models were considered for the observed series {yt } and the least absolute deviation method 20

was used to find the estimates. We considered four models to fit the absolute return series {yt }: ARFIMA(4, d, 0) − ARCH(4),

ARFIMA(4, d, 0) − GARCH(1, 1),

ARFIMA(2, d, 1) − ARCH(4),

ARFIMA(2, d, 1) − GARCH(1, 1).

The methodology for least absolute deviation estimation and diagnostic checking in sections 2 and 3 was applied to these four models. We set M to be 15 and the values 2 of σ 2 , σ|u| and f (0) were estimated with the methods mentioned in section 3. The

bandwidth was set to be 0.05. The modeling results are as follows:

Model 1 :   (1 − B)0.4349 (1 + 0.4573B + 0.2611B 2 + 0.1794B 3 + 0.0707B 4 )yt = εt ,   1/2 εt = ut ht ,    h = 0.0242 + 0.0973ε2 + 0.1045ε2 + 0.0746ε2 + 0.0876ε2 , t

t−1

t−2

t−3

t−4

where the estimated standard errors of d, φ1 , φ2 , φ3 , φ4 , φ5 , α0 , α1 , α2 , α3 , α4 and α5 are 5.672 × 10−3 , 6.818 × 10−3 , 6.526 × 10−3 , 5.543 × 10−3 , 4.280 × 10−3 , 1.982 × 10−3 , 0.0215, 0.0220, 0.0195 and 0.0206, Qr (15) = 30.43 and Qa (M ) = 109.40.

Model 2 :   (1 − B)0.4651 (1 + 0.4786B + 0.2827B 2 + 0.2052B 3 + 0.1004B 4 )yt = εt ,    1/2 εt = ut ht ,    h = 9.273 × 10−4 + 0.0465ε2 + 0.9084h , t

t−1

t−1

where the estimated standard errors of d, φ1 , φ2 , φ3 , φ4 , φ5 , α0 , α1 and β1 are 5.541 × 10−3 , 6.570 × 10−3 , 6.393 × 10−3 , 5.591 × 10−3 , 4.499 × 10−3 , 2.420 × 10−4 , 7.264 × 10−3 and 0.0127, Qr (15) = 33.12 and Qa (M ) = 10.79.

Model 3 :   (1 − B)0.6594 (1 − 0.0784B − 0.0170B 2 )yt = εt − 0.7699εt−1 ,    1/2

εt = ut ht ,    h = 0.0229 + 0.1011ε2 + 0.1086ε2 + 0.0815ε2 + 0.0923ε2 , t t−3 t−4 t−1 t−2

where the estimated standard errors of d, φ1 , ψ1 , α0 , α1 , α2 , α3 , α4 and α5 are 0.0124, 7.118 × 10−3 , 4.334 × 10−3 , 7.925 × 10−3 , 1.925 × 10−3 , 0.0216, 0.0223, 0.0201 and 0.0209, Qr (15) = 17.69 and Qa (M ) = 97.83. 21

Model 4 :   (1 − B)0.7117 (1 − 0.0618B − 0.0193B 2 )yt = εt − 0.8004εt−1 ,    1/2

εt = ut ht ,    h = 8.162 × 10−4 + 0.0469ε2 + 0.9100h , t t−1 t−1

where the estimated standard errors of d, φ1 , ψ1 , α0 , α1 and β1 are 0.0135, 7.815×10−3 , 3.912 × 10−3 , 7.673 × 10−3 , 2.226 × 10−4 , 7.201 × 10−3 and 0.0122, Qr (15) = 15.96 and Qa (M ) = 10.88.

From the above results, we see that model 4, ARFIMA(2,d,1)-GARCH(1,1), can be accepted by the test statistics Qa (M ) and Qr (M ) with M = 15 at significance level 0.05 and hence is adequate. Consistent with the results in section 5, at significance level 0.05, the portmanteau test Qa (M ) rejects models 1 and 3, suggesting that the conditional variances are misspecified, while Qr (M ) rejects models 1 and 2, suggesting that the conditional means are misspecified. We had also tried several other values of M and similar results were obtained. Figure 5 presents the sample autocorrelation functions (ACF) of residuals and absolute residuals coming from the above four fitted models. The 95% asymptotic confidence interval of these ACFs was also displayed in Figure 5. Only the ACFs of both residuals and absolute residuals coming from model 4 are insignificant at level 0.05. This is consistent with the findings of the two portmanteau tests.

7

Conclusions

In this article we proposed a least absolute deviation estimator for the ARFIMAGARCH processes and established its asymptotic properties. Two portmanteau tests are also designed based on the asymptotic distributions of residual autocorrelations and absolute residual autocorrelations. Simulation results show that the proposed least absolute deviation method behaves well for series with heavy-tailed noise component and the diagnostic tool consisting of the two portmanteau tests could be useful in checking the adequacy of the ARFIMA-GARCH models estimated by the least absolute deviation method. The modeling process for the absolute returns of the daily closing Dow Jones Industrial Average Index (1995-2004) illustrates that 22

the properties of long memory and conditional heteroscedasticity may exist simultaneously in the absolute returns of a financial series, which is also known to be heavy-tailed. The robust methodology proposed in this article should be useful in modeling such time series that exhibit simultaneously the features of long memory, conditional heteroscedasticity and heavy tails.

23

Appendix: Proofs of Theorem 2.1 and Lemma 3.1 Proofs of Theorem 2.1 and Lemma 3.1 are presented in this section. To complete the proof of Theorem 2.1, we first state four lemmas, Lemmas A.1-A.4. Lemma A.1. Suppose the stochastic process {Xt , t = 1, ..., n} have an identical marginal distribution with E|Xt |2 < ∞, then

n−1/2 max{|X1 |, ..., |Xn |} = op (1). Proof. The result is obvious, since for any ε > 0, P (n−1/2 max{|X1 |, ..., |Xn |} > ε) 6 nP (|X1 | > n1/2 ε) 1 6 2 E{|X1 |2 I(|X1 | > n1/2 ε)} ε = o(1).

Lemma A.2. If Assumptions 2 and 3 hold, then there exists a constant c0 small enough such that (C1). (C2). (C3). (C4). (C5). (C6).

° ∂εt (λ) °2 ° } < ∞, E{ sup ° ∂γ λ∈Vc0 ° 1 ∂εt (λ) ° °2 } < ∞, E{ sup ° p λ∈Vc0 ht (λ) ∂γ 2 ° ∂ εt (λ) °2 ° } < ∞, E{ sup ° ∂γ∂γ T λ∈Vc0 ° 1 ∂ht (λ) ° °2 } < ∞, E{ sup ° p ∂λ λ∈Vc0 ht (λ) ° 1 ∂ht (λ) °2 ° } < ∞, E{ sup ° h (λ) ∂λ λ∈Vc0 t ° 1 ∂ 2 ht (λ) °2 ° } < ∞, E{ sup ° T λ∈Vc0 ht (λ) ∂λ∂λ

where Vc0 = {λ ∈ Θ : kλ − λ0 k < c0 }.

Proof. It is not difficult to complete the proof from Lemma B.1 of Ling (2003).

24

Lemma A.3. Under Assumptions 1-3, if |d| < 1/2, then n X 1 v 1 {[ p ][|εt (λ0 + √ )| − |εt (λ0 )|]} = op (1), (A.1) −p −1/2 n ht (λ0 ) ht (λ0 + n v) t=1 where v is an arbitrary but fixed vector in the space Rl .

Proof. We can rewrite (A.1) as follows, by Taylor expansion, n X v 1 1 ][|εt (λ0 + √ )| − |εt (λ0 )|]} −p {[ p −1/2 n ht (λ0 ) ht (λ0 + n v) t=1 n 1 X −3/2 vT ∂ht (λ0 ) =− √ {|εt (λ0 + √ )| − |εt (λ0 )|} ht (λ0 )v T ∂λ 2 n t=1 n n

v ∂ht (λ∗nt ) 2 3 X −5/2 ∗ ] {|εt (λ0 + √ )| − |εt (λ0 )|} ht (λnt )[v T + 8n t=1 ∂λ n n

v 1 X −3/2 ∗ T ∂ 2 ht (λ∗nt ) ht (λnt )v v{|εt (λ0 + √ )| − |εt (λ0 )|} − T 4n t=1 ∂λ∂λ n n

v ∂ht (λ0 ) v T ∂εt (λ0 ) 1 X −3/2 ht (λ0 )v T {|εt (λ0 + √ )| − |εt (λ0 ) + √1 |} =− √ ∂λ 2 n t=1 n n ∂γ n

v T ∂εt (λ0 ) 1 X −3/2 ∂ht (λ0 ) {|εt (λ0 ) + √ | − |εt (λ0 )|} − √ ht (λ0 )v T ∂λ 2 n t=1 n ∂γ n

+

3 X −5/2 ∗ ∂ht (λ∗nt ) 2 v ht (λnt )[v T ] {|εt (λ0 + √ )| − |εt (λ0 )|} 8n t=1 ∂λ n n

v 1 X −3/2 ∗ T ∂ 2 ht (λ∗nt ) ht (λnt )v v{|εt (λ0 + √ )| − |εt (λ0 )|} − T 4n t=1 ∂λ∂λ n

:= A1 + A2 + A3 + A4,

where {λ∗nt } is a sequence of vectors between λ0 and λ0 +n−1/2 v and vector v1 ∈ Rp+q+1

includes the first p + q + 1 elements of v. From the following inequalities, ¯ 1 −3/2 v1T ∂εt (λ0 ) ¯¯ ¯ √ ht (λ0 )v T ∂ht (λ0 ) {|εt (λ0 + √v )| − |εt (λ0 ) + √ |} ∂λ n n n ∂γ 1 ∂ 2 εt (λ) 1 v T ∂ht (λ0 ) }· sup |v1T v1 |, 6√ max { a0 n 16t6n ht (λ0 ) ∂λ 2n λ∈Vc0 ∂γ∂γ T ∗ ¯ 1 −5/2 ∗ ¯ ¯ ht (λnt )[v T ∂ht (λnt ) ]2 {|εt (λ0 + √v )| − |εt (λ0 )|}¯ n ∂λ n T v ∂ht (λ) 2 1 ∂εt (λ) 1 | · √ max { sup |v1T |} 6 √ sup | n a0 λ∈Vc0 ht (λ) ∂λ ∂γ n 16t6n λ∈Vc0

25

and ¯ 1 −3/2 ∗ T ∂ 2 ht (λ∗nt ) ¯ v ¯ ht (λnt )v v{|εt (λ0 + √ )| − |εt (λ0 )|}¯ T n ∂λ∂λ n T 2 v ∂ ht (λ) 1 1 T ∂εt (λ) √ |}, v| · |v max { sup 6 √ sup | 1 n a0 λ∈Vc0 ht (λ) ∂λ∂λT ∂γ n 16t6n λ∈Vc0 where the set Vc0 is defined in Lemma A.2, by (C1), (C3), (C5), (C6), Lemma A.1 and the ergodic theorem, we have A1 = op (1), A3 = op (1) and A4 = op (1). Hence we only need consider A2. It holds that, for x 6= 0, Z −y (A.2) |x + y| − |x| = y[I(x > 0) − I(x < 0)] + 2 [I(x 6 s) − I(x 6 0)]ds, 0

see Knight (1998). Note that the inequality, Z −y [I(x 6 s) − I(x 6 0)]ds > 0, (A.3) 0

is always satisfied. Then the item A2 can be rewritten as, omitting the constant −1/2, n

n

2 X v T ∂ht (λ0 ) 1 X v T ∂ht (λ0 ) √ ξt (v1 )[I(ut > 0) − I(ut < 0)] + Znt (v1 ), n t=1 ht (λ0 ) ∂λ n t=1 ht (λ0 ) ∂λ where (A.4)

ξt (v1 ) = p

(A.5)

Znt (v1 ) =

Z

0

−1/2

and ut = εt (λ0 )ht

v1T

∂εt (λ0 ) , ht (λ0 ) ∂γ

−n−1/2 ξt (v1 )

[I(ut 6 s) − I(ut 6 0)]ds

(λ0 ) is independent and identically distributed. However, by

(C2), (C5), the Schwarz inequality, the ergodic theorem and the fact that E[I(ut > 0) − I(ut < 0)] = 0, we have n

1 X v T ∂ht (λ0 ) ξt (v1 )[I(ut > 0) − I(ut < 0)] = op (1). n t=1 ht (λ0 ) ∂λ 26

Furthermore, by (A.3), we have the inequality, n

n

1 v T ∂ht (λ0 ) X 1 X v T ∂ht (λ0 ) Znt (v1 ), Znt (v1 )| 6 √ max { }· |√ n t=1 ht (λ0 ) ∂λ n 16t6n ht (λ0 ) ∂λ t=1

where

1 v T ∂ht (λ0 ) √ max { } = op (1) n 16t6n ht (λ0 ) ∂λ P and A2 = op (1) will be implied by nt=1 Znt (v1 ) = Op (1).

Next we consider the asymptotic behavior of the summation

Pn

t=1

Znt (v1 ). Be-

cause of the continuity of the density function f at zero, there exist a η1 > 0 and a 0 < M0 < ∞ such that sup|x| η)} 6 nE{(

Z

n−1/2 |ξt (v1 )|

2ds)2 I(|n−1/2 ξt (v1 )| > η)}

0

= 4E{|ξt (v1 )|2 I(|ξt (v1 )| > n1/2 η)} = o(1) and nE{|Znt (v)|2 I(|n−1/2 ξt (v1 )| < η)} Z −n1/2 |ξt (v1 )| −1/2 6 2nE{n |ξt (v1 )| |I(ut 6 s) − I(ut 6 0)|dsI(|n−1/2 ξt (v1 )| < η)} 6 2nηE{

Z

0

−n1/2 |ξt (v1 )|

0

|F (s) − F (0)|dsI(|n−1/2 ξt (v1 )| < η)}

Z 6 2nη(f (0) + M0 )E{

−n1/2 |ξt (v1 )|

sds}

0

= η(f (0) + M0 )v1T Ωε v1 , where F is the cumulative distribution function of ut and the last line above converges to zero as η → 0. Hence,

n n X X 2 E{Znt (v) − E[Znt (v)|Ft−1 ]}2 E( {Znt (v) − E[Znt (v)|Ft−1 ]}) = t=1

t=1

6 nE|Znt (v)|2 = o(1),

which implies that n X t=1

Znt (v) =

n X

E[Znt (v)|Ft−1 ] + op (1).

t=1

27

However, E[Znt (v)|Ft−1 ] =

Z

−n−1/2 ξt (v1 )

0

[F (s) − F (0))]ds ≈

where the approximation holds on the set {|n

Z

−1/2

−n−1/2 ξt (v1 )

sf (0)ds =

0

f (0) |ξt (v1 )|2 , 2n

ξt (v1 )| < η ∗ }, for η ∗ small enough.

Note that, from Lemma A.1 and by the ergodic theorem respectively, P {n−1/2 max(|ξ1 (v1 )|, ..., |ξn (v1 )|) > η ∗ } = o(1)

and

n

n

f (0) X 1 ∂εt (λ0 ) 2 f (0) X 1 {v1T } −→ f (0)v1T Ωε v1 |ξt (v1 )|2 = 2n t=1 2n t=1 ht (λ0 ) ∂γ 2

in probability. Hence we have n X

(A.6)

t=1

1 Znt (v1 ) = f (0)v1T Ωε v1 + op (1). 2

This completes the proof.

Lemma A.4. Under Assumptions 1-3, if |d| < 1/2, then n X {p

(A.7)

t=1

|εt (λ0 )|

+

1 v 1 |εt (λ0 )| log ht (λ0 + √ ) − p − log ht (λ0 )} 2 n ht (λ0 ) 2

ht (λ0 + n−1/2 v) n 1 X |ut | − 1 T ∂ht (λ0 ) 1 T v + v Ω2 v + op (1), =− √ ∂λ 2 2 n t=1 ht (λ0 )

where v is an arbitrary but fixed vector in the space Rl .

Proof. We employ Taylor expansion to rewrite the summation in (A.7) as follows, n X {p t=1

|εt (λ0 )|

1 v |εt (λ0 )| 1 − log ht (λ0 )} log ht (λ0 + √ ) − p n ht (λ0 ) 2 ht (λ0 + n−1/2 v) 2 n n X X v 1 1 |εt (λ0 )| |εt (λ0 )| + log ht (λ0 )} + log ht (λ0 + √ )} − {p = {p n ht (λ0 ) 2 ht (λ0 + n−1/2 v) 2 t=1 t=1 n 1 X |ut | − 1 T ∂ht (λ0 ) =− √ v ∂λ 2 n t=1 ht (λ0 ) +

n

v T ∂ht (λ∗ ) 2 1 X 3|εt (λ0 )| − 1}{ } { p + 4n t=1 2 ht (λ∗ ) ht (λ∗ ) ∂λ n v T ∂ 2 ht (λ∗ ) 1 X |εt (λ0 )| {p − 1} v − 4n t=1 ht (λ∗ ) ∂λ∂λT ht (λ∗ )

:= B1 + B2 + B3,

28

where λ∗ is a vector between λ0 and λ0 + n−1/2 v. Note that, by (C4) and Lemma A.1, p p p ¯ ¯ ht (λ0 ) 1 (A.8) − 1¯ 6 √ max | ht (λ∗ ) − ht (λ0 )| max ¯ p 16t6n a0 16t6n ht (λ∗ ) ¯ v T ∂ht (λ) ¯ 1 ¯} = op (1) 6 √ max { sup ¯ p 2 a0 n 16t6n λ∈Vc0 ∂λ ht (λ) is satisfied. Combining (A.8) and (C5), we have n

1 X v T ∂ht (λ∗ ) 2 B2 = (1.5|ut | − 1){ } + op (1). 4n t=1 ht (λ∗ ) ∂λ However, the item {(v T /ht (λ∗ ))(∂ht (λ∗ )/∂λ)}2 in the above equation has the following

Taylor expansion,

¯ v T ∂ht (λ∗ ) 2 v T ∂ht (λ0 ) 2 ¯¯ ¯{ } − { } ht (λ∗ ) ∂λ ht (λ0 ) ∂λ ¯ ¯ v T ∂ht (λ∗∗ ) v T ∂ 2 ht (λ∗∗ ) v T ∂ht (λ∗∗ ) ∂ht (λ∗∗ ) ∗∗ ¯ = 2¯{ }{ − }(λ − λ ) 0 ht (λ∗∗ ) ∂λ ht (λ∗∗ ) ∂λ∂λT h2t (λ∗∗ ) ∂λ ∂λT ¯ v T ∂ht (λ) ¯ 2 ¯} 6 √ max { sup ¯ n 16t6n λ∈Vc0 ht (λ) ∂λ ¯ ¯ ¯ T ¯ v T ∂ 2 ht (λ) ¯ + sup ¯ vabs ∂ht (λ) ¯2 }, v · { sup ¯ abs abs T ∂λ λ∈Vc0 ht (λ) λ∈Vc0 ht (λ) ∂λ∂λ

where each element in vabs is equal to the absolute value of the corresponding element in v and λ∗∗ is a vector between λ∗ and λ0 . Hence by (C5), (C6), Lemma A.1 and the ergodic theorem, v T ∂ht (λ0 ) 2 1 1 } + op (1) = v T Ω2 v + op (1) B2 = E{ 2 2ht (λ0 ) ∂λ 2 is satisfied. Similarly, by (C6), (A.8) and Assumption 3, we have n

B3 =

v T ∂ 2 ht (λ∗ ) 1 X (|ut | − 1) v + op (1) 4n t=1 ht (λ∗ ) ∂λ∂λT

= op (1). This completes the proof.

29

Proof of Theorem 2.1. For any v = (v1T , v2T )T ∈ Rl , where v1 ∈ Rp+q+1 and v2 ∈

Rr+s+1 , let

v Sn (v) = Qn (λ0 + √ ) − Qn (λ0 ) n n X 1 v {p = [|εt (λ0 + √ )| − |εt (λ0 )|]} n ht (λ0 ) t=1 +

n X t=1

+

n X t=1

{[ p

{p

v 1 ][|εt (λ0 + √ )| − |εt (λ0 )|]} −p n ht (λ0 ) ht (λ0 + n−1/2 v) 1

|εt (λ0 )|

n−1/2 v)

+

1 v log ht (λ0 + √ ) 2 n

ht (λ0 + 1 |εt (λ0 )| − log ht (λ0 )} −p ht (λ0 ) 2

:= Sn(1) (v) + Sn(2) (v) + Sn(3) (v). By Lemmas A.3 and A.4, we know that n

Sn(2) (v)

+

Sn(3) (v)

1 X |ut | − 1 T ∂ht (λ0 ) 1 T =− √ v + v Ω2 v + op (1). ∂λ 2 2 n t=1 ht (λ0 )

Denote n X

1 v T ∂εt (λ0 ) {p | − |εt (λ0 )|]} [|εt (λ0 ) + √ n ∂λ ht (λ0 ) t=1 n n X 1 X √ ξt (v1 )[I(ut > 0) − I(ut < 0)] + 2 Znt (v1 ), = n t=1 t=1

Ln (v) =

where

and Znt (v1 ) =

Z

∂εt (λ0 ) vT ξt (v1 ) = p 1 ht (λ0 ) ∂γ

0

−n−1/2 ξt (v1 )

[I(ut 6 s) − I(ut 6 0)]ds,

see also (A.4) and (A.5). By (A.6), we have that n

1 X Ln (v) = √ ξt (v1 )[I(ut > 0) − I(ut < 0)] + f (0)v1T Ωε v1 + op (1). n t=1

30

(1)

As in Davis and Dunsmuir (1997), by (C2), the sequence {Sn (v)} has the same limit

as {Ln (v)}. Let

sn =

n X t=1

{ξt (v1 )[I(ut > 0) − I(ut < 0)] −

|ut | − 1 T ∂ht (λ0 ) v }, 2ht (λ0 ) ∂λ

where v 6= 0. Then it is easy to show that sn is a martingale with (1/n)Es2n = 2 v T (Ω1 + σ|u| Ω2 )v > 0. From the strict stationarity and ergodicity of {Yt } and {εt }, a.s.

[(1/n)Es2n ]−1 [(1/n)E(s2n |Fn−1 )] → 1. Using the central limit theorem of Stout (1974), n

1 X |ut | − 1 T ∂ht (λ0 ) √ v }−→v T W {ξt (v1 )[I(ut > 0) − I(ut < 0)] − 2ht (λ0 ) ∂λ n t=1 in distribution, where W is a multivariate normal vector with mean 0 and covariance 2 matrix Ω1 + σ|u| Ω2 . Hence,

1 Sn (v) −→ v T W + v T (f (0)Ω1 + Ω2 )v 2 in distribution. Following Lemma 2.2 and Remark 1 of Davis et al. (1992), we complete the proof of Theorem 2.1. Proof of Lemma 3.1. For any v ∈ Rl , denote Dnk and

n 1 X |εt (λ0 )| ∂ht (λ0 ) |εt−k (λ0 )| (p − 1) = n t=k+1 2h3/2 ∂λ ht−k (λ0 ) t (λ0 )

v n |εt−k (λ0 + √vn )| 1 X |εt (λ0 + √n )| q q Ck (v) = − 1)( − 1) ( n t=k+1 ht (λ0 + √v ) ht−k (λ0 + √v ) n

n

where k = 1, ..., M . Next we want to show that, for any but fixed positive number N0 , (A.9)

¯√ ¯ √ sup ¯ nCk (v) − nCk (0) + v T Dnk ¯ = op (1).

kvk6N0

In fact, the item



nCk (v) −



nCk (0) + v T Dnk can be rewritten as

n 1 X {E1 + E2 + E3}, n t=k+1

31

where |εt (λ0 + √vn )| |εt (λ0 )| v T |εt (λ0 )| ∂ht (λ0 ) |εt−k (λ0 )| −p +√ }, E1 = ( p − 1){ q ∂λ n 2h3/2 ht−k (λ0 ) ht (λ0 ) ht (λ0 + √vn ) t (λ0 ) |εt−k (λ0 + √vn )| |εt (λ0 )| |εt−k (λ0 )| E2 = ( p −p − 1){ q } v ht (λ0 ) ht−k (λ0 ) ht−k (λ0 + √n )

and

|εt−k (λ0 + √vn )| |εt (λ0 + √vn )| |εt (λ0 )| |εt−k (λ0 )| −p −p }{ q }. E3 = { q ht (λ0 ) ht−k (λ0 ) ht (λ0 + √vn ) ht−k (λ0 + √vn )

Note that, for kvk 6 N0 , the vector λ0 + n−1/2 v will be an interior point of set

Vc0 for large enough n. Hence, in the remainder of the proof, we assume this to be always satisfied. By Taylor expansion, (A.8), (C5), (C6) and Lemma A.1, we have (A.10) n |εt (λ0 )| |εt (λ0 )| v T |εt (λ0 )| ∂ht (λ0 ) 1 X |εt−k (λ0 )| √ −p − 1){ q +√ } (p ∂λ n t=k+1 ht−k (λ0 ) n 2h3/2 ht (λ0 ) ht (λ0 + √vn ) t (λ0 ) p n ht (λ0 ) 1 v T ∂ 2 ht (λ∗ ) 1 X v T ∂ht (λ∗ ) 2 3 =√ [ ]} { − (|ut−k | − 1)|ut | p 8n ht (λ∗ ) ∂λ n t=k+1 ht (λ∗ ) 4n ht (λ∗ ) ∂λ∂λT n

1 1 X v T ∂ 2 ht (λ) 6 op (1) + √ max {||ut | − 1|} · |} {|ut | · sup | T 4n t=1 n 16t6n λ∈Vc0 ht (λ) ∂λ∂λ n

= op (1),

1 v T ∂ht (λ) 3 X + √ max {||ut | − 1|} · {|ut | · sup | |} 8n t=1 ∂λ n 16t6n λ∈Vc0 ht (λ)

p where ut = |εt (λ0 )|/ ht (λ0 ). By Taylor expansion, (C3), Lemma A.1 and ergodic

theorem, we also have

(A.11) n 1 X |εt−k (λ0 )| 1 √ − 1) q (p n t=k+1 ht−k (λ0 ) ht (λ0 +

√v n

v ∂εt (λ0 ) {|εt (λ0 + √ )| − |εt (λ0 ) + n−1/2 v1T |} ∂λ n )

n

6√

1 1 X T ∂ 2 εt (λ0 ) |v v1 | max {||ut | − 1|} · na0 16t6n n t=1 1 ∂λ∂λT

= op (1).

32

Furthermore, it holds that, by (C2), (A.2), (A.6), (A.8), Lemma A.1 and the ergodic theorem, (A.12) n ∂εt (λ0 ) 1 X |εt−k (λ0 )| 1 √ {|εt (λ0 ) + n−1/2 v1T | − |εt (λ0 )|} − 1) q (p ∂λ n t=k+1 ht−k (λ0 ) ht (λ0 + √vn ) p n ht (λ0 ) 1 X {|ut + n−1/2 ξt (v1 )| − |ut |} =√ (|ut−k | − 1) q v n t=k+1 ht (λ0 + √n ) p n ht (λ0 ) 1 X 1 =√ { √ ξt (v1 )sgn(ut ) + 2Znt (v1 )} (|ut−k | − 1) q n t=k+1 n ht (λ0 + √v ) n

n X

n 2 X 1 (|ut−k | − 1)ξt (v1 )sgn(ut ) + √ = (|ut−k | − 1)Znt (v1 ) n t=k+1 n t=k+1 p n ht (λ0 ) 1 X − 1} (|ut−k | − 1)ξt (v1 )sgn(ut ){ q + n t=k+1 ht (λ0 + √vn ) p n ht (λ0 ) 2 X +√ − 1} (|ut−k | − 1)Znt (v1 ){ q n t=k+1 ht (λ0 + √v ) n

n X

2 6 op (1) + √ max {||ut | − 1|} · Znt (v1 ) n 16t6n t=1 p n ht (λ0 ) 1X + max {| q − 1|} · |(|ut−k | − 1)ξt (v1 )| 16t6n n t=1 ht (λ0 + √vn ) p n X ht (λ0 ) 2 Znt (v1 ) − 1|} · + √ max {||ut | − 1|} · max {| q 16t6n n 16t6n ht (λ0 + √v ) t=1 n

= op (1),

where sgn(x) = I(x > 0) − I(x < 0). Hence, by (A.11) and (A.12) the εt (λ0 +

√v ) n

term in E1 can be replaced by εt (λ0 ), up to op (1), and by (A.10) we have shown that

(A.13)

n ¯ 1 X ¯ sup ¯ √ E1¯ = op (1). n t=k+1 kvk6N0

p Note that E{|εt (λ0 )|/ ht (λ0 ) − 1} = 0 and

n 1 X |εt−k (λ0 )| T ∂ht−k (λ0 ) |εt (λ0 )| } = op (1). − 1) 3/2 v {( p n t=k+1 ∂λ ht (λ0 ) 2ht−k (λ0 )

33

By a proof similar to E1, we can obtain that n ¯ ¯ 1 X ¯ E2¯ = op (1). sup √ n t=k+1 kvk6N0

(A.14) Since

¯ |εt (λ0 + √vn )| |εt (λ0 )| ¯¯ ¯q −p ht (λ0 ) ht (λ0 + √vn ) ¯ ¯ ¯|εt (λ0 + √v )| − |εt (λ0 )|¯ ¯ 1 n q + |εt (λ0 )|¯ q 6 ht (λ0 + √vn ) ht (λ0 +

¯ 1 ¯ −p v h (λ ) √ ) t 0 n

1 ∂εt (λ∗ ) 1 ∂ht (λ∗∗ ) −3/2 |v1T | + √ |εt (λ0 )|ht (λ∗∗ )|v T | na0 ∂γ ∂λ 2 n 1 |εt (λ0 )| v T ∂ht (λ) 1 ∂εt (λ) |+ √ p |, sup | 6√ sup |v1T na0 λ∈Vc0 ∂γ 2 n ht (λ∗∗ ) λ∈Vc0 ht (λ) ∂λ

6√

where λ∗ and λ∗∗ are two vectors between λ0 and λ0 + n−1/2 v. A similar result holds when t is replaced by t − k and hence it is not difficult to show that (A.15)

n ¯ 1 X ¯ ¯ √ sup E3¯ = Op (n−1/2 ). n t=k+1 kvk6N0

Hence, by (A.13), (A.14) and (A.15), equation (A.9) is satisfied. Furthermore, by the central limit theorem, Theorem 2.1 and Theorem 2.2, (A.16)

Dnk = Hk + op (1) and



bn − λ0 ) = Op (1) n(λ

are satisfied. Combining (A.9) and (A.16), we complete the proof of Lemma 3.1.

34

References Baillie, R.T., Chung, C.F. and Tiles, M.A. (1995), Analyzing inflation by the fractionally integrated ARFIMA-GARCH model, J. Appl. Econometrics, 11, 23-40. Beran, J. (1994), Statistics for Long-Memory Processes, New York: Chapman & Hall. Beran, J. (1995), Maximum likelihood estimation of the differencing parameter for invertible short and long memory autoregressive integrated moving average models, J. R. Stat. Soc. Ser. B, 57, 659-672. Beran, J. and Feng, Y.H. (2001), Local polynomial estimation with a ARFIMAGARCH error process, Bernoulli, 7, 733-750. Berkes, I. and Horvath, L. (2003), The efficiency of the estimators of the parameters in GARCH processes, Ann. Statist., 32, 633-655. Bassett, G.J.R. and Koenker, R. (1978), Asymptotic theory of least absolute error regression, J. Amer. Statist. Assoc., 73, 618-622. Bloomfield, P. and Steiger, W.L. (1983), Least Absolute Deviation: Theory, Applications and Algorithms, Bukhausen. Bollerslev, T. (1986), Generalized autoregression conditional heteroscedasticity, J. Econometrics, 31, 307-327. Box, G.E.P. and Pierce, D.A. (1970), Distribution of the residual autocorrelations in autoregressive integrated moving average time series models, J. Amer. Statist. Assoc., 65, 1509-1526. Chan, N.H. and Palma, W. (1998), State space modeling of long-memory processes, Ann. Statist., 26, 719-740. Davis, R.A. and Dunsmuir, W.T.M. (1997), Least absolute deviation estimation for regression with ARMA errors, J. Theoret. Probab., 10, 481-497. Davis, R.A., Knight, K. and Liu, J. (1992), M-estimation for autoregressions with infinite variances, Stochastic Process. Appl., 40, 145-180. 35

Ding, Z., Granger, C.W.J. and Engle, R.F. (1993), A long memory property of stock market returns and a new model, Journal of Empirical Finance, 1, 83-106. Embrechts, P., Kluppelberg, C. and Mikosch, T. (1997), Modelling Extremal Events, Berlin: Springer. Engle, R.F. (1982), Autoregression conditional heteroscedasticity with estimates of the variance of U.K. inflation, Econometrica, 50, 987-1008. Fox, R. and Taqqu, M.S. (1986), Large-sample properties of parameter estimates for strongly dependent stationary Gaussian time series, Ann. Statist., 14, 517-532. Francq, C. and Zakoian, J.M. (2004), Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes, Bernoulli, 10, 605-637. Geweke, J. and Porter-Hudak, S. (1983), The estimation and application of longmemory time series models, J. Time Ser. Anal., 4, 229-238. Granger, C.W.J. and Joyeux, R. (1980), An introduction to long-memory time series models and fractional difference, J. Time Ser. Anal., 1, 15-39. Granger, C.W.J. and Ding, Z. (1995), Some properties of absolute return; an alternative measure of risk, Annales d’Economic et de Statistique, 40, 67-92. Granger, C.W.J. and Sin, C.Y. (2000), Modelling the absolute returns of different stock indices: exploring the forecastability of an alternative measure of risk, J. Forecast., 19, 277-298. Granger, C.W.J., Spear, S. and Ding, Z.X. (2000), Stylized facts on the temporal and distributional properties of absolute returns: an update, In Statistics and Finance: An Interface, Ed. W.S. Chan, W.K. Li and H. Tong, pp.97-120. London: Imperial college press. Hall, P. and Yao, Q. (2003), Inference in ARCH and GARCH models with heavytailed errors, Econometrica, 71, 285-317. Hosking, J.R.M. (1981), Fractional differencing, Biometrika, 68, 165-176.

36

Knight, K. (1998), Limiting distributions for L1 regression estimators under general conditions, Ann. Statist., 26, 755-770. Lehmann, E.L. (1998), Elements of Large-Sample Theory, New York: Springer. Li, G. and Li, W.K. (2005), Diagnostic checking time series model with conditional heteroscedasticity estimated by the least absolute deviation approach, Biometrika, 92, 691-701. Li, W.K. (1992), On the asymptotic distribution of residual autocorrelations in nonlinear time series modelling, Biometrika, 79, 435-437. Li, W.K. and Mak, T.K. (1994), On the squared residual autocorrelations in nonlinear time series with conditional heteroskedasticity,J. Time Ser. Anal., 15, 627-636. Li, W.K. and McLeod, A.I. (1986), Fractional time series modeling, Biometrika, 73, 217-221. Ling, S. (2003), Adaptive estimators and tests of stationary and nonstationary shortand long-memory ARFIMA-GARCH models, J. Amer. Statist. Assoc., 98, 955-967. Ling, S. and Li, W.K. (1997), On fractionally integrated autoregressive moving average time series models with conditional heteroscedasticity, J. Amer. Statist. Assoc., 92, 1184-1193. Luce, R.D. (1980), Several possible measures of risk, Theory and Decision, 12, 217228. Mandelbrot, B.B. (1977), Fractals: Form, Chance and Dimension, San Francisco: Freeman. Mandelbrot, B.B. (1983), The Fractal Geometry of Nature, San Francisco: Freeman. McLeod, A.I. and Hipel, K.W. (1978), Preservation of the rescaled adjusted range 1. A reassessment of the hurst phenomenon, Water Resources Research, 14, 491-508. 37

McLeod, A.I. and Li, W.K. (1983), Diagnostic checking ARMA time series models using squared residual autocorrelations, J. Time Ser. Anal., 4, 269-273. Mittnik, S. and Rachev, S.T. and Paolella, M.S. (1998), Stable paretian models in finance: some empirical and theoretical aspects, In A Practical Guide to Heavy Tails, Ed. R.J. Adler, R.E. Feldman and M.S. Taqqu, pp.79-110. Boston: Birkhauser. Nelder, J.A. and Mead, R. (1965), A simplex method for function minimization, Computer Journal, 7, 308-313. Peng, L. and Yao, Q. (2003), Least absolute deviations estimation for ARCH and GARCH models, Biometrika, 90, 967-975. Rachev, S.T. and Mittnik, S. (2000), Stable Paretian Models in Finance, New York: Wiley. Sowell, F. (1992), Maximum likelihood estimation of stationary univariate fractionally integrated time series models, J. Econometics, 53, 165-188. Stout, W.F. (1974), Almost Sure Convergence, New York: Academic Press. Tsay, R.S. (2002), Analysis of Financial Time Series, New York: John Wiley & Sons. Tse, Y.K. and Zuo, X.L. (1997), Testing for conditional heteroscedasticity: some Monte Carlo results, J. Stat. Comput. Simul., 58, 237-253. Taylor, S. (1986), Modelling Financial Time Series, New York: John Wiley & Sons. Weiss, A. (1986), Asymptotic theory for ARCH models: estimation and testing, Econometric Theory, 2, 107-131. Wong, H. and Ling, S. (2005), Mixed portmanteau tests for time-series models, J. Time Ser. Anal., 26, 569-579. Yajima, Y. (1988), On estimation of regression models with long-memory stationary errors, Ann. Statist., 18, 791-807.

38

_

_ _ _ _

_ _ _ _ _ _ _ _ _

_ _ _ _ _ _

_ _ _ _ _

LAD t(5)

MLE t(5)

LAD nor

MLE nor

0.2

0.4

_

_ _ _ _ _ _ _ _ _

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

_ _ _ _ _ _ _ _ _

_ _ _ _ _

_ _ _ _ _

LAD t(5)

MLE t(5)

LAD nor

MLE nor

0.0

_ _ _ _ _ _ _

_ _ _ _ _ _ _ _ _ _ _ _

0.6

_

0.6 0.4

0.8

0.8

_ _ _

0.0

0.2

d=0.3 1.0

1.0

d=−0.3

LAD t(3)

MLE t(3)

LAD t(3)

MLE t(3)

Figure 1: Boxplots for the average absolute errors of the least absolute deviation estimators (LAD) and the maximum likelihood estimators (MLE) to six stationary ARFIMA(0,d,0)-GARCH(1,1) models. Labels t(3), t(5) or nor indicate respectively that the error ut has t-distribution with 3 or 5 degrees of freedom, or a normal distribution.

39

_

1.0 0.8 0.6

_

_ _ _

LAD t(5)

MLE t(5)

LAD nor

_ _ _ _

MLE nor

_ _ _ _ _ _ _ _

0.2

_ _ _

_ _ _ _ _ _

0.4

_ _

_

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

_ _ _ _ _

_ _ _ _ _ _ _ _

_ _ _ _ _ _

_ _ _ _ _ _ _

LAD t(5)

MLE t(5)

LAD nor

MLE nor

0.0

_ _ _ _ _ _ _ _

_ _ _ _ _ _ _ _ _ _ _ _

0.0

0.2

0.4

0.6

1.0

d=1.3

0.8

d=0.7

LAD t(3)

MLE t(3)

LAD t(3)

MLE t(3)

Figure 2: Boxplots for the average absolute errors of the least absolute deviation estimators (LAD) and the maximum likelihood estimators (MLE) to six nonstationary ARFIMA(0,d,0)-GARCH(1,1) models. Labels t(3), t(5) or nor indicate respectively

0.0 0.5 1.0 1.5 2.0 2.5 3.0

that error ut has t-distribution with 3 or 5 degrees of freedom, or a normal distribution.

0

500

1000

1500

2000

2500

Figure 3: Time plot of absolute returns of the daily closing Dow Jones Industrial Average Index (1995-2004).

40

Table 1: Estimated parameter bias and square root of the mean squared error for the ARFIMA(0, 0.3, 0)-GARCH(1, 1) model (500 replications).

Known mean

n=400

βb1

0.0023

α b0

0.0695

α b1

0.0072

-0.0723

0.0457

0.1938

0.0565

0.0450

0.2257

BIAS √ MSE √ MAV

-0.0009

db

βb1

-0.0024

α b0

0.0734

α b1

0.0063

-0.0751

0.2068

0.0499

0.1964

0.0568

0.2099

0.0540

0.2355

0.0462

0.2309

0.0538

0.2386

0.0516

0.0046

-0.0478

-0.0045

0.0542

0.0045

-0.0490

0.0387

0.1565

0.0422

0.1590

0.0423

0.1613

0.0425

0.1614

0.0389

0.1537

0.0455

0.1603

0.0398

0.1550

0.0455

0.1611

BIAS √ MSE √ MAV

-0.0045

0.0747

0.0015

-0.0406

-0.0115

0.0743

0.0009

-0.0391

0.0559

0.2063

0.0479

0.1447

0.0589

0.2040

0.0476

0.1414

0.0520

0.2114

0.0486

0.1448

0.0532

0.2147

0.0485

0.1467

BIAS √ MSE √ MAV

-0.0025

0.0626

0.0001

-0.0279

-0.0069

0.0629

-0.0005

-0.0276

0.0469

0.1765

0.0402

0.1163

0.0479

0.1759

0.0394

0.1154

0.0452

0.1725

0.0416

0.1169

0.0461

0.1733

0.0415

0.1176

BIAS √ MSE √ MAV

-0.0035

0.0919

0.0006

-0.0309

-0.0129

0.0918

0.0000

-0.0305

0.0606

0.2317

0.0514

0.1227

0.0622

0.2332

0.0511

0.1233

0.0593

0.2616

0.0472

0.1339

0.0605

0.3091

0.0471

0.1562

BIAS √ MSE √ MAV

-0.0052

0.0720

0.0004

-0.0260

-0.0141

0.0706

-0.0005

-0.0246

0.0542

0.1931

0.0423

0.1023

0.0586

0.1922

0.0423

0.1018

0.0513

0.1984

0.0409

0.1055

0.0522

0.1977

0.0407

0.1052

t(3) n=300

db

Unknown mean

BIAS √ MSE √ MAV

t(5) n=300

n=400

N ormal n=300

n=400

41

Table 2: The empirical (Si ) and the large sample (Ai ) standard errors of absolute residual autocorrelations and residual autocorrelations.

lag i Absolute residual autocorrelations

Residual autocorrelations

1

2

3

4

5

6

1

2

3

4

5

6

Ai

0.0375

0.0500

0.0532

0.0543

0.0550

0.0555

0.0535

0.0564

0.0571

0.0573

0.0574

0.0575

Si

0.0305

0.0496

0.0536

0.0524

0.0525

0.0556

0.0510

0.0560

0.0572

0.0560

0.0563

0.0584

Ai

0.0292

0.0394

0.0411

0.0418

0.0424

0.0430

0.0414

0.0437

0.0442

0.0444

0.0445

0.0445

Si

0.0242

0.0363

0.0398

0.0403

0.0398

0.0409

0.0400

0.0434

0.0438

0.0448

0.0450

0.0432

Ai

0.0315

0.0495

0.0524

0.0536

0.0545

0.0553

0.0513

0.0559

0.0568

0.0571

0.0573

0.0574

Si

0.0289

0.0494

0.0512

0.0533

0.0523

0.0532

0.0478

0.0548

0.0591

0.0594

0.0580

0.0579

Ai

0.0242

0.0388

0.0406

0.0414

0.0421

0.0427

0.0397

0.0433

0.0440

0.0443

0.0444

0.0445

Si

0.0223

0.0377

0.0406

0.0412

0.0430

0.0421

0.0385

0.0423

0.0440

0.0444

0.0451

0.0437

Ai

0.0276

0.0493

0.0520

0.0534

0.0544

0.0553

0.0504

0.0557

0.0567

0.0571

0.0573

0.0574

Si

0.0278

0.0495

0.0498

0.0533

0.0552

0.0540

0.0495

0.0537

0.0575

0.0579

0.0577

0.0587

Ai

0.0214

0.0387

0.0403

0.0411

0.0419

0.0427

0.0390

0.0432

0.0440

0.0443

0.0444

0.0445

Si

0.0212

0.0387

0.0404

0.0411

0.0416

0.0422

0.0387

0.0415

0.0447

0.0433

0.0442

0.0440

t(3) n=300

42

n=500 t(5) n=300 n=500 N ormal n=300 n=500

Table 3: The empirical size and power of Qa (M ) and Qr (M ) (replications=1000, M = 6) Size

Type I Power

Type II Power

Qa (M ) Qr (M )

Qa (M ) Qr (M )

Qa (M ) Qr (M )

t(3)

Differencing parameter d = −0.3

n=400

0.043

0.043

0.328

0.078

0.039

0.374

n=600

0.046

0.044

0.500

0.088

0.037

0.620

n=400

0.055

0.063

0.733

0.099

0.052

0.620

n=600

0.053

0.047

0.901

0.104

0.045

0.801

n=400

0.058

0.047

0.919

0.105

0.048

0.691

n=600

0.045

0.053

0.984

0.099

0.038

0.990

t(5)

N ormal

t(3)

Differencing parameter d = 0.3

n=400

0.051

0.038

0.338

0.087

0.039

0.388

n=600

0.053

0.043

0.462

0.086

0.034

0.589

n=400

0.061

0.043

0.747

0.108

0.047

0.621

n=600

0.051

0.052

0.860

0.094

0.039

0.824

n=400

0.055

0.056

0.913

0.104

0.063

0.698

n=600

0.048

0.053

0.981

0.092

0.049

0.886

t(5)

N ormal

t(3)

Differencing parameter d = 0.7

n=400

0.052

0.048

0.319

0.082

0.038

0.346

n=600

0.049

0.043

0.475

0.073

0.041

0.588

n=400

0.052

0.049

0.742

0.105

0.050

0.599

n=600

0.048

0.048

0.900

0.092

0.038

0.815

n=400

0.047

0.043

0.923

0.124

0.050

0.701

n=600

0.051

0.043

0.987

0.114

0.053

0.875

t(5)

N ormal

43

1.0 0.8 0.6 0.4 0.2 0.0

0

50

100

150

200

Figure 4: Autocorrelation function of the absolute return of the daily closing Dow Jones Industrial Average Index (1995-2004). The dotted lines have the values of √ ±2/ n.

4

6

8

10 12 14

6

8

10 12 14

2

4

6

8

10 12 14

0.10 −0.10 2

4

6

8

10 12 14

2

4

6

8

10 12 14

0.00 −0.10

0.00 −0.10

0.00

0.00

0.10 0.00 4

2

4

6

8

10 12 14

2

4

6

8

10 12 14

0.10

2

2

0.00

10 12 14

Model 4

−0.10

8

0.10

6

−0.10

0.00

0.10

Model 3

0.10

4

0.10

2

−0.10

Absolute Residuals

Model 2

−0.10

0.00 −0.10

Residuals

0.10

Model 1

Figure 5: Autocorrelation functions of residuals and absolute residuals coming from Models 1-4 and the dotted lines show ±1.96Ai , the 95% asymptotic confidence interval, where Ai , i = 1, ..., 15, is the asymptotic standard error. 44

Suggest Documents