Estimating ARMA Models Efficiently - CiteSeerX

Estimating ARMA Models Efficiently†

Rómulo A. Chumacero‡

ABSTRACT This paper discusses the asymptotic and finite sample properties of the Efficient Method of Moments (EMM) when applied to estimating stationary ARMA models. Issues such of identification, model selection, and testing are also discussed. The properties of these estimators are compared to those of Maximum Likelihood (ML) by means of Monte Carlo experiments for both, invertible and non-invertible ARMA models.

Keywords: Monte Carlo, Efficient Method of Moments, ARMA, Identification, Maximum Likelihood, Model Selection.

JEL Classification: C15, C22, C52.

Santiago, February 1998

†

This research was partially supported by the Research Fund of the Department of Economics of the University of Chile. I would like to thank Miguel Basch, George Tauchen and the participants of the Economics Workshop at the University of Chile for helpful comments. The usual disclaimer applies. ‡ Department of Economics. University of Chile. E-mail: [email protected].

1

Introduction

There is a long-standing tradition for estimating stationary ARMA models by means of likelihood-based methods. Another types of estimators rely on the estimation of a sequence of long autoregressions or the usage of non-linear least squares.1 Gallant and Tauchen (1996) develop a class of minimum chi-square estimator (called the Efficient Method of Moments, or EMM for short) that is particularly suitable, among other things, for estimating stationary ARMA models. As a matter of fact, Gouriéroux, et.al. (1993) estimate an invertible MA(1) model using a method that is similar in spirit to EMM (Indirect Inference). The usage of simulation-based methods such as the one discussed here is not new. Gouriéroux, et.al. (1993), Ghysels, et.al. (1994), Chumacero (1997) and Michaelides and Ng (1997) use these methods to evaluate their finite sample properties. This paper presents a generalization of this approach for estimating any stationary ARMA model (invertible or non-invertible) by using EMM. Issues such as identification, testing, and model selection are also discussed. The finite sample properties of several specifications are compared to likelihood-based estimators. The paper is organized as follows. Section 2 presents a brief description of EMM. Section 3 describes the class of estimators that will be considered, discusses the issue of identification, and derives their asymptotic properties. Results for two types of processes (MA(1) and ARMA(1,1) models) are also derived. Section 4 presents the results of Monte Carlo experiments to assess the finite sample properties of the estimators described in the previous section. Finally, Section 5 summarizes the main findings.

1

See Ghysels, et.al. (1994) for references. 1

2

The Efficient Method of Moments

(

)

Consider a stationary stochastic process p y t x t , ρ that describes yt in terms of exogenous variables (xt) and structural parameters (ρ) which the researcher is

(

interested in estimating. Consider an auxiliary model f y t x t , θ

(

)

that has an

)

analytical expression whereas the density p y t x t , ρ may not. Gallant and Tauchen (1996) propose to use the scores of the auxiliary model:

0∂ / ∂θ 5 ln f 4y x ,θ 9 t

t

(1)

T

in order to generate the GMM moment conditions

4

9

mT ρ , θ T =

I I0

5 4

9 2

7 2 7

∂ / ∂θ ln f yt xt , θ T p y x , ρ dy p x ρ dx

(2)

05

Where θ T is defined as the maximum likelihood estimator of f ⋅ for a sample of size T: T

4

θ T = arg max ∑ ln f yt \ xt , θ T θ ∈Θ

t =1

9

(3)

For cases in which analytical expressions for (2) are not available, simulations may be required to compute them; in that case, we approximate the moments by:

4

9

4

0

9

5 4

1 N mT ρ , θ T ≅ mN ρ , θ T = ∑ ∂ / ∂θ ln f ~ yn ~ xn , θ T N n=1

9

(4)

where N is the sample size of the Monte Carlo integral approximation drawn from one sample of y and x generated for a given value of ρ in the structural model. When (4) is used to approximate the moments, the GMM estimator of ρ with an efficient weighting matrix is given by:

4

94 9

ρ = arg min mN′ ρ , θ T IT ρ ∈R

−1

4

mN ρ , θ T

9

2

(5)

If the auxiliary model constitutes a good statistical description of the data generating process of y, the outer-product of the gradients (OPG) can be used in the weighting matrix; that is:

4 !

4

1 T IT = ∑ ∂ / ∂θ ln f yt xt , θ T T t =1

99"$# !4∂ / ∂θ ln f 4y x ,θ 99"$#′ t

t

T

(6)

Of course, in cases in which a Monte Carlo approximation is not needed, (2) should be introduced in (5). Gallant and Tauchen (1996) demonstrate the strong convergence and asymptotic normality of the estimator presented in (5).2

1

6

3

D T ρ − ρ 0  → N 0, dρ′ I −1dρ

1

8

−1

(7)

6

a.s. a .s. where dρ = ∂m ρ 0 , θ 0 / ∂ρ ′ , θ T  → θ 0 , and IT  → I.

By standard arguments, it can also be shown that the asymptotic distribution of the objective function that ρ minimizes is given by:

4

94 9

TJ T = TmN′ ρ , θ T IT

−1

4

9

D mN ρ , θ T  → χ 2h − r

(8)

where r and h denote the dimensions of ρ and θ respectively. Equation (8) corresponds to the familiar over-identifying restrictions test described by Hansen (1982). As in GMM, the order condition for identification requires that h≥r. The rank condition is more involved, given that in this setup, we require the existence of a unique function that links ρ and θ in a sense that will be made precise shortly. Given the results described, and if the identification conditions are met, statistical inference may be carried out in identical fashion as in GMM. However, depending on the complexity of the auxiliary model, Wald type tests based on the variance-

2

See Tauchen (1996) for a step-by-step derivation of these results. 3

covariance matrix obtained by differentiating the moments may be difficult to construct (see Chumacero, 1997).

3

Estimating ARMA Models with EMM

3.1

The General Case

Consider a gaussian stationary ARMA(p,q) model with q ≠ 0 that we are interested in estimating, and denote by γ i (i = 0 ,1,...) its autocovariances. p

q

i =1

i =1

yt = ∑ δ i yt − i + ε t + ∑ α i ε t − i ,

2

ε t ~ N 0, σ 2

7

(9)

The auxiliary model used for estimating this process is given by:

2

j

yt = ∑ β i yt − i + vt , vt ~ N 0, σ 2v i =1

7

(10)

To make the correspondence with the notation introduced in the previous section,

3

8

the structural parameters of this model are ρ = δ 1 ,.. , δ p , α 1 ,.., α q , σ 2 , and the

3

8

parameters of the auxiliary model are θ = β 1 , β 2 ,.., β j , σ v2 . The order condition for identification requires j ≥ p + q . The rank condition can be studied by evaluating the asymptotic properties of the estimators of the AR(j) auxiliary model. In this case, it is trivial to verify that:

β "# γ γ γ "# γ "# β "# γ γ γ γ β β # ## = β # # β = → = # # # β ## !γ γ γ #$ !γ #$ !β ##$ ! $ ∑ y − ∑ β y σ = → γ 1 + ∑ β + 2∑ γ ∑ β T−j −1

1 2

a.s.

0

1

j −1

1

0 ,1

1

0

j −2

2

0,2

0

j −1

j

t = j +1

t

i =1

j

0

(11)

0, j

2

j

T

2 v

j−2

i

t−i

0

i =1

j −i

j

j

a.s.

2 0 ,i

i =1

i

l=1

0 ,i

β 0 ,i + l − β 0 ,i

Thus, all the parameters of the auxiliary model are (asymptotically) a non-linear function of the first j autocovariances of y. Identification requires the existence of a 4

unique set of parameters in the structural model that accompanied by the consistent estimators of the auxiliary model make (2) equal to 0. As will be shown shortly, this condition is equivalent to the identibiability condition that must also be imposed for estimating ARMA models with ML (see Hamilton, 1994 or Tanaka, 1996). As is well documented, for every invertible stationary ARMA model there is an observationally equivalent non-invertible ARMA model. Thus, the same requirements that are imposed in any stationary ARMA model that is to be estimated by ML ought to be made when estimating it by EMM. Given that the structural model is assumed to be gaussian, it can be shown that the scores of the auxiliary model are:

4

9

γ

mT ρ , θ T =

!

i

"# i = 1,.., j # ## σ 1 + ∑ β + 2∑ γ ∑ β β − β # ## 2σ $

i + ∑ β lγ i − l +

1 − 2+ 2σ v

l =1

j

∑ β γ l

l = i +1

l−i

2 v

γ0

j

i =1

j−i

j

2 i

i

i =1

l =1

i

i +l

(12)

i

4 v

where the dimension of (12) is (j+1)x1 and correspond to the unconditional expectation of the scores of the auxiliary model. It is trivial to verify that the last moment condition is equal to zero when β is replaced by β 0 . This is also the case for the first j moments (up to a reparameterization discussed later). One important point to be mentioned is that for gaussian ARMA processes there is no need to approximate the moment conditions by Monte Carlo simulation (as in equation 4) given that the exact conditions can be directly derived analytically. Next, we apply these results to two particular processes; namely a MA(1) and an ARMA(1,1). Later on, we conduct Monte Carlo experiments to assess the finite sample properties of different EMM estimators of these processes and compare their properties with those of ML.

5

3.2 The MA(1) Model The model that we are interested in estimating takes the form:

2

yt = ε t + αε t −1 , ε t ~ N 0, σ 2

7

(13)

while the auxiliary model is again in AR(j) process. In this case the autocovariances are given by:

2

7

γ 0 = σ 2 1 + α 2 , γ 1 = σ 2α , γ i = 0 for i > 1. It is trivial to verify that for any AR(j) auxiliary model their estimates converge to:

β "# → 1 0−15 α 41 − α 1 1 6 1−α 1 6 !σ #$ ! σ 41 − α 9 i −1

i 2 v

2 j + 1− i

i

a.s.

2 j +1

2 j+2

2

6

9"# #$

for i = 1,.., j

(14)

when α ≠ 1, and to:

β "# → 1 0−15 α 1 j + 1 − i6"# j +1! σ 1 j + 26 #$ !σ #$ i −1

i 2 v

i

for i = 1,.. , j

a.s.

2

(15)

in the unit root case. Once analytical expressions for the limiting values of the estimated parameters of the auxiliary model are found, it is easy to verify (by direct substitution) that (12) is equal to zero. Notice however that given that the estimates of the auxiliary model are functions of the autocovariances, there is always an invertible MA model that is observationally equivalent to a non-invertible MA model. In this case, it is easy to verify that the following MA(1) model reproduces the same autocovariances, the same estimates of the auxiliary model, thus also satisfying (12). That process is:

2

yt = ε t + α * ε t −1 , ε t ~ N 0, σ *2

7

(16)

with α * = 1 / α and σ 2* = σ 2α 2 . Thus, the only cases in which EMM can satisfy the rank condition (exactly) are when α = ±1. 6

Thus, just as in ML, EMM requires for identification that the econometrician constraints its attention either to the invertible or the non-invertible case. It is important to notice however that for estimation of either an invertible or noninvertible process, EMM delivers exactly the same moment conditions. In contrast, ML requires the specification of the unconditional (exact) likelihood function when dealing with non-invertible processes and makes it computationally more expensive than EMM. We will discuss this point later.

3.3 The ARMA(1,1) Model The other model that we are interested in estimating is:

2

yt = δyt −1 + ε t + αε t −1 , ε t ~ N 0, σ 2

7

(14)

while the auxiliary model is again in AR(j) process. In this case the autocovariances are given by:

γ0 =

2

σ 2 1 + α 2 + 2αδ 1− δ

2

7,

γ 1 = γ 0δ + σ 2α , γ i = δγ i −1 for i > 1.

This process renders estimates of the auxiliary model and moment conditions that are given by expressions and (11) and (12). Just as in the case of the MA(1) process discussed previously, when α * = 1 / α and σ 2* = σ 2α 2 are replaced in (14), that process is observationally equivalent. Again, identification requires of a stance regarding the invertibility of the process. The next section develops several Monte Carlo experiments to assess the finite sample properties of several EMM estimators for these two special cases, discusses the issue of the choice of the auxiliary model, testing, and compares their results with those obtained with ML.

7

4

The Monte Carlo Experiments

There have been at least four Monte Carlo experiments to assess the finite sample properties of simulation-based methods for invertible MA(1) processes. Gouriéroux, et.al. (1993) use Indirect Inference with AR(1), AR(2), and AR(3) auxiliary models; setting α = −0.5. Ghysels, et.al. (1994) compare several simulation-based estimators for different specifications of the invertible MA(1) model using, as in Gouriéroux, et.al. (1993), an AR(3) auxiliary model. For the case of the estimators with EMM they approximate the moment conditions by simulation. Michaelides and Ng (1997) also perform Monte Carlo experiments for different sample sizes and different values for N for approximating the moment conditions in (4). Nevertheless, as there histograms show, even though they are estimating an invertible MA(1) model, they allow for estimates of α greater than 1 (in absolute value), thus not imposing the necessary identification conditions. In this study they also fix the auxiliary model to an AR(3). Chumacero (1997) also study the case of the finite sample properties of the invertible MA(1) model with fixed AR auxiliary models (2 and 3). Two other Monte Carlo experiments for more complex setups are also studied, showing the superior performance of EMM with respect to GMM in several dimensions. In a different setup, Gallant and Tauchen (1998) also present Monte Carlo evidence of the gains in efficiency of using EMM with respect to conventional method of moments estimators. There are however several questions that are left unanswered in the papers just mentioned. This section presents to give answers to some of them. In particular: •

For a practical standpoint, how should the auxiliary model be chosen? Is there any model selection criterion that is superior in terms of efficiency?

8

•

Chumacero (1997) and Michaelides and Ng (1997) present evidence that the finite sample properties of the over-identifying restrictions test described in (8) presents problems of size. In particular, there is strong evidence of over-rejections. Is there a simple way to correct this problem and provide a better approximation to the asymptotic distribution of this test?

•

How well do Wald type tests perform under different specifications of ARMA models?

•

How well does EMM perform when estimating non-invertible ARMA models?

•

Is there any gain from estimating stationary ARMA models with EMM instead of ML?

The results of different Monte Carlo experiments reported shortly intend to provide answers to these questions.

4.1

Design of the Experiments

In order to answer the questions previously posed, two types of experiments were implemented. One for different specifications of MA(1) model (both invertible and non-invertible), and the other for different specifications of an ARMA(1,1) model. In each case the results are compared to the properties of ML estimators. The ML estimators were estimated using the conditional likelihood when the true process was invertible and the unconditional (exact) likelihood when the process was not. For each setup, 1,000 samples, each of size T=100 and T=200 were used. These lengths were chosen to make comparable these experiments with previous studies (particularly Ghysels, et.al., 1994 and Michaelides and Ng, 1997).

9

In the case of EMM two different types of estimators were chosen. One uses the exact moment conditions (equation (12)) and the other approximates them using equation (8) with N=2,500. Finally, three selection criteria for the choice of the lag length of the auxiliary model were used. These are the Akaike Information Criterion (AIC), the Schwarz Criterion (BIC), and the Hannan & Quinn criterion (HQ). Given that the auxiliary model assumes normality, they are defined as:

16 1 6 BIC1 j 6 = ln σ + 1 j ln T / T 6 HQ1 j 6 = ln σ + 22 j ln1ln T 6 / T 7

AIC j = ln σ 2v + 2 j / T 2 v

(15)

2 v

where in all cases j is chosen so as to minimize (15). Chumacero (1997) showed that the choice of weighting matrix is not as crucial in EMM as in GMM and that (6) provides basically identical results as other procedures requiring the computation of HAC matrices. This is the case when the auxiliary model is chosen correctly, in the sense of providing a good statistical description of all the features of the data. Thus, here we concentrate in the comparison of the model selection criteria described in (15) and their implications for the properties of the coefficients and the statistics commonly used for inference. In all the samples the maximum and minimum lags for the auxiliary model were set to 20 and 2 respectively. This last figure is justified because we want the model to be overidentified. For each sample, each expression in (15) was minimized and j was chosen accordingly.

4.2 Results of the MA(1) Model Tables 1 to 5 and Figures 1 and 2 report the results for different specifications of the MA(1) model. In all, seven different specifications were estimated and compared to the results obtained by ML. The main findings are:

10

•

As known, BIC tends to select more parsimonious models, followed by HQ, and AIC always chooses larger auxiliary models.

•

The three selection criteria provide no discernable differences in terms of bias. Nevertheless, AIC tends to provide more efficient estimates, followed by HQ, and finally BIC. The reason is that, particularly in cases in the neighborhood of a unit root, BIC is too conservative in the choice of lag length (Tables 1 and 2).

•

When we compare the results of EMM with ML, we find that EMM corrects the bias that ML has when the parameter is close to the unit root and the unconditional likelihood is used for estimation.

•

In this simple example, a choice of N=2,500 approximates well the exact moment conditions. Nevertheless, given that for any stationary ARMA model, the moments can be computed exactly, there is no need to use these approximations.3

•

Besides that using the exact moment conditions (when available) renders better finite sample results, there is also a practical reason to do so. The gains in computational time of estimation are dramatic. As a matter of fact, even in this simple set-up, obtaining EMM estimates using the exact moment conditions is 10 times faster than obtaining then using conditional ML, 20 times faster than using the exact ML, and 15 times faster than using the approximate moment conditions by EMM. There is then an important gain in efficiency (measured in computational time) by using EMM.4

•

As documented in Chumacero (1997) and Michaelides and Ng (1997), in this set-up, EMM tends to over-reject the null when using the standard over-identifying restrictions test. Furthermore, the less parsimonious the

3

Curiously, all the Monte Carlo experiments conducted in this set-up use the approximation to the moment conditions instead of the exact moments. 4 Of course, these gains are even greater when dealing with more complex ARMA models. 11

auxiliary model, the greater the size problem. Thus, AIC tends to overreject more than HQ, which does the same compared to BIC. The magnitude of the over-rejections is rather important; thus a simple correction is suggested. Instead of using the actual sample size, use T-q as scaling in (8). As Tables 3 and 4 show, this simple correction allows for an almost complete rectification of the size distortion around standard levels of significance. This is particularly true when AIC is chosen as selection criterion as Figure 1 makes evident. At any rate, size distortions are still important for levels of more than 15%. •

With regard to Wald type tests, when the parameter is relatively far away from the unit circle, they have well (Table 5 and Figure 2). Nevertheless, they tend to become unstable near the unit root, thus making the inversion of Chi-square tests preferable. Notice however, that in all these cases, a correction as the one indicated previously is also needed.

•

Concluding, EMM tends to perform well when estimating MA(1) models, particularly on cases near or in a unit root, where the bias that is usually associated with conditional ML is diminished. At any rate, ML is still more efficient in terms of Root Mean Square Error. There is however one important gain in efficiency when using EMM and it comes from the time needed to estimate these models. In fact, EMM is at least 10 times faster than ML. When a suitable transformation to the over-identifying restrictions test is performed, AIC should be chosen as selection criterion because it usually provides more efficient estimates. Nevertheless, AIC should not be chosen for inference if the transformation is not performed.

4.2

Results of the ARMA(1,1) Model

As in the previous exercise, several specification for the ARMA(1,1) were estimated. In all the cases, the parameter associated with the autoregressive coefficient was set equal to -0.8 and the coefficient associated with the MA component was allowed to 12

vary. Once again 1,000 samples each of sizes 100 and 200 were artificially generated and estimated by ML and three EMM estimators. In the later cases, the exact moment conditions were used and the auxiliary models were chosen using the three selection criteria discussed previously. The results of these experiments are reported in Table 6 and can be summarized as follows: •

Once again, AIC tends to select larger models, followed by HQ and finally BIC selects, on average, smaller lag lengths.

•

In terms of the RMSE of the autoregressive coefficient, it is interesting to notice that the selection criterion that renders the smallest one is BIC, followed by HQ, and finally AIC. This order is reversed when we compare the RMSE of the MA coefficient. The intuition here is clear: Particularly in the case in which the process is invertible, the auxiliary model that would better capture the dynamics of the MA coefficient is one that requires several lags. On the other side, to better capture the characteristics of the AR component, parsimony is preferred (BIC).

•

Nevertheless, all the models do as well in terms of bias. As already discussed previously, particularly in the case when the MA coefficient is close to the unit circle, EMM reduces the bias associated with ML.

•

The gains in computational time are increased substantially when using EMM instead of ML. In particular, the average estimation for a sample size of 200 is 20 times faster with EMM than with ML. These gains are increased to a favor of 35 if the exact likelihood is used when estimating by ML.

5 Conclusions In this paper we develop a methodology for estimating stationary gaussian ARMA models (invertible or not) by means of EMM. We show that, in contrast to the 13

prevailing practice, simulation is not required in order to compute the moment conditions that EMM uses. The gains in terms of efficiency and computational time required for estimating these models are substantial. In cases in which the ARMA process is close to the unit circle, EMM may be preferred because of bias reduction. In any case, ML still renders substantially lower RMSE than EMM. Other issue addressed on this paper corresponds to the way in which the auxiliary model ought to be chosen. Three automatic selection criteria were examined. It is shown that AIC tends to perform better than BIC and HQ (in terms of RMSE) when estimating pure MA models. Another result that comes from the experiments performed is that there is a simple way to correct to problem of over-rejections that typically occurs when using Hansen’s (1982) test. If this correction is not performed, large-scale auxiliary models will present important size distortions. When dealing with ARMA models that are relatively close to the unit circle, Wald type tests will not perform adequately. Thus, inverting chi-square tests is preferable once a (T-q) correction is performed.

14

Table 1 Properties of the Estimators: MA(1) Model

α=0.5 Estimator ML AIC AICE BIC BICE HQ HQE α=1.0 Estimator MLE AIC AICE BIC BICE HQ HQE α=1.5 Estimator MLE AIC AICE BIC BICE HQ HQE

Mean 0.493 0.489 0.489 0.489 0.489 0.493 0.493 Mean 1.002 0.911 0.912 0.927 0.924 0.920 0.919 Mean 1.526 1.564 1.560 1.564 1.560 1.544 1.541

T=100 RMSE 0.094 0.138 0.137 0.139 0.137 0.138 0.136 T=100 RMSE 0.037 0.144 0.143 0.191 0.192 0.161 0.159 T=100 RMSE 0.204 0.410 0.393 0.412 0.407 0.383 0.381

j 3.049 3.049 2.078 2.078 2.311 2.311 j 7.803 7.803 3.594 3.594 5.244 5.244 j 3.926 3.926 2.405 2.405 2.864 2.864

Mean 0.496 0.497 0.498 0.494 0.495 0.498 0.499 Mean 0.999 0.947 0.948 0.948 0.951 0.949 0.950 Mean 1.514 1.516 1.512 1.498 1.492 1.497 1.493

T=200 RMSE 0.064 0.085 0.082 0.088 0.086 0.087 0.085 T=200 RMSE 0.016 0.087 0.085 0.122 0.118 0.099 0.096 T=200 RMSE 0.131 0.202 0.195 0.253 0.243 0.223 0.216

j 3.419 3.419 2.148 2.148 2.436 2.436 j 10.648 10.648 5.216 5.216 7.475 7.475 j 4.798 4.798 2.823 2.823 3.478 3.478

Notes: The results were obtained by estimating 1,000 samples, each of size T. RMSE = Root Mean Square Error. j = Average lag length of the auxiliary model. ML = Results obtained with the conditional Maximum Likelihood estimator. MLE = Results obtained with the unconditional (exact) Maximum Likelihood estimator. AIC = Results from EMM using AIC as information criterion and a numerical approximation for the moments (N=2,500). AICE = Results from EMM using AIC as information criterion and the exact moment conditions. BIC = Results from EMM using BIC as information criterion and a numerical approximation for the moments (N=2,500). BICE = Results from EMM using BIC as information criterion and the exact moment conditions. HQ = Results from EMM using HQ as information criterion and a numerical approximation for the moments (N=2,500). HQE = Results from EMM using HQ as information criterion and the exact moment conditions.

15

Table 2 Properties of the Estimators: MA(1) Model

α=-0.5 Estimator ML AICE BICE HQE α=-0.95 Estimator ML AICE BICE HQE α=-1.0 Estimator MLE AICE BICE HQE α=-1.25 Estimator MLE AICE BICE HQE

Mean -0.508 -0.506 -0.505 -0.508 Mean -0.911 -0.908 -0.899 -0.907 Mean -1.001 -0.921 -0.935 -0.927 Mean -1.241 -1.258 -1.258 -1.244

T=100 RMSE 0.098 0.136 0.137 0.137 T=100 RMSE 0.073 0.113 0.145 0.125 T=100 RMSE 0.031 0.131 0.176 0.145 T=100 RMSE 0.114 0.220 0.291 0.250

j 3.140 2.096 2.352 j 7.598 3.635 5.259 j 7.884 3.708 5.479 j 5.484 2.996 3.960

Mean -0.505 -0.505 -0.505 -0.509 Mean -0.926 -0.936 -0.930 -0.938 Mean -0.999 -0.950 -0.954 -0.954 Mean -1.246 -1.243 -1.215 -1.225

T=200 RMSE 0.065 0.081 0.085 0.081 T=200 RMSE 0.046 0.069 0.092 0.075 T=200 RMSE 0.016 0.082 0.110 0.091 T=200 RMSE 0.074 0.131 0.180 0.152

j 3.455 2.159 2.456 j 10.313 5.114 7.398 j 10.755 5.288 7.696 j 6.869 3.853 5.001

Notes: The results were obtained by estimating 1,000 samples, each of size T. RMSE = Root Mean Square Error. j = Average lag length of the auxiliary model. ML = Results obtained with the conditional Maximum Likelihood estimator. MLE = Results obtained with the unconditional (exact) Maximum Likelihood estimator. AICE = Results from EMM using AIC as information criterion and the exact moment conditions. BICE = Results from EMM using BIC as information criterion and the exact moment conditions. HQE = Results from EMM using HQ as information criterion and the exact moment conditions.

16

Table 3 Properties of the Over-Identifying Restrictions Test: MA(1) Model T=100

α=0.5 Estimator AIC AICE BIC BICE HQ HQE α=1.0 Estimator AIC AICE BIC BICE HQ HQE α=1.5 Estimator AIC AICE BIC BICE HQ HQE

M 0.3 0.4 13.9 17.3 5.0 6.6

TJT 5% 7.2 6.2 5.1 4.7 5.1 4.6

M 0.1 0.1 0.1 0.1 0.1 0.1

TJT 5% 13.1 12.8 8.9 7.7 10.4 9.6

M 0.2 0.6 1.2 2.1 0.7 1.6

TJT 5% 8.0 7.1 5.9 5.3 5.5 5.0

T=200

(T-q)JT-q 10% M 5% 10% M 13.5 1.8 4.9 10.3 0.1 13.1 2.3 4.7 10.1 0.1 10.7 20.4 4.4 9.9 2.8 10.0 24.5 4.6 9.5 6.2 11.0 9.8 4.0 9.8 0.8 10.4 12.2 4.2 9.2 2.4 T=100 (T-q)JT-q 10% M 5% 10% M 20.7 3.1 5.7 12.2 0.1 20.3 6.7 5.4 11.2 0.1 18.1 0.1 6.2 14.9 0.1 16.3 0.1 5.4 13.1 0.1 18.4 0.3 5.6 13.1 0.1 17.6 0.7 5.4 11.4 0.1 T=100 (T-q)JT-q 10% M 5% 10% M 14.9 3.3 5.4 10.9 0.2 13.1 6.5 5.1 9.7 1.4 12.5 3.1 5.1 11.2 3.7 10.6 5.1 4.1 9.5 11.3 12.0 3.1 4.8 10.3 4.0 10.0 5.9 3.9 8.3 14.3

TJT (T-q)JT-q 5% 10% M 5% 10% 10.6 18.6 0.1 8.0 16.8 7.9 16.4 0.3 7.1 14.3 7.4 13.4 3.8 6.8 12.9 5.7 10.9 8.1 5.6 10.4 8.1 14.4 1.4 7.4 13.8 6.5 12.5 3.7 6.1 11.8 T=200 TJT (T-q)JT-q 5% 10% M 5% 10% 13.5 21.9 0.2 8.1 15.4 11.0 18.9 4.7 6.4 12.8 10.6 19.3 0.1 8.3 16.1 8.6 16.8 0.2 6.4 13.7 12.4 21.1 0.1 8.2 16.7 9.5 18.4 1.0 6.4 13.7 T=200 TJT (T-q)JT-q 5% 10% M 5% 10% 7.9 17.7 1.3 6.0 13.2 7.0 14.6 6.4 5.4 11.7 7.9 15.3 6.2 7.0 14.0 6.8 12.4 16.7 5.8 11.7 6.8 15.2 8.3 6.2 12.8 6.4 11.9 24.3 5.3 10.6

Notes: The results were obtained by estimating 1,000 samples, each of size T. TJT = Over-identifying restrictions test. (T-q)JT-q = Over-identifying restrictions test adjusted by degrees of freedom. M = P-value of the Mann-Whitney-Wilcoxon Test. 5%, 10% = Size of the test for the nominal counterpart. AIC = Results from EMM using AIC as information criterion and a numerical approximation for the moments (N=2,500). AICE = Results from EMM using AIC as information criterion and the exact moment conditions. BIC = Results from EMM using BIC as information criterion and a numerical approximation for the moments (N=2,500). BICE = Results from EMM using BIC as information criterion and the exact moment conditions. HQ = Results from EMM using HQ as information criterion and a numerical approximation for the moments (N=2,500). HQE = Results from EMM using HQ as information criterion and the exact moment conditions.

17

Table 4 Properties of the Over-Identifying Restrictions Test: MA(1) Model

α=-0.5 Estimator AICE BICE HQE α=-0.95 Estimator AICE BICE HQE α=-1.0 Estimator AICE BICE HQE α=-1.25 Estimator AICE BICE HQE

T=100

T=200

TJT (T-q)JT-q M 5% 10% M 5% 10% 0.4 6.0 12.7 2.7 4.2 10.2 17.0 3.8 9.4 24.5 3.3 8.6 6.7 4.5 10.8 12.7 3.8 9.5 T=100 TJT (T-q)JT-q M 5% 10% M 5% 10% 0.1 9.8 18.9 5.9 4.5 9.4 0.1 7.2 14.7 0.1 5.7 11.9 0.1 6.9 15.0 0.6 4.2 9.2 T=100 TJT (T-q)JT-q M 5% 10% M 5% 10% 0.1 11.1 20.6 3.9 4.8 9.2 0.1 7.8 15.6 0.1 6.0 12.0 0.1 7.4 15.8 0.4 4.4 9.4 T=100 TJT (T-q)JT-q M 5% 10% M 5% 10% 0.2 6.3 12.2 14.2 2.5 7.2 0.1 5.2 11.8 0.9 4.2 10.0 0.5 4.7 12.0 6.2 3.0 8.6

TJT (T-q)JT-q M 5% 10% M 5% 10% 0.1 9.4 16.3 0.3 7.4 14.3 6.3 6.1 12.5 8.3 6.0 12.0 3.7 7.1 13.4 5.5 6.7 12.7 T=200 TJT (T-q)JT-q M 5% 10% M 5% 10% 0.1 11.3 17.9 8.7 6.3 12.1 0.1 6.2 14.0 0.3 4.4 11.8 0.1 8.1 14.5 1.8 5.7 11.2 T=200 TJT (T-q)JT-q M 5% 10% M 5% 10% 0.1 11.3 19.3 4.7 6.9 12.2 0.1 6.8 14.9 0.1 5.0 12.4 0.1 8.7 15.9 1.0 5.9 12.0 T=200 TJT (T-q)JT-q M 5% 10% M 5% 10% 2.8 5.9 12.4 23.9 4.3 8.6 6.5 4.7 10.2 14.4 4.4 8.9 9.1 4.1 10.8 27.0 3.4 8.4

Notes: The results were obtained by estimating 1,000 samples, each of size T. TJT = Over-identifying restrictions test. (T-q)JT-q = Over-identifying restrictions test adjusted by degrees of freedom. M = P-value of the Mann-Whitney-Wilcoxon Test. 5%, 10% = Size of the test for the nominal counterpart. AICE = Results from EMM using AIC as information criterion and the exact moment conditions. BICE = Results from EMM using BIC as information criterion and the exact moment conditions. HQE = Results from EMM using HQ as information criterion and the exact moment conditions.

18

Table 5 Properties of the Wald Test: MA(1) Model T=100

α=0.5 Estimator ML AIC AICE BIC BICE HQ HQE α=1.5 MLE AIC AICE BIC BICE HQ HQE α=-0.5 Estimator ML AICE BICE HQE

M 20.0 15.2 17.7 47.4 49.1 33.4 36.0

38.7 8.0 7.5 0.2 0.2 0.6 0.6

M 13.7 13.4 45.5 34.0

WT 5% 7.2 8.4 9.1 6.3 6.8 6.4 7.3

10% M 12.6 20.0 13.3 19.4 13.0 22.4 11.0 48.1 10.6 44.5 11.5 38.1 11.3 40.9 T=100

T=200 WT-q 5% 7.2 8.1 8.3 6.0 6.6 6.2 7.0

WT WT-q 8.6 11.1 38.7 8.6 1.6 4.8 5.9 1.2 1.8 5.0 5.6 1.2 0.3 1.9 0.2 0.3 0.3 2.0 0.2 0.3 0.5 2.4 0.4 0.4 0.5 2.3 0.5 0.5 T=100 WT WT-q 5% 10% M 5% 8.5 13.4 13.7 8.5 8.2 12.7 17.4 8.0 5.8 8.7 40.9 5.7 6.7 10.2 38.9 6.5

10% 12.6 13.0 12.7 10.9 10.6 11.5 11.3

11.1 3.5 4.4 1.7 1.8 1.9 2.0

10% 13.4 11.8 8.4 9.8

M 36.6 13.0 19.5 25.5 39.2 17.9 26.9

47.2 48.3 15.4 5.8 4.6 23.1 15.9

M 35.2 14.9 45.2 45.3

WT 5% 10% M 5.2 12.0 36.6 7.9 13.3 15.1 6.7 11.5 22.3 6.5 10.8 27.5 5.0 8.8 41.7 6.7 11.4 19.7 5.3 9.6 29.4 T=200 WT 7.5 11.3 47.2 4.9 10.7 48.1 4.2 9.0 41.8 1.5 4.5 5.2 0.9 3.9 4.0 2.8 6.3 21.0 2.3 6.1 14.3 T=200 WT 5% 10% M 6.8 13.2 35.2 7.1 12.4 17.2 4.9 8.2 43.0 5.0 9.0 47.9

WT-q 5% 10% 5.2 12.0 7.6 12.5 6.5 11.2 6.1 10.7 4.8 8.8 6.5 11.3 5.2 9.6 WT-q 7.5 4.8 3.9 1.2 0.9 2.4 2.1

11.3 9.4 8.8 4.0 3.8 5.5 5.9

WT-q 5% 10% 6.8 13.2 6.9 11.8 4.7 8.1 4.9 8.8

Notes: The results were obtained by estimating 1,000 samples, each of size T. WT = Wald test for the null. WT-q = Wald test for the null adjusted by degrees of freedom. M = P-value of the MannWhitney-Wilcoxon Test. 5%, 10% = Size of the test for the nominal counterpart. ML = Results obtained using the conditional ML estimator. MLE = Results obtained using the unconditional (exact) ML estimator. AIC = Results from EMM using AIC as information criterion and a numerical approximation for the moments (N=2,500). AICE = Results from EMM using AIC as information criterion and the exact moment conditions. BIC = Results from EMM using BIC as information criterion and a numerical approximation for the moments (N=2,500). BICE = Results from EMM using BIC as information criterion and the exact moment conditions. HQ = Results from EMM using HQ as information criterion and a numerical approximation for the moments (N=2,500). HQE = Results from EMM using HQ as information criterion and the exact moment conditions.

19

Figure 1 Empirical and Theoretical Distributions of the Over-Identifying Restrictions Test: MA(1) Model (α=0.5, T=100) AIC

BIC

1.0

1.0

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.0

0.2

0.4

0.6

0.8

1.0

HQ 1.0

1.0

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.0

AIC BIC HQ

0.2

0.4

0.6

0.8

1.0

Notes: The results were obtained by estimating 1,000 samples, each of size T. The x-axis corresponds to the theoretical p-value. The y-axis corresponds to the empirical p-value. On the first three panels, the continuous line corresponds to the tests using T, and the dashed line corresponds to the test using (T-q). The fourth panel used the plots the empirical distributions of the tests using (T-q).

20

Figure 2 Empirical and Theoretical Distributions of the Wald Test: MA(1) Model (α=0.5, T=100) ML

AIC

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0.0

0.0 -3

-2

-1

0

1

2

3

-6

-4

BIC

-2

0

2

HQ

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0.0

0.0 -3

-2

-1

0

1

-4

-2

0

2

Notes: The results were obtained by estimating 1,000 samples, each of size T. The continuous line corresponds to the empirical pdf of the tests. The dashed line corresponds to the pdf of a N(0,1). The empirical pdfs were estimated with the Epanechnikov kernel using the bandwidth proposed by Silverman (1986).

21

Table 6 Properties of the Estimators: ARMA(1,1) Model (δ=-0.8) T=100

α=-0.7 Estimator ML AICE BICE HQE α=-0.8 Estimator ML AICE BICE HQE

T=200

δ Mean -0.793 -0.771 -0.774 -0.775

Mean -0.795 -0.775 -0.779 -0.777

α=-0.9 Estimator ML AICE BICE HQE

Mean -0.797 -0.777 -0.783 -0.782

α=-1.0 Estimator ML AICE BICE HQE

Mean -0.801 -0.786 -0.785 -0.786

α Mean RMSE j Mean -0.704 0.085 -0.797 -0.742 0.170 5.40 -0.784 -0.755 0.185 3.48 -0.786 -0.754 0.175 4.06 -0.785 T=100 δ α RMSE Mean RMSE j Mean 0.063 -0.797 0.075 -0.798 0.127 -0.829 0.155 6.42 -0.785 0.097 -0.843 0.167 3.88 -0.788 0.105 -0.842 0.155 4.83 -0.787 T=100 δ α RMSE Mean RMSE j Mean 0.061 -0.880 0.067 -0.799 0.135 -0.893 0.148 7.89 -0.788 -0.897 0.139 4.28 -0.791 0.097 -0.901 0.127 5.67 -0.790 0.110 T=100 δ α RMSE Mean RMSE j Mean 0.061 -0.926 0.099 -0.803 0.135 -0.926 0.237 9.08 -0.791 0.101 -0.955 0.207 4.52 -0.795 0.108 -0.941 0.196 6.18 -0.794

δ

RMSE 0.064 0.125 0.096 0.102

RMSE 0.044 0.064 0.058 0.060

δ RMSE 0.043 0.068 0.057 0.060

δ RMSE 0.043 0.070 0.064 0.062

δ RMSE 0.043 0.072 0.059 0.061

α j Mean RMSE -0.702 0.057 -0.724 0.097 6.13 -0.746 0.132 3.94 -0.737 0.113 4.72 T=200 α j Mean RMSE -0.799 0.049 -0.823 0.093 7.93 -0.853 0.129 4.69 -0.840 0.113 5.84 T=200 α j Mean RMSE -0.889 0.042 -0.911 0.076 10.42 -0.924 0.100 5.59 -0.921 0.086 7.45 T=200 α j Mean RMSE -0.946 0.069 -0.960 0.076 12.70 -0.964 0.105 6.12 -0.960 0.086 8.50

Notes: The results were obtained by estimating 1,000 samples, each of size T. RMSE = Root Mean Square Error. j = Average lag length of the auxiliary model. ML = Results obtained with the conditional Maximum Likelihood estimator. AICE = Results from EMM using AIC as information criterion and the exact moment conditions. BICE = Results from EMM using BIC as information criterion and the exact moment conditions. HQE = Results from EMM using HQ as information criterion and the exact moment conditions.

22

References Chumacero, R. (1997). “Finite Sample Properties of the Efficient Method of Moments.” Studies in Nonlinear Dynamics and Econometrics 2, 35-51. Gallant, R. and G. Tauchen. (1996). “Which Moments to Match?” Econometric Theory 12, 657-681. Gallant, R. And G. Tauchen. (1998). “The Relative Efficiency of Method of Moments Estimators.” Manuscript. Duke University. Ghysels, E., L. Khalaf, and C. Vodounou. (1994). “Simulation Based Inference in Moving Average Models.” Manuscript. CIRANO. Gorieroux, C., A. Monfort, and E. Renault. (1993). “Indirect Inference.” Journal of Applied Econometrics 8, S85-S118. Hamilton, J. (1994). Time Series Analysis. Princeton University Press. Hansen, L. P. (1982). “Large Sample Properties of Generalized Method of Moments Estimators”. Econometrica 50, 1029-1053. Michaelides, A. And S. Ng. (1997). “Estimating the Rational Expectations Model of Speculative Storage: A Monte Carlo Comaprison of Three Simulation Estimators.” Manuscript. Boston College. Silverman, B. (1986). Density estimation for Statistics and Data Analysis. Chapman & Hall. Tanaka, K. (1996). Time Series Analysis: Nonstationary and Nonivertible Distribution Theory. John Wiley & Sons. Tauchen, G. (1996). “New Minimum Chi-Square Methods in Empirical Finance.” Manuscript. Duke University.

23