shrinkage gmm estimation in conditional moment restriction ... - J-Stage

0 downloads 0 Views 152KB Size Report
effect of a subset of moment conditions that are less important and used only for .... Shrinkage methods have been studied intensively in statistical literature. ..... Nagar approximation of the MSE of a linear combination of the estimator t. ˆ βs, .... homoskedastic instrumental variable regression models, the formula is the same.
J. Japan Statist. Soc. Vol. 39 No. 2 2009 239–255

SHRINKAGE GMM ESTIMATION IN CONDITIONAL MOMENT RESTRICTION MODELS Ryo Okui* This paper proposes the shrinkage generalized method of moments estimator to address the “many moment conditions” problem in the estimation of conditional moment restriction models. This estimator is obtained as the minimizer of the function constructed by modifying the GMM objective function, such that we shrink the effect of a subset of moment conditions that are less important and used only for efficiency. We provide the closed form of the shrinkage parameter that minimizes the asymptotic mean squared error. A simulation study shows encouraging results. Key words and phrases: GMM, Higher order expansion, Instrumental variables estimation, Many instruments problem, Shrinkage method.

1. Introduction This paper considers the estimation of conditional moment restriction models. An important practical problem is that a conditional moment restriction model can yield a large number of potential unconditional moment conditions. Even though using many moment conditions is desirable according to conventional first-order asymptotic theory, it has been found that the two-step generalized method of moments (GMM) estimator (Hansen (1982)) has a considerable bias in finite samples in the presence of many moment restrictions. This paper addresses this “many moment conditions” problem by extending the shrinkage method proposed by Okui (2008) to conditional moment restriction models. The consequences of having many moment conditions have been studied intensively in linear homoskedastic instrumental variable regression models (see, for example, Kunitomo (1980), Morimune (1983), and Bekker (1994)). Donald et al. (2003), Newey and Smith (2004), Han and Phillips (2006) and Newey and Windmeijer (2009) study this problem for GMM estimation in more general moment restriction models. One important finding from the existing literature is that there is a bias-variance trade-off in the number of moment conditions used for estimation. Newey and Smith (2004) show that the bias of the GMM estimator is proportional to the number of moment conditions. To address this “many moment conditions” problem, Donald et al. (2009) propose a procedure to choose the number of moment conditions for the estimation of conditional moment restriction models. This procedure is an extension of Donald and Newey’s (2001) procedure. As an alternative, Okui (2008) puts forward shrinkage methods to alleviReceived August 26, 2009. Revised November 16, 2009. Accepted December 10, 2009. *Insitute of Economic Research, Kyoto University, Yoshida-Hommachi, Sakyo, Kyoto, Kyoto 6068501, Japan. Email: [email protected]

240

RYO OKUI

ate the problem of having many moment conditions for linear homoskedastic simultaneous equation models. We propose the shrinkage generalized method of moments (SGMM) estimator, which extends Okui’s (2008) shrinkage method to possibly nonlinear and heteroskedastic models. The SGMM estimator is obtained as the minimizer of the objective function of the GMM estimator modified by: 1) dividing it into two parts, where the first part corresponds to the “main” moment conditions, which guarantee identification, and where the second part corresponds to “supplemental” moment conditions, which are used for efficiency; and 2) shrinking the effect of the second set of moment conditions. The shrinkage parameter is chosen to minimize the asymptotic mean squared error (MSE). We use the Nagar (1959) approximation of the MSE (see also Donald and Newey (2001) and Donald et al. (2009)). It is the MSE of leading terms in a stochastic expansion of the estimator. We use “many moment conditions asymptotics,” under which both the sample size and the number of moment conditions are taken to infinity. It is found that the optimal shrinkage parameter has a closed form, which makes it easy to implement. It has been observed that shrinkage methods perform better than variable selection approaches in many circumstances. See, for example, Hastie et al. (2001) and Okui (2008). The key decision involved in selection methods is to select which instruments to discard. Even though we alleviate the many-instruments problem by doing so, we also ignore the information that the discarded instruments might reveal. On the other hand, shrinkage methods not only mitigate the many-instruments problem, but also enable the use of the information that is lost by discarding variables. It can then be expected that the SGMM estimator is a good alternative to the moment selection method. A limitation of the shrinkage method proposed here is that it requires us to specify the set of “main” moment conditions which are a priori known to be strong. While this requirement may be restrictive in some applications, there are situations in which it is possible to specify the set of “main” moment conditions. For example, Angrist and Krueger (1991) use quarter-of-birth variables and their interactions with year-of-birth or state-of-birth variables as instruments. In this case, the quarter-of-birth variables may be considered as “main” instruments and the interactions may be considered as other instruments. Another example arises in the situation considered by West et al. (2009). Suppose that we consider a (possibly misspecified) linear (in parameters) model for the relationship between the endogenous regressors and instruments. The main moment conditions in this case would be given by the model we specify and the other moment conditions are generated by other functions of the instruments. We may also consider the situation in which the instruments are some polynomial series. The first few polynomial terms might be the main instruments. We note that selection methods such as those of Donald et al. (2009) typically require a different assumption, that an ordering of instruments is pre-specified, to make them computationally feasible and to justify the method theoretically. Shrinkage methods have been studied intensively in statistical literature.

SHRINKAGE GMM ESTIMATION

241

There are a few articles that consider the application of shrinkage methods in moment restriction models, in addition to Okui (2008) on which the shrinkage estimator proposed here is based. Chamberlain and Imbens (2004) propose a procedure, called random effect quasi-maximum likelihood, which could be categorized as a shrinkage method. It is hard to extend their idea into general conditional moment restriction models because it may be difficult to specify the likelihood of these models, particularly in the presence of heteroskedasticity of unknown form. The kernel-weighted GMM in ARMA models by Kuersteiner (2002) is also related to the ideas explored here. This idea is extended by Canay (2009) such that the “main” instruments are fully used but the effects of the other instruments are shrunk according to the ordering of the instruments. We note that Canay (2009) considers linear instrumental variables regression models, but his specification allows heteroskedasticity. Another related paper is Carrasco (2008). Her idea is different from the one considered here. Her approach involves regularization of the inverse of the covariance matrix of the sample moment conditions while our approach is to shrink the effect of additional moment conditions. The shrinkage method considered here may be regarded as a special case of the model averaging method. It turns out that the SGMM estimator minimizes a weighted average of the objective function of the GMM estimator with “main” moment conditions and that of the estimator with all moment conditions. Kuersteiner and Okui (2009) propose the model averaging two-stage least squares estimator for linear instrumental variables regression models based on the model averaging method of Hansen (2007). This paper considers conditional moment restriction models which are more general than those discussed in Kuersteiner and Okui (2009). On the other hand, the SGMM estimator combines only two models while Kuersteiner and Okui (2009) allow the number of models to be large. The remainder of the paper is organized as follows. Section 2 defines the SGMM estimator and explains its implementation. The theoretical results on which the procedure is based are stated in Section 3. Section 4 contains the results of small Monte Carlo experiments conducted to investigate the small sample properties of the SGMM estimator and to compare the SGMM estimator and the moment selection method proposed by Donald et al. (2009). Section 5 concludes the paper and discusses possible future areas of research. All mathematical proofs are given in the supplementary material that is available at the author’s webpage.   Throughout the paper, i signifies N i=1 . 2. Model and procedure 2.1. Model This paper considers the following conditional moments restriction models: E(ρi (β0 )|xi ) = 0, where ρi (β) ≡ ρ(zi , β) is a scalar valued function, known up to the p × 1 vector of parameter β, zi is a single observation of an i.i.d. sequence (z1 , z2 , . . .), and

242

RYO OKUI

xi is the vector of exogenous variables, which is a part of zi . The moment restriction holds only at the true value β0 . Our goal is to estimate β0 with the data, {(zi )}N i=1 . Conditional moment restriction models are widely used in economics and related fields. For example, a linear instrumental variable regression model be β, where z is the dependent variable, longs to this class with ρi (β) = z1i − z2i 1i z2i is a vector of endogenous regressors, and xi is the vector of instrumental variables. An important difference from the linear instrumental variable regression model considered in Okui (2008) is that the residual, ρi (β), may be conditionally heteroskedastic in the current model. A widely used approach to estimate the parameter β is the GMM estimation using unconditional moment restrictions implied by the conditional moment restriction. We write those unconditional moment restrictions as: E(qi ρi (β0 )) = 0, ¯ × 1 vector of functions of xi . The vector qi is often considered a where qi is a K vector of instrumental variables. The GMM estimator proposed by Hansen (1982) is defined as follows: gˆ(β) ≡

1  qi ρi (β), N i

 ˜ ≡ 1 ˜ 2 qi q  , ˆ β) Υ( ρi (β) i N i

where β˜ is some preliminary consistent estimator of β. The vector gˆ(β) is the ˜ is an ˆ β) sample analog of the vector of the moment conditions and the matrix Υ( estimator of the covariance matrix of the moment conditions. Then, the GMM estimator of β is: ˜ −1 gˆ(β). ˆ β) βˆ = arg min gˆ(β) Υ( β

The GMM estimator is the value of β which makes the vector gˆ(β) closest to 0 ˜ Choosing Υ( ˜ as a ˆ β). ˆ β) in the quadratic distance with the weighting matrix Υ( weighting matrix gives the smallest (first-order) asymptotic variance, provided that the number of moment conditions is fixed. ¯ is large. Let We consider the situation where the dimension of qi , K, 2 2 σi = E(ρi (β0 ) |xi ) and√di = E(∂ρi (β0 )/∂β|xi ). In the current model, the asymptotic variance of a N -consistent regular estimator cannot be smaller than E(σi−2 di di )−1 (Chamberlain (1987)). It can be achieved by the GMM estimator if di /σi2 can be written as a linear combination of the instruments qi . Likewise, if there is a linear combination of the instruments which is close to di /σi2 , then the asymptotic variance of the GMM estimator is small. This observation implies that conventional first-order asymptotic theory favors using all moment conditions to estimate the parameter, as the asymptotic variance of the estimator decreases with the number of moment conditions. However, even though this approach results in a small standard error, it has been observed that the GMM estimator is very sensitive to the number of moment conditions and has

SHRINKAGE GMM ESTIMATION

243

considerable bias in the presence of many moment conditions (Newey and Smith (2004)). Furthermore, if a set of instruments can approximate di /σi2 well, then adding more instruments is not helpful to reduce the asymptotic variance since it cannot be smaller than E(σi−2 di di )−1 . It is, therefore, important to consider how to handle a large number of instruments. This paper proposes the SGMM estimator to overcome this problem. This is an extension of the shrinkage estimator proposed in Okui (2008) to more general situations that allow nonlinearity and heteroskedasticity. 2.2. The SGMM estimator This section introduces the shrinkage generalized method of moments (SGMM) estimator. Suppose we decide that a small subset of available moment conditions are more important for estimation than the remaining moment conditions. We may call the former subset the “main moment conditions” and the latter the “supplemental moment conditions.” That is, we write qi = (q1,i , q2,i ), where q1,i is the m × 1 vector of “main instruments” and q2,i is the K × 1 vector of supple¯ = m + K. This situation is mental instruments used for efficiency. Note that K similar to those of Chamberlain and Imbens (2004) and Okui (2008). We believe that the division of moment conditions is clear in many economic applications as discussed in Section 1. We note that dividing the set of moment conditions arbitrarily does not cause a problem in order for the theoretical results presented in the next section to hold. However, it is better to choose instruments that are reasonably strong as main instruments. In a Monte Carlo simulation reported in Section 4, we discuss the effect of an inappropriate choice of main moment conditions. To describe the SGMM estimator, we first divide the objective function according to the division of the instruments. To this end, define gˆ1 (β) ≡

1  q1,i ρi (β), N i

gˆ2 (β) ≡

1  q2,i ρi (β) N i

and  ˜ ≡ 1 ˜ 2 q1,i q  , ˆ 11 (β) ρi (β) Υ 1,i N i

 ˜ ≡ 1 ˜ 2 q2,i q  , ˆ 21 (β) Υ ρi (β) 1,i N i

 ˜ ≡ 1 ˜ 2 q1,i q  , ˆ 12 (β) Υ ρi (β) 2,i N i  ˜ ≡ 1 ˜ 2 q2,i q  . ˆ 22 (β) Υ ρi (β) 2,i N i

Also, define ˜ −1 Υ ˜ ˆ 11 (β) ˆ 12 (β), g˜2 (β) ≡ gˆ2 (β) − gˆ1 (β) Υ ˜ ≡Υ ˜ −Υ ˜Υ ˜ −1 Υ ˜ ˆ 22 (β) ˆ 21 (β) ˆ 11 (β) ˆ 12 (β). ˜ 22 (β) Υ With these new notations, the GMM estimator can be rewritten as: ˜ −1 gˆ1 (β) + g˜2 (β) Υ ˜ −1 g˜2 (β)). ˆ 11 (β) ˜ 22 (β) βˆ = arg min(ˆ g1 (β) Υ β

244

RYO OKUI

The first and second terms in the objective function correspond to the main moment conditions (those created by q1,i ) and the supplemental moment conditions (those created by q2,i ), respectively. Even though using only q1,i as instruments would give a consistent estimate, we lose efficiency since we only use a small subset of moment conditions. However, adding q2,i into the set of instruments might result in a large bias. Our procedure introduces the shrinkage parameter to weigh down the effect of having supplemental moment conditions. It can be achieved by putting the shrinkage parameter in front of the second term in the objective function. The SGMM estimator, βˆs , is defined as: (2.1)

˜ −1 gˆ1 (β) + s˜ ˜ −1 g˜2 (β)), ˆ 11 (β) ˜ 22 (β) βˆs = arg min(ˆ g1 (β) Υ g2 (β) Υ β

for some shrinkage parameter s, 0 ≤ s ≤ 1. Note that the choice s = 1 leads to the GMM estimator using all the moment conditions. Likewise, setting s = 0 yields the GMM estimator using only the main moment conditions. The SGMM estimator is also understood to be an extension of the model averaging estimator of Kuersteiner and Okui (2009). The SGMM estimator can be written as: ˜ −1 gˆ1 (β) + sˆ ˜ −1 gˆ(β)). ˆ 11 (β) ˆ β) βˆs = arg min((1 − s)ˆ g1 (β) Υ g (β) Υ( β

The SGMM estimator minimizes a weighted average of the objective function of the GMM estimator with main moment conditions and that of the GMM estimator with all moment conditions. 2.3. Choice of shrinkage parameter To implement the SGMM estimator in practice, a method is required to choose the shrinkage parameter. We propose to choose the shrinkage parameter by minimizing the Nagar (1959) approximation of the MSE. The definition of the approximation and the theoretical results are stated in the next section. This section focuses on the procedure itself. The choice of the shrinkage parameter this paper recommends is the following: sˆ∗ =

(2.2)

ˆ 2 τˆ τˆ Ω ˆ 2 /N + τˆ Ω ˆ 2 τˆ Π 2

,

where ˆ −1 t, τˆ ≡ Ω

ˆ2 ≡ Γ ˆ Υ ˜ ˜ −1 ˆ Ω 0,2 22 (β) Γ0,2 ,

ˆ2 ≡ Π

 i

t is a (possibly estimated) weighting vector, and ˜Γ ˆ β) ˆ 0, ˆ ≡Γ ˆ 0 Υ( Ω

 ∂ ˜ ˆ0 ≡ 1 Γ qi ρi (β), N i ∂β

   ˆ ˜ −1 Υ ˜ ˆ 12 (β), qˆi,2 ≡ qi,2 − qi,1 Υ11 (β)

˜ τ  ηˆi ξˆii,2 ρi (β)ˆ

SHRINKAGE GMM ESTIMATION



  

 ∂ ∂ ˜ − ˜ ηˆi ≡ qi ρi (β) ρi (β) ∂β ∂β i

 ∂ ˜ ˆ 0,2 ≡ 1 qˆi,2 ρi (β), Γ N i ∂β

qi qi

245

−1

qi ,

i

1  ˜ ˜ −1 ξˆii,2 ≡ qˆi,2 Υ22 (β) qˆi,2 . N

ˆ 2 is an estimate of the bias of the GMM estimator caused by adding The term Π ˆ 2 is an estimate of the amount of q2,i into the set of instruments. The matrix Ω the variance reduction as a result of including q2,i into the instrument set. We recommend sˆ∗ because it estimates the shrinkage parameter that minimizes the Nagar approximation of the MSE of a linear combination of the estimator t βˆs , as explained in the following section. Let us summarize the procedure. The first step is a preliminary estimation and can be done with any estimator satisfying Assumption 4 (stated in the next section) such as the GMM estimator with a fixed number of instruments and the IV estimator of Donald et al. (2003). (The IV estimator is a GMM estimator  using the matrix i qi qi /N as a weighting matrix. In linear models, it is equivalent to the two stage least squares estimator.) The second step is to estimate the shrinkage parameter by the formula (2.2) and the weighting matrix. The SGMM estimator is obtained by minimizing the objective function in (2.1) with the estimated shrinkage parameter. 3. Theoretical results This section provides the theoretical results that justify the use of the SGMM estimator and the choice of the shrinkage parameter explained in the previous section. We make several assumptions that are similar to those imposed in Donald et al. (2009). Assumption 1. (a) z is i.i.d. (b) β0 satisfies E[ρ(z, β)|x] = 0 uniquely in  Θ (a compact subset of Rp ). (c) i σi−2 di di /N is uniformly positive definite over N and finite with probability one. (d) σi2 is bounded and bounded away from zero. (e) Let ηi ≡ (∂/(∂β))ρi (β0 ) and ρi ≡ ρi (β0 ). E(ηik ρli |xi ) = 0 for any non-negative integers k and l such that k + l = 3. (f) E(ηi k ρli |xi ) is bounded for any non-negative integers k and l such that k + l = 6. Assumptions 1(b) and 1(c) identify the parameter β. The third moment conditions imposed in Assumption 1(e) is used to simplify the MSE formula. It is often employed in the literature on higher order expansion (see, for example, Donald and Newey (2001), Hahn et al. (2001), and Donald et al. (2009)). Assumption 2. ρ(·) is at least four times continuously differentiable in the neighborhood of β0 with derivatives that are all dominated in absolute value by the random variable bi with E(b2i ) < ∞. Taylor series expansions are used in order to calculate the asymptotic MSE. This assumption is needed in order to guarantee the existence of the expansion

246

RYO OKUI

and to control the remainder terms. The following assumption imposes restrictions on the instrumental variables. Assumption 3. (a) There is ζ(K) such that, for each K, there is a nonsingular constant matrix B such that q˜(x) = Bq(x) for all x in the support of x and supx∈X ˜ q (x) < ζ(K) √ and the smallest eigenvalue of E(˜ q (x)˜ q (x) ) is bounded away from zero, and K ≤ ζ(K) ≤ CK for some finite constant C. (b) For each K there exists a sequence of constants π and π ∗ such that E(d(xi ) − q(xi ) π2 ) → 0 and ζ(K)2 E(d(xi )/σi2 − q(xi ) π ∗ 2 ) → 0 as K → ∞. ¯ = m+K, the assumption is given We note that while the dimension of q is K in terms of K because m is assumed to be fixed. Assumption 3(a) is important in order to bound remainder terms in the stochastic expansion. With Assumption 3(b), the asymptotic variance of the SGMM estimator (and the GMM estimator) ˜ is E(σi−2 di di ). The next set of assumptions is about the preliminary estimate β.  Assumption 4. (a) β˜ →p β0 . (b) We can write β˜ = β0 + i φi ρi /N + √ op (1/ N ), where E(φi ρi |xi ) = φi E(ρi |xi ) = 0 and E(ρ2i φi φi ) < ∞. √ The estimator β˜ is required to be N -consistent. However, it is not required to be efficient. This assumption is satisfied for many common choices of preliminary estimates, such as the IV estimator of Donald et al. (2003). The first theorem derives the consistency and asymptotic normality of the SGMM estimator.

Theorem 1. Suppose that Assumptions 1-4 are satisfied,√and that N → ∞, K → ∞, ζ(K)2 K/N → 0 and s →p 1. Then, βˆs →p β0 and N (βˆs − β0 ) →d N (0, E(σi−2 di di )−1 ). The proof of this theorem is essentially the same as that of Theorems 5.3 and 5.4 in Donald et al. (2003). This result justifies the use of the SGMM estimator. The SGMM estimator has the same properties as those of the GMM estimator in the first-order asymptotics. It is therefore semi-parametrically efficient. Unfortunately, this result also indicates that the first-order asymptotic theory does not provide useful information to choose the shrinkage parameter. As in Okui (2008), we propose to choose the shrinkage parameter to minimize the Nagar approximation of the MSE (c.f. Nagar (1959) and Donald and Newey (2001)). We approximate the MSE of a linear combination of the estimator E(N t (βˆ − β0 )(βˆ − β0 ) t) by t Ω∗−1 t + S(s) where (3.1)

ˆ ˆ + R(s), N t (βˆ − β0 )(βˆ − β0 ) t = Q(s) ˆ E(Q(s)|X) = t Ω∗−1 t + S(s) + T (s), 

−2  ˆ [R(s) + T (s)]/S(s) = op (1) as K → ∞, N → ∞, and Ω∗ ≡ N i=1 σi di di /N . The Nagar approximation involves two layers of approximation. The first layer concerns the stochastic expansion, i.e., we divide N t (βˆ − β0 )(βˆ − β0 ) t into two

SHRINKAGE GMM ESTIMATION

247

ˆ ˆ ˆ parts, Q(s) and R(s), and discard R(s), which goes to zero faster than S(s). The second layer is the approximation of the mean of the leading term, i.e., we ignore the term T (s), which goes to zero faster than S(s) when we evaluate ˆ the conditional mean of Q(s) given the exogenous variable x. The term t Ω∗−1 t corresponds to the first-order asymptotic variance. Hence, S(s) is the nontrivial ˆ Our goal and dominant term of the MSE of the dominant term for estimator β. is to find S(s). While a common approach to choosing the shrinkage parameter in statistical literature is an exact finite sample approach, this paper uses the Nagar approximation because there are several disadvantages in following an exact finite sample approach for the current model as argued in Okui (2008). First, finite sample approaches have to rely on some distributional assumption. Second, they usually provide a complicated result, which makes the choice of the shrinkage parameter difficult in practice. We believe that the Nagar approximation of the MSE is the most practical way to choose the shrinkage parameter among those with theoretical justification. Moreover, since the moment selection method by Donald et al. (2009) also uses the Nagar approximation of the MSE to choose the number of moment conditions, the comparison between the selection method and the SGMM estimation should be easier. The next theorem derives the form of S(s). To state the theorem, we introduce the following notations. τ ≡ Ω∗−1 t, Π2 ≡

 i

Ω ≡ Γ0 Υ−1 Γ0 ,

E(ρi τ  ηi |xi )ξii,2 ,

Ω2 ≡ Γ0,2 Υ−1 22 Γ0,2 ,    ∂ 1 Γ0 ≡ qi E ρi (β0 )|xi , N i ∂β

 ≡ qi,2 − qi,1 Υ−1 11 Υ12 , 1  1  ∗ ∗  Υ22 ≡ σi qi,2 qi,2 , Υ11 ≡ σi qi,1 qi,1 , N i N i ∗ qi,2

Γ0,2 ≡

  ∂ 1  ∗ qi,2 E ρi (β0 )|xi , N i ∂β

ξii,2 ≡

Υ12 ≡

1   σi qi,1 qi,2 , N i

1 ∗ −1 ∗ q Υ q . N i,2 22 i,2

We also impose an additional assumption that |Π2 | > cK for some constant. It requires that E(ηi ρi |xi ) = 0. It guarantees that the MSE formula has the term for the square of the bias, the order of which is K 2 /N . If E(ηi ρi |xi ) = 0, then the bias term disappears and the formula of the MSE presented later is not appropriate. An example where this assumption is violated is the standard linear regression model with conditional heteroskedastic error studied by Cragg (1983). On the other hand, this assumption is satisfied in typical instrumental variables regression models. Theorem 2. Suppose that Assumptions 1-4 are satisfied, N → ∞, K → ∞, ζ(K)2 K/N → 0, and 1 − s = Op (K 2 /N ). Assume that |Π2 | > cK for some constant c with probability one. Then, for βˆs , the decomposition given by (3.1)

248

RYO OKUI

holds with: S(s) =

s2 Π22 + τ  (Ω∗ − Ω + (1 − s)2 Ω2 )τ. N

The proof is similar to that of Proposition 1 in Donald et al. (2009). The first term on the right-hand side represents the square of the bias, whose order is proportional to the number of moment conditions. The bias is large if E(ηi ρi |xi ) is large. In linear models, E(ηi ρi |xi ) corresponds to the correlation between the endogenous variables and the error term. It can be seen that introducing the shrinkage parameter mitigates the bias of the GMM estimator. The second term on the right hand side corresponds to the second-order variance term, which is a decreasing function of the shrinkage parameter. The term Ω2 corresponds to the amount of the variance reduction by including q2,i into the set of instruments. This theorem is a generalization of the results in Okui (2008). Under linear homoskedastic instrumental variable regression models, the formula is the same as in Theorem 2 in Okui (2008). Note that the result of Proposition 1 in Donald et al. (2009) can be obtained by setting s = 1. The optimal shrinkage parameter is obtained by minimizing the asymptotic MSE with respect to s and has the closed form: s∗ =

τ  Ω2 τ . Π22 /N + τ  Ω2 τ

We observe that 0 ≤ s∗ ≤ 1, which means that the optimal shrinkage parameter always lies in a natural parameter region of the shrinkage parameter. The form of the optimal shrinkage parameter has an interesting interpretation. Adding q2,i into the set of instruments increases the bias by Π22 /N , and reduces the variance by Ω2 . Therefore, if the bias caused by adding q2 is large, which happens when there is strong endogeneity, then the optimal shrinkage parameter is close to 0 and, if the variance gain is large, which happens when qi,2 is a set of strong instruments, then it is close to 1. Note that, if β is unidimensional, the choice of t does not affect the optimal shrinkage parameter. 4. Monte Carlo experiments This section reports the results of the Monte Carlo experiments. This Monte Carlo simulation is conducted with Ox 5.1 (Doornik (2007)) for Linux. The aim of these experiments is to show how the SGMM estimator behaves in finite samples and to compare the SGMM estimator with the moment selection procedure proposed by Donald et al. (2009). 4.1. Design Our data generating process is the following model: yi = δYi + i ,

Yi = πZi + ui ,

SHRINKAGE GMM ESTIMATION

249

for i = 1, . . . , n, where δ is a scalar parameter of interest, π is a scalar parameter that represents the strength of instruments, Zi ∼ i.i.d. N (0, 1), and 

i ui



  

= vi σ(Zi ),

vi ∼ i.i.d.N

0 0

,

1c c1



.

The function σ(·) is specified below. The vector of instruments is (1, Zi , Zi2 , . . . , Zi9 ); there are ten instrumental variables. We fix the true value of δ, δ = 1 and we examine how well each estimator estimates δ. The strength of the instrument Zi is determined by the value of π, and we set π = 0.2. (We also consider π = 0.5, but the results are qualitatively similar.) In this framework, each experiment is characterized by a vector of (n, c, σ(Z)). We use n = 100 and 500. The degree of endogeneity is summarized in c and we set c = 0.1, 0.5, and 0.9. The number of replications is 5000. Two different functional forms of σ(·) are considered. The √ first specification is σ(Z) = 1 (homoskedasticity). The second is σ(Z) = 0.7 + 0.2Z + 0.3Z 2 (heteroskedasticity). This functional form of σ(Z) is used in Cragg (1983). The functional form of σ(·) affects the strength of each instrument. For example, if the errors are homoskedastic, then using only Zi as an instrument achieves efficiency; other variables are not useful to reduce the asymptotic variance of the estimator. On the other hand, if the errors are highly heteroskedastic, employing many instrumental variables may be needed to achieve a small asymptotic variance. We compare the following four estimators: the instrumental variable estimator using only Zi (IV); the GMM estimator with all the instruments (GMM); the GMM estimator with the number of instruments selected by the procedure of Donald et al. (2009) (DIN); and the SGMM estimator proposed in the current paper (SGMM). For DIN, we select a set of instruments among ten sets: (Zi ), (1, Zi ), (1, Zi , Zi2 ), . . . , (1, Zi , Zi2 , . . . , Zi9 ). Among instruments, the variable Zi is treated as the “main” instrument (q1,i in the previous section) for SGMM. We also examine the effect of a “wrong” order of instruments or a “wrong” choice of main moment conditions. DIN-W is the GMM estimator with the number of instruments selected by the procedure of Donald et al. (2009), but the order of instruments is reversed so that it is Zi9 , Zi8 , . . . , 1, Zi . SGMM-W is the SGMM estimator whose choice of the main instrument is Zi9 . To implement GMM, DIN, SGMM, DIN-W and SGMM-W, a preliminary estimate is needed in order to compute the weighting matrix, the selection criteria, and/or the shrinkage parameter. We use the estimator “IV” for this purpose. 4.2. Results For each estimator, we report the median bias (Median bias) and the median absolute deviation (MAD). We use these “robust” measures because of concerns about the existence of moments of estimators. In fact, we often obtain very large MSEs for IV and DIN. A disadvantage of using these robust measures is that the relationship between the theoretical results and the simulation results becomes less clear. To overcome this issue at least partially, we also compute the root

250

RYO OKUI Table 1. Monte Carlo simulations: σ(Z) = 1.

IV

GMM

DIN

SGMM

DIN-W

SGMM-W

median bias RMTSE. MAD Cov. Rate

0.006 0.644 0.366 0.992

0.0538 0.415 0.242 0.613

0.0388 0.587 0.299 0.696

0.0406 0.466 0.273 0.81

0.0605 0.52 0.275 0.572

0.0612 0.437 0.249 0.804

median bias RMTSE. MAD Cov. Rate

0.004 0.248 0.151 0.966

0.0253 0.212 0.137 0.81

0.0238 0.239 0.149 0.805

0.0176 0.225 0.142 0.874

0.0323 0.246 0.147 0.788

0.0259 0.215 0.137 0.882

median bias RMTSE. MAD Cov. Rate

0.0247 0.636 0.352 0.955

0.247 0.447 0.313 0.482

0.169 0.623 0.378 0.597

0.164 0.468 0.298 0.729

0.271 0.565 0.362 0.467

0.24 0.475 0.317 0.715

median bias RMTSE. MAD Cov. Rate

0.004 0.257 0.15 0.952

0.126 0.228 0.162 0.714

0.0638 0.274 0.176 0.757

0.0631 0.234 0.155 0.846

0.142 0.301 0.189 0.676

0.129 0.237 0.169 0.828

median bias RMTSE. MAD Cov. Rate

0.0312 0.63 0.324 0.885

0.49 0.556 0.498 0.182

0.166 0.644 0.4 0.654

0.261 0.445 0.316 0.734

0.459 0.638 0.493 0.313

0.446 0.562 0.467 0.709

median bias RMTSE. MAD Cov. Rate

0.004 0.278 0.148 0.933

0.235 0.272 0.24 0.477

0.0348 0.284 0.164 0.829

0.0754 0.231 0.154 0.861

0.238 0.368 0.261 0.546

0.233 0.281 0.245 0.819

c = 0.1 N = 100

N = 500

c = 0.5 N = 100

N = 500

c = 0.9 N = 100

N = 500

mean truncated squared error (RMTSE): 

E(min((δˆ − δ)2 , 2))

ˆ We note that a similar measure is used in Chamberlain for each estimator δ. and Imbens (2004). This measure is always finite and should be closely related to the MSE. The coverage rate of the (Wald) 95% confidence interval based on the estimate (Cov. Rate) is also computed. Tables 1–2 summarize the results of the experiments. The results in terms of

SHRINKAGE GMM ESTIMATION Table 2. Monte Carlo simulations: σ(Z) =

IV

GMM

DIN

median bias RMTSE. MAD Cov. Rate

0.008 0.758 0.454 0.992

0.0592 0.458 0.26 0.598

median bias RMTSE. MAD Cov. Rate

0.010 0.334 0.195 0.976

median bias RMTSE. MAD Cov. Rate



251

0.7 + 0.2Z + 0.3Z 2 .

SGMM

DIN-W

SGMM-W

0.0501 0.681 0.339 0.709

0.048 0.517 0.303 0.815

0.0644 0.584 0.307 0.56

0.0623 0.486 0.27 0.804

0.0304 0.243 0.153 0.799

0.0288 0.293 0.167 0.804

0.0197 0.267 0.161 0.873

0.0427 0.328 0.17 0.773

0.0296 0.247 0.153 0.876

0.0574 0.744 0.441 0.955

0.29 0.497 0.343 0.45

0.206 0.698 0.425 0.601

0.213 0.52 0.334 0.721

0.325 0.629 0.409 0.441

0.299 0.53 0.363 0.712

median bias RMTSE. MAD Cov. Rate

0.010 0.347 0.193 0.95

0.152 0.262 0.188 0.689

0.0987 0.339 0.213 0.741

0.0889 0.274 0.178 0.831

0.195 0.42 0.243 0.635

0.163 0.273 0.196 0.812

median bias RMTSE. MAD Cov. Rate

0.0767 0.721 0.395 0.867

0.558 0.625 0.564 0.141

0.252 0.723 0.488 0.626

0.355 0.523 0.388 0.711

0.574 0.736 0.601 0.248

0.558 0.654 0.568 0.681

median bias RMTSE. MAD Cov. Rate

0.0102 0.377 0.186 0.919

0.303 0.328 0.305 0.401

0.0765 0.37 0.215 0.801

0.115 0.265 0.186 0.848

0.375 0.568 0.399 0.408

0.327 0.361 0.331 0.805

c = 0.1 N = 100

N = 500

c = 0.5 N = 100

N = 500

c = 0.9 N = 100

N = 500

the relative performances of the different estimators are very similar across the different specifications of the functional form of σ(·). The relative performance of GMM, DIN and SGMM based on RMTSE is similar to that based on MAD. However, IV tends to have a large RMTSE even when its MAD is small. This result seems to be caused by the fact that IV does not have any moment. We first examine the performances of GMM and IV. If endogeneity is small (c = 0.1), GMM performs very well and often is best among the estimators. However if c = 0.5 or 0.9, then GMM works poorly because of its large bias, and

252

RYO OKUI

it is outperformed by IV in terms of MAD. This result shows the “many moment conditions” problem. The confidence intervals based on IV give good coverage rates. On the other hand, the coverage rates of those based on GMM show poor performance. This phenomenon is also a consequence of using many moment conditions. Next, we examine the performances of DIN and SGMM. The performances of the SGMM estimator are remarkable. SGMM have small MAD in cases with c = 0.5 and c = 0.9. If c = 0.1, GMM outperforms SGMM, yet SGMM still works better than DIN in any measure. The performances of DIN are not satisfactory in the current design. Even though selecting the number of moments improves the precision of the estimator in cases with c = 0.9, DIN is outperformed by IV and SGMM in those cases. If c = 0.1 or 0.5, GMM has a smaller MAD than DIN while the bias of GMM is often larger than that of DIN. In all cases, applying the selection method or the shrinkage method leads to improvements in coverage probability. Between these two, the confidence intervals based on SGMM have better coverage rates. Lastly, we examine of the performance of DIN-W and SGMM-W. When c = 0.1, their performances are comparable to other estimators. It is probably because the strength of instruments is not important when the degree of endogeneity is small. However, when c = 0.5 or c = 0.9, DIN-W is outperformed by DIN and SGMM-W is outperformed by SGMM. There are also cases in which they perform worse than GMM does. This result illustrates the importance of appropriate choice of main moment conditions or appropriately deciding the order of instruments when we implement shrinkage or selection methods. We observe that SGMM-W performs better than DIN-W does. This indicates that the SGMM estimator is relatively robust to the a priori information about the strength of instruments. The simulation results also show that the performances of estimators improve as the sample size increases, which indicates that the results of the asymptotic analysis hold even when we fail to see which moment conditions are strong. In summary, the shrinkage GMM estimator performs well and often outperforms other estimation methods, at least in the current simulation design. However, we should be careful in choosing the set of main instruments because it affects the performance. 5. Conclusion This paper proposes the shrinkage GMM estimator to address the “many moment conditions” problem in the estimation of conditional moment restriction models. The shrinkage GMM estimator shrinks the effect of a subset of moment conditions. It may also be regarded as a model averaging estimator that averages the objective function of the GMM estimator with a small number of moment conditions and that with a large number of moment conditions. The Monte Carlo simulation shows that the shrinkage GMM estimator often performs better than conventional estimators. It would be beneficial if future applied research utilized

SHRINKAGE GMM ESTIMATION

253

the shrinkage method as an alternative approach to moment selection. There are several directions to which we extend this paper. An interesting extension is to consider a shrinkage version of other estimation methods. As observed in the literature, the generalized empirical likelihood (GEL) estimators (see, e.g., Qin and Lawless (1994), Hansen et al. (1996), Kitamura and Stutzer (1997), Smith (1997) and Imbens et al. (1998)) have superior finite sample properties in the presence of many moments conditions (see, e.g., Newey and Smith (2004)). It is therefore beneficial to consider a shrinkage version of the GEL estimators in order to improve their finite sample properties further. An approach may be to shrink the value of the Lagrange multiplier associated with supplemental moment conditions toward zero. Another possible extension is concerned with the assumption that we assume that all the instruments are orthogonal to the error term. It is important to examine the validity of instruments in practice. As considered by Hall and Peixe (2003), we may apply the method of Andrews (1999) first in order to eliminate invalid instruments and then apply the shrinkage method. Investigating the properties of such a procedure is also an interesting future research topic. Higher-order efficiency property of selection and shrinkage estimators would also be an interesting theoretical question. For linear instrumental variables regression models with Gaussian errors, Takeuchi and Morimune (1985) establish a higher-order efficiency of LIML-type estimators whose asymptotic bias is adjusted. Anderson et al. (2005) and Anderson et al. (2009) obtain a similar result in the presence of many instruments. For general moment restriction models, Newey and Smith (2004) provide a discussion on higher-order efficiency of biascorrected empirical likelihood estimators when the distribution is multinomial. Their framework excludes the possibility of instrument selection or shrinkage estimation. It is an important but challenging future research topic to explore how to discuss higher-order efficiency properties with instrument selection and shrinkage estimators in conditional moment restriction models. Acknowledgements This study is part of the author’s dissertation project at the University of Pennsylvania. The author gratefully acknowledges the hospitality of Yale University, where part of this paper was written. The author would like to thank his adviser, Yuichi Kitamura, for his help, patience and encouragement, and Gregory Kordas, Petra Todd and Frank Schorfheide for their helpful supervision. The author obtained valuable comments from two anonymous referees and participants of Hitotsubashi Conference on Econometrics. The author acknowledges financial support from the Hong Kong Research Grants Council under Project No. HKUST643907 and Kyoto University. References Anderson, T. W., Kunitomo, N. and Matsushita, Y. (2005). A new light from old wisdoms: Alternative estimation methods of simultaneous equations and microeconometric models, unpublished manuscript.

254

RYO OKUI

Anderson, T. W., Kunitomo, N. and Matsushita, Y. (2009). On the asymptotic optimality of the LIML estimator with possibly many instruments, J. Econ., forthcoming. Andrews, D. W. K. (1999). Consistent moment selection procedures for generalized method of moments estimation, Econometrica, 67(3), 543–564. Angrist, J. D. and Krueger, A. B. (1991). Does compulsory school attendance affect schooling and earnings?, Q. J. Econ., 106(4), 979–1014. Bekker. P. A. (1994). Alternative approximations to the distributions of instrumental variable estimators, Econometrica, 62(3), 657–681. Canay, I. A. (2009). Simultaneous selection and weighting of moments in GMM using a trapezoidal kernel, J. Econ., forthcoming. Carrasco, M. (2008) A regularization approach to the many instruments problem, unpublished manuscript. Chamberlain, G. (1987). Asymptotic efficiency in estimation with conditional moment restrictions, J. Econ., 34, 305–334. Chamberlain, G. and Imbens, G. (2004). Random effects estimators with many instrumental variables, Econometrica, 72(1), 295–306. Cragg, J. (1983). More efficient estimation in the presence of heteroscedasticity of unknown from, Econometrica, 49, 751–764. Donald, S. G. and Newey, W. K. (2001). Choosing the number of instruments, Econometrica, 69(5), 1161–1191. Donald, S. G., Imbens, G. W. and Newey, W. K. (2003). Empirical likelihood estimation and consistent tests with conditional moment restrictions, J. Econ., 117, 55–93. Donald, S. G., Imbens, G. W. and Newey, W. K. (2009). Choosing the number of moments in conditional moment restriction models, J. Econ., 152, 28–36. Doornik, J. A. (2007). Ox 5 - An Object-oriented Matrix Programming Language, Timberlake Consultants Ltd. Hahn, J., Hausman, J. and Kuersteiner, G. (2001). Bias corrected instrumental variables estimation for dynamic panel models with fixed effects, unpublished manuscript. Hall, A. R. and Peixe, F. P. M. (2003). A consistent method for the selection of relevant instruments, Econ. Rev., 22(3), 269–287. Han, C. and Phillips, P. C. B. (2006). GMM with many moment conditions, Econometrica, 74(1), 147–192. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators, Econometrica, 50(4), 1029–1053. Hansen, B. E. (2007). Least squares model averaging, Econometrica, 75(4), 1175–1189. Hansen, L. P., Heaton, J. and Yaron, A. (1996). Finite-sample properties of some alternative GMM estimators, J. Bus. Econ. Stat., 14(3), 262–280. Hastie, T., Tibshirani, R. and Friedman. J. (2001). The Elements of Statistical Learning; Data Mining, Inference, and Prediction, Springer, New York. Imbens, G. W., Spady, R. H. and Johnson, P. (1998). Information theoretic approaches to inference in moment condition models, Econometrica, 66, 333–357. Kitamura, Y. and Stutzer, M. (1997). An information-theoretic alternative to generalized method of moments estimation, Econometrica, 65, 861–874. Kuersteiner, G. M. (2002). Mean squared error reduction for GMM estimators of linear time series models, unpublished manuscript. Kuersteiner, G. and Okui, R. (2009). Constructing optimal instruments by first stage prediction averaging, Econometrica, forthcoming. Kunitomo, N. (1980). Asymptotic expansions of the distributions of estimators in a linear functional relationship and simultaneous equations, J. Am. Stat. Assoc., 75, 693–700. Morimune, K. (1983). Approximate distribution of the k-class estimators when the degree of overidentifiability is large compared with the sample size, Econometrica, 51(3), 821–841. Nagar, A. L. (1959). The bias and moment matrix of the general k-class estimators of the parameters in simultaneous equations, Econometrica, 27(4), 575–595. Newey, W. K. and Smith, R. (2004). Higher order properties of GMM and generalized empirical likelihood estimators, Econometrica, 72(1), 219–255.

SHRINKAGE GMM ESTIMATION

255

Newey, W. K. and Windmeijer, F. (2009). Generalized method of moments with many weak moment conditions, Econometrica, 77(3), 687–719. Okui, R. (2008). Instrumental variable estimation in the presence of many moment conditions, J. Econ., forthcoming. Qin, J. and Lawless, J. (1994). Empirical likelihood and general estimating equations, Ann. Stat., 22(1), 300–325. Smith, R. J. (1997). Alternative semi-parametric likelihood approaches to generalized method of moments estimation, Econ. J., 107, 503–519. Takeuchi, K. and Morimune, K. (1985). Third-order efficiency of the extended maximum likelihood estimators in a simultaneous equation system, Econometrica, 53(1), 177–200. West, K., Wong, K. and Anatolyev, S. (2009). Instrumental variables estimation of heteroskedastic linear models using all lags of instruments, Econ. Rev., 28(5), 441–467.

Suggest Documents