Instrumental Variable Estimation with ...

Instrumental Variable Estimation with Heteroskedasticity and Many Instruments∗ Jerry A. Hausman Department of Economics M.I.T.

Whitney K. Newey Department of Economics M.I.T.

Tiemen Woutersen Department of Economics John Hopkins University

John Chao Department of Economics University of Maryland

Norman Swanson Department of Economics Rutgers University August 2006 Revised September 2007

Abstract It is common practice in econometrics to correct for heteroskedasticity. This paper corrects instrumental variables estimators with many instruments for heteroskedasticity. We give heteroskedasticity robust versions of the limited information maximum likelihood (LIML) and Fuller (1977, FULL) estimators; as well as heteroskedasticity consistent standard errors thereof. The estimators are based on removing the own observation terms in the numerator of the LIML variance ratio. We derive asymptotic properties of the estimators under many and many weak instruments setups. Based on a series of Monte Carlo experiments, we find that the estimators perform as well as LIML or FULL under homoskedasticity, and have much lower bias and dispersion under heteroskedasticity, in nearly all cases considered.

JEL Classification: C12, C13, C23 Keywords: Instrumental Variables, Heteroskedasticity, Many Instruments, Jackknife ∗

The NSF provided financial support for this paper under Grant No. 0136869. Helpful comments were provided by A. Chesher and participants in seminars at CalTech, CEMMAP, Harvard, MIT, Pittsburgh, UC Berkeley, UCL, and USC. Capable research assistance was provided by A. Kowalski, R. Lewis, and K. Menzel.

1

Introduction

It is common practice in econometrics to correct standard errors for heteroskedasticity. A leading example of such correction is least squares with heteroskedasticity consistent standard errors, which is ubiquitous. Additionally, two-stage least squares (2SLS) with heteroskedasticity consistent standard errors is often used, in exactly identified models. However, such corrections seem not to be available for the Fuller (1977, FULL) and limited information maximum likelihood (LIML) estimators, in overidentified models. This perhaps surprising, given that FULL and LIML have better properties than 2SLS (see e.g. Hahn and Inoue (2002), Hahn and Hausman (2002), and Hansen, Hausman, and Newey, (2007)). The purpose of this paper is to correct these methods for heteroskedasticity under many instruments, and we shall see that it is necessary to correct both the estimators and the standard errors. LIML and FULL are inconsistent with many instruments and heteroskedasticity, as pointed out for the case of dummy instruments and LIML by Bekker and van der Ploeg (2005), and more generally by Chao and Swanson (2004).1 Here we give a general characterization of this inconsistency. More importantly, we propose heteroskedasticity robust versions of FULL and LIML, namely HFUL and HLIM, respectively. HLIM is a jackknife version of LIML that deletes own observation terms in the numerator of the variance ratio; and like LIML, HLIM is invariant to normalization. Also, HLIM can be interpreted as a linear combination of forward and reverse jackknife instrumental variable (JIV) estimators, analogous to Hahn and Hausman’s (2002) interpretation of LIML as a linear combination of forward and reverse two-stage least squares estimators. For each estimator we also give heteroskedasticity consistent standard errors that adjust for the presence of many instruments. We show that HLIM and HFUL are as efficient as FULL and LIML under homoskedasticity and the many weak instruments sequence of Chao and Swanson (2005). Under the many instruments sequence of Kunitomo (1980) and Bekker (1994) we show that HLIM 1

See also Ackerberg and Devereux (2003).

[1]

may be more or less efficient than LIML. We argue that these efficiency differences will tend to be small in most applications, where the number of instrumental variables is small relative to the sample size. The HFUL and HLIM estimators and their associated standard errors are quite simple to compute. However, similarly to least squares not being efficient under heteroskedasticity, HFUL and HLIM are also not efficient under heteroskedasticity and many instruments. Recent results of Newey and Windmeijer (2007) suggest that the continuous updating estimator (CUE) of Hansen, Heaton, and Yaron (1996) and other generalized empirical likelihood estimators (see e.g. Smith (2004)) are efficient. These estimators are quite difficult to compute, though. To address this problem, we give a linearized, jackknife version of the continuous updating estimator that is easier to compute, and for which HLIM provides simple starting values. In Monte Carlo work we do not find much advantage to using the CUE, and no advantage to using its linearized version, relative to HFUL and HLIM. One important precedent to the research discussed in this paper is Hahn and Hausman (2002), who considered combining forward and reverse IV estimators. JIV estimators were proposed by Phillips and Hale (1977), Blomquist and Dahlberg (1999), Angrist and Imbens and Krueger (1999), and Ackerberg and Deveraux (2003). Chao and Swanson (2004) have previously given heteroskedasticity consistent standard errors and shown asymptotic normality for JIV, under many weak instruments. Newey and Windmeijer (2007) considered efficiency of IV estimators with heteroskedasticity and many weak instruments. In a series of Monte Carlo experiments, we show that the HFUL and HLIM are approximately as efficient as LIML under homoskedasticity, unlike the JIV estimator, that was shown to perform poorly relative to LIML by Davidson and MacKinnon (2006). Also, HFUL has less bias and dispersion than FULL in most of the cases that we consider, under heteroskedasticity. These results suggest that the new estimators are promising heteroskedasticity robust and efficient alternatives to FULL, LIML, and other estimators, under many instruments. [2]

The rest of the paper is organized as follows. In the next section, the model is outlined, and previous estimators are summarized. In Section 3, heteroskedasticity robust LIML and FULL estimators are presented; while Section 4 discusses efficiency of these estimators. Section 5 outlines how to use the same jackknifing approach used in the construction of HLIM and HFUL in order to construct a robust CUE. Asymptotic theory is gathered in Section 6, and Monte Carlo findings are presented in Section 7. All proofs are gathered in Section 8.

2

The Model and Previous Estimators

The model we consider is given by y n×1

=

X δ0 n×GG×1

+ ε , n×1

X = Υ + U, where n is the number of observations, G is the number of right-hand side variables, Υ is a matrix of observations on the reduced form, and U is the matrix of reduced form disturbances. For our asymptotic approximations, the elements of Υ will be implicitly allowed to depend on n, although we suppress dependence of Υ on n for notational convenience. Estimation of δ0 will be based on an n × K matrix, Z, of instrumental variable observations with rank(Z) = K. We will assume that Z is nonrandom and that observations (εi , Ui ) are independent across i and have mean zero. This model allows for Υ to be a linear combination of Z, i.e. Υ = Zπ for some K × G matrix π.

Furthermore, some columns of X may be exogenous, with the correspond-

ing column of U being zero. The model also allows for Z to approximate the reduced form. For example, let Xi0 , Υ0i , and Zi0 denote the ith row (observation) of X, Υ, and Z respectively. We could define Υi = f0 (wi ) to be a vector of unknown functions of a vector wi of underlying instruments, and Zi = (p1K (wi ), ..., pKK (wi ))0 for approximating functions pkK (w), such as power series or splines. In this case, linear combinations of Zi may approximate the unknown reduced form (e.g. as in Donald and Newey (2001)).

[3]

To describe estimators in the extant literature, let P = Z(Z 0 Z)−1 Z 0 . The LIML estimator, δ˜∗ , is given by (y − Xδ)0 P (y − Xδ) ∗ ∗ ∗ ˜ ˆ ˆ . δ = arg min Q (δ), Q (δ) = δ (y − Xδ)0 (y − Xδ) FULL is obtained as δ˘∗ = (X 0 P X − α ˘ ∗ X 0 X)−1 (X 0 P y − α ˘ ∗ X 0 y), for α ˘ ∗ = [˜ α∗ − (1 − α ˜ ∗ )C/T ]/[1 − (1 − α ˜ ∗ )C/T ], α ˜ ∗ = Q(δ˜∗ ), and C > 0. FULL has moments of all orders, is approximately mean unbiased for C = 1, and is second order admissible for C ≥ 4, under homoskedasticity and standard large sample asymptotics. Both LIML and FULL are members of a class of estimators of the form δˆ∗ = (X 0 P X − α ˆ ∗ X 0 X)−1 (X 0 P y − α ˆ ∗ X 0 y). For example, LIML has this form for α ˆ∗ = α ˜ ∗ , FULL for α ˆ∗ = α ˘ ∗ , and 2SLS for α ˆ ∗ = 0. We can use the objective functions that these estimators minimize in order to characterize the problem with heteroskedasticity and many instruments. If the limit of the objective function is not minimized at the true parameter, then the estimator will not be consistent. For expository purposes, first consider 2SLS, which has the following objective function ˆ 2SLS (δ) = (y − Xδ)0 P (y − Xδ)/n = Q

n X X 0 0 (yi − Xi δ)Pij (yj − Xj δ)/n + Pii (yi − Xi0 δ)2 /n. i=1

i6=j

This objective function is a quadratic form that, like a sample average, will be close to its expectation in large samples. Its expectation is n h i X X 0 0 ˆ E Q2SLS (δ) = (δ − δ0 ) Υi Pij Υj (δ − δ0 )/n + Pii E[(yi − Xi0 δ)2 ]/n i=1

i6=j

Asymptotically, the first term following the above equality will be minimized at δ0 , under certain regularity conditions. The second term is an expected squared residual that will not be minimized at δ0 due to endogeneity. With many instruments Pii 9 0, [4]

so that the second term does not vanish asymptotically. Hence, with many instruments, 2SLS is not consistent, even under homoskedasticity, as pointed out by Bekker (1994). ˆ ∗ (δ), with a For LIML, we can (asymptotically) replace the objective function, Q corresponding ratio of expectations giving Pn Pn 0 2 E[(y − Xδ)0 P (y − Xδ)] (δ − δ0 )0 i6=j Pij Υi Υ0j (δ − δ0 ) i=1 Pii E[(yi − Xi δ) ] P P = + . n n 0 2 0 2 E[(y − Xδ)0 (y − Xδ)] i=1 E[(yi − Xi δ) ] i=1 E[(yi − Xi δ) ]

Here, we again see that the first term following the equality will be minimized at δ0

asymptotically. Under heteroskedasticity, the second term may not have a critical value at δ0 , and so the objective function will not be minimized at δ0 . To see this let σi2 = E[ε2i ], P P P P γi = E[Xi εi ]/σi2 , and γ¯ = ni=1 E[Xi εi ]/ ni=1 σi2 = i γi σi2 / i σi2 . Then " n # ¯ Pn n 2 ¯ X X P E[(y − X δ) ] ∂ −2 i i i=1 ii ¯ P = Pn Pii E[Xi εi ] − Pii σi2 γ¯ n 2 2] ¯ ∂δ E[(y − X δ) σ i i i=1 i=1 i δ=δ0 i=1 i=1 P −2 ni=1 Pii (γi − γ¯)σi2 Pn = = −2Cov\ σ 2 (Pii , γi ), 2 i=1 σi

where Cov\ σ2 (Pii , γi ) is the covariance between Pii and γi , for the distribution with probP ability weight σi2 / ni=1 σi2 for the ith observation. When lim Cov\ σ2 (Pii , γi ) 6= 0,

n−→∞

the objective function will not have zero derivative at δ0 asymptotically so that it is not minimized at δ0 . When this covariance does have a zero limit then it can be shown that the ratio of expectations will be minimized at δ0 as long as for Ωi = E[Ui Ui0 ] the matrix Pn Pn 2 ¶ X µ n 2 X X σi Pii 0 i=1 i=1 σi Pii P P Υi Υi /n + Pii Ωi /n − Ωi /n 1− n n 2 2 i=1 σi i=1 σi i i=1

has a positive definite limit. For the homoskedastic case it is known that LIML is consistent under many or many weak instruments (see e.g. Bekker (1994) and Chao and Swanson (2004)). Note that Cov\ σ2 (Pii , γi ) = 0, when either γi or Pii does not depend on i. Thus, it

is variation in γi = E[Xi εi ]/σi2 , the coefficients from the projection of Xi on εi , that leads to inconsistency of LIML, and not just any heteroskedasticity. Also, the case where [5]

Pii is constant occurs with dummy instruments and equal group sizes. It was pointed out by Bekker and van der Ploeg (2005) that LIML is consistent in this case, under heteroskedasticity. LIML is inconsistent when Pii = Zi0 (Z 0 Z)−1 Zi (roughly speaking this is the size of the ith instrument observation) is correlated with γi . This can easily happen when (say) there is more heteroskedasticity in σi2 than E[Xi εi ]. Bekker and van der Ploeg (2005) and Chao and Swanson (2004) pointed out that LIML can be inconsistent with heteroskedasticity; but this appears to be the first statement of the critical condition that Cov\ σ 2 (Pii , γi ) = 0 for consistency of LIML. The lack of consistency of these estimators under many instruments and heteroskedasticity can be attributed to the presence of the i = j terms in their objective functions. The estimators can be made robust to heteroskedasticity by dropping these terms. Doing this for 2SLS gives δ¯ = arg min δ

Solving for δ¯ gives δ¯ =

X (yi − Xi0 δ)Pij (yj − Xj0 δ)/n i6=j

Ã X i6=j

Xi Pij Xj0

!−1

X

Xi Pij yj .

i6=j

This is the JIV2 estimator of Angrist, Imbens, and Krueger (1994). Because the normal equations remove the i = j terms, this estimator is consistent. It was pointed out by Ackerberg and Devereux (2003) and Chao and Swanson (2004) that this estimator is consistent under many weak instruments and heteroskedasticity. However, under homoskedasticity and many weak instruments, this estimator is not efficient; and Davidson and MacKinnon (2005) argued that it additionally has inferior small sample properties under homoskedasticity, when compared with LIML. The estimators that we give overcome these problems.

[6]

3

Heteroskedasticity Robust LIML and FULL

The heteroskedasticity robust LIML estimator (HLIM) is obtained by dropping the i = j terms from the numerator of the LIML objective function, so that P 0 0 i6=j (yi − Xi δ)Pij (yj − Xj δ) ˜ ˆ ˆ . δ = arg min Q(δ), Q(δ) = δ (y − Xδ)0 (y − Xδ) Like the jackknife IV estimator, δ˜ will be consistent under heteroskedasticity because the i = j terms have been removed from the numerator. In the sequel, we will show that this estimator (and an asymptotic variance estimator thereof) is consistent and asymptotically normal. ¯ = [y, X]. As is the case for LIML, this estimator is invariant to normalization. Let X Then d˜ = (1, −δ˜0 )0 solves

d0 min

d:d1 =1

³P

´ ¯ i Pij X ¯ j0 d X i6=j . ¯ 0 Xd ¯ d0 X

Another normalization, such as imposing that another d is equal to 1 would produce the same estimator, up to the normalization. ˜ ˆ δ) Also, computation of this estimator is straightforward. Similarly to LIML, α ˜ = Q( ¯ 0 X) ¯ −1 P X ¯ i Pij X ¯ j0 . Also, first order conditions for δ˜ are is the smallest eigenvalue of (X i6=j 0=

X i6=j

´ ³ X ˜ ˜ Xi Pij yj − Xj0 δ˜ − α Xi (yi − Xi0 δ). i

Solving these conditions gives !−1 Ã ! Ã X X Xi Pij Xj0 − α ˜X 0X Xi Pij yj − α ˜X 0y . δ˜ = i6=j

i6=j

This estimator has a similar form to LIML except that the i = j terms have been deleted from the double sums. It is interesting to note that LIML and HLIM coincide when Pii is constant. In that case, ˆ ∗ (δ) = Q(δ) ˆ Q +

P

Pii (yi − Xi0 δ)2 ˆ = Q(δ) + P11 , (y − Xδ)0 (y − Xδ) i

[7]

so that the LIML objective function equals the HLIM objective function plus a constant. This explains why constant Pii will lead to LIML being consistent under heteroskedasticity. HLIM is a member of a class of jackknife estimators having the form !−1 Ã ! Ã X X Xi Pij X 0 − α ˆX 0X Xi Pij yj − α ˆX 0y . δˆ = j

i6=j

i6=j

The JIV estimator is obtained by setting α ˆ = 0. A heteroskedasticity consistent version of FULL, namely HFUL, is obtained by replacing α ˜ with α ˆ = [˜ α − (1 − α ˜ )C/T ]/[1 − (1 − α ˜ )C/T ] for some C > 0. The small sample properties of this estimator are unknown, but we expect its performance relative to HLIM to be similar to that of FULL relative to LIML. As pointed out by Hahn, Hausman, and Kuersteiner (2004), FULL has much smaller dispersion than LIML with weak instruments, so we expect the same for HFUL. Monte Carlo results given below confirm these properties. An asymptotic variance estimator is useful for constructing large sample confidence ˆ γˆ = X 0 εˆ/ˆ ˆ = X − εˆγˆ 0 , intervals and tests. To describe it, let εî = yi − Xi0 δ, ε0 εˆ, X ˆ = H

X i6=j

ˆ= Xi Pij Xj0 − α ˆ X 0 X, Σ

n X X

ˆ i Pik εˆ2k Pkj X ˆ j0 + X

i,j=1 k∈{i,j} /

X

ˆ i εî εˆj X ˆ j0 . Pij2 X

i6=j

The variance estimator is ˆH ˆ −1 . ˆ −1 Σ Vˆ = H ˜ as a combination of forward and reverse We can interpret the HLIM estimator, δ, jackknife IV (JIV) estimators. For simplicity, we give this interpretation in the scalar δ P P case. Let ε˜i = yi − Xi0 δ˜ and γ˜ = i Xi ε˜i / i ε˜2i . First-order conditions for δ˜ are 0=−

X X ˜ X ˆ δ) ∂ Q( ˜ = ˜ ˜ i − γ˜ yi ]Pij (yj − X 0 δ). ε˜2i /2 = (Xi − γ˜ ε˜i )Pij (yj − Xj0 δ) [(1 + γ˜ δ)X j ∂δ i i6=j i6=j

The forward JIV estimator δ¯ is Ã !−1 X X δ¯ = Xi Pij Xj Xi Pij yj . i6=j

i6=j

[8]

The reverse JIV is obtained as follows. Dividing the structural equation by δ0 gives Xi = yi /δ0 − εi /δ0 . Applying JIV to this equation in order to estimate 1/δ0 , and then inverting, gives the reverse JIV estimator δ¯r =

Ã X

yi Pij Xj

i6=j

!−1

X

yi Pij yj .

i6=j

Then, collecting terms in the first-order conditions for HLIM gives

˜ 0 = (1 + γ˜ δ)

X i6=j

˜ = (1 + γ˜ δ)

X i6=j

Dividing through by

P

i6=j

Finally, solving for δ˜ gives

˜ − γ˜ Xi Pij (yj − Xj0 δ) ˜ − γ˜ Xi Pij Xj (δ¯ − δ)

X i6=j

X i6=j

˜ yi Pij (yj − Xj0 δ)

˜ yi Pij Xj (δ¯r − δ).

Xi Pij Xj gives ˜ δ¯ − δ) ˜ − γ˜ δ( ¯ δ¯r − δ). ˜ 0 = (1 + γ˜ δ)( ¡ ¢ ˜ δ¯ − γ˜ δ¯ δ¯r (1 + γ ˜ δ) δ˜ = . ¯ 1 + γ˜(δ˜ − δ)

As usual, the asymptotic variance of a linear combination of coefficients is unaffected by how the coefficients are estimated, so that a feasible version of this estimator is n n X X ¡ ¢ r ∗ 0¯ ¯ ¯ ¯ ¯ ¯ 2. ¯ Xi (yi − Xi δ)/ (yi − Xi0 δ) δ = (1 + γ¯ δ)δ − γ¯δ δ , γ¯ = i=1

i=1

Because HLIM and HFUL perform so well in our Monte Carlo experiments, we do not pursue this particular estimator, however. The above result is analogous to that of Hahn and Hausman (2002), in the sense that under homoskedasticity, LIML is an optimal combination of forward and reverse bias corrected two stage least squares estimators. We find a similar result, as HLIM is a function of forward and reverse heteroskedasticity robust JIV estimators.

[9]

4

Optimal Estimation with Heteroskedasticity

HLIM is not asymptotically efficient under heteroskedasticity and many weak instruments. In GMM terminology, it uses a nonoptimal weighting matrix, one that is not heteroskedasticity consistent for the inverse of the variance of the moments. In addition, it does not use a heteroskedasticity consistent projection of the endogenous variables on the disturbance, which leads to inefficiency in the many instruments correction term. Efficiency can be obtained by modifying the estimator so that the weight matrix and the projection are heteroskedasticity consistent. Let Ã ! n X X ˆ ˆk (δ) = ˆ −1 Ω(δ) = Zi Zi0 εi (δ)2 /n, B Zi Zi0 εi (δ)Xik /n Ω(δ) i=1

and

i

h i ˆk (δ)Zi εi (δ), D ˆ i (δ) = D ˆ i1 (δ), ..., D ˆ iG (δ) . ˆ ik (δ) = Zi Xik − B D

Also, let δ¯ be a preliminary estimator (such as HLIM). An IV estimator that is efficient under heteroskedasticity of unknown form and many weak instruments is !−1 Ã X X ¯ 0 Ω( ¯ 0 Ω( ¯ −1 Zj X 0 ¯ −1 Zj yj . ˆ i (δ) ˆ δ) ˆ i (δ) ˆ δ) δˆ = D D j i6=j

i6=j

¯ −1 , and where ˆ δ) This is a jackknife IV estimator with an optimal weighting matrix, Ω( ¯ replaces Xi Z 0 . The use of D ¯ makes the estimator as efficient as the CUE under ˆ i (δ) ˆ i (δ) D i many weak instruments. The asymptotic variance can be estimated by ˆ −1 Σ ˆH ˆ −1 , H ˆ = U =H

X

˜ −1 Zj X 0 , Σ ˆ δ) ˆ Xi Zi0 Ω( j

=

n X

˜ −1 D ˜ 0 Ω( ˜ ˆ i (δ) ˆ δ) ˆ j (δ). D

i,j=1

i6=j

This estimator has a sandwich form similar to that given in Newey and Windmeijer (2007).

5

The Robust, Restricted CUE

As discussed above, HLIM has been made robust to heteroskedasticity by jackknifing, where own observation terms are removed. In general this same approach can be used to [10]

make the continuous updating estimator robust to restrictions on the weighting matrix, such as homoskedasticity. For example, LIML is a CUE, where homoskedasticity is imposed on the weighting matrix; and HLIM is its robust version. For expository purposes, consider a general GMM setup where δ denotes a G × 1 parameter vector and gi (δ) is a K × 1 vector of functions of the data and parameters satisfying E[gi (δ0 )] = 0. For example, in the linear IV environment, gi (δ) = Zi (yi − Xi0 δ). P ˜ Let Ω(δ) denote an estimator of Ω(δ) = ni=1 E[gi (δ)gi (δ)0 ]/n, where an n subscript on Ω(δ) is suppressed for notational convenience. A CUE is given by ˜ −1 gˆ(δ). δˆ = arg min gˆ(δ)0 Ω(δ) δ

P ˜ When Ω(δ) = ni=1 gi (δ)gi (δ)0 /n this estimator is the CUE given by Hansen, Heaton,

and Yaron (1996), that places no restrictions on the estimator of the second moment matrices. In general, restrictions may be imposed on the second moment matrix. For ˜ (δ) to be only example, in the IV setting where gi (δ) = Zi (yi − Xi0 δ), we may specify Ω

consistent under homoskedasticity, ˜ Ω(δ) = (y − Xδ)0 (y − Xδ) Z 0 Z/n2 . In this case the CUE objective function is (y − Xδ)0 P (y − Xδ) gˆ(δ) Ω(δ) gˆ(δ) = , (y − Xδ)0 (y − Xδ) 0˜

−1

which is the LIML objective function, as is well known (see Hansen, Heaton, and Yaron, (1996)). ˜ A CUE will tend to have low bias when the restrictions imposed on Ω(δ) are satisfied, but may be more biased otherwise. A simple calculation can be used to explain this ˜ ¯ ˜ bias. Consider a CUE where Ω(δ) is replaced by its expectation, Ω(δ) = E[Ω(δ)]. This replacement is justified under many weak moment asymptotics. The expectation of the CUE objective function is then ¯ −1 gˆ(δ)] = (1 − n−1 )¯ ¯ −1 g¯(δ) + tr(Ω(δ) ¯ −1 Ω(δ))/n, E[ˆ g(δ)0 Ω(δ) g(δ)0 Ω(δ) [11]

where g¯(δ) = E[gi (δ)] and Ω(δ) = E[gi (δ)gi (δ)0 ]. The first term in the above expression ¯ is minimized at δ0 , where g¯(δ0 ) = 0. When Ω(δ) = Ω (δ) , then ¯ −1 Ω(δ))/n = K/n, tr(Ω(δ) so that the second term does not depend on δ. In this case the expected value of the CUE ¯ objective function is minimized at δ0 . When Ω(δ) 6= Ω(δ), the second term will depend on δ, and so the expected value of the CUE objective function will not be minimized at δ0 . This effect will lead to bias in the CUE, because the estimator will be minimizing an objective function with expectation that is not minimized at the truth. It is also interesting to note that this bias effect will tend to increase with K. This bias was noted by Han and Phillips (2005) for two-stage GMM, who referred to the bias term as a “noise” term, and to the other term as a “signal” term. We robustify the CUE by jackknifing (i.e. by deleting the own observation terms in the CUE quadratic form). Note that X ¯ −1 gj (δ)/n2 ] = (1 − n−1 )¯ ¯ −1 g¯(δ), E[ gi (δ)0 Ω(δ) g(δ)0 Ω(δ) i6=j

¯ is. A corresponding estimator is which is always minimized at δ0 , no matter what Ω(δ) ¯ ˜ obtained by replacing Ω(δ) by Ω(δ) and minimizing. Namely, δˆ = arg min δ

X

˜ −1 gj (δ)/n2 . gi (δ)0 Ω(δ)

i6=j

This is a robust CUE (RCUE), that should have small bias by virtue of the jackknife ˜ form of the objective function. The HLIM estimator is precisely of this form, for Ω(δ) = (y − Xδ)0 (y − Xδ) Z 0 Z/n2 .

6

Asymptotic Theory

Theoretical justification for the estimators proposed here is provided by asymptotic theory where the number of instruments grows with the sample size. Some regularity conditions are important for the results. Let Zi0 , εi , Ui0 , and Υ0i denote the ith row of Z, ε, U, [12]

and Υ respectively. Here, we will consider the case where Z is constant, which can be viewed as conditioning on Z (see e.g. Chao, Swanson, Hausman, Newey, and Woutersen (2007)). Assumption 1: Z includes among its columns a vector of ones, rank(Z) = K, and there is a constant C such that Pii ≤ C < 1, (i = 1, ..., n), K −→ ∞. The restriction that rank(Z) = K is a normalization that requires excluding redundant columns from Z. It can be verified in particular cases. For instance, when wi is a continuously distributed scalar, Zi = pK (wi ), and pkK (w) = wk−1 , it can be shown that Z 0 Z is nonsingular with probability one for K < n.2 The condition Pii ≤ C < 1 implies P that K/n ≤ C, because K/n = ni=1 Pii /n ≤ C.

Assumption 2: There is a G × G matrix, Sn = Sñ diag (μ1n , ..., μGn ), and zi such √ that Υi = Sn zi / n, Sñ is bounded and the smallest eigenvalue of Sñ Sñ0 is bounded away √ √ from zero, for each j either μjn = n or μjn / n −→ 0, μn = min μjn −→ ∞, and 1≤j≤G √ P P K/μ2n −→ 0. Also, ni=1 kzi k4 /n2 −→ 0, and ni=1 zi zi0 /n is bounded and uniformly nonsingular.

Setting μjn =

√ n leads to asymptotic theory like that in Kunitomo (1980), Morimune

(1984), and Bekker (1994), where the number of instruments K can grow as fast as the √ sample size. In that case, the condition K/μ2n −→ 0 would be automatically satisfied. √ Allowing for K to grow, and for μn to grow more slowly than n, allows for many instruments without strong identification. This condition then allows for some components √ of the reduced form to give only weak identification (corresponding to μjn / n −→ 0), √ and other components (corresponding to μjn = n) to give strong identification. In particular, this condition allows for fixed constant coefficients in the reduced form. Assumption 3: (ε1 , U1 ), ..., (εn , Un ) are independent with E[εi ] = 0, E[Ui ] = 0, E[ε4i ] 2

The observations w1 , ..., wT are distinct with probability one and therefore, by K < n, cannot all be roots of a K th degree polynomial. It follows that for any nonzero a there must be some t with a0 Zt = a0 pK (wi ) 6= 0, implying that a0 Z 0 Za > 0.

[13]

and E[kUi k4 ] are bounded in i, V ar((εi , Ui0 )0 ) = diag(Ω∗i , 0), and nonsingular.

Pn

i=1

Ω∗i /n is uniformly

This condition includes moment existence assumptions. It also requires the average variance of the nonzero reduced form disturbances to be nonsingular, and is useful for the proof of consistency contained in the appendix. Assumption 4: There is a πKn such that

Pn

i=1

kzi − πKn Zi k2 /n −→ 0.

This condition allows for an unknown reduced form that is approximated by a linear combination of the instrumental variables. It is possible to replace this assumption with P the condition that i6=j zi Pij zj0 /n is uniformly nonsingular. We can easily interpret all of these conditions within the context of the important

example of a linear model with exogenous covariates and a possibly unknown reduced form. This example is given by ¶ ¶ µ µ √ ¶ µ vi π11 Z1i + μn f0 (wi )/ n Z1i , Zi = , + Xi = Z1i 0 pK (wi ) where Z1i is a G2 × 1 vector of included exogenous variables, f0 (w) is a G − G2 dimensional vector function of a fixed dimensional vector of exogenous variables, w, and def

pK (w) = (p1K (w), ..., pK−G2 ,K (w))0 . The variables in Xi other than Z1i are endogenous √ with reduced form π11 Z1i + μn f0 (wi )/ n. The function f0 (w) may be a linear combination of a subvector of pK (w), in which case zi = πKn Zi , for some πKn in Assumption 4; or it may be an unknown function that can be approximated by a linear combination of √ pK (w). For μn = n, this example is like the model in Donald and Newey (2001), where Zi includes approximating functions for the optimal (asymptotic variance minimizing) instruments Υi , but the number of instruments can grow as fast as the sample size. When μ2n /n −→ 0, it is a modified version where the model is more weakly identified. To see precise conditions under which the assumptions are satisfied, let ¶ µ µ ¶ ¡ √ √ ¢ f0 (wi ) I π 11 , Sn = Sñ diag μn , ..., μn , n, ..., n , and Sñ = zi = . Z1i 0 I [14]

√ By construction we have that Υi = Sn zi / n. Assumption 2 imposes the requirements that

and that

n X i=1

Pn

0 i=1 zi zi /n

kzi k4 /n2 −→ 0,

is bounded and uniformly nonsingular. The other requirements of

Assumption 2 are satisfied by construction. Turning to Assumption 3, we require that Pn 0 0 πKn , [IG2 , 0]0 ]0 . i=1 V ar(εi , Ui )/n is uniformly nonsingular. For Assumption 4, let πKn = [˜

Then Assumption 4 will be satisfied if, for each n, there exists a π ˜Kn with n X i=1

kzi −

0 πKn Zi k2 /n

=

n X i=1

0 kf0 (wi ) − π ˜Kn Zi k2 /n −→ 0.

Theorem 1: If Assumptions 1-4 are satisfied and α ˆ = op (μ2n /n) or δˆ is HLIM or p 0 ˆ ˆ p HFUL then μ−1 n Sn (δ − δ0 ) −→ 0 and δ −→ δ0 .

ˆ For instance, in the This result gives convergence rates for linear combinations of δ. √ 0 ˆ above example, it implies that δˆ1 is consistent and that π11 δ1 + δˆ2 = op (μn / n). The asymptotic variance of the estimator will depend on the growth rate of K relative to μ2n . The following condition allows for two cases. Assumption 5: Either I) K/μ2n is bounded and and μn Sn−1 −→ S¯0 .

√ KSn−1 −→ S0 or; II) K/μ2n −→ ∞

To state a limiting distribution result it is helpful to also assume that certain objects P P ˜i0 ; converge. Let σi2 = E[ε2i ], γn = ni=1 E[Ui εi ]/ ni=1 σi2 , U˜ = U − εγn0 , having ith row U ˜ i = E[U ˜i U ˜i0 ]. and let Ω

Pn Pn 2 0 2 (1 − Pii )zi zi0 /n, Σp = lim Assumption 6: HP = lim i=1 i=1 (1 − Pii ) zi zi σi /n n−→∞ n−→∞ ³ ´ P ˜j0 ] /K. ˜j U˜j0 ] + E[U ˜i εi ]E[εj U and Ψ = limn−→∞ i6=j Pij2 σi2 E[U

This convergence condition can be replaced by an assumption that certain matrices

are uniformly positive definite without affecting the limiting distribution result for tratios given in Theorem 3 below (see Chao, Swanson, Hausman, Newey, and Woutersen (2007)). [15]

We can now state the asymptotic normality results. In Case I we have that d Sn0 (δˆ − δ0 ) −→ N(0, ΛI ),

(6.1)

where ΛI = HP−1 ΣP HP−1 + HP−1 S0 ΨS00 HP−1 . In Case II, we have that √ d (μn / K)Sn0 (δˆ − δ0 ) −→ N(0, ΛII ),

(6.2)

where ΛII = HP−1 S¯0 ΨS¯00 HP−1 . The asymptotic variance expressions allow for the many instrument sequence of Kunitomo (1980), Morimune (1983), and Bekker (1994) and the many weak instrument sequence of Chao and Swanson (2004, 2005). In Case I, the first term in the asymptotic variance, ΛI , corresponds to the usual asymptotic variance, and the second is an adjustment for the presence of many instruments. In Case II, the asymptotic variance, ΛII , only contains the adjustment for many instruments. This is because K is growing faster than μ2n . Also, ΛII will be singular when included exogenous variables are present. We can now state an asymptotic normality result. Theorem 2: If Assumptions 1-6 are satisfied, α ˆ =α ˜ + Op (1/T ) or δˆ is HLIM or HFUL, then in Case I, equation (6.1) is satisfied, and in Case II, equation (6.2) is satisfied. It is interesting to compare the asymptotic variance of the HLIM estimator with that of LIML when the disturbances are homoskedastic. Under homoskedasticity the variance of V ar((εi , Ui0 )) will not depend on i (e.g. so that σi2 = σ 2 ). Then, γn = E[Xi εi ]/σ 2 = γ ˜i εi ] = E[Ui εi ] − γσ 2 = 0, so that and E[U ˜ P = lim ˜ p, H Σp = σ H 2

n−→∞

n n X X 2 0 2 0 ˜ ˜ (1 − Pii ) zi zi /n, Ψ = σ E[Uj Uj ](1 − lim Pii2 /K). n−→∞

i=1

[16]

i=1

˜i0 ]S00 , the asymptotic variance of HLIM is then ˜i U Focusing on Case I, letting Γ = σ 2 S0 E[U V =σ

2

˜ P H −1 HP−1 H P

+ lim (1 − n−→∞

n X

Pii2 /K)Hp−1 ΓHP−1 .

i=1

For the variance of LIML, assume that third and fourth moments obey the same restrictions that they do under normality. Then from Hansen, Hausman, and Newey (2006), P for H = limn−→∞ ni=1 zi zi0 /n and τ = limn−→∞ K/n, the asymptotic variance of LIML is

V ∗ = σ 2 H −1 + (1 − τ )−1 H −1 ΓH −1 . With many weak instruments, where τ = 0 and maxi≤n Pii −→ 0, we will have ˜ P = H and limn−→∞ P Pii2 /K −→ 0, so that the asymptotic variances of HLIM HP = H i

and LIML are the same and equal to σ 2 H −1 + H −1 ΓH −1 . This case is most important in practical applications, where K is usually very small relative to n. In such cases we would expect from the asymptotic approximation to find that the variance of LIML and HLIM are very similar. Also, the JIV estimators will be inefficient relative to LIML and HLIM. As shown in Chao and Swanson (2004), under many weak instruments the asymptotic variance of JIV is VJIV = σ 2 H −1 + H −1 S0 (σ 2 E[Ui Ui0 ] + E[Ui εi ]E[εi Ui0 ])S00 H −1 , ˜i0 ]. ˜i U which is larger than the asymptotic variance of HLIM because E[Ui Ui0 ] ≥ E[U In the many instruments case, where K and μ2n grow as fast as n, it turns out that we cannot rank the asymptotic variances of LIML and HLIM. To show this, consider √ an example where p = 1, zi alternates between −¯ z and z¯ for z¯ 6= 0, Sn = n (so

˜ = E[U ˜i2 ] and that Υi = zi ), and zi is included among the elements of Zi . Then, for Ω P κ = limn−→∞ ni=1 Pii2 /K we find that ! Ã 2 ˜ σ Ω V −V∗ = 2 (τ κ − τ 2 ) 1 − 2 . z¯ (1 − τ )2 z¯ Since τ κ − τ 2 is the limit of the sample variance of Pii , which we assume to be positive, ˜ Here, z¯2 is the limit of the sample variance of zi . Thus, V ≥ V ∗ if and only if z¯2 ≥ Ω. [17]

the asymptotic variance ranking can go either way depending on whether the sample ˜i . In applications where the sample size is variance of zi is bigger than the variance of U large relative to the number of instruments, these efficiency differences will tend to be quite small, because Pii is small. For homoskedastic, non-Gaussian disturbances, it is also interesting to note that the asymptotic variance of HLIM does not depend on third and fourth moments of the disturbances, while that of LIML does (see Bekker and van der Ploeg (2005) and Haslett (2000)). This makes estimation of the asymptotic variance under homoskedasticity simpler for HLIM than for LIML. It remains to establish the consistency of the asymptotic variance estimator, and to show that confidence intervals can be formed for linear combinations of the coefficients in the usual way. The following theorem accomplishes this, under additional conditions on zi . Theorem 3: If Assumptions 1-6 are satisfied, and α ˆ=α ˜ + Op (1/T ) or δˆ is HLIM or HFUL, there exists a C with kzi k ≤ C for all i, and there exists a πn , such that p

p

maxi≤n kzi − πn Zi k −→ 0, then in Case I, Sn0 Vˆ Sn −→ ΛI and in Case II, μ2n Sn0 Vˆ Sn /K −→

ΛII. . Also, if c0 S00 ΛI S0 c 6= 0 in Case I or c0 S¯00 ΛII S¯0 c 6= 0 in Case II, then c0 (δˆ − δ0 ) d p −→ N(0, 1). c0 Vˆ c

This result allows us to form confidence intervals and test statistics for a single linear combination of parameters in the usual way.

7

Monte Carlo Results

In this Monte Carlo simulation, we provide evidence concerning the finite sample behavior of HLIM and HFUL. The model that we consider is y = δ10 + δ20 x2 + ε, x2 = πz1 + U2

[18]

where zi1 ∼ N(0, 1) and U2i ∼ N(0, 1). The ith instrument observation is 2 3 4 , z1i , z1i , z1i Di1 , ..., z1i Di,K−5 ), Zi0 = (1, z1i , z1i

where Dik ∈ {0, 1}, Pr(Dik = 1) = 1/2, and zi1 ∼ N(0, 1). Thus, the instruments consist of powers of a standard normal up to the fourth power plus interactions with dummy variables. Only z1 affects the reduced form, so that adding the other instruments does not improve asymptotic efficiency of the LIML or FULL estimators, though the powers of zi1 do help with asymptotic efficiency of the CUE. The structural disturbance, ε, is allowed to be heteroskedastic, being given by s 1 − ρ2 ε = ρU2 + (φv1 + 0.86v2 ), v1 ∼ N(0, z12 ), v2 ∼ N(0, (0.86)2 ), φ2 + (0.86)4 where vi1 and vi2 are independent of U2 . This is a design that will lead to LIML being inconsistent with many instruments. Here, E[Xi εi ] is constant and σi2 is quadratic in zi1 , 2 −1 so that γi = (C1 + C2 zi1 + C3 zi1 ) A, for a constant vector, A, and constants C1 , C2 , C3 .

In this case, Pii will be correlated with γi = E[Xi εi ]/σi2 . We report properties of estimators and t-ratios for δ2 . We set n = 800 and ρ = 0.3 throughout and choose K = 2, 10, 30. We choose π so that the concentration parameter is nπ2 = μ2 = 8, 16, 32. We also choose φ so that the R-squared for the regression of ε2 on the instruments is 0, 0.1, or 0.2. Below, we report results on median bias and the range between the .05 and .95 quantiles for LIML, HLIM, the jackknife CUE, JIV, HFUL, CUE, and FULL. Interquartile range results were similar. We find that under homoskedasticity, LIML and HFUL have quite similar properties, though LIML is slightly less biased. Under heteroskedasticity, HFUL is much less biased and also much less dispersed than LIML. Thus, we find that heteroskedasticity can bias LIML. We also find that the dispersion of LIML is substantially larger than HFUL. Thus we find a lower bias for HFUL under heteroskedasticity and many instruments, as predicted by the theory, as well as substantially lower dispersion, which though not predicted by the theory may turn out to be important in practice. In additional tables following the references, we also find that coverage proba[19]

bilities using the heteroskedasticity and many instrument consistent standard errors are quite accurate. Median Bias R2ε2 |z2 = 0.00 1

HF U L k1 0.025 0.027 0.067 0.007 0.002 0.003

μ2

K LIM L HLIM F U LL1 HF U L 8 0 0.005 0.005 0.042 0.043 8 8 0.024 0.023 0.057 0.057 8 28 0.065 0.065 0.086 0.091 32 0 0.002 0.002 0.011 0.011 32 8 0.002 0.001 0.011 0.011 32 28 0.003 0.002 0.013 0.013 ***Results based on 20,000 simulations.

JIV E −0.034 0.053 0.164 −0.018 −0.019 −0.014

CU E 0.005 0.025 0.071 0.002 0.002 0.006

JCU E 0.005 0.032 0.092 0.002 0.002 0.006

CU E 1.470 3.101 6.336 0.616 0.770 1.156

JCU E 1.487 3.511 6.240 0.616 0.767 1.133

Nine Decile Range: .05 to .95 R2ε2 |z2 = 0.00 1

HF U L k1 1.202 2.579 4.793 0.602 0.713 0.983

μ2 K LIM L HLIM F U LL1 HF U L 8 0 1.470 1.466 1.072 1.073 8 8 2.852 2.934 1.657 1.644 8 28 5.036 5.179 2.421 2.364 32 0 0.616 0.616 0.590 0.589 32 8 0.715 0.716 0.679 0.680 32 28 0.961 0.985 0.901 0.913 ***Results based on 20,000 simulations.

JIV E 3.114 5.098 6.787 0.679 0.816 1.200

Median Bias R2ε2 |z2 = 0.20 1

μ2

K LIM L HLIM F U LL1 HF U L 8 0 −0.001 0.050 0.041 0.078 8 8 −0.623 0.094 −0.349 0.113 8 28 −1.871 0.134 −0.937 0.146 32 0 −0.001 0.011 0.008 0.020 32 8 −0.220 0.015 −0.192 0.024 32 28 −1.038 0.016 −0.846 0.027 ***Results based on 20,000 simulations.

[20]

HF U L k1 0.065 0.096 0.134 0.016 0.016 0.017

JIV E −0.031 0.039 0.148 −0.021 −0.021 −0.016

CU E −0.001 0.003 −0.034 −0.001 0.000 −0.017

JCU E 0.012 −0.005 0.076 −0.003 −0.019 −0.021

Nine Decile Range: .05 to .95 R2ε2 |z2 = 0.20 1

μ2

K LIM L HLIM F U LL1 HF U L 8 0 2.219 1.868 1.675 1.494 8 8 26.169 5.611 4.776 2.664 8 28 60.512 8.191 7.145 3.332 32 0 0.941 0.901 0.903 0.868 32 8 3.365 1.226 2.429 1.134 32 28 18.357 1.815 5.424 1.571 ***Results based on 20,000 simulations.

8

HF U L k1 1.653 4.738 7.510 0.884 1.217 1.808

JIV E 4.381 7.781 9.975 1.029 1.206 1.678

CU E 2.219 16.218 1.5E+012 0.941 1.011 3.563

JCU E 2.582 8.586 12.281 0.946 1.086 1.873

Appendix: Proofs of Consistency and Asymptotic Normality

Throughout, let C denote a generic positive constant that may be different in different uses and let M, CS, and T denote the conditional Markov inequality, the Cauchy-Schwartz inequality, and the Triangle inequality respectively. The first Lemma is proved in Hansen, Hausman, and Newey (2006). ° °2 µ ° °2 ¶ p ° 0 ˆ ° ° ° Lemma A0: If Assumption 2 is satisfied and °Sn (δ − δ0 )/μn ° / 1 + °δˆ° −→ ° ° ° p ° 0 ˆ 0 then °Sn (δ − δ0 )/μn ° −→ 0.

We next give a result from Chao et al. (2007) that is used in the proof of consistency. Lemma A1 (Lemma A1 of Chao et al., 2007): If (Wi , Yi ), (i = 1, ..., n) are in-

¯= dependent, Wi and Yi are scalars, and P is symmetric, idempotent of rank K then for w E[(W1 , ..., Wn )0 ], y¯ = E[(Y1 , ..., Yn )0 ], σ ¯W n = maxi≤n V ar(Wi )1/2 , σ ¯Y n = maxi≤n V ar(Yi )1/2 , X i6=j

Pij Wi Yj =

X

Pij w ¯i y¯j + Op (K 1/2 σ ¯W n σ ¯Y n + σ ¯W n

i6=j

p √ y¯0 y¯ + σ ¯Y n w ¯ 0 w). ¯

˜ = [ε, X]S¯n−10 , and Hn = Pn (1 − For the next result let S¯n = diag(μn , Sn ), X i=1

Pii )zi zi0 /n.

[21]

Lemma A2: If Assumptions 1-4 are satisfied and X

√ K/μ2n −→ 0 then

˜ i Pij X ˜ j0 = diag(0, Hn ) + op (1). X

i6=j

Proof: Note that ˜i = X

µ

μ−1 n εi Sn−1 Xi

¶

=

µ

0√ zi / n

¶

+

µ

μ−1 n εi Sn−1 Ui

¶

.

−2 ˜ ˜ ˜ Since kSn−1 k ≤ Cμ−1 n we have V ar(Xik ) ≤ Cμn for any element Xik of Xi . Then applying P ˜ i Pij X ˜ j0 gives Lemma A1 to each element of i6=j X

X

˜ i Pij X ˜ j0 = diag(0, X

i6=j

X i6=j

= diag(0,

X

X zi Pij zj0 /n) + Op (K 1/2 /μ2n + μ−1 ( kzi k2 /n)1/2 ) n i

zi Pij zj0 /n)

+ op (1).

i6=j

Also, note that Hn −

X

zi Pij zj0 /n =

X i

i6=j

zi zi0 /n − 0 0 ) ZπKn

X i

Pii zi zi0 /n −

X i6=j

zi Pij zj0 /n = z 0 (I − P )z/n 0

0 0 0 (I − P ) (z − ZπKn ) /n ≤ (z − ZπKn ) (z − ZπKn ) /n = (z − X kzi − πKn Zi k2 /n −→ 0, ≤ IG i

where the third equality follows by P Z = Z, the first inequality by I − P idempotent, and the last inequality by A ≤ tr(A)I for any positive semi-definite (p.s.d.) matrix A. P Since this equation shows that Hn − i6=j zi Pij zj0 /n is p.s.d. and is less than or equal to P another p.s.d. matrix that converges to zero it follows that i6=j zi Pij zj0 /n = Hn + op (1). The conclusion follows by T . Q.E.D.

In what follows it is useful to prove directly that the HLIM estimator δ˜ satisfies p Sn0 (δ˜ − δ0 )/μn −→ 0. p

Lemma A3: If Assumptions 1-4 are satisfied then Sn0 (δ˜ − δ0 )/μn −→ 0.

¯ = [0, Υ], U¯ = [ε, U], X ¯ = [y, X], so that X ¯ = (Υ ¯ + U)D ¯ Proof: Let Υ for ∙ ¸ 1 0 D= . δ0 I [22]

√ p ¯ ˆ =X ¯ 0 X/n. Note that kSn / nk ≤ C and by standard calculations z 0 U/n −→ 0. Let B Then

° 0 ° ° °¡ √ ¢ p °Υ ¯ U¯ /n° = ° Sn / n z 0 U/n° ≤ C kz 0 U/nk −→ 0.

¯ n = Pn E[U ¯i U¯i0 ]/n = diag(Pn Ω∗i /n, 0) ≥ Cdiag(IG−G2 +1 , 0) by Assumption 3. Let Ω i=1 i=1 p ¯ −Ω ¯ n −→ ¯ 0 U/n 0, so it follows that w.p.a.1. By M we have U

¯ 0 Υ/n ˆ = (U ¯ 0U ¯ +Υ ¯ 0U ¯ + U¯ 0 Υ ¯ +Υ ¯ 0 Υ)/n ¯ ¯n + Υ ¯ + op (1) ≥ Cdiag(IG−G2 +1 , 0). B =Ω ¯n + Υ ¯ ¯ 0 Υ/n Since Ω is bounded, it follows that w.p.a.1, ˆ −δ 0 )0 = (y − Xδ)0 (y − Xδ)/n ≤ C k(1, −δ 0 )k2 = C(1 + kδk2 ). C ≤ (1, −δ 0 )B(1, ˜ = [ε, X]S¯n−10 . Next, as defined preceding Lemma A2 let S¯n = diag(μn , Sn ) and X P Note that by Pii ≤ C < 1 and uniform nonsingularity of ni=1 zi zi0 /n we have Hn ≥ P (1 − C) ni=1 zi zi0 /n ≥ CIG . Then by Lemma A2, w.p.a.1. def Aˆ =

X i6=j

˜iX ˜ j0 ≥ Cdiag(0, IG ), Pij X

˜ i . Then w.p.a.1 for all δ ¯ i = D0 S¯n X Note that S¯n0 D(1, −δ 0 )0 = (μn , (δ0 − δ)0 Sn )0 and X ! Ã X X 0 ¯iX ¯ j0 (1, −δ 0 )0 μ−2 Pij (yi − Xi0 δ)(yj − Xj0 δ) = μ−2 Pij X n n (1, −δ ) i6=j

i6=j

=

ˆ Let Q(δ) = (n/μ2n ) left element of the

0 0 ¯ ˆ ¯0 0 0 μ−2 n (1, −δ )D Sn ASn D(1, −δ )

P

0 0 0 i6=j (yi −Xi δ)Pij (yj −Xj δ)/(y −Xδ) (y −Xδ). Then by the upper P p conclusion of Lemma A2, μ−2 n i6=j εi Pij εj −→ 0. Then w.p.a.1

¯ ¯ n ¯ ¯ ¯ ¯ X X ¯ ¯ ¯ p ¯ˆ 2 ε P ε / ε /n ¯ −→ 0. ¯Q(δ0 )¯ = ¯μ−2 i ij j n i ¯ ¯ i=1

i6=j

ˆ ≤ Q(δ ˆ ˆ δ) ˆ 0 ).Therefore w.p.a.1, by (y − Xδ)0 (y − we have Q( Since δˆ = arg minδ Q(δ), Xδ)/n ≤ C(1 + kδk2 ), it follows that ° °2 ° 0 ˆ ° °Sn (δ − δ0 )/μn ° p ˆ ≤ C Q(δ ˆ δ) ˆ 0 ) −→ 0≤ ≤ C Q( 0, ° °2 ° ˆ° 1 + °δ ° [23]

2

≥ C kSn0 (δ − δ0 )/μn k .

°2 µ ° ° °2 ¶ p ° ° 0 ˆ ° ° implying °Sn (δ − δ0 )/μn ° / 1 + °δˆ° −→ 0. Lemma A0 gives the conclusion. Q.E.D. p

Lemma A4: If Assumptions 1-4 are satisfied, α ˆ = op (μ2n /n), and Sn0 (δˆ−δ0 )/μn −→ 0 P then for Hn = ni=1 (1 − Pii )zi zi0 /n, Ã ! X X p Xi Pij Xj0 − α ˆ X 0 X Sn−10 = Hn + op (1), Sn−1 ( Xi Pij εˆj − α ˆ X 0 εˆ)/μn −→ 0. Sn−1 i6=j

i6=j

Proof: By M and standard arguments X 0 X = Op (n) and X 0 εˆ = Op (n). Therefore, by kSn−1 k = O(μ−1 n ), p

p

ˆ Sn−1 X 0 εˆ/μn = op (μ2n /n)Op (n/μ2n ) −→ 0. α ˆ Sn−1 X 0 XSn−10 = op (μ2n /n)Op (n/μ2n ) −→ 0, α Lemma A2 (lower right hand block) and T then give the first conclusion. By Lemma A2 P p (off diagonal) we have Sn−1 i6=j Xi Pij εj /μn −→ 0, so that ! Ã X X p Xi Pij εˆj /μn = op (1) − S −1 Xi Pij X 0 S −10 S 0 (δˆ − δ0 )/μn −→ 0.Q.E.D. S −1 n

n

i6=j

j

n

n

i6=j

P p ˆ 0 )/μn −→ 0 then i6=j εî Pij εˆj /ˆ ε0 εˆ = Lemma A5: If Assumptions 1 - 4 are satisfied and Sn0 (δ−δ

op (μ2n /n).

Proof:

P Let βˆ = Sn0 (δˆ − δ0 )/μn and α ˘ = i6=j εi Pij εj /ε0 ε = op (μ2n /n). Note that

ñ = σε2 = Op (1) by M. By Lemma A4 with α ˆ = α ˘ we have H σ ˆε2 = εˆ0 εˆ/n satisfies 1/ˆ P p Sn−1 ( i6=j Xi Pij Xj0 − α ˘ X 0 X)Sn−10 = Op (1) and Wn = Sn−1 (X 0 P ε − α ˘ X 0 ε)/μn −→ 0, so Ã ! P X X ε ˆ P ε ˆ 1 i ij j i6=j −α ˘ = 0 εî Pij εˆj − εi Pij εj − α ˘ (ˆ ε0 εˆ − ε0 ε) εˆ0 εˆ εˆ εˆ i6=j i6=j ³ ´ 2 μn 1 ˆ0 ˜ ˆ ˆ0 Wn = op (μ2 /n), β − 2 β β = H n n n σ ˆε2 so the conclusion follows by T. Q.E.D. p ˆ 0 )/μn −→ Proof of Theorem 1: First, note that if Sn0 (δ−δ 0 then by λmin (Sn Sn0 /μ2n ) ≥ ³ ´ λmin Sñ Sñ0 ≥ C we have

° ° ° ° ° ° ° ° °ˆ ° ° 0 ˆ 0 2 1/2 ° ˆ °Sn (δ − δ0 )/μn ° ≥ λmin (Sn Sn /μn ) °δ − δ0 ° ≥ C °δ − δ0 ° , [24]

p p implying δˆ −→ δ0 . Therefore, it suffices to show that Sn0 (δˆ−δ0 )/μn −→ 0. For HLIM this ˜ = P ε˜i Pij ε˜j /˜ ˆ δ) follows from Lemma A3. For HFUL, note that α ˜ = Q( ε0 ε˜ = op (μ2n /n) i6=j

by Lemma A5, so by the formula for HFUL, α ˆ =α ˜ + Op (1/n) = op (μ2n /n). Thus, the

result for HFUL will follow from the most general result for any α ˆ with α ˆ = op (μ2n /n). For any such α ˆ , by Lemma A4 we have X X Sn0 (δˆ − δ0 )/μn = Sn0 ( Xi Pij Xj0 − α ˆ X 0 X)−1 (Xi Pij εj − α ˆ X 0 ε) /μn i6=j

i6=j

X X = [Sn−1 ( Xi Pij Xj0 − α ˆ X 0 X)Sn−10 ]−1 Sn−1 (Xi Pij εj − α ˆ X 0 ε) /μn i6=j

i6=j

p

= (Hn + op (1))−1 op (1) −→ 0.Q.E.D.

Now we move on to asymptotic normality results. The next result is a central limit theorem that is proven in Chao et. al. (2007). Lemma A6 (Lemma A2 of Chao et al., 2007): If i) P is a symmetric, idempotent matrix with rank(P ) = K, Pii ≤ C < 1; ii) (W1n , U1 , ε1 ), ..., (Wnn , Un , εn ) are indepenP 0 0 dent and Dn = ni=1 E[Win Win ] is bounded; iii) E [Win ] = 0, E[Ui ] = 0, E[εi ] = 0 and P there exists a constant C such that E[kUi k4 ] ≤ C, E[ε4i ] ≤ C; iv) ni=1 E[kWin k4 ] −→ 0; ¢ ¡ P 2 0 2 0 ¯ n def = v) K −→ ∞; then for Σ i6=j Pij E[Ui Ui ]E[εj ] + E[Ui εi ]E[εj Uj ] /K and for any ¯ n c2n > C, sequence of bounded nonzero vectors c1n and c2n such that Ξn = c01n Dn c1n +c02n Σ

it follows that Yn =

n X

Ξ−1/2 ( n

i=1

Let α ˜ (δ) =

P

i6=j

ˆ D(δ) = ∂[

c01n Win + c02n

X i6=j

√ d Ui Pij εj / K) −→ N (0, 1) .

εi (δ)Pij εj (δ)/ε(δ)0 ε(δ) and

X

εi (δ)Pij εj (δ)/2ε(δ)0 ε(δ)]/∂δ =

i6=j

X i6=j

Xi Pij εj (δ) − α ˜ (δ)X 0 ε(δ).

A couple of other intermediate results are also useful. p Lemma A7: If Assumptions 1 - 4 are satisfied and Sn0 (δ¯ − δ0 )/μn −→ 0 then −10 ¯ ˆ δ)/∂δ]S −Sn−1 [∂ D( = Hn + op (1). n

[25]

¯ = y − X δ, ¯ γ¯ = X 0 ε¯/¯ ¯ Then differentiating gives Proof: Let ε¯ = ε(δ) ε0 ε¯, and α ¯=α ˜ (δ). −

X X X ˆ ∂D ¯ = Xi Pij Xj0 − α ¯ X 0 X − γ¯ ε¯i Pij Xj0 − Xi Pij ε¯j γ¯ 0 + 2(¯ ε0 ε¯)¯ αγ¯γ¯0 (δ) ∂δ i6=j i6=j i6=j X 0 0 0 0 ¯ + D( ¯γ, ˆ δ) ˆ δ)¯ = Xi Pij Xj − α ¯ X X + γ¯ D( i6=j

¯ = P Xi Pij ε¯j − (¯ ˆ δ) where the second equality follows by D( ε0 ε¯)¯ αγ¯ . By Lemma A5 we i6=j

have α ¯ = op (μ2n /n). By standard arguments, γ¯ = Op (1) so that Sn−1 γ¯ = Op (1/μn ). Then ¯ = P Xi Pij ε¯j − α ˆ δ) by Lemma A4 and D( ¯ X 0 ε¯ i6=j Sn−1

Ã X i6=j

Xi Pij Xj0 − α ¯X 0X

!

p

¯ γ 0 S −10 −→ 0, ˆ δ)¯ Sn−10 = Hn + op (1), Sn−1 D( n

The conclusion then follows by T. Q.E.D. Lemma A8: If Assumptions 1-4 are satisfied then for γn = ˜i = Ui − γn εi U ˆ 0) = Sn−1 D(δ

P

i

E[Ui εi ]/

P

i

E[ε2i ] and

n X X √ (1 − Pii )zi εi / n + Sn−1 U˜i Pij εj + op (1). i=1

i6=j

√ Proof: Note that for W = z 0 (P − I)ε/ n by I − P idempotent and E[εε0 ] ≤ CIn we have 0 0 )0 (I − P )(z − ZπKn )/n E[W W 0 ] ≤ Cz 0 (I − P )z/n = C(z − ZπKn n X kzi − πKn Zi k2 /n −→ 0, ≤ CIG i=1

√ so z 0 (P − I)ε/ n = op (1). Also, by M 0

X ε/n =

n X i=1

n X √ √ 0 E[Xi εi ]/n + Op (1/ n), ε ε/n = σi2 /n + Op (1/ n). i=1

Pn

Also, by Assumption 3 i=1 σi2 /n ≥ C > 0. The delta method then gives γ˜ = X 0 ε/ε0 ε = √ ˆ 0 ) = P Xi Pij εj −ε0 ε˜ α(δ0 )˜ γ γn +Op (1/ n). Therefore, it follows by Lemma A1 and D(δ i6=j

[26]

that ˆ 0) = Sn−1 D(δ

X i6=j

X √ ˜i Pij εi − Sn−1 (˜ U zi Pij εj / n + Sn−1 γ − γn )ε0 ε˜ α(δ0 ) i6=j

X X √ √ √ ˜i Pij εj + Op (1/ nμn )op (μ2n /n) Pii zi εi / n + Sn−1 = z P ε/ n − U 0

i

=

i6=j

n X i=1

X √ U˜i Pij εj + op (1).Q.E.D. (1 − Pii )zi εi / n + Sn−1 i6=j

Proof of Theorem 2: Consider first the case where δˆ is HLIM. Then by Theorem p ˆ = 0. Expanding gives ˆ δ) 1, δˆ −→ δ0 . The first-order conditions for LIML are D(

ˆ¡ ¢ ˆ 0 ) + ∂ D δ¯ (δˆ − δ0 ), 0 = D(δ ∂δ

p 0 ¯ where δ¯ lies on the line joining δˆ and δ0 and hence β¯ = μ−1 n Sn (δ − δ0 ) −→ 0. Then by

−10 ¯ ¯ ¯ n = Sn−1 [∂ D( ˆ δ)/∂δ]S ˆ δ)/∂δ Lemma A7, H = HP + op (1). Then ∂ D( is nonsingular n

w.p.a.1 and solving gives −1 ˆ ¯ ˆ δ)/∂δ] ¯ n−1 Sn−1 D(δ ˆ 0 ). Sn0 (δˆ − δ) = −Sn0 [∂ D( D(δ0 ) = −H

Next, apply Lemma A6 with Ui = Ui and √ Win = (1 − Pii )zi εi / n, By εi having bounded fourth moment, and Pii ≤ 1, n X i=1

By Assumption 6, we have

n X £ 4¤ E kWin k ≤ C kzi k4 /n2 −→ 0. i=1

Pn

0 E[Win Win ] −→ ΣP . Let Γ = diag (ΣP , Ψ) and ¶ µ Pn i=1 Win √ P An = . ˜ i6=j Ui Pij εj / K i=1

d

Consider c such that c0 Γc > 0. Then by the conclusion of Lemma A6 we have c0 An −→ p

N(0, c0 Γc). Also, if c0 Γc = 0 then it is straightforward to show that c0 An −→ 0. Then it [27]

follows by the Cramer-Wold device that ¶ µ Pn Win d i=1 √ −→ N(0, Γ), Γ = diag (ΣP , Ψ) . An = P ˜ P ε / K U i6=j i ij j

Next, we consider the two cases. Case I) has K/μ2n bounded. In this case

√ KSn−1 −→ S0 ,

so that √ def Fn = [I, KSn−1 ] −→ F0 = [I, S0 ], F0 ΓF00 = ΣP + S0 ΨS00 . Then by Lemma A8, d

ˆ 0 ) = Fn An + op (1) −→ N(0, ΣP + S0 ΨS00 ), Sn−1 D(δ d ¯ n−1 Sn−1 D(δ ˆ 0 ) −→ N(0, ΛI ). Sn0 (δˆ − δ0 ) = −H

In case II we have K/μ2n −→ ∞. Here √ (μn / K)Fn −→ F¯0 = [0, S¯0 ], F¯0 ΓF¯00 = S¯0 ΨS¯00 √ and (μn / K)op (1) = op (1). Then by Lemma A8, √ √ d ˆ 0 ) = (μn / K)Fn An + op (1) −→ N(0, S¯0 ΨS¯00 ), (μn / K)Sn−1 D(δ √ √ d ¯ n−1 (μn / K)Sn−1 D(δ ˆ 0 ) −→ N(0, ΛII ).Q.E.D. (μn / K)Sn0 (δˆ − δ0 ) = −H The next two results are useful for the proof of consistency of the variance estimator are taken from Chao et. al. (2007). Let μ ¯W n = maxi≤n |E[Wi ]| and μ ¯Y n = maxi≤n |E[Yi ]|. Lemma A9 (Lemma A3 of Chao et al., 2007): If (Wi , Yi ), (i = 1, ..., n) are independent, Wi and Yi are scalars then X i6=j

X √ Pij2 Wi Yj = E[ Pij2 Wi Yj ] + Op ( K(¯ σW n σ ¯Y n + σ ¯W n μ ¯Y n + μ ¯W n σ ¯Y n )). i6=j

Lemma A10 (Lemma A4 of Chao et al., 2007): If Wi , Yi , ηi , are indepen√ √ dent across i with E[Wi ] = ai / n, E[Yi ] = bi / n, |ai | ≤ C, |bi | ≤ C, E[ηi2 ] ≤ C,

[28]

−2 0 V ar(Wi ) ≤ Cμ−2 n , V ar(Yi ) ≤ Cμn , there exists πn such that maxi≤n |ai − Zi πn | −→ 0, √ and K/μ2n −→ 0 then

An = E[

X

Wi Pik ηk Pkj Yj ] = O(1),

i6=j6=k

X

i6=j6=k

p

Wi Pik ηk Pkj Yj − An −→ 0.

ˆ γˆ = X 0 εˆ/ˆ Next, recall that εî = Yi − Xi0 δ, ε0 εˆ, γn =

P

i

E[Xi εi ]/

P

i

σi2 and let

˘ i = Sn−1 (Xi − γˆεî ), X˙ i = Sn−1 (Xi − γn εi ), X ³ ´ X X ˘iX ˘2 = ˘ i εî εˆj X ˘1 = ˘ i Pik εˆ2k Pkj X ˘ j0 , Σ ˘ i0 εˆ2j + X ˘ j0 , Pij2 X Σ X i6=j6=k

Σ˙ 1 =

X

i6=j

X˙ i Pik ε2k Pkj X˙ j0 , Σ˙ 2 =

i6=j6=k

X i6=j

Pij2

³ ´ 0 2 0 ˙ ˙ ˙ ˙ Xi Xi εj + Xi εi εj Xj .

ˆ = Sn0 (δˆ − δ0 ) we have Note that for ∆ ˆ εî − εi = −Xi0 (δˆ − δ0 ) = −Xi0 Sn−10 ∆, h i2 2 2 0 ˆ 0 ˆ εî − εi = −2εi Xi (δ − δ0 ) + Xi (δ − δ0 ) ,

˘ i − X˙ i = −Sn−1 γˆ (ˆ X εi − εi ) − Sn−1 (ˆ γ − γn )εi ,

ˆ − Sn−1 μn (ˆ γ − γn )(εi /μn ), = Sn−1 γˆ Xi0 Sn−10 ∆ ˘ i εî − X˙ i εi = Xi εî − γˆ εˆ2i − Xi εi + γn ε2i , X n h io 0 ˆ 0 ˆ 0 ˆ 2 = −Xi Xi (δ − δ0 ) − γˆ −2εi Xi (δ − δ0 ) + Xi (δ − δ0 )

−(ˆ γ − γn )ε2i . ° ° °2 ° ° °° ° °˘ ° ° ° ° ˙ °° ˘ ° ˘ ˘0 ˙ ˙ X − X + 2 X − X ° ° ° ° °Xi Xi − X˙ i X˙ i0 ° ≤ °X i i i i i°

˘ 2 − Σ˙ 2 = op (K/μ2n ). Lemma A11: If the hypotheses of Theorem 3 are satisfied then Σ √ Proof: Note first that Sn / n is bounded so by the Cauchy-Schwartz inequality, √ p kΥi k = kSn zi / nk ≤ C. Let di = C + |εi | + kUi k . Note that γˆ − γn −→ 0 by standard ar° ° ° ° ° ° ° p ° ˆ = kˆ guments. Then for Aˆ = (1 + kˆ γ k)(1 + °δˆ°) = Op (1), and B γ − γn k + °δˆ − δ0 ° −→ 0,

[29]

we have ˆ + εi | ≤ Cdi A, ˆ εi | ≤ |Xi0 (δ0 − δ) kXi k ≤ C + kUi k ≤ di , |ˆ ° ° ° ° ° ° ° ° °˙ ° ° ˘ ° ° −1 ˆ d , ˆεî )° ≤ Cμ−1 °Xi ° = °Sn−1 (Xi − γn εi )° ≤ Cμ−1 i °Xi ° = Sn (Xi − γ n n di A, ° ° ° ³° ° ° °´ ° °˘° ° ˙ ° °˘ ° ˘ ˘0 ° ° −2 ˆ γ k kˆ ˙ X X + εi − εi k + kˆ − X γ − γn k |εi | ° ° °Xi Xi − X˙ i X˙ i0 ° ≤ °X ° ° i i i i ° ≤ Cμn di A kˆ 2 ˆ2 ˆ ≤ Cμ−2 n di A B,

¯ ¯ 2 ˆ ¯εî − ε2i ¯ ≤ (|εi | + |ˆ εi |) |ˆ εi − εi | ≤ Cd2i AˆB, ° ° ° ¡ ¢° ° °˘ °Xi εî − X˙ i εi ° = °Sn−1 Xi εî − γˆ εˆ2i − Xi εi + γn ε2i ° ¯ ¯ ¡ ¢ ≤ Cμ−1 kXi k |ˆ εi − εi | + kˆ γ k |ˆ ε2i − ε2i | + ¯ε2i ¯ kˆ γ − γn k n 2 ˆ 2 ˆ2 ˆ ˆ2 ˆ ˆ ≤ Cμ−1 n di (B + A B + B) ≤ Cdi A B, ° ° ° ° ° °˘ ° −1 2 ˆ2 ° ˙ 2 °Xi εî ° ≤ Cμn di A , °Xi εi ° ≤ Cμ−1 n di .

Also note that

E

" X i6=j

#

≤ Cμ−2 Pij2 d2i d2j μ−2 n n

X

Pij2 = Cμ−2 n

i,j

X

Pii = Cμ−2 n K.

i

P 2 so that i6=j Pij2 d2i d2j μ−2 n = Op (K/μn ) by the Markov inequality. Then it follows that ° ° ¶ µ °X ° ° °2 ¯ ³ ´° X ¯ 2¯ ° ¯ °˙ ° ¯ 2 ° ° ° ˘ ˘0 2 0 2 0 2 2 0° 2¯ ¯ ˘ ˙ ˙ ¯ ˘ ˙ ˙ Pij Xi Xi εˆj − Xi Xi εj ° ≤ Pij εˆj °Xi Xi − Xi Xi ° + °Xi ° εˆj − εj ° ° ° i6=j i6=j X ¡ ¢ ˆ + AˆB) ˆ = op K/μ2n . ≤ Cμ−2 Pij2 d2i d2j (Aˆ4 B n i6=j

We also have ° ° °X °° ° ° °° °´ ³ ´° ³° X ° °° ˘ ° ° ˙ °° ˘ ° ° 2 0 2 ° ˘ ˙ ˙ ˙ ˘ ˘ ˙ X + X X ≤ P ε ˆ ε ˆ − X ε ε P ε ˆ ε ˆ − X ε ε ε ˆ − X ε X X X X ° j j° ° i i ° i i j j i i j j ° j j° i i° ij ij ° i i ° ° j j ° ° i6=j i6=j µ ¶ X K −2 2 2 2 2 ˆ2 ˆ ˆ ≤ Cμn . Pij di dj (1 + A )A B = op μ2n i6=j The conclusion then follows by the triangle inequality. Q.E.D. ˘ 1 − Σ˙ 1 = op (K/μ2n ). Lemma A12: If the hypotheses of Theorem 3 are satisfied then Σ Proof: Note first that ¡ √ ¢0 ˆ = −Di0 ∆, ˆ εî − εi = −Xi0 (δˆ − δ0 ) = −Xi0 Sn−10 Sn0 (δˆ − δ0 ) = − zi / n + Sn−1 Ui ∆ [30]

√ ˆ = Sn0 (δˆ − δ0 ). Also where Di = zi / n + Sn−1 Ui and ∆ h i2 εˆ2i − ε2i = −2εi Xi0 (δˆ − δ0 ) + Xi0 (δˆ − δ0 ) ,

˘ i − X˙ i = −ˆ ˆ − Sn−1 μn (ˆ γ εî + γn εi = Sn−1 γˆDi0 ∆ γ − γn ) εi /μn . X

˘ 1 − Σ˙ 1 = P7 Tr where We now have Σ r=1 ³ ´ ³ ´0 ³ ´0 X X ¡ ¢ ¡ ¢ ˘ i − X˙ i Pik εˆ2k − ε2k Pkj X ˘ j − X˙ j , T2 = ˘ j − X˙ j X˙ i Pik εˆ2k − ε2k Pkj X X T1 = i6=j6=k

T3 =

i6=j6=k

T6 =

i6=j6=k

´ ³ ´0 ´ X ³ X ³ ˘ i − X˙ i Pik ε2k Pkj X ˘ j − X˙ j , T4 = T20 , T5 = ˘ i − X˙ i Pik ε2k Pkj X˙ j0 , X X X

i6=j6=k

i6=j6=k

¡ ¢ X˙ i Pik εˆ2k − ε2k Pkj X˙ j0 , T7 = T50 .

From the above expression for εˆ2i − ε2i we see that T6 is a sum of terms of the form 0 ˙ ˙0 ˆP ˆ p B i6=j6=k Xi Pik ηi Pkj Xj where B −→ 0 and ηi is either a component of −2εi Xi or of Xi Xi . P By Lemma A10 we have i6=j6=k X˙ i Pik ηi Pkj X˙ j0 = Op (1), so by the triangle inequality p

T6 −→ 0. Also, note that X X ˆ0 Di Pik ε2k Pkj X˙ j0 + Sn−1 μn (ˆ γ − γn ) (εi /μn )Pik ε2k Pkj X˙ j0 . T5 = Sn−1 γˆ ∆ i6=j6=k

i6=j6=k

√ √ p ˆ 0 −→ ˙ Note that Sn−1 γˆ∆ 0, E [Di ] = zi / n, V ar(Di ) = O(μ−2 n ), E[Xi ] = zi / n, and P 2 ˙ = O(μ−2 ˙0 V ar(X) n ). Then by Lemma A10 it follows that i6=j6=k Di Pik εk Pkj Xj = Op (1) 2 ˙0 p ˆ0 P so that the Sn−1 γˆ∆ i6=j6=k Di Pik εk Pkj Xj −→ 0. A similar argument applied to the second p

p

term and the triangle inequality then give T5 −→ 0. Also T7 = T50 −→ 0.

Next, analogous arguments apply to T2 and T3 , except that there are four terms in each of them rather than two, and also to T1 except there are eight terms in T1 . For brevity we omit details. Q.E.D. Lemma A13: If the hypotheses of Theorem 3 are satisfied then ³ ´ X X ˜i U˜i0 ]σj2 + E[U ˜i εi ]E[εj U˜j0 ] Sn−10 + op (K/μ2n ). Σ˙ 2 = Pij2 zi zi0 σj2 /n + Sn−1 Pij2 E[U i6=j

Proof: Note that V

i6=j

ar(ε2i )

≤ C and μ2n ≤ Cn, so that for uki = e0k Sn−1 Ui ,

© 4 2 ª 4 + X˙ i4 ] ≤ C zik /n + E[u4k ] + zi4 /n2 + E[u4 ] ≤ Cμ−4 E[(X˙ ik X˙ i )2 ] ≤ CE[X˙ ik n , 2 2 −2 εi /n + u2ki ε2i )] ≤ Cn−1 + Cμ−2 E[(X˙ ik εi )2 ] ≤ CE[(zik n ≤ Cμn .

[31]

˜ i = E[U˜i U˜i0 ], Also, we have, for Ω ˜ i Sn−10 , E[X˙ i εi ] = Sn−1 E[U ˜i εi ]. E[X˙ i X˙ i0 ] = zi zi0 /n + Sn−1 Ω Next let Wi be e0j X˙ i X˙ i0 ek for some j and k, so that ˜i0 ]Sn−10 ek + zij zik /n, |E[Wi ]| ≤ Cμ−2 ˜i U E[Wi ] = e0j Sn−1 E[U n . ©¡ √ ¢¡ √ ¢ª V ar(Wi ) = V ar e0j Sn−1 Ui + zij / n e0k Sn−1 Ui + zik / n ≤ C/μ4n + C/nμ2n ≤ C/μ4n .

Also let Yi = ε2i . Then

√ K(¯ σW n σ ¯Y n + σ ¯W n μ ¯Y n + μ ¯W n σ ¯Y n ) ≤ CK 1/2 /μ2n , so applying

Lemma A9 for this Wi and Yi gives X

Pij2 X˙ i X˙ i0 ε2j =

i6=j

X i6=j

³ ´ √ ˜ i Sn−10 σj2 + Op ( K/μ2n ). Pij2 zi zi0 /n + Sn−1 Ω

It follows similarly from Lemma A9 with Wi and Yi equal to elements of X˙ i εi that X

Pij2 X˙ i εi εj X˙ j0 = Sn−1

i6=j

X

√ ˜j0 ]Sn−10 + Op ( K/μ2n ). ˜i εi ]E[εj U Pij2 E[U

i6=j

√ Also, by K −→ ∞ we have Op ( K/μ2n ) = op (K/μ2n ). The conclusion then follows by T. Q.E.D. Lemma A14: If the hypotheses of Theorem 3 are satisfied then Σ˙ 1 =

X

zi Pik σk2 Pkj zj0 /n + op (1).

i6=j6=k

Proof: Apply Lemma A10 with Wi equal to an element of X˙ i , Yj equal to an element of X˙ j , and ηk = ε2k . Q.E.D. Proof of Theorem 3: Note that ˘1 + Σ ˘ 2 )(Sn−1 HS ˆ n−10 )−1 (Σ ˆ n−10 )−1 . Sn0 Vˆ Sn = (Sn−1 HS p

ˆ n−10 −→ HP . Also, note that for z¯i = By Lemma A4 we have Sn−1 HS [32]

P

j

Pij zi = e0i P z,

X

zi Pik σk2 Pkj zj0 /n =

XX X i

i6=j6=k

=

zi Pik σk2 Pkj zj0 /n

j6=i k∈{i,j} /

Ã XX X i

j6=i

k

zi Pik σk2 Pkj zj0 − zi Pii σi2 Pij zj0 − zi Pij σj2 Pjj zj0

!

/n

X X X X z¯k σk2 z¯k0 − Pik2 zi zi0 σk2 − zi Pii σi2 z¯i0 + zi Pii σi2 Pii zi0 = ( k

− =

X

X i

i,k

z¯j σj2 Pjj zj0 +

j

X

i

i

zj Pjj σj2 Pjj zj0 )/n

i

X ¡ ¢ σi2 z¯i z¯i0 − Pii zi z¯i0 − Pii z¯i zi0 + Pii2 zi zi0 /n − Pij2 zi zi0 σj2 /n. i6=j

Also, it follows similarly to the proof of Lemma A8 that

P

i

kzi − z¯i k2 /n ≤ z 0 (I −

P )z/n −→ 0. Then by σi2 and Pii bounded we have ° ° °X ° X ° ° σi2 (¯ zi z¯i0 − zi zi0 )/n° ≤ σi2 (2 kzi k kzi − z¯i k + kzi − z¯i k2 )/n ° ° ° i i X X X kzi k2 /n)1/2 ( kzi − z¯i k2 /n)1/2 + C kzi − z¯i k2 /n −→ 0, ≤ C( i i i ° ° °X ° X X ° ° σi2 Pii (zi z¯i0 − zi zi0 )/n° ≤ ( σi4 Pii2 kzi k2 /n)1/2 ( kzi − z¯i k2 /n)1/2 −→ 0. ° ° ° i

i

i

It follows that X

zi Pik σk2 Pkj zj0 /n =

X i

i6=j6=k

σi2 (1 − Pii )2 zi zi0 /n + o(1) −

= ΣP −

X

Pij2 zi zi0 σj2 /n + o(1).

X

Pij2 zi zi0 σj2 /n

i6=j

i6=j

It then follows by Lemmas and the triangle inequality that ˘1 + Σ ˘2 = Σ

X

zi Pik σk2 Pkj zj0 /n +

i6=j6=k

+Sn−1

X i6=j

X

Pij2 zi zi0 σj2 /n

i6=j

³ ´ 2 0 2 0 ˜ ˜ ˜ ˜ Pij E[Ui Ui ]σj + E[Ui εi ]E[εj Uj ] Sn−10 + op (1) + op (K/μ2n )

= ΣP + KSn−1 (Ψ + o(1))Sn−10 + op (1) + op (K/μ2n ) = ΣP + KSn−1 ΨSn−10 + op (1) + op (K/μ2n ). [33]

Then in case I) we have op (K/μ2n ) = op (1) so that ¡ ¢ Sn0 Vˆ Sn = H −1 ΣP + KSn−1 ΨSn−10 H −1 + op (1) = ΛI + op (1). p

In case II) we have (μ2n /K) op (1) −→ 0, so that

¢ ¡¡ ¢ ¡ 2 ¢ 0 μn /K Sn Vˆ Sn = H −1 μ2n /K ΣP + μ2n Sn−1 ΨSn−10 H −1 + op (1) = ΛII + op (1).

p d Next, consider case I) and note that Sn0 (δˆ − δ0 ) −→ Y ∼ N(0, ΛI ), Sn0 Vˆ Sn −→ ΛI , √ c0 KSn−10 → c0 S00 , and c0 S00 ΛI S0 c 6= 0. Then by the continuous mapping and Slutzky

theorems, √ c0 (δˆ − δ0 ) c0 Sn−10 Sn0 (δˆ − δ0 ) c0 KSn−10 Sn0 (δˆ − δ0 ) p = q =q √ √ c0 Vˆ c c0 Sn−10 Sn0 Vˆ Sn Sn−1 c c0 KSn−10 Sn0 Vˆ Sn Sn−1 Kc

c0 S 0 Y d ∼ N(0, 1). −→ p 0 0 c0 S0 ΛI S0 c ³ √ ´ 0 p d ˆ 0 ) −→ Y¯ ∼ N(0, ΛII ), (μ2n /K) Sn0 Vˆ Sn −→ ΛII , c0 μn Sn−10 −→ For case II), μn / K Sn (δ−δ

c0 S¯00 , and c0 S¯00 ΛII S¯0 c 6= 0. Then ³ √ ´ 0 0 −10 c μ 0 ˆ S / K Sn (δˆ − δ0 ) n n c (δ − δ0 ) p = q c0 Vˆ c c0 Sn−10 (μ2n /K) Sn0 Vˆ Sn Sn−1 c ³ √ ´ c0 μn Sn−10 μn / K Sn0 (δˆ − δ0 ) c0 S¯00 Y¯ d −→ p = q ∼ N(0, 1).Q.E.D. 0S ¯00 ΛII S¯0 c c 0 −10 2 0 −1 ˆ c μn Sn (μn /K) Sn V Sn Sn μn c References

Ackerberg, D.A. and P. Devereux (2003). “Improved JIVE Estimators for Overidentified Models with and without Heteroskedasticity,” Working Paper, UCLA. Angrist, J.D., G.W.Imbens, and A. Krueger (1999). “Jackknife Instrumental Variables Estimation,” Journal of Applied Econometrics 14, 57-67. Angrist, J. and A. Krueger (1991). “Does Compulsory School Attendance Affect Schooling and Earnings”, Quarterly Journal of Economics 104, 979—1014. [34]

Bekker, P.A. (1994). “Alternative Approximations to the Distributions of Instrumental Variables Estimators,” Econometrica 63, 657-681. Bekker, P.A. and J. van der Ploeg (2005). “Instrumental Variable Estimation Based on Grouped Data,” Statistica Neerlandica 59, 239-267. Blomquist, S. and M. Dahlberg (1999). “Small Sample Properties of LIML and Jackknife IV Estimators: Experiments with Weak Instruments,” Journal of Applied Econometrics 14, 69-88. Chao, J. and N. Swanson (2004). “Estimation and Testing Using Jackknife IV in Heteroskedastic Regressions With Many Weak Instruments,” Working Paper, Rutgers University. Chao, J. and N. Swanson (2005). “Consistent Estimation With a Large Number of Weak Instruments,” Econometrica 73, 1673-1692. Chao, J., N. Swanson, J. Hausman, W. Newey, and T. Woutersen (2007). “Asymptotic Distribution of JIVE in a Heteroskedastic IV Regression with Many Instruments,” Working Paper, Rutgers. Davidson, R. and J.G. MacKinnon (2006). “The Case Against Jive,” (with discussion and reply), Journal of Applied Econometrics 21, 827-833. Donald, S. G. and W. K. Newey (2001). “Choosing the Number of Instruments” Econometrica 69, 1161-1191. Dufour, J.M. (1997). “Some Impossibility Theorems in Econometrics, With Applications to Structural and Dynamic Models,” Econometrica 65, 1365 - 1388. Fuller, W.A. (1977): ”Some Properties of a Modification of the Limited Information Estimator,” Econometrica 45, 939-954. Hahn, J. and J. Hausman (2002) “A New Specification Test for the Validity of Instrumental Variables”, Econometrica 70, 163-189. [35]

Hahn, J., J.A. Hausman, and G.M. Kuersteiner (2004): “Estimation with Weak Instruments: Accuracy of higher-order bias and MSE approximations,” Econometrics Journal, Volume 7. Hahn, J. and A. Inoue (2002). “A Monte Carlo Comparison of Various Asymptotic Approximations to the Distribution of Instrumental Variables Estimators,” Econometric Reviews 21, 309-336. Han, C. and P.C.B. Phillips (2006). “GMM With Many Moment Conditions,” Econometrica 74, 147-192. Hansen, C., J.A. Hausman, W.K. Newey (2007). “Estimation with Many Instrumental Variables,” Journal of Business and Economic Statistics, forthcoming. Hansen, L.P., J. Heaton and A. Yaron (1996). “Finite-Sample Properties of Some Alternative GMM Estimators,” Journal of Business and Economic Statistics 14, 262-280. Kunitomo, N. (1980). “Asymptotic Expansions of Distributions of Estimators in a Linear Functional Relationship and Simultaneous Equations,” Journal of the American Statistical Association 75, 693-700. Morimune, K. (1983). “Approximate Distributions of k-Class Estimators When the Degree of Overidentifiability is Large Compared with the Sample Size,” Econometrica 51, 821-841. Newey, W.K. and Windmeier (2007). “GMM with Many Weak Moment Conditions Many Moments,” Working Paper, MIT. Phillips, G.D.A. and C. Hale (1977). “The Bias of Instrumental Variable Estimators of Simultaneous Equation Systems,” International Economic Review 18, 219-228. Smith, R.J. (2004). “GEL Criteria for Moment Condition Models,” Working Paper, CEMMAP, UCL. [36]

Staiger, D. and J. Stock (1997). “Instrumental Variables Regression with Weak Instruments”, Econometrica 65, 557-586. Stock, J. and M. Yogo (2005). “Asymptotic Distributions of Instrumental Variables Statistics with Many Instruments,” Chapter 6, in Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, Andrews, D.W.K. and J.H. Stock eds., Cambridge: Cambridge University Press. White, H. (1982). “Instrumental Variables Estimation with Independent Observations,” Econometrica 50, 483-500.

[37]

1

CP

0:00 8 0:00 8 0:00 8 0:00 16 0:00 16 0:00 16 0:00 32 0:00 32 0:00 32 0:10 8 0:10 8 0:10 8 0:10 16 0:10 16 0:10 16 0:10 32 0:10 32 0:10 32 0:20 8 0:20 8 0:20 8 0:20 16 0:20 16 0:20 16 0:20 32 0:20 32 0:20 32 ***Simulation

1

R2"2 jz2 LIM L

HLIM

F U LL1

0 0:101 0:166 0:027 8 0:668 0:068 0:054 28 0:456 0:104 0:094 0 0:023 0:023 0:005 8 0:010 0:091 0:006 28 0:014 0:160 0:019 0 0:010 0:010 0:001 8 0:009 0:007 0:002 28 0:032 0:005 0:005 0 0:118 0:020 0:025 8 1:337 1:283 0:056 28 6:297 0:003 0:082 0 0:026 0:011 0:004 8 1:858 0:120 0:137 28 2:814 0:173 0:270 0 0:011 0:004 0:000 8 0:225 0:028 0:114 28 1:209 0:097 0:318 0 0:115 0:006 0:025 8 2:381 0:113 0:116 28 17:799 0:716 0:183 0 0:027 0:001 0:003 8 5:254 0:144 0:243 28 20:790 0:381 0:465 0 0:011 0:001 0:000 8 0:805 0:011 0:220 28 0:426 0:090 0:621 results based on 20000 replications.

K

Table 1: Mean Bias

0:027 0:056 0:104 0:005 0:007 0:026 0:001 0:001 0:003 0:042 0:075 0:123 0:016 0:014 0:037 0:007 0:002 0:006 0:051 0:084 0:132 0:023 0:016 0:038 0:012 0:004 0:011

HF U L

2

0:002 0:013 0:032 0:008 0:030 0:029 0:004 0:014 0:023 0:017 0:019 0:066 0:004 0:028 0:031 0:002 0:016 0:033 0:029 0:026 0:100 0:012 0:030 0:033 0:007 0:020 0:045

6:029 0:183 0:896 0:105 0:008 0:260 0:037 0:052 0:723 9:684 0:421 0:658 0:131 0:043 0:454 0:037 0:030 0:128 10:364 0:472 0:370 0:143 0:063 0:503 0:038 0:025 0:324

JIV E

= 1:0

HF U L k1

= 0:3, T = 800, JCU E

0:101 0:002 9.4E+008 1:025 -5.5E+009 2:836 0:023 0:004 6.8E+008 0:236 -1.8E+009 0:145 0:010 0:010 3.0E+008 0:032 -7.7E+008 0:878 0:118 0:107 6.5E+009 0:459 -3.1E+009 1:232 0:026 0:024 1.2E+010 0:070 5.8E+009 0:380 0:011 0:011 5.1E+009 0:032 9.4E+009 1:790 0:115 0:202 2.2E+010 13:213 7.9E+009 0:053 0:027 0:040 1.9E+010 0:006 1.5E+010 1:287 0:011 0:013 2.5E+010 0:021 3.1E+010 0:064 24-Aug-2007 12:01 PM

CU E

2

CP

0:00 8 0:00 8 0:00 8 0:00 16 0:00 16 0:00 16 0:00 32 0:00 32 0:00 32 0:10 8 0:10 8 0:10 8 0:10 16 0:10 16 0:10 16 0:10 32 0:10 32 0:10 32 0:20 8 0:20 8 0:20 8 0:20 16 0:20 16 0:20 16 0:20 32 0:20 32 0:20 32 ***Simulation

1

R2"2 jz2 LIM L

HLIM

F U LL1

0 29:427 142:094 0:108 8 9.7E+003 2.1E+003 0:249 28 4.4E+003 1.9E+003 0:501 0 0:141 0:119 0:065 8 4:111 229:916 0:120 28 374:392 239:892 0:262 0 0:036 0:036 0:033 8 0:385 10:373 0:045 28 0:875 19:071 0:084 0 63:294 14:746 0:191 8 8.8E+003 2.9E+004 1:143 28 1.2E+006 2.2E+003 2:999 0 0:152 0:134 0:112 8 1.5E+004 265:764 0:639 28 7.9E+004 498:607 2:042 0 0:062 0:059 0:057 8 312:246 3:383 0:215 28 3.6E+003 48:497 0:839 0 73:754 142:328 0:264 8 3.2E+005 734:796 2:223 28 1.2E+007 9.9E+004 5:956 0 0:210 0:169 0:154 8 4.3E+005 984:062 1:427 28 8.7E+006 678:117 4:618 0 0:085 0:078 0:077 8 8.3E+003 4:932 0:590 28 9.2E+005 34:697 2:479 results based on 20000 replications.

K 0:108 0:241 0:470 0:064 0:117 0:255 0:033 0:045 0:085 0:169 0:428 0:705 0:103 0:236 0:436 0:054 0:088 0:170 0:215 0:613 0:917 0:134 0:373 0:615 0:071 0:150 0:273

0:142 0:821 3:227 0:074 0:252 1:289 0:034 0:053 0:201 0:214 1:575 5:218 0:115 0:623 2:611 0:057 0:129 0:572 0:269 2:472 7:208 0:148 1:154 4:043 0:074 0:293 1:134

HF U L k1

= 0:3, T = 800,

HF U L

Table 2: Variance of Estimates JIV E

= 1:0

7.3E+005 866:939 3.8E+004 2:280 333:333 2.1E+003 0:047 0:338 7.4E+003 1.9E+006 1.2E+003 2.0E+004 8:821 363:588 2.6E+003 0:077 10:585 167:392 2.2E+006 1.4E+003 2.8E+004 14:236 316:244 3.4E+003 0:104 15:485 3.9E+003

2

JCU E

29:427 110:545 3.2E+022 1.1E+004 6.2E+022 4.1E+004 0:141 8:709 7.0E+021 478:994 3.1E+022 204:799 0:036 0:036 1.3E+021 49:222 6.0E+021 1.8E+004 63:294 143:941 3.6E+023 1.5E+004 5.6E+023 1.3E+004 0:152 0:827 1.4E+023 631:282 3.8E+023 4.1E+003 0:062 0:080 2.9E+022 0:394 1.6E+023 6.3E+004 73:754 753:247 6.8E+023 2.9E+006 1.1E+024 4.9E+003 0:210 0:556 4.1E+023 529:838 8.3E+023 2.1E+004 0:085 0:086 3.3E+023 5:944 5.6E+023 32:788 24-Aug-2007 12:01 PM

CU E

3

CP

0:00 8 0:00 8 0:00 8 0:00 16 0:00 16 0:00 16 0:00 32 0:00 32 0:00 32 0:10 8 0:10 8 0:10 8 0:10 16 0:10 16 0:10 16 0:10 32 0:10 32 0:10 32 0:20 8 0:20 8 0:20 8 0:20 16 0:20 16 0:20 16 0:20 32 0:20 32 0:20 32 ***Simulation

1

R2"2 jz2 LIM L

HLIM

F U LL1

0 29:437 142:121 0:109 8 9.7E+003 2.1E+003 0:252 28 4.5E+003 1.9E+003 0:510 0 0:142 0:120 0:065 8 4:111 229:924 0:120 28 374:392 239:917 0:262 0 0:036 0:036 0:033 8 0:385 10:373 0:045 28 0:876 19:071 0:084 0 63:308 14:747 0:192 8 8.8E+003 2.9E+004 1:146 28 1.2E+006 2.2E+003 3:005 0 0:153 0:134 0:112 8 1.5E+004 265:779 0:658 28 7.9E+004 498:637 2:115 0 0:062 0:059 0:057 8 312:297 3:383 0:228 28 3.6E+003 48:506 0:940 0 73:767 142:328 0:264 8 3.2E+005 734:809 2:236 28 1.2E+007 9.9E+004 5:990 0 0:210 0:170 0:154 8 4.3E+005 984:083 1:486 28 8.7E+006 678:263 4:834 0 0:085 0:078 0:077 8 8.3E+003 4:932 0:639 28 9.2E+005 34:705 2:865 results based on 20000 replications.

K 0:109 0:244 0:481 0:065 0:117 0:256 0:033 0:045 0:085 0:170 0:434 0:721 0:103 0:236 0:438 0:054 0:088 0:170 0:218 0:620 0:935 0:134 0:374 0:617 0:072 0:150 0:273

0:142 0:821 3:228 0:074 0:252 1:290 0:034 0:053 0:201 0:214 1:576 5:222 0:115 0:624 2:612 0:057 0:129 0:573 0:269 2:472 7:218 0:148 1:155 4:044 0:075 0:293 1:136

HF U L k1

= 0:3, T = 800,

HF U L

Table 3: Mean Square Error JIV E

= 1:0

7.3E+005 866:973 3.8E+004 2:291 333:333 2.1E+003 0:049 0:341 7.4E+003 1.9E+006 1.2E+003 2.0E+004 8:838 363:590 2.6E+003 0:079 10:586 167:409 2.2E+006 1.4E+003 2.8E+004 14:256 316:248 3.4E+003 0:106 15:486 3.9E+003

2

JCU E

29:437 110:545 3.2E+022 1.1E+004 6.2E+022 4.1E+004 0:142 8:709 7.0E+021 479:049 3.1E+022 204:820 0:036 0:036 1.3E+021 49:223 6.0E+021 1.8E+004 63:308 143:952 3.6E+023 1.5E+004 5.6E+023 1.3E+004 0:153 0:828 1.4E+023 631:287 3.8E+023 4.1E+003 0:062 0:080 2.9E+022 0:395 1.6E+023 6.3E+004 73:767 753:287 6.8E+023 2.9E+006 1.1E+024 4.9E+003 0:210 0:557 4.1E+023 529:838 8.3E+023 2.1E+004 0:085 0:086 3.3E+023 5:945 5.6E+023 32:792 24-Aug-2007 12:01 PM

CU E

4

CP

0:00 8 0:00 8 0:00 8 0:00 16 0:00 16 0:00 16 0:00 32 0:00 32 0:00 32 0:10 8 0:10 8 0:10 8 0:10 16 0:10 16 0:10 16 0:10 32 0:10 32 0:10 32 0:20 8 0:20 8 0:20 8 0:20 16 0:20 16 0:20 16 0:20 32 0:20 32 0:20 32 ***Simulation

1

R2"2 jz2 F U LL1 0:043 0:057 0:091 0:021 0:023 0:035 0:011 0:011 0:013 0:065 0:084 0:118 0:033 0:036 0:053 0:017 0:015 0:021 0:078 0:113 0:146 0:040 0:055 0:071 0:020 0:024 0:027

HF U L

2

0:025 0:027 0:067 0:012 0:007 0:017 0:007 0:002 0:003 0:049 0:062 0:100 0:024 0:021 0:036 0:013 0:006 0:011 0:065 0:096 0:134 0:033 0:042 0:058 0:016 0:016 0:017

CU E

JCU E

0:034 0:005 0:005 0:053 0:025 0:032 0:164 0:071 0:092 0:039 0:003 0:003 0:024 0:006 0:006 0:029 0:022 0:029 0:018 0:002 0:002 0:019 0:002 0:002 0:014 0:006 0:006 0:028 0:003 0:009 0:042 0:004 0:005 0:155 0:000 0:085 0:041 0:002 0:003 0:025 0:002 0:017 0:031 0:021 0:013 0:020 0:001 0:000 0:021 0:000 0:009 0:012 0:007 0:005 0:031 0:001 0:012 0:039 0:003 0:005 0:148 0:034 0:076 0:044 0:001 0:010 0:024 0:003 0:037 0:025 0:049 0:013 0:021 0:001 0:003 0:021 0:000 0:019 0:016 0:017 0:021 24-Aug-2007 12:01 PM

JIV E

= 1:0

HF U L k1

= 0:3, T = 800,

0:005 0:042 0:023 0:057 0:065 0:086 0:003 0:021 0:005 0:023 0:016 0:035 0:002 0:011 0:001 0:011 0:002 0:013 0:033 0:044 0:059 0:160 0:100 0:445 0:016 0:020 0:019 0:148 0:035 0:485 0:008 0:010 0:005 0:088 0:011 0:328 0:050 0:041 0:094 0:349 0:134 0:937 0:024 0:017 0:041 0:327 0:057 1:082 0:011 0:008 0:015 0:192 0:016 0:846 20000 replications.

HLIM

0:005 0:024 0:065 0:003 0:005 0:019 0:002 0:002 0:003 0:003 0:286 0:793 0:002 0:198 0:663 0:001 0:104 0:379 0:001 0:623 1:871 0:001 0:443 1:679 0:001 0:220 1:038 based on

LIM L

0 8 28 0 8 28 0 8 28 0 8 28 0 8 28 0 8 28 0 8 28 0 8 28 0 8 28 results

K

Table 4: Median Bias

5

CP

0:00 8 0:00 8 0:00 8 0:00 16 0:00 16 0:00 16 0:00 32 0:00 32 0:00 32 0:10 8 0:10 8 0:10 8 0:10 16 0:10 16 0:10 16 0:10 32 0:10 32 0:10 32 0:20 8 0:20 8 0:20 8 0:20 16 0:20 16 0:20 16 0:20 32 0:20 32 0:20 32 ***Simulation

1

R2"2 jz2 LIM L

F U LL1

1:466 1:072 2:934 1:657 5:179 2:421 0:920 0:821 1:303 1:099 2:307 1:655 0:616 0:590 0:716 0:679 0:985 0:901 1:702 1:431 4:176 3:542 6:735 5:331 1:144 1:085 1:915 2:698 3:348 4:715 0:788 0:774 0:968 1:402 1:394 3:122 1:868 1:675 5:611 4:776 8:191 7:145 1:287 1:267 2:598 4:044 4:540 6:624 0:901 0:903 1:226 2:429 1:815 5:424 20000 replications.

HLIM

0 1:470 8 2:852 28 5:036 0 0:920 8 1:291 28 2:192 0 0:616 8 0:715 28 0:961 0 1:904 8 14:446 28 33:938 0 1:206 8 5:492 28 19:963 0 0:806 8 1:603 28 5:751 0 2:219 8 26:169 28 60:512 0 1:405 8 12:510 28 41:753 0 0:941 8 3:365 28 18:357 results based on

K 1:073 1:644 2:364 0:820 1:095 1:656 0:589 0:680 0:913 1:336 2:197 2:895 1:041 1:521 2:205 0:756 0:909 1:241 1:494 2:664 3:332 1:174 1:933 2:662 0:868 1:134 1:571

HF U L

Table 5: Nine Decile Range .05 to .95

1:202 2:579 4:793 0:862 1:280 2:260 0:602 0:713 0:983 1:477 3:658 6:203 1:088 1:863 3:282 0:771 0:962 1:388 1:653 4:738 7:510 1:229 2:491 4:376 0:884 1:217 1:808

HF U L k1

2

= 1:0 CU E

JCU E

3:114 1:470 1:487 5:098 3:101 3:511 6:787 6:336 6:240 1:229 0:920 0:919 1:862 1:389 1:446 3:685 2:577 2:905 0:679 0:616 0:616 0:816 0:770 0:767 1:200 1:156 1:133 3:826 1:904 2:092 6:687 8:758 6:334 8:638 2.6E+011 9:658 1:568 1:206 1:235 2:393 2:119 2:183 4:519 12:529 4:719 0:885 0:806 0:809 1:044 0:910 0:946 1:462 1:789 1:519 4:381 2:219 2:582 7:781 16:218 8:586 9:975 1.5E+012 12:281 1:815 1:405 1:474 2:786 3:445 2:986 5:186 1.1E+011 6:287 1:029 0:941 0:946 1:206 1:011 1:086 1:678 3:563 1:873 24-Aug-2007 12:01 PM

JIV E

= 0:3, T = 800,

6

CP

0:00 8 0:00 8 0:00 8 0:00 16 0:00 16 0:00 16 0:00 32 0:00 32 0:00 32 0:10 8 0:10 8 0:10 8 0:10 16 0:10 16 0:10 16 0:10 32 0:10 32 0:10 32 0:20 8 0:20 8 0:20 8 0:20 16 0:20 16 0:20 16 0:20 32 0:20 32 0:20 32 ***Simulation

1

R2"2 jz2 LIM L

F U LL1

0:491 0:412 0:683 0:575 0:944 0:784 0:343 0:317 0:428 0:391 0:588 0:521 0:241 0:232 0:271 0:260 0:340 0:321 0:595 0:554 0:827 1:400 1:077 2:894 0:437 0:425 0:550 0:868 0:719 1:919 0:313 0:310 0:353 0:475 0:435 0:917 0:647 0:651 0:880 2:188 1:149 4:444 0:496 0:499 0:622 1:365 0:814 3:507 0:363 0:365 0:410 0:728 0:506 1:641 20000 replications.

HLIM

0 0:491 8 0:686 28 0:919 0 0:343 8 0:424 28 0:568 0 0:241 8 0:271 28 0:333 0 0:655 8 2:132 28 5:598 0 0:457 8 1:036 28 2:866 0 0:320 8 0:505 28 1:072 0 0:772 8 4:073 28 10:657 0 0:539 8 1:854 28 6:876 0 0:378 8 0:794 28 2:358 results based on

K 0:411 0:571 0:786 0:317 0:393 0:531 0:232 0:261 0:325 0:513 0:699 0:914 0:407 0:507 0:653 0:303 0:340 0:414 0:568 0:754 0:981 0:464 0:569 0:733 0:351 0:394 0:481

0:446 0:670 0:938 0:330 0:425 0:586 0:236 0:270 0:340 0:550 0:812 1:067 0:421 0:545 0:716 0:308 0:352 0:434 0:606 0:865 1:144 0:479 0:617 0:810 0:357 0:409 0:505

HF U L k1

2

JCU E

0:702 0:491 0:493 0:907 0:705 0:750 1:107 1:006 1:087 0:413 0:343 0:343 0:520 0:448 0:454 0:710 0:654 0:681 0:261 0:241 0:241 0:302 0:288 0:285 0:389 0:398 0:391 0:910 0:655 0:686 1:183 0:976 1:047 1:403 1:822 1:478 0:541 0:457 0:466 0:673 0:553 0:590 0:905 0:949 0:893 0:345 0:320 0:321 0:386 0:338 0:349 0:487 0:486 0:477 1:068 0:772 0:826 1:373 1:155 1:319 1:611 2:496 1:752 0:635 0:539 0:556 0:772 0:615 0:707 1:021 1:201 1:063 0:407 0:378 0:380 0:447 0:363 0:391 0:551 0:549 0:541 24-Aug-2007 12:01 PM

CU E

= 1:0

JIV E

= 0:3, T = 800,

HF U L

Table 6: Interquartile Range .25 to .75

7

CP

0:00 8 0:00 8 0:00 8 0:00 16 0:00 16 0:00 16 0:00 32 0:00 32 0:00 32 0:10 8 0:10 8 0:10 8 0:10 16 0:10 16 0:10 16 0:10 32 0:10 32 0:10 32 0:20 8 0:20 8 0:20 8 0:20 16 0:20 16 0:20 16 0:20 32 0:20 32 0:20 32 ***Simulation

1

R2"2 jz2 F U LL1

0:026 0:021 0:037 0:029 0:049 0:040 0:035 0:029 0:039 0:032 0:046 0:038 0:042 0:037 0:042 0:038 0:047 0:039 0:021 0:052 0:040 0:052 0:054 0:066 0:031 0:077 0:042 0:056 0:050 0:054 0:040 0:104 0:043 0:084 0:049 0:067 0:019 0:075 0:037 0:080 0:051 0:118 0:030 0:120 0:040 0:081 0:052 0:098 0:040 0:162 0:042 0:120 0:049 0:107 20000 replications.

HLIM

0:025 0:035 0:045 0:033 0:036 0:042 0:041 0:041 0:042 0:065 0:053 0:048 0:090 0:070 0:058 0:114 0:102 0:086 0:097 0:065 0:059 0:141 0:097 0:078 0:177 0:146 0:128 based on

LIM L

0 8 28 0 8 28 0 8 28 0 8 28 0 8 28 0 8 28 0 8 28 0 8 28 0 8 28 results

K 0:034 0:044 0:054 0:039 0:043 0:050 0:044 0:044 0:050 0:026 0:044 0:058 0:034 0:045 0:054 0:041 0:045 0:051 0:023 0:041 0:055 0:032 0:043 0:054 0:040 0:044 0:051

0:029 0:038 0:049 0:037 0:040 0:046 0:043 0:042 0:047 0:023 0:040 0:054 0:033 0:042 0:050 0:041 0:043 0:049 0:021 0:038 0:051 0:031 0:040 0:052 0:040 0:042 0:049

HF U L k1

= 0:3, T = 800,

HF U L

Table 7: Rejection Probabilities

2

CU E

JCU E

0:051 0:012 0:012 0:063 0:027 0:024 0:068 0:051 0:037 0:028 0:020 0:020 0:047 0:032 0:030 0:062 0:049 0:043 0:038 0:030 0:030 0:046 0:041 0:040 0:057 0:062 0:054 0:037 0:010 0:014 0:046 0:036 0:023 0:054 0:081 0:040 0:028 0:014 0:015 0:040 0:033 0:027 0:051 0:068 0:044 0:040 0:025 0:026 0:042 0:034 0:035 0:049 0:061 0:052 0:026 0:008 0:014 0:036 0:043 0:027 0:046 0:094 0:042 0:029 0:012 0:013 0:032 0:032 0:030 0:043 0:084 0:047 0:039 0:024 0:024 0:033 0:030 0:033 0:039 0:073 0:053 24-Aug-2007 12:01 PM

JIV E

= 1:0

Instrumental Variable Estimation with ...

Instrumental Variable Estimation with ...

Suggest Documents

Instrumental Variable Quantile Estimation of Spatial ... - CiteSeerX

Instrumental Variable Estimation for Duration Data - Description

Nonparametric Instrumental Variable Estimation in ...

Instrumental Variable Estimation for Duration Data

Instrumental Variable Quantile Estimation of Spatial ... - CiteSeerX

An Instrumental Variable Consistent Estimation Procedure - CiteSeerX

Censored quantile instrumental variable estimation with ... - Google Sites

The Gini Instrumental Variable, or the âdouble instrumental variable ...

instrumental variable interpretation of cointegration with ... - CiteSeerX

Instrumental Variable Bias with Censored Regressors

instrumental variable estimation of a threshold model ect205-1

an instrumental variable approach for identification and estimation ...

Adjusting for bias introduced by instrumental variable estimation in the ...

instrumental-variable estimation of bangkok- weather effects in ... - USM

Instrumental Variable Estimation of the Causal Effect of ... - CiteSeerX

Instrumental Variables Estimation With Some Invalid ...

AN OPTIMAL INSTRUMENTAL VARIABLE ...

A Flexible Instrumental Variable Approach

Original Articles Instrumental variable analysis

Instrumental Variable Interval Regression - Core

Spatial Interdependence and Instrumental Variable

INSTRUMENTAL VARIABLES ESTIMATION OF HETEROSKEDASTIC ...

an instrumental variable approach with panel data - National ...

Shrinkage priors for linear instrumental variable models with many ...

Instrumental Variable Estimation with ...