Measurement errors and simultaneity - UiO

91 downloads 0 Views 331KB Size Report
Sep 3, 2013 - In this chapter we are more concerned with a non-zero correlation ... simultaneity problem that arises when the regression equation is one of ...
Chapter 9

Measurement errors and simultaneity Erik Biørn and Jayalakshmi Krishnakumar Chapter in:

THE ECONOMETRICS OF PANEL DATA. Fundamentals and recent developments, Ed. by L. M´aty´ as and P. Sevestre. Springer Science, forthcoming

9.1

Introduction

This chapter is concerned with the problem of endogeneity of certain explanatory variables in a regression equation. There are two potential sources of endogeneity in a panel data model with individual and time specific effects : (i) correlation between explanatory variables and specific effects (when treated random) and (ii) correlation between explanatory variables and the residual/idiosyncratic error term. The first case was extensively dealt with in Chapter 4 of this book and hence we will not go into it here. In this chapter we are more concerned with a non-zero correlation between the explanatory variables and the overall error consisting of both the specific effect and the genuine disturbance term. One might call it double endogeneity as opposed to the single endogeneity in the former situation. In this chapter we consider two major causes of this double endogeneity encountered in practical situations. One of them is the presence of measurement errors in the explanatory variables. This will be the object of study of Section 9.2. Another major source is the simultaneity problem that arises when the regression equation is one of several structural equations of a simultaneous model and hence contains current endogenous explanatory variables. Section 9.3 will look into this problem in detail and Section 9.4 concludes the 1

chapter.

9.2

Measurement errors and panel data

A familiar and notable property of the Ordinary Least Squares (OLS) when there are random measurement errors (errors-in-variables, EIV) in the regressors is that the coefficient estimators are inconsistent. In the one regressor case (or the multiple regressor case with uncorrelated regressors) under standard assumptions, the slope coefficient estimator is biased towards zero, often denoted as attenuation. More seriously, unless some ‘extraneous’ information is available, e.g. the existence of valid parameter restrictions or valid instruments for the error-ridden regressors, slope coefficients cannot (in general) be identified from standard data [see Fuller (1987, section 1.1.3)].1 This lack of identification in EIV models, however, relates to uni-dimensional data, i.e., pure (single or repeated) cross-sections or pure time-series. If the variables are observed as panel data, exhibiting two-dimensional variation, it may be possible to handle the EIV identification problem and estimate slope coefficients consistently without extraneous information, provided that the distribution of the latent regressors and the measurement errors satisfy certain weak conditions. Briefly, the reason why the existence of variables observed along two dimensions makes the EIV identification problem easier to solve, is partly (i) the repeated measurement property of panel data, so that the measurement error problem can be reduced by taking averages, which, in turn, may show sufficient variation to permit consistent estimation, and partly (ii) the larger set of other linear data transformations available for estimation. Such transformations, involving several individuals or several periods, may be needed to take account of uni-dimensional ‘nuisance variables’ like unobserved individual or period specific heterogeneity, which are potentially correlated with the regressor. Our focus is on the estimation of linear, static regression equations from balanced panel data with additive, random measurement errors in the regressors by means of methods utilizing instrumental variables (IVs). The panel data available to an econometrician are frequently from individuals, firms, or other kinds of micro units, where not only observation errors in the narrow sense, but also departures between theoretical variable definitions and their observable counterparts in a wider sense may be present. From the panel data literature which disregards the EIV problem we know that the effect of, for example, additive (fixed or random) individual heterogeneity within a linear model can be eliminated by deducting individual means, taking differences over periods, etc. [see Baltagi (2001, Chapter 2) and Hsiao (2003, Section 1.1)]. Such transformations, 1

Identification under non-normality of the true regressor is possible, by utilizing moments of the distribution of the observable variables of order higher than the second, [see Reiersøl (1950)]. Even under non-identification, bounds on the parameters can be established from the distribution of the observable variables [see Fuller (1987, p. 11)]. These bounds may be wide or narrow, depending on the covariance structure of the variables; see Klepper and Leamer (1984), Bekker et al. (1987), and Erickson (1993).

2

however, may magnify the variation in the measurement error component of the observations relative to the variation in the true structural component, i.e., they may increase the ‘noise/signal ratio’. Hence, data transformations intended to ‘solve’ the unobserved heterogeneity problem in estimating slope coefficients may aggravate the EIV problem. Several familiar estimators for panel data models, including the fixed effects within-group and between-group estimators, and the random effects Generalised Least Squares (GLS) estimators will then be inconsistent, the bias depending, inter alia, on the way in which the number of individuals and/or periods tend to infinity and on the heterogeneity of the measurement error process; see Griliches and Hausman (1986) and Biørn (1992, 1996). Such inconsistency problems will not be dealt with here. Neither will we consider the idea of constructing consistent estimators by combining two or more inconsistent ones with different probability limits. Several examples are given in Griliches and Hausman (1986), Biørn (1996), and Wansbeek and Meijer (2000, section 6.9). The procedures to be considered in this section have two basic characteristics: First, a mixture of level and difference variables are involved. Second, the orthogonality conditions derived from the EIV structure – involving levels and differences over one or more than one periods – are not all essential, some are redundant. Our estimation procedures are of two kinds: (A) Transform the equation to differences and estimate it by IV or GMM, using as IVs level values of the regressors and/or regressands for other periods. (B) Keep the equation in level form and estimate it by IV or GMM, using as IVs differenced values of the regressors and/or regressands for other periods. In both cases, the differencing serves to eliminate individual heterogeneity, which is a potential nuisance since it may be correlated with the latent regressor vector. These procedures resemble, to some extent, procedures for autoregressive (AR) models for panel data without measurement errors (mostly AR(1) equations with individual heterogeneity and often with exogenous regressors added) discussed, inter alia, by Anderson and Hsiao (1981, 1982), Holtz-Eakin et al. (1988), Arellano and Bond (1991), Arellano and Bover (1995), Ahn and Schmidt (1995), Blundell and Bond (1998), and Sevestre and Trognon (2007). If the distribution of the latent regressor vector is not time invariant and the second order moments of the measurement errors and disturbances are structured to some extent, a large number of consistent IV estimators of the coefficient of the latent regressor vector exist. Their consistency is robust to potential correlation between the individual heterogeneity and the latent regressor. Serial correlation or non-stationarity of the latent regressor is favourable from the point of view of identification and estimability of the coefficient vector. The literature dealing specifically with panel data with measurement errors is not large. The (A) procedures above extend and modify procedures described in Griliches and Hausman (1986), which is the seminal article on measurement errors in panel data, at least in econometrics. Extensions are discussed in Wansbeek and Koning (1991), Biørn (1992, 1996, 2000, 2003), and Biørn and Klette (1998, 1999), and Wansbeek (2001). Paterno et al. (1996) consider Maximum Likelihood analysis of panel data with measurement errors

3

and is not related to the (A) and (B) procedures to be discussed here.

9.2.1

Model and orthogonality conditions

Consider a panel data set with N (≥ 2) individuals observed in T (≥ 2) periods and a relationship between y (observable scalar) and a (K × 1) vector ξ (latent), yit = c + ξ0it β + αi + uit ,

i = 1, . . . , N ; t = 1, . . . , T,

(9.2.1)

where (yit , ξit ) is the value of (y, ξ) for individual i in period t, c is a scalar constant, β is a (K × 1) vector and αi is a zero (marginal) mean individual effect, which we consider as random and potentially correlated with ξit , and uit is a zero mean disturbance, which may also contain a measurement error in yit . We observe xit = ξit + v it ,

i = 1, . . . , N ; t = 1, . . . , T,

(9.2.2)

where v it is a zero mean (K × 1) vector of measurement errors. Hence, yit = c + x0it β + ²it ,

²it = αi + uit − v 0it β.

(9.2.3)

We can eliminate αi from (9.2.3) by taking arbitrary backward differences ∆yitθ = yit −yiθ , ∆xitθ = xit − xiθ , etc., giving ∆yitθ = ∆x0itθ β + ∆²itθ ,

∆²it = ∆uit − ∆v 0it β.

(9.2.4)

We assume that (ξit , uit , v it , αi ) are independent across individuals [which excludes random period specific components in (ξ it , uit , v it )], and make the following basic orthogonality assumptions: E(v it uiθ ) = E(ξ it uiθ ) = E(αi v it ) = 0K1 , Assumption (A):

E(ξiθ ⊗ v 0it ) = 0KK ,

i = 1, . . . , N, t, θ = 1, . . . , T,

E(αi uit ) = 0, where 0mn denotes the (m × n) zero matrix and ⊗ is the Kronecker product operator. Regarding the temporal structure of the measurement errors and disturbances, we assume either that Assumption (B1):

E(v it v 0iθ ) = 0KK ,

|t − θ| > τ,

Assumption (C1):

E(uit uiθ ) = 0,

|t − θ| > τ,

where τ is a non-negative integer, indicating the order of the serial correlation, or Assumption (B2):

E(v it v 0iθ ) is invariant to t, θ,

t 6= θ,

Assumption (C2):

E(uit uiθ ) is invariant to t, θ,

t 6= θ,

4

which allows for time invariance of the autocovariances. The latter will, for instance, be satisfied if the measurement errors and the disturbances have individual specific components, say v it = v 1i + v 2it , uit = u1i + u2it , where v 1i , v 2it , u1i , and u2it are independent IID processes. The final set of assumptions relate to the distribution of the latent regressor vector ξ it : Assumption (D1):

E(ξit )

is invariant to t,

Assumption (D2):

E(αi ξ it )

is invariant to t,

Assumption (E):

rank(E[ξ ip (∆ξ0itθ )]) = K for some p, t, θ different,

Assumptions (D1) and (D2) hold when ξ it is stationary for all i [(D1) alone imposing mean stationarity]. Assumption (E) imposes non-IID and some form of autocorrelation or non-stationarity on ξit . It excludes, for example, the case where ξ it has an individual specific component, so that ξit = ξ1i + ξ2it , where ξ1i and ξ2it are independent (vector) IID processes. Assumptions (A)–(E) do not impose much structure on the first and second order moments of the uit s, v it s, ξit s and αi s. This has both its pros and cons. It is possible to structure this distribution more strongly, for instance assuming homoskedasticity and normality of uit , vit , and αi , and normality of ξit . Exploiting this stronger structure, e.g., by taking a LISREL of LIML approach, we might obtain more efficient (but potentially less robust) estimators by operating on the full covariance matrix of the yit s and the xit s rather than eliminating the αi s by differencing. Other extensions are elaborated in Section 9.2.7.

9.2.2

Identification and the structure of the second order moments

The distribution of (ξit , uit , v it , αi ) must satisfy some conditions to make identification of β possible. The nature of these conditions can be illustrated as follows. Assume, for simplicity, that this distribution is the same for all individuals and that (A) holds, and let C(ξit , ξiθ ) = Σξξ tθ , E(v it v 0iθ )

=

Σvv tθ ,

E(ξit αi ) = Σξα t , E(uit uiθ ) =

E(αi2 ) = σ αα ,

uu , σtθ

i = 1, . . . , N, t, θ = 1, . . . , T,

where C denotes the covariance matrix operator. It then follows from (9.2.1) and (9.2.2) that the second order moments of the observable variables can be expressed as vv C(xit , xiθ ) = Σξξ tθ + Σtθ , ξα C(xit , yiθ ) = Σξξ tθ β + Σt , ξα 0 0 ξα uu αα C(yit , yiθ ) = β 0 Σξξ tθ β + (Σt ) β + β Σθ + σtθ + σ ,

i = 1, . . . , N, (9.2.5) t, θ = 1, . . . , T.

The identifiability of β from second order moments in general depends on whether or not knowledge of C(xit , xiθ ), C(xit , yiθ ), and C(yit , yiθ ) for all available t and θ is sufficient 5

for obtaining a unique solution for β from (9.2.5), given the restrictions imposed on the ξα uu αα Σξξ tθ s, Σt s, σtθ s, and σ . The answer, in general, depends on T and K. With no further information, the number of elements in C(xit , xiθ ), and C(yit , yiθ ) (all of which can be estimated consistently from corresponding sample moments under weak conditions) equal uu 1 1 the number of unknown elements in Σvv tθ and σtθ , which is 2 KT (KT + 1) and 2 T (T + 1), respectively. Then σ αα cannot be identified, and C(xit , yiθ ) contains the only additional ξα information available for identifying β, Σξξ tθ , and Σt , given the restrictions imposed on the latter two matrices. Consider two extreme cases. First, if T = 1, i.e., if we only have cross-section data, and no additional restrictions are imposed, there is an identification problem for any K. Second, if T > 2 and ξit ∼ IID(µξ , Σξξ ), v it ∼ IID(01,K , Σvv ), uit ∼ IID(0, σ uu ), αi ∼ IID(0, σ αα ), we also have lack of identification in general. We get an essentially similar conclusion when the autocovariances of ξ it are time invariant and it is IID across i. From (9.2.5) we then get C(xit , xiθ ) = δtθ (Σξξ + Σvv ), C(xit , yiθ ) = δtθ Σξξ β,

(9.2.6)

C(yit , yiθ ) = δtθ (β 0 Σξξ β + σ uu ) + σ αα , where δtθ = 1 for t = θ and = 0 for t 6= θ, and so we are essentially in the same situation with regard to identifiability of β as when T = 1. The ‘cross-period’ equations (t 6= θ) then serve no other purpose than identification of σ αα , and whether T = 1 or T > 1 realizations of C(xit , xit ), C(xit , yit ), and C(yit , yit ) are available in (9.2.6) is immaterial to the identifiability of β, Σξξ , Σvv , and σ uu . In intermediate situations, identification may be ensured when T ≥ 2. These examples illustrate that in order to ensure identification of the slope coefficient vector from panel data, there should not be ‘too much structure’ on the second order moments of the latent exogenous regressors along the time dimension, and not ‘too little structure’ on the second order moments of the errors and disturbances along the time dimension.

9.2.3

Moment conditions

A substantial number of (linear and non-linear) moment conditions involving yit , xit , and ²it can be derived from Assumptions (A)–(E). Since (9.2.1)–(9.2.3) and Assumption (A) imply E(xit x0iθ ) = E(ξ it ξ 0iθ ) + E(v it v 0iθ ), E(xit yiθ ) = E(ξ it ξ 0iθ )β + E[ξit (αi + c)], E(yit yiθ ) = c2 + E(αi2 ) + β 0 E(ξit ξ0iθ )β + β 0 E[ξit (αi + c)] + E[(αi + c)ξ0iθ ]β + E(uit uiθ ), E(xit ²iθ ) = E(ξ it αi ) − E(v it v 0iθ )β, E(yit ²iθ ) = β 0 E(ξ it αi ) + E(αi2 ) + E(uit uiθ ),

6

we can derive moment equations involving observable variables in levels and differences: E[xip (∆x0itθ )] = E[ξ ip (∆ξ0itθ )] + E[v ip (∆v 0itθ )],

(9.2.7)

0 E[xip (∆yitθ )]

(9.2.8)

=

E[(∆xipq )yit ] =

E[ξ ip (∆ξ0itθ )]β, E[(∆ξ ipq )ξ 0it ]β +

E[(∆ξ ipq )(αi + c)],

(9.2.9)

as well as moment equations involving observable variables and errors/disturbances: E[xip (∆²itθ )] = − E[v ip (∆v 0itθ )]β,

(9.2.10)

E[yip (∆²itθ )] = E[uip (∆uitθ )],

(9.2.11)

E[(∆xipq )²it ] = E[(∆ξipq )αi ] − E[(∆v ipq )v 0it ]β,

(9.2.12)

0

E[(∆yipq )²it ] = β E[(∆ξipq )αi ] + E[(∆uipq )uit ],

t, θ, p, q = 1, . . . , T. (9.2.13)

Not all of the equations in (9.2.7)–(9.2.13), whose number is substantial even for small T , are, of course, independent. Depending on which (B), (C), and (D) assumptions are valid, some terms on the right hand side of (9.2.9)–(9.2.13) will vanish. Precisely, if T > 2, then (9.2.3), (9.2.5), and (9.2.10)–(9.2.13) imply the following moment conditions, or orthogonality conditions (OC), on the observable variables and the errors and disturbances (B2), or (B1) with |t − p|, |θ − p| > τ, t 6= θ =⇒ E[xip (∆²itθ )] = E[xip (∆yitθ )] − E[xip (∆x0itθ )]β = 0K1 .

(9.2.14)

(C2), or (C1) with |t − p|, |θ − p| > τ, t 6= θ =⇒ E[yip (∆²itθ )] = E[yip (∆yitθ )] − E[yip (∆x0itθ )]β = 0.

(9.2.15)

(D1), (D2) and (B2), or (B1) with |t − p|, |t − q| > τ, p 6= q =⇒ E[(∆xipq )²it ] = E[(∆xipq )yit ] − E[(∆xipq )xit ]β = 0K1 .

(9.2.16)

(D1), (D2), and (C2), or (C1) with |t − p|, |t − q| > τ, p 6= q =⇒ E[(∆yipq )²it ] = E[(∆yipq )yit ] − E[(∆yipq )x0it ]β = 0.

(9.2.17)

The treatment of the intercept term c in constructing (9.2.16) and (9.2.17) needs a comment. When the mean stationarity assumption (D1) holds, using IVs in differences annihilates c in the moment equations, since then E(∆xipq ) = 0K1 and E(∆yipq ) = 0. If, however, we relax (D1), which is unlikely to hold in many practical situations, we get E[(∆xipq )²it ] = E[(∆xipq )yit ] − E[∆xipq ]c − E[(∆xipq )x0it ]β = 0K1 , E[(∆yipq )²it ] = E[(∆yipq )yit ] − E[∆yipq ]c − E[(∆yipq )x0it ]β = 0. Using E(²it ) = E(yit ) − c − E(x0it )β = 0 to eliminate c leads to the following modifications of (9.2.16) and (9.2.17): (D1), (D2) and (B2), or (B1) with |t − p|, |t − q| > τ, p 6= q, =⇒ E[(∆xipq )²it ] = E[(∆xipq )(yit − E(yit ))] − E[(∆xipq )(x0it − E(x0it ))]β = 0K1 . (D1), (D2), and (C2), or (C1) with |t − p|, |t − q| > τ, p 6= q, =⇒ E[(∆yipq )²it ] = E[(∆yipq )(yit − E(yit ))] − E[(∆yipq )(x0it − E(x0it ))]β = 0. 7

To implement these modified OCs in the GMM procedures to be described below for the level equation, we could replace E(yit ) and E(xit ) by corresponding global or period specific sample means. The conditions in (9.2.14)–(9.2.17) are not all independent. Some are redundant, since they can be derived as linear combinations of other conditions.2 We confine attention to (9.2.14) and (9.2.16), since (9.2.15) and (9.2.17) can be treated similarly. When τ = 0, the total number of OCs in both (9.2.14) and (9.2.16) is 12 KT (T −1)(T −2). Below, we prove that (a) When (B2) and (C2), or (B1) and (C1) with τ = 0, are satisfied, all OCs in (9.2.14) can be constructed from all admissible OCs relating to equations differenced over one period and a subset of OCs relating to differences over two periods. When (B1) and (C1) are satisfied with an arbitrary τ , all OCs in (9.2.14) can be constructed from all admissible OCs relating to equations differenced over one period and a subset of OCs relating to differences over 2(τ +1) periods. (b) When (B2) and (C2), or (B1) and (C1) with τ = 0, are satisfied all OCs in (9.2.16) can be constructed from all admissible OCs relating to IVs differenced over one period and a subset of IVs differenced over two periods. When (B1) and (C1) are satisfied with an arbitrary τ , all OCs in (9.2.16) can be constructed from all admissible OCs relating to IVs differenced over one period and a subset of IVs differenced over 2(τ +1) periods. We denote the non-redundant conditions defined by (a) and (b) as essential OCs. Since (9.2.14) and (9.2.16) are symmetric, we prove only (a) and derive (b) by way of analogy. P Since xip ∆²itθ = xip ( tj=θ+1 ∆²ij,j−1 ), we see that if (hypothetically) all p = 1, . . . , T combined with all t > θ would have given admissible OCs, (9.2.14) for differences over 2, 3, . . . , T − 1 periods could have been constructed from the conditions relating to oneperiod differences only. However, since (t, θ) = (p, p − 1), (p + 1, p) are inadmissible, and [when (B2) holds] (t, θ) = (p + 1, p − 1) is admissible, we have to distinguish between the cases where p is strictly outside and strictly inside the interval (θ, t). From the identities Pt

xip ∆²itθ = xip (

j=θ+1 ∆²ij,j−1 )

xip ∆²itθ = xip (

j=θ+1 ∆²ij,j−1

Pp−1

for p = 1, . . . , θ−1, t + 1, . . . , T,

+ ∆²i,p+1,p−1 +

Pt

j=p+2 ∆²ij,j−1 )for

p = θ+1, . . . , t−1,

when taking expectations, we then obtain Proposition 1: A. When (B2) and (C2) are satisfied, then 2

This redundancy problem is discussed in Biørn (2000). Essential and redundant moment conditions in AR models for panel data are discussed in Ahn and Schmidt (1995), Arellano and Bover (1995), and Blundell and Bond (1995). A general treatment of redundancy of moment conditions in GMM estimation is found in Breusch et al. (1999).

8

(a) E[xip (∆²it,t−1 )] = 0K1 for p = 1, . . . , t−2, t+1, . . . , T ; t = 2, . . . , T are K(T −1)(T −2) essential OCs for equations differenced over one period. (b) E[xit (∆²it+1,t−1 )] = 0K1 for t = 2, . . . , T − 1 are K(T − 2) essential OCs for equations differenced over two periods. (c) The other OCs are redundant: among the 12 KT (T − 1)(T − 2) conditions in (9.2.14), only a fraction 2/(T −1), are essential. B. When (B1) and (C1) are satisfied for an arbitrary τ , then (a) E[xip (∆²it,t−1 )] = 0K1 for p = 1, . . . , t−τ −2, t+τ +1, . . . , T ; t = 2, . . . , T are essential OCs for equations in one-period differences. (b) E[xit (∆²it+τ +1,t−τ −1 )] = 0K1 for t = τ +2, . . . , T −τ −1 are essential OCs for equations in 2(τ +1) period differences. (c) The other OCs in (9.2.14) are redundant. Symmetrically, from (9.2.16) we have Proposition 2: A. When (B2) and (C2) are satisfied, then (a) E[(∆xip,p−1 )²it ] = 0K1 for t = 1, . . . , p−2, p+1, . . . , T ; p = 2, . . . , T are K(T − 1)(T − 2) essential OCs for equations in levels, with IVs differenced over one period. (b) E[(∆xit+1,t−1 )²it ] = 0K1 for t = 2, . . . , T − 1 are K(T − 2) essential OCs for equations in levels, with IVs differenced over two periods. (c) The other OCs are redundant: among the 12 KT (T − 1)(T − 2) conditions in (9.2.16), only a fraction 2/(T −1), are essential. B. When (B1) and (C1) are satisfied for an arbitrary τ , then (a) E[(∆xip,p−1 )²it ] = 0K1 for t = 1, . . . , p−τ −2, p+τ +1, . . . , T ; p = 2, . . . , T are essential OCs for equations in levels, with IVs differenced over one period. (b) E[(∆xit+τ +1,t−τ −1 )²it ] = 0K1 for t = τ +2, . . . , T −τ −1 are essential OCs for equations in levels, with IVs differenced over 2(τ +1) periods. (c) The other OCs in (9.2.16) are redundant. These propositions can be (trivially) modified to include also the essential and redundant OCs in the ys or the ∆ys, given in (9.2.15) and (9.2.17).

9.2.4

Estimators constructed from period means

Several consistent estimators of β can be constructed from differenced period means. These estimators exploit the repeated measurement property of panel data, while the differencing removes the latent heterogeneity. From (9.2.3) we obtain ∆s y¯·t = ∆s x ¯0·t β + ∆s ²¯·t ,

s = 1, . . . , T −1; t = s+1, . . . , T,

(¯ y·t − y¯) = (¯ x·t − x ¯)0 β + (¯ ²·t − ²¯), 9

t = 1, . . . , T,

(9.2.18) (9.2.19)

P

P P

P

P P

where y¯·t = N1 i yit , y¯ = N1T i t yit , x ¯·t = N1 i xit , x ¯ = N1T i t xit , etc. and ∆s denotes differencing over s periods. When (A) is satisfied, the (weak) law of large numbers implies, under weak conditions [confer McCabe and Tremayne (1993, section 3.5)],3 that plim(¯ ²·t ) = 0, plim(¯ x·t − ξ¯·t ) = 0K1 , so that plim[¯ x·t ²¯·t ] = 0K1 even if 1 PN plim[ N i=1 xit ²it ] 6= 0K1 . From (9.2.18) and (9.2.19) we therefore get plim[(∆s x ¯·t )(∆s y¯·t )] = plim[(∆s x ¯·t )(∆s x ¯0·t )]β,

(9.2.20)

0

(9.2.21)

plim[(¯ x·t − x ¯)(¯ y·t − y¯)] = plim[(¯ x·t − x ¯)(¯ x·t − x ¯) ]β.

¯ ξ¯ − ξ) ¯ 0 ] have rank K, which is Hence, provided that E[(∆s ξ¯·t )(∆s ξ¯·t )0 ] and E[(ξ¯·t − ξ)( ·t ensured by Assumption (E), consistent estimators of β can be obtained by applying OLS on (9.2.18) and (9.2.19), which give, respectively,  b  β ∆s =

T X

−1  0

(∆s x ¯·t ) (∆s x ¯·t )

t=s+1

b β BP =



T X

(∆s x ¯·t ) (∆s y¯·t ) , s = 1, . . . , T −1, (9.2.22)

t=s+1

" T X

#−1 " T X

t=1

t=1

(¯ x·t − x ¯)(¯ x·t − x ¯)0



#

(¯ x·t − x ¯)(¯ y·t − y¯) .

(9.2.23)

The latter is the ‘between period’ (BP) estimator. The consistency of these estimators simply relies on the fact that averages of a large number of repeated measurements of an error-ridden variable give, under weak conditions, an error-free measure of the true average at the limit, provided that this average shows variation along the remaining dimension, i.e., across periods. Shalabh (2003) also discusses consistent coefficient estimation in measurement error models with replicated observations. The latter property is ensured by Assumption (E). A major problem with these estimators is their low potential efficiency, as none of them exploits the between individual variation in the data, which often is the main source of variation. Basic to these conclusions is the assumption that the measurement error has no period specific component, which, roughly speaking, means that it is ‘equally difficult’ to measure ξ correctly in all periods. If such a component is present, it will not vanish when taking plims of period means, i.e., plim(¯ v ·t ) will no longer be zero, (9.2.20) and (9.2.21) will no b b longer hold, and so β ∆s and β BP will be inconsistent.

9.2.5

GMM estimation and testing in the general case

We first consider the GMM principle in general, without reference to panel data and measurement error situations. Assume that we want to estimate the (K × 1) coefficient vector β in the equation4 y = xβ + ², (9.2.24) 3

Throughout plim denotes probability limits when N goes to infinity and T is finite. We here, unlike in Sections 9.2.1–9.2.4, let the column number denote the regressor and the row number the observation. Following this convention, we can express the following IV and GMM estimators in the more common format when going from vector to matrix notation. 4

10

where y and ² are scalars and x is a (1 × K) regressor vector. There exists an instrument vector z, of dimension (1 × G), for x (G ≥ K), satisfying the OCs E(z 0 ²) = E[z 0 (y − xβ)] = 0G1 .

(9.2.25)

We have n observations on (y, x, z), denoted as (yj , xj , z j ), j = 1, . . . , n, and define the vector valued (G × 1) function of corresponding empirical means, g n (y, x, z; β) =

1 n

Pn

0 j=1 z j (yj

− xj β).

(9.2.26)

It may be considered the empirical counterpart to E[z 0 (y −xβ)] based on the sample. The essence of GMM is to choose as an estimator for β the value which brings the value of g n (y, x, z; β) as close to its theoretical counterpart, 0G1 , as possible. If G = K, an exact solution to g n (y, x, z; β) = 0G1 exists and is the simple IV estimator β∗ = [

P

P j

z 0j xj ]−1 [

j

z 0j yj ].

If G > K, which is the most common situation, GMM solves the estimation problem by minimizing a distance measure represented by a quadratic form in gn (y, x, z; β) for a suitably chosen positive definit (G × G) weighting matrix W n , i.e., β ∗GM M = argminβ [g n (y, x, z; β)0 W n g n (y, x, z; β)].

(9.2.27)

All estimators obtained in this way are consistent. A choice which leads to an asymptotically efficient estimator of β, is to set this weighting matrix equal (or proportional) P to the inverse of (an estimate of) the (asymptotic) covariance matrix of n1 nj=1 z 0j ²j ; see, e.g., Davidson and MacKinnon (1993, Theorem 17.3) and Harris and M´aty´ as (1999, section 1.3.3). If ² is serially uncorrelated and homoskedastic, with variance σ²2 , the appropriate choice P is simply W n = [n−2 σ²2 nj=1 z 0j z j ]−1 . The estimator obtained from (9.2.27) is then P

b β GM M = [(

P

j

0 −1 P 0 −1 j z j z j ) ( j z j xj )] P 0 P 0 P [( j xj z j )( j z j z j )−1 ( j z 0j yj )],

x0j z j )( ×

(9.2.28)

which is the standard Two-Stage Least Squares (2SLS) estimator. If ²j has an unspecified heteroskedasticity or has a more or less strictly specified autocorrelation, we can reformulate the OCs in an appropriate way, as will be exemplified below. Both of these properties are essential for the application of GMM to panel data. To operationalize the latter method in the presence of unknown heteroskedasticity, we first construct consistent residuals ²bj , usually from (9.2.28), which we consider as a first step GMM estimator, c n = [n−2 P z 0 ²b2 z ]−1 ; see White (1984, sections IV.3 and VI.2). and estimate W n by W j j j j Inserting this into (9.2.27) gives P

e β GM M = [(

j

P

0 b2 z )−1 (P z 0 x )]−1 j zj ² j j j j j P 0 P 0 2 P [( j xj z j )( j z j ²bj z j )−1 ( j z 0j yj )].

x0j z j )( ×

11

(9.2.29)

This second step GMM estimator is in a sense an optimal GMM estimator in the presence of unspecified error/disturbance heteroskedasticity. The validity of the orthogonality condition (9.2.25) can be tested by the Sargan-Hansen statistic [confer Hansen (1982), Newey (1985), and Arellano and Bond (1991)], corresponde ing to the asymptotically efficient estimator β GM M : P

J = [(

P b0j z j )( j j²

P

z 0j ²b2j z j )−1 (

j

z 0j ²bj )]−1 .

Under the null, J is asymptotically distributed as χ2 with a number of degrees of freedom equal to the number of overidentifying restrictions, i.e., the number of orthogonality conditions less the number of coefficients estimated under the null. b e The procedures for estimating standard errors of β GM M and β GM M can be explained as follows. Express (9.2.24) and (9.2.25) as y = Xβ + ²,

E(Z 0 ²) = 0,

E(²) = 0,

E(²²0 ) = Ω,

where y, X, Z, and ² correspond to y, x, z and ², and the n observations are placed along the rows. The two generic GMM estimators (9.2.28) and (9.2.29) have the form b = [X 0 P X]−1 [X 0 P y], β Z Z e = [X 0 P (Ω)X]−1 [X 0 P (Ω)y], β Z Z

P Z = Z(Z 0 Z)−1 Z 0 , P Z (Ω) = Z(Z 0 ΩZ)−1 Z 0 .

b and Let the residual vector obtained from the former be b² = y − X β

S XZ = S 0ZX = S Z ΩZ =

X 0Z , n

Z 0 ΩZ , n

Z 0Z , n 0 0 Z ²² Z S Z ²²Z = , n S ZZ =

²0 Z , n 0 bb0 Z ²² Z S Z b²b²Z = . n

S ²Z = S 0Z² =

Inserting for y in the expressions for the two estimators gives "

#

0 √ b √ −1 Z ² −1 √ n(β−β) = n[X 0 P Z X]−1 [X 0 P Z ²] = [S XZ S −1 , S ] S S XZ ZZ ZZ ZX n # " 0 √ e √ −1 −1 Z ² 0 −1 0 −1 n(β−β) = n[X P Z (Ω)X] [X P Z (Ω)²] = [S XZ S ZΩZ S ZX ] S XZ S ZΩZ √ , n

and hence −1 −1 −1 −1 −1 −1 b − β)(β b − β)0 = [S n(β XZ S ZZ S ZX ] [S XZ S ZZ S Z ²²Z S ZZ S ZX ][S XZ S ZZ S ZX ] , −1 −1 −1 −1 −1 −1 e − β)(β e − β)0 = [S n(β XZ S ZΩZ S ZX ] [S XZ S ZΩZ S Z ²²Z S ZΩZ S ZX ][S XZ S ZΩZ S ZX ] .

√ b √ e The asymptotic covariance matrices of nβ and nβ can then, under suitable regularity conditions, be written as [see Bowden and Turkington (1984, pp. 26, 69)] √ b b − β)(β b − β)0 ] = plim[n(β b − β)(β b − β)0 ], aV( n β) = lim E[n(β √ e e − β)(β e − β)0 ] = plim[n(β e − β)(β e − β)0 ]. aV( n β) = lim E[n(β 12

Since S Z ²²Z and S ZΩZ coincide asymptotically, we get, letting bars denote plims, √ b −1 −1 −1 −1 aV( n β) = [S XZ S ZZ S ZX ]−1 [S XZ S ZZ S ZΩZ S ZZ S ZX ][S XZ S ZZ S ZX ]−1 , √ e −1 aV( n β) = [S XZ S ZΩZ S ZX ]−1 . Replacing the plims S XZ , S ZX , S ZZ and S ZΩZ by their sample counterparts, S XZ , S ZX , S ZZ and S Zˆ²²ˆZ and dividing by n, we get the following estimators of the asymptotic b and β: e covariance matrices of β 1 −1 −1 −1 [S S −1 S ]−1 [S XZ S −1 ˆ² ˆ Z S ZZ S ZX ][S XZ S ZZ S ZX ] ZZ S Z ² n XZ ZZ ZX = [X 0 P Z X]−1 [X 0 P Z b²b²0 P Z X][X 0 P Z X]−1 , d e = 1 [S V(β) S −1 S ]−1 = [X 0 Z(Z 0 b²b²0 Z)−1 Z 0 X]−1 = [X 0 P Z (b²b²0 )X]−1 . n XZ Z ²ˆ ²ˆ Z ZX d

b = V(β)

These are the generic expressions for estimating variances and covariances of the GMM e in practice, we replace P (Ω) by estimators (9.2.28) and (9.2.29). When calculating β Z 0 0 0 0 P Z (b²b² ) = Z(Z b²b² Z)−1 Z [see White (1982, 1984)].

9.2.6

Estimation by GMM, combining differences and levels

Following this general description of the GMM, we can construct estimators of β by replacing the expectations in (9.2.14)–(9.2.17) by sample means taken over i and minimizing their distances from the zero vector. There are several ways in which this idea can be operationalized. We can (i) Estimate equations in differences, with instruments in levels, using (9.2.14) and/or (9.2.15) for (a) one (t, θ) and one p, (b) one (t, θ) and several p, or (c) several (t, θ) and several p jointly. (ii) Estimate equations in levels, with instruments in differences, using (9.2.16) and/or (9.2.17) for (a) one t and one (p, q), (b) one t and several (p, q), or (c) several t and several (p, q) jointly. In cases (i.a) and (ii.a), we obtain an empirical distance equal to the zero vector, so no minimization is needed. This corresponds, formally, to the situation with ‘exact identification’ (exactly as many OCs as needed) in classical IV estimation. In cases (i.b), (i.c), (ii.b), and (ii.c), we have, in a formal sense, ‘overidentification’ (more than the necessary number of OCs), and therefore construct ‘compromise estimators’ by minimizing appropriate quadratic forms in the corresponding empirical distances. We now consider cases (a), (b), and (c) for the differenced equation and the level equation.

(a) Simple period specific IV estimators Equation in differences, IVs in levels. The sample mean counterpart to (9.2.14) and (9.2.15) for one (t, θ, p) gives the estimator b β p(tθ) = [

0 −1 PN i=1 z ip (∆xitθ )] [ i=1 z ip (∆yitθ )],

PN

13

(9.2.30)

where z ip = xip or equal to xip with one element replaced by yip . Equation in levels, IVs in differences. The sample mean counterpart to (9.2.16) and (9.2.17) for one (t, p, q) gives the estimator b β (pq)t = [

PN

0 −1 PN i=1 (∆z ipq )xit ] [ i=1 (∆z ipq )yit ],

(9.2.31)

where ∆z ipq = ∆xipq or equal to ∆xipq with one element replaced by ∆yipq . Using (9.2.14)–(9.2.17) we note that • When z ip = xip (p 6= θ, t) and ∆z ipq = ∆xipq (t 6= p, q), Assumption (B2) is b b necessary for consistency of β p(tθ) and β (pq)t . If yip is included in z ip (p 6= θ, t), and ∆ypq is included in ∆z ipq (t 6= p, q), Assumption (C2) is also necessary for b b consistency of β p(tθ) and β (pq)t . b • Assumptions (D1) and (D2) are necessary for consistency of β (pq)t , but they are not b necessary for consistency of β p(tθ) .

Since the correlation between the regressors and the instruments, say between z ip and ∆xitθ , may be low, (9.2.30) and (9.2.31) may suffer from the ‘weak instrument problem’, discussed in Nelson and Startz (1990), Davidson and MacKinnon (1993, pp. 217–224), and Staiger and Stock (1997). The following estimators may be an answer to this problem.

(b) Period specific GMM estimators We next consider estimation of β in (9.2.4) for one pair of periods (t, θ), utilizing as IVs for ∆xitθ all admissible xip s, and estimation of β in (9.2.3), for one period (t), utilizing as IVs for xit all admissible ∆xipq s. To formalize this, we define the selection and differencing matrices  



((T −2) × T ) matrix

 obtained by deleting from  P tθ =  the T -dimensional   identity matrix

d.21 ..

   d  t−1,t−2 Dt =   dt+1,t−1  dt+2,t+1  ..  .

  ,  

rows t and θ

     ,    

t, θ = 1, . . . , T,

dT,T −1

where dtθ is the (1 × T ) vector with element t equal to 1, element θ equal to −1 and zero otherwise, so that D t is the are one-period [(T −2) × T )] differencing matrix, except that dt,t−1 and dt+1,t are replaced by their sum, dt+1,t−1 .5 We use the notation y i· = (yi1 , . . . , yiT )0 , X i· = (xi1 , . . . , xiT )0 , y i(tθ) = P tθ y i· ,

X i(tθ) = P tθ X i· ,

xi(tθ) = vec(X i(tθ) )0 ,

∆y i(t) = D t y i· ,

∆X i(t) = D t X i· ,

∆xi(t) = vec(∆X i(t) )0 ,

etc. Here X i(tθ) denotes the [(T −2) × K] matrix of x levels obtained by deleting rows t and θ from X i· , and ∆X i(t) denotes the [(T −2) × K] matrix of x differences obtained 5

The two-period difference is effective only for t = 2, . . . , T −1.

14

by stacking all one-period differences between rows of X i· not including period t and the single two-period difference between the columns for periods t + 1 and t − 1. The vectors y i(tθ) and ∆y i(t) are constructed from y i· in a similar way. Stacking y 0i(tθ) , ∆y 0i(t) , xi(tθ) , and ∆xi(t) , by individuals, we get 

Y (tθ)







0 0  y 1(tθ)   ∆y 1(t) .. ..    =  , ∆Y (t) =  . .   

y 0N (tθ)

∆y 0N (t)



   , X (tθ) 







 x1(tθ)   ∆x1(t) .. ..    =  , ∆X (t) =  . .   

xN (tθ)

  , 

∆xN (t)

which have dimensions (N × (T −2)), (N × (T −2)), (N × (T −2)K), and (N × (T −2)K), respectively. These four matrices contain the IVs to be considered below. Equation in differences, IVs in levels. Write (9.2.4) as ∆y tθ = ∆X tθ β + ∆²tθ , where ∆y tθ = (∆y1tθ , . . . , ∆yN tθ )0 , ∆X tθ = (∆x1tθ , . . . , ∆xN tθ )0 , etc. Using X (tθ) as IV matrix for ∆X tθ , we obtain the following estimator of β, specific to period (t, θ) differences and utilizing all admissible x level IVs, ·

³

0 0 b β x(tθ) = (∆X tθ ) X (tθ) X (tθ) X (tθ)

·

´−1

¸−1

X 0(tθ) (∆X tθ )

³

× (∆X tθ )0 X (tθ) X 0(tθ) X (tθ) =

·h P

0 i (∆xitθ )xi(tθ)

×

·h P

ihP

0 i xi(tθ) xi(tθ)

0 i (∆xitθ )xi(tθ)

ihP

´−1

i−1 hP

0 i xi(tθ) xi(tθ)

¸

X 0(tθ) (∆y tθ ) i¸−1

0 i xi(tθ) (∆xitθ )

i−1 hP



i xi(tθ) (∆yitθ )

.

(9.2.32)

It exists if X 0(tθ) X (tθ) has rank (T − 2)K, which requires N ≥ (T − 2)K. This GMM estimator, which exemplifies (9.2.28), minimizes the quadratic form: µ

1 0 X ∆² N (tθ) tθ

¶0 µ

1 X0 X N 2 (tθ) (tθ)

¶−1 µ



1 0 . X ∆² N (tθ) tθ

The weight matrix (N −2 X 0(tθ) X (tθ) )−1 is proportional to the inverse of the (asymptotic) covariance matrix of N −1 X 0(tθ) ∆²tθ when ∆²itθ is IID across i, possibly with a variance b depending on (t, θ). The consistency of β x(tθ) relies on Assumptions (B2) and (E). b Interesting modifications of β x(tθ) are:

(1) If var(∆²itθ ) = ωitθ varies with i and is known, we can increase the efficiency of (9.2.32) by replacing x0i(tθ) xi(tθ) by x0i(tθ) ωitθ xi(tθ) , which gives an asymptotically P optimal GMM estimator.6 Estimation of i x0i(tθ) ωitθ xi(tθ) for unknown ωitθ proceeds as in (9.2.29). 6

For a more general treatment of asymptotic efficiency in estimation with moment conditions, see Chamberlain (1987) and Newey and McFadden (1994).

15

. (2) Instead of using X (tθ) as IV matrix for ∆X tθ , as in (9.2.32), we may use (X (tθ) .. Y (tθ) ). Equation in levels, IVs in differences. Write (9.2.3) as y t = ceN + X t β + ²tθ , where eN is the N -vector of ones, y t = (y1t , . . . , yN t )0 , X t = (x1t , . . . , xN t )0 , etc. Using ∆X (t) as IV matrix for X t , we get the following estimator of β, specific to period t levels, utilizing all admissible x difference IVs, ·

b β

x(t)

=

X 0t (∆X 0(t) )

³

´−1

(∆X 0(t) )(∆X (t) )

·

X 0t (∆X (t) )

× =

·h P

0 i xit (∆xi(t) )

×

³

(∆X (t) ) X t ´−1

0

(∆X (t) ) (∆X (t) )

ihP

·h P

¸

ihP

0 i (∆xi(t) )(∆xi(t) )

0

(∆X (t) ) y t

i−1 hP

0 i (∆xi(t) )(∆xi(t) )

0 i xit (∆xi(t) )

¸−1

0

0 i (∆xi(t) )xit

i¸−1

i−1 hP

i (∆xi(t) )yit



.

(9.2.33)

It exists if (∆X (t) )0 (∆X (t) ) has rank (T −2)K, which again requires N ≥ (T −2)K. This GMM estimator, which also exemplifies (9.2.28), minimizes the quadratic form: µ

1 (∆X (t) )0 ²t N

¶0 ·

¸−1 µ

1 (∆X (t) )0 (∆X (t) ) N2



1 (∆X (t) )0 ²t . N

The weight matrix [N −2 (∆X (t) )0 (∆X (t) )]−1 is proportional to the inverse of the (asymptotic) covariance matrix of N −1 (∆X (t) )0 ²t when ²it is IID across i, possibly with a variance b depending on t. The consistency of β x(t) relies on (B3), (D1), (D2), and the validity of (E3) for all (p, q). b Interesting modifications of β x(t) are: (1) If var(²it ) = ωit varies with i and is known, we can increase the efficiency of (9.2.33) by replacing (∆xi(t) )0 (∆xi(t) ) by (∆xi(t) )0 ωit (∆xi(t) ), which gives an asymptotically P optimal GMM estimator. Estimation of i (∆xi(t) )0 ωit (∆xi(t) ) for unknown ωit proceeds as in (9.2.29). . (2) Instead of using ∆X as IV matrix for X , as in (9.2.33), we may use (∆X .. ∆Y ). (t)

(t)

t

(t)

If we replace assumptions (B2) and (C2) by (B1) or (C1) with arbitrary τ , we must ensure that the IVs have a lead or lag of at least τ +1 periods to the regressor, to ‘get clear of’ the τ period memory of the MA(τ ) process. Formally, we then replace P tθ and D t by7 

    P tθ(τ ) =      7

matrix obtained by deleting from the T -dimensional identity matrix rows θ − τ, . . . , θ + τ and t − τ, . . . , t + τ



    ,    



d.21 ..

   d  t−τ −1,t−τ −2 D t(τ ) =   dt+τ +1,t−τ −1  dt+τ +2,t+τ +1  ..  .

dT,T −1

The dimension of these matrices depends in general on τ .

16



     , t, θ = 1, . . . , T,    

and otherwise proceed as above.

(c) Composite GMM estimators We finally consider GMM estimation of β when we combine all essential OCs delimited by Propositions 1 and 2. We here assume that either (B1) and (C1) with τ = 0 or (B2) and (B2) are satisfied. If τ > 0, we can proceed as above, but must ensure that the variables in the IV matrix have a lead or lag of at least τ +1 periods to the regressor, to ‘get clear of’ the τ period memory of the MA(τ ) process, confer Part B of Propositions 1 and 2. Equation in differences, IVs in levels. Consider (9.2.5) for all θ = t − 1 and all θ = t − 2. These (T −1) + (T −2) equations stacked for individual i read               





∆yi21 ∆yi32 .. . ∆yi,T,T −1 ∆yi31 ∆yi42 .. . ∆yi,T,T −2

              =            





∆x0i21 ∆x0i32 .. . ∆x0i,T,T −1 ∆x0i31 ∆x0i42 .. . ∆x0i,T,T −2



∆²i21 ∆²i32 .. . ∆²i,T,T −1 ∆²i31 ∆²i42 .. . ∆²i,T,T −2

              β +             

       ,      

(9.2.34)

or, compactly, ∆y i = (∆X i )β + ∆²i . The IV matrix, according to Proposition 1, is the ((2T −3) × KT (T −2)) matrix8 



 xi(21)  0  ..   .   0 Zi =   0   0  ..   .

0

0 xi(32) .. . 0 0 0 .. . 0

··· 0 0 0 ··· 0 0 0 .. .. .. .. . . . . · · · xi(T,T −1) 0 0 ··· 0 xi2 0 ··· 0 0 xi3 .. .. .. .. . . . . ··· 0 0 0

··· 0 ··· 0 .. .. . . ··· 0 ··· 0 ··· 0 .. .. . . · · · xi,T −1

       .      

(9.2.35)

Let ∆y = [(∆y 1 )0 , . . . , (∆y N )0 ]0 ,

∆² = [(∆²1 )0 , . . . , (∆²N )0 ]0 ,

∆X = [(∆X 1 )0 , . . . , (∆X N )0 ]0 , Z = [Z 01 , . . . , Z 0N ]0 . The GMM estimator corresponding to E[Z 0i (∆²i )] = 0T (T −2)K,1 , which minimizes [N −1 (∆²)0 Z](N −2 V )−1 [N −1 Z 0 (∆²)] for V = Z 0 Z, can be written as h

i−1 h

0 0 −1 0 b β Dx = (∆X) Z(Z Z) Z (∆X)

hP

= [

P

0 i (∆X i ) Z i ] [

hP

× [

−1

0 i Z iZ i]

P

0 i (∆X i ) Z i ] [

8

i

(∆X)0 Z(Z 0 Z)−1 Z 0 (∆y) i−1

P

[

0 i Z i (∆X i )] −1

0 i Z iZ i]

P

[

i

0 i Z i (∆y i )] .

(9.2.36)

Formally, we here use different IVs for the (T −1) + (T −2) different equations in (9.2.4), with β as a common slope coefficient.

17

It is possible to include not only the essential OCs, but also the redundant OCs when constructing this GMM estimator. The singularity of Z 0 Z when including all OCs, due to the linear dependence between the redundant and the essential OCs, may be treated by replacing standard inverses in the estimation formulae by generalised (Moore-Penrose) inverses. b , which is shown formally in Biørn and Klette (1998). The resulting estimator is β Dx If ∆² has a non-scalar covariance matrix, a more efficient GMM estimator is obtained for V = V Z(∆²) = E[Z 0 (∆²)(∆²)0 Z], which gives h

i−1 h

−1 0 0 e β Dx = (∆X) ZV Z(∆²) Z (∆X)

We can estimate

1 N V Z(∆²)

i

0 (∆X)0 ZV −1 Z(∆²) Z (∆y) .

(9.2.37)

c = consistently from the residuals obtained from (9.2.37), ∆² i

b , by means of [see White (1984, sections IV.3 and VI.2) and (1986, ∆y i − (∆X i )β Dx section 3)] N c V 1 X Z(∆²) c )(∆² c )0 Z . = Z 0 (∆² (9.2.38) i i i N N i=1 i

Inserting (9.2.38) in (9.2.37), we get the asymptotically optimal (feasible) GMM estimator9 hP

e β Dx = [

i−1 P 0 c c0 0 0 −1 P i (∆X i ) Z i ][ i Z i ∆²i ∆²i Z i ] [ i Z i (∆X i )] hP i P c ∆² c 0 Z ]−1 [P Z 0 (∆y )] . × [ i (∆X i )0 Z i ][ i Z 0i ∆² i i i i i i

(9.2.39)

. These estimators can be modified by extending in (9.2.37) all xi(t,t−1) to (xi(t,t−1) .. y 0i(t,t−1) ) . and all xit to (xit ..y it ), which also exploit the OCs in the ys. Equation in levels, IVs in differences. Consider next the T stacked level equations for individual i [confer (9.2.3)] 





y  .i1    ..  =  yiT









c x  .i1   ..  .  +  ..  β +  c xiT



²i1 ..  . , ²iT

(9.2.40)

or, compactly, y i = eT c + X i β + ²i . The IV matrix, according to Proposition 2, is the (T × T (T −2)K) matrix10 



∆xi(1) · · · 0   .. .. .. ∆Z i =  . . . . 0 · · · ∆xi(T ) 9

(9.2.41)

It is possible to include the redundant OCs also when constructing this GMM estimator. Using (Moore-Penrose) inverses, the estimator remains the same. 10 Again, we formally use different IVs for different equations, considering (9.2.40) as T different equations with β as a common slope coefficient.

18

Let y = [y 01 , . . . , y 0N ]0 ,

² = [²01 , . . . , ²0N ]0 ,

X = [X 01 , . . . , X 0N ]0 , ∆Z = [(∆Z 1 )0 , . . . , (∆Z N )0 ]0 . The GMM estimator corresponding to E[(∆Z i )0 ²i ] = 0T (T−2)K,1 , which minimizes [N −1 ²0 (∆Z)](N −2 V ∆ )−1 [N −1 (∆Z)0 ²] for V ∆ = (∆Z)0 (∆Z), can be written as h

0 0 −1 0 b β Lx = X (∆Z)[(∆Z) (∆Z)] (∆Z) X

i−1

h

× X 0 (∆Z)[(∆Z)0 (∆Z)]−1 (∆Z)0 y

i

hP

= [

i−1 P −1 P 0 0 [ i (∆Z i )0 X i ] i X i (∆Z i )] [ i (∆Z i ) (∆Z i )] hP

× [

P

0 i X i (∆Z i )] [

i

P

−1 0 [ i (∆Z i ) (∆Z i )]

0 i (∆Z i ) y i ] .

(9.2.42)

If ² has a non-scalar covariance matrix, a more efficient GMM estimator is obtained for V ∆ = V (∆Z)² = E[(∆Z)0 ²²0 (∆Z)], which gives h

−1 0 0 e β Lx = X (∆Z)V (∆Z)² (∆Z) X

We can estimate

1 N V (∆Z)²

i−1 h

i

0 X 0 (∆Z)V −1 (∆Z)² (∆Z) y .

(9.2.43)

consistently from the residuals obtained from (9.2.43), by N c V 1 X (∆Z)² = (∆Z i )0 b²i b²0i (∆Z i ). N N i=1

(9.2.44)

Inserting (9.2.44) in (9.2.44), we get the asymptotically optimal GMM estimator hP

e β Lx = [

0 i X i (∆Z i )]

hP

× [

£P

0 0² b i ²i (∆Z i ) i (∆Z i ) b

¤−1 P

[

i−1

0 i (∆Z i ) X i ]

i

£P ¤−1 P 0 0 0² b [ i (∆Z i )0 y i ] i ²i (∆Z i ) i X i (∆Z i )] i (∆Z i ) b

.

(9.2.45)

. These estimators can be modified by extending all ∆xi(t) to (∆xi(t) .. ∆y 0i(t) ) in (9.2.41), which also exploit the OCs in the ∆ys. Other moment estimators, which will not be discussed specifically in the present EIV context, are considered for situations with predetermined IVs in Ziliak (1997), with the purpose of reducing the finite sample bias of asymptotically optimal GMM estimators.

9.2.7

Extensions. Modifications

All the methods presented so far rely on differencing as a way of eliminating the individual effects, either in the equation or in the instruments. This is convenient for the case where the individual heterogeneity has an unspecified correlation with the latent regressor vector and for the fixed effects case. Other ways of eliminating this effect in such situations are discussed in Wansbeek (2001). Their essence is to stack the matrix of covariances between the regressand and the regressors and eliminating these nuisance parameters by suitable 19

projections. Exploiting a possible structure, suggested by our theory, on the covariance matrix of the ξit s and αi across individuals and periods, may lead to further extensions. Additional exploitable structure may be found in the covariance matrix of the yit s. The latter will, however, lead to moment restrictions that are quadratic in the coefficient vector β. Under non-normality, higher order moments may also, in principle, be exploited to improve efficiency, but again at the cost of a mathematically less tractable problem. In a random effects situation, with zero correlation between ξit and αi , and hence between xit and αi , differencing or projecting out the αi s will not be efficient, since they will not exploit this zero correlation. The GLS estimator, which would have been the minimum variance linear unbiased estimator in the absence of measurement errors, will no longer, in general, be consistent [see Biørn (1996, Section 10.4.3)], so it has to be modified. Finally, if the equation contains strongly exogenous regressors in addition to the errorcontaminated ones, further moment conditions exist, which can lead to improved small sample efficiency of the GMM estimators. An improvement of small sample efficiency may also be obtained by replacing IV or GMM by LIML estimation; see Wansbeek and Meijer (2000, Section 6.6).

9.2.8

Concluding remarks

Above we have demonstrated that several, rather simple, GMM estimators which may handle jointly the heterogeneity problem and the measurement error problem in panel data, exist. These problems may be ‘intractable’ when only pure (single or repeated) cross section data or pure time series data are available. Estimators using either equations in differences with level values as instruments, or equations in levels with differenced values as instruments are useful. In both cases, the differences may be taken over one period or more. Even for the static model considered here, instruments constructed from the regressors (xs) as well as from the regressands (ys) may be of interest. GMM estimators combining both instrument sets in an optimal way, are usually more precise than those using either of them. Although a substantial number of orthogonality conditions constructed from differences taken over two periods or more are redundant, adding the essential two-period difference orthogonality conditions to the one-period conditions in the GMM algorithm may significantly affect the result [confer the examples in Biørn (2000)]. Using levels as instruments for differences, or vice versa, as a general estimation strategy within a GMM framework, however, may raise problems related to ‘weak instruments’. Finding operational ways of identifying such instruments among those utilizing essential orthogonality conditions in order to reduce their potential damage with respect to inefficiency, is a challenge for future research.

20

9.3

Simultaneity and panel data

Simultaneous equation models (SEM) or structural models as they are also sometimes called, have been around in the economic literature for a long time dating back to the period when the Econometric Society itself was formed. In spite of this long history, their relevance in modelling economic phenomena has not diminished; if at all it is only growing over time with the realisation that there is a high degree of interdependence among the different variables involved in the explanation of any socio-economic phenomenon. The theory of simultaneous equations has become a must in any econometric course whatever level it may be. This is due to the fact any researcher needs to be made attentive to the potential endogenous regressor problem, be it in a single equation model or in a system of equations and this is the problem that the SEM theory precisely deals with. At this stage it may be useful to distinguish between interdependent systems i.e simultaneous equations and what are called systems of regression equations or seemingly unrelated regressions (SUR) in which there are no endogenous variables on the right hand side but non-zero correlations are assumed between error terms of different equations. We will see later in the section that the reduced form of a SEM is a special case of SUR. In a panel data setting, in addition to the simultaneous nature of the model which invariably leads to non-zero correlation between the right hand side variables and the residual disturbance term, there is also the possibility of the same variables being correlated with the specific effects. However unlike in the correlated regressors case of Chapter 4 eliminating the specific effect alone does not solve the problem here and we need a more comprehensive approach to tackle it. We will develop generalizations of the two stage least squares (2SLS) and three stage least squares (3SLS) methods that are available in the classical SEM case. These generalizations can also be presented in a GMM framework, giving the corresponding optimal estimation in this context. The most commonly encountered panel data SEM is the SEM with error component (EC) structure. Thus a major part of this chapter will the devoted to this extension and all its variants. Other generalizations will be briefly discussed at the end.

9.3.1

SEM with EC

The Model This model proposes to account for the temporal and cross–sectional heterogeneity of panel data by means of an error components structure in the structural equations of a simultaneous equation system. In other words, the specific effects associated with pooled data are incorporated in an additive manner in the random element of each equation. Let us consider a complete linear system of M equations in M current endogenous variables and K exogenous variables. We do not consider the presence of lagged endogenous variables in the system. The reader is referred to the separate chapter of this book dealing with dynamic panel data models for treatment of such cases.

21

By a “complete” system, we assume that there are as many equations as there are endogenous variables and hence the system can be solved to obtain the reduced form. Further, we also assume that the data set is balanced i.e. observations are available for all the variables for all the units at all dates. Once again, the case of unbalanced panel data sets is dealt with in a separate chapter of the book. We write the M –th structural equation of the system as follows11 : 0 ∗ ∗ yit γm + x0it βm + umit = 0,

m = 1, ...M

(9.3.1)

0 is the (1 × M ) vector of observations on all the M endogenous variables for the where yit i–th individual at the t–th time period; x0it is the (1 × K) vector of observations on all ∗ and β ∗ are the K exogenous variables for the i–th individual at the t–th time period; γm m 0 and x0 ; and u respectively the coefficient vectors of yit is the disturbance term of the mit it m–th equation for the i–th individual and the t–th time period. More explicitly, 0 yit = [y1it . . . yM it ]; x0it = [x1it . . . xKit ]; 0

0

∗ ∗ ∗ βm = [β1m . . . βM m ];

∗ ∗ ∗ γm = [γ1m . . . γKm ].

By piling up all the observations in the following way:       Y =     



0 y11 .. .



      ;     

0 y1T .. .

     X=     

0 yN T

x011 .. . x01T .. .





      ;     

     um =      

x0N T

um11 .. . um1T .. .

       ,     

umN T

equation (9.3.1) can be written as: ∗ ∗ Y γm + Xβm + um = 0,

m = 1, ...M

(9.3.2)

Defining ∗ Γ = [γ1∗ . . . γM ];

∗ B = [β1∗ . . . βM ];

U = [u1 . . . uM ] ,

we can write the whole system of M equations as: Y Γ + XB + U = 0 .

(9.3.3)

Before turning to the error structure, we add that the elements of Γ and B satisfy certain a priori restrictions, crucial for identification, in particular the normalisation rule ( γii∗ = −1 ) and the exclusion restrictions (some elements of Γ and B are identically zero). 11

Note that the constant term is included in the β vector, contrary to the introductory chapters, and hence xit contains 1 as its first element.

22

Following an error components pattern, it is assumed that each structural equation error umit is composed of three components: an individual effect µmi , a time effect εmt and a residual error νmit . Formally, we have: Assumption 1: umit = µmi + εmt + νmit ,

m = 1, . . . , M i = 1, . . . , N t = 1, . . . , T .

(9.3.4)

By denoting  

lT0 (1

  × T ) = [1 . . . 1]; µm =   



µm1 .. .



     ; εm =     

µmN

        ; νm =        εmT  

εm1 .. .

νm11 .. . νm1T .. .

       ,     

νmN T the above decomposition (9.3.4) can be written for all the observations, as: um = (IN ⊗ lT )µm + (lN ⊗ IT )εm + νm ,

m = 1, . . . , M .

Assumption 2: E(µm ) = 0;

E(εm ) = 0;

E(νm ) = 0 ,

m = 1, . . . , M .

Assumption 3: E(µm µ0m0 ) = σµmm0 IN ,

m, m0 = 1, . . . , M

E(εm ε0m0 ) = σεmm0 IT ,

m, m0 = 1, . . . , M

0 E(νm νm = σνmm0 IN T , 0)

m, m0 = 1, . . . , M

Assumption 4: 0 0 E(µm ε0m0 ) = 0; E(µm νm E(εm νm ∀m, m0 . 0 ) = 0; 0 ) = 0,

We will also assume independence, two by two, among the different components whenever required, and normality of their distribution for ML estimation. Assumption 5: The error components are independent of the exogenous variables. From these assumptions, the covariance matrix between um and um0 , denoted as Σmm0 , can be derived as: 0 Σmm0 = E(um u0m0 ) = σµmm0 (IN ⊗ lT lT0 ) + σεmm0 (lN lN ⊗ IT ) + σνmm0 IN T .

23

(9.3.5)

The spectral decomposition of Σmm0 is given by (see Nerlove (1971)) Σmm0 = σ1mm0 M1 + σ2mm0 M2 + σ3mm0 M3 + σ4mm0 M4

(9.3.6)

where σ1mm0

= σνmm0

σ2mm0

= σνmm0 + T σµmm0

σ3mm0

= σνmm0 + N σεmm0

σ4mm0

= σνmm0 + T σµmm0 + N σεmm0

(9.3.7)

and 1 1 1 0 0 (IN ⊗ lT lT0 ) − (lN lN ⊗ IT ) + lN T lN T T N NT of rank m1 = (N − 1)(T − 1); 1 1 0 = (IN ⊗ lT lT0 ) − lN T lN T T NT of rank m2 = N − 1; 1 1 0 0 = (lN lN ⊗ IT ) − lN T lN T N NT of rank m3 = T − 1; 1 0 = lN T lN T NT of rank m4 = 1

M1 = IN T −

M2 M3 M4

with

4 X

Mi = IN T ;

Mi Mj = δij Mi .

i=1

Further, we note that 0 lN T Mi = 0 ,

i = 1, 2, 3 ;

0 0 lN T M4 = lN T .

By denoting Σµ = [σµmm0 ] , Σε = [σεmm0 ] , Σν = [σνmm0 ] , m, m0 = 1, . . . , M , relations (9.3.7) can be written in matrix form as: Σ1 = Σν ; Σ2 = Σν + T Σµ ; Σ3 = Σν + N Σε ; Σ4 = Σν + T Σµ + N Σε . Note that Σµ , Σε and Σν are uniquely determined from Σ1 , Σ2 , Σ3 , Σ4 and vice versa. Finally, the variance–covariance matrix of the structural form can be verified to be: Σ = E((vec U )(vec U )0 ) =

4 X i=1

24

Σi ⊗ Mi

(9.3.8)

with Σi = [σimm0 ] m, m0 = 1, . . . M

for i = 1, 2, 3, 4.

The inverse and determinant of Σ (useful for the estimation procedures of later sections) are given by (see Baltagi(1981) or Balestra and Krishnakumar(1987)): −1

Σ

=

4 X

4 mi Σ−1 . i ⊗ Mi ; | Σ | = Πi=1 | Σi |

(9.3.9)

i=1

The Reduced Form and the Identification Problem By definition, the reduced form of a system of simultaneous equations is the solution of the system for the endogenous variables in terms of the exogenous variables and the disturbances. For our model, it is given by: Y = XΠ + V where Π = −BΓ−1 ; V = −U Γ−1 . By using the properties of vec, we can write 0

vec V = (−Γ−1 ⊗ I) vec U and thus we have: E(vec V ) = 0 and Ω = E((vec V )(vec V )0 ) 0

= (−Γ−1 ⊗ I)Σ(−Γ−1 ⊗ I) =

4 X

0

Γ−1 Σi Γ−1 ⊗ Mi

i=1

=

4 X

Ωi ⊗ Mi

i=1

where 0

Ωi = Γ−1 Σi Γ−1 ,

i = 1, 2, 3, 4.

It can be easily verified that each reduced form equation has a three components error structure like any structural equation and the covariances across different reduced form equations are also of the same nature as those across different structural equations. However, an important point in which the reduced form differs from the structural form is that the right hand side variables of the former are uncorrelated with the errors whereas it is not the case in the latter due to simultaneity. 25

Thus the reduced form is a seemingly unrelated regression (SUR) model with error components. This model was originally proposed by Avery (1977) and is an important extension of panel data specifications to systems of equations. Our reduced form is in fact a special case of such a model as the explanatory variables are the same in each equation. Avery (1977) treated a more general case in which each equation has its own set of explanatory variables. This interpretation of our reduced form enables us to provide an interesting application of Avery’s model combining SUR with error components (EC). We do not intend to go into the details of the inference procedures for the reduced form for want of space. In general, both ML and feasible GLS can be applied. Both are consistent, asymptotically normal and equivalent. The reader is referred to Krishnakumar (1988) for detailed derivations. In the context of any simultaneous equation model, it is important to consider the problem of identification prior to estimation. In the case of the classical simultaneous equation model (with homoscedastic and non–auto–correlated errors), there is abundant literature on identification (see, for instance, Koopmans (1953), Fisher (1966), Rothenberg (1971) and Hausman and Taylor (1983)). In our case of SEM with EC, as long as there are no a priori restrictions on the structural variances and covariances (i.e. no “covariance restrictions” in the terminology of Hausman and Taylor), the identification problem is exactly the same as that of the classical model. In other words, in such a situation, we can separate the discussion on the identification of Γ and B from that of the Σi (s), i = µ, ε, ν. Thus, we would have the same rank and order conditions of identifiability of the elements of Γ and B, and the same definitions of under–identified, just–identified and over–identified equations. Once the structural coefficients are identified, the identification of the structural variance– covariance matrices is immediate, through the equations relating them to the reduced form covariance matrices. Now, if we impose additional a priori restrictions on the structural variances and covariances, then it is no longer possible to separate the equations relating (Γ, B) to Π from those relating Σi (s) to Ωi (s), i = µ, ε, ν and one has to study the existence and uniqueness of solutions for the full system consisting of all the identifying equations, given the prior restrictions. This has been done for the classical simultaneous equation model by Hausman and Taylor (1983). One can follow the same approach for our model but one has to keep in mind the fact that, in the classical case, there is only one Σ whereas in our 0 case there are three of these sets of relations: Ωi Γ0 = Γ −1 Σi , i = µ, ε, ν. One type of a priori covariance restrictions that do not need any particular analysis is that either Σµ or Σε is identically equal to zero (i.e. only one specific effect is present in the model) and hence is identified. Note that, in this case, the corresponding Ω matrix (Ωµ or Ωε ) is also zero and the spectral decomposition of Σ (and Ω) is reduced to two terms only.

26

Structural Form Estimation Generalised Two Stage Least Squares Let us consider a structural equation, say the m–th one and write it as: ym = Ym γm + Xm βm + um ,

(9.3.10)

∗ in which the normalisation rule (βmm = −1) and the exclusion restrictions are already substituted. We assume that these are the only a priori information available. Note that ∗ included endogenous and K Ym and Xm denote the matrices of observations on the Mm m included exogenous right hand side variables respectively and γm and βm denote their respective coefficients. By defining





Zm = [Ym Xm ];

αm = 

γm βm

 ,

we can rewrite (9.3.10) as ym = Zm αm + um

(9.3.11)

and we recall (see (9.3.6)) that E(um u0m0 )



mm0

=

4 X

σimm0 Mi .

i=1

The endogenous right hand side variables of equation (9.3.10) are correlated with both the individual effects and the residual error term. Hence classical methods like the OLS, GLS, or within will all yield inconsistent estimators and an appropriate procedure is given by the IV method which typically consists in premultiplying the equation in question by a matrix of valid instruments and then applying GLS to the transformed equation. In the classical case, the instrument for Zm is taken to be X (see, for instance, Theil (1971)). In our case, it can be shown that, of all the transformations of X, say F X, the one which minimises the asymptotic variance–covariance matrix of the resulting estimator of αm , is given by F = Σ−1 mm . In other words, any other transformation would lead to an estimator with an asymptotic variance–covariance matrix “greater” than the one obtained −1 (“greater” is used to mean that the difference would be positive definite). This using Σmm result is based on Theorem 5 of Balestra (1983). Its application to our model can be found in Krishnakumar (1988). Therefore the optimal instrument for Zm is given by Σ−1 mm X and premultiplying (9.3.11) 0 −1 by X Σmm , we get: 0 −1 0 −1 X 0 Σ−1 mm ym = X Σmm Zm αm + X Σmm um .

(9.3.12)

Applying GLS on (9.3.12), we obtain what we call the generalised two stage least squares (G2SLS) estimator of αm : 27

0 Σ−1 X(X 0 Σ−1 X)−1 X 0 Σ−1 Z ]−1 b m,G2SLS = [Zm α mm mm mm m 0 Σ−1 X(X 0 Σ−1 X)−1 X 0 Σ−1 y × Zm mm mm mm m

(9.3.13)

Now, the above estimator is not feasible as Σmm is unknown. Hence we need a prior estimation of the variance components. By analysis of variance of the errors of the m–th structural equation, the following estimators of the σimm (s) are obtained: 1 u0 M1 um (N − 1)(T − 1) m 1 = u0 M2 um N −1 m 1 = u0 M3 um T −1 m e2mm + σ e3mm − σ e1mm = σ

e1mm = σ e2mm σ e3mm σ e4mm σ

(9.3.14)

These formulae contain um which is also unknown. However, it can be estimated as follows. Premultiplying equation (9.3.11) by the instrument M1 X, we get: X 0 M1 ym = X 0 M1 Zm αm + X 0 M1 um . Note that, if the equation has an intercept, it gets eliminated by this transformation and we will be left with: ∗ ∗ X 0 M1 ym = X 0 M1 Zm αm + X 0 M1 um (9.3.15) ∗ denotes the matrix of right hand side variables excluding the vector of ones and where Zm ∗ the respective coefficients. That is, we have split Z and α as: αm m m

 ∗ Zm = [Ym lN T Xm ];

  

γm

   

αm =  a m  bm

∗ and α∗ as and redefined Zm m

 ∗ ∗ Zm = [Ym Xm ];

∗ αm =



γm bm

 .

∗ called the covariance Performing GLS on (9.3.15), we obtain a consistent estimator of αm or the within 2SLS estimator:

0

0

∗ −1 ∗ ∗ ∗ b m,cov2SLS M1 X(X 0 M1 X)−1 X 0 M1 Zm ] Zm M1 X(X 0 M1 X)−1 X 0 M1 ym α = [Zm

The intercept is estimated as: bm,cov2SLS = a

1 0 ∗ ∗ b m,cov2SLS ) . l (ym − Zm α N T NT 28

(9.3.16)

From these estimators, we can predict um as: ∗ bm,cov2SLS = ym − Zm b m,cov2SLS − lN T a bm,cov2SLS . u α

b mm = bm,cov2SLS for um in (9.3.14), we obtain σ bimm , i = 1, 2, 3, 4 and Σ Substituting u bimm Mi , leading to the following feasible G2SLS estimator of αm : i=1 σ

P4

0 Σ b −1 Zm ]−1 b −1 X)−1 X 0 Σ b −1 X(X 0 Σ b m,f G2SLS = [Zm α mm mm mm 0 Σ b −1 ym b −1 X)−1 X 0 Σ b −1 X(X 0 Σ ×Zm mm mm mm

(9.3.17)

Before giving the limiting distribution of the above estimators, we just mention that all our estimators are consistent. Another interesting point to note is that all the three estimators — Cov2SLS, G2SLS, fG2SLS — have the same limiting distribution. It is given by (see Krishnakumar (1988) for derivation):  √

N  √ NT

 bm − am ) (a ∗ − α∗ ) bm (α m





 ∼ N 0, 



σµmm + σεmm

0

0

0 R e m Pem )−1 σνmm (Pem



where 

1

  Pem =  0 

0

0



  Π∗m  ; 

 em =  R



1/(σµmm + σεmm )

0

0

1/(σνmm )R

∗ Hm



with Π∗m being the coefficient matrix of X in the reduced form equations for Ym except ∗ being a selection matrix such that X ∗ = XH ∗ . for the column of ones and Hm m m Generalised Three Stage Least Squares The extension from G2SLS to generalised 3SLS (G3SLS) can be done in two ways. In what follows, we present both the ways and show that they yield asymptotically equivalent estimators. The reader will recall that the G2SLS method uses the instrument matrix Σ−1 mm X for the m–th equation. Applying to each structural equation of the system, its corresponding transformation given by Σ−1 mm X, m = 1, 2, . . . , M, we obtain: 0 −1 0 −1 X 0 Σ−1 11 y1 = X Σ11 Z1 α1 + X Σ11 u1 .. .

X 0 Σ−1 M M yM

0 −1 = X 0 Σ−1 M M ZM αM + X ΣM M um

or e 0 D−1 y = X e 0 D −1 Zα + X e 0 D−1 u X

29

(9.3.18)

where12 e = I ⊗X X

D =

diag [Σ11 . . . ΣM M ]

Z =

diag [Z1 . . . ZM ]

α

0

0 = [α10 . . . αM ]

u0 = [u01 . . . u0M ] 0 y 0 = [y10 . . . yM ]

Now, let us apply GLS to the transformed system (9.3.18) to obtain our first generalised 3SLS (G3SLS–I) estimator: e −1 XD e −1 Z]−1 e X e 0 D −1 ΣD−1 X) b G3SLS−I = [Z 0 D−1 X( α e X e 0 D−1 ΣD −1 X) e −1 X e 0 D−1 y ×Z 0 D−1 X(

(9.3.19)

Note that this way of generalising is analogous to the way that classical 2SLS is extended to 3SLS by Zellner and Theil (1962). However, there is also a second way of approaching the problem, that we briefly present below. Recall that our reason for choosing Σ−1 mm X as the instrument for Zm in the G2SLS procedure was that it minimised the asymptotic covariance matrix of the resulting coefficient estimator. Now, let us write the whole system as: y = Zα + u with E(u) = 0

and E(uu0 ) = Σ

and find the best transformation F of (I ⊗ X) for choosing the instruments. By the same reasoning as for G2SLS, we would get F = Σ−1 . Using Σ−1 (I ⊗ X) as instruments and estimating α by GLS on the transformed system yields our second G3SLS (G3SLS–II) estimator: e X e 0 Σ−1 X) e −1 X 0 Σ−1 Z]−1 b G3SLS−II = [Z 0 Σ−1 X( α e XΣ e −1 X) e −1 X e 0 Σ−1 y ×Z 0 Σ−1 X(

(9.3.20)

Both these G3SLS estimators can be made feasible by replacing the variance components present in Σ by their corresponding estimates given by analysis of variance: b1mm0 σ b2mm0 σ b3mm0 σ b4mm0 σ 12

1 b0m M1 u bm0 u (N − 1)(T − 1) 1 b0m M2 u bm0 = u N −1 1 b0m M3 u bm0 u = T −1 b2mm0 + σ b3mm0 − σ b1mm0 = σ =

Note that this particular notation for D is valid only for this chapter.

30

(9.3.21)

bm (s), we can take: for m, m0 = 1, . . . M . Note that for the u bm,cov2SLS = ym − Zm α b m,cov2SLS u

or bm,f G2SLS = ym − Zm α b m,f G2SLS u

or even bm,cov3SLS = ym − Zm α b m,cov3SLS u b m,cov3SLS is yet another 3SLS estimator obtained by using the instrument matrix where α (I ⊗ M1 X) for the system and estimating by GLS. b = P4 Σ b bimm0 (s) given by (9.3.21), we form Σ From the estimates of σ i=1 i ⊗ Mi with 0 b i = [σ bimm0 ], m, m = 1, . . . , M and use it in (9.3.19) and (9.3.20) to get the feasible Σ G3SLS estimators. It is remarkable that due to the special structure of the error–components covariance matrix, all these 3SLS estimators, namely the pure G3SLS–I, pure G3SLS–II, cov3SLS, feasible G3SLS–I and feasible G3SLS–II, have the same limiting distribution given by:  √

N  √ NT

 b − a) (a b ∗ − α∗ ) (α







∼ N 0, 



Σµ + Σ ε

0

0

¯ 0 (Σ−1 ¯ −1 [Π ν ⊗ R)Π]



where a is a (M × 1) column vector containing the intercepts of each equation i.e. a0 = [a1 . . . aM ] and α∗ is ((M − 1)M × 1) containing the other non–zero coefficients of each 0 0 ∗0 ] and where Π ∗ ]), m = 1, . . . , M . ¯ = diag ([Π∗m Hm equation i.e. α∗ = [α1∗ . . . αM Finally, let us note that, though we assume the presence of an intercept in each equation, the above results can be easily generalised to the case in which some equations have an intercept and others do not. Error Components Two Stage Least Squares This is an alternative method of estimating the parameters of a single structural equation. This method is proposed by Baltagi (1981) and inspired from the feasible Aitken procedure developed by Maddala (1971) for a single equation error components model. In this method, the structural equation in question say the m–th one, is successively transformed by the matrices of eigenvectors associated with the distinct characteristic roots of Σmm and GLS is performed on a system comprising all the three transformed equations. Before going further, let us introduce some more notations. From Section 7.1.1.2 we know that the distinct eigenvalues of Σmm0 are σ1mm0 , σ2mm0 , σ3mm0 and σ4mm0 . The matrices whose columns are the eigenvectors associated with these roots are Q1 , Q2 , Q3 √ √ √ and lN T / N T respectively where Q1 = C2 ⊗ C1 , Q2 = C2 ⊗ lT / T , Q3 = lN / N ⊗ C1 √ √ 0 = [l0 / N such that OT0 = [lT0 / T C10 ] and ON C20 ] are orthogonal. Note that Qj Q0j N are unique for j = 1, 2, 3 and Q0j Qj = Mj , j = 1, 2, 3. Now, let us apply the transformations Qj , j = 1, 2, 3 to our structural equation (9.3.11):

31

Qj ym = Qj Zm αm + Qj um ,

j = 1, 2, 3 .

(9.3.22)

It is easily verified that   σjmm Im j E(Qj um u0m Q0j ) =  0

forj = j 0 forj 6= j 0

Thus the transformed errors have a scalar variance–covariance matrix but are still correlated with the right hand side variables. Hence an IV technique is used with Qj X as instruments for Qj Zm . This gives: (j)

0 Q0 Q X(X 0 Q0 Q X)−1 X 0 Q0 Q Z ]−1 [Zm j j j j j j m

b m,2SLS = α

0 Q0 Q X(X 0 Q0 Q X)−1 X 0 Q0 Q y , ×Zm j j j j j j m

j = 1, 2, 3

(9.3.23)

These 2SLS estimators are in turn used to estimate the variance components: 1 (j) (j) b m,2SLS ), b m,2SLS )0 (Qj ym − Qj Zm α (Qj ym − Qj Zm α mj b2mm + σ b3mm − σ b1mm = σ

bjmm = σ b4mm σ

j = 1, 2, 3 (9.3.24)

This is a generalisation of the Swamy and Arora (1972) method. The above procedure gives three different estimators of the same αm . Therefore, we can combine all the three transformed equations of (9.3.22) together and estimate the whole system by GLS. We have: 

X 0 Q01 Q1 ym





X 0 Q01 Q1 Zm





X 0 Q01 Q1 um

  0 0  X Q2 Q2 ym 

    0 0  =  X Q2 Q2 Zm  

     αm +  X 0 Q02 Q2 um  

X 0 Q03 Q3 ym

X 0 Q03 Q3 Zm

X 0 Q03 Q3 um

    . 

(9.3.25)

Using the Swamy and Arora estimates (9.3.24) of the variance components and performing feasible GLS on (9.3.25), we get the error components two stage least squares (EC2SLS) estimator: P3

1 Z 0 Q0 Q X(X 0 Q0j Qj X)−1 X 0 Q0j Qj Zm ]−1 j=1 b σjmm m j j P 0 Q0 Q X(X 0 Q0 Q X)−1 X 0 Q0 Q y ]] ×[ 3j=1 bσ 1 [Zm j j j j j j m jmm

b m,EC2SLS = [ α

(9.3.26) (9.3.27)

It can be shown that the above estimator is a weighted average of the three 2SLS estimators given in (9.3.23). The limiting distribution of the EC2SLS estimator is the same as that of the feasible G2SLS estimator. Error Components Three Stage Least Squares 32

In this section, we present an extension of the EC2SLS method to the whole system. We start with y = Zα + u and transform it successively by (IM ⊗ Qj ),

j = 1, 2, 3 to give:

y (j) = Z (j) α + u(j) ,

j = 1, 2, 3

where y (j) = (IM ⊗ Qj )y ; Z (j) = (IM ⊗ Qj )Z ; u(j) = (IM ⊗ Qj )u 0

E(u(j) u(j) ) = Σj ⊗ Imj ,

(9.3.28) and

j = 1, 2, 3 .

Using X (j) = (IM ⊗ Qj X) as instruments for Z (j) and applying GLS, we get: (j)

0

(j) −1 [Z (j) {Σ−1 j ⊗ PX (j) }Z ]

b IV GLS = α

0 ×[Z (j) {Σ−1 j

⊗ PX (j)

}y (j) ]

(9.3.29)

j = 1, 2, 3

where for any matrix A, PA denotes the projection matrix A(A0 A)−1 A. The unknown variance components are estimated by bjmm0 σ b4mm0 σ

1 (j) (j) b m,2SLS ]0 [Qj ym0 − Qj Zm0 α b m0 ,2SLS ] , [Qj ym − Qj Zm α mj b2mm0 + σ b3mm0 − σ b1mm0 = σ

=

j = 1, 2, 3

Now, recognising once again that the same α is being estimated three times separately, we can combine all the three transformed systems of (9.3.28) and estimate the global system by (feasible) IVGLS. The resulting estimator is called the error component 3SLS (EC3SLS) estimator of α: P3

(j)0 (Σ b −1 ⊗ j=1 Z j P3 0 (j) b −1 ×[ j=1 Z (Σ j

b EC3SLS = [ α

PX (j) )Z (j) ]−1

(9.3.30)

)y (j) ]

(9.3.31)

⊗ PX (j)

The above estimator also has the same interpretation as the EC2SLS one, in that it is a weighted average of the three 3SLS estimators of (9.3.29) (see Baltagi (1981) for further details). Finally, the limiting distribution of the EC3SLS estimator can be shown to be the same as that of the G3SLS estimators of the previous section and hence is asymptotically equivalent to them. Full Information Maximum Likelihood The full information maximum likelihood (FIML) procedure consists in maximising the log–likelihood function of the model with respect to the structural parameters given the a priori restrictions. As in all constrained maximisation problems, there are two ways of tackling it — (i) by maximising the corresponding Lagrangian function with 33

respect to the same parameters and a set of multipliers associated with the constraints; (ii) by substituting the constraints in the objective function and performing maximisation without constraints. In this section, we will briefly review both the approaches. The reader will note that neither of them yield explicit analytical solutions and hence both require numerical iterative procedures to arrive at the solution. Moreover, in the first approach adopted by Balestra and Krishnakumar (1987) and Krishnakumar (1988), the a priori restrictions on the structural coefficients are assumed to be any linear ones whereas in the second approach followed by Prucha (1985), only the normalisation and exclusion restrictions are considered. Recalling our structural model : Y Γ + XB + U = 0 . and separating the intercept of each equation from the other terms, we can write: Y Γ + lN T a0 + X∗ B∗ + U = 0 or lN T a0 + Z∗ Θ∗ + U = 0 where



Θ∗ = 



Γ B∗

 .

Note that in case only some equations have an intercept and others do not, the following procedure can be easily modified accordingly. Now, the a priori restrictions on the coefficients can be written as (say we have p of them):  



S0 0 0



a



S∗

vec Θ∗



=



s0



s∗

(9.3.32)

These include the normalisation rule, the exclusion restrictions and any other linear constraints. To these, we add the symmetry conditions for Σj (s) written as: C vec Σj = 0 ,

j = µ ,ε ,ν .

(9.3.33)

The log–likelihood function of the model can be written as follows, after a few simplifications and rearrangements:

lnL =

P4

1 0 2 i=1 mi ln | Σi | + 2 N T ln | L Θ∗ | 0 Z Θ )Σ−1 − 12 tr(N T aa0 + Θ0∗ Z∗0 lN T a0 + alN 4 T ∗ ∗ P 4 1 0 0 − 2 tr i=1 Θ∗ Z∗ Mi Z∗ Θ∗ Σ−1 i

const −

1 2

34

(9.3.34)

with L such that Γ = LΘ∗ . Thus we have to maximise (9.3.34) with respect to a, Θ∗ , Σµ , Σε and Σν under the constraints (9.3.32) and (9.3.33). Here again we will not describe the procedure in detail for brevity’s sake and the reader is invited to consult Balestra and Krishnakumar (1987) for more information on the algorithm to be implemented in order to obtain a numerical solution. We give below the limiting distribution of the FIML estimator:  √

T  √ NT where

 bM L − a) (a b ∗,M L − Θ∗ ) vec(Θ





 ∼ N 0, 



P∗ = 

Π0∗ I



Σµ + Σ ε

0

0

−1 0 F [F 0 (Σ−1 ν ⊗ P∗ )F ] F



  R

(Π∗

I)

When the a priori restrictions are only the normalisation and exclusions, we have the same limiting distribution for the FIML as the one for the (feasible) G3SLS. Hence, in this case, the FIML and the fG3SLS are of the same asymptotic efficiency. As mentioned in the beginning of this section, there is a second approach to the constrained maximisation problem which consists in replacing the constraints in the objective function and then maximising the latter with no constraints. This has been done by Prucha (1985) for our model for the case of the usual restrictions only and is called the normal FIML (NFIML) estimator. The normal equations of the above maximisation programme lead to an IV interpretation of the ML estimator which can be used as an estimator– generating equation to form a general class of estimators called the NFIMLA estimator (the subscript A indicates that the estimator can be viewed as an approximation of the NFIML estimator). Further Prucha also shows that under certain conditions, all members of the NFIMLA class are asymptotically equivalent among themselves and to the NFIML estimator. Asymptotic Comparisons of the Various Structural Estimators In this section, we will summarise the different asymptotic equivalences mentioned earlier and state a few more results regarding the just–identified case. First, let us briefly recall the results that we already know in the case of the usual restrictions. We have the asymptotic equivalence of the various 2SLS estimators namely, the cov2SLS, fG2SLS and EC2SLS. Among the system methods, we have the asymptotic equivalence of cov3SLS, fG3SLS–I, fG3SLS–II, EC3SLS and FIML estimators. Regarding the just–identified case, we will mention the important results without deriving them. The reader is referred to the original works by Krishnakumar (1988) and Baltagi (1981) for proofs. When a single equation, say the m–th one, is just–identified:

35

(i) the indirect least squares estimators obtained using the covariance estimator of Π, is exactly equal to the cov2SLS estimator; (ii) the indirect least squares estimator obtained using the feasible GLS estimator of Π has the same limiting distribution as the feasible G2SLS estimator; (iii) the EC2SLS estimator can be expressed as a weighted combination of three indirect estimators of αm ; (iv) the three 2SLS estimators of equation (9.3.23) are respectively equal to the indirect least squares estimators based on the between groups, between time periods and within variations estimators of the reduced form; b cov or (v) all these estimators — feasible G2SLS, cov2SLS, indirect estimators based on Π b f GLS and EC2SLS — are asymptotically equivalent. Π When the whole system is just–identified: (i) fG3SLS–I reduces to fG2SLS whereas the fG3SLS–II does not; (ii) fG3SLS–I, fG3SLS–II, fG2SLS and the indirect estimators are all asymptotically equivalent; (iii) EC3SLS does not reduce to EC2SLS; (iv) EC3SLS and EC2SLS have the same limiting distribution and (v) all these estimators — fG3SLS–I, fG3SLS–II, cov3SLS, fG2SLS, cov2SLS, EC3SLS, EC2SLS — are asymptotically equivalent. Small Sample Properties There are essentially two ways of arriving at the small sample behaviour of econometric estimators. One is by analytically deriving the exact distribution or an approximation to it and the other is by “constructing” the distribution through simulations (also called Monte–Carlo experiments). In the case of the reduced form, the unbiasedness of the various coefficient and variance components estimators is proved without great difficulty (see Krishnakumar (1988)). However, exact efficiency properties are yet to be established and so far nothing is known. In the case of the structural form estimators, things get very complicated. In the classical simultaneous model, several authors have dealt with the problem of finding the exact distributions of the two stage and three stage estimators. The reader is invited to consult Phillips (1982) for more information in the classical case. In the SEM with EC case, we have no result on the exact density functions of the various structural estimators. However, we do have results on approximations to finite sample moments using series expansions methods. These methods are used even when we have the analytical expression of density functions since they yield much less complicated expressions. In these methods, the estimator is developed around its true value in a series of terms of orders decreasing in the powers of the sample size. Then the series is truncated upto a desired order and the expectation of the truncated series is calculated to get the bias upto that order. This procedure has been applied to our model by Krishnakumar (1988), following the approach of Nagar (1959), to get approximations for the bias of cov2SLS and fG2SLS estimators. 36

We will not go deeper into this aspect here. The results and derivations can be found in Krishnakumar (1988). Now, we turn to the second approach — the Monte–Carlo study. This method consists in specifying a true model giving values for all the parameters, generating the random elements and the observations on the exogenous variables, calculating the endogenous variables and estimating the parameters using only the observations. By running the procedure a number of times with different sets of observations (keeping the true values unchanged), one can “construct” the distribution curve of the estimator and derive its mean, variance, mean square error and so on. These criteria can be used to compare the performance of different estimation methods. In addition, the whole exercice can be repeated for different sets of true values. Baltagi (1984) carried out such a Monte–Carlo experiment for the SEM with EC, in which he compared various least squares and IV estimators of a two–equation structural model, keeping the same true values for the coefficients and changing only the values of the variance components. In what follows, we will briefly review the main results concerning the structural form and the reduced form. For results regarding the variance components estimators, the reader is referred to Baltagi (1984). First, the structural form results. The classical 2SLS has a smaller bias than the EC2SLS but the EC2SLS has a lower root mean square error (RMSE) than the classical 2SLS. Better estimates of the structural variance components do not necessarily imply better estimates of the structural coefficients. In general, 3SLS dominates 2SLS and EC3SLS dominates EC2SLS in RMSE though the superiority of EC3SLS over EC2SLS does not hold for all the structural parameters. There is gain in performing EC3SLS rather than classical 3SLS according to RMSE. Similar results are also obtained if we use global criteria like the normalised mean square deviation and the normalised mean absolute deviation which give a single indicator for the combined performance of all parameter estimators. Now, the reduced form results. Performing feasible GLS on each reduced form equation is better than performing OLS or LSDV, according to RMSE. But, according to the same criterion, feasible GLS on the entire system does not necessarily produce better results than feasible GLS on each equation separately. Baltagi notes that this could be due to the fact that there are only two equations in the model and may not be so in larger models. Once again better estimates of the variance components do not necessarily imply better feasible GLS estimates of coefficients. The same results are maintained even according to global criteria. M´aty´as and Lovrics (1990) investigate the small scale properties of 5 limited information estimators for SEM with EC models by means of a Monte Carlo study. They compare the OLS estimator, the within estimator, the pure G2SLS, and two feasible G2SLS estimators (one with OLS as the first step and the other with within). Their findings are as follows: The OLS estimator remains biased in all cases. But it is still recommended for very small N and T (N < 10, T < 20) due to its stability as the G2SLS/within 2SLS are un-

37

stable and have a large dispersion. For N < 10 and T > 20 they favour the G2SLS/within 2SLS estimators and for a sufficient (N > 15 − 20) as long as T > 5. There is practically no difference between the three G2SLS (pure and the two feasible) estimators. Baltagi and Chang (2000) study the relative performance of several estimators of a two-equation SEM with unbalanced panel data. Among the single equation metods they compare 2SLS, W2SLS and the EC2SLS and among the system estimators they look at 3SLS, W3SLS and EC3SLS. They observe that most of the results obtained for the balanced case carry over to the unbalanced one.

9.3.2

Extensions

Simultaneous Equation Models with Correlated Specific Effects In the SEM with EC discussed in the previous subsections, it was assumed that the error components were uncorrelated with the exogenous variables. Cornwell, Schmidt and Wyhowski (1992) extend our model to the case in which this assumption is dropped. They allow for the possibility of only some variables being correlated with the error components (singly exogenous) while the others are independent of them (doubly exogenous). Their model is specified as follows: ym = Ym δm + Xm βm + Zm γm + αm + εm , m = 1, . . . , M .

(9.3.35)

A distinction is also made between time–varying exogenous variables (X) and the time– invariant exogenous variables (Z) and only individual effects are present in the model (i.e. we have only a two–components error term). Denoting, Rm = [Ym Xm Zm ] ;

ξ 0m = [δ 0m β 0m γ 0m ] ,

we can write (9.3.35) as ym = Rm ξm + (αm + εm ) ,

m = 1, . . . , M −1

The 2SLS proposed transforms the equation say the first one by Σ112 to get: −1

−1

−1

Σ112 y1 = Σ112 R1 ξ1 + Σ112 (α1 + ε1 ) and use instruments of the form A = [Qv X Pv B] with different choices for B. Three different choices are proposed the first one corresponding to the instrument set of Hausman and Taylor (1981), the second one inspired from Amemiya and McCurdy (1986) and the third based on Breusch et al. (1987). The Three Stage Least Squares generalises the procedure for the whole model. The authors also derive estimators in the case in which the nature of the correlation between the exogenous variables and the specific effects may vary from equation to equation. In other words, we may have an exogenous variable correlated with the specific effect

38

in one equation but uncorrelated with the specific effect in another equation. In this case, the instrument set also varies across equations. In case the specific effects are assumed to be fixed the authors show that the model can be estimated by OLS after a within transformation. Simultaneous Error Component Models with Censored Endogenous Variables Another recent extension is the inclusion of censored endogenous variables in a simultaneous EC model, by Vella and Vernbeek (1999). Their model is a two–equation system in which the first one is the primary focus and the second one is already in the reduced form. For i = 1, . . . , N ; t = 1, . . . , T we have: ∗ yit = m1 (xit , zit ; θ1 ) + µi + ηit

(9.3.36)

∗ zit = m2 (xit , zit ; θ2 ) + αi + νit

(9.3.37)

∗ zit = h(zit , θ3 ) ∗ yit = k(yit ) ∗ and z ∗ are where i indexes individuals (i = 1, ..., N ), t time periods (t = 1, ..., T ), yit it latent endogenous variables with observed counterparts yit and zit ; m1 and m2 denote general functions characterized by the unknown parameters in θ1 and θ2 , respectively. The mapping from the latent to the observed variables is through the censoring functions h and k, h depending on another unknown parameter vector θ3 . An error component structure is specified for the disturbance term of each equation (µi and ηit for equation (9.3.36) and αi and νit for equation (9.3.37)) with the components being independent across individuals. Denoting εit = µi + ηit and uit = αi + νit , it is assumed that ui | Xi ∼ N ID(0, σα2 ιι0 + σν2 I),

E(εit | Xi , ui ) = τ1 uit + τ2 u ¯i

(9.3.38)

where ι is a vector of ones, ui is the T vector of uit s for individual i, Xi = [xi1 , ..., xiT ]0 P and u ¯i = T −1 Tt=1 uit ; τ1 and τ2 are unknown constants. Equation (9.3.38) reflects the ∗. endogenous character of zit Two variants are considered for the censoring mechanisms: ∗ is censored through h(·) and y ∗ observed only for certain values of z , ..., z (1) zit i1 iT i.e it ∗ yit = yit

= 0

if gt (zi1 , ..., zit ) = 1 (unobserved) if gt (zi1 , ..., ziT ) = 0

and ∗ is observed and only y ∗ is censored through k(·). (2) zit it

39

The first model allows for a conditional moment estimation where equation (9.3.37) is first estimated by ML and then equation (9.3.36) by conditional moment method after adding in its right hand side the conditional expectation of its errors given the exogenous variables and the errors of equation (9.3.37), in order to take into account the endogeneity of zit . For the second variant, a two step conditional ML approach is proposed by first estimating the second equation by ML as zit is observed and then the first equation by conditional ML i.e. maximising the conditional likelihood given zi . Generalisations to multiple endogenous variables are briefly mentioned. The first method is applied to a model for analyzing the influence of the number of hours worked on the hourly wage rate keeping in mind the potential endogeneity of the former. Through this application the authors point out the usefulness of the two step methods in a context where the maximum likelihood procedure is impractical.

9.4

Conclusion

To conclude we would like to make a few general remarks. First let us add a word on the different uses of the same terminology and a possible confusion arising from it, especially for students. As mentioned before, the problem of regressors correlated with the error term (whatever component of it) results in inconsistent/biased OLS/GLS estimates and one has to resort to IV/GMM methods. When data are in a one-dimensional form, there is no room for confusion. However in a panel data setting, the same terminology of ‘endogeneity of regressors’ may be used whether it concerns correlation with specific effects or with the residual disturbance term. Though it is correct to use the same name in both cases, the researcher has to check what type of endogeneity she is faced with before adopting a solution. Some methods or transformations that are valid for one may not be valid for the other and vice versa. Again keeping the students in mind we would like to point out that the terms IV and GMM can rightly be used in an interchangeable fashion as all IV estimators can also be interpreted as GMM estimators using the corresponding moment conditions. But one should understand how the same estimator can be obtained by both ways especially for implementing the estimation methods in any software package which may not explicitly have one or the other term in its commands. We now turn to areas where research could be continued in this topic. First of all, the reader would have noticed that we have not specially dealt with hypothesis testing in our chapter. This is because the tests on various coefficients and variance components are only asymptotic, based on the limiting distributions of the respective estimators and can be derived relatively easily as straightforward extensions of their counterparts in the single– equation model. No exact results are available so far on the distributions. This precisely leads us to one possible area for further theoretical research namely, derivation of the exact distributions of the various estimators developed above, or better approximations to the exact distribution than the asymptotic ones, especially for small samples, using recent

40

techniques like bootstrap or saddlepoint approximations. Finally, regarding the practical implementation of the various IV methods, we are happy to note that many of the above procedures have been included in the econometric software available on the market. G2SLS, within-2SLS and EC-2SLS are easily implemented in STATA which offers many estimation and inference possibilities with panel data in general. Matrix manipulations are also convenient in this programme which allows for easy and quick transformations of variables before entering them in a regression. Other packages like TSP, LIMDEP and RATS have also included panel data estimation possibilities. The reader is invited to go through the chapter devoted to this topic in this volume for an excellent review of the different options available.

References Ahn, S.C., and P. Schmidt [1995]: Efficient Estimation of Models for Dynamic Panel Data. Journal of Econometrics, 68, 5-27. Amemiya, T. [1971]: The Estimation of Variances in a Variance Components Model, International Economic Review 12, 1–13. Amemiya, T. and T.E. McCurdy [1986]: Instrumental Variable Estimation of an Error Components Model, Econometrica 54, 869–881. Anderson, T.W., and C. Hsiao [1981]: Estimation of Dynamic Models with Error Components. Journal of the American Statistical Association, 76, 598-606. Anderson, T.W., and C. Hsiao [1982]: Formulation and Estimation of Dynamic Models Using Panel Data. Journal of Econometrics, 18, 47-82. Arellano, M., and S. Bond [1991]: Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations. Review of Economic Studies, 58, 277-297. Arellano, M., and O. Bover [1995]: Another Look at the Instrumental Variable Estimation of ErrorComponents Models. Journal of Econometrics, 68, 29-51. Avery, R.B. [1977]: Error Component Models and Seemingly Unrelated Regressions, Econometrica, 45, 199–209. Balestra, P. [1978]: Determinant and Inverse of a Sum of Matrices with Applications in Economics and Statistics, Document de travail 24, Institut de Math´ematiques Economiques de Dijon, France. Balestra, P. [1983]: La D´erivation Matricielle. Collection de l’Institut de Math´ematiques Economiques de Dijon, 12, Sirey, Paris. Balestra, P. and J. (Varadharajan–)Krishnakumar [1987]: Full Information Estimations of a System of Simultaneous Equations with Error Component Structure, Econometric Theory 3, 223–246. Baltagi, B.H. [1980]: On Seemingly Unrelated Regressions with Error Components, Econometrica, 48, 1547–1551. Baltagi, B.H. [1981]: Simultaneous Equations with Error Components, Journal of Econometrics, 17, 189– 200. Baltagi, B.H. [1984]: A Monte Carlo Study for Pooling Time Series of Cross–Section Data in the Simultaneous Equations Model, International Economic Review, 25, 3, 603–624. Baltagi, B.H. [2001]: Econometric Analysis of Panel Data, second edition. Chichester: Wiley.

41

Baltagi, B.H. and Y-J. Chang [2000]: Simultaneous Equations with Incomplete Panels, Econometric Theory, 16, 269-279. Bekker, P.A. [1994]: Alternative Approximations to the Distributions of Instrumental Variable Estimators. Econometrica, 62, 657-681. Bekker, P., A. Kapteyn, and T. Wansbeek [1987]: Consistent Sets of Estimates for Regressions with Correlated or Uncorrelated Measurement Errors in Arbitrary Subsets of All Variables. Econometrica, 55, 1223-1230. Biørn, E. [1992]: The Bias of Some Estimators for Panel Data Models with Measurement Errors. Empirical Economics, 17, 51-66. Biørn, E. [1996]: Panel Data with Measurement Errors. Chapter 10 in The Econometrics of Panel Data. Handbook of the Theory with Applications, ed. by L. M´ aty´ as and P. Sevestre. Dordrecht: Kluwer. Biørn, E. [2000]: Panel Data with Measurement Errors. Instrumental Variables and GMM Estimators Combining Levels and Differences. Econometric Reviews, 19, 391-424. Biørn, E. [2003]: Handling the Measurement Error Problem by Means of Panel Data: Moment Methods Applied on Firm Data. Chapter 24 in Econometrics and the Philosophy of Economics, ed. by B. Stigum. Princeton: Princeton University Press, 2003. Biørn, E., and T.J. Klette [1998]: Panel Data with Errors-in-Variables: Essential and Redundant Orthogonality Conditions in GMM-Estimation. Economics Letters, 59, 275-282. Biørn, E., and T.J. Klette [1999]: The Labour Input Response to Permanent Changes in Output: An Errors in Variables Analysis Based on Panel Data. Scandinavian Journal of Economics, 101, 379-404. Blundell, R., and S. Bond [1998]: Initial Conditions and Moment Restrictions in Dynamic Panel Data Models. Journal of Econometrics, 87, 115-143. Bowden, R.J. and D.A. Turkington [1984]: Instrumental Variables. Econometric Society Publication, No. 8, Cambridge University Press, Cambridge. Breusch, T.S., G.E. Mizon and P. Schmidt [1987]: Efficient Estimation Using Panel Data, Michigan State University Econometrics Workshop Paper 8608. Breusch, T., H. Qian, P. Schmidt, and D. Wyhowski [1999]: Redundancy of Moment Conditions. Journal of Econometrics, 91, 89-111. Chamberlain, G. [1987]: Asymptotic Efficiency in Estimation With Conditional Moment Restrictions. Journal of Econometrics, 34, 305-334. Cornwell, C., P. Schmidt and D. Wyhowski [1992]: Simultaneous Equations and Panel Data, Journal of Econometrics, 51, 151–181. Davidson, R., and J.G. MacKinnon [1993]: Estimation and Inference in Econometrics. Oxford: Oxford University Press. Don, F.J.H. [1985]: The Use of Generalized Inverses in Restricted Maximum Likelihood, Linear Algebra and its Applications, 70. Erickson, T. [1993]: Restricting Regression Slopes in the Errors-in-Variables Model by Bounding the Error Correlation. Econometrica, 91, 959-969. Frisch, R. [1934]: Statistical Confluence Analysis by Means of Complete Regression Systems. Oslo: Universitetets Økonomiske Institutt. Fisher, F.M. [1966]: The Identification Problem in Econometrics, New York: McGraw–Hill.

42

Fuller, W.A. [1987]: Measurement Error Models. New York: Wiley. Gouri´eroux, C., Monfort, A., and E. Renault [1990]: Two Stage GMM with Application to Regression with Heteroscedasticity of Unknown Form, CEPREMAP, N.9110, August. Griliches, Z., and J.A. Hausman [1986]: Errors in Variables in Panel Data. Journal of Econometrics, 31, 93-118. Griliches, Z. and M.D. Intriligator (eds.) [1983]: Handbook of Econometrics, North–Holland Publishing Company, Amsterdam. Hansen, L.P. [1982]: Large Sample Properties of Generalized Method of Moments Estimators. Econometrica, 50, 1029-1054. Harris, D. and L. M´ aty´ as [1999]: Introduction to the Generalized Method of Moments Estimation. Chapter 1 in Generalized Method of Moments Estimation, ed. by L. M´ aty´ as. Cambridge: Cambridge University Press. Hausman, J.A. and W.E. Taylor [1983]: Identification in Linear Simultaneous Equations Models with Covariance Restrictions: An Instrumental Variables Interpretation, Econometrica 51, 5, 1527–1549. Holtz-Eakin, D., W. Newey, and H.S. Rosen [1988]: Estimating Vector Autoregressions with Panel Data. Econometrica, 56, 1371-1395. Hsiao, C. [2003]: Analysis of Panel Data, 2nd edition. Cambridge: Cambridge University Press. Klepper, S., and E. Leamer [1984]: Consistent Sets of Estimates for Regressions with Errors in All Variables. Econometrica, 52, 163-183. Koopmans, T.C. [1953]: Identification Problems in Economic Model Construction, in Studies in Econometric Method (Cowles Commission Monograph 14), edited by W.C.Hood and T.C.Koopmans, New York: John Wiley and Sons. Krishnakumar, J. [1988]: Estimation of Simultaneous Equation Models with Error Components Structure. Springer–Verlag, Berlin–Heidelberg. Maddala, G.S. [1971]: The Use of Variance Components Models in Pooling Cross Section and Time Series Data, Econometrica, 39, 341–358. Magnus, J.R. [1982]: Multivariate Error Components Analysis of Linear and Non–Linear Regression Models by Maximum Likelihood, Journal of Econometrics, 19, 239–285. M´ aty´ as, L. and L. Lovrics [1990]: Small Sample Properties of Simultaneous Error Components Models, Economics Letters 32, 25-34. McCabe, B., and A. Tremayne [1993]: Elements of Modern Asymptotic Theory with Statistical Applications. Manchester: Manchester University Press. Nagar, A.L. [1959]: The Bias and Moment Matrix of the General k–class Estimators of the Parameters in Simultaneous Equations, Econometrica, 27, 575–595. Nelson, C.R., and R. Startz [1990]: Some Further Results on the Exact Small Sample Properties of the Instrumental Variable Estimator. Econometrica, 58, 967-976. Nerlove, M. [1971]: A Note on Error Components Models, Econometrica, 39, 383–396. Newey, W.K. [1985]: Generalized Method of Moments Specification Testing. Journal of Econometrics, 29, 229-256. Newey, W.K., and D. McFadden [1994]: Large Sample Estimation and Hypothesis Testing. Chapter 36 in Handbook of Econometrics, Vol. IV, ed. by R.F. Engle and D.L. McFadden. Amsterdam: North-Holland.

43

Pagan, A. [1979]: Some Consequences of Viewing LIML as an Iterated Aitken Estimator, Economics Letters, 3, 369–372. Paterno, E.M., Y. Amemiya, and Y. Amemiya [1996]: Random Effect and Random Coefficient Analysis with Errors-in-Variables. 1996 Proceedings of the Business and Economic Statistics Section, pp. 76-79. Phillips, P.C.B. [1982]: Small Sample Distribution Theory in Econometric Models of Simultaneous Equations, Cowles Foundation Discussion Paper No. 617, Yale University. Prucha, I.R. [1984]: On the Asymptotic Efficiency of Feasible Aitken Estimator for Seemingly Unrelated Regression Models with Error Components, Econometrica, 52, 203–207. Prucha, I.R. [1985]: Maximum Likelihood and Instrumental Variable Estimation in Simultaneous Equation Systems with Error Components, International Economic Review, 26, 491–506. Reiersøl, O. [1950]: Identifiability of a Linear Relation Between Variables which are Subject to Error. Econometrica, 18, 375-389. Rothenberg, T.J. [1971]: Identification in Parametric Models, Econometrica, 39, 577–592. Schmidt, P. [1990]: Three-Stage Least Squares with Different Instruments for Different Equations. Journal of Econometrics, 43, 389-394. Sevestre, P., and A. Trognon [2007]: Dynamic Linear Models. Chapter 7 in The Econometrics of Panel Data. Handbook of the Theory with Applications, ed. by L. M´ aty´ as and P. Sevestre. Dordrecht: Kluwer. Shalabh [2003]: Consistent Estimation of Coefficients in Measurement Error Models with Replicated Observations. Journal of Multivariate Analysis, 86, 227-241. Staiger, D., and J.H. Stock [1997]: Instrumental Variables Regression With Weak Instruments. Econometrica, 65, 557-586. Swamy, P.A.V.B. and S.S. Arora [1972]: The Exact Finite Sample Properties of the Estimators of Coefficients in the Error Components Regression Models, Econometrica, 40, 261–275. Theil, H. [1971]: Principles of Econometrics. North–Holland Publishing Company, Amsterdam. Vella, F. and M. Verbeek [1999]: Two-step Estimation of Panel Data Models with Censored Endogenous Variables and Selection Bias , Journal of Econometrics, 90, 239-263. Wansbeek, T.J. [2001]: GMM Estimation in Panel Data Models with Measurement Error. Journal of Econometrics, 104, 259-268. Wansbeek, T.J., and R.H. Koning [1991]: Measurement Error and Panel Data. Statistica Neerlandica, 45, 85-92. Wansbeek, T.J., and E. Meijer [2000]: Measurement Error and Latent Variables in Econometrics. Amsterdam: Elsevier. White, H. [1982]: Instrumental Variables Regression with Independent Observations. Econometrica, 50, 483-499. White, H. [1984]: Asymptotic Theory for Econometricians. Orlando: Academic Press. White, H. [1986]: Instrumental Variables Analogs of Generalized Least Squares Estimators. In Advances in Statistical Analysis and Statistical Computing. Theory and Applications, vol. 1, ed. by R.S. Mariano, JAI Press, pp. 173-227. Zellner, A. and H. Theil [1962]: Three Stage Least Squares: Simultaneous Estimation of Simultaneous Equations, Econometrica, 30, 54–78. Ziliak, J.P. [1997]: Efficient Estimation With Panel Data When Instruments Are Predetermined: An Empirical Comparison of Moment-Condition Estimators. Journal of Business and Economic Statistics, 15, 419-431.

44

Suggest Documents