GMM Estimation of Spatial Autoregressive Models in a System of ...

2 downloads 0 Views 505KB Size Report
2 Model and Moment Conditions. 2.1 Model. The model considered in this paper is given by a system of A simultaneous equations for B cross sectional units,.
GMM Estimation of Spatial Autoregressive Models in a System of Simultaneous Equations with Heteroskedasticity Xiaodong Liu

Paulo Saraiva

Department of Economics,

Department of Economics,

University of Colorado Boulder

University of Colorado Boulder

Email: [email protected]

Email: [email protected]

October, 2016

Abstract This paper proposes a GMM estimation framework for the SAR model in a system of simultaneous equations with heteroskedastic disturbances. Besides linear moment conditions, the proposed GMM estimator also utilizes quadratic moment conditions based on the covariance structure of model disturbances within and across equations. Compared with the QML approach, the GMM estimator is easier to implement and robust under heteroskedasticity of unknown form. We derive the heteroskedasticity-robust standard error for the GMM estimator. Monte Carlo experiments show that the proposed GMM estimator performs well in …nite samples. JEL classi…cation: C31, C36 Key words: simultaneous equations, spatial autoregressive models, quadratic moment conditions, unknown heteroskedasticity

1

Introduction

The spatial autoregressive (SAR) model introduced by Cli¤ and Ord (1973, 1981) has received considerable attention in various …elds of economics as it provides a convenient framework to model the interaction between economic agents. However, with a few exceptions (e.g., Kelejian We thank the editor, the associate editor, and two anonymous referees for valuable comments and suggestions. The remaining errors are our own.

1

and Prucha, 2004; Baltagi and Pirotte, 2011; Yang and Lee, 2017), most theoretical works in the spatial econometrics literature focus on the single-equation SAR model, which assumes that an economic agent’s choice (or outcome) in a certain activity is isolated from her and other agents’choices (or outcomes) in related activities. This restrictive assumption potentially limits the usefulness of the SAR model in many contexts. To incorporate the interdependence of economic agents’ choices and outcomes across di¤erent activities, Kelejian and Prucha (2004) extends the single-equation SAR model to the simultaneousequation SAR model. They propose both limited information two stage least squares (2SLS) and full information three stage least squares (3SLS) estimators for the estimation of model parameters and establish the asymptotic properties of the estimators. In a recent paper, Yang and Lee (2017) study the identi…cation and estimation of the simultaneous-equation SAR model by the full information quasi-maximum likelihood (QML) approach. They give identi…cation conditions for the simultaneous-equation SAR model that are analogous to the rank and order conditions for the classical simultaneous-equation model and derive asymptotic properties of the QML estimator. The QML estimator is asymptotically more e¢ cient than the 3SLS estimator under normality but can be computationally di¢ cult to implement. In this paper, we propose a generalized method of moments (GMM) estimator for the identi…cation and estimation of simultaneous-equation SAR models with heteroskedastic disturbances. Similar to the GMM estimator proposed by Lee (2007) and Lin and Lee (2010) for single-equation SAR models, the GMM estimator utilizes both linear moment conditions based on the orthogonality condition between the instrumental variable (IV) and model disturbances, and quadratic moment conditions based on the covariance structure of model disturbances. While the single-equation GMM estimator can be considered as an equation-by-equation limited information estimator for a system of simultaneous equations, the simultaneous-equation GMM estimator proposed in this paper exploits the correlation structure of disturbances within and across equations and thus is a full information estimator. We study the identi…cation of model parameters under the GMM framework and derive asymptotic properties of the GMM estimator under heteroskedasticity of unknown form. Furthermore, we propose a heteroskedasticity-robust estimator for the asymptotic variance-covariance matrix of the GMM estimator in the spirit of White (1980). The GMM estimator is asymptotically more e¢ cient than the 3SLS estimator. Compared with the QML estimator considered in Yang and Lee (2017), the GMM estimator is easier to implement and robust under heteroskedasticity. Monte 2

Carlo experiments show that the proposed GMM estimator performs well in …nite samples. The remaining of this paper is organized as follows. In Section 2, we describe the model and give the moment conditions used to construct the GMM estimator. In Section 3, we establish the identi…cation for the model under the GMM framework. We derive the asymptotic properties of the GMM estimator in Section 4. Results of Monte Carlo simulation experiments are reported in Section 5. Section 6 brie‡y concludes. Proofs are collected in the Appendix. Throughout the paper, we adopt the following notation. For an n diag(A) denote an n diag(a11 ;

n diagonal matrix with the i-th diagonal element being aii , i.e., diag(A) =

; ann ). Let (A) denotes the spectral radius of the square matrix A. For an n

matrix B = [bij ], the vectorization of B is denoted by vec(B) = (b11 ; In denote the n

2

n matrix A = [aij ], let

; bn1 ; b12 ;

m

; bnm )0 .1 Let

n identity matrix and in;k denote the k-th column of In .

Model and Moment Conditions

2.1

Model

The model considered in this paper is given by a system of m simultaneous equations for n cross sectional units, Y=Y where Y = [y1 ;

; ym ] is an n

exogenous variables, and U = [u1 ; n

0

+ WY

0

+ XB0 + U;

(1)

m matrix of endogenous variables, X is an n

KX matrix of

m matrix of disturbances.2 W = [wij ] is an

; um ] is an n

n nonstochastic matrix of spatial weights, with wij representing the proximity between cross

sectional units i and j.3 The diagonal elements of W are normalized to be zeros. In the literature, WY is usually referred to as the spatial lag. KX of

0,

0

and B0 are, respectively, m

m, m

m and

m matrices of true parameters in the data generating process (DGP). The diagonal elements 0

are normalized to be zeros.

In general, the identi…cation of simultaneous-equation models needs exclusion restrictions. Let k;0 ,

k;0

and

k;0

denote vectors of nonzero elements of the k-th columns of

0,

0

and B0 re-

spectively under some exclusion restrictions. Let Yk , Yk and Xk denote the corresponding matrices containing columns of Y, Y = WY and X that appear in the right hand side of the k-th equation. A, B, C are conformable matrices, then vec(ABC) = (C0 A)vec(B), where denotes the Kronecker product. this paper, all variables are allowed to depend on the sample size, i.e., are allowed to formulate triangular arrays as in Kelejian and Prucha (2010). Nevertheless, we suppress the subscript n to simplify the notation. 3 For SAR models, the notion of proximity is not limited to the geographical sense. It can be economic proximity, technology proximity, or social proximity. Hence the SAR model has a broad range of applications. 1 If

2 In

3

Then, the k-th equation of model (1) is

yk = Yk

+ Yk

k;0

k;0

+ Xk

k;0

+ uk :

(2)

We maintain the following assumptions regarding the DGP. Assumption 1 Let uik denote the (i; k)-th element of U and u denote the vectorization of U, i.e., u = vec(U). (i) (ui1 ;

; uim ) are independently distributed across i with zero mean. (ii) 2

11

6 6 E(uu ) = 6 6 4 0

is nonsingular, with i = 1;

kl

=

lk

; n and k; l; s; t = 1;

= diag(

1;kl ;

.. .

1m

..

.. .

.

m1

;

n;kl ).

mm

3 7 7 7 7 5

(iii) Ejuik uil uis uit j1+ is bounded for any

; m, for some positive constant .

Assumption 2 The elements of X are uniformly bounded constants. X has full column rank KX . limn!1 n

1

X0 X exists and is nonsingular.

Assumption 3

0

is nonsingular with a zero diagonal. (

0 (Im

0)

1

) < 1= (W).

Assumption 4 W has a zero diagonal. The row and column sums of W and (Imn 0 0

W)

1

In

are uniformly bounded in absolute value.

Assumption 5 for k = 1;

0 0

k;0

=(

0 k;0 ;

0 k;0 ;

0 0 k;0 )

is in the interior of a compact and convex parameter space

; m.

The above assumptions are based on some standard assumptions in the literature of SAR models (see, e.g., Kelejian and Prucha, 2004; Lee, 2007; Lin and Lee, 2010). In particular, Assumption 3 is 0 0

from Yang and Lee (2017). Under this assumption, S = Imn

In

0 0

W is nonsingular,

and hence the simultaneous-equation SAR model (1) has a well de…ned reduced form

1

y=S

(B00

In )x + S

1

where y = vec(Y), x = vec(X), and u = vec(U). Note that, when m = 1, we have and j

11;0 j

0

=

11;0 .

Then, (

0 (Im

0)

1

(3)

u;

0

= 0

) < 1= (W) becomes the familiar parameter constraint

< 1= (W) for the single-equation SAR model. 4

2.2

Motivating examples

To illustrate the empirical relevance of the proposed model, we give two motivating examples in this subsection. The …rst example models the spillover e¤ect of …scal policies between local jurisdictions. The second example characterizes the conformity e¤ect in social networks. 2.2.1

Fiscal policy interaction

Consider a set of n local jurisdictions. Each jurisdiction chooses expenditures in m publicly provided services to maximize the social welfare. The social welfare function of jurisdiction i is given by

Vi =

m X

(

ik

+

k=1

where

ik

m X l=1

lk

n X

m

wij yjl )yik

j=1

m

1 XX 2

lk yik yil ;

(4)

k=1 l=1

captures the existing condition of publicly provided service k in jurisdiction i (and other

exogenous characteristics of jurisdiction i), yik represents the spending by jurisdiction i on publicly provided service k, and wij measures the geographical proximity between jurisdictions i and j (e.g., one could let wij = 1 if jurisdictions i and j share a common border and wij = 0 otherwise). The …rst term of the social welfare function (4) captures the social welfare gain from publicly provided services. The marginal social welfare gain of the expenditure yik is given by ik + Pm Pn Pn l=1 lk j=1 wij yjl , where j=1 wij yjl represents the spending by neighboring jurisdictions on

publicly provided services and its coe¢ cient

lk

captures the spillover e¤ect of publicly provided

services (see, e.g., Allers and Elhorst, 2011). For example, people may use libraries, parks and other recreation facilities in neighboring jurisdictions. In this case, the spending by a jurisdiction on such facilities might be considered as a substitute for the spending by a neighboring jurisdiction on similar or related facilities and thus generate a negative spillover e¤ect (i.e.

lk

< 0). On the other hand,

if the services provided by neighboring jurisdictions are complements (e.g. road networks and business parks, or hospitals and nursing homes), then the spillover e¤ect is expected to be positive (i.e. lk

> 0). The second term of the social welfare function (4) captures the cost of publicly provided

services.4 The coe¢ cient

lk

(

lk

=

kl )

represents the substitution e¤ect between expenditures by

a jurisdiction on publicly provided services k and l. 4 See Hauptneier et al. (2012) for some discussion on including the cost of publicly provided services instead of imposing a budget constraint in the social welfare function.

5

Maximizing the social welfare function (4) yields the best response function m X

yik =

lk yik

+

l=1;l6=k

where

lk

=

lk = kk ,

lk

=

m X

lk

and

=

ik

wij yjl +

ik

j=1

l=1

lk = kk ,

n X

ik = kk .

Suppose

ik

= x0ik

+ uik , where

k

xik is a vector of the observable characteristics of jurisdiction i and uik captures the unobservable heterogeneity of jurisdiction i. Then the best response function implies the simultaneous-equation SAR model (2). 2.2.2

Social conformity

Patacchini and Zenou (2012) consider a social conformity model where the social norm is given by the average behavior of peers in a certain activity. We generalize their model by de…ning the social norm based on a portfolio of di¤erence activities. Suppose a set of n individuals interact in a social network. The network topology is captured by the matrix W = [wij ]. Let di denote the number of friends of individual i. A possible speci…cation of W is such that wij = 1=di if individuals i and j are friends and wij = 0 otherwise. Individual i choose e¤ort levels yi1 ;

; yim simultaneously in m

activities to maximize her utility function

Ui =

m X

k=1

m

ik yik

m

1 XX 2

lk yik yil

m X 1

k=1

k=1 l=1

2

k (yik

m X

%lk

n X

wij yjl )2 :

(5)

j=1

l=1

The …rst term of the utility function (5) captures the payo¤ from the e¤orts with the productivity of individual i in activity k given by

ik .

The second term is the cost from the e¤orts with the

substitution e¤ect between e¤orts in di¤erent activities captured by

lk

(

lk

=

kl ).

The last term

re‡ects the in‡uence of an individual’s friends on her own behavior. It is such that each individual wants to minimize the social distance between her own behavior yik to the social norm of that activity. The social norm for activity k is given by the weighted average behavior of her friends in Pm Pm Pn m activities l=1 %lk j=1 wij yjl with the weights %lk such that l=1 %lk = 1. The coe¢ cient k

captures the taste for conformity.

Maximizing the utility function (5) yields the best response function

yik =

m X

l=1;l6=k

lk yik +

m X l=1

lk

n X

wij yjl +

ik

j=1

6

where ik

lk

= x0ik

2.3

= k

lk =( kk

+

k ),

lk

=

k %lk =( kk

+

k ),

and

ik

=

ik =( kk

+

5 k ).

Suppose

+ uik . Then the best response function leads to the econometric model (2).

Moment conditions

Following Lee (2007) and Lin and Lee (2010), for the estimation of the simultaneous-equation SAR model (1), we consider both linear moment conditions

E(Q0 uk ) = 0;

where Q is an n

KQ matrix of IVs, and quadratic moment conditions E(u0k

where

r ’s

are n

(6)

r ul )

= tr(

r

kl );

for r = 1;

; p;

n constant matrices. Note that, if the diagonal elements of

r ’s

are zeros, then

the quadratic moment conditions become

E(u0k

r ul )

As an example, we could use Q = [X; WX;

= 0;

for r = 1;

; Wp X] and

1

; p:

(7)

= W;

2

= W2 diag(W2 );

;

p

=

Wp diag(Wp ), where p is some predetermined positive integer, to construct the linear and quadratic moment conditions. The quadratic moment conditions (7) exploit the covariance structure of model disturbances both within and across equations, and hence are more general than the quadratic moment conditions considered in Lee (2007) and Lin and Lee (2010). Let the residual function for the k-th equation be

uk (

where

k

=(

0 k;

0 k;

0 0 k) .

k)

= yk

Yk

k

Yk

k

Xk

k;

The empirical linear moment functions based on (6) can be written as

g1;k

g1;k (

k)

= Q0 uk (

k)

(8)

5 Suppose the parameters in the best response function can be identi…ed, then parameters in the utility function (5) can be identi…ed up to a scale factor. This issue is P common for simultaneous-equation models (Schmidt, 1976). Pm In this example, if lk can be identi…ed, then k = kk m l=1 lk =(1 l=1 lk ), which is identi…able up to a scale factor kk .

7

and the empirical quadratic moment functions based on (7) can be written as

g2;kl (

g2;kl

for k; l = 1;

k;

l)

0 1 uk ( k );

=[

;

0 0 p uk ( k )] ul ( l )

(9)

; m. Combining both linear and quadratic moment functions by de…ning 2

3 g ( ) 6 1 7 g( ) = 4 5; g2 ( ) where

=(

0 1;

;

0 0 m) ,

0 g1 ( ) = (g1;1 ;

(10)

0 0 ; g1;m )0 , and g2 ( ) = (g2;11 ;

0 0 ; g2;1m ; g2;21 ;

0 ; g2;mm )0 .

The identi…cation and estimation of the simultaneous-equation SAR model (1) is then based on the moment conditions E[g(

0 )]

= 0. We maintain the following assumption regarding the moment

conditions. Assumption 6 (i) The elements of Q are uniformly bounded. (ii) The diagonal elements of are zeros, and the row and column sums of limn!1 n

3

1

are uniformly bounded, for r = 1;

r

exists and is nonsingular, where

= Var[g(

r

; p. (iii)

0 )].

Identi…cation

Following Yang and Lee (2017), we establish the identi…cation of the simultaneous-equation SAR model in two steps. In the …rst step, we consider the identi…cation of the “pseudo” reduced form parameters in equation (11). In the second step, we recover the structural parameters from the “pseudo” reduced form parameters. 3.1 When

Identi…cation of the “pseudo” reduced form parameters 0

is nonsingular, the simultaneous-equation SAR model (1) has a “pseudo” reduced form

Y = WY

where

0

=

0 (Im

0)

1

,

0

= B0 (Im

0)

0

+X

1

0

+ V;

(11)

, and V = U(Im

0)

1

. Equation (11) has the

speci…cation of a multivariate SAR model (see, Yang and Lee, 2017; Liu, 2015). First, we consider the identi…cation of the “pseudo” reduced form parameters

0

=[

lk;0 ]

and

0

=[

1;0 ;

;

m;0 ]

under the GMM framework.

8

The k-th equation in model (11) is given by

yk =

m X

lk;0 Wyl

+X

+ vk ;

k;0

l=1

where 0 0

Wyl = Hl ( with Hl = (i0m;l

(

W)[Imn

0 0

1

W)]

In )x + Hl v

(12)

, x = vec(X), and v = vec(V). Hence, the residual

function for the k-th equation can be written as

vk (

k ) = yk

m X

lk Wyl

X

k

= dk (

k ) + vk +

l=1

where

=(

k

1k ;

;

mk ;

0 0 k)

m X

(

lk )Hl v;

lk;0

(13)

l=1

and dk (

k)

= [E(Wy1 );

; E(Wym ); X](

k;0

k ).

The “pseudo”reduced form parameters in model (11) can be identi…ed by the moment conditions described in the previous section. Similar to (8) and (9), the linear moment functions can be written as f1;k

f1;k (

k)

= Q0 vk (

k)

and the quadratic moment functions can be written as

f2;kl

0 and f2 ( ) = (f2;11 ; 0 )]

=

0

k;

l)

=[

0 1 vk ( k );

; m. Let f ( ) = [f1 ( )0 ; f2 ( )0 ]0 , where

for k; l = 1;

E[f (

f2;kl (

0 0 ; f2;1m ; f2;21 ;

0 ; f2;mm )0 . For

= 0, the moment equations limn!1 n

1

0 0 p vk ( k )] vl ( l )

;

=(

0

0 1;

;

0 0 m) ,

0 f1 ( ) = (f1;1 ;

0 ; f1;m )0 ,

to be identi…ed by the moment conditions

E[f ( )] = 0 need to have a unique solution at

(Hansen, 1982).

It follows from (13) that

lim n

n!1

for k = 1; at

k

=

1

E[f1;k (

k )]

= lim n n!1

1

Q0 dk (

k)

= lim n n!1

1

Q0 [E(Wy1 );

; m. The linear moment equations, limn!1 n k;0 ,

if Q0 [E(Wy1 );

1

E[f1;k (

; E(Wym ); X](

k )]

k;0

k)

= 0, have a unique solution

; E(Wym ); X] has full column rank for n su¢ ciently large. A

necessary condition for this rank condition is that [E(Wy1 );

; E(Wym ); X] has full column rank

9

of m + KX and rank(Q)

m + KX for n su¢ ciently large.

If, however, [E(Wy1 );

; E(Wym ); X] does not have full column rank, then the model may

still be identi…able via the quadratic moment conditions. Suppose for some m 2 f0; 1;

;m

1g,

E(Wyl ) and the columns of [E(Wy1 ); ; E(Wym ); X] are linearly dependent,6 i.e., E(Wyl ) = Pm ; c1;ml ; c02;l ) 2 Rm+KX , for l = k=1 c1;kl E(Wyk ) + Xc2;l for some vector of constants (c1;1l ; m + 1;

dk (

; m. In this case,

k)

=

m X

E(Wyj )[

jk;0

jk

+

j=1

m X

(

lk )c1;jl ] + X[

lk;0

k;0

k

l=m+1

and hence limn!1 n

1

E[f1;k (

k )]

+

m X

(

lk )c2;l ];

lk;0

l=m+1

= 0 implies that

=

jk

jk;0

=

k

k;0

m X

+

+

(

l=m+1 m X

(

lk )c1;jl

lk;0

(14)

lk )c2;l ;

lk;0

l=m+1

for j = 1;

; m, provided that Q0 [E(Wy1 );

; m and k = 1;

rank for n su¢ ciently large. Therefore, ( m + 1;

1k;0 ;

;

0 k;0 )

mk;0 ;

; E(Wym ); X] has full column can be identi…ed if

lk;0

(for l =

; m) can be identi…ed from the quadratic moment conditions.

With

k

given by (14), we have

E[vk (

0 k)

r vl ( l )]

=

m X

(

0 ik )tr[Hi

ik;0

i=1 m X m X

+

r E(vl v

0

)] +

m X

(

jl;0

jl )tr[

0 r Hj E(vvk )]

j=1

(

0 jl )tr[Hi

ik )( jl;0

ik;0

r Hj E(vv

0

)];

i=1 j=1

where E(vv0 ) = [(Im

0 1 0)

In ] [(Im

1

0)

In ] and E(vk v0 ) = E(vv0k )0 = (i0m;k

Therefore, the quadratic moment equations, limn!1 n a unique solution at

lim n

n!1

1

f

m X

0,

(

1

E[f2;kl (

k;

l )]

= 0 for k; l = 1;

In )E(vv0 ). ; m, have

if the equations

ik;0

0 ik )tr[Hi

r E(vl v

i=1

0

)] +

m X

(

jl;0

jl )tr[

0 r Hj E(vvk )]

(15)

j=1

+

m X m X

(

ik;0

ik )( jl;0

0 jl )tr[Hi

r Hj E(vv

0

)]g =

0;

i=1 j=1

6 We

adopt the convention that [E(Wy1 );

; E(Wym ); X] = X for m = 0.

10

for r = 1;

; p and k; l = 1;

7 0.

; m, have a unique solution at

To wrap up, su¢ cient conditions

for the identi…cation of the “pseudo” reduced form parameters are summarized in the following assumption. Assumption 7 At least one of the following conditions holds. (i) limn!1 n (ii) limn!1 n m

1

Q0 [E(Wy1 );

; E(Wym ); X] exists and has full column rank.

Q0 [E(Wy1 );

; E(Wym ); X] exists and has full column rank for some 0

1

1. The equations (15), for r = 1;

; p and k; l = 1;

; m, have a unique solution at

m 0.

Example 1 To better understand the identi…cation conditions in Assumption 7, consider the “pseudo” reduced form equations (11) with m = 2 and X = [x; Wx], where x is n

1 vector of individual-

speci…c exogenous characteristics. In this case, equations (11) can be written as

y1

=

11;0 Wy1

+

21;0 Wy2

+

11;0 x

+

21;0 Wx

+ u1

y2

=

12;0 Wy1

+

22;0 Wy2

+

12;0 x

+

22;0 Wx

+ u2 :

(16)

The “pseudo” reduced form equations (16) have the speci…cation of a multivariate SAR model (Yang and Lee, 2017). In (16), Wx is a spatial lag of the exogenous variable and its coe¢ cient captures the “contextual e¤ ect” (Manski, 1993). How to identify the endogenous peer e¤ ect captured by

kk;0

from the contextual e¤ ect has been a major interest in the literature of social interaction models. In the multivariate SAR model (16), the identi…cation problem is even more challenging because of the presence of the “cross-equation peer e¤ ect” captured by

lk;0

(l 6= k).

Identi…cation of model (16) can be achieved via linear moment conditions if Assumption 7 (i) holds, or via linear and quadratic moment conditions if Assumption 7 (ii) holds. A necessary condition for Assumption 7 (i) is that [E(Wy1 ); E(Wy2 ); x; Wx] has full column rank for n su¢ ciently large. It follows by a similar argument as in Cohen-Cole et al. (2016) that [E(Wy1 ); E(Wy2 ); x; Wx] has full column rank if and only if the matrices In ; W; W2 ; W3 are linearly independent8 and the 7 A weaker identi…cation condition can be derived based on (14) and (15) if the constants c 1;1l ; known to the researcher. 8 For example, the weights matrix W for a star network is given by 2 3 0 (n 1) 1 (n 1) 1 6 1 7 0 0 6 7 W=6 . 7: . . . .. .. .. 5 4 .. 1 0 0

; c1;ml ; c2;l are

As W3 = W, the linear independence of In ; W; W2 ; W3 does not hold.

11

parameter matrix 2 6 6 6 6 4

11;0 21;0 12;0 21;0 22;0

1

12;0

22;0 11;0

+

21;0

12;0 11;0

22;0 21;0

11;0 12;0

12;0 21;0

3

+

(

22;0

11;0 22;0

11;0

11;0

+

7 7 7 (17) 7 5

22;0 )

22;0

12;0

21;0

has full column rank. Suppose

11;0

=

=

21;0

=

12;0

22;0

= 0 in the DGP. Then the parameter matrix (17)

does not have full column rank. Therefore, [E(Wy1 ); E(Wy2 ); x; Wx] does not have full column rank and Assumption 7 (i) fails to hold. In this case, model (16) can be identi…ed if Assump0 0

tion 7 (ii) holds. From (12), E(Wyk ) = Hk ( dk (

k)

= [E(Wy1 ); E(Wym ); x; Wx](

k;0

k)

=(

1

Q0 dk (

k)

the linear moment equations limn!1 n limn!1 n

1

In )x = 0 as 1k )x

1k;0

= 0, only

= 0, which implies that

0

+(

and

1k;0

2k )Wx.

2k;0

2k;0

Q0 [x; Wx] exists and has full column rank. Identi…cation of

lk;0

Hence, from

can be identi…ed if

has to be achieved via

the quadratic moment equations (15). It follows from (15) that, for k = l = 1,

(

11;0

11 )

+(

11;0

2 11 )

+(

11;0

11 )( 21;0

for r = 1;

1

lim n

n!1

tr[H01 (

lim n

n!1

1

r

tr[H01

21 )

0 0 r )E(v1 v )]

+

r H1 E(vv

lim n

n!1

1

0

+(

)] + (

tr[H01 (

r

+

21 ) 2 21 )

21;0

lim n

1

n!1

lim n

n!1

0 0 r )H2 E(vv )]

1

tr[H02 (

tr[H02

r

+

0 0 r )E(v1 v )]

r H2 E(vv

= 0;

0

)] (18)

; p. If the matrix 2

tr[H01 (

1+

0 0 1 )E(v1 v )]

6 6 6 tr[H02 ( 1 + 01 )E(v1 v0 )] 6 6 16 lim n 6 tr[H01 1 H1 E(vv0 )] n!1 6 6 6 tr[H02 1 H2 E(vv0 )] 4 tr[H01 ( 1 + 01 )H2 E(vv0 )] has full row rank, (18) has a unique solution at ( Similarly, (

21;0

12;0 ;

22;0 )

11;0 ;

tr[H01 (

p+

0 0 p )E(v1 v )]

tr[H02 (

p+

0 0 p )E(v1 v )]

tr[H01

p H1 E(vv

0

)]

tr[H02

p H2 E(vv

0

)]

tr[H01 ( 21;0 )

p

+

and hence (

0 0 p )H2 E(vv )]

11;0 ;

21;0 )

3 7 7 7 7 7 7 7 7 7 7 5

can be identi…ed.

can be identi…ed from (15) with k = l = 2.

12

3.2

Identi…cation of the structural parameters

Provided that the “pseudo”reduced form parameters

and

0

0

can be identi…ed from the linear and

quadratic moment conditions as discussed above. Then, the identi…cation problem of the structural parameters in and

0

0

= B0 (Im

0 0) ;

= [(Im 0)

1

0 0;

B00 ]0 through the linear restrictions

0

=

0 (Im

0)

1

is essentially the same one as in the classical linear simultaneous-equation

model (see, e.g., Schmidt, 1976). Let #k;0 denote the k-th column of

0.

Suppose there are Rk

restrictions on #k;0 of the form Rk #k;0 = 0 where Rk is a Rk (2m+KX ) matrix of known constants. Following a similar argument as in Yang and Lee (2017), the su¢ cient and necessary rank condition for identi…cation is rank(Rk k = 1;

0)

= m

1, and the necessary order condition is Rk

m

1, for

; m.

Assumption 8 For k = 1;

; m, Rk #k;0 = 0 for some Rk

rank(Rk

0)

=m

(2m + KX ) constant matrix Rk with

1:

Example 2 To better understand the rank condition in Assumption 8, consider the model

y1

=

21;0 y2

+

11;0 Wy1

+X

1;0

+ u1

y2

=

12;0 y1

+

22;0 Wy2

+X

2;0

+ u2

(19)

with “pseudo” reduced-form equations

where

and

2 6 4 [

y1

=

11;0 Wy1

+

21;0 Wy2

+X

1;0

+ v1

y2

=

12;0 Wy1

+

22;0 Wy2

+X

2;0

+ v2 ;

11;0 21;0

1;0 ;

2;0 ]

12;0 22;0

= (1

3

7 5 = (1

12;0 21;0 )

12;0 21;0 )

1

[

1;0

1

+

2 6 4

(20)

3

11;0

12;0 11;0

21;0 22;0

22;0

7 5

12;0

1;0 ]:

21;0

2;0 ;

2;0

+

(21)

(22)

Suppose Assumption 7 holds for (20) and thus the “pseudo” reduced-form parameters can be identi-

13

…ed. Then, the structural parameters

0

0 0) ;

= [(Im

2

1

6 B00 ]0 = 4

0 0;

11;0

21;0

1

12;0

0

0 1;0

22;0

0 2;0

0

30 7 5

can be identi…ed from (21) and (22) if the rank condition holds. The exclusion restriction for the …rst equation of model (19) can be represented by R1 = [0; 0; 0; 1; 01 which has rank 1 if

22;0

11;0

=

22;0

Then R1

KX ].

Then R2

= [

0

11;0 ; 0]

= [0;

which has rank 1 if

22;0 ],

11;0

6= 0.

= 0 in the DGP, then (19) becomes a classical linear simultaneous equations

model, which cannot be identi…ed without imposing exclusion restrictions on

4

0

6= 0. Similarly, the exclusion restriction for the second equation can be

represented by R2 = [0; 0; 1; 0; 01 Indeed, if

KX ].

k;0 .

GMM Estimation

4.1

Consistency and asymptotic normality

Based on the moment conditions E[g(

0 )]

= 0, the GMM estimator for the simultaneous-equation

SAR model (1) is given by egmm = arg min g( )0 F0 Fg( )

(23)

where F is some conformable matrix such that limn!1 F exists with full row rank greater than or equal to dim ( ). Usually, F0 F is referred to as the GMM weighting matrix. To characterize the asymptotic distribution of the GMM estimator, …rst we need to derive Var[g(

0 )]

and D =

E[ @@ 0 g(

0 )].

As

r ’s

have zero diagonals for r = 1;

=

; p, it follows by

Lemmas A.1 and A.2 in the Appendix that

= Var[g(

0 )]

where

11

= Var[g1 (

0 )]

= (Im

0

Q)

(Im

2

6 =4

11

0

0 22

2

3

7 5;

0 6 Q 11 Q 6 .. Q) = 6 . 6 4 Q0 1m Q

(24)

Q0 ..

1m Q

.. .

. Q0

mm Q

3 7 7 7 7 5

14

and

22

= Var[g2 (

0 E(g2;ij g2;kl )j =

0 )]

with a typical block matrix in

2

0

6 tr( 6 =6 6 4 tr(

il

il

1

p

jk

jk

1)

1)

+ tr( .. . + tr(

ik

1

given by

22

jl

0 1)

.. ik

p

jl

tr(

il

1

jk

1)

+ tr( .. .

tr(

il

p

jk

p)

+ tr(

.

0 1)

ik

p

jl

0 p)

ik

p

jl

0 p)

The explicit expression for D depends on the speci…c restrictions imposed on the model parameters. Let Zk = [Yk ; Yk ; Xk ]. Then,

D=

E[

@ g( @ 0

where D1 =

and

with

1;kl

0 6 Q E(Z1 ) 6 @ E[ 0 g1 ( 0 )] = 6 6 @ 4

6 6 6 6 6 6 6 6 6 @ E[ 0 g2 ( 0 )] = 6 6 @ 6 6 6 6 6 6 6 4

= [E(Z0k

1 ul );

= [D01 ; D02 ]0 ;

2

2

D2 =

0 )]

; E(Z0k

.. . 1;1m

..

. 1;m1

.. . 1;mm

and

2;kl

3

. Q0 E(Zm )

3

1;11

0 p ul )]

..

(25)

2

7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7+6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 5 4

= [E(Z0l

7 7 7 7 5 3

2;11

..

. 2;1m

.. . 2;m1

..

. 2;mm

0 1 uk );

; E(Z0l

7 7 7 7 7 7 7 7 7 7; 7 7 7 7 7 7 7 7 5

0 0 p uk )] .

In the fol-

lowing proposition we establish consistency and asymptotic normality of the GMM estimator egmm de…ned in (23).

Proposition 1 Suppose Assumptions 1-8 hold, Then, p

n(egmm

0)

d

! N(0; AsyVar(egmm ))

15

3

7 7 7: 7 5

where AsyVar(egmm ) = lim [(n n!1

with

1

D)0 F0 F(n

1

1

D)]

(n

1

D)0 F0 F(n

1

)F0 F(n

1

D)[(n

1

D)0 F0 F(n

1

1

D)]

and D de…ned in (24) and (25) respectively.

With F0 F in (23) replaced by (n

1

1

)

, AsyVar(egmm ) reduces to (limn!1 n 1

Therefore, by the generalized Schwarz inequality, (n However, since matrix (n

1

depends on the unknown matrix 1

)

)

1

1

D0

1

1

D)

.

is the optimal GMM weighting matrix.

, the GMM estimator with the optimal weighting

is infeasible. The following proposition extends the result in Lin and Lee (2010) 1

to the simultaneous-equation SAR model by suggesting consistent estimators for n under heteroskedasticity of unknown form. With consistently estimated n

1

and n

1

1

and n

D

D, the feasible

optimal GMM estimator and its heteroskedasticity-robust standard error can be obtained. Proposition 2 Suppose Assumptions 1-8 hold. Let e be a consistent estimator of diag(e u1k u e1l ;

estimators of n Then, n

1

e D

e k = uk (ek ). Let n ;u enk u enl ) where u eik is the i-th element of u

n

1

and n

1

1

D, with

D = op (1) and n

0

and

1e

n

kl

1

1

0

and e kl =

e and n D

1e

be

and D replaced by e and e kl respectively.

in

= op (1).

Finally Proposition 3 establishes asymptotic normality of the feasible optimal GMM estimator. Proposition 3 Suppose Assumptions 1-8 hold. The optimal GMM estimator is given by

where n

1e

bgmm = arg min g( )0 e g( ); is a consistent estimator of n

1

. Then,

p

n(bgmm

(26)

0)

d

! N(0; (limn!1 n

1

D0 D)

1

Note that, the 3SLS estimator can be treated as a special case of the optimal GMM estimator using only linear moment conditions, i.e., b3SLS = arg min g1 ( )0 e 1 g1 ( ) = (Z0 PZ) e 11 where

2

6 Z1 6 Z=6 6 4

..

1

e Z0 Py;

3

. Zm

7 7 7 7 5 16

).

e = (Im and P

Since D0

1

Q0 ) e (Im

Q)[(Im

p

D

D01

n(b3SLS

1 11 D1

= D02

1

Q)]

0)

(Im

Q0 ). Similar to Proposition 3, we can show that

d

! N(0; ( lim n n!1

1 22 D2 ,

1

1 1 11 D1 )):

D01

which is positive semi-de…nite, the proposed GMM esti-

mator is asymptotically more e¢ cient than the 3SLS estimator. 4.2

Best moment conditions under homoskedasticity

The above optimal GMM estimator is only “optimal” given the chosen moment conditions. The asymptotic e¢ ciency of the optimal GMM estimator can be improved by choosing the “best”moment conditions. As discussed in Lin and Lee (2010), under heteroskedasticity of unknown form, the best moment conditions may not be available. However, under homoskedasticity, it is possible to …nd the best linear and quadratic moment conditions with Q and

r ’s

satisfying Assumption 6. In

general, the best moment conditions depend on the model speci…cation. For expositional purpose, we consider a two-equation SAR model given by

y1

=

21;0 y2

+

11;0 Wy1

+

21;0 Wy2

+ X1

1;0

+ u1

y2

=

12;0 y1

+

12;0 Wy1

+

22;0 Wy2

+ X2

2;0

+ u2

where X1 and X2 are respectively n n

K1 and n

K2 sub-matrices of X. Suppose u1 and u2 are

1 vectors of i.i.d. random variables with zero mean and E(u1 u01 ) =

E(u1 u02 ) =

12 In .

(27)

11 In ,

E(u2 u02 ) =

22 In

and

The reduced form of (27) is 2

3 y 6 1 7 4 5=S y2

2

X1 16 4 X2

1;0 2;0

3

2

3 u 1 7 7 16 5+S 4 5; u2

(28)

where

S = I2n

0 0

In

0 0

2

6 W=4

1

12;0

21;0

1

30 7 5

In

2 6 4

11;0

12;0

21;0

22;0

30 7 5

W:

17

Let (S

1

)kl denote the (k; l)-th block matrix of S

1

, i.e., (S

1

)kl = (i02;k

In )S

1

(i2;l

In ), for

k; l = 1; 2. Then, from (28),

y1

=

(S

1

y2

=

(S

1

)11 X1

+ (S

1

1;0

)21 X1

+ (S

1

1;0

)12 X2

+ (S

1

)11 u1 + (S

1

2;0

)12 u2

)22 X2

+ (S

1

)21 u1 + (S

1

2;0

)22 u2 :

(29)

With the residual functions

u1 (

1)

= y1

21 y2

11 Wy1

21 Wy2

X1

1

u2 (

2)

= y2

12 y1

12 Wy1

22 Wy2

X2

2;

the moment functions are given by g( ) = [g1 ( )0 ; g2 ( )0 ]0 , where 2

3 ) 1 7 5 2)

0 6 Q u1 ( g1 ( ) = 4 Q0 u2 (

and

with g2;kl (

2

k;

l)

= [

0 1 uk ( k );

g2;11 (

6 6 6 g2;12 ( 6 g2 ( ) = 6 6 6 g2;21 ( 4 g2;22 ( ;

1;

1)

3

7 7 7 1; 2) 7 7; 7 2; 1) 7 5 2; 2)

0 0 p uk ( k )] ul ( l ).

Then, the asymptotic variance-covariance

matrix for the optimal GMM estimator de…ned in (26) is AsyVar(bgmm ) = (limn!1 n Under homoskedasticity,

1

D0 D)

1

.

de…ned in (24) can be simpli…ed so that

11

= Var[g1 (

0 )]

2

6 =4

0

11 Q

Q

0 12 Q Q

0

12 Q

3

Q 7 5 0 22 Q Q

18

and

2

22

6 6 6 6 = 6 6 6 4

= Var[g2 (

0 )]

2 11

11 12

2 12

11 12

2 12

11 22

22 12

11 12

11 22

2 12

22 12

2 12

22 12

22 12

2 22

11 12

2

3

2 11

6 6 6 6 1+6 6 6 4

7 7 7 7 7 7 7 5

11 12

11 12

2 12

11 12

11 22

2 12

22 12

11 12

2 12

11 22

22 12

2 12

22 12

22 12

2 22

3 7 7 7 7 7 7 7 5

2;

with

1

2

6 tr( 1 6 .. =6 . 6 4 tr( p

1)

tr( ..

1

.. .

.

1)

tr(

p

3 ) p 7 7 7 7 5 ) p

and

2

2

6 tr( 1 6 .. =6 . 6 4 tr( p

0 1)

tr( ..

1

0 p)

.. .

.

0 1)

tr(

p

0 p)

3

7 7 7: 7 5

Furthermore, D de…ned in (25) can be simpli…ed so that

D1 =

E[

@ g1 ( @ 0

and

2

D2 =

with W(S

1;kl 1

= [E(Z0k

6 6 6 @ 6 E[ 0 g2 ( 0 )] = 6 6 @ 6 4

1 ul );

; E(Z0k

0 p ul )]

0 )]

2

0

6 Q E(Z1 ) =4 3

1;11 1;12 1;21 1;22

and

2;kl

0

Q E(Z2 ) 2

7 6 7 6 7 6 7 6 7+6 7 6 7 6 5 4

= [E(Z0l

3 7 5

(30)

2;11 2;12 2;21 2;22 0 1 uk );

; E(Z0l

3

7 7 7 7 7; 7 7 5

0 0 p uk )] .

(31)

Let Gkl =

)kl for k; l = 1; 2. From the reduced form (29),

E(Zk ) = [(S

1

)3

k;1 X1

1;0 +(S

1

)3

k;2 X2

2;0 ; G11 X1

1;0 +G12 X2

2;0 ; G21 X1

1;0 +G22 X2

2;0 ; Xk ]

19

and

2

6 6 6 6 0 E(Zk Aul ) = 6 6 6 4

0 1l tr[A (S

1

)3

1l tr(A

0

G11 ) +

1l tr(A

0

G21 ) +

1

)3

l2 tr(A

0

G12 )

l2 tr(A

0

G22 )

k;2 ]

0

for k; l = 1; 2, where A can be replaced by either As we can see now, both

0 l2 tr[A (S

k;1 ] +

r

0 r

or

for r = 1;

and D depend on Q and

r ’s.

3 7 7 7 7 7 7 7 5

; p.

The best Q and

r ’s

that minimize the asymptotic variance-covariance matrix of bgmm given by (limn!1 n

1

are those

D0 D)

1

.

Under homoskedasticity, we can use the criterion for the redundancy of moment conditions (Breusch et al., 1999) to show that the best Q and

1

)12 X2 ; (S

)11

diag[(S

1

)11 ]

5

= G11

diag(G11 )

)12

diag[(S

1

)12 ]

6

= G12

diag(G12 )

)21

diag[(S

1

)21 ]

7

= G21

diag(G21 )

)22

diag[(S

1

)22 ]

8

= G22

diag(G22 ):

[X; (S

=

(S

1

1

=

(S

1

2

=

(S

1

3

=

(S

1

4

1

1

satisfying Assumption 6 are

)11 X1 ; (S

=

Q

r ’s

)21 X1 ; (S

1

)22 X2 ; G11 X1 ; G12 X2 ; G21 X1 ; G22 X2 ]

(32)

Proposition 4 For model (27) with homoskedastic disturbances, the best Q and

r ’s

that satisfy

Assumption 6 are given in (32). The best moment conditions with Q and involve unknown parameters

0

and

0.

r ’s

given in (32) are not feasible as Q and

With some consistent preliminary estimators for

0

r ’s

and

0,

the feasible best moment conditions can be obtained in a similar manner as in Lee (2007). Note that, P1 (S 1 )kl = j=0 cj Wj for some constants c0 ; c1 ; , for k; l = 1; 2. Thus, we can use the leading order terms of the series expansion, i.e., In ; W; Gkl = W(S

1

)kl in Q and

r ’s.

1

)kl and

Since the approximation error goes to zero very fast as p increases,

in practice, we could use Q = [X; WX; Wp

; Wp to approximate the unknown (S

; Wp X] and

1

= W;

2

= W2

diag(W2 );

;

p

=

diag(Wp ) for some small p to construct the moment conditions.

20

5

Monte Carlo

To study the …nite sample performance of the proposed GMM estimator, we conduct a small Monte Carlo simulation experiment. The model considered in the experiment is given by

In the DGP, we set

y1

=

21;0 y2

+

11;0 Wy1

+

21;0 Wy2

+

1;0 x1

+ u1

y2

=

12;0 y1

+

12;0 Wy1

+

22;0 Wy2

+

2;0 x2

+ u2 :

21;0

=

12;0

= 0:2,

=

11;0

22;0

= 0:4, and

21;0

=

12;0

= 0:2. For the

spatial weights matrix W = [wij ], we take guidance from the speci…cation in Kelejian and Prucha (2010). More speci…cally, we divide the sample into two halves, with each half formulating a circle. In the …rst circle (containing n=2 cross sectional units), every unit is connected with the unit ahead of it and the unit behind. In the second circle, every unit is connected with the 9 units ahead of it and the 9 units behind. There are no connections across those two circles. Let di be the number of connections of the i-th unit. We set wij = 1=di if units i and j are connected and wij = 0 otherwise. We generate x1 and x2 from independent standard normal distributions. We generate p p u1 = (u11 ; ; un1 )0 and u2 = (u12 ; ; un2 )0 such that ui1 = & i i1 and ui2 = & i i2 , where i1 and

i2

are respectively the i-th elements of 2 6 4

B 6 N @0; 4

7 5

1 2

2

0

3

and

1

For the heteroskedastic case, we set & i = homoskedastic case, we set & i = 2.

2

with

In

12 In

12 In

In

31

7C 5A :

p di =2. The average variances of u1 and u2 are 2. For the

We conduct 1000 replications in the simulation experiment for di¤erent speci…cations with n 2 f250; 500g,

where and

1

=(

12

2 f0:3; 0:5; 0:7g, and (

21 ;

11 ;

1;0 ;

2;0 )

2 f(1; 1); (0:5; 0:5)g. Let

u1 (

1)

= y1

21 y2

11 Wy1

21 Wy2

1 x1

u2 (

2)

= y2

12 y1

12 Wy1

22 Wy2

2 x2 ;

21 ;

1)

and

2

=(

12 ;

12 ;

22 ;

2 ).

Let Q = [x1 ; x2 ; Wx1 ; Wx2 ; W2 x1 ; W2 x2 ]

= W. We consider the following estimators:

21

(a) The 2SLS estimator of

1

(b) The 3SLS estimator of u( ) = [u1 (

based on the linear moment function Q0 u1 (

=(

0 1;

0 0 2)

based on the linear moment function (I2

1)

Q)0 u( ), where

0 0 0 1 ) ; u2 ( 2 ) ] .

(c) The single-equation GMM (GMM-1) estimator of Q0 u1 (

1 ).

and quadratic moment function u1 (

(d) The system GMM (GMM-2) estimator of and the quadratic moment functions u1 ( (e) The QML estimator of

0 1)

1

based on the linear moment function

u1 (

1 ).

based on the linear moment function (I2 0 1)

u1 (

1 ),

u1 (

0 1)

u2 (

2 ),

and u2 (

0 2)

Q)0 u( ) u2 (

2 ).

described in Yang and Lee (2017).

Among the above estimators, the 2SLS and GMM-1 are equation-by-equation “limited information” estimators, while 3SLS, GMM-2 and QML are “full information” estimators. To obtain the heteroskedasticity-robust optimal weighting matrix for the 3SLS and GMM estimators, we consider preliminary estimators of e1

e2

1

and

2

given by

=

arg min u1 (

0 0 1 0 Q u1 ( 1 ) 1 ) Q(Q Q)

+ [u1 (

0 1)

u1 (

2 1 )]

=

arg min u2 (

0 0 1 0 Q u2 ( 2 ) 2 ) Q(Q Q)

+ [u2 (

0 2)

u2 (

2 2 )] :

The estimation results are reported in Tables 1-4. We report the mean and standard deviation (SD) of the empirical distributions of the estimates. To facilitate the comparison of di¤erent estimators, we also report their root mean square errors (RMSE). The main observations from the experiment are summarized as follows. [Insert Tables 1-4 here] With homoskedastic disturbances, all the estimators are essentially unbiased when n = 500. The QML estimator for (

11;0 ;

21;0 )

has the smallest SD with the GMM-2 estimator being a

close runner up. As reported in Table 1, when n = 500 and of

11;0

12

= 0:5, the QML estimator

reduces the SDs of the 2SLS, 3SLS, GMM-1 and GMM-2 estimators by, respectively,

33.9%, 34.5%, 15.9% and 5.1%; and the QML estimator of

21;0

reduces the SDs of the 2SLS,

3SLS, GMM-1 and GMM-2 estimators by, respectively, 24.1%, 24.7%, 22.1% and 7.4%. On the other hand, the SDs of all the estimators for

21;0

and

1;0

are largely the same.

22

With heteroskedastic disturbances, the QML estimator for (

11;0 ;

21;0 )

is downwards biased.

The bias remains as sample size increases. The GMM-2 estimator for ( smallest SD. As reported in Table 2, when n = 500 and of

12

11;0 ;

21;0 )

has the

= 0:5, the GMM-2 estimator

reduces the SDs of the 2SLS, 3SLS, GMM-1 and QML estimators by, respectively,

11;0

22.5%, 21.6%, 5.5% and 13.8%; and the GMM-2 estimator of

21;0

reduces the SDs of the

2SLS, 3SLS, GMM-1 and QML estimators by, respectively, 15.6%, 13.8%, 11.0% and 19.0%. When

1;0

=

2;0

= 0:5, the IV matrix Q is less informative and the e¢ ciency improvement of

the GMM-2 estimator relative to the other estimators is more prominent. As reported in Table 4, when n = 500 and

12

= 0:5, the GMM-2 estimator of

11;0

reduces the SDs of the 2SLS,

3SLS, GMM-1 and QML estimators by, respectively, 35.4%, 35.0%, 12.0% and 67.4%; and the GMM-2 estimator of

21;0

reduces the SDs of the 2SLS, 3SLS, GMM-1 and QML estimators

by, respectively, 24.9%, 24.9%, 17.3% and 69.0%. In this case, the GMM-2 estimators for and

1;0

21;0

also reduce the SDs of the QML estimators by, respectively, 9.6% and 7.8%.

The computational cost of the GMM estimator is much lower than that of the QML estimator. For example, when n = 250,

1;0

=

2;0

= 1,

12

= 0:5 and disturbances are homoskedastic,

the average computation time for GMM-1, GMM-2 and QML estimators are, respectively, 0.06, 0.10 and 9.38 seconds.9

6

Summary

In this paper, we propose a general GMM framework for the estimation of SAR models in a system of simultaneous equations with unknown heteroskedasticity. We introduce a new set of quadratic moment conditions to construct the GMM estimator based on the correlation structure of model disturbances within and across equations. We establish the consistency and asymptotic normality of the proposed the GMM estimator and discuss the optimal choice of moment conditions. We also provide heteroskedasticity-robust estimators for the optimal GMM weighting matrix and the asymptotic variance-covariance matrix of the GMM estimator. The Monte Carlo experiments show that the proposed GMM estimator perform well in …nite samples. In particular, the optimal GMM estimator with some simple moment functions is robust under heteroskedasticity with no apparent loss in e¢ ciency under homoskedasticity, whereas the 9 The Monte Carlo experiments are performed on a computer with Intel (R) Xeon (R) CPU X5450 @ 3.00 GHz and 32.0 GB RAM.

23

QML estimator for the spatial lag coe¢ cient is biased with a large standard deviation in the presence of heteroskedasticity. Furthermore, the computational cost of the GMM approach is drastically smaller than that of the QML.

References Allers, M. A. and Elhorst, J. P. (2011). A simultaneous equations model of …scal policy interactions, Journal of Regional Science 51: 271–291. Baltagi, B. H. and Pirotte, A. (2011). Seemingly unrelated regressions with spatial error components, Empirical Economics 40: 5–49. Breusch, T., Qian, H., Schmidt, P. and Wyhowski, D. (1999). Redundancy of moment conditions, Journal of Econometrics 91: 89–111. Cli¤, A. and Ord, J. (1973). Spatial Autecorrelation, Pion, London. Cli¤, A. and Ord, J. (1981). Spatial Processes, Models and Applications, Pion, London. Cohen-Cole, E., Liu, X. and Zenou, Y. (2016). Multivariate choices and identi…cation of social interactions. Working paper, University of Colorado Boulder. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators, Econometrica 50: 1029–1054. Hauptneier, S., Mittermaier, F. and Rincke, J. (2012). Fiscal competition over taxes and public inputs, Regional Science and Urban Economics 42: 407–419. Kelejian, H. H. and Prucha, I. R. (2004). Estimation of simultaneous systems of spatially interrelated cross sectional equations, Journal of Econometrics 118: 27–50. Kelejian, H. H. and Prucha, I. R. (2010). Speci…cation and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances, Journal of Econometrics 157: 53–67. Lee, L. F. (2007). GMM and 2SLS estimation of mixed regressive, spatial autoregressive models, Journal of Econometrics 137: 489–514. Lin, X. and Lee, L. F. (2010). GMM estimation of spatial autoregressive models with unknown heteroskedasticity, Journal of Econometrics 157: 34–52.

24

Liu, X. (2015). GMM identi…cation and estimation of peer e¤ects in a system of simultaneous equations. Working paper, Univeristy of Colorado Boulder. Manski, C. F. (1993). Identi…cation of endogenous social e¤ects: the re‡ection problem, The Review of Economic Studies 60: 531–542. Patacchini, E. and Zenou, Y. (2012). Juvenile delinquency and conformism, Journal of Law, Economics, and Organization 28: 1–31. Schmidt, P. (1976). Econometrics, Marcel Dekker, New York. White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity, Econometrica 48: 817–838. White, H. (1994). Estimation, Inference and Speci…cation Analysis, Cambridge Univsersity Press, New York. Yang, K. and Lee, L. F. (2017). Identi…cation and QML estimation of multivariate and simultaneous equations spatial autoregressive models, Journal of Econometrics 196: 196–214.

25

A

Lemmas

In the following, we list some lemmas useful for proving the main results in this paper. Lemma A.1 Let A = [aij ] and B = [bij ] be n 1; 2; 3; 4

be n

n nonstochastic matrices with zero diagonals. Let

1 vectors of independent random variables with zero mean. Let

kl

= E(

0 k l)

for k; l = 1; 2; 3; 4. Then,

E( 01 A

0 2 3B 4)

= tr(

13 A

24 B

0

) + tr(

14 A

23 B):

Proof: As aii = bii = 0 for all i, E( 01 A 2 03 B 4 ) n X n X n X n X aij bkl = E(

1;i 2;j 3;k 4;l )

i=1 j=1 k=1 l=1

=

n X

aii bii E(

1;i 2;i 3;i

4;i ) +

aii bjj E(

n n X X

aij bij E(

1;i 3;i )E( 2;j 4;j )

+

tr(

13 A

24 B

0

) + tr(

14 A

Lemma A.2 Let A = [aij ] be an n (c1 ;

; cn ) be an n

n X n X

aij bji E(

1;i 4;i )E( 2;j 3;j )

i=1 j6=i

i=1 j6=i

=

1;i 2;i )E( 3;j 4;j )

i=1 j6=i

i=1

+

n X n X

23 B):

n nonstochastic matrix with a zero diagonal and c =

1 nonstochastic vector. Let

1; 2; 3

be n

1 vectors of independent ran-

dom variables with zero mean. Then,

E( 01 A

0 2 3 c)

= 0:

Proof: As aii = 0 for all i,

E(

0 0 1 A 2 3 c)

n X n X n X = E( aij ck i=1 j=1 k=1

1;i 2;j 3;k )

=

n X

aii ci E(

1;i 2;i 3;i )

= 0:

i=1

26

Lemma A.3 Let A be an mn

mn nonstochastic matrix with row and column sums uniformly 1 0

bounded in absolute value. Suppose u satis…es Assumption 1. Then (i) n n

1

[u0 Au

u Au = Op (1) and (ii)

E(u0 Au)] = op (1).

Proof: A trivial extension of Lemma A.3 in Lin and Lee (2010). Lemma A.4 Let A be an mn

mn nonstochastic matrix with row and column sums uniformly

bounded in absolute value. Let c be an mn 1 nonstochastic vector with uniformly bounded elements. Suppose u satis…es Assumption 1. Then n limn!1 n

1=2 0

c Au = Op (1) and n

1 0

c A A0 c exists and is positive de…nite, then n

1 0

c Au = op (1). Furthermore, if

d

1=2 0

1 0

c A A0 c).

c Au ! N(0; limn!1 n

Proof: A trivial extension of Lemma A.4 in Lin and Lee (2010). Lemma A.5 Let Akl be an n

n nonstochastic matrix with row and column sums uniformly

bounded in absolute value and ck an n

1 nonstochastic vector with uniformly bounded elements Pm ; m. Suppose u satis…es Assumption 1. Let 2 = Var( ), where = k=1 c0k uk +

for k; l = 1; Pm Pm 0 k=1 l=1 [uk Akl ul

tr(Akl

kl )].

If n

1 2

is bounded away from zero, then

1

d

! N(0; 1).

Proof: A trivial extension of Lemma 3 in Yang and Lee (2017).

Lemma A.6 Let c1 and c2 be mn S = Imn

0 0

(

estimators of

In ) 0

and

0 0

( 0

1 nonstochastic vectors with uniformly bounded elements. Let

e = Imn W) and S

respectively. Then, n

(e0

In )

1 0 e 1 c1 (S

(e0

S

1

W), where e and e are consistent

)c2 = op (1).

Proof: A trivial extension of Lemma A.9 in Lee (2007). Lemma A.7 Let f ( ) = [f1 ( )0 ; f2 ( )0 ]0 with E[f (

0 )]

E[ @@ 0 fi ( )] and

= 0. De…ne Di =

ij

=

E[fi ( )fj ( )0 ] for i; j = 1; 2. The following statements are equivalent (i) f2 is redundant given f1 ; (ii) D2 =

21

1 11 D1

and (iii) there exists a matrix A such that D2 =

21 A

and D1 =

11 A.

Proof: See Breusch et al. (1999).

B

Proofs

Proof of Proposition 1: For consistency, we …rst need to show that n

1

Fg( )

n

1

E[Fg( )] =

op (1) uniformly in . Suppose the i-th row of F can be written as

Fi = [fi;1 ;

; fi;m ; fi;11;1 ;

; fi;11;p ;

; fi;mm;1 ;

; fi;mm;p ]: 27

Then, Fi g( ) =

m X

fi;k Q0 uk (

k) +

of

k;0 0

=(

and

uk (

1k;0 ;

0,

;

0 mk;0 )

0 k)

fi;kl;r uk (

r ul ( l ):

k=1 l=1 r=1

k=1

Let

p m X m X X

and

k;0

=(

1k;0 ;

;

0 mk;0 )

denote, respectively, the k-th column

including the restricted parameters. From the reduced form (3),

k)

= yk

Yk

= Yk (

k;0

= Y(

Yk

k

k)

k

Xk

+ Yk (

k;0

+ WY( k;0 m X = dk ( k ) + uk + [( lk;0

(33)

k k)

k)

k;0

+ Xk (

k)

+ Xk (

0 lk )(im;l

k)

k;0

k)

k;0

In ) + (

+ uk + uk

lk;0

0 lk )(im;l

W)]S

1

u

l=1

where

dk (

k) =

m X

[(

lk;0

0 lk )(im;l

In ) + (

0 lk )(im;l

lk;0

W)]S

1

(B00

In )x + Xk (

k;0

k)

l=1

and S = Imn

0 0

In

0 0

W. This implies that

E[Q0 uk (

k )]

= Q0 dk (

k)

28

and

E[uk ( k )0 r ul ( l )] = dk ( k )0 r dl ( l ) m m X X 0 1 0 + ( jl;0 )tr[ (i I )S E(uu )] + ( r m;j n jl k +

j=1 m X

(

ik )tr[

ik;0

0 0 r (im;i

In )S

1

E(uu0l )] +

i=1

+

+

+

+

j=1 m X

(

jl;0

jl )tr[

0 r (im;j

W)S

1

E(uu0k )]

ik;0

ik )tr[

0 0 r (im;i

W)S

1

E(uu0l )]

i=1

m X m X

i=1 j=1 m X m X

i=1 j=1 m X m X

i=1 j=1 m X m X

(

ik;0

ik )( jl;0

0 1 0 (im;i jl )tr[S

In )0

(

ik;0

ik )( jl;0

0 1 0 (im;i jl )tr[S

W)0

(

ik;0

ik )( jl;0

0 1 0 (im;i jl )tr[S

In )0

(

ik;0

ik )( jl;0

0 1 0 (im;i jl )tr[S

W)0

0 r (im;j

1

In )S

0 r (im;j

0 r (im;j

]

In )S

1

]

W)S

1

]

0 r (im;j

W)S

1

]:

i=1 j=1

As Fi g( ) is a quadratic function of A.3 and A.4 that n

1

Fi g( )

n

is uniformly equicontinuous in n

1

1

and the parameter space of

is bounded, it follows by Lemmas

E[Fi g( )] = op (1) uniformly in . Furthermore, n

1

E[Fg( )]

. The identi…cation condition and the uniform equicontinuity of

E[Fg( )] imply that the identi…cation uniqueness condition for n

Therefore, egmm is a consistent estimator of

0

2

E[g( )0 ]F0 FE[g( )] holds.

(White, 1994).

For the asymptotic normality, we use the mean value theorem to write p

where device,

n(egmm

0)

1 @ e 1 @ g( gmm )0 F0 F 0 g( ) n@ n @

is as convex combination of egmm and

p1 Fg( 0 ) n

1 @ @ 0 g(

)

0.

1

limn!1 n

also converges in probability to 1

1 @ e 1 g( gmm )0 F0 p Fg( n@ n

0)

By Lemma A.5 together with the Cramer Wald

converges in distribution to N(0; limn!1 n

of egmm implies that n

=

0.

1

F F0 ). Furthermore, consistency

Therefore it su¢ ces to show that

E[ @@ 0 g( )] = op (1) uniformly in . We divide the remainder of the proof

into two parts focusing respectively on the partial derivatives of g1 ( ) and g2 ( ). First, note that

2

0 6 Q Z1 6 @ 6 0 g1 ( ) = 6 @ 4

..

3

. Q0 Zm

7 7 7: 7 5 29

From the reduced form (3), we have

yl = (i0m;l 1

By Lemma A.4, we have n which implies that that n

Q0 yl

1 @ @ 0 g1 (

In )S

1

[(B00

In )x + u]:

E(Q0 yl ) = op (1) and n

n

1

)

n

1

1

Q0 Wyl = n

1

E(Q0 Wyl ) + op (1),

E[ @@ 0 g1 ( )] = op (1).

Next, note that 2

@ g2 ( ) = @ 0

where

1;kl ( l )

As uk (

k)

n

1

E[Z0k

fore, n

1;11 ( .. .

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4

1;1m ( m )

..

= [Z0k

2

3

1)

.

; Z0k

1 ul ( l );

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4

7 7 7 7 7 7 7 7 7 7 7 7 7 1;m1 ( 1 ) 7 7 7 .. 7 . 7 5 1;mm ( m ) 0 p ul ( l )]

and

2;11 (

1)

..

.

.. . 2;m1 ( m )

2;kl ( k )

..

= [Z0l

.

3

7 7 7 7 7 7 7 ( ) 2;1m 1 7 7 7; 7 7 7 7 7 7 7 7 5 2;mm ( m )

0 1 uk ( k );

; Z0l

can be expanded as (33), it follows by Lemmas A.3 and A.4 that n

r ul ( l )]

1 @ @ 0 g2 (

)

= op (1) and n n

1

1

Z0l

0 r uk ( k )

n

1

E[Z0l

0 r uk ( k )]

1

Z0k

0 0 p uk ( k )] . r ul ( l )

= op (1) uniformly in . There-

E[ @@ 0 g2 ( )] = op (1) and the desired result follows.

Proof of Proposition 2: We divide the proof into two parts. First we prove the consistency of n

1e

. Then we prove the consistency of n

(1) To show n

and (1b) n

1

1e

n

1

1

e D.

= op (1), we need to show that (1a) n

tr( e kl A1 e st A2 ) n

1

tr(

kl A1

st A2 )

1

Q0 e kl Q n

= op (1), for k; l; s; t = 1;

1

Q0

kl Q

= op (1)

; m, for n n zero-

diagonal matrices A1 = [a1;ij ] and A2 = [a2;ij ] that are uniformly bounded in row and column sums. Q0 e kl Q in (1a) can be shown by a similar argument as in White (1980). Pn eik u eil u ejs u ejt . It Thus, we focus on the consistency of n 1 tr( e kl A1 e st A2 ) = n 1 i;j=1 a1;ij a2;ji u P n follows by a similar argument as in Lin and Lee (2010) that n 1 i;j=1 a1;ij a2;ji uik uil ujs ujt Pn n 1 i;j=1 a1;ij a2;ji i;kl j;st = op (1). Therefore, to show (1b) holds, we only need to show that Pn Pn n 1 i;j=1 a1;ij a2;ji u eik u eil u ejs u ejt n 1 i;j=1 a1;ij a2;ji uik uil ujs ujt = op (1). The consistency of n

1

30

Note that

n

= n

n X

1

1

+n

i;j=1 n X

a1;ij a2;ji u eik u eil u ejs u ejt

a1;ij a2;ji (e uik u eil

i;j=1 n X 1

i;j=1

n

1

n X

a1;ij a2;ji uik uil ujs ujt

i;j=1

uik uil )ujs ujt + n

1

n X

i;j=1

a1;ij a2;ji (e uik u eil

uik uil )(e ujs u ejt

a1;ij a2;ji uik uil (e ujs u ejt

ujs ujt )

ujs ujt ):

From (33), we have e k = uk (ek ) = yk u

Yk e k

Yk e k

Xk e k = dk (ek ) + uk + ek (ek )

where dk (ek ) ek ( e k )

m X

=

[(

lk;0

[(

lk;0

l=1

m X

=

l=1

and S = Imn

(

0 0

In )

elk )(i0m;l

elk )(i0m;l (

0 0

respectively. Then,

In ) + (

lk;0

In ) + (

lk;0

e )(i0 lk m;l

e )(i0 lk m;l

W)]S

1

W)]S

1

(B00

In )x + Xk (

e ) k

k;0

u

W). Let dik and eik denote the i-th element of dk (ek ) and ek (ek )

u eik u eil = uik uil + dik dil + eik eil + (uik dil + dik uil ) + (uik eil + eik uil ) + (dik eil + eik dil ):

To show n

1

Pn

i;j=1

a1;ij a2;ji (e uik u eil

uik uil )ujs ujt = op (1), we focus on terms that are of higher

orders in uil . One of such terms is

n

1

n X

i;j=1

a1;ij a2;ji eik uil ujs ujt

=

m X

(

r=1 m X

+

r=1

rk;0

(

rk;0

erk )n

1

e )n rk

n X

a1;ij a2;ji uil ujs ujt (i0m;r

i;j=1 n X 1

a1;ij a2;ji uil ujs ujt (i0m;r

i0n;i )S

1

wi )S

u 1

u;

i;j=1

where wi denotes the i-th row of W. By Assumption 1, we can show Ejuhk uil ujs ujt j c for some Pn Pm constant c using Cauchy’s inequality, which implies Ejn 1 i;j=1 r=1 a1;ij a2;ji uil ujs ujt (i0m;r Pn Pm i0n;i )S 1 uj = O(1) and Ejn 1 i;j=1 r=1 a1;ij a2;ji uil ujs ujt (i0m;r wi )S 1 uj = O(1) because A1 , 31

Pn are uniformly bounded in row and column sums. Hence, n 1 i;j=1 a1;ij a2;ji eik uil ujs ujt = Pn op (1). Similarly, we can show other terms in n 1 i;j=1 a1;ij a2;ji (e uik u eil uik uil )ujs ujt are of order Pn op (1). With a similar argument as above or as in Lin and Lee (2010), n 1 i;j=1 a1;ij a2;ji uik uil (e ujs u ejt Pn ujs ujt ) = op (1) and n 1 i;j=1 a1;ij a2;ji (e uik u eil uik uil )(e ujs u ejt ujs ujt ) = op (1). Therefore, the 1

A2 , W and S

consistency of n

1

tr( e kl A1 e st A2 ) in (1b) follows.

(2) Some typical entries in D that involve unknown parameters are Q0 E(yk ) = Q0 (i0m;k In )S

1

In )S

1

(B00

In )x, Q0 E(Wyk ) = Q0 (i0m;k

1

Wn )S

(B00

] and E(u0l AWyk ) = tr[(im;l A)(i0m;k W)S

1

In )x, E(u0l Ayk ) = tr[(im;l

], where A = [aij ] is an n n zero-diagonal 1

matrix uniformly bounded in row and column sums. To show n show that (2a) n n

1

e W)S

Q0 (i0m;k

A)(i0m;k e W)S

1e

]

1

e In )S n

1

1e

e In )S

Q0 (i0m;k 1

]

e0 (B 0 n

tr[(im;l

1

In )x 1

e0 (B 0

n

1

In )x

Q0 (i0m;k

A)(i0m;k

tr[(im;l

A)(i0m;k

2

6 6 e =6 6 4

Q0 (i0m;k

W)S

1

1

In )S

e D

In )S

(B00

e 11 .. .

..

.. .

e m1

1

1

(B00

D = op (1), we need to

] = op (1) and n

e 1m

.

n

e mm

In )x = op (1) and

In )x = op (1); (2b) n

e = Imn ] = op (1). where S

1

W)S

1

n

A)(i0m;k

1

(e0

3

tr[(im;l (e0

In )

1

tr[(im;l

A)(i0m;k W) and

7 7 7: 7 5

As (2a) follows by Lemma A.6 and (2b) follows by a similar argument as in Lin and Lee (2010), we conclude n

1

e D

n

1

D = op (1).

Proof of Proposition 3: For consistency, note that g( )0 e

1

g( ) = g( )0

1

From the proof of Proposition 1, n Hence, it su¢ ces to show that n

1

1

g( )0

g( )0 ( e

1 1

g( ) + g( )0 ( e

g( )

n 1

1

1

E[g( )0

1

1

)g( ):

g( )] = op (1) uniformly in .

)g( ) = op (1) uniformly in . Let k k denote

the Euclidian norm for vectors and matrices. Then, 1 g( )0 ( e n

2 1

From the proof of Proposition 1, n

1

1 kg( )k n

)g( )

1

g( ) n

1

2

1e n

1

1 n

1

E[g( )] = op (1) uniformly in . As n

:

1

E[Q0 uk (

k )]

=

32

n

1

Q0 dk (

k)

= O(1) and n

1

0 k)

E[uk (

r ul ( l )]

1

=n

0 k)

dk (

r dl ( l )+n

1

0 k)

tr[Gk (

r Gl ( l )

]=

O(1) uniformly in , where

dk (

k)

=

m X

[(

0 lk )(im;l

lk;0

In ) + (

0 lk )(im;l

lk;0

W)]S

1

(B00

In )x + Xk (

k)

k;0

l=1

and Gk (

k)

=

Pm

elk )(i0m;l

l=1 [( lk;0

In )+(

1

Op (1) uniformly in . Therefore, n

e )(i0 lk m;l

lk;0

g( )0 ( e

1

1

W)]S

1

, it follows that n

1

kg( )k =

)g( ) = op (1) uniformly in .

For the asymptotic distribution, by the mean value theorem, for some convex combination of bgmm and

p

0

n(bgmm

denoted by ,

0)

= "

=

"

1 @ b g( gmm )0 n@

1 D n

1

1 n

h N 0; lim n

d

!

1 D n

1

n!1

1e n # 1

1

# 1 @ g( ) n@ 0

1 D n

i D0 D

1

1 n

1

1 @ b g( gmm )0 n@

1 p g( n

0)

1e n

1

1 p g( n

0)

+ op (1)

1

where the asymptotic distribution statement is implied by Lemma A.5. Proof of Proposition 4:

To show that the moment functions g ( ) with Q and

r ’s

given in

(32) are the most e¢ cient, it su¢ ces to show that any additional moment conditions g( ) in the form of (10) are redundant. By Lemma A.7, it su¢ ces to …nd a matrix A such that E [g(

0 )g

(

0 0) ] A

and

E[ @@ 0 g (

0 )]

= E [g (

0 )g

(

0 0 ) ] A.

E[ @@ 0 g(

0 )]

=

Let J1 and J2 be selection matrices

such that X1 = XJ1 and X2 = XJ2 . For any linear moment function 2

0 6 Q u1 ( g1 ( ) = 4 Q0 u2 (

3 ) 1 7 5; ) 2

it follows from (29) and (30) that

E[

@ g1 ( @ 0

0 )]

2

0

6 Q [E(y2 ); E(Wy1 ); E(Wy2 ); XJ1 ] =4

0

Q [E(y1 ); E(Wy1 ); E(Wy2 ); XJ2 ]

3 7 5

33

where

E(y1 )

=

(S

1

E(y2 )

=

(S

1

)11 X1

+ (S

1

1;0

)21 X1

+ (S

1

1;0

0 )]

+ G12 X2

2;0

E(Wy2 )

= G21 X1

1;0

+ G22 X2

2;0 :

0 0 )g1 ( 0 ) ]

= E [g1 (

1

6 6 6 6 6 6 6 6 6 6 6 6 =6 6 6 6 6 6 6 6 6 6 6 6 4

2

6 =4

0 0 )g1 ( 0 ) ] A1 ,

3 0 Q Q 12 7 5 0 22 Q Q

0 11 Q Q 0 12 Q Q

where

1 2 12

11 22

2

2;0

1;0

A1 =

with

)22 X2

= G11 X1

E [g1 (

E[ @@ 0 g1 (

2;0

E(Wy1 )

On the other hand,

Hence,

)12 X2

2 6 4

22

1

12

2

12

1

11

2

3

0

0

0

0

0

0

0

0

0

1;0

0

0

2;0

0

0

0

1;0

0

0

2;0

0

0

0

1;0

0

0

2;0

2

J1 7 7 0 7 7 7 7 0 7 7 7 0 7 7 7 0 7 7 7 7 0 7 7 7 0 7 7 7 0 7 7 5 0

and

2

Next, for any quadratic moment function 2

u1 (

6 6 6 u1 ( 6 g2 ( ) = 6 6 6 u2 ( 4 u2 (

0 1) 0

1)

0 2) 0 2)

u1 (

6 6 6 6 6 6 6 6 6 6 6 6 =6 6 6 6 6 6 6 6 6 6 6 6 4

1)

3 7 5

0

0

0

1;0

0

0

2;0

0

0

0

0

0

0

0

0

0

1;0

0

0

2;0

0

0

0

1;0

0

0

2;0

3

J2 7 7 0 7 7 7 7 0 7 7 7 0 7 7 7 0 7 7: 7 7 0 7 7 7 0 7 7 7 0 7 7 5 0

3

7 7 u2 ( 2 ) 7 7 7; 7 u1 ( 1 ) 7 5 u2 ( 2 ) 34

E[ @@ 0 g2 (

it follows from (29) and (31) that 2

6 6 6 6 = 6 6 6 4

D2;1

11 tr(

0

12 tr(

0

2

3) + 3)

11 tr(

6 6 6 6 +6 6 6 4

+

12 tr(

0

22 tr(

0

4) 4)

0 )]

= [D2;1 ; D2;2 ], where

11 tr(

0

12 tr(

0

5) + 5)

0

22 tr(

0

6) 6)

11 tr(

0

12 tr(

0

7) + 7)

+

0

0

0

0

0

0

3)

+

12 tr(

4)

11 tr(

5)

0 12 tr(

+

12 tr(

3)

+

12 tr(

6)

11 tr(

7)

0

+

22 tr(

4)

12 tr(

5)

0

+

+

12 tr(

0

22 tr(

0

12 tr(

0 22 tr(

6)

12 tr(

7)

0

+

22 tr(

0

8) 0

3

0

3

7 7 7 8) 0 7 7 7 0 7 5 0 3 ) 0 8 7 7 0 7 7 7 7 ) 0 7 8 5 0

and 2

6 6 6 6 = 6 6 6 4

D2;2

0

0

0

0

0

0

11 tr(

0

1)

+

12 tr(

0

2)

11 tr(

0

5)

+

12 tr(

0

6)

11 tr(

0

7)

+

12 tr(

0

1)

+

22 tr(

0

2)

12 tr(

0

5)

+

22 tr(

0

6)

12 tr(

0

7)

+

2

0

6 6 6 6 +6 6 6 4

1)

11 tr(

0

+

12 tr(

2)

11 tr(

5)

0 12 tr(

1)

+

0 12 tr(

6)

11 tr(

7)

0

+

22 tr(

2)

12 tr(

5)

+

+ 0

22 tr(

6)

12 tr(

7)

+

; tr(

0

7 7 0 7 7 7 7 0 12 tr( 8) 0 7 5 0 22 tr( 8) 0 3 0 7 7 7 12 tr( 8) 0 7 7 7 0 7 5 tr( ) 0 22 8

Furthermore,

E[g2 (

0 0 )g2 ( 0 ) ]

=

1

[tr(

1 );

; tr(

8 )]

+

[tr(

2

0

1 );

8 )];

where 2

1

6 6 6 6 =6 6 6 4

2 12

2 11

11 12

11 12

2 12

11 22

11 22

2 12

22 12

22 12

2 22

11 12 2 12

22 12

11 12

22 12

3

7 7 7 7 7 and 7 7 5

2

2

6 6 6 6 =6 6 6 4

2 11

11 12

11 12

2 12

11 12

11 22

2 12

22 12

11 12

2 12

11 22

22 12

2 12

22 12

22 12

2 22

3

7 7 7 7 7: 7 7 5

35

Hence,

E[ @@ 0 g2 ( A2

0 )]

=

= E [g2 (

[@1 +[@2

0 0 )g2 ( 0 ) ] A2 ,

i8;3 ; @1

where

i8;5 ; @1

i8;4 ; @2

with

i8;7 ; 0; @3

i8;6 ; @2

i8;8 ; 0; @4

@2 =

12

@3 =

2

3

0 6 7 6 7 6 0 7 6 7 1 7 2 6 6 7 6 11 7 4 5

3

0

3

6 7 6 7 6 12 7 7 16 7= 1 6 6 7 6 0 7 4 5 22

2

0 )g

(

0 0) ] A

3

12

2

0

3

7 6 7 6 6 0 7 7 16 7: 2 6 7 6 6 12 7 4 5

In summary, the desired result follows since E[ @@ 0 g( E [g (

i8;8 ; 0];

3

0 6 7 6 7 6 11 7 6 7 1 7= 1 6 6 7 6 0 7 4 5 2

@4 =

12

7 6 7 6 6 22 7 7 16 7 2 6 7 6 6 0 7 5 4 0

12

and

i8;6 ; @4

3

6 7 6 7 6 0 7 7 16 7= 1 6 6 7 6 22 7 4 5 0 2

i8;7 ; 0]

6 11 7 7 6 6 12 7 7 16 7 2 6 7 6 6 0 7 5 4 0

6 11 7 7 6 6 0 7 7 16 7= 1 6 7 6 6 12 7 5 4 0 2

i8;5 ; @3

i8;2 ; @4 2

3

2

@1 =

i8;1 ; @3

22

0 )]

= E [g(

0 )g

(

0 0) ] A

and

E[ @@ 0 g (

0 )]

for A = [A01 ; A02 ]0 .

36

=

Table 1: Estimation under Homoskedasticity (𝛽1,0 = 𝛽2,0 = 1) 𝜙21,0 = 0.2 𝜆11,0 = 0.4 𝜆21,0 = 0.2

𝑛 = 250 𝜎12 = 0.3 2SLS 0.203(0.093)[0.093] 3SLS 0.199(0.097)[0.097] GMM-1 0.205(0.092)[0.092] GMM-2 0.199(0.095)[0.095] QML 0.194(0.092)[0.092] 𝜎12 = 0.5 2SLS 0.204(0.093)[0.093] 3SLS 0.197(0.097)[0.097] GMM-1 0.208(0.091)[0.092] GMM-2 0.196(0.095)[0.095] QML 0.193(0.093)[0.093] 𝜎12 = 0.7 2SLS 0.205(0.093)[0.093] 3SLS 0.194(0.098)[0.098] GMM-1 0.212(0.090)[0.091] GMM-2 0.194(0.095)[0.096] QML 0.191(0.093)[0.094] 𝑛 = 500 𝜎12 = 0.3 2SLS 0.204(0.063)[0.063] 3SLS 0.201(0.064)[0.064] GMM-1 0.205(0.063)[0.063] GMM-2 0.202(0.064)[0.064] QML 0.200(0.062)[0.062] 𝜎12 = 0.5 2SLS 0.204(0.063)[0.063] 3SLS 0.200(0.065)[0.065] GMM-1 0.206(0.062)[0.063] GMM-2 0.200(0.064)[0.064] QML 0.200(0.062)[0.062] 𝜎12 = 0.7 2SLS 0.205(0.063)[0.063] 3SLS 0.199(0.065)[0.065] GMM-1 0.208(0.062)[0.062] GMM-2 0.199(0.064)[0.064] QML 0.199(0.062)[0.062] Mean(SD)[RMSE]

𝛽1,0 = 1

0.391(0.153)[0.154] 0.392(0.153)[0.153] 0.405(0.113)[0.114] 0.408(0.097)[0.098] 0.404(0.096)[0.096]

0.193(0.166)[0.166] 0.197(0.168)[0.168] 0.194(0.161)[0.161] 0.195(0.128)[0.128] 0.198(0.124)[0.124]

0.995(0.094)[0.094] 0.996(0.096)[0.096] 0.994(0.093)[0.093] 0.997(0.093)[0.094] 1.003(0.093)[0.093]

0.393(0.154)[0.154] 0.395(0.154)[0.154] 0.405(0.125)[0.125] 0.409(0.109)[0.110] 0.405(0.107)[0.107]

0.187(0.167)[0.167] 0.194(0.170)[0.170] 0.191(0.161)[0.161] 0.195(0.135)[0.135] 0.198(0.129)[0.129]

0.995(0.094)[0.094] 0.998(0.095)[0.095] 0.994(0.093)[0.093] 0.998(0.093)[0.093] 1.003(0.093)[0.093]

0.395(0.154)[0.154] 0.397(0.155)[0.155] 0.405(0.133)[0.133] 0.411(0.123)[0.123] 0.406(0.118)[0.118]

0.183(0.168)[0.169] 0.191(0.173)[0.173] 0.187(0.157)[0.157] 0.196(0.143)[0.143] 0.197(0.135)[0.135]

0.995(0.094)[0.094] 1.000(0.096)[0.096] 0.994(0.093)[0.093] 0.999(0.094)[0.094] 1.002(0.094)[0.094]

0.390(0.111)[0.112] 0.391(0.112)[0.112] 0.398(0.079)[0.079] 0.399(0.070)[0.070] 0.397(0.067)[0.067]

0.199(0.115)[0.115] 0.201(0.116)[0.116] 0.198(0.111)[0.111] 0.199(0.090)[0.090] 0.199(0.084)[0.084]

0.998(0.065)[0.065] 0.999(0.065)[0.065] 0.997(0.064)[0.064] 0.999(0.064)[0.064] 1.002(0.063)[0.063]

0.392(0.112)[0.112] 0.392(0.113)[0.113] 0.398(0.088)[0.088] 0.400(0.078)[0.078] 0.398(0.074)[0.074]

0.196(0.116)[0.116] 0.199(0.117)[0.117] 0.197(0.113)[0.113] 0.199(0.095)[0.095] 0.199(0.088)[0.088]

0.998(0.065)[0.065] 1.000(0.065)[0.065] 0.997(0.064)[0.064] 1.000(0.063)[0.063] 1.002(0.063)[0.063]

0.392(0.112)[0.113] 0.393(0.113)[0.114] 0.397(0.095)[0.095] 0.400(0.087)[0.087] 0.398(0.082)[0.082]

0.194(0.116)[0.116] 0.198(0.118)[0.118] 0.196(0.111)[0.111] 0.200(0.100)[0.100] 0.199(0.093)[0.093]

0.998(0.064)[0.064] 1.000(0.065)[0.065] 0.998(0.063)[0.064] 1.000(0.063)[0.063] 1.002(0.063)[0.063]

Table 2: Estimation under Heteroskedasticity (𝛽1,0 = 𝛽2,0 = 1) 𝜙21,0 = 0.2 𝜆11,0 = 0.4 𝜆21,0 = 0.2

𝑛 = 250 𝜎12 = 0.3 2SLS 0.200(0.096)[0.096] 3SLS 0.199(0.093)[0.093] GMM-1 0.206(0.087)[0.088] GMM-2 0.199(0.090)[0.090] QML 0.203(0.094)[0.094] 𝜎12 = 0.5 2SLS 0.200(0.096)[0.096] 3SLS 0.197(0.093)[0.094] GMM-1 0.208(0.086)[0.086] GMM-2 0.196(0.090)[0.090] QML 0.202(0.094)[0.094] 𝜎12 = 0.7 2SLS 0.200(0.096)[0.096] 3SLS 0.194(0.094)[0.095] GMM-1 0.210(0.085)[0.086] GMM-2 0.193(0.091)[0.091] QML 0.201(0.094)[0.094] 𝑛 = 500 𝜎12 = 0.3 2SLS 0.202(0.064)[0.064] 3SLS 0.200(0.061)[0.061] GMM-1 0.204(0.059)[0.059] GMM-2 0.201(0.060)[0.060] QML 0.210(0.062)[0.063] 𝜎12 = 0.5 2SLS 0.202(0.064)[0.064] 3SLS 0.199(0.061)[0.061] GMM-1 0.205(0.059)[0.059] GMM-2 0.200(0.060)[0.060] QML 0.209(0.062)[0.063] 𝜎12 = 0.7 2SLS 0.202(0.064)[0.064] 3SLS 0.198(0.061)[0.061] GMM-1 0.206(0.058)[0.058] GMM-2 0.198(0.060)[0.060] QML 0.209(0.062)[0.062] Mean(SD)[RMSE]

𝛽1,0 = 1

0.398(0.123)[0.123] 0.398(0.122)[0.122] 0.405(0.096)[0.096] 0.406(0.088)[0.088] 0.366(0.110)[0.116]

0.194(0.135)[0.135] 0.195(0.132)[0.132] 0.191(0.129)[0.130] 0.196(0.110)[0.110] 0.185(0.154)[0.154]

0.996(0.097)[0.097] 0.998(0.093)[0.093] 0.995(0.090)[0.090] 0.996(0.090)[0.090] 1.013(0.095)[0.096]

0.399(0.124)[0.124] 0.399(0.123)[0.123] 0.405(0.103)[0.103] 0.408(0.095)[0.095] 0.368(0.118)[0.122]

0.191(0.136)[0.136] 0.194(0.133)[0.133] 0.189(0.127)[0.128] 0.196(0.113)[0.113] 0.182(0.148)[0.149]

0.997(0.096)[0.097] 0.999(0.093)[0.093] 0.995(0.089)[0.089] 0.998(0.090)[0.090] 1.013(0.095)[0.096]

0.400(0.124)[0.124] 0.401(0.124)[0.124] 0.404(0.108)[0.108] 0.409(0.103)[0.103] 0.370(0.122)[0.125]

0.188(0.137)[0.138] 0.193(0.135)[0.135] 0.188(0.124)[0.124] 0.197(0.117)[0.117] 0.179(0.140)[0.141]

0.997(0.097)[0.097] 1.001(0.093)[0.093] 0.995(0.089)[0.089] 1.000(0.090)[0.090] 1.013(0.096)[0.097]

0.396(0.089)[0.089] 0.398(0.088)[0.088] 0.399(0.068)[0.068] 0.400(0.063)[0.063] 0.358(0.075)[0.086]

0.197(0.096)[0.096] 0.198(0.093)[0.094] 0.197(0.091)[0.092] 0.199(0.078)[0.078] 0.187(0.104)[0.104]

1.000(0.066)[0.066] 0.999(0.063)[0.063] 0.998(0.061)[0.061] 0.999(0.061)[0.061] 1.013(0.064)[0.065]

0.397(0.089)[0.089] 0.399(0.088)[0.089] 0.399(0.073)[0.073] 0.401(0.069)[0.069] 0.359(0.080)[0.090]

0.196(0.096)[0.096] 0.198(0.094)[0.094] 0.196(0.091)[0.091] 0.199(0.081)[0.081] 0.185(0.100)[0.102]

1.000(0.066)[0.066] 1.000(0.063)[0.063] 0.998(0.061)[0.061] 0.999(0.061)[0.061] 1.013(0.064)[0.065]

0.397(0.089)[0.089] 0.399(0.089)[0.089] 0.399(0.077)[0.077] 0.402(0.074)[0.074] 0.362(0.083)[0.091]

0.195(0.096)[0.096] 0.197(0.094)[0.094] 0.195(0.089)[0.090] 0.199(0.084)[0.084] 0.181(0.095)[0.097]

1.000(0.066)[0.066] 1.000(0.063)[0.063] 0.998(0.061)[0.061] 1.000(0.061)[0.061] 1.013(0.064)[0.065]

Table 3: Estimation under Homoskedasticity (𝛽1,0 = 𝛽2,0 = 0.5) 𝜙21,0 = 0.2 𝜆11,0 = 0.4 𝜆21,0 = 0.2

𝑛 = 250 𝜎12 = 0.3 2SLS 0.221(0.185)[0.186] 3SLS 0.207(0.207)[0.208] GMM-1 0.227(0.177)[0.179] GMM-2 0.216(0.203)[0.203] QML 0.182(0.190)[0.191] 𝜎12 = 0.5 2SLS 0.225(0.186)[0.188] 3SLS 0.199(0.211)[0.211] GMM-1 0.242(0.173)[0.178] GMM-2 0.208(0.204)[0.204] QML 0.173(0.197)[0.199] 𝜎12 = 0.7 2SLS 0.232(0.187)[0.190] 3SLS 0.194(0.215)[0.215] GMM-1 0.258(0.166)[0.176] GMM-2 0.206(0.202)[0.203] QML 0.167(0.204)[0.207] 𝑛 = 500 𝜎12 = 0.3 2SLS 0.215(0.127)[0.128] 3SLS 0.207(0.134)[0.134] GMM-1 0.221(0.123)[0.125] GMM-2 0.213(0.132)[0.133] QML 0.199(0.127)[0.127] 𝜎12 = 0.5 2SLS 0.217(0.125)[0.126] 3SLS 0.201(0.134)[0.134] GMM-1 0.228(0.120)[0.123] GMM-2 0.209(0.132)[0.132] QML 0.196(0.128)[0.128] 𝜎12 = 0.7 2SLS 0.217(0.125)[0.126] 3SLS 0.197(0.133)[0.133] GMM-1 0.233(0.116)[0.121] GMM-2 0.203(0.129)[0.129] QML 0.191(0.128)[0.129] Mean(SD)[RMSE]

𝛽1,0 = 0.5

0.366(0.315)[0.317] 0.370(0.322)[0.323] 0.405(0.186)[0.186] 0.413(0.150)[0.151] 0.411(0.153)[0.153]

0.197(0.346)[0.346] 0.204(0.357)[0.357] 0.192(0.299)[0.299] 0.180(0.217)[0.218] 0.203(0.216)[0.216]

0.489(0.095)[0.095] 0.488(0.099)[0.099] 0.492(0.092)[0.093] 0.495(0.095)[0.095] 0.504(0.093)[0.093]

0.368(0.334)[0.336] 0.376(0.328)[0.329] 0.399(0.219)[0.219] 0.412(0.181)[0.181] 0.412(0.180)[0.180]

0.184(0.352)[0.352] 0.200(0.374)[0.374] 0.182(0.306)[0.307] 0.185(0.242)[0.243] 0.207(0.232)[0.232]

0.489(0.096)[0.097] 0.492(0.099)[0.099] 0.491(0.091)[0.091] 0.499(0.093)[0.093] 0.505(0.094)[0.094]

0.381(0.323)[0.324] 0.394(0.331)[0.331] 0.393(0.238)[0.238] 0.415(0.213)[0.213] 0.416(0.217)[0.217]

0.180(0.365)[0.365] 0.199(0.391)[0.391] 0.172(0.296)[0.298] 0.185(0.258)[0.259] 0.208(0.252)[0.252]

0.490(0.091)[0.092] 0.499(0.094)[0.094] 0.490(0.087)[0.087] 0.503(0.089)[0.089] 0.505(0.094)[0.095]

0.367(0.215)[0.218] 0.370(0.217)[0.219] 0.391(0.129)[0.129] 0.399(0.106)[0.106] 0.398(0.097)[0.097]

0.199(0.230)[0.230] 0.204(0.235)[0.235] 0.201(0.204)[0.204] 0.194(0.154)[0.154] 0.201(0.138)[0.138]

0.496(0.063)[0.063] 0.497(0.064)[0.064] 0.497(0.061)[0.061] 0.501(0.062)[0.062] 0.504(0.061)[0.061]

0.378(0.212)[0.213] 0.383(0.215)[0.216] 0.392(0.151)[0.151] 0.404(0.128)[0.128] 0.402(0.116)[0.116]

0.190(0.234)[0.234] 0.198(0.243)[0.243] 0.191(0.210)[0.210] 0.191(0.168)[0.168] 0.199(0.151)[0.151]

0.496(0.065)[0.065] 0.498(0.065)[0.065] 0.497(0.063)[0.063] 0.502(0.063)[0.063] 0.503(0.063)[0.063]

0.375(0.215)[0.217] 0.381(0.220)[0.220] 0.385(0.167)[0.168] 0.398(0.152)[0.152] 0.399(0.136)[0.136]

0.193(0.236)[0.236] 0.205(0.249)[0.249] 0.192(0.206)[0.206] 0.202(0.184)[0.184] 0.205(0.163)[0.163]

0.495(0.064)[0.064] 0.500(0.064)[0.064] 0.496(0.061)[0.061] 0.502(0.061)[0.061] 0.502(0.063)[0.063]

Table 4: Estimation under Heteroskedasticity (𝛽1,0 = 𝛽2,0 = 0.5) 𝜙21,0 = 0.2 𝜆11,0 = 0.4 𝜆21,0 = 0.2

𝑛 = 250 𝜎12 = 0.3 2SLS 0.209(0.194)[0.194] 3SLS 0.203(0.206)[0.206] GMM-1 0.228(0.172)[0.174] GMM-2 0.211(0.199)[0.199] QML 0.183(0.210)[0.211] 𝜎12 = 0.5 2SLS 0.213(0.192)[0.193] 3SLS 0.197(0.210)[0.210] GMM-1 0.240(0.169)[0.174] GMM-2 0.204(0.201)[0.201] QML 0.178(0.210)[0.212] 𝜎12 = 0.7 2SLS 0.211(0.195)[0.195] 3SLS 0.186(0.213)[0.214] GMM-1 0.249(0.165)[0.172] GMM-2 0.195(0.200)[0.200] QML 0.169(0.214)[0.216] 𝑛 = 500 𝜎12 = 0.3 2SLS 0.209(0.127)[0.128] 3SLS 0.202(0.126)[0.126] GMM-1 0.218(0.115)[0.116] GMM-2 0.208(0.122)[0.123] QML 0.206(0.138)[0.138] 𝜎12 = 0.5 2SLS 0.210(0.128)[0.128] 3SLS 0.199(0.128)[0.128] GMM-1 0.224(0.113)[0.116] GMM-2 0.205(0.123)[0.123] QML 0.203(0.136)[0.136] 𝜎12 = 0.7 2SLS 0.209(0.128)[0.128] 3SLS 0.194(0.130)[0.130] GMM-1 0.229(0.111)[0.114] GMM-2 0.200(0.123)[0.123] QML 0.201(0.132)[0.132] Mean(SD)[RMSE]

𝛽1,0 = 0.5

0.379(0.253)[0.254] 0.379(0.253)[0.254] 0.400(0.162)[0.162] 0.407(0.140)[0.140] 0.410(0.400)[0.400]

0.196(0.280)[0.280] 0.197(0.281)[0.281] 0.190(0.252)[0.253] 0.187(0.200)[0.201] 0.180(0.580)[0.580]

0.494(0.098)[0.098] 0.492(0.095)[0.095] 0.494(0.089)[0.089] 0.499(0.090)[0.090] 0.513(0.099)[0.100]

0.382(0.256)[0.256] 0.386(0.254)[0.254] 0.395(0.184)[0.184] 0.409(0.161)[0.161] 0.405(0.470)[0.470]

0.187(0.282)[0.282] 0.193(0.290)[0.290] 0.183(0.254)[0.254] 0.189(0.210)[0.210] 0.178(0.602)[0.603]

0.493(0.097)[0.097] 0.496(0.094)[0.094] 0.493(0.087)[0.087] 0.501(0.086)[0.086] 0.513(0.099)[0.100]

0.386(0.251)[0.252] 0.394(0.250)[0.250] 0.393(0.196)[0.197] 0.411(0.184)[0.184] 0.391(0.469)[0.469]

0.186(0.282)[0.282] 0.198(0.296)[0.296] 0.175(0.240)[0.241] 0.195(0.221)[0.221] 0.186(0.543)[0.543]

0.492(0.097)[0.097] 0.500(0.093)[0.093] 0.492(0.086)[0.086] 0.504(0.086)[0.086] 0.512(0.101)[0.102]

0.386(0.176)[0.177] 0.391(0.175)[0.175] 0.398(0.118)[0.118] 0.403(0.100)[0.100] 0.397(0.320)[0.320]

0.189(0.194)[0.194] 0.192(0.193)[0.193] 0.190(0.180)[0.181] 0.190(0.138)[0.139] 0.160(0.488)[0.490]

0.500(0.065)[0.065] 0.499(0.062)[0.062] 0.499(0.060)[0.060] 0.502(0.059)[0.059] 0.513(0.065)[0.066]

0.384(0.181)[0.181] 0.389(0.180)[0.180] 0.393(0.133)[0.134] 0.401(0.117)[0.117] 0.377(0.359)[0.360]

0.189(0.197)[0.197] 0.195(0.197)[0.197] 0.189(0.179)[0.180] 0.194(0.148)[0.148] 0.178(0.477)[0.478]

0.499(0.065)[0.065] 0.500(0.063)[0.063] 0.498(0.059)[0.059] 0.503(0.059)[0.059] 0.512(0.064)[0.065]

0.383(0.179)[0.180] 0.390(0.178)[0.178] 0.390(0.141)[0.141] 0.400(0.132)[0.132] 0.355(0.318)[0.321]

0.192(0.196)[0.196] 0.199(0.198)[0.198] 0.187(0.171)[0.171] 0.199(0.157)[0.157] 0.193(0.381)[0.381]

0.498(0.066)[0.066] 0.500(0.063)[0.063] 0.497(0.059)[0.059] 0.502(0.060)[0.060] 0.511(0.065)[0.066]