QML Estimation of Dynamic Panel Data Models with ... - CiteSeerX

4 downloads 0 Views 558KB Size Report
of Economics and Social Sciences, Singapore Management University (SMU), ... †Guanghua School of Management, Peking University, Beijing 100871, China.
QML Estimation of Dynamic Panel Data Models with Spatial Errors∗ Liangjun Su†and Zhenlin Yang‡ February, 2007

Abstract We propose quasi maximum likelihood (QML) estimation of dynamic panel models with spatial errors when the cross-sectional dimension (n) is large and the time dimension (T ) is fixed. We consider both the random effects and fixed effects models and derive the limiting distributions of the QML estimators under different assumptions on the individual effects and on the initial observations. Monte Carlo simulation shows that the estimators perform well in finite samples.

JEL classifications: C12, C14, C22, C5

Key Words: Dynamic Panel, Fixed Effects, Random Effects, Spatial Dependence, Quasi Maximum Likelihood ∗ Liangjun

Su gratefully acknowledges the financial support from the NSFC (70501001). He also thanks the School

of Economics and Social Sciences, Singapore Management University (SMU), for the hospitality during his two-month visit, and the Wharton-SMU research center, SMU, for supporting his visit. Zhenlin Yang gratefully acknowledges the research support from the Wharton-SMU research center, Singapore Management University. † Guanghua School of Management, Peking University, Beijing 100871, China. Telephone: +86-10-6276-7444. Email address: [email protected]. ‡ School of Economics and Social Sciences, Singapore Management University, 90 Stamford Road, Singapore 178903. Telephone: +65-0828-0852. Fax: +65-6828- 0853. Email address: [email protected].

1

1

Introduction

Recently there has been a growing interest in the estimation of panel data models with cross-sectional or spatial dependence. See Baltagi and Li (2004), Baltagi, Song and Koh (2003), Chen and Conley (2001), Elhorst (2003, 2005), Huang (2004), Kapoor, Kelejian and Prucha (2006), Persaran (2003, 2004), Phillips and Sul (2003), Yang, Li and Tse (2006), among others, for an overview. In this paper we focus on the quasi-maximum likelihood estimation of dynamic panel data models with spatial errors. The history of spatial econometrics can be traced back at least to Cliff and Ord (1973). Since then, various methods have been proposed to estimate the spatial dependence models, including the method of maximum likelihood (ML) (Ord, 1975; Smirnov and Anselin, 2001), the method of moments (MM) (Kelejian and Prucha, 1998, 1999, 2006; Lin and Lee, 2006), and the method of quasi-maximum likelihood (QML) (Lee, 2004). A common feature of these methods is that they are all developed for estimations of estimate a cross-sectional model with no time dimension. Recently, Elhorst (2003, 2005) studies the ML estimation of (dynamic) panel data models with certain spatial dependence structure, but the asymptotic properties of the estimators are not given. For over thirty years of spatial econometrics history, the asymptotic theory for the (Q)ML estimation of spatial models has been taken for granted until the influential paper by Lee (2004), which establishes systematically the desirable consistency and asymptotic normality results for the Gaussian QML estimates of a spatial autoregressive model. He demonstrates that the rate of convergence of the QML estimates may depend on some general features of the spatial weights matrix. More recently, Yu, de Jong, and Lee (2006) extend the work of Lee (2004) to spatial dynamic panel data models with fixed effects allowing both the time dimension (T ) and the cross-sectional dimension (n) large. This paper concerns with the more traditional panel data model where n is allowed to grow but T is held fixed (usually small). As Binder, Hsiao and Pesaran (2005) remarked, this model remains the prevalent setting for the majority of empirical microeconometric research. Our work is distinct from that of Yu, de Jong, and Lee (2006) in several aspects. First, unlike Yu, de Jong, and Lee (2006) who consider only fixed effects model, we shall consider both random and fixed effects specification of the individual effects and highlight their differences and implications these differences have for estimation and inference. Second, when we keep T fixed, our estimation strategy is quite different from that in the large-n and large-T setting. In case of fixed effects model, we have to difference-out 2

the fixed effects whereas Yu, de Jong, and Lee (2006) need not do so. Third, spatial dependence is present only in the error term in our model whereas Yu, de Jong, and Lee (2006) considers spatial lag model. Consequently the two approaches complement each other. We conjecture we can extend our work to the general spatial autoregressive model with spatial error (SARAR). The rest of the paper is organized as follows. In Section 2 we introduce our model specification. We propose the quasi maximum likelihood estimates in Section 3 and study their asymptotic properties in Section 4. In Section 5 we provide a small set of Monte Carlo experiments to evaluate the finite sample performance of our estimators. All proofs are relegated to the appendix. To proceed, we introduce some notation and convention. Let In denote an n × n identity matrix. Let ιT denote a T × 1 vector of ones and JT = ιT ι0T , where prime denotes transposition throughout this paper. ⊗ denotes the Kronecker product. |·| denotes the absolute value of a scalar or determinant of a matrix.

2

Model Specification

We consider the model of the form yit = ρyi,t−1 + x0it β + zi0 γ + uit ,

(2.1)

for i = 1, ..., n, t = 1, ..., T, where the scalar parameter ρ with |ρ| < 1 characterizes the dynamic effect, xit is a p × 1 vector of time-varying exogenous variables, zi is a q × 1 vector of time-invariant exogenous variables such as the constant term or dummies representing individuals’ gender or race, 0

and the disturbance vector ut = (u1t , ..., unt ) is assumed to exhibit both non-observable individual effects and spatially autocorrelated structure, i.e.,

0

ut

= μ + εt ,

(2.2)

εt

= λWn εt + vt ,

(2.3)

0

0

where μ = (μ1 , ..., μn ) , εt = (ε1t , ..., εnt ) , and vt = (v1t , ..., vnt ) , with μ representing the unobservable individual effects which could be either random or fixed, εt representing the spatially correlated errors, and vt representing the random innovations that are assumed to be independent and identically distributed (i.i.d.) with zero mean and variance 0, σ 2v . In the case where μ is random, its elements are assumed to be i.i.d. (0, σ2μ ) and to be independent of vt . In the case where μ is fixed, the time invariant regressors should be removed from the model due to multicollinearity between the 3

observed and unobserved individual-specific effects. The parameter λ is a the spatial autoregressive coefficient and Wn is a known n × n spatial weight matrix whose diagonal elements are zero. Following the literature in spatial econometrics, we assume that In − λWn is nonsingular. We will also 0

assume the observations on (yit , x0it , zi0 ) are available at the initial period t = 0. Let Bn = Bn (λ) = In −λWn . Frequently, we will suppress the dependence of Bn and Wn on n and write B and W instead. We have εt = B −1 vt . Let yt = (y1t , ..., ynt )0 , and xt = (x1t , ..., xnt )0 . Define ¡ ¢0 0 0 Y = (y10 , ..., yT0 ) , Y−1 = y00 , ..., yT0 −1 , X = (x01 , ..., x0T ) , and Z = ιT ⊗ z, where z = (z1 , ..., zn )0 .

Using matrix notation, we can write the model specified by Eqs (2.1)-(2.3) as

¡ ¢ Y = ρY−1 + Xβ + Zγ + u, with u = (ιT ⊗ In )μ + IT ⊗ B −1 v.

(2.4)

It is worth mentioning that Eqs (2.1)-(2.3) allow spatial dependence to be present in the random disturbance term εt but not in the individual effect μ. See Baltagi, Song and Koh (2003) and Baltagi and Li (2004) for the application of this type of models. Alternatively we can allow both εt and μ to follow a spatial autoregressive model as is done by Kapoor, Kelejian and Prucha (2006) who consider GMM estimation of static spatial panel model with random effects. Our theory can readily be modified to take into account the latter case, and we conjecture that a specification test can also be developed to test for the two different specifications.

3

Quasi Maximum Likelihood Estimation

In this section we develop quasi maximum likelihood estimates (QMLEs) based on Gaussian likelihood for the models specified above.

3.1

QMLE for the Random Effects Model

For the random effects model, the covariance matrix of u has the familiar form E (uu0 ) = σ 2v Ω, with ¡ ¢ −1 Ω = Ω φμ , λ = φμ (JT ⊗ In ) + IT ⊗ (B 0 B) ,

(3.1)

where φμ = σ 2μ /σ2v , JT = ιT ι0T , and we suppress the dependence of Ω on n. We frequently suppress the argument of Ω when no confusion can arise. It is well known that the likelihood function for a dynamic panel model depends on the assumptions on the initial observations (Hsiao, 2003). If |ρ| ≥ 1 or the processes generating the xit are not stationary, it does not make sense to assume 4

that the process generating the yit is the same prior to the period of observations as for t = 1, ..., T. For this reason, we consider two sets of assumptions about initial observations {yi0 }. Case I: yi0 is exogenous If y0 is taken as exogenous, it rules out the plausible assumption that it is generated by the same process as generates yt , t = 1, ..., T. In this case, we can easily derive the likelihood function for ¡ ¢0 ¡ ¡ ¢0 ¢0 the model (2.4) conditional on y0 . Let θ = β 0 , γ 0 , ρ , δ = λ, φμ , and ς = θ0 , σ 2v , δ 0 . The log likelihood function of (2.4) is Lr (ς) = −

nT 1 nT 1 0 log(2π) − log(σ 2v ) − log |Ω| − 2 u (θ) Ω−1 u (θ) 2 2 2 2σ v

(3.2)

where u (θ) = Y − ρY−1 − Xβ − Zγ. Maximizing (3.2) gives the QMLE of the model parameters based on the Gaussian likelihood. Computationally it is convenient to work with the concentrated log-likelihood by concentrating out the parameters θ and σ 2v . From (3.2), given δ, the QMLE of θ is i−1 h b e e 0 Ω−1 Y, e 0 Ω−1 X X θ (δ) = X

and the QMLE of σ 2v is

σ b2v (δ) =

1 0 e (δ) , u e (δ) Ω−1 u nT

(3.3)

(3.4)

eb e = (X, Z, Y−1 ) , and u e (δ) = Y − X θ (δ) . Substituting (3.3) and (3.4) back into (3.2) for θ where X

and σ 2v , we obtain the concentrated log-likelihood function of δ : Lrc (δ) = −

nT 1 nT (log(2π) + 1) − log [b σ 2v (δ)] − log |Ω| . 2 2 2

(3.5)

b 0 of δ maximizes the concentrated log-likelihood (3.5). The QMLEs of θ1 and b , λ) The QMLE b δ = (φ μ 2 b σ θ1 (b δ) and σ b2v (b δ), respectively. Further, the QMLE of σ 2μ is given by σ b2μ = φ σ 2v are given by b μ bv .

Case II: yi0 is endogenous If y0 is taken as endogenous, there are several approaches to treat initial observations. Assume that, possibly after some differencing, both yit and xit are stationary. In this case, the initial observations are determined by y0 =

∞ X j=0

ρj x−j β +



X zγ μ ρj B −1 v−j . + + 1 − ρ 1 − ρ j=0 5

(3.6)

Since x−j , j = 1, 2, ..., are not observable, we cannot use x−j in our estimation procedure. In this paper we follow Bhargava and Sargan (1983) (see also Hsiao, 2003, p.76) and assume that the initial observation y0 can be approximated by eπ + , y0 = π 0 ιn + xπ 1 + zπ 2 + ≡ x

(3.7)

0

e = (ιn , x, z) , π = (π 0 , π 01 , π 02 ) , E ( |x, z) = 0, and the covariance where x = (x0 , x1 , ..., xT ) , x structure of

is affected by the spatial weight matrix W. Note that if z contains the constant term,

then ιn should vanish in (3.7). Under the stationarity assumption, (3.6) implies that y0 = ye0 + ζ 0 , where ye0 is the systematic or

exogenous part of y0 and ζ 0 is the endogenous part, namely, ye0 =

∞ X j=0

ρj x−j β +



X zγ μ ρj B −1 v−j . and ζ 0 = + 1−ρ 1 − ρ j=0

(3.8)

(3.7) then follows by assuming that the optimal predictor of ye0 conditional on the observable x and 0

eπ + ζ, where ζ = (ζ 1 , ..., ζ n ) , and ζ 0i s are i.i.d. (0, σ 2ζ ) and they are independent of z is x eπ : ye0 = x

xit , zi , μi and εit .

By construction, = ζ + ζ 0 , and E ( i ) = 0. We can verify that under strict exogeneity of xit and zi , E(

0

) = σ 2ζ In +

and

σ 2μ (1 − ρ)

E ( u0 ) =

2 In +

σ2v −1 (B 0 B) 1 − ρ2

σ 2μ 0 ι ⊗ In . 1−ρ T

(3.9)

(3.10)

We require that n > p (T + 1) + q + 1 for the identification of the parameters in (3.7). The is impossible if T is relatively large and p 6= 0. For this reason, the regressor in (3.7) is frequently P replaced by x ≡ (T + 1)−1 Tt=0 xt , and thus we have y0 = π 0 ιn + xπ 1 + zπ 2 + ≡ x eπ + , 0

where now x e = (ιn , x, z) , π = (π0 , π 01 , π 02 ) , and the variance-covariance structure of

(3.11)

and u is the

same as before except for the definition of σ 2ζ .

0 s are random with a common In contrast, Hsiao (2003, p.76, Case II) simply assumes that yi0

mean π and write y0 as eπ + , y0 = ιn π + ≡ x 6

(3.12)

where now x e = ιn , = ( 1 , ...,

0 n)

,

i

represents the deviation of initial individual endowments from

the mean, and the variance-covariance structure of

and u is the same as before except for the

definition of σ 2ζ .

In the following, x e can be any one of them defined in (3.7), (3.11) or (3.12). We will simply refer

its dimension as n × k, where k varies from case to case.

Because the likelihood function (3.2) assumes that the initial observations are exogenously given, it generally produces biased estimators when this assumption is not satisfied (see Bhargava and Sargan, 1983). Under the assumption that the presample values are generated by the same process as the within-sample observations, we need to derive the joint distribution of yT , ..., y1 , y0 from (2.4) and (3.7), (3.11) or (3.12). Denoting by σ 2v Ω∗ the n (T + 1) × n (T + 1) symmetric matrix of 0

u∗ = ( , u0 ) , we see that Ω∗ has the form ⎛ ⎞ σ 2μ 0 σ 2μ σ 2v −1 2 0 σ I + I + (B B) ι ⊗ I n ζ n 1−ρ2 1−ρ T (1−ρ)2 n ⎠ σ 2v Ω∗ = ⎝ σ 2μ 2 ι ⊗ I σ Ω n v 1−ρ T ⎛ ⎞ φμ φ −1 μ 0 1 0 φ I + I + (B B) ι ⊗ I 2 n n 2 n ζ 1−ρ 1−ρ T (1−ρ) ⎠ = σ 2v ⎝ φμ Ω 1−ρ ιT ⊗ In ⎛ ⎞ ω 11 ω 12 ⎠, ≡ σ 2v ⎝ ω 21 Ω

where φζ = σ 2ζ /σ 2v and we frequently suppress the argument of Ω∗ when no confusion can arise. ¡ ¡ ¢0 ¢0 ¢0 ¡ Now let θ = β 0 , γ 0 , π 0 , δ = ρ, λ, φμ , φζ , and ς = θ0 , σ 2v , δ 0 . Note that ς is a (p + q + k + 5)× 1 vector of unknown parameters. Based on (2.4) and (3.7), (3.11) or (3.12), and assuming Gaussian

likelihood, the random effect QML estimator of ς is derived by maximizing the following log-likelihood function: n (T + 1) 1 1 n (T + 1) log(2π) − log(σ 2v ) − log |Ω∗ | − 2 u∗ (θ)0 Ω∗−1 u∗ (θ) 2 2 2 2σ v ¡ 0 ¢0 where u∗ (θ) = y00 − π 0 x e0 , u (β, γ, ρ) , and u (β, γ, ρ) = Y − ρY−1 − Xβ − Zγ. Lrr (ς) = −

(3.13)

Maximizing (3.13) gives the quasi maximum likelihood estimates (QMLEs) of the model pa-

rameters based on the Gaussian likelihood. We will work with the concentrated log-likelihood by concentrating out the parameters θ and σ 2v . From (3.13), given δ = (ρ, λ, φμ , φy )0 , the QMLE of θ is £ ¤−1 ∗0 ∗−1 ∗ b θ (δ) = X ∗0 Ω∗−1 X ∗ X Ω Y , 7

(3.14)

and the QMLE of σ 2v is

where ⎛

Y∗ =⎝

y0 Y − ρY−1





⎠ , X∗ = ⎝

σ b2v (δ) =

1 ∗ 0 ∗−1 ∗ e (δ) , u e (δ) Ω u nT

0n×p

0n×q

X

Z

x e

0nT ×k



(3.15) ⎛

⎠, u e∗ (δ) = ⎝

y0 -e xπ b (δ)

b (δ) -Zb Y -X β γ (δ) -ρY−1



⎠,

b 0, γ and b θ (δ) = (β(δ) b(δ)0 , π b(δ)0 )0 . Substituting (3.14) and (3.15) back into (3.13) for θ and σ2v , we obtain the concentrated log-likelihood function of δ :

n (T + 1) 1 n (T + 1) (3.16) (log(2π) + 1) − log σ b2v (δ) − log |Ω∗ | . 2 2 2 ´0 ³ b b φ b ,φ of δ maximizes the concentrated log-likelihood (3.5). The QMLEs of The QMLE b δ= b ρ, λ, μ y ³ ´ ³ ´ δ , respectively. Further, the QMLE of σ 2μ and σ 2y are given θ b δ and σ b2v b θ and σ 2v are given by b Lrr c (δ) = −

2 2 b σ b σ b2y = φ by σ b2μ = φ μ b v by σ y bv , respectively.

3.2

QMLE for the Fixed Effects Model

In this section, we consider the dynamic panel data model with fixed effects. In this case, we write the model in vector notation yt = ρyt−1 + xt β + zγ + μ + B −1 vt ,

(3.17)

0

where, for example, μ = (μ1 , ..., μn ) denotes the fixed effects that may be correlated with the regressors xt and z, and the specification for other variables is the same as the random effects case. Following the standard practice, we eliminate μ by first-differencing (3.17), namely, ∆yt = ρ∆yt−1 + ∆xt β + B −1 ∆vt .

(3.18)

(3.18) is well defined for t = 2, 3, ..., T but not for t = 1 because observations on yi,−1 are not available. By continuous substitution, we can write ∆y1 as ∆y1 = ρm ∆y−m+1 +

m−1 X

ρj ∆x1−j β +

j=0

m−1 X

ρj B −1 ∆v1−j .

(3.19)

j=0

Like Hsiao et al. (2002), since the observations ∆x1−j , j = 1, 2, ... are not available, the conditional mean of ∆y1 given ∆y−m+1 and ∆x1−j , j = 0, 1, 2, ... as defined by η 1 = E (∆y1 |∆y−m+1 , ∆x1 , ∆x0 , ...) = ρm ∆y−m+1 + 8

m−1 X j=0

ρj ∆x1−j β,

(3.20)

is unknown even if we assume that m is sufficiently large. Noting that η 1 is an n × 1 vector, we will confront the incidental parameters problem if we treat η1 as a free parameter to be estimated. As Hsiao et al. (2002) remark, to get around this problem, the expected value of η1 , conditional on the observables, has to be a function of a finite number of parameters, and such a condition can hold provided that {xit } are trend-stationary (with a common deterministic linear trend) or first-difference stationary processes. In this case, the expected value of ∆xi,1−j , conditional on the 0

pT × 1 vector ∆xi = (∆x0i1 , ..., ∆x0iT ) , is linear in ∆xi , i.e., E (∆xi,1−j |∆xi ) = π 0j + π01j ∆xi ,

(3.21)

where π0j and π 1j don’t depend on i. Denote ∆x = (∆x1 , ..., ∆xn )0 , an n × pT matrix. Then under the assumption that E (∆yi,−m+1 |∆xi1 , ∆xi2 , ..., ∆xiT ) is the same across all individuals, we have xπ + e, ∆y1 = π 0 ιn + ∆xπ 1 + e ≡ ∆e where e = (η1 − E (η1 |∆x)) + 0

Pm−1 j=0

(3.22)

ρj B −1 ∆v1−j is an n × 1 random vector with typical element

ei (i = 1, ..., n) ; π = (π0 , π 01 ) is a (pT + 1) × 1 vector of parameters associated with the conditional x = (ιn , ∆x) . (3.21) and (3.22) are associated with the Bhargava and Sargan’s mean of ∆y1 ; and ∆e (1983) approximation for the dynamic random effects model with endogenous initial observations. See Ridder and Wansbeek (1990) and Blundell and Smith (1991) for a similar approach. By construction, we can verify that under strict exogeneity of xit , E (ei |∆xi ) = 0, E (ee0 ) = σ 2e In + σ 2v cm (B 0 B)

−1

≡ σ 2v B −1 (φe BB 0 + cm In ) B 0−1 ,

(3.23)

and E (e∆u02 ) = −σ 2v (B 0 B)

−1

, E (e∆u0t ) = 0 for t = 3, ..., T,

(3.24)

¡ ¢ where ∆ut = B −1 ∆vt , σ 2e =Var(η1i ) is identical across i, cm = 2 1 + ρ2m−1 / (1 + ρ) , and φe =

σ 2e /σ 2v . Clearly, when m → ∞, c∞ = 2/ (1 + ρ) , which is not a free parameter. We assume that m is an unknown finite and identical across individuals so that we will estimate cm below. See Hsiao et al. (2002, p.110). We require that n > pT + 1 for the identification of the parameters in (3.22). This becomes impossible if T is relatively large in applications and p > 1. Like the random effects model, ∆e x in (3.22) can be chosen to be other variables, so that we have xπ + e, ∆y1 = π 0 ιn + ∆xπ 1 + e ≡ ∆e 9

(3.25)

or xπ + e, ∆y1 = πιn + e ≡ ∆e where ∆e x = (ιn , ∆x) with ∆x = T −1

PT

t=1

(3.26)

∆xt in (3.25), ∆e x = ιn in (3.26), and in each case the

variance-covariance structure of e and (∆v2 , ..., ∆vT ) are the same as above. In the following, we simply refer to the dimension of π to be k. Let E = φe BB 0 + cm In . Then the covariance matrix of ∆u ≡ (e0 , ∆u02 , ..., ∆u0T ) is given by ¡ ¢ ¡ ¢ Var (∆u) = σ 2v IT ⊗ B −1 HE IT ⊗ B 0−1 ≡ σ 2v Ω† , where the nT × nT matrix HE is defined by ⎛

E

⎜ ⎜ ⎜ −In ⎜ ⎜ ⎜ 0 ⎜ ⎜ . . HE = ⎜ ⎜ . ⎜ ⎜ ⎜ 0 ⎜ ⎜ ⎜ 0 ⎝ 0

−In

0

···

0

0

0

2In

−In

···

0

0

0

−In .. .

2In .. .

··· .. .

0 .. .

0 .. .

0 .. .

0

0

···

2In

−In

0

0

0

···

−In

2In

−In

0

0

···

0

−In

2In

(3.27)



⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

(3.28)

¡ ¡ ¢0 ¢0 Now let θ = β 0 , ρ, π 0 , δ = (λ, cm , φe )0 , and ς = θ0 , σ2v , δ 0 . Note that ς is a (p + k + 5) × 1

vector of unknown parameters. Based on (3.18) and (3.22) and using Gaussian likelihood, the fixed effects QML estimator of θ is derived by maximizing the following log-likelihood function: Lf (ς) = − where

¯ ¯ nT 1 1 nT 0 log(2π) − log(σ 2v ) − log ¯Ω† ¯ − 2 ∆u (θ) Ω†−1 ∆u (θ) , 2 2 2 2σ v ⎛

⎜ ⎜ ⎜ ⎜ ∆u (θ) = ⎜ ⎜ ⎜ ⎝

∆y1 − ∆e xπ ∆y2 − ρ∆y1 − ∆x2 β .. . ∆yT − ρ∆yT −1 − ∆xT β

(3.29)



⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎠

(3.30)

Maximizing (3.29) gives the Gaussian QMLE of the model parameters. We will work with the concentrated log-likelihood by concentrating out the parameters θ and σ2v . From (3.13), given δ = (λ, cm , φe )0 , the QMLE of θ is £ ¤−1 b θ (δ) = ∆X 0 Ω†−1 ∆X ∆X 0 Ω†−1 ∆Y, 10

(3.31)

and the QMLE of σ 2v is

where

σ b2v (δ) =

1 e e (δ) , ∆u (δ)0 Ω†−1 ∆u nT

⎞ ⎛ ∆y 1 ⎟ ⎜ 0n×p ⎜ ⎟ ⎜ ⎜ ⎜ ∆x2 ⎜ ∆y2 ⎟ ⎟ ⎜ ⎜ ∆Y = ⎜ . ⎟ , ∆X = ⎜ . ⎜ . ⎜ . ⎟ ⎜ . ⎜ . ⎟ ⎠ ⎝ ⎝ ∆yT ∆xT ⎛

0n×1 ∆y1 .. . ∆yT −1

e (δ) equals ∆u (θ) with θ being replaced by b and ∆u θ (δ) .

(3.32) ⎞ ∆e x ⎟ ⎟ 0n×k ⎟ ⎟ , .. ⎟ ⎟ . ⎟ ⎠ 0n×k

Substituting (3.14) and (3.15) back into (3.13) for θ and σ 2v , we obtain the concentrated log-

likelihood function of δ : Lfc (δ) = −

¯ ¯ nT 1 nT (log(2π) + 1) − log σ b2v (δ) − log ¯Ω† ¯ . 2 2 2

(3.33)

b b b )0 of δ maximizes the concentrated log-likelihood (3.33). The QMLEs of θ The QMLE b θ = (λ, cm , φ e

2 b σ θ(b δ) and σ b2v (b δ), respectively. Further, the QMLE of σ 2e are given by σ b2e = φ and σ 2v are given by b e bv .

3.3

Computational Issues

f Maximization of Lrc (δ) , Lrr c (δ) and Lc (δ) involves repeated evaluations of the inverse and determi-

nants of the nT × nT matrices Ω and Ω† , and the n (T + 1) × n (T + 1) matrix Ω∗ . This can be a great burden when n or T or both are large. By Magnus (1982, p.242), the following identities can be used to simplify the calculation involving Ω: ¯ ¯ ¯ −1 ¯ −2(T −1) , |Ω| = ¯(B 0 B) + φμ T In ¯ · |B| h i−1 ¡ ¢ −1 Ω−1 = T −1 JT ⊗ (B 0 B) + φμ T In + IT − T −1 JT ⊗ (B 0 B) .

(3.34) (3.35)

The above formulae reduce the calculations of the inverse and determinant of an nT × nT matrix to the calculations of those of several n×n matrices. By Griffith (1988), calculations of the determinants can be further simplified: |B| =

n Y

i=1

n h ¯ Y ¯ i −1 ¯ ¯ (1 − λwi ) , and ¯(B 0 B) + φμ T In ¯ = (1 − λwi )−1 + φμ T ,

(3.36)

i=1

where wi0 s are the eigenvalues of W. The above simplifications are also used in Yang et al. (2006).

11

For the inverse of Ω∗ , using the formula for the inverse of a partitioned matrix (e.g., Magnus and Neudecker, 2002), we have ⎛

Ω∗−1 = ⎝

D−1

−D−1 ω 12 Ω−1

−Ω−1 ω 21 D−1

Ω−1 + Ω−1 ω 21 D−1 ω 12 Ω−1

where D = ω 11 − ω 12 Ω−1 ω 21 . For |Ω∗ | , we have





⎠≡⎝

ω 11

ω 12

ω 21

ω 22



⎠,

¯ ¯ |Ω∗ | = |Ω| ¯ω 11 − ω 12 Ω−1 ω 21 ¯ ,

(3.37)

(3.38)

which involves the inverse and determinants of several n × n matrices by using (3.34) and (3.35). For Ω† , by the properties of matrix operation, ¯¡ ¯¡ ¯ †¯ ¢¯ ¢¯ ¯Ω ¯ = ¯ IT ⊗ B −1 ¯ |HE | ¯ IT ⊗ B 0−1 ¯ = |B|−2T |HE | , ¡ ¢−1 −1 ¡ ¢−1 −1 HE IT ⊗ B −1 = (IT ⊗ B 0 ) HE (IT ⊗ B) , Ω†−1 = IT ⊗ B 0−1 where |HE | = |E ∗ | , −1 HE

E∗

¡ ¢ ¡ −1 ¢ ¡ ∗−1 ¢ ∗−1 = (1 − T ) h−1 E , + h1 − (1 − T ) h−1 ⊗ E 0 ⊗E 0 = T E − (T − 1) In ,

and the T × T matrix hϑ (ϑ = 0, 1) is defined as ⎛

⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ hϑ = ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

ϑ

−1

0

···

−1

2

0 .. .

−1 .. .

2 .. .

··· .. .

0

0

0

···

0

0

0

···

−1 · · ·

12

0

0



⎟ ⎟ 0 ⎟ ⎟ ⎟ 0 0 ⎟ ⎟ . .. .. ⎟ ⎟ . . ⎟ ⎟ ⎟ 2 −1 ⎟ ⎠ −1 2 0

(3.39)

Hsiao et al. (2002) give the determinant of hϑ as |hϑ | = 1 + T (ϑ − 1) , and the inverse of hϑ as h−1 ϑ

4

= [1 + T (ϑ − 1)]−1 ⎛ T T −1 T −2 ⎜ ⎜ ⎜ T − 1 (T − 1) ϑ (T − 2) ϑ ⎜ ⎜ ⎜ T − 2 (T − 2) ϑ (T -2) (2ϑ-1) ⎜ ⎜ .. .. .. ⎜ . . . ⎜ ⎜ ⎜ 3ϑ 3 (2ϑ − 1) ⎜ 3 ⎜ ⎜ ⎜ 2 2ϑ 2 (2ϑ − 1) ⎝ 1 ϑ (2ϑ − 1)

··· ··· ··· .. . ··· ··· ···

3

2

1





ϑ

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ 3 (2ϑ − 1) 2 (2ϑ − 1) (2ϑ − 1) ⎟ ⎟ .. .. ⎟. . . ⎟ ⎟ ⎟ 3 [(T -3) ϑ- (T -4)] 2 [(T -3) ϑ- (T -4)] (T -3) ϑ- (T -4) ⎟ ⎟ ⎟ 2 [(T -3) ϑ- (T -4)] 2 [(T -2) ϑ- (T -3)] (T -2) ϑ- (T -3) ⎟ ⎠ (T -3) ϑ- (T -4) (T -2) ϑ- (T -3) (T -1) ϑ- (T -2)

Asymptotic Distribution of the QMLEs

In this section we first state a set of assumptions that are used to establish asymptotic properties of the estimators proposed in the previous sections. We then study the consistency and asymptotic normality of the proposed estimators for the dynamic panel data models with spatial errors. Recall the notation for the parameters: θ the parameter vector that can be concentrated out from the likelihood function (exclusive of σ 2v ), δ the parameter vector in the concentrated likelihood, and ¡ ¢0 ς = (θ0 , σ 2v , δ 0 )0 . Let ς 0 = θ00 , σ 2v0 , δ 00 be the true parameter vector. Let Ξ be the parameter space of ς, and Λ the space of δ.

4.1

Generic Assumptions

To provide a rigorous analysis of the QMLEs, we need to assume different set of conditions based on different model specifications. Nevertheless, for both the random and fixed effects specifications we first make the following generic assumptions.

Assumption G (i) The available observations are (yit , xit , zi ) for i = 1, ..., n, t = 0, 1, ..., T, with T ≥ 2 fixed and n → ∞. (ii) The disturbance vector ut = (u1t , ..., unt )0 exhibits both individual effects and spatially autocorrelated structure defined in (2.2) and (2.3). vit are i.i.d. for all i and t with E (vit ) = 0, Var(vit ) = σ 2v , and E |vit |4+

0

< ∞ for some

0

> 0. (iii) {xit , t = · · · , −1, 0, 1, · · · }

and {zi } are strictly exogenous. (iv) |ρ| < 1 in (2.1). (v) The true parameter ς 0 lies in the interior of a convex compact set Ξ.

13



Assumption G(i) corresponds to traditional panel data models with large n and small T . One can consider extending the QMLE procedure to panels with large n and large T ; see, for example, Phillips and Sul (2003). Assumption G(ii) is standard in the literature. Assumption G(iii) is not as strong as it appears in the spatial econometrics literature since in most spatial analysis regressors are treated as nonstochastic fixed constants (e.g., Anselin, 1988; Kelejian and Prucha, 1998, 1999, 2006; Lee, 2002, 2004; Lin and Lee, 2006; Robinson, 2006). One can relax the strict exogeneity condition in Assumption G(iii) like Hsiao et al. (2002) but this will complicate our analysis in case of spatially correlated errors. Assumption G(iv) can be relaxed for the case of random effects with exogenous initial observations without any change of the derivation. It can also be relaxed for the fixed effects model with some modification of the derivation as did Hsiao et al. (2002). Assumption G(v) is commonly assumed in the literature but deserves some further discussion: if we assume that the spatial weight matrix W is row normalized, it is well known that the largest eigenvalue (in absolute value) of W is 1, so that only λ ∈ (−1, 1) is required for the good behavior of (In − λW )

−1

. In this

case Assumption G(v) requires that λ lie in a compact subset of (−1, 1) . Further, once we relax the assumption that W is row-normalized, the parameter space usually varies with the spatial units n. For more detailed and excellent discussions, see Kelejian and Prucha (2006). For the spatial weight matrix, we make the following assumptions.

Assumption W (i) The elements wn,ij of Wn are at most of order h−1 n , denoted by O (1/hn ) , uniformly in all i, j. As a normalization, Wn,ii = 0 for all i. (ii) The ratio hn /n → 0 as n goes to ª © infinity. (iii) The matrix Bn is nonsingular. (iv) The sequences of matrices {Wn } and Bn−1 are ª © uniformly bounded in both row and column sums. (v) Bn−1 (λ) are uniformly bounded in either

row or column sums, uniformly in λ in a compact parameter space Λ.

Assumptions W(i)-(iv) parallel Assumptions 2-4 of Lee (2004). Like Lee (2004), Assumptions W(i)-(iv) provide the essential features of the weight matrix for the model. Assumption W(i) is always satisfied if {hn } is a bounded sequence. We allow {hn } to be divergent but at rate smaller than n as specified in Assumption W(ii). Assumption W(iii) guarantees that disturbance term is well defined. Kelejian and Prucha (1998, 1999, 2001) and Lee (2004) also assume Assumption 2(iv) which limits the spatial correlation to some degree but facilitates the study of the asymptotic properties

14

of the spatial parameter estimators. By Horn and Hohnson (1985, p. 301), limsupn kλ0 Wn k < 1 is sufficient to guarantee that Bn−1 is uniformly bounded in both row and column sums. By Lee ª © (2002, Lemma A.3), Assumption W(iv) implies Bn−1 (λ) are are uniformly bounded in both row

and column sums uniformly in a neighborhood of λ0 as explicitly given in Assumption W(v). To proceed, let Ω0 , Ω∗0 and Ω†0 be respectively Ω, Ω∗ and Ω† evaluated at δ 0 .

4.2

Asymptotic Distributions for Random Effects Model

For the random effects model, we need to supplement the generic Assumption G with additional assumptions on the individual-specific effects μ, and the regressors. In particular, we make the following assumptions.

Assumption R (i) The individual effects μi are i.i.d. with E (μi ) = 0,Var (μi ) = σ 2μ , and 4+

E |μi |

0

< ∞ for some

0

> 0. (ii) μi and vjt are mutually independent, and they are independent

of xks and zk for all i, j, k, t, s. (iii) All elements in (xit , zi ) have 4 +

0

moments for some

0

> 0.

Assumption R(i) and the first part of Assumption R(ii) is standard in the random effects panel data literature. The second part of Assumption R(ii) is for convenience. Alternatively we can treat the regressors as nonstochastic matrix.

Case I: yi0 is exogenous To derive the consistency of the QML estimator, we need the parameters of interest to be identifi£ r¡ ¢¤ θ, σ 2v , δ , where we suppress the dependence able. For this purpose, define Lr∗ c (δ) = maxθ,σ 2v E L £ r¡ ¢¤ θ, σ 2v , δ is given by of Lr∗ c (δ) on n. The optimal solution to maxθ,σ 2v E L

and

´ ´i−1 ³ h ³ e e 0 Ω−1 Y = θ0 e e 0 Ω−1 X E X θ (δ) = E X

σ e2v (δ) =

¤ σ2 ¢ ¡ 1 £ E u (θ0 )0 Ω−1 u (θ0 ) = v0 tr Ω−1 Ω0 , nT nT

(4.1)

(4.2)

e 0 Ω−1 u) = 0 by e 0 + u and E(X where the second equality in (4.1) follows from the fact that Y = Xθ

Lemma B.9. Consequently, we have Lr∗ c (δ) = −

nT 1 nT (log(2π) + 1) − log [e σ 2v (δ)] − log |Ω| . 2 2 2 15

(4.3)

We impose the following identification condition.

−1 Assumption R (iv) Ler (δ) ≡ limn→∞ (nT ) Lr∗ c (δ) is uniquely maximized at δ 0 .

The following theorem establishes the consistency of QMLE for the random effects model with exogenous initial observations. Theorem 4.1 Under Assumptions G, W, and R(i)-(iv), if the initial observations yi0 are exogep

nously given, then bς → ς 0 .

ς ) /∂ς = 0 To derive the asymptotic distribution of bς , we need to make a Taylor expansion of ∂Lr (b

at ς 0 .The first order derivatives of Lr (ς) are ∂Lr (ς) ∂θ ∂Lr (ς) ∂σ 2v ∂Lr (ς) ∂λ ∂Lr (ς) ∂φμ where A = ∂ (B 0 B)

−1

= = = =

1 e 0 −1 X Ω u (θ) , σ 2v 1 nT 0 u (θ) Ω−1 u (θ) − 2 , 2σ 4v 2σ v ¤ 1 £ 1 0 u (θ) Ω−1 (IT ⊗ A) Ω−1 u (θ) − tr Ω−1 (IT ⊗ A) , 2 2σ v 2 ¤ 1 1 £ u (θ)0 Ω−1 (JT ⊗ In ) Ω−1 u (θ) − tr Ω−1 (JT ⊗ In ) , 2 2σ v 2

/∂λ = (B 0 B)

−1

−1

(W 0 B + B 0 W ) (B 0 B)

. At ς = ς 0 , these are linear and

quadratic functions of u ≡ u (θ0 ) . Note that the elements in u are not independent and the compo-

e are random, thus the central limit theorem (CLT) for linear and quadratic forms in Kelenents in X ¡ ¢ jian and Prucha (2001) cannot be directly applied to u. We need to plug u = (ιT ⊗In )μ+ IT ⊗ B −1 v

into ∂Lr (ς 0 ) /∂ς and apply the CLT to linear and quadratic functions of μ and v separately.

Let Hnr (ς) = ∂ 2 Lr (ς) / (∂ς∂ς 0 ) and Γrn (ς) = E(∂Lr (ς) / (∂ς) ∂Lr (ς) / (∂ς 0 )). Then Hnr (ς) is the Hessian matrix and Γrn (ς 0 ) is the variance matrix of ∂Lr (ς 0 ) /∂ς, both of which are reported in Appendix A. Using some of the lemmas in Appendix B (Lemmas B.3-B.6 in particular), we can verify that for normally distributed individual-specific effects μi and error terms vit , Γrn (ς 0 ) = −E (Hnr (ς 0 )) . Theorem 4.2 Under Assumptions G, W, and R(i)-(iv), if the initial observations yi0 are ex√ ¢ d ¡ −1 ς − ς 0 ) →N 0, Hr−1 Γr Hr−1 , where Hr = limn→∞ (nT ) Hnr (ς 0 ) and ogenously given, then nT (b Γr = limn→∞ (nT )−1 Γrn (ς 0 ) .

16

As in Lee (2004), the asymptotic results in Theorem 4.2 is valid regardless of whether the sequence {hn } is bounded or divergent. Even though not reported here, the matrices Γr and Hr can be simplified if hn → ∞ as n → ∞. When both μi and vit are normally distributed, the asymptotic variance-covariance matrix reduces to Hr−1 . In practice, we need to estimate the variance-covariance matrix. By the lemmas in Appendix B, b r,n ≡ (nT )−1 Hnr (b ς ) is a consistent estimate of Hr . The consistent estimate of we can show that H

Γr can be obtained by its sample analogue. To conserve space, we omit the details here. Similar remarks hold for the other cases.

Case II: yi0 is endogenous £ rr ¡ ¢¤ θ, σ 2v , δ , where we suppress the dependence of In this case, define Lrr∗ c (δ) = maxθ,σ 2v E L £ rr ¡ ¢¤ θ, σ2v , δ is now given by Lrr∗ c (δ) on n. The optimal solution to maxθ,σ 2v E L

and

¢ ¢¤−1 ¡ ∗0 ∗−1 £ ¡ e E X Ω (δ) Y ∗ (ρ) = θ0 θ (δ) = E X ∗0 Ω∗−1 (δ) X ∗

(4.4)

¤ ¢ £ ¡ 1 σ 2v0 0 (4.5) E u∗ (θ0 ) Ω∗−1 (δ) u∗ (θ0 ) = tr Ω∗−1 (δ) Ω∗0 . n (T + 1) n (T + 1) ¡ ¢0 ∗ 0 = 01×n , Y−1 . The second equality in (4.4) follows from the fact that Y ∗ (ρ) = X ∗ θ0 + Let Y−1 ¡ ¢ ∗ ∗ , and E X ∗0 Ω∗−1 (δ) u∗ (θ0 ) + (ρ0 − ρ) Y−1 = 0 by arguments similar to those u∗ (θ0 ) + (ρ0 − ρ) Y−1 σ e2v (δ) =

used in the proof of Lemma B.13. Consequently, we have Lrr∗ c (δ) = −

n (T + 1) 1 n (T + 1) (log(2π) + 1) − log σ e2v (δ) − log |Ω∗ | . 2 2 2

(4.6)

We make the following identification assumption.

−1 Assumption R (iv ∗ ) Lerr (δ) ≡ limn→∞ (nT ) Lrr∗ c (δ) is uniquely maximized at δ 0 .

The following theorem establishes the consistency of QMLE for the random effects model with endogenous initial observations. Theorem 4.3 Under Assumptions G, W, R(i)-(iii) and R(iv∗ ), if the initial observations yi0 are p

endogenously given, then b ς → ς 0. 17

The score functions are given below: ∂Lrr (ς) ∂θ ∂Lrr (ς) ∂σ 2v ∂Lrr (ς) ∂ρ ∂Lrr (ς) ∂λ ∂Lrr (ς) ∂φμ ∂Lrr (ς) ∂φζ

= = = = = =

1 ∗0 ∗−1 ∗ X Ω u (θ) , σ2v 1 ∗ 0 ∗−1 ∗ n (T + 1) u (θ) Ω u (θ) − , 2σ4v 2σ 2v ¢ ∗−1 ∗ ¤ 1 1 £ 1 ¡ 0 0 01×n , Y−1 Ω u (θ) + 2 u∗ (θ) Ω∗−1 Ω∗ρ Ω∗−1 u∗ (θ) − tr Ω∗−1 Ω∗ρ , 2 σv 2σv 2 ¤ £ 1 1 ∗ 0 ∗−1 ∗ u (θ) Ω (IT ⊗ A) Ω∗−1 u∗ (θ) − tr Ω∗−1 (IT∗ ⊗ A) , 2σ2v 2 ¤ 1 ∗ 0 ∗−1 ∗ 1 £ u (θ) Ω (JT ⊗ In ) Ω∗−1 u∗ (θ) − tr Ω∗−1 (JT∗ ⊗ In ) , 2 2σv 2 ¤ 1 ∗ 0 ∗−1 1 £ u (θ) Ω (KT∗ ⊗ In ) Ω∗−1 u∗ (θ) − tr Ω∗−1 (KT∗ ⊗ In ) , 2 2σv 2

where Ω∗ρ = ∂Ω∗ /∂ρ, ⎛

IT∗ = ⎝

1 1−ρ

01×T

0T ×1

IT





⎠ , JT∗ = ⎝

0

ι0T

ιT

JT





⎠ , and KT∗ = ⎝

1

01×T

0T ×1

0T ×T



⎠.

rr rr 0 Let Hnrr (ς) = ∂ 2 Lrr (ς) / (∂ς∂ς 0 ) and Γrr n (ς) = E(∂L (ς) / (∂ς) ∂L (ς) / (∂ς )). We report the

Hessian matrix Hnrr (ς) and Γrr n (ς 0 ) in Appendix A. We now state the asymptotic normality result. Theorem 4.4 Under Assumptions G, W, R(i)-(iii) and R(iv∗ ), if the initial observations yi0 satisfy √ ¢ d ¡ −1 −1 Γrr Hrr , where Hrr = limn→∞ (nT )−1 Hnrr (ς 0 ) (3.7), (3.11) or (3.12), then nT (bς − ς 0 ) →N 0, Hrr

and Γrr = limn→∞ (nT )−1 Γrr n (ς 0 ) .

4.3

Asymptotic Distribution for the Fixed Effects Model

For the fixed effect model, we need to supplement the general Assumption G with the following assumption on the regressors.

Assumption F. (i) The processes {xit , t = · · · , −1, 0, 1, · · · } are trend-stationary or first-differencing stationary for all i = 1, ..., n. (ii) All elements in (∆vit , ∆xit ) have 4 +

0

moments for some

0

> 0.

£ ¡ ¢¤ Define Lfc ∗ (δ) = maxθ,σ2v E Lf θ, σ 2v , δ , where we suppress the dependence of Lfc ∗ (δ) on n. £ ¡ ¢¤ The optimal solution to maxθ,σ2v E Lf θ, σ2v , δ is now given by ¢¤−1 ¡ ¢ £ ¡ e E ∆X 0 Ω†−1 ∆Y = θ0 θ (δ) = E ∆X 0 Ω†−1 ∆X 18

(4.7)

and σ e2v (δ) =

´ ³ ¤ σ2 1 £ E ∆u (θ0 )0 Ω†−1 ∆u (θ0 ) = v0 tr Ω†−1 Ω†0 , nT nT

(4.8)

¡ ¢ where the second equality in (4.7) follows from the fact that ∆Y = ∆Xθ0 +∆u (θ0 ) and E ∆X 0 Ω†−1 ∆u (θ0 ) =

0 by Lemma B.16. Consequently, we have Lfc ∗ (δ) = −

¯ ¯ nT 1 nT (log(2π) + 1) − log [e σ 2v (δ)] − log ¯Ω† ¯ . 2 2 2

(4.9)

The following identification condition is needed for our consistency result.

−1 Assumption F (iii) Lef (δ) ≡ limn→∞ (nT ) Lfc ∗ (δ) is uniquely maximized at δ 0 .

Theorem 4.5 Under Assumptions G, F, and W, if the initial observations satisfy (3.22), (3.25), p

or (3.26), then b ς → ς 0.

The first order derivatives of Lf (ς) are ∂Lf (ς) ∂θ ∂Lf (ς) ∂σ 2v ∂Lf (ς) ∂λ ∂Lf (ς) ∂cm ∂Lf (ς) ∂φe

where Ω†λ

Ω†cm Ω†φe

= = = = =

1 ∆X 0 Ω†−1 ∆u (θ) , σ 2v 1 nT 0 ∆u (θ) Ω†−1 ∆u (θ) − 2 , 4 2σ v 2σv 1 h †−1 † i 1 0 †−1 † †−1 ∆u (θ) Ω Ω Ω ∆u (θ) − tr Ω Ωλ , λ 2σ 2v 2 ¤ 1 1 £ ∆u (θ)0 Ω†−1 Ω†cm Ω∗−1 ∆u (θ) − tr Ω†−1 Ω†cm , 2 2σ v 2 1 1 h †−1 † i 0 †−1 † ∗−1 ∆u (θ) Ω Ω Ω ∆u (θ) − tr Ω Ωφe , φe 2σ 2v 2

³ ¡ ¢ ¡ ¢ ¡ ¢ ¢0 ´ ∂Ω† ¡ = IT ⊗ B −1 W B −1 HE IT ⊗ B 0−1 + IT ⊗ B −1 HE IT ⊗ B −1 W B −1 ∂λ ¢ ¡ ¢ ¡ + IT ⊗ B −1 HE,λ IT ⊗ B 0−1 , ¡ ¢ ¡ ¢ ∂Ω† ≡ = IT ⊗ B −1 HE,cm IT ⊗ B 0−1 , ∂cm ¢ ¡ ¢ ∂Ω† ¡ ≡ = IT ⊗ B −1 HE,φe IT ⊗ B 0−1 , ∂φe ≡

and HE,λ , HE,cm and HE,φe are the derivatives of HE with respect to λ, cm and φe , respectively. Let Hnf (ς) = ∂ 2 Lf (ς) / (∂ς∂ς 0 ) and Γfn (ς) = E(∂Lf (ς) / (∂ς) ∂Lf (ς) / (∂ς 0 )). As before we report the Hessian matrix Hnf (ς) and Γfn (ς 0 ) in Appendix A. We now state the asymptotic normality result. Theorem 4.6 Under Assumptions G, F and W, if the initial observations satisfy (3.22), (3.25), ³ ´ √ d or (3.26), then nT (bς − ς 0 ) →N 0, Hf−1 Γf Hf−1 , where Hf = limn→∞ (nT )−1 Hnf (ς 0 ) and Γr =

limn→∞ (nT )−1 Γfn (ς 0 ) .

19

5

Finite Sample Properties of the QMLEs

In this section, we investigate the performance of the QMLEs when sample sizes are finite. In particular, we investigate under the random effects model the consequences of treating the initial observations as endogenous when they are in fact exogenous, and vice versa. In the case of fixed effects model, we pay attention to whether there is a difference in parameter estimates when the fixed effects are uncorrelated with the regressor(s) versus when they are correlated with the regressor(s). We use the following data generating process (DGP): yt

= ρyt−1 + β 0 ιn + xt β 1 + xγ + ut

ut

= μ + εt

εt

= λWn εt + vt

where yt , yt−1 , xt , and z are all n × 1 vectors. The elements of xt are randomly generated from N (0, 4), and the elements of z are randomly generated from Bernoulli(0.5). The spatial weight matrix is generated according to Rook contiguity, by randomly allocating the n spatial units on a lattice of k × m(≥ n) squares, finding the neighbors for each unit, and then row normalizing. We choose β 0 = 5, β 1 = 1, γ = 1, σ μ = 0.5, σ v = 0.5, ρ ∈ {0.25, 0.50, 0.75}, and λ ∈ {0.25, 0.50, 0.75}. Each set of Monte Carlo results (corresponding to a combination of the ρ and λ values) is based on 1000 samples. Table 1 summarizes the results for the random effects model and Table 2 reports the results for the fixed effects model. From the upper panel of Table 1, we see that when y0 is exogenous and is correctly treated as exogenous, the QMLEs perform quite well. However, when y0 is exogenous but treated incorrectly as endogenous, the QMLEs can be quite biased with large RMSEs, in particular in the cases that ρ and λ are large. When y0 is endogenous and is correctly treated as endogenous, the QMLEs also perform quite well. When y0 is endogenous but treated incorrectly as exogenous (lower panel of Table 1), the QMLEs perform reasonably well, except for case of γ and σ μ where the QMLEs are sightly biased. From Table 2, we see that QMLEs under the fixed effects perform generally well, whether the fixed effects are correlated with the regressor or not.

REFERENCES

20

Anselin, L. (1988) Spatial Econometrics: Methods and Models. The Netherlands: Kluwer Academic Press. Baltagi, B. H., and D. Li (2004) Prediction in the panel data models with spatial correlation. In L. Anselin, R. Florax, and S. J. Rey (Eds). Advances in Spatial Econometrics. Springer-Verlag: Berlin. Baltagi, B. H., S. H. Song, and W. Koh (2003) Testing panel data regression models with spatial error correlation. Journal of Econometrics 117, 123-150. Bhargava, A., and J. D. Sargan (1983) Estimating dynamic random effects models from panel data covering short time periods. Econometrica 51, 1635-1659. Binder, M., C. Hsiao, and M. H. Pesaran (2005) Estimation and inference in short panel vector autoregressions with unit roots and cointegration. Econometric Theory 21, 795-837. Blundell, R., and R. J. Smith (1991) Conditions initiales et estimation efficace dans les mod`eles ´ dynamiques sur donn´ees de panel. Annales d’Economie et de Statistique 20/21, 109-123. Chen, X. and T. G. Conley (2001) A new semiparametric spatial model for panel time series. Journal of Econometrics 105, 59-83. Cliff, A. D., and J. K. Ord (1973), Spatial Autocorrelation. London: Pion Ltd. Elhorst, J. P. (2003) Specification and estimation of spatial panel data models. International Regional Science Review 26, 244-268. Elhorst, J. P. (2005) Unconditional maximum likelihood estimation of linear and log-linear dynamic models for spatial panels. Geographical Analysis 37, 85-106. Griffith, D. A. (1988) Advanced Spatial Statistics. Kluwer, Dordrecht, the Netherlands. Horn, R. and C. Hohnson (1985), Matrix Analysis. New York. John Wiley & Sons. Hsiao, C. (2003) Analysis of Panel Data. 2nd edition. Cambridge University Press. Huang, X. (2004) Panel vector autoregression under cross sectional dependence. Working paper. University of California, Riverside.

21

Kapoor, M., H. H. Kelejian, and I. R. Prucha (2006) Panel data models with spatially correlated error components. Dept. of Economics, University of Maryland, College Park. Forthcoming in Journal of Econometrics. Kelejian, H. H., and I. R. Prucha (1998), A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances, Journal of Real Estate Finance and Economics 17, 377-398. Kelejian, H. H., and I. R. Prucha (1999), A generalized moments estimator for the autoregressive parameter in a spatial model, International Economic Review 40, 509-533. Kelejian, H. H., and I. R. Prucha (2001), On the asymptotic distribution of the Moran I test statistic with applications. Journal of Econometrics 104, 219-257. Kelejian, H. H., and I. R. Prucha (2006), Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. Working paper, Dept. of Economics, University of Maryland, College Park. Lee, L. (2002), Asymptotic Distribution of Quasi-maximum Likelihood Estimators for Spatial Autoregressive Models, Mimeo, Dept. of Economics, Ohio State University. Lee, L-F. (2004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72, 1899-1925. Lin, X. and L-F. Lee (2006) GMM estimation of spatial autoregressive models with unknown heteroskedasticity. Working paper, The Ohio State University, Columbus. Magnus, J. R. (1982) Multivariate error components analysis of linear and nonlinear regression models by maximum likelihood. Journal of Econometrics 19, 239-285. Magnus, J. R. and H. Neudecker (2002) Matrix Differential Calculus with Applications in Statistics and Econometrics. John Wiley and Sons. Nerlove, M. (2002) Likelihood inference for dynamic panel models. In Essays in Panel Data Econometrics, M. Nerlove (Eds), pp. 307-348. Cambridge University Press. Ord, J. K. (1975), Estimation Methods for Models of Spatial Interaction, Journal of the American Statistical Association 70, 120-126. 22

Persaran, M. H. (2003) Estimation and inference in large heterogeneous panels with cross section dependence. Working paper no 0305. University of Cambridge. Persaran, M. H. (2004) General diagnostic tests for cross section dependence in panels. Working paper no. 1229. University of Cambridge. Phillips, P. C. B., and D. Sul (2003) Dynamic panel estimation and homogeneity testing under cross-section dependence. Econometrics Journal 6, 217-259. Ridder, G., and T. Wansbeek (1990) Dynamic models for panel data. In Advanced Lectures in Quantitative Economics. Edited by F. van der Ploeg. London: Academic Press. Robinson, P. M. (2006) Efficient estimation of the semiparametric spatial autoregressive model. Mimeo. Dept. of Economics, London School of Economics. Smirnov, O., and L. Anselin (2001), Fast Maximum Likelihood Estimation of Very Large Spatial Autoregressive Models: A Characteristic Polynomial Approach, Computational Statistics and Data Analysis 35, 301-319. White, H. (1994), Estimation, Inference and Specification Analysis, Econometric Society Monographs No. 22, Cambridge University Press. Yang, Z. C. Li, and Y. K. Tse (2006) Functional form and spatial dependence in dynamic panels. Economics Letters 91, 138-145.

23

Appendix A

The Hessian and Information Matrices

In this appendix, we give the Hessian and information matrices for each scenario. For any random ¡ ¢ ¡ ¢ variable a with zero mean and fourth moments, let κa = E a4 − 3E a2 .

A.1

Random Effects Model with Fixed Initial Observations

e = Y − ρY−1 − Xβ − Zγ, A = (B 0 B)−1 (W 0 B + B 0 W ) (B 0 B)−1 , and Recall u (θ) = Y − Xθ .

−1

A ≡ ∂A/∂λ = 2 (B 0 B)

[(W 0 B + B 0 W ) A −W 0 W ]. To conserve space, let P1 = Ω−1 (IT ⊗ A) Ω−1 , ³ .´ P2 = Ω−1 (JT ⊗ In ) Ω−1 , P3 = Ω−1 IT ⊗ A Ω−1 , and P4 = P1 ΩP1 − P3 . Then the Hessian matrix Hr,n (ς) = ∂ 2 Lr (ς) / (∂ς∂ς 0 ) has elements ∂ 2 Lr (ς) e 0 Ω−1 X, e = − 12 X 0 ∂θ∂θ

∂ 2 Lr (ς) ∂θ∂σ 2v

σv

∂ 2 Lr (ς) ∂θ∂λ

=

∂ 2 Lr (ς) ∂σ 2v ∂σ 2v

e 0 P1 u (θ1 ) , − σ12 X v 0

= − σ16 u (θ1 ) Ω−1 u (θ1 ) + v

∂ 2 Lr (ς) ∂σ 2v ∂φμ

= − 2σ14 u (θ1 )0 P2 u (θ1 ) ,

∂ 2 Lr (ς) ∂θ∂φμ ∂ 2 Lr (ς) ∂σ 2v ∂λ

nT 2σ 4v ,

∂ 2 Lr (ς) ∂λ∂λ

v

0 ∂ 2 Lr (ς) 1 1 ∂λ∂φμ = 2 tr (P1 ΩP2 Ω) - σ 2v u (θ 1 )

e 0 Ω−1 u (θ1 ) , = − σ14 X v

=

e 0 P2 u (θ1 ) , − σ12 X v

= − 2σ14 u (θ1 )0 P1 u (θ1 ) , v ¡ ¢ = 12 tr (P4 Ω) - σ12 u (θ1 )0 P4 + 12 P3 u (θ1 ) , v

0 ∂ 2 Lr (ς) 1 1 ∂φμ ∂φμ = 2 tr (P2 ΩP2 Ω) - σ 2v u (θ 1 ) P2 ΩP2 Ωu (θ 1 ) . ¡ ¢ ¡ ¢ In ) qn (ιT ⊗ In ) , Gqn ,2 = IT ⊗ B 0−1 qn IT ⊗ B −1 ,

P1 ΩP2 u (θ1 ) ,

For an nT ×nT matrix qn , let Gqn ,1 = (ι0T ⊗ ¡ ¢ and Gqn ,3 = (ι0T ⊗ In ) qn IT ⊗ B −1 . The elements of Γr,n (ς 0 ) ≡ E [{∂Lr (ς 0 ) /∂ς} {∂Lr (ς 0 ) /∂ς 0 }]

are given by: Γr,θθ =

1 e 0 −1 X), e σ 2v E(X Ω

Γr,θσ2v = Γr,θλ =

1 e 0 −1 uu0 Ω−1 u), 2σ 6v E(X Ω

1 e 0 −1 uu0 Ω−1 2σ 4v E(X Ω

(IT ⊗ A) Ω−1 u),

1 e 0 −1 uu0 Ω−1 2σ 4v E(X Ω

(JT ⊗ In ) Ω−1 u), n P o P nT 1 κμ ni=1 G2Ω−1 ,1ii + κv ni=1 G2Ω−1 ,2ii , Γr,σ2v σ2v = 2σ 4 + 4σ 8 v v © P ª P Γr,σ2v λ = 2σ12 tr (P1 Ω) + 4σ16 κμ ni=1 GΩ−1 ,1ii GP1 ,1ii + κv ni=1 GΩ−1 ,2ii GP1 ,2ii , v v © Pn ª Pn Γr,σ2v φu = 2σ12 tr (P2 Ω) + 4σ16 κμ i=1 GΩ−1 ,1ii GP2 ,1ii + κv i=1 GΩ−1 ,2ii GP2 ,2ii , v v © Pn ª Pn Γr,λλ = 12 tr (P1 ΩP1 Ω) + 4σ14 κμ i=1 G2P1 ,1ii + κv i=1 G2P1 ,2ii , v P P Γr,λφu = 12 tr (P1 ΩP2 Ω) + 4σ14 {κμ ni=1 GP1 ,1ii GP2 ,1ii + κv ni=1 GP1 ,2ii GP2 ,2ii } , v © P ª P Γr,φu φu = 12 tr (P2 ΩP2 Ω) + 4σ12 κμ ni=1 G2P2 ,1ii + κv ni=1 G2P2 ,2ii .

Γr,θφu =

v

The expression for Γr,θσ2v , Γr,θλ and Γr,θφu can be expressed out by using Lemma B.18.

24

A.2

Random Effects Model with Endogenous Initial Observations

¡ ¢0 ∗ 0 To conserve space, let Y−1 = 01×n , Y−1 , P0∗ = Ω∗−1 Ω∗ρ Ω∗−1 , P1∗ = Ω∗−1 (IT∗ ⊗ A) Ω∗−1 , P2∗ = ³ .´ Ω∗−1 (JT∗ ⊗ In ) Ω∗−1 , P3∗ = Ω∗−1 (KT∗ ⊗ In ) Ω∗−1 , P4∗ = Ω−1 IT ⊗ A Ω−1 , and P5∗ = P1∗ Ω∗ P1∗ −P4∗ . Write u∗ = u∗ (θ) . Then the Hessian matrix Hrr,n (ς) = ∂ 2 Lrr (ς) / (∂ς∂ς 0 ) has elements ∂ 2 Lrr (ς) ∂θ∂θ 0 2

v

rr

∂ L (ς) ∂θ∂ρ

∗ = − σ12 X ∗0 Ω∗−1 Y−1 - σ12 X ∗0 P0∗ u∗ , v

∂ 2 Lrr (ς) ∂θ∂φμ

v

= − σ12 X ∗0 P2∗ u∗ , v

∂ 2 Lrr (ς) ∂σ 2v ∂σ 2v 2

∂ 2 Lrr (ς) ∂θ∂σ 2v

= − σ12 X ∗0 Ω∗−1 X ∗ ,

=

rr

∂ L (ς) ∂σ 2v ∂λ

− σ16 u∗0 Ω∗−1 u∗ v

+

n(T +1) 2σ 4v ,

rr

∂ L (ς) ∂θ∂λ ∂ 2 Lrr (ς) ∂θ∂φζ ∂ 2 Lrr (ς) ∂σ 2v ∂ρ 2

rr

∂ L (ς) ∂σ 2v ∂φμ

= − 2σ14 u∗0 P1∗ u∗ , v

∂ 2 Lrr (ς) 1 ∗0 ∗ ∗ ∂σ 2v ∂φζ = − 2σ 4v u P3 u , 2 rr ∂ L (ς) ∗0 ∗-1 ∗ = − σ12 Y−1 Ω Y−1 ∂ρ∂ρ v

2



2 ∗0 ∗-1 ∗ ∗-1 ∗ Ωρ Ω u σ 2v Y−1 Ω

= − σ14 X ∗0 Ω∗−1 u∗ , v

= − σ12 X ∗0 P1∗ u∗ , v

= − σ12 X ∗0 P3∗ u∗ , v

=

∗ − σ14 Y−1 Ω∗−1 u∗ - 2σ14 u∗0 P0∗ u∗ , v v

= − 2σ14 u∗0 P2∗ u∗ , v

£ ¡ ¢¤ + 12 tr Ω∗-1 Ω∗ρ Ω∗-1 Ω∗ρ − Ω∗ρρ

¡ ¢ − 2σ12 u∗0 Ω∗-1 2Ω∗ρ Ω∗-1 Ω∗ρ − Ω∗ρρ Ω∗-1 u∗ , v ³ ´i ³ ´ h ∂ 2 Lrr (ς) 1 1 ∗0 ∗-1 ∗ ∗-1 1 ∗-1 ∗ ∗-1 ∗ ∗ ∗0 ∗-1 ∗ ∗-1 ∗ ∗ Ω 2Ω Ω∗-1 u∗ , = − Y Ω Ω Ω + tr Ω Ω Ω -Ω u Ω Ω Ω -Ω 2 2 −1 ρ ρ λ λ ρλ λ ρλ ∂ρ∂λ σv 2 2σ v h ³ ´i ³ ´ ∂ 2 Lrr (ς) 1 ∗0 ∗-1 ∗ Ωφμ Ω∗-1 + 12 tr Ω∗-1 Ω∗φμ Ω∗-1 Ω∗ρ -Ω∗ρφμ - 2σ12 u∗0 Ω∗-1 2Ω∗φμ Ω∗-1 Ω∗ρ -Ω∗ρφμ Ω∗-1 u∗ , ∂ρ∂φμ = − σ 2v Y−1 Ω v h ³ ´i ³ ´ ∂ 2 Lrr (ς) 1 1 ∗0 ∗-1 ∗ ∗-1 1 ∗-1 ∗ ∗-1 ∗ ∗ ∗0 ∗-1 = − Y Ω Ω Ω + tr Ω Ω Ω -Ω Ω 2Ω∗φζ Ω∗-1 Ω∗ρ -Ω∗ρφζ Ω∗-1 u∗ , ρ φζ φζ ρφζ ∂ρ∂φζ σ 2v −1 2 2σ 2v u Ω ¡ ∗ 1 ∗¢ ∗ ∂ 2 Lrr (ς) 1 1 ∗0 ∗ ∗ P5 + 2 P4 u , ∂λ∂λ = 2 tr (P5 Ω ) − σ 2 u v

∂ 2 Lrr (ς) ∂λ∂φμ

=

1 ∗0 ∗ ∗ ∗ ∗ σ 2v u P1 Ω P2 u ,

∂ 2 Lrr (ς) ∂λ∂φζ



= 12 tr (P1∗ Ω∗ P3∗ Ω∗ ) −

1 ∗0 ∗ ∗ ∗ ∗ σ 2v u P1 Ω P2 u ,

= 12 tr (P2∗ Ω∗ P2∗ Ω∗ ) −

1 ∗0 ∗ ∗ ∗ ∗ σ 2v u P2 Ω P2 u ,

= 12 tr (P2∗ Ω∗ P3∗ Ω∗ ) −

1 ∗0 ∗ ∗ ∗ ∗ σ 2v u P2 Ω P3 u ,

= 12 tr (P3∗ Ω∗ P3∗ Ω∗ ) −

1 ∗0 ∗ ∗ ∗ ∗ σ 2v u P3 Ω P3 u ,

∂ 2 Lrr (ς) ∂φμ ∂φμ 2

rr

∂ L (ς) ∂φμ ∂φζ ∂ 2 Lrr (ς) ∂φζ ∂φζ

1 ∗ ∗ ∗ ∗ 2 tr (P1 Ω P2 Ω )

where e.g., Ω∗ρ = ∂Ω∗ /∂ρ, and Ω∗ρλ = ∂ 2 Ω∗ /∂ρ∂λ. Γrr,n (ς) = E [{∂Lrr (ς) /∂ς} {∂Lrr (ς) /∂ς 0 }] has elements Γrr,θθ =

1 ∗0 ∗−1 ∗ X ), σ 2v E(X Ω

Γrr,θσ2v =

1 ∗0 ∗−1 ∗ ∗0 ∗−1 ∗ u u Ω u ), 2σ 6v E(X Ω

£ ∗0 ∗−1 ∗ © ∗0 ∗−1 ∗ 1 ∗0 ∗ ∗ ª¤ X Ω u Y−1 Ω u + 2 u P0 u ,

Γrr,θρ =

1 σ 4v E

Γrr,θλ =

1 ∗0 ∗−1 ∗ ∗0 ∗ ∗ u u P1 u ), 2σ 4v E(X Ω

Γrr,θφu =

1 ∗0 ∗−1 ∗ ∗0 ∗ ∗ u u P2 u ), 2σ 4v E(X Ω

1 ∗0 ∗−1 ∗ ∗0 ∗ ∗ u u P3 u ), 2σ 4v E(X Ω ¡ ¢2 2 +1)2 = 4σ18 E u∗0 Ω∗−1 u∗ − n (T , 4σ 4v v

Γrr,θφζ = Γrr,σ2v σ2v

Γrr,σ2v ρ =

1 2σ 4v E

hn

1 ∗0 ∗−1 ∗ u σ 2v Y−1 Ω

+

1 ∗0 ∗ ∗ 2σ 2v u P0 u

i ¡ ¢o − 12 tr Ω∗−1 Ω∗ρ u∗0 Ω∗−1 u∗ ,

25

¡ ∗0 ∗−1 ∗ ∗0 ∗ ∗ ¢ n(T +1) u Ω u u P1 u − 4σ2 tr (P1∗ Ω∗ ) , v ¡ ¢ +1) ∗ ∗ Γrr,σ2v φu = 4σ16 E u∗0 Ω∗−1 u∗ u∗0 P2∗ u∗ − n(T 4σ 2v tr (P2 Ω ) , v ¡ ¢ +1) ∗ ∗ Γrr,σ2v φζ = 4σ16 E u∗0 Ω∗−1 u∗ u∗0 P3∗ u∗ − n(T 4σ 2v tr (P3 Ω ) , v hn o© ªi ∗0 ∗−1 ∗ ∗0 ∗−1 ∗ Ω u + 2σ12 u∗0 P0∗ u∗ − 12 tr (P0∗ Ω∗ ) Y−1 Ω u + 12 u∗0 P0∗ u∗ , Γrr,ρρ = σ12 E σ12 Y−1 v v hn v o i ∗0 ∗−1 ∗ Ω u + 2σ12 u∗0 P0∗ u∗ − 12 tr (P0∗ Ω∗ ) u∗0 P1∗ u∗ , Γrr,ρλ = 2σ12 E σ12 Y−1 v v hn v o i ∗0 ∗−1 ∗ Ω u + 2σ1 2 u∗0 P0∗ u∗ − 12 tr (P0∗ Ω∗ ) u∗0 P2∗ u∗ , Γrr,ρφu = 2σ12 E σ12 Y−1 v v hn v o i ∗0 ∗−1 ∗ Ω u + 2σ12 u∗0 P0∗ u∗ − 12 tr (P0∗ Ω∗ ) u∗0 P3∗ u∗ , Γrr,ρφζ = 2σ14 E σ12 Y−1 1 4σ 6v E

Γrr,σ2v λ =

v

Γrr,λλ =

1 4σ 4v E

v

∗0

(u

2 P1∗ u∗ )

v



1 2 4 tr

(P1∗ Ω∗ ) ,

Γrr,λφu =

1 4σ 4v E

(u∗0 P1∗ u∗ u∗0 P2∗ u∗ ) − 14 tr (P1∗ Ω∗ ) tr (P2∗ Ω∗ ) ,

Γrr,λφζ =

1 4σ 4v E

(u∗0 P1∗ u∗ u∗0 P3∗ u∗ ) − 14 tr (P1∗ Ω∗ ) tr (P3∗ Ω∗ ) , 2

Γrr,φu φu =

1 4σ 2v E

(u∗0 P2∗ u∗ ) − 14 tr2 (P2∗ Ω∗ ) ,

Γrr,φu φζ =

1 4σ 2v E

(u∗0 P2∗ u∗ u∗0 P3∗ u∗ ) − 14 tr (P2∗ Ω∗ ) tr (P3∗ Ω∗ ) ,

Γrr,φζ φζ =

1 4σ 2v E

(u∗0 P3∗ u∗ ) − 14 tr2 (P3∗ Ω∗ ) .

A.3

2

Fixed Effects Model

To conserve space, let Pa† = Ω†−1 Ω†a Ω†−1 , for a = λ, cm , φe , where Ω†a = ∂Ω† /∂a. Let Ω†aa = ∂ 2 Ω† / (∂a∂a) . Write ∆u = ∆u (θ) . Then the Hessian matrix Hf,n (ς) = ∂ 2 Lf (ς) / (∂ς∂ς 0 ) has elements ∂ 2 Lf (ς) ∂θ∂θ 0 ∂ 2 Lf (ς) ∂θ∂λ ∂ 2 Lf (ς) ∂θ∂φe ∂ 2 Lf (ς) ∂σ 2v ∂λ 2

f

= − σ12 ∆X 0 Ω†−1 ∆X, v

= − σ12 ∆X 0 Pλ† ∆u, v

=

− σ12 ∆X 0 Pφ†e ∆u, v

= − 2σ14 ∆u0 Pλ† ∆u, v

∂ L (ς) 1 0 † ∂σ 2v ∂φe = − 2σ 4v ∆u Pφe ∆u, h i ∂ 2 Lf (ς) † † 1 †−1 † = tr P Ω − Ω Ω λ λ λλ ∂λ∂λ 2 ∂ 2 Lf (ς) ∂λ∂cm ∂ 2 Lf (ς) ∂λ∂φe ∂ 2 Lf (ς) ∂cm ∂cm ∂ 2 Lf (ς) ∂cm ∂φe ∂ 2 Lf (ς) ∂φe ∂φe

∂ 2 Lf (ς) ∂θ∂σ 2v ∂ 2 Lf (ς) ∂θ∂cm ∂ 2 Lf (ς) ∂σ 2v ∂σ 2v ∂ 2 Lf (ς) ∂σ 2v ∂cm

= − σ14 ∆X 0 Ω†−1 ∆, v

= − σ12 ∆X 0 Pc†m ∆u, v

=

− σ16 ∆u0 Ω†−1 ∆u v

nT 2σ 4v ,

= − 2σ14 ∆u0 Pc†m ∆u, v

− σ12 ∆u0 Pλ† Ω†λ Ω†−1 ∆u + 2σ12 ∆u0 Ω†−1 Ω†λλ Ω†−1 ∆u, v h i v † † 1 1 †−1 † 0 † † †−1 = 2 tr Pλ Ωcm − Ω Ωλcm − σ2 ∆u Pλ Ωcm Ω ∆u + 2σ1 2 ∆u0 Ω†−1 Ω†λcm Ω†−1 ∆u, v v h i = 12 tr Pλ† Ω†φe − Ω†−1 Ω†λφe − σ12 ∆u0 Pλ† Ω†φe Ω†−1 ∆u + 2σ12 ∆u0 Ω†−1 Ω†λφe Ω†−1 ∆u, v v £ ¤ = 12 tr Pc†m Ω†cm − σ12 ∆u0 Pc†m Ω†cm Ω†−1 ∆u, v h i † †−1 = 12 tr Pc†m Ω†φe − Ω†−1 Ω†cm φe − σ12 ∆u0† ∆u + 2σ12 ∆u0 Ω†−1 Ω†cm φe Ω†−1 ∆u, cm Ωφe Ω v v h i † † 1 1 0 † † † = 2 tr Pφe Ωφe − σ2 ∆u Pφe Ω Pφe ∆u. v

Γf,n (ς 0 ) = E [{∂L (ς 0 ) /∂ς} {∂Lrr (ς 0 ) /∂ς 0 }] has elements rr

Γf,θθ =

+

1 0 †−1 ∆X), σ 2v E(∆X Ω

26

Γf,θσ2v = Γf,θλ =

1 0 †−1 ∆u∆u0 Ω†−1 ∆u), 2σ 6v E(∆X Ω

1 0 †−1 ∆u∆u0 Ω†−1 Ω†λ Ω†−1 ∆u), 2σ 4v E(∆X Ω

Γf,θcm =

1 0 †−1 ∆u∆u0 Ω†−1 Ω†cm Ω†−1 ∆u), 2σ 4v E(∆X Ω

1 0 †−1 ∆u∆u0 Ω†−1 Ω†φe Ω†−1 ∆u), 2σ 4v E(∆X Ω ¡ ¢2 2 2 = 4σ1 8 E ∆u0 Ω†−1 ∆u − n4σT4 , v v

Γf,θφe = Γf,σ2v σ2v

³ ´ ³ ´ nT ∆u0 Ω†−1 ∆u∆u0 Ω†−1 Ω†λ Ω†−1 ∆u − 4σ Ω†−1 Ω†λ , 2 tr v ¡ 0 †−1 ¢ nT ¡ †−1 † ¢ 1 0 †−1 † †−1 Γf,σ2v cm = 4σ6 E ∆u Ω ∆u∆u Ω Ωcm Ω ∆u − 4σ2 tr Ω Ωcm , v v ³ ´ ³ ´ † 1 nT 0 †−1 0 †−1 †−1 Γf,σ2v φe = 4σ6 E ∆u Ω ∆u∆u Ω Ωφe Ω ∆u − 4σ2 tr Ω†−1 Ω†φe , v ³ ´2 ³ ´v 1 1 2 0 †−1 † †−1 †−1 † Γf,λλ = 4σ4 E ∆u Ω Ωλ Ω ∆u − 4 tr Ω Ωλ , v ³ ´ ³ ´ ¡ ¢ Γf,λcm = 4σ14 E ∆u0 Ω†−1 Ω†λ Ω†−1 ∆u∆u0 Ω†−1 Ω†cm Ω†−1 ∆u − 14 tr Ω†−1 Ω†λ tr Ω†−1 Ω†cm , v ³ ´ ³ ´ ³ ´ Γf,λφe = 4σ14 E ∆u0 Ω†−1 Ω†λ Ω†−1 ∆u∆u0 Ω†−1 Ω†φe Ω†−1 ∆u − 14 tr Ω†−1 Ω†λ tr Ω†−1 Ω†φe , v ¡ ¢2 ¡ ¢ Γf,cm cm = 4σ12 E ∆u0 Ω†−1 Ω†cm Ω†−1 ∆u − 14 tr2 Ω†−1 Ω†cm , v ³ ´ ³ ´ Γf,cm φe = 4σ12 E ∆u0 Ω†−1 Ω†cm Ω†−1 ∆u∆u0 Ω†−1 Ω†φe Ω†−1 ∆u − 14 tr Ω†−1 Ω†cm Ω†−1 Ω†φe , v ³ ´2 ³ ´ 1 Γf,φe φe = 4σ2 E ∆u0 Ω†−1 Ω†φe Ω†−1 ∆u − 14 tr2 Ω†−1 Ω†φe .

Γf,σ2v λ =

1 4σ 6v E

v

B

Some Useful Lemmas

We prove some lemmas that are used in the proof of the main theorems in the text. For the ease of notation, we assume that both xit and zi are scalar random variables (p = 1, q = 1) in this Appendix. Let Pn and Qn be two n × n matrices that are uniformly bounded in both row and column sums. Let Rn be a conformable matrix whose elements are uniformly O (on ) for certain sequence on . Frequently we will use three evident facts (see, e.g., Kelejian and Prucha, 1999, Lee, 2002): (1) Pn Qn is also uniformly bounded in both row and column sums; (2) any (i, j) elements Pn,ij of Pn are uniformly bounded in i and j and tr (Pn ) = O (n) ; (3) the elements of Pn Rn and Rn Pn are uniformly O (on ) . Noting that both W and B −1 are all uniformly bounded in both row and column sums under our assumption. It is easy to apply the first evident fact to prove the following lemma. −1

Lemma B.1 B 0 B, (B 0 B)

.

, Ω, Ω−1 , Ω∗ , Ω∗−1 , Ω† , Ω†−1 , A, and A are all uniformly bounded in

both row and column sums, where recall A = (B 0 B)

−1

−1

(W 0 B + B 0 W ) (B 0 B)

[(W 0 B + B 0 W ) A −W 0 W ]. By the three evident facts and Lemma B.1, we have the following lemma. 27

.

−1

and A = 2 (B 0 B)

Lemma B.2 1) tr (D1 ΩD2 ) /n = O (1) for D1 , D2 = Ω−1 , Ω−1 (IT ⊗ A) Ω−1 , Ω−1 (JT ⊗ In ) Ω−1 , ³ .´ Ω−1 IT ⊗ A . The same conclusion holds when Ω is replaced by Ω∗ or Ω† , and D1 and D2 are replaced by their analogs corresponding to the case of Ω∗ or Ω† . ¢ ¡ 2) tr B 0−1 RB −1 /n = O (1) where R is an n × n nonstochastic matrix that is uniformly bounded in both row and column sums.

Lemma B.3 Let {ai }ni=1 and {bi }ni=1 be two independent i.i.d.. sequences with zero means and ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ fourth moments. Let σ 2a = E a21 , σ 2b = E b21 . Let κa = E a41 − 3σ 4a , and κb = E b41 − 3σ 4b . Let qn , pn be n × n nonstochastic matrices. Then Pn 1) E (a0 qn a) (a0 pn a) = κa i=1 qn,ii pn,ii + σ 4a [tr (qn ) tr (pn ) + tr (qn (pn + p0n ))] . 2) E (a0 qn a) (b0 pn b) = σ 2a σ 2b tr (qn ) tr (pn ) .

3) E (a0 qn b) (a0 pn b) = σ 2a σ 2b tr (qn p0n ) . Proof. To show 1), write E (a0 qn a) (a0 qn a) = E

³P

n i=1

Pn

j=1

Pn

k=1

´ a a q a a q . i j n,ij k l n,kl l=1

Pn

Noting that E (ai aj ak al ) will not vanish only when i = j = k = l, (i = j) 6= (k = l) , (i = k) 6= (j = l) , and (i = l) 6= (j = k) , we have E (a0 qn a) (a0 pn a) n X n n X ¡ ¢X qn,ii pn,ii + σ 4a {qn,ii pn,jj + qn,ij pn,ij + qn,ij pn,ji } = E a41 i=1

= κa = κa

n X i=1 n X

qn,ii pn,ii + σ 4a

i=1 j6=i

n n X X i=1 j=1

{qn,ii pn,jj + qn,ij pn,ij + qn,ij pn,ji }

qn,ii pn,ii + σ 4a [tr (qn ) tr (pn ) + tr (qn (pn + p0n ))] .

i=1

The result in 2) is trivial since E (a0 qn a) (b0 pn b) = E (a0 qn a) E (b0 pn b) = σ2a σ 2b tr (qn ) tr (pn ) . Now, ³P P ´ ³P ´ Pn Pn n n n Pn 2 2 a b q a b p a b q p = E E (a0 qn b) (a0 pn b) = E i=1 j=1 k=1 l=1 i j n,ij k l n,kl i=1 j=1 i j n,ij n,ij = σ 2a σ 2b tr (qn p0n ) .

¡ ¢ P j −1 Lemma B.4 Recall u = (ιT ⊗ In )μ + IT ⊗ B −1 v. Let a = ξ + μ/ (1 − ρ) + ∞ v−j , where j=0 ρ B ξ, μ, and v are defined in the text. In particular, ξ 0i s are i.i.d.. and independent of μ and v. Let

qn , pn , rn , sn , tn be nT × nT, nT × nT, n × n, n × nT and n × nT nonstochastic matrices, respectively. Further, qn , pn , and rn are symmetric. Then Pn Pn 1) E (u0 qn u) (u0 pn u) = κμ i=1 Gqn ,1ii Gpn ,1ii + κv i=1 Gqn ,2ii Gpn ,2ii +σ 4v [tr (qn Ω) tr (pn Ω) + 2tr (qn Ωpn Ω)] . 28

2) E (u0 qn u) (a0 rn a) = 3) E (a0 sn u) (a0 tn u) =

κμ (1−ρ)2 κμ (1−ρ)2

Pn

i=1

Pn

i=1

Gqn ,1ii rn,ii + σ 4v [tr (rn ω 11 ) tr (qn Ω) + 2tr (ω 12 qn ω 21 pn )] . (sn (ιT ⊗ In ))ii (tn (ιT ⊗ In ))ii

+σ 4v [tr (sn ω 21 ) tr (tn ω 21 ) + tr (sn ω 21 tn ω 21 ) + tr (sn Ωt0n ω 11 )] . κμ Pn 0 0 4 0 0 4) E (u0 qn u) (u0 s0n a) = 1−ρ i=1 Gqn ,1ii ((ιT ⊗ In ) sn )ii +σ v [tr (qn Ω) tr (sn ω 12 ) +2tr (Ωsn ω 12 qn )] . P κμ n 4 5) E (a0 rn a) (a0 sn u) = (1−ρ) 3 i=1 rn,ii (sn (ιT ⊗ In ))ii +σ v [tr (rn ω 11 ) tr (sn ω 21 ) +2tr (rn ω 11 sn ω 21 )] , ¡ ¢ ¡ ¢ where, e.g., Gqn ,1 = (ι0T ⊗ In ) qn (ιT ⊗ In ) , and Gqn ,2 = IT ⊗ B 0−1 qn IT ⊗ B −1 . Proof. We only sketch the proof of 1) and 2) since it mainly follows from Lemma B.3 and the ¡ ¢ other proofs are similar. First, let Gqn ,3 = (ι0T ⊗ In ) qn IT ⊗ B −1 . Then by the independence of μ and v and Lemma B.3, we have E (u0 qn u) (u0 pn u) = E{μ0 Gqn ,1 μμ0 Gpn ,1 μ + v 0 Gqn ,2 vv 0 Gpn ,2 v + μ0 Gqn ,1 μv 0 Gpn ,2 v + v 0 Gqn ,2 vμ0 Gpn ,1 μ +2μ0 Gqn ,3 vμ0 Gpn ,3 v + 2v 0 G0qn ,3 μv 0 Gpn ,3 μ} n n X X Gqn ,1ii Gpn ,1ii + κv Gqn ,2ii Gpn ,2ii + σ 4v [tr (qn Ω) tr (pn Ω) +2tr (qn Ωpn Ω)] . = κμ i=1

i=1

Next, write a = b + B −1 c, where b = ξ + μ/ (1 − ρ) and c = mutually independent. So

P∞

j=0

ρj v−j . Then b and c are i.i.d.. and

E (u0 qn u) (a0 rn a) = E{μ0 Gqn ,1 μb0 rn b + v 0 Gqn ,2 vc0 B 0−1 rn B −1 c + μ0 Gqn ,1 μc0 B 0−1 rn B −1 c + v 0 Gqn ,2 vb0 rn b} n X κμ Gqn ,1ii rn,ii + σ 4v [tr (rn ω 11 ) tr (qn Ω) + 2tr (ω 12 qn ω 21 pn )] . = (1 − ρ)2 i=1 Similarly, we can prove the other claims. To proceed, we define some notation. By continuous back substitution, we have for t = 0, 1, 2, ..., yt = Xt β 00 + cρ,t zγ 0 + cρ,t μ + Vt + Y0,t , Pt−1

Pt−1

ρj0 B −1 vt−j , Y0,t = ρt0 y0 and cρ,t = (1 − ρt0 ) / (1 − ρ0 ) for P∞ j the case of random effect with fixed initial observations (Case I), and Xt = j=0 ρ0 xt−j , Vt = P∞ j −1 vt−j , Y0,t = 0, and cρ,t = 1/ (1 − ρ0 ) ≡ cρ for the case of random effect with endogenous j=0 ρ0 B ¡ ¢0 initial observations (Case II). Now, define Y0 = Y00,0 , Y00,1 , ..., Y00,T −1 . Then where Xt =

j=0

ρj0 xt−j , Vt =

(B.1)

j=0

Y−1 = X(−1) β 0 + (lρ ⊗ In ) zγ 0 + (lρ ⊗ In ) μ + V(−1) + Y0 , 29

(B.2)

where



X(−1)

for Case I and

⎜ ⎜ ⎜ ⎜ =⎜ ⎜ ⎜ ⎝

X(−1)



0 X1 .. . XT −1 ⎛

⎜ ⎜ ⎜ ⎜ =⎜ ⎜ ⎜ ⎝



⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ , V(−1) = ⎜ ⎟ ⎜ ⎟ ⎜ ⎠ ⎝

VT −1

X1 .. . XT −1



⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ , V(−1) = ⎜ ⎟ ⎜ ⎟ ⎜ ⎠ ⎝



⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ , and lρ = ⎜ ⎟ ⎜ ⎟ ⎜ ⎠ ⎝

V1 .. .



X0



0

V0 V1 .. . VT −1

0 cρ,1 .. . cρ,T −1



⎟ ⎟ ⎟ ⎟ ⎟ , and lρ = cρ ιT ⎟ ⎟ ⎠

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

(B.3)

(B.4)

for Case II. Notice that in Case I, Y−1 can also be expressed as

Y−1 = Ax X 0 β 0 + (lρ ⊗ In ) zγ 0 + (lρ ⊗ In ) μ + Av v + Y0 , where Ax = Cρ ⊗ In and Av = Cρ ⊗ B −1 with ⎛

⎜ ⎜ ⎜ ⎜ ⎜ Cρ = ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

0

0

0

···

0 0

1

0

0

···

0 0

ρ0 .. .

1 .. .

0 .. .

··· .. .

0 0 .. .. . .

ρT0 −2

ρT0 −3

ρT0 −4

1 0

(B.5)



⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎟ ⎠

The expression in (B.5) will facilitate our analysis in several places. The following four lemmas (Lemmas B.5-B.8) are used in the proof of Theorem 4.2. Lemma B.5 For D1 , D2 = Ω−1 , Ω−1 (IT ⊗ A) Ω−1 , or Ω−1 (JT ⊗ In ) Ω−1 , ¤ £ 1) n−1 u0 D1 ΩD2 u − σ 2v tr (D1 ΩD2 Ω) = op (1) , ³ ´i h e −E X e 0 D1 ΩD2 X e = op (1) . e 0 D1 ΩD2 X 2) n−1 X

Proof. Let R = D1 ΩD2 . Note that R is uniformly bounded in both row and column sums. To

show 1), first note that E (u0 Ru) = σ 2v tr (RΩ) . By Lemma B.4, h i ¢ ¡ 2 Var n−1 u0 Ru = n−2 E (u0 Ruu0 Ru) − (E (u0 Ru)) = n−2 κμ

n X

G2R,1ii + n−2 κv

i=1

n X i=1

30

¡ ¢ G2R,2ii + 2n−2 σ 4v tr (RΩRΩ) = O n−1 ,

where the last equality follows from the fact that G2R,1 , G2R,2 , and RΩRΩ are all uniformly bounded in both row and column sums. Then 1) follows by the Chebyshev inequality. e = (X, Z, Y−1 ) , it is easy to show that terms like n−1 X 0 RX, n−1 X 0 RZ, Next, noticing that X

and n−1 Z 0 RZ converge to their expectation. For terms involving Y−1 , we need to use (B.5) to obtain 0 n−1 Y−1 RY−1

0

= n−1 [Ax X 0 β 0 + (lρ ⊗ In ) zγ 0 ] R [Ax X 0 β 0 + (lρ ⊗ In ) zγ 0 ] 0

+n−1 [(lρ ⊗ In ) μ + Av v] R [(lρ ⊗ In ) μ + Av v] +n−1 Y00 RY0 0

+2n−1 [Ax X 0 β 0 + (lρ ⊗ In ) zγ 0 ] R [(lρ ⊗ In ) μ + Av v] 0

+2n−1 [Ax X 0 β 0 + (lρ ⊗ In ) zγ 0 ] RY0 +2n−1 [(lρ ⊗ In ) μ + Av v]0 RY0 ≡

6 X

Ani .

i=1

It suffices to show that each Ani (i = 1, ..., 6) converges to its expectation. Take An6 as an example. E (An6 ) = 0 because Y0 is kept fixed here. For the second moment, ¢ ª © ¡ ¡ ¢ Var (An6 ) = 4n−2 E μ0 lρ0 ⊗ In RY0 Y00 R0 (lρ ⊗ In ) μ + E (v 0 Av RY0 Y00 R0 Av v) ¡ ª ¡ © ¡ ¢¢ ¢ = 4n−2 σ 2μ tr RY0 Y00 R0 lρ lρ0 ⊗ In + σ 2v tr (Av RY0 Y00 R0 Av ) = O n−1 ,

where the last equality follows from the fact that both matrices in the trace operator are uniformly bounded in both row and column sums. Similarly, we can show that n−1 X 0 RY−1 and n−1 Z 0 RY−1 converge to their expectation in probability. Lemma B.6 Let R be an nT × nT nonstochastic matrix that is uniformly bounded in both row

p e 0 RΩ−1 u → (01×p , 01×q , and column sums, e.g., InT , Ω−1 (IT ⊗ A) , or Ω−1 (JT ⊗ In ) . Then n−1 X

limn→∞ σ 2v tr (R (Jρ ⊗ In )) /n)0 , where



⎜ ⎜ ⎜ ⎜ ⎜ Jρ = ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

0 1 ρ ···

ρT −2

0 0 1 .. .. .. . . .

··· .. .

ρT −3 .. .

0 0 0

···

1

0 0 0

···

0



⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ ⎟ ⎟ ⎠

p e 0 Ω−1 u) = 0 and n−1 X e 0 Ω−1 u → In particular, when R = InT , E(n−1 X 0(p+q+1)×1 .

31

e = (X, Z, Y−1 ) . By the strict exogeneity of X and Z, we can readily show Proof. Note that X p

p

that both X 0 RΩ−1 u and Z 0 RΩ−1 u have expectation zero, n−1 X 0 RΩ−1 u → 0 and n−1 Z 0 RΩ−1 u → 0. We are left to show ª p ¡ 0 ¢ © 0 RΩ−1 u − σ 2v tr (R (Jρ ⊗ In )) → 0, and E Y−1 RΩ−1 u = σ 2v tr (R (Jρ ⊗ In )) . n−1 Y−1

Using (B.5), we have

¡ ¢ 0 Y−1 RΩ−1 u = μ0 lρ0 ⊗ In RΩ−1 u + v 0 A0v RΩ−1 u + β 00 XA0x RΩ−1 u ¡ ¢ + γ 00 z lρ0 ⊗ In RΩ−1 u + Y00 RΩ−1 u 5 X



Anj .

j=1

It is easy to show that n−1 Anj = op (1) for j = 3, 4, 5. So we will show that An1 + An2 = ¡ ¢ σ 2v tr (R (Jρ ⊗ In )) + op (n) . Noting that u = (ιT ⊗ In ) μ + IT ⊗ B −1 v, we have £ ¡ ¤ £ ¢ ¡¡ ¢ ¢¤ E (An1 ) = E μ0 lρ0 ⊗ In RΩ−1 (ιT ⊗ In ) μ = φμ σ 2υ tr RΩ−1 ιT lρ0 ⊗ In ∙ µ h i−1 ¶¸ −1 , = φμ σ 2υ tr R (JT Jρ ) ⊗ (B 0 B) + φμ T In

and £ £ ¡ ¢ ¤ ¡ ¢¡ ¢¤ E (An2 ) = E v 0 A0v RΩ−1 IT ⊗ B −1 v = σ 2υ tr RΩ−1 IT ⊗ B −1 Jρ ⊗ B 0−1 h ³ ´i −1 = σ2υ tr RΩ−1 Jρ ⊗ (B 0 B) ∙ µ ¶¸ i−1 ¡ −1 ¢ h 0 −1 −1 2 0 = συ tr R T JT Jρ ⊗ (B B) + φμ T In (B B) £ ¡¡ ¢ ¢¤ +σ2υ tr R Jρ − T −1 JT Jρ ⊗ In ,

where we have used (3.35) and the fact that E (vv 0 A0v ) = Jρ ⊗ B 0−1 . Hence E (An1 + An2 ) = σ 2v tr (R (Jρ ⊗ In )) .

¡ ¢ We can show that E A2nj = O (n) for j = 1, 2. It follows from the Chebyshev inequality that

n−1 (An1 + An2 ) = σ 2v tr (R (Jρ ⊗ In )) /n+op (1) . When R = InT , tr (R (Jρ ⊗ In )) = tr (Jρ ) tr (In ) = 0 and we can also verify that E (Ani ) = 0 for i = 3, 4, 5. This completes the proof. Lemma B.7 Suppose that {P1n } and {P2n } are sequences of matrices with row and column sums 0

2+

uniformly bounded. Let a = (a1 , ..., an ) , where ai is a random variable and supi E |ai | some

0

0

0

0

< ∞ for

> 0. Let b = (b1 , ..., bn ) , where bi s are i.i.d.. with mean zero and (4 + 2 0 )th finite moments, 32

and {bi } is independent of {ai } . Let σ 2Qn be the variance of Qn = a0 P1n b + b0 P2n b − σ 2v tr (P2n ) . ¡ √ ¢ Assume that the elements of P1n , P2n are of uniform order O 1/ hn and O (1/hn ) , respectively. 1+2/

If limn→∞ hn

0

d

/n = 0, then Qn /σQn → N (0, 1) .

Proof. Note that Qn is a linear-quadratic form of b like that in Theorem 1 of Kelejian and Prucha (2001). The only difference is that the coefficient (a0 P1n ) of the linear part of Qn is random here. We can prove the lemma by modifying the proof of Theorem 1 in Kelejian and Prucha (2001) or Lemma A.13 of Lee (2002). Lemma B.8 Suppose that the conditions in Theorem 4.2 are satisfied. Then d

1 e 0 −1 X Ω u → N (0, Γr,11 ) , where Γr,11 = p limn→∞ (nT ) 1) √nT 1 2) √nT

r

∂L (ς 0 ) d → ∂ς

N (0, Γr ) .

−1

e 0 Ω−1 X. e X 0

Proof. For 1), by the Cramer-Wold device, it suffices to show that for any c = (c01 , c02 , c3 ) ∈ d

e 0 Ω−1 u → N (0, c0 Γr,11 c) . Using (B.5) and the expression Rp × Rq × R such that kck = 1, (nT )−1/2 c0 X ¡ ¢ u = (ιT ⊗ In ) μ + IT ⊗ B −1 v, we have e 0 Ω−1 u = c01 XΩ−1 u + c02 ZΩ−1 u + c3 Y−1 Ω−1 u c0 X ¡ ¢ = c01 XΩ−1 (ιT ⊗ In ) μ + c01 XΩ−1 IT ⊗ B −1 v ¡ ¢ +c02 ZΩ−1 (ιT ⊗ In ) μ + c02 ZΩ−1 IT ⊗ B −1 v ¡ ¢ +c3 β 00 XA0x Ω−1 (ιT ⊗ In ) μ + c3 β 00 XA0x Ω−1 IT ⊗ B −1 v ¡ ¡ ¢ ¢ ¡ ¢ +c3 γ 00 z lρ0 ⊗ In Ω−1 (ιT ⊗ In ) μ + c3 γ 00 z lρ0 ⊗ In Ω−1 IT ⊗ B −1 v ¡ ¢ +c3 Y00 Ω−1 (ιT ⊗ In ) μ + c3 Y00 Ω−1 IT ⊗ B −1 v ¡ ¢ ¡ ¢ ¡ ¢ +c3 μ0 lρ0 ⊗ In Ω−1 (ιT ⊗ In ) μ + c3 μ0 lρ0 ⊗ In Ω−1 IT ⊗ B −1 v ¡ ¢ +c3 v 0 A0v Ω−1 (ιT ⊗ In ) μ + c3 v 0 A0v Ω−1 IT ⊗ B −1 v =

3 X

Tni ,

i=1

where Tn1

=

£ 0 ¡ ¢ ¤ c1 X + c02 Z + c3 β 00 XA0x + c3 γ 00 z lρ0 ⊗ In + c3 Y00 Ω−1 (ιT ⊗ In ) μ ¡ ¢ +c3 μ0 lρ0 ⊗ In Ω−1 (ιT ⊗ In ) μ,

33

£ 0 ¡ ¢ ¤ ¡ ¢ c1 X + c02 Z + c3 β 00 XA0x + c3 γ 00 z lρ0 ⊗ In + c3 Y00 Ω−1 IT ⊗ B −1 v ¡ ¢ +c3 v 0 A0v Ω−1 IT ⊗ B −1 v, and ¡ ¢ ¡ ¢ = c3 μ0 lρ0 ⊗ In Ω−1 IT ⊗ B −1 v + c3 v 0 A0v Ω−1 (ιT ⊗ In ) μ £¡ ¢ ¡ ¢ ¤ = c3 μ0 lρ0 ⊗ In Ω−1 IT ⊗ B −1 + (ι0T ⊗ In ) Ω−1 Av v.

Tn2

=

Tn3

¡ ¡ ¢¢ It is easy to verify that E (Tn3 ) = 0, E (Tn1 ) = c3 φμ σ 2v tr Ω−1 ιT lρ0 ⊗ In , and thus E (Tn2 ) =

−E (Tn1 ) by Lemma B.6. Also, we can verify that Cov(Tni , Tnj ) = 0 for i 6= j. It suffices to show

that each Tni (after appropriately centered for Tn1 and Tn2 ) is asymptotically normally distributed with mean zero. Note that Tn1 and Tn2 are linear and quadratic functions of μ and v, respectively. For Tn3 , it is a special case of Lemma B.7 since it can be regarded as a linear function of either μ or υ and μ and υ are independent of each other by assumption. So we can apply Lemma B.10 to Tni (i = 1, 2, , 3) to obtain p d {Tni − E (Tni )} / Var (Tni ) → N (0, 1) .

Now by the independence of Tn1 and Tn2 , and the asymptotic independence of Tn3 with Tn1 and Tn2 , we have 3 X 1 d e 0 Ω−1 u = √1 √ c0 X Tni → N nT nT i=1

implying that

√1 nT

Ã

−1

0, lim (nT ) n→∞

3 X

!

Var (Tni ) ,

i=1

d e 0 Ω−1 X e 0 Ω−1 u → e N (0, Γr,11 ) because we can readily show that (nT )−1 (X X

e 0 Ω−1 u)) = op (1) . −Var(X

For 2), noticing that each component of ∂Lr (ς 0 ) /∂ς can be written as linear and quadratic

functions of μ or v, the proof can be done by following that of 1) closely. The next four lemmas are used in the proof of Theorem 4.4 for the random effects model with endogenous initial observations (Case II). Let Rts be an n × n symmetric and positive semidefinite (psd) nonstochastic square matrix for t, s = 0, 1, ..., T − 1. Assume that Rts are uniformly bounded P∞ P∞ in both row and column sums. Recall for Case II Xt = j=0 ρj0 xt−j , and Vt = j=0 ρj0 B −1 vt−j .

Lemma B.9 Suppose that the conditions in Theorem 4.4 are satisfied. Then ¡ ¢ P∞ s−t+2i , 1) E (V0t Rts Vs ) = σ 2v tr B 0−1 Rts B −1 i=max(0,t−s) ρ ³P ´ ¡ ¢ P ∞ ∞ j+k Rts E xs−j x0t−k , 2) E (X0t Rts Xs ) = tr j=0 k=0 ρ 3) E (X0t Rts Vs ) = 0.

34

Proof. Let Pj = ρj B −1 . Then Vt = and 0 otherwise, we have E (V0t Rts Vs ) =

∞ ∞ X X i=0 j=0



∞ X

i=max(0,t−s)

E (X0t Rts Xs ) =

P∞

j=0

Pj vt−j . Noting that E (vt0 Dvs ) = σ 2v tr (D) if t = s

∞ X

i=max(0,t−s)



¡ 0 ¢ E vt−i Pi0 Rts Ps−t+i vt−i

¡ ¢ Pi0 Rts Ps−t+i ⎠ = σ 2v tr B 0−1 Rts B −1

∞ X

ρs−t+2i .

i=max(0,t−s)

ρj xt−j , we have

∞ X ∞ X j=0 k=0

Now,

j=0

¡ 0 ¢ E vt−i Pi0 Rts Pj vs−j =

= σ 2v tr ⎝

Next, noting that Xt =

P∞

⎛ ⎞ ∞ X ∞ X ¡ ¡ ¢ ¢ ρj+k E x0t−k Rts xs−j = tr ⎝ ρj+k Rts E xs−j x0t−k ⎠ .

E (X0t Rts Vs ) =

j=0 k=0

∞ ∞ X X

j=0 k=0

¡ ¢ ρj+k E x0t−k Rts B −1 vs−j = 0.

Lemma B.10 Suppose that the conditions in Theorem 4.4 are satisfied. Then ¢ © Pn ¡ ¢ ¡ ¢ ¡ 1) Cov V0t Rts Vs , V0g Rgh Vh = ρtsgh,1 κv i=1 B 0−1 Rts B −1 ii B 0−1 Rgh B −1 ii ³ ³ ´´o 0 +2σ 4v tr B 0−1 Rts B −1 B 0−1 Rgh B −1 + B 0−1 Rgh B −1 ³ ³ ´ ´ −1 −1 0 B −1 . +ρtsgh,2 σ4v tr B 0−1 Rts (B 0 B) Rgh B −1 + ρtsgh,3 σ 4v tr B 0−1 Rts (B 0 B) Rgh ¢ ¡ 2) Cov X0t Rts Vs , X0g Rgh Vh hP P ³ ´i m m Pmin(m,m+s−h) i+k+h−s+2j −1 0 Rts (B 0 B) Rgh E x0g−k xt−i . = σ 2v tr i=0 k=0 j=max(0,s−h) ρ ¢ ¡ 3) Cov X0t Rts Xt , X0g Rgh Xh = O (n) , P∞ P P (s+g+h−3t+4j) g−t+2i , ρtsgh,2 = ∞ where ρtsgh,1 = ∞ j=max(0,t−s,t−g,t−h) ρ i=max(0,t−g) ρ j=max(0,s−h) P∞ P∞ h−s+2j h−t+2i g−s+2j ρ 1 (j 6= i + s − t) , and ρtsgh,3 = i=max(0,t−h) ρ 1 (j 6= i + s − t) . j=max(0,s−g) ρ Proof. Let R1 and R2 be arbitrary n × n nonstochastic matrices. We can show that £ ¡ ¢¤ E (vt0 R1 vs ) vg0 R2 vh ⎧ P ⎪ ⎪ κv ni=1 R1,ii R2,ii + σ 4v [tr (R1 ) tr (R2 ) +tr (R1 (R2 +R20 ))] ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ σ4 tr (R1 ) tr (R2 ) ⎪ ⎪ ⎨ v = σ4v tr (R1 R2 ) ⎪ ⎪ ⎪ ⎪ 4 ⎪ σv tr (R1 R20 ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 0 35

if t = s = g = h if t = s 6= g = h if t = g 6= s = h . if t = h 6= s = g otherwise

Consequently, £ ¤ E V0t Rts Vs V0g Rgh Vh ⎤ ⎡ ∞ X ∞ ∞ X ∞ X X 0 0 ρi+j+k+l vt−i B 0−1 Rts B −1 vs−j vg−k B 0−1 Rgh B −1 vh−l ⎦ = E⎣ i=0 j=0 k=0 l=0 ∞ X

=

ρ(s+g+h−3t+4j)

j=max(0,t−s,t−g,t−h)

+σ 4v +σ 4v

(

κv

n X ¡ 0−1 ¢ ¡ ¢ B Rts B −1 ii B 0−1 Rgh B −1 ii i=1

£ ¡ 0−1 ¢ ¡ ¢ ¡ ¡ ¢¢¤o 0 B −1 tr B Rts B −1 tr B 0−1 Rgh B −1 + 2tr B 0−1 Rts B −1 B 0−1 Rgh B −1 + B 0−1 Rgh ∞ X

i=max(0,t−s) ∞ X

+

¡ ¢ ρs−t+2i tr B 0−1 Rts B −1

ρg−t+2i

i=max(0,t−g) ∞ X

+

∞ X

j=max(0,s−h)

ρh−t+2i

i=max(0,t−h)

∞ X

j=max(0,s−g)

∞ X

j=max(0,g−h)

¡ ¢ ρh−g+2j tr B 0−1 Rgh B −1 1 (j 6= i + g − t)

³ ´ −1 ρh−s+2j σ 4v tr B 0−1 Rts (B 0 B) Rgh B −1 1 (j 6= i + s − t) ³ ´ −1 0 ρg−s+2j σ 4v tr B 0−1 Rts (B 0 B) Rgh B −1 1 (j 6= i + s − t)

Then 1) follows by Lemma B.9. For 2), we have ´ ³ ¡ ¢ 0 Cov X0t Rts Vs , X0g Rgh Vh = E X0t Rts Vs (X0t Rgh Vs ) ⎡ ⎤ ∞ X ∞ ∞ X ∞ X X ¡ ¢ 0 = E⎣ ρi+j+k+l x0t−i Rts B −1 vs−j x0g−k Rgh B −1 vh−l ⎦ i=0 j=0 k=0 l=0

⎡ ∞ X ∞ X = σ 2v tr ⎣

∞ X

i=0 k=0 j=max(0,s−h)

⎤ ¡ ¢ −1 0 ρi+k+h−s+2j Rts (B 0 B) Rgh E x0g−k xt−i ⎦ .

¢ ¡ The expression for Cov X0t Rts Xt , X0g Rgh Xh is quite complicated, but we can use the three evident facts to show it is of order O (n) , which suffices for our purpose.

i p h P −1 PT −1 0 −1 PT −1 PT −1 0 Lemma B.11 1) (nT )−1 Tt=0 s=0 Vt Rts Vs − E (nT ) t=0 s=0 Vt Rts Vs → 0, P −1 PT −1 0 p 2) (nT )−1 Tt=0 s=0 Xt Rts Vs → 0, h i p −1 PT −1 PT −1 0 −1 PT −1 PT −1 0 X R X − E (nT ) X R X 3) (nT ) t ts t t ts t → 0. t=0 s=0 t=0 s=0

Proof. By the three evident facts and Lemmas B.2, B.9, and B.10, we can show that E[(nT )−1 PT −1 PT −1 0 t=0 s=0 Vt Rts Vs ] = O (1) , and Ã

−1

Var n

−1 T −1 T X X t=0 s=0

V0t Rts Vs

!

= n−2

−1 T −1 T −1 T −1 T X X X X t=0 s=0 g=0 h=0

36

¡ ¢ ¡ ¢ Cov V0t Rts Vs , V0g Rgh Vh = O n−1 .

Then 1) follows by the Chebyshev’s inequality. For 2), we have E{(nT )−1 0, and Ã

−1

Var n = n−2

−1 T −1 T X X

X0t Rts Vs

t=0 s=0

T −1 T −1 T −1 T −1 X X X X t=0 s=0 g=0 h=0

= n−2

−1 T −1 T −1 T −1 T X X X X t=0 s=0 g=0 h=0

¡ ¢ = O n−1 ,

PT −1 PT −1 t=0

s=0

X0t Rts Vs } =

!

¡ ¢ Cov X0t Rts Vs , X0g Rts Vh ⎡ ∞ X ∞ X σ 2v tr ⎣

∞ X

i=0 k=0 j=max(0,s−h)

⎤ ¡ ¢ −1 0 ρi+k+h−s+2j Rts (B 0 B) Rgh E xg−k x0t−i ⎦

where the last equality follows because (i) xit are i.i.d.. across i with second moments uniformly bounded in i, (ii) Rts (B 0 B)

−1

0 Rgh are uniformly bounded in both row and column sums by B.1, ¡ ¢ −1 0 E xg−k x0t−i are uniformly bounded by the third evident fact. and (iii) elements of Rts (B 0 B) Rgh

Hence the conclusion follows by the Chebyshev inequality.

3) follows by Lemma B.10 and the Chebyshev inequality. Lemma B.12 For D1 , D2 = Ω∗−1 , Ω∗−1 (IT∗ ⊗ A) Ω∗−1 , Ω∗−1 (JT∗ ⊗ In ) Ω∗−1 or Ω∗−1 (KT∗ ⊗ In ) Ω∗−1 , ¤ £ 1) n−1 u∗0 D1 Ω∗ D2 u∗ − σ 2v tr (D1 Ω∗ D2 Ω∗ ) = op (1) , 2) n−1 [X ∗0 D1 Ω∗ D2 X ∗ − E (X ∗0 D1 Ω∗ D2 X ∗ )] = op (1) .

Proof. Let R = D1 Ω∗ D2 . Note that R is uniformly bounded in both row and column sums. ⎞ ⎛ R11 R12 ⎜ n×nT ⎟ To show 1), first note that E (u∗0 Ru∗ ) = σ 2v tr (RΩ∗ ) . Now write R = ⎝ n×n ⎠ . Let R21 R22

u0 = y0 − x eπ 0 and u = Y − ρ0 Y−1 − Xβ 0 − Zγ 0 . Then,

nT ×n

nT ×nT

0 ) u) Var (u∗0 Ru∗ ) = Var (u00 R11 u0 + u0 R22 u + u00 (R12 + R21 0 ) u) + 2Cov (u00 R11 u0 , u0 R22 u) = Var (u00 R11 u0 ) + Var (u0 R22 u) + Var (u00 (R12 + R21 0 0 ) u) + 2Cov (u0 R22 u, u00 (R12 + R21 ) u) . +2Cov (u00 R11 u0 , u00 (R12 + R21

By Lemmas B.2-B.4, we can show that each term on the right hand side is O (n) . Consequently, 1) follows by the Chebyshev ⎛ inequality. ⎞ e 0 R22 X e X e 0 R21 x X e ⎠ . It is easy to show n−1 x e0 R11 x e converges in probability Next, X ∗0 RX ∗ = ⎝ e x e e0 R11 x x e0 R12 X e 0 R22 X, e n−1 X e 0 R21 x e converge in probability to to its expectation. To show n−1 X e, and n−1 x e0 R12 X 37

e (recall X e = (X, Z, Y−1 )). their expectations, the major difficulty lies in the appearance of Y−1 in X ¡ ¢ p 0 0 R22 Y−1 − E Y−1 R22 Y−1 ) → 0 and the other cases can be proved We show below that n−1 (Y−1

similarly. To this goal, we need to use (B.2) (with Y0 = 0nT ×1 ) to obtain 0 R22 Y−1 n−1 Y−1

¡ ¢0 = n−1 X(−1) β 0 + (lρ ⊗ In ) zγ 0 + (lρ ⊗ In ) μ + V(−1) R22 ¡ ¢ × X(−1) β 0 + (lρ ⊗ In ) zγ 0 + (lρ ⊗ In ) μ + V(−1) .

After expressing out the right hand side of the last expression, it has 16 terms, most of which can easily be shown to converge to their respective expectations. The exceptions are terms involving X(−1) or V(−1) , namely: n−1 β 00 X0(−1) R22 X(−1) β 0 , n−1 β 00 V0(−1) R22 V(−1) , n−1 β 00 X0(−1) R22 V(−1) , n−1 β 00 X0(−1) R22 (lρ ⊗ In ) zγ 0 , n−1 β 00 X0(−1) R22 (lρ ⊗ In ) μ, n−1 V0(−1) R22 (lρ ⊗ In ) zγ 0 , and n−1 V0(−1) R22 (lρ ⊗ In ) μ. The first three terms converge in probability to their expectations by Lemma B.11. We can show the other terms converge in probability to their expectations by similar arguments to those used in proving Lemmas B.9-B.11. Lemma B.13 Let R be an n (T + 1) × n (T + 1) nonstochastic matrix that is uniformly bounded in both row and column sums, e.g., In(T +1) , Ω∗−1 (IT∗ ⊗ A) , Ω∗−1 (JT∗ ⊗ In ) , or Ω∗−1 (KT∗ ⊗ In ) . Then ³ ³¡ ´´ ¢ p −1 /n, 01×k )0 , n−1 X ∗0 RΩ∗−1 u∗ → (00p×1 , 00q×1 , limn→∞ σ 2v tr RΩ∗−1 ι∗T lρ∗0 ⊗ In + Jρ∗ ⊗ (B 0 B) where



Jρ∗ = ⎝

0

01×T





⎠ , lρ∗ = ⎝

01×1





⎠ , ι∗T = ⎝

1/ (1 − ρ)

⎞ ⎠

ιT jρ Cρ lρ ¡ ¢ ¡ ¢ p and jρ0 = 1, ρ, ..., ρT −1 / 1 − ρ2 . In particular, when R = In(T +1) , we have: n−1 X ∗0 Ω∗−1 u∗ → 0(p+q+k+1)×1 and E(X ∗0 Ω∗−1 u∗ ) = 0. Proof. Recall ⎛ X∗ = ⎝

0n×p

0n×q

0n×1

X

Z

Y−1

x e

0nT ×k





⎠, u e∗ = ⎝

y0 − x eπ 0

Y − Xβ 0 − Zγ 0 − Y−1 ρ0



⎠.

By the strict exogeneity of X and Z, we can readily show that [0p×n X 0 ] RΩ∗−1 u∗ , [0q×n Z 0 ] RΩ∗−1 u∗ p

p

and [e x0 0k×nT ] RΩ∗−1 u∗ have expectation zero, n−1 [0p×n X 0 ] RΩ∗−1 u∗ → 0, n−1 [0q×n Z 0 ] RΩ∗−1 u∗ → p

x0 0k×nT ] RΩ∗−1 u∗ → 0. Let 0, and n−1 [e ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 0 0 0 n×1 n×p n×1 ∗ ⎠ , X∗ = ⎝ ⎠ , and V∗ = ⎝ ⎠. Y−1 =⎝ Y−1 X(−1) V(−1) 38

Then we are left to show ³ ³¡ n ´´o p ¢ −1 ∗0 RΩ∗−1 u∗ − σ2v tr R ι∗T lρ∗0 ⊗ In + Jρ∗ ⊗ (B 0 B) → 0, and n−1 Y−1 ³ ³¡ ´´ ¢ ¢ ¡ ∗ −1 RΩ−1 u = σ 2v tr R ι∗T lρ∗0 ⊗ In + Jρ∗ ⊗ (B 0 B) . E Y−1

Using (B.2), Y−1 = X(−1) β 0 + (lρ ⊗ In ) zγ 0 + (lρ ⊗ In ) μ + V(−1) , we have ∗0 RΩ∗−1 u∗ Y−1

¡ ¢ = μ0 lρ∗0 ⊗ In RΩ∗−1 u∗ + V∗0 RΩ∗−1 u∗

4 X ¡ ¢ +β 00 X∗ RΩ∗−1 u∗ + γ 00 z lρ∗0 ⊗ In RΩ∗−1 u∗ ≡ Anj . j=1

It is easy to show that E (Anj ) = 0 and n−1 Anj = op (1) for j = 3, 4. So we will show that ³ ³¡ ´´ ¢ −1 + op (n) . Let An1 + An2 = σ2v tr R ι∗T lρ∗0 ⊗ In + Jρ∗ ⊗ (B 0 B) ⎛ P ⎞ ∞ j −1 ρ B v −j j=0 ⎠. v∗ = ⎝ ¡ ¢ IT ⊗ B −1 v

Noting that u∗ = (u00 , u0 )0 , where u0 = ξ + μ/ (1 − ρ) + ¢ ¡ IT ⊗ B −1 v, we have

P∞

j=0

ρj B −1 v−j , and u = (ιT ⊗ In ) μ +

£ ¡ ¤ £ ¢ ¡¡ ¢ ¢¤ E (An1 ) = E μ0 lρ∗0 ⊗ In RΩ∗−1 (ι∗T ⊗ In ) μ = φμ σ 2υ tr RΩ∗−1 ι∗T lρ∗0 ⊗ In

and

h ³ ´i £ ¤ −1 , E (An2 ) = E V∗0 RΩ∗−1 v ∗ = σ2υ tr RΩ∗−1 Jρ∗ ⊗ (B 0 B)

where we have used (3.35) and the fact that E (v ∗ V∗0 ) = Jρ∗ ⊗ (B 0 B)

−1

. Hence

³¡ ´´ ³ ¢ −1 . E (An1 + An2 ) = σ 2v tr RΩ∗−1 ι∗T lρ∗0 ⊗ In + Jρ∗ ⊗ (B 0 B)

¡ ¢ We can show that E A2nj = O (n) for j = 1, 2. The first conclusion then follows from the Chebyshev ³ ³¡ ´´ ¢ −1 inequality. When R = In(T +1) , tedious calculation shows that tr Ω∗−1 ι∗T lρ∗0 ⊗ In + Jρ∗ ⊗ (B 0 B) = 0 and we can also verify that E (Ani ) = 0 for i = 3, 4. This completes the proof. Lemma B.14 Suppose that the conditions in Theorem 4.4 are satisfied. Then d

1 X ∗0 Ω∗−1 u∗ → N (0, Γrr,1 ) , where Γrr,11 = p limn→∞ (nT )−1 X ∗0 Ω∗−1 X ∗ . 1) √nT 1 2) √nT

∂Lrr (ς 0 ) d → ∂ς

N (0, Γrr ) .

39

Proof. For 1), by the Cramer-Wold device, it suffices to show that for any c = (c01 , c02 )0 ∈ d

Rp+q+1 × Rk such that kck = 1, (nT )−1/2 c0 X ∗0 Ω∗−1 u∗ → N (0, c0 Γrr,11 c) . Write −1/2 0

−1/2

c X ∗0 Ω∗−1 u∗ = (nT )

Tn ≡ (nT )

h

i e 0 ω 21 u0 + c01 X e 0 ω 22 u + c02 x e0 ω 11 u0 + c02 x e0 ω 12 u . c01 X

Analogous to the proof of Lemma B.8, we can write Tn as the summation of six asymptotically P6 0 independent terms, namely, Tn = i=1 Tni , where Tni s are linear and quadratic functions of μ,

linear and quadratic functions of v, linear function of μ and v, linear function of μ and e ξ, linear function of v and e ξ, and linear function of e ξ, respectively. Further

and E

p d {Tni − E (Tni )} / Var (Tni ) → N (0, 1) ,

³P 6

´ 0 T s, we have = 0. Now by the asymptotic independence of Tni ni i=1 6 1 X d √ Tni → N Tn = nT i=1

implying that (nT )

−1/2

Ã

0, lim (nT ) n→∞

−1

6 X

!

Var (Tni ) ,

i=1

d

−1

X ∗0 Ω∗−1 u∗ → N (0, Γrr,11 ) because we can show that (nT )

(X ∗0 Ω∗−1 X ∗

−Var(X ∗0 Ω∗−1 u∗ )) = op (1) . The proof of 2) is similar and thus omitted. The next three lemmas are used in the proof of Theorem 4.6 for the fixed effects model. Lemma B.15 For D1 , D2 = Ω†−1 , Ω†−1 Ω†λ Ω†−1 , Ω†−1 Ω†cm Ω†−1 or Ω†−1 Ω†φe Ω†−1 , ¡ £ ¢¤ 1) n−1 ∆u0 D1 Ω† D2 ∆u − σ 2v tr D1 Ω† D2 Ω† = op (1) , ¡ ¢¤ £ 2) n−1 ∆X 0 D1 Ω† D2 ∆X − E ∆X 0 D1 Ω† D2 ∆X = op (1) .

Proof. Let R = D1 Ω† D2 . Note that R is uniformly bounded in both row and column sums. Since ¡ ¢ ¢ ¡ E (∆u0 R∆u) = σ 2v tr RΩ† , by the Chebyshev inequality 1) follows provided Var n−1 ∆u0 R∆u = ´0 ³ Pm−1 0 0 0 , ∆v(1) . Then o (1) . Let ∆v(0) = Bζ + j=0 ρj ∆v1−j , ∆v(1) = (∆v20 , ....∆vT0 ) , and ∆v = ∆v(0) ³ ³ ´ ¡ ´ ¡ ¢ ¢ 0 0 e e ≡ In ⊗ B −1 R In ⊗ B −1 . where R ∆u0 R∆u = ∆v 0 In ⊗ B −1 R In ⊗ B −1 ∆v = ∆v 0 R∆v, ⎞ ⎛ R00 R01 ⎟ ⎜ n×n n×n(T −1) e similarly. Let C be a (T − 1) × T ⎟ and partition R Now write R = ⎜ ⎠ ⎝ R10 R11 n(T −1)×n

n(T −1)×n(T −1)

matrix with Cij = −1 if i = j, Cij = 1 if j = i + 1, and Cij = 0 otherwise. Then ∆v(1) = (C ⊗ In ) v, where v = (v10 , ..., vT0 )0 . So e ∆v 0 R∆v

0 e 0 e 0 0 (R01 + R10 ) ∆v(1) = ∆v(0) R00 ∆v(0) + ∆v(1) R11 ∆v(1) + ∆v(0)

0 e 0 e11 (C ⊗ In ) v + ∆v 0 (R01 + R10 = ∆v(0) ) (C ⊗ In ) v R00 ∆v(0) + v 0 (C 0 ⊗ In ) R (0)

40

Then, Var (∆u0 R∆u) ´ ³ ´ ´ ³ ³ 0 e 0 e11 (C ⊗ In ) v + Var ∆v 0 (R01 + R10 ) (C ⊗ I ) v R00 ∆v(0) + Var v 0 (C 0 ⊗ In ) R = Var ∆v(0) n (0) ´ ³ 0 e e11 (C ⊗ In ) v R00 ∆v(0) , v 0 (C 0 ⊗ In ) R +2Cov ∆v(0) ´ ³ 0 e 0 0 (R01 + R10 ) (C ⊗ In ) v R00 ∆v(0) , ∆v(0) +2Cov ∆v(0) ´ ³ 0 e11 (C ⊗ In ) v, ∆v 0 (R01 + R10 ) (C ⊗ In ) v . +2Cov v 0 (C 0 ⊗ In ) R (0)

By the Cauchy-Schwartz inequality it suffices to show that the first three terms on the right hand side ¢ ¡ ¢ ¡ are O (n) since then Var n−1 ∆u0 R∆u = O n−1 = o (1) . Write ∆v(0) = Bζ + v1 − ρm−1 v−m+1 + Pm−2 j 0e j=0 ρ (ρ − 1) v−j . Since B R00 B is uniformly bounded in both row and column sums, n h³ ´ ´ i2 ³ ³ ´ ´ ³ X 0 e00 e00 Bζ = κζ e00 B e00 BB 0 R e00 + R + σ 4ζ tr B 0 R B0R B = O (n) . Var ζ 0 B 0 R ii

i=1

´ ³¡ ³ ¢ ¡ ¢´ e00 v1 = O (n) , Var ρm−1 v−m+1 0 R e00 ρm−1 v−m+1 = O (n) , and Var(Pm−2 Similarly Var v10 R j=0 Pm−2 j j 0 e ρ (ρ − 1) v−j R00 j=0 ρ (ρ − 1) v−j ) = O (n) . It follows by the Cauchy-Schwartz inequality that ´ ´ ³ ³ 0 e e11 (C ⊗ In ) v R00 ∆v(0) = O (n) . By the same token, we can show that Var v 0 (C 0 ⊗ In ) R Var ∆v(0) ´ ³ 0 0 (R01 + R10 ) (C ⊗ In ) v = O (n) . This completes the proof of 1). The proof = O (n) , and Var ∆v(0) of 2) is similar and thus omitted.

Lemma B.16 Let R be an nT × nT nonstochastic matrix that is uniformly bounded in both row p

and column sums, e.g., In(T +1) , Ω†−1 , Ω†−1 Ω†λ , Ω†−1 Ω†cm or Ω†−1 Ω†φe . Then n−1 ∆X 0 RΩ†−1 ∆u → n h io −1 (00p×1 , 00q×1 , limn→∞ σ 2υ tr RΩ†−1 C1 ⊗ (B 0 B) + C2 ⊗ In /n, 0)0 , where C1 is a T × T matrix defined by



⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ C1 = ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

2ρ 1+ρ

³

2ρ 1+ρ

´ −1 ···

³ ´ ³ ´ 2ρ 2ρ ρT −4 ρ 1+ρ −1 ρT −3 ρ 1+ρ −1 ¡ ¢ ¡ ¢ ρT −5 2ρ − 1 − ρ2 ρT −4 2ρ − 1 − ρ2 ¡ ¢ ¡ ¢ ρT −6 2ρ − 1 − ρ2 ρT −5 2ρ − 1 − ρ2 .. .. . .

0

2 1+ρ

0

−1

2−ρ

2ρ − 1 − ρ2

···

0 .. .

0 .. .

−1 .. .

2−ρ .. .

··· .. .

0

0

0

0

···

−1

2−ρ

0

0

0

0

···

0

−1

−1 ρ

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

¡ ¢ and C2 is a T ×T matrix whose first row is 0, φe , ρφe , ..., ρT −2 φe and all other row elements are zero. p

In particular, when R = InT , we have n−1 ∆X 0 Ω†−1 ∆u → 0(p+k+1)×1 and E(∆X 0 Ω†−1 ∆u) = 0. 41

³ ´0 ¡ 0 ¢0 0 0 Proof. Let ∆X ∗ = (0p×n , ∆x02 , ..., ∆x0T ) , ∆Y ∗ = 01×n , ∆y1 , ..., ∆yT0 −1 , ∆e x∗ = ∆e x , 0k×n(T −1) . ¡ ¢ x∗ ) . By the notation in the proof of Lemma B.15, ∆u = In ⊗ B −1 ∆v. Then ∆X = (∆X ∗ , ∆Y ∗ , ∆e

x∗ RΩ†−1 ∆u By the strict exogeneity of X and Z, we can readily show that ∆X ∗ RΩ†−1 ∆u, and ∆e p

p

x∗ RΩ†−1 ∆u → 0. We are left to show have expectation zero, n−1 ∆X ∗ RΩ†−1 ∆u → 0, and n−1 ∆e n n h ioo p −1 → 0, n−1 ∆Y ∗0 RΩ†−1 ∆u − σ 2υ tr RΩ†−1 C1 ⊗ (B 0 B) + C2 ⊗ In

n h io ¢ ¡ −1 and E ∆Y ∗0 RΩ†−1 ∆u = σ 2υ tr RΩ†−1 C1 ⊗ (B 0 B) + C2 ⊗ In . ³ ´ ¡ ¢0 PT −3 0 0 Let kρ = 0, 1, ρ, ..., ρT −2 , X = 01×n , 01×n , (∆x2 β 0 ) , ..., j=0 ρj (∆xT −1−j β 0 ) , and V = ´ ³ P −3 j ρ (∆vT −1−j )0 . Since ∆y1 = ∆e xπ + e and 01×n , 01×n , (∆v2 )0 , ..., Tj=0 ∆yt = ρt−1 ∆y1 +

t−2 X

ρj ∆xt−j β +

j=0

t−2 X

ρj B −1 ∆vt−j for t = 2, 3, ...,

(B.6)

j=0

we have ¡ ¢ ∆Y ∗ = kρ ⊗ ∆y1 + X + IT ⊗ B −1 V.

So ∗0

†−1

∆Y RΩ

0

†−1

∆u = X RΩ

3 X ¡ 0 ¢ ¡ ¢ 0 †−1 0 0−1 †−1 ∆u + kρ ⊗ ∆y1 RΩ ∆u + V IT ⊗ B Anj . RΩ ∆u ≡ j=1

It is easy to show that E (An1 ) = 0 and n−1 An1 = op (1) . Now after some tedious calculations we can show that © £¡ ¢ ¡ ¢¤ª E (An2 + An3 ) = E RΩ†−1 ∆u kρ0 ⊗ ∆y10 + V 0 IT ⊗ B 0−1 n h io −1 = σ 2υ tr RΩ†−1 C1 ⊗ (B 0 B) + C2 ⊗ In .

¢ ¡ and that E A2nj = O (n) for j = 2, 3. The first conclusion then follows from the Chebyshev inequal-

ity.

When R = InT , write ³ ³ ´´ ¡ ¢ −1 E (An2 + An3 ) = σ 2υ tr Ω†−1 C1 ⊗ (B 0 B) + σ 2υ tr Ω†−1 (C2 ⊗ In ) ≡ An2 + An3 .

−1 Using (3.39) and the explicit expressions for h−1 0 and h1 , we can show that

where c =

PT −1 j=1

¡ ¡ ¢ ¢ An2 = −cφe σ2υ tr E ∗−1 BB 0 and An3 = −cφe σ 2υ tr E ∗−1 BB 0 ,

(T − j) ρj−1 . Consequently, E (An2 + An3 ) = 0. This completes the proof. 42

Lemma B.17 Suppose that the conditions in Theorem 4.6 are satisfied. Then d

1 ∆X 0 Ω†−1 ∆u → N (0, Γf,11 ) , where Γf,11 = p limn→∞ (nT )−1 ∆X 0 Ω†−1 ∆X. 1) √nT 1 2) √nT

∂Lf (ς 0 ) d → ∂ς

N (0, Γf ) . 0

Proof. For 1), by the Cramer-Wold device, it suffices to show that for any c = (c01 , c2 , c03 ) ∈ d

−1/2

∆X 0 Ω†−1 ∆u → N (0, c0 Γf,11 c) . As in the proof of Lemma ³ ´0 ¡ 0 ¢0 0 0 x∗ = ∆e x , 0k×n(T −1) . B.16, let ∆X ∗ = (0p×n , ∆x02 , ..., ∆x0T ) , ∆Y ∗ = 01×n , ∆y1 , ..., ∆yT0 −1 , ∆e Rp × R × Rk such that kck = 1, (nT )

Then

Tn ≡ ∆X 0 Ω†−1 ∆u = (c01 ∆X ∗ + c02 ∆Y ∗ + c03 ∆x∗ )Ω†−1 ∆u. By the proof of Lemma B.16. ¡ ¢ ∆Y ∗ = X + (kρ ⊗ ∆y1 ) + IT ⊗ B −1 V.

¡ 0 ¢ P 0 j −1 Let V ∗ = v−m+1 , v−m+2 , ..., vT0 . We can write ∆y1 = ∆e xπ + ζ + m−1 ∆v1−j = ∆e xπ + ζ + j=0 ρ B

C1 V ∗ , V =C2 V ∗ and ∆u = C4 ζ + C3 V ∗ for some matrices Ci , i = 1, 2, 3, 4. Consequently, ∆Y ∗ = X † + kρ ⊗ ζ + C5 V ∗ ,

¡ ¢ 0 where X † = X +kρ ⊗ (∆e xπ) , and C5 = (kρ,1 C10 , ..., kρ,T C10 ) + IT ⊗ B −1 C2 . As a result, we can write

Tn

¡ ¢ = (c01 ∆X ∗ + c02 X † + kρ ⊗ ζ + C5 V ∗ + c03 ∆x∗ )Ω†−1 (C4 ζ + C3 V ∗ ) ¢ ¡ = a01 ζ + ζ 0 A1 ζ + (a02 V ∗ + V ∗0 A2 V ∗ ) + ζ 0 A3 V ∗ , ≡ Tn1 + Tn2 + Tn3 ,

where ai and Ai are vectors or matrices that involve ∆X ∗ , X † , ∆x∗ , and the nonstochastic matrices Cj (j = 3, 4, 5). By Lemma B.7,

By the facts that E have

³P 3

i=1

p d {Tni − E (Tni )} / Var (Tni ) → N (0, 1) . Tni

´

= 0 and that Tni , i = 1, 2, 3, are asymptotically independent, we

3 1 X 1 d √ Tn = √ Tni → N nT nT i=1 −1/2

implying that (nT )

Ã

0, lim (nT )

d

n→∞

−1

3 X

!

Var (Tni ) ,

i=1

−1

∆X 0 Ω†−1 ∆u → N (0, Γf,11 ) because we can show that (nT )

−Var(∆X 0 Ω†−1 ∆u)) = op (1) . The proof of 2) is similar and thus omitted. 43

(∆X 0 Ω†−1 ∆X

The next three Lemmas are used in simplifying the covariance-covariance matrix of the score function. Lemma B.18 Suppose that the conditions in Theorem 4.2 are satisfied. Then ⎛ ¡ ¢ ⎞ ¡ 3 ¢ 0 E μ31 X 0 gqn ,1 diag (Gpn ,1 ) + E v11 X gqn ,2 diag (Gpn ,2 ) ⎟ ´ ⎜ ³ ¡ 3 ¢ 0 ¡ ¢ ⎟ e 0 qn uu0 pn u = ⎜ E X ⎜ E μ31 Z 0 gqn ,1 diag (Gpn ,1 ) + E v11 Z gqn ,2 diag (Gpn ,2 ) ⎟ , ⎝ ⎠ ¢ ¡ 0 qn uu0 pn u E Y−1 ¢ ¡ 0 ¢ ¡ ¢ ¡ 3 ¢ 0 ¡ 0 qn uu0 pn u = σ 2v E Y−1 qn Ωpn u +E μ31 Q0n gqn ,1 diag (Gpn ,1 )+E v11 Qn gqn ,2 diag (Gpn ,2 ) where E Y−1 ¢ ¢ ¡ ¢¢ Pn ¡¡ 0 PnT ¡ +κμ i=1 lρ ⊗ In qn (ιT ⊗ In ) ii Gpn ,1ii + κv i=1 Av qn IT ⊗ B −1 ii Gpn ,2ii , Qn = E[Ax X 0 β 0 ¡ ¢ + (lρ ⊗ In ) zγ 0 + Y0 ], gqn ,1 = qn (ιT ⊗ In ) , gqn ,2 = qn IT ⊗ B −1 , and Ax , Av , Y0 and lρ are defined in (B.2) and (B.5).

Proof. The proof is similar to that of Lemma B.4. Lemma B.19 Let qn and pn be n (T + 1) × n (T + 1) symmetric nonstochastic matrix. Suppose that the conditions in Theorem 4.4 are satisfied. Then ¡ ¢ ¡ ¢ ¡ ¢ 1) E (X ∗0 qn u∗ u∗0 pn u∗ ) = E μ31 X ∗0 qn d1 diag (d01 pn d1 ) + E ζ 31 X ∗0 qn d2 diag (d02 pn d2 ) + E a32 ¡ 3 ¢ ∗0 X qn d4 diag (d04 pn d4 ) , X ∗0 qn d3 diag (d03 pn d3 ) + E v11 P P 2) E (u∗0 qn u∗ u∗0 pn u∗ ) = σ 4v tr (Ω∗ qn ) tr (Ω∗ pn )+κμ ni=1 (d01 qn d1 )ii (d01 pn d1 )ii +κζ ni=1 (d02 qn d2 )ii P P × (d02 pn d2 )ii + κa2 ni=1 (d03 qn d3 )ii (d03 pn d3 )ii + κv ni=1 (d04 qn d4 )ii (d04 pn d4 )ii , where d0i s and a2 are defined in (B.8). Proof. Write ⎛

⎞ −1 + B a 2 u∗ = ⎝ ¡ ¢ ⎠ = d1 μ + d2 ζ + d3 a2 + d4 v, (ιT ⊗ In ) μ + IT ⊗ B −1 v

where d1

d4



= ⎝ ⎛

= ⎝

ζ+

1 1−ρ

ιT 01×1 0T ×1

μ 1−ρ





⎠ ⊗ In , d2 = ⎝ 01×T IT



1 0T ×1





⎠ ⊗ In , d3 = ⎝

⎠ ⊗ B −1 , and a2 =

∞ X j=0

ρj v−j .

1 0T ×1

(B.7)



⎠ ⊗ B −1 , (B.8)

Using the fact that μ, ζ, a2 , and v are mutually independent, we can readily show 1) and 3). For

44

example, for 3), we can apply Lemma B.3 to each term in (B.7) to get E (u∗0 qn u∗ u∗0 pn u∗ ) = σ 4μ [tr (d01 qn d1 ) tr (d01 pn d1 ) + 2tr {d01 qn d1 d01 pn d1 }] + κμ

n X

i=1 n X

+σ 4ζ [tr (d02 qn d2 ) tr (d02 pn d2 ) + 2tr {d02 qn d2 d02 pn d2 }] + κζ +

σ 4v (1 −

ρ2 )2

= σ 4v tr (Ω∗ qn ) tr (Ω∗ pn ) + κμ

n X

i=1

i=1

n X

(d01 qn d1 )ii (d01 pn d1 )ii + κζ

i=1

+κa2

(d02 qn d2 )ii (d02 pn d2 )ii

[tr (d03 qn d3 ) tr (d03 pn d3 ) + 2tr {d03 qn d3 d03 pn d3 }] + κa2

+σ 4v [tr (d04 qn d4 ) tr (d04 pn d4 ) + 2tr {d04 qn d4 d04 pn d4 }] + κv

n X

(d01 qn d1 )ii (d01 pn d1 )ii

(d03 qn d3 )ii (d03 pn d3 )ii + κv

n X

(d03 qn d3 )ii (d03 pn d3 )ii

i=1

(d04 qn d4 )ii (d04 pn d4 )ii

i=1 n X

(d02 qn d2 )ii (d02 pn d2 )ii

i=1

n X

(d04 qn d4 )ii (d04 pn d4 )ii .

i=1

Lemma B.20 Let qn and pn be nT × nT symmetric nonstochastic matrix. Suppose that the condix∗ ) as in the proof of Lemma B.16. tions in Theorem 4.6 are satisfied. Write ∆X = (∆X ∗ , ∆Y ∗ , ∆e Then 1) E (∆X 0 qn ∆u∆u0 pn ∆u) ⎛ ¡ ¢ ¡ ¢ ¡ ¢ ¡ 3 ¢ (1) (2) E ζ 31 ∆X ∗0 qn diag (p11 ) +E ve13 ∆X ∗0 qn diag B 0−1 p11 B −1 +E v11 ∆X ∗0 qn d diag (d0 pn d) ⎜ ⎜ =⎜ E (∆Y ∗0 qn ∆u∆u0 pn ∆u) ⎝ ¡ ¢ ¡ ¢ ¡ 3 ¢ ¡ ¢ (1) (2) x∗0 qn diag (p11 ) +E ve13 ∆e x∗0 qn diag B 0−1 p11 B −1 +E v11 ∆e x∗0 qn d diag (d0 pn d) E ζ 31 ∆e ¡ ¢ ¡ ¢ ¢ P P n n ¡ 2) E (∆u0 qn ∆u∆u0 pn ∆u) = σ 4v tr Ω† qn tr Ω† pn +κζ i=1 q11,ii p11,ii +κvh i=1 B 0−1 q11 B −1 ii ¡ 0−1 ¢ Pn B p11 B −1 ii + κv i=1 (d0 qn d)ii (d0 pn d)ii . ¡ ¢ ¡ ¢ ¡ 3 ¢ 0 Qn gqn ,2 where E (∆Y ∗0 qn ∆u∆u0 pn ∆u) = σ 2v E ∆Y ∗0 qn Ω† pn ∆u +E μ31 Q0n gqn ,1 diag (Gpn ,1 )+E v11 ¢ ¢ ¡ ¢¢ Pn ¡¡ 0 PnT ¡ −1 diag (Gpn ,2 ) +κμ i=1 lρ ⊗ In qn (ιT ⊗ In ) ii Gpn ,1ii +κv i=1 Av qn IT ⊗ B G , Qn = ii pn ,2ii

E[Ax X 0 β 0 + (lρ ⊗ In ) zγ 0 + Y0 ], and Ax , Av , Y0 and lρ are defined in (B.2) and (B.5). Pm (1) (2) 0 , 0n×n(T −1) )0 , qn = ((q11 B −1 )0 , (q21 B −1 )0 )0 , q11 (p11 ) is the vei = j=0 ρj ∆vi,1−j , qn = (q11

upper left n × n submatrix of qn (pn ), q21 is the lower left n (T − 1) × n submatrix of qn , d =

(0T ×1 , C 0 )0 ⊗ B −1 , and C is defined in the proof of Lemma B.15.

Proof. The proof is similar to that of Lemma B.19 and thus omitted.

45



⎟ ⎟ ⎟, ⎠

C

Proofs of the Theorems

Proof of Theorem 4.1 By Theorem 3.4 of White (1994), it suffices to show that p

r (nT )−1 [Lr∗ c (δ) − Lc (δ)] → 0 uniformly in δ ∈ Λ,

(C.1)

and (nT ) lim sup max c n→∞ ρ∈N (ρ0 )

−1

r∗ [Lr∗ c (δ) − Lc (δ 0 )] < 0 for any

> 0,

(C.2)

where N c (δ 0 ) is the complement of an open neighborhood of δ 0 on Λ of diameter . By (3.5) and 1 r e2v (δ) − ln σ b2v (δ)). To show (C.1) , it is sufficient to show (4.3) , (nT )−1 [Lr∗ c (δ) − Lc (δ)] = − 2 (ln σ

By (3.4) , we can write

where M = InT

e2v (δ) = op (1) uniformly on Λ. σ b2v (δ) − σ

(C.3)

1 1 0 −1/2 0 e (δ) = M Ω−1/2 Y, u e (δ) Ω−1 u Y Ω nT nT ³ ´ e X e 0 Ω−1 X e X e 0 Ω−1/2 . Noting that M Ω−1/2 X e = 0, we have σ − Ω−1/2 X b2v (δ) = σ b2v (δ) =

(nT )−1 u0 Ω−1/2 M Ω−1/2 u. Then by (4.2) , we have e2v (δ) = σ b2v (δ) − σ

³ ´i 1 h 0 −1/2 M Ω−1/2 u − σ2v0 tr Ω−1/2 M Ω−1/2 Ω0 uΩ nT ´ ¡ ¢i σ2 h ³ + v0 tr Ω−1/2 M Ω−1/2 Ω0 − tr Ω−1 Ω0 nT

≡ Tn1 + Tn2 .

We can readily show that both Tn1 and Tn2 are op (1) uniformly on Λ. To show (C.2) , we follow Lee (2002) and define an auxiliary process: YnT = UnT , where UnT ∼ ¢ ¡ ¡ ¢ N 0, σ 2v Ω with Ω = Ω (δ) . The true parameter value is given by σ 2v0 , δ 0 . Let Ω0 = Ω (δ 0 ) . The ¡ ¢ nT 1 2 log likelihood function of the auxiliary process is Lra δ, σ 2v = − nT 2 log(2π) − 2 log(σ v ) − 2 log |Ω| − ¡ ¢ 1 0 −1 r 2 UnT . We can verify that Lr∗ c (δ) = maxσ 2v Ea La δ, σ v , where Ea is the expectation 2σ 2v UnT Ω ¡ ¢ r 2 r∗ under the auxiliary process. By the Jensen inequality, Lr∗ c (δ) ≤ Ea La δ 0 , σ v0 = Lc (δ 0 ) for all δ. Suppose the identification uniqueness condition in (C.2) is not satisfied, then there would exist a −1

sequence δ n ∈ Λ such that δ n → δ ∗ 6= δ 0 , and limn→∞ (nT ) would contradict to Assumption R(iv). This completes the proof of the theorem. ¥ 46

∗ r∗ [Lr∗ c (δ ) − Lc (δ 0 )] = 0. The latter

Proof of Theorem 4.2 By the Taylor series expansion 1 ∂Lr (bς ) 1 ∂Lr (ς 0 ) 1 ∂ 2 Lr (eς ) √ nT (b ς − ς 0) , 0= √ =√ + ∂ς nT ∂ς∂ς 0 nT ∂ς nT where elements of e ς lie in the segment joining the corresponding elements of b ς and ς 0 . Thus ∙ ¸−1 √ 1 ∂ 2 Lr (eς ) 1 ∂Lr (ς 0 ) √ nT (b ς − ς 0) = − . nT ∂ς∂ς 0 ∂ς nT p

p

ς → ς 0 , and it suffices to show that By Theorem 4.1, bς → ς 0 . Consequently, e 1 ∂ 2 Lr (ς 0 ) 1 ∂ 2 Lr (e ς) − = op (1) , 0 nT ∂ς∂ς nT ∂ς∂ς 0

(C.4)

1 ∂ 2 Lr (ς 0 ) p → Σr , nT ∂ς∂ς 0

(C.5)

and 1 ∂Lr (ς 0 ) d √ → N (0, Σr + Λr ) . (C.6) ∂ς nT ³ ´ p p e ,λ e → e≡Ω φ Ω0 . By Lemmas B.2, B.5 and B.6, and the fact that As e ς → ς 0 , it follows that Ω μ ´ ³ ´ ³ ee e θ0 − e θ , (C.4) holds for each of its component, e.g., ∂ 2 Lr (eς ) /∂θ∂θ0 . The u e θ = Y −X θ = u+X

result in (C.5) follows from Lemmas B.5 and B.6. (C.6) is proved in Lemma B.8. ¥

Proof of Theorem 4.3 The proof is almost identical to that of Theorem 4.1 and thus omitted. ¥ Proof of Theorem 4.4 The proof is analogous to that of Theorem 4.2 and now follows mainly from Lemmas B.12-B.14.¥ Proof of Theorem 4.5 The proof is almost identical to that of Theorem 4.1 and thus omitted. ¥ Proof of Theorem 4.6 The proof is analogous to that of Theorem 4.2 and now follows mainly from Lemmas B.15-B.17.¥

47

ρ Mean

Table 1. Monte Carlo Mean and RMSE for the QMLEs Random Effects Model with Normal Errors, n = 50, T = 3 λ = .25 λ = .50 λ = 0.75 Rmse Mean Rmse Mean Rmse Mean Rmse Mean Rmse Mean

Initial observations are exogenous .25 5.000 0.147 4.989 0.148 1.000 0.024 1.000 0.026 0.998 0.166 0.995 0.166 0.250 0.016 0.252 0.016 0.224 0.130 0.151 0.157 0.484 0.144 0.287 0.275 0.492 0.036 0.608 0.114 .50 4.999 0.144 4.982 0.147 1.000 0.026 0.999 0.030 1.003 0.170 1.000 0.170 0.500 0.014 0.503 0.014 0.230 0.125 0.132 0.177 0.485 0.147 0.174 0.354 0.493 0.037 0.657 0.163 .75 4.992 0.142 4.997 0.143 0.999 0.024 0.999 0.028 1.007 0.167 1.008 0.169 0.750 0.013 0.749 0.012 0.213 0.123 0.101 0.197 0.491 0.158 0.069 0.436 0.491 0.038 0.686 0.192

5.005 1.001 1.000 0.250 0.470 0.497 0.494 5.000 1.000 1.000 0.499 0.466 0.494 0.492 5.007 1.000 0.999 0.750 0.472 0.496 0.492

0.171 0.024 0.176 0.020 0.103 0.146 0.036 0.171 0.024 0.168 0.018 0.114 0.149 0.037 0.162 0.023 0.160 0.018 0.108 0.157 0.038

4.989 1.001 0.997 0.254 0.351 0.309 0.621 4.978 1.000 0.995 0.504 0.307 0.181 0.668 5.018 1.000 1.002 0.747 0.276 0.071 0.704

0.173 0.026 0.177 0.021 0.183 0.266 0.127 0.175 0.029 0.168 0.019 0.229 0.350 0.174 0.161 0.029 0.163 0.017 0.254 0.434 0.210

4.998 1.000 1.004 0.250 0.723 0.504 0.494 4.993 1.000 1.004 0.500 0.722 0.491 0.495 4.997 1.000 1.008 0.750 0.723 0.498 0.493

0.224 0.024 0.173 0.025 0.075 0.160 0.038 0.229 0.023 0.167 0.024 0.078 0.159 0.039 0.232 0.023 0.173 0.024 0.077 0.160 0.038

4.974 1.000 0.999 0.256 0.605 0.317 0.635 4.965 0.999 0.998 0.507 0.563 0.189 0.687 5.016 1.000 1.013 0.746 0.523 0.072 0.730

Rmse 0.230 0.026 0.176 0.029 0.171 0.268 0.141 0.240 0.029 0.172 0.029 0.213 0.346 0.193 0.236 0.030 0.175 0.024 0.253 0.433 0.237

Initial observations are endogenous .25 4.922 0.220 4.945 0.211 4.932 0.213 4.948 0.207 4.944 0.246 4.950 0.244 1.004 0.024 1.009 0.025 1.002 0.022 1.007 0.023 1.002 0.023 1.006 0.023 0.976 0.166 0.986 0.165 0.983 0.178 0.998 0.178 0.972 0.173 0.992 0.171 0.262 0.027 0.258 0.025 0.259 0.024 0.256 0.023 0.259 0.023 0.256 0.022 0.239 0.125 0.223 0.128 0.481 0.101 0.468 0.104 0.731 0.070 0.720 0.074 0.464 0.157 0.546 0.182 0.480 0.152 0.562 0.183 0.473 0.157 0.553 0.185 0.495 0.038 0.570 0.082 0.493 0.038 0.567 0.079 0.495 0.039 0.570 0.083 .50 4.774 0.349 4.910 0.275 4.804 0.324 4.922 0.266 4.830 0.339 4.925 0.296 1.006 0.024 1.009 0.025 1.006 0.024 1.009 0.024 1.006 0.023 1.008 0.024 0.937 0.184 0.982 0.177 0.933 0.180 0.983 0.170 0.944 0.183 0.999 0.177 0.523 0.033 0.509 0.025 0.520 0.030 0.507 0.023 0.518 0.028 0.506 0.022 0.237 0.122 0.227 0.125 0.479 0.102 0.471 0.103 0.728 0.071 0.721 0.072 0.433 0.162 0.540 0.181 0.431 0.159 0.535 0.177 0.448 0.168 0.550 0.197 0.497 0.038 0.569 0.081 0.498 0.038 0.570 0.082 0.499 0.040 0.571 0.084 .75 4.147 0.936 4.869 0.455 4.183 0.900 4.884 0.435 4.254 0.858 4.900 0.457 1.011 0.025 1.010 0.025 1.012 0.025 1.009 0.024 1.011 0.025 1.007 0.023 0.765 0.297 0.976 0.209 0.770 0.289 0.985 0.199 0.806 0.272 1.018 0.209 0.793 0.047 0.756 0.022 0.792 0.046 0.756 0.021 0.788 0.042 0.754 0.020 0.229 0.123 0.223 0.126 0.467 0.107 0.466 0.107 0.719 0.073 0.718 0.072 0.409 0.194 0.526 0.212 0.414 0.185 0.540 0.205 0.419 0.189 0.534 0.211 0.510 0.040 0.573 0.085 0.506 0.040 0.568 0.080 0.509 0.041 0.572 0.085 Note: Under each λ value, first two columns correspond to estimates treating y0 as exogenous, whereas the last two columns correspond to estimates treating y0 as endogenous. Under each ρ value, the seven rows correspond to, respectively, β0 (= 5), β1 (= 1), γ(= 1), ρ, λ, σµ (= .5) and σv (= .5).

48

Table 2. Monte Carlo Mean and RMSE for the QMLEs Fixed Effects with Normal Errors, n = 50, T = 3 ρ λ = .25 λ = .50 λ = 0.75 Mean Rmse Mean Rmse Mean Rmse The fixed effects are randomly generated .25 β1 0.980 0.031 0.982 0.031 ρ 0.236 0.028 0.238 0.027 λ 0.248 0.124 0.484 0.104 σv 0.489 0.038 0.489 0.037 .50 β1 0.975 0.036 0.977 0.035 ρ 0.481 0.033 0.484 0.030 λ 0.246 0.123 0.488 0.105 σv 0.490 0.037 0.491 0.038 .75 β1 0.964 0.044 0.967 0.043 ρ 0.719 0.042 0.722 0.039 λ 0.239 0.125 0.490 0.102 σv 0.488 0.038 0.489 0.040

0.983 0.239 0.733 0.491 0.977 0.485 0.729 0.494 0.968 0.723 0.725 0.493

0.030 0.025 0.072 0.039 0.033 0.029 0.075 0.037 0.041 0.039 0.074 0.037

The fixed effects are average (over T ) of the X values .25 β1 0.976 0.037 0.976 0.037 0.979 0.035 ρ 0.231 0.034 0.230 0.035 0.233 0.034 λ 0.245 0.124 0.485 0.101 0.727 0.075 σv 0.490 0.036 0.488 0.039 0.492 0.036 .50 β1 0.965 0.046 0.966 0.045 0.969 0.042 ρ 0.469 0.044 0.470 0.045 0.471 0.042 λ 0.245 0.121 0.485 0.104 0.729 0.073 σv 0.488 0.040 0.488 0.039 0.491 0.038 .75 β1 0.949 0.060 0.949 0.060 0.953 0.055 ρ 0.700 0.061 0.699 0.062 0.700 0.060 λ 0.243 0.123 0.474 0.113 0.731 0.072 σv 0.487 0.038 0.489 0.039 0.490 0.039 Note: β0 = 5, β1 = 1, γ = 1, σµ = .5 and σv = .5.

49

Suggest Documents