Unified M-Estimation of Fixed-Effects Spatial Dynamic Models with Short Panels∗ Zhenlin Yang School of Economics, Singapore Management University, 90 Stamford Road, Singapore 178903 Email:
[email protected]
July 27, 2017 Abstract It is well known that quasi maximum likelihood (QML) estimation of dynamic panel data (DPD) models with short panels depends on the assumptions on the initial values, and a wrong treatment of them will result in inconsistency and serious bias. The same issues apply to spatial DPD (SDPD) models with short panels. In this paper, a unified M estimation method is proposed for estimating the fixed-effects SDPD models containing three major types of spatial effects, namely spatial lag, spatial error and space-time lag. The method is free from the specification of the distribution of the initial observations and robust against nonnormality of the errors. Consistency and asymptotic normality of the proposed M -estimator are established. A martingale difference representation of the underlying estimating functions is developed, which leads to an initial-condition free estimate of the variance of the M -estimators. Monte Carlo results show that the proposed methods have excellent finite sample performance.
Key Words: Adjusted quasi score; Dynamic panels; Fixed effects; Initial-condition free estimation; Martingale difference; OPMD; Spatial effects; Short panels. JEL classifications: C10, C13, C21, C23, C15
1
Introduction
In the majority of empirical microeconometric research involving panel data, a panel with a large number of cross-sectional units and a small number of time periods, called a short panel, remains the prevalent setting (Hsiao et al., 2002; Binder et al., 2005), and evidence from the standard dynamic panel data models shows that maximum likelihood (ML) estimators are more efficient than GMM estimators (Hsiao et al., 2002; Binder et al., 2005; Bun and Caree, 2005; Gouri´eroux, et al., 2010; Kruiniger, 2013). Hsiao (2003, Ch. 4) gives an excellent summary on dynamic panel data (DPD) models with random or fixed effects. ∗
I would like to thank Anil Bera, Ingmar Prucha, Alain Pirotte, a referee, an associate editor, and Co-Editor Oliver Linton; and the participants of the VIII World Conference of the Spatial Econometrics Association, ETH Zurich, June 2014, the seminar at Shanghai University of Finance and Economics, May 2015, the seminar at Universit´e Paris II Panth´eon-Assas (CRED), June 2016, and the seminar at Northeastern University, China, December 2016, for their helpful comments. The financial support from Singapore Management University under Grant C244/MSS12E007 is gratefully acknowledged.
1
In recent years, there has been a growing interest in the DPD models with cross-sectional or spatial dependence, arising from economic processes such as housing decisions, technology adoption, unemployment, welfare participation, price decisions, etc. The resulted models are referred to as spatial DPD (SDPD) models (Anselin, 2001; Anselin et al., 2008), where the spatial effects may appear in the model either in the form of spatial lag(s) of the response variable (Yu et al., 2008; Yu and Lee, 2010; Lee and Yu, 2010a; Korniotis, 2010; Elhorst, 2010), or in the form of spatial errors (Elhorst, 2005; Yang et al., 2006; Mutl, 2006; Su and Yang, 2015). Lee and Yu (2010b, 2015) provide an excellent survey on the SDPD models. Most of the studies on the SDPD modes are either based on the GMM-type method or under a large panel set-up, except Elhorst (2010) and Su and Yang (2015) who consider the quasi-ML (QML) estimation of the SDPD model with short panels. As ML estimators are more efficient than the GMM estimators, and the latter can perform poorly (Gouri´eroux, et al., 2010), it is natural to expect that any ML-type estimation be more efficient than the corresponding GMM estimation. The main difficulty in using ML or QML method to estimate the DPD or SDPD models with short panels is the modeling of the initial observations of the response vector, say y0 , for the random effects model, or the initial differences, say Δy1 , for the fixed effects model. This is because y0 may be exogenous in the sense that it varies autonomously, independent of other variables in the model; or endogenous in the sense that it is generated in the same way as the other values of the response vector y in the latter time periods. In case that y0 is endogenous, it depends on the processes starting values and the past values of time-varying regressors, both of which are not observable, leading to incidental parameters. In the case of fixed effects model, Δy1 is endogenous whether y0 is exogenous or endogenous and this incidental parameters problem always exists. The traditional way of handling this problem is to predict these quantities using the observed values of the regressors (Anderson and Hisao, 1981, 1982; Bhargava and Sargan, 1983; Hsiao et al. 2002; Elhorst, 2010; Su and Yang, 2015).1 However, the model for the initial differences involves the unknown process starting time. Also, its predictability typically requires that the time-varying regressors be trend or first-difference stationary. And, when there are many time-varying regressors in the model, modelling the initial difference may introduce ‘too many’ additional parameters, causing a efficiency decline. Most importantly, this linear projection method may not be applicable to an SDPD model with spatial lags (see the footnote at the end of Section 2 for some details). It is therefore highly desirable to have a general method that is free from the specification of the initial conditions. In this paper, we propose a unified, initial-condition free approach to estimate the SDPD models with fixed effects, allowing all three major types of spatial dependence to be present in the model, namely, the spatial lag, space-time lag, and spatial error. The approach starts from the ‘conditional’ quasi-likelihood, with the initial differences being treated as if they are exogenous,2 and then makes corrections on the conditional quasi-score functions to give a set 1
In case of fixed effects models, the incidental parameters problem also occurs in the model itself (the fixed effects), but this problem can be resolved by first-differencing or some kind of orthogonal transformations. 2 Clearly, this ‘conditional’ quasi likelihood can never be a correct likelihood as Δy1 is always endogenous,
2
of unbiased estimating equations. Solving these unbiased estimating equations (EFs) leads to estimators that are consistent and asymptotically normal. It turns out that the corrections or the adjustments on the conditional quasi scores are totally free from the specification of the distribution of the initial differences, resulting initial-condition free estimators for the SDPS model. The proposed estimator is simply referred to in this paper as the M -estimator according to Huber (1981) or van der Vaart (1998).3 For initial-condition free inferences, a martingale difference (M.D.) representation of the EFs (being the adjusted quasi scores) is developed, and the average of the outer products of the M.D.s (OPMD) is shown to give a consistent and initial-condition free estimate of the variance covariance (VC) matrix of the EFs. This and the estimated Hessian matrix together give a consistent estimate of the VC matrix of the M -estimators, referred to as the OPMD-estimator in this paper. Monte Carlo results show that the proposed M -estimators of the model parameters and the OPMD-estimator of their VC matrix have excellent finite sample performance – robust against the way the initial observations being generated and nonnormality of the error distributions. Under a special submodel where only spatial error is present, the proposed methods are compared with the traditional method where the initial observations are modeled and the full quasi likelihood is used (Su and Yang, 2015). The results show that the two methods are comparable when the initial observations are correctly specified, but the proposed methods are more robust against misspecifications of the initial conditions. Our Monte Carlo results show that the proposed unified M -estimation method, for the FE-SDPD model with all three types of spatial effects, is not only valid when T is small, but also provides better estimators when T is not small, compared with the conditional quasi likelihood approach. The proposed OPMD method for the VC matrix estimation is valid only when T is small, but when T is large, a plug-in method based on the conditional variance of the adjusted quasi scores, treating the initial differences as exogenous, can be used. The rest of the paper is organized as follows. Section 2 describes the general SDPD model, its submodels, and discusses the limitations of the method of modelling the initial conditions with the SDPD models. Section 3 introduces the unified M -estimation method for the general SDPD model with fixed effects, presents the asymptotic properties of the proposed M -estimators, and introduces the OPMD method for VC matrix estimation and proves its consistency. Section 4 presents Monte Carlo results. Section 5 concludes the paper. A supplement available at the author’s web site: http://www.mysmu.edu/faculty/zlyang/ collects details on several important submodels, detailed proofs of the theoretical results, additional Monte Carlo results, and an empirical illustration with matlab codes. and thus maximizing it may produce inconsistent estimators when the time dimension T is fixed and small. This is intuitively clear as the conditional likelihood ignores the information contained in Δy1 which is a fixed proportion of the whole data Δy1 , Δy2 , . . . , ΔyT . Beside, Δy1 may contain some additional information about the model parameters that is accumulated from the past. In this sense, some form of modifications is necessary before this conditional likelihood approach can be followed for model estimation. 3 The term ‘M -estimator’ was coined by Huber (1964) to mean maximum-likelihood type. It can be defined in either of the two ways: (a) as the solution of a maximization problem and (b) as the root of an estimating equation. Clearly, our estimation strategy falls into the category (b). van der Vaart (1998) named the M estimator defined in (b) as the zero estimator. See also Newey and MacFadden (1994).
3
2
Spatial Dynamic Panel Data Models
Consider the spatial dynamic panel data (SDPD) model where the spatial effects appear in the model in the forms of spatial lag (SL), space-time lag (STL), and spatial error (SE): yt = ρyt−1 + λ1 W1 yt + λ2 W2 yt−1 + Xtβ + Zγ + μ + αt 1n + ut ,
(2.1)
ut = λ3 W3 ut + vt , t = 1, 2, . . ., T, where yt = (y1t , y2t, . . . , ynt) and vt = (v1t, v2t, . . . , vnt) are n × 1 vectors of response values and idiosyncratic errors at time t, and {vit} are independent and identically distributed (iid) across i and t with mean zero and variance σv2 ; the scalar parameter ρ characterizes the dynamic effect, λ1 the spatial lag effect, λ2 the space-time effect, and λ3 the spatial error effect; {Xt} are n × p matrices containing values of p time-varying exogenous variables, Z is an n×q matrix containing the values of q time-invariant exogenous variables that may include the constant term, dummy variables (e.g., individuals’ gender and race), etc.; β and γ are the usual regression coefficients; Wr , r = 1, 2, 3, are the given n × n spatial weight matrices; and μ is an n × 1 vector of unobserved individual-specific effects, {αt } are the time-specific effects, and 1n is an n × 1 vector of ones. Model (2.1) is fairly general. It embeds several important submodels popular in the literature. Thus, it is highly desirable to have a unified method of inference for this general model so that the method can easily be simplified to suit each special model of interest to a particular applied problem. On the other hand, Model (2.1) can be further extended to contain higher-order spatial lags in yt , in yt−1 , as well as in ut ; and to allow for serial correlation among {vt} in moving average form. See the end of Section 3.1 and Section 5 for further discussions. In this paper, we focus on Model (2.1) because it is general enough and further generalizations can be done at the expense of more tedious algebra. Each submodel has its own features and merits, and thus deserves some specific attention. First, setting λ1 and λ2 to zero, Model (2.1) reduces to an SDPD model with only SE,4 in the form of a spatial autoregressive (SAR) error process, yt = ρyt−1 + Xt β + Zγ + μ + αt 1n + ut , ut = λ3 W3 ut + vt , t = 1, 2, . . ., T.
(2.2)
Su and Yang (2015) provide formal asymptotic results for the quasi maximum likelihood (QML) estimation of Model (2.2) with short panels (T small), and random or fixed effects. To give a full quasi likelihood function, the initial observations y0 or the initial differences Δy0 are modeled under some fundamental assumptions adapted from Hsiao et al. (2002). These assumptions may not hold and thus the model for the initial observations or differences is subject to model misspecification. See the end of this section for a detailed discussion. 4 This SE dependence structure was introduced by Anselin (1988). Subsequently, alternative or extended SE structures have been suggested, e.g., to replace the SAR process by a spatial moving average (SMA) process, to allow μ to be spatially correlated as well, etc. See Kapoor et al. (2007), Anselin et al. (2008), Lee and Yu (2012) and Baltagi et al. (2013). However, with short panels and fixed effects, μ must be eliminated by a data transformation to avoid incidental parameters problem, and after that only the SE structure in ut is kept.
4
Setting λ2 and λ3 in (2.1) to zero gives an SDPD model with only SL, yt = ρyt−1 + λ1 W1 yt + Xt β + Zγ + μ + αt 1n + vt , t = 1, 2, . . ., T.
(2.3)
A spatial model (not necessarily dynamic panel) with only SL effect may be more popular than that with only SE effect, as in the former, both the mean and variance of a spatial unit are directly affected by some other spatial units, whereas in the latter only the variance is so, resulting a model with so-called cross-section dependence.5 Setting λ2 to zero, Model (2.1) reduces to an SDPD model with both SL and SE, also referred to as the SDPD model with SARAR effect in the literature, yt = ρyt−1 + λ1 W1 yt + Xt β + Zγ + μ + αt n + ut , ut = λ3 W3 ut + vt , t = 1, 2, . . ., T. (2.4) This model encompasses Models (2.2) and (2.3), and has not been formally treated under ˇ ıˇ z k et al. (2014) considered GMM estimation of this model by the QML-type approach.6 C´ extending the three steps approach of Kapoor et al. (2007) with large n and fixed T . Setting λ3 in (2.1) to zero gives an SDPD model with SL and STL, yt = ρyt−1 + λ1 W1 yt + λ2 W2 yt−1 + Xt β + Zγ + μ + αt n + vt , t = 1, 2, . . ., T.
(2.5)
Under fixed effects, Yu et al. (2008) presented formal asymptotic results for the QML estimation of Model (2.5) under large n and large T set-up, and Lee and Yu (2014) studied this model based on GMM approach where n is large and T can be large but small relative to n. The case of large n and fixed T for Model (2.5) has not been formally treated in the literature, in particular under the QML approach. Setting ρ, λ1 and λ3 to zero in Model (2.1), we have a panel data model with only STL dependence, referred to as pure space recursive model in Anselin et al. (2008). Finally, when all the spatial parameters are set to zero, Model (2.1) reduces to the regular dynamic panel data (DPD) model, which has been extensively treated in the literature. The current study focuses on the general Model (2.1) with fixed effects, large n and small T . Under this scenario, Z must be excluded from the model, and αt 1n can be merged into Xtβ. Thus, the model under study takes the form: yt = ρyt−1 + λ1 W1 yt + λ2 W2 yt−1 + Xtβ + μ + ut , ut = λ3 W3 ut + vt ,
(2.6)
t = 1, 2, . . ., T . The QML approach requires the initial observations (differences) be specified (modeled) in order to construct the full likelihood function. The standard approach of modelling the initial observations or initial differences is through a linear projection onto the space 5
An alternative way of modelling the cross-section dependence may be the factor model; see, e.g., Andrews (2005), Pesaran (2006), Bai (2009), and Pesaran and Tosetti (2011). 6 Anselin et al. (2008, p. 647) point out that combinations of both spatially lagged dependent variables and spatially lagged error terms may lead to identification problems unless the covariate effects are non-zero. For detailed discussions on the identification of the SDPD models, see Elhorst (2012) and Lee and Yu (2016).
5
of observed regressors (Hsiao et al., 2002). This approach has been successfully adapted by Su and Yang (2015) to give consistent estimations of Model (2.2) under both random effects and fixed-effects. However, this approach requires the following initial conditions: (i) data collection starts from the 0th period; the processes start from the −mth period, i.e., m periods before the start of data collection, where m = 0, 1, . . ., and then evolve according to the prescribed processes, i.e., one of the models described above; (ii) starting positions of the process y−m are treated as exogenous; hence the exogenous variables Xt and the errors ut start to have impact on the response from the period −m + 1 onwards; (iii) all the exogenous quantities (y−m , Xt) are considered as random and inferences proceed by conditioning on them, (iv) {xit , t = . . . , −1, 0, 1, . . .} are trend stationary or first-difference stationary for all i = 1, . . . , n, and (v) the variances of y−m are constant. Evidently, what happened in the ‘past’ is not observed, the process starting time or m is unknown, the processes movements may not be the same before and after the start of data collection, and the processes are observed only for a few periods. Hence, the above assumptions, in particular the later part of (i) and (iv), may not hold and the linear projection model for the initial observations may be misspecified.7 Furthermore, even if these assumptions do hold, the linear projection method for modelling the initial observations/differences may not have a straightforward extension to SDPD models that contain SL and/or STL structures.8 Alternative methods that are free from the initial conditions, or the methods without the need of explicitly modelling the initial differences, are therefore highly desirable.
Unified M -Estimation of Fixed-Effects SDPD Models
3
In this section, we present a unified framework for estimating the fixed effects (FE) SDPD models, where all three types of spatial dependence are allowed to be present in the model and the time dimension T is allowed to be small and fixed. The basic idea of this unified approach is to first formulate the Gaussian likelihood function conditional on the initial differences Δy1 as if they are exogenous, and then modify the resulted quasi score function to account for the ignorance of Δy1 in this ‘conditional’ quasi Gaussian score. We shall start with short panels, i.e., panels with large n and small T , to demonstrate the exact cause of inconsistency of the estimators based on this conditional likelihood, and to show how one can adjust the quasi 7
Consider Model (2.2) after first-difference. Under the initial conditions, we obtain, P Pm−1 j −(j+1) j Δv−j+1 , Δy1 = ρm Δy−m+1 + m−1 j=0 ρ Δx−j+1 β + j=0 ρ B3
by successive backward substitutions, where B3 = In − λ3 W3 and In is an n × n identity matrix. Clearly, the exogenous part Δη1 of Δy1 enjoys an approximately linear structure, which makes the linear project of Δη1 onto the space of the observed Δxt , t = 1, . . . , T , valid. However, if the specified initial conditions do not hold, this linear projection is in doubt. 8 Consider simply the first-differenced Model (2.3). Under the initial conditions, we obtain, P P j −(j+1) j −(j+1) Δx−j+1 β + m−1 Δv−j+1 , Δy1 = ρm B1−m Δy−m+1 + m−1 j=0 ρ B1 j=0 ρ B1 by successive backward substitutions. Now, different from the SDPD model with SE only, the exogenous part Δη1 of Δy1 also contains spatial effect through B1 = In − λ1 W1 , and the linear structure is no longer there.
6
scores to give consistent estimators. Then we argue that when T grows with n the proposed estimation strategy remains valid, and in fact it gives better estimators than those based on the conditional likelihood, which is the usual QML estimators under the large n and large T set up (see, e.g., Yu et al., 2008). We present here the results for the most general model, and then in the supplement specialize them to each of several submodels to facilitate the practical applications and to compare with the existing QML estimation if available. Proofs of the lemmas and theorems are lengthy, in particular the the latter, and are sketched in Appendices. Detailed proofs are given in the supplement.
3.1
The M-estimation
To facilitate the introduction of the general theory and method, we differentiate the true value of a parameter vector from its general value by adding a subscript ‘0’, e.g., β0 is the true value of β, and emphasize that Model (2.6) holds only under the true parameter values. Following the standard practice, we eliminate μ by first-differencing (2.6) to give, Δyt = ρ0 Δyt−1 + λ10W1 Δyt + λ20 W2 Δyt−1 + ΔXt β0 + Δut , Δut = λ30 W3 Δut + Δvt , (3.1) 2 , ρ , λ } where for t = 2, 3, · · · , T . The parameters left in Model (3.1) are ψ0 = {β0 , σv0 0 0 λ0 = (λ10, λ20, λ30 ) . Note that Δy1 depends on both the initial observations y0 and the first period observations y1 . Thus, even if y0 is exogenous, y1 and thus Δy1 is not. However, we still formulate a likelihood function as if Δy1 is exogenous, and then make corrections on the relevant elements of the score function. Let ΔY = {Δy2 , . . . , ΔyT } , ΔY−1 = {Δy1 , . . ., ΔyT −1 } , ΔX = {ΔX2 , . . . , ΔXT } , and Δv = {Δv2 , . . ., ΔvT } . Let Wr = IT −1 ⊗ Wr , r = 1, 2, 3, where ⊗ denotes the Kronecker product and Ik a k × k identity matrix. Define Br (λr ) = In − λr Wr , r = 1, 3, and B2 (ρ, λ2) = ρIn + λ2 W2 . Model (3.1) can be written as:
ΔY = ρ0 ΔY−1 + λ10 W1 ΔY + λ20W2 ΔY−1 + ΔXβ0 + Δu, Δu = λ30 W3 Δu + Δv. (3.2) It is easy to see that 2 2 Ω(λ30 ), C ⊗ [B3 (λ30)B3 (λ30 )]−1 ≡ σv0 Var(Δu) = σv0 where C is a (T − 1) × (T − 1) constant matrix, ⎛ ⎜ ⎜ ⎜ C =⎜ ⎜ ⎝
2 −1 0 −1 2 −1 .. .. .. . . . 0 0 0 0 0 0
··· ··· .. . ··· ···
0 0 .. .
0 0 .. .
0 0 .. .
⎞
⎟ ⎟ ⎟ ⎟. ⎟ −1 2 −1 ⎠ 0 −1 2
Under normality of vt , the joint distribution of Δu can be easily obtained, which translates directly to the conditional joint distribution of ΔY . The quasi Gaussian loglikelihood of ψ
7
in terms of Δy2 , . . . , ΔyT , as if Δy1 is exogenous, has the form, ignoring the constant term: STLE (ψ) = − n(T2−1) log(σv2 ) −
1 2
log |Ω(λ3)| + log |B1 (λ1 )| −
1 Δu(θ) Ω(λ3 )−1 Δu(θ), 2σv2
(3.3)
where θ = (β , ρ, λ1, λ2 ) , Δu(θ) = B1 (λ1 )ΔY − B2 (ρ, λ2)ΔY−1 − ΔXβ, B1 (λ1 ) = IT −1 ⊗ B1 (λ1 ), and B2 (ρ, λ2) = IT −1 ⊗ B2 (ρ, λ2). Let θ1 = (β , ρ, λ2) . Given λ1 and λ3 , (3.3) is maximized at θ˜1 (λ1 , λ3) = (ΔX Ω(λ3 )−1 ΔX)−1 ΔX Ω(λ3 )B1 (λ1 )ΔY,
(3.4)
σ ˜v2 (λ1 , λ3)
(3.5)
1 u (λ1 , λ3 )Ω(λ3)−1 Δ˜ u(λ1 , λ3), n(T −1) Δ˜
=
˜ 1 , λ3 ) and ΔX = (ΔX, ΔY−1, W2 ΔY−1 ). Substituting where u ˜(λ1 , λ3) = B1 (λ1 )ΔY − ΔXθ(λ ˜v2 (λ1 , λ3) back into (3.3) gives the concentrated conditional quasi loglikelihood θ˜1 (λ1 , λ3) and σ of (λ1 , λ3), ignoring the constant term, σv2 (λ1 , λ3)] − 12 log |Ω(λ3)| + log |B1 (λ1 )|. cSTLE(λ1 , λ3 ) = − n(T2−1) log[˜
(3.6)
˜ 1 and λ ˜ 3 of λ1 and Maximizing cSTLE(λ1 , λ3 ) gives the conditional QML (CQML) estimators λ ˜ 1, λ ˜ 3) and σ ˜ 1, λ ˜ 3 ). ˜v2 ≡ σ ˜v2 (λ λ3 , and thus the CQML estimators of θ1 and σv2 as θ˜1 ≡ θ˜1 (λ Note that STLE (ψ) is a quasi Gaussian loglikelihood both in the traditional sense that {vit } are not exactly Gaussian but Gaussian likelihood is used, and the sense that Δy1 is not exogenous but is treated as exogenous. The latter causes inconsistency of the CQML estimators when T is small and fixed. We see from the results presented below that even if T increases with n, the CQML estimators may encounter an asymptotic bias. We now introduce a method that not only gives a consistent estimator of the model parameters when T is small, but also eliminates the asymptotic bias when T is large. To simplify the notation, a parametric quantity (scalar, vector or matrix) evaluated at the general values of the parameters is denoted by dropping its arguments, e.g., B1 ≡ B1 (λ1 ), B1 ≡ B1 (λ1 ), Ω ≡ Ω(λ3), and similarly for Br and Br , r = 2, 3; and that evaluated at the true values of the parameters is denoted by dropping its argument and then adding a subscript 0, e.g., B10 ≡ B1 (λ10 ), Ω0 ≡ Ω(λ30). Let C = C ⊗ In . Denote Δu ≡ Δu(θ0 ). The usual expectation, variance and covariance operators, ‘E’, ‘Var’ and ‘Cov’, correspond to the true parameter values. ∂ STLE (ψ) be the conditional quasi score (CQS) function. We have Let SSTLE (ψ) = ∂ψ ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ SSTLE(ψ) =
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
1 ΔX Ω−1 Δu(θ), σv2 −1) 1 Δu(θ) Ω−1 Δu(θ) − n(T , 2σv4 2σv2 1 Δu(θ) Ω−1 ΔY−1 , σv2 1 Δu(θ) Ω−1 W1 ΔY − tr(B−1 1 W1 ), σv2 1 −1 Δu(θ) Ω W2 ΔY−1 , σ2 v
1 Δu(θ) (C −1 2σv2
⊗ A3 )Δu(θ) − (T − 1)tr(G3 ),
8
(3.7)
where A3 = W3 B3 + B3 W3 and G3 = W3 B3−1 . Under mild conditions, maximizing the conditional loglikelihood STLE (ψ) is equivalent to solving the estimating equation SSTLE (ψ) = 0. It is well known that the QML type estimation or an extremum type estimation is a special case of M -estimation and that for a regular M -estimation problem, a necessary condition for the M -estimators to be consistent is that the probability limit of the estimating function (in this case, the averaged conditional quasi score) at the true parameter value is zero, i.e., limn→∞
1 nT SSTLE (ψ0 )
p
−→ 0,
see, e.g., van der Vaart (1998). However, as shown below this is not the case unless T also goes to infinity. Thus, the CQML estimators are not consistent unless T → ∞. Further, even if T goes to infinity with n (proportionally), the CQML estimators encounter a bias of order O(T −1 ), giving the so-called the asymptotic bias.9 To overcome this major problem, and to avoid the stringent initial conditions and the difficulty in modelling the initial differences under the FE-SDPD models with SL and/or STL effects, we first derive E[SSTLE(ψ0 )], and then ∗ (ψ), adjust the quasi scores SSTLE(ψ) so that the adjusted quasi score (AQS) vector, say SSTLE 1 ∗ is such that plimn→∞ nT SSTLE(ψ0 ) = 0. In contrast with Hsiao et al . (2002), Elhorst (2010), and Su and Yang (2015), we only need to have very minimum knowledge about the processes in the past. Assumption A: Under Model (2.1), (i) the processes started m periods before the start of data collection, the 0th period, and (ii) if m ≥ 1, Δy0 is independent of future errors {vt , t ≥ 1}; if m = 0, y0 is independent of future errors {vt, t ≥ 1}. Assumption A implies that the proposed method does not impose the conditions that {ys , s = −m, . . . , −1} follow the same processes as {yt, t = 0, 1, . . . , T }, and {xit } are trendstationary or first-difference stationary. It has a much weaker requirement on the processes starting positions ym . We have an important lemma based on the reduced form of (3.1): −1 −1 −1 ΔXtβ0 + B10 B30 Δvt , t = 2, . . . , T, Δyt = B0 Δyt−1 + B10
(3.8)
where B ≡ B(ρ, λ1, λ2 ) = B1−1 (λ1 )B2 (ρ, λ2). Lemma 3.1 Suppose Assumption A holds. Assume further that, for i = 1, . . ., n and t = 0, 1, . . ., T , (i) the idiosyncratic errors {vit} in Model (2.1) are iid across i and t with 2 , (ii) the time-varying regressors Xt are exogenous, and (iii) both mean 0 and variance σv0 −1 −1 B10 and B30 exist. We have 2 D−10 B−1 E(ΔY−1 Δv ) = −σv0 30 ,
(3.9)
2 D0 B−1 E(ΔY Δv ) = −σv0 30 , 9
(3.10) 1 2
1 E[SSTLE (ψ0 )] = O( T1 ), then √nT E[SSTLE (ψ0 )] = O(( Tn ) ), implying E[ nT (ψ˜ − ψ0 )] = √ O(( Tn ) ). The latter says that nT (ψ˜ − ψ0 ) would converge to a non-centered normal if Tn → c > 0. If Tn → 0 (large T case), the asymptotic bias vanishes, but this would not be a case of interest to a spatial panel model.
To be exact, if 1 2
1 nT
√
9
where D−1 ≡ D−1 (ρ, λ1, λ2 ) and D ≡ D(ρ, λ1, λ2 ), having the following expressions, ⎛ D−1
⎜ ⎜ =⎜ ⎜ ⎝
In , B − 2In , .. .
0, In , .. .
... ... .. .
0, 0, .. .
0 0 .. .
BT −4 (In − B)2 , BT −5 (In − B)2 , . . . B − 2In , In
⎛ ⎜ ⎜ D=⎜ ⎜ ⎝
B − 2In , (In − B)2 , .. .
In , B − 2In , .. .
... ... .. .
0 0 .. .
BT −3 (In − B)2 , BT −4 (In − B)2 , . . . B − 2In
⎞ ⎟ ⎟ −1 ⎟B , ⎟ 1 ⎠
⎞ ⎟ ⎟ −1 ⎟B . ⎟ 1 ⎠
The results of Lemma 3.1 lead immediately to 2 −1 E(Δu Ω−1 0 ΔY−1 ) = −σv0 tr(C D−10 ),
E(Δu Ω−1 0 W1 ΔY ) E(Δu Ω−1 0 W2 ΔY−1 )
= =
2 −σv0 tr(C−1 D0 W1 ), 2 −σv0 tr(C−1 D−10 W2 ),
(3.11) (3.12) (3.13)
showing that the (ρ, λ1, λ2 ) elements of E[SSTLE (ψ0 )] are not zero, typically of order O(n).10 1 ∂ 1 ∂ 1 ∂ Hence, plimn→∞ nT ∂ρ STLE(ψ0 ), plimn→∞ nT ∂λ1 STLE(ψ0 ), and plimn→∞ nT ∂λ2 STLE (ψ0 ) are all non-zero, suggesting that the conditional QMLEs of (ρ, λ1, λ2 ), treating Δy1 as exogenous, cannot be consistent in general. Very interestingly, these results are derived under only a very minimum set of conditions given in Assumption A and in Lemma 3.1, they are free from the initial conditions usually set for short dynamic panels, and are independent of the way the past observations being generated, i.e., when the processes started and how the process evolved before the start of data collection. These provide a simple way to adjust the quasi scores so as to give a set of unbiased estimating functions. The adjusted quasi score (AQS) functions are:
∗ (ψ) = SSTLE
⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
1 ΔX Ω−1 Δu(θ), σv2 n(T −1) 1 Δu(θ) Ω−1 Δu(θ) − 2σ2 , 2σv4 v 1 Ω−1 ΔY −1 D ), Δu(θ) + tr(C −1 −1 σv2 1 Δu(θ) Ω−1 W1 ΔY + tr(C−1 DW1 ), σv2 1 Δu(θ) Ω−1 W2 ΔY−1 + tr(C−1 D−1 W2 ), σv2 1 −1 ⊗ A3 )Δu(θ) − (T − 1)tr(G3 ), 2σ2 Δu(θ) (C
(3.14)
v
which lead to, as shown in Theorems 3.1 and 3.2, an estimator of ψ that not only is consistent but also has a centered asymptotic distribution, whether T is fixed or grows with n. The latter 10
1−ρT
1 ∂ 1 When λ1 = λ2 = 0, plim nT (ψ0 ) = T 2 (1−ρ00 )2 − T (1−ρ (see the supplement, ). Thus, ρ˜ has a bias ∂ρ STLE 0) 1 of order O( T ). As all the matrices involved in (3.11)-(3.13) are uniformly bounded in row and column sums, this result would hold for the general model, and the bias in ρ˜ would spill over to the other CQMLEs.
10
implies that when T grows with n, the estimation based on the AQS functions eliminates the asymptotic bias incurred in the conditional QML approach. The ‘adjustments’ in the AQS functions have another interesting feature: they are independent of the SE structure, i.e., free from B3 . This means that the adjustments to the conditional quasi scores of (3.7) remain the same if the SAR error is replaced by SMA error, or the spatial errors are of higher order.11 Comparing (3.14) with (3.7), we see that after the adjustments, both ρ and λ2 parameters become non-linear in the sense that their estimation has to be done through a nonlinear rootfinding process. Evidently, the adjustments recovered the ‘neglected’ information contained in the initial observations (by the conditional likelihood) about these parameters. ∗ (ψ) = 0 leads to the M -estimator ψˆM of ψ. This root-finding process can Solving SSTLE be simplified by first solving the equations for β and σv2 , given δ = (ρ, λ) , resulting in the constrained M -estimators of β and σv2 as βˆM (δ) = (ΔX Ω−1 ΔX)−1 ΔX Ω−1 (B1 ΔY − B2 ΔY−1 ), 2 (δ) σ ˆv,M
=
1 u(δ) Ω−1 Δˆ u(δ), n(T −1) Δˆ
(3.15) (3.16)
2 (δ) back into the last four ˆ ˆv,M where Δˆ u(δ) = Δu(β(δ), ρ, λ1, λ2 ). Substituting βˆM (δ) and σ components of the AQS function in (3.14) gives the concentrated AQS functions: ⎧ 1 ⎪ ⎪ u(δ) Ω−1 ΔY−1 + tr(C−1 D−1 ), 2 (δ) Δˆ ⎪ σ ˆv,M ⎪ ⎪ ⎪ ⎪ ⎨ 2 1 Δˆ u(δ) Ω−1 W1 ΔY + tr(C−1 DW1 ), σ ˆv,M (δ) ∗c (3.17) SSTLE(δ) = 1 Ω−1 W ΔY −1 D W ), ⎪ Δˆ u (δ) + tr(C ⎪ 2 −1 −1 2 2 ⎪ σˆv,M (δ) ⎪ ⎪ ⎪ ⎪ ⎩ 21 Δˆ u(δ) (C −1 ⊗ A3 )Δˆ u(δ) − (T − 1)tr(G3 ). 2ˆ σv,M (δ)
∗c (δ) = 0, we obtain the unSolving the resulted concentrated estimating equations, SSTLE ˆ constrained M -estimators δM of δ. The unconstrained M -estimators of β and σv2 are thus 2 2 ˆ 2 ˆ ) . ˆv,M ≡σ ˆv,M (δM ). Denote ψˆM = (βˆM , σ ˆ v,M , ρˆM, λ βˆM ≡ βˆM (δˆM ) and σ M We end this subsection by noting that Model (3.1) and its M -estimation strategy can even be further extended to allow for higher-order spatial lags: 1 2 λ1j W1j Δyt + kj=1 λ2j W2j Δyt−1 + ΔXtβ + Δut , Δyt = ρΔyt−1 + kj=1 k3 Δut = j=1 λ3j W3j Δut + Δvt , for t = 2, 3, · · · , T ;
and to allow for serial correlation among {vt} in moving average form. The former extension r λrj Wrj , proceeds by simply letting λr = (λrj , j = 1, . . ., kr ) , r = 1, 2, 3; Br (λr ) = In − kj=1 k2 −1 r = 1, 3; and B(ρ, λ1, λ2) = B1 (λ1)(ρIn + j=1 λ2j W2j ), but the latter requires reworks on Lemma 3.1 and the proofs of subsequent theorems. See Section 5 for more discusses. In this paper, we focus on Model (3.1) as it is general enough for most of the empirical applications and it leads to a set of inference theories that are fairly simple and yet easily extendable. 11 This feature may also hold if the SE structure is replaced by the other forms of cross-section dependence induced by common factors; see, e.g., Andrews (2005), Pesaran (2006), Bai (2009), Pesaran and Tosetti (2011).
11
3.2
Asymptotic properties of the M-estimators
In this section we study the consistency and asymptotic normality of the proposed M estimators for the FE-SDPD model with the general spatial dependence structure. To facilitate the discussions of the asymptotic properties of the proposed M -estimators, first recall, ψ0 denotes the true value of the parameter vector ψ; and a parametric function at the true parameter value is differentiated from that at a general parameter value by adding a subscript ‘0’, e.g., B1 ≡ B1 (λ1 ) and B10 ≡ B1 (λ10), Ω ≡ Ω(λ3 ) and Ω0 ≡ Ω(λ30 ), etc.; Δu ≡ Δu(θ0 ); and the common expectation, variance and covariance operators ‘E’ ‘Var’ and ‘Cov’ correspond to the true parameter vector ψ0 . Second, some general notation and convention are as follows: (i) δ denotes the vector of parameters in the concentrated AQS functions (of the full model or a submodel), and Δ the space from which δ takes values; (ii) tr(·), | · | and · denote, respectively, the trace, determinant, and Frobenius norm of a matrix; (iii) γmax (A) and γmin (A) denote, respectively, the largest and smallest eigenvalues of a real symmetric matrix A; and (iv) diag(ak ) forms a diagonal matrix using the elements {ak } and blkdiag(Ak ) forms a block-diagonal matrix using the matrices {Ak }. Assumption B: The innovations vit are iid for all i and t with E(vit ) = 0, Var(vit) = σv2 , and E|vit |4+0 < ∞ for some 0 > 0. Assumption C: The space Δ is compact, and the true parameter δ0 lies in its interior. Assumption D: The time-varying regressors {Xt, t = 0, 1, . . ., T } are exogenous, their 1 ΔX ΔX exists and is nonsingular. values are uniformly bounded, and limn→∞ nT Assumption E: (i) For r = 1, 2, 3, the elements wr,ij of Wr are at most of order h−1 n , uniformly in all i and j, and wr,ii = 0 for all i; (ii) hn /n → 0 as n → ∞; (iii) {Wr , r = 1, 2, 3} −1 , r = 1, 3} are uniformly bounded in both row and column sums; (iv) For r = 1, 3, and {Br0 −1 {Br } are uniformly bounded in either row or column sums, uniformly in λr in a compact parameter space Λr , and 0 < cr ≤ inf λr ∈Λr γmin (Br Br ) ≤ supλr ∈Λr γmax (Br Br ) ≤ c¯r < ∞. Assumption F: For an n × n matrix Φ uniformly bounded in either row or column sums, with elements of uniform order h−1 n , and an n × 1 vector φ with elements of uniform order −1/2 hn hn , (i) n Δy1 ΦΔy1 = Op(1) and hnn Δy1 ΦΔv2 = Op (1); (ii) hnn (Δy1 − E(Δy1 )) φ = op (1); (iii) hnn [Δy1 ΦΔy1 − E(Δy1 ΦΔy1 )] = op (1), and (iv) hnn [Δy1 ΦΔv2 − E(Δy1 ΦΔv2 )] = op (1). Assumptions B-E are standard in the spatial econometrics literature (see, e.g., Lee (2004), Lee and Yu (2010), Su and Yang (2015)). Assumption F imposes some fairly mild conditions on the initial differences Δy1 . These conditions clearly hold if the process starting position y−m is exogenous with m = 0 or 1. They can also be proved for a general m if, in addition to Assumptions A, B, and D, it is further assumed that the processes start at exogenous positions y−m at time −m, and then evolves according to (2.6). Now, the consistency of the proposed M -estimators ψˆM lies with the consistency of δˆM , as 2 ˆv,M follows almost immediately that under Assumptions D and E, the consistency of βˆM and σ ∗c ˆ of δM . The concentrated estimating function (CEF) SSTLE (δ) and its population counter part play a major role for the consistency of δˆM for δ. 12
∗ ∗ Define S¯STLE (ψ) = E[SSTLE (ψ)], the population counter part of the joint estimating function ∗ (ψ) = 0 (JEF) given in (3.14). Given δ, the population joint estimation equation (JEE) S¯STLE is partially solved, by working with its first two components, at
β¯M (δ) = (ΔX Ω−1 ΔX)−1 ΔX Ω−1 (B1 EΔY − B2 EΔY−1 ), 2 (δ) σ ¯v,M
=
1 E[Δ¯ u(δ) Ω−1 Δ¯ u(δ)], n(T −1)
(3.18) (3.19)
¯ = B1 ΔY − B2 ΔY−1 − ΔX β(δ). These lead to the population where Δ¯ u(δ) = Δu(θ)|β=β(δ) ¯ 2 ¯v,M (δ) back into the counter part of the CEF given in (3.17), upon substituting β¯M (δ) and σ ∗ δ-component of S¯STLE (ψ), as ⎧ 1 ⎪ ⎪ u(δ) Ω−1 ΔY−1 ] + tr(C−1 D−1 ), 2 (δ) E[Δ¯ ⎪ σ ¯v,M ⎪ ⎪ ⎪ ⎪ 1 ⎨ u(δ) Ω−1 W1 ΔY ] + tr(C−1 DW1 ), 2 (δ) E[Δ¯ σ ¯v,M ∗c ¯ (3.20) SSTLE (δ) = 1 ⎪ ⎪ u(δ) Ω−1 W1 ΔY−1 ] + tr(C−1 D−1 W1 ), 2 (δ) E[Δ¯ ⎪ σ ¯ ⎪ v,M ⎪ ⎪ ⎪ ⎩ 21 E[Δ¯ u(δ) (C −1 ⊗ A3 )Δ¯ u(δ)] − (T − 1)tr(G3 ). 2¯ σv,M (δ)
2 ∗c (δ) and thus S¯STLE (δ) can be obtained through Note that more detailed expressions for σ ¯v,M the following very useful identity: ◦ ), Δ¯ u∗ (δ) = M(B∗1 ΔY − B∗2 ΔY−1 ) + P(B∗1 ΔY ◦ − B∗2 ΔY−1 1
(3.21)
1
◦ u(δ), B∗r = Ω− 2 Br , ΔY ◦ = ΔY − E(ΔY ), ΔY−1 = ΔY−1 − E(ΔY−1 ), where Δ¯ u∗ (δ) = Ω− 2 Δ¯ 1 1 1 − −1 Ω 2 is the square-root matrix of Ω, M = In(T −1) − Ω 2 ΔX(ΔX Ω ΔX)−1 ΔX Ω− 2 , and P = In(T −1) −M. Also note that the quantities E(ΔY ), E(ΔY−1 ), Var(ΔY ), Cov(ΔY, ΔY−1 ), etc., involved in (3.18)-(3.20) are functions of ψ0 , but not ψ. ∗c (δ). It is easy to see that S ∗c (δ ) = 0 ¯STLE Clearly, the M -estimator δˆM of δ0 is a zero of SSTLE 0 2 2 ∗c ¯ ¯ ¯v (δ0 ) = σv0 , i.e., δ0 is a zero of SSTLE(δ). Thus, by Theorem 5.9 of through β(δ0 ) = β0 and σ p ∗c (δ) − S ∗c (δ) −→ ˆ ¯STLE 0, van der Vaart (1998), δM will be consistent for δ0 if supδ∈Δ n(T1−1) SSTLE and the following identification condition holds. Assumption G: inf δ: d(δ,δ )≥ε S¯∗c (δ) > 0 for every ε > 0, where d(δ, δ0) is a measure 0
STLE
of distance between δ0 and δ. For the simpler models discussed in Section 2, the corresponding expressions for S¯M∗c(δ) can be easily obtained by dropping the relevant terms. The identification condition becomes simpler, allowing us to gain insights on the nature of such a condition. See Lee and Yu (2016) and references therein for a detailed discussion on the identification of the SDPD models. Theorem 3.1 Suppose Assumptions A-G hold. Assume further that (i) γmax [Var(ΔY )] and γmax [Var(ΔY−1 )] are bounded, and (ii) inf δ∈Δ γmin Var(B1 ΔY − B2 ΔY−1 ) ≥ cy > 0. p We have, as n → ∞, ψˆM −→ ψ0 . To derive the asymptotic distribution of ψˆM , we start with a Taylor expansion of the JEE ∗ (ψˆM ) = 0 at ψ0 , and then verify that the AQS function S ∗ (ψ0 ) is asymptotically normal S STLE
STLE
13
∂ ∗ ¯ ¯ and that the adjusted Hessian ∂ψ SSTLE (ψ) has proper asymptotic behavior, for some ψ lying between ψˆM and ψ0 elementwise. To most of the static models or dynamic models where the initial conditions are specified, both problems are fairly standard in that the regular law of large numbers (LLN) and central limit theorem (CLT) for linear-quadratic forms (e.g., Kelejian and Prucha, 2001) would be sufficient. In our approach, the ‘unspecified’ ∗ (ψ0 ), and thus extended LLN and CLT for bilinear-quadratic forms Δy1 is involved in SSTLE (Lemmas A.4 and A.5) are required for establishing the asymptotic properties. The following representations for ΔY and ΔY−1 in terms of Δy1 = 1T −1 ⊗ Δy1 and Δv are crucial.
Lemma 3.2 Under the assumptions of Lemma 3.1, we have, ΔY
= R Δy1 + η + SΔv,
(3.22)
ΔY−1 = R−1 Δy1 + η−1 + S−1 Δv,
(3.23)
where R = blkdiag(B0 , B02 , . . . , B0T −1 ), R−1 = blkdiag(In , B0 , . . . , B0T −2 ), η = BB−1 10 ΔXβ0, −1 −1 −1 −1 −1 η−1 = B−1 B10 ΔXβ0, S = BB10 B30 , S−1 = B−1 B10 B30 , ⎛ ⎜ ⎜ B=⎜ ⎝
In B0 .. .
0 In .. .
... ... .. .
0 0 .. .
⎛
⎞
0 0 .. .
⎜ ⎟ ⎜ ⎟ ⎟ , and B−1 = ⎜ ⎝ ⎠
B0T −2 B0T −3 . . . B0 In
0 In .. .
0 0 .. .
... ... .. .
0 0 .. .
0 0 .. .
⎞ ⎟ ⎟ ⎟. ⎠
B0T −3 B0T −4 . . . In 0
The representations for ΔY and ΔY−1 given in Lemma 3.2 turn out to be very useful. They lead to a simple way for establishing the asymptotic normality of the AQS vector, and a simple way for estimating the variance-covariance (VC) matrix of it. Using these representations and Δu = B−1 30 Δv, the AQS function at ψ0 can be written as ⎧ Π1 Δv, ⎪ ⎪ ⎪ n(T −1) ⎪ ⎪ Δv Φ1 Δv − 2σ2 , ⎪ ⎪ v0 ⎪ ⎨ Δv Ψ1 Δy1 + Π2 Δv + Δv Φ2 Δv + tr(C−1 D−10 ), ∗ SSTLE(ψ0 ) = ⎪ Δv Ψ2 Δy1 + Π3 Δv + Δv Φ3 Δv + tr(C−1 D0 W1 ), ⎪ ⎪ ⎪ ⎪ ⎪ Δv Ψ3 Δy1 + Π4 Δv + Δv Φ4 Δv + tr(C−1 D−10 W2 ), ⎪ ⎪ ⎩ Δv Φ5 Δv − (T − 1)tr(G30),
(3.24)
where Π1 = σ12 Cb ΔX, Π2 = σ12 Cb η−1 , Π3 = σ12 Cb W1 η, Π4 = σ12 Cb W2 η−1 , Φ1 = 2σ14 (C −1 ⊗In ), v0
v0
v0
v0
v0
Φ2 = σ12 Cb S−1 , Φ3 = σ12 Cb W1 S, Φ4 = σ12 Cb W2 S−1 , Φ5 = σ12 [C −1 ⊗(G30 +G30 )], Ψ1 = σ12 Cb R−1 , v0
v0
v0
v0
Ψ2 = σ12 Cb W1 R, Ψ3 = σ12 Cb W2 R−1 , and Cb =C −1 ⊗ B30 . v0
v0
v0
Theorem 3.2 Under the assumptions of Theorem 3.1, we have, as n → ∞, D ∗−1 ∗ n(T − 1) ψˆM − ψ0 −→ N 0, lim Σ∗−1 STLE(ψ0 )ΓSTLE (ψ0 )ΣSTLE(ψ0 ) , n→∞
∂ 1 ∗ ∗ ∗ where Σ∗STLE (ψ0 ) = − n(T1−1) E[ ∂ψ SSTLE (ψ0 )] and ΓSTLE (ψ0 ) = n(T −1) Var[SSTLE(ψ0 )], both as∗ sumed to exist and ΣSTLE(ψ0 ) to be positive definite, for sufficiently large n.
14
3.3
The OPMD estimation of robust VC matrix
The practical applications of the M -estimation of the FE-SDPD models, i.e., model in∗−1 ∗ ferences, depend on the availability of a consistent estimate of Σ∗−1 STLE(ψ0 )ΓSTLE (ψ0 )ΣSTLE(ψ0 ), the VC matrix of ψˆM . As Σ∗STLE (ψ0 ) is the expected negative modified Hessian, its observed counter part immediately offers a consistent estimate of it, i.e., ∂ ∗ ˆ. (3.25) Σ∗STLE(ψˆM ) = − n(T1−1) ∂ψ SSTLE (ψ) ψ=ψ M ∂ ∗ The detailed expression of ∂ψ SSTLE (ψ) for the most general model is given in the proof of Theorem 3.2 in Appendix C, which can easily be simplified to various submodels by deleting relevant terms. The consistency of Σ∗STLE(ψˆM ) is also proved in the proof of Theorem 3.2. However, the traditional plug-in method of estimation of Γ∗STLE (ψ0 ), the VC matrix of the ∗ (ψ0 ), runs into a similar problems as the QML estimation of the joint AQS function SSTLE model – initial differences Δy1 need to be specified or modeled when T is fixed and small as seen from (3.24). To make the estimation of Γ∗STLE (ψ0 ) also free from the specification of initial conditions so that the inferences for the general FE-SDPD model is fully free from the initial conditions. We propose a martingale difference (M.D.) method, i.e., decompose the AQS function into the sum of a vector M.D. sequence so that the ‘average’ of the outer products of the elements of the M.D. sequence gives a consistent estimate of Γ∗STLE (ψ0 ). ∗ (ψ0 ) contains three types of elements: From (3.24) we see that the AQS function SSTLE
Π Δv, Δv ΦΔv,
and Δv ΨΔy1 ,
where Π, Φ and Ψ are nonstochastic matrices (depending on ψ0 ) with Π being n(T − 1) × p or n(T − 1) × 1, and Φ and Ψ being n(T − 1) × n(T − 1). Clearly, the closed form expressions for variances of Π Δv and Δv ΦΔv, and their covariance can readily be derived, so that a plugin method may be used to estimate these variances and covariances. However, closed-form expression for the variance of Δv ΨΔy1 and its covariances with Π Δv and Δv ΦΔv depend on the knowledge of the distribution of Δy1 , which is unavailable. To give a unified method of estimating the variance of AQS function so that it is also free from the specifications of the initial conditions, we first write the AQS function as a sum of a vector M.D. array, taking the advantage that Δvit are independent across i for each t and that T is small. We show in the following lemma that Π Δv can be written as a sum of n independent terms, and Δv ΦΔv − E(Δv ΦΔ v) can be written as the sum of a M.D. array. Under the assumption that Δy0 depends only on the current and past errors (vs , s ≤ 0) but not the future errors (vt, t ≥ 1), we show that Δv ΦΔy1 can also be written ∗ (ψ0 ) is a function of ψ0 , Δv and as the sum of a M.D. sequence. Note from (3.24) that SSTLE and Δy1 Δy1 , where ψ0 is consistently estimated by ψˆM , Δv is consistently estimated by Δv, itself is observed. A new method for estimating the variance of the AQS function, namely the outer-product-of-martingale-difference (OPMD) method, arises. For a square matrix A, let Au , Al and Ad be, respectively, its upper-triangular, lowertriangular, and diagonal matrix such that A = Au + Al + Ad . Denote by Πt , Φts and Ψts the 15
submatrices of Π, Φ and Ψ partitioned according to t, s = 2, . . . , T . Define Ψt+ = Ts=2 Ψts , ∗ = Ψt+ Δy1 . Let {Gn,i } t = 2, . . ., T , Θ = Ψ2+ (B30 B10 )−1 , Δy1◦ = B30 B10 Δy1 , and Δy1t be the increasing sequence of σ-fields generated by (vj1 , . . . , vjT , j = 1, . . . , i), i = 1, . . ., n, n ≥ 1. Let Fn,0 be the σ-field generated by (v0 , Δy0 ), and define Fn,i = Fn,0 ⊗ Gn,i . Clearly, Fn,i−1 ⊆ Fn,i , i.e., {Fn,i }ni=1 is an increasing sequence of σ-fields, for each n ≥ 1. Lemma 3.3 Consider Model (3.1), and suppose the assumptions of Lemma 3.1 hold. Consider the general Π which is n(T − 1) × p, and let Πit be the ith row of Πt . Define g1i = g2i = g3i =
T
t=2 Πit Δvit ,
(3.26)
T
∗ 2 t=2 (Δvit Δξit + Δvit Δvit − σv0 dit ), ◦ 2 ∗ Δv2i Δζi + Θii (Δv2iΔy1i + σv0 ) + Tt=3 Δvit Δy1it ,
(3.27) (3.28)
T l ∗ d where {Δξit} = Δξt = Ts=2 (Φu st + Φts )Δvs , Δvt = s=2 Φts Δvs , {dit } = diagonal elements of CΦ, {Δζi } = Δζ = (Θu + Θl )Δy1◦ , and diag{Θii } = Θd . Then, n = g1i , Π Δv i=1 n = g2i , Δv ΦΔv − E(Δv ΦΔv) i=1 n Δv ΨΔy1 − E(Δv ΨΔy1 ) = i=1 g3i , , g2i, g3i ) , Fn,i}ni=1 form a vector M.D. sequence. and {(g1i
Now, following Lemma 3.3, for each Πr , r = 1, 2, 3, 4, defined in (3.24), define g1ri according to (3.26); for each Φr , r = 1, . . . , 5, define g2ri according to (3.27); and for each Ψr , r = 1, 2, 3, define g3ri according to (3.28). Define , g21i, g31i + g12i + g22i, g32i + g13i + g23i , g33i + g14i + g24i, g25i) . gi = (g11i
n ∗ (ψ0 ) = gi , and {gi , Fn,i} form a vector M.D. sequence. It follows that Then, SSTLE n i=1 ∗ Var[SSTLE(ψ0 )] = i=1 E(gigi ). The ‘average’ of the outer products of the estimated gi s, i.e., ∗ Γ STLE =
1 n(T −1)
n
ˆigˆi , i=1 g
(3.29)
thus gives a consistent estimator of the variance of Γ∗STLE (ψ0 ), where gˆi is obtained by replacing Noting that Δy1 is observed, ψ0 in gi by ψˆM and Δv in gi by its observed counterpart Δv. we have the following theorem. Theorem 3.3 Under the assumptions of Theorem 3.1, we have, as n → ∞, ∗ − Γ∗ (ψ0 ) = Γ STLE STLE
1 n(T −1)
n i=1
p gˆi gˆi − E(gi gi ) −→ 0,
p ∗−1 ˆ ∗−1 ∗−1 ∗ ˆ ∗ and hence, Σ∗−1 STLE(ψM )ΓSTLE ΣSTLE(ψM ) − ΣSTLE(ψ0 )ΓSTLE (ψ0 )ΣSTLE(ψ0 ) −→ 0. ∗−1 ˆ ˆ ∗ ˆ The estimator Σ∗−1 STLE(ψM )ΓSTLE ΣSTLE (ψM ) of the VC matrix of ψM is referred to as the OPMD ∗ estimator in this paper, to reflect the fact that Γ STLE is obtained from the outer products of
16
the elements of a martingale difference (OPMD) sequence.12 The general theories and methods introduced in this section can easily be simplified to suit various submodels discussed in Section 2, from which important insights can be gained on the properties of the CQMLE and the proposed M -estimator (see, e.g., Footnote 10). The simplified methods are also useful for practical applications, and allow easy comparisons of our approach with the standard small T or large T approaches if available. To conserve space, the detailed results for these submodels are collected in the supplement.
4
Monte Carlo Results
Monte Carlo experiments are carried out to investigate (i) the finite sample performance of the proposed M -estimator of the FE-SDPD model, and (ii) the finite sample performance of the proposed OPMD-based standard error (the OPMD estimate of the standard deviation of the M -estimator). We use the following five models in our Monte Carlo experiments: SE : SL : SLE : STL : STLE :
yt yt yt yt yt
= ρyt−1 + β0 ιn + Xtβ1 + Zγ + μ + ut , ut = λ3 W3 ut + vt , = ρyt−1 + λ1 W1 yt + β0 ιn + Xt β1 + Zγ + μ + vt , = ρyt−1 + λ1 W1 yt + β0 ιn + Xt β1 + Zγ + μ + ut , ut = λ3 W3 ut + vt , = ρyt−1 + λ1 W1 yt + λ2 W2 yt−1 + β0 ιn + Xtβ1 + Zγ + μ + vt , = ρyt−1 + λ1 W1 yt + λ2 W2 yt−1 + β0 ιn + Xtβ1 + Zγ + μ + ut , ut = λ2 W3 ut + vt .
The elements of Xt are generated in a similar fashion as in Hsiao et al. (2002),13 and the elements of Z are randomly generated from Bernoulli(0.5). The spatial weight matrices are generated according to Rook or Queen contiguity, or group interaction schemes.14 We choose β0 = 5, β1 = 1, γ = 1, σμ = 1, σv = 1 or 2, a set of values for ρ ranging from −0.9 to 0.9, a set of values for λr , r = 1, 2, 3 in a similar range, T = 3 or 7, and n = 50, 100, and 200. Each set of Monte Carlo results, corresponding to a combination of the values of (n, T, m, ρ, λs, σv ), is based on 2000 samples. The error (vt ) distributions (err) can be (i) normal, (ii) normal mixture (10% N (0, 42) and 90% N (0, 1)), or (iii) chi-square with 3 degrees of freedom.15 The fixed effects μ are generated according to either T1 Tt=1 Xt + e or e, where e ∼ (0, IN ), resulting in the fixed effects that are either correlated or uncorrelated with the regressors. 12 Practical implementations of the OPMD estimator of the VC matrix of AQS vector can be greatly facilitated by the vector and matrix representation of the quantities defined in Lemma 3.3. For example, to compute g1 = {g1i }, let πk be the kth column of Π. Reshape πk and Δv into n × (T − 1) matrices πk and Δv. Then g1 equals the vector of row sums of πk Δv, where denotes the Hadamard product. Moreover, the partial derivatives of D and D−1 defined in Lemma 3.1 are needed for the evaluation of the Hessian matrix, which can be algebraically tedious if T is not so small. In this case, these partial derivatives can be replaced by the numerical partial derivatives without losing much of the accuracy of the OPMD estimator. PT 13 1 The detail is: Xt = μx + gt1n + ζt, (1 − φ1 L)ζt = εt + φ2 εt−1 , εt ∼ N (0, σ12 In ), μx = e + T +m+1 t=−m εt , and e ∼ N (0, σ22 In ). Let θx = (g, φ1 , φ2 , σ1 , σ2 ). Alternatively, Xt can be randomly generated from N (0, σ12 In ). The σ12 is the key parameter that controls the variability of the regressors, and thus the signal-to-noise ratio. 14 The Rook and Queen schemes are standard. For group interaction, we first generate k = P nα groups of n, 1.5¯ n), g = 1, · · · , k, where 0 < α < 1 and n ¯ = n/k, and then adjust ng so that kg=1 ng = n. sizes ng ∼ U (.5¯ The reported results correspond to α = 0.5. See Yang (2015) for details in generating these spatial layouts. 15 In both (ii) and (iii), the generated errors are standardized to have mean zero and variance σv2 .
17
Monte Carlo (empirical) means and standard deviations (sds) are reported for the CQML estimator (CQMLE) and the proposed M -estimator. Empirical averages of the standard ˆ ∗−1 , se based on Σ∗−1 based on the robust VC matrix errors (ses): se based on Γ M M (ψM ), and rse ∗−1 ˆ ∗ ∗−1 ˆ estimate ΣM (ψM )ΓM ΣM (ψM ), are also reported for the proposed M -estimator. Due to the space constraints, only a small subset of the results, corresponding to the case of correlated fixed effects, is reported, and the full set of results can be found in the supplement. Table 1a presents the empirical means and sds of the estimators of SE model and Table 1b presents the corresponding averages of ses for the M -estimator. For this model, the full QMLE (FQMLE) is available from Su and Yang (2015) and thus is included in the Monte Carlo experiments for comparison purpose. The results (reported and unreported) show an excellent performance of the proposed M -estimators of the model parameters, and the OPMD-based ses. The M -estimator of the dynamic parameter is nearly unbiased, as is the FQMLE, whereas the CQMLE can be quite biased and as n increases it does not show a sign of convergence. As expected, the M -estimator is slightly less efficient than the FQMLE, but when T is increased from 3 to 7, the difference becomes negligible. However, the FQMLE depends on the m value and a wrong specification of it may result in poor estimate when ρ is negative and large (see Su and Yang, 2015). The FQMLE also depends on the choice of ‘predictors’ in the predictive model for the initial differences. In contrast, the proposed M -estimator (also CQMLE) depends on neither. The OPMD-based ses (rse) are on average very close to the corresponding Monte Carlo sds in general, showing the robustness of rse. 2 The non-robust ses (se and se) of σ ˆv can be quite different from the corresponding Monte Carlo sds when the errors are nonnormal, whether n is small or large. When T is increased from 3 to 7, the CQMLE of ρ improves significantly. Both FQMLE and M -estimator of the spatial parameter λ3 show some bias (the CQMLE is more biased). This is perhaps due to the √ intrinsic nature of the QML-type estimation of the spatial effects.16 The n-consistency of the FQMLE and M -estimator is clearly demonstrated by the Monte Carlo sds. To summarize, our approach performs equally well as or better than the full QML approach depending on whether the initial model is correctly specified or misspecified, and clearly outperforms the CQML approach whether T is small or large. Tables 2a and 2b present partial results for the SL model. The results (reported and unreported) again show an excellent performance of the proposed M -estimation strategy, which offers dramatic improvements over the conditional QML estimation method when T is small. The results also show that the proposed OPMD estimate of the standard deviation of M -estimator also performs excellently. As discussed in Section 2, a spatial model with SL effects may be more popular due to the fact that it is able to capture the neighborhood or spatial effects on both the mean and variance levels, and hence it is important to have simple and reliable estimation and inference methods for the FE-SDPD models with SL effects. Tables 3a and 3b present partial results for the SLE model with both SL and SE effects. Similar observations as in the two simpler models can be made, except that the estimators for 16
See Yang (2015) for a detailed examination on the bias of estimating a spatial lag model and the general methodology on finite sample bias corrections for nonlinear estimators.
18
the spatial parameters, in particular the spatial error parameter, are more biased, and that the OPMD-based ses of them are slightly less accurate. However, with W3 replaced by Queen contiguity (which results in a weaker spatial error dependence) or T increased to 7, the bias becomes much smaller, and OPMD estimate becomes very accurate (see the supplement). Tables 4a and 4b present partial results for the STL model that incorporates both SL and STL effects into the FE-SDPD model. The proposed estimators again perform excellently in general. The CQMLE of ρ again exhibits a large and persistent bias. Furthermore, the CQMLE of λ2 may also exhibit a large and persistent bias, depending on the true values of ρ, T , the signal-to-noise ratio (SNR), which is σ1 /σv given at the bottom of Table 4a, etc. It performs more poorly with a larger ρ, or a smaller T , or a smaller SNR. Tables 5a and 5b present partial results for the STLE model with all the three spatial effects. The results show that the proposed M -estimators of the model parameters, and the proposed OPMD-based standard errors of the M -estimators perform very well. In contrast, the CQMLEs may perform poorly as in the STL model. Furthermore, the estimation of models containing the STL effect may require a larger signal-to-noise ratio for numerical stability. Finally, for all the models, the nonnormality can have a significant effect on the se of the error variance σv2 , and hence it is important to use the OPMD-based robust standard errors in statistical inferences when the normality of the errors is in doubt.
5
Conclusion and Discussion
We introduce a general strategy (M -estimation) for estimating a fixed-effects dynamic panel data model with three major forms of spatial effects: the spatial lag, space-time lag, and spatial error, based on short panels. The proposed M -estimation method is simple as it is based on the adjusted quasi score functions, and is robust in the sense that it is free from the specification of the initial conditions and allowing errors to be nonnormal. A initial condition free method for estimating the robust standard errors of the M -estimators is also given. These together lead to a complete set of inference methods for the fixed-effects SDPD models that are free from the specification of the initial conditions, and robust against error distributions. The simplicity and generality of the proposed methods render them to be very attractive to the practitioners. An empirical illustration is given in the supplement, which, together with matlab codes, can be downloaded from author’s website. Kuersteiner and Prucha (2015) considered GMM estimation of a more general FE-SDPD model, allowing for endogenous spatial weights, sequentially exogenous regressors, interactive fixed effects, and heteroskedasticity in time and cross-section. GMM estimation requires the initial estimates of some auxiliary parameters, and needs to select instruments and moment weights. Its finite sample performance is not assessed. It would be interesting to compare our M -estimation method with GMM when both applied to our FE-SDPD model. It would also be interesting to extend our methods to allow for endogenous spatial weights, interactive fixed effects, heteroskedasticity in time and cross-section, serial correlation, etc. However, these are clearly beyond the scope of this paper and will be dealt with in future works. 19
Appendix A: Some Basic Lemmas The following lemmas are essential for the proofs of the main results in this paper. Lemma A.1 (Kelejian and Prucha, 1999; Lee, 2002): Let {An } and {Bn } be two sequences of n × n matrices that are uniformly bounded in both row and column sums. Let Cn be a sequence of conformable matrices whose elements are uniformly O(h−1 n ). Then (i) the sequence {An Bn } are uniformly bounded in both row and column sums, (ii) the elements of An are uniformly bounded and tr(An ) = O(n), and (iii) the elements of An Cn and Cn An are uniformly O(h−1 n ). Lemma A.2 (Lee, 2004, p.1918): For W1 and B1 defined in Model (3.1), if W1 and −1 B10 are uniformly bounded, where · is a matrix norm, then B1−1 is uniformly bounded in a neighborhood of λ10 . Lemma A.3 (Lee, 2004, p.1918): Let Xn be an n × p matrix. If the elements Xn are uniformly bounded and limn→∞ n1 Xn Xn exists and is nonsingular, then Pn = Xn (Xn Xn )−1 Xn and Mn = In − Pn are uniformly bounded in both row and column sums. Lemma A.4 (Lemma B.4, Yang, 2015, extended): Let {An } be a sequence of n × n matrices that are uniformly bounded in either row or column sums. Suppose that the elements an,ij of An are O(h−1 n ) uniformly in all i and j. Let vn be a random n-vector of iid elements with mean zero, variance σ 2 and finite 4th moment, and bn a constant n-vector of elements −1/2 of uniform order O(hn ). Then (i) E(vn An vn ) = O( hnn ), (iii) Var(vn An vn + bn vn ) = O( hnn ), 1 (v) vn An vn − E(vn An vn ) = Op(( hnn ) 2 ),
(ii) Var(vn An vn ) = O( hnn ), (iv) vn An vn = Op( hnn ), 1 (vi) vn An bn = Op (( hnn ) 2 ),
and (vii), the results (iii) and (vi) remain valid if bn is a random n-vector independent of vn such that {E(b2ni )} are of uniform order O(h−1 n ). Proof of Lemma A.4: The results in (vii) extend the results (iii) and (vi) of Lemma B.5 of Yang (2015) by allowing bn to be a random vector. Proofs are done similarly. Lemma A.5 (Central Limit Theorem for bilinear quadratic forms). Let {Φn } be a sequence of n × n matrices with row and column sums uniformly bounded, and elements of uniform order O(h−1 n ). Let vn = (v1 , · · · , vn ) be a random vector of iid elements with mean zero, variance σv2 , and finite (4 + 2 0 )th moment for some 0 > 0. Let bn = {bni } be an n × 1 random vector, independent of vn , such that (i) {E(b2ni)} are of uniform order O(h−1 n ), (ii) hn n 2+0 < ∞, (iii) n supi E|bni| i=1 [φn,ii (bni − Ebni )] = op (1) where {φn,ii } are the diagonal hn n 2 elements of Φn , and (iv) n i=1 [bni − E(bni)] = op (1). Define the bilinear-quadratic form: Qn = bn vn + vn Φn vn − σv2 tr(Φn ), 1+2/0
2 be the variance of Q . If lim and let σQ n n→∞ hn n
2 } are bounded away /n = 0 and { hnn σQ n
d
from zero, then Qn /σQn −→ N (0, 1). Proof of Lemma A.5: The proof is lengthy and is given in the supplement to this paper available at the author’s web site: http://www.mysmu.edu/faculty/zlyang/. 20
Appendix B: Proofs of Lemmas 3.1-3.3 To conserve space, only sketches of the proofs are given, for the understanding of M estimation and OPMD methodologies. Detailed proofs are given in the supplement. Proof of Lemma 3.1: Under (3.8) and Assumption A, we have for t ≥ 2, E(Δyt−1 Δvt ) = 2 B −1 B −1 , E(Δy Δv ) = σ 2 (2I − B )B −1 B −1 , E(Δy 2 2 −1 −1 −σv0 t n 0 t+1 Δvt ) = −σv0 (In − B0 ) B10 B30 ; t v0 10 30 10 30 t−(s+1) −1 −1 2 B for t ≥ s + 1 and s ≥ 2, E(ΔytΔvs ) = −σv0 (In − B0 )2 B10 B30 ; and all the terms 0 E(Δyt Δvt+2 ), t ≥ 1, are zero. Summarizing these, we obtain the results of Lemma (3.1). Proof of Lemma 3.2: By (3.8), continuous substitution gives Δyt = B0t−1 Δy1 + t−2 t−3 −1 −1 B0t−2 , B0t−3 , . . . , In, 0, . . ., 0 B−1 10 ΔXβ0 + B0 , B0 , . . . , In , 0, . . ., 0 B10 B30 Δv. Proof of Lemma 3.3: First, Π Δv = Tt=2 Πt Δvt = ni=1 Tt=2 Πit Δvit ≡ ni=1 g1i . n T 2 2 tr[(C ⊗ In )Φ] ≡ σv0 Now, for the terms quadratic in Δv, E(Δv ΦΔv) = σv0 i=1 t=2 dit , T where dit = s=2 (ctsΦii,st ), {cts} = C, and {Φii,ts } = diag(Φts ); and
Δv ΦΔv − E(Δv ΦΔv) n T T T (Φu + Φl + Φd )Δv − σ 2 dit = s t ts ts ts v0 t=2 s=2 Δv i=1 2t=2 T T u n T l d = d −σ s=2 Δvs Φts Δvt + Δvt (Φts + Φts )Δv s n v0 i=1 t=2 it t=2 n T ∗ 2 = i=1 t=2 Δvit Δξit + Δvit Δvit − σv0 dit ≡ i=1 g2i , T l ∗ d where Δξt = Ts=2 (Φu st + Φts )Δvs , and Δvt = s=2 Φts Δvs . Finally, ∗ , Δv Ψy1 = Tt=2 Δvt Ψt+ Δy1 = Δv2 ΘΔy1◦ + Tt=3 Δvt Δy1t ∗ = Ψ Δy . Noting that where Δy1◦ = B30 B10 Δy1 and Δy1t t+ 1
Δy1◦ = B30 B10 Δy1 = B30 B20 Δy0 + B30 Δx1 β0 + Δv1 ,
(B.1)
2 tr(Θ), and and as Δy0 is independent of vt , t ≥ 1 by Assumption A, E(Δv2 ΘΔy1◦ ) = −σv0 2 tr(Θ) Δv2 ΘΔy1◦ − E(Δv2 ΘΔy1◦ ) = Δv2 (Θu + Θl + Θd )Δy1◦ + σv0 n n ◦ 2 = i=1 Δv2i Δζi + i=1 Θii (Δv2iΔy1i + σv0 ), where {Δζi } = Δζ = (Θu + Θl )Δy1◦ . Thus, Δv ΨΔy1 − E(Δv ΨΔy1 ) = ni=1 g3i, where T ◦ + σ2 ) + ∗ g3i = Δv2i Δζi + Θii (Δv2iΔy1i v0 t=3 Δvit Δy1it . , g2i, g3i)|Fn,i−1 ] = 0, {(g1i , g2i, g3i) , Fn,i} form a vector M.D. sequence. As E[(g1i
Appendix C: Proofs of Theorems 3.1-3.3 In proving the theorems, the following matrix results are used: (i) the eigenvalues of a projection matrix are either 0 or 1; (ii) the eigenvalues of a positive definite (p.d.) matrix are strictly positive; (iii) γmin (A)tr(B) ≤ tr(AB) ≤ γmax (A)tr(B) for symmetric matrix A and positive semidefinite (p.s.d.) matrix B; (iv) γmax (A + B) ≤ γmax (A) + γmax(B) for symmetric matrices A and B; and (v) γmax (AB) ≤ γmax (A)γmax(B) for p.s.d. matrices A and B. See, e.g, Bernstein (2009). 21
Proof of Theorem 3.1: From (3.17) and (3.20), we have ⎧ 1 u(δ) Ω−1 ΔY−1 − σ¯ 2 1(δ) E[Δ¯ u(δ) Ω−1 ΔY−1 ], ⎪ 2 (δ) Δˆ ⎪ σ ˆv,M v,M ⎪ ⎪ ⎪ ⎨ 2 1 Δˆ u(δ) Ω−1 W1 ΔY − σ¯ 2 1(δ) E[Δ¯ u(δ) Ω−1 W1 ΔY ], σ ˆv,M (δ) ∗c ∗c v,M ¯ SSTLE (δ) − SSTLE (δ) = 1 ⎪ u(δ) Ω−1 W2 ΔY−1 − σ¯ 2 1(δ) E[Δ¯ u(δ) Ω−1 W2 ΔY−1 ], 2 (δ) Δˆ ⎪ σ ˆv,M ⎪ v,M ⎪ ⎪ 1 ⎩ Δˆ u(δ) ΥΔˆ u(δ) − σ¯ 2 1(δ) E[Δ¯ u(δ)ΥΔ¯ u(δ)], σ ˆ 2 (δ) v,M
v,M
where Υ = 12 (C −1 ⊗ A3 ). With Assumption G, consistency of δˆM follows from: 2 (δ) is bounded away from zero, ¯v,M (a) inf δ∈Δσ 2 2 (δ) = o (1), ¯v,M ˆv,M (δ) − σ (b) supδ∈Δσ p 1 −1 u(δ) Ω−1 ΔY−1 ] = op (1), u(δ) Ω ΔY−1 − E[Δ¯ (c) supδ∈Δ n(T −1) Δˆ u(δ) Ω−1 W1 ΔY ] = op (1), u(δ) Ω−1 W1 ΔY − E[Δ¯ (d) supδ∈Δ n(T1−1) Δˆ u(δ) Ω−1 W2 ΔY−1 ] = op (1), u(δ) Ω−1 W2 ΔY−1 − E[Δ¯ (e) supδ∈Δ n(T1−1) Δˆ u(δ) − E[Δ¯ u(δ) ΥΔ¯ u(δ)] = op (1). u(δ) ΥΔˆ (f ) supδ∈Δ 1 Δˆ n(T −1)
◦ ) given in Proof of (a). By Δ¯ u∗ (δ) = M(B∗1 ΔY − B∗2 ΔY−1 ) + P(B∗1 ΔY ◦ − B∗2 ΔY−1 (3.21), and the orthogonality between the two projection matrices M and P, we have, 2 (δ) = σ ¯v,M
1 E[Δ¯ u∗(δ)Δ¯ u∗ (δ)] = n(T1−1) tr[Var(B∗1 ΔY − B∗2 ΔY−1 )] n(T −1) + n(T1−1) (B∗1 EΔY − B∗2 EΔY−1 ) M(B∗1 EΔY − B∗2 EΔY−1 ).
As M is p.s.d., the second term is nonnegative uniformly in δ ∈ Δ. The first term is 1 1 −1 −1 n(T −1) tr[Ω Var(B1 ΔY −B2 ΔY−1 )] ≥ n(T −1) γmin (C )γmin (B3 B3 )tr[Var(B1 ΔY −B2 ΔY−1 )] > c > 0, uniformly in δ ∈ Δ, by the definition of the matrix C, Assumption E(iv) and the 2 (δ) > c > 0. ¯v,M assumption given in the theorem. It follows that inf δ∈Δ σ 1
u(δ) = M(B∗1 ΔY − B∗2 ΔY−1 ), we have, Proof of (b). Noting that Δˆ u∗ (δ) = Ω− 2 Δˆ 2 (δ) = σ ˆv,M
1 u∗ (δ)Δˆ u∗ (δ) n(T −1) Δˆ
=
1 ∗ n(T −1) (B1 ΔY
− B∗2 ΔY−1 ) M(B∗1 ΔY − B∗2 ΔY−1 ).
1 ∗ ∗ ∗ It follows that, by denoting Q1 = n(T1−1) ΔY B∗ 1 MB1 ΔY , Q2 = n(T −1) ΔY−1 B2 MB2 ΔY−1 , 1 1 ∗ ∗ ∗ ◦ ∗ ◦ ∗ ◦ ∗ ◦ ), Q3 = n(T −1) ΔY B1 MB2 ΔY−1 , and Q4 = n(T −1) (B1 ΔY − B2 ΔY−1 ) P(B1 ΔY − B2 ΔY−1 2 2 (δ) − σ ¯v,M (δ) = (Q1 − EQ1 ) + (Q2 − EQ2 ) − 2(Q3 − EQ3 ) − EQ4 . σ ˆv,M
(C.1)
p
The results follows if Qj − EQj −→ 0, j = 1, 2, 3, and EQ4 −→0, uniformly in δ ∈ Δ. The uniform convergence of Qj − EQj , j = 1, 2, 3, to zero in probability follows from the pointwise convergence for each δ ∈ Δ and the stochastic equicontinuity of Qj , according to 1 1 Theorem 1 of Andrews (1992). Let M∗ = Ω− 2 MΩ− 2 , we have, by Lemma 3.2, Q1 = n(T1−1) Δy1 R B1 M∗ B1 RΔy1 + η B1 M∗ B1 η + Δv S B1 M∗ B1 SΔv +2Δy1 R B1 M∗ B1 η + 2Δy1 R B1 M∗ B1 SΔv + 2η B1 M∗ B1 SΔv , which gives Q1 −EQ1 = 5=1 (Q1, −EQ1,), where Q1, , = 1, . . . , 5, denote the five stochastic terms of Q1 , and EQ1,5 = 0; 22
B M∗ B η ∗ Δy1 R−1 B2 M∗ B2 R−1 Δy1 + η−1 2 −1 + Δv S−1 B2 M B2 S−1 Δv 2 B M∗ B S Δv , +2Δy1 R−1 B2 M∗ B2 η−1 + 2Δy1 R B2 M∗ B2 S−1 Δv + 2η−1 2 −1 2 leading to Q2 −EQ2 = 5=1 (Q2, −EQ2, ), where Q2, , = 1, . . . , 5, denote the five stochastic terms of Q2 , and EQ2,5 = 0; and Q3 = n(T1−1) Δy1 R B1 M∗ B2 R−1 Δy1 + η B1 M∗ B2 η−1 + Δv S B1 M∗ B2 S−1 Δv
Q2 =
1 n(T −1)
+Δy1 R B1 M∗ B2 η−1 + η B1 M∗ B2 R−1 Δy1 + Δy1 R B1 M∗ B2 S−1 Δv +Δv S B1 M∗ B2 R−1 Δy1 + η B1 M∗ B2 S−1 Δv + Δv S B1 M∗ B2 η−1 , leading to Q3 − EQ3 = 8=1 (Q3, − EQ3, ), where Q3, , = 1, . . ., 8, denote the eight random terms in Q3 and the last two terms have expectations zero. Thus, Qk , k = 1, 2, 3, are decomposed into sums of terms of the forms: n(T1−1) Δy1 ΦΔy1 , 1 1 1 1 n(T −1) Δv ΠΔv, n(T −1) Δy1 ΨΔv, n(T −1) Δy1 φ, and n(T −1) Δv ξ, where the matrices Φ, Π and Ψ, and the vectors φ and ξ are defined in terms of R, R−1 , S, S−1 , η, η−1 , B1 , B2 and M∗ . Note that R, R−1 , S, S−1 , η and η−1 depend on true parameter values, whereas B1 depends on λ1 , B2 depends on ρ and λ2 , and M∗ depends on λ3 . For the terms quadratic in Δy1 , they can be written as n1 Δy1 Φ++ (δ)Δy1 where Φ++ (δ) = 1 t s Φt,s (δ). It can easily be seen by Lemma A.1 and Lemma A.3 that for each δ ∈ Δ, T −1 Φt,s (δ) are uniformly bounded in either row or column sums. The pointwise convergence of n1 [Δy1 Φ++ (δ)Δy1 − E(Δy1 Φ++ (δ)Δy1 )] thus follows from Assumption F(iii). For the terms quadratic in Δv, they can be written as n(T1−1) Tt=1 Ts=1 vt Πts vs . The pointwise convergence of n1 [vt Πts vs − E(vt Πts vs )] follows from Lemma A.4 (v), for each t, s = 1, . . . , T . The pointwise convergence of n(T1−1) [Δy1 ΨΔv − E(Δy1 ΨΔv)] follows by writing Δy1 ΨΔv = s Δy1 Ψ+s Δvs and then applying Lemma A.4 (vii) and Assumption F(iv). The pointwise convergence of n(T1−1) [Δy1 φ − E(Δy1 φ)] follows from Assumption F(ii), and of n(T1−1) Δv ξ p
from Chebyshev inequality. Thus, Qk, (δ) − EQk, (δ) −→ 0, for each δ ∈ Δ, and all k and . Now, for all the Qk, (δ) terms, let δ1 and δ2 be in Δ. We have by the mean value theorem: Qk, (δ2 ) − Qk, (δ1 ) =
∂ ¯ ∂δ Qk, (δ)(δ2
− δ1 ),
where δ¯ lies between δ1 and δ2 elementwise. Note that Qk, (δ) is linear or quadratic in ρ, λ1 and λ2 , and thus the corresponding partial derivatives takes simple form. It is easy to show ∂ Qk, (δ)| = Op(1), for ω = ρ, λ1, λ2 . For ∂λ∂ 3 Qk, (δ), note that only the matrix that supδ∈Δ | ∂ω M ∗ involves λ3 . Some algebra leads to the following simple expression for its derivative: d ∗ dλ3 M
˙ −1 ΩM∗ , = M ∗ ΩΩ
˙ −1 = d Ω−1 = C −1 ⊗A3 . Thus, the results supδ∈Δ | ∂ Qk, (δ)| = Op(1) can be easily where Ω dλ3 ∂λ3 proved for all the Qk, (δ) quantities. For example, for Q1,1 (δ), noting that γmax (M) = 1, ∂ Q1,1 (δ)| = supδ∈Δ | n(T1−1) ∂λ∂ 3 Δy1 R B1 M∗ B1 RΔy1 | supδ∈Δ | ∂λ 3 ˙ −1 ΩM∗ B1 RΔy1 | = supδ∈Δ 1 |Δy R B M∗ ΩΩ n(T −1)
1
1
1
n(T −1)
˙ −1 B1 RΔy1 | ≤ supδ∈Δ n(T1−1) |Δy1 R B1 Ω ˙ −1 )γmax (B B1 ) 1 |Δy R RΔy1 | = Op(1), ≤ γmax (Ω 23
1
by Assumption F(i). It follows that Qk, (δ) are stochastically equicontinuous. Hence, by p Theorem 1 of Andrews (1992), Qk, (δ) − EQk, (δ) −→ 0, uniformly in δ ∈ Δ for all k and . p It follows that Qk (δ) − EQk (δ) −→ 0, uniformly in δ ∈ Δ, k = 1, 2, 3. It left to show that EQ4 (δ) → 0, uniformly in δ ∈ Δ. We have, EQ4 =
1 −1 −1 −1 −1 n(T −1) tr[Ω ΔX(ΔX Ω ΔX) ΔX Ω Var(B1 ΔY − B2 ΔY−1 )] −1 (ΔX Ω−1 ΔX) tr[ΔX Var(B1 ΔY − B2 ΔY−1 )ΔX] ≤ n(T1−1) γmax (Ω−2 )γmin −1 ΔX Ω−1 ΔX 1 = n(T1−1) γmax (Ω−2 )γmin n(T −1) n(T −1) tr[ΔX Var(B1 ΔY − B2 ΔY−1 )ΔX].
As Ω−1 = C −1 ⊗ B3 B3 , we have by the matrix C defined at the beginning of Section 3.1 and Assumption E(iv), 0 < cw ≤ inf λ3 ∈Λ3 γmin (Ω−1 ) ≤ supλ3 ∈Λ3 γmin (Ω−1 ) ≤ c¯w < ∞. By ΔX Ω−1 ΔX ≤ γmin ΔXn(T ≤ Assumption D, we have, 0 < cx ≤ inf λ3 ∈Λ3 γmin (Ω−1 )γmin ΔX n(T −1) −1) ΔX Ω−1 ΔX ΔX γmax n(T −1) ≤ supλ3 ∈Λ3 γmax (Ω−1 )γmax ΔX ¯x < ∞. It follows that n(T −1) ≤ c EQ4 ≤ ≤ =
1 1 c¯2 c tr[ΔX Var(B1 ΔY − B2 ΔY−1 )ΔX] n(T −1) w x n(T −1) 1 1 c¯2 c c¯ tr[ΔX ΔX], by the assumption n(T −1) w x y n(T −1) O(n−1 ), by Assumption D.
in Theorem 3.1
p
2 (δ) − σ 2 (δ) −→ 0, uniformly in δ ∈ Δ, completing the proof of (b). ¯v,M Hence, σ ˆv,M
u∗ (δ)) given Proofs of (c)-(f ). By the expressions of Δˆ u(δ) and Δ¯ u(δ) (or Δˆ u∗ (δ) and Δ¯ earlier and Lemma 3.2, all the quantities inside | · | in (c)-(f) can all be expressed in the forms similar to (C.1). Thus, the proofs of (c)-(f) follow the proof of (b). Proof of Theorem 3.2: We have by the mean value theorem, 0= √
1 S ∗ (ψˆSTLE) n(T −1) STLE
=√
1 S ∗ (ψ0 ) + n(T −1) STLE
1 ∂ ∗ ¯ n(T −1) ∂ψ SSTLE (ψ)
n(T − 1)(ψˆM − ψ0 ),
where ψ¯ lies elementwise between ψˆM and ψ0 . The result of the theorem follows if D ∗ SSTLE (ψ0 ) −→ N 0, limn→∞ Γ∗STLE (ψ0 ) , (a) √ 1 n(T −1) p ∂ ∗ ∂ ∗ ¯ (b) n(T1−1) ∂ψ SSTLE(ψ) − ∂ψ SSTLE (ψ0 ) −→ 0, and ∂ ∗ p ∂ ∗ −→ 0. (c) n(T1−1) ∂ψ SSTLE(ψ0 ) − E ∂ψ SSTLE (ψ0 ) ∗ (ψ0 ) consists of three types of elements: Proof of (a). From (3.24), we see that SSTLE T ∗ Π Δv, Δv ΦΔv and Δv ΨΔy1 , which can be written as Π Δv = t=1 Πt vt , ΔvΦΔv = T T T ∗ ∗ ∗ ∗ ∗ t=1 s=1 vt Φts vs , and Δv ΨΔy1 = t=1 vt Ψt Δy1 , where Πt , Φts and Ψt are formed by −1 B20 y0 + η1 + the elements of the partitioned Π, Φ and Ψ, respectively. By (2.1), y1 = B10 T T T −1 −1 Ψ∗∗ y + Ψ∗+ v + ∗ B30 v1 , leading to Tt=1 vt Ψ∗t Δy1 = v v B10 1 t=1 t t 0 t=1 t t t=1 vt Ψt η1 , for ∗+ suitably defined non-stochastic quantities η1 , Ψ∗∗ t and Ψt . These show that, for every non∗ (ψ0 ) can be expressed as zero (p + 5) × 1 vector of constants c, c SSTLE ∗ (ψ0 ) = c SSTLE
T T t=1 s=1
vt Ats vs +
T t=1
24
vt Bt v1 +
T t=1
vt g(y0 ) − c μ∗ ,
for suitably defined non-stochastic matrices Ats and Bt , vector μ∗ , and the function g(y0 ) lin∗ c SSTLE (ψ0 ) ear in y0 . As, {y0 , v1 , . . . , vT } are independent, the asymptotic normality of √ 1 n(T −1)
follows from Lemma A.5. Finally, the Cram´er-Wold devise leads to the joint asymptotic normality. ∗ (ψ) = Proof of (b). The Hessian matrix, HSTLE ∗ = − σ12 ΔX Ω−1 ΔX, Hββ
v v
1 ∗ −1 Δu(θ), Hβσ 2 = − σ 4 ΔX Ω v
v
n(T −1) , 2σv4
Hσ∗2 λ2 = − σ14 ΔY−1 W2 Ω−1 Δu(θ), v v ˙ −1 Δu(θ), H ∗2 = 1 4 Δu(θ) Ω
v
∗ = − σ12 ΔX Ω−1 ΔY−1 , Hβρ
has the elements:
Hσ∗2 σ2 = − σ16 Δu(θ) Ω−1 Δu(θ) +
v
v
∂ ∗ ∂ψ SSTLE (ψ),
∗ = − σ12 ΔX Ω−1 W1 ΔY, Hβλ 1 v 1 ∗ −1 = − Hβλ σv2 ΔX Ω W2 ΔY−1 , 2 ∗ ˙ −1 Δu(θ), = σ12 ΔX Ω Hβλ 3 v Ω−1 Δu(θ), Hσ∗2 ρ = − σ14 ΔY−1 v v Hσ∗2 λ1 = − σ14 ΔY W1 Ω−1 Δu(θ), v v 1 ∗ ˙ −1 = ΔY Hρλ 2 −1 Ω Δu(θ), σv 3 ˙ −1 Δu(θ), Hλ∗1 λ3 = σ12 ΔY W1 Ω v W Ω ˙ −1 Δu(θ), Hλ∗2 λ3 = σ12 ΔY−1 2 v
σv λ3 2σv ∗ Hρρ = − σ12 ΔY−1 Ω−1 ΔY−1 + tr(C−1 D−1,ρ ), v ∗ Hρλ = − σ12 ΔY−1 Ω−1 W1 ΔY + tr(C−1 D−1,λ1 ), 1 v ∗ Ω−1 W ΔY −1 Hρλ = − σ12 ΔY−1 2 −1 + tr(C D−1,λ2 ), 2 v Hλ∗1 λ1 = − σ12 ΔY W1 Ω−1 W1 ΔY + tr(C−1 Dλ1 W1 ), v Hλ∗1 λ2 = − σ12 ΔY W1 Ω−1 W2 ΔY−1 + tr(C−1 Dλ2 W1 ), v W Ω−1 W ΔY −1 Hλ∗2 λ2 = − σ12 ΔY−1 2 −1 + tr(C D−1,λ2 W2 ), 2 v Hλ∗3 λ3 = − σ12 Δu(θ) [C −1 ⊗ (W3 W3 )]Δu(θ) − tr(G23 ). v
∂ ∂ D−1 and Dω = ∂ω D, ω = ρ, λ1, λ2 , and G3 = W3 B−1 where Ω˙ −1 = ∂λ∂ 3 Ω−1 , D−1,ω = ∂ω 3 . 1 ∗ It is easy to show that n(T −1) HSTLE (ψ0 ) = Op(1) by Lemma A.1 and the model assumpp 1 ¯ = Op(1) because ψ¯ − ψ0 = op (1) which is implied by ψˆM −→ H ∗ (ψ) ψ0 . tions. Thus,
n(T −1) STLE −r 2 ∗ ¯ −r = σv0 + op (1), r = 2, 4, 6. Noting that σ r appears in HSTLE (ψ) multiplicaAs σ ¯ −→ σv0 , σ 1 1 ∗ ∗ 2 2 2 ¯ ¯ ¯ ¯ by σv0 results in tively, n(T −1) HSTLE (ψ) = n(T −1) HSTLE(β, σv0 , ρ¯, λ) + op (1), i.e., replacing σ 2
p
an asymptotically negligible error. The proof of (b) is thus equivalent to the proof of p ∗ 1 ¯ σ 2 , ρ¯, λ) ¯ − H ∗ (ψ0 ) −→ 0. HSTLE (β, STLE v0 n(T −1) From Δu(θ) = Δu − (λ1 − λ10 )W1 ΔY − (ρ − ρ0 )ΔY−1 − (λ2 − λ20 )W2 ΔY−1 − ΔX(β − β0 ), −1 2 2 −1 ⊗ (W W ) − (λ − λ )C −1 ⊗ (W + W ), and Ω ˙ −1 = 3 ) − Ω (λ30) = (λ3 − λ30)C 3 30 3 3 3 3 ∗ (ψ) are linear, bilinear, −C −1 ⊗ (W3 B3 + B3 W3 ), we see that all the random elements of HSTLE or quadratic in ΔY , ΔY−1 or Δu, and linear or quadratic in β, ρ, and λ. This means that all ∗ ¯ σ 2 , ρ¯, λ) ¯ − H ∗ (ψ0 ) are linear, bilinear, or (β, the corresponding elements in n(T1−1) HSTLE STLE v0 ¯ − λ0 , quadratic in ΔY , ΔY−1 or Δu, and linear, bilinear or quadratic in β¯ − β0 , ρ¯ − ρ0 , and λ and thus are all op (1) by the consistency of ψˆM , Lemma 3.2, Lemma A.1 and Assumption F. ∗ ∗ ¯ σ 2 , ρ¯, λ)−H ¯ (β, It left to show that all the ‘trace’ terms in n(T1−1) HSTLE STLE (ψ0 ) are op (1), v0 ¯ 1, λ ¯ 2 )W1 ) − tr(C−1 Dλ (ρ0 , λ10, λ20)W1 )] = op(1), for Hλ λ . Let ρ, λ e.g., n(T1−1) [tr(C−1 Dλ1 (¯ 1 1 1 ∗ ∗ ∗ ¯ ¯ ρ, λ1 , λ2) and (ρ0 , λ10, λ20). By the mean value theorem, (ρ , λ1, λ2 ) be between (¯ 1 −1 ¯ 1, λ ¯ 2)W1 ) − tr(C−1 Dλ (ρ0 , λ10, λ20)W1 )] [tr(C Dλ (¯ ρ, λ Ω−1 (λ
=
1 n(T −1) ρ∗ ρ−ρ ¯ 0 −1 n(T −1) tr(C Dλ1 W1 ) ∗
λ∗1 λ1
1
+
∗ ∗ ¯ 1 −λ10 ¯ 2 −λ20 λ λ −1 λ1 −1 λ2 n(T −1) tr(C Dλ1 W1 ) + n(T −1) tr(C Dλ1 W1 ),
λ∗2 λ1
where Dρλ1 , D and D are the partial derivatives of Dλ1 evaluated at (ρ∗ , λ∗1, λ∗2 ). Consider, W.L.O.G., T = 3. Recall B1 = In − λW1 , B2 = ρIn + λ2 W2 and B = B2−1 B2 . We have 25
D(ρ, λ1, λ2) =
B1−1 B1−1 B2 B1−1 , (In − B1−1 B2 )2 B1−1 , B1−1 B2 B1−1
.
This shows that the elements of Dλ1 are the multiplications of the matrices W1 , B1−1 and B2 . Subsequently, Dρλ1 , Dλλ11 and Dλλ21 have elements being the multiplications of the matrices W1 , W2 , B1−1 (λ1 ), and B2 (ρ, λ2), and hence are uniformly bounded in a matrix norm, in the ∗ neighborhood of (ρ0 , λ10, λ20) by Lemmas A.1 and A.2. Therefore, n(T1−1) tr(C−1 Dρλ1 W1 ) = Op (1),
∗ 1 −1 λ1 n(T −1) tr(C Dλ1 W1 )
= Op(1), and
∗ 1 −1 λ2 n(T −1) tr(C Dλ1 W1 )
= Op(1), leading to (b).
Proof of (c). First, for the terms involving only Δu (linear or quadratic), the results follows Lemma A.4(v)-(vi), noticing Δu = B−1 30 Fv where Fv = Δv. For example, Hσ∗2 λ3 (ψ0 ) − E[Hσ∗2 λ3 (ψ0 )] = v
v
1 ˙ −1 4 [Δu Ω0 Δu 2σv0
˙ −1 Δu)] = − E(Δu Ω 0
1 4 [v Av 2σv0
− E(v Av)],
˙ −1 −1 where A = F B−1 30 Ω0 B30 F, which is easily seen to be uniformly bounded in both row and column sums. Thus, Lemma A.4(v) leads to n(T1−1) {Hσ∗2 λ3 (ψ0 ) − E[Hσ∗2 λ3 (ψ0 )]} = op (1). v v Second, by Lemma 3.2 all the terms involving ΔY and ΔY−1 can be written as sums of the terms linear in Δy, quadratic in Δy, bilinear in Δy and Δv, or quadratic in Δv. Thus, the results follow by repeatedly applying Lemma A.1, Lemma A.4, and Assumption F. p Proof of Theorem 3.3: First, the result Σ∗STLE(ψˆM ) − Σ∗STLE(ψ0 ) −→ 0 is implied by the p gigˆi − E(gigi )] −→ 0 follows result (b) in the proof of Theorem 3.2. The result n(T1−1) ni=1 [ˆ p p gigˆi − gi gi ) −→ 0 and n(T1−1) ni=1 [gigi − E(gi gi )] −→ 0. The proof of the from n(T1−1) ni=1 (ˆ former is straightforward by applying the mean value theorem. We focus on the proof of the ∗ (ψ0 ) are mixtures of terms of the forms Π Δv = ni=1 g1i , latter result. As the elements of SSTLE Δv ΦΔv − E(Δv ΦΔv) = ni=1 g2i and Δv ΨΔy1 − E(Δv ΨΔy1 ) = ni=1 g3i, it suffices to show that n 1 i=1 [gkigri − E(gkigri )] = op (1), k, r = 1, 2, 3. n(T −1)
To facilitate the proof, the following dot notation is convenient: (a) for an n(T − 1) × 1 vector Δv with elements {Δvit } double indexed by i = 1, . . . , n for each t = 2, . . ., T , Δv·t is the subvector that contains all the elements with the same t, and Δvi· is the subvector that picks up the elements with the same i; (b) for an n(T − 1) × n(T − 1) matrix Φ with elements {Φit,js , i, j = 1, . . ., n; t, s = 2, . . . , T }, where it is the double index for the rows and js the double index for the columns, Φ·t,·s is the n × n submatrix corresponding to the (t, s) periods, Φi·,j· the (T − 1) × (T − 1) submatrix corresponding to the (i, j) units, Φit,j· the (T − 1) × 1 subvector that picks up the element from the itth row corresponding to s = 2, . . ., T . With the vector dot notation, the gri , r = 1, 2, 3, defined in Lemma 3.3 can be written as ◦ 2 + σv0 )+ g1i = Πi· Δvi· , g2i = Δvi· Δξi· + Δvi· Δvi·∗ − 1T −1 di· , and g3i = Δv2iΔζi + Θii (Δv2iΔy1i ∗ Δvi− Δy1i− where ‘− ’ plays the same role as ‘·’ but corresponds to t = 3, . . . , T . Note that under Assumptions D and E, one can easily see by Lemma A.1 that the elements of all the Π’s, Φ’s,, and Ψ’s, defined in (3.24), are uniformly bounded. The proofs proceed by applying the weak law of large numbers (WLLN) for M.D. arrays, see, e.g., Davidson (1994, p. 299). n 1 − E(g g )] = First, with g1i = Πi· Δvi· , n(T1−1) ni=1 [g1ig1i 1i 1i i=1 Πi· (Δvi· Δvi· − n(T −1) n 1 2 σv0 C)Πi· ≡ n(T −1) i=1 Un,i , where C is defined below (3.2). Without loss of generality, 26
assume Uni is a scalar, as if not we can work on each element of it. Clearly, {Un,i } are independent, thus form a M.D. array. By Assumption B and using the fact that the elements of Πi· are uniformly bounded, it is easy to show that E|Un,i|1+ ≤ Ku < ∞, for > 0. Thus, {Un,i } are uniformly integrable. The other two conditions of WLLN for M.D. arrays p of Davidson are satisfied with the constant coefficients n(T1−1) . Thus, n(T1−1) ni=1 Un,i −→ 0. Second, with g2i = Δvi· Δξi· + Δvi· Δvi·∗ − 1T −1 di· , we have, n 1 2 2 i=1 [g2i − E(g2i)] n(T −1) = n(T1−1) ni=1 [(Δvi· Δξi· )2 − E((Δvi· Δξi·)2 )] + n(T1−1) ni=1 [(Δvi· Δvi·∗ )2 − E((Δvi· Δvi·∗ )2 )] + n(T2−1) ni=1 (Δvi· Δξi·)(Δvi· Δvi·∗ ) − n(T2−1) ni=1 (1T −1 di·)(Δvi· Δξi· ) − n(T2−1) ni=1 [(1T −1 di·)(Δvi· Δvi·∗ − E(Δvi· Δvi·∗ ))] ≡ 5r=1 Hr . n σ2 2 C)Δξi· ]+ n(Tv0 Now, H1 = n(T1−1) ni=1 [Δξi· (Δvi·Δvi· −σvo i=1 [Δξi· CΔξi· −E(Δξi· CΔξi· )]. −1) 2 C)Δξi· . As Δξi· is Gn,i−1 -measurable, For the first term, let Vn,i = Δξi· (Δvi· Δvi· − σv0 1+ | ≤ E(Vn,i|Gn,i−1 ) = 0. Thus, {Vn,i, Gn,i } form a M.D. array. It is easy to see that E|Vn,i Kv < ∞, for some > 0. Thus, {Vn,i} is uniformly integrable. The other two conditions of p the WLLN for M.D. arrays of Davidson are satisfied. Thus, n(T1−1) ni=1 Vn,i −→ 0. C Δξ , where {C } = C For the second term of H1 , note that Δξi· CΔξi· = t s Δξit ts is ts T u is a constant matrix defined above (3.3), and recall ξt = s=2 (Φst + Φts )Δvs . We have, i−1 T i−1 Δξit = Ts=2 i−1 j=1 (Φjs,it + Φit,js )Δvjs = j=1 s=2 (Φjs,it + Φit,js)Δvjs = j=1 φijt Δvj· , 2 where φijt = (Φj·,it + Φit,j·). Thus, (Δξit)2 − E[(Δξit)2 ] = i−1 j=1 [φijt (Δvj· Δvj· − σv0 C)φijt ] + i−1 j−1 φijt φikt Δvk· . It follows that 2 j=1 k=1 Δvj· n 1 2 2 i=1 {(Δξit ) − E[(Δξit ) ]} n(T −1) n 2 = n(T1−1) n−1 j=1 i=j+1 [φijt (Δvj· Δvj· − σv0 C)φijt ] n j−1 +2 n(T1−1) n−1 j=1 Δvj· i=j+1 k=1 φijt φikt Δvk· . Clearly, the first term is the ‘average’ of n − 1 independent terms, and the second is the ‘average’ of a M.D. array as the term in the curling brackets is Gn,j−1 -measurable. Conditions of Theorem 19.7 of Davidson (1994) are easily verified, and hence n(T1−1) ni=1 {(Δξit)2 − E[(Δξit)2 ]} = op (1). Similarly, one shows that n(T1−1) ni=1 {ΔξitΔξis −E[(Δξit Δξis )]} = op (1) n σ2 for s = t. Thus, n(Tv0 i=1 [Δξi· CΔξi· − E(Δξi· CΔξi· )] = op (1), and H1 = op (1). −1) The proofs for H3 and H4 can be done in a similar manner as the proof for the second term of H1 . The proofs for H2 and H5 are similar to the proof of the first part of H1 , as they each involves a sum of n independent terms. ◦ 2 ∗ + σv0 ) + Δvi− Δy1i− , we obtain, Third, with g3i = Δv2iΔζi + Θii (Δv2iΔy1i 1 n(T −1)
Q1 = Q2 =
1 n(T −1) 2 2σv0 n(T −1)
n
2 i=1 [g3i
2 )] ≡ − E(g3i
n
2 2 2 i=1 [(Δv2i − 2σv0 )Δζi ], n 2 2 i=1 [Δζi − E(Δζi )],
27
10 r=1
Qr , where
n 1 2 ◦ 2 ◦ 2 i=1 Θii [(Δv2iΔy1i ) − E((Δv2iΔy1i) )], n(T −1) 2 n 2σv0 2 ◦ ◦ Q4 = n(T −1) i=1 Θii [Δv2iΔy1i − E(Δv2iΔy1i )], n Δy ∗ )2 − E((Δv Δy ∗ )2 )], Q5 = n(T1−1) i=1 [(Δvi− 1i− i− 1i− n 2 2 ◦ 2 ◦ )], Q6 = n(T −1) i=1 Θii [Δv2iΔζi Δy1i − E(Δv2i ΔζiΔy1i 2 2σv0 n Q7 = n(T −1) i=1 Θii Δv2iΔζi , 2 Δy ∗ ) − E(Δv Δζ (Δv Δy ∗ ))], Q8 = n(T −1) ni=1 [Δv2iΔζi (Δvi− 2i i 1i− i− 1i− ◦ )(Δv Δy ∗ ) − E((Δv Δy ◦ )(Δv Δy ∗ ))], Q9 = n(T2−1) ni=1 Θii [(Δv2iΔy1i 2i i− 1i− 1i i− 1i− 2 n 2σv0 ∗ ∗ Q10 = n(T −1) i=1 Θii [Δvi− Δy1i− − E(Δvi− Δy1i− )]. Q3 =
As Δζi2 is Fn,i−1 -measurable, Q1 is the average of a M.D. array and its convergence follows from WLLN for M.D. array, and the convergence of Q7 immediately follows. For Q2 , note that 2 n 2σv0 Δζ = (Θu + Θ )Δy1◦ = (Θu + Θ )B30 B10 Δy1 . It follows that Q2 = n(T −1) i=1 (Δy1 AΔy1 − E(Δy1 AΔy1 )] = op(1) by Assumption F, where A = ((Θu + Θ )B30 B10 ) (Θu + Θ )B30 B10 is easily seen to be uniformly bounded in both row and column sums. Writing Δy1◦ = B30 B10 Δy0 + B30 Δx1 β0 + Δv1 ≡ g(y0 , v0 ) + v1 , the convergence of Q3 , Q4 and Q6 can be easily proved though tedious. The results for Q5 and Q10 are proved by the independence ∗ and Δy1i− and Assumption F. Finally, the results for Q8 and Q9 can be proved between Δvi− ∗ by further writing Δy1t = Φt+ Δy1 = Φt+ (B30 B10 )−1 Δy1◦ ≡ q(Δy0 , v0) + Φt+ (B30 B10 )−1 v1 . Subsequently, the cross-product terms: n(T1−1) ni=1 [g1ig2i−E(g1ig2i )], n(T1−1) ni=1 [g1ig3i− E(g1ig3i)], and n(T1−1) ni=1 [g2ig3i − E(g2ig3i )], can all be decomposed in a similar manner, and the convergence of each of the decomposed terms can be proved in a similar way.
References [1] Anderson, T. W., Hsiao, C., 1981. Estimation of dynamic models with error components. Journal of American Statistical Association 76, 598-606. [2] Anderson, T. W., Hsiao, C., 1982. Formulation and estimation of dynamic models using panel data. Journal of Econometrics 18, 47-82. [3] Andrews, D. W. K., 1992. Generic uniform convergence. Econometric Theory 8, 241-257. [4] Andrews, D. W. K., 2005. Cross-section regression with common shocks. Econometrica 73, 1551-1585. [5] Anselin, L., 1988. Spatial Econometrics: Methods and Models. The Netherlands: Kluwer Academic Press. [6] Anselin, L., 2001. Spatial econometrics. In: Baltagi, B. H. (Eds.), A companion to theoretical econometrics. Blackwell Publishers Ltd., Massachusetts, pp. 310-330. [7] Anselin, L., Le Gallo, J., Jayet, J., 2008. Spatial panel econometrics. In: M´ aty´ as, L., Sevestre, P. (Eds.), The Econometrics of Panel Data: Fundamentals and Recent Developments in Theory and Practice. Springer-Verlag, Berlin Heidelberg, pp. 625-660. [8] Bai, J., 2009. Panel data models with interactive fixed effects. Econometrica 77, 12291279. 28
[9] Baltagi, B. H., Egger, P., Pfaffermayr, M., 2013. A generalized spatial panel model with random effects. Econometric Reviews 32, 650-685. [10] Bernstein, D. S., 2009. Matrix Mathematics: Theory, Facts, and Formulas. 2nd edition. Princeton University Press, Princeton. [11] Bhargava, A., Sargan, J. D., 1983. Estimating dynamic random effects models from panel data covering short time periods. Econometrica 51, 1635-1659. [12] Binder, M., Hsiao, C., Pesaran, M. H., 2005. Estimation and inference in short panel vector autoregressions with unit roots and cointegration. Econometric Theory 21, 795837. [13] Bun, M.J., Carree, M.A., 2005. Bias-corrected estimation in dynamic panel data models. Journal of Business and Economic Statistics 23, 200-210. ˇ ıˇ [14] C´ z k, P., Jacobs, J. P. A. M., Ligthart J., Vrijburg, H., 2014. GMM estimation of fixed effects dynamic panel data models with spatial lag and spatial errors. CentER Discussion Paper No. 2014-003, Tilburg University. [15] Davidson, J., 1994. Stochastic Limit Theory. Oxford University Press, Oxford. [16] Elhorst, J. P., 2005. Unconditional maximum likelihood estimation of linear and loglinear dynamic models for spatial panels. Geographical Analysis 37, 85-106. [17] Elhorst, J. P., 2010. Dynamic panels with endogenous interaction effects when T is small. Regional Science and Urban Economics 40, 272-282. [18] Elhorst, J. P., 2012. Dynamic spatial panels: models, methods, and applications. Journal of Geographical Systems 14, 5-28. [19] Gouri´eroux, C. Phillips, P. C. B, Yu, J., 2010. Indirect inference for dynamic panel models. Journal of Econometrics 157, 68-77. [20] Hsiao, C., Pesaran, M. H., Tahmiscioglu, A. K., 2002. Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods. Journal of Econometrics 109, 107-150. [21] Hsiao, C., 2003. Analysis of Panel Data. 2nd edition. Cambridge University Press. [22] Huber, P. J., 1964. Robust estimation of a location parameter. Annals of Mathematical Statistics 35, 73-101. [23] Huber, P. J., 1981. Robust Statistics. Wiley, New York. [24] Kapoor, M., Kelejian, H. H., Prucha, I. R., 2007. Panel data models with spatially correlated error components. Journal of Econometrics 140, 97-130. [25] Kelejian, H. H., Prucha, I. R., 1999. A generalized moments estimator for the autoregressive parameter in a spatial model. International Economic Review 40, 509-533. [26] Kelejian H. H., Prucha, I. R., 2001. On the asymptotic distribution of the Moran I test statistic with applications. Journal of Econometrics 104, 219-257. [27] Korniotis, G. M., 2010. Estimating panel models with internal and external habit formation. Journal of Business and Economic Statistics 28, 145-158. [28] Kruiniger, H., 2013. Quasi ML estimation of the panel AR(1) model with arbitrary initial conditions. Journal of Econometrics 173, 175-188. 29
[29] Kuersteiner, G. M., Prucha, I. R., 2015. Dynamic panel data models: networks, common shocks, and sequential exogeneity. CESifo Working Paper, University of Maryland. [30] Lee, L. F., 2002. Consistency and efficiency of least squares estimation for mixed regressive spatial autoregressive models. Econometric Theory 18, 252-277. [31] Lee, L. F., 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72, 1899-1925. [32] Lee, L. F., Yu, J., 2010a. A spatial dynamic panel data model with both time and individual fixed effects. Econometric Theory 26, 564-597. [33] Lee, L. F., Yu, J., 2010b. Some recent developments in spatial panel data models. Regional Science and Urban Economics 40, 255-271. [34] Lee, L. F., Yu, J., 2012. Spatial panels: random components vs. fixed effects. International Economic Review 53, 1369-1412. [35] Lee, L. F., Yu, J., 2014. Efficient GMM estimation of spatial dynamic panel data models with fixed effects. Journal of Econometrics 180, 174-197. [36] Lee, L. F., Yu, J., 2015. Spatial panel data models. In: Baltagi, B. H. (Eds.), The Oxford Handbook of Panel Data. Oxford University Press, Oxford, pp. 363-401. [37] Lee, L. F., Yu, J., 2016. Identification of spatial Durbin panel models. Journal of Applied Econometrics 31, 133-162. [38] Mutl, J., 2006. Dynamic panel data models with spatially correlated disturbances. PhD Thesis, University of Maryland, College Park. [39] Newey, W., McFadden, D., 1994. Large sample estimation and hypothesis testing. In: McFadden, D., Engle, R. F. (Eds.), Handbook of Econometrics Vol. IV. Elsevier, North Holland, pp. 2111-2245. [40] Pesaran, M. H., 2006. Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica 74, 967-1012. [41] Pesaran, M. H., Tosetti, E., 2011. Large panels with common factors and spatial correlation. Journal of Econometrics 161, 182-202. [42] Su, L., Yang, Z., 2015. QML estimation of dynamic panel data models with spatial errors. Journal of Econometrics 185, 230-258. [43] van der Vaart, A. W., 1998. Asymptotic Statistics. Cambridge University Press. [44] Yang, Z., Li, C., Tse, Y. K., 2006. Functional form and spatial dependence in dynamic panels. Economics Letters 91, 138-145. [45] Yang, Z. (2015). A general method for third-order bias and variance correction on a nonlinear estimator. Journal of Econometrics 186, 178-200. [46] Yu, J., de Jong, R., Lee, L. F., 2008. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. Journal of Econometrics 146, 118-134. [47] Yu, J., Lee, L. F., 2010. Estimation of unit root spatial dynamic panel data models. Econometric Theory 26, 1332-1362.
30
Table 1a. Empirical Mean(sd) of CQMLE, FQMLE and M-Estimator, SE Model, m = 5 err ψ CQMLE FQMLE M-Est CQMLE FQMLE M-Est n = 50, T = 3 n = 200, T = 3 1 1 1.0152(.096) 1.0017(.100) 1.0015(.100) 1.0109(.050) 1.0021(.052) 1.0020(.053) 1 .9154(.135) .9678(.148) .9719(.154) .9080(.065) .9960(.079) .9962(.080) .5 .3605(.055) .4995(.065) .5015(.066) .2869(.033) .5009(.043) .5013(.044) .5 .4702(.107) .4761(.093) .4793(.105) .4775(.073) .4877(.060) .4907(.070) 2
1 1 .5 .5
1.0142(.098) .9176(.266) .3610(.066) .4701(.106)
1.0007(.102) .9662(.284) .4975(.069) .4770(.092)
1.0002(.102) .9785(.307) .5023(.078) .4803(.104)
1.0099(.050) .9045(.128) .2876(.041) .4741(.075)
1.0015(.053) .9920(.152) .5002(.047) .4844(.063)
1.0014(.053) .9935(.155) .5018(.052) .4883(.072)
3
1 1 .5 .5
1.0133(.099) .9192(.198) .3585(.059) .4681(.110)
1.0001(.103) .9678(.212) .4953(.066) .4736(.093)
.9997(.103) .9771(.227) .4992(.071) .4786(.106)
1.0090(.047) .9060(.099) .2881(.036) .4741(.075)
1.0003(.049) .9938(.119) .5018(.046) .4852(.062)
1.0003(.049) .9947(.121) .5029(.048) .4884(.073)
1
1 1 0 .5
1.0525(.100) .9204(.138) -.1524(.065) .4731(.106)
1.0035(.103) .9255(.126) -.0036(.074) .4848(.085)
1.0012(.104) .9702(.154) .0032(.078) .4807(.105)
1.0517(.052) .9313(.069) -.1825(.035) .4820(.072)
1.0009(.053) .9712(.066) -.0032(.042) .4897(.059)
.9999(.053) .9915(.078) .0005(.043) .4881(.070)
2
1 1 0 .5
1.0528(.099) .9230(.265) -.1529(.071) .4741(.103)
1.0042(.102) .9032(.241) -.0091(.076) .4880(.086)
1.0006(.104) .9764(.299) .0022(.086) .4805(.102)
1.0479(.053) .9327(.133) -.1821(.039) .4806(.073)
.9979(.055) .9596(.129) -.0042(.043) .4917(.059)
.9962(.055) .9940(.150) .0018(.047) .4873(.072)
3
1 1 0 .5
1.0515(.100) .9250(.200) -.1543(.068) .4740(.107)
1.0021(.103) .9194(.185) -.0077(.076) .4855(.088)
.9990(.104) .9767(.224) .0014(.083) .4811(.105)
1.0497(.053) .9319(.102) -.1831(.037) .4834(.072)
.9998(.054) .9661(.100) -.0045(.043) .4929(.057)
.9985(.054) .9924(.115) .0001(.045) .4906(.070)
1
1 1 .5 .5
1.0248(.044) .9771(.081) .4456(.028) .4928(.057)
2
1 1 .5 .5
1.0247(.045) .9776(.183) .4461(.028) .4919(.058)
n = 50, T = 7 1.0015(.044) 1.0013(.044) .9888(.083) .9893(.083) .4987(.029) .4990(.029) .4920(.055) .4947(.056) 1.0012(.045) .9887(.186) .4988(.029) .4914(.055)
1.0010(.045) .9899(.187) .4992(.029) .4941(.057)
1.0231(.033) .9821(.059) .4407(.021) .4931(.047) 1.0232(.033) .9806(.129) .4412(.022) .4906(.048)
n = 100, T = 7 1.0018(.033) 1.0017(.033) .9949(.060) .9956(.061) .4990(.022) .4994(.022) .4904(.044) .4953(.046) 1.0020(.033) .9931(.132) .4991(.022) .4882(.046)
1.0019(.033) .9942(.133) .4996(.022) .4928(.047)
1 1.0250(.044) 1.0015(.044) 1.0013(.044) 1.0214(.033) 1.0003(.033) 1.0002(.033) 1 .9751(.130) .9863(.133) .9872(.134) .9779(.095) .9908(.097) .9915(.097) .5 .4458(.028) .4986(.029) .4990(.029) .4413(.020) .4996(.021) .5000(.021) .5 .4903(.057) .4896(.056) .4923(.057) .4919(.048) .4898(.045) .4940(.047) Note: Par = ψ = (β, σv2 , ρ, λ3 ) ; err = 1 (normal), 2 (normal mixture), 3 (chi-square). Xt values are generated with θx = (g, φ1 , φ2 , σ1 , σ2) = (.01, .5, .5, 1, .5) as in Footnote 13. W3 is generated according to Group Interaction scheme as in Footnote 14. 3
31
err ψ
Table 1b. Empirical sd and average of standard errors of M-Estimator SE Model, m = 5, Parameter configurations as in Table 1a. sd se se rse sd se se rse sd se se rse n = 50, T = 3 n = 100, T = 3 n = 200, T = 3 .100 .112 .099 .096 .071 .073 .070 .069 .053 .053 .051 .051 .154 .165 .150 .146 .113 .114 .110 .109 .080 .081 .079 .080 .066 .068 .064 .065 .059 .054 .054 .056 .044 .040 .042 .044 .105 .111 .099 .096 .083 .086 .081 .080 .070 .070 .068 .068
1
1 1 .5 .5
2
1 1 .5 .5
.102 .307 .078 .104
.124 .117 .071 .126
.099 .152 .064 .099
.093 .263 .070 .090
.071 .209 .065 .089
.078 .076 .053 .095
.069 .110 .054 .082
.068 .198 .063 .078
.053 .155 .052 .072
.055 .050 .037 .074
.051 .079 .042 .068
.050 .147 .051 .067
3
1 1 .5 .5
.103 .227 .071 .106
.117 .133 .070 .118
.099 .151 .064 .099
.095 .203 .066 .093
.070 .162 .062 .088
.075 .089 .053 .091
.069 .110 .054 .082
.069 .153 .060 .079
.049 .121 .048 .073
.053 .061 .039 .072
.051 .079 .042 .068
.051 .113 .047 .067
1
1 1 .0 .5
.104 .154 .078 .105
.113 .165 .081 .111
.102 .149 .075 .099
.100 .144 .075 .094
.072 .111 .056 .087
.074 .112 .057 .086
.071 .107 .055 .082
.071 .106 .055 .081
.053 .078 .043 .070
.054 .078 .042 .071
.052 .076 .042 .069
.052 .076 .042 .068
2
1 1 .0 .5
.104 .299 .086 .102
.126 .117 .086 .126
.103 .157 .077 .099
.131 .568 .173 .094
.074 .211 .065 .087
.078 .073 .058 .094
.071 .107 .055 .082
.070 .196 .060 .078
.055 .150 .047 .072
.056 .048 .041 .075
.052 .076 .042 .069
.052 .143 .046 .067
3
1 1 .0 .5
.104 .224 .083 .105
.120 .132 .084 .117
.102 .150 .075 .099
.099 .203 .077 .093
.073 .156 .059 .084
.076 .086 .057 .091
.071 .107 .055 .082
.071 .149 .057 .079
.054 .115 .045 .070
.054 .058 .042 .072
.052 .076 .042 .068
.052 .109 .044 .067
1
1 1 .5 .5
.044 .083 .029 .056
n = 50, .047 .090 .031 .060
T =7 .044 .084 .028 .055
.043 .082 .028 .054
.033 .061 .022 .046
n = 100, T = 7 .034 .032 .032 .062 .059 .058 .022 .021 .021 .048 .046 .045
.025 .042 .016 .039
2
1 1 .5 .5
.045 .187 .029 .057
.050 .050 .032 .066
.044 .084 .028 .055
.043 .172 .028 .052
.033 .133 .022 .047
.035 .032 .023 .051
.032 .059 .021 .046
.032 .127 .021 .045
.025 .093 .017 .040
.026 .021 .016 .041
.025 .042 .016 .039
.025 .092 .016 .038
3
1 1 .5 .5
.044 .134 .029 .057
.049 .062 .031 .063
.044 .083 .028 .056
.043 .128 .028 .053
.033 .097 .021 .047
.034 .041 .022 .049
.032 .059 .021 .046
.032 .092 .021 .046
.026 .069 .017 .039
.026 .028 .016 .040
.025 .042 .016 .039
.025 .066 .016 .038
32
n = 200, T = 7 .026 .025 .025 .043 .042 .042 .016 .016 .015 .040 .039 .039
Table 2a. Empirical Mean(sd) of CQMLE and M-Estimator, SL Model, T = 3, m = 5 n = 50 n = 100 n = 200 err ψ CQMLE M-Est CQMLE M-Est CQMLE M-Est 1 1 1.0190(.053) .9992(.055) 1.0024(.035) 1.0004(.036) 1.0112(.025) .9998(.025) .9365(.133) .9695(.143) .9620(.096) .9855(.100) .9657(.068) .9949(.072) 1 .4279(.042) .5015(.047) .4467(.024) .5007(.026) .4310(.020) .4995(.022) .5 .2331(.060) .1933(.064) .2114(.049) .1953(.052) .2048(.039) .1980(.040) .2 2
1 1 .5 .2
1.0193(.052) .9391(.260) .4289(.045) .2335(.061)
.9992(.054) .9743(.280) .5031(.048) .1938(.065)
1.0005(.034) .9558(.184) .4474(.027) .2124(.050)
.9984(.034) .9797(.194) .5012(.028) .1967(.053)
1.0116(.026) .9635(.137) .4318(.022) .2030(.037)
1.0003(.026) .9929(.145) .5000(.023) .1962(.039)
3
1 1 .5 .2
1.0180(.055) .9388(.203) .4277(.043) .2354(.060)
.9980(.056) .9730(.218) .5018(.047) .1960(.064)
1.0019(.035) .9581(.147) .4461(.027) .2121(.050)
.9998(.036) .9817(.155) .5000(.029) .1962(.052)
1.0111(.024) .9623(.102) .4319(.021) .2054(.037)
.9997(.025) .9913(.108) .4999(.023) .1986(.039)
1
1 1 -.5 .2
1.0261(.057) .9600(.136) -.5577(.046) .1898(.096)
.9978(.059) .9753(.141) -.4976(.050) .1841(.097)
1.0188(.037) .9761(.099) -.5543(.031) .1855(.059)
.9993(.037) .9891(.102) -.4989(.034) .1978(.059)
1.0226(.027) .9797(.068) -.5513(.022) .1824(.044)
.9992(.027) .9917(.070) -.4991(.024) .1975(.044)
2
1 1 -.5 .2
1.0254(.056) .9557(.281) -.5557(.048) .1985(.093)
.9971(.058) .9719(.290) -.4958(.052) .1920(.094)
1.0179(.037) .9745(.201) -.5542(.033) .1853(.058)
.9985(.037) .9878(.206) -.4990(.035) .1974(.058)
1.0228(.028) .9857(.140) -.5518(.023) .1815(.043)
.9993(.028) .9980(.143) -.4995(.024) .1966(.043)
3
1 1.0261(.057) .9978(.059) 1.0184(.037) .9989(.037) 1.0225(.028) .9990(.028) .9487(.198) .9640(.204) .9752(.155) .9884(.159) .9808(.105) .9929(.108) 1 -.5 -.5562(.048) -.4971(.052) -.5547(.034) -.4992(.037) -.5514(.023) -.4991(.025) .1938(.097) .1877(.098) .1824(.058) .1943(.058) .1827(.043) .1974(.044) .2 Note: Par = ψ = (β, σv2 , ρ, λ1 ) ; err = 1 (normal), 2 (normal mixture), 3 (chi-square). Xt values are generated with θx = (g, φ1 , φ2, σ1 , σ2 ) = (.01, .5, .5, 2, 1) as in Footnote 13. W1 is generated according to Queen Contiguity scheme.
err ψ 1 1 1 .5 .2
Table 2b. Empirical sd and average of standard errors of M-Estimator SL Model, T = 3, m = 5, Parameter configurations as in Table 2a. n = 50 n = 100 n = 200 sd se se rse sd se se rse sd se se .055 .060 .054 .052 .036 .036 .035 .034 .025 .026 .026 .143 .158 .142 .138 .100 .106 .101 .100 .072 .075 .073 .047 .049 .044 .044 .026 .028 .026 .026 .022 .021 .021 .064 .070 .063 .061 .052 .048 .052 .059 .040 .037 .039
rse .025 .072 .021 .042
2
1 1 .5 .2
.054 .280 .048 .065
.066 .105 .052 .077
.054 .143 .044 .063
.052 .255 .046 .061
.034 .194 .028 .053
.039 .063 .029 .051
.034 .101 .026 .052
.034 .190 .027 .058
.026 .145 .023 .039
.027 .042 .021 .039
.026 .073 .021 .039
.025 .141 .022 .042
3
1 1 .5 .2
.056 .218 .047 .064
.062 .122 .050 .074
.054 .143 .044 .063
.052 .196 .045 .060
.036 .155 .029 .052
.037 .078 .028 .049
.034 .101 .026 .052
.034 .144 .027 .058
.025 .108 .023 .039
.027 .053 .021 .038
.026 .072 .021 .039
.025 .105 .022 .042
33
Table 3a. Empirical Mean(sd) of n = 50 err ψ CQMLE M-Est 1 1 1.0028(.051) 1.0012(.052) 1 .9268(.133) .9542(.141) .5 .4332(.041) .4993(.044) .2 .2064(.078) .1930(.083) .2 .1383(.185) .1489(.183) 2
1 1 .5 .2 .2
1.0006(.050) .9219(.263) .4355(.043) .2016(.078) .1411(.175)
.9989(.051) .9505(.280) .5011(.045) .1881(.083) .1525(.171)
CQMLE and M-Estimator, SLE Model, T = 3, m = 5 n = 100 n = 200 CQMLE M-Est CQMLE M-Est .9894(.035) .9990(.036) 1.0130(.028) 1.0001(.028) .9491(.093) .9800(.100) .9629(.070) .9911(.075) .4285(.029) .4999(.032) .4337(.019) .5010(.021) .2190(.069) .1905(.075) .1871(.063) .1938(.066) .1561(.148) .1582(.146) .1781(.122) .1723(.120) .9908(.036) .9495(.188) .4291(.032) .2199(.067) .1597(.148)
1.0005(.037) .9813(.200) .5006(.034) .1921(.072) .1634(.145)
1.0123(.027) .9626(.138) .4332(.022) .1881(.064) .1778(.122)
.9995(.028) .9910(.146) .5001(.023) .1940(.067) .1733(.120)
.9984(.053) .9920(.036) 1.0015(.037) 1.0121(.028) .9993(.028) 1 1.0001(.052) 1 .9247(.199) .9527(.212) .9461(.143) .9771(.152) .9596(.102) .9875(.108) .5 .4345(.042) .5006(.046) .4287(.031) .4996(.034) .4324(.021) .4991(.022) .2 .2038(.080) .1901(.086) .2209(.071) .1925(.076) .1898(.063) .1955(.066) .2 .1394(.186) .1510(.182) .1598(.148) .1629(.145) .1745(.121) .1695(.118) 2 Note: Par = ψ = (β, σv , ρ, λ1 , λ3 ) ; err = 1 (normal), 2 (normal mixture), 3 (chi-square). Xt values are generated with θx = (g, φ1 , φ2 , σ1, σ2 ) = (.01, .5, .5, 2, 1) as in Footnote 13. W1 and W3 are from Group Interaction scheme, and not equal; see Footnote 14. 3
err ψ 1 1 1 .5 .2 .2
Table 3b. Empirical sd and average of standard errors of M-Estimator SLE Model, T = 3, m = 5, Parameter configurations as in Table 3a. n = 50 n = 100 n = 200 sd se se rse sd se se rse sd se se .052 .056 .051 .050 .036 .038 .036 .035 .028 .029 .028 .141 .157 .140 .137 .100 .108 .102 .100 .075 .075 .072 .044 .045 .042 .042 .032 .032 .031 .031 .021 .021 .021 .083 .072 .080 .098 .075 .065 .072 .086 .066 .056 .064 .183 .179 .160 .163 .146 .143 .135 .138 .120 .116 .114
rse .028 .072 .021 .076 .117
2
1 1 .5 .2 .2
.051 .280 .045 .083 .171
.062 .107 .049 .080 .208
.050 .140 .041 .079 .160
.049 .247 .042 .095 .153
.037 .200 .034 .072 .145
.041 .067 .033 .070 .155
.036 .102 .031 .072 .134
.035 .192 .033 .085 .134
.028 .146 .023 .067 .120
.030 .043 .021 .059 .122
.028 .072 .021 .064 .113
.027 .141 .022 .074 .114
3
1 1 .5 .2 .2
.053 .212 .046 .086 .182
.058 .123 .046 .076 .191
.051 .140 .041 .080 .160
.050 .190 .043 .097 .159
.037 .152 .034 .076 .145
.039 .080 .031 .067 .147
.036 .101 .031 .072 .134
.035 .144 .033 .085 .136
.028 .108 .022 .066 .118
.029 .054 .021 .057 .119
.028 .072 .021 .064 .114
.028 .105 .022 .075 .115
34
Table 4a. Empirical Mean(sd) of CQMLE and M-Estimator, STL Model, T = 7, m = 5 n = 50 n = 100 n = 200 err ψ CQMLE M-Est CQMLE M-Est CQMLE M-Est 1 1 1.0045(.025) 1.0000(.025) 1.0073(.017) .9998(.017) 1.0078(.012) 1.0002(.012) 1 .9809(.079) .9855(.080) .9897(.058) .9940(.058) .9923(.041) .9966(.041) .5 .4780(.018) .4996(.019) .4807(.012) .5001(.012) .4812(.008) .5001(.009) .2 .1904(.046) .1950(.046) .1971(.031) .1994(.031) .1974(.023) .1984(.023) .2 .2280(.041) .2043(.042) .2208(.030) .2006(.030) .2200(.021) .2013(.021) 2
1 1 .5 .2 .2
1.0040(.026) .9920(.182) .4780(.018) .1908(.046) .2276(.042)
.9994(.026) .9968(.184) .4999(.019) .1954(.046) .2038(.042)
1.0078(.017) .9884(.130) .4805(.013) .1962(.032) .2216(.031)
1.0003(.018) .9927(.131) .4999(.013) .1985(.032) .2014(.031)
1.0074(.012) .9934(.090) .4816(.009) .1986(.023) .2186(.022)
.9998(.012) .9977(.091) .5005(.009) .1995(.023) .1999(.022)
.9999(.012) 1 1.0050(.025) 1.0006(.025) 1.0076(.018) 1.0001(.018) 1.0075(.012) 1 .9815(.135) .9861(.137) .9912(.095) .9955(.096) .9903(.067) .9945(.068) .5 .4783(.018) .4999(.018) .4805(.012) .4999(.012) .4810(.008) .4999(.009) .2 .1905(.048) .1950(.048) .1954(.031) .1977(.031) .1978(.023) .1988(.023) .2 .2278(.042) .2041(.042) .2226(.030) .2024(.030) .2198(.022) .2012(.022) 2 Note: Par = ψ = (β, σv , ρ, λ1 , λ2 ) ; err = 1 (normal), 2 (normal mixture), 3 (chi-square). Xt values are generated with θx = (g, φ1 , φ2 , σ1 , σ2 ) = (.01, .5, .5, 2, 1) as in Footnote 13. W1 and W2 are from Queen Contiguity, and equal. 3
err ψ 1 1 1 .5 .2 .2
Table 4b. Empirical sd and average of standard errors of M-Estimator STL Model, T = 7, m = 5, Parameter configurations as in Table 4a. n = 50 n = 100 n = 200 sd se se rse sd se se rse sd se se .025 .027 .025 .024 .017 .018 .017 .017 .012 .012 .012 .080 .088 .081 .080 .058 .060 .058 .057 .041 .042 .041 .019 .019 .018 .019 .012 .012 .012 .013 .009 .008 .009 .046 .049 .047 .048 .031 .032 .032 .032 .023 .023 .023 .042 .044 .042 .044 .030 .030 .030 .032 .021 .021 .022
rse .012 .041 .009 .024 .023
2
1 1 .5 .2 .2
.026 .184 .019 .046 .042
.029 .046 .021 .052 .047
.025 .082 .019 .047 .042
.025 .174 .019 .048 .044
.018 .131 .013 .032 .031
.019 .029 .013 .033 .031
.017 .058 .012 .031 .030
.017 .126 .013 .032 .032
.012 .091 .009 .023 .022
.013 .020 .008 .024 .021
.012 .041 .009 .023 .022
.012 .091 .009 .024 .023
3
1 1 .5 .2 .2
.025 .137 .018 .048 .042
.028 .060 .020 .050 .045
.025 .081 .018 .046 .042
.024 .126 .019 .048 .044
.018 .096 .012 .031 .030
.018 .039 .012 .033 .030
.017 .058 .012 .032 .030
.017 .092 .013 .032 .032
.012 .068 .009 .023 .022
.013 .026 .008 .023 .021
.012 .041 .009 .023 .022
.012 .066 .009 .024 .023
35
Table 5a. Empirical Mean(sd) of CQMLE and M-Estimator, STLE Model, T = 3, m = 5 n = 50 n = 100 n = 200 err ψ CQMLE M-Est CQMLE M-Est CQMLE M-Est 1 1 1.0076(.031) .9995(.031) 1.0092(.024) .9999(.025) 1.0057(.016) .9999(.016) 1 .9287(.133) .9437(.138) .9622(.099) .9746(.102) .9723(.070) .9841(.071) .3 .2578(.033) .3000(.035) .2663(.021) .2992(.022) .2685(.014) .2999(.015) .2 .1966(.073) .1957(.075) .1877(.059) .1967(.060) .2030(.037) .1987(.038) .2 .2209(.084) .2055(.091) .2240(.047) .2037(.049) .2019(.037) .2005(.039) .2 .1463(.185) .1459(.191) .1846(.132) .1745(.135) .1785(.092) .1838(.094) 2
1 1 .3 .2 .2 .2
1.0084(.031) .9288(.264) .2564(.035) .1989(.073) .2168(.082) .1367(.184)
1.0003(.032) .9448(.273) .2986(.036) .1983(.076) .2008(.089) .1358(.190)
1.0091(.025) .9591(.195) .2661(.022) .1895(.060) .2211(.047) .1805(.131)
.9998(.025) .9717(.201) .2988(.022) .1985(.060) .2008(.050) .1702(.134)
1.0055(.016) .9713(.140) .2685(.015) .2043(.036) .2015(.036) .1799(.090)
.9997(.016) .9832(.144) .2998(.015) .1999(.037) .2003(.039) .1855(.092)
.9985(.031) 1.0084(.025) .9991(.025) 1.0054(.016) .9995(.016) 1 1.0066(.031) 1 .9318(.201) .9476(.208) .9644(.148) .9769(.152) .9727(.102) .9846(.105) .3 .2590(.034) .3014(.036) .2679(.022) .3008(.022) .2689(.015) .3003(.015) .2 .1983(.071) .1978(.073) .1879(.060) .1969(.060) .2046(.037) .2003(.038) .2 .2193(.081) .2035(.087) .2207(.047) .2003(.049) .2009(.037) .1998(.040) .2 .1412(.184) .1403(.190) .1852(.133) .1750(.135) .1794(.093) .1849(.095) 2 Note: Par = ψ = (β, σv , ρ, λ1 , λ2 , λ3 ) ; err = 1 (normal), 2 (normal mixture), 3 (chi-square). Xt values are generated with θx = (g, φ1 , φ2 , σ1 , σ2 ) = (.01, .5, .5, 3, 1) as in Footnote 13. W1 , W2 and W3 are all from Queen Contiguity, and equal. 3
err ψ 1 1 1 .3 .2 .2 .2
Table 5b. Empirical sd and average of standard errors of M-Estimator STLE Model, T = 3, m = 5, Parameter configurations as in Table 5a. n = 50 n = 100 n = 200 sd se se rse sd se se rse sd se se .031 .035 .031 .030 .025 .026 .025 .025 .016 .017 .016 .138 .156 .136 .132 .102 .106 .099 .098 .071 .073 .071 .035 .038 .035 .035 .022 .023 .021 .022 .015 .015 .015 .075 .080 .071 .071 .060 .062 .059 .060 .038 .039 .038 .091 .079 .085 .108 .049 .043 .049 .060 .039 .033 .039 .191 .209 .184 .186 .135 .141 .132 .132 .094 .096 .093
rse .016 .070 .015 .038 .050 .094
2
1 1 .3 .2 .2 .2
.032 .273 .036 .076 .089 .190
.039 .109 .042 .089 .088 .243
.031 .137 .034 .070 .083 .184
.030 .239 .035 .069 .104 .177
.025 .201 .022 .060 .050 .134
.028 .065 .024 .067 .047 .154
.025 .099 .021 .059 .049 .132
.024 .187 .022 .059 .059 .130
.016 .144 .015 .037 .039 .092
.017 .042 .016 .041 .034 .102
.016 .071 .015 .038 .039 .093
.016 .139 .015 .038 .049 .092
3
1 1 .3 .2 .2 .2
.031 .208 .036 .073 .087 .190
.037 .124 .040 .084 .084 .224
.031 .137 .035 .070 .084 .183
.030 .187 .035 .069 .106 .181
.025 .152 .022 .060 .049 .135
.027 .079 .023 .065 .045 .147
.025 .099 .021 .059 .049 .131
.025 .142 .022 .059 .059 .130
.016 .105 .015 .038 .040 .095
.017 .052 .015 .040 .033 .098
.016 .071 .015 .038 .039 .093
.016 .104 .015 .038 .050 .094
36