Panel Vector Autoregression in R: The panelvar ...

Panel Vector Autoregression in R: The panelvar Package Michael Sigmund1 , Robert Ferstl2 , Daniel Unterkofler1

Abstract This paper considers two types of generalized method of moments (GMM) estimators for panel vector autoregression models (PVAR) with fixed individual effects. First, the first difference GMM estimator is implemented. It is an extension of the single equation dynamic panel model. A GMM-estimator for single equation dynamic panel model is implemented in the STATA package xtabond2. Some of the xtabond2 features are covered in the R package: plm. Second, also the so-called system GMM estimator is extended from single equation dynamic panel models to PVAR models. In addition to the GMM-estimators we contribute to the literature by providing specification tests (Hansen overidentification test, lag selection criterion and stability test of the PVAR polynomial) and classical structural analysis for PVAR models such as orthogonal and generalized impulse response functions, bootstrapped confidence intervals for impulse response analysis and forecast error variance decompositions. Finally, we implement the first difference and the forward orthogonal transformation to remove the fixed effects. Keywords: Panel vector autoregression model, generalized method of moments, first difference and system GMM, R JEL: Classification Numbers G20, G30 1. Introduction Over the past decades important advances have been made in the study of dynamic panel data models with fixed effects for the typical setting that cross-sectional dimension (N) is large and the time dimension (T) is short. Classical OLS-based regression methods cannot be applied because of the famous Nickell bias (Nickell, 1981) that does not disappear asymptotically if N → ∞ and T is fixed. One solution to this problem is to apply Email addresses: [email protected] (Michael Sigmund), [email protected] (Robert Ferstl), [email protected] (Daniel Unterkofler) 1 Oesterreichische Nationalbank (OeNB), Otto-Wagner-Platz 3, A-1090 Vienna, Austria. 2 Department of Finance, University of Regensburg, 93053 Regensburg, Germany

generalized method of moments estimators popularized by Hansen (1982) in economics. Important contributions are Anderson & Hsiao (1982), Holtz-Eakin et al. (1988), Arellano & Bond (1991), Arellano & Bover (1995) and Blundell & Bond (1998). Within the well established GMM estimators it is useful to distinguish between the first difference GMM estimator (Holtz-Eakin et al., 1988; Arellano & Bond, 1991) that uses lags of the endogenous variable(s) as instruments and the system GMM estimator (Blundell & Bond, 1998) that uses additional moment conditions based on information contained in the ”levels”. First difference and system GMM estimators for single equation dynamic panel data models have been implemented in STATA: xtabond2 (Roodman, 2009a) and some of the features are also available in the R package: plm (Croissant & Millo, 2008). However, with the exception of Holtz-Eakin et al. (1988) and later Binder et al. (2005) the theoretical literature has primarily focused on single equation dynamic panel data models, whereas there are many applications that requires a simultaneous treatment of the decision problems of households, firm, banks or other economic agents. Ever since the seminal paper of Sims (1980) on macroeconomic reality, vector autoregressive models (VAR) are considered as a starting point in economics to study models with more than one endogenous variable. These models have been extensively studied in the time series literature (see Luetkepohl, 2006; Pfaff, 2008). Again, the standard ordinary least square equation-by-equation estimation procedure for VAR models does not provide unbiased estimates for PVAR models. The popularity of PVAR model in empirical economics (and other social sciences) is documented by over 640 citations of Love & Zicchino (2006). They provide an unofficial STATA code that has been extended recently by Abrigo & Love (2016). Abrigo & Love (2016) use the first generation GMM estimator suggested by Anderson & Hsiao (1982) to deal with the Nickell bias (Nickell, 1981). Our code implements the direct extension of the Anderson & Hsiao (1982), the first difference GMM estimator (Holtz-Eakin et al., 1988; Arellano & Bond, 1991) and the more complex system GMM by Blundell & Bond (1998) for PVAR models.3 In the panelvar package we basically extends all features of xtabond2 to a system of dynamic panel equations. We implement the first difference GMM and system GMM estimator as laid out in Binder et al. (2005) for PVAR models. In doing so, the panelvar package also adds most of the features in xtabond2 also to dynamic single equation panel models that are missing in the plm package, most notably the forward orthogonal transformation (an additional method to remove the fixed effect, especially for panel data sets 3

The differences between Anderson & Hsiao (1982), Arellano & Bond (1991) and Blundell & Bond (1998) are described in section 2.2 and 2.6.

2

with gaps). In addition to the GMM estimators we also provide structural analysis functions for PVAR models that are well established for (time-series) VAR models. These functions include orthogonal impulse response functions (see Luetkepohl, 2006), generalized impulse response functions (see Pesaran & Shin, 1998) and forecast error variance decomposition. For the impulse response function we also provide a GMM-specific bootstrap method for estimating confidence intervals. Furthermore we extend the Hansen J overidentification test (Hansen, 1982; Roodman, 2009a), the model selection procedure of Andrews & Lu (2001) and the Windmeijer corrected standard errors (Windmeijer, 2005) from single equation dynamic panel models to PVAR models. The paper is organized as follows: Section 2 sets up the PVAR model. First, we shows how the basic GMM estimation works for single equation dynamic panel models. Based on this short introduction we derive the first difference GMM estimator for PVAR models. In the next step we define the system GMM moment conditions and corresponding system GMM estimator. Next, the Windmeijer correction is extended to the standard errors of a PVAR model. The next subsections introduce the orthogonal impulse response function, the generalized impulse response function and confidence bands for impulse response analysis for PVAR models. The final subsection of section 2 defines the Hansen J overidentification test and the Andrews-Lu model selection procedure. Section 3 uses the panelvar package for single equation dynamic panel models and for PVAR models. In this section we use publically available data sets that are also included in the plm package.4 2. Methodology 2.1. Specification of a PVAR model A basic first order vector auto-regressive panel model (PVAR) was first introduced by Holtz-Eakin et al. (1988). We extend their model to allow for p lags of m endogenous variables, k predetermined variables and n strictly exogenous variables. Therefore, we consider the following stationary PVAR with fixed effects.5   p p X X     yi,t = Im − Al  µi + Al yi,t−l + Bxi,t + Csi,t + i,t (1) l=1

l=1

4

For these data sets we provide additional PVAR examples in the corresponding vignettes. A random effects specification in a dynamic panel context is possible but requires strong assumption on the individual effects. Empirical applications mostly use a fixed-effects specification. We do not consider a random-effects implementation at this stage. See Binder et al. (2005) for more details. 5

3

Im denotes an m × m identity matrix. Let yi,t ∈ Rm be an m × 1 vector of endogenous variables for the ith cross-sectional unit at time t. Let yi,t−l ∈ Rm be an m×1 vector of lagged endogenous variables. Let xi,t ∈ Rk be an k × 1 vector of predetermined variables that are potentially correlated with past errors. Let si,t ∈ Rn be an n × 1 vector of strictly exogenous variables that neither depend on t nor on t−s for s = 1, ..., T . The idiosyncratic error vector i,t ∈ Rm is assumed to be well-behaved and independent from both the regressors xi,t and si,t and the individual error component µi . Stationarity requires that all unit roots of the PVAR model fall inside the unit circle, which therefore places some constraints on the fixed effect µi . The cross section i and the time section t are defined as follows: i = 1, 2, ..., N and t = 1, 2, ..., T . In this specification we assume parameter homogeneity for Al (m × m), B (m × k) and C (m × n) for all i. A PVAR model is hence a combination of a single equation dynamic panel model (DPM) and a vector auto-regressive model (VAR). 2.2. GMM estimation of single equation dynamic panel models A panel data set of size N ×T offers a rich statistical structure. Most panel econometric theory has been developed for data sets where N was large and T was small. In the spirit of this data structure various general methods of moments (GMM) estimators have been developed for a single equation dynamic panel model (no VAR structure) to avoid the famous Nickell Bias (Nickell, 1981).6 Most notably starting with Anderson & Hsiao (1982) instrumental variable approaches have been developed that use internal instruments (further lags of the dependent variable) to eliminate the Nickell bias. The idea of using internal instruments can be formulated by applying the first difference (or the forward orthogonal transformation7 ) instead of the within transformation in a simple AR(1) model. yi,t − yi,t−1 = φ(yi,t−1 − yi,t−2 ) + (i,t − i,t−1 ) By construction i,t−1 and yi,t−1 are correlated but i,t−1 is not correlated with yi,t−2 , if we assume that i,t are not serially correlated and φ is not too close to one. The use of internal instruments can also be expressed in terms of the following moment conditions: 6

The Nickell bias disappears, if T → ∞. However, for small T various studies confirm that the bias is severe (Phillips & Sul, 2007). p P 7 ⊥ yi,t+1 = ci,t (yi,t − 1/T i,t s>t yi,s ). Where ci,t = T i,t /(T i,t + 1). This transformation is suggested by Arellano & Bover (1995) to minimize data losses due to data gaps.

4

E[(i,t − i,t−1 )yi,t−2 ] = 0

(2)

Originally Holtz-Eakin et al. (1988) came up with the idea to exploit more moment conditions by substituting yi,t−2 by yi,t−m for each m = 3, ..., t in Eq. (2).8 The resulting estimator is known as first difference GMM, where the first difference refers to the (i,t − i,t−1 ) part of Eq. (2). To be more precise, Anderson & Hsiao (1982) and Arellano & Bond (1991) use the same instruments, but Arellano & Bond (1991) exploit more moment conditions based on them and are therefore asymptotically more efficient (i.e. lower asymptotic variance). The first difference GMM estimator for single equation dynamic panel models is implemented in plm (see Croissant & Millo, 2008) and xtabond2 (see Roodman, 2009a). 2.3. First difference moment conditions Before we set up the first difference estimator moment conditions, we apply the first difference or the forward orthogonal transformation to Eq. (1) ∆yi,t =

p X

Al ∆yi,t−l + B∆xi,t + C∆si,t + ∆i,t

(3)

l=1

∆ either refers to the first difference or the forward orthogonal transformation. The first difference transformation exists for t ∈ {p + 2, · · · , T } and the forward orthogonal transformation exists for t ∈ {p + 1, · · · , T − 1}. We denote the set of indexes t, for which the transformation exists by T∆ . Binder et al. (2005) extend the equation-by-equation estimator of Holtz-Eakin et al. (1988) for a PVAR model with only endogenous variables that are lagged by one period. We further extend Binder et al. (2005) by adding more lags of the endogenous variables, predetermined and strictly exogenous variables. Moreover, we follow Binder et al. (2005) by setting up the GMM conditions for each individual i. First, we express the moment conditions for the lagged endogenous, the predetermined and the strictly exogenous variables for a fixed i and t.9 Definition 2.1. (First difference GMM moment conditions): E[∆i,t y|i, j ] = 0 j ∈ {1, · · · , T − 2} and t ∈ T∆ , E[∆i,t x|i, j ] = 0 j ∈ {1, · · · , T − 1} and t ∈ T∆ , E[∆i,t ∆s|i,t ] = 0 t ∈ T∆ 8

(4)

Interestingly, though the Arellano & Bond (1991) is now seen as the source of an estimator, it is entitled: Some tests of specification for panel data. 9 ∆i,t follows from rewriting Eq. (3).

5

The dimension of these matrices are the following: ∆i,t is m × 1. yi,s is [m(T − p − 1)] × 1.10 xi,s is [k(T − 1) − 1] × 1. Finally, ∆si,t is n × 1. For later derivations it is useful to define qi,t by q|i,t := (y|i,t−p−1 , y|i,t−p−2 , ..., y|i,1 , x|i,t−1 , x|i,t−2 , ..., x|i,1 , ∆si,t ) t ∈ {p + 2, · · · , T } such that we can define the moment conditions more compactly later on. We rewrite the moment conditions of Eq. (4) in a different form by stacking over t. Therefore we start stacking over t with Eq. (3). ∆Yi =

p X

∆Yi,l A|l + ∆Xi B| + ∆SiC | + ∆Ei

(5)

l=1

∆Yi , ∆Yi,l and ∆Ei are an (T −1− p)×m matrices. A, B and C have the same dimension as in Eq (1). ∆Xi is (T − 1 − p) × k. ∆Si is (T − 1 − p) × n. Based on Eq. (5), we set up the stacked moment conditions for each i. E[Q|i (∆Ei )] = 0 Qi is the stacked form of qi,t defined by  |  0 · · · 0  qi,p+2   q|i,p+3 0   0  Qi :=  . ..  ..  .. . .    0 0 · · · q|i,T

(6)

(7)

The dimension of Qi depend on the number of lags of the endogenous variables (p), the number of endogenous variables (m), the number of predetermined variables (k) and the number of strictly exogenous variables (n). Mathematically, the number of moment conditions follow an arithmetic progression S h = h2 ∗ (a1 + (h − 1) ∗ d) where one has to identify the value of the first element a1 , the number of elements h and the difference between two consecutive elements d. The dimension consists of three parts which can be attributed to the lagged endogenous variables, the predetermined variables and the strictly exogenous variables. We further introduce the pre endo variables Lmin and Lmin which refer to the first possible lag of either the endogenous and the predetermined variables that can be used as instruments. 10

Here we assume that in our specification all lags up to order p of the endogenous variable are included.

6

endo endo dim(Qi ) = (m ∗ [(T − (Lmin + p − 1)) ∗ (2p + ((T − (Lmin + p − 1)) − 1) ∗ d)/2]+ pre pre k ∗ [(T − (Lmin + p − 1))/2 ∗ (2p + ((T − (Lmin + p − 1)) − 1) ∗ d)]+ n) × (T − 1 − p)

(8) We illustrate how the number of moment condition based on Eq. (8) with a simple example. Suppose T = 9 and all data of individual i are available. Table 1: Number of moment conditions for an individual Time Periods

m=p=1

m=p=k=1

m = 1, p = 2

m = 1, p = 2, k = 1

m = 1, p = 3

m = 1, p = 3, k = 2

t=1 t=2 t=3 t=4 t=5 t=6 t=6 t=7 t=8

0 0 1 2 3 4 5 6 7

0 0 2 3 4 5 6 7 8

0 0 0 2 3 4 5 6 7

0 0 0 3 4 5 6 7 8

0 0 0 0 3 4 5 6 7

0 0 0 0 4 5 6 7 8

As the dimension of Qi signals, there are many moment conditions to identify a possibly small number of parameters. There are a number of reasons why the reduction of the number of moment conditions could be of major importance in many cases. First, there is the problem of instrument proliferation (Roodman, 2009b). Instrument proliferation is intrinsic in GMM estimation of dynamic panel models when all the lags of the endogenous explanatory variables are exploited, as the number of moment conditions increases with T and with the dimension of the vectors of endogenous regressors and predetermined variables. Although more conditions should improve efficiency (Dagenais & Dagenais, 1998), the bias due to overfitting is quite severe as the number of moment conditions expands, outweighting the gains in efficiency (see Bekker, 1994; Newey & Smith, 2004; Ziliak, 1997). Second, a related problem is concerned with asymptotic theory. Even if we consider the case that N → ∞ and T → ∞ (instead of the classical assumptions: T fixed and N → ∞) Alvarez & Arellano (2003) show that in Theorem 2 the consistency and asymptotic normality of the first difference GMM-estimator hold under the following condition: log(T )2 /N → 0. In practical terms, for large T if we can fix the number of moment conditions q after T > c, the first difference GMM remains consistent. We can also rely on 7

an alternative proof of Koenker & Machado (1999) that the first difference GMM estimator remains consistent and asymptotically normally distributed in the case of T → ∞. Koenker & Machado (1999) state the following condition: q3 /N → 0. Third, reducing the number of moment conditions is necessary for very large panel data sets to make an estimation computationally feasible. As a consequence, the single equation dynamic panel literature offers two possibilities to reduce this number. The first idea is to reduce the number of moment conditions by fixing a maximal lag length Lmax after which no further instruments are used, if available. It is less popular but statistically possible not to start with the first possible lag of the instruments Lmin but with a deeper lag. Both parts of the first idea can be implemented. Only the Lmax part of the first idea is formulated by Mehrhoff (2009) for single equation dynamic panel models. We extend his idea by adding the Lmin and the extension to PVAR models. endo Due to different Lmin assumptions for the lagged endogenous (Lmin = 2) and the prepre determined variables (Lmin = 1) we implemented two transformation matrices in our code but only explain the transformation matrix for the ”endogenous” block of the instrument matrix Qi . The transformation matrix for the ”predetermined” block of the instrument matrix is derived in a similar fashion. In principle, we apply a linear transformation on the matrix Qendo to reduce the number i endo of rows and columns (that have non-zero elements). For each qi,t in Eq. (7) there is a corresponding transformation matrix fi,Lj which are identity matrices of growing dimension depending on the number of periods T , the lags l, Lmin and Lmax . The columns of the identity matrices are cut off in the following way:   endo endo  =2  (1, .., j) × (1, .., j) if j < Lmax , Lmin        endo endo endo   if j ≥ Lmax , Lmin = 2   (1, .., j) × (1, .., Lmax )  L fi, j =   endo endo endo     (1, ..., j) × (L , .., j) if j < L , L > 2   max min min     (1, ..., j) × (Lendo , .., Lendo ) if j ≥ Lendo , Lendo > 2   min

max

max

min

The are defined for each j ∈ {p, ..., T − 2}, where p is the lag order of the PVAR model. The fi,Lj are block-diagonalized to complete the full transformation matrix FiL : fi,Lj

 L  0 ··· 0  fi, j   0 f L  0  i, j+1 FiL =  . ..  ⊗ Im×m ..  .. . .    L 0 0 · · · fi,T −2 The second idea to reduce the number of moment conditions is called collapsing of instruments. A detailed description of the motivation and theory behind collapsing can be 8

found in Roodman (2009b). The collapsed version of Eq. (7) reduced Qi to:

Qcollapse i

 | | |  qi,p+2 qi,p+3 · · · qi,T    0    ..  ..  . :=  . 0   ..   0 .   0 0 ···

(9)

Qcollapse reduced to a (T − 2) × (T − 2) matrix. i Following Mehrhoff (2009) the transformation matrix for collapsing the lagged endogenous instrument set is made up of identity matrices of increasing dimension stacked one upon the other with blocks of zero matrices to the right.11  L  0 · · · 0   fi, j   f L  · · · 0 · · ·  i, j+1 FCi =  . . . . . . . . .  ⊗ Im×m  .. .   L  fi,T −2 · · · · · · · · · Not surprisingly, both ideas to reduce the number of moment conditions can be applied at the same time as FiL and FCi have the same ”block” elements. 2.4. GMM estimator with first difference moment conditions After deriving all the inputs that are necessary to state the moment conditions in Eq. (6), we can set up the estimation procedure. Based on the moment conditions in Eq. (6) and the derived set of instruments Qi , we can now formulate the following minimization problem.12 N N X X | | min( Qi (∆Yi − [∆Yi,−1 ∆Xi ∆Si ] Φ) ΛQ ( Q|i (∆Yi − [∆Yi,−1 ∆Xi ∆Si ] Φ) Φ

i=1

i=1

(10) Where Φ is defined as [A B C] which is a m × (p + k + n) matrix. ΛQ is the GMM weighting matrix. We also define ∆Wminus := ∆Yi,−1 ∆Xi ∆Si and ∆Wi = ∆Yi . 11 12

Again, the matrix for collapsing the predetermined variables is derived in the appendix. In our notation we try to stick to our code as closely as possible.

9

We follow the standard GMM literature which proposes a one-step and a two-step estimation procedure that differ in how ΛQ is defined. Since the two-step estimation builds on the residuals of the one-step estimation, we start with the one-step (or initial) estimate ΦIE . For the one-step estimation ΛQ has the following structure. ΛQ =

N X

Q|i DD| Qi

i=1

Let Vi be the matrix with untransformed time series data of cross section i. Then there exist a (T − 1) × T linear transformation matrix D such that DVi = ∆Vi . If the first difference transformation is used to remove the fixed effect, then D has the following structure:   −1 1 0 · · · 0   0  0 −1 1 D =  .. (11) ..  .. .  . .    0 0 · · · −1 1 If the forward orthogonal transformation is applied to remove the fixed effect ∆, then D has the following structure: q q q  q   T −1 − 1 1 1 − · · · −   T T (T −1) T (T −1) T (T −1) q q q   T −2 1 1   0 − · · · − T −1 (T −1)(T −2) (T −1)(T −2)   . (12) D =   ..  .. .. ..  . .   . q q .   1 T −1 1 0 0 ··· − 2 T T −(T −1) consists of two parts a (T − 1) × (T − 1) identity matrix and a (T − 1) × 1 column of zeros. The optimal solution for ΦIE in vectorization form for Eq. (10) is given by:13 | −1 | −1 vec(ΦIE ) = (S ZX Λ−1 Z SZX ) SZX ΛZ S Zy

(13)

In the two-step estimation the choice of the optimal weighting matrix ΛZ requires the ˆ i = ∆yi − ∆Wminus,i ΦIE ). The so called feasible residuals of the one-step estimation (∆E efficient general methods of moment estimator (FEGMM) reads as follows14 13 14

For a prove of the existence of this solutions see Hansen (2012), pp.8–10. See for example Roodman (2009b) for more details.

10

−1 | −1 vec(ΦFEGMM ) = (S|ZX Λ−1 Zeˆ SZX ) SZX ΛZeˆ S Zy

(14)

where SQX =

N X

Q|i ∆Wminus ,

i=1 N X

SQy =

Q|i ∆Wi ,

i

Zi = SZX = SZy = ΛQ =

Qi ⊗ Im×m , SQX ⊗ Im×m , vec(S|Qy ), N X

Q|i DD| Qi ,

(15)

i=1

ΛZ =

ΛQ ⊗ Im×m , ˆ i ), vec(E

eˆ i = ˆ i = ∆Wi − ∆Wminus,i ΦIE , E N X ΛZeˆ = Z|i Γeˆ Zi , i=1 N X

Γeˆ =

eˆ i eˆ |i .

i=1

2.5. Additional moment conditions Additional moment conditions can be constructed when imposing the following assumption (see Arellano & Bover (1995) and Blundell & Bond (1998) for the case m = 1) on the structure of the process. Definition 2.2. (System GMM moment conditions): E[yi,t µ|i ] = E[yi,s µ|i ], E[xi,t µ|i ] = E[xi,s µ|i ], E[zi,t µ|i ] = 0, ∀i ∈ {1, · · · , N} and ∀s,t So, this assumption is valid if changes in yi,t are not systematically related to µi . Following Blundell & Bond (1998) this assumption is clearly satisfied in a fully stationary PVAR model. 11

Blundell & Bond (1998) also argue that the system GMM estimator performs better than the first difference GMM estimator because the additional instruments remain good predictors for the endogenous variables in this model even when the series are very persistent. The additional moment conditions (additional to Eq. (4)) are the following: P E[(i,t + (I − pj=1 A j )µi )(yi,t−1 − yi,t−2 )| ] = 0 t ∈ {3, 4, · · · , T } P E[(i,t + (I − pj=1 A j )µi )(xi,t − xi,t−1 )| ] = 0 t ∈ {2, 3, · · · , T } (16) Pp | E[(i,t + (I − j=1 A j )µi )zi,t ] = 0 t ∈ {2, 3, · · · , T } And hence we define the matrices Pi for the case p = 1 as follows, whereas ”∆” denotes the first difference operator:   0   0 ∆yi2 0 · · ·  0 0 ∆yi3 0     .. .. .  .    0 ··· ∆yi,T −1   0  0 ··· 0  Pi := ∆xi2 0   0 ∆x 0 0  i3    0 0 ∆xi4 0    ..  ..  .  .  0 0 ··· ∆xi,T Next we define a new matrix for the instruments:15 ! Qi 0 ∗ Qi := 0 Pi 2.6. System GMM estimator The extended GMM-estimator is derived in the same way as Eq. (13) and reads as follows (IEE stands for initial estimator extended): vecΦIEE = (S|Z∗ X Λ−1 SZ∗ X )−1 S|Z∗ X Λ−1 SZ∗ y

(17)

Where In the case where p > 1 the matrices Pi have to be adapted by deleting columns and rows such that equation 16 holds. 15

12

SQ∗ X =

N X

∗ Q∗| i ∆Wminus ,

i=1

SZ∗ X = SQ∗ X ⊗ Im×m , ∗ ∆Wminus = ∆Yi,−1 ∆Xi ∆Zi Yi,−1 1 , SQ∗ Y =

N X

Q∗i ∆W∗i ,

i=1

SZ∗ Y = SQ∗ Y ⊗ Im×m , Λ Q∗ =

N X

Q∗i D∗ (D∗ )| (Q∗i )| ,

i=1

∆W∗i = ∆Yi Yi , D∗ is defined as follows (((2T − 1) × T )): D =

D

∗

!

IT ×T

Where D is defined in Eq. (11) for the first difference transformation and in Eq. (12) for the forward orthogonal transformation. For the two-step estimation we use ΛZe∗ˆ =

N X

Z∗i Γeˆ (Z∗i )|

i=1

as the weighting matrix. −1 | −1 vec(ΦEFEGMM ) = (S|Z ∗ X Λ−1 Z ∗ SZ ∗ X ) SZ ∗ X ΛZ ∗ S Z ∗ y eˆ

13

eˆ

2.7. Estimating the asymptotic covariance matrix of the estimator In this section we define the estimates of the asymptotic covariance matrix of the GMM-estimators. The covariance matrix for the one-step GMM-estimator defined in Eq. (13) and Eq. (17) can be defined in a straightforward way. Ruud (2000) shows that the one-step GMMestimators are asymptotically normally distributed. −1 ˆ ΦIE := S| ΛZ SZX S| Λ−1 Var ZX ZX Z ΛZeˆ | −1 | × Λ−1 , Z SZX SZX ΛZ SZX with ΛZ = ΛQ ⊗ Im and ΛZeˆ as defined in Eq. (15). Since the choice of the weighting matrix is in general not optimal in the one-step estimation, we cannot expect that the one-step estimator is asymptotically efficient. For the two-step GMM- estimators in Eq. (14) and the asymptotic variance is defined as follows: −1 ˆ ΦFEGMM := 1 S| Λ−1 Var ZX Zeˆ SZX N However, Windmeijer (2005) shows that this estimator does not perform well in finite samples on simulated data of a dynamic panel process (m = 1). Windmeijer (2005) therefore suggested a small sample correction for a dynamic panel model (see Roodman (2009a)), which we extend to a PVAR model (see Appendix Appendix A for a derivation). | −1 −1 ˆ Wc Var SZX ΦFEGMM = SZX ΛZeˆ (ΦIE ) −1 + DΦFEGMM ,ΛZeˆ (ΦIE ) SZX (ΛZ )−1 SZX S|ZX (ΛZ )−1 −1 + S|ZX ΛZeˆ (ΦIE ) −1 SZX D|ΦFEGMM ,ΛZ (ΦIE )

(18)

eˆ

ˆ ΦIE D| + DΦFEGMM ,ΛZeˆ (ΦIE ) Var ΦFEGMM ,ΛZ

eˆ

(ΦIE )

2.8. Orthogonal impulse response analysis In this section responses to orthogonal impulses are calculated. This section closely follows Luetkepohl (2006) and Pfaff (2008). Considering Eq. (1), the impulse response analysis in a vector autoregression context is concerned with the response of one (endogenous) variables to an impulse in another (endogenous) variable. This idea can be formalized by first deriving the so-called PVMAX representation (panel vector moving average representation with exogenous variables)

14

of a PVAR-X(1) process:16 ∞ " ∞  X  x # X  h i yi,t = νi +  A j−1 [B C] i,t− j +  A j  i,t− j si,t− j j=0

(19)

j=0

With νi = (Im − A)−1 µ∗i .17 It is also important to note that in the impulse response analysis we treat predetermined and strictly exogenous variables in the same way. Based on the VMA-X representation the impulse response function can be stated as follows: IRF(k, r) =

∂yi,t+k = Ak e r ∂(i,t )r

where k is the number of periods after the shock to the r-th component of i,t with er being a m × 1 vector with a 1 in the r-th column and 0 otherwise. Let Σ be the covariance matrix of t . Usually the off diagonal elements of Σ are different from 0, so shocks across the m equations are not independent of each other. So the parameters of the PVAR model have to be adjusted such that the responses to ”independent” shocks are transferred through the PVAR system accordingly. Since we assume that Σ is a symmetric positive definite matrix, there exist a unique Cholesky decomposition such that Σ = PP| , where P is a lower triangular matrix. Defining Θk = Ak P and ui,t = P−1 i,t we obtain the orthogonal impulse response function: OIRF(k, r) =

∂yi,t+k = Θk er ∂(ui,t )r

(20)

As stated in Luetkepohl (2006) and many others, although the Cholesky-decomposition is unique it depends on the ordering of variables which has been criticized in the literature. An alternative to the OIRF that meets some of the critics is presented in the next section. Following Luetkepohl (2006) a closely related tool for interpreting PVAR models is available, namely the forecast error variance decomposition. It determines how much of the forecast error variance of each of the variables can be explained by exogenous shocks to the other variables. We start the forecast error variance decomposition by defining the h-step forecast error in the MA representation: 16

As a PVAR(p) process can be expressed as a PVAR(1) process (see Luetkepohl (2006)) we need not derive a more expression. general Pp 17 ∗ µi = Im − l=1 Al µi .

15

yi,t+h − yi,t =

h−1 X

Θk ui,t−k

k=0

Let θk,m,n be the m − nth component of Θk . Then it is possible to define the contribution of innovations in variable n to the forecast error variance or MSE of the h-step forecast of variable m. yi,m,t+h − yi,m,t =

h−1 X

(e|m Θk en )2

(21)

k=0

If we divide Eq. (21) by the mean squared error of the h-step forecast of yi,m,t+h , then the forecast error variance of variable yi,m yields:  h−1 m  h−1 X XX    2 (e|m Θk en )2 /  ωom,n,h = θk,m,n (22)  k=0 m=1

k=0

2.9. Generalized impulse response analysis In Pesaran & Shin (1998) an alternative approach to the orthogonal impulse response analysis in Eq. (20) is suggested. Instead of shocking all the elements of i,t Pesaran & Shin (1998) choose to shock only one element, say its r-th element and integrate out the effects of other shocks using the historically observed distribution of the errors. In this case we have GIRF(k, r, Σ ) = E yi,t+k |i,t,r = δr , Σ − E yi,t+k |Σ p By setting δr = Σ,r,r we obtain the generalized impulse response function by GIRF(k, r, Σ ) = Ak Σ σr,r

−1/2

where σr,r is the r-th diagonal element of Σ . Lin (2006) states that when Σ is diagonal OIRF and GIRF are the same. Moreover GIRF is unaffected by the ordering of variables. The GIRF of the effect of an unit shock to the r-th equation is the same as that of an orthogonal impulse response but different for other shocks. Hence, the GIRF can easily computed by using OIRF with each variable as the leading one. In analogy to Eq. (22), the forecast error variance decomposition for the generalized impulse response function is defined as follows:     h−1 m h−1 X 2  X   −1 X g   k 2 ω = σr,r em |A Σ en  /  θk,m,n  , (23) m,n,h

k=0 m=1

k=0

16

2.10. Confidence bands for impulse response analysis In this section we do not enter the theoretical discussion which methods are best for estimating confidence bands for our (orthogonal and generalized) impulse response functions but focus on the ideas presented in Luetkepohl (2006). He states that if the distributions of a VAR model under consideration is unknown, so-called bootstrap or resampling methods may be applied to investigate the distributions of functions of stochastic processes or multiple time series. It is important to observe that our problem does not only involve a standard VAR model but a PVAR model with a GMM estimator. For panel datasets, there are in general three resampling schemes, temporal resampling, cross-sectional resampling and a combined resampling. In contrast to the earlier literature Kapetanios (2008) suggests to use cross-sectional resampling, where subsets of the data with the same panel individual panel identifier are drawn completely with replacement. Whereas Kapetanios (2008) shows that this bootstrapping procedure works well with many cross-sectional units for general panel models (not only dynamic panel models), Yan (2012) additionally presents Monte Carlo simulations that shows the superiority of this procedure in combination with the first difference GMM estimator. Kapetanios (2008) defines the bootstrapping procedure that we implement as follows: Definition 2.3. (cross-sectional resampling): For a T × N matrix of random variables Θ, cross-sectional resampling is defined as the operation of constructing a T × N ∗ matrix Θ∗ where the columns of Θ∗ are a random sample with replacement of blocks of the columns of Θ and N ∗ is not necessarily equal to N. Based on Definition (2.3) our bootstrapping procedure can be defined as follows: 1. Let P(i) = 1/N be the uniformly distributed probability of drawing i from the set i = 1, ..., N. Let Yi , Xi , S i be such a draw from the full panel data set. Repeat this draw N ∗ -times with replacement. 2. Depending on the selected argument, estimate Φ (onestep/twostep, system or first difference GMM) for the drawn dataset. 3. Depending on what impulse response function was selected, calculate the orthogonal or generalized impulse response function. Theorem 2.1. (finite sample distribution): By repeating step 1 to step 3 ω times, where ω is sufficiently large, one can approximate the finite sample distribution of the selected impulse response function by the empirical distribution of their bootstrap version.

17

√ We aware of Cao & √ Sun (2011) who18 show that there exist a correlation between N(ΦFEGMM − Φ) and N(ΣFEGMM − Σ). However, we also estimate our first difference GMM models jointly, not equation by equation, so in the twostep estimator we take this correlation into account.19 2.11. Specification tests 2.11.1. Hansen J-test for the validity of instrument subsets Following Roodman (2009a) a critical assumption for the validity of the GMM estimator is that the instruments are exogenous. If the model is overidentified, the Hansen J-Test (Hansen, 1982) falls naturally out of the GMM framework. The test can be applied to onestep (ΛZ ) and twostep (ΛZeˆ ) GMM estimations:  N |  N  X  −1 X  ˆ i  ΛZ  Zi E ˆ i  ∼a χ˜ 2L−K  Zi E (24) eˆ

i=1

i=1

L represents the number of instruments. K counts the number of parameters in the model. 2.11.2. Andrews-Lu model selection procedure In Andrews & Lu (2001) a class of model and moment selection criteria (MMSC) are introduced that are analogous to the well-known model section criteria for choosing between competing models. In the original notation of Andrews & Lu (2001) the basic MMSC reads as follows: MMS Cn (b, c) = Jn (b, c) − h(|c| − |b|)κn

(25)

Jn (b, c) is the Hansen J test statistics from Eq. (24). b stands for the number of parameters, c is the number of moment conditions and n is the number of total observations. We implemented three different versions of MMSC function: MMS C BIC,n (b, c) = Jn (b, c) − (|c| − |b|) · ln(n) MMS C AIC,n (b, c) = Jn (b, c) − (|c| − |b|) · 2 MMS C HQIC,n (b, c) = Jn (b, c) − Q · (|c| − |b|) · ln(ln(n))

(26)

Σ is defined as the variance-covariance matrix of the estimated residuals. Moreover, the model of Cao & Sun (2011) is developed for balanced panels. Based on the very limited Monte carlo studies in Cao & Sun (2011) we conclude that for T i > 5 for all i = {1, ..., N} the bias might not be too severe, even if 18 19

18

Andrews & Lu (2001) recommend the MMSC-BIC (Bayesian information criterion) or the MMSC-HQIC (Hannan-Quinn information criterion). The MMSC-AIC (Akaike information criterion) does not fulfill their consistency criterion as it has positive probability even asymptotically of selecting too few over-identifying restrictions. 2.11.3. Stability of the PVAR model The standard stability condition of the panel VAR coefficients is based on the modulus of each eigenvalue of the estimated model. Luetkepohl (2006) and Hamilton (1994) both show that a VAR model is stable if all moduli of the companion matrix are strictly less than one. Stability implies that the panel VAR is invertible and has an infinite-order vector moving-average representation. 3. Using panelvar In this section we demonstrate how to use panelvar package. We start with a wellknown one dimensional example (a single dynamic panel equation) to compare the results with STATA’s xtabond2 and the pgmm function in the plm package. For the same specification we apply first difference and system GMM estimations with the first difference and the forward orthogonal transformation. We compare these results for two-step estimator (also the Windmeijer correction). In this exercise we show the short comings of the pgmm function that does not allow the forward orthogonal transformation and cannot apply the collapse option in specific settings. Second, we apply the package panelvar to examples with more than one endogenous variable. We also perform a simulation exercise in which we compare our code with the existing STATA codes of Abrigo & Love (2016) who claim to use the Anderson & Hsiao (1982) GMM estimator. 3.1. Single equation dynamic panel estimation with panelvar In the following example a well-known data sets from the literature is used. The EmplUK (abdata) data set was used in the dynamic panel literature by Arellano & Bond (1991), Blundell & Bond (1998), Roodman (2009a) and also in Croissant & Millo (2008). This data set describes employment, wages, capital and output of 140 firms in the United Kingdom from 1976 to 1984. To be in line with the literature, we choose the specification of table 4b in Arellano & Bond (1991) and apply the first difference GMM estimator (i.e. )

19

log empi,t = log empi,t−1 + log empi,t−2 + log wagei,t + log wagei,t−1 + P log outputi,t + log outputi,t−1 + log capitali,t + Tt=2 yeart

(27)

Employment is explained by past values of employment (two lags), current and first lag of wages and output and current value of capital. The Eq. (27) can be estimated with the following STATA code: clear all webuse abdata #delimit ; xtabond2 n L1.n L2.n w L1.w k ys L1.ys i.year, gmmstyle(n, lag(2 99)) ivstyle(w L1.w k ys L1.ys i.year) nolevel twostep robust; #delimit cr The same example can be estimated with the package panelvar and the package plm: R> library("panelvar") R> data("abdata") R> Arellano_Bond_1991_table4b plm_Arellano_Bond_1991_table4b library("panelvar") R> data("abdata") R> Arellano_Bond_1991_table4b plm_Arellano_Bond_1991_table4b library(panelvar) R> library(plm) R> data("abdata") R> Blundell_Bond_1998_table4 plm_Blundell_Bond_1998_table4_test2 library(panelvar) R> data("Cigar") R> ex1_cigar_data_pvar ex1_hansen_j_test ex1_model_to_tests_eigenvalues ex1_cigar_data_pvar_gir ex1A_abigo_fd_nosystem ex1B_abigo_fod_nosystem