Impulse Response Functions in Generalized Bayesian Autoregressive ...

6 downloads 0 Views 272KB Size Report
Vector autoregressions (VAR) are extensively used to model economic time series. ... impulse response functions sensitivity of Bayesian VAR modeling to theĀ ...
Impulse Response Functions in Generalized Bayesian Autoregressive Models Hedibert F. Lopes (IM-UFRJ) and Helio S. Migon (IM-UFRJ)

Abstract

Vector autoregressions (VAR) are extensively used to model economic time series. The large number of parameters is the main dicult with VAR models, however. To overcome this, Litterman (1986) suggests to use a Bayesian strategy to estimate the VAR, equation by equation, where, a priori, the lags have decreasing importance (known as Litterman Prior). In this paper, a VAR model is analyzed through a Bayesian multivariate regression, with time varying parameter, i.e. a multivariate Bayesian dynamic linear model. In this case correlation between the VAR equations is allowed, as suggested by Kadiyala and Karlsson (1993). This methodology was applied to some Brazilian economic variables, as a bivariate VAR, (Lima, Migon and Lopes (1993)). The main interest remains in studying the impulse response functions sensitivity of Bayesian VAR modeling to the hyperparameters choice of the Litterman Prior. Con dence interval to these functions, extreme non-linear functions of the VAR parameters, are given.

1 Introduction Vector autoregressions have been extensively used to model economic time series, in the last two decades. Some of their main aspects are: easy forecasting implementation (Litterman (1986)) and unrestricted reduced form estimation which permit a two-stage procedure to estimate structural models (Lima, Migon & Lopes (1993)). The most common tool for analyzes the dynamic of VAR models is the impulse response function. Another advantage of VAR models is the possibility of analyzing the reduced form either from a classical (Lutkepohl (1991)) or a Bayesian approach (Litterman (1986), Koop (1992), Lima, Migon & Lopes (1993), Polasek (1993) e Kadiyala & Karlsson (1993)). The aims of this paper are to analyze the sensitivity of the impulse response function to prior's hyperparameters, in a bivariate autoregression, and to evaluate posteriors con dence intervals, by means of Monte Carlo numerical integration. The rest of the paper is organized as follows: in Section 2, the VAR model and its Vector Moving Average (VMA) representation, necessary to understand the impulse response function, are presented. The VAR model is also formulated as a Bayesian multivariate regression model. In section 3, the VAR formulation is rewritten as a Bayesian dynamic model. Two alternative prior's speci cations are shown in Section 4. One is used to 1

estimate the VAR model equation by equation, known as Litterman's prior; another assumes joint estimation. The Monte Carlo procedure used to obtain con dence intervals to the impulse response function is discussed in Section 5. Some examples are presented in Section 6 and Section 7 gives some conclusions.

2 Vector Autoregressive Models (VAR) The earliest use of VAR models was the work of Doan, Litterman & Sims (1984), whose main objective was to impose less arbitrary restrictions than traditional econometric models. The idea is that the model should rise possible restrictions by itself, at time of estimation and analysis. Following Lutkepohl (1991), suppose that yt follows a VAR of order p, i.e. a VAR(p):

yt =  + A yt? +    + Apyt?p + ut (1) where yt = (y t ; y t ;    ; ynt )0 is a vector with n time series, Ai : coecients matrices,  = ( ;    ; n )0 vector of intercepts and ut = (u t ; u t ;    ; unt )0 is a Gaussian white noise process, denoted ut  NID(0; u ). 1

1

1

2

1

1

2

2.1 The Vector Moving Average Representation (VMA)

The moving average representation of a VAR process gives a straightforward form to analyze the dynamic relations among the variables in the VAR. Equation (1) can be rewritten as

A(L)yt =  + ut where L is the lag operator and A(L) = I n ? A L ?    ? Ap Lp.

(2)

yt =  + (L)ut P iLi e  = I n. where:  = (1) ; (L)A(L) = I n ; (L) = 1 i

(3)

1

Assuming that all the series in yt are stationary (the roots of I ? A(L) are all outside unit circle), the moving average representation is:

=0

0

The moving average coecient matrices can be recursively obtained:

i =

Xi  A ; i?j j

j =1

i = 1; 2;   

(4)

with Aj = 0 8j > p. These coecients are called impulse response functions, since @y@uj;ttk+i = jk;i corresponds to the e ect on yj , i steps ahead, to a unit shock on yk . This de nition depends on the lack of correlation among the components of ut . This is the reason why the researches only analyze impulse response function where the shocks are orthogonal, i.e. E (uu0 ) = I n . A straightforward form to do this is to de ne ! t = P ?1 ut where u = PP 0 so that ! = I n . Notice that P can be, and usually is, the Cholesky decomposition of u . Algebraically, 2

1 X

1 X

i=0

i=0

yt =  + i ut?i =  + i!t?i

(5)

where ! t = P ?1 ut e i = i P However, to assume ! t = P ?1 ut where P is lower triangular means to impose a recursive structure in the system; in other words, the rst shock is the most exogenous and the last one is the least exogenous, which is a strong assumption. An alternative assumption would be to try other orders in the system and analyze the identi cation robustness. Fixing some parameters in P and using maximum likelihood estimation is the most acceptable method to estimate structural VAR's (Lima,Migon & Lopes (1993)). Since the main interest concerns analyzing the impulse response function sensitivity to the prior choice, it is enough to concentrate on reduced form estimation of VAR.

2.2 Bayesian VAR Estimation

The reduced VAR can be rewritten, transposing equation (1), as

y0t =  0 + y0t? A0 +    + y0t?pA0p + u0t

or,as a multivariate regression model

1

(6)

1

y0t = xt B + u0t (7) where xt = (1; y 0t? ;    ; y0t?p ) is a (1  k) vector; B = (A0 ;    ; A0p )0 is a (k  n) matrix and k = 1 + np is the number of parameters in each equation. The vector xt can be 1

1

extended to allow seasonal dummies, exogenous variables, and so on, without modifying the conclusions. Stacking the T observations:

Y = XB + U (8) with Y = (y ;    ; y T )0 ; X = (x0 ;    ; x0T )0 ; U = (u ;    ; uT )0 ; and U  N (0; I T ; ). 1

1

1

Note that the rows of U are i.i.d N (0; ). The likelihood function is:

L(B ;  j X ; Y ) / j  j?(+n+1)=2 expf? T2 tr?1 b g

 j  j?k= expf? 21 tr? (B ? c B)0 (X 0 X )(B ? c B)g (9) where  = T ? (n + k + 1). The MLE of B e  are, respectively, c B = (X 0X )? X 0 Y and 0 0 0 ? b = T Y (I ? X (X X ) X )Y . The kernel of the likelihood function is the kernel of a 2

1

1

1

1

Matrix Normal-Inverted Wishart distribution (see appendix A). Then it is possible to use a conjugate analysis, with the following prior distribution:

p(B; ) / j  j?(m+n+1)=2 expf? 21 tr?1 Gg

 j  j?k= expf? 12 tr? (B ? )0A? (B ? )g 1

2

3

1

(10)

Combining the likelihood function in (9) and the conjugate prior distribution (10), the posterior distribution is:

p(B;  j X ; Y ) / j  j?(+m+2n+k+2)=2 expf? 21 tr?1 G1 g

 j  j?k= expf? 21 tr? (B ? B )0 A? (B ? B )g (11) 0 0 c where G = T b + G + c B X X B + 0 A?  ? B 0 A? B , A? = (A? + X 0 X ) e B0 = (0A? + Y 0X )(A? + X 0X )? . 2

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

Using a noninformative prior (di use prior), the posterior distributions can be found doing : (1)  = 0; (2) A?1 = 0; (3) G = 0 and m = ?k. This development is identical to that one used in Bayesian uni-and-multivariate regression (Press (1972)).

3 The Bayesian Dynamic Common Components Model The Bayesian Dynamic Common Components Models (West & Harrison (1989)) appear when the interest is to model a system of time series which are modeled by the same explanatory variables (regressors, seasonal components, dummies etc.). VAR models have this characteristic, i.e., the explanatory variables are lagged dependent variables. Suppose that each one of n time series yjt (j = 1;    ; n), follows a standard Bayesian dynamic linear model (DLM),i.e.,

fxt; Gt ; Vt j ; W t j g (12) where xt , as in the last section, is a (1  k) vector of explanatory variables, Gt is a 2

2

transition matrix,Vt and W t are the observation and the system errors, respectively, all known for t = 1; 2;    ; T . The scale factors j2 (j = 1;    ; n) are unknown. Then we write the following observation and system equations.

yjt = xt jt + ujt

jt

ujt  N (0; Vt j2 ) = Gt j;t?1 + ! jt !jt  N (0; W t j2 )

(13) (14)

Notice that xt ; Gt ; Vt ; W t are common components; although each series has its own state vector tj , well speci ed. The model above, when rewritten in matrix form, is:

y0t = xt t + u0t  t  N (0; Vt ) (15) t = Gt t? + t t  N (0; W t ; ) (16) where yt = (y t ;    ; ynt )0 (n  1); ut = (u t ;    ; unt ) (n  1); t = ( t ;    ; nt ) (k  n);

t = (! t ;    ; !nt) (k  k) and  = fij g with jj = j . 1

1

1

2

1

1

The model speci ed by (15) and (16) is clearly an extension of the multivariate regression model (7) as far as it does permit time varying parameters. Some advantages of dynamic models are: the possibility to give more weight to more recent information using discounts, and the possibility of incorporating external information to the system. To complete the model speci cation it is necessary to provide some initial information. 4

3.1 Initial Information Suppose that

(0 j ; D0 )  N (m0 ; C 0 ) ( j D0 )  WI (S 0 ; n0 )

(17) (18)

are the initial parameters information, with known hyperparameters m0 ; C 0 ; S 0 ; n0 . It is useful to note the direct relationship between these hyperparameters and ; A; G; m, presented in the last section for the static VAR models. Therefore we can always use the same prior speci cations to static or dynamic BVAR models.

4 Prior Speci cation The large number of parameters in a VAR model (k = np + 1 parameters per equation) makes intractable the task of specify all prior's hyperparameters, k means and k(k + 1)=2 covariance; for instance, in a simple model with three variables (n = 3) and four lags (p = 4 the total number of hyperparameters to assign is 104 Fortunately, Litterman (1986) developed a procedure to specify these hyperparameters in a particular, but useful, way.

4.1 Litterman's Prior

The Litterman prior is speci ed equation by equation,

y i = X i + u i ( )

( )

where u(i)  N (0; i2 I T ), as:

( )

i = 1; 2;    ; n

(19)

( (i) j i2 = ^i2 )  N (0; e i ) where e i = diag(!1 ;    ; !k ) and

(20)

!j = l ^^i2

(21)

2

j

l = 1;    ; p j = 1;    ; n

that is, the prior distribution for each lagged variable is increasingly concentrated around zero when the lag length grows (giving less weight, a priori, to larger orders in the VAR). The variance ratio is only used to eliminate scale e ects. The priors for predetermined variables can be similarly speci ed; for instance, !j = 1 ^i2 . The hyperparameters  and 1 express our prior knowledge. It can be thought that e i ! 1 ( very large), indicates lack of prior information, or little information a priori. On the other hand, if  ! 0, each equation should be seen as a white noise process yit = uit. In section 6, we will use some di erent values for  in a bivariate VAR with Brazilian economic data. It is easy to show that the posterior is a multivariate normal (see Press (1972)). This kind of procedure does not permit interaction among the n variables in the system. This assumption generally is rather unrealistic, since it is common in economy that series have similar patterns. Below we present an alternative joint speci cation prior to all VAR equations. 5

4.2 Generalized Litterman's Prior

Kadiyala & Karlsson (1993) compare the forecast performance of VAR with di erent priors speci cations. In this paper, the hyperparameters of the conjugate prior de ned in (10) is adapted to accommodate a generalization of the Litterman strategy. So, 1. De ning G = diag(G1 ;    ; Gn ), Gi = (m ? n ? 1)^i2 , then,

E (ii) = m ?Gni ? 1 = ^i2

m > n+1

(22)

Notice that the mean of ii is equal to ^i2 only a priori, in opposition to Litterman's prior which assumes that ii = ^i2 . 2.  = 0 (i.e., zero mean to all lags). 3. Using ( j )  N (;  A)  TT +m (; G A)

(23) (24)

V ( ) = (m ? n ? 1)?1 diag(G1 A;    ; GnA) = diag(^12 A;    ; ^n2 A):

(25)

where = vec(B ),  = vec(), E ( ) =  and V ( ) = mG? nA ?1 if m > n + 1, then

The similarity with Litterman's prior follows by setting

A = diag(alj );

alj = ^2 l j

(26)

From (21) and (26) it is easy to see that  can be di erent for each variable j and lag l. Then, we rewrite a conjugate prior distribution to (B; ) with the same properties that

Litterman's Prior, but without forcing  to be diagonal. Other prior speci cations could be directly used, for instance the generalized natural prior presented in Press (1972) and the di use-normal prior presented in Kadiyala & Karlsson (1993). In the following section we will present the procedure used to generate con dence intervals to the impulse response functions from Monte Carlo simulations in p(B ;  j X ; Y ).

5 Con dence Intervals to the Impulse Response Function Once the reduced form VAR was estimated it is interesting to calculate the impulse response function distribution. Pi=0 iut?i The MA Prepresentation of a VAR(p) process could be described by yt = 1 where i = ij =1 i?j Aj , as was shown in section (2.1). It is easy to see that the impulse 6

response functions are extremely non-linear functions of the parameters in VAR. In spite of this nonlinearity it is useful to access their distributions, at least numerically. There are several results in the literature concerning numerical and stochastic integration techniques to obtain moments and density of this kind of functions (for instance, Laplace's Methods, Gaussian Quadrature, Sampling-Resampling and Gibbs Sampling). Since our posterior distribution for (B ; ) is easy to be sampled we use simple Monte Carlo integration to obtain the moments of the impulse response functions. Geweke (1989) shows that simple Monte Carlo Integration is ecient in Gaussian linear models, and we have shown that VAR belongs this class of models. The main steps to get approximate con dence intervals to the impulse response functions using Monte Carlo integration is as follows. Denoting the posterior distributions by p(B j )  Matrix Normal and p()  Inverted Wishart and the impulse response functions by G  G(B ; ), the steps to get Ed (G) and Vd (G), from M posterior draws, are 1. Draw (B i ; i ) from p(B ; ) i = 1; 2;    ; M . 2. Calculate Gi from B i and i 8i. 3. Calculate M X

Ed (G) = M1 Gi i=1 M X

Vd (G) = M1 (Gi ? Ed (G))2 i=1 The dynamic relations among variables in the VAR can then be investigated. In the Appendix A, some matrix distributions (Matrix Normal, Matrix-t,Wishart and Inverted Wishart) are described, and in the appendix B, gives the steps t generate random matrices from these distributions through univariate generations.

6 An Illustration In this section the bivariate VAR is analyzed either from a Bayesian and classical point of view and the con dence intervals to the impulse response functions are estimated. Brazilian economic data, from third quarter of 1982 up to rst quarter of 1990, is used and the variables involved are: industrial production index and unemployment rate. Only 31 observations are available, which suggests that Bayesian inference should be essential. The logarithm of unemployment rate (UN) and the rst di erence of the log of the industrial production (IP) were used to get stationary. Therefore, the vector of endogenous variables is yt = (UNt ; IPt )0 .

6.1 VAR Order

An advantage of BVAR modeling is its unrestricted estimation, with decreasing weights to longer lags. However, the lag length were searched using some information criteria, like: Akaike information criteria (AIC), Schwarz criteria (SC) and Hannan & Quinn criteria (HQ), which are de ned, in a VAR context, as follows: 7

(a) AIC (p) = ln j b u(p) j + T kn (b) SC (p) = ln j b u(p) j + lnTT kn (c) HQ(p) = ln j b u(p) j +2 ln lnT T kn where b u (p) is the MLE of u . In this application we have xed the lag length in 8 2

2

( )

(

2

( ))

2

quarters. Applying these criteria, we found that p = 2. To compare the sensitivity of the impulse response functions the lag length choice, p = 4 and p = 6 were also used. Other motivation to use these lags is that these criteria depend on maximum likelihood estimation, which is sensible to the sample size (we only have 31 observations). Table 1 shows the information criteria search. More details can be found in Lutkepohl (1991). Koreisha & Pukkila (1993) have studied the sensitivity of some information criteria to the number of components in yt and they suggest using a modi ed HQ criteria.

6.2 Hyperparameters Choice

The Litterman's hyperparameter  presented in (21) of the section (3.1) was xed in 2.7 and 1.0. The rst value is quite di use prior and the second is more informative. A scheme for this choice of  can be found in Lima, Migon & Lopes (1993). It is common to search for a  that minimizes some objective function, as the h-step ahead mean absolute error. This procedure is, however, not completely Bayesian since it uses the dataset twice (this kind of procedure is called Empirical Bayes). A complete Bayesian analyses of  should be to specify p(), a prior distribution for , for instance, in a hierarchical fashion (Polasek (1993)) and using Monte Carlo Markov Chain techniques to built posterior moments numerically.

p

Table 1: Lag Lenght Choice AIC HQ SC -10.9963 -10.6614 -10.5138

1 2 -11.4769 -10.7891 -10.5249 3 -11.3557 -10.2956 -9.8909 4 -10.0916 -9.6378 -9.0866 5 -10.1765 -9.3060 -8.6014 6 -9.7631 -9.4506 -8.5850 7 -10.1236 -9.3411 -8.3065 8 -9.6417 -8.3587 -7.1456

6.3 Impulse Response Functions

The procedure suggested in section 5 was used to analyze the behavior of the impulse response functions obtained from VAR or BVAR. Table 2 below shows the simulation exercise. The impulse response graphics to reduced form shocks are presented in appendix C. We used M=500 draws of (B ; ), since for M as big as 50.000 the results were virtually the same. All the programs have been carried out in a PC-AT/286 with maths processor and the spent time was about 30 seconds for 500 Monte Carlo iterations. 8

Table 2: Exercises Estimation p=2 p=4 p=6 Classic ( = 100) Exercise 1 Exercise 4 Exercise 7 Bayesian ( = 2:7) Exercise 2 Exercise 5 Exercise 8 Bayesian ( = 1:0) Exercise 3 Exercise 6 Exercise 9 Some comments are needed. Firstly, the VAR(2) has similar results in all estimation procedures, but with shorter intervals as more prior information is available. Secondly, the VAR(4) has almost the same results that VAR(2), but its impulse response are smoother and the con dence intervals shorter than VAR(2) as can be seen in Appendix C. Finally with p = 6 the classical model is the worst (larger con dence intervals); possibly a re ex of few degrees of freedom. This disadvantage is less evident when we used prior information (exercise 8 and 9). Finally, in the classical estimation as bigger is the VAR order the larger will be its memory; as can be seen, for instance, in the impulse response functions of IP. The same happens in Bayesian cases, but to a less extent. Notice that our shocks' system are correlated; for instance, the response of unemployment to a shock II varies between -2 and 4 while the response of PI vary from -0.8 to 0.6, indicating clearly scale in uences1. The Bayesian analysis is quite the same whether  = 2:7 or  = 1:0, which con rms that  = 2:7 is very informative.

7 Final Comments and Perspectives As we have mentioned in the introduction, our main interest was to study, empirically, the impulse response function's sensitivity to the hyperparameter's choices of the Litterman's prior. In this context, we have seen that informative prior distributions have direct e ect on these functions. The Monte Carlo Integration was simpli ed, since we have drawn from univariate distributions and matrix calculations were used to obtain matrix draws. The Bayesian VAR, static or dynamic, has great potentialities. It is important to remember that the multivariate analysis of time series is a challenger to modellers. Some ideas for further work are: (1) BVAR Hierarchic Modeling should be used instead of xing  in the Litterman's Prior; (2) To investigate other priors distributions in place of Litterman's Prior which was used in this work; (3) To investigate the sensitivity of discount factors in a Bayesian dynamic modeling; (4) The e ects of these results on structural VAR's; (5) The application of recent integration techniques to approximate posterior moments of complicated functions, such as Monte Carlo Markov Chains techniques. 1 Our objective was not to give a economic interpretation to the MA representation, but instead we

were interested in its sensitivity to the prior hyperparameters choice.

9

References [1] Dawid, A. P. (1981) \Some matrix-variate distribution theory: Notational considerations and a Bayesian Application ", Biometrika, 68, 265-274. [2] Doan, T., Litterman, R., and Sims, C. (1984) \Forecasting and Conditional Projection Using Realistic Prior Distributions ", Econometric Reviews, 3, 1-100. [3] Dreze,J. H., and Richard, J. F. (1983) \Bayesian Analysis of Simultaneous Equation Systems ", in Handbook of Econometrics, eds. Z. Griliches and M. Intriligator, North-Holland, Amsterdam, pp. 517-598. [4] Geweke, J. (1989) \Bayesian Inference in Econometric Models Using Monte Carlo Integration ", Econometrica, 57-6, 1317-1339. [5] Johnson,N.L., and Kotz, S. (1972) Distributions in Statistics: Continuous Multivariate Distributions, John Wiley, New York. [6] Kadiyala, K. R., and Karlsson, S. (1993) \Forecasting with Generalized Bayesian Vector Autoregressions, Journal of Forecasting, 12, 365-378. [7] Koop, G. (1992) \Aggregate Shocks and Macroeconomics Fluctuations: A Bayesian Approach ", Journal of Applied Econometrics, 7, 395-411. [8] Koreisha, S. G., and Pukkila, T. (1993) \Determining the Order of a Vector Autoregression when the Number of Component Series is Large ", Journal of Time Series Analysis, 14(1), 47-69. [9] Lima, E. C. R., Migon, H. S., and Lopes, H. F. (1993) \Efeitos Din^amicos dos Choques de Oferta e Demanda Agregadas sobre o Nvel de Atividade Econ^omica do Brasil ", Revista Brasileira de Economia, 47(2), 177-204. [10] Litterman, R. (1986) \Forecasting With Bayesian Vector Autoregressions { Five Years of Experience ", Journal of Business and Economic Statistics, 4(1), 25-38. [11] Lopes, H.F. (1994) \Aplicaco~es de Modelos Autoregressivos Vetoriais Bayesianos ", Unpublished MSc. Dissertation, Deparment of Statistical Methods - Federal University of Rio de Janeiro. [12] Lutkepohl, H. (1991) Introduction to Multiple Time Series Analysis, SpringerVerlag, New York. [13] Polasek, W. (1993) \Gibbs Sampling in VAR Models with Tightness Priors ", Manuscrito, University of Basel. [14] Press, S. J. (1972) Applied Multivariate Analysis, Holt, Rinehart and Winston, Inc. New York. [15] West, M., and Harrison, J. (1989) Bayesian Forecasting and Dynamic Models, Springer-Verlag, New York.

10

APPENDIX Appendix A : Matrix Distributions

This appendix presents some of the main matrix distributions. Some basic references are Dawid (1981), Dreze & Richard (1983) and Johnson & Kotz (1972).

Matrix Normal

A matrix Y (T  n) is said to be a matrix normal distribution (or matricvariate normal), with parameters , and , denoted Y  N (; ; ), if and only if its density function is given by: p(Y j ; ; ) = [(2)Tn j  jT j jn]?1=2 expf? 21 ?1 (Y ? )0 ?1 (Y ? )g (27) where  (T  n) and > 0;  > 0 are (T  T ) and (n  n) matrices, respectively. The mean and variance of Y are E (Y ) =  and V (vec(Y )) =  , respectively. The matrix is the covariance matrix of the columns of Y and the  matrix is the covariance matrix of the rows Y . Lets Y = (y1 ; y 2 ;    ; yn ) = (y(1) ; y(2) ;    ; y(T ) )0 , where yi is a (T  1) vector that represents the i-th column of Y and y(i) is a (n  1) vector that represents the i-th row of Y . With this notation, it can be said that E (y i y0j ) = ij and E (y (i) y0(j) ) = !ij . Some properties of the matrix normal distribution are generalizations of univariate and multivariate normal: 1. Let Z be (T  n), where zij are independent standard normal; then, in matrix notation, Z  N (0; IT ; In ) (standard matrix normal distribution); 2. Let Z  N (0; IT ; ) ) ZB  N (0; IT ; B 0 B ); 3. Let Z  N (0; ; In ) ) AZ  N (0; A A0 ; In ); 4. Let Z  N (0; IT ; In );  = B 0 B and = AA0 ) AZB +   N (; ; );

Matrix-t

A matrix Y (T  n) is said to be a matrix-t distribution (or t-matricvariate), with parameters , ,  and  degrees of freedom, denoted Y  T (; ; ), if and only if its density function is:

p(Y j ; ; ;  ) = k j j=2 j  jn=2 j + (Y ? )0 ?1 (Y ? ) j?(+T )=2

Q

Q

(28)

where k = ?Tn=2 ni=1 ?[( + n ? i + 1)=2)]= ni=1 ?[( ? i + 1)=2], (T  n), > 0(T  T ), > 0(n  n) and  > n ? 1. The mean and variance of Y are E (Y ) =  if  > n and V (vec(Y )) = ( ? n ? 1)?1 ( ) if  > n + 1, respectively. Some properties of the matrix-t are generalizations of univariate and multivariate t distributions: 1. Let   WI (In;  ) and (Y j )  N (0; IT ; ), then Y  T (0; IT ; In ) (standard matrix-t distribution); 11

2. Let X  N (0; IT ; ) and U  W ( ;  ), then Y 0 = X 0 U ?1=2  T (0; ; ), where U ?1=2 is the inverse of the square root of U 2 ; 3. Let Y = (y1 ; y2 ;    ; y n ) be matrix-t, then p(Y ) = p(y1 )p(y2 j y1 )    p(yn j y1 ;    ; yn?1) where each one conditional distributions is multivariate t. 4. Let Y be matrix-t, then any submatrix of Y will be matrix-t, too.

Wishart

A matrix (n  n) is said to be a Wishart distribution with parameters  and  (degrees of freedom), denoted  W (;  ), if and only if its density function is:

p( j ;  ) = k j j j j=2

 ?n?1)=2

(

Q

expf? 21 tr?1 g

where k?1 = 2n=2 n(n?1)=4 ni=1 ?[( + 1 ? i)=2];   n, and  > 0 (n  n). The mean, variance and covariance are E ( ) =  ,V (!ij ) =  (ij2 + ii jj ) and COV (!ij ; !kl ) =  (ik jl + il jk ), respectively, where  = fij g.

Inverted Wishart A matrix (n  n) is said to be a Inverted Wishart distribution with parameters Q e  (degrees of freedom), denoted  WI (Q;  ), if and only if its density function is: = p( j Q;  ) = k j j jQ jn = expf? 12 tr ? Qg Q where k? = 2n= n n? = ni ?[( + 1 ? i)=2];   n, and Q > 0 (n  n). The mean of is E ( ) =  ? n Q if  > n + 1. 2

1

( + +1) 2

1

2

(

1

1) 4

=1

( +1)

Some of the main properties among these distributions are:

1. 2. 3. 4. 5.

Let   WI (Q;  ) ) ?1  W (Q?1 ;  ); Let ?1  WI (Q?1 ;  ) ) A?1 A0  WI (AQ?1 A0 ;  ); Let Z  N (0; IT ; In ) ) Z 0 Z  W (In ; T ) : Standard Wishart; Let n = 1 W (1;  ) ) 2 ( > 0); Let n = 1 WI (1;  ) ) (2 )?1 ;

p

2 In the univariate case: x  n(0; 1) and y  2 then t = x= y= has a t distribution.

12

Appendix B: Matrix Normal-Inverted Wishart Posterior Generation In the BVAR model, the posterior information is resumed as follows: (B j )  N (B ; A ; ) ()  WI (G ; m ) 0

(29) (30)

0

0

0

where B is (k  n), A is (k  k), G is (n  n) and m scalar. Then the following strategy to draw from this posterior densities can be suggested. 0

0

0

Inverted Wishart Generation To draw i from (30) we have the following steps: 1. Create Z (m  n) where zij  N (0; 1). This way Z  N (0; Im0 ; In); 2. It is known that Z 0Z  W (In; m ) (property 3 from Inverted Wishart); 3. It is known that (Z 0Z )?  WI (In; m ) (property 1 from Inverted Wishart); 0

0

1

0

4. Calculate A so that AA0 = G (A is the Cholesky decomposition of G ); 5. Calculate i = A(Z 0Z )? A0  WI (G ; m ) (property 2 from Inverted Wishart). 0

0

1

0

0

Matrix Normal Generation To draw (B j i) from (29)we have the following steps: 1. Create Z (k  n) where zij  N (0; 1). This way Z  N (0; Ik ; In);

2. Calculate A so that AA0 = A . (A is the Cholesky decomposition of A ); 3. Calculate B so that BB 0 = i . (B is the Cholesky decomposition of i); 4. Calculate B i = B + AZB 0  N (B ; A ; i) (property 4 from Matrix Normal). 0

0

0

0

13

0