Factor Markov Models with Finite Dimensional Dependence - CiteSeerX

0 downloads 0 Views 197KB Size Report
Mod eles a facteurs markoviens et d ependance nie. Nous consid erons des mod eles a facteurs non line eaires, o u le facteur ( t) est markovien et admet une ...
Factor Markov Models with Finite Dimensional Dependence

Christian Gourieroux Joann Jasiaky This version: March 28, 2000 (First draft: June 26, 1998)

 CREST and CEPREMAP, e-mail: [email protected] y York University, e-mail: [email protected]

The second author gratefully acknowledges the nancial support of the Natural Sciences and Engineering Research Council of Canada

THIS VERSION: March 28, 2000

Abstract Factor Markov Models with Finite Dimensional Dependence We consider nonlinear factor models, where the factor (t) is Markov and features nite dimensional dependence (FDD), i.e. admits a transition function of the type:  (t t?1 ) =  (t )a0 (t )b(t?1 ), with a nite number of cross e ects between the present and past values. We discuss various characterizations of the FDD condition in terms of the predictor space and nonlinear canonical decomposition. The FDD models are shown to admit explicit recursive formulas for ltering and smoothing of the observable process, arising as an extension of the Kitagawa approach. The ltering and smoothing algorithms are given in the paper. Keywords: Predictor Space, Factor, Finite Dimensional Dependence, Canonical Analysis, Filtering, Smoothing JEL : C4 j

0

0

THIS VERSION: March 28, 2000

Resume Modeles a facteurs markoviens et dependance nie Nous considerons des modeles a facteurs non lineeaires, ou le facteur (t ) est markovien et admet une fonction de transition: (t t?1 ) = (t)a0 (t )b(t?1), avec un nombre ni d'e ets croises entre valeurs presentes et passees (dependance nie). Nous donnons diverses caracterizations de la condition de dependance nie en terme d'espace de predicteurs et de decomposition canonique non lineaire. Nous derivons ensuite des formules recursives explicites pour le ltrage et le lissage du processus observable. Mots cles: Espace de predicteurs, facteur, dependance de dimension nie, analyse canonique, ltrage, lissage. JEL : C4 j

1

THIS VERSION: March 28, 2000

1 Introduction In this paper we study nonlinear factor models with nite dimensional dependence [FDD henceforth], where the observable stationary process (Yt ) depends on stationary latent factors (t ), with measurement and transition equations de ned by:  measurement equation: l(yt jt ; yt?1 ) = q(yt jt ); (say); (1.1)

 transition equation: l(t jt?1 ; yt?1) = (t jt?1 ); (say) J X = (t ) aj (t )bj (t?1 ) j=1 = (t )a0 (t )b(t?1 );

(1.2)

where t?1 = ft?1 ; t?2 ; :::g and yt?1 = fyt?1; yt?2 ; :::g denote the sets of lagged values up to and including t ? 1 and t denotes t?1 updated by adding t . The measurement equation (1.1) shows that the dynamics of yt is solely determined by the current value of the factor. The transition equation de nes the factor process (t ) as a Markov process of order one. In particular, we see from (1.2) that the transition function admits a decomposition, which involves the marginal distribution (t ) of t and a nite number of cross-e ects aj (t )bj (t?1 ); j = 1; :::; J between the current and lagged values of the factor. Whereas the two rst conditions are standard for nonlinear factor models [see, e.g. Harrison, Stevens (1976), Harvey (1981), Kitagawa, Gersch (1984), (1996), Shephard, Pitt (1997)], the latter one pertains to the nite dimensional dependence. The objective of this paper is to characterize the FDD condition and to show that FDD nonlinear processes admit tractable ltering and smoothing algorithms, which can easily be derived and used for estimation and forecasting. The paper is organized as follows. In section 2, we study the dynamic properties of the factor process. We characterize the FDD condition in terms of predictor spaces and explain the computation of factor predictions at any horizon. In section 3 we consider the observable process. We provide explicit recursive formulas for ltering and smoothing which extend the standard Kitagawa's result [Kitagawa (1987)]. In section 4 we distinguish and compare the dynamics of true latent and mimicking factors. An example of a switching regimes process with smoothed regime transitions is presented in section 5. The proofs are gathered in appendices.

2

THIS VERSION: March 28, 2000

2 Dynamic Properties of the Factor Process The FDD property is de ned with respect to the dynamics of the latent factor. Let us rst characterize it in terms of the dimension of the predictor space of the factor process. The particular form (1.2) of the transition function allows for a straightforward computation of the transition function h-step ahead.

Property 2.1: If (t ) satis es the transition equation (1.2), and  h (t ( )

is the transition equation h steps ahead, we get:

h?1 jt?1 ) = l(t+h?1 jt?1 ; yt?1 )

+

(h) (t+h?1 jt?1 ) = (t+h?1 )a0 (t+h?1 )Dh?1 b(t?1 ); where the matrix D has for elements:

dj;k = E [bj (t )ak (t )]; j; k = 1; :::; J: The above equation resembles the general prediction formula for Markov chains with a nite state space. The di erence arises from replacing the transition matrix by the D matrix, which de nes the basic links between the past and the future. However the D matrix, unlike the transition matrix, does not necessarily have nonnegative elements and they don't necessarily sum up to one by columns. The following properties characterize the FDD condition for a (nonlinear) Markov process of order one. They extend a well-known result of Akaike (1974) [see also Tong (1990), p 186], valid for linear processes.

Property 2.2: Let (t ) be a stationary Markov process with continuous transition function. The predictor space, i.e. the space generated by the conditional expectations: E [g(t ; :::; t H )jt? ], +

1

H; g varying, has a nite dimension if and only if the transition function can be written as: (t jt?1 ) = (t )a0 (t )b(t?1 ):

The FDD condition can also be characterized in terms of the nonlinear canonical decomposition. Whenever the distribution of (t ) satis es the condition:

Z Z  ( j ) 2 t t?1  ( ) ( )d d < +1; t t?1 t t?1 (t )

(2.1)

the transition function can be decomposed as [Lancaster (1968)]:

(t jt?1 ) = (t )[1 +

1 X j=1

j j (t ) j (t?1 )];

(2.2)

3

THIS VERSION: March 28, 2000

where the canonical correlations j ; j varying, are nonnegative scalars, and the canonical directions satisfy the orthogonality and normalization conditions: E [ j (t ) k (t )] = E [ j (t ) k (t )] = 0; 8j 6= k; E [ j (t )] = E [ j (t )] = 0; 8j; E [ 2j (t )] = E [ j2 (t )] = 1; 8j:

Property 2.3: The stationary Markov process (t ) with continuous transition function satis-

fying the constraint (2.1), has the FDD property if and only if it admits a nite number of non zero canonical correlations.

The dimension of the predictor space is equal to the number of nonzero canonical correlations, i.e. to the minimal number of cross-elements in the decomposition formula (1.2).

Remark 2.1: The nonlinear canonical decomposition can be used to simplify computations of

the h-step ahead transitions. For example, if:

JX ?1

j j (t ) j (t?1 )g j=1 = (t )f1 + 0 (t ) (t?1 )g;

(t jt?1 ) = (t )f1 + where  = diag (j ). We get:

(h) (t+h?1 jt?1 ) = (t+h?1 )f1 + 0 (t+h?1 )()h?1 (t?1 )g; where  = E [ (t ) 0 (t )].

3 Dynamic Properties of the Observed Process In practice the latent factor is not observed and the available information at date t ? 1 includes the lagged values of y only. Thus the dynamics of the process has to be inferred from the conditional distribution l(yt jyt?1 ). We explain how to compute recursively the predictive distributions l(ytjyt?1 ) and l(t jyt ). This procedure yields explicit ltering formulas given in subsection 3.1. In the second subsection these results are extended to computing predictions at any horizon h, i.e. l(yt+h?1jyt?1 ). The smoothing is described in the last subsection.

3.1 The lter In a standard approach, the lter is based on the bayesian updating equations. Let us initiate the lter at time t and assume that l(t?1 jyt?1 ) is known. We have:

4

THIS VERSION: March 28, 2000

l(yt ; t jyt?1 ) = = =

Z Z Z

l(yt; t jt?1 ; yt?1)l(t?1 jyt?1 )dt?1 l(ytjt ; t?1 ; yt?1)l(t jt?1 ; yt?1 )l(t?1 jyt?1 )dt?1 q(yt jt )(t jt?1 )l(t?1 jyt?1 )dt?1 ;

(3.1)

by using the conditions (1.1) and (1.2). Then we deduce:

Z

l(ytjyt?1 ) = l(yt ; t jyt?1 )dt ;

(3.2)

l(y ;  jy ) l(t jyt ) = l(ty jty t?)1 : t t?1

(3.3)

and

In general this algorithm is intractable [see Kitagawa, Gersch (1996), chapter 6]. Indeed, if the factor is d-dimensional, it requires at each step the computation of the d-dimensional integral (3.1), for any admissible values of t in (3.2), i.e. an in nite number of d-dimensional integrals. It is known that this algorithm simpli es when the joint process (yt ; t ) is gaussian, or when the factor admits a nite state space. In the rst case we get the Kalman's lter [Kalman (1960), Kalman, Bucy (1961)] or its extension to generalized models [West et alii (1985)], and the Kitagawa's lter in the second case [Kitagawa (1987), Hamilton (1989), Hamilton, Susmel (1994)]. We see below that a simpli cation also arises for FDD factor models. Let us substitute for the transition function its expression (1.2). We get:

Z

q(yt jt )(t )a0 (t )b(t?1 )l(t?1 jyt?1 )dt?1 Z = q(yt jt )(t )a0 (t ) b(t?1 )l(t?1 jyt?1 )dt?1 Z 0 = q(yt )~(t jyt )a (t ) b(t?1 )l(t?1 jyt?1 )dt?1 = q(yt )~(t jyt )a0 (t )c(yt?1 );

l(yt ; t jyt?1 ) =

(3.4)

where : q(yt ) is the marginal p.d.f. of yt , ~ (t jyt ) is the conditional p.d.f. of t given yt , c(yt?1 ) = E [b(t?1 )jyt?1 ]: The formula (3.4) shows that the predictive distribution depends on the lagged distribution through a nite number of summary statistics, i.e. the J components of c(yt?1 ). Therefore we need only to nd out how to update the c function when a new observation arrives.

Property 3.1: The FDD lter

i) The updating formula for the summary statistic c is:

THIS VERSION: March 28, 2000

5

E [b( )a0 ( )jy ]c(y ) c(yt ) = E [at0 ( )jty ]ct(y t?) 1 : t t t?1

ii) Then the predictive distribution is:

l(yt; t jyt?1 ) = q(yt )~(t jyt )a0 (t )c(yt?1 ): In particular:

l(ytjyt?1 ) = q(yt )E [a0 (t )jyt ]c(yt?1 ); ~ ( jy )a0 ( )c(y ) l(t jyt ) = E [ta0 (t )jy t]c(y t?1) : t t t?1 The FDD lter requires only at each step the computation of a nite number of integrals, which are the elements of E [a0 (t )jyt ] and E [b(t )a0 (t )jyt ]. Moreover the lter is also valid when the information on y contains observations recorded between the initial date t = 1 (say) and the present date. Therefore it may be used in the standard approach to compute numerically the likelihood function of the model [see Harvey (1989)].

3.2 Predictions h-step ahead. We have:

l(yt+h?1; t+h?1 jyt?1) Z = q(yt+h?1 jt+h?1 )(h) (t+h?1 jt?1 )l(t?1 jyt?1 )dt?1 = q(yt+h?1 )~ (t+h?1 jyt+h?1 )a0 (t+h?1 )Dh?1 c(yt?1 ):

Property 3.2: The predictive distribution of the observable process h-step ahead is: l(yt+h?1jyt?1 ) = q(yt+h?1 )E [a0 (t+h?1 )jyt+h?1 ]Dh?1 c(yt?1 ):

3.3 Smoothing Explicit smoothing formulas can also be derived for the unobservable factor, allowing to compute the conditional distribution: l(t jyt ; yt+1 ; :::; yt+H ). Let us consider the joint conditional distribution:

l(yt+H ; t+H ; yt+H ?1 ; t+H ?1 ; :::; yt+1 ; t+1 ; t jyt )

6

THIS VERSION: March 28, 2000

= q(yt+H jt+H )(t+H )a0 (t+H )b(t+H ?1 ) q(yt+H ?1 jt+H ?1 )(t+H ?1 )a0 (t+H ?1 )b(t+H ?1 )

::: q(yt+1 jt+1 )(t+1 )a0 (t+1 )b(t )l(t jyt ) = q(yt+H )~ (t+H jyt+H )a0 (t+H )b(t+H ?1 ) q(yt+H ?1 )~ (t+H ?1 jyt+H ?1 )a0 (t+H ?1 )b(t+H ?2 ) ::: q(yt+1 )~(t+1 jyt+1 )a0 (t+1 )b(t )l(t jyt ): By integrating out the future values of the latent factor we nd:

l(yt+H ; yt+H ?1 ; :::; yt+1 ; t jyt ) = q(yt+H )E [a0 (t+H )jyt+H ] q(yt+H ?1 )E [b(t+H ?1 )a0 (t+H ?1 )jyt+H ?1 ] ::: q(yt+1 )E [b(t+1 )a0 (t+1 )jyt+1 ]b(t )l(t jyt ): Therefore the smoothing formula is:

l(t jyt ; yt+1 ; :::; yt+H ) = l(yt+H ; yt+H +1 ; :::; yt+1 ; t jyt ) = l(yt+H ; :::; yt+1 jyt ) HY ?1 = E [a0 (t+H )jyt+H ) E [b(t+h a0 (t+h )jyt+h ]b(t )l(t jjyt ) h=1 ( )?1 HY ?1 0 0 E [a (t+H )jyt+H ) E [b(t+h a (t+h )jyt+h ]E [b(t )jyt ] : h=1

4 Multiplicity of Factor Representations A factor representation, when it exists, is in general not unique. For instance a process (yt ) satisfying (1.1)-(1.2) with a factor (t ) also admits a factor FDD representation based on any factor (t1 ) satisfying a one to one di erentiable relationship with (t ). For practical implementations it is interesting to study models, where factors are determined by the observed history of (yt ). These factor representations are called endogenous and the corresponding factors are said mimicking factors [see e.g. Huberman et alii (1987) for nancial interpretations in a linear framework].

7

THIS VERSION: March 28, 2000

Let us brie y point out the di erence between factors in wide sense and the mimicking factors. For example, in the lter (3.4), when the global information set (yt?1 ; t?1 ) is considered, the e ect of the past is entirely captured by the transformed factors: b1 (t?1 ); :::; bJ (t?1 ) (the last but one line). However when we restrict the information to the observables (yt?1 ) (the last line), the e ect of the past is captured by cj (yt?1 ) = E (bj (t?1 )jyt?1 ); j = 1; :::; J , which mimicks the statistical properties of the latent factor. Even if the process (yt ) does not admit a FDD factor representation with factor (c(yt?1 )), (for instance since (c(yt?1 )) is not a Markov process), this mimicking factor still features some properties of the true latent factor dynamics. Typically we deduce from Property 3.1:

l(yt jyt?1 ) = q(yt )[

J X j=1

a^j (yt )cj (yt?1 )];

where a^j (yt ) = E [aj (t )jyt ]; j = 1; :::; J . This is a decomposition with a nite number of cross terms describing the present - past relationship.

Remark 4.1: When the factor transition function admits the canonical decomposition: (t jt?1 ) = (t )f1 + 0 (t ) (t?1 )g; the observable transition function becomes:

l(yt jyt?1) = q(yt )f1 + ^0 (yt )  (yt?1 )g; where ^(yt ) = E [ (t )jyt ] , and  (yt?1 ) = E [ (t?1 )jyt?1 ]. By comparing this equation to the factor transition we note that the current canonical directions aj (t ) have been replaced by static predictions a^j (yt ) = E [aj (t )jyt ], whereas the lagged directions bj (t?1 ) have been replaced by their dynamic predictions j (yt?1 ) = E [bj (t?1 )jyt?1 ]. The initial canonical directions bj (t?1 ) and their observable counterparts j (yt?1 ) de ne stationary processes with the same mean, and variances such that:

V j (yt?1 )  V bj (t?1 ); by the variance decomposition equation.

5 Smoothed Transitions Factor models with latent switching regimes have been widely studied in the literature. Commonly, the underlying latent process is assumed to follow a Markov chain [see, e.g. Hamilton (1989), Hamilton, Susmel (1994)], and the observable likelihood is computed using the Kitagawa's lter

8

THIS VERSION: March 28, 2000

[Kitagawa (1987)]. In this subsection we extend this approach by allowing for smoothed transitions between regimes.

5.1 The factor dynamics The transition function of the factor is given by:

(t jt?1 ) = (t )a0 (t )b(t?1 );

(5.1)

where the current and lagged directions satisfy:

Z

aj ( )  0;

( )aj ( )d = Eaj ( ) = 1; 8 j = 1; :::; J; J X bj ( ) = 1; 8 j = 1; :::; J:

bj ( )  0;

j=1

(5.2) (5.3)

The value bj ( ) provides the probability of the factor being next period in regime j given the current state  , whereas j ( ) = ( )aj ( ) is the p.d.f. of the factor in regime j . In this case the elements of the D matrix are:

j;k = E [bj (t )ak (t )] = Ek bj (t ); where Ek denotes the expectation with respect to the distribution k . Despite that the factor process is not a Markov chain, the D matrix is a transition matrix, with nonnegative elements summing up to one by columns. We say that such factor process displays smoothed transitions. Example 5.1: The qualitative threshold ARCH models introduced by Gourieroux, Monfort (1991) belong to this class of models. For example, the factor dynamics is given by: J X

J X

j 1t?1 2Aj ut ; j=1 is a partition of the real line, 1 2A denotes the indicator function, which is t =

j=1

j 1 ?1 2A + t

j

where Aj ; j = 1; :::J equal to one if  2 A, to zero otherwise, and (ut ) is a standard gaussian white noise. The transition function is: J 1  ?  X (t jt?1 ) =  t j 1 ?1 2A ; j j=1 j t

j

where  is the p.d.f. of the standard normal. In this model the future regime is entirely predetermined. Example 5.2: Another model with smoothed regime transitions is for example:

9

THIS VERSION: March 28, 2000

  1 (t jt?1 ) = 1  t ? logit (a + bt?1 ) 1 1   2 + 1  t ? [1 ? logit (a + bt?1 )]: 2

2

where logit x = (1 + exp x)?1 .

5.2 The observable dynamics We now verify that the observable process also features smoothed regime transitions. Let us introduce the marginal p.d.f. of yt in regime j :

qj (yt ) = =

Z Z Z

q(yt jt )j (t )dt q(yt jt )(t )aj (t )dt

q(yt )~(yt jt )aj (t )dt = q(yt )^aj (yt ); j = 1; :::; J:

=

Then the observable transition function is:

l(yt jyt?1 ) = q(yt )^a(yt )c(yt?1 );

(5.4)

where q(yt )^aj (yt ) = qj (yt ) is the marginal p.d.f. in regime j; j = 1; :::; J , cj (yt?1 ) = E [bj (t?1 )jyt?1 ], j = 1; :::; J satis es the constraints cj (yt?1 )  0, PJj=1 cj (yt?1 ) = 1; 8yt?1. We conclude that the observable dynamics features smoothed transitions.

6 Conclusions We have derived explicit lter and smoother for Markov models with nite dimensional dependence. These models provide simple predictor spaces and are useful in various applications which involve predictions. In nance, for instance [see Gourieroux, Jasiak (2000)], they can be used to get explicit solutions of intertemporal optimization problems, to construct conditional heteroscedastic models compatible with temporal aggregation, or to relate the factors of the historical and risk neutral dynamics.

10

THIS VERSION: March 28, 2000

Appendix 1 Transition Function of the Factor h-Step Ahead It is immediate by recursion. Indeed we get:

(h+1) (t+h jt?1 ) =

Z Z

(h) (t+h jt )(t jt?1 )dt

(t+h )a0 (t+h )Dh?1 b(t )(t )a0 (t )b(t?1 )dt Z = (t+h )a0 (t+h )Dh?1 b(t )a0 (t )(t )dt b(t?1 ) = (t+h )a0 (t+h )Dh b(t?1 ):

=

Appendix 2 Proof of Property 2.2

Sucient condition

We have:

E [g(t ; :::; t+H )jt?1 ] = E [E [g(t ; :::; t+H )jt ]jt?1 ] = E [~g(t )jt?1 ]; say: Then we have:

Z

g~(t )(t )a0 (t )b(t?1 )dt = E [~g(t )a0 (t )]b(t?1 ):

E [~g(t )jt?1 )] =

Therefore the predictor space is included in the space generated by the J components of b(t?1 ).

Necessary condition

Let us assume a predictor space generated by a nite number of vectors b1 (t?1 ); :::; bJ (t?1 ). Since (t jt?1 ) belongs to the predictor space (a.s. in t ), we can write:

(t jt?1 ) =

J X j=1

Aj (t )bj (t?1 )

= (t ) where aj = Aj =.

J X j=1

aj (t )bj (t?1 );

11

THIS VERSION: March 28, 2000

Appendix 3 Proof of Property 2.3

Sucient condition

It is immediate with aj = j j and bj = .

Necessary condition

Conversely if there exists an in nite number of nonzero canonical correlations we note that the predictor space includes the predictions E [aj (t )jt?1 ] = j bj (t?1 ); 8j , and therefore admits an in nite dimension,since the bj (t?1 ); j varying , are orthogonal.

Appendix 4 The lter We have:

Z

b(t )l(t jyt )dt b(t )~(t jyt )a0 (t )dt c(yt?1 ) = ; (from 3:4) E [a0 (t )jyt ]c(yt?1 ) E [b(t )a0 (t )jyt ]c(yt?1 ) = E [a0 (t )jyt ]c(yt?1 ) :

c(yt ) =

R

THIS VERSION: March 28, 2000

12

References [1] Akaike,H. (1974): "Markovian Representation of Stochastic Processes and Its Application to the Analysis of Autoregressive Moving Average Processes", An. Inst. Stat. Math., 26, 363-387. [2] Gourieroux, C., and J. Jasiak (2000): "Financial Econometrics", preliminary version, http://dept.econ.yorku.ca/ jasiakj. [3] Gourieroux, C., and A. Monfort (1990) "Qualitative Threshold ARCH Models", Journal of Econometrics, 51, 159-199. [4] Granger, C., and T. Terasvirta (1993): "Modelling Nonlinear Economic Relationships", Oxford University Press. [5] Hamilton, J. (1989): "Analysis of Time Series Subject to Changes in Regime", Journal of Econometrics, 64, 307-339. [6] Hamilton, J., and R. Susmel (1994): "Autoregressive Conditional Heteroscedasticity and Changes in Regime", Journal of Econometrics, 64, 307-333. [7] Harrison, P., and C. Stevens (1976): " Bayesian Forecasting", Journal of Roy. Stat. Soc., B, 38, 205-247. [8] Harvey, A. (1989): "Forecasting, Structural Time Series Models and the Kalman Filter", Cambridge University Press. [9] Huberman, G., Kandel, S. and R. Stambaugh (1987): "Mimicking Portfolio and Exact Arbitrage Pricing", Journal of Finance, 42, 1-9. [10] Kalman, R. (1960): "A New Approach to Linear Filtering and Prediction Problem", Journal of Basic Engineering, 82, 34-45. [11] Kalman, R., and R. Bucy (1961): "New Results in Linear Filtering and Prediction Theory", Journal of Basic Engineering, 83, 95-108. [12] Kitagawa, G. (1987): "Non Gaussian State Space Modeling of Nonstationary Time Series", JASA, 82, 1032-1063. [13] Kitagawa, G., and W. Gersch (1984): "A Smoothness Priors-State Space Modeling of Time Series with Trend and Seasonality", JASA, 79, 378-391. [14] Kitagawa, G., and W. Gersch (1996): "Smoothness Priors Analysis of Time Series", Lecture Notes in Statistics, 116, Springer.

THIS VERSION: March 28, 2000

13

[15] Lancaster, H. (1968): " The Structure of Bivariate Distributions", Annals of Mathematical Statistics, 29, 716-736. [16] Shephard, N., and M. Pitt (1997): "Likelihood Analysis of Non Gaussian Measurement Time Series", Biometrika, 84, 653-68. [17] Tong, H. (1990): "Nonlinear Time Series: A Dynamical System Approach", Oxford University Press. [18] West, M., Harrison, P., and H. Mignon (1985): "Dynamic generalized Linear Models and Bayesian Forecasting", JASA, 80-97.

Suggest Documents