of an aggregation of component time series behaviour represented in state-space form. The. Kalman filter is a popular and widely used method for estimation ...
Decomposition of time series models in state-space form E.J. Godolphina , Kostas Triantafyllopoulosb a
Royal Holloway University of London, Egham, Surrey, UK. b University of Newcastle, Newcastle upon Tyne, UK. Abstract
This paper gives a methodology for decompositions of a very wide class of time series, including normal and non-normal time series, which are represented in state-space form. In particular the linked signals generated from dynamic generalized linear models are decomposed into a suitable sum of noise-free dynamic linear models. A number of relevant general results are given and two important cases, consisting of normally distributed data and binomially distributed data, are examined in detail. The methods are illustrated by considering examples involving both linear trend and seasonal component time series. Some keywords: Decompositions of time series, dynamic models, generalized linear models, Bayesian forecasting, state space models, Kalman filtering.
1
Introduction
Let {yt } denote a discrete set of observations which becomes available at regular and roughly equal intervals of time. A key problem of time series analysis, which has received much attention in the literature, is to decompose yt into independent trend and seasonal component time series. This problem was discussed in the early path-breaking works of Mitchell (1927), Yule (1927) and Davis (1941); and later important contributions include the works of Whittle (1963, ch. 8) and Hillmer and Tiao (1982). In a related work, Pollock (2000) uses modifications of the Weiner-Kolmogorov filter theory to extract trend components from economic time series; and the decomposition of autoregressive integrated moving average (ARIMA) models in terms of structural trend components is considered in the review by Pollock (2001), which provides further useful references. Much of the recent time series literature is focussed on the specification of yt in terms of an aggregation of component time series behaviour represented in state-space form. The Kalman filter is a popular and widely used method for estimation and forecasting state-space models; see for example Durbin and Koopman (2001) and Pollock (2003), who provides a good account, including historical references, on the use of the Kalman filter in Econometrics. From a Bayesian standpoint, West and Harrison (1997) propose a class of time series dynamic linear models (TSDLMs), defined by yt = µt + νt ,
µt = F 0 θt ,
θt = Gθt−1 + ωt ,
νt ∼ N (0, V ),
(1)
ωt ∼ N (0, W ),
(2)
where µt represents the typically unobserved time series signal which depends on a d × 1 parameter vector θt , νt denotes independent normally distributed measurement noise with 1
expectation zero and a finite variance V and ωt denotes independent d × 1 system noise with finite covariance matrix W . An initial prior density for θ0 is essential, typically a Gaussian prior, i.e. θ0 ∼ N (m0 , C0 ), for some mean and covariance matrix m0 and C0 . In many applications in practical work it is necessary to specify the measurement equation (1) and the state equation (2) which define the full model. The matrices F , G, V and W depend on unknown parameters in general and, as pointed out by Godolphin and Stone (1980), there may be some doubt about d, the dimension of θt . Estimation of the full model from the data by maximum likelihood methods, which is described for example by Durbin and Koopman (2001, ch. 7), is usually effective if the series is not too short. Alternatively, the full model can be pre-specified using suitable canonical forms as described, for example, in West and Harrison (1997, ch. 5,6). The decomposition of the state space model into independent component series is subsequent to model specification, hence F , G, V and W are a priori known. The problem of decomposing a restricted class of TSDLMs into minimal component series, hereafter known as the irreducible decomposition, is achieved by West (1997). The generalization of this problem to any TSDLM (1) – (2), subject only to the assumption that the model is observable, is given by Godolphin and Johnson (2003) who show that the signal µt can be decomposed into several independent component series of ARMA or ARIMA type. However, there are many situations where the assumption that the time series follows a normal distribution is evidently incorrect. Behavioural and health sciences give typical examples with distributions including binomial, Poisson, gamma, log-normal and negative binomial. Smith (1979) and Key and Godolphin (1981) consider a steady state-space model that has a non-normal distribution, whilst Harvey and Fernandes (1989) and Smith and Miller (1986) discuss other non-normal models. Kitagawa and Gersch (1996) and Durbin and Koopman (2000) study non-normal time series from a computational viewpoint. The generalization of TSDLMs to the non-normal case is achieved by considering dynamic generalized linear models (DGLMs), in which yt has a distribution belonging to the exponential family. Gamerman (1991) and Hemming and Shaw (2002) use DGLMs to model survival data, whilst Fahrmeir and Tutz (1994, p. 271) discuss interesting examples of DGLMs for survival modelling. But the fundamental problem of decomposition of the class of DGLMs does not appear to have been considered previously in the literature. In this paper we describe a methodology for this decomposition provided that {yt } has a distribution from the exponential family. The main results in section 2 specify properties of DGLMs and describe their decomposition. Section 3 considers two examples in which the distribution of yt is normal and binomial. Conclusions are given in section 4. Relevant results on rational canonical forms are described briefly in the appendix (section 5).
2
Dynamic generalized linear model
Dynamic generalized linear models (DGLMs) are extensions of Gaussian state-space models in which it is assumed that the density, p(yt |γt ), of the response yt is a member of the exponential family of distributions, where γt is the natural parameter. This specification is very general and includes the conventional Gaussian state space model as a special case. It also includes responses following many of the more familiar distributions, e.g. binomial, Poisson, beta and gamma distributions.
2
In particular, the DGLM sets · ¸ 1 p(yt |γt ) = exp {z(yt )γt − b(γt )} c(yt , φt ) a(φt )
(observation model),
(3)
where φt is known, c(yt , φt ) is a known function of yt and φt , z(·) is either the identity function or a simple linear function of yt and the function b(·) is assumed to be twice differentiable and convex. It is known (McCullagh and Nelder, 1989) that µt = E{z(yt )|γt } = b0 (γt ). The link function g(·) is a known continuous monotonic function that connects µt to the linear predictor ηt , which is specified in state-space form by the dynamic linear model in terms of the d × 1 state vector θt , i.e. g(µt ) = ηt = F 0 θt θt = Gθt−1 + ωt ,
ωt ∼ (0, W )
(link equation),
(4)
(evolution equation),
(5)
where F , G and W are known. Here the notation X ∼ (m, C) signifies that the mean of the random vector X is m and the covariance matrix of X is C. Defining y t = {y1 , . . . , yt } to denote the information set of data up to and including time t, it follows from an argument developed by West and Harrison (1997, ch. 14) that the observation equation (3) is implicitly conditional on the information set y t−1 (t ≥ 2 and y 1 = y1 ).
2.1
Properties of the model
The analysis of model (3) – (5) is based on a specified distribution for γt . If the prior distribution of γt given y t−1 is in the prior conjugate family of distributions, then the posterior density of γt given y t is in this same family. The forecast density of yt |y t−1 is derived from the prior for γt and density (3). In general the prior for γt at time t − 1 is specified in terms of some known rt and st and known normalising constant κ(·, ·) by p(γt |y t−1 ) = κ(rt , st ) exp{rt γt − st b(γt )}.
(6)
As the new observation yt becomes available then y t = {y t−1 , yt } so that the posterior distribution of γt is obtained by Bayes’ theorem as p(γt |y t ) = p(γt |y t−1 )p(yt |γt )/p(yt |y t−1 ) ½ ¾ yt 1 = κ rt + , st + a(φt ) a(φt ) ·½ ¾ ½ ¾ ¸ yt 1 × exp rt + γt − st + b(γt ) , a(φt ) a(φt )
(7)
where the density p(yt |y t−1 ) is the one-step forecast distribution. Density (7) is then in the same family of distributions as (6) and this underpins the conjugacy of the system. Note that it does not generally follow that the forecast distribution has a recognizable form. For an example on this see section 3.2. The k-step forecast distribution of yt+k given information y t is Z t p(yt+k |y ) = p(yt+k |γt+k )p(γt+k |y t ) dγt+k =
κ{rt (k), st (k)}c(yt+k , φt+k ) κ{rt (k) + yt+k /a(φt+k ), st (k) + 1/a(φt+k )} 3
(8)
such that rt (k) and st (k) are parameters consistent with ft (k) = E(ηt+k |y t ) = F 0 Gk mt and qt (k) = var(ηt+k |y t ) = F 0 {Gk Ct (G0 )k +kW }F , where mt and Ct are respectively the posterior mean vector and covariance matrix of θt . The moments mt and Ct are obtained recursively, as follows. Suppose that at time t − 1, conditional on y t−1 , the posterior mean and covariance matrix of θt−1 are known, i.e. θt−1 |y t−1 ∼ (mt−1 , Ct−1 ).
(9)
Then the prior first two moments of θt at time t are θt |y t−1 ∼ (Gmt−1 , GCt−1 G0 + W ).
(10)
The posterior mean and covariance matrix of θt at time t are θt |y t ∼ (mt , Ct )
(11)
such that mt = E{E(θt |ηt , y t−1 )|y t } = Gmt−1 + (GCt−1 G0 + W )F (ft∗ − F 0 Gmt−1 ) and Ct = var{E(θt |ηt , y t−1 )|y t } + E{var(θt |ηt , y t−1 )|y t } £ = (GCt−1 G0 + W ) Id − F F 0 (GCt−1 G0 + W ) ½ ¾ ¸ qt∗ 1 × 1− 0 , F (GCt−1 G0 + W )F F 0 (GCt−1 G0 + W )F where the conditional mean and covariance matrix of θt |ηt , y t−1 are obtained by linear Bayes’ estimation, after observing that ¯ · ¸ ½· 0 ¸ · 0 ¸¾ ηt ¯¯ t−1 F Gmt−1 F (GCt−1 G0 + W )F F 0 (GCt−1 G0 + W ) y ∼ , . θt ¯ Gmt−1 (GCt−1 G0 + W )F GCt−1 G0 + W Here ft∗ = E(ηt |y t ) and qt∗ = var(ηt |y t ) are calculated from the posterior density (7). An initial specification needs to be done for θ0 ∼ (m0 , C0 ).
(12)
m0 and C0 are chosen by the modeller, usually following vague prior specification; see for example Triantafyllopoulos and Pikoulas (2002). For every t ≥ 1 the parameters rt and st need to be specified. This specification is done so that rt and st are consistent with the definitions of ft = E(ηt |y t−1 ) = F 0 Gmt−1 and qt = var(ηt |y t−1 ) = F 0 (GCt−1 G0 + W )F . Equations (6) – (12) consist of a full algorithm with properties similar to the Kalman filter. This algorithm can be considered optimal in two ways. Firstly, it provides a full conjugate analysis, similar to GLM theory, for the natural parameter γt and for the forecast distribution of yt+k |y t . Secondly, for the state vector θt , this algorithm provides Bayes’ linear estimation, exhibiting optimal properties such as minimum least square errors and minimum expected risk under quadratic loss; some details on the above Bayes’ linear optimality appear in Goldstein (1976) and West and Harrison (1997, §4.9). 4
2.2
Decomposition of the model
The d × 1 vector F and d × d evolution matrix G are specified in the link equation (4) and evolution equation (5). It is assumed that the d × d observability matrix £ ¤ T = F G0 F · · · (G0 )d−1 F has full rank d. This implies that G is non-derogatory, so the minimum polynomial has degree d and is equal to the characteristic polynomial Φ(λ) = det(λId − G) = λd + φ1 λd−1 + · · · + φd , where Id denotes the d × d identity matrix. Let Φ(λ) be factorized into a product of s ≤ d relatively prime factors Φ(λ) = Φ1 (λ)Φ2 (λ) · · · Φs (λ) such that the polynomial factor Φ` (λ) has degree d` , given by Φ` (λ) = λd` + φ1,` λd` −1 + · · · + φd` ,`
(` = 1, . . . , s),
where d1 + d2 + · · · + ds = d. This factorization can usually be achieved in many ways. If the factors Φ1 (λ), . . . , Φs (λ) are the elementary divisors of Φ(λ) then this is the irreducible primary factorization of Φ(λ). In general there may be repeated roots or there may exist a primary factorization of Φ(λ) which is reducible, i.e. some elementary divisors are combined and the s components of Φ(λ) are relatively prime. Under these conditions G is similar to a primary rational canonical form, which implies that there is a nonsingular matrix Q such that Q−1 GQ = CP = C(Φ1 ) ⊕ C(Φ2 ) ⊕ · · · ⊕ C(Φs ), i.e. G is similar to the direct sum of s companion matrices, such that C(Φ` ) is the companion matrix of Φ` (λ), as described in the appendix. Let F 0 Q be expressed as F 0 Q = [F10 F20 · · · Fs0 ] where the component term F`0 is a 1 × d` row vector, let Ω` denote the d` × d` nonsingular matrix F`0 F 0 C(Φ` ) ` Ω` = .. . F`0 C(Φ` )d` −1 and let Ω denote the corresponding direct sum Ω = Ω1 ⊕ Ω2 ⊕ · · · ⊕ Ωs . Godolphin and Johnson (2003) show that matrices Ω and CP commute, which result has the following interesting application. Define the d × 1 transformed state vector ξt = ΩQ−1 θt , then it follows from equation (5) that ξt has the recursive form ξt = CP ξt−1 + δt ,
where δt = ΩQ−1 ωt .
5
Furthermore the primary rational canonical form CP is a direct sum of companion matrices, 0 0 ]0 consequently the ξt and δt vectors can be partitioned into s components ξt = [ξ1,t · · · ξs,t 0 0 0 and δt = [δ1,t · · · δs,t ] such that for each ` = 1, . . . , s we have ξ`,t = C(Φ` )ξ`,t−1 + δ`,t ,
(13)
and the s vector models specified by (13) are self-contained in the sense that no elements of ξt or δt occur in more than one model. Now consider the link equation (4). Let e0d` = [1 0 · · · 0] denote the 1 × d` vector with leading term unity and remaining terms zero. It follows that g(µt ) = ηt = F 0 θt = F 0 QQ−1 θt = [F10 F20 · · · Fs0 ]Q−1 θt = [e0d1 Ω1 e0d2 Ω2 · · · e0ds Ωs ]Q−1 θt = [e0d1 e0d2 · · · e0ds ]ΩQ−1 θt = [e0d1 e0d2 · · · e0ds ]ξt (1)
(2)
(s)
= χt + χt + · · · + χt , (`)
where χt
(14)
is the noise-free component time series (`)
χt = e0d` ξ`,t
(` = 1, . . . , s)
(15)
which, together with the s component evolution equations (13), gives the required decomposition. This work generalizes the TSDLM decomposition of Godolphin and Johnson (2003) to a time series whose distribution is a member of the exponential family. The additive property of the linked signal is demonstrated by equation (14); in general, however, this does not extend to the original time series unless the inverse of the link function, g(·)−1 , preserves this additive property. Note that if Φ(λ) has more than two elementary divisors then the factorization into relatively prime factors is not unique and each primary factorization implies a different decomposition and, in particular, an irreducible decomposition results from an irreducible primary factorization of Φ(λ). However, not all of these decompositions can be expected to be independent, which is an important requirement when investigating the behavioural properties of individual components. In practice it will be necessary to check the condition of Godolphin and Johnson (2003, section 5) before applying the method.
3
Examples
In this section the results of section 2 are illustrated by considering decompositions of time series following certain distributions in the exponential family. Two cases considered here in detail are the normal distribution and the binomial distribution.
3.1
Normal
In the normal case, the distribution p(yt |µt ) in the observation equation (3) is · ½ ¾ ½ ¾¸ 1 µ2t 1 yt2 p(yt |µt ) = exp yt µt − − + log(2πφt ) , φt 2 2 φt
6
(16)
60 50 40 30 20 0
10
monthly sales
0
20
40
60
80
time (months)
Figure 1: Monthly coded seasonal sales of an agricultural product. which is the canonical form of the exponential family with b(µt ) = 12 µ2t ; then E(yt |µt ) = b0 (µt ) = µt and var(yt |µt ) = b00 (µt )φt = φt so that the natural parameter is the conditional mean and φt is the conditional variance. The canonical link in this case is the identity function so the link equation (4) becomes µt = g(µt ) = ηt = F 0 θt , from which equations (1) and (2) are obtained. Furthermore, the prior and posterior densities in equations (6) and (7) are normal and the k-step forecast distribution of yt+k |y t is normal for every integer k ≥ 1. To illustrate the decomposition for the normal model (16), data consisting of coded sales of an agricultural product is considered. Monthly data for a period of seven complete years, i.e. a total of 84 observations, are available and are plotted in Figure 1. A fit of the primary rational canonical form, with suitably chosen initial values following classical Bayesian procedures, appears to be adequate for these data. The 13 × 13 matrix G is non-derogatory with characteristic polynomial given by Φ(λ) = (λ − 1)2 (λ + 1)(λ2 + 1)(λ2 − λ + 1)(λ2 + λ + 1)(λ4 − λ2 + 1) = (λ − 1)2 × (seasonal factor)
(17)
This polynomial has six elementary divisors but the irreducible decomposition is of little interest in this case because the independence condition of Godolphin and Johnson (2003) is not satisfied. The factorization of (17) into two relatively prime factors corresponds to a (1) decomposition of µt into two component time series, a “linear growth” trend term χt and (2) a “seasonal” term χt with period 12. Each component model is represented by equations 7
(15) and (13), and is specified as follows: for the “linear growth” trend term · ¸ £ ¤ 1 1 (1) ξ1,t−1 + δ1,t , χt = 1 0 ξ1,t and ξ1,t = 0 1
(18)
whilst for the “seasonal” component (2) χt
=
£
1
0 010
¤
· ξ2,t
and ξ2,t =
010 I10 −1 −1010
¸ ξ2,t−1 + δ2,t ,
(19)
where 010 and 110 denote the 10 × 1 column vectors 010 = [0 0 · · · 0]0 and 110 = [1 1 · · · 1]0 . The two component TSDLMs represented by (18) and (19) exist as independent time series in their own right and their properties can be investigated separately. For example, Figure 2 considers the one-step forecasts for µt , and hence the original observations yt , by deriving and plotting one-step forecasts for each of the two components. The additive property of the linked signal (14) is preserved in the normal case, hence overall forecasts for yt are found by combining both sets of forecasts and these are also plotted in Figure 2.
3.2
Binomial
Suppose that the series {yt } follows a binomial distribution with density à ! λt p(yt |πt ) = πtyt (1 − πt )λt −yt , yt = 0, 1, . . . , λt ; 0 < πt < 1, yt
(20)
where πt is a random variable which denotes the probability of success at time t and λt is a known positive integer at time t. Then ! ½ µ ¶ ¾Ã πt λt p(yt |πt ) = exp yt log + λt log (1 − πt ) yt 1 − πt à ! ¾¸ · ½ yt λt γt γt − log (1 + e ) . = exp λt yt λt This implies that z(·) is the proportion z(yt ) = yt /λt and that the natural parameter γt = log{πt /(1−πt )}. In this case b(γt ) = log(1+eγt ), confirming that πt = E({z(yt )|πt } = b0 (γt ) = eγt /(1+eγt ). The link function is g(µt ) = γt and so the canonical link and evolution equations are ¶ µ πt = F 0 θt (21) ηt = log 1 − πt θt = Gθt−1 + ωt , ωt ∼ (0, W ). The natural prior for πt |y t−1 is the beta distribution p(πt |y t−1 ) =
Γ(st ) π rt −1 (1 − πt )st −rt −1 , Γ(rt )Γ(st − rt ) t
0 < rt < st
and the posterior for πt |y t is the beta distribution p(πt |y t ) =
Γ(st + λt ) π rt +yt −1 (1 − πt )st +λt −rt −yt −1 , Γ(rt + yt )Γ(st + λt − rt − yt ) t 8
60 40 20 −20
0
one−step forecast
0
20
40
60
80
time (months)
Figure 2: One-step forecasts for the two components of the agricultural sales series. The solid line represents the overall forecasts for yt , the dashed line shows the forecasts for the trend, and the dotted line shows the forecasts for the seasonal component.
9
quarter Q1 Q2 Q3 Q4
Table 1: Quarterly binomial data over a period of eleven years. yr 1 yr 2 yr 3 yr 4 yr 5 yr 6 yr 7 yr 8 yr 9 yr 10 yr 11 10 10 11 12 14 14 15 17 17 18 19 3 4 5 5 6 7 6 7 8 8 10 1 1 2 2 3 4 4 5 6 7 7 6 7 7 8 9 9 10 9 10 11 11
where Γ(·) is the gamma function and rt and st are known parameters. As discussed in section 2 these parameters must be consistent with the definition of ft = E(ηt |y t ) and qt = var(ηt |y t ). West and Harrison (1997, p. 530) state that for the above binomial time series it is approximately 1 + exp(ft ) 1 + exp(−ft ) rt = and st = . qt qt Furthermore, the normalizing constant κ(rt , st ) = Γ(st ){Γ(rt )Γ(st − rt )}−1 . Initial settings are required for m0 and C0 ; then it is possible to calculate all rt and st and identify all prior and posterior densities of πt . The k-step forecast distribution can be obtained from equation (8) as ! Ã Γ{st (k)} λt+k t p(yt+k |y ) = Γ{rt (k)}Γ{st (k) − rt (k)}Γ{st (k) + λt+k } yt+k ×Γ{rt (k) + λt+k yt+k }Γ{st (k) − rt (k) + λt+k (1 − yt+k )}, where rt (k) and st (k) are defined in §2. Since the forecast density has no standard form the derivation of its moments requires evaluation of Γ(·). The use of the binomial model (20) is illustrated by considering the quarterly data of Table 1. In each quarter, over an eleven year period, λt = 25 Bernoulli trials are performed and yt , the number of successes, is recorded. These data suggest there are within-year seasonal factors affecting πt , the probability of success, and that there is a steady increase in πt from year to year. It is evident that the normal model is entirely inappropriate for analyzing these data, although some state-space model should be appropriate. The binomial model is used to investigate suggestions about possible growth and seasonality by choosing the canonical link given by (21). The quarterly growth model selected for this purpose is Model B2 of Godolphin (2001) but perhaps it can be argued that the “asymptotic ceiling model”, i.e. Model D of Godolphin (2001), may be more appropriate. In either case the matrix G is non-derogatory and the results of section 2 apply. The 44 proportions z(yt ) = yt /λt are transformed by the logistic transformation ½ ¾ µ ¶ µ ¶ z(yt ) yt yt log = log = log , 1 − z(yt ) λt − yt 25 − yt and the quarterly growth model is fitted with suitably chosen initial values following classical (1) Bayesian procedures, and plotted in Figure 3. The growth component time series ψt = χt (2) and quarterly seasonal component time series χt , obtained by decomposing the transformed proportions, are also plotted in Figure 3. There is some interest in this situation for examining the ‘de-seasonalised’ linear trend component on the original probability scale. This is obtained in Figure 4 by transforming 10
1 0 −1 −3
−2
logistic proportion
2
4
6
8
10
12
time (years)
Figure 3: Transformed logistic proportions for the binomial time series. The solid line is the (1) transformed time series z(yt ), the dashed line shows the trend component ψt = χt , and the (2) dotted line shows the seasonal component χt .
11
0.7 0.6 0.5 0.4 0.1
0.2
0.3
probability
2
4
6
8
10
12
time (years)
Figure 4: De-seasonalised linear trend component on the original probability scale. The solid line is the fitted model ut /(1 + ut ) and the dashed line shows the de-seasonalised trend component ψt .
12
back the growth term ψt by the inverse link function exp(ψt )/{1 + exp(ψt )} and then by (1) (2) superimposing this on the fitted model ut /(1 + ut ), where ut = exp (χt + χt ).
4
Concluding comments
This paper develops a methodology for decomposing the linked signal of a given dynamic generalized linear model into simple noise free dynamic linear models. No attempt is made to develop performance measures for the full state-space model from the method described here. Decomposition of the signal implies a corresponding decomposition of forecasts, which is critically useful for the development of forecast-based decisions. Since these combined component forecasts are necessarily identical to the original forecasts for the full model, for any given lead time and for any given model specification, it follows that no measure of performance can usefully be inferred from these results. The focus is on the variety of component specification, the advantage and the utility of component forecasting and the implications for model building in terms of independent component time series. Earlier attempts to model individual components as state-space models include the work on seasonal models due to Harrison and Stevens (1976) and the work on polynomial trend models due to Godolphin and Stone (1980). An advance on this activity is our proposal of using the primary rational canonical form of the evolution matrix to determine relevant component state-space models. This presents an approach to the presentation of the state-space form that complements the work of West and Harrison (1997) and Durbin and Koopman (2001). The approach is computationally convenient, generally applied and is applicable to normal as well as non-normal time series in state-space form.
Acknowledgements The authors are grateful to Stephen Pollock and two referees for useful comments on an earlier draft of the paper. The work of Kostas Triantafyllopoulos was supported by grant NAL/00642/G of the Nuffield Foundation.
5
Appendix: rational forms
To justify steps in the development of the argument of section 2, we give below a brief discussion of similarity analysis for matrices and the primary rational canonical form. Two d × d matrices G and H are similar if there is a nonsingular matrix Q such that H = Q−1 GQ. Similarity is an equivalence relation on the set of d × d matrices, so a canonical form of a given evolution matrix G under similarity is a standard representative of the set of matrices similar to G. Canonical forms considered here are the rational canonical form and the primary rational canonical form, both of which are defined over the real field. An alternative canonical form is the Jordan canonical form which is specified over the field of complex numbers. West and Harrison (1997, §5.4) adopt the Jordan canonical form for a class of similar evolution matrices with real-valued eigenvalues and a modified real-valued Jordan canonical form for a class of similar evolution matrices with complex-valued eigenvalues. Both canonical forms are real valued, however the specification of two distinct primary canonical forms for the similarity classes of evolution matrices can be avoided if either the rational canonical form or the primary rational canonical form is adopted. 13
Let ∆(λ) = λn + δ1 λn−1 + · · · + δn−1 λ + δn be a monic polynomial of degree n. Then the companion matrix C{∆(λ)} associated with ∆(λ) is the n × n matrix given by 0 1 0 ··· 0 0 0 1 ··· 0 .. . . .. . .. .. .. C{∆(λ)} = . . . 0 0 0 ··· 1 −δn −δn−1 −δn−2 · · · −δ1 Let G be a d × d matrix and consider λId − G, the characteristic matrix of G. Let the monic greatest common divisor of all minors of order i × i of λId − G be hi (λ) so that the invariant factors of λId − G are defined by ki (λ) =
hi (λ) hi−1 (λ)
∀ i = 1, . . . , d,
where h0 (λ) = 1 and hd (λ) = det(λId − G). It follows that two d × d matrices are similar if and only if their characteristic matrices have the same invariant factors. Furthermore, any d × d matrix G is similar to a unique matrix of the form C = C{kr+1 (λ)} ⊕ C{kr+2 (λ)} ⊕ · · · ⊕ C{kd (λ)},
(22)
where (i) kr+1 (λ), . . . , kd (λ) are the d − r nonconstant invariant factors of λId − G; (ii) ki−1 (λ) is a factor of ki (λ), for i = r + 2, . . . , d; (iii) C{ki (λ)} is the di × di companion matrix associated with the monic polynomial ki (λ) of degree di . The matrix C in equation (22) is the rational canonical form of G. The characteristic polynomial of G is the determinant of the characteristic matrix and the minimum polynomial is the monic polynomial of least degree such that m(G) = 0. Let the invariant factors of λId −G be denoted by k1 (λ), . . . , kd (λ). Then the characteristic polynomial is the product Φ(λ) = k1 (λ) · · · kd (λ), and the minimum polynomial is then m(λ) = kd (λ). If k1 (λ) = · · · = kd−1 (λ) = 1 then Φ(λ) = m(λ) = kd (λ) and the matrix G is said to be non-derogatory (or cyclic). We have immediately that a real valued d × d matrix G with characteristic polynomial Φ(λ) is non-derogatory if and only if its rational canonical form is the d × d companion matrix C{Φ(λ)}. Thus a non-derogatory matrix has rank d or d − 1, depending on whether Φ(λ) possesses a constant term or not. A second canonical form which is defined as a block diagonal of companion matrices is the primary rational canonical form. This canonical form is in general not unique and is constructed by factorizing each of the nonconstant invariant factors of the characteristic matrix λId − G into relatively prime components. In the case of a non-derogatory matrix G with characteristic polynomial Φ(λ) = Φ1 (λ)Φ2 (λ) · · · Φs (λ), where the s ≤ d factors Φ1 (λ) · · · Φs (λ) are relatively prime, the matrix G is similar to C(Φ1 ) 0 ··· 0 0 C(Φ2 ) · · · 0 CP = (23) .. .. . . . . . . . . 0 0 · · · C(Φs ) 14
such that C(Φi ) is the companion matrix of Φi (λ), (i = 1, . . . , s). Matrix representation (23) is the primary rational canonical form for G. If it is known that the matrix G is non-derogatory, the key distinction between the rational canonical form C{Φ(λ)} and the corresponding primary rational canonical form (23) is that the factorization of the characteristic polynomial Φ(λ) into relatively prime monic polynomials is a crucial part of the formulation of the canonical form. The decomposition of the linked signal is dependent on this factorization of the characteristic polynomial.
References Davis, H.T., 1941. The Analysis of Economic Time Series, Cowles Commision monograph No. 6. Principia Press, Bloomington, Indiana. Durbin, J., Koopman, S.J., 2000. Time series analysis of non-Gaussian observations based on state space models from both classical and Bayesian perspectives (with discussion). J. Roy. Statist. Soc. B 62 (1), 3-56. Durbin, J., Koopman, S.J., 2001. Time Series Analysis by State Space Methods. Oxford University Press, Oxford. Fahrmeir, L., Tutz, G., 1994. Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, New York. Gamerman, D., 1991. Dynamic Bayesian models for survival data. Applied Statistics 40 (1), 63-79. Godolphin, E.J., 2001. Observable trend-projecting state-space models. J. Applied Statistics 28 (3/4), 379-389. Godolphin, E.J., Johnson, S.E., 2003. Decomposition of time series dynamic linear models. J. Time Series Analysis 24 (5), 513-527. Godolphin, E.J., Stone, J.M., 1980. On the structural representation for polynomialprojecting models based on the Kalman filter. J. Roy. Statist. Soc. B 42 (1), 35-45. Goldstein, M., 1976. Bayesian analysis of regression problems. Biometrika 63 (1), 51-58. Harrison, P.J., Stevens, C.F., 1976. Bayesian forecasting (with discussion). J. R. Statist. Soc. B 38 (3), 205-247. Harvey, A.C., Fernandes, C., 1989. Time series models for count or qualitative observations. J. Business & Econom. Statist. 7 (4), 407-417. Hemming, K., Shaw, J.E.H., 2002. A parametric dynamic survival model applied to breast cancer survival times. Applied Statistics 51 (4), 421-435. Hillmer, S.C., Tiao, G.C., 1982. ARIMA-model based approach to seasonal adjustment. J. Amer. Statist. Assoc. 77 (377), 63-70. Key, P.B., Godolphin, E.J., 1981. On the Bayesian steady forecasting model. J. Roy. Statist. Soc. B 43 (1), 92-96. 15
Kitagawa, G., Gersch, W., 1996. Smoothness Priors Analysis of Time Series. Springer, New York. McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models, 2nd Edition. Chapman & Hall, London. Mitchell, W.C., 1927. Business Cycles, the Problem and Setting. The National Bureau of Economic Research, New York. Pollock, D.S.G., 2000. Trend estimation and de-trending via rational square-wave filters. J. Econometrics 99 (2), 317-334. Pollock, D.S.G., 2001. Methodology for trend estimation. Economic Modelling 18 (1), 75-96. Pollock, D.S.G., 2003. Recursive estimation in econometrics. Computational Statistics and Data Analysis 44 (1/2), 37-75. Smith, J.Q., 1979. A generalization of the Bayesian steady forecasting model. J. R. Statist. Soc. B 41 (3), 375-387. Smith, R.L., Miller, J.E., 1986. A non-Gaussian state space model and application to prediction of records. J. Roy. Statist. Soc. B 48 (1), 79-88. Triantafyllopoulos, K., Pikoulas, J., 2002. Multivariate Bayesian regression applied to the problem of network security. J. Forecasting 21 (8), 579-594. West, M., 1997. Time series decompositions. Biometrika 84 (2), 489-494. West, M., Harrison, P.J., 1997. Bayesian Forecasting and Dynamic Models, 2nd Edition. Springer, New York. Whittle, P., 1963. Prediction and Regulation by Linear Least Square Methods. English Universities Press. Yule, G.U., 1927. On a method of investigating periodicities in disturbed series with special reference to Wolfer’s sunspot numbers. Phil. Trans. Roy. Soc. 226, 267-298.
16