Residual and forecast methods in time series models ...

Pruscha: Residual and forecast methods in time series models with covariates Sonderforschungsbereich 386, Paper 33 (1996) Online unter: http://epub.ub.uni-muenchen.de/

Projektpartner

RESIDUAL AND FORECAST METHODS IN TIME SERIES MODELS WITH COVARIATES BY HELMUT PRUSCHA Mathematical Institute, University of Munich, Theresienstr. 39 D-80333 Munich, Germany

SUMMARY We are dealing with time series which are measured on an arbitrary scale, e.g. on a categorical or ordinal scale, and which are recorded together with time varying covariates. The conditional expectations are modelled as a regression model, its parameters are estimated via likelihood- or quasi-likelihoodapproach. Our main concern are diagnostic methods and forecasting procedures for such time series models. Diagnostics are based on (partial) residual measures as well as on (partial) residual variables l-step predictors are gained by an approximation formula for conditional expectations. The various methods proposed are illustrated by two di erent data sets. Some key words: Categorical time series Conditional regression models Forecast methods Ordinal responses Partial residuals.

1 INTRODUCTION We are concerned with time series data (Zt Yt) t 1, where Yt is a response variable measured on a scale which is not necessarily metrical, and Zt is a vector of covariates. The evolution of the Yt-process is assumed to be driven by its own history as well as by the covariate process Zt. The conditional expectation of Yt is modelled in the form h(t), where h is a suitable response function and t a regression term containing the actual covariates Zt as well as former observations (sec. 2 and 3). Such models were already investigated by Kaufmann (1987), Zeger and Qaqish (1988), Pruscha (1993), 1

Lee (1991 1994). Their statistical analysis is based on (quasi-) likelihood methods (sec. 4). The main concern of the present paper is to carry over two classic time series topics to these more general models. The rst is the partial residual analysis (sec. 5) which can be used to assess the relevance of subsets of covariates as well as to remove the inuence of covariate subsets (see Fuller, 1976, sec. 9.3 for the latter). Our methods are inspired by linear model theory and can be found in the special case of a binary logistic model in Landwehr et al (1984) and of a cumulative logistic model in Pruscha (1994). The second topic deals with forecasting future outcomes YT +1 YT +2 : : :, if the process has been observed up to time T. Here, some recent work on the cumulative model (Pruscha, 1995) is continued and generalized: We will arrive at forecast formulas covering the well-known recursive equations of Box and Jenkins (1976) as well as the l-step transition laws for nite Markov chains (sec. 6). In the following we assume that the response variable is m-dimensional,

Yt = (Yt1 : : : Ytm)T and that the covariates Zt form an r-dimensional vector process. Formally the continuous case, where the Ytj are metrically scaled, is also covered. Most emphasis, however, lies on the discrete case, where Yt is e.g. Mm(1 t) distributed, i.e. multinomially distributed with parameters 1 and t,

t = (t1 : : : tm)T tj > 0

m X tj < 1:

j =1

Within this case special attention is given to an ordinally scaled Yt , where it is useful to introduce cumulative probabilities

t(j) = t1 + : : : + tj : Throughout we will use subscripts in parenthesis to indicate an increasing order. 2

2 MODELLING The collection of variables, observed earlier than Yt, is denoted by Ht = (Z1 Y1 : : : Zt;1 Yt;1 Zt) : The m-dimensional conditional expectation vector t = E (YtjHt) is modelled in the form t = h(t) t = 1 2 : : : (1) where h : IRm ! IRm is an appropriate response function and t = (t1 : : : tm)T the linear regression term of the model. This term is written as t = Xt# (2) where # 2 IRp comprises the unknown parameters and the entries of the m p -matrix Xt are functions of Ht. Typically, the dependence on the last response Yt;1 or on the last responses Yt;1 : : : Yt;k ] and on the present covariates Zt are separated in (2), leading to

tj = j + T Ht;1j + T (Yt;1)j + TZt : (3) Here, for each j = 1 : : : m j is an intercept term, Ht;1j is an s 1 -vector, the components of which being functions of Ht;1, and (y)j is a q 1 -vector expressing the dependence on the last responsePy = Yt;1 the term T (Yt;1) can be expanded to T (Yt;1 : : : Yt;k ) or to ki=1 Ti (Yt;i)]. For model (1) - (3), the vector # 2 IRp, where #T = ( T T T T) 2 IRm 2 IRs 2 IRq 2 IRr is the unknown parameter vector of dimension p = m+s+q +r, and H0 (Y0) must be given in advance. 3

3 EXAMPLES Let us consider examples, some of them already existing in the literature. 1. Yt multinomially distributed Let, conditionally on Ht, the variable Yt be Mm (1 pt) - distributed. In this case pt t . We have m + 1 alternatives, the last one being Ytm+1 = 1 ; (Yt1 + : : : + Ytm), occurring with probability

ptm+1 = 1 ; (pt1 + : : : + ptm ) :

A suitable response function is, e.g., hj () = exp(j )=(1 + Pmk=1 exp(k )) which leads to a multivariate logistic regression model. Together with the general form (2) of the regression term multinomial models were analysed by Kaufmann (1987). Let us mention two specications of (y) occurring in (3). In the rst, so-called lagged variable dummies are used, i.e. we put (i)

q = m (y)j = y for all j:

In the second , an (m + 1) (m + 1) -transition matrix P ( ) is employed. P m +1 Letting ym+1 = 1 ; (y1 + : : : + ym) and w = j=1 jyj , we put (ii)

q = m + 1 (y)ij = P (i j ) if w = i (y)ij = 0 else],

such that T (y)j = w P (w j ), see Goettlein and Pruscha (1992) for an application. Assuming case (ii) and h = id = = = 0 = 1 model (1) - (3) describes a simple Markov chain with known transition matrix P ( ). A Markov chain with unknown transition matrix can be obtained from (1)-(3), if one allows Tj (y)j instead of T (y)j and takes case (i) above, but with q = m + 1 instead of q = m. 2. Ordinally scaled response Let Yt as in 1: and dene the J = f1 2 : : : m + 1g -valued ordinal variable 4

Wt =

X jY : tj

m+1

(4)

j =1

Introducing H(j)() = h1()+: : :+hj () , we have from (1) for j = 1 : : : m+1

pt(j) pr(Wt j jHt) = H(j)(t) : (5) If F denotes a (cumulative) distribution function we dene a cumulative regression model (Mc Cullagh, 1980) by setting H(j)(t) = F (t(j)) : (6) Putting s = 1 and Ht;1 = pt;1 in (3), the regression term t can be written in the specic form t(j) = (j) + pt;1(j) + T (Yt;1)(j) + TZt : (7) Note that model (5) - (7) has an inherent recursive structure and a side condition on the parameters to ensure t(j) ; t(j;1) > 0. Besides the specications (i) and (ii) above we can here also choose (iii)

q = 1 (Yt;1)j = Wt;1 for all j ,

i.e. we can employ the lagged ordinal variables. Model (7) can easily extended to a higher order: instead of pt;1 and T (Yt;1) sums of the form s X p 0

i=1

i t;i

and

q X T (Y 0

i=1

i

t;i )

can be used. 3. Metrically scaled responses If the response vector Yt = (Yt1 : : : Ytm)T consists of metrically scaled variables Ytj , the regression term t can be put as 5

t = +

s X 0

i=1

i

t;i +

q X Y 0

i=1

i t;i + 1

TZ t

(8)

with 1 = (1 : : : 1)T 2 IRm and i i scalars or m m -matrices, see Zeger and Qaqish (1988) and Li (1994). Here we often have a function h0 : IR ! IR, e.g. h0 = exp, such that

h() = (h0(1) : : : h0(m))T :

(9)

4 (QUASI-) LIKELIHOOD For the following we will assume that the evolution of the process Zt t 1 is not inuenced by the process Yt t 0 precisely

pr(Zt+1 2 jHt Yt) = pr(Zt+1 2 jZ1 : : : Zt) : (10) Then a full likelihood approach is possible if (conditional) densities ft(y #t) y 2 IRm of the conditional distributions pr(Yt 2 jHt) are available, as in Ex. 1 and 2 above. 1. First, let us assume that the density ft belongs to an m-parametric exponential family, i.e. ft(y #t) = exp fyT#t ; b(#t)ggt(y):

(11)

Then we obtain t(#t) = b0(#t), and, via (1), #t = u(t) with u = (b0);1 h and t = Xt #: Based on an observation (Z1 Y1 : : : Zn Yn ), the log-likelihood function ln(#), the p 1 -score vector Un (#) = (d=d#)ln (#) and the p p -Hessian matrix Wn(#) = (d2=d#d#T )ln(#) are given by

ln(#) =

Xn fY T # ; b(# )g # = u( ) = X # t t t t t t t t=1

6

Un(#) =

Xn X T D (#);1(#)(Y ; (#)) t=1

t

t

Wn(#) = Rn (#) ;

t

t

t

Xn X T D (#);1(#)DT (#)X t=1

t

t

t

t

(12)

t

where we have set Dt(#) = (d=d)hT (t) t(#) = (d2=d#d#T )b(#t) #t = u(t) Rn(#) as in Fahrmeir and Tutz (1994, App. A1), and where we have neglected additive terms of ln(#) not depending on #. In the case of an Mm(1 pt )-distribution, where pt t Kaufmann (1987) gave conditions under which there exists a consistent m.l. estimate #^n for #, which is asymptotically normally distributed in the sense that the distributional convergence ;1 ;;1 n (#^n ; #) ! Np (0 V (#)) takes place for n ! 1, where

(13)

;V (#) = pr#- lim ;Tn Wn (#^n);n if the latter limit exists for a sequence ;n = ;n (#) n 1 of invertible norming matrices tending towards 0. 2. Secondly, let us assume an ordinally scaled response variable Wt as in Ex. 3.2 . Then ptj (#) = pr#(Wt = j jHt) and we can write

ln(#) = Un(#) =

Xn log p (#) tw t=1 Xn u (#) u (#) = (d=d#)p (#)=p (#) tw tj tj tj t

t=1

t

and Wn (#) similarly. Assuming the model (5)-(7) we can make use of recurrence relations of pt(#) and its derivatives (instead of exploiting the exponential family structure of the Mm (1 pt)-distribution). Using a distancediminishing theory for certain iterative function systems (Norman, 1972), (13) can be proved under the assumptions that jsup F 0() j < 1 and that 7

Zt t 1, forms a Markov process (of some order) with compact state space and with Lipschitz-bounded transition kernels (Pruscha, 1993). 3. Now we assume that, contrary to 1. and 2., a (conditional) density for pr(Yt 2 jHt) cannot be given, but that (conditional) rst and second moments t(#) E# (Yt j Ht) = h(t)

cov#(Yt j Ht) = t(t) t = Xt #

can be specied. One still use Un (#) = 0 as estimation equation with Un(#) as in (12). If the response variables Yt t = 1 2 : : : are m-dimensional and independent, we are in the case of longitudinal data and the asymptotic covariance in (13) is of the form V ;1(#)S (#)V ;1(#), where

Xn

;1 T S (#) = pr#-lim ;Tn f XtT Dt (#^n) ;1 t (#^n ) cov (Yt ) t (#^n )Dt (#^n )Xt g ;n t=1

see Liang and Zeger (1986).

5 RESIDUAL ANALYSIS 5.1 Linear model residuals The following derivation of global and partial residuals is inspired by linear model theory. In a linear regression model of the form

Yt = t + et t = Xt# t = 1 2 : : : global residuals are dened by ^ e^t = Yt ; Xt#

#^ l.s. estimator for #. Partial residuals from regression on X1, where 8

(14)

Xt = (Xt1 Xt2) and #T = (#T1 #T2) are partitions, are given by

e^(tpar) = Yt ; Xt1#^1 = e^t + Xt2#^2: (15) Note that e^(tpar) can be gained from the \true partial residual" e(tpar) t (Yt ) = Yt ; Xt1#1 by plugging in the estimator #^1 for #1. Let the et's now N (0 2) -distributed, i.e. let pr(Yt y) = " 2 (y) with "2 being the N ( 2) -distribution function. Then e(tpar) t(Yt) can be gained from t

pr( t (Yt) y) = " 2 (y) t = t ; Xt1#1 = Xt2#2: t

(16)

In more general models like ours two di erent (partial) residual methods can be established (falling together in the normal linear case above) a) residual measures for diagnostic purposes which will be dened in analogy with (15) b) residual variables, which have values on the same scale as the Yt-data and which can be submitted to further time series analysis. They will be gained in analogy with (16).

5.2 Partial residual measures On the basis of the general model (1),(2) we build the global GLM-residuals (cf. Fahrmeir and Tutz, 1994, p. 98)

e^t = Dt;T (#^) (Yt ; t(#^)) (17) with Dt as in 4.1. Using (17) we dene partial residuals from regression on X1 as in (15) by e^(tpar) = e^t + Xt2#^2: 9

(18)

Often it is desirable to summarize the m-components e^(tjpar) of (18) into a one-dimensional measure. In the following two examples we will weight the components e^tj of (17) by the diagonal elements dtj (#^) d^tj of Dt (#^) i.e. we will build

e~t =

m m X X d^tj e^tj = d^tj :

j =1

j =1

(19)

Then, dening X~t2 in analogy,

e~(tpar) = e~t + X~t2#^2 is a one-dimensional partial residual measure.

(20)

Ex. 1. If we have a response function h of the form (9), then d^tj = h00(^tj ), and we obtain from (17) and (19)

e~t =

m m X X (Ytj ; tj (#^)) = d^tj j =1

j =1

which is a kind of average of the residual components. Ex. 2. In the case of an ordinal response with response function h of the form (6), i.e. hj () = F ((j)) ; F ((j;1)), we have d^tj = F 0(^t(j)) , and Dt;T (#) is a lower triangular matrix with j -th row (1=F 0(t(j)) : : : 1=F 0(t(j)) 0 : : : 0) : Hence the j-th component of (17) turns out to be

e^tj = (1=d^tj )

Xj (Y ; p (#^)) tk tk

k=1

and (19) takes the form 10

e~t = ; (1=

X d^ ) (W ; m (#^)) tj t t j

(21)

Pm+1 jYtj as in (4) with the f1 : : : m + 1g -valued ordinal variable W = t j =1 and with the mean category mt = Pmj=1+1 jptj (see Pruscha, 1994, sec 3.1). Note that Wt ; mt is a really ordinal residual. Partial residual measures like (18), (20) are usually plotted over the regression term Xt2#^2 or X~t2#^2 to assess the signicance of the regressor set X2.

5.3 Partial residual variables We restrict ourselves to univariate response variables with (up to parameter # ) known distribution function. Putting

t = Xt# t = t ; Xt1#1 = Xt2#2 we will call Y^t = t(Yt #^) partial residual variable (from regression on X1), if we have for Yt = t(Yt #), in analogy with (16),

prt (Yt y t) = prt (Yt y t) for all y 2 IR

(22)

where prt() = pr( j Ht) and prt( ) means that prt () is evaluated under regression term = t or = t. Further we require

prt(Yt = Yt t) = 1 if t = t :

(23)

We will use the notation Ft(y ) = prt (Yt y ) and will distinguish the cases where Ft is continuous in y 2 IR or not. a) Let Ft(y ) y 2 IR , continuous for each . Put F ;1(x ) = inf fy : F (y ) xg and dene Yt = t(Yt #) via 11

t (y #)

= Ft;1(Ft(y t) t) :

Then, since Ft(Yt t) is U 0 1]-distributed under prt( t) one gets

prt (Yt y t) = prt (Ft(Yt t) Ft(y t) t) = Ft(y t) that is (22). Further, relation (23) is fullled, since Ft;1(Ft(Yt t) t) < Yt occurs with prt( t)-probability zero. In the example F (y ) = "2 (y), we have t (y #)

= ";1 2 (" 2 (y )) = y ; Xt1 #1 t

t

such that t(y #^) as in (15) above. b) Let Yt 2 J = f1 2 : : : m + 1g ordinally scaled (formerly denoted by Wt for a purely categorical response, partial residual variables don't seem to be meaningful). Then we have Ft(j ) = prt(Yt j ) j 2 J and (22) cannot be satised with a functionP t : J ! J . Instead, we will dene transition probabilities t(kjj ) k2J t(kjj ) = 1, such that

Yt = k is selected with probability t(kjj ) if Yt = j is the observed category

(24)

Let prt denote the (conditional) probability law governing the observed process as well as the random experiment (24). Then, making use of the model equation prt (Yt = j ) = hj (t) , we have

prt (Yt = j Yt = k) = hj (t) t(kjj ) : 12

Putting

t(j k )

= hj (t) t(kjj ) , we have to dene

P Pjk22JJ tt((jjkk))==hhkj((t))

t

in such a way that

t

(25)

for the second equation see (22), and that, with respect to (23),

X (j j ) = 1 if = : t t t

j 2J

(26)

To this end, let H(j)() = h1() + : : : + hj () as in 3.2, H(0) = 0, and introduce for j k 2 J the intervals

It(j k) = (H(k;1)(t) H(k)(t)] \ (H(j;1)(t) H(j)(t)]: Then we dene t(j k) as the length of It(j k) , i.e. t (j k )

= jIt(j k)j :

(27)

Since Pk jIt(j k)j = H(j)(t) ; H(j;1)(t) and Pj jIt(j k)j = H(k)(t) ; H(k;1)(t), equations (25) are fullled. Since jIt(j j )j = H(j)(t) ; H(j;1)(t) , if t = t , (26) is also satised. Partial residual variables Yt are constructed with the intension to remove the inuence of the regressor set X1 on the response variable. A typical application is the removal of a trend in a time series.

6 FORECASTING METHODS 6.1 General method Based on the model (1),(2) , i.e. t E (Yt j Ht) = h(t) t = Xt# and given an observation up to time T , i.e. 13

FT = (Z1 Y1 : : : ZT YT ) = (HT YT ) we dene the l-step predictor for T +l by

^T (l) = E (T +l j FT ) l 1 :

(28)

In the following we will write

ET () = E ( j FT ) varT () = ET ( ; ET ())2 and we will use a similar denition for the conditional covariance matrix covT () . Due to FT HT +l we also have

^T (l) = ET (YT +l ): We will compute ^T (l) by the approximation $T (l) + BT (l), where $T (l) is gained by interchanging conditional expectation and response function h , i.e. by

$T (l) = h(^T (l)) ^T (l) = ET (T +l)

(29)

and BT (l) is a correction term, to be developped below. The l-step predictor ^T (l) is, contrary to ^T (l), computable for many models, especially for models with a recursive structure. For the derivation of such computation formulas we will distinguish between a continuous and a discrete response. In any case we have to assume that l-step predictors

Z^T (l) = ET (ZT +l) are available for the covariate process Zt t 1. This is the case, e.g., if Zt forms an r-dimensional autoregressive process of some xed order, see Brockwell and Davies (1987, sec. 11.4). 14

6.2 Recursive forecast formulas a) Continuous response. For the m-variate model (8), with s0 = q0, one starts with q X ^ (1) = + 0

T

i T +1;i

i=1

q X + Y 0

i=1

i T +1;i + 1

T Z^ (1) T

(30)

and has, with ^T (1) = h(^T (1)) + BT (1), the 2-step predictor q X ^ (2) = + 0

T

i=2

i T +2;i

q X + Y 0

i=2

T (1) + 1 i T +2;i + (1 + 1 )^

T Z^ (2) T

and so on, until for l > q0 q X 0

^T (l) = + (i + i)^T (l ; i) + 1 T Z^T (l): i=1

(31)

The classical Box and Jenkins (1976, p. 129f) forecast formulas are contained as a special case. Indeed, one has to write ( + ) + (Y ; ) instead of + Y , and has to interpret t Yt ; t as their zt and at, respectively see also Lee (1994, p. 507). b) Discrete response. We will use the extended version of model (7), i.e.

t = +

q X p 0

i=1

i

t;i +

q X T (Y 0

i=1

i

t;i ) + 1

T Zt

:

(32)

While ^T (1) is similar to (30), we have, with the probability vector p^T (1) = h(^T (1)) + BT (1), the 2-step predictor

^T (2) = +

q X p 0

i=2

i T +2;i +

q X T (Y 0

i=2

i

T ^ T (1) + 1 T Z^T (2) T +2;i ) + 1 p^T (1) + 1

where ^ T (1) = Pmj=1+1 p^Tj (1)(ej ), em+1 = 0 2 IRm . Finally, for l > q0 and with ^ T (k) = Pmj=1+1 p^Tj (k)(ej ) , 15

^t(l) = +

q q X X i p^T (l ; i) + Ti ^ T (l ; i) + 1 TZ^T (l) : 0

0

i=1

i=1

(33)

In the special casePof a simple Markov chain as in 3.1 we obtain p^Tj (l) = P l(i j ) if WT mk=1+1 kYTk = i, with P l the l-th power of the known or estimated transition matrix.

6.3 Correction term To give an estimate B^T (l) of the bias

BT (l) = ^T (l) ; $T (l) produced by the approximation ET (h(T +l)) h(ET (T +l)) , we start with expanding h(T +l) and h(^T (l)) at ^T (l ; 1) ^T (0) = T , up to the order 2. To do this, we write h instead of hj for some xed j , assume twice continuous di erentiability of h and introduce the abbreviations

^ = ^T (l ; 1) xT (l) = T +l ; ^ x^T (l) = ^T (l) ; ^: Then, with remainder terms RT (l) etc.,

h(T +l) = h(^) + xTT (l) h0(^) + 12 xTT (l) h00(^) xT (l) + RT (l) h(^T (l)) = h(^) + x^TT (l) h0(^) + 12 x^TT (l) h00(^) x^T (l) + R^ T (l):

(34) (35)

Applying conditional expectation to (34) we obtain

ET (h(T +l)) = h(^) + x^TT (l) h0(^) + 12 ET fxTT (l) h00(^) xT (l)g + R~ T (l): (36) 16

Subtracting (35) from (36), neglecting remainder terms and introducing the omitted subscript j again, we arrive at

B^Tj (l) = 21 ET fTT+l h00j (^) T +lg ; 12 ^TT (l) h00j (^) ^T (l) = 21 ET f(T +l ; ^T (l))T h00j (^) (T +l ; ^T (l))g: Denoting the eigenvalues of the m m -matrix h00j (^) by (jk) k = 1 : : : m and writing

h00j (^) = ATj (^) Diag( (jk) ) Aj (^) we nally obtain for j = 1 : : : m m X B^Tj (l) = 21 (jk)A(jk)T (^) covT (T +l) A(jk)(^) ^ = ^T (l ; 1) k=1

(37)

where A(jk)T stands for the k-th row of the m m -matrix Aj and covT () for the conditional covariance matrix of . For categorical responses, we have to dene B^Tm+1 = ; Pmj=1 B^Tj and (possibily) have to recalculate the B^Tj (l) j = 1 : : : m + 1, in such a way that the p$Tj (l) + B^Tj (l) j = 1 : : : m + 1 form a probability vector.

To use correction term (37) estimates for covT (T +l) must be available. Noting that covT (T +l) = covT (T +l ; T ) , one is led to build empirical variances/covariances of the vectors t+l ; t t = 1 : : : T ; l. Equation (37) will now be specialized in two examples. Ex.1. For a response function h of the form (9) one obtains

B^Tj (l) = 12 h000 (^Tj (l ; 1)) varT (T +lj ) : Ex.2. In the case of an ordinal response with response function h as in (5), (6) we obtain for the bias BT (j)(l) = p^T (j)(l) ; p$T (j)(l) the estimate 17

B^T (j)(l) = 12 F 00(^T (j)(l ; 1)) varT (T +l(j)) :

(38)

Here, we are also interested in the mean category

mt =

X jp

m+1 j =1

tj

=

m X (1 ; p

j =0

t(j ) )

and its l-step prediction

m^ T (l) =

m X (1 ; p^T (j)(l)) :

j =0

For the bias BTm(l) = m^ T (l) ; m$ T (l) we obtain from (38) the estimate m X B^Tm(l) = ; 12 F 00(^T (j)(l ; 1)) varT (T +l(j)) : j =1

(39)

6.4 Monte Carlo solution Let us now assume that we are in the (most informative) situation, where the conditional probability laws pr(Yt 2 j Ht) and pr(Zt+1 2 j Z1 : : : Zt) are explicitely given (see condition (10) above) and where estimates of all model parameters are available. Then one can gain the following computerintensive solution T (l) for the l-step predictor ^T (l) = ET (T +l). Given the observation (Z1 Y1 : : : ZT YT ) the succeeding outcomes are simulated M times by the Monte-Carlo method. From ) Y (i) : : : Z (i) Y (i) i = 1 : : : M ZT(i+1 T +1 T +l T +l

one builds T(i+) l i = 1 : : : M and the average M X T (l) = M1 (Ti)+l (Ti)+l = h(T(i+) l) i=1

18

(40)

as the Monte-Carlo solution for ^T (l). The mean squared error

VT(l) = ET (&T(l) &TT(l)) = covT (T +l ) where &T(l) = T +l ; ^T (l), can be estimated from (40) by M X i) i)T V^T(l) = M 1; 1 (&(T (l) &(T (l)) i=1

i) (l) = (i) ; (l). where &(T T T +l Up to now there seems to exist no approach to estimate V^T(l) and related mean squared errors outside the Monte-Carlo method.

7 Applications 7.1 Data sets We will use two di erent data sets for illustrating the various methods proposed above. The rst is a longitudinal data set on damages in beech, oak and pine trees, gathered by Dr. A. Goettlein, University of Bayreuth, during the last years in a forest district of the Spessart (Bavaria). The longitudinal structure of the data is determined by the observation period of 12 years (1983 - 1994) and by N sites (N = 80/25/14 sites with beech/oak/pine trees). The response variable Wt measures the percentage of leaves/needles lost on an ordinal scale of m+1=8 categories. For each site and each year t a vector Zt of r=20 covariates were recorded concerning the trees (age, canopy), the site (gradient, height, exposition), the soil (moisture, pH-values) and the climate, see Goettlein and Pruscha (1992) and (1996) for detailed information. The parameters of the cumulative logistic regression model (5)-(7) were estimated by the m.l. method for each tree species separately. Concerning the - function we made the special choice of lagged ordinal variables, see case (iii) in 3.2. Further we put = 0. The covariate process Zt is assumed- as far as 19

forecasting methods are employed- to be driven by an AR(1)-equation. The second data set concerns the aftershock series of the Friuli earthquake (May-Sept. 1976), which were placed at my disposal by Dr. H. Gebrande, University of Munich. The response variable Wt gives the number of shocks at day t, for t = 1 : : : 115 (corresponding to the period from 19th May to 10th Sept.), the covariates Zt are the magnitude ML of the shocks (daily averages) and - in connection with trend analyses - the terms t;1=2 t;1=4. The cumulative logistic model (5)-(7) was applied, with the m+1=6 categories 0 : : : 5 (i.e., instead w we took min(5,w) as response value), and with a preselected -matrix. The rst two, middle two and last two rows of the corresponding transition matrix P ( ), see case (ii) in 3.1, was chosen as ( 41 14 14 14 0 0), (0 14 41 14 41 0) and (0 0 14 14 14 14 ), respectively (actually, the F ;1-transformed cumulative probability vectors entered regression equation (7)).

7.2 Partial residuals For the forest damage data we want to plot one-dimensional partial residual measures (20) over the regression term Xt2#^2, where e~t is calculated on the basis of a cumulative logistic model via formula (21). For the oak and pine tree, the partial residual plot for the regressor set X2 = topography = (height, gradient, upper/lower part of slope) shows a clear upward trend (Fig. 1) and gives evidence for the signicance of this covariates in the model equation. This is di erent with the beech tree, where the plot gives no hint to a relevance of the topography (in agreement with related test results, see Goettlein and Pruscha, 1996). The partial residual plot for X2t = Wt;1, the lagged ordinal response variable (Fig. 2), reveals the strong dependence of the damage value Wt on the value Wt;1 of the last year. The method 5.3 b) of building ordinally scaled partial residual variables (from regression on X1) is demonstrated for the Friuli earthquake data. The timeseries plot of the number of shocks per day shows a decreasing tendency (see Fig. 3a). A trend function a(t) = 1t;1=2 + 2t;1=4 was incorporated into the regression term t, and ^a(t) was plotted in the form trend(t) = c0 + c1^a(t), with appropriate scaling rates c0 c1. Letting in the cumulative 20

model with 6 categories, as described above, X1t = (t;1=2 t;1=4) and X2t = (pt;1 (Yt;1) MLt) we can calculate partial residual variables Y^t according to (24) and (27). The time-series plot of the Y^t-values no longer reveals an obvious trend (Fig. 3b), in agreement with the test result that a trend component would no longer be a signicant part of the regression term.

7.3 Forecasting Fixing the observations of the Spessart data within the period 1983-1994 as known, we try to forecast the damage values for the years 1995-2002. That is, we put T=12, and we are interested in the l-step predictors m^ T (l) l = 1 : : : 8, of the mean damage category mt = Pmj=1+1 jptj , for each of the three tree species separately. The calculations of Fig. 4 were performed for each site i = 1 : : : N , followed by an average over the N sites of the species. On the basis of the cumulative logistic regression model with regression term t(j) = (j) + Wt;1 + T Zt , the l-step prediction m^ T (l) was computed by m$ T (l) + B^Tm(l), with B^Tm(l) as in (39) and with

m$ T (l) =

m X (1 ; p$T (j)(l)) p$T (j)(l) = F (^T (j)(l))

j =0

^T (j)(l) = (j) +

X kp^ (l ; 1) + T Z^ (l): Tk T

m+1 k=1

The correction terms B^Tm(l) were calculated in two di erent ways: The term varT (T +l) in (39) was estimated by empirical variances, as indicated in 6.3, and by the Monte-Carlo method 6.4, denoted by B^T(A)(l) and B^T(M )(l), respectively. To come close to the correct forecast E (mT +ljFT ), the forthcoming paths (40) were simulated M=600 times, assuming gaussian errors in the AR(1)law of the covariate process. As in 6.4 averages pT (j)(l) were built as well as mT (l) = Pmj=0(1 ; pT (j)(l)) together with the 95% condence limits 21

q

p mT (l) V^Tm(l) 1:960= N where V^Tm(l) was calculated for each of the N sites as indicated in 6.4 and then averaged. The approximations m$ T (l) l = 1 : : : 8 run within these condence limits mT (l) s, see Fig. 4, the corrected forecasts m$ T (l) + B^Tm(l) comes close to mT (l) and hence close to the correct forecast of mT +l. In the case of the beech tree the correction term B^T(A) performs bad in the period 1997-2000.

Acknowledgement: This work is partially supported by the Deutsche Forschungsgemeinschaft (SFB 386).

References Box,G.E.P and Jenkins,G.M. (1976) Time Series Analysis: Forecasting and Control. rev.ed. Holden-Day, San Francisco Brockwell,P.J. and Davis,R.A. (1987). Time Series: Theory and Methods. Springer, New York Fahrmeir,L. and Tutz,G. (1994). Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, New York Fuller,W.A. (1976). Introduction to Statistical Time Series. Wiley, New York. Goettlein,A. and Pruscha,H. (1992). Ordinal time series models with application to forest damage data. In: Lecture Notes in Statistics , 78. Springer, New York, 113-118. 22

Goettlein,A. and Pruscha,H. (1996). The inuence of stand characteristics, topography, site and weather on tree health in the forest district Rothenbuch. Forstw. Cbl., 115 (to appear) Kaufmann,H. (1987). Regression models for nonstationary categorical time series: asymptotic estimation theory. Ann. Statist. , 15, 79-98. Landwehr,J.M., Pregibon,D. and Shoemaker,A.C. (1984). Graphical methods for assessing logistic regression models. J.Am.Statist.Ass., 79, 61-71. Li,W.K. (1991). Testing model adequacy for some Markov regression models for time series. Biometrika, 78, 83-89. Li,W.K. (1994). Time series models based on generalized linear models: some further results. Biometrics, 50, 506-511. Liang,K.Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13-22. McCullagh,P. (1980). Regression models for ordinal data (with discussion). J.R.Statist.Soc. B, 42, 109-142. Norman,M.F. (1972). Markov Processes and Learning Models. Academic Press, New York. Pruscha,H. (1993). Categorical time series with a recursive scheme and with covariates. statistics, 24, 43-57 Pruscha,H. (1994). Partial residuals in cumulative regression models for ordinal data. Statist. Papers, 35, 273-284. Pruscha,H. (1995). Forecast methods in regression models for categorical time series. In: Lecture Notes in Statistics 104. Springer, New York, 233-240. Zeger,S.L. and Qaqish,B. (1988). Markov regression models for time series: a quasi likelihood approach. Biometrics, 44, 1019-1031. 23

0

• • • • •

••• ••• ••• • ••• • • ••••• ••••••• ••• •••••• •• ••••• ••••• •••• ••••• ••• •• •• •••• • •• • • • •• • •• • • •

• •• •• •

-10

•

• • • • ••

•

•

-20

Partial Residuals

10

SPESSART 1983-94

• ••••• • • • ••• • •••••••••••••• ••• •• ••••••••• •••••••• •• •••• •••• • •••• • • • • •

• •• •••••• •••• • •••••

•• • •••• •• ••• • •• • •

• • •• ••• •• •• • ••• • • ••• • •• •• •• •• • •• • • •• •

••• ••• • ••• •

• • •• •

BEECH

• • • ••

•• ••

•

• • •

•

•

•

•

•• •

•

•

• •

•

• •

•

•

•• • • • •• • • ••• •• •• •• • • •• ••• •• • •

• •

-0.1

•

0.0

0.1

0.2

0.3

5

BETA * TOPOGRAPHY

0

•• •• •

-5

Partial Residuals

•

• • ••• •

• •

• •

• • • •

••

•

•• •• •• •• •• •

• •

•

• • •

• •• •• • • •• •

-10

• -1.5

• • • •• •• • •

• • • • •• • •

• • • • • • •• •• ••• •• • • •

•

• •

• •

•• •

•

•• •

•• • • •• ••

•

•• • •

• • ••

•• •

••

OAK

• • •

••• • ••

••

•

•

•

•

-1.0

• •• •

• •

• ••

• •

••

• •

• • • • • •• •• • •

-0.5

0.0

0.5

1.0

1.5

6

BETA * TOPOGRAPHY

•

2 -2

0

• •

-4

•

• • •• • • •• •

•• • • • • •• • •• •• •• ••

• •• • •• •

• • •

• •• • • • • • •

• • •• • •• • • • •• •• • •

•

• • • • • • •

••

•• • • • •

• •

•

•

•

PINE

• • • ••

• • • • •• • •

• •

•• • •• • •• • • •• • •

• •• •

•

• •

• ••

•

•

-6

Partial Residuals

4

•

•

• -2

-1

0

1

2

BETA * TOPOGRAPHY

24 Figure 1: Partial residuals for the regressor set topography plotted over the regressor term betatopography, for each of the three tree species. A smoothing curve was tted to the scatterplot.

5

• •

• •

•

• ••• •••• • ••• • ••••• •

0

•

-5

Partial Residuals for W(t-1)

10

SPESSART 1983-94

• •• •• •

• •• •• •••• • ••• •• ••• •••

••

• ••

••• ••• •• •• •

••

•

•

•

OAK

0

1

2

3

4

5

6 4

•

•

0

1

2

2 •

•• • •• • ••• •• ••• ••• •• •• ••

-2

0

• •

• -6

Partial Residuals for W(t-1)

Lagged Variable W(t-1)

• •• • •••

••• ••• •• •• •• •• • •• ••• •

3

•• •• • ••• • ••• •• • •• •• •• •

• •••• • • • ••

•

PINE

4

5

6

Lagged Variable W(t-1)

Figure 2: Partial residuals for the regressor Wt;1, the lagged (ordinally scaled) damage category, plotted over the values of Wt;1, for oak and pine trees. A smoothing curve was tted to the scatterplot. 25

6

FRIULI EARTHQUAKE 1976 AFTERSHOCKS

5

• • •• ••

4

• •

• •

3

•

• •••• ••• •

•

•• •• •

• • •••

••

••• • • •

••

• •

2

•

•

1

NO. OF SHOCKS

trend(t) ____ •• ••••

•

••

•• • • •• •

0

• • 0

••

20

• •

• •

• ••

•

••• • ••

40

•

• •

•

••• •

60

•

••

••••• • •• • • • •• •

• • •••

•

80

100

••••

• 120

5

•

4

• ••

3

•• ••

2

•

•

1

•• •

•••• ••• •

• ••• •••

• •

•• • • •• •

0

•

• ••• •

•

0

Residuals From Trend

6

DAYS 19th May - 10th Sept

••

•

• ••

• •• •

• 40

••

• •• •• • • ••• • • 60

••

• 80

•

• ••

••

•• • • ••

• • •••

•

• •

•

• • • • • •• ••

••

• •• • 20

•

••

•• • 100

• 120

DAYS 19th May - 10th Sept

Figure 3: a) (top) Number Yt of shocks per day plotted over the aftershock period of 115 consecutive days, together with a trend function. b) (bottom) Ordinally scaled partial residual variable Yt from trend, plotted over the 115 consecutive days. 26

0.50

0.55

0.60

0.65

FORECASTED DAMAGES SPESSART

0.45

BEECH 1995

1996

1997

1998

1999

2000

2001

2002

1999

2000

2001

2002

1.64

1.68

1.72

YEARS

OAK 1995

1996

1997

1998

2.85

2.95

YEARS

2.75

M A A+BA A+BM

2.65

PINE 1995

1996

1997

1998

1999

2000

2001

2002

YEARS

Figure 4: Forecasted forest damages for the years 1995 - 2002, for each of the three tree species. The approximation m$ T (l) A] is plotted, together with the corrections m$ T (l) + B^T(A)(l) and m$ T (l) + B^T(M )(l) A + BA and A + BM ] and with the Monte-Carlo solution mT (l) M ]. At the beginning and the end of the beech curves, a condence interval M s is indicated by vertical bars in the case of the oak and pine curves these bars would overlap the whole plot area (s 0:15 and s 0:30, resp.) and are therefore omitted. 27

Residual and forecast methods in time series models ...

Residual and forecast methods in time series models ...

Suggest Documents

Using Artificial Market Models to Forecast Financial Time-Series

Resampling in Time Series Models

Time Series and Forecasting 22.1 Time Series Models

[PDF] Time Series: Theory and Methods (Springer Series in Statistics ...

Hybrid Perturbation methods based on Statistical Time Series models

Graphical Models in Time Series Analysis - Researchers

Bayesian Methods in Nonlinear Time Series

High-level dependence in time series models

Encompassing Univariate Models in Multivariate Time Series

Tourism Time Series Forecast - Different ANN Architectures with Time

BAYESIAN TIME SERIES: FINANCIAL MODELS AND SPECTRAL ...

Automatic extraction and forecast of time series cyclic ... - Caterpillar SSA

Time Series Analysis and Forecast of the Electricity ... - wseas

Time Series Analysis and Forecast of the Electricity ... - wseas.us

Structural Time Series Models and Trend Detection in Global and ...

Forecast Models for Actual Construction Time and Cost

Models and Methods (Wiley Series in Systems ... - Google Sites

Models and Methods (Wiley Series in Systems ... - Google Sites

Causality and graphical models in time series analysis

Applications and Comparisons of Four Time Series Models in ... - PLOS

pdf-0699\time-series-and-dynamic-models-themes-in-modern ...

Unit Roots in Time Series Models: Test and ... - Census Bureau

Combining domain knowledge and statistical models in time series

residual limb models