the model based approach to smoothing sonia vt ... - Open Collections

2 downloads 0 Views 5MB Size Report
1.1 A good data analysis paradigm. 4. 3.1 The time immemorial assumption. 63. 3.2 Predicted signals with 95% confidence bands of chlorine data using two.
T H E MODEL BASED APPROACH TO SMOOTHING by

SONIA V . T . M A Z Z I L i e , Universidad Nacional de Cordoba, Argentina, M . S c , T h e University of British Columbia,

1989

1992

A T H E S I S S U B M I T T E D IN P A R T I A L F U L F I L L M E N T THE REQUIREMENTS FOR T H E DEGREE DOCTOR OF

OF

PHILOSOPHY in

T H E F A C U L T Y O F G R A D U A T E STUDIES DEPARTMENT OF

STATISTICS

We accept this thesis as conforming to the required standard

T H E UNIVERSITY O F BRITISH January

COLUMBIA

1997

© S o n i a V . T . Mazzi,

1997

OF

In

presenting

degree

at the

this

thesis

in

University of

partial

fulfilment

of

of

department

requirements

British Columbia, I agree that the

freely available for reference and study. I further copying

the

by

his

or

her

representatives.

an advanced

Library shall make it

agree that permission for extensive

this thesis for scholarly purposes may be granted or

for

It

is

by the

understood

that

head of copying

my or

publication of this thesis for financial gain shall not be allowed without my written permission.

Department

of

€Tr\

Tl'STt'CS

The University of British Columbia Vancouver, Canada

Date

DE-6 (2/88)

Sartygr^j

39-No,

The model based approach to smoothing

Abstract

Nonparametric regression methods are devised in order to obtain a smooth fit of a regression curve without having to specify a parametric model. provides a flexible approach to curve estimation.

T h i s , in principle,

It is shown how splines are not such

a flexible tool for smoothing. It is proposed to use the modern approach of state space modeling instead, to solve a variety of smoothing problems. T h i s model based approach alleviates estimation and computational problems as well as provides insightful solutions to smoothing problems.

n

Contents

Abstract

ii

Table of Contents

iii

List of Tables

viii

List of Figures

ix

Acknowledgements

x

Introduction

1

1.1

Motivation

1

1.2

Contributions

6

1.3

Outline

8

1

2 The State space model 2.1

10

Definition of the state space model iii

14

2.2

2.3

2.4

2.5

2.6

Some examples of state space models

18

2.2.1

T h e parametric regression model as a S S M

18

2.2.2

Structural models

19

2.2.3

A R I M A models as state space models

24

Principles and tools of prediction and estimation for the S S M

28

2.3.1

M i n i m u m mean square error prediction

30

2.3.2

Combining predictors and iterated prediction

32

2.3.3

T h e prediction and K a l m a n update formulas

32

2.3.4

T h e K a l m a n filter

34

2.3.5

T h e smoothing filter

35

T h e diffuse K a l m a n filter

39

2.4.1

T h e diffuse likelihood

40

2.4.2

Diffuse prediction

41

2.4.3

Diffuse smoothing

42

State space models which have applied since time immemorial

43

2.5.1

44

Prediction and likelihood evaluation for non-stationary models . . .

Cross-validation and related methods of estimation

46

2.6.1

47

Cross-validation for state space models

iv

2.7

3

2.6.2

Generalised cross-validation and state space models

48

2.6.3

Generalised cross-validation and diffuse state space models

49

Summary

50

Continuous state space models

51

3.1

Definition of continuous state space models

53

3.2

Discrete version of a continuous state space model

53

3.3

Estimation using the continuous state space model

54

3.3.1

T h e state transition matrix

54

3.3.2

Variance matrix of the state error

55

3.3.3

Initialization of the Diffuse K a l m a n filter when using a continuous state space model

3.4

4

63

Smoothing L-Splines

67

3.4.1

Definition of L-splines

68

3.4.2

A stochastic model for L-spline smoothing

69

3.5

Some continuous structural models

76

3.6

Summary

82

Smoothing polynomial splines

84

v

4.1

Definition of polynomial smoothing splines

86

4.2

A basis for the space of natural splines

89

4.2.1

Divided differences

89

4.2.2

T h e Demmler-Reinsch basis

91

4.3

Estimation of the smoothing parameter

95

4.4

A Bayesian approach to spline smoothing

96

4.4.1

Some Gaussian processes

96

4.4.2

Wahba's approach

99

4.4.3

A state space form of Wahba's model

4.5

101

Summary

104

A closer look at polynomial splines

105

5.1

T h e signal plus noise model

106

5.2

Penalised least squares

107

5.3

Smoothing splines, penalized least squares,

signal extraction and state

space models

108

5.4

Smoothing splines and A R I M A models

Ill

5.5

Related topics

114

5.5.1

114

Smoothing i n the presence of heteroskedasticity

vi

5.6

5.5.2

Repeated measurements

115

5.5.3

Semiparametric regression

115

5.5.4

Change point problems or intervention analysis

116

Summary

116

Epilogue

117

References

118

A Stochastic differential equations

124

A.l

Definition of the stochastic integral

124

A.2

Stochastic differentials

125

A.3

Stochastic differential equations, existence and uniqueness of solutions . . . 126

A.4

Linear stochastic differential equations

127

B Kronecker products

129

C Jordan canonical form of a matrix

131

D The matrix exponential

132

E Data sets

134

vii

List of Tables

2.1

Different trend models

23

3.1

Estimated parameters of chlorine data model for two different modeling situations

3.2

73

Estimated parameters for the L-spline and polynomial spline of order 7 models for melanoma data

79

vm

List of Figures 1.1

A good data analysis paradigm

4

3.1

T h e time immemorial assumption

3.2

Predicted signals with 95% confidence bands of chlorine data using two

63

different models 3.3

74

Predicted innovation residuals with 95% confidence bands for chlorine data using two different models

75

3.4

Predicted signals for Case I and II, superimposed

75

3.5

Predicted signal and pointwise confidence bands joined by straight lines of incidence of melanoma data using an L-spline and a polynomial spline of order 7

3.6

80

Predicted innovation residuals of incidence of melanoma data with error bands for the L-spline and polynomial spline of order 7 models

3.7

81

Predicted signal and pointwise confidence bands joined by straight lines of incidence of melanoma data using a cubic spline and function of the innovation residuals

ix

autocorrelation 82

Acknowledgements I a m most grateful to m y research supervisor, D r . Piet de Jong, whose advice and words were always a source of inspiration and guidance.

I thank h i m for his patience

and care to transmit not only the technical aspects of the statistical science but also the epistemological ones. O u r extended discussions were most formative and have definitely marked these first steps of m y carrer. I wish to thank D r . Nancy Heckman, D r . John Petkau, D r . R u b e n Zamar, and D r . Harry Joe for the invaluable advice and help provided throughout m y graduate studies. Thanks to Laurie M a c W i l l i a m , Cheryl Garden, Matias Salibian, Birgitte R0nn, and Felix Menden who, as friends and/or collegues provided at different stages of m y graduate student life uninterested support, love and care. I would also like to thank Piet de Jong for his financial support through an N S E R C grant. Finally, I want to thank my family. Without their love and trust this would have never been accomplished.

Chapter 1 Introduction 1.1

Motivation

Suppose that pairs of data values (yi,Xi),

i = 1 , . . . , n , are observed and one wishes to

describe the relationship between the components of the pairs. T h i s is called a problem.

s m o o t h i n g

T h e reason for the name is that one would like to describe the relationship

between the variables without taking into account too much of the inherent noise which is present i n almost all data sets.

For example, if the x's are the times (or dates) at

which the ?/'s were recorded, the set of pairs ( ? / ; , £ ; ) ,

i

=

l , . . . , n ,

is denoted a time

series and the interest lies i n describing how y evolves i n time. A great deal of effort has been put by researchers to study time series and statistical models and inference tools have been devised for this purpose, for example, A R I M A models (see Box, Jenkins and Reinsel, 1994), the more modern scope of State Space models (see for example Anderson and Moore, 1979, Harvey, 1989 and 1993, West and Harrison, 1989) and i n particular Structural T i m e Series models (see Harvey, 1989 and 1993).

A l l these studies on time

series can be translated with few modifications to the case where x is not necessarily time but any other equally spaced ordering variable for the ?/'s. In this thesis no assumption

1

on the spacing of the indexing variable, x, will be made. Another area of statistics where a lot of effort has been put into studying smoothing problems is nonparametric regression. In this case, the relationship between the variables is assumed to have an unspecified functional form g such that yi = g(xi) where the e;'s are

i.i.d.

+ ei,

i = l,...,n,

(1.1)

error terms. T h e main idea behind nonparametric regression is,

in contrast to parametric models, not to make any assumptions on the form of g. parametric models g would be known, at least implicitly, up to some parameters polynomials).

In (e.g.

In non-parametric regression, qualitative assumptions on g, commonly

the continuity of some derivative, are made in order to restrict the search of a suitable regression function to a reasonably small set. There are different approaches to estimate g in a non-parametric context.

Perhaps

the most popular non-parametric approach is the one based on polynomial splines. T h e estimation criterion that gives rise to polynomial splines consists on finding a function g that has certain regularity conditions (e.g. a degree of differentiability, square integrability of the derivatives, etc.), fits the data well and has some degree of smoothness

(where

smoothness or curvature is measured as the integral of the square of some derivative of g).

It turns out that the polynomial spline of order 2m — 1 is a smooth piecewise

polynomial of order 2m — 1 (see Chapter 4). It is claimed that because of the segmented nature of piecewise polynomials, polynomial smoothing splines will estimate the regression function with high fidelity and smoothness. It has been realised, though, that polynomial smoothing splines do not always provide satisfactory regression estimates.

A s a way of

generalizing smoothing splines and at the same time cover the cases where smoothing polynomial splines fail to perform appropriately, attention is being paid to the so called 2

L-splines (see Chapter 3 for a definition). Most of the work done in the area of spline smoothing is not based on the use of stochastic models, but on the use of functional analysis theory. It is the claim of statisticians advocating nonparametric regression that by only supplying qualitative information about the regression function the data will speak for themselves concerning the actual form of the regression curve. As will be shown, parametric and nonparametric techniques do not cover the range of all possible approaches to solving smoothing problems.

F r o m the literature in non-

parametric regression (e.g. Eubank, 1988) it seems as though the dichotomy "parametric vs. nonparametric" techniques for regression comprise the universe of possible techniques so far developed. There are other approaches, based on stochastic modeling, which have coexisted most of the time with the deterministic modeling techniques but have been somewhat relegated to equally spaced time series analysis and ignored by the researchers working in other regression (or smoothing) problems.

T h e most important objective in data analysis is to understand the true law that dictates the behaviour of observed variables. For this purpose, usually the researcher is presented with not much more than observed data and little or almost nonexistent theory inherent to the variables for this purpose. In this case a sensible paradigm to adopt would be as follows.

i) Recognize important characteristics of the data. ii) Develop a way to analyse the data, i.e. a methodology for data analysis and inference, so that this adopted methodology can incorporate as much available perception

3

and theoretical background as possible. iii) After the data have been analysed according to step ii), check that the adopted methodology has captured the features in i) (by means of diagnostic checks) and look for further improvement if necessary.

One can represent the above paradigm schematically as in Figure 1.1.

i) recognise characteristics of data

ii) adopt methodology

data analysis

for analysis

iii) verify methodology is appropriate

diagnostic checks

Figure 1.1: A good data analysis paradigm.

It is important that the choices of methodologies in step ii) are abundant so that the features of the data recognised in step i) can be captured. These methodologies have to be not only abundant but also easy to implement in terms of the feasibility, amount and level of complexity of computation involved. Also the methodologies of ii) should allow step iii) of the paradigm so that the loop can continue until a satisfactory description of the data is obtained.

This type of paradigm, or the chart depicted in Figure 1.1 is not new in the literature. Box has presented similar schemes in his works (see for example Box, 1979,

1980,

and

Box, Jenkins and Reinsel, 1994 p.17). W h a t is different is that Box does not make explicit step i) of the model building process and starts up in step ii). Other authors, inspired by Box's model building charts, have also designed and proposed more complete schemes 4

(see Cook and Weisberg, 1982, p.7, W i l d , 1994, Tong, 1990). For many regression problems it has been shown in the existing literature that the above paradigm is realizable by adopting in step ii) stochastic models such as state space models (see Chapter 2 for a definition and properties). T h e adoption of state space models permits the incorporation of observed features of data sets (for example in the form of components, like trends, seasonal and/or cyclic components, etc.)

as well as existing

background theory on the relationship of the variables (such as influence of covariates and the fulfillment of differential equations) in a specific and transparent way. Estimation within the context of state space models is relatively easy and convenient algorithms such as the K a l m a n filter, diffuse K a l m a n filter and smoothing algorithms are nowadays easily implemented on practically any computer. Diagnostic tools and tests are readily available to perform part iii) of the paradigm. O n the other hand, if one wishes to adopt the above data analysis paradigm and one decides to use polynomial spline smoothing as a methodology for data analysis, two problems arise.

T h e first one is that in no obvious way is it clear how to incorporate

observed features or underlying theory on the behaviour of the data in the data analysis methodology. T h e second problem is that because of the lack of diagnostic tools and the lack of a model the completion of the cycle indicated in Figure 1.1 cannot be done. A s will be shown polynomial spline smoothing is a rather narrow class of methodologies to adopt for data analysis. T h e amount and level of complexity of computations involved in spline methods is not to be overlooked. Only for the case of cubic polynomial splines do there exist relatively easy algorithms and methods of computation (see Green and Silverman, 1994).

Examples of articles where it is shown that smoothing by means of splines (either

5

polynomial or L-splines) is a problem that can be posed as a signal extraction problem using a particular stochastic representation of the unobserved signal or, in other words, as a smoothing problem in the context of a particular state space model are those by Weinert, B y r d and Sidhu (1980), Wecker and Ansley (1983), Ansley and K o h n (1986), K o h n and Ansley (1987), and Ansley, K o h n and Wong (1993). Weinert, B y r d and Sidhu (1980) studied the relationship between L-splines and stochastic processes and gave a state space model to compute L-splines. Wecker and Ansley (1983), found a state-space model for polynomial splines. Ansley and K o h n (1986) and K o h n and Ansley (1987), and Ansley, K o h n and Wong (1993), among others, based on Weinert, B y r d and Sidhu's work of 1980, explore further properties and extensions of the relationship between L-splines and stochastic processes. As Box (1979) points out, it should be recognised that the choice of a methodology for data analysis is not an irrevocable decision and the investigator need not attempt to allow for all contingencies a priori. A good data analysis methodology should allow for what Box calls a criticism stage that allows understanding of an initial analysis. T h i s should lead the researcher to question the methodology adopted or to look for improvement of the initial analysis.

For all the reasons above, it is proposed i n this thesis to adopt the paradigm represented i n Figure 1.1 for regression analysis.

1.2

Contributions

In this thesis, the problem of smoothing using nonparametric tools and stochastic models is addressed. T h e nonparametric tools considered in this thesis are polynomial smoothing

6

splines and L-splines and the stochastic models are state space models. T h e relationship of both approaches, nonparametric and stochastic models, is studied. A comparison of the performance of these methods is done by analysing real data. In order to carry out the comparison of these two approaches, continuous state space models are studied in detail.

First, it is studied how to derive a discrete state space

model from a continuous state model.

T h e most difficult part in this derivation is to

compute the state error covariance matrix.

W h e n the stochastic process defined by the

state equation of the continuous model is stationary then a result exists that makes the computation of the matrix easy.

T h e problem when the process is nonstationary

so far not been addressed in the literature.

has

T h i s problem has importance since most of

the stochastic processes that are associated with smoothing splines (L- and polynomial) are nonstationary.

A result, applicable to the nonstationary case, is shown that makes

the computation of the state error covariance easier than the direct calculation, in many practical cases. Also, it is shown how to initialise the diffuse K a l m a n filter to carry out estimation and prediction in the context of a discrete model derived from the continuous one.

In general it is shown how to translate all the existing estimation and prediction

tools for the discrete case to the continuous case.

Smoothing by minimising a penalised least squares criterion is associated with smoothing by using a certain continuous state space model. This state space model, in many cases, is not always properly specified in that the initial conditions are always diffuse. B y means of the analysis of an example it is shown that proper initial conditions should not be overlooked since they affect in great extent the estimation and prediction results.

It

is also shown, in this same example and a second one, the advantages of a model based approach to smoothing. It is shown how different models can be compared and how the

7

paradigm in Figure 1.1 can be realised using state space models. In order to compare polynomial spline smoothing with a stochastic model based approach to smoothing, the penalised least squares criterion giving rise to polynomial splines is studied in detail. In particular it is shown that the smoothing polynomial spline defining criterion is equivalent to another penalised least squares criterion. T h i s result generalises a known result for cubic splines. This new penalised least squares criterion allows to make the properties of smoothing polynomial splines transparent. In particular it is shown that a stochastic model associated with smoothing polynomial splines is an A R I M A plus noise model where the only parameter to be estimated from the data is the noise variance.

1.3

Outline

T h e rest of this thesis is organized as follows. Chapter 2 is a review of state space models, where the main ideas on state space modeling, definitions, estimation tools and results are presented.

T h i s chapter should

bring ideas of how these stochastic local models can be conceived and how to carry out estimation and prediction in the state space model context.

Also, this chapter is

fundamental for the development of the next Chapter. In Chapter 3, estimation within the continuous state space model is addressed. New results on the implementation of estimation and prediction tools (diffuse K a l m a n filter and diffuse smoothing) are presented. In particular, initialisation issues of the diffuse K a l m a n filter are studied. Also, L-splines, a nonparametric smoothing tool, are briefly reviewed. T h e focus is to view L-splines in the context of continuous state space models.

Examples

are given to illustrate how nonparametric regression methods can be easily implemented

8

when using a model based approach.

T h e advantages of the model based approach to

nonparametric smoothing are illustrated with examples. It is shown how the exploration of a model, rather than the construction and minimisation of an objective function, can result in a fruitful endeavour. In Chapter 4 , a review of polynomial smoothing splines is given. Mainly, a basis for the space of smoothing polynomial splines (of any pre-fixed order) is studied so as to extract and make more transparent certain properties of polynomial smoothing splines. It is shown that polynomial smoothing splines can be obtained by minimising a certain penalised least squares criterion, generalising a well known result so far proved for cubic splines only. In Chapter 5 polynomial splines are further explored and a connection between polynomial splines, solutions to penalised least squares problems and A R I M A processes is shown. In particular it is shown that polynomial smoothing of order 2m — 1 is equivalent to A R I M A ( 0 , m , m — 1) plus noise modeling where all the M A parameters are fixed. This result may bring some more light in the understanding of the polynomial spline smoothing tool.

9

Chapter 2 T h e State space m o d e l

A general class of models of much current interest is that of state space models, also referred to as S S M from now on. State space models were originally developed by control engineers, particularly for applications concerning navigation systems such as controlling the position of a space rocket.

Currently, and especially due to the contributions of

Anderson and Moore (1979), Harvey (e.g. 1989), West and Harrison (1989), de Jong (e.g. 1987,1988,1991), among others, S S M have been found useful i n time series problems. It is the purpose of this thesis to show that S S M can be useful in general regression problems as well. Suppose that pairs of data values (yi,X{),

i — 1 , . . . ,n, are observed. T h e smoothing

or regression problem consists of describing the relationship between the components of the pairs. B y this notation, the variable x - plays the role of an t

is the y's are ordered according to the x's and x\ < • • • ,x . n

indexing

variable.

That

If x is time, then the data

are denoted a time series and the interest lies i n describing the evolution of y i n time. If y is energy consumption and x is temperature then the interest lies i n the study of the changes of energy consumption as the temperature changes.

10

If y is the height of

a child and x its age in months then the interest is to study how the child grows i n height as it grows older. T h e indexing variable plays the role of imparting structure to the data; it defines contiguity of the y's. T h i n k , for example, of the y's being energy consumption of a household. Ordering the y's i n the (time) sequence they were observed usually gives a completely different structure to the data than if the j/'s had been ordered by a temperature indexing variable. W h e n the researcher measures any sort of (unobservable) signal, it will typically be disturbed by noise so that the actual observation is given by

observation=signal-(-noise.

In state space models the signal is taken to be a linear combination of a set of variables, which constitute what is called the state vector. T h e state vector at X{ denotes the so called "state of nature" at x ± . Note that the state vector is unobservable and that it has to be predicted from the data. Often the state has a physical meaning.

For example, the progress of a spacecraft

can be monitored using a S S M with a state whose components describe the velocity, acceleration, coordinates,

and rate of fuel consumption of the spacecraft.

The N A S A

Space Program utilises S S M to control its spacecrafts. In many economic contexts the state may represent constructs such as trends or seasonalities. For example, Crafts et.al. (1989) employ the S S M to predict the historical unobserved trend and cycle components of the British industrial production index. In other situations, where physical information is not available, the state can represent an unobserved and/or unknown process driving the observed system.

For example Watson and Engle (1983) use the S S M to estimate

unobserved wage rates.

Other applications of the S S M include a study i n inventory 11

control (Downing et.al., 1980)

and of statistical quality control (Phadke, 1981).

In the

area of policy, Harvey and D u r b i n (1986) conducted an interesting study of the impact of seat belt legislation on road casualties. In parametric regression the state vector stays constant as the indexing variable, x, varies and the constants that enter in the linear combination of state variables change. In both, parametric and nonparametric regression, the signal is assumed to be deterministic. In parametric regression the functional form of the signal is assumed to be completely known up to some parameters. In nonparametric regression the signal is not assumed to have a specific form but to have some general qualities, such as a certain degree of differentiability. It is usual to read, especially in the nonparametric literature, that parametric vs. nonparametric regression constitute an ambivalence that covers in great extension a universe of alternatives for the solution of regression problems. A s will be seen throughout this chapter and the following ones, there exist other options to solve regression problems. A great deal of effort has been put into studying signal plus noise models where the signal is stochastic, that is the signal is assumed to be a stochastic process rather than a deterministic function. Hence the way that the signal at Xi \ is related to the signal at Xi is +

established in a probabilistic rather than in a functional manner. T h e study of the signal as a stochastic process has yielded interesting results that allow insightful solutions of the problem of estimating the unknown signal. Perhaps because these studies were motivated by econometric applications, they were relegated to the study of equally spaced time series.

Typically, and this is a strength of state space models, the state vector changes as the

12

indexing variable, x, changes (whereas the constants that enter in the linear combination of the components of the state vector to give the signal are usually fixed.) T h i s is because the state is modeled as a stochastic process on x that suits, or is thought to suit the data well. Hence, S S M are stochastic local models for the data where local refers to the adjacency of the data dictated by the ordering variable, x.

A s a consequence, S S M are

robust in the sense that isolated departures from the bulk of data will have a limited influence in the global results of estimation. In this way, smooth estimates of the signal that produce a good local representation of the data can be obtained. A s an example, suppose that the ordering variable, x, is equally spaced, with a;,-_i—x; = 1, and that it is believed that the relationship between

y

and

x

is linear. T h e n ,

i — 1 , . . . , n , for some a, b G R. This is equivalent to saying that condition yo = a.

yi

=

a + b x i ,

= yi + b, with starting

If one wants a more flexible model than the initial linear model for

the relationship between y and x, then, starting from the linear model one could add a disturbance, e, and postulate that yi = a + bxi + e,-.

O r , one could add a disturbance,

77, to the alternative formulation and say that ?/;+,• = y,- + b + rji. These two new "more flexible" formulations are very different. T h e first one, yi — a + bxi + e'> is the usual t

linear regression paradigm, whose drawbacks, for example lack of flexibility or lack of robustness, are well known. T h e second one, j/,- - = yi + 6 + rji, is a state space description +t

that produces a

local

linear

description

of the data, it represents local characteristics of

the data. Therefore, an initial statement of a model and the placement of disturbances is crucial in making this initial model more flexible. T h e construction of S S M is concerned in identifying possible components (the state) of an initial global formulation of the signal and the placement of random disturbances in the components in order to obtain a local description.

13

Following, a review of discrete state space models is presented. Discrete state space models have been thoroughly studied especially to use these models in econometric applications, e.g. time series. As will be seen, these models can be used in most regression situations as well. T h e formulation and treatment of discrete S S M involves keeping track of the particular spacing of the indexing or ordering variable, x. T h e next chapter introduces continuous state space models which are formulated independently of any particular spacing of a;'s. Once the data are collected, a discrete model is derived from the continuous one in order to suit the data. Most of the results derived for the discrete case will apply fairly directly. In light of the above paragraph, the review of results on discrete S S M that follow is essential to the development of Chapter 3. is given.

In section 2.1 a formal definition of S S M

Section 2.2 shows some examples and explores in them the usefulness of the

formulation of S S M in certain situations.

T h e rest of the chapter deals with technical

details about the implementation of the prediction and estimation tools used with the S S M . Section 2.3 deals with the criteria that are used for estimation in the context of S S M and the K a l m a n and smoothing filters which are the tools to be used to implement the estimation. Sections 2.4 and 2.5 deal with diffuse S S M , and S S M which have applied since time immemorial. These models are useful generalisations of the original S S M of Section 2.1. Section 2.6 shows how generalised cross validation ideas can be implemented in the context of S S M if a distribution free method for estimation is preferred.

2.1

D e f i n i t i o n of the state space m o d e l

State space models offer the possibility of constructing useful local models to solve smoothing problems.

T h e advantage of explicit stochastic local models is that they are more 14

flexible than parametric regression models. In S S M the data play an important role i n determining the local changes. T h i s brings also the advantage of having robust models to local departures from the bulk of the data. Also, important observed features of the data, such as trends or cycles, may be incorporated in the model. In this section the following notation is used:

• If a is a random vector then Var(a)

denotes its variance-covariance matrix.

• 7 ~ (c, a C) indicates that 7 is a random vector with £ ' ( 7 ) = c and Var(-y) = 0 then we allow the slope of the trend to vary in each of the intervals determined by the x's. So, one can appreciate how, starting from an idealised and global deterministic signal and by adding stochastic disturbances appropriately one can construct a flexible stochastic model which will focus on local characteristics of the data.

A d d i n g disturbances to

components is more effective than just adding a single disturbance to the deterministic signal. Note how a completely deterministic treatment of the signal, as in the previous section, in the case of parametric regression, does not suggest other possible ways of making the model more flexible except for turning a parametric signal into a more unspecified one. For the seasonal component, one may assume that it is approximately a trigonometric function of x. asin(cux) + period

i

for some

bcos(u>x)

that is

p,

a c o s ( u x )

j ( x

If it were perfectly trigonometric then it would be of the form j(x)

to

=

1

)

b,

and

u>,

where w is a frequency corresponding to

This ensures 7(3;)

2ir/p.

=

7 ( 2 + p).

T h e n , defining

=

7(2,-

+

hi)



[a s'm(ujXi)

=

y*(xi

=

asin(uj(xi +

+ 6cos(a;a;j)]

hi))

+

b c o s ( u ( x i

cos(uhi)

+

+

[a cos(ujXi)

hi))



6sin(a;a;j-)]

and 7*(x i) i +

-f*(x)

one has that

— bsm(u>x),

+

a,

+ h i )

=

a c o s ( u ( x i

+ h i ) )

21

=



bsm(uj(xi

+ h i ) )

sm(u>hi)

=

=

[acos(ujXi)

Letting 7,- = 7 ( 2 : ; ) and 7* =

— 6sin(u;a;,-)] cos(u>hi)

j * ( x i )

( 7*'+i

\ _

bcos(uxi)]

sm(uhi).

one has that



\ 7*+i /

— [asin(a;Xj) +

c o s ( u h i ) - s m ( u h i )

s m ( u h i ) c o s ( u h i )

()

(2-6)

Since in real life one is unlikely to observe an exactly trigonometric seasonal signal, one may

add disturbances in (2.6) so that I 7*'+i j _ \ 7*+i /

COS(UJhi) _

s m ( u h i )

s i n ( u h i )

cos(wftj)

where K ; and K* are (0, A;,) and (0, k*) variables, re depend on h{. In this way, once one has a dynamic linear representation for an idealised signal, one may

construct a more flexible model by adding disturbances appropriately.

Following are some of the most common structural models used in practice.

P u r e local linear trend m o d e l In this case we have a model where only a trend is observed. T h i s trend is assumed to be locally linear with added disturbances. The model expressed in state space form would be

(2.8)

where the e;'s are independent of the 77^'s and £-'s e^'s are i . i . d . (0,cr ), and 7/- and 2

(0,(J .) 2

a n d (0, 0

with fixed slope

> 0

= 0

smooth trend

= 0

> 0

deterministic linear

= 0

= 0

stochastic (or local) level

Table 2.1: Different trend models.

Autoregressive component plus noise model In (2.5)

let //,• = 0 for all i and, moreover, assume that

= pf3{-\ +

0 < p < 1. T h e

signal then follows an autoregressive process of order 1. If p = 1, then the signal follows a random walk process. T h e corresponding state space formulation of the model is Vi

a

i + 1



OL{ -\- t{

=

poti + Ci

where e,-'s and the £,-'s are independent, e;'s are i . i . d (0, a ), 2

a, 2

and the £,'s are (0, a .), 2

where

depends on h{.

T h i s model is frequently used in absence of physical information about the signal.

Trigonometric cycle or seasonality model Since seasonality is a periodic phenomenon it is natural to model it using trigonometric functions.

23

T h e corresponding state space formulation is yi

=

(l_0)a - + e,t

c o s ( u h i )

s m ( u h i )

s'm(uhi)

c o s ( u h i )

a,

+

;:

,



where K i and K* are correlated white noise disturbances, usually assumed to have a common variance a .., with a ., depending on hi, and p is a "damping factor" which lies in the 2

2

range 0 < p < 1. There are further variations of these models and only some of them were illustrated.

2.2.3

A R I M A models as state space models

A complete reference for A R I M A models can be found in Box, Jenkins and Reinsel (1994). A R I M A models are based on the idea that a time series, yi, i = 1 , 2 , 3 , . . . , i n which successive values are highly dependent, can frequently be regarded as generated from a series of independent (0,a ) 2

"shocks" or disturbances

is called a discrete white noise process.

T h e process u -, t

u;-2, • • •

It is assumed that some sort of "linear filter"

transforms the white noise process into the stochastic process of interest yielding yi = p, +

Ui +

+ f/>2^t-2

ipiUi-i

+ • •

where p represents the level of the process. T h e attention

is focused on modeling the signal as a stationary process, an assumption which is crucial for estimation and modeling purposes. T h e above idea is conceptually based on a result by Wold (1938), who established that any zero-mean purely non-deterministic stationary process ?/,• possesses a linear representation yi = Ui + YsJLi V'j^i-j) with the u,'s being (0, a ) and uncorrelated, and Y^JLi ip] < °°2

In this sense, the models that are studied are moving average ( M A ) models, where the process j/ - is assumed to be a finite linear combination of i . i . d . t

24

disturbances, and

the autoregressive models ( A R ) , where yi is assumed to be a finite linear combination of past values of yi plus a disturbance. Back substitution in the past values of j/j allows expressing an A R process as an infinite linear combination of iid disturbances. T o achieve greater flexibility in fitting an actual time series, yi can also be assumed to be the sum of an autoregressive process and a moving average process, or in the Box and Jenkins terminology, an A R M A process.

Box and Jenkins recognise the importance of having at hand some nonstationary models, in addition to stationary A R M A models, since some real life processes in industry, business, etc., are better represented as such. In particular they consider a special type of nonstationarity. T h e y assume that if the process is nonstationary then, after differencing it a few times it will have a stationary A R M A representation. These type of nonstationary models are called integrated moving average models or A R I M A , for short. In using A R I M A models, it is assumed that the values of the indexing variable, x, are equally spaced.

Also, let us define the backward shift operator, B, by B yi 3

=

yi-j,

T h i s operator is extensively used in the A R I M A literature. A formal definition of the mentioned processes follows.

D e f i n i t i o n 2.2 The process yi is said to follow an ARMA(p,q) yi = 8 + ai?/i_i + a yi-2 2

H

where e,- ~ (0, cr I) and yo, y~\,..., y~ 2

Also, a

v

v

h a y^ p

p

+ e- + &ie,-_i H t

model if

h &,e -_„ t

are unobserved and may be either fixed or

random.

^ 0, and b ^ 0. q

The process yi is said to follow an ARIMA(p,d,q) ARMA(p,q)

model.

25

model if (1 — B) yi d

follows

an

A s a consequence of the definition, if yi follows an A R I M A ( p , d, q) model it also follows an A R M A ( m a x ( p , d), q) (not necessarily stationary) model . It turns out that A R M A models can be cast into S S M form, as the following result shows.

T h e o r e m 2.1 Put c =

(Gardner

et.al., 1980.)

max(p, q + 1) and define T =

Suppose that yi follows { ( a i ; . . . ; a ), (7; 0)}, c

H

an ARMA(p,q) =

model.

(1; b\;...; 6 _i) c

and

W = (1; 0 ; . . . ; 0) where the aj or bj for respectively j > p and j > q are defined as zero. Then,

yi — (l,0,...,0)a,a,-+i

where u.

=

Wp3 + Tai +

Hui

(0,a ).

E x a m p l e 2.1

2

(ARIMA(0,2,1)

models as

SSM.)

Let yi be generated by an A R I M A ( 0 , 2 , 1 ) process.

Then

yi - 2j/,-_i + yi-2 = Ui +

where u 's are independent and it; 8

(0,

( y i , x - i ) , . . . ,

( y „ , x ) it is desired to obtain a prediction of n

y +k n

0.

• Interpolation: given (j/ x ) , . . . , (?/;_!, l5

x

x ^ ) ,

( y

i

+

1

, x

i

+

i ) , . . . , (y , x n

n

)

one wants to

predict y- at X{. t

• Smoothing: if the observations are generated by a S S M , a prediction of the unobserved state a{ at Xi is desired. Another smoothing problem is the signal extraction 28

problem where a prediction of the signal Xid + zT,o:,- at X{ is desired.

Also, an

estimate of the measurement and/or state error components are sometimes desired.

Frequently in the S S M there are unknown parameters, for example some disturbance variances and covariances. Prediction and smoothing can only be carried out once these parameters have been estimated.

Some ways to proceed with the estimation of these

parameters is via m a x i m u m likelihood or via cross-validation. It will be shown that the same tool that will be used for prediction can be used to obtain an expression for the likelihood of the j/'s and/or an expression for the cross-validation criterion. M a x i m u m likelihood is to be used when the disturbances, u,'s, are assumed to be normally distributed. If the normality of the disturbances is doubted then a distribution-free method like cross-validation can be used. T h e prediction criterion to be used is the M e a n square error (Mse) and the estimators that will be considered are a constant plus a linear combination of the observations. Henceforth, predictors will denote "best linear predictors" where "best" is relative to minimization of the Mse. Best linear prediction and m i n i m u m mean squared error are the key words in the implementation of the prediction procedures in the context of the S S M . T h e tools to be used are the K a l m a n and smoothing filters. These algorithms permit the incorporation of information sequentially which makes prediction and estimation methods easy to i m plement.

In prediction procedures where all the data are incorporated at once, as in

classical linear regression, quite cumbersome operations (for example the inversion of a large matrix) may have to be performed.

29

In this section definitions and technical aspects of inference in the context of the S S M , as per Definition 2.1, will be given. T h e proofs of the results will be omitted and can be found in textbooks like Anderson and Moore (1979). First, in Subsection 2.3.1 the concept and some properties of best linear prediction are formally stated.

Subsections 2.3.2

and 2.3.3

contain some important properties of best

linear predictors in terms of combining information. This is a preparation and the basic material for the K a l m a n filter algorithm, which is explained in Subsection 2.3.4.

The

K a l m a n filter provides predictions of the state at x;, given all the information collected at x i , . . . , x ; _ i . T h e smoothing filter gives predictions of the state given all the available information (fixed interval smoothing), a prediction of the state at a particular point x t

as the sample size increases (fixed point smoothing), and/or a prediction of the signal at a point Xi where no observation of y was made (interpolation). These smoothing situations are reviewed in Subsection 2.3.5.

2.3.1

M i n i m u m mean square error prediction

T h i s subsection contains the definitions and some of the properties of best linear predictors. T h i s is the type of estimators that will be considered for estimation within the SSM. Suppose that X Var(X)

= aZ, 2

x

and Y

Var(Y)

are random vectors such that E(X)

= c r % , Cov(X,Y)

=



\i , E(Y) x

2

— c — b'Y)

2

is minimum. 30

y

xy

The (best linear) predictor of X given Y is P(X\Y)

and b are such that E(X

\i ,

g | A |

(2.20)

2

Diffuse prediction

2.4.2

T h e D K F can be used to compute diffuse predictors of a:,- and y,- i n the context of the S S M . These are predictors constructed under the assumption that y S^Si,

are in the row space of Si, then as C —• 0 0 ,

41

Msefy)

{

C- )- , 1

1

and i — 1 , . . . , n + 1. If Si

—• a^S^ . 1

If the rows of A^

on —> A;( —SfS{', 1) and Mse(6ci) —•

a (Pi

+ A^Si A' ).

2

V i

If the rows of

iy

{X {-B, t

2.4.3

b) + ZiAi)

are in the row space of Si, then as C — > oo,

(-S-Si-1),

and Mse( )

-+ a\Di + E ^ E ' ^ ) .

yi

Diffuse smoothing

Smoothing refers to predicting the state vector a ; , i — 1 , . . . , n , using the entire observation vector y, where y