Additive models for correlated data with applications to air ... - CiteSeerX

Additive models for correlated data with applications to air pollution monitoring

Marco Giannitrapani, Adrian W. Bowman & E. Marian Scott Dept. of Statistics The University Glasgow G12 8QQ

May 27, 2005

Summary Additive models, where individual regression terms are assumed only to be smooth, have proved to be very powerful and useful extensions of linear regression. However, in many applications the underlying data are subject to various types of correlation, often through time series or spatial effects. A method of adjusting the construction of smooth terms to allow for the presence of correlation is proposed, along with techniques to identify the significance of individual terms. This includes assessment of the evidence for bivariate terms, expressing the interaction between two covariates. Appropriate degrees of freedom are derived for these settings and suitable methods of performing the appropriate distributional calculations for model selection are discussed. Throughout the paper, the methods are applied to the analysis of data on SO2 at a European monitoring site.

Keywords: additive model, approximate F test, approximate degrees of freedom, local linear regression, nonparametric smoothing, quadratic forms.

1

1

Introduction

Nonparametric smoothing techniques aim to provide a means of modelling the relationships between variables without specifying any particular form for the underlying regression function. In the simplest setting, interest lies in estimating the regression function m in a relationship between a response variable y and a covariate x, expressed in the model y = m(x) + ε, where ε denotes independent random error. When several covariates are present, Stone (1985), Buja et al. (1989) and Hastie & Tibshirani (1986) proposed to extend the idea of multiple regression into a flexible form known as an additive model. For data {(yi, xi1 , . . . , xip ); i = 1, . . . , n} the model can be represented as yi = µ +

p

mj (xij ) + εi ,

(1)

j=1

where the jth covariate has its own associated component mj and the regression function is constructed from the combination of these components. The errors εi are assumed to be independent. Hastie & Tibshirani (1990) give a general introduction to generalised additive models, which have proved to be very useful tools in analysing a wide variety of datasets. Hastie & Tibshirani (2000) give an example of a further extension where a Bayesian procedure for posterior sampling from a generalized additive model is proposed, based on smoothing partial residuals and adding appropriate noise to obtain a new realization of the underlying relationship. This is equivalent to Gibbs sampling for an appropriately defined Bayesian model. Standard methods of implementing additive models assume that the errors ε are independent. However, a wide variety of applications generate data which are subject to correlations of various types, often temporal or spatial. Figure 1 shows an example from air pollution involving SO2 concentrations monitored at Waldhof in Germany from 1983 to 2000. The data are plotted on the log scale to remove substantial skewness. There is clear evidence of trend and 2

3 1

ln(SO2)

−1 −3

1985

1990

1995

2000

Year

Figure 1: Weekly values of ln(SO2 ) monitored at Waldhof from 1983 to 2000.

seasonal effects and the identification of these is a classic problem in time series. Procedures such as STL (Cleveland et al. ; 1990) are available as descriptive methods. Alternatively, following Loader (1999), a simple additive model can be constructed as ln(SO2 ) = µ + m1 (year) + m2 (week) + ε,

(2)

where m1 represents the trend across years and m2 describes the seasonal component over the 53 weeks of the year. Independence of errors is an inappropriate assumption for time series such as the air pollution data, where correlation between neighbouring observations may well be expected. An independence model, when used with correlated data, is likely to provide component estimates which are valid, although not necessarily efficient. More seriously, standard errors are likely to be too small and methods of model comparison, to identify the presence or nature of covariate effects, are likely to be inappropriately applied. The effects of correlation on standard forms of nonparametric regression estimators is reviewed by Opsomer et al. (2001), with particular focus on the issue of smoothing parameter selection. A method of bandwidth choice which is suitable for correlated data was proposed. Some authors have considered adaptations of additive models to correlated data problems. Kammann & Wand (2003) presented geoadditive models, obtained from the fusion of geostatistical and ad3

ditive models. They proposed to incorporate a geographical spline component, merging this with an additive model in a mixed model formulation. Niu (1996) introduced a class of additive models for environmental time series in which both mean levels and variances of the series are modelled as nonlinear functions of meteorological variables. The fitting procedure is based on nonlinear regression and on the Box-Jenkins modelling strategy to deal with the serial correlation. The aim of this paper is to investigate methods of fitting and analysing additive models when the data contain correlated errors. These may arise from temporal correlation, as in the SO2 data described above, but the methods proposed are general and would apply equally well to spatial and other forms of correlated data. Issues of estimation are addressed in Section 2, where univariate, bivariate and additive models are all considered. Issues of inference which require different models to be compared to identify the presence and nature of particular covariate effects are addressed in Section 3. A simulation study to investigate the statistical properties of the proposed methods is described in Section 4. Applications of the methods to the air pollution data are investigated in Section 5. This involves the use of additional meteorological information which requires more complex additive models, including the use of bivariate explanatory terms. Some final discussion is given in Section 6.

2 2.1

Fitting additive models Single component models

The basic building block of an additive model is the construction of an estimate of the regression function m in the model yi = m(xi ) + εi ,

(3)

for observed data {(xi , yi ), i = 1, . . . , n}, where the errors εi are independent with variance σ 2 . This is the simplest form of nonparametric regression and 4

many approaches are available, as discussed by Green & Silverman (1994), Simonoff (1996), Bowman & Azzalini (1997) and many other authors. Although the methods differ in philosophy and style, the end results in terms of estimation are often similar. It is therefore acceptable to select a method of smoothing which is convenient to the problem at hand. The local linear method of smoothing, described by Cleveland (1979), is adopted here. It is a conceptually appealing way of constructing an estimate from observed data {(xi , yi); i = 1, . . . , n} by fitting a linear model in a local manner, using weights w(x − xi ; h) to focus attention on the estimation point x of interest. Specifically, the estimator m(x) ˆ is taken as the least squares estimator α ˆ which arises from the criterion n {yi − α − β(xi − x)}2 w(xi − x; h). min α,β

(4)

i=1

The weight function w(.; h) should be a smooth, symmetric, unimodal function which is here taken to be a normal density function with mean 0 and standard deviation h. Local linear regression has a number of attractive properties. Conceptually, it can be viewed as a relaxation of the usual linear regression model. As h becomes very large the weights attached to each observation by the kernel functions also become very large and the curve estimate approaches the fitted least squares regression line. It is appealing to have this standard model within the nonparametric formulation. From a more theoretical perspective, Fan & Gijbels (1992) and Fan (1993) showed the excellent properties which this estimator possesses. However, covariates which are defined on a cyclical scale, referring for example to seasonal information, require a different treatment. Since there is no natural analogue of linear regression on a cyclical scale, a local mean estimator can be constructed as n {yi − α}2 w(xi − x; h), min α

(5)

i=1 1

where the weight function is now defined as w(xi − x; h) = e h cos(2π

xi −x r

) , with r

representing the length of the sample space for x. The use of this von Mises weight 5

function ensures that the estimate, taken again to be the least squares solution α, ˆ is adapted to the cyclical scale, with observations at one end influencing the estimate at the other end. Loader (1999) describes an alternative approach based on a suitably scaled sin function. All of these forms of smoothing are based on weighted least squares criteria and so the resulting estimates can be expressed as linear combinations of the elements of the vector of response data y. It is convenient to define a smoothing matrix S whose rows contain the weights which are appropriate for particular estimation points on the scale of the covariate. The vector of estimated values m ˆ at these points then has the convenient representation m ˆ = Sy. This is particularly useful for the construction of standard errors and methods of model comparison. These aspects are discussed below. McMullan et al. (2005) considered the problem of nonparametric estimation in additive models when the errors terms are correlated. In the context of model (3), the vector of errors ε is assumed to have variance matrix σ 2 V , where V is a correlation matrix. McMullan et al. (2005) suggested moving to a scale where a model with independent errors could be applied, through the transformation z = K −1 y, where the correlation matrix V has the Cholesky decomposition ˜ + η, where the eleV = KK . The model (3) can then be written as z = m(x) ments of the error vector η are now independent and standard methods of fitting can therefore be applied. The regression function m ˜ is equivalent to K −1 m. The structure of the regression function, and specifically the hypothesis m ˜ = m = 0, can be examined on the new model scale. However, estimates of m by backtransformation from m ˜ can be problematic in the absence of conditions on K −1 which guarantee smoothness in the end result. An alternative formulation in the case of correlated data arises by first rewriting the local least squares criterion (4) in vector–matrix form as {y − α1n − Xβ} W {y − α1n − Xβ} , where X denotes a vector with ith element (xi − 6

x) and the matrix W has diagonal elements w(xi − x; h) with 0’s elsewhere. This immediately suggests a local least squares criterion which incorporates the correlation structure directly in {y − α1n − Xβ} (K −1 ) W K −1 {y − α1n − Xβ}.

(6)

Since W is diagonal, the central matrix expression can be written as (K −1 ) W K −1 = W V −1 = V −1 W , for computational ease. However, the form (6) emphasises the connection with the transformation approach of McMullan et al. (2005). In solving the criterion (4) for the standard local linear estimator, an explicit solution can be derived. The details are given in Wand & Jones (1995), Bowman & Azzalini (1997) and other authors. In a similar manner, an explicit expression for the case of correlated errors can be derived for the value of α ˆ in the minimisation of (6), leading to n n n n 2( vij wj (yi + yj )( xi vij wj xj ) − ( vij wj (xi yj + xj yi ))( vij wj (xj + xi ))

m(x) ˆ =

i,j

i,j

i,j

i,j

n n n 4( vij wj xi xj )( vij wj ) − ( vij wj (xj + xi ))2 i,j

i,j

i,j

(7) where vij indicates the (i, j)th element of the inverse correlation matrix V −1 and wj = w(x − xj ; h). This reduces to the appropriate expression in the case of independent errors. The local constant estimator is a special case of the local linear estimator, and its explicit representation follows as i j vij wj (yi + yj ) . m(x) ˆ = 2 i j vij wj Local linear regression smoothing can be easily extended to the case of two covariates. With independent errors, the estimate can be defined as the least squares solution α ˆ in the problem min

α,β,γ

n

{yi − α − β1 (xi1 − x1 ) − β2 (xi2 − x2 )}2 w(xi1 − x1 ; h1 )w(xi2 − x2 ; h2 ). (8)

i=1

With correlated data, formulation (6) applies with X denoting a matrix whose ith row is (xi1 − x1 , xi2 − x2 ) and β denotes the vector (β1 , β2 ) . An explicit 7

solution for α ˆ is again available by performing the detailed algebra, along the lines described by Bowman & Azzalini (2003) in the independent case. In all cases, the vector of fitted values at a set of estimation points of interest continues to have the vector-matrix representation m ˆ = Sy. However, S now involves the elements of the correlation matrix V as well as the weights wj and the covariate values xi . 2.2

Additive Models

Univariate and bivariate smoothing procedures can be used as building blocks in the estimation of individual components of the additive model (1). The standard approach employs the backfitting algorithm which iteratively applies smoothing with respect to each covariate, using as response the residuals based on the other model components. The details are described by Hastie & Tibshirani (1990) but the heart of the process is expressed in the iterative scheme (l) (l) (l−1) m ˆk − m ˆk ˆ− , j = 1, . . . , p, m ˆ j = Sj y − µ kj

(l)

where mj indicates the estimate of component j at iteration l and Sj denotes the smoothing matrix associated with component j, as defined in the section above on single components. In order to ensure unique definitions of the estimators, the intercept term can be held at µ ˆ = y¯, the sample mean, throughout and (l) ˆ j (xij ) = 0 can be applied at each additional adjustment to ensure that i m step. The initial values of m ˆ j can be set to 0 and the process continues until convergence. For a variety of reasons it is very useful to have available the projection matrices which create the final estimates of the additive components, after convergence. Explicit expressions can be derived for the case of two components. More generally, Hastie & Tibshirani (1990) explain how the projection matrices can be recovered retrospectively by repeated use of indicator variables. However, 8

it is straightforward simply to keep track of the relevant matrices as the itera(l)

tions proceed. If Pj denotes the matrix of constants which produces the current (l)

(l)

(l)

ˆ j = Pj y, then the backfitting scheme (9) can estimate of component mj as m be expressed as (l)

Pj = (In − P0 )Sj (In −

kj

where P0 represents an n × n matrix filled with the value 1/n, In represents the identity matrix of order n and 0 ≤ k ≤ p. At each stage, the updated projection (l)

matrix Pj

remains independent of the data y. The result after convergence is

a set of projection matrices {Pj ; j = 1, . . . , p} which create the estimates of the individual components and the fitted values as yˆ = P y, where P = nj=0 Pj . In the discussion above it has been assumed that the correlation matrix V is known. In practice, it will often be required to estimate this. As in the case of linear models, an effective strategy is to fit an independence model and use the residuals from this to identify a suitable structure for the error component. This follows the approach of Niu (1996). In the nonparametric case an additional issue arises as a result of the bias which is inevitably present in the estimation of the regression function. This bias will be transferred to the residuals, leading to inflation of the estimates of the correlation parameters. This is most easily ˆ i ), where m(x ˆ i ) denotes the fitted seen by considering the residuals ri = yi − m(x ˆ j (xij ), evaluated at the ith observation. Estimates of additive model µ ˆ+ jm correlation are derived from products of the form ri rj , which has mean value ˆ i )}. with principal terms E {εi εj } + bi bj , where bi denotes the bias m(xi ) − E {m(x Since the bias is smooth in the covariates, then bi and bj will have the same sign where observations i and j are close. Where the covariates are themselves time related, this therefore inflates the estimates of correlation. However, the effect is therefore a conservative one, leading to a more cautious interpretation of the regression components in the fitted model. With the SO2 data, an additive model was fitted using the smoothing 9

0.4

1.0

0.0

0.0

-0.8

-0.4

-1.0 -2.0 1985

1990

1995

2000

0

Years

10

20

30 Weeks

40

50

Figure 2: Additive models for the ln(SO2 ) data, based on an assumption of AR(1) errors. The left and right hand panels show the estimated components for the trend over years and and the seasonal effect of weeks respectively. The dashed lines identify a distance of two standard errors from the estimate.

parameters h1 = 1 for years and h2 = 0.3 for seasonality. These values were chosen to reflect the substantial degrees of smoothness expected in the trend and seasonal curves. The autocorrelation function of the resulting residuals from an independent additive model shows that a simple AR(1) process provides a good description for the error terms. As in most applications, principal interest lies in the mean structure and so it is sufficient to adopt a simple model for the errors which allows the principal effects of correlation to be incorporated without undue complexity. Where there is strong evidence that more complex time series models are required, these can be adopted without difficulty. The estimated correlation parameter, using the residuals from the independence model, is ρˆ = 0.244, indicating a mild degree of correlation. An estimate of the correlation matrix V is then available by setting the (i, j)th entry to ρˆ|i−j| . Niu (1998) proposes iteration between estimation of the mean function using the current estimation of the correlation structure and estimation of the error structure for the current residuals. In the work described in the present paper a simple one-step approach was found to be sufficient. Figure 2 illustrates the estimated components of the additive model which expresses the trend and seasonal effects on SO2 simultaneously and also

10

incorporates the fitted correlation structure. The steady decline over the years and the marked seasonal pattern are both clear. However, it is also important to indicate the variability associated with these estimates through their standard errors. These can easily be derived from the representation of the component ˆ i } = Pi V Pi σ 2 . An esestimates as m ˆ i = Pi y through the expression var{m timate of σ 2 can be constructed from the residual sum-of-squares, which can be written as y Γy, where Γ = (I − P ) V −1 (I − P ). This has mean value E y Γy = E ε Γε + m (I − P ) V −1 (I − P )m = tr {V Γ} σ 2 + b V −1 b, where b denotes the vector of bias terms at the observed data points. This therefore leads to an estimate of σ 2 as σˆ 2 =

y Γy . tr {V Γ}

Since this has mean value σ 2 + b V −1 b/ tr {V Γ}, the effect of bias again has a conservative effect by inflating the estimate of σ 2 . Following the terminology of Hastie & Tibshirani (1990), the normalising constant tr {V Γ} is referred to as the approximate degrees of freedom for error. The corresponding approximate degrees of freedom for the model is ν = n − tr {V Γ}. When V is replaced by the identity matrix this reduces to one of the standard definitions of approximate degrees of freedom used in the independent errors case. Estimated standard errors for the model component m ˆ i = Pi y are then ˆ 2 . The available as the square root of the diagonal entries of the matrix Pi V Pi σ dashed lines in Figure 2 illustrate a band corresponding to ±2 estimated standard errors around the estimate. The high precision of estimation is apparent. However, it is of interest to consider the effect of failing to incorporate the correlation structure in the estimation process. This can easily be done by replacing the matrix V in the expression above by the identity matrix. In this example there is very little difference between the estimates for each model component, reflecting the fact that the independence assumption leads to a valid, although not 11

necessarily fully efficient, estimator. However, the standard errors under an independence assumption are reduced by around 25% over those which incorporate the correlation structure. This marked effect indicates the benefit of adopting a model which properly incorporates correlated errors and therefore gives a more realistic assessment of precision in estimation.

3

Comparing nonparametric models

An essential part of using additive models is the ability to compare different candidate models and hence to assess the evidence for the retention or omission of particular components. Some terms may also be well described by parametric rather than nonparametric forms. Hastie & Tibshirani (1990) advocated the use of an F statistic, analogous to the standard procedure for linear models. Specifically, evidence against the null hypothesis that the data were generated by a model M0 , as opposed to the larger model M1 , is contained in the test statistic F =

(RSS0 − RSS1 )/(ν1 − ν0 ) , RSS1 /(n − ν1 )

(11)

constructed from the residual sums-of-squares (RSS) and approximate degrees of freedom (ν) of the two models. Hastie & Tibshirani (1990) proposed that this statistic should be compared to an F distribution with ν1 − ν0 and n − ν1 degrees of freedom, again by analogy with linear models. The presence of bias in the residual sums-of-squares and the absence of the required properties in the underlying projection matrices mean that the test statistic will not follow an F distribution under the null hypothesis, M0 . However, this distribution does provide a helpful benchmark. The expression RSS = y Γy, where Γ denotes the matrix (I−P ) V −1 (I− P ), was derived in Section 2. This can be conveniently extended here to express the residual sums-of-squares for models M0 and M1 as RSSk = y Γk y, for the models indexed by k = 0, 1. It follows that E {RSS0 − RSS1 } = E ε (Γ0 − Γ1 )ε + 12

−1 −1 b b0 −b b1 , where b0 and b1 denote the bias vectors under the two models. 0V 1V

It is very difficult to derive explicit expressions for the two bias terms, although Opsomer (2000) derived recursive expressions. The guiding principle that bias is controlled by curvature and the parameters h2 and σ 2 remains true but the situation is made very complex by the interactions between the covariates as the backfitting algorithm progresses. However, in order to ensure that the bias terms cancel one another out as far as possible, it is clear that the same smoothing parameters should be used when fitting both models. An alternative approach to the distributional calculations arises by expressing the F statistic in terms of quadratic forms as F =

y T Ay y T By

where A is the matrix (Γ0 − Γ1 )/(ν1 − ν0 ) and B is the matrix Γ1 /(n − ν1 ). The significance of the statistic F can then be expressed through its p-value, as T y Ay > Fobs p = P y T By = P {y T Cy > 0}, where Fobs denotes the value calculated from the observed data, and C is the matrix given by (A − Fobs B). Following the discussion on bias cancellation above, the distributional calculations replace y T Cy by εT Cε. Johnson and Kotz (1972) summarize general results about the distribution of a quadratic form in normal variables. Specifically, the jth cumulant of the distribution of εT Cε is available as 2j−1(j − 1)! tr {(V C)j }. This allows the moments to be calculated and matched to a convenient distributional approximation, such as a shifted and scaled chisquared distribution, a + bχ2c . Bowman & Azzalini (1997) give all the details of this process.

13

4

A simulation study

A simulation study was carried out to assess the performances of the model comparison tests discussed in the previous section. The study was formulated to match the general patterns of the SO2 data introduced in Section 1. Data were therefore simulated from the additive model y = 2 − β (year − 1990) + 0.5 cos(2π week/53) + ε,

(12)

for different values of the parameter β, with the errors ε sampled from an AR(1) process with standard deviation 0.5 and correlation parameter ρ. Simulations consisted in generating 200 data sets of 11 full years of data, from 1990 to 2000. Comparisons of the models M0 : y = α

+ m2 (week) + ε,

M1 : y = α + m1 (year) + m2 (week) + ε were carried out to assess evidence for the presence of trend over the years. The following steps provide a summary of the estimation and testing procedures discussed in Section 3. 1. Model M1 is fitted to the data using the vector of smoothing parameters h. 2. The residuals from this model are used to construct an autocorrelation function, assumed to be of AR(1) form, from which an estimate ρˆ of the correlation parameter ρ is produced. 3. Models M0 and M1 are fitted to the data using the AR(1) correlation matrix with parameter ρˆ, and the vector of smoothing parameters h. 4. These two fitted models are compared through an F-statistic, using both the approximate F -distribution and quadratic form methods to calculate the p-values.

14

Table 1 documents the empirical sizes of the tests, when data are generated from the null hypothesis M0 by setting the simulation parameter β = 0. Within each cell of the table, the upper value refers to the approximate F -test while the lower value refers to the quadratic form calculation. The results from these two methods are very similar. The upper section of the table shows how the use of a particular fixed value ρ˜ in the correlation matrix V affects the size of the test under different values of the true correlation parameter ρ. The size is well calibrated when ρ˜ is set to the true value but undershoots markedly if ρ˜ is set too high and overshoots markedly if ρ˜ is set too low. In particular, the very poor calibration of the test under the independence model (˜ ρ = 0) when the data are correlated is apparent. It is reassuring to see that use of an estimated correlation coefficient ρˆ performs very well in keeping the empirical size close to its target value of 0.05. It is only when the degree of correlation becomes strong and the smoothing parameter is set at a low value that the test runs into difficulty. The size of the test is also not greatly affected by changes in the degree of smoothing, investigated here through multiples of the reference values h1 = 1.3 and h2 = 0.4, for the trend and seasonal components respectively. This range of values spans the degree of smoothing used with the SO2 data. Similar results were obtained in simulations from other functions used to define the null hypothesis. In order to investigate the power of the test, data were simulated from (12) with β = 0.025. This represents a very small degree of trend which is not easily detected visually. However, the lower section of Table 1 shows that the test is indeed effective at identifying its presence. As expected, the power of the test reduces as the degree of correlation in the data is increased. The top left hand panel of Figure 3 gives a visual impression of the sensitivity of the test by plotting data from model (12) with β = 0.025, ρ = 0.2. The graphical indication of trend is unclear, while the evidence for the presence of trend from the test is very strong, with p-values 0.009 and 0.004 from the approximate F and quadratic form calculations respectively.

15

h multiplier

0.67

ρ=0 1

1.5

ρ˜ = 0.4

0.00 0.00 0.00 0.00 0.035 0.06 0.04 0.06

0.00 0.00 0.00 0.00 0.02 0.03 0.035 0.055

0.00 0.00 0.005 0.01 0.025 0.03 0.02 0.02

0.885 0.915

0.895 0.935

0.9 0.92

ρ˜ = 0.2 ρ˜ = 0.0 ρˆ ρˆ

0.67 Size 0.00 0.00 0.025 0.04 0.195 0.25 0.065 0.09 Power 0.69 0.75

ρ = 0.2 1

1.5

0.67

ρ = 0.4 1

1.5

0.005 0.010 0.015 0.05 0.165 0.225 0.045 0.06

0.00 0.00 0.015 0.04 0.10 0.11 0.03 0.045

0.015 0.03 0.275 0.36 0.48 0.565 0.10 0.145

0.02 0.035 0.185 0.23 0.375 0.445 0.06 0.075

0.025 0.025 0.135 0.17 0.325 0.375 0.03 0.03

0.705 0.76

0.67 0.72

0.46 0.555

0.47 0.53

0.41 0.475

Table 1: The upper section of the table shows the empirical sizes of the test to compare models M0 and M1 . For each parameter setting, 200 datasets were simulated from model M0 , as described in the text. The smoothing parameters were set to h1 = 1.3, h2 = 0.4, multiplied by the values indicated in the table. ρ˜ refers to the use of a fixed value of the correlation parameter while ρˆ refers to the use of an estimated correlation coefficient. Within each cell, the upper value refers to the the approximate F-test while the lower value refers to the quadratic form calculation. The lower section of the table documents the empirical power of the test using data simulated from model (12) with β = 0.025.

In Section 2, bivariate terms were considered. These may be regarded as expressing interaction between the effects of two variables. Comparisons of a bivariate model of the form M2 : y = α + m12 (year, week) + ε with the additive model M1 were therefore also carried out to assess the effectiveness of the test in this setting. In order to assess size, data were simulated from the additive model (12) with β = 0.2 to mimic the patterns evident in the SO2 data. The results of the size simulations are displayed in Table 2. The only case where size is markedly inflated is again when a very small smoothing parameter is used and the correlation in the data becomes strong. However, there is also an indication that the quadratic form method of calculation is generally more effective than the use of the approximate F-test. In almost all cases, the approximate F-test produces too few small values while the quadratic form calculation produces empirical sizes which are closer to the target value of 0.05. In order to compute the power of the test in the bivariate setting, data 16

h multiplier

0.67

ρ=0 1

ρ˜ = ρ

0.02 0.075 0.02 0.09

0.01 0.05 0.005 0.025

1 0.96 0.995 1.00 1.00

1.5 0.965 0.99 0.99 1.00

ρˆ

h multiplier Amplitude change Phase change

1.5

0.67 Size 0.015 0.02 0.035 0.04 0.00 0.085 0.01 0.018 Power 2 1 0.915 0.925 0.975 0.955 0.955 0.94 0.995 0.965

ρ = 0.2 1

1.5

0.67

ρ = 0.4 1

1.5

0.00 0.04 0.005 0.055

0.01 0.045 0.005 0.015

0.005 0.03 0.15 0.285

0.015 0.07 0.005 0.095

0.005 0.035 0.00 0.03

1.5 0.79 0.91 0.92 0.975

2 0.68 0.86 0.795 0.96

1 0.65 0.795 0.795 0.86

1.5 0.49 0.72 0.65 0.815

2 0.415 0.615 0.49 0.75

Table 2: The upper section of the table shows the empirical sizes of the test to compare the additive and bivariate models, M1 and M2 .. For each parameter setting, 200 datasets were simulated from the additive model. The standard deviation of the errors was 0.5. The smoothing parameters were set to h1 = 1.3, h2 = 0.4, multiplied by the values indicated in the table. ρ˜ refers to the use of a plug-in value of the correlation parameter while ρˆ refers to the use of an estimated correlation coefficient. Within each cell, the upper value refers to the the approximate F-test while the lower value refers to the quadratic form calculation. The lower section of the table shows the empirical powers of the test when data were simulated from models (13) and (14). The standard deviation of the errors was 0.5. The correlation used is the estimated correlation coefficient.

were simulated from a model where the seasonality changes across the years. Two kinds of seasonal changes were considered, namely y = 2 − 0.2(year − 1990) − γ(year − 2001) cos(2π week/53) + ε,

(13)

y = 2 − 0.2(year − 1990) − γ cos(2π week/53 − λ(year − 1990)) + ε (14) In both models the seasonal cycle is described by a cosine function, but model (13) is characterized by changes in amplitude, while model (14) is characterized by changes in phase. The empirical power of the test is documented in Table 2 where good performance is exhibited. For the amplitude change the parameters were set to γ = 0.05, while for the phase change the parameters were γ = 0.5 and λ = 0.125. Estimated correlation parameters were used on each occasion. An illustration is given in Figure 3, where the upper right hand panel shows data simulated with amplitude change while the lower panels illustrate data where there is phase change in the seasonal component. The inadequacy of an additive model is clearly

17

3

4

1 0

simulated values

2

3

-1 -2 1992

1994

1996

1998

2000

1990

1992

1994

Years

1996

1998

2000

Years

1995

2000 1.0

0.0

-0.5

0.0

simulated values

1.0

simulated values

0.5

2.0 1.5

simulated values

2.5

1.5

0.5

2.0

3.0

2.5

1990

-0.5

-1.0

1.0

simulated values

2 1 0 1990

0

10

20

30 Weeks

40

50

0

10

20

30 Weeks

40

50

0

10

20

30

40

50

Weeks

Figure 3: The top left panel shows data simulated from model (12), incorporating a small trend over the years. The top right hand panel shows data simulated from model (13), incorporating a small change in amplitude over the years. The lower panels illustrate data from model (13) which has a change in the phase of the seasonal pattern. Data are plotted for years 1990, 1995 and 2000, with the underlying cosine functions superimposed.

18

indicated by the p-values of 0.001 and < 0.0001 respectively, using the quadratic form method of calculation.

5

Application to the air pollution data

The data analysed in this section are the weekly means of the natural logarithm of concentrations of SO2 monitored at Waldhof (DE02), from 1983 up to 2000. These data have been monitored by the EMEP network (Co-operative Programme for Monitoring and Evaluation of the Long Range Transmission of Air Pollutants in Europe) and are available on the web at www.emep.int. Meteorological data, on a weekly scale, on precipitation, temperature, humidity, wind speed and wind direction were also available. The wind information was combined into a mean direction, weighted by speed, to reflect the dominant direction of air flow. As a result of skewness, SO2 and rainfall were transformed to log scales. One question of interest is the extent to which meteorology may explain the variation in observed SO2 . This can be tackled by fitting the following two nested additive models. M3 : ln(SO2 ) = µ + my (year) + mw (week) + ε M4 : ln(SO2 ) = µ + my (year) + mw (week) + mr (rain) + mt (temp) + mh (humidity) + ma (air) + ε. Estimates of each component of M4 is shown in Figure 4, using the smoothing parameters 1, 0.3, 1, 2 3, 0.3 for years, weeks, rain, temperature, humidity and air flow respectively. The usefulness of the explanatory information contained in the meteorological data is confirmed by comparing M2 and M3 more formally in a hypothesis test. The resulting p-values are lower than 0.001, using both the F-distribution and quadratic form calculations. However, comparisons of the top two panels of Figure 4 with the plots in Figure 2 shows that the estimates of the trend and seasonal effects change very little between the two models. This 19

provides reassurance that these components should be well estimated even at stations where meteorological information is not available. Some of the panels also raise the issue of whether the covariate effects may be linear rather than nonparametric in nature. The adoption of a linear effect for one or more of the terms in model M4 creates a semiparametric model. The estimation process for models of this type has been thoroughly discussed by Hastie & Tibshirani (1990), Green & Silverman (1994) and other authors. In the present context, semiparametric models can be fitted simply by replacing the smoothing matrices for the appropriate terms in the iterative scheme (10) by the usual matrix expressions for the fitted values in a linear regression. The model comparison methods described in Section 3 follow immediately, using the projection matrix P produced by the backfitting algorithm. When model M4 is compared with the restricted models which constrain each term in turn to be linear, all the nonparametric terms are found to be highly significant, with the exception of humidity, whose p-value is 0.61 using the quadratic form calculations. A further question of interest is whether there is any evidence for a change in the pattern of seasonality across the years. This can be assessed through the model M5 : ln(SO2 ) = µ + myw (year, week) + mr (rain) mt (temp) + mh (humidity) + ma (air) + ε. M4 and M5 both include the meteorological variables but M5 includes trend and seasonality in a bivariate component. Model M4 therefore assumes that the pattern of seasonality is constant across years while M5 allows interaction between years and weeks. A comparison of these two models in a hypothesis test produces the p-values 0.664 and 0.739, using the F-distribution and quadratic form calculations respectively. There is therefore no indication that, at Waldhof, the seasonality of ln(SO2 ) changes across the years. This is consistent with the shape of the estimate of the bivariate component shown in Figure 5, where there is very 20

0.4 0.0

0.2

0.5

-0.4

-0.5 -1.5 1985

1990

1995

2000

0

10

20

30 Weeks

40

50

-0.5

-0.2

0.0

0.5

0.2

1.5

0.4

Years

-2

0 Rain

2

80 Humidity

90

4

-20

-10

0 10 Temperature

20

-0.4

-0.2

0.0

0.0

0.4

0.2

-4

60

70

100

-100

0 Air

100

Figure 4: The full lines show the estimates of the components of the additive model M4 for ln(SO2 ) at Waldhof (DE02). The dashed lines identify a distance of two standard errors from the estimates. The dotted lines and shaded regions identify the estimates and a distance of two standard errors under the assumption of independent errors.

21

2 1

S/m3)

ln(SO2; ug 0 -1 -2

1985

1990 Years

1995 2000

10

30 20 s Week

40

50

Figure 5: The estimate of the bivariate component for trend and seasonality from model M5 .

little indication of any departure from an additive structure.

6

Discussion

Techniques for fitting and testing additive models with correlated data have been described. Simulation studies show the importance of using adjustments of this type when correlation is present. The methods have been applied to measurements of air pollution in the form of SO2 concentrations and an additive model has been fitted and analysed. The dotted lines and shaded areas in the panels of Figure 4 show the estimates of the components when serial correlation is ignored. In this particular example, the broad patterns remain the same but the incorporation of correlation into the model for the data places the results on a firmer methodological footing and gives confidence in the results obtained. More particularly, it provides a more realistic assessment of the variability of the estimates and a more accurate method of model comparison through the distributional calculations involving the F-statistic. The selection of a suitable smoothing parameter is an important issue

22

in nonparametric regression, although Bowman & Azzalini (1997) point out that the sensitivity of this choice is often less in issues of inference than in estimation. When the response data are correlated, the choice of smoothing parameter has the effect of setting the balance between the patterns in the data which are attributed to trend in the mean and those attributed to correlated errors. In the application discussed in the paper, the smoothing parameters have been selected through judgement based on background knowledge of the nature of the trends and correlations to be expected in this setting. However, the qualitative results are, in this case, not very sensitive to the smoothing parameters used. The simulations described in the paper and the discussion on the conservative effects of bias also offer reassurance that the incorporation of correlated errors can be performed in an effective and informative manner.

Acknowledgement The assistance of the Umweltbundesamt, Federal Environmental Agency of Germany, in obtaining the meteorological data used in the paper is gratefully acknowledged. Marco Giannitrapani also acknowledges support from the Chancellor’s fund of the University of Glasgow, and the Centre for Ecology and Hydrology, Edinburgh.

References

Bowman, A. W. and Azzalini, A. (1997). Applied Smoothing Techniques for Data Analysis. Oxford University Press; Oxford. Bowman, A. W. and Azzalini, A. (2003). Computational aspects of nonparametric smoothing with illustrations from the sm library Computational Statistics & Data Analysis 42 (4), 545–560.

23

Buja, A., Hastie, T.J. & Tibshirani, R.J. (1989). Linear smoothers and additive models. Ann.Statist. 17, 453-555. Cleveland, R. B. and Cleveland, W. S. and McRae, J. E. and Terpenning, I. (1990). STL: A seasonal trend decomposition procedure based on loess. Journal of Official Statistics 6, 3–73. Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74, 829–836. Fan, J. and Gijbels, I. (1992). Variable bandwidth and local linear regression smoothers. Annals of Statistics 20 (4), 2008–2036. Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies. Annals of Statistics 21, 196–216. Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. Chapman & Hall, London. Hastie, T. and Tibshirani, R. (1986). Generalized additive models (with discussion). Stat.Sci. 1 (398), 297–318. Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman & Hall, London. Hastie, T. and Tibshirani, R. (2000). Bayesian Backfitting. Statistical Science 15 (3), 196–223. Johnson, N. L. and Kotz, S. (1972). Distributions in Statistics: Continuous Univariate Distributions, Vol. II. Wiley: New York.

24

Kammann, E. E. and Wand, M. P. (2003). Geoadditive models. Applied Statistics 52 (1), 1–18. Loader, C. (1999). Local regression and likelihood. Springer-Verlag: New York. McMullan, A. and Bowman, A. W. and Scott, E. M. (2005). Nonparametric modelling using additive models with correlated data. Technical report, Department of Statistics, The University of Glasgow. Niu, X. F. (1996). Nonlinear additive Models for environmental time series, with applications to ground-level ozone data analysis. Journal of the American Statistical Association 91 (435), 1310–1321. Opsomer, J. D. (2000). Asymptotic properties of backfitting estimators. J.Mult.Anal. 73 (2), 166–179. Opsomer, J. D. and Wang, Y. and Yang, Y. (2001). Nonparametric regression with correlated errors. Statistical Sciences 16 (2), 134–153. Simonoff, J. S. (1996). Smoothing Methods in Statistics. Springer-Verlag, New York. Stone, C.J. (1985). Additive regression and other nonparametric models. Ann.Statist. 13, 689–705. Wand, M. P. and Jones, M. C. (1995). Kernel smoothing. Chapman & Hall, London.

25

Additive models for correlated data with applications to air ... - CiteSeerX

Additive models for correlated data with applications to air ... - CiteSeerX

Suggest Documents

Model Detection for Additive Models with Longitudinal Data - CiteSeerX

Generalized Additive Models with Spatio-temporal Data

Matching DSGE Models To Data With Applications To Fiscal And ...

Transformation Models for Survival Data Analysis with Applications

structural credit risk models in banking with applications to ... - CiteSeerX

Zero adjusted models with applications to analysing ... - CiteSeerX

Smoothing Spline Models With Correlated Random Errors1 - CiteSeerX

From Domain Models to Hypermedia Applications - CiteSeerX

Generalized Additive Models (with discussion) - Stanford University

Sparse Additive Generative Models of Text - CiteSeerX

Nonlinear causal discovery with additive noise models

Business Models with Additive Manufacturing ... - Springer Link

Business Models with Additive ManufacturingâOpportunities and ...

Nonlinear causal discovery with additive noise models

correlated random effects models with unbalanced

Hierarchical Bayesian Models for Applications in ... - CiteSeerX

Convergence rates for rank-based models with applications to ...

Generalized additive models I

Memoryless additive models

Generalized additive models - Stata

Generalized Additive Models - MEXICO

Analytic sensitivity analysis for models with correlated input variables

Optimal Designs for Compartmental Models with Correlated ... - Diarium

Estimation and Testing for Varying Coefficients in Additive Models with ...

Additive models for correlated data with applications to air ... - CiteSeerX