Using of generalized additive model for model ... - Academic Journals

11 downloads 0 Views 82KB Size Report
Multiple Poisson regression analysis is one of the most widely used statistical techniques in analysing air pollution data. It is a powerful tool when its ...
Scientific Research and Essay Vol.4 (9), pp. 867-871, September, 2009 Available online at http://www.academicjournals.org/SRE ISSN 1992-2248 © 2009 Academic Journals

Full Length Research paper

Using of generalized additive model for model selection in multiple poisson regression for air pollution data Y. Terzi* and M. A. Cengiz Department of Statistics, University of Ondokuz Mayıs, P. M. B. 55139, Samsun, Turkey. Accepted 4 August, 2009

Multiple Poisson regression analysis is one of the most widely used statistical techniques in analysing air pollution data. It is a powerful tool when its assumptions are met, including that the relationships between the predictors and the response are a function such as straight-line, polynomial, or exponential. In many applications, however, the reliance on a defined mathematical function is difficult. Many phenomena do not have a relationship that can be easily defined. Generalized additive models (GAM) enable us to relax this assumption by replacing a defined function with a non-parametric smoother to uncover existing relationships. GAM can be used for model selection in multiple Poisson regression. This study focuses on GAM for model selection in multiple Poisson regression for modelling associations between air pollution and increases in hospital admissions for respiratory disease. Key words: Poisson regression, generalized additive model, cubic spline, air pollution. INTRODUCTION Air pollution has been studied for many years following a series of dramatic episodes that took place in industrial countries. Interest in this phenomenon is reflected by the number of studies that have been published since the last decade. It still poses a health risk even for developed countries (Ballester et al., 2006). Associations between air pollution and increases in hospital admissions for respiratory disease, have been observed in many studies in various parts of the world (Zanobetti et al., 2000). Some studies, but not others, found associations between sulphur dioxide (SO2) and particular matter (PM10) exposures and daily or weekly hospital admissions for respiratory disease. Numerous studies have demonstrated increased risk of respiratory hospitalizations in relation to airborne particles, including particulate matter with an aerodynamic diameter ≤ 10 µm (PM10) ((Bell et al., 2008), 1, 2). The most common and consistent associations have been found with particulate matter and SO2 (Brunekreef and Holgate, 2002). In any given a day or a week, only a small portion of the population is admitted to hospital. The number of admissions to hospital is a count; that is, it can only take on

*Corresponding author. E-mail: [email protected].

values limited to the non-negative integers. This suggests that a Poisson process is the underlying mechanism being modelled (Schwartz, 1996). Therefore Poisson regression models have been used to examine the associations between daily or weekly hospital admissions and air pollution in many studies. A standard Poisson regression model assumes that the relationships between the predictors and the response are a function such as straight-line, polynomial, or exponential. In many epidemiogical studies, however, the reliance on a defined mathematical function is limiting. Many phenomena do not have a relationship that can be easily defined. Nonparametric regression relaxes the usual assumption of linearity and enables you to explore the data more flexibly, uncovering structure in the data that might otherwise be missed. Hastie and Tibshirani (1990) proposed generalized additive models. These models enable the mean of the dependent variable to depend on an additive predictor through a nonlinear link function. The models permit the response probability distribution to be any member of the exponential family of distributions. Many widely used statistical models belong to this general class; they include additive models for Gaussian data, nonparametric logistic models for binary data and non-parametric log-linear models for Poisson data. Generalized additive models (GAM) provide a powerful

Sci. Res. Essays

868

class of models for modelling nonlinear effects of continuous covariates in regression models with nonGaussian responses. A huge variety of competing approaches are now available for modelling and estimating nonlinear functions of continuous covariates. GAM allows the possibility of incorporating variables in a nonparametric way using smooth functions such as spline. This approach allows the possibility of incorporating variables in a non-parametric way using smooth functions such as loess or spline, therefore avoiding the need to presuppose the shape of the relation and later trying to reproduce it by means of an approximate functional expression prominent examples are smoothing splines (Hastie and Tibshirani, 1990), local polynomials (Fan and Gijbels, 1996), regression splines with adaptive knot selection (Friedman and Silverman, 1989; Friedman, 1991; Stone et al., 1997) and P-splines (Eilers and Marx, 1996; Marx and Eilers, 1998). Currently, smoothing based on mixed model representations of GAM’s and extensions is extremely popular (Lin and Zhang, 1999; Currie and Durban, 2002; Wand, 2003) and the book by Ruppert et al. (2003). This study models the relations between the whole hospitalized patients, cases with respitary disease who applied to Afyon Respiratory Disease Hospital and the measures of air pollution at the city center using GAM with a cubic smoothing spline. This study was performed by retrospective evaluation of the patient’s records between 1 October 2007 - 30 September 2008. SO2 PM10 values related with the same period were extracted from the archives of Afyon Environmental Department Air Pollution Unit. MATERIALS AND METHODS This section firstly describes multiple Poisson regression in Generalized Linear Model (GLM) context and then methodology behind generalized additive models (GAM). Let Y be a response random variable and X1, X2, ... , Xp be a set of predictor variables. A regression procedure can be viewed as a method for estimating the expected value of Y given the values of X1, X2, ... , Xp. The standard Poisson regression model assumes a linear form for the conditional expectation:

E (Y X 1 , X 2 ,..., X p ) = β 0 + β1 X 1 + β 2 X 2 + ... + β p X p The link function for the Poisson model is the log function. Assume

the mean of the Poisson distribution is µ (x ) , the dependence of

µ (x)

and independent variable X1, ... ,Xk is

g(µ ) = log(µ ) = η . The additive model generalizes the linear model by modelling the conditional expectation as:

E (Y X 1 , X 2 ,..., X p ) = s 0 + s1 ( X 1 ) + s 2 ( X 2 ) + ... + s p ( X p )

Where si(X), i = 1,2, ... , p are smooth functions. Similar to generalized linear models, generalized additive models consist of a random component, an additive component, and a link function relating the two components. The response Y, the random component, is assumed to have exponential family density

fY ( y; θ ; φ ) = exp Where

θ

yθ − b(θ ) + c( y , φ ) a(φ )

is called the natural parameter and

parameter. The mean of the response variable

µ

φ

is the scale

is related to the

set of covariates X1, X2, ... , Xp by a link function g. The quantity

η = s0 +

p i =1

si ( X i )

where s1(·), ... , sp(·) are smooth functions defines the additive component and the relationship between µ and η is defined by

g ( µ ) = η = log µ .

A smoother is a tool for summarizing the trend of a response measurement Y as a function of one or more predictor measurements X1, ... , Xp. It produces an estimate of the trend that is less variable than Y itself. An important property of a smoother is its nonparametric nature. It doesn't assume a rigid form for the dependence of Y on X1, ... , Xp. In this study, we focus on only a cubic smoothing spline that can be used with GAM. A cubic smoothing spline is the solution to the following optimization problem: among all functions

η (x) with

two continuous

derivatives, find one that minimizes the penalized least square: n i =1

b

( yi − η ( xi )) 2 + λ (η ′′(t )) 2 dt a

Where λ is a fixed constant and

a ≤ x1 ≤ ... ≤ x n ≤ b .

The first term measures closeness to the data while the second term penalizes curvature in the function. It can be shown that there exists an explicit, unique minimizer and that minimizer is a natural cubic spline with knots at the unique values of xi. The parameter is the smoothing parameter.

λ

Application This study presents the relations between the whole hospitalized patients, cases with respitary disease who applied to Afyon Respiratory Disease Hospital and the measures of air pollution at the city center. This study was performed by retrospective evaluation of the patient’s records between 1 October 2007 - 30 September 2008. SO2 - PM10 values related with the same period were extracted from the archives of Afyon Environmental Department Air Pollution Unit. Weekly records of hospital admissions for KOAH were obtained at Afyon State Hospital in the period from December 2006 to December 2008. Weekly average levels of SO2 and PM10 were obtained from the environmental state agency. Weekly counts of respiratory admissions were considered as the dependent variable of pollutants in Poisson regression model. The first goal of this analysis is to identify associations between

Terzi and Cengiz

869

Table 1. Analysis of parameter estimates for standard multiple Poisson regression.

Parameter

Df

Estimate

Intercept SO2 PM10

1 1 1

0.3932 0.0022 0.0041

Standard error 0.1638 0.0013 0.0011

Wald 95% confidence limits 0.0723 -0.0002 0.0019

0.7142 0.0047 0.0064

Chi-Square

Pr > ChiSq

5.77 3.15 12.77

0.0163 0.0759 0.0004

Table 2. Regression model analysis parameter estimates using GAM.

Parameter estimate 0.43041 0.00254 0.00375

Parameter Intercept Linear (SO2) Linear (PM10)

Standard error 0.17136 0.00139 0.00115

t value

Pr > |t|

2.51 1.83 3.27

0.0136 0.0704 0.0014

Table 3. Smoothing model analysis of deviance using GAM with a cubic smoothing spline.

Source Spline(SO2) Spline(PM10)

Df 3 3

Sum of Squares 14.552864 2.476951

air pollution and hospital admissions for respiratory disease using multiple Poisson regression in GLM context. So PROC GENMOD in SAS can be used to investigate the relationship among hospital admissions and the predictors (SO2 and PM10). The GENMOD analysis of the independent variable effects is shown in Table 1. From Table 1, first model obtained using standard multiple Poisson regression is as follows:

log(Y ) = 0.3932 + 0.0022 SO2 + 0.0041PM 10 Where Y is counts of weekly hospital admissions. In Table 1, the analysis of parameter estimates results show that the effect of pm10 on hospital admissions is highly significant and the effect of SO2 on hospital admissions is insignificant at the 5% level. Standard multiple Poisson regression assumes a strict linear relationship between the response and the predictors. GAM can be performed using PROC GAM to investigate a less restrictive model, with moderately flexible spline terms for each of the predictors. We use the model that requests an additive model using a cubic smoothing spline for each term. Each term is fit using a univariate smoothing spline with three degrees of freedom. Table 2 indicates that the partial predictions corresponding to PM10 have a linear pattern and to SO2 do not have not a linear pattern at 0.05 significance level. Looking at Table 3 we can easily see that if there is a quadratic relation between hospital admissions and PM10 and hospital admissions and SO2. Table 3 shows that the partial predictions corresponding to SO2 have a quadratic pattern and to PM10 do not have not a quadratic pattern at 0.05 significance level. The results from Tables 2 and 3 can be obtained by plotting of hospital admissions against SO2 and PM10. Figure 1 can be used to see why standard multiple Poisson

Chi-Square 8.9761 1.5278

Pr > ChiSq 0.0296 0.6759

regression gives the results that the effect of pm10 on hospital admissions is highly significant and the effect of SO2 on hospital admissions is insignificant at the 5% level. The plots show that the partial predictions corresponding to SO2 a have a quadratic pattern, while PM10 have a linear pattern. Combining the results from standard Poisson regression model and GAM, the new model can be constructed as follows. We use the standard multiple Poisson regression with SO2 with a quadratic term together and with PM10with only linear term. Therefore PROC GENMOD in SAS is used again to investigate the relationship among hospital admissions and the predictors (SO2 and PM10). Analysis of new variable effects is shown in Table 4. From Table 4, the second model obtained using standard multiple Poisson regression with GAM is as follows

log(Y ) = −0.2217 + 0.0212 SO 2 + 0.0038 PM 10 − 0.0001SO 22 . In this case the analysis of Parameter estimation indicates that the effect of all independent variables is highly significant at the 5% level.

RESULT AND DISCUSSION In this study, firstly we used a standard Poisson regression and found association between SO2 and hospital admissions and not association between PM10 and hospital admissions. Secondly we used a generalized additive model (GAM) of Poisson regression with a cubic spline and realised that the partial predictions corresponding to PM10 have a linear pattern and to SO2 have not

Sci. Res. Essays

870

Figure 1. Partial prediction for each predictor (SO2 and PM10).

Table 4. Analysis of parameter estimates for standard multiple Poisson regression with GAM

Parameter Intercept SO2 SO2* SO2 PM10

Df

Estimate

1 1 1 1

-0.2217 0.0212 -0.0001 0.0038

Standard error 0.2507 0.0057 0.0000 0.0011

a quadratic pattern at 0.05 significance level. We selected the new independent variable structure using GAM with a cubic spline. An important difference between the first analysis of this data with a standard Poisson regression and the subsequent analysis with GAM is that GAM indicates that SO2 is a significant predictor of hospital admissions. The difference is due to the fact that the standard Poisson regression model only includes a linear effect in SO2 whereas the GAM model allows a more complex relation- ship, which the plots indicate is nearly quadratic. Having used the GAM procedure to discover an appropriate form of the dependence of hospital admissions on each of the two independent variables, you can use the standard Poisson regression to fit and assess the corresponding parametric

Wald 95% confidence limits -0.7131 0.0100 -0.0002 0.0016

0.2697 0.0323 -0.0000 0.0060

Chi-Square

Pr > ChiSq

0.78 13.89 11.68 11.23

0.3766 0.0002 0.0006 0.0008

model. REFERENCES Ballester F, Rodríguez P, Iñíguez C, Saez M, Daponte A, Galán I, Taracido M, Arribas F, Bellido J, Cirarda FB, Cañada A, Guillén JJ, Guillén-Grima F, López E, Pérez-Hoyos S, Lertxundi A, Toro S (2006). Air pollution and cardiovascular admissions association in Spain: results within the EMECAS Project. Epidemiol. Community Health 60: 328-336. Bell ML, Ebisu K, Peng RD, Walker J, Samet JM, Zeger SL, Dominici F (2008). Seasonal and regional short-term effects of fine particles on hospital admissions in 202 US Counties 1999–2005. Am. J. Epidemiol. 168(11): 1301-1310. Brunekreef B, Holgate ST (2002). Air pollution and health. Lancet. 360: 1233-1242. Currie I, Durban M (2002). Flexible smoothing with P-splines: A unified

Terzi and Cengiz

approach. Stat. Model. 4: 333-349. Eilers PHC, Marx BD (1996). Flexible smoothing using B-splines and penalized likelihood (with comments and rejoinder). Stat. Sci. 11(2): 89-121. Fan J, Gijbels I (1996). Local polynomial modelling and its applications. Chapman and Hall, London. Friedman JH (1991). Multivariate adaptive regression splines (with discussion). Ann. Stat.19: 1-141. Friedman JH, Silverman BL (1989). Flexible parsimonious smoothing and additive modelling (with discussion). Technometrics. 31: 3-39. Hastie T, Tibshirani R (1990). Generalized additive models. Chapman and Hall, London. Lin X, Zhang D (1999). Inference in generalized additive mixed models by using smoothing splines. J. Royal Stat. Soc. B. 61: 381-400. Marx BD, Eilers PHC (1998). Direct generalized additive modelling with penalized likelihood. Computational Statistics and Data Analysis. 28: 193-209. Ruppert D, Wand MP, Carroll RJ (2003). Semi-parametric regression. Cambridge University Press.

871

Schwartz J, Spix C, Touloumi G, Bachárová L, Barumamdzadeh T, le Tertre A, Piekarksi T, Ponce de Leon A, Pönkä A, Rossi G, Saez M, Schouten JP (1996). Methodological issues in studies of air pollution and daily counts of deaths or hospital admissions. J. Epidemiol. Community Health p. 50. Stone CJ, Hansen M, Kooperberg C, Truong YK (1997) polynomial splines and their tensor products in extended linear modelling (with discussion). Ann. Stat. 25: 1371-1470. Wand MP (2003). Smoothing and mixed models, Comput. Stat. 18: 223249. Zanobetti A, Schwartz J, Dockery DW (2000). Airborne particles are a risk factor for hospital admissions for heart and lung disease. Environ. Health Prospect. 108: 1071-1077.

Suggest Documents