1
TITLE:
2
METEOROLOGY ON AIR QUALITY USING SELF-ORGANIZING MAPS AND
3
GENERALIZED ADDITIVE MODELLING.
4
John L. Pearcea*, Jason Beringera, Neville Nichollsa, Rob J. Hyndmanb, Petteri Uotilaa, and
5
Nigel J. Tappera
6
a
School of Geography and Environmental Science, Monash University, Melbourne, Australia
7
b
Department of Econometrics and Business Statistics, Monash University, Melbourne,
8
Australia
INVESTIGATING
THE
INFLUENCE
OF
SYNOPTIC-SCALE
9 10
Keywords: air pollution, generalized additive models, self-organizing maps, and synoptic
11
meteorology.
12 13
*Corresponding author: School of Geography and Environmental Science, Monash
14
University,
15
[email protected]
Clayton,
Victoria
3800.
Tel:
+61399054457.
E-mail:
16 17 18 19 20 21 22 23 24 25 1
26
INVESTIGATING THE INFLUENCE OF SYNOPTIC-SCALE CIRCULATION ON
27
AIR
28
ADDITIVE MODELLING.
QUALITY
USING
SELF-ORGANIZING
MAPS
AND
GENERALIZED
29
ABSTRACT
30
The influence of synoptic-scale circulations on air quality is an area of increasing interest to
31
air quality management in regards to future climate change. This study presents an analysis
32
where the dominant synoptic „types‟ over the region of Melbourne, Australia are determined
33
and linked to regional air quality. First, a self-organising map (SOM) is used to generate a
34
time series of synoptic charts that classify the annual daily circulation affecting Melbourne
35
into 20 different synoptic types. SOM results are then employed within the framework of a
36
generalized additive model (GAM) to identify links between synoptic-scale circulations and
37
observed changes air pollutant concentrations. The GAMs estimate shifts in pollutant
38
concentrations under each synoptic type after controlling for long-term trends, seasonality,
39
weekly emissions, spatial variation, and temporal persistence. Results showed the aggregate
40
impact of synoptic circulations in the models to be quite modest as only 5.1% of the daily
41
variance in O3, 4.7% in PM10, and 7.1% in NO2 were explained by shifts in synoptic
42
circulations. Further analysis of the partial residual plots identified that despite a modest
43
response at the aggregate level, individual synoptic categories had differential effects on air
44
pollutants. In particular, increases of up to 40% in NO2 and PM10 and 30% in O3 occur when
45
a synoptic conditions result in a north-easterly gradient wind over the Melbourne area.
46
Additionally, NO2 and PM10 levels also showed increases of up to 40% when a strong high
47
pressure system was centered directly over the Melbourne area. In sum, the unified approach
48
of SOM and GAM proved to be a complementary suite of tools capable of identifying the
49
entire range synoptic circulation patterns over a particular region and quantifying how they
50
influence local air quality. 2
51
1. Introduction
52
Increased air pollutant concentrations in the urban environment do not typically result
53
from sudden increases in emissions, but rather from meteorological conditions that impede
54
dispersion in the atmosphere or result in increased pollutant generation (Cheng, Campbell et
55
al. 2007). A combination of meteorological variables important to these conditions includes
56
temperature, winds, radiation, atmospheric moisture, and mixing depth (EPA 2009). Because
57
synoptic-scale circulations are the envelope that govern all the above meteorological features
58
synoptic weather typing has become a popular approach for evaluating impacts of
59
meteorological conditions on air pollution (Triantafyllou 2001; Chen, Cheng et al. 2008;
60
Beaver and Palazoglu 2009). This has led the air quality community to recognize synoptic-
61
scale circulations as an important driver of local air pollution (EPA 2009).
62
We wish to increase the understanding of the relationship between synoptic-scale
63
circulations and air pollution in Melbourne, Australia. The city of Melbourne, with a
64
population of approximately 3.9 million (ABS 2010), is situated on Port Phillip Bay at the
65
south-eastern edge of continental Australia in close proximity to the Southern Ocean at 37°
66
48‟ 49” S and 144° 57 47” E (Figure 1). The climate of Melbourne can best be described as
67
moderate oceanic and the city is famous for its changeable weather conditions (BOM 2009).
68
This is due in part to the city's location at the pole-ward margin of a sub-tropical continent
69
that results in the passage of very differing air masses over the region. The mid-latitude
70
synoptic weather systems that affect the region produce persistent westerly winds between
71
the subtropical high pressures to the north and the Southern Ocean lows to the south. This
72
region is also dominated by fronts, which result from the interaction of subtropical and polar
73
air masses. Although Melbourne‟s air quality can be described as relatively good when
74
compared to other urban centres of similar size, recent periods of anomalous environmental
75
conditions present an interesting opportunity for analysis (Murphy and Timbal 2008). During 3
76
this time levels of ozone and particles have not always met air quality standards and events
77
such as bushfires, heatwaves, and dust storms have likely influenced regional air quality. The
78
overall objective of this research is to investigate how the dominant types of synoptic-scale
79
circulation over Melbourne influence local air quality. Moreover, a practical methodology is
80
presented in order to achieve these results.
81
2. Data
82
a. Large Scale Meteorological Data
83
The four-time daily gridded MSLP data from the ERA-Interim reanalysis (1989 to
84
2008) was used to develop the SOM for synoptic-scale circulations affecting Melbourne. This
85
reanalysis was produced by the European Centre for Medium-Range Weather Forecasts
86
(ECMWF) and is discussed in more detail by Uppala et al. (2008). MSLP fields were
87
obtained for 10 a.m., 4 p.m., 10 p.m., and 4 a.m. local standard time (LST) for each day in the
88
reanalysis period at a spatial resolution of 0.72° over the spatial domain of 35-44° S and 140-
89
150° E. It is important to note that ERA40 and NCEP/NCAR reanalysis products were also
90
trialled for this analysis. While these products produced climatologies of similar agreement
91
ERAI was chosen as the enhanced spatial resolution of the data produced synoptic types that
92
were more interpretable for the Melbourne region. MSLP was chosen as a proxy for
93
atmospheric circulation because it has been shown to relate well to the spatial pattern of these
94
processes.
95
b. Local Meteorological Data
96
Links between synoptic types and local weather conditions were made using daily
97
automatic weather station observations for site number 086282 (Melbourne International
98
Airport) for the period of 1999 to 2006. This site is located at 37° 40‟ 12” S and 144° 49‟ 48”
99
E with an elevation of 113 m and was chosen because a comprehensive range of measures are
4
100
collected on a consistent basis. Variables provided by Climate Information Services, National
101
Climate Centre, Bureau of Meteorology Variables included:
102
Maximum daily temperature (°C)
103
Mean sea level pressure (hPa)
104
Global radiation (MJ/m2)
105
Water vapour pressure (hPa)
106
Zonal (u) and meridional (v) wind components (km/hr)
107
Precipitation (mm).
108
Additionally, boundary layer height (BLH) was taken from the ERA-Interim data using the
109
location of 37° 30‟ 0” S and 145° 30‟ 0” E for 4 p.m. LST - the approximate time of
110
maximum boundary layer depth.
111
c. Air Pollutant Monitoring Data
112
Local air pollution data was provided by the Environmental Protection Authority
113
Victoria (EPAV) taken from the Port Phillip Bay air monitoring network (Figure 1).
114
Pollutants included ozone (O3), particulate matter ≤ 10 µg (PM10), and nitrogen dioxide
115
(NO2). O3 and NO2 concentrations are reported in parts per billion by volume (ppb) and were
116
measured using pulsed fluorescence chemiluminescence and ultra violet absorption
117
techniques. PM10 concentrations were measured using photospectrometry and are reported in
118
micrograms per cubic meter (µg/m3). This analysis uses the daily maximum value for 8-hr O3,
119
the 24-hr mean value of PM10, and the daily maximum value for 1-hr NO2 from all available
120
monitoring locations over the period of 1999 to 2006 (Table 1). These timeframes were
121
selected to parallel air quality objectives in the State Environment Protection Policy for
122
ambient air quality (SEPP 1999). Additionally, days on which significant air quality events
123
(bushfires, dust storms, factory emissions, etc.) were known to have occurred and data below
124
5 ppb for O3 and NO2 and 3 µg/m3for PM10 were removed. 5
125
3. Methods
126
a. Self-organizing Maps
127
Synoptic typing is an approach used in synoptic climatology where the atmospheric
128
state is partitioned into broad categories either in terms of the spatial patterns associated with
129
a proxy (ex. MSLP, 500 hPa height) or the multivariate characteristics of an air mass. These
130
synoptic categories or „types‟ are then used to provide insight into the influence of large scale
131
processes on local environmental conditions (Hewitson and Crane 2002). The classification
132
of synoptic types used in this study was produced using a neural networking algorithm known
133
as self-organizing maps (SOM) (Kohonen 2001). This approach was has been shown to be
134
effective means of classifying the expected synoptic circulation patterns over a particular
135
region (Hewitson and Crane 2002; Hope, Drosdowsky et al. 2006). In short, SOMs can be
136
described as a data reduction technique used to visualize and interpret large high-dimensional
137
data sets onto a two dimensional array of representative nodes (Kohonen 2001). This two
138
dimensional array of nodes is commonly referred to as the self-organizing „map‟. One feature
139
of the SOM that has been found particularly useful in synoptic climatology is the matching of
140
each period in the analysis to a particular synoptic type (Hope, Drosdowsky et al. 2006). This
141
essentially results in a time series of synoptic charts that a can be used to link environmental
142
responses to specific types of systems through time. For more information on SOM and their
143
application in synoptic climatology please see Hewitson and Crane (2002).
144
It is important to note that in practice the dimensions of the SOM are selected by the
145
user. This has direct implications for the number of synoptic states represented. We assessed
146
a multiple array of SOM dimensions in effort to identify a dimension that was adequate to
147
represent the expected range of synoptic patterns for the region and practical for use in our
148
statistical analysis. It was determined that a 4X5 array of circulation patterns met these
6
149
criteria. The software used to create the SOM is part of the SOM_PAK, available from
150
http://www.cis.hut.fi/research/som-research.
151
b.Generalized Additive Models
152
There are a multitude of techniques for analysing the effects of meteorology on air
153
pollution (Thompson, Reynolds et al. 2001). One approach that has been found particularly
154
effective at handling the complex non-linearity‟s associated with air pollution is Generalized
155
Additive Modelling (GAM) (Aldrin and Haff 2005; Carslaw, Beevers et al. 2007). The
156
additive model in the context of a concentration time series can be written in the form (Hastie
157
and Tibshirani 1990):
( )
158
∑ (
)
(2.1)
159
where g[ ] is a Gaussian distributed „log-link‟ function, yi is the ith air pollution concentration,
160
E( ) is the expected concentration of y, β0 is the overall mean of the response, sj(xij) is the
161
smooth function of ith value of covariate j, n is the total number of covariates, and εi is the ith
162
residual with var(εi)= σ2, which is assumed to be normally distributed. Smooth functions are
163
developed through an integration of model selection and automatic smoothing parameter
164
selection using penalised regression splines, which while optimizing the fit, make an effort to
165
minimize the number of dimensions in the model (Wood 2006). Interaction terms, e.g. s(x1,
166
x2), can also be modelled as thin-plate regression splines or tensor product smooths. This is a
167
notable feature for air quality applications as interaction terms have used to effectively for
168
modelling complex responses to wind components and to account for spatial trends (Bivand
169
et al. 2008; Carslaw 2007). The choice of the smoothing parameters is made through
170
restricted maximum likelihood (REML) and confidence intervals are estimated using an 7
171
unconditional Bayesian method (Wood 2006). This analysis was conducted using the gam
172
modelling function in R environment for statistical computing (R Development Core Team
173
2009) with packages „mgcv‟ (Wood 2006).
174
c. Model Development
175
The first step in the selection of individual models for O3, PM10, and NO2 was to fit a
176
preliminary base model. This was fit to each pollutant in order to control for the seasonality,
177
persistence, spatial trend, and weekly emissions patterns that exist in these data. Following
178
model (2.1) the preliminary model can be written as:
179
log(yi) = β0 + s(time) + s(dow) + s (long, lat) + s (yi-1) + εi
180
(2.2)
181
where time is a numeric vector ranging from 1 to 2922 included to account for long-term
182
trends and seasonality, dow is a numeric vector ranging from 0 to 6 included to account for
183
day-of-the-week, long and lat are the spatial coordinates of each monitor location included to
184
account for spatial trend, and yi-1 is one day lag term included to account for short-term
185
temporal persistence. It is important to note that the residual spatial variation is controlled by
186
including a tensor product smooth , s(long, lat), in the model and a smooth function of the
187
preceding day‟s pollutant concentration, s(yi-1), was included to control for autocorrelation in
188
residuals. Additionally, since air pollution data are inherently cyclic, a predetermined
189
smoothing parameter of k=32 (one knot (k) for each change of season) was used for the
190
construction of the spline function for time. The motivation for this control is that function
191
should represent a relatively symmetric cyclic pattern in the data. To check the adequacy of
192
our methods for controlling for space-time effects, box-plots and time-series plots of
8
193
residuals by monitor location were examined. No violations of assumptions were obvious in
194
any pollutant.
195
Finally, the categorical predictor for synoptic-scale circulations (C) -- included to
196
represent synoptic-scale circulation types – was added to the model. Following model (2.1)
197
final models can be written as:
198
log(yi) = β0 +s(time) + s(doy) + s(dow) + s (long, lat) + s(yi-1) + Cp + ε, (2.3)
199 200
where the term Cp represents the effect of the pth synoptic type and p = 0, 1, ..., 19.
201
d. Characterization of Synoptic Influence on Air Pollution
202
The explanatory power of model (2.3) was measured using the R2 statistic. The
203
aggregate impacts of synoptic circulations on each pollutant are assessed by taking the
204
difference in the R2 of model (2.2) and model (2.3). Individual relationships between
205
particular synoptic types and each air pollutant are assessed using partial response plots.
206
Partial response plots are used to reveal the marginal effect of each synoptic type on
207
each air pollutant. A partial response plot shows the static effect (i.e. effects that are stable
208
over time) of a particular synoptic type on a particular pollutant after accounting (i.e.,
209
controlling) for the effects of all other explanatory variables in the model (Camalier, Cox et al.
210
2007). The y-axis of each plot, which has been centred to the mean value of the response,
211
shows the marginal effect of a covariate (Harrell 2001) on a percentage scale. Specifically,
212
the marginal effect is the average percentage change in pollutant concentration as the
213
covariate of interest is varied, while all other explanatory variables are kept fixed. Partial
214
response plots make it easy to compare the size of the marginal effects of different covariates
215
on the different pollutants.
216
Technically, the displayed marginal effects are given by 100 * (Cp / Cp{ref}-1 ), where p
217
is the synoptic type of interest, Cp is the corresponding coefficient in model (2.3), and Cp{ref} is 9
218
the coefficient for the chosen reference value for p – in our case synoptic type 00. In sum,
219
the marginal effects are the estimated average effects of each synoptic type on the predicted
220
values produced by model (2.3).
221
4. Results and Discussion
222
a. Self-Organizing Maps
223
The SOM of MSLP provides a clear visualization of the range of circulation features
224
affecting Melbourne (Figure 2). Additionally, the method arranges the output grid so that
225
similar types are near each other while more distinct types are further apart. Individual maps
226
within the grid are referenced throughout the text using XY coordinates with category 00
227
being in the top left corner and category 43 being in the bottom right corner. Synoptic types
228
with strong regions of high pressure are located near the bottom right corner of the grid while
229
types with broad regions of low pressure are located near the top left. These pressure patterns
230
can be used to infer regional scale wind speeds and direction along with mixing. A secondary
231
feature of the SOM algorithm is the generation of a time series where each six hour period in
232
the analysis has been classified under a particular synoptic type represented in Figure 2.
233
Frequency analysis of these periodic classifications found that higher frequency nodes are
234
presented on the periphery of the SOM grid and that lower frequency nodes are presented
235
towards the center (Figure 3). This indicates that dominant circulations on are presented on
236
the periphery and transitional states are presented closer to the center of the grid. It is should
237
be noted that the transitions between synoptic features would be expected to reflect the
238
generally eastward movement of weather systems in this region.
239
Meteorological conditions associated with each synoptic type were determined by
240
matching the automatic weather station observations on any one day to the synoptic types
241
associated with the SOM classes 10 a.m. for that day (Table 2). Across group variability for
10
242
each meteorological variable indicates that each synoptic type has distinct local
243
meteorological conditions.
244
b.Generalized Additive Models
245
b.1 Ozone (O3)
246
Results found significant differences (F=122.6, d.f.=19, p