Studying the effect of meteorological factors on the SO2 and PM10 pollution levels with refined versions of the SARIMA model D. S. Voynikova, S. G. Gocheva-Ilieva, A. V. Ivanov, and I. P. Iliev Citation: AIP Conference Proceedings 1684, 100005 (2015); doi: 10.1063/1.4934342 View online: http://dx.doi.org/10.1063/1.4934342 View Table of Contents: http://scitation.aip.org/content/aip/proceeding/aipcp/1684?ver=pdfcov Published by the AIP Publishing Articles you may be interested in Modeling weekly maxima PM10 concentration in Malaysia AIP Conf. Proc. 1613, 217 (2014); 10.1063/1.4894348 PM10, PM2.5 and PM1 distribution in Penang Island, Malaysia AIP Conf. Proc. 1528, 146 (2013); 10.1063/1.4803585 Inflation In A Realistic SO(10) Model AIP Conf. Proc. 805, 439 (2005); 10.1063/1.2149750 A linearized Eulerian sound propagation model for studies of complex meteorological effects J. Acoust. Soc. Am. 112, 446 (2002); 10.1121/1.1485971 Crystal structure refinements with generalized scattering factors. III. Refinement of 1, 1′‐azobiscarbamide and melamine, 2,4,6‐triamino‐s‐ triazine, at the octopole level J. Chem. Phys. 65, 336 (1976); 10.1063/1.432773
This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52
Studying the Effect of Meteorological Factors on the SO2 and PM10 Pollution Levels with Refined Versions of the SARIMA Model D.S. Voynikova1, a), S.G. Gocheva-Ilieva1, b), A.V. Ivanov1, c) and I.P. Iliev2, d) 1
Department of Applied Mathematics and Modeling, Faculty of Mathematics and Informatics, Paisii Hilendarski University of Plovdiv, 24 Tzar Assen str., 4000 Plovdiv, Bulgaria 2 Department of Physics, Technical University - Plovdiv, 25 Tzanko Djusstabanov str., 4000 Plovdiv, Bulgaria a)
Corresponding author:
[email protected] b)
[email protected] c)
[email protected] d)
[email protected]
Abstract. Numerous time series methods are used in environmental sciences allowing the detailed investigation of air pollution processes. The goal of this study is to present the empirical analysis of various aspects of stochastic modeling and in particular the ARIMA/SARIMA methods. The subject of investigation is air pollution in the town of Kardzhali, Bulgaria with 2 problematic pollutants – sulfur dioxide (SO2) and particulate matter (PM10). Various SARIMA Transfer Function models are built taking into account meteorological factors, data transformations and the use of different horizons selected to predict future levels of concentrations of the pollutants.
INTRODUCTION In EU countries, the USA, and all other developed economies, efforts are made to some extent or another to preserve the environment, and such actions are expected to be extended in the future. In order to keep the air clean in Europe, specific standards and legislative measures are implemented [1-3], pollution is monitored and controlled using monitoring stations. The results are processed and systematized by national agencies [4]. In Bulgaria, 12 air pollutants are monitored continuously by 36 automated stations in urban areas. The Bulgarian Executive Environment Agency manages and coordinates activities related to the control and environmental protection within the country [5]. A large array of data have been accumulated over a period of over 10 years. This allows for scientific processing, modeling, and analysis of the data in order to find relationships and trends, and to make forecasts using various methods and techniques, including numerical, statistical, based on temporal and spatialtemporal analysis of atmospheric conditions, climate change related to air quality, etc. It needs to be noted that unfortunately such research is incidental in Bulgaria, compared to much more intensive studies carried out in the region, e.g., in Greece. Due to the large variation in concentration levels of different harmful atmospheric gases in various regions, many studies focus mainly on modeling and predicting the main pollutants – PM10, Ozone, NO, CO [6-9]. With regard to forecasting, a review of the theoretical and empirical results obtained by using the time series approach can be found in [10], where it is concluded that a third of investigations in time series are in the field of environmental sciences. Some of the most widely-used stochastic parametric methods for studying the concentrations of various air pollutants in linear and multivariate cases are based on Box-Jenkins methodology. These are used to build so-called autoregressive integrated moving average (ARIMA) and seasonal ARIMA (SARIMA) models [11-16].
Application of Mathematics in Technical and Natural Sciences AIP Conf. Proc. 1684, 100005-1–100005-12; doi: 10.1063/1.4934342 © 2015 AIP Publishing LLC 978-0-7354-1331-3/$30.00
This article is copyrighted as indicated in the article. Reuse of AIP content100005-1 is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52
We have to note that similar environmental studies were recently carried out for the town of Burgas and the town of Blagoevgrad, Bulgaria, respectively [17, 18]. The aim of this study is to build and analyze various multivariate SARIMA/Transfer function (TF) models of the two problematic pollutants SO2 and PM10 of the town of Kardzhali, Bulgaria with respect to meteorological variables, data transformations, and the use of different horizons selected to predict future levels of pollution. More specifically, the goal is to investigate the influence of normality of data on the empirical properties of the SARIMA/TF method and the predictive performance of the models. The study was carried out by using IBM SPSS software package for Windows.
DATA DESCRIPTION We examine air quality in the town of Kardzhali, a typical small urban region in Bulgaria. The town is located in the Eastern Rhodope Mountains in Southeast Bulgaria, 243 km away from the capital city of Sofia. The exact coordinates are: 41,65°N, 25,37°E, elevation of 275 m. The town is situated on the north and south banks of river Arda and it is bordered by two reservoirs - by Kardzhali Dam to the west, and by Studen Kladenets Dam to the east. The town's population is around 44,000 people. The buildings are typically low- to medium-size. There is no immediate pollution from other nearby cities and highways. The climate in the area is temperate-subtropical, mainly dry throughout the year with only 10-15 days of rain in December. Winter is relatively mild – average temperatures are around 0°С. Summer is sunny and hot, maximum temperatures reach 40-43°С. In this study we examine data for two problematic pollutants - SO2 and PM10 in the town of Kardzhali during the period of 2 years and 3 months - from 1 January 2012 to 7 March 2014 taking into account hourly data. The following six meteorological time series are used: wind speed (WS), wind direction (WD), atmospheric pressure (PRESS), global solar radiation (GSR), temperature of ground air (TEMP), and relative humidity (HUMIDITY). Figure 1 presents the wind rose diagram. A wind rose is a graphic tool used to give a succinct view of how wind speed and direction are typically distributed at a given location. For Kardzhali the wind direction WD is mainly from the north-west to the south-east, and the wind type is defined as a light breeze.
FIGURE 1. Wind rose diagram of wind speed and direction for the town of Kardzhali, Bulgaria, according to the examined data and time period
This article is copyrighted as indicated in the article. Reuse of AIP content100005-2 is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52
The specific type of wind rose is explained by the city being located in a mountainous area and effect of the Arda river, which has the same geographical orientation. The wind rose indicates that the accumulation of air pollutants is concentrated in this direction over low-lying terrain, where the city is situated. Descriptive statistics of the observed air pollutants and meteorological data for the town of Kardzhali in the considered time period are given in Table 1. It is worth noting the high levels of the mean of PM10. Both SO2 and PM10 have high values of Skewness and Kurtosis, which may indicate that their distributions deviate significantly from the normal distribution. In order to study the influence of the distribution, we also used appropriate transformations of the initial data of the pollutants. The observed concentrations of SO2 and PM10 are given in Figures 2a, 2b (hourly data). TABLE 1. Descriptive statistics of the observed air pollutants and meteorological data of the town of Kardzhali over the period between 1 January 2012 and 7 March 2014. N Mini Maxi Mean Median Std. Skew Kurto Variables mum mum Deviation ness sis 17713 SO2 (µg/m3) 0.001 593.62 14.49 6.56 25.83 6.49 70.43 18679 PM10 (µg/m3) 0.59 886.93 41.27 28.99 41.24 4.16 33.85 19128 WS (m/s) 0.001 15.98 1.09 0.77 0.87 1.80 8.63 19128 PRESS (mbar) 948.6 1000.6 978.4 978.3 6.98 -0.12 0.40 19128 GSR (W/m2) 0.001 992.97 165.9 11.01 257.8 1.60 1.35 19128 TEMP (oC) -15.33 38.87 12.92 12.71 9.94 0.13 -0.73 19128 HUMIDITY (%) 4.13 96.80 60.56 53.65 29.64 0.03 -1.52 17713 YJSO2 0.00 3.61 1.65 1.66 0.67 -0.04 -0.42 YJPM10 18679 0.44 3.71 2.48 2.47 0.35 -0.16 1.06
Std. Error of the Skewness is 0.018, Std. Error of the Kurtosis is 0.035.
(a)
(b)
FIGURE 2. Observed hourly concentrations of the pollutants of the town of Kardzhali, Bulgaria: (a) SO2, (b) PM10
On Figure 3 are shown average daily concentrations data for SO2 and PM10. The horizontal lines indicate the permissible European upper daily limits of 125 microgram/m3 for SO2 and 50 microgram/m3 for PM10, respectively. It is observed that PM10 is problematic. Although SO2 does not surpass drastically the prescribed upper limit levels, the graph shows that there is a pronounced background with harmful influence on the human health.
This article is copyrighted as indicated in the article. Reuse of AIP content100005-3 is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52
(a)
(b)
FIGURE 3. Observed daily concentrations of the pollutants: (a) SO2, (b) PM10
METHODS Data Transformations As it is well known, the application of parametric models in environmental sciences usually requires normal or near to normal distribution of variables [19]. For improving the distribution and stabilizing the variability of the data, we use the Yeo-Johnson power transformation [20]: YJ ( , x)
( x 1) 1 /
x 0, 0
log( x 1)
x 0, 0
( x 1)2 1 /(2 )
x 0, 2
log( x 1)
x 0, 2
,
[2, 2]
(1)
For considered data, the Yeo–Johnson transformation coefficients for any of the observed variables were found using simple procedure of attempts from the sequence [-2, -1.9, -1.8,…, 2]. The optimal value of was determined by minimizing the d statistic of Hinkley [19]:
d
mean( ) median( ) spread( )
,
spread( ) IRQ q0.75 q0.25 or spread( ) MAD ,
(2)
where all calculations are performed on the transformed series with the selected values of , q0.75 and q0.25 are the third and the first quartile of the sample, respectively, MAD means the mean absolute deviation. The basis of the d statistic is that, for symmetrically distributed data, the mean and median has to be very close. The application of (2) allows for the selection of the most symmetric distribution so as to estimate its proximity to normal distribution [19]. To minimize the round-off error and improve the numerical accuracy of the computations all predictor time series were initially standardized by z-scores.
This article is copyrighted as indicated in the article. Reuse of AIP content100005-4 is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52
ARIMA/SARIMA Transfer Function Model ARIMA (Autoregressive Integrated Moving Average) is a general class of models, introduced by Box and Jenkins in 1970 and widely used in environmental sciences [11]. The multiplicative univariate model for the seasonal and cyclic cases is expressed in a short form as SARIMA ( p, d , q)( P, D, Q) s . Here, the parameters are nonnegative integers denoted as follows: non-seasonal parameters: p – autoregressive process ( AR ) of order p , d – differencing ( I ) of order d , q – moving average process ( MA ) of order q . The seasonal parameters P, D, Q have analogical meaning, s is an integer indicating the order of seasonality (number of hours, days, etc.). In the multivariate case when the given series Yt depends on one or more predictor series X1t , X 2t , ,..., X kt the ARIMA/Transfer function (TF) model has the following general form [11]: k Num i Bbi f ( X ) MA a f (Yt ) i i it t i 1 Deni AR
(3)
s where is the model constant, , i are differencing operators 1 B 1 B d
D
, i 1 B
di
1 Bs
Di
,
B is the backward shift operator with BYt Yt 1 , t 1, 2,...,n is the time, f , fi are optional initial transformations
of the dependent and predictor series, at N (0, 2 ) is the white noise series normally distributed with mean zero and variance 2 , Bbi is the delay term of positive integer order bi . The moving average and autoregressive lag polynomials are respectively
AR p ( B) P ( B s ) 1 1B 2 B 2 ... p B p 1 1B s 2 B 2 s ... P B Ps
MA q ( B)Q ( B s ) 1 1B 2 B 2 ... q B q 1 1B s 2 B 2s ... Q BQs
(4)
and the numerator and denominator in (3) are
2 v s 2s Vs Deni 1 i1B i 2 B ... iv B 1 i1B i 2 B ... iV B Numi i 0 i1B i 2 B 2 ... iu Bu 1 i1B s i 2 B 2 s ... iU BUs
(5)
In (4) - (6) , j , j , j , j , j , j , j , j are the model parameters to be estimated. In the modeling procedure it is necessary to preset the following parameters: p, d , q, P, D, Q, s for the dependent variable and orders for every TF predictor variable X i : u,U , v,V , di , Di , bi from (3), (5).
Evaluation of the Model Performance and Model Selection We use the following generally-accepted model fit and model accuracy performance measures: coefficient of determination R 2 , Root Mean Square Error (RMSE), and Mean Absolute Percent Error (MAPE).
RMSE
n
t2 / n , t 1
MAPE
100 n t , n t 1 Yt
t Yt Yˆt
(6)
wherе Yˆt are the predicted values by the given model of Yt .
This article is copyrighted as indicated in the article. Reuse of AIP content100005-5 is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52
Model accuracy was also analyzed from the plots of residuals in autocorrelation functions (ACF) and Partial ACF (PACF). As a model selection criteria the Schwartz Bayesian Criterion (or Normalized BIC) was used [19, 20]. Ideally, the BIC has to be as small as possible. The final models were also selected on the basis of their forecasts for the future out-of-sample data horizons of 24, 48 and 72 hours, respectively. When more models show similar results, the parsimonious principle was respected.
RESULTS Performed Data Transformations By applying (1)-(2) were found the following optimum values of the parameter : 0.2 for SO2 and 0.2 for PM10. Figure 4 shows the distribution of initial and transformed data. It can be reported that after the Yeo-Johnson transformation (1), time series variables SO2 and PM10 have close to normal distribution, more clearly expressed for PM10. The transformed times series will be denoted by YJSO2 and YJPM10, respectively.
FIGURE 4. Histograms of SO2, PM10 and of their corresponding Yeo-Johnson power transformations YJSO2 and YJPM10
This article is copyrighted as indicated in the article. Reuse of AIP content100005-6 is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52
Building and Analyzing SARIMA/TF Models A large number of models were built in order to model the target time series of SO2 and PM10, and their transformed versions. The first stage in determining the number of parameters p, d , q, P, D, Q, s in (3) was to calculate and analyze the ACF and PACF of the modeled data. In the case of output SO2 and PM10, the respective ACF and PACF demonstrated the most notable cyclicity at 24 lags, so the order of seasonality was selected to be s 24 . This is realistic, since for every 2 consecutive days (of 24 hours each) with relatively similar atmospheric conditions, it can be expected for pollution levels to be dependent on those observed during the previous day. Here, the influence of relatively weak to moderate winds at 0.3 to 1.5 m/s over the entire period are especially significant, as these are concentrated in one direction, which is also evident by the wind rose in Figure 1. This is a prerequisite for stagnation pollution processes over the considered time period. The examination of the graphics for PACF showed significantly more complex behavior. These lead to the conclusion that the series are stationary as in both graphs there missed values close to 1 at lag 1, i.e., d 0 . For SO2 the autoregressive parameter should be between 1 and 9, and the moving average q – from 0 to 2. Respectively for PM10 the most probable values are: p from 0 to 10, q from 1 to 2. Seasonal parameters are not expressed strongly and can have values between 0, 1 and 2 for P,Q for the two examined time series. A more complicated matter is the selection of parameters for meteorological time series used as predictor series. The examination of each series showed that they are stationary and the models have to include di Di 0 . For other parameters, these are given values between 0, 1, and 2 for all parameters of predictors as per equation (3) to refine the models. These are to study the influence of the meteorological time series during the previous two hours from the every current hour t . Some of the results for the most characteristic models with best statistical indicators are given in Table 2. The influence of meteorological time series on the models has been accounted for. The best model from each group is designated by an asterisk in column 2 of Table 2. TABLE 2. SARIMA/TF models of air pollutants SO2 and PM10 and their transformed versions YJSO2 and YJPM10 for the town of Kardzhali with model fit statistics. Model Fit Statistics SARIMA/TF Variable Predictors Stationary Normali Model RMSE MAPE R2 zed BIC WS, HUMIDITY, PRESS, GSR, SO2 (2,0,3)(1,0,1)24 0.725 13.553 156.943 5.220 TEMP WS, HUMIDITY, PRESS, GSR, (1,0,5)(1,0,1)24 0.724 13.580 153.516 5.225 TEMP WS, HUMIDITY, PRESS, GSR, (1,0,8)(2,0,1)24 0.726 13.528 165.551 5.219 TEMP WS, HUMIDITY, PRESS, GSR, *(2,0,8)(2,0,1)24 0.726 13.519 162.349 5.220 TEMP WS, HUMIDITY, PRESS, TEMP (1,0,7)(1,0,1)24 0.872 0.237 25.971 -2.875 YJSO2 WS, HUMIDITY, PRESS, TEMP (1,0,5)(1,0,1)24 0.871 0.237 26.393 -2.871 WS, PRESS, TEMP (2,0,2)(2,0,1)24 0.878 0.232 26.383 -2.911 WS, PRESS, TEMP *(2,0,2)(2,0,1)24 0.879 0.233 26.476 -2.908 (3,0,2)(1,0,1)24 0.845 16.226 24.057 5.579 WS, HUMIDITY, PRESS, TEMP PM10 WS, HUMIDITY, PRESS, TEMP (2,0,5)(1,0,1)24 0.845 16.235 23.996 5.582 WS, HUMIDITY, PRESS, TEMP (2,0,2)(1,0,1)24 0.845 16.255 24.078 5.582 WS, HUMIDITY, PRESS, TEMP *(2,0,2)(2,0,1)24 0.845 16.238 24.043 5.582 (4,0,3)(1,0,1)24 0.901 0.110 3.030 -4.418 WS, HUMIDITY, PRESS, TEMP YJPM10 WS, HUMIDITY, TEMP (1,0,7)(1,0,1)24 0.901 0.109 3.033 -4,417 WS, TEMP (1,0,1)(1,0,1)24 0.900 0.110 3.036 -4.411 WS, HUMIDITY, PRESS, TEMP *(2,0,2)(1,0,1)24 0.901 0.109 3.032 -4.419
As shown, the models of the transformed variable YJSO2 have 15% higher values of R2 compared to those of non-transformed time series of SO2. In the case of PM10, the result is that the models of the transformed variable
This article is copyrighted as indicated in the article. Reuse of AIP content100005-7 is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52
YJPM10 have 5% higher values of R2 than the models of the initial variable PM10. In fact, the models on the transformed variables provide better predictions and forecasts in the selected time horizons. Analysis of the results in Table 2 shows that the meteorological factor GSR (global solar radiation) does not take any influence in all models. Following the best selected models for SO2 and YJSO2, the HUMIDITY does not have a significant effect, and is excluded from the last two models. In all models of PM10 and YJPM10, the contribution of the four predictor variables WS, PRESS, HUMIDITY and TEMP is accounted for. The use of z-scores of these variables give more precise estimation of their importance. As an example, we provide more details about the last model in Table 2 for YJPM10 (model *(2,0,2)(1,0,1)24), which has the highest statistical indexes and demonstrates the best predictive performance. The obtained Ljung-Box Q(18) statistics of the model residuals from SPSS software is 16.645 with significance equals to 0.163>0.05. This indicates that the residuals are serially uncorrelated and they appear to be white noise [21]. We can conclude that the model is adequate. Parameter estimates and statistics are given in Table 3. All the estimates are statistically significant at level 0.02. Taking into account the standardization of the predictors, one can compare the coefficients in Table 3, which leads to the conclusion that the variables zTEMP and zWS (air temperature and wind speed) have the biggest effect in the PM10 pollution of the town. TABLE 3. Parameters of the Selected SARIMA/TF Model *(2,0,2)(1,0,1)24 of YJPM10a). Term Variable Lag Estimate Std. Error t in the model Constant 2.482 0.116 21.312 YJPM10 AR Lag 1 1.775 0.023 77.697 Lag 2 -0.781 0.022 -36.224 MA Lag 1 0.783 0.024 33.290 Lag 2 0.107 0.008 13.436 AR, Seasonal Lag 1 0.995 0.001 1285.662 MA, Seasonal Lag 1 0.956 0.003 353.330 Numerator Lag 0 -0.055 0.002 -30.583 zWS Denominator Lag 1 0.455 0.026 17.833
zPRESS
Numerator
Lag 0
-0.025
0.010
-2.577
0.010
zTEMP
Numerator
Lag 0 Lag 1 Lag 1
-0.220 0.173 -0.606 2
0.011 0.040 0.184
-20.783 4.357 -3.285
0.000 0.000 0.001
Lag 0 Lag 1 Lag 1
-0.016 -0.008 0.972
0.003 0.003 0.006
-4.831 -2.420 155.357
0.000 0.016 0.000
Denominator Delay
zHUMIDITY
Numerator Denominator
a)
Sig. 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
The symbol z denotes that the predictor variables are preliminary standardized.
Prediction and Forecasting The obtained models were used for prediction and forecasting of future pollution levels in three 24-hour horizons. Figures 5 and 6 show measured values for the pollutant (in blue) and predicted values with the chosen best model (in green) for retransformed variables YJSO2 and YJPM10, respectively. Good predictive performance of the models is observed.
This article is copyrighted as indicated in the article. Reuse of AIP content100005-8 is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52
FIGURE 5. Observed values for SO2 (in blue) and predicted values (in green) with SARIMA/TF model *(2,0,2)(2,0,1) 24
FIGURE 6. Observed values for PM10 (in blue) and predicted values (in green) with SARIMA/TF model *(2,0,2)(1,0,1)24
The models were applied for forecasting the future pollution levels. Three 24-hour successive time horizons were obtained. The results are shown in In Figures 7 and 8 with comparisons against the measured data. Good agreement is achieved, except for the spikes, seen in the graphs.
This article is copyrighted as indicated in the article. Reuse of AIP content100005-9 is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52
FIGURE 7. Comparison of the observed data in the last 72 h (at the left hand side of the vertical line) and a forecasting for SO2 using holdout real data for 72 h (at the right hand side of the vertical line) with the retransformed SARIMA/TF model *(2,0,2)(2,0,1)24
FIGURE 8. Comparison of the observed data in the last 72 h (at the left hand side of the vertical line) and a forecasting for PM10 using holdout real data for 72 h (at the right hand side of the vertical line) with the retransformed SARIMA/TF model *(2,0,2)(1,0,1)24
This article is copyrighted as indicated in the article. Reuse of AIP content100005-10 is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52
CONCLUSION High performance SARIMA models are built, using 5 meteorological variables as tools for forecasting of pollution concentrations of PM10 and SO2 in the town of Kardzhali, Bulgaria. The final models are selected after modeling without transformation of initial measurement data or following the initial Yeo-Johnson transformation to improve normality. Models built on transformed time series demonstrate better statistical performance achieving a coefficient of determination of up to 88% for SO2 and 90% for PM10 and provide good forecasts. The predictive performance of the models is also analyzed at three forecast horizons – of 24, 48, and 72 hours. More specifically, it was found that: 3 to 4 meteorological factors have influence, especially air temperature, wind speed, pressure and humidity; The preliminary Yeo-Johnson power transformation gives up to 15% improvement of the R-squared for the SO2 model, and 5% up to the PM10 model compared to models with non-transformed data; The use of the models for 3 successive 24-hour horizons demonstrates a relatively good results in forecasting of the pollutants in comparison with the observed measurements. The developing of the suitable mathematical models of the air pollution in problematic urban areas appears to be an independent alternative to the conventional and officially used information systems, complementing them and discovering new features into the overall climate conditions. Apart from the SARIMA method other types of shortscale pollution models could be applied based on multiple regression, principal component regression, regularized regression, etc.
ACKNOWLEDGMENTS This work was supported in part by Paisii Hilendarski University of Plovdiv NPD under Grant NI15-FMI-004.
REFERENCES 1.
Executive Environment Agency (ExEA), Bulgaria. http://eea.government.bg/en. Accessed 21 June 2015. 2. Directive 2008/50/EC of the European Parliament and of the council of 21 May 2008 on ambient air quality and cleaner air for Europe, Official Journal of the European Union L 152/1 (2008), http://eurlex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2008:152:0001:0044:EN:PDF. Accessed 26 June 2015. 3. Air Quality Standards, European Commission. Environment, 2015, http://ec.europa.eu/ environment/air/quality/standards.htm. Accessed 21 June 2015. 4. Air quality in Europe - 2014 report, European Environment Agency, Publications, 19 Nov 2014. Available: http://www.eea.europa.eu/publications/air-quality-in-europe2014/at_download/file. 5. EEA Daily Bulletin for air quality in the country, EEA - Executive environment agency, National system for realtime air quality control in Bulgaria, http://pdbase.government.bg/airq/bulletinen.jsp. Accessed 21 June 2015. 6. U. Brunelli, V. Piazza, L. Pignato, F. Sorbello, and S. Vitabile (2007) Atmos. Environ. 41(14), 2967-2995. 7. L. Jian, Y. Zhao, Y. P. Zhu, M. B. Zhang, and D. Bertolatti (2012) Sci. Tot. Environ. 426, 336–345 (2012), doi: http://dx.doi.org/10.1016/j.scitotenv.2012.03.025 8. E. A. P. Lima, E. C. Guimaraes, S. A. Pozza, M. A. S. Barrozo, and J. R. Coury (2009) Int. J. Environ. Eng. 1(1), 80-94. 9. A. Vlachogianni, P. Kassomenos, A. Karppinen, S. Karakitsios, and J. Kukkonen (2011) Sci. Tot. Environ. 409(8), 1559–1571. 10. J. G. De Gooijer, and R. J. Hyndman (2006) 25 years of time series forecasting, Int. J. Forecasting 22, 443– 473. 11. G. E. P. Box and G. M. Jenkins, Time Series Analysis, Forecasting and Control, Holden Day, San Francisco, 1976.
This article is copyrighted as indicated in the article. Reuse of AIP content100005-11 is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52
12. P. McBerthouex and L. C. Brown, Statistics for Environmental Engineers, 2nd edn, Lewis Publishers, CRC Press LLS, Boca Raton, 2002. 13. T. Slini, A. Kaprara, K. Karatzas, and N. Moussiopoulos (2006) Environ. Modell. Softw. 21(4), 559-565. 14. U. Kumar and V. Jain (2010) Stoch. Environ. Res. Risk Asses. 24(5), 751-760. 15. P. W. G. Liu (2009) Atmos. Environ. 43, 2104-2113. 16. J. Y. Zhao, Y.-P. Zhu, M.-B. Zhang, and D. Bertolatti (2012) Sci. Tot. Environ. 426, 336-345. 17. D. Petelin, A. Grancharova, and J. Kocijana, “Evolving Gaussian process models for prediction of ozone concentration in the air,” in Simul. Model. Pract. Th. (EUROSIM 2010), 33, pp. 68–80, 2013. 18. S. G. Gocheva-Ilieva, A. V. Ivanov, D. S. Voynikova, and D. T. Boyadzhiev (2014) Stoch. Env. Res. Risk A28(4), 1045-1060. 19. D. S. Wilks, Statistical Methods in the Atmospheric Sciences, 3rd edn., Elsevier, Amsterdam, 2011. 20. I. K. Yeo and R. A. Johnson (2000) Biometrika 87(4), 954–959. 21. W. Enders, Applied Econometric Time Series, 4th edn., Wiley, New York, 2014.
This article is copyrighted as indicated in the article. Reuse of AIP content100005-12 is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 87.252.173.209 On: Wed, 18 Nov 2015 03:58:52