N 44o 55.288'. E 25o 26.802'. 2. Liceul 5 (micro. XII) 298m. N 44o 54.864'. E 25o 28.090'. 3. ProCor ( micro ... sts time series analysis. Table 1 Statistical models ...
596-124 COMPARING STATISTICAL AND NEURAL NETWORK APPROACHES FOR URBAN AIR POLLUTION TIME SERIES ANALYSIS
Daniel Dunea Mihaela Oprea Emil Lungu University Petroleum-Gas of Ploiesti, Department of Informatics Bd. Bucuresti Nr. 39, Ploiesti, 100680, Romania
IASTED International Conference on Modelling, Identification and Control (MIC 2008)
I. Context The urban air pollution in Romania In many urban areas, outdoor air pollution is still a concern. Sedimentable dusts and suspended particulate matters are the main air pollutants in Romania often exceeding the legal limits (MAC) for various intervals of mediation. There are over 4.5 million inhabitants under the permanent incidence of pollution, which determines high morbidity levels. Pollutants of concern include groundlevel ozone, and particulate matter, emitted into the air by sources such as vehicles, factories, and construction activities. Both of these pollutants have been linked to asthma and other respiratory illnesses.
To protect their respiratory health, inner-city residents need timely access to air quality data. Access to air quality forecasts and real-time data can allow residents to reduce their exposures when pollutant levels are high.
Location and environmental problem
Total Suspended Particulates (TSP)– Concentration isolines
MAC exceeding frequency of TSP (Romania’s critical areas)
Urban Air Pollution - Case-Study: Targoviste City Significant Emissions sources in the area: thermal power plant and a steel factory
Main Objective of the study:
A comparative study of solving urban air quality forecasting problems with statistical methods and a feed-forward artificial neural network.
Monitoring points 1.APM : 292 m N 44o 55.288’ E 25o 26.802’ 2. Liceul 5 (micro XII) 298m N 44o 54.864’ E 25o 28.090’ 3. ProCor ( micro XII) –279 m N 44o 54.657’ E 25o 27.710’ 4. Micro VI (Autobaza) - 315 m N 44o 55.393’ E 25o 26.880’ 5. Doicesti: 325 m N 44o 29.247’ E 25o 24.679’
II. Technical description
ARIMA model introduced by Box and Jenkins implies autoregressive and moving averages parameters, and explicitly includes differentiation in the formulation of the model.
ANN (Artificial Neural Network) models. Software application was developed in C++ to perform experiments with a feed-forward neural network. Best results were obtained with 4 or 6 units in the input layer, 6 neurons in the hidden layer and one output neuron.
Four learning algorithms were used for neural net approach: Batch, Incremental,Quickprop and Rprop.
Inputs in the selected statistical and ANN models: Time series of monthly averages concentrations (Sedimentable Dusts, Total Suspended Particulates, Nitrogen Dioxide, and Sulfur Dioxide imissions) recorded during 1995-2006 in the urban area of Târgovişte (N=144) Same raw data were used in both models without using any pre-processing or noise filtering methods.
III. Technical description Time series analysis Protocol 1. Identification of the most suitable statistical model from linear trend, moving average, exponential smoothing and ARIMA methods based on root mean square error (RMSE), the statistical significance of the terms in the forecasting model and the results of five tests run on the residuals. 2. STATISTICAL MODELS: evaluation of the statistical significance based on correlation coefficient and R-squared between the forecasted variables and observed raw data, the correlation between errors and data, mean absolute error (MAE) and standard error of estimates. 3. ANN MODEL: idem The network training comprised four learning algorithms. The first two were given by the batch and incremental implementations of the standard back-propagation learning. These standard algorithms were tested with different values of the learning rate and momentum. The other two algorithms were the resilient back-propagation Rprop and Quickprop. 4. Selection of the most adequate learning algorithm based on the statistical significance (r, R-squared and the correlation between errors and data). 5. Comparing the statistical significance of statistical model and ANN model.
III. Current results Sedimentable dusts time series analysis
Table 1 Statistical models used for the time series of sedimentable dusts in Targoviste: Root mean square error (RMSE) and the results of five tests run on the residuals
RMSE
Test for excessive runs up and down
Test for excessive runs above and below median
Box-Pierce test for excessive autocorrelation
Test for difference in mean 1st half to 2nd half
Test for difference in variance 1st half to 2nd half
ARIMA(3,0,2) with constant
2.31847
Not significant (p 0.05)
Not significant (p 0.05)
marginally significant (0.01 < p 0.05)
Not significant (p 0.05)
Not significant (p 0.05)
Linear trend = 7.27945 + 0.00133407 t
2.87789
Not significant (p 0.05)
highly significant (p 0.001)
highly significant (p 0.001)
Not significant (p 0.05)
Not significant (p 0.05)
Simple moving average of 3 terms
3.09659
highly significant (p 0.001)
highly significant (p 0.001)
highly significant (p 0.001)
Not significant (p 0.05)
Not significant (p 0.05)
Simple exponential smoothing with alpha = 0.0108
2.88439
Not significant (p 0.05)
highly significant (p 0.001)
highly significant (p 0.001)
Not significant (p 0.05)
Not significant (p 0.05)
Model
ARIMA r=0.607
Table 2 Correlation coefficients forecasted/observed of the ANN algorithms used for the time series of sedimentable dusts in Targoviste
Indicator
ANN model (6,6,1) Batch
ANN model (6,6,1) Quickprop
ANN model (6,6,1) Incremental
ANN model (6,6,1) Rprop
r
0.644
0.603
0.606
0.707
Sedimentable dusts time series analysis 18
Observed ARIMA (3,0,2)
16
Concentration (g m-2)
14 12 10 8 6
r=0.607
4 2 0 0
12
24
36
48
60
72
84
96
108
120
132
144
156
Months (1995-2006) 18
Observed ANN Rprop(6,6,1)
16
Concentration (g m-2)
14 12 10
r=0.707
8 6 4 2 0 0
12
24
36
48
60
72
84
96
Months (1995-2006)
108
120
132
144
156
Total Suspended Particulates (TSP) time series analysis 300
Observed ARIMA (4,0,3)
Concentration (µg m-3)
250
200
150
100
r=0.801
50
0 0
12
24
36
48
60
72
84
96
108
120
132
144
156
Months (1995-2006) 300
Observed ANN Quickprop (4,6,1)
-3
Concentration (µg m )
250
200
150
r=0.946
100
50
0 0
12
24
36
48
60
72
84
96
Months (1995-2006)
108
120
132
144
156
Nitrogen Dioxide (NO2) time series analysis 60
Observed ARIMA (3,0,2)
Concentration (µg m-3)
50
40
30
r=0.676
20
10
0 0
12
24
36
48
60
72
84
96
108
120
132
144
156
Months (1995-2006) 60
Observed ANN incremental (6,6,1)
-3
Concentration (µg m )
50
40
r=0.877
30
20
10
0 0
12
24
36
48
60
72
84
96
Months (1995-2006)
108
120
132
144
156
Sulfur Dioxide (SO2) time series analysis 160
Observed ARIMA (3,0,3)
140
Concentration (µg m-3)
120 100 80 60
r=0.876
40 20 0
0
12
24
36
48
60
72
84
96
108
120
132
144
156
Months (1995-2006) 160
Observed ANN incremental (6,6,1)
140
Concentration (µg m-3)
120 100 80
r=0.914
60 40 20 0
0
12
24
36
48
60
72
84
96
Months (1995-2006)
108
120
132
144
156
Nitrogen Dioxide (NO2) time series analysis:
Correlation between errors and data Since the measured and simulated data are close, the plots of the observed and simulated pollutants in figures are difficult to distinguish. Residual Plot of ANN (NO2)
5.4
5.8
3.4
3.8
Errors
Errors
Residual Plot of ARIMA (NO2)
1.4 -0.6
1.8 -0.2
-2.6
-2.2
-4.6
-4.2
0
10
20 30 40 Observed series
50
60
0
10
20 30 40 Observed series
50
60
S t a n d a r d E r r o r o f E s t . = 3 .S9t4a8n2d5a r d E r r o r o f E s t . = 3 . M e a n a b s o l u t e e r r o r = 2 . 9 9 3M9e7a n a b s o l u t e e r r o r = 2 . 3 2 1
Correlation between errors and data
Residual Plot of ARIMA (SO2)
Residual Plot of ANN (SO2)
8
8
5
4
2
Errors
Errors
Sulfur Dioxide (SO2) time series analysis:
-1
0 -4
-4 -7
-8
0
30
60
90
120
Observed series of SO2
150
0
30
60
90
120
150
Observed series of SO2
S 6t 1a 8n d a r d E r r o r o f E s t . = 1 Standard Error of Est. = 12.1 Mean absolute error = 6.82 Mean absolute error = 8.12004
III. Current results General overview of the statistical and ANN model forecasting performances for Targoviste air quality time series
Air pollutant
Statistical Model
Correlation coefficient
ANN model
Correlation coefficient
Sedimentable dusts
ARIMA (3,0,2)
0.607
Rprop (6,6,1)
0.707
Total Suspended Particulates (TSP)
ARIMA (4,0,3)
0.801
Quickprop(4,6,1)
0.946
Nitrogen Dioxide (NO2)
ARIMA (3,0,2)
0.676
Incremental (6,6,1)
0.877
Sulfur Dioxide (SO2)
ARIMA (3,0,3)
0.876
Incremental (6,6,1)
0.914
IV. Innovation relevance
IV. Innovation relevance expert End-user segment – useful applications
Pollution threshold warnings
Mobile-GIS support
citizen
Remote Access SCADA
VI. Conclusions
The contribution of artificial intelligence to the environmental monitoring systems under development relates to evolutionary computing, which provides stochastic search processes that can efficiently assess complex spaces described by mathematical, statistical, neural network or fuzzy inference models applied to urban air pollution evaluation.
Machine-learning techniques are currently contributing to on-line air quality monitoring requirements.
Statistical and neural modeling techniques can also provide approximations to supplement results from computationally expensive analytic methods.
In this experiment, significant results for air quality data forecasting were obtained with Rprop (Sedimentable dusts), Quickprop (TSP), which gave a faster training using an adaptive technique for adjusting the weights of the network, and Incremental algorithm (NO2 and SO2) .
VI. Conclusions
One weakness of the ARIMA model resides in the assumption that the examined time-series is stationary and linear, and therefore has no structural changes.
Advantages of neural computing techniques over conventional statistical approaches relied on faster computation, learning ability and noise rejection.
This approach is intended to harmonize data forecasting processes using ANN worldwide. It might help the expert in subsequent assessments and should increase the role of the local and regional pollution control tools to understand how to reduce global emissions.
V. Future work
Further investigation using hourly, daily and monthly air quality data from other locations and regional level, assessing and verifying the reliability, relevance, and adequancy of ANN data forecasting.
An important step for reliable air quality forecasting is the optimal selection of ANN learning algorithm.
The automation of this component is required to optimize the informational fluxes and to facilitate the decision making process.
The software application will decide which algorithm modeled data is compatible with the time series based on the statistical significance (r, R-squared and the correlation between errors and data). Acknowledgements: The research work reported in this paper is funded by a Romanian Postdoctoral Programme under the CEEX research project no.1533/2006.