investigating data processing and modelling for urban ...

596-124 COMPARING STATISTICAL AND NEURAL NETWORK APPROACHES FOR URBAN AIR POLLUTION TIME SERIES ANALYSIS

Daniel Dunea Mihaela Oprea Emil Lungu University Petroleum-Gas of Ploiesti, Department of Informatics Bd. Bucuresti Nr. 39, Ploiesti, 100680, Romania

IASTED International Conference on Modelling, Identification and Control (MIC 2008)

I. Context The urban air pollution in Romania  In many urban areas, outdoor air pollution is still a concern.  Sedimentable dusts and suspended particulate matters are the main air pollutants in Romania often exceeding the legal limits (MAC) for various intervals of mediation.  There are over 4.5 million inhabitants under the permanent incidence of pollution, which determines high morbidity levels.  Pollutants of concern include groundlevel ozone, and particulate matter, emitted into the air by sources such as vehicles, factories, and construction activities.  Both of these pollutants have been linked to asthma and other respiratory illnesses.

To protect their respiratory health, inner-city residents need timely access to air quality data. Access to air quality forecasts and real-time data can allow residents to reduce their exposures when pollutant levels are high.

Location and environmental problem

Total Suspended Particulates (TSP)– Concentration isolines

MAC exceeding frequency of TSP (Romania’s critical areas)

Urban Air Pollution - Case-Study: Targoviste City Significant Emissions sources in the area: thermal power plant and a steel factory

Main Objective of the study:

A comparative study of solving urban air quality forecasting problems with statistical methods and a feed-forward artificial neural network.

Monitoring points 1.APM : 292 m N 44o 55.288’ E 25o 26.802’ 2. Liceul 5 (micro XII) 298m N 44o 54.864’ E 25o 28.090’ 3. ProCor ( micro XII) –279 m N 44o 54.657’ E 25o 27.710’ 4. Micro VI (Autobaza) - 315 m N 44o 55.393’ E 25o 26.880’ 5. Doicesti: 325 m N 44o 29.247’ E 25o 24.679’

II. Technical description 

ARIMA model introduced by Box and Jenkins implies autoregressive and moving averages parameters, and explicitly includes differentiation in the formulation of the model.



ANN (Artificial Neural Network) models. Software application was developed in C++ to perform experiments with a feed-forward neural network. Best results were obtained with 4 or 6 units in the input layer, 6 neurons in the hidden layer and one output neuron.



Four learning algorithms were used for neural net approach: Batch, Incremental,Quickprop and Rprop.

Inputs in the selected statistical and ANN models: Time series of monthly averages concentrations (Sedimentable Dusts, Total Suspended Particulates, Nitrogen Dioxide, and Sulfur Dioxide imissions) recorded during 1995-2006 in the urban area of Târgovişte (N=144) Same raw data were used in both models without using any pre-processing or noise filtering methods.

III. Technical description Time series analysis Protocol 1. Identification of the most suitable statistical model from linear trend, moving average, exponential smoothing and ARIMA methods based on root mean square error (RMSE), the statistical significance of the terms in the forecasting model and the results of five tests run on the residuals. 2. STATISTICAL MODELS: evaluation of the statistical significance based on correlation coefficient and R-squared between the forecasted variables and observed raw data, the correlation between errors and data, mean absolute error (MAE) and standard error of estimates. 3. ANN MODEL: idem The network training comprised four learning algorithms. The first two were given by the batch and incremental implementations of the standard back-propagation learning. These standard algorithms were tested with different values of the learning rate and momentum. The other two algorithms were the resilient back-propagation Rprop and Quickprop. 4. Selection of the most adequate learning algorithm based on the statistical significance (r, R-squared and the correlation between errors and data). 5. Comparing the statistical significance of statistical model and ANN model.

III. Current results Sedimentable dusts time series analysis

Table 1 Statistical models used for the time series of sedimentable dusts in Targoviste: Root mean square error (RMSE) and the results of five tests run on the residuals

RMSE

Test for excessive runs up and down

Test for excessive runs above and below median

Box-Pierce test for excessive autocorrelation

Test for difference in mean 1st half to 2nd half

Test for difference in variance 1st half to 2nd half

ARIMA(3,0,2) with constant

2.31847

Not significant (p  0.05)


marginally significant (0.01 < p  0.05)



Linear trend = 7.27945 + 0.00133407 t

2.87789


highly significant (p  0.001)




Simple moving average of 3 terms

3.09659






Simple exponential smoothing with alpha = 0.0108

2.88439






Model

ARIMA r=0.607

Table 2 Correlation coefficients forecasted/observed of the ANN algorithms used for the time series of sedimentable dusts in Targoviste

Indicator

ANN model (6,6,1) Batch

ANN model (6,6,1) Quickprop

ANN model (6,6,1) Incremental

ANN model (6,6,1) Rprop

r

0.644

0.603

0.606

0.707

Sedimentable dusts time series analysis 18

Observed ARIMA (3,0,2)

16

Concentration (g m-2)

14 12 10 8 6

r=0.607

4 2 0 0

12

24

36

48

60

72

84

96

108

120

132

144

156

Months (1995-2006) 18

Observed ANN Rprop(6,6,1)

16

Concentration (g m-2)

14 12 10

r=0.707

8 6 4 2 0 0

12

24

36

48

60

72

84

96

Months (1995-2006)

108

120

132

144

156

Total Suspended Particulates (TSP) time series analysis 300


Concentration (µg m-3)

250

200

150

100

r=0.801

50

0 0

12

24

36

48

60

72

84

96

108

120

132

144

156

Months (1995-2006) 300

Observed ANN Quickprop (4,6,1)

-3

Concentration (µg m )

250

200

150

r=0.946

100

50

0 0

12

24

36

48

60

72

84

96

Months (1995-2006)

108

120

132

144

156

Nitrogen Dioxide (NO2) time series analysis 60



50

40

30

r=0.676

20

10

0 0

12

24

36

48

60

72

84

96

108

120

132

144

156

Months (1995-2006) 60

Observed ANN incremental (6,6,1)

-3

Concentration (µg m )

50

40

r=0.877

30

20

10

0 0

12

24

36

48

60

72

84

96

Months (1995-2006)

108

120

132

144

156

Sulfur Dioxide (SO2) time series analysis 160


140


120 100 80 60

r=0.876

40 20 0

0

12

24

36

48

60

72

84

96

108

120

132

144

156

Months (1995-2006) 160

Observed ANN incremental (6,6,1)

140


120 100 80

r=0.914

60 40 20 0

0

12

24

36

48

60

72

84

96

Months (1995-2006)

108

120

132

144

156

Nitrogen Dioxide (NO2) time series analysis:

Correlation between errors and data Since the measured and simulated data are close, the plots of the observed and simulated pollutants in figures are difficult to distinguish. Residual Plot of ANN (NO2)

5.4

5.8

3.4

3.8

Errors

Errors

Residual Plot of ARIMA (NO2)

1.4 -0.6

1.8 -0.2

-2.6

-2.2

-4.6

-4.2

0

10

20 30 40 Observed series

50

60

0

10

20 30 40 Observed series

50

60

S t a n d a r d E r r o r o f E s t . = 3 .S9t4a8n2d5a r d E r r o r o f E s t . = 3 . M e a n a b s o l u t e e r r o r = 2 . 9 9 3M9e7a n a b s o l u t e e r r o r = 2 . 3 2 1

Correlation between errors and data

Residual Plot of ARIMA (SO2)

Residual Plot of ANN (SO2)

8

8

5

4

2

Errors

Errors

Sulfur Dioxide (SO2) time series analysis:

-1

0 -4

-4 -7

-8

0

30

60

90

120

Observed series of SO2

150

0

30

60

90

120

150

Observed series of SO2

S 6t 1a 8n d a r d E r r o r o f E s t . = 1 Standard Error of Est. = 12.1 Mean absolute error = 6.82 Mean absolute error = 8.12004

III. Current results General overview of the statistical and ANN model forecasting performances for Targoviste air quality time series

Air pollutant

Statistical Model

Correlation coefficient

ANN model

Correlation coefficient

Sedimentable dusts

ARIMA (3,0,2)

0.607

Rprop (6,6,1)

0.707

Total Suspended Particulates (TSP)

ARIMA (4,0,3)

0.801

Quickprop(4,6,1)

0.946

Nitrogen Dioxide (NO2)

ARIMA (3,0,2)

0.676

Incremental (6,6,1)

0.877

Sulfur Dioxide (SO2)

ARIMA (3,0,3)

0.876

Incremental (6,6,1)

0.914

IV. Innovation relevance

IV. Innovation relevance expert End-user segment – useful applications

Pollution threshold warnings

Mobile-GIS support

citizen

Remote Access SCADA

VI. Conclusions 

The contribution of artificial intelligence to the environmental monitoring systems under development relates to evolutionary computing, which provides stochastic search processes that can efficiently assess complex spaces described by mathematical, statistical, neural network or fuzzy inference models applied to urban air pollution evaluation.



Machine-learning techniques are currently contributing to on-line air quality monitoring requirements.



Statistical and neural modeling techniques can also provide approximations to supplement results from computationally expensive analytic methods.



In this experiment, significant results for air quality data forecasting were obtained with Rprop (Sedimentable dusts), Quickprop (TSP), which gave a faster training using an adaptive technique for adjusting the weights of the network, and Incremental algorithm (NO2 and SO2) .

VI. Conclusions 

One weakness of the ARIMA model resides in the assumption that the examined time-series is stationary and linear, and therefore has no structural changes.



Advantages of neural computing techniques over conventional statistical approaches relied on faster computation, learning ability and noise rejection.



This approach is intended to harmonize data forecasting processes using ANN worldwide. It might help the expert in subsequent assessments and should increase the role of the local and regional pollution control tools to understand how to reduce global emissions.

V. Future work 

Further investigation using hourly, daily and monthly air quality data from other locations and regional level, assessing and verifying the reliability, relevance, and adequancy of ANN data forecasting.



An important step for reliable air quality forecasting is the optimal selection of ANN learning algorithm.



The automation of this component is required to optimize the informational fluxes and to facilitate the decision making process.



The software application will decide which algorithm modeled data is compatible with the time series based on the statistical significance (r, R-squared and the correlation between errors and data). Acknowledgements: The research work reported in this paper is funded by a Romanian Postdoctoral Programme under the CEEX research project no.1533/2006.

investigating data processing and modelling for urban ...

investigating data processing and modelling for urban ...

Suggest Documents

PROCESSING FULL-WAVEFORM LIDAR DATA: MODELLING RAW

Urban growth dynamics and modelling using remote sensing data and

Wind input data for urban dispersion modelling - CiteSeerX

Point Cloud Data Fusion for Enhancing 2D Urban Flood Modelling

UnTRIM modelling for investigating environmental ...

A Computational Modelling Method for Investigating th

what's for dinner? investigating eood-processing ...

Collision Warning and Sensor Data Processing in Urban Areas

investigating geosparql requirements for participatory urban planning

Investigating grammatical processing in bilinguals

Excess mortality and urban change: Investigating ...

Data Compression and Network Processing for ...

Data Collection and Processing Instrumentation for

Conceptualisation and Data Collection for modelling - CiteSeerX

A FRAMEWORK FOR FOOD PROCESSING PLANT MODELLING

Data Modelling and Database Requirements for ... - CiteSeerX

Programmable Graphics Processing Units for Urban Landscape ...

Models and data for flood modelling

Schizoanalytical Digital Modelling for Urban Design - Cumincad

Modelling Policies for Urban Sustainability - Spiekermann & Wegener

Schizoanalytical Digital Modelling for Urban Design - Cumincad

Urban modelling for seismic prone areas - Core

Data Processing Procedures and Methodology for ...

Data Processing for Outliers Detection