Apr 4, 2017 - News & Events Section . ...... services were planned for this website, such as news ..... for Your Community: The AirBeat Project of Roxbury.
Degree Name:
Master of Sciences
Student ID:
DETERMINING CHANGES IN LEVELS OF AIR POLLUTION BASED ON DIFFERENT ENVIRONMENTAL CONDITIONS:
THE CASE OF AIR POLLUTANTS NEAR THE SOHAR HIGHWAY IN OMAN
By Sara Al-Ruzeiqi
Supervisor: Professor Eran Edirisinghe
A Master's by Course Dissertation Submitted in partial fulfilment of the requirements for the award of Master of Science in Information Technology of Loughborough University
September 2014
DECLARATION I hereby declare that the master thesis entitled, “Determining changes in levels of air pollution based on different environmental conditions: the case of air pollutants near the Sohar highway in Oman” is my own original work carried out as a master student at the Loughborough University except the extent that assistant from others in the dissertation’s design and conception or in style, presentation and linguistic expression are duly acknowledged.
All sources used for this thesis have been fully and properly cited. It contains no material which to a substantial extent has been accepted for the award of any other degree at Loughborough University or other educational institution, expect where due acknowledgement is made in the thesis.
Sara Al-Ruzeiqi
(Signature) 22 September 2014
(Date)
Summary
This paper aims to establish a prediction mechanism for future values of environmental parameters such as temperature, humidity and gas concentrations including the ozone utilized through machine-learning algorithms. Evaluation of various dataset models to measure air gas concentrations and pollutant factors proved that the Bagging model was most effective in comparing two datasets of different resolutions (1-hour and 10-minute intervals). Moreover, and through data analyses of SO2, NO2, O3, benzene, toluene, o-Xylene, m-Xylene and p-Xylene gases as pollutant in SHW measured within an 11-month period, predictions of pollutant levels can attain a relative absolute error of as low as 18.01 for O3, while predictions for pollution can reach an
accuracy of 99.87% with 1 day lead time. Data for 10-minute intervals revealed
acceptable levels per AQG guidelines within the eleven month study period, but exhibited elevated levels on certain days and hours of the day. A user-friendly website TRCAQP (http://trcaqp.com) based on the result of this study has been developed to forecast air pollution in Sohar city. The system predicts and gives daily online updates for eight air pollution indicators (SO2, NO2, O3, benzene, toluene, o-Xylene, m-Xylene and p-Xylene) levels for the next days (+1day) using bagging model.
i
Acknowledgement During my journey to obtain an MSc, I haven’t travelled alone. I have received numerous support and encouragement from my supervisor, colleagues, friends and my family. I would like to thank all those people who made this thesis possible and gave me an unforgettable experience. First of all, I would like to address my most precious appreciation to my supervisor, professor Eran Edirisinghe, for his interest, guidance and encouragement through this work. His profound knowledge, positive attitude, patient guidance, and valuable suggestions on my work guaranteed the completion of this thesis. His unflinching courage and conviction will always inspire me, and I hope to continue to work with him in the future. I also would like thank my colleagues for their friendly advice, interesting discussions and enjoyable working atmosphere. My special acknowledgements go to all those people who made possible the difficult task of ‘Using Data Mining Algorithm’ for my experiments. My warm appreciation is due to TRC in Oman. I cannot find proper words to express my deep gratitude to my family and friends for their sincere encouragement and inspiration during this period, which helped to bring me into this stage of my life. And last but not least, I would like to thank all who have knowingly and unknowingly helped me and been involved in the successful completion of this project.
ii
Table of Contents Summary ............................................................................................................................................................. i Acknowledgement.............................................................................................................................................. ii Table of Contents .............................................................................................................................................. iii List of Tables..................................................................................................................................................... vi List of Figures .................................................................................................................................................. vii Abbreviations .................................................................................................................................................... ix CHAPTER 1....................................................................................................................................................... 1 Air Pollution Effects on Environmental Conditions near the Sohar Highway in Oman ................................... 1 1.1 Introduction .............................................................................................................................................. 1 1.2 Hypotheses and Testing ........................................................................................................................... 4 1.3. Limitations .............................................................................................................................................. 5 1.4 Thesis Structure ........................................................................................................................................ 5 CHAPTER 2....................................................................................................................................................... 6 Heavy Air Pollution Prediction by Using Data Mining Algorithm ................................................................... 6 2.1. Introduction ............................................................................................................................................. 6 2.2. Air Pollution Measuring Techniques ...................................................................................................... 7 2.2.1. Chemiluminescence Methods .......................................................................................................... 9 2.2.1.1. Gas-Phase Chemiluminescence .................................................................................................... 9 2.2.1.2. Liquid-Phase Chemiluminescence ................................................................................................ 9 2.2.2. Passive Samplers ............................................................................................................................ 10 2.2.3. Electrochemical Sensors ................................................................................................................ 10 2.2.4. Thick Film Sensors ........................................................................................................................ 11 2.2.5. Spectroscopic Methods .................................................................................................................. 11 2.2.6. Fourier Transform Infrared Spectrometry ...................................................................................... 12 2.3. Air Quality Forecasting Techniques ..................................................................................................... 12 2.3.1. Climatology Methods ..................................................................................................................... 12 2.3.2. Data Mining Methods .................................................................................................................... 13 2.4. Conclusion............................................................................................................................................. 16 CHAPTER 3..................................................................................................................................................... 18 Methodology and Approaches Applied in This Study ..................................................................................... 18 3.1 Introduction ............................................................................................................................................ 18 3.2 Studied gases .......................................................................................................................................... 19 3.2.1 Oxides of Nitrogen .......................................................................................................................... 20 iii
3.2.2 Ozone .............................................................................................................................................. 21 3.2.3 Sulphur Dioxide .............................................................................................................................. 21 3.2.4 BTX ................................................................................................................................................. 21 3.3 Instruments ............................................................................................................................................. 22 3.4 Procedure................................................................................................................................................ 24 CHAPTER 4..................................................................................................................................................... 27 Air Pollution Prediction Results and Discussion Using ANOVA and Data Mining Near the Sohar Highway in Oman ............................................................................................................................................................ 27 4.1 Introduction ............................................................................................................................................ 27 4.2 Data Analysis by Using Statistical Method ANOVA ............................................................................ 28 4.2.1 One hour time interval .................................................................................................................... 28 4.2.2 Ten minutes time interval................................................................................................................ 36 4.2.2.1 Air composition analysis .............................................................................................................. 37 4.2.2.2 Every 10 minutes in one hour....................................................................................................... 41 4.2.2.3 Every 1 hour in one day ............................................................................................................... 41 4.2.2.4 Weekdays over the eleven months ................................................................................................ 42 4.2.2.5 AM/PM effects over the eleven months ........................................................................................ 42 4.2.2.6 Tables with Post Hoc probabilities (Duncan test) of the monthly variations .............................. 43 4.2.2.7 Multivariate Linear Regression for the O3 .................................................................................. 44 4.3 Data Analysis by ANOVA Mmethod with Extended Sample Data and Regression Model ................. 46 4.3.1 Average gases concentrations for short period ............................................................................... 48 4.3.2 Regression model ............................................................................................................................ 53 4.4 Data Analysis by using machine learning algorithm ............................................................................. 55 4.4.1 Dataset characteristics ..................................................................................................................... 55 4.4.2 Dataset attribution process .............................................................................................................. 56 4.4.3. Pollution prediction algorithm ....................................................................................................... 61 4.4 Model validation results ......................................................................................................................... 62 Chapter 5 .......................................................................................................................................................... 72 The TRC Air Quality Forecasting System ....................................................................................................... 72 5.1 Introduction ............................................................................................................................................ 72 5.2. An Overview of the TRCAQ’S Process................................................................................................ 73 5.3. The TRCAQ Website Design................................................................................................................ 74 5.3.1. The Home Section .......................................................................................................................... 75 5.3.2. Daily Forecast Section ................................................................................................................... 75 5.3.3. About TRCAQ and What Can You Do Sections ........................................................................... 76
iv
5.3.4. Real Time Data Section ................................................................................................................. 77 5.3.5. News & Events Section .................................................................................................................. 78 5.3.6. Local Air Quality Oman................................................................................................................. 78 5.3.7. Stay Updated and TRC-AQ Alerts Sections .................................................................................. 78 5.4. TRCAQ Database Design ..................................................................................................................... 78 5.5. Software Components Used To Operate the TRCAQ .......................................................................... 79 5.5.1. Ambient Collection Stage .............................................................................................................. 80 5.5.2. Forecasting Module Stage .............................................................................................................. 80 5.5.3. Website Stage ................................................................................................................................. 80 5.6. Advantages of the TRCAQ Website ..................................................................................................... 80 5.7 Conclusions ............................................................................................................................................ 81 CHAPTER 6..................................................................................................................................................... 82 Conclusion........................................................................................................................................................ 82 Author’s Contribution ...................................................................................................................................... 85 References ........................................................................................................................................................ 86 Appendix I ........................................................................................................................................................ 94 Appendix II ...................................................................................................................................................... 96 Appendix III ..................................................................................................................................................... 99 Appendix IV ................................................................................................................................................... 100
v
List of Tables Table 2.1 Comparison of advantages and disadvantages of different air pollution measurement techniques/methods ............................................................................................................................................ 8 Table 4.1 Permissible concentrations of gases in the air, reported by the WHO for prevent health consequences ................................................................................................................................................... 27 Table 4.2 Descriptive Statistics for the average 1 hour collected data ............................................................ 28 Table 4.3 Correlation matrix between the relevant gases and the physical/environmental factors for the 1hour collected data. .......................................................................................................................................... 34 Table 4.4 Correlation matrix between the relevant gases concentration among them for the data set II collected data. ................................................................................................................................................... 35 Table 4.5 Mean 10 minutes concentration over the eleven months of observations ....................................... 36 Table 4.6 The Post Hoc probabilities (Duncan test) of the monthly variations for (a) SO2 (b) NO2 (c) O3 (d) Benzene (e) Toluene and (f) o-Xylene ............................................................................................................. 44 Table 4. 7 The Multivariate Linear Regression Results for (a) O3 taken the o-Xylene as predictors (b) O3 taken the Toluene as predictors (c) the final better predictor results ............................................................... 44 Table 4.8 Regression summary for dependent variable: O3 ............................................................................ 45 Table 4.9 Summary of the multivariate linear regression analysis for the ozone concentration on the air taking the other variables as predictors ............................................................................................................ 48 Table 4.10 Gases averages concentrations and mean values for the atmospheric-environmental variables grouping by month, weekdays, month days, section and time day .................................................................. 50 Table 4.11 Summary of the multivariate linear regression analysis for the ozone concentration on the air taking the other variables as predictors, only with the average Timely data. .................................................. 51 Table 4.12 Observed and predicted values along with the residual parameters. ............................................. 52 Table 4.13 Summary of the multivariate linear regression analysis for the nitrogen dioxide concentration on the air taking the other variables as predictors ................................................................................................. 53 Table 4.14 Observed and predicted values for the nitrogen dioxide along with the residual parameters ....... 54 Table 4.15 Observed and predicted values for the o-Xylene along with the residual parameters. .................. 55 Table 4.16 Attributes of the target dataset ....................................................................................................... 56 Table 4.17 Model features and time (t) ............................................................................................................ 56 Table 4.18 Accuracy of pollution predictor model (hour and day).................................................................. 61 Table 4.19 Correlation coefficient of Dataset I (10 min) ................................................................................. 66 Table 4.20 Correlation coefficient of Dataset II (1 h) ...................................................................................... 66 Table 4.21 Relative Absolute Error of Dataset I (10 min) ............................................................................... 66 Table 4.22 Relative Absolute Error of Dataset II (1 h) .................................................................................... 66 Table 4.23 Performance count of each algorithm ............................................................................................ 67 Table 5.1 Software Components of the TRCAQ ............................................................................................. 79 vi
List of Figures Figure 3.1. Sampling Sites of the DOAS Instrument at Sohar University; Light Emitter (A), Reflector (B), and Parking Lot of Sohar University (C). ........................................................................................................ 19 Figure 4.1 Mean plot of temperature grouped by month ................................................................................. 29 Figure 4.2 Mean plot of relative humidity grouped by month ......................................................................... 29 Figure 4.3 Mean plot of global radiation (W/m2) grouped by month ............................................................. 30 Figure 4.4 Mean plot of wind direction (°) grouped by month ........................................................................ 30 Figure 4.5 Mean plot of temperature (°C) grouped by time ............................................................................ 31 Figure 4.6 Mean plot of relative humidity (%) grouped by time ..................................................................... 31 Figure 4.7 Mean plot of Wind direction (°) grouped by time .......................................................................... 31 Figure 4.8 Mean plot of multiple variables grouped by month........................................................................ 32 Figure 4.9 Mean plot of multi variables grouped by month ............................................................................ 32 Figure 4.10 Mean plot of multiple variables grouped by time ......................................................................... 33 Figure 4.11 Multiple regression results for predicting the O3 concentration .................................................. 35 Figure 4.12 M10C of the SO2 two weeks grouping from April to August. .................................................... 37 Figure 4.13 M10C of the NO2 two weeks grouping from April to August ..................................................... 38 Figure 4.14 M10C of the O3 two weeks grouping from April to August ........................................................ 38 Figure 4.15 M10C of the Benzene two weeks grouping from April to August ............................................... 39 Figure 4.16 M10C of the Toluene two weeks grouping from April to August ............................................... 40 Figure 4.17 M10C of the o-Xylene Two weeks grouping from April to August ............................................ 41 Figure 4.18 M10C within one single hour along the eleven months ............................................................... 41 Figure 4.19 MHC in any day along the eleven months.................................................................................... 42 Figure 4.20 MDC grouping for weekdays along the eleven months ............................................................... 43 Figure 4.21 AM/PM effects over the eleven months ....................................................................................... 43 Figure 4.22 Predicted and observed values, dependent variable O3................................................................ 45 Figure 4.23 The ozone concentration on the air as a function of the other gases levels. ................................. 47 Figure 4.24 The ozone concentration on the air as a function of the most important atmosphericenvironmental parameters. ............................................................................................................................... 47 Figure 4.25 Predicted versus observed values taking from the linear model................................................... 48 Figure 4.26 The distribution of the average concentration of Ozone grouping as (left) daily for any month and (right) Time for any day. ........................................................................................................................... 51 Figure 4.27 Graphical results from the prediction model (left) Predicted versus observed values and (right) Residuals versus predicted values distribution. ............................................................................................... 52 Figure 4.28 Predicted versus observed values for the o-Xylene taking from the linear model ....................... 54 Figure 4.29 Correlation coefficient of both datasets ........................................................................................ 65 Figure 4.30 RAE of O3 on both datasets ......................................................................................................... 67 vii
Figure 4.31 RAE of bagging model on both datasets ...................................................................................... 68 Figure 4.32 TRUE vs. PREDICTED daily maximum values of SO2 concentration (µg/m3) ........................ 69 Figure 4.33 TRUE vs. PREDICTED daily maximum values of NO2 concentration (µg/m3) ........................ 69 Figure 4.34 TRUE vs. PREDICTED daily maximum values of O3 concentration (µg/m3) ........................... 69 Figure 4.35 TRUE vs. PREDICTED daily maximum values of B concentration (µg/m3) ............................. 69 Figure 4.36 TRUE vs. PREDICTED daily maximum values of T concentration (µg/m3) ............................. 69 Figure 4.37 TRUE vs. PREDICTED daily maximum values of P-X concentration (µg/m3) ......................... 69 Figure 4.38 TRUE vs. PREDICTED daily maximum values of M-X concentration (µg/m3) ........................ 70 Figure 4.39 TRUE vs. PREDICTED daily maximum values of O-X concentration (µg/m3) ........................ 70 Figure 5.1 TRCAQ data flow ........................................................................................................................... 74 Figure 5.2 Illustration of TRCAQ Home Page ................................................................................................ 76 Figure 5.3 Latest & Forecast Page showing site information and location photographs ................................. 77
viii
Abbreviations am or AM ANOVA APHEA AQG AQR AR BG BTX DIAL DOAS EPA FTIR KS LR M10C MAE MECA MDC MHC MYC M5P M5R m-Xylene or M-X NO NO2 NOx O3 o-Xylene or O-X pm or PM PM PM10 PM2.5 p-Xylene or P-X RAE REP RD RMSE RRSE RS SHW
ante meridiem (morning) Analysis of Variance Air Pollution on Health: European Approach Air Quality Guidelines Air Quality Regulations Additive Regression Bagging benzene, toluene, and xylene differential absorption lidar Differential Optical Absorption Spectroscopy Environmental Protection Agency Fourier-transform infrared Lazy learning Linear Regression Mean 10 minute concentration Mean absolute error Ministry of Environmental and Climate Affairs Mean daily concentration Mean hourly concentration Mean yearly concentration Trees Rule Based meta-Xylene nitrogen monoxide nitrogen dioxide nitrogen oxides ozone ortho-Xylene post meridiem (afternoon) particle pollution particulate matter up to 10 micrometers in size particulate matter up to 2.5 micrometers in size para-Xylene Relative absolute error REPTree RegressionByDiscretization Root mean squared error Root relative squared error RandomSubSpace Sohar high way ix
SO2 TEA TDLAS TRC TRCAQP WHO
sulfur dioxide triethanolamine tunable diode laser absorption spectroscopy Research Council of Oman the Research Council Air Quality Project World Health Organization
x
CHAPTER 1 Air Pollution Effects on Environmental Conditions near the Sohar Highway in Oman
1.1 Introduction The rise in air pollution is both evident and universal due particularly to industrialization and urbanization (Özden, Döğeroğlu, & Kara, 2008). Air pollutants, which may be organic and inorganic, have adverse effects on human health and may cause unwanted environmental impact (Gryparis, Forsberg, Katsouyanni, Analitis, Touloumi, & Schwartz, 2004; Pénard-Morand, Charpi, Raherison, Kopferschmitt, Caillaud, & Lavaud, 2005). As you would expect, this is greater in areas with high industry activities, as the state of the environment in these areas is relatively worse and requires urgent intervention from environmental authorities (Elsom, 1994). The task of improving and controlling air quality has drawn a great deal of worldwide attention. It has called for serious policy interventions from governments and initiatives for technological advancements from public and private entities. To evade the dangers caused by air pollution, it is important to consider the accurate measurement of its levels, as well as variations due to changes in environmental conditions. Gaining a good grasp of these levels can guide authorities in taking precautionary measures that will minimize the projected health and environmental impacts. However, measuring air quality is not easy. It is usually stymied by unreliable data suffering from insufficient sampling, erroneous measurements and their acquisition (Gerboles, Lagler, Rembges, & Brun, 2003). Previous ambient concentration measurement and monitoring methods have already been used, such as chemiluminescent analysers, diffusive samplers, electrochemical sensors, thick film sensors, differential optical absorption spectroscopy (DOAS), and Fourier-transform infrared (FTIR) methods. However, their utilization has been limited to emission testing, which has resulted in unrealized potentials for obtaining useful and accurate information in detecting patterns/trends that will be useful in forecasting air pollution levels. The viability of measuring techniques therefore becomes an important question to answer. This paper considers data mining techniques, or the process of finding practical knowledge from huge data saved in data warehouses, databases and types of information repositories, (Fayyad, Piatetsky-Shapiro, & Smyth, 1996) as alternative to previously known mechanisms for various reasons. They have been utilized in various application domains, such as in sentiment analysis, 1
object recognition, online advertisement, and social marketing. Using combinations of different techniques from different fields, i.e., statistics, artificial intelligence, database systems, and pattern recognition, the ability of data mining for machine learning (Riga, Tzima, Karatzas, & Mitkas, 2009) to analyse huge volumes of data on air quality can significantly improve the forecasting capabilities of current methods. Some of the data mining methods that have been used for forecasting are Linear Regression, Additive Regression, Bagging (Meta Algorithm), Lazy learning (Kstar), Trees M5p, Rule based Random Sub Space, Regression by discretization, and Trees (Reptree) (Yang & Wu, 2006). Bagging stands as the most suitable, since it operates by altering the data to generate various base networks (Bose & Mahapatra, 2001; Opitz & Maclin, 1999). It can accommodate a wide range of factors, such as air pollutions requirements, traffic flow, land uses, and meteorology. Bagging is an easy, ensemble method of classification, wherein sampling with replacement can be done from the training set in order to form novel training sets, which can replicate the ideal situation (Breiman, 1996). Despite its simplicity compared to others with more intricate data mining algorithms, bagging is flexible enough to process a wide range of factors, such as air pollutions requirements, traffic flow, land uses, and meteorology. Although other algorithms do help, they somehow lack some functionality that is fulfilled by bagging algorithm. To test the effectiveness of data mining and machine learning methods in forecasting pollution levels, this study looks into the air pollution levels near the Sohar City highway in Oman at two different time durations. Studies on air pollution have rarely investigated the feasibility of measuring air pollution levels near the highway, a probable place where pollutant saturation may occur considering its proximity to sources of pollution. Studying the variation of gaseous species across the highway is necessary in order to enhance our understanding on how to reduce the uncertainties in air quality assessment and the corresponding environmental risks. The environmental conditions that may affect the levels of air pollutants present during particular periods have also rarely been the subject of these studies. Hence, this paper asks if environmental conditions affect the level of air pollution, considering changes in season, wind direction, time of day, and temperature. This paper, therefore asks two significant questions: first is the viability of bagging data mining technique as a viable tool to measure air pollution levels. The second is to determine whether environmental conditions, such as season, wind direction, time of day, and temperature, affect the levels of air pollution in the area, particularly SO2, NO2, O3, benzene, toluene, o-Xylene, m-Xylene and p-Xylene gases. To answer these questions, this paper argues that data gathered near 2
the SHW using data mining techniques for measuring and forecasting air pollution has determined that the levels of air pollution vary based on seasons, wind direction, time, and temperature. It aims to develop a measuring and forecasting model for air pollutants using Bagging data mining algorithm, with full consideration of environmental conditions. Through this, a systematic forecasting system that takes into account factors that may affect levels of air pollution may be utilized to provide a solution to this issue. Sohar was selected for various reasons. The city, located on the Gulf of Oman outside the Strait of Hormuz, is subject to land-sea breeze circulation and long periods of meteorological stagnation (Charabi, Al-Bulooshi, & Al-Yahyai, 2013). It has also been experiencing a significant increase in emissions mainly due to fuel consumption, as the number of vehicles in the city has more than doubled, from 31,206 vehicles in 2006 to more than 45,390 vehicles in 2012 (ROP, 2012). This was worsened by the climate of the region. Westerly winds are predominant, and in combination with land-sea breeze circulation as explained above, polluted air moves toward the eastern part of Sohar (Charabi, Al-Bulooshi, & Al-Yahyai, 2013) where most of the city’s working class resides. Furthermore, the air quality is impacted upon by temperature inversions that momentarily trap pollutants, including those resulting from vehicle emissions. In particular, the Sohar Highway in Sohar City was selected because it is a busy highway with varying traffic density. The highway is considered the main inter-city road in the Oman Sultanate, connecting major cities in the Sultanate and other major cities in neighbouring countries such as Dubai in the United Arab Emirates. It was also subjected to a rapid emergence of industrial plants, such as chemical plants, which contributes to air pollution in Oman. Air pollution caused by high nitrogen oxide concentration is also the result of increasing vehicular traffic density in Sohar. Studying the levels of air pollution in SHW is vital in understanding and developing a remedy that will address the implications of rising levels of pollution in the city. Special considerations have been given by the government to Sohar City, in order to make it a business and industrial centre of Oman by 2020 and to establish economic reforms of the country other than oil business. However, the impacts of air quality and the importance of setting air quality goals are not yet fully understood by the wider region. Without this, the industrialization agenda of the Omani government may inevitably set aside the environmental impact of the plan. The allowable levels of the pollutants and air quality standard guidelines have been set by the WHO and other national agencies. These guidelines clearly indicated that it will cause different levels of human health problems when the pollutant levels exceed the allowable levels (Künzli, Kaiser, Medina, Studnicka, Chanel, & Filliger, 2000). In this case, a reliable air quality 3
management and information system is needed to forecast the future air pollution levels and suggest suitable control actions (Monteiro, Lopes, Miranda, Borrego, & Vautard, 2005). Warnings can be issued to the public or a specific group of people who are sensitive to particular air pollutants and therefore help to reduce any health effect of the air pollution. However, these warnings must be simple and informative in order to be understood by the public. Also, this air quality prediction data and information is useful for environmental management authorities to prevent or minimize the other adverse effects caused by air pollution. Currently the Sultanate of Oman doesn’t have any air quality forecast system. The only online available air quality forecasting system and information (AirNow, 2014; UK AIR, 2014; EPA VICTORIA, 2014) has referred to their air quality prediction and other relevant applications. In this research work, an online air pollution forecasting system using a bagging model for Sohar city has been developed. The main idea of this system is similar to the study of air pollution prediction of the Scottish air quality website (Willis, 2006). After establishing an initial online system, the performance and accuracy has been improved by referencing other alternative models.
1.2 Hypotheses and Testing For this paper, the following hypotheses shall be tested: • Null-hypothesis (H0): The air pollutant levels near the SHW using data mining techniques do NOT significantly vary due to environmental elements. • Hypothesis H1: The air pollutant levels near SHW significantly vary due to change in seasons • Hypothesis H2: The air pollutant levels SHW significantly vary due to change in wind direction • Hypothesis H3: The air pollutant levels SHW significantly vary due to changes in time of day • Hypothesis H4: The air pollutant levels SHW significantly vary due to changes in temperature To test these hypotheses, the author tested for levels of air pollutants in three main locations near the Sohar highway (SHW) i.e. top of the roof of Sohar University building, top of the Minaret mosque, and near the Sohar University parking lot. Pollutants near Sohar highway, particularly NO2, O3, SO2, and BTX (benzene, toluene, and xylene) were monitored using Differential Optical Absorption Spectroscopy (DOAS), a device which is broadly used to determine atmospheric species concentrations. To capture the variations in concentration of gaseous species, the evaluation of the captured light in the DOAS instrument was analysed every ten minutes. Meteorological parameters to represent environmental conditions, such as speed and direction of wind, air humidity, pressure, 4
temperature, precipitation, and global solar radiation were also measured to determine its effects in the concentrations of the gaseous species. These were recorded and changes were reported accordingly. 1.3. Limitations Limits to this research, which may be addressed by further studies, include a comprehensive assessment of sources of air pollution (including vehicular emissions), which is not covered by this paper. Although the particulate matters have great health impacts, it has not been investigated in this study due to the applicable instruments that have been used. Furthermore, the study is made for pollution levels with a span of eleven months, resulting in a larger database and more strongly based regression results and the study will not analyse the possible reasons for the evolution of pollution levels. It will not conduct an econometric study of pollution levels on a wide array of nonincome variables, as it first needs to establish the empirically based relationship between variables on pollution levels, such as education, literacy, policy applications. Nonetheless, the study will aid the government of Oman in understanding pollution in the region, which will help in developing effective policies for air pollution. 1.4 Thesis Structure The content of this thesis is organized as follows: Chapter 2 is summary of literature survey on air quality measuring and forecasting fields; it provides understanding of different methods and their efficiency which have been applied in this study areas so far. Chapter 3 presents the methodologies and procedures applied in this study in detail. Chapter 4 presents and discusses the results in measuring, analysing, and forecasting pollution levels through statistical analysis, such as the Student test and ANOVA, as well as machine learning algorithms; it describes the most significant results obtained through our experimental investigation on the data mining method application for air quality forecasting and statistical evaluation. Chapter 5 introduces the online air pollution forecasting and information system, which has been developed based on the results of this research work. Finally, Chapter 6 concludes this paper by presenting the main results of this research work and future outlook.
5
CHAPTER 2 Heavy Air Pollution Prediction by Using Data Mining Algorithm
2.1. Introduction In recent years, the heavy air pollution load near to highways has led to numerous health hazards. The negative effects of air pollution and the necessity of controlling the air quality have drawn worldwide attention. Particulate matter, also known as particle pollution or PM, is one of the main pollutants. These particles are characterized by size and can cause serious health problems. Relevant researches show that the death rate in cities with elevated levels of pollution significantly surpasses significantly those noted in moderately cleaner cities (Stone, 2003). The traffic intensity on both highways and urban areas is rapidly increasing nowadays and it has therefore become necessary to find a method or tool to analyze and predict the air pollution in highway traffic areas, in order to prevent the possible hazards caused by air pollution in long term. Despite the fact that introducing cleaning technologies in both stationary and mobile sources has contributed to the reduction of air pollutants significantly, air pollution still remains a major risk in our lives (Gryparis, Forsberg, Katsouyanni, Analitis, Touloumi, & Schwartz, 2004; PénardMorand, Charpi, Raherison, Kopferschmitt, Caillaud, & Lavaud, 2005). There are currently many countries who inform the public when the level of air pollution exceeds a certain level. Sometimes, it is only directed to the specific population who are allergic or sensitive to specific particles (e.g. asthmatics) or overall air pollution. It has therefore become an important task to find out innovative ways to predict the air quality before these individuals get exposed to ozone or fine particles. Generally, the guidelines for important air pollution alert will be considered only when the level of air pollution turns to considerably serious (Elsom, 1994). The ambient air quality assessment and management council of EU states: “The ambient air pollution level should be defined in such a way so that all possible serious harm or negative effects on human should be possible to be avoided, reduced or prevented by taking proper actions”. Typically, when the air pollution reaches a higher level than guideline threshold values, then alerts will be released and corresponding actions will be taken immediately. However, in some countries the alert level is defined lower than the guideline value. For example, in New Zealand the alert level air pollution is below the guideline value (New Zealand Ministry for the Environment, 2004). It has been noticed that the warning level of air pollution in New Zealand, which is using 66% to 100% of the guideline 6
level, might be possible to go beyond the guideline level when upward trends of air pollution are not curbed. In this case, managers, administrators and policy makers have to develop policies and methods which are aimed at controlling potential overflow or improving the air quality. Thus, many environmental agencies around the world have imposed stringent emission regulations. Evaluation of air quality in urban areas using available measured air quality data will provide useful information and solutions for long-term planning of pollution control strategies to control the pollutants level under the safety limits (Özden, Döğeroğlu, & Kara, 2008). Air quality forecasting is an essential method for air pollution prevention. Conventionally, either mathematical models or specific meteorological knowledge are used to estimate and classify the air quality for forecasting purposes. The mathematical models offer capabilities of forecasting beforehand, which is useful for making decisions in advance of a coming event or even a crisis. The meteorological methods, on the other hand, correspond to the present status of air quality which provide the possibility of inferring directly in terms of health risk of human and potential harms on eco-system. In the following section the statistical measuring and analysis techniques for air pollution forecasting will be discussed. 2.2. Air Pollution Measuring Techniques The selection of proper air pollution measurement techniques is significantly important. Main concerns while selecting a measurement tool or technique for air pollution include accuracy, preciseness, instrument complexity, time consumption, and cost of measurement system (WHO, 1976). However, the air pollution is caused by different components and substances like acidic gasses, ozone, and dust and smoke particles. Therefore, there is no common way to measure all these pollutants simultaneously, so that separate methods are used to measure air pollution caused by each component individually. The commonly used air pollution measuring methods, on commercial and industrial levels, include: chemiluminescent method, passive samplers, electrochemical sensors, thick film sensors, various spectroscopic methods, and FTIR analysers. These methods can be used for obtaining useful information in various circumstances for different applications. This section will briefly introduce these methods. Table 2.1 summarizes the comparison of advantages and disadvantages of different air pollution measurement techniques/methods.
7
TECHNIQUE
ADVANTAGES
Relatively high material and operation cost.
Chemiluminescence
• Approved technique by EU First Daughter Directive. • Good detecting resolution ~ 1 µg m-3. • Real time data with short responding time (