An Interactive Predictive System for Weather ...

5 downloads 0 Views 161KB Size Report
State Legislatures at State Capitol, met with Senator Thomas Kean, Jr. to discuss research at state colleges. [3] Prediction Model of Hourly Water Consumption in ...
An Interactive Predictive System for Weather Forecasting Ayham Omary Engineering and Architecture faculty Umm Al-Qura University Mecca, KSA [email protected]

Ahmad Wedyan CIS Department, IT Faculty Salman bin Abdulaziz University AlKharj, KSA

[email protected]

Abstract— Studying precipitation and weather data using Artificial Intelligent (AI) and data mining techniques has been the subject of several research papers. In this paper, a dataset is built about Jordanian weather and precipitation related information. This information is gathered from local and web resources. A tool is built to parse all weather related information from different websites that store such information. Data mining techniques and AI algorithms are used for future precipitation forecasting based on historical data. Data mining and statistical methods are used to predict future forecasting and possible climate change. Keywords: Weather forecasting, data mining, and prediction algorithms, Numerical Weather Prediction models.

I.

INTRODUCTION

Fresh water is essential for the sustainability of life. Climate change may have a serious impact in fresh water availability especially in countries of scarce resources such as Jordan. Changing climate conditions have increased the demand for decision support systems to include true forecasts. Studying historical data for predicting future knowledge is a software field that is applicable to all life fields. Knowledge discovery from temporal, spatial and spatiotemporal data is critical for climate change science and climate impacts. We still study historical climate statistics for future forecasting. The evolution of the tools and techniques available to gather data about weather, water, temperature, etc increased the possibility of utilizing data mining techniques in this field. Many researchers in the world used a process of extracting information over the using of satellite remote sensing image data. It is a very important and effective method of forecasting rainfall for example. Meteorological satellite data have been operational in weather services for more than 30 years. During this period, forecasting of severe weather based on satellite remote-sensing data has been a challenging task. Reanalysis is an intelligent use of past information with modern modeling to create consistent, long-term, spatially extended data sets (meteorology, climate, production etc). Forecasting is the prediction of outcomes based on varying degrees and different approaches (i.e. deterministic, statistical, semi-empirical and artificial intelligence).

Ahmed Zghoul, Ahmad Banihani, and Izzat Alsmadi Computer science and IT Faculty Yarmouk University, Irbid, Jordan

[email protected], [email protected], [email protected]

This prediction is based on understanding of processes and history over different timescales, spatial resolution, costs and accuracy. To more efficiently use the limited amount of water under the impact of global climate change or to resourcefully provide adequate time for flood and drought warning, there is a need to seek an advanced modeling technique for improving streamflow forecasting on a short-term basis. In any data mining model, the raw data is the first required input. In this part, we will collect and build a dataset about climate related attributes in Jordan over a history of years. The dataset may include attributes related to waterfall, temperature, etc that will be gathered by climate and water domain experts. In the second stage and in order to build a climate and weather classifier, we will train the model through using actual data. After training and evaluating the model, it will be used for future forecasting. Climate modeling may include studying the following attributes: 1.

Historical weather records

2.

Daily rainfall and max and min temperatures

3. Raining and temperature parameters relative to time, location, and height (e.g. spatiotemporal rainfall Distribution, etc). 4. Air temperature, relative humidity, soil moisture, soil temperature, etc. In order to apply data mining techniques on climate forecasting, several preprocessing techniques should be utilized to improve accuracy and eliminate outliers. Examples of some of the included steps may include: 1. Data scaling is a very important step before the models can be formulated and developed. All the input variables were standardized by subtracting mean and divided by the standard deviation [10]. This would generate a set of standard normal random variables with mean ‘0’ and standard deviation ‘1’. 2. Preparation of inputs for calibration and verification of models

3. Several available data mining techniques maybe applied depending on suitability and accuracy. Examples include Neural Networks, KNN, Genetic programming, etc. Many numerical techniques of streamflow prediction have been widely used in water resources managements. There are stream discharge models of climatic variations designed as physically-based models such as “SHE” [11], conceptual models such as IHACRES [12] and CMD-IHACRES [13]. The physically-based models intensively involve continuum mechanics of water transport. The modeler must well understand the underlying physics of the water transport and its interactions. The physically-based models are computationally demanding. Schreider et al reported that such conceptual models could work very well with good accuracy despite the highly computational needs of the physically-based models. Recent advances in small-scaled, high performance computers have opened the way for high intensity computational demanding programs [14]. II.

RELATED WORK

The focus in this paper is studying the use of Information technology (mainly, data mining, information retrieval and AI) in building a climate and weather prediction and forecasting model. We will present some papers that have this similar focus. Zhang et al. introduced the problem of mining dynamic interdimension association rules for local-scale severe weather prediction [1]. At the beginning they transformed the original weather record database into a new database expressing the change tendencies of the measurements of the weather. And they proposed a new algorithm DIAL, incorporating the algorithm for quantitative association rules and the process of database transformation. Then they used the algorithm for mining quantitative association rules to generate the result rules in quantitative format. Finally, they introduced some predicates to generate the final rules. Gibson et al. used parallel genetic algorithms to explore weather prediction through cluster of desktop stations running the MPICH distribution of MPI [2]. They implemented smallscale parallel genetic algorithms and developed a simple temperature prediction scheme based on an abridged set of data. Kusiak and Shah proposed simple and robust alarm system architecture for predicting incoming faults due to unbalanced water chemistry [5]. In that system data mining algorithms produced easy-to-interpret multiple rule sets, which were employed by hierarchical decision making algorithm to predict faults. They successfully applied the alarm system to the data from two commercial power plants to monitor the water chemistry faults. The system effectively identified normal and faulty operating conditions. Atsalakis and Minoudaki solved one of the main problems in the management of large water supply and distribution

systems [4]. This is the forecasting of daily demand in order to schedule pumping effort and minimize costs. The problem of water demand prediction of daily irrigation is solved using Adaptive Neuro-Fuzzy Inferences Systems which applied on data which was taken from a water company named O.A.DY.K and concerns the area of prefecture of Chania. This solution has several advantages such as reliability, maintainability, easy to use, easy to implement, and avoids the drawbacks of time-series methods which require a constant time step during sampling. Finally, this solution, ANFIS, has the biggest mean absolute prediction error and got the steady performance considering AR and ARMA as verification. In their paper, Yuko Tachibana and Mikihiko Ohnari [3] focused on the prediction of hourly water consumption to supply sustainable water to consumers and to operate an effective plant [3]. They used data mining concept to analyze and categorize hourly water consumption data which gathered in a water purification plant in a metropolitan area in Japan for several years and tried to construct a precise prediction model through the year. They used a categorical approach to precisely predict the hourly water consumption for the next day (which is called a waveform) even if this next day is an usual day or an unusual day such as a holiday to extract regularities from data as a base pattern and to propose a structure of a water demand prediction model. The problem they faced in this solution is that clustering the waveforms by attributes, such as the day of the week, weather and temperature range generates a large number of clusters and base patterns. They solved this problem by applying the cumulative curves instead of waveforms for clustering hourly water consumption data to reduce the number of generated clusters and base patterns as the waveforms belonging to different clusters are classified into the same cluster as to the cumulative curves. Liu et al. used a three-dimensional variation of retrieval systems as an improvement of the one-dimensional version [6]. This system was used in for several decades and provided valuable products to a large community for satellite sensor monitoring, climate studies, and quality control in numerical weather prediction (NWP) centers. The improved algorithm based on a Gridpoint Statistical Interpolation (GSI) scheme from the National Center for Environmental Prediction (NCEP). As a result the 3D-var retrieval system is able to produce a low pressure field at the surface and the warm core at upper troposphere for the typhoon Choi-wan which implies that 3D-var retrieval system is valuable. Theron et al. described how data mining and visualization techniques can benefit Paleoceano-graphers [7]. They show an example of reconstruction of ocean paleodynamics in the South China Sea to show that Ocean dynamics modeling is essential for predicting the impact of climatic change on human activities. They used integrated comprehensive datasets obtained from a gravity core recovered in the Sunda Slope

(South China Sea) using different techniques such as quantitative analyses from coccolithophores, stable isotopes and bio-markers. Finally, this study gives a good example of combining data mining and paleoceanography to explain some general paleodynamics, including short-time events, showing the potential to monitor and predict in the context of decadal time series. III.

GOALS AND APPROACHES

The problem with weather prediction is the complexity that comes up of the many variables that may affect weather prediction. In addition to considering all the current statistics on the climate, physical equations to describe particle interactions on the smallest scale are aggregated to model an entire weather system. The major goals of this paper is to introduce a weather prediction model (based on HIRLAM and ALADIN models) that will be used for weather (in particular precipitation) prediction model focused on Jordan geographic area. This model will be based on the several weather numerical Prediction models. Those models depend on collecting historical weather data and collect all possible attributes that may impact (with a different degree) the current and future amounts of yearly falling rain in Jordan. Example of the important climate attributes that are relevant to this model are: temperature: mean, max and min, cooling and growing degree days, dew point, humidity: average, maximum and minimum, wind speed, heights, desert and mountains related information, etc. There are also other important factors that affect precipitation in particular. This includes: Several factors affect precipitation efficiency, including saturation ratio, production rate of condensate, residence time in clouds, dry air entrainment, vertical wind shear, and perceptible water [15, 16]. Differential equations are used in weather prediction models. The two most common techniques for approximating solutions to these equations are the finite element method and spectral methods. In this field, several researchers and practitioners used the Precipitation Potential, PP formula to predict precipitation. PP is calculated as follows:

surface and biosphere (vegetation), the ocean surface and atmospheric chemistry. The HIRLAM model splits a forecasting problem into two parts: physics and dynamics. Dynamics concerns itself with actually solving the model equations and analyzing past weather data. The physics segment aggregates all the equations used to describe microscale behavior like particle interactions. There are several equations that are used in the model. Those consist of a set of three dimensional nonlinear hyperbolic partial differential equations. These equations are solved in the dynamics portion of the HIRLAM model. The equations describe five predictive variables: 2 horizontal wind components, the temperature, specific humidity, and the surface pressure. The ALADIN model is built according to the idea of compatibility with the global model (ARPEGE), out of which it was derived. It is a model with primeval equations, based on spectrum technique including hydrostatic, eulerian or semi lagranean options, digital filter initiating, optimum interpolation analysis & lately variational assimilation of 3D data. The ALADIN model, developed in an international cooperation led by Météo France, is operationally used for weather prediction. The grid step of the model is 12 km; the integration domain covers a major part of Europe. We have noticed through contacting the national and local weather stations in Jordan that all weather related information are scattered, not up to date, or not collected over a sufficient period of time (such as several years). Collecting the historical weather dataset is a major element on project. We decided to collect this information locally and nationally from the different resources along with trying to collect such information from universal entities or websites that save such information for reasons such as: research studies, air traffic control stations and many other reasons.

PP = PW * (1000-700 MRH) where PW = Perceptible Water through the entire depth of the atmosphere in inches, and 1000-700 MRH is the mean relative humidity from 1000 to 700 millibars. The 1000-700 millibar layer was chosen since the deep moisture is mainly contained in the lowest 3-4 km of the atmosphere [17]. Another important weather prediction model is the operational High Resolution Limited Area Model (HIRLAM) model. The model group was set up among the weather services of the Nordic countries, Ireland, and the Netherlands to research short-range forecasting. HIRLAM model is primarily a numerical weather prediction model with parameterizations that are aimed at the short range weather forecast only. The components of such a model are the atmosphere, the land

Figure 1. Weather data robot main user interface. At this early stage of the project, we built an information retrieval robot application that can collect all weather and precipitation related information about Jordan for a period of years of no less than 5 years. A major website that is used in the collection is www.wunderground.com based in Germany which is used by air traffic controllers all over the world.

IV.

CONCLUSION

This paper presents a first glance on a project to build a numerical weather prediction model in Jordan. Such models are usually complex and will take significant time and resources to accomplish. Unlike most other countries, Jordan has scarce research focus in this area. One of the challenges is to collect and have all required data and build a historical dataset of weather, precipitation and all possible related attributes. Some of those attributes may not be available in any local weather station. Some of the information may not be complete or verified. Those are some examples of the challenges that this project may face. Besides making the goals and steps of the project, in this first stage, we built a software robot program to collect all weather related information from worldwide websites that collect such information about Jordan and all other countries. We hope once we have such dataset that we will be able to compare it with the collected data from local weather stations in Jordan. Later on, data mining and AI algorithms will be used to build a future prediction and forecasting for precipitation based on studying the historical data.

REFERENCES [1] “Mining Dynamic Interdimension Association Rules for LocalscaleWeatherPrediction”, Zhongnan Zhang, Weili Wu, Yaochun Huang, Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04), 2004. [2] Parallel Genetic Algorithms: An Exploration of Weather Prediction, through Clustered Computing, Emily Gibson, Jessie Burger, and Deborah Knox, CCSCNE03, Posters Under the Dome, Mar 2003. Presentation to NJ State Legislatures at State Capitol, met with Senator Thomas Kean, Jr. to discuss research at state colleges. [3] Prediction Model of Hourly Water Consumption in Water Purification Plant through Categorical Approach, Yuko Tachibana and Mikihiko Ohnari, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics, [4] Daily irrigation water demand prediction using Adaptive Neuro-Fuzzy Inferences Systems (ANFIS), G. ATSALAKIS and C. MINOUDAKI, Proc. of the 3rd IASME/WSEAS Int. Conf. on Energy, Environment, Ecosystems and Sustainable Development, Agios Nikolaos, Greece, July 24-26, 2007. [5] Data-Mining-Based System for Prediction of Water Chemistry Faults, Andrew Kusiak, and Shital Shah, IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 53, NO. 2, APRIL 2006. [6] A THREE-DIMENSIONAL VARIATION (3D-var) RETRIEVAL OF TEMPERATURE AND WATER VAPOR PROFILES, Quanhua Liu, Fuzhong Weng, Sid-Ahmed Boukabara, Yong Han, 11th Specialist Meeting on Microwave Radiometry and Remote Sensing of the Environment (MicroRad), 2010. [7] Using Data Mining and Visualization Techniques for the Reconstruction of Ocean Paleodynamics, R. Theron, J. A. Flores, F. J. Sierro, C. Pelejero, J. Grimalt and M. Vaquero, IEEE International Geoscience and Remote Sensing Symposium, 2002. IGARSS '02. 2002 . [8] Presentation to the WMO Workshop Peter Best, Shahbaz Mushtaq & Roger Stone, Toowoomba, 19 May 2009. http://www.wamis.org/agm/meetings/-wocaps09/S5-Best.pdf [9] Short-term streamflow forecasting with global climate change implications – A comparative study between genetic programming and neural network models, A. Makkeasorn, Chang a, X. Zhou, Journal of Hydrology (2008) 352, 336– 354. [10] Robust Scale Estimators and Confidence Intervals for Location. In: Understanding Robust and Exploratory Data Analysis, Iglewicz, B., 1983.

Hoaglin, D.C., F. Mosteller and J.W. Tukey (Eds.). John Wiley and Sons, New York, ISBN: 0-471-38491-7, pp: 405-431. [11] An introduction to the European Hydrologic System-Systeme Hydrologique Europeen, Abbott, M. B., Bathurst, J. C., Cunge, J. A., O’Connell, P. E., and Rasmussen, J., 1986a. SHE, 1: History and philosophy of a physically-based, distributed modeling system. J. of Hydrol., 87:45-59. [12] Computation of the instantaneous unit-hydrograph and identifiable component flows with application to 2 small upland catchments, Jakeman A.J., Littlewood I.G & Whitehead P.G. Journal of Hydrology, 117, 275-300, 1990. [13] Improving the characteristics of streamflow modeled by regional climate models. Evans, J.P., 2003. J. Hydrology 284, 211 – 227. [14] Streamflow prediction for the Queanbeyan River at Tinderry, Schreider, S.Y., Jakeman, A.J., Falkland, A., and Knee, R., 1995. Australia. Environment International 21(5), 545 – 550. [15] Condensation Precipitation Process and Precipitation Efficiency, Chappell, C., 1997: Hydrometeorological Course – COMET 1997 [16] Flash Flood Forecasting: An Ingredients-Based Methodology, Doswell, C. A., et al, 1996: Weather and Forecasting, 11 560-581. [17] QPF/Mesoscale. Hydrometeorological Course, Junker, W., 1997, COMET 1997.