Document not found! Please try again

Using Machine Learning Algorithms to provide Local ...

18 downloads 0 Views 1MB Size Report
Aug 11, 2014 - temperature conditions at a given weather station using the current ...... Blackrock Castle station with 98% with margin of error between ±0.0001 to. ±1.999. .... [22] C. M. Bishop, Neural Networks for Pattern Recognition, Oxford ...
Machine Learning: Using Machine Learning Algorithms to provide Local Temperature Prediction

By Pavan Vadlamudi Under the supervision Dr. Karl Grabe

This report is submitted in partial fulfilment of the requirements for the Degree of Master of Science in Software Development at Cork Institute of Technology. It represents substantially the result of my own work except where explicitly indicated in the text. The report may be freely copied and distributed provided the source is explicitly acknowledged. August 2014

Pavan Vadlamudi Machine Learning: Using Machine Learning Algorithms to provide Local Temperature Prediction Abstract Predicting weather conditions can also be considered as an example of data mining. Using the weather data collected from a location for a certain period of time, we obtain a model to predict variables such as temperature at a given time based on the input to the model. As weather conditions tend to follow patterns and are not totally random, we can use current meteorological readings along with those taken a few hours earlier at a location and also readings taken from nearby locations to predict a condition such as the temperature at that location. Thus, the data instances that will be used to build the model may contain present and previous hour's readings from a set of nearby locations as input attributes. The variable that is to be predicted at one of these locations for the present hour is the target attribute. The type and number of conditions that are included in an instance depend on the variable we are trying to predict and on the properties of the ML algorithm used.

i

ACKNOWLEDGMENTS First of all, I would like to express my gratitude to my supervisor Dr. Karl Grabe for guidance and assistance throughout this project. Also, I would like to thank Cormac Gebruers of National Maritime College of Ireland for suggesting motivation for this paper, providing me the allowing me to access data base and valuable feedback. I wish to thank Dr. Ted Scully lecturer providing timely suggestion for this research project work. I would like to thank Dr. John Creagh our M.Sc. co-ordinator and lecturer providing necessary assistance. Next, I wish thank my Organisation Health Information and Quality Authority (HIQA) providing academic support assistance. Finally, I wish to thank my family especially my wife for her support and understanding throughout my masters’ study.

ii

Table of Contents 1 Introduction ....................................................................................................... 1 1.1. Thesis Statement ................................................................................................ 2 1.2. Thesis Outline.................................................................................................... 2 2 Background ....................................................................................................... 3 2.1. Weather station .................................................................................................. 3 3 Literature Survey ............................................................................................... 6 3.1. Multilayer Perceptron ........................................................................................ 6 3.2. Linear Regression .............................................................................................. 8 3.3. M5P ................................................................................................................ 10 3.4. Least Median Square ....................................................................................... 11 4 Machine Learning ............................................................................................ 12 4.1. Classification Algorithms ................................................................................. 15 4.2. Regression Algorithms ..................................................................................... 16 4.2.1. Linear Regression ............................................................................................ 16 4.2.2. LeastMedSquare .............................................................................................. 17 4.2.3. M5P ................................................................................................................ 18 4.2.4. MultiLayer Perceptron ..................................................................................... 20 5 WEKA ............................................................................................................ 23 5.1. Background ..................................................................................................... 23 5.2. The Graphical User Interface ........................................................................... 23 5.3. The Command-line .......................................................................................... 25 5.4. WEKA input datasets ....................................................................................... 26 5.5. ARFF file ........................................................................................................ 27 5.5.1. ARFF Header Section ...................................................................................... 27 5.5.2. ARFF Data Section .......................................................................................... 29 The @data Declaration ....................................................................................... 29 The instance data ................................................................................................ 29 6 Implementation ................................................................................................ 31 6.1. Data ................................................................................................................. 31 6.1.1. Data extraction ................................................................................................ 33 6.1.2. Data clean-up................................................................................................... 34 6.1.3. Data validation and Merge ............................................................................... 34 6.1.4. Dataset selection .............................................................................................. 36 6.1.5. Data processing ............................................................................................... 38 6.1.6. Experiment exception ...................................................................................... 39 6.2. Software and Tools used .................................................................................. 39 7 Results & Conclusions ..................................................................................... 41 7.1. Results ............................................................................................................. 41 7.2. Conclusion....................................................................................................... 50 References ................................................................................................................. 51 Appendix A: Abbreviations ........................................................................................ 54

iii

1 Introduction In our everyday life there is an ever-increasing demand for more accurate weather forecasts. Weather is our most common topic of conversation. The needs for weather prediction are many. From factories to farms, from satellite launching stations to commerce and industries and even from general public there is a persistent demand for more reliable weather predictions. Besides, the architects and the industrialists alike rely on a sound knowledge of the three dimensional atmosphere. In the present smart phone age the exact knowledge of the coming weather is more than necessary. In the field of agricultural planning, the importance of weather prediction cannot be stressed. For example, even one night's killing frost, if not predicted well in advance, may prove fatal to some of the most cherished subtle crops. Prediction of monsoon is vital for sowing seeds. Similarly, the danger of floods and droughts is obvious to all of us. But if there is a timely prediction about their incidence, much can be done to minimize the damage and destruction of lives and livelihood caused by such natural calamities. It is, therefore, true that if everyone, who could make use of the weather predictions, were to have information available when required and in the form required, much human welfare and economic benefits would be within reach. In most cases, accurate weather forecasting happens to be the ultimate aim of atmospheric research. It is also the most sophisticated area in meteorology. It may be noted that the nature of modern weather forecasting is very complex and highly quantitative. It also requires huge amount data processing computing power. For example, Irish Met Éireann uses [1] Numerical Weather Prediction (NWP) model predicts for up to 5 days ahead. It involves a sound knowledge of higher mathematics, physics and other branches of pure science. But in the present research thesis an attempt has been made only to highlight the different machine learning approaches used to predict the coming weather. In this paper I 1

made attempt to predict weather variable temperature using two stations historical monthly data to predict third station temperature on given a time. 1.1. Thesis Statement In this thesis I propose to build models that can predict weather parameter temperature conditions at a given weather station using the current information from that station and surrounding stations. I use ML algorithms including classification and regression to build the models to predict weather parameter temperature at a selected sample of weather stations in the surrounding areas of Cork. Of all the weather conditions reported by weather station I focus on predicting temperature. 1.2. Thesis Outline This thesis is organized as follows. Chapter 2 presents the background for this thesis with a detailed description of the weather data collection systems present and the sensors they use. Chapter 3 discusses the literature survey carried out for this paper. Chapter 4 describes the basic concepts of machine learning and presents various ML algorithms used. Chapter 5 presents use of WEKA tool. Chapter 6 describes the implementation of machine algorithms on data collected. Chapter 7 presents conclusions and summarizes the work done in this thesis.

2

2 Background Weather observations and related environmental and geophysical measurements are necessary for a real-time preparation for weather analyses, forecasts and severe weather warnings, study of climate, local weather-dependent operations, hydrology and agricultural meteorology, and research in meteorology and climatology [2]. Weather observations are required for Weather forecasting. Weather forecasts are issued to save lives, reduced property damage, reduce crop damage and to let the general public know what to expect. During the last two decades, the number of automated weather station networks has greatly increased throughout the world.

This rapid development has been the

consequence of the need to provide meteorological data in near-real time and the great evolution of automatic data acquisition systems. In general, valid meteorological data are required to make useful weather forecasting and other weather related decisions. 2.1. Weather station A weather station is a site, located either on land or sea, it consists of instruments and equipment for measuring atmospheric conditions to provide information for weather forecasts, to study and climate. The measurements taken include temperature, atmospheric pressure, humidity, wind speed, wind direction, and precipitation (rainfall) amounts. Temperature and humidity instruments are keep free from direct sun rays. Wind measurements are collected free any obstructions possible. Automated measurements taken at once an hour and whereas manual measurements are collected at least once daily. There is slightly different weather variables collected from Sea such as sea surface temperature, height of the wave and period of the wave. To collect sea parameters drifting weather buoys, moored weather buoys, ships and oil rigs were used. Here is list of several of instruments and its usage used in typical weather station 

Thermometer: Measures air/sea temperature



Anemometer: Measures wind speed and direction



Barometer: Measures atmospheric pressure



Ceilometer: Measures the height of the base of cloud layer



Hygrometer: Measures humidity of the air 3



Rain-gauge: Measures the amount of precipitation



Sunshine Recorder: Measures the duration of sunshine



Stevenson Screen: A type of wooden louvered rectangular box consists wet and dry bulb thermograph

In addition to above more specific purpose built or advanced weather station may have instruments to measure UV index, solar radiation, soil moisture etc. ,.

Figure 1.1: Picture of weather station located Cork Institute of Technology. List of weather parameters collected from typical weather station (i) Air Temperature: Air temperature is recorded in Celsius in increments of one hundredth of a degree, with values ranging from -9999 to 9999 and a value of 32767 indicating an error. For example, a temperature of 10.5 degree Celsius is reported as 1050. (ii) Soil Temperature: Soil temperature is the temperature measured near the soil surface. It is recorded in the same format as air temperature. (iii) Dew Point: Dew point is defined as the temperature at which dew forms. It is recorded in the same format as air temperature. (iv)Visibility: Visibility is the maximum distance to which it is possible to see without any aid. Visibility reported is the horizontal visibility recorded in one tenth of a meter 4

with values ranging from 00000 to 99999. A value of -1 indicates an error. For example, a visibility of 800. 2 meters is reported as 8002. (v) Precipitation: Precipitation is the amount of water in any form that falls to earth. Precipitation rate is measured in millimetres per hour with values ranging from 000 to 999 except for a value of -1 that indicates either an error or absence of this type of sensor. (vii) Wind Speed: Wind speed is recorded in knots. (viii) Wind Direction: Wind direction is reported as an angle with values ranging from 0 to 360 degrees. (ix) Air Pressure: Air pressure is defined as the force exerted by the atmosphere at a given point. The pressure reported is the pressure when reduced to sea level. It is measured in tenths of a mill bar and the values reported range from 00000 to 99999.

5

3 Literature Survey This was the first step in under the previous work on machine learning algorithms on it. This is a very important for clearly defining the objectives of this study. For this part of work, journal papers, books and internet publications were searched and studied. All these were essential for seeking information methods used in evaluating algorithms, tools and API. There are many statistical and machine learning tools available. Also there many articles and books available on machine learning algorithms. Choosing the right tool and right algorithms is main challenge. Given the nature of my research is mini-thesis so criteria for selecting the tool is free and open source. Machine learning algorithms which are capable of processing numerical data. I quickly learned that regression algorithms suits to my research project. So I have narrow down the literature around regression algorithms and free open source tools. 3.1. Multilayer Perceptron In this paper I found, discussion [3] about prediction of Weather parameter maximum air temperature using an application of Support Vector Machines (SVMs). At a given location predicts maximum temperature of the next day using previous n days as input. Machine learning achieved better performance than usual statistical methods. The findings shows SVM is found better performance compared with MultiLayer Perception (MLP). For this study they used data from University of Cambridge for a period of five years (2003-2007) is used to build the models. In this paper experiments carried out on data from the months of January and July of the year 2008 used for testing. Several weather parameters recorded in the given database at every half hour interval. For this work only the daily maximum temperature is extracted from this database. While searching for particular algorithms found a paper describes that a neural networkbased algorithm for predicting the temperature is presented. The Neural Networks package supports different types of training or learning algorithms [4]. The RMSE 6

(Root Mean Square Error) value of artificial neural network's performance. In this research a sample dataset taken from Weather Underground for the location of Mumbai airport. Dataset contains particular period for the year 2009, from month January to December is taken. MLPs are one of the most common neural network structures, as they are simple and effective, and have found home in a wide assortment of machine learning applications, such as character recognition [5]. P. Ramasamy [6] and others published a journal in science direct recently, in their research project their successfully demonstrated using ANN (artificial neural network) to predict wind speed. Lack of measurement instruments over the mountainous regions India it is hard know wind speed. The purpose of the article study is to predict wind speeds for 11 locations in the Western Himalayan Indian state of Himachal Pradesh to identify potential wind generation sites. An artificial neural network (ANN) model is used to predict wind speeds using measured wind data of Hamirpur location for training and testing. Temperature, air pressure, solar radiation and altitude are taken as inputs for the ANN model to predict daily mean wind speeds. Mean absolute percentage error (MAPE) and correlation coefficient between the predicted and measured wind speeds are found to be 4.55% and 0.98 respectively. Predicted wind speeds are found to range from 1.27 to 3.78 m/s for Kangra, Kinnaur, Keylong, Shimla, Solan, Bilaspur, Una, Kullu, Sirmaur, Una, Solan and Mandi locations. A micro-wind turbine is used to measure the wind power generated at these locations which is found to vary from 773.61 W to 5329.76 W which is appropriate for small lighting applications. Model is validated by predicting wind speeds for Gurgaon city for which measured data are available with MAPE 6.489% and correlation coefficient 0.99 showing high prediction accuracy of the developed ANN Model. In the article published in 2012[7], A Neural networks have been successful in the classification of bank customer data. The prediction correctness of neural networks can be increased by having more training instances in the dataset. Decision trees have been beneficial in knowledge extraction from trained neural networks. In this pare they have 7

obtained rules for classification of bank customers according to their attributes values. The results show that however neural networks are good in generalization performance of given data set. However they cannot explain how they arrive to a solution. The extracted rules show that three major attributes age, region and mortgage has major influence on the data set. The extraction of knowledge from these networks helps us to obtain useful rules which further helps in understanding the results obtain from neural networks. In the future works we can apply soft computing techniques to extract more easily to understand rules. 3.2. Linear Regression Dr. Ted Scully, Lecturer in CIT department of computer science helped me via email, pointed web resource for using regression using WEKA tool. In this web resource, the regression model is then used to predict the result of an unfamiliar dependent variable, given the values of the autonomous variables. Regression is the easiest method to use [8], but is also probably the weakest. In this model can be as simple as one input variable and one output variable (similar to a Scatter diagram in Excel, or a XY Diagram in OpenOffice.org software tools). Of course, it can get more intricate than that, including dozens of input variables. In effect, regression models all fit the same general pattern. There are a number of independent variables, which, when taken together, produce a result — a dependent variable. The relationship between forecast geo-potential thickness and observed maximum temperature is investigated [9], and regression equations are calculated using numerical model thickness forecasts for Nashville. Model thickness forecast accuracy is shown to have seasonal variability. Furthermore, the accuracy of eta regression forecasts is dependent on changes in the average low-level humidity. During December through March, most large eta regression forecast errors occurred during times when much drier air moved into the Nashville area. The eta regression forecast method was useful in making subjective improvements to the Model Output Statistics (MOS) from the Nested Grid Model (NGM), especially when the difference between the eta regression forecast and the NGM MOS was at least 3°F. The limitations of this study include potential error 8

due to inaccurate model thickness forecasts and dependence on forecast thickness as the preferred predictor of afternoon maximum temperature. The change in climate has led to an interest in how this will affect the energy consumption in buildings. Most of the work in the literature relates to offices and homes. However, this paper investigates a supermarket in northern England by means of a multiple regression analysis based on gas and electricity data for 2012[10]. The equations obtained in this analysis use the humidity ratio derived from the dry-bulb temperature and the relative humidity in conjunction with the actual dry-bulb temperature. These equations are used to estimate the consumption for the base year period (1961–1990) and for the predicted climate period 2030–2059. The findings indicate that electricity use will increase by 2.1% whereas gas consumption will drop by about 13% for the central future estimate. The research further suggests that the year 2012 is comparable in temperature to the future climate, but the relative humidity is lower. Further research should include adaptation/mitigation measures and an evaluation of their usefulness. In the journal article [11], they have predicted number of cases of malaria disease from the historical data from various sources. The sources mainly regional surveys and health reports. They used linear regression data mining technique. A GUI based system will provide prediction for a given year and region. In their system this prediction may good assistance to public health authority for decision making in public safety. In this study [12], the ability of two models of multi linear regression (MLR) and Levenberg–Marquardt (LM) feed-forward neural network was examined to estimate the hourly dew point temperature. Dew point temperature is the temperature at which water vapour in the air condenses into liquid. This temperature can be useful in estimating meteorological variables such as fog, rain, snow, dew, and evaporation and in investigating agronomical issues as stomatal closure in plants. The availability of hourly records of climatic data (air temperature, relative humidity and pressure) which could be used to predict dew point temperature initiated the practice of modelling. Additionally, the wind vector (wind speed magnitude and direction) and conceptual input of weather 9

condition were employed as other input variables. In general, a regression model will be defined as a single algebraic equation of the form [13] Z = f (X1, X2,...Xk ) + u Where Z is a variable whose movements and values may be described or explained by the variables X1, X2,…, Xk . The letters are known as regress OR’s and assumed to have a causal relationship to the dependent variable Z. The additional term u is a random variable, which is included to account for the fact that movements in Z are not completely explained by the variables. 3.3. M5P Model Trees create a decision tree and use a linear model at each node to make a prediction rather than using an average value in other words it combines conventional decision tree with the possibility of linear regression. M5P generates models that are compact and relatively logical. John Ross Quinlan a specialist researcher in Machine learning and its application to data mining. He is also inventor of M5, ID3 (Iterative Dichotomiser 3), C4.5 and C5.0 algorithms. Published in research paper [14] in the paper describes a Model trees, like regression trees, can be taught successfully from large datasets - in the experiment using DEC Station 5000 a model tree for the car price data was constructed in less than a second, while that for the LHRH data (563 cases X 128 attributes) still required less than a minute. Paper demonstrated model trees have advantages over regression trees in terms of compactness and prediction accuracy, attributable to the ability of model trees to exploit local linearity in the data. There is one other noteworthy difference regression trees will never give a predicted value lying outside the range observed in the training cases, whereas model trees can extrapolate. Yong Wang made improvements to the M5P algorithm [15]

10

3.4. Least Median Square A brief paper on data mining [16] we briefly reviewed the various data mining prediction techniques in literature. This research paper is helpful to researcher to impart the several of data mining prediction issues. It is really very difficult to predict and it is a complex. Actually no approaches or tools can guarantee to generate the 100% accurate prediction in the organization. In this paper, authors have successfully analysed the different algorithm and prediction technique. The different algorithms used are Linear Regression and Least Median Square to predict numerical values. Paper concludes that the least median squares square regression algorithm produce better results but it takes more time to build the model when compare with Linear regression. Another paper found [17], in this research paper aims at analysis of soil dataset using data mining techniques. It spotlight on classification of soil using various algorithms accessible. Another important objective is to predict untested attributes using regression technique, and implementation of automated soil sample classification. Using regression algorithms like Linear Regression, Least Median Square, Simple Regression different attributes were predicted. According to these results the values of Phosphorous attribute was found to be most accurately predicted and it depends on least number of attributes.

11

4 Machine Learning This chapter introduces Machine Learning. We discuss about a brief introduction about machine learning. Describes machine learning components such as data model, attributes, classification and test sets. Learning can be defined in general as a process of gaining knowledge through experience. We humans start the process of learning new things from the day we are born. This learning process continues throughout our life where we try to gather more knowledge and try to improve what we have already learned through experience and from information gathered from our surroundings. Artificial Intelligence (AI) is a field of computer science whose objective is to build a system that exhibits intelligent behaviour in the tasks it performs. A system can be said to be intelligent when it has learned to perform a task related to the process it has been assigned to without any human interference and with high accuracy. Machine Learning (ML) is a sub-field of AI whose concern is the development, understanding and evaluation of algorithms and techniques to allow a computer to learn. ML mix with other specialties such as statistics, human psychology and brain modelling. Human psychology and neural models took from brain modelling help in understanding the mechanisms of the human brain, and especially its learning process, which can be used in the construction of ML algorithms. Since many ML algorithms use analysis of data for building models, statistics plays a key role in this field. A process or assignment that a computer is apportioned to deal with can be termed the knowledge or task domain (or just the domain). The information that is generated by or obtained from the domain constitutes its knowledge base. The knowledge base can be represented in several ways using Boolean, numerical, and discrete values, relational literals and their combinations. The knowledge base is generally denoted in the form of input-output pairs, where the information represented by the input is given by the domain and the result produced by the domain is the output. The information from the knowledge base can be used to show the data generation process (i.e., output classification for a given input) of the domain. Knowledge of the data generation task does not define the internals of the working of the domain, but can be used to classify new inputs accordingly. As the knowledge base grows in size or gets difficult, inferring new relations about the data generation process (the domain) becomes difficult for humans. ML algorithms 12

attempt to learn from the domain and the knowledge base to build computational models that represent the domain in an accurate and efficient way. The model built captures the data generation task of the domain, and by use of this model the algorithm is able to match previously unobserved examples from the domain. The models built can take on diverse forms based on the ML algorithm used. Some of the model forms are decision lists, inference networks, concept hierarchies, state transition networks and search-control rules. The concepts and working of various ML algorithms are different but their common goal is to study/learn from the domain they represent. ML algorithms requires a dataset, which constitutes the knowledge base, to build a model of the domain. The dataset is a collection of occurrences from the domain. Technically these occurrences referred as instances. Each instance contains of a set of attributes which describe the properties of that example from the domain. An attribute takes in a range of values based on its attribute type, which can be discrete or continuous. Discrete (or nominal) attributes take on distinct values (e.g., car = Honda, weather = sunny) whereas continuous (or numeric) attributes take on numeric values (e.g., distance = 10.4 meters, temperature = 20ºC). Each instance consists of a set of input attributes and an output attribute. The input attributes are the information given to the learning algorithm and the output attribute contains the feedback of the activity on that information. The value of the output attribute is assumed to depend on the values of the input attributes. The attribute along with the value assigned to it define a feature, which makes an instance a feature vector. The model built by an algorithm can be seen as a function that maps the input attributes in the instance to a value of the output attribute. Huge amounts of data may look random when observed with the naked eye, but on a closer examination, we may find patterns and relations in it. We also get an understanding into the mechanism that generates the data. Witten & Frank [18] define data mining as a process of discovering patterns in data. It is also denoted to as the process of extracting relationships from the given data. In general data mining differs from machine learning in that the issue of the proficiency of learning a model is considered along with the effectiveness of the learning. In data mining problems, we can look at the data generation process as the domain and the data generated by the domain as the knowledge base. Thus, ML algorithms can be used to learn a model that describes the data generation process based on the dataset given to it. The data given to 13

the algorithm for building the model is called the training data, as the computer is being trained to learn from this data, and the model built is the result of the learning process. This model can now be used to predict or classify previously unseen examples. New examples used to evaluate the model are called a test set. The correctness of a model can be estimated from the difference between the predicted and actual value of the target attribute in the test set. Predicting weather conditions can also be considered as an example of data mining. Using the weather data collected from a location for a certain period of time, we obtain a model to predict variables such as temperature at a given time based on the input to the model. As weather conditions tend to follow patterns and are not totally random, we can use current meteorological readings along with those taken a few hours earlier at a location and also readings taken from nearby locations to predict a condition such as the temperature at that location. Thus, the data instances that will be used to build the model may contain present and previous hour's readings from a set of nearby locations as input attributes. The variable that is to be predicted at one of these locations for the present hour is the target attribute. The type and number of conditions that are included in an instance depend on the variable we are trying to predict and on the properties of the ML algorithm used. WEKA [18], for Waikato Environment for Knowledge Analysis, is a collection of various ML algorithms, implemented in Java, can be used for data mining problems. Apart from applying ML algorithms on datasets and analysing the results generated, WEKA also provides options for pre-processing and visualization of the dataset. It can be extended by the user to implement new algorithms. In this research mini-thesis WEKA Java API extensively used. Assume that we want to guess the present temperature at a weather station C (see Figure 4.1). To do this we use eight input attributes: the previous two hours temperature together with the present hour temperature at C and two nearby weather site locations A and B. The output attribute is the present hour temperature at C. Let temp_ denote temperature taken at hour at location , then the data instance will take the form, temp_At-2, temp_At-1, temp_At, temp_Bt-2, temp_Bt-1, temp_Bt, temp_Ct-2, temp_Ct-1, temp_Ct

With the last attribute, temp_Ct, being the output attribute.

14

A C

B

temp_Ct =?

Data available from the stations Time t-1 t-2 t o o Temp 12 13 14o

Figure 4.1: Using data from nearby weather site location to predict temperature for the location C. ML algorithms can be broadly classified into two groups, classification and regression algorithms. We describe these two types of classifications and some of the ML algorithms from each of these groups. 4.1. Classification Algorithms Algorithms that classify a given instance into a set of discrete categories are called classification algorithms. These algorithms work on a training set to come up with a model or a set of rules that classify a given input into one of a set of discrete output values. Most classification algorithms can take inputs in any form, discrete or continuous although some of the classification algorithms require all of the inputs also to be discrete. The output is always in the form of a discrete value. Decision trees and Bayes nets are examples of classification algorithms. To be able to apply classification algorithms on our weather example we need to convert the output attribute into classes. This is generally done by discretization, which is the process of dividing a continuous variable into classes. Discretization can be done in many ways, a simple approach would be to divide the temperature into ranges of 5 degrees and giving each range a name or by using entropy-based algorithms [19] [20]. Inputs attributes can be left as continuous if the algorithm deals with them or they can be converted into discrete values depending on the algorithm. I designate in detail the classification algorithms that have been used in this thesis in the sub-sections below.

15

4.2. Regression Algorithms Algorithms that develop a model based on equations or mathematical operations on the values taken by the input attributes to produce a continuous value to represent the output are called of regression algorithms. The input to these algorithms can take both continuous and discrete values depending on the algorithm, whereas the output is a continuous value. We describe in detail the regression algorithms that have been used in this thesis below. 4.2.1. Linear Regression The Linear Regression algorithm of WEKA [18] performs standard least squares regression to identify linear relations in the training data. This algorithm gives the finest results when there is some linear dependency among the data. This algorithm requires the input attributes and target class to be numeric and it does not allow missing attributes values. The algorithm calculates a regression equation to predict the output (x) for a set of input attributes a1,a2,.....,ak. The equation to calculate the output is expressed in the form of a linear combination of input attributes with each attribute associated with its respective weight w0,w1,....,wk, where w1 is the weight of a1 and a0 is always taken as the constant 1. An equation takes the form

x  w0  w1 a1  ..........  wk a k . For our weather example the equation learned would take the form temp_Cout = w0 + wAt-2 temp_At-2 + wAt-1 temp_At-1 + wAt temp_At + wBt-2 temp_Bt-2 + wBt-1 temp_Bt-1 + wBt temp_Bt + wCt-2 temp_Ct-2 + wCt-1 temp_Ct-1 , where temp_Cout is value assigned to the output attribute, and each term on the right hand side is the product of the values of the input attributes and the weight associated with each input. The accuracy of predicting the output by this algorithm can be measured as the absolute difference between the actual output observed and the predicted output as obtained from the regression equation, which is also the error. The weights must be chosen in such way that they minimize the error. To get better accuracy higher weights must be assigned to those attributes that influence the result the most. A set of training instances is used to update the weights. At the start, the weights can be assigned random values or all set to a constant (such as 0). For the first instance in the training data the predicted output is obtained as 16

w0  w1 a1  ..........  wk a k (1)

(1)

k

  wja j

(1)

j 0

,

where the superscript for attributes gives the instance position in the training data. After the predicted outputs for all instances are obtained, the weights are reassigned so as to minimize the sum of squared differences between the actual and predicted outcome. Thus the aim of the weight update process is to minimize  (i ) k   x   w j a j (i )     i 1  j 0 , n

which is the sum of the squared differences between the observed output for the ith training instance (x(i)) and the predicted outcome for that training instance obtained from the linear regression equation. 4.2.2. LeastMedSquare The WEKA LeastMedSquare or Least Median Squares of Regression algorithm [21] is a linear regression method that minimizes the median of the squares of the differences from the regression line. The algorithm requires input and output attributes to be continuous, and it does not allow missing attribute values. Standard linear regression is applied to the input attributes to get pre the output. The predicted output x is obtained as

w0  w1 a1  ..........  wk a k (1)

(1)

k

  wja j

(1)

j 0

,

Where the ai are input attributes and wi are the weights associated with them. In the LeastMedSquare algorithm, using the training data, the weights are updated in such a way that they minimize the median of the squares of the difference between the actual output and the predicted outcome using the regression equation. Weights can be initially set to random values or assigned a scalar value. The aim of the weight update process is to determine new weights to minimize k  (i )  median x (i )   w j a j  i j 0  ,

Where i ranges from 1 to the number of instances in the training data that is being used, x(i) is the actual output for the training instance i, and the predicted outcome for that training instance is obtained from the regression equation. 17

4.2.3. M5P The M5P or M5Prime algorithm [15] is a regression-based decision tree algorithm, based on the M5 algorithm by Quinlan [14]. M5P is developed using M5 with some additions made to it. We will first describe the M5 algorithm and then the features added to it in M5P.

Temp_A

>25

Temp_c

Temp_bt >23 Model 1

±0.00001 and < ± 1.99



PlusOrMinus2to4: Predicted value between > ±1.9999 and ±4.9999 and ±10.9999 and ±15.9999 and ±25.9999 and ±50.9999

Here is the complete analysis based above conditions. 

Dataset #1: 732 instances classified, for the month of December, for the stations Anglesea Street Blackrock Castle and North ring road. The stations represented as S1, S2 and S3 respectively in the below Figure 7.1.

41

December

800

700 600 500 400 300 200 100 0 LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

S1

S1

S1

S1

S2

S2

S2

S2

S3

S3

S3

S3

Number of Correct Instances PlusOrMinus5to10 PlusOrMinus26to50

PlusOrMinus1 PlusOrMinus11to15 GreaterPlusOrMinus51

PlusOrMinus2to4 PlusOrMinus16to25

Figure 7.1: Results of dataset #1. From the above Figure 7.1, we are able to predict temperature (98%) of Blackrock Castle (S2) station with margin of error between ±0.0001 to ±1.999 and add exact prediction using Least Med Square algorithm. On same station rest of algorithms are able predict temperature with margin of error ±0.0001 between ±1.999 greater than 92%. 

Dataset #2: 736 instances classified, for the month of December, for the stations Innishannon, Roches Point and Youghal. The stations represented as S1, S2 and S3 respectively in the below Figure 7.2.

December 800 700 600 500 400 300 200 100 0 LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

S1

S1

S1

S1

S2

S2

S2

S2

S3

S3

S3

S3

Number of Correct Instances

PlusOrMinus1

PlusOrMinus2to4

PlusOrMinus5to10

PlusOrMinus11to15

PlusOrMinus16to25

PlusOrMinus26to50

GreaterPlusOrMinus51

Figure 7.2: Results of dataset #2.

42

From the above Figure 7.2, we are able to predict temperature (92%) of Roches Point (S2) station with margin of error between ±0.0001 to ±1.999 and add exact prediction using Linear Regression algorithm. On same station rest of algorithms are able predict temperature with margin of error ±0.0001 between ±1.999 greater than 74%. 

Dataset #3: 687 instances classified, for the month of November, for the stations Innishannon, Roches Point and Youghal. The stations represented as S1, S2 and S3 respectively in the below Figure 7.3.

November 700 600 500 400 300 200 100 0 LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

S1

S1

S1

S1

S2

S2

S2

S2

S3

S3

S3

S3

Number of Correct Instances

PlusOrMinus1

PlusOrMinus2to4

PlusOrMinus5to10

PlusOrMinus11to15

PlusOrMinus16to25

PlusOrMinus26to50

GreaterPlusOrMinus51

Figure 7.3: Results of dataset #3. From the above Figure 7.3, we are able to predict temperature (84%) of Roches Point (S2) station with margin of error between ±0.0001 to ±1.999 and add the no. of correctly predicted instance using Multilayer perceptron algorithm. On same station rest of algorithms are able predict temperature with margin of error ±0.0001 between ±1.999 add the no. of correctly predicted instance is greater than 48%. Also note that Multilayer Perceptron algorithm successfully predicted temperature (81%) for the station Youghal (S3) and rest of rest of algorithms are able predict temperature with margin of error ±0.0001 between ±1.999 add exact prediction is greater than 47%. 

Dataset #4: 717 instances classified, for the month of October, for the stations Carrigaloe, Cobh and Platform Alpha. The stations represented as S1, S2 and S3 respectively in the below Figure 7.4.

43

October 700 600 500 400 300 200 100 0 LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

S1

S1

S1

S1

S2

S2

S2

S2

S3

S3

S3

S3

Number of Correct Instances

PlusOrMinus1

PlusOrMinus2to4

PlusOrMinus5to10

PlusOrMinus11to15

PlusOrMinus16to25

PlusOrMinus26to50

GreaterPlusOrMinus51

Figure 7.4: Results of dataset #4. From the above Figure 7.4, we are able to predict temperature (87%) of Carrigaloe (S1) station with margin of error between ±0.0001 to ±1.999 and add the no. of correctly predicted instance using Multilayer perceptron algorithm. On same station rest of algorithms are able predict temperature with margin of error ±0.0001 between ±1.999 add the no. of correctly predicted instance is greater than 40%. But also note that error greater then ±50 degrees for multilayer perceptron for the station of Platform Alpha. The reason may be due to the geographical location of Platform Alpha [31], it is located off shore in sea. 

Dataset #5: 680 instances classified, for the month of October, for the stations CIT, Cork Airport and North Ring Road. The stations represented as S1, S2 and S3 respectively in the below Figure 7.5.

44

October 700 600 500 400 300 200 100 0 LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

S1

S1

S1

S1

S2

S2

S2

S2

S3

S3

S3

S3

Number of Correct Instances

PlusOrMinus1

PlusOrMinus2to4

PlusOrMinus5to10

PlusOrMinus11to15

PlusOrMinus16to25

PlusOrMinus26to50

GreaterPlusOrMinus51

Figure 7.5: Results of dataset #5. From the above Figure 7.5, we are able to predict temperature (92%) of CIT (S1) station with margin of error between ±0.0001 to ±1.999 and add the no. of correctly predicted instance using Least Med Square algorithm. On same station rest of algorithms are able predict temperature with margin of error between ±0.0001 to ±1.999 (degrees of temperature) add the no. of correctly predicted instance is greater than 66%. But also neither of algorithm are unable to predict accurately Cork Airport station temperature. 

Dataset #6: 188 instances classified, for the month of October, for the stations Anglesea Street, Blackrock Castle and CIT. The stations represented as S1, S2 and S3 respectively in the below Figure 7.6.

June 180 160 140 120 100 80 60 40 20 0 LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

S1

S1

S1

S1

S2

S2

S2

S2

S3

S3

S3

S3

Number of Correct Instances PlusOrMinus5to10 PlusOrMinus26to50Percent

PlusOrMinus1 PlusOrMinus11to15 GreaterPlusOrMinus51

45

PlusOrMinus2to4 PlusOrMinus16to25

Figure 7.6: Results of dataset #6. From the above Figure 7.6, we are able to predict temperature (87%) of Anglesea Street (S1) station with margin of error between ±0.0001 to ±1.999 (degrees of temperature) add the no. of correctly predicted instance using Multilayer perceptron algorithm. On same station rest of algorithms are able predict temperature with margin of error between ±0.0001 to ±1.999 (degrees of temperature) add the no. of correctly predicted instance is greater than 25%. But also note that same algorithm is able to predict accurate with 81% and 71% for station Blackrock Castle (S2) and CIT (S3). 

Dataset #7: 647 instances classified, for the month of October, for the stations Anglesea Street, Blackrock Castle and CIT. The stations represented as S1, S2 and S3 respectively in the below Figure 7.7.

July 600 500 400 300 200 100 0 LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

LML

LR

M5P

MLP

S1

S1

S1

S1

S2

S2

S2

S2

S3

S3

S3

S3

Number of Correct Instances

PlusOrMinus1

PlusOrMinus2to4

PlusOrMinus5to10

PlusOrMinus11to15

PlusOrMinus16to25

PlusOrMinus26to50

GreaterPlusOrMinus51

Figure 7.7: Results of dataset #7. From the above Figure 7.7, we are able to predict temperature (76.5%) of Anglesea Street (S1) station with margin of error between ±0.0001 to ±1.999 (degrees of temperature) add the no. of correctly predicted instance using Linear regression algorithm. On same station rest of algorithms are able predict temperature with margin of error between ±0.0001 to ±1.999 (degrees of temperature) add the no. of correctly predicted instance is greater than 51%. But also note that for station Blackrock Castle (S2) all algorithms predicted greater than 59%. 

Dataset #8: 428 instances classified, for the month of August, for the stations Platform Alpha, Ringaskiddy-DWB and Roches Point. The stations represented as S1, S2 and S3 respectively in the below Figure 7.8. 46

August 400 350 300 250 200 150 100 50 0

LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

S1

S1

S1

S1

S2

S2

S2

S2

S3

S3

S3

S3

Number of Correct Instances

PlusOrMinus1

PlusOrMinus2to4

PlusOrMinus5to10

PlusOrMinus11to15

PlusOrMinus16to25

PlusOrMinus26to50

GreaterPlusOrMinus51

Figure 7.8: Results of dataset #8. From the above Figure 7.8, we are able to predict temperature (85%) of RingaskiddyDWB (S2) station with margin of error between ±0.0001 to ±1.999 (degrees of temperature) add the no. of correctly predicted instance using Multilayer perceptron algorithm. On same station rest of algorithms are able predict temperature with margin of error between ±0.0001 to ±1.999 (degrees of temperature) add the no. of correctly predicted instance is greater than 77%. But also note that for station Platform Alpha (S1) all algorithms predicted greater than 61%. 

Dataset #9: 709 instances classified, for the month of January, for the stations Platform Alpha, Roches Point and Smart Bay. The stations represented as S1, S2 and S3 respectively in the below Figure 7.9.

47

January 600 500

400 300 200

100 0 LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

S1

S1

S1

S1

S2

S2

S2

S2

S3

S3

S3

S3

Number of Correct Instances PlusOrMinus5to10 PlusOrMinus26to50

PlusOrMinus1 PlusOrMinus11to15 GreaterPlusOrMinus51

PlusOrMinus2to4 PlusOrMinus16to25

Figure 7.9: Results of dataset #9. From the above Figure 7.9, we are able to predict temperature (75%) of Platform Alpha (S1) station with margin of error between ±0.0001 to ±1.999 (degrees of temperature) add the no. of correctly predicted instance using Least Med Square algorithm. On same station rest of algorithms are able predict temperature with margin of error between ±0.0001 to ±1.999 (degrees of temperature) add the no. of correctly predicted instance is greater than 72%. 

Dataset #10: 585 instances classified, for the month of February, for the stations Platform Alpha, Roches Point and Smart Bay. The stations represented as S1, S2 and S3 respectively in the below Figure 7.10.

February 500 400 300 200 100 0 LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

LMS

LR

M5P

MLP

S1

S1

S1

S1

S2

S2

S2

S2

S3

S3

S3

S3

Number of Correct Instances PlusOrMinus5to10 PlusOrMinus26to50Percent

PlusOrMinus1 PlusOrMinus11to15 GreaterPlusOrMinus51

PlusOrMinus2to4 PlusOrMinus16to25

Figure 7.10: Results of dataset #10. From the above Figure 7.10, we are able to predict temperature (71%) of Platform 48

Alpha (S1) station with margin of error between ±0.0001 to ±1.999 (degrees of temperature) add the no. of correctly predicted instance using Multilayer Perceptron algorithm. On same station rest of algorithms are able predict temperature with margin of error between ±0.0001 to ±1.999 (degrees of temperature) add the no. of correctly predicted instance is greater than 69%. The best result (98%) from the above 10 dataset is Blackrock Castle, using LeastMedSquare algorithm based represented as chart format in Figure 7.11.

December - Blackrock Castle - LeastMedSquare - 98% 16 14 12 10

8 6 4 2

-2

1 23 45 67 89 111 133 155 177 199 221 243 265 287 309 331 353 375 397 419 441 463 485 507 529 551 573 595 617 639 661 683 705 727

0

Actual

Predicted

Figure 7.11: Results of LeastMedSquare algorithm on month of December for Blackrock Castle station with 98% with margin of error between ±0.0001 to ±1.999.

49

7.2. Conclusion Exact number of predictions by all of algorithm achieved is less than 1%, LMS achieved maximum among them but not significant enough when you compared with other algorithms. LMS also performed 4 times Major error% is less than 5% out of 30 instances including one instance time there is no Major errors. LR algorithm able to predict with highest number of minor errors but its performance on Average Major error% and Maximum Major error% is no reduction when we compared with other algorithms. LR also performed 5 times where Major error% is less than 5% out of 30 instances. M5P algorithm performed lesser among all algorithms. It predicted highest Major errors instances, Maximum Major error% and lowest minor errors when you compared with other algorithms. Only positive outcome from this algorithm is performed 4 times where Major error% is less than 5% out of 30 instances. Even though MLP achieved lesser average Major error% it is in-line with rest of the algorithms. Predicting values with minor errors is no significant improvement than other algorithms. Similar to LMS also performed 4 times Major error% is less than 5% out of 30 instances including twice there is no Major errors. By far MLP predicts lesser Major errors than any other algorithm. ± Avg. Max. 5 Greater51 Major Major ±1(Minor ±2 to (Major error error % Algorithm 0 errors errors) to 4 10 errors) % 83.3 LR 148 11024 922 65 6168 33 85.7 M5P 140 10500 976 67 6644 36 82.7 LMS 156 10849 908 90 6324 34 82.5 MLP 123 10872 1275 76 5981 32 Table 7.1: Summary of results of the algorithms. Weather prediction is a complex subject it involves computational statistics, physics and atmospheric science. Using ML techniques on historical data we can predict temperature of a station using two stations. It is also possible to automate the process. But there is one definite algorithm provides consistent results. Always consider using similar type weather stations, for example use land based weather station to predict land based station. Prediction error is large in few cases geographical distance and its location. 50

References [1]

“Met Éireann - The Irish Weather Service”, [on-line] Available at http://www.met.ie/forecasts/5day-ireland.asp

[2]

“CIMO-Guide”, [on-line] Available at https://www.wmo.int/pages/prog/www/IMOP/CIMO-Guide.html

[3]

Radhika and M. Shashi, “Atmospheric Temperature Prediction using Support Vector Machines”, International Journal of Computer Theory and Engineering, 1, (2009), pp. 1793–8201.

[4]

S. S. Baboo and I. K. Shereef, “An Efficient Weather Forecasting System using Artificial Neural Network”, International Journal of Environmental Science and Development, 1, 4, (2010), pp. 321–326.

[5]

A. Ganatra, Y. P. Kosta, G. Panchal, and C. Gajjar, “Initial Classification Through Back Propagation In a Neural Network Following Optimization Through GA to Evaluate the Fitness of an Algorithm”, International Journal of Computer Science and Information Technology, 3, 1, (2011), pp. 98–116.

[6]

P. Ramasamy, S. S. Chandel, and A. K. Yadav, “Wind speed prediction in the mountainous region of India using an artificial neural network model”, Renewable Energy, 80, (2015), pp. 338–347.

[7]

K. Kumar and G. S. Mitra Thakur, “Extracting Explanation from Artificial Neural Networks”, International Journal of Computer Science and Information Technologies (IJCSIT), 3, 2, (2012), pp. 3812–3815.

[8]

“Data mining with WEKA, Part 1: Introduction and regression”, [on-line] Available at http://www.ibm.com/developerworks/library/os-weka1/

[9]

D. R. Massie and M. A. Rose, “Predicting Daily Maximum Temperatures Using Linear Regression and Eta Geopotential Thickness Forecasts”, Weather and Forecasting, 12, 4, (1997), pp. 799–807.

[10] M. R. Braun, H. Altan, and S. B. M. Beck, “Using regression analysis to predict the future energy consumption of a supermarket in the UK”, Applied Energy, 130, (2014), pp. 305–313. [11] P. Pitale and A. Ambhaikar, “Sensitive Region Prediction using Data Mining Technique”, International Journal of Engineering and Advanced Technology (IJEAT), 1, 5, (2012), pp. 332–336.

51

[12] M. Zounemat-Kermani, “Hourly predictive Levenberg–Marquardt ANN and multi linear regression models for predicting of dew point temperature”, Meteorology and Atmospheric Physics, 117, no. 3–4, (2012), pp. 181–192. [13] G. J. Hahn, N. Draper, and H. Smith, Applied Regression Analysis, Ed. 3., 1982 ISBN 0-471-17082-8. [14] J. R. Quinlan, “Learning With Continuous Classes”, in In Proceedings AI’92 (Adams & Sterling, Eds), Singapore, (1992), pp. 343-348. [15] Y. Wang and I. H. Witten, “Inducing Model Trees for Continuous Classes”, in In Poster Papers of the Ninth European Conference on Machine Learning, (1997), pp. 128–137. [16] N. Padhy, “Data Mining: A prediction Technique for the workers in the PR Department of Orissa (Block and Panchayat)”, International Journal of Computer Science, Engineering and Information Technology, 2, 5, (2012), pp. 197–36. [17] J. Gholap, A. Ingole, J. Gohil, S. Gargade, and V. Attar, “Soil Data Analysis Using Classification Techniques and Soil Attribute Prediction”, International Journal of Computer Science Issues, 9, 3, (2012), pp. 415-418. [18] Witten, I. H., Frank E., and Hall M. A., “Data Mining: Practical Machine Learning Tools and Techniques”, Morgan Kaufmann Publishers, Ed. 3, 2011, ISBN: 978-012-374856-0 [19] Fayyad, U. and Irani, K., “Multi-interval discretization of continuous-valued attributes for classification learning”, Proceedings of 13th International Joint Conference on Artificial Intelligence, (1993), pp. 1022-1027. [20] Dougherty, J., Kohavi, R. and Sahami, M., “Supervised and unsupervised discretization of continuous features”, Proceedings of the Twelfth International Conference on Machine Learning, 12, (1995), pp. 94-202. [21] P. J. Rousseeuw, “Least Median of Squares Regression”, Journal of the American Statistical Association, 79, 388, (1984), pp. 871–880. [22] C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Ed. 1, 1995, ISBN: 0198538642. [23] D. E. Rumelhart and J. L. McClelland, “Learning internal representations by error propagation”, in Parallel distributed processing: explorations in the microstructure of cognition, vol. I, Cambridge, MA: MIT Press, 1986, pp. 318–362. [24] “Weka 3: Data Mining Software in Java”, [on-line] Available at http://www.cs.waikato.ac.nz/ml/weka/ 52

[25] “Attribute-Relation File Format (ARFF).” [on-line] Available at http://www.cs.waikato.ac.nz/ml/weka/arff.html [26] Holmes, Geoffrey, Andrew Donkin, and Ian H. Witten. "WEKA: A machine learning workbench." In Intelligent Information Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand Conference on, IEEE, (1994), pp. 357361. [27] [on-line]. Available at http://www.corkharbourweather.ie [28] [on-line]. Available at http://86.43.106.118/weather/index.php?page=pageStations [29] “Documentation.”, [on-line] Available at http://www.cs.waikato.ac.nz/ml/weka/documentation.html [30] “Monthly Data - Climate - Met Éireann - The Irish Meteorological Service Online.”, [on-line] Available at http://www.met.ie/climate/monthly-data.asp?Num=3904 [31] “Gas Production | Kinsale Energy.”, [on-line] Available at http://www.kinsale-energy.ie/gas-production.html

53

Appendix A: Abbreviations API: Application Programming Interface ARFF: Attribute-Relation File Format CIT: Cork Institute of Technology CSV: Comma Separated Values GUI: Graphical User Interface LMS: Least Median Square LR: Linear Regression ML: Machine Learning MLP: Multilayer Perceptron UV: Ultra Violet WEKA: Waikato Environment for Knowledge Analysis

54

Suggest Documents