Temperature forecasting with a dynamic higher ... - ACM Digital Library

4 downloads 7879 Views 733KB Size Report
Temperature Forecasting with a Dynamic Higher-Order. Neural Network Model. Noor Aida Husaini. Faculty of Computer Science and. Information Technology ...
Temperature Forecasting with a Dynamic Higher-Order Neural Network Model Rozaida Ghazali

Noor Aida Husaini

Faculty of Computer Science and Faculty of Computer Science and Information Technology, UTHM Malaysia Information Technology, UTHM Malaysia 86400 Pt. Raja, Bt. Pahat, Johor 86400 Pt. Raja, Bt. Pahat, Johor Malaysia Malaysia +60137599878 +60194457524

Lokman Hakim Ismail Faculty of Civil & Environmental Engineering, UTHM Malaysia 86400 Pt. Raja, Bt. Pahat, Johor Malaysia +60127227600

[email protected]

[email protected]

[email protected]

Norhamreeza Abdul Hamid

Mustafa Mat Deris

Nazri Mohd Nawi

Faculty of Computer Science and Faculty of Computer Science and Faculty of Computer Science and Information Technology, UTHM Malaysia Information Technology, UTHM Malaysia Information Technology, UTHM Malaysia 86400 Pt. Raja, Bt. Pahat, Johor 86400 Pt. Raja, Bt. Pahat, Johor 86400 Pt. Raja, Bt. Pahat, Johor Malaysia Malaysia Malaysia +60137107067 +60197725089 +60169629692

[email protected]

[email protected]

[email protected]

the reliability of the data being used, methods of forecasting and environment change. In such cases, we resort to a dynamic higher-order neural network model based on historical data in dealing with temperature forecasting that lies about predicting the future event of the temperature. This kind of prediction can ease the daily routine of human being, and may contribute to tourism sectors. Historically, variety of methods have been proposed which deal with temperature including physical methods, statistical-empirical methods, and numerical-statistical methods [5], [6]. However, those methods were constitutionally high maintenance thus only capable of providing certain information [7]. Owing to difficulties of formulating reasonable nonlinear model, recent attempts have resorted to various soft computing approaches for complex temperature modelling [811]. However, all of the studies mentioned above which are considered as black-box models, whereas taking in and giving out information [7] do not provide users with a function that describes the relationship between the input and output. Indeed, such approaches prone to overfit the data. On the other hand, they also suffer long training times and easily trapped into local minima. Therefore, the use of such dynamic system which armed with computational intelligence attempts to integrate several different computing paradigms such as Higher Order Neural Networks (HONN) [12] and Recurrent Neural Networks [13]. On their own, each of these network models appears to be extremely effective in handling dynamic, nonlinear data and consequently providing superior simulations [14] especially when the underlying physical relationships are not fully understood. However, when utilised together, the strengths of each network models can be exploited in a synergistic manner and therefore can be a promising tool for the development of low cost hybrid systems. Thus, in providing passively comfortable outdoor environment for the region as hot and humid as Malaysia, the Jordan Pi-Sigma Neural Network (JPSN) significantly matches the idea. The JPSN which has the ability to converge faster and maintain the high learning capabilities of HONN [12], together with an additional recurrent term from the output layer to the input layer can be used for temporal sequences of input-output mappings. This is then, the JPSN may be beneficial for meteorological department which imply the nonlinearity

ABSTRACT This paper presents the application of a combined approach of Higher Order Neural Networks and Recurrent Neural Networks, so called Jordan Pi-Sigma Neural Network (JPSN) for comprehensive temperature forecasting. In the present study, one-step-ahead forecasts are made for daily temperature measurement, by using a 5-year historical temperature measurement data. We also examine the effects of network parameters viz the learning factors, the higher order terms and the number of neurons in the input layer for selecting the best network architecture, using several performance measures. The comparison results show that the JPSN model can provide excellent fit and forecasts with reasonable results, therefore can be used as temperature forecasting tool.

Categories and Subject Descriptors H.2 [Database Management]: Database Application - Data Mining

General Terms Algorithms, Performance, Design, Reliability, Experimentation.

Keywords Temperature, higher order, recurrent, Jordan Pi-Sigma.

1. INTRODUCTION Forecasting future conditions can be applied to many critical fields, such as disease forecasting [1], [2], technology forecasting [3], [4] and weather forecasting [5], [6]. Such predictions have been difficult, and might lead to the uncertainties associated with Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. iiWAS’11, 5–7 December, 2011, Ho Chi Minh City, Vietnam. Copyright 2011 ACM 978-1-4503-0784-0/11/12...$10.00.

547

where d j (t ) is the target output and y j (t ) is the forecast output. At every time (t ) , the weights are updated according to:

relationship in meteorological process and may be helpful in modelling the temperature for predictive purposes. The rest of this paper is organised as follows. Section 2 describes the JPSN. Section 3 describes the implementation and data collection. Results and discussions are discussed in Section 4. Finally, the discussions of this work and directions for future research are given in Section 5.

Δwij (t ) = −η

∂J (t ) + αΔwij (t − 1) ∂wij

(4)

where η is the learning rate and α is the momentum term. The weights are updated using backpropagation learning algorithm [15]. The updated network carries out the learning cycle until the maximum number of epochs is reached. The utilisation of product units in the output layer indirectly incorporates the capabilities of JPSN while using a small number of weights and processing units. Therefore, the JPSN that combines the properties of both HONN and RNN offers better performance. When utilising the JPSN as predictor for one-step-ahead, the previous input values are used to predict the next element in the data. Since network with recurrent connection holds several advantages over ordinary feedforward networks especially in dealing with time-series problems, therefore, these dynamic properties may give a value-added to that JPSN. The JPSN has a topology of a fully connected two-layered feedforward network. Considering the fixed weights that are not tuneable, it can be said that such a network topology with only one layer of tuneable weights may reduce the training time.

2. JPSN MODEL FOR TEMPERATURE FORECASTING The general idea behind the use of JPSN in temperature forecasting is that avoiding the network from the combinatorial explosion of higher-order terms as the network order increases. The JPSN utilises the higher order terms at the output layer, by having the hidden layer as the summation of input terms and the network output is the product of these terms. The JPSN which is constructed by having a recurrent link from output layer back to the input layer gives the temporal dynamics of the time-series process, thus allowing the network to compute the functions in a more parsimonious way. This feedback connection can be used as additional input for the next time step. The nonlinearity that has been introduced by JPSN contains a multi-linear interaction between their inputs or neurons which enable them to expand the input space into higher dimensional space. The JPSN architecture was comprised of an input layer, one or more higher order terms, and an output layer. The simpler characteristics of JPSN, which has a single layer of tuneable weights between the input layers and summing layers, can offer a large saving of hardware implementation. The architecture of the JPSN is shown in Figure 1.

3. IMPLEMENTATION This temperature forecasting has been implemented using MATLAB 7.10.0 (R2010a) on Pentium® Core ™2 Quad CPU under Window XP Professional platform. The evaluation of the temperature forecasting has been done on the basis of standard measuring criteria: Mean Squared Error (MSE), Signal to Noise Ratio (SNR), and Normalised Mean Squared Error (NMSE) [16]. Another most important criterion used for evaluation of the network model is generalisation ability. A network is said to be generalised well, when the output is correct (or close enough) for an input which has not been included in the training set. In addition, we also consider the number of iterations during the training process.

3.1 Data Collection Figure 1. The JPSN ( z

−1

One the most important components in the success of neural network solution is the data collection. The quality, availability, reliability and relevancy of the data used to develop and run the system are critical to its success. For the implementation of temperature forecasting, historical data for five years of daily temperature measurement in Batu Pahat, ranging from 1/1/2005 to 31/12/2009 were collected from Central Forecast Office, Malaysian Meteorological Department (MMD) [17]. The reason for selecting Batu Pahat data is due to the fact that the data that is relatively uniform whilst contains no missing values. Based on the previous records, the maximum, minimum and average temperature measurements are tabulated in Table 1. The idea behind this selection was to examine the applicability of JPSN on the temperature measurement.

denotes the time delay operator).

For each training example, the output of the JPSN is calculated as: ⎛ k ⎞ y (t ) = f ⎜⎜ ∏ hL (t )⎟⎟ (1) ⎝ L=1 ⎠ h l (t ) can be calculated as: m

m+1

m =1

m =1

hL (t ) = ∑ wLm x m (t ) + wL ( m ) + wL ( m+1) y (t − 1) = ∑ wLm z m (t − 1)

(2)

where hL (t ) represents the activation of the L unit at time t , and y (t ) is the previous network output. The unit’s transfer function f which is sigmoid activation function, is bounded of output range of [0,1] . The error signal is determined as follows:

e between the target and actual

e j (t ) = d j (t ) − y j (t )

(3)

548

oscillation for a practical purpose of making rapid learning possible [19]. Theoretically, a higher η can be used to speed up the learning process, but if it is set too high, the algorithm will “step-over” in finding the optimum weights [19]. If it is set too small, it will result in incredibly long training times. Figure 2 (a) presents the number of epochs versus different values of η with α = 0.2 . The figure shows that a higher η leads the algorithm

Table 1: The Maximum, Minimum and Average Temperature Measurement at Batu Pahat Average Maximum Minimum Station Size (oC) (oC) (oC) Batu 1826 29.5 23.7 26.75 Pahat This whole dataset was then divided into three parts; 50% for the training, 25% for the validation and 25% for the testing (out-of-sample). Of the total available data, 913 data points were used for training, while the remaining data were used for both testing and validation. Table 2 shows the segregation of the dataset.

to converge quickly. On the other hand, a lower value of η requires unreasonably high number of epochs to reach the desired solution, thus lead to longer training time and the network might diverged. In addition, the momentum factor α is also important factor for the learning process. Figure 2 (b) indicates the effects of the momentum factor α on the model convergence with the same learning factor η = 0.1 . It can be seen that a larger value of α affects the number of epochs reached. Therefore, the higher rate of α can be used to achieve a smaller number of epochs.

Table 2: Summary of Temperature Dataset Segregation of Batu Pahat Dataset Training Validation Testing Batu Pahat 913 456 457 station

Epochs 1600

4. RESULTS AND DISCUSSION

1400 1200

Design of right architecture involves several steps: selecting the number of layers, basic decision about the amount of neurons to be used in each layer and choosing the appropriate neurons’ transfer function. Thus, in this study, a three-layered JPSN was considered and was trained using backpropagation learning algorithm [15]. The choice of a network architecture is always a problem dependent and often required experimentation before a final architecture was selected for training, and ultimately, forecasting. Therefore, we employed a total of 405 network architectures which consist of 5 different numbers of input nodes ranging from 4 to 8 and the number of higher order terms in the range of 2 to 5 nodes. For this network model, only one output unit is needed which represents the value of the temperature measurement for one-day ahead. The input and output series were subsequently scaled using a standard minimum and maximum normalisation method which then produces a new bounded dataset. One of the reasons for using data scaling is to process outliers, which consist of sample values that occur outside the normal range [18]. An average performance of 10 simulations was used and the network parameters (e.g: the learning factors, the higher order terms, and the number of neurons in the input layer) were experimentally tested in an attempt to find the best model. A sigmoid function 1 1 + e − x was employed and all networks were trained with a maximum of 3000 epochs. By considering all in-sample dataset that have been trained, the best value for the momentum term α = 0.2 and the learning rate η = 0.1 , were chosen based on the simulation

(

1000 800 600 400 200 0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8 Learning Rate

(a): Number of Epochs versus Various Learning Rate with Momentum Fixed to 0.2 400

Epochs

350 300 250 200 150 100 50 0 1

)

2

3

4

5

6

7 8 Momentum Term

(b): Number of Epochs versus Various Momentum Term with Learning Rate Fixed to 0.1 Figure 2: The Effects of Learning Factors on the Network Performance

results being made by trial-and-error procedure. Likewise, the number of input node has been fixed to 4 in order to exemplify the effects of all network parameters.

4.2 The Effects of Higher Order Terms The number of higher order terms is one of the important factors that can affect the network’s performance. There is no upper limit on the possible number of network order in theory. Yet, it is a rarely seen the number of network order that is two times greater than the number of input nodes. Therefore, in this work, the number of higher order terms is gradually increased by one during the training process, starting from 2nd order up to 5th order. A comparative performance of the networks based on these different structures is tabularised in Table 3. For the rest of the experiments, the momentum term and the learning rate are

4.1 The Effects of Learning Factors

The value of the learning rate η and the momentum factor α significantly affect the efficiency and the convergence of JPSN learning algorithm. Both network parameters are added to determine the size of steps in the gradient descent process. The momentum term determines the effect of past weight changes on the current direction of movement in weight space, and to avoid

549

fixed to α = 0.2 , η = 0.1 respectively, as it has been proven for

Table 5: Average Result of JPSN for One-Step-Ahead Prediction No. of Input 4 5 6 7 8 Nodes

JPSN, they are the best parameters together with input = 4. The results apparently show that the JPSN with 2nd order outperformed the other network orders for all measuring criteria except for epoch. This is due to the network that needs more epochs for the tuneable weights to be adjusted.

Network Order NMSE

Table 3: The Effects of Number of Higher Order Terms for JPSN with α = 0.2 , η = 0 .1 ORDER NMSE SNR MSE Training MSE Testing Epoch CPU Time

2 0.7710 18.7557 0.0062 0.0065 1460.9 147.25

3 0.7928 18.6410 0.0064 0.0066 1641.1 323.24

4 0.8130 18.5389 0.0064 0.0068 1209.9 460.93

5 0.8886 18.1574 0.0076 0.0074 336.8 460.93

2

2

2

2

0.7710

0.7837

0.7912

0.7888

2 0.8005

SNR MSE Training MSE Testing

18.7557 18.6853 18.6504 18.6626 18.5946 0.0062 0.0062 0.0062 0.0062 0.0062 0.0065 0.0066 0.0066 0.0066 0.0067

Epoch

1460.9

193.5

285.3

236.3

185.9

CPU Time

147.25

20.17

30.72

26.13

21.12

Figure 3 presents the learning curve for the prediction of temperature using JPSN. As shown in the plot, the blue line represents the trend of the actual values, while the red line represents the predicted values. As it can be noticed from the plot, the predicted values of daily temperature measurement almost fit the actual values with minimum error forecast. The better performance of temperature forecasting is allocated based on the robustness it contains. Hence, it can be seen that the parsimonious representation of higher order terms in JPSN is capable of learning the behaviour of nonlinearity in the temperature data, thus assists the network to model effectively.

4.3 The Effects of the Number of Neurons in the Input Layer The number of neurons in the input layer also affects the performance of JPSN. As there are no rigorous rules in the literature on how to determine the optimal number of input neurons, in this work the numbers of input neurons for the network models were selected using trial-and-error procedure between 4 and 8. From Table 4, it clearly appeared that the network performance starts to decrease when a larger number of input neurons are added to the network. However, it can be concluded that a large number of neurons in the input layers is not always necessary, and it can decrease the network performance and may lead to greater execution time, which resulting from network overfitting. Table 4: The Effects of the Input Neurons for JPSN with α = 0.2 , η = 0.1 and Input = 4 INPUT NMSE SNR MSE Training MSE Testing Epoch CPU Time

4 5 0.7710 0.7837 18.7557 18.6853

6 0.7912 18.6504

7 0.7888 18.6626

8 0.8005 18.5946

0.0062

0.0062

0.0062

0.0062

0.0062

0.0065

0.0066

0.0066

0.0066

0.0067

1460.9 147.25

193.5 20.17

285.3 30.72

236.3 26.13

185.9 21.12

Figure 3: Temperature Forecast made by JPSN.

5. CONCLUSION AND FUTURE WORK In this work, we presented a JPSN model for temperature forecasting in Batu Pahat. The simulation results which were validated by (a) the minimum error in all measuring criteria and (b) the effects of network parameters; indicates that the JPSN can be a promising tool for the prediction of temperature since it incorporates the auxiliary information by the existence of the recurrent connection from the output to the input layer that acts as an additional guidance to evaluate the current noisy input and its signal component. From the extensive simulation results, it is suggested that JPSN with architecture 4-2-1 provides the best prediction with learning rate 0.1, and momentum 0.2. For possible research directions, it is suggested the JPSN to be implemented and trained with other well-known learning algorithms such as conjugate gradient, simulated annealing, and simplex algorithm rather than backpropagation itself with an objective of improving the computational efficiency of JPSN’s training process.

Table 5 shows the results for temperature prediction using JPSN. It shows that the 2nd order JPSN, with 4 inputs demonstrates the best results using all measuring criteria except for number of epochs and CPU time. In the case of learning speed, input 5 converged much faster than the other input architecture. This was owing to the large number of tuneable weights employed by the network architecture. Over all the training process, the JPSN with 4 inputs shows the least error. Meanwhile, the prediction error that can be found in the testing set is slightly lower that the other network architecture. This indicates that the network is capable of representing nonlinear function. Consequently, it can be inferred that the JPSN yield more accurate results, providing that the choice of network parameters are determined properly.

550

[15] Rumelhart, D.E., G.E. Hinton, and R.J. Williams, Learning Representations by Back-Propagating Errors. Nature, 1986. 323(9): p. 533-536. [16] Ghazali, R., A. Hussain, and W. El-Dereby, Application of Ridge Polynomial Neural Networks to Financial Time Series Prediction. International Joint Conference on Neural Networks (IJCNN ‘06), 2006: p. 913-920. [17] Department, M.M. Weather Forecast. 2010 [cited 2011 February, 18]; Available from: http://www.met.gov.my. [18] Ghazali, R., et al., Dynamic Ridge Polynomial Neural Networks in Exchange Rates Time Series Forecasting, in Lecture Notes in Computer Science, B.B.e. al., Editor. 2007, LNCS. p. 123-132. [19] Holger R. M. and Graeme C. D., The Effect of Internal Parameters and Geometry on the Performance of BackPropagation Neural Networks. Environmental Modeling and Software, 1998. 13(1): p. 193-209.

6. ACKNOWLEDGMENTS The authors would like to thank Universiti Tun Hussein Onn Malaysia for supporting this research under the Postgraduate Incentive Research Grant.

7. REFERENCES [1] [2]

[3]

[4]

[5] [6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

Cooke, B.M., et al., Disease Forecasting, in The Epidemiology of Plant Diseases. 2006, Springer Netherlands. p. 239-267. Coombs, A., Climate Change Concerns Prompt Improved Disease Forecasting. Nature Medicine, 2008. 14(1): p. 33. Byungun, Y. and P. Yongtae, Development of New Technology Forecasting Algorithm: Hybrid Approach for Morphology Analysis and Conjoint Analysis of Patent Information. IEEE Transactions on Engineering Management, 2007. 54(3): p. 588-599. Cheng, A.-C., C.-J. Chen, and C.-Y. Chen, A Fuzzy Multiple Criteria Comparison of Technology Forecasting Methods for Predicting the New Materials Development. Technological Forecasting and Social Change, 2008. 75(1): p. 131-141. Lorenc, A.C., Analysis methods for numerical weather prediction. 1986, John Wiley & Sons, Ltd. p. 1177-1194. Paras, et al., A Feature Based Neural Network Model for Weather Forecasting. Proceedings of World Academy of Science, Engineering and Technology, 2007. 34. Paras, et al. A Feature Based Neural Network Model for Weather Forecasting. in Proceedings of World Academy of Science, Engineering and Technology. 2007. Bhardwaj, R., et al., Bias-free Rainfall Forecast and Temperature Trend-based Temperature Forecast using T170 Model Output during the Monsoon Season. Meteorology & Atmospheric Sciences, 2007. 14(3): p. 351360. Hayati, M. and Z. Mohebi, Application of Artificial Neural Networks for Temperature Forecasting. World Academy of Science, Engineering and Technology, 2007. 28: p. 275279. Lee, L.-W., L.-H. Wang, and S.-M. Chen, Temperature Prediction and TAIFEX Forecasting based on High-Order Fuzzy Logical Relationships and Genetic Simulated Annealing Techniques. Expert Systems with Applications, 2008. 34(1): p. 328-336. Smith, B.A., G. Hoogenboom, and R.W. McClendon, Artificial Neural Networks for Automated Year-Round Temperature Prediction. Computers and Electronics in Agr., 2009. 68(1): p. 52-61. Giles, C.L. and T. Maxwell, Learning, Invariance, and Generalization in High-Order Neural Networks. Applied Optics, 1987. 26(23): p. 4972-4978. Franklin, J.A., Recurrent Neural Networks for Music Computation. INFORMS Journal on Computing, 2006. 18(3): p. 321-338. Zhang, J.X., CEO Tenure and Debt: An Artificial Higher Order Neural Network Approach, in Artificial Higher Order Neural Networks for Economics and Business, M. Zhang, Editor. 2009, Information Science Reference: Hershey, New York. p. 330-343.

551