Available online at www.sciencedirect.com
ScienceDirect Procedia Technology 11 (2013) 557 – 564
The 4th International Conference on Electrical Engineering and Informatics (ICEEI 2013)
Uncertain Time Series in Weather Prediction Nabilah Filzah Mohd Radzuan*, Zalinda Othman, Azuraliza Abu Bakar Data Mining and Optimization, Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor D.E, Malaysia
Abstract The paper reviews about methods have been implemented on uncertain time series data in weather prediction. The aim of uncertain time series analysis is to formulate uncertain data in order to gain knowledge, fit low dimensional models, and do prediction. Euclidean distance, particle swarm optimization, data mining, and Monte Carlo simulation are methods that have been compared to investigate the best ways of predicting. These methods have been implemented since early 1900s. This paper discusses on the performance of every methods.
© The Authors. Authors.Published PublishedbybyElsevier ElsevierB.V. Ltd. Open access under CC BY-NC-ND license. © 2013 2013 The Selection and peer-review peer-reviewunder underresponsibility responsibilityofofthe theFaculty FacultyofofInformation Information Science &Technology, Technology,Universiti UniversitiKebangsaan Kebangsaan Science and Malaysia. Malaysia. Keywords: uncertain time series; time series; artificial intelligence; weather
1. Introduction Uncertain time series is part of time series group. Time series well known as a stretch of values on the same scale indexed by a time that occur naturally in many application areas such as environmental, economic, finance, and medicine. The aim of time series analysis is to formulate time series data in order to gain knowledge, fit low dimensional models, and make forecasts. The uncertain time series is a non-negative and precisely different ways in a number of fields [1], [2]. The combination of uncertainties are significant [1–3] and bring important knowledge for the area involved. However, there are many challenges to be faced when it involved application area as example in manufacturing and weather. Researcher in manufacturing tries to survey any techniques that have been proposed specifically for modelling and processing uncertain time series for temporal data [4], [5]. While, in weather the
* Corresponding author. Tel.: +603-8921-6182; fax: +603-8921-6184. E-mail address:
[email protected]
2212-0173 © 2013 The Authors. Published by Elsevier Ltd. Open access under CC BY-NC-ND license. Selection and peer-review under responsibility of the Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia. doi:10.1016/j.protcy.2013.12.228
558
Nabilah Filzah Mohd Radzuan et al. / Procedia Technology 11 (2013) 557 – 564
researcher tries to determine knowledge from uncertain time series that can be used in weather prediction or precipitation for human being benefits. Beginning 1900s, there were several researches being done according to uncertain time series but not specify using the uncertain terms. In 1905, researcher determines the relation between normal and abnormal development of the embryo of the frog by investigate the failed of yolk to segment where yolk had been interrupt during the earlier stages [6]. The inconsistent process of segmentation shown there were uncertain time series in stages of development. On the other hands, during 1911, the uncertain time series had been studied through series of traverses across desert in order to determine the effect secular Oscillation in Egypt during the Cretaceous and Eocene periods [7]. While, in 1985, the researcher helps to identify wet spells of weather and assess the unusualness of the recent episode of heavy precipitation in meteorology department even though the uncertain whether a prolonged dry spell will strike to bring the lake levels down to much lower levels before the onset of the next severe wet spell [8]. These situations shown the uncertain time series was very helpful and useful when knowledge had been extracted from the data set, and more researches must be done in order to get proper method of extraction exact knowledge for predicting. Uncertain time series in weather precipitation forecast is believed can avoid risks and help in order to make better daily decisions, whispered can improve the quality of demand, and identify the temporal patterns that emerge and persist. Therefore, this study will determine the accuracy of forecasting uncertain time series and improve the quality of yield in weather forecast area. The study will focus on present uncertain time series data in order to reduce space, which type of similarity is suitable to measure similarity between patterns in long sequence uncertain time series data and how to manage data without reduce lost of compression properties, and how to maximise the efficiency of prediction in extract knowledge from uncertain time series data. This paper represents a brief overview of uncertain time series and methods being used especially focusing on weather prediction. Methods that have been implemented assists meteorological people in finding parameters and variables even there are some errors exist [9], and further research must been done for predicting process in future. This paper is organized as follows. First, the related research is reviewed in Sec.2. Then, some analysis will be compared in Sec.3. In Sec.4, there will be a discussion of techniques and benefits. Finally, the conclusion is drawn in Sec.5. 2. Related research Uncertain time series is important in helping to build up the prediction. It influences changeable climate that provides more useful information. Important knowledge can be tackled from this changeable gap that exists in uncertain time series data. The uncertainty can provide better results in terms of quality and efficiency [4], [5], [10]. Uncertainty information might arise for different reasons [5]. The uncertain time series has been explored extensively in recent years. Uncertain data is created by several applications in data forecasting. This is such as weather precipitation predicting for meteorology department, or in manufacturing demand prediction which both actually can gain benefits in handling future outcome. There are relationship between uncertain and original series [11], [12]. A certain time series is extracted to represent the original uncertain time series [10]. Uncertainty can be due to data aggregation, privacy-preserving transforms, and error-prone mining algorithms [4], [5]. Uncertain time series can be treated as positional uncertain vectors [13]. Besides, uncertainty exist in the modelling process where it arises from fundamental choice as example in grid resolution and from the parameterization of processes unresolved at the grid scale [14] and the high uncertainty bring big impact on prediction of regional climate change [15]. The lack of model diversity causes limited range of climate change projections [16]. The distinct source of uncertainty in prediction are included internal variability, model uncertainty or response uncertainty, and scenario uncertainty [15], [17]. Weather and climate are difference. Weather is a representation of natural situation over a short period of time where as a “snapshot” of the atmosphere at a particular time. While, climate is the statistics of weather over a long period of time [18]. Change on environment and climate particularly sea level cause difficulties arise in identify an effective policy on multi-decadal time scale in order to gain the factors for evolution were uncertain in variability and trends, especially in analyzing of temperature and precipitation [19]. The present statistics of storm surges to be capable of a large change in the frequency of floods in future [19]. The long-term effects of warming in ecological
Nabilah Filzah Mohd Radzuan et al. / Procedia Technology 11 (2013) 557 – 564
559
regimes are uncertain with no clear evidence found [20]. The predicting effect of future climate in a highly dynamic system is the challenge [20]. As focus on weather area, the artificial intelligence methods for weather prediction currently included model output statistics, fuzzy logic, expert systems, genetic algorithm, and particle swarm optimization [18], but in implementation of uncertain time series data, the method involved are included Euclidean Distance, Particle Swarm Optimization (PSO), Data Mining , and Monte Carlo. Predicting uncertain time series appears to be serious problem in focus area, as the existing forecast of certain time series does not purely mirror the ability of predicting future decisions. The certain time series data cannot be implemented in large scale of dataset which bring error in prediction [3], [9], [21]. Uncertain time series in weather precipitation prediction is believed can avoid risks and help in order to make better daily decisions. The determination of predicting uncertain time series noted as serious action must be taken in order to improve the quality of yield in focus area. Therefore, comparison of analysis been done in order to determine the yield of predicting uncertain time series. The limitation from analysis can be used as an opening of experiment and being aim to secure the limitation for enhanced prediction outcome. 2.1.Eulidean distance method Weather area shows that the main problem of indexing uncertain time series are included the low query selectivity due to the curve of positional, only correlated uncertain time series can be treated as positional uncertain vectors, and none of the approaches such as statistical modelling and Gaussian; is able to support dynamic time warping [13]. Traditional distance measures such as Euclidean distance or dynamic time warping are not always effective for analyzing uncertain time series data cause of limitation in applicability [22]. The change of precipitation from climate are often complex, uncertain, and changeable with the dynamic characteristic and rendered as a complex nonlinear dynamic system [21]. 2.2.Particle swarm optimization method PSO considers as good method when plays in certain time series data based on the yield of prediction. For each potential solution in the PSO is referred as a particle, and each set of particles composes a population where it maintains the position associated with the best fitness ever experienced by it in a personal memory call pbest and the best value of it is called gbest [18]. In this case, there was tool been implemented through PSO [18] which was Matlab 6 with MeteoLab toolboxes which developed by Antonio S. Cofi’no, Rafael Cano, Carmen Sordo, Cristina Primo, and Jos’e Manuel Guti’errez where the tool helps in collection of numerical weather prediction (NWP) algorithms, raw data of weather observations included precipitation, pressure and temperature observation, information of reanalysis projects, and algorithms for generation of simulation data and a routine for the representation of climate data. The used of data set of temperature observations been compared with two techniques included PSO-based approach and Neural Network (NN) toolbox Netlab shown that the mean error NN 3.3289 was bigger than PSO 2.27. This shown PSO was helped in reduce the error from the extraction knowledge processes. PSO works in an unsupervised manner and discover relations hidden inside the data set [18]. In other cases such as precipitation prediction, avoid of uncertainty in data will cause error. This been shown through standard error of estimate [23]. Uncertainty analysis is one of the techniques that has been used to validate the precipitation data apart from systematic bias, model output evaluation with different precipitation inputs, and inter-comparisons of magnitude and spatial pattern; for analysis of spatial similarities between Next Generation Radar (NEXRAD) and North American Land Data Assimilation System (NLDAS) [24]. PSO algorithm is one of heuristic methods that has been applied for analysis and prediction data. It is used to provide experimental results of the aforementioned method on real-world meteorological time series [18]. PSO can outperform classical and modern methods for the discovery of models in weather data. Usually, PSO algorithm is implemented in certain time series data. Self adaptability of PSO algorithm allows it to work in an unsupervised manner and to discover relations hidden inside the data set. Besides, PSO algorithm is helped in decrease mean error compare with NN. This experiment is shown through a data set of temperature observations collected every 10 days in Rennes [18] as in Figure 1.
560
Nabilah Filzah Mohd Radzuan et al. / Procedia Technology 11 (2013) 557 – 564
Fig. 1. Result for Rennes data set
Fig. 2. Result for Ostersund Froson data set
Figure 1 contains the plot of the model with mean error for PSO model was 2.27 while NN was 3.3289. The same method been applied on precipitation analysis in Ostersund Froson and it brought the same result where mean error of PSO model was 2.01 smaller than NN with 3.0390 as shown in Figure 2. The minimum mean error shown that the PSO algorithm can outperform based on artificial intelligent for finding models in weather data. Therefore, this PSO algorithm can be tested in uncertain time series data in extraction knowledge. 2.3. Data mining method On the other hands, regression model in data mining helpful too in determine the uncertainty [1]. The effect of uncertainty of a particular location on future weather fields been calculated through adjoint sensitivity analysis by provides framework to assess the effect of weather conditions and uncertainty on infrastructures [25]. Information from uncertain time series believes can help in prepare to face high wind power variation by allocating reserves or by committing peaking units [25]. The standard errors are large reflecting more correctly the uncertainty on the estimated parameters [26]. The model errors make a larger contribution than initial condition and lateral boundary condition errors to predict uncertainty in the short range, statistical analysis underestimating predict uncertainty [27]. The uncertainty environmental parameters can help in determine the optimal management schedule by using nonlinear stochastic model predictive control technique [28]. The autoregressive models using a Gaussian process modelling technique in estimates of certain weather conditions and related uncertainty information, and can give rise to conservative uncertainty bounds [28]. There are tasks under data mining included classification, prediction, association rules mining, sequential rules, clustering and deviation detection. The prediction of data mining task been used in predict process. Regression model has been used in determine the uncertain time series [1], [3], [8], [21], [29]. In previous research, the regression model through certain time series data assisted on capture the main modes of spatial variability of isotopic composition of precipitation [1], assess the unusualness of the recent episode of heavy precipitation but uncertain whether a prolonged dry spell will strike again in bring the lake levels down much lower levels before the onset of the next severe wet spell [8], [29], improve the prediction accuracy in study of precipitation forecast but difficult to predict climate because of the dynamic characteristics of sample set where the flow was shown in Figure 3 [21], and try to figure out the effect of relationship between uncertain time series data with the weather area. This task helps the uncertain time series in extraction of knowledge with carefully implemented. The comparison of analysis can be clearly shown in Table 1.
Nabilah Filzah Mohd Radzuan et al. / Procedia Technology 11 (2013) 557 – 564
SVM Model
History of precipitation and environmental factors
Nonlinear feature selection
Environmental factors to retain
Environmental factors of measured years
Nonlinear order
Car model
Final modal
SVM Training
Prediction
Fig. 3. Precipitation predicting process [21]
2.4.Monte Carlo method The uncertainties in water vapour observations make climate trend analyses difficult [30]. The uncertain must be considered as risk-assessment strategies which bring impact in heavy precipitation from the record of lake levels [8], [29]. The uncertain brings a number of difficulties associated with estimating risks in populations and wide ranging individual exposures, and change in behaviour with time and the natural variation in individual response [31]. Highly uncertain which is incorporation of information causes best developed within a probabilistic framework [31]. Monte Carlo methods can be used to analysis of complex systems and natural processes subject to a high degree of uncertainty and help in maintain physical accuracy, but it cause bias with limited range [31]. The yield of Monte Carlo simulation helps in provide a basis for decision making and identify the most significant sources even though with bias [31]. Monte Carlo also known as class of computational algorithms that rely on repetitive random sampling to subtract their results and it has been applied in prediction as alternative methods for human perception. In this experiment, the Monte Carlo simulation works to present a method for calculating the long-term cumulative exposure in order to cristobalite from volcanic ash [31]. The probability distribution of controlling variables allow for correlation between variables. The components and processes of the model were shown in Figure 4.
561
562
Nabilah Filzah Mohd Radzuan et al. / Procedia Technology 11 (2013) 557 – 564
Fig. 4. Model flowchart for estimation and cristobalite exposure through weather data [31]
The Monte Carlo simulation has a number of difficulties as associates with estimating risks in populations due to uncertain and wide ranging individual exposures, change in behaviour with time and the natural variation in individual response [2], [31]. Therefore, the elements of the model could be improved by gathering more data and calculate again for gaining the significant sources for uncertain time series data in implemented for weather. Table 1. Comparison of previous analysis Method
Technique
Advantage
Disadvantage
Traditional
Euclidean distance (Dynamic Time Warping)
x
survey the techniques that have been proposed specifically for modelling and processing uncertain time series
x x
not always effective limitation in applicability
Heuristic
PSO
x x
applied for analysis and prediction data to provide experimental results of on realworld meteorological time series to work in an unsupervised manner to discover relations hidden inside the data set decrease mean error
x
implemented in certain time series data
forecast process capture the main modes of spatial variability of isotopic composition of precipitation improve the forecasting accuracy
x
need framework to assess prediction difficult to predict climate because of the dynamic characteristics of sample set usually implement in certain time series
x x x Data Mining
Prediction Task: x Regression model
x x x
x
x Monte Carlo
Monte Carlo Simulation
x x
Monte Carlo approach
x
repetitive random sampling to subtract results applied in prediction as alternative methods for human perception treatment leads to nonlinear terms
x
bias with the limited range
x
the Monte Carlo approach is less appealing
Nabilah Filzah Mohd Radzuan et al. / Procedia Technology 11 (2013) 557 – 564
3. Discussion In this work, the methods had been reviewed previously brought benefits to weather area through predicting the uncertain time series data. The performed analytical and experimental comparisons of techniques shown there should be furthered experiment on the method in order to get the accuracy of predicting uncertain time series and improve the quality of yield in focus area. Besides, the calculation in Monte Carlo simulation should be more persisted in identifies the most significant sources of uncertainty and providing a basis of decision making through simulation. The numerical weather prediction through these approaches can help researcher in predict the weather especially in precipitation of landslide. The approach of PSO algorithm focused on analysis but the classical models may become insufficient because of the lack adaption. The Monte Carlo simulation brought a bit difficult in estimating risks in uncertain time series and get the uncertainty but still helped in determine the effective of the system. The prediction in data mining method through regression model determined test between techniques been used using multivariate model, and further helps in analyze the effect of bias correction and examine forecast periods by search in consider more data model and plans to use extraction algorithm which can be more accurate results. 4. Conclusion This review paper produced an evaluation of methods in uncertain time series. The previous analysis and experiments presented the overview in implementation of methods on certain time series data in order to extract knowledge for future work. There were methods had been used on uncertain time series data but still brought limitation in predicts processes. Uncertain time series was important in helping to build up the prediction. The comparison of analysis been done in order to determine the yield of predicting uncertain time series through PSO algorithm, Monte Carlo simulation, and regression model in data mining method. The methods had been reviewed brought benefits to weather area through predicting but still future action must be taken in obtain the accuracy of predicting uncertain time series and improve the quality of yield in focus area.
References [1] S. P. Lykoudis, A. a. Argiriou, and E. Dotsika, “Spatially interpolated time series of δ18Ο in Eastern Mediterranean precipitation,” Global and Planetary Change, Apr. 2010, vol. 71, no. 3–4, p. 150–159. [2] H. L. Cloke and F. Pappenberger, “Ensemble flood forecasting: A review,” Journal of Hydrology, 2009, vol. 375, no. 3–4, p. 613–626, Sep. [3] V. Jankovic, “Science Migrations: Mesoscale Weather Prediction from Belgrade to Washington, 1970–2000,” Social Studies of Science, Feb. 2004, vol. 34, no. 1, p. 45–75. [4] M. Dallachiesa, “Similarity Matching for Uncertain Time Series : Analytical and Experimental Comparison,” 2011, p. 8–15. [5] M. Dallachiesa, B. Nushi, K. Mirylenka, and T. Palpanas, “Uncertain Time-Series Similarity : Return to the Basics,” in Proceedings of the VLDB Endowment, 2012, vol. 5, no. 11, p. 1662–1673. [6] T. H. Morgan, “The relation between normal and abnormal development of the embryo of the frog,” Archiv für Entwicklungsmechanik der Organismen, Jul.1905, vol. 19, no. 4, pp. 581–587. [7] W. F. Hume, “The Effects of Secular Oscillation in Egypt during the Cretaceous and Eocene Periods,” Quarterly Journal of the Geological Society, Feb. 1911, vol. 67, no. 1–4, pp. 118–148. [8] T. R.Karl and P. J.Young, “Recent heavy precipitation in the vicinity of the Great Salt Lake: just how unusual,” Journal of Climate and Applied Meteorology, 1985, vol. 25, pp. 353–363. [9] D. J. Gagne, A. McGovern, and M. Xue, “Machine learning enhancement of storm scale ensemble precipitation forecasts,” in Proceedings of the 2011 workshop on Knowledge discovery, modeling and simulation - KDMS ’11, 2011, p. 45. [10] Y. Zuo, G. Liu, X. Yue, W. Wang, and H. Wu, “Similarity Matching over Uncertain Time Series,” 2011 Seventh International Conference on Computational Intelligence and Security, Dec. 2011, p. 1357–1361. [11] C. La Russa and K. Andrews, “AC 2010-1093 : Managing A Digitization Project : Issues for State Agency Publications with Folded Maps,” American Society for Engineering Education, 2010. p. 1–6. [12] L. Haitao and Z. Xiaofu, “Precipitation Time Series Predicting of the Chaotic Characters Using Support Vector Machines,” in 2009 International Conference on Information Management, Innovation Management and Industrial Engineering, 2009, p. 407–410. [13] J. Aßfalg, H. Kriegel, P. Kr, and M. Renz, “Probabilistic Similarity Search for Uncertain Time Series,” in SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management, 2009, p. 435 – 443.
563
564
Nabilah Filzah Mohd Radzuan et al. / Procedia Technology 11 (2013) 557 – 564 [14] M. R. S. Angew, J. M. Murphy, D. M. H. Sexton, D. N. Barnett, G. S. Jones, M. J. Webb, M. Collins, and D. A. Stainforth, “Quantification of modelling uncertainties in a large ensemble of climate change simulations,” Nature Publishing Group, 2004, vol. 430, no. August 2004, p. 768–772,. [15] E. Hawkins and R. Sutton, “The Potential to Narrow Uncertainty in Regional Climate Predictions,” Bulletin of the American Meteorological Society, Aug. 2009, vol. 90, no. 8, p. 1095–1107. [16] C. Pennell and T. Reichler, “On the Effective Number of Climate Models,” Journal of Climate, May 2011, vol. 24, no. 9, p. 2358–2367. [17] J. C. Hargreaves, “Skill and uncertainty in climate models,” Wiley Interdisciplinary Reviews: Climate Change, Jul. 2010, vol. 1, no. 4, p. 556–564. [18] S. Esfandeh and M. Sedighizadeh, “Meteorological Data Study and Forecasting Using Particle Swarm Optimization Algorithm,” World Academy of Science, Engineering and Technology, 2011, vol. 59, p. 2117–2119. [19] P. Lionello, “The climate of the Venetian and North Adriatic region: Variability, trends and future change,” Physics and Chemistry of the Earth, Parts A/B/C, Jan. 2012, vol. 40–41, no. October 2008, p. 1–8. [20] E. Johannesen, R. B. Ingvaldsen, B. Bogstad, P. Dalpadado, E. Eriksen, H. Gjosaeter, T. Knutsen, M. Skern-Mauritzen, and J. E. Stiansen, “Changes in Barents Sea ecosystem state, 1970-2009: climate fluctuations, human impact, and trophic interactions,” ICES Journal of Marine Science, vol. 69, Mar. 2012, no. 5, p. 880–889. [21] C. Qing, Z. Xiaoli, and Z. Kun, “Research on Precipitation Prediction Based on Time Series Model,” in 2012 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring, 2012, p. 568–571. [22] S. R. Sarangi and K. Murthy, “DUST : A Generalized Notion of Similarity between Uncertain Time Series Similarity of Uncertain Time Series,” IBM India Research Lab, 2010, vol. 1. [23] D. Harris, E. Foufoula-Georgiou, K. K. Droegemeier, and J. J. Levit, “Multiscale Statistical Properties of a High-Resolution Precipitation Forecast,” Journal of Hydrometeorology, Aug. 2001, vol. 2, no. 4, p. 406–418. [24] Z. Nan, S. Wang, X. Liang, T. E. Adams, W. Teng, Y. Liang, and S. Member, “Analysis of Spatial Similarities Between NEXRAD and NLDAS Precipitation Data Products,” IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2010, vol. 3, no. 3, p. 371–385. [25] A. Cioaca, V. Zavala, and E. Constantinescu, “Adjoint sensitivity analysis for numerical weather prediction: Application to Power Grid Optimization,” in Proceedings of the first international workshop on High performance computing, networking and analytics for the power grid - HiPCNA-PG ’11, 2011, p. 35. [26] T. Brijs, D. Karlis, and G. Wets, “Studying the effect of weather conditions on daily crash counts using a discrete time-series model.,” Accident; analysis and prevention, May 2008, vol. 40, no. 3, p. 1180–90. [27] A. J. Clark, W. a. Gallus, and T.-C. Chen, “Contributions of Mixed Physics versus Perturbed Initial/Lateral Boundary Conditions to Ensemble-Based Precipitation Forecast Skill,” Monthly Weather Review, Jun 2008, vol. 136, no. 6, p. 2140–2156. [28] C. S. Division, Weather Forecast-Based Optimization of Integrated Energy Systems. 2009, p. 1–39. [29] T. R.Karl and P. J.Young, “Recent heavy precipitation in the vicinity of the Great Salt Lake just how unusual,” American Meteorological Society, 1986, vol. 67, no. 1, p. 4–9. [30] J. J. Simpson, J. S. Berg, C. J. Koblinsky, G. L. Hufford, and B. Beckley, “The NVAP global water vapor data set: independent crosscomparison and multiyear variability,” Remote Sensing of Environment, Apr. 2001, vol. 76, no. 1, p. 112–129. [31] T. K. Hincks, W. P. Aspinall, P. J. Baxter, a. Searl, R. S. J. Sparks, and G. Woo, “Long term exposure to respirable volcanic ash on Montserrat: a time series simulation,” Bulletin of Volcanology, Nov. 2005, vol. 68, no. 3, p. 266–284