Forecasting Performance of LS-SVM for Nonlinear ... - Springer Link

2 downloads 0 Views 999KB Size Report
feasibility of using LS-SVM in the forecasting of nonlinear hydrological time series by comparing ..... 2.4 Feedforward Back-propagation Neural Network (NNBP).
KSCE Journal of Civil Engineering (2012) 16(5):870-882 DOI 10.1007/s12205-012-1519-3

Water Engineering

www.springer.com/12205

Forecasting Performance of LS-SVM for Nonlinear Hydrological Time Series Seok Hwan Hwang*, Dae Heon Ham**, and Joong Hoon Kim*** Received March 20, 2011/Revised September 9, 2011/Accepted October 3, 2011

···································································································································································································································

Abstract This paper presents a Least-Square Support Vector Machine (LS-SVM) approach for forecasting nonlinear hydrological time series. LS-SVM is a machine-learning algorithm firmly based on the statistical learning theory. The objective of this paper is to examine the feasibility of using LS-SVM in the forecasting of nonlinear hydrological time series by comparing it with a statistical method such as Multiple Linear Regression (MLR) and a heuristic method such as a Neural Network using Back-Propagation (NNBP). And the performance of prediction model is also dependent on the degrees of linearity (or persistency) of data, not only on the performance of model itself. Thus, we would clearly verify that prediction performance of three models according to linear extent using daily water demand and daily inflow of dam data. In the experimental results, LS-SVM showed superior forecasting accuracies and performances to those of MLR and NNBP and LS-SVM demonstrated better forecasting efficiency in nonlinear hydrological time series using Relative Correlation Coefficient (RCC) which is a relative measure of forecasting efficiency with different persistency. Keywords: forecasting, forecasting performance, support vector machine ···································································································································································································································

1. Introduction Forecasting future events is difficult and complex due to the numerous unknown parameters involved in events and interactions between the parameters. Furthermore, forecasting accuracy is hampered by noise in time series data and increasing complexity over a time horizon. The objective of this paper is to examine the feasibility of using a LS-SVM in hydrological time series forecasting by comparing it with a statistical method such as MLR and a heuristic method such as a NNBP, as these two are by far the most popular data-driven models and have been used extensively in the past couple of decades in hydrological time series forecasting. The foundations of Support Vector Machines (SVMs) were developed by Vapnik (1995) and they have gained in popularity due to their many attractive features and promising empirical performances. The formulation embodies the Structural Risk Minimization (SRM) principle, which has been shown to be superior to the traditional Empirical Risk Minimization (ERM) principle, as employed by conventional neural networks. SRM minimizes an upper bound on the expected risk, as opposed to ERM, which minimizes the error on the training data. This difference equips SVM with a greater ability to generalize, which is the goal of statistical learning (Gunn, 1998).

The American Society of Civil Engineers Task committee (ASCE Task Committee on the Application of the Artificial Neural Network in Hydrology, 2000a, 2000b) presented an alternative, data-driven model based on an SVM, applied in stream flow forecasting. Dibike et al. (2001) applied SVM in remotely sensed image classification and regression (rainfall-runoff modeling) problems. Comparisons between SVM, Artificial Neural Network (ANN), and a conceptual rainfall-runoff model at three catchments showed the superior performance of SVM. Liong and Sivapragasam (2002) also reported a superior SVM performance compared to ANN performance in forecasting flood stages. Flood stages at a number of locations up stream were used to forecast flood stages at a downstream location. Asefa and Kemblowski (2002) used SVM to reproduce the behavior of the Monte-Carlo based groundwater flow and transport models that are utilized in the design of initial groundwater contamination detection monitoring systems. Chau et al. (2005) employed the Genetic Algorithm-based Artificial Neural Network (ANN-GA) and the Adaptive-networkbased Fuzzy Inference System (ANFIS) for flood forecasting in a channel reach of the Yangtze river in China. When comparing the performance with an empirical linear regression model, both ANN-GA and ANFIS present better accuracy than the linear regression model when cautious treatment is made to avoid

*Senior Researcher, Water Resources Research Department of the Korea Institute of Construction Technology, Ilsan, Goyang 411-412, Korea (E-mail: [email protected]) **Researcher, Water Resources Research Department of the Korea Institute of Construction Technology, Ilsan, Goyang 411-412, Korea (E-mail: [email protected]) ***Member, Professor, Dept. of Civil, Environmental, and Architectural Engineering, Korea University, Seoul 136-713, Korea (Corresponding Author, Email: [email protected]) − 870 −

Forecasting Performance of LS-SVM for Nonlinear Hydrological Time Series

overfitting. The ANFIS model is found to be optimal, except that it entails a large number of parameters. The ANN-GA model is also good, but it requires long computation time and additional modeling parameters. Lin et al. (2006) presented the SVM as a promising method for hydrological prediction. The SVM model is tested by using the long-term observations of monthly river flow discharges in the Manwan Hydropower Scheme. SVM is demonstrated as the potential candidate for long-term discharge predictions by comparing the performance with SVM, ARMA and ANN models. Wang et al. (2009) examined the Autoregressive Moving-Average (ARMA) models, ANNs approaches, ANFIS techniques, Genetic Programming (GP) models and SSVM method by using the long-term observations of monthly river flow discharges. The results present that ANFIS, GP and SVM bring out the best performance. Wu et al. (2009) discussed the accuracy performance of monthly streamflow forecasts when using data-driven modeling techniques on the streamflow series. A Crisp Distributed Support Vectors Regression (CDSVR) model, ARMA, K-Nearest Neighbors (KNN), ANNs, and Crisp Distributed Artificial Neural Networks (CDANN) was used for monthly streamflow prediction. The results presented that models using preprocessed data performed better than models using original data. Also the moving average improved the performance of ANN, CDANN, and CDSVR by adjusting the correlation relationship between input components and output of models. It was also found that the performance of CDSVR deteriorated with the increase of the forecast horizon. These studies report that hybrid heuristic models are superior to primitive model or SVM is superior to neural networks in prediction. However, hydrological time series data such as discharge have different nonlinearities in each data. Thus the performance of prediction model is also depends on the degrees of linearity (or persistency) of data and not only performance of model itself. We would clearly verify that prediction performance of models according to linear extent using Relative Correlation Coefficient (RCC) which is a relative measure of forecasting efficiency with different persistency. The present paper reviews the basic ideas underlying statistical learning theory and LS-SVM, and its potential is demonstrated by applying the method to two different practical problems in hydrological time series forecasting with different persistency. The first application is the forecasting of daily water demand data that has relatively strong linear relationships between the main input data sets and the output data set. The second application is the forecasting of daily mean inflow of dam data with relatively weak linear relationships between the input data sets and the output data set. The forecasting performance of LS-SVM is compared with that of a conventional statistical model such as MLR and a heuristic model such as a NNBP, in terms of the accuracy and efficiency of the nonlinear forecasting model. Moreover, in this study, as a special measure of forecasting efficiency, RCC is used to compare the forecasting efficiency between models in proportion to the persistency of the data series. Vol. 16, No. 5 / July 2012

RCC is an evaluation method of the real-time forecasting model to measure the forecasting improvement rate against the unchanged condition. Since the conventional evaluation measures, which can be estimated by calculating the errors between measured and forecast data, suffer the limitation of being unable to calculate the accuracy of forecasting to measure, they are unable to show clearly the forecasting efficiency relative to the persistency of data. In defining RCC, it is first necessary to specify the “persistency” of the data and the “efficiency” of forecasting. Persistency means the meteorological phenomenon by which weather remains relatively unchanged over short time intervals, thereby increasing the accuracy of the forecasts. An important guide to the persistency in a time series is given by the series of quantities called the sample Autocorrelation Coefficients, which measure the correlation between observations at different times. Forecasting efficiency can also be evaluated by examining the extent to which an available model can improve forecasting performance. In this point, RCC measures the efficiency of a forecasting model to improve the accuracy, defined as its accuracy relative to a forecasting based on the naïve or no knowledge condition without other knowledge about the data other than its calibrationperiod data. Forecasting efficiency for the naïve model is a measure of the extent to which using a forecasting model improves forecasting over just guessing. From this study, it has been clarified that SVM showed the better prediction performance than regression and neural network model for strong nonlinearity data series. And it is possible that model performance evaluated depending on the degree of nonlinearity.

2. Methodology 2.1 Introduction of Support Vector Machine (SVM) SVM algorithm is first invented by Vapnik and this has been greatly developed ever since. SVM has nice theoretical and practical properties such as good generalization, convergence to optimum solution, and results exceed other conventional learning machines. SVM is a particular instance of kernel machines that are large class of learning algorithms. And SVM has been shown successful applications in many fields such as bioinformatics, image recognition, and data-fitting. 2.2 Least-Squares Support Vector Machines (LS-SVM) LS-SVM is a powerful, nonlinear, black-box regression method that builds a nonlinear model in the so-called feature space where the inputs have been transformed by means of a nonlinear mapping ϕ that is possibly infinite dimensional. This is converted to the dual space by means of Mercer’s theorem and the use of a positive definite kernel, without explicitly computing the mapping ϕ. SVMs (Vapnik, 1998) and LS-SVM are characterized by primal-dual optimization formulations with the use of a positive definite kernel (Suykens and Vandewalle, 1999; Suykens et al., 2002). Their solution follows from convex programs.

− 871 −

Seok Hwan Hwang, Dae Heon Ham, and Joong Hoon Kim

Standard SVM leads to the solution of convex Quadratic Programming problems, whereas LS-SVM leads to that of linear equations. The LS-SVM formulation solves a linear system in dual space under a least-squares cost function, whereby the sparseness property can be obtained by, for example, sequentially pruning the support value spectrum or via a fixed-size subset selection approach. The LS-SVM training procedure involves the selection of a kernel parameter and the regularization parameter of the cost function, which can be done, for example, by cross-validation, Bayesian techniques, or other methods. The LS-SVM regression framework can be formulated as follows. Given the data set p l { xi, yi }i = 1 , with input vectors xi ∈ ℜ and output values yi ∈ ℜ , consider the regression model yi = f ( xi ) + ei , where x1, …, xl p are deterministic points, f:ℜ → ℜ is an unknown real-valued smooth function, and e1, …, el are uncorrelated random errors 2 2 with E [ ei ] = 0 , E [ ei ] = σ e < ∞ . LS-SVMs have been used to estimate the nonlinear f of the form: f ( xi ) = w ϕ ( xi ) + b T

(1)

nh

where ϕ ( xi ):ℜ → ℜ denotes the potentially infinite ( nh = ∞ ) dimensional feature map. The cost function for the data of the LS-SVM model (Suykens et al., 2002) in the primal space is given by: p

l

min w, b, ei

1 1 T 2 P ( w, e ) = --- w w + γ --- ∑ ei , 2 2

(2)

i=1

s.t. yi = w ϕ ( xi ) + b + ei, i = 1, …, l . T

The formulation includes a bias term, as in most standard SVM formulations, which is usually not the case in the other methods. The relative importance between the smoothness of the solution and data fitting is governed by the scalar, γ, referred to as the regularization constant. The optimization that is performed is known as a ridge regression (Golub and Van Loan, 1989). In order to solve the constrained optimization problem, a Lagrangian is constructed: l

L ( w, b, ei ;αi ) = P ( w, ei ) – ∑ αi { w ϕ ( xi ) + b + ei – yi } T

(3)

i=1

with αi as the Lagrange multipliers. The conditions for optimality are given by: ∂L ------- = 0 → w = ∂w

l

∑ αi ϕ ( xi ) , i=1

l

∂L ------ = 0 → ∑ αi = 0 , ∂b i=1 ∂L ------ = 0 → αi = γ ei, i = 1, …, l , ∂ei ∂L -------- = 0 → yk = wT ϕ ( xi ) + b + ei, ∂αi

the following set of linear equations: 1l α = 0 0 1l ⎧ ⎫ b = 0 ⇒ ⎨ ⎬ –1 –1 1l Ω + γ Il α y ⎩ ( Ω + INγ ) α + b = y ⎭ T

T

(5) n×n

with where y = [y1…yl]T, 1i = [1…1]T, α = [α1…αl]T, Ω ∈ ℜ Ωij = K(xi, xj). The resulting LS-SVM model can be evaluated at a new point x* by: ˆf ( x* ) =

l

* ∑ α i K ( x i, x ) + b

(6)

i=1

2.3 Kernel Function In Eq. (6), K(xi, xj) is defined as the kernel function. The value of the kernel is equal to the inner product of two vectors, xi and xj, in the feature space ϕ(xi) and ϕ(xj), that is, Κ(xi, xj) = ϕ(xi)T ϕ(xj). The elegance of using the kernel function is that one can deal with feature spaces of arbitrary dimensionality without having to compute the map ϕ(x) explicitly. Any function satisfying Mercer’s condition can be used as the kernel function. Table 1 shows the kinds of kernel functions, where d is the degree of polynomial kernel, with τ a tuning parameter and σ the bandwidth of the kernel. The kernel parameter should be carefully chosen as it implicitly defines the structure of the high dimensional feature space ϕ(x), and thus controls the complexity of the final solution. From the implementation perspective, training LS-SVM is equivalent to solving a linearly constrained QP with the number of variables twice that of the training data points. The sequential minimal optimization algorithm proposed by Scholkopf and Smola (2002) is reported to be very effective in training LS-SVM for solving regression problems. 2.4 Feedforward Back-propagation Neural Network (NNBP) The feedforward NNBP is a very popular model in neural networks. It does not have feedback connections, but errors are back-propagated during model training. Least mean squared error is used. Many applications can be formulated when using a feedforward NNBP, and the methodology is used as the model for most multi-layered neural networks. Errors in the output determine measures of hidden layer output errors, which are used as a basis to adjust the connection weights between the pairs of layers. Recalculating the outputs is an iterative process that is carried out until the errors fall below a certain tolerance level. Learning rate parameters scale the adjustments to weights. A momentum parameter can also be used in scaling the adjustments from a previous iteration and adding to the adjustments in the current iteration. Table 1. Typical Kernel Functions

i = 1, … , l

Kernel names Linear

Formulas of Kernels K(xi, xj) = xTi xj

Polynomial

K ( xi, xj ) = ( xTi xj + τ )d

Radial Basis Function (RBF)

K ( xi, xj ) = exp ( – xi – xj ⁄ σ )

(4)

By applying the kernel trick K ( xi, xj ) = ϕ ( xi ) ϕ ( xj ) with a positive definite (Mercer) kernel, K, the dual problem is given by T

− 872 −

2

2

KSCE Journal of Civil Engineering

Forecasting Performance of LS-SVM for Nonlinear Hydrological Time Series

2.5 Multiple Linear Regression (MLR) MLR is a method used to model a linear relationship between explanatory variables and the response variable. MLR is based on least squares: the model is such that the sum-of-squares of the differences between observed and predicted values is minimized. An MLR model can be expressed as yi = β0 + β1 xi1 + β2 xi2 + … + β k xik + ei

(7)

where i = 1,…,l, yi are the response variables, xil, xi2,…, xik are the explanatory variables, β0 is an intercept, β1, β2, …, βk are slopes, ei are random errors, and there is an independent and identically distributed N(0, σ2). l p Given the data set { xi, yi }i = 1, xi ∈ ℜ , yi ∈ ℜ , the equation can be rewritten in matrix notation as: yi = xi β + ei

(8)

where yi = [ y1 y2 …yl ] , xi = [ 1xi1 xi2 …xik ], β = [ β0 … βk ] . T

T

3. Model Applications 3.1 Descriptions for Sites and Data Sets Case I: Daily water demand (WD) forecasting; The daily maximum temperature (MT), daily sunshine duration (SD), and WD for the city of Seoul over the five years from January 1992 to December 1996 were collected. The population of Seoul from 1992 to 1996 in Table 2 showed a slight, but insignificant,

decrease. The mean population of those five years was 10,679,019. This accounts for about a quarter of the total national population. Data for the first four years were used for model training or parameter estimation of the model, and the last one-year data were used for validation. The variation of population in the service area was not considered since the variation was small during the study period (standard deviation is about 3.0% of mean). Table 3 outlines the basic statistics of the input and target data sets of WD forecasting. In case of WD data, the standard deviation (about 2.2% of mean) was much smaller than the mean, indicating that the relative data variation was much smaller than the full-scale data. The data sets of MT and SD were similar to the results of WD, although their degrees of difference differed. Skewness is a measure of the “asymmetry” of the probability distribution of a real-valued random variable. The skewness of WD showed varying positive values, revealing many values below the mean. However, MT and SD showed negative skewness values, suggesting many values above the mean. Kurtosis is a measure of the ‘peakedness’ of the probability distribution of a real-valued random variable. Kurtosis of WD data showed positive values and this was represented by a sharp probability distribution shape. Case II: Daily mean inflow (Df) forecasting of Chung-ju dam; The basin of Chung-ju dam is located in South Korea and has an area of 6,648 km2. The annual mean rainfall is 1,197.6 mm and the annual mean discharge (inflow into the dam) is 154.5 m3/s. All the rivers carry heavy runoff during the summer

Table 2. Population of Seoul, 1992-1996 Year Population (persons)

1992 10,969,862

1993 10,925,464

1994 10,798,700

1995 10,231,217

1996 10,469,852

Mean 10,679,019

Table 3. Basic Statistics of the Observed Data Sets Data sets and their basic statistics Basic statistics Data types Data sets Unit Length Mean Std Skew Kurt AC Lag of AC (a) Daily water demand 1827 5029224 162142 0.682 4.68 0.855 1 day Total m3 Training m3 1462 5036568 172135 0.575 4.37 0.855 1 day WD Forecasting m3 365 4999806 108947 0.975 3.72 0.840 1 day o Total C 1827 17.12 10.25 -0.218 1.83 0.957 1 day o MT Training C 1462 17.18 10.19 -0.230 1.84 0.957 1 day o Forecasting C 365 16.88 10.52 -0.173 1.76 0.960 1 day Total hour 1827 5.996 3.715 -0.326 1.85 0.290 1 day SD Training hour 1462 6.015 3.729 -0.334 1.84 0.293 1 day Forecasting hour 365 5.918 3.661 -0298 1.87 0.279 1 day (b) Daily mean inflow of dam Total m3/s 6574 167.3 520.2 11.87 208.3 0.675 1 day Df Training m3/s 6209 161.7 520.5 12.27 218.3 0.676 1 day Forecasting m3/s 365 262.3 506.0 4.958 40.18 0.654 1 day Total mm 6574 3.271 11.38 7.210 78.64 0.291 1 day Rf Training mm 6209 3.192 11.31 7.462 83.73 0.304 1 day Forecasting mm 365 4.614 12.55 4.024 20.86 0.097 1 day Length: the size of the data series; Mean: the mean of the data series; Std: the standard deviation of the data series; Skew: the skewness of the data series; Kurt: the kurtosis of the data series; AC: the autocorrelation coefficient of the given data series; Lag of AC: the lag time to calculate the AC. Vol. 16, No. 5 / July 2012

− 873 −

Seok Hwan Hwang, Dae Heon Ham, and Joong Hoon Kim

season (July to September). Df and Rf (daily rainfall) data of Chung-ju dam in Korea were collected for 18 years from 1986 to 2003. The first 17 whole-year data from January 1986 to December 2002 were used for model training or parameter estimation and the last year data for 2003 were used for validating the performance of forecasting models. Table 3 shows the basic statistics of input and target data sets for the Df and Rf of Chung-ju dam. The differences in terms of mean and standard deviation of these two data sets were quite large, and the standard deviation was greater than the mean, indicating that the Df of Chung-ju dam included some extreme values or outliers. The skewness of the Df and Rf of Chung-ju dam showed varying positive values, revealing many values below the mean. Furthermore, the skewness of the two data sets were large. Kurtosis of all data sets showed positive and varying values. This illustrated the sharp shape of the probability distributions. Furthermore, the values for the two data sets had many extreme values or outliers, since kurtosis of the inputs was large. 3.2 Correlation and Periodicity Analysis between Primary Factors An introduction to correlation and periodicity analysis; Scatter plots are useful for determining the linearity of the relationships. The Pearson product-moment correlation coefficient (i.e., the Pearson correlation coefficient or simply the correlation coefficient) is the widely used statistical measure for summarizing the relationship between two variables. Lowess (or Loess; Locally Weighted Regression) curves describe the relationship between two variables without assuming linearity or normality of residuals. An important guide to the persistency in a time series is given by the series of quantities called the Autocorrelation Coefficients (ACs), which measure the correlation between observations at different times. The set of ACs arranged as a function of separation in time is the Autocorrelation Function (ACF). As the observations in a time series are generally not independent, ACF gives the correlation between xi and xi-k for increasing values of k at time steps i and i-k. The spectrum of a time series is the distribution of variance of the series as a function of frequency. The object of spectral analysis is to estimate and study the spectrum. The spectrum contains no new information beyond that contained in the Autocovariance Function (ACVF), and, in fact, the spectrum can be computed mathematically by the transformation of the ACVF. However, the spectrum and the ACVF present information about the variance of the time series from complementary viewpoints. ACF summarizes information in the time domain, and the spectrum in the frequency domain. Case I: WD forecasting; Daily water use behavior can be affected by meteorological variables such as air temperature and sunshine duration. As shown in Fig. 1(a-1), a time-dependent water-use time series varies with daily temperature and annual seasons as affected by the weather. For example, the water-use rate is higher during summer. Fig. 1(a-1) shows that the time series for WD, MT, and SD had similar patterns of timedependent variation. Particularly, the correlation coefficient of

Fig. 1. (1) Time Series Plot of Input Variables, (2) Correlation Coefficients between Input Variables (Numerals of (2) are correlation coefficients and solid lines of (2) are lowess curve), (3) ACFs of Input Data, (4) Spectrums of Input Data (Vertical solid lines of (4) are frequencies at (a) oneyear and half-year periods, and (b) one-year, half-year and one-third-year periods from the left side)

− 874 −

KSCE Journal of Civil Engineering

Forecasting Performance of LS-SVM for Nonlinear Hydrological Time Series

Fig. 1(a-2) reveals a strong linear relationship between WD and MT. However, the linear correlation is not proportional to the result of forecasting for nonlinear data sets. Temperature is considered to be a primary factor affecting WD. Especially, high temperatures over 20 degrees Celsius have a significant correlation with WD in the weather dynamic. Lowess curves in Fig. 1(a2) show that non-linear relationships existed among the WD, MT, and SD variables. The ACF of Fig. 1(a-3) shows that the WD, MT, and SD had periodic variations. These three meteorological factors varied within the year and these three data factors had annual variations. Furthermore, a spectrum analysis of Fig. 1(a4) showed that WD and SD changed with every one-year and sixmonth period, while MT changed with the one-year period. There can be many input parameters for WD forecasting. However, the exogenous variables, except WD and temperature factors, have little influence on short-term WD forecasting. Certain WD occasions are not sensitive to SD. For example, SD remains within the realm of noise in data analysis in Fig. 1(a-1). Therefore, the exogenous variables, except WD and MT, were not considered in this study. Case II: Df forecasting of Chung-ju dam; The Df forecasting of Chung-ju dam used Rf data as an input. In general, timedependent inflow time series data varied with antecedent rainfall data. Fig. 1(b-1) shows that the time series for Df and Rf had similar patterns of time-dependent variations. More particularly, the correlation coefficient of Fig. 1(b-2) showed a weak linear relationship as 0.44 between Df and Rf data. From this fact, a strong nonlinear relationship exists between the Df data and Rf data because Rf was clearly the primary factor affecting the Df in terms of the physical rainfall-runoff process. The ACF of Fig. 1(b-3) shows that both Df and Rf exhibited some periodical patterns with clear annual variations. Furthermore, a spectrum analysis of Fig. 1(b-4) shows clearly the Df and Rf changes over the six-month (half year) and four-month (third of a year) time periods. Many input parameters can influence Df. However, the exogenous variables, except Rf, had little influence on the short-term Df forecasting. Therefore, other exogenous factors, except Df and antecedent Rf, were not considered in this study.

validate the performance of the developed models. In this study, the Gaussian radial basis function was used as the kernel function of LS-SVM (Pelckmans et al., 2003). LS-SVM used the two parameters to define the nonlinear function, namely γ and σ2. As noted earlier, γ is a regularization constant and σ2 is the bandwidth of the radial basis kernel function (RBF). The improper selection of these two parameters can cause over-fitting or under-fitting problems. Since there are few general guidelines to determine the parameters of LS-SVM, this study varied the parameters to select the optimal parameter values for the best forecasting performance. That is, proposed values were chosen over dozens of trial and error experiments. The generalized error was minimum for σ2 = 200 and γ = 50 for LS-SVM. The parameter values presented in this paper may be considered the appropriate level since the sensitivities of SVM parameters relatively are not large, although the appropriate level of parameters may differ according to data. The activation function of the network was a sigmoid function for NNBP. Design of an appropriate WD forecasting model requires the use of data that reflect the physical variables that affect the WD. Following synthetic consideration of all the factors and the results of correlation and periodicity analyses of Fig. 1(a) and Table 4, the input data set was composed of the previous WD and MT. The input data can be expressed as 1 2 WDt = f(wt, WDt – 1, WDt – 2, WDt – 3, --- ∑ WDt – 3 + i, MTt – 1, 3i = 0 1 2 --- ∑ MTt – 3 + i ), t = 4, …, l 3i = 0

(9)

where f is the linear or nonlinear mapping function used to represent the relationships between the input data and the output data, and WD, w, and MT are WD (m3), day type (patterns of the days were quantified as 0.1 for weekdays, and 0.9 for Sundays, the scale of numerical values were not important in this case because patterns of days are logical values such as True or False. As a general rule, patterns of days should be set as 0 and 1. This settings, however, might be trouble in training process of neural Table 4. Correlation Coefficients between the Data Sets

3.3 Model Application Case I: WD forecasting; A hydrologist may prefer to implement a simpler data-driven model to identify a direct mapping between the inputs and outputs without any detailed consideration of the internal structure of the physical processes where the main concern is to create accurate forecasts at specific locations under well-rehearsed Rf scenarios and antecedent conditions (Dibike, 2001). Many methods can be used to reconstruct state space. The time delay coordinate method is currently the most popular choice and was applied in this study. WD and MT data were chosen as inputs. The results of forecasting by LS-SVM were compared to those of NNBP and MLR. Data from January 1992 to December 1995 were used for training the LS-SVM and NNBP models, and for parameter estimation of the MLR model. The 1996 data were used to Vol. 16, No. 5 / July 2012

Data sets

− 875 −

(1) WDt (a) WD

(2) WDt (3) WDt (4) Dft (5)

(b) Df

Dft (6) Dft (7) Rft

Correlation coefficients (CC) WDt WDt-1 WDt-2 WDt-3 WDt-4 1.000 0.855 0.791 0.771 0.759 MTt MTt-1 MTt-2 MTt-3 MTt-4 0.625 0.597 0.577 0.572 0.569 SDt SDt-1 SDt-2 SDt-3 SDt-4 0.137 0.029 -0.001 0.012 0.024 Dft Dft-1 Dft-2 Dft-3 Dft-4 1.000 0.676 0.387 0.296 0.263 Rft Rft-1 Rft-2 Rft-3 Rft-4 0.447 0.122 0.084 0.091 0.099 Rft Rft-1 Rft-2 Rft-3 Σ2i = 0 Rft-3+i 0.447 0.122 0.084 0.091 0.143 Rft Rft-1 Rft-2 Rft-3 Rft-4 1.000 0.286 0.080 0.050 0.078

Seok Hwan Hwang, Dae Heon Ham, and Joong Hoon Kim

networks), and MT (oC), respectively. The length of data is represented by l. In this study, finally, the input data were composed of the previous 3 days’ WD (WDt-1, WDt-2, WDt-3), previous 1 day’s MT (MTt-1), and the present day’s day type (wt), which were used to forecast the present day’s WD (WDt). 2 Furthermore, the means of the previous 3 days’ WD ( 1 ⁄ 3Σi = 0 2 WDi – 3 + i ) and MT ( 1 ⁄ 3Σ i = 0 MT i – 3 + i ) were used as input data to represent the daily short-term effects of the temperature cycles in Korea (Kim, 1994). As shown in Table 4, the correlation coefficients between the present day’s WD (WDt) and the previous WD (WDt-1~WDt-4) were slightly decreased, but the decreasing rates of correlation coefficients were almost constant after the previous three days (WDt-3~WDt-4). In the case of MT (MTt), the correlation coefficients between the present WD (WDt) and MT (MTt~MTt-4) were almost constant after the previous one day (MTt-1~MTt-4). The SD (SDt) was excluded as an input data set, because the correlation coefficients between present WD (WDt) and SD (SDt ~SDt-4) were close to zero after the previous one day (SDt-1~SDt-4). Case II: Df forecasting of Chung-ju dam; The time delay coordinate method was also applied in this case and the Df and Rf data were chosen as inputs. The results of forecasting by LSSVM were compared with those by NNBP and MLR. The data from January 1986 to December 2002 were used for model training the models or parameter estimation. The 2003 data were used for validating the performance of the developed models. The generalized error was minimum for σ2 = 15 and γ = 3 for LSSVM. The activation function of the network was a sigmoid function for NNBP. To design an appropriate Df forecasting model, all the factors reflect the physical variables causing Df to change were synthetically analyzed and the correlation and periodicity analysis results are shown in Fig. 1 (b) and Table 4. The Df forecasting model can be considered as a function of the lag vectors of Df, Rf, and the summation of Rf ( ΣRf ). 2

Dft = f(Dft – 1, Dft – 2, Dft – 3, Rft – 1, Rft – 2, Rft – 3, ∑ Rft – 3 + i ), t = 4, …, l

correlation coefficients from 0.09 (Dft vs. Rft-3) to 0.14 (Dft vs. Σ Rft-1,t-2,t-3) between Df and Rf. Finally, input data were composed of the antecedent three days’ Df and Rf (Dft-1, Dft-2, Dft-3, Rft-1, Rft-2, Rft-3) to forecast the next day’s Df (Dft). Furthermore, 2 the sum of the antecedent three day Rf ( Σi = 0 Rft – 3 + i ) were used for input data to represent the superposition effect resulting from the differences of lag time in the rainfall-runoff process. 3.4 Evaluation Measures for Forecasting Performance Quantitative measures; The test criteria for performance measures of the forecasting methods used in this study are the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE), the Relative RMSE (RRMSE), and the Mean Absolute Percentage Error (MAPE). These performance measures are outlined in the Table 5. The magnitude of MAE for forecasting a given lead time (or lag time) is a measure of the degree of bias. The RMSE (or RRMSE) is the average of the forecasting square errors. Therefore, a few large errors can cause a large RMSE value, although most of the forecast error magnitudes are within acceptable limits. Despite this disadvantage, RMSE is useful as an unbiased estimate of the variance of the random component. Therefore, in a forecasting comparison between two models, a smaller RMSE indicates better forecasting accuracy. As a general rule, a better forecasting model is one that yields unbiased forecasts with a smaller RMSE value. RRMSE is equal to zero if the forecasting model is perfect and to 1 if the forecasting data are equal to the mean of the estimated data. MAPE is very useful not only for forecast comparison purposes but also as an absolute measure of forecasting accuracy. Two different forecasting methods can sometimes possess comparable performance levels with respect Table 5. Evaluation Measures for Forecasting Performances Measure MAE

Name Mean Absolute Error

RMSE Root Mean Square Error

(10)

i=0

where f is the linear or nonlinear mapping function used to represent the relationships between input data and output data, and l is the full length of the data. As shown in Table 4, in case of Df (Dft), the correlation coefficients between the present day’s Df (Dft) and the previous day’s Df (Dft-1~Dft-4) rapidly decreased, but become almost constant after the previous three days (Dft-3~Dft-4). In the case of Rf (Rft), the correlation coefficients between the present day’s Rf (Rft) and the previous day’s Rf (Rft-1~Rft-4) were almost constant after the previous two days (Rft-2~Rft-4). In Table 4, correlation coefficients between the present day’s Df (Dft) and the present day’s and previous day’s Rf (Rft~Rft-4) were almost constant after the previous two days (Rft-2~Rft-4). However, the previous three days’ Rf data (Rft-1~Rft-3) were selected as inputs by considering the time of the concentration of Rf. As shown in Table 4, the existence of the superposition effect resulted from differences of the time of concentration evinced by the increase in the

Formula

mean ( ei ) 2

mean ( ei ) 2

2

2

2

RRMSE

Relative Root Mean Square Error

for i = 1…l

MAPE

Mean Absolute Percentage Error

mean ( pi )

NSC

Nash-Sutcliffe R2

1 – F ⁄ Fb , where, 2 2 Fb = mean { ( yi – y ) } , for i = 1…l

F ⁄ Fb , where, Fb = mean ( yi ) ,

2

2

2

2

1 – F ⁄ Fb , where, PI

Persistency Index

2

2

Fb = mean { ( yi – yi – 1 ) } , for

i = 2…l Relative Correlation RCC ryyˆ ⁄ ryyb Coefficient ei = yi – yˆ i , for i = 1…l, where yˆ i and yi indicate the forecast data and observed data at time i, respectively. 2

2

F = mean ( ei ) .

pi =100×ei/yi, for i = 1…l. y indicates the mean of the observed data. r represents Pearson’s correlation coefficient, and the subscripts yˆ , y and yb indicate the forecast data set, observed data set and benchmark data set, respectively. Benchmark data set are the usually used observed data set by lagged k time steps.

− 876 −

KSCE Journal of Civil Engineering

Forecasting Performance of LS-SVM for Nonlinear Hydrological Time Series

to MAE and RMSE. Regardless of the magnitude or variability of the time series, a smaller MAPE means a better, more accurate forecast. As a general rule, any accurate forecasting system must always yield low MAPE values. Relative measures; RCC is introduced to represent the relative efficiency of nonlinear forecasting accuracy. RCC is proposed as a simple alternative to the persistency index, PI, and the NashSutcliffe coefficient of efficiency, NSC. It is defined as ryyˆ ⁄ ryy , where r represents the Pearson’s correlation coefficient, and the subscripts yˆ , y, and yb indicate the forecast data set, observed data set, and benchmark data set, respectively, which are the commonly observed data sets by lagged k time steps. b

covariance ⎛ --------------------------------------------------------------------------⎞ degree of agreement ⎝ multiplied standard deviations⎠ RCC = ---------------------------------------------------- = -------------------------------------------------------------------------------degree of persistence ⎛ autocovariacne --------------------------------------------------------------------------⎞ ⎝ multiplied standard deviations⎠ Syyˆ -------------------ryyˆ Syy Syˆ yˆ = ------- = ----------------------Syyb ryyb ----------------------* Syy Sybyb

2

(11)

where, ryyˆ ≥ 0 and ryy ≥ 0 . When l pairs of observed and forecast variables yi and yˆ i , i = 1…l. The variances and covariance of yi and yˆ i are given by: b

l

l

l

2

syy = ∑ ( yi – y ) , syˆ yˆ = ∑ ( yˆ i – yˆ i ) , syyˆ = ∑ ( yi – y ) ( yˆ i – yˆ ) 2

i=1

i=1

(12)

i=1

l l where y = 1 ⁄ Σi = 1 yi and yˆ = 1 ⁄ Σi = 1 yˆ i are the overall mean. When l pairs of observed variables yi, i = 1…l. The k-order AC is the simple correlation coefficient of the first l-k observations, where k =1…l-1, yi-k, i = k+1…l, with the next l-k observations, yi, i =k+1…l. The variances and covariance of yi and yi-k are given by: *

syy =

l

∑ i = 1+k

* 2

( y i – y ) , s y b yb =

data, which were used as input data, and k-step after data, which were used as target data, is very high (i.e., if there is a strong k-step lagged autocorrelation of the target data), even though errors between k-step after forecast data and k-step after observed data are very small, then the forecasting efficiency of certain model is relatively poor. From this perspective, RCC is a good evaluation method to measure quantifiably the performance of the forecast model by comparing the forecasting accuracy as a function of persistency. A positive value of RCC greater than 100% represents an improved model. An RCC value equal to 100% indicates that the model is no better than a model in which the forecast is the lagged k observed record, indicating no change in predictability. An RCC value which is smaller than 100% indicates that the model is performing worse than a model in which the forecast is the lagged k observed record. In view of its aim to test statistics, RCC is similar to PI.

l

∑ i = 1+k

2

( yi – k – yb ) , syyb =

l



*

( yi – y ) ( y i – k y b )

(13)

i = 1+k

l

where y* = 1 ⁄ ( l – k )Σil = 1 + k yi and yb = 1 ⁄ ( l – k )Σi = 1 + k yi – k are the overall mean. For reasonably large l and small k, the difference * between the sub-period yb and y can be ignored. The coefficient ryyˆ can also be expressed as the ratio between the covariance and the multiplied standard deviations of the observed and forecast values. Therefore, it estimates the combined dispersion against the single dispersion of the observed and forecast series. The coefficient ryyb estimates the combined dispersion against the single dispersion of the observed and observed series. As previously mentioned, in the case of a high ryyb (i.e. strong persistency), the model’s forecasting properties can be dramatically improved by a simple linear adjustment of the observed value without using complex nonlinear forecasting models. For this reason, the value of ryyb is very important in real-time forecasting. Generally, when k-step-ahead forecasting is performed using certain forecasting models, the forecast accuracy is estimated using differences (errors) between forecast data and observed data in the same time range. For example, in k-step-ahead forecasting using the observed data, if the degree of correlation between present Vol. 16, No. 5 / July 2012

F PI (Persistence Index) = 1 – -----2 (14) Fb 2 2 2 2 where, F = mean { ( yi – yˆi ) } , Fb = mean { ( yi – yi – 1 ) } , for i = 2… l. However, for lead time model forecasting assessments, a more meaningful value is required, because testing relationships in terms of variations about the mean of the observed series is neither stringent nor rigorous, producing results that are difficult to interpret (Anctil et al., 2004). Furthermore, PI has no absolute standard to estimate the magnitude of accuracy or efficiency (relative improvement of accuracy) with the value of PI itself. In other words, PI is a relative value, and therefore incomparable with different data sets directly. From this point of view, RCC is more appropriate than PI for ensuring that the forecasting performance is better than persistency. Table 5 shows the evaluation measures used in this study.

4. Analysis and Results 4.1 Results of the Forecasting Performance Figure 2 and Table 6 compare the forecasting performance among the three models with observed and forecasted WD. LSSVM showed excellent performance results for WD forecasting. The performance of the forecasts was evaluated by the various, previously mentioned, goodness-of-fit measures. The results of the validation test of the forecasting model, as shown in Table 6, clearly showed the greater accuracy of the LS-SVM forecast compared to the NNBP and MLR models in the testing time period. The testing criteria of MAE, RMSE, RRMSE, and MAPE were calculated in order to measure the forecasting performance. The performance measures of LS-SVM showed lower errors than those of NNBP and MLR. Especially, MAPE was useful when comparing the absolute measure of forecasting accuracy, indicating a forecasting error ratio to the original data scale. The MAPE of LS-SVM at 0.843% was lower than the 0.882% of NNBP and 0.904% of MLR. These comparisons of performance

− 877 −

Seok Hwan Hwang, Dae Heon Ham, and Joong Hoon Kim

Fig. 2. Forecasting Results of LS-SVM, NNBP and MLR for WD Table 6. Forecasting Performances LS-SVM, NNBP and MLR for WD Performance measures for evaluating forecasting errors MAE RMSE RRMSE MAPE(%) CC RCC(%) LS-SVM 42223.6 56259.7 0.011 0.843 0.859 102.0 NNBP 44140.9 58579.7 0.012 0.882 0.845 100.4 MLR 45316.6 59684.1 0.012 0.904 0.844 100.3 Models

measures proved the superior forecasting accuracy of LS-SVM compared to that of NNBP and MLR. However, the MAPEs of these three models were representing small values overall, indicating that the variations of forecasting errors were small compared to the full scale of the original data set. In addition, the Pearson correlation coefficient (CC or r) between the forecast and observed data was computed so as to measure and compare the forecasting accuracy rate of each model. These CC comparisons showed that the forecasting accuracy rate of LS-SVM at 0.859 was higher than that of NNBP and MLR at 0.845 and 0.844, respectively. The RCC of the three models were computed to estimate and compare the forecasting improvement rate by comparing the forecasting accuracy (CC) to persistency (AC). The RCC values of LS-SVM, NNBP, and MLR were 102.0%, 100.4%, and 100.3%, respectively. The higher RCC of LS-SVM than that of NNBP and MLR indicated that the forecasting efficiency of LS-SVM was superior to that of NNBP and MLR. However, the forecasting efficiencies of LS-SVM, NNBP and MLR for the WD data were not good overall. The NNBP and MLR models, in particular, were almost the same as the unchanged situation, with forecasting performances that were improved from the 100% RCC value of the naïve model by only 0.4% and 0.3%, respectively. These results were mainly attributed to the strong persistency of the WD data. Figure 3 shows a plot of observed versus forecast data to compare the performance among the three models with observed

and forecast Df at Chung-ju dam. LS-SVM showed excellent performance results for daily rainfall-runoff forecasting and comparatively good results with respect to peak flow matching. The results of the validation test shown in Table 7 clearly indicated that the LS-SVM forecast was more closely aligned to the actual values than were the NNBP and MLR models in the testing time period, because the forecasting errors in the LS-SVM model were correspondingly smaller than those in the other models. The test criteria of MAE, RMSE, RRMSE, and MAPE were calculated to measure the forecasting performance. The performance measures (forecasting errors) of LS-SVM were much lower than those of NNBP and MLR. The MAPE criteria were computed to estimate the absolute measure of forecasting accuracy. The MAPE of LS-SVM was significantly lower at 35.9% than the 99.7% for NNBP and 104.1% for MLR, confirming that the variance of forecasting errors of LS-SVM was smaller than that of NNBP and MLR, as compared to the scale of the original data set. The result can also be interpreted to indicate that Df forecasting using LS-SVM at high value regions is much more accurate than using NNBP and MLR. The CC comparison showed that the forecasting accuracy rate of LS-SVM at 0.878 was higher than that of NNBP and MLR at 0.834 and 0.824, respectively. Furthermore, to estimate and compare the forecasting efficiency (forecasting improvement rate) by comparing the forecasting accuracy against persistency, the RCC values of LSSVM, NNBP, and MLR were computed as 134.4%, 127.6%, and 126.1%, equating to improvements in comparison with the naïve model of 34.4%, 27.6% and 26.1%, respectively. This confirmed the superior forecasting efficiency of LS-SVM compared to that of NNBP and MLR. Finally, this result indicated that Df data had a relatively strong nonlinear property, and that the forecasting performance of LS-SVM for nonlinear data was better than that of NNBP and MLR.

− 878 −

KSCE Journal of Civil Engineering

Forecasting Performance of LS-SVM for Nonlinear Hydrological Time Series

Fig. 3. Forecasting Results of LS-SVM, NNBP and MLR for Df Table 7. Forecasting Performances of LS-SVM, NNBP and MLR for Df Models LS-SVM NNBP MLR

Performance measures for evaluating forecasting errors MAE RMSE RRMSE MAPE (%) CC RCC (%) 74.6 254.0 0.444 35.9 0.878 134.4 106.3 281.2 0.492 99.7 0.834 127.6 113.4 289.2 0.506 104.1 0.824 126.1

4.2 Analysis for Randomness of Residual A general time series model, linear regression in particular, can be written as: yi = f(xi) + ei, i = 1…l

(15)

where f(xi) is the fitted part that includes any predictable pattern in the series. However, ei is impossible to model any further, and is the residual (or error) term assumed to have a zero mean, a constant variance, and to be probabilistically independent. In this study, the randomness of residuals was tested to confirm f(xi) as being a suitable model for the data according to these assumptions. The first method to check the randomness of residuals is to simply plot the residuals and test if any periodic pattern or trend emerges. The second method is to simply plot the residuals against their rank or the equivalent quantiles from the reference distribution (in this study, normal distribution). The third method is to conduct a statistical test for randomness. In this study, the second and third methods were conducted to test the randomness of the residuals. To conduct the second method, normal Q-Q plots were used. The Q-Q plot compares the quantiles of two variables. If the variables come from the same type of distribution, the Q-Q plot is a straight line. To conduct the third method, the following tests were used at 95% Confidence Interval (CI): Run, Turning point, Daniel, Mann-Kendall and linear regression test. Figure 4 shows the distribution of residuals for WD. The normal Q-Q plots of residuals showed that the residuals were Vol. 16, No. 5 / July 2012

Fig. 4. Normal Q-Q Plots of Residuals for (a) LS-SVM, (b) NNBP, and (c) MLR (WD) Table 8. Test for Randomness of the Residuals for WD at 95% Confidence Interval Model LS-SVM NNBP MLR z-value H0 z-value H0 z-value H0 Run test 1.419 Accept 2.681 Reject 0.368 Accept Turning points test 0.083 Accept 1.412 Accept 0.166 Accept Daniel test 1.156 Accept 2.304 Reject 0.382 Accept Mann-Kendall test 1.236 Accept 2.369 Reject 0.382 Accept Linear regression test 1.264 Accept 2.335 Reject 0.447 Accept H0: Null hypothesis. H0 of Run test, Turning point test: No trend and independent. H0 of Daniel test, Mann-Kendall test, Linear regression test: No trend. Test

well matched with the normal Q-Q lines, except for some extreme values, and that the histograms of the residuals were distributed symmetrically. In the results of the statistical test for the randomness of the residuals shown in Table 8, the residuals between the observed and forecast data for WD by LS-SVM and MLR exhibited no trend, and independent at the 95% CI. However, the residuals between the observed and forecast data by NNBP showed trends for the Daniel, Mann-Kendall, and linear regression tests at the 95% CI. These normal Q-Q plots and the results of the statistical randomness test for residuals

− 879 −

Seok Hwan Hwang, Dae Heon Ham, and Joong Hoon Kim

indicated that the residuals from LS-SVM and MLP-but not from NNBP- represented the similar characteristics of white noise. White noise is a series of uncorrelated random variables, with zero mean and constant variance. From these results of the residual analysis for checking the randomness assumptions, we assert that the forecasted WD generated by LS-SVM and MLP, but not by NNBP, presented a statistically good fit to the observed WD. Figure 5 shows the distribution of the residuals for Df. The normal Q-Q plots of the residuals showed that the residuals were well matched with the normal Q-Q lines, except for some extreme positive values. The histograms of the residuals were distributed symmetrically, especially for LS-SVM. The results of Table 9 show that the residuals between the observed and forecast data of Df by LS-SVM exhibited no trend and were independent at the 95% CI. However, the residuals from NNBP showed a trend for the Run test, and those from MLR had a trend for the Run test and the Turning points test at the 95% CI. From these normal Q-Q plots and the results of the statistical test for the residuals, the residuals from LS-SVM only showed characteristics of white noise. From these results of the residual analysis for Df, the forecasted Df by LS-SVM, but not by NNBP and MLR, presented a statistically good fit to the observed Df. 4.3 Assessment for Forecasting Efficiency This section presents a sensitivity analysis between RCC and PI yield to data persistency in order to assess the performance of forecasting improvement. They are similar methodologies that use the relationship to persistency even if they measure quite different quantities and have different structures. Although two given data

sets may show the same forecast results, their forecasting performances should be interpreted differently if the strengths of persistency of the two data sets are very different. That is, the accurate forecasting of this data set is generally quite simple and easy when one data set has strong persistency (or strong linearity). From this perspective, traditional validation test methods, which estimate the forecasting accuracy like MAE by using a comparison of forecast and observed data, have some limitations. Therefore, complementary validation test methods, such as PI, were introduced. However, because PI is formulated as squared errors, the PI values vary sensitively when the persistency is weak, and vary insensitively when the persistency is strong. For this reason, forecasting efficiencies cannot be estimated by constant amounts. As a result, PI can act as a standard of judgment for the improvement of forecasting accuracies but cannot become an intuitive standard of judgment for the degrees of improvement of forecasting. RCC is a very useful measure to estimate the forecasting improvement quantitatively regardless of any persistency of the original data set. A sensitivity analysis of evaluation measures such as PI, RCC, and NSC, which are the typical relative measures for assessing the performance of real-time forecasting models, can be performed using WD and Df. The forecasting improvement rates, RCC of LS-SVM, NNBP, and MLR for the data sets with different persistency, were assessed by comparing the CCs between the forecast data and the observed target data with ACs of the observed target data. As shown in Fig. 6, since the WD data set had a large AC value of 0.842, the RCC value was only 102.0%, indicating a forecasting improvement to the original observed data set of only 2.0%. The Df had a small AC value of 0.653, but forecasting accuracies were good through each of the models. Particularly, the RCC of LS-SVM was 134.4%, indicating a considerable forecasting improvement to the original observed data set of 34.4%. Summarizing the results, the forecasting accuracy and efficiency of LS-SVM were the highest among the three models, which confirmed the clear superiority of LS-SVM to both NNBP and MLR in the real-time forecasting of a nonlinear hydrological time series. The goodness-of-fit of the evaluation method is estimated by

Fig. 5. Normal Q-Q Plots of Residuals for (a) LS-SVM, (b) NNBP, and (c) MLR (Df) Table 9. Test for Randomness of the Residuals for Df of Dam at 95% Confidence Interval Model LS-SVM NNBP MLR Test z-value H0 z-value H0 z-value H0 Run test 0.053 Accept 5.639 Reject 4.796 Reject Turning points test 1.250 Accept 0.250 Accept 2.000 Reject Daniel test 0.598 Accept 1.860 Accept 1.903 Accept Mann-Kendall test 0.606 Accept 1.397 Accept 1.424 Accept Linear regression test 0.389 Accept 0.733 Accept 0.793 Accept H0: Null hypothesis. H0 of Run test, Turning point test: No trend and independent. H0 of Daniel test, Mann-Kendall test, Linear regression test: No trend.

Fig. 6. An Analysis of Forecasting Efficiency (i.e., improvement of forecasting accuracy compared to original correlations) for Nonlinearity by Comparing the Autocorrelation of the Observed Data with the RCC of Each Model

− 880 −

KSCE Journal of Civil Engineering

Forecasting Performance of LS-SVM for Nonlinear Hydrological Time Series

Fig. 7. The Goodness-of-fit of RCC Compared with Others (PI, NSC) as a Validation Test Method to Estimate the Nonlinear Forecasting Efficiency

comparing with the original data set. Fig. 7 shows the lag one AC of the observed WD and Df, and the CC between the observed and forecast data. RCC represents the ratio between AC and CC. WD had a strong persistency (linearity) since it showed large AC value. However, Df had a weak persistency since it showed small AC value. In these two hydrological time series with different persistencies (i.e., degree of linearity), Df had larger RCC than WD did. That is, the forecasting improvement of Df compared to the original observed data set was more significantly increased than that of WD in comparison with RCC, PI, and NSC. However, in the case of NSC, since the NSCs of the WD and Df presented similar values of 0.736 and 0.749, respectively, a direct comparison of NSCs between the two data series was very difficult. As for PI, in the absence of any absolute decision standard of the values to 0.162 and 0.638, it was very difficult to measure the absolute amount of forecasting improvement using PI. However, in the case of RCC, the RCC values can indicate the degree of improvement or deterioration of the forecasting accuracy compared to the observed lagged k data, which is based on the percentage scale. As a result, the comparison of the forecasting efficiencies, which were evaluated using RCC for the two models, indicated that the forecasting accuracies of WD and Df were increased by 2.0% and 34.4%, respectively, compared to the unchanged situation. This result confirmed the higher forecasting efficiency of Df data had a strong nonlinearity by LS-SVM compared to that of WD data had a strong linearity by LS-SVM.

to those of NNBP and MLR for the WD and the Df. For WD, which had a strong autocorrelation, LS-SVM presented slightly more accurate and improved results than NNBP and MLR did, but the forecasting accuracies of the three models did not differ significantly. For the Df, which had a weak autocorrelation, the forecasting accuracy and improvement of LS-SVM were greatly superior to those of NNBP and MLR. The forecasting accuracy of WD was relatively increased by only 2.0%, but that of Df was increased by 34.4%, compared to the unchanged situation. These results confirmed the more powerful forecasting ability of LSSVM, compared to NNBP and MLR, to forecast the hydrological time series data with strong nonlinearity. Conventional estimation measures, which estimate the errors between measured data and forecast data, are able to calculate the accuracy of forecasting but they cannot show the forecasting efficiency, i.e., the forecasting improvement rate according to the persistency of data. Therefore, RCC should be considered when assessing the performance or efficiency of real-time hydrological forecasting models. Compared to conventional relative evaluation measures such as NSC or PI, RCC is the best measure for assessing the forecasting performance of real-time forecasting models. These results confirmed the suitability of RCC for assessing the forecasting performance of real-time forecasting models compared to conventional evaluation measures such as NSC or PI. Therefore, RCC should be considered when assessing the performance, or efficiency of real-time hydrological forecasting models. However, despite being a good evaluation method for assessing forecasting efficiency, as an evaluation method rather than a metrical evaluation measure, RCC should be used in conjunction with metrical evaluation measures of errors such as MAE, RMSE or MAPE to increase its accuracy and objectivity of judgment. In this paper, the performance ability of SVM model in predicting nonlinear data was evaluated. And we had showed that prediction performance of LS-SVM is superior to multiple linear regression and neural networks using back-propagation in prediction of nonlinear data. However, more studies need to be conducted on performance of forecasting with various persistencies data also in cases of hybrid forecasting models since SVMs or neural networks can be improved tied with genetic algorithm or fuzzy.

References 5. Conclusions This paper has presented an LS-SVM approach for forecasting nonlinear hydrological time series. The objective of this paper was to examine the feasibility of using LS-SVM in forecasting of nonlinear hydrological time series by comparing it with a statistical method such as MLR and a heuristic method such as NNBP. And we would clearly verify that prediction performance of three models according to linear extent using daily water demand and daily mean inflow of dam data. The main study results are summarized as follows. The forecasting accuracies and efficiencies of LS-SVM were superior Vol. 16, No. 5 / July 2012

Anctil, F., Michel, C., Perrin, C., and Vazken Andreassian, V. (2004). “A soil moisture index as an auxiliary ANN input for stream flow forecasting.” Journal of Hydrology, Vol. 286, Nos. 1-4, pp. 155-167. Anderson-Sprecher, R. (1994). “Model comparison and R2.” The American Statistician, Vol. 48, No. 2, pp. 113-117. Armstrong, J. S. and Collopy, F. (1992). “Error measures for generalizing about forecasting methods: Empirical comparisons.” International Journal of Forecasting, Vol. 8, Issue 1, pp. 69-80. ASCE (1993). “Criteria for evaluation of watershed models.” Journal of Irrigation and Drainage Engineering, Vol. 119, No. 3, pp. 429-442. ASCE (ASCE Task Committee on the Application of Artificial Neural Networks in Hydrology) (2000a). “Artificial neural networks in

− 881 −

Seok Hwan Hwang, Dae Heon Ham, and Joong Hoon Kim

hydrology. I: Preliminary concepts.” Journal of Hydrologic Engineering, Vol. 5, Issue 2, pp. 115-123. ASCE (ASCE Task Committee on the Application of Artificial Neural Networks in Hydrology) (2000b). “Artificial neural networks in hydrology. II: Hydrologic applications.” Journal of Hydrologic Engineering, Vol. 5, Issue 2, pp. 124-137. Asefa, T. and Kemblowski, M. W. (2002). “Support vector machines approximation of flow and transport models in initial groundwater contamination network design.” Eos. Trans. AGU, Vol. 83, No. 47, Fall Meet. Suppl., Abstract H72D-0882. Bracmort, K. S., Arabi, M., Frankenberger, J. R., Engel, B. A., and Arnold, J. G . (2006). “Modeling long-term water quality impact of structural BMPS.” Trans. ASAE, Vol. 49, No. 2, pp. 367-384. Chau, K. W., Wu, C. L., and Li, Y. S. (2005). “Comparison of several flood forecasting models in Yangtze River.” Journal of Hydrologic Engineering, ASCE, Vol. 10, No. 6, pp. 485-491. Cheng, C. T., Ou, C. P., and Chau, K. W. (2002). “Combining a fuzzy optimal model with a genetic algorithm to solve multiobjective rainfall-runoff model calibration.” Journal of Hydrology, Vol. 268, Nos. 1-4, pp. 72-86. Cheng, C. T., Wang, W. C., Xu, D. M., and Chau, K. W. (2008). “Optimizing hydropower reservoir operation using hybrid genetic algorithm and chaos.” Water Resources Management, Vol. 22, No. 7, pp. 895-909. Dawson, C. W., Abrahart, R. J., and See, L. M. (2007). “HydroTest: A web-based toolbox of evaluation metrics for the standardized assessment of hydrological forecasts.” Environmental Modelling & Software, Vol. 22, No. 7, pp. 1034-1052. Dibike, Y. B., Velickov, S., Solomatine, D. P., and Abbott, M. B. (2001). “Model induction with support vector machines: Introduction and applications.” Journal of Computing in Civil Engineering, ASCE, Vol. 15, No. 3, pp. 208-216. Donigian, A. S., Imhoff, J. C., and Bicknel, B. R. (1983). Predicting water quality resulting from agricultural nonpoint-source pollution via simulation HSPF, In Agricultural Management and Water Quality 200-249, Ames, Iowa: Iowa State University Press. Golub, G. H. and Van Loan, C. F. (1989). Matrix computations, Johns Hopkins University Press, Baltimore, MD. Gunn, S. (1998). Support vector machines for classification and regression, ISIS Technical Report, University of Southampton. Gupta, H. V., Sorooshian, S., and Yapo, P. O. (1999). “Status of automatic calibration for hydrologic models: Comparison with multilevel expert calibration.” Journal of Hydrologic Engineering, Vol. 4, No. 2, pp. 135-143. Hyndman, R. J. and Koehler, A. B. (2006). “Another look at measures of forecast accuracy.” International Journal of Forecasting, Vol. 22, No. 4, pp. 679-688. Kim, Y. O. (1994). Korea's climate and culture, Ewha Womans University Press, Seoul, 104. Klemes V. (1986). “Operational testing of hydrological simulation models.” Hydrological Sciences Journal, Vol. 31, No. 3, pp. 13-24. Legates, D. R. and McCabe, G. J. (1999). “Evaluating the use of ‘goodness-of fit’ measures in hydrologic and hydroclimatic model validation.” Water Resources Research, Vol. 35, No. 1, pp. 233-241. Lin, J. Y., Cheng, C. T., and Chau, K. W. (2006). “Using support vector machines for long-term discharge prediction.” Hydrological Sciences Journal, Vol. 51, No. 4, pp. 599-612. Liong, S. Y. and Sivapragasm, C. (2002). “Flood stage forecasting with SVM.” Journal of American Water Resources Association, Vol. 38, No. 1, pp. 173-186.

Motovilov, Y. G., Gottschalk, L., England, K., and Rodhe, A. (1999). “Validation of distributed hydrological model against spatial observations.” Agric. Forest Meteorology, Vol. 98-99, pp. 257-277. Nash J. E. and Sutcliffe J. V. (1970). “River flow forecasting through conceptual models, Part I - A discussion of principles.” Journal of Hydrology, Vol. 10, No. 3, pp. 282-290. Osuna, E., Freund, R., and Girosi, F. (1997). “An improved training algorithm for support vector machines.” Proc. of the IEEE Workshop on Neural Networks for Signal Processing VII, New York, pp. 276-285. Pearson, K. (1896). “Mathematical contributions to the theory of evolution. III. Regression, heredity and panmixia.” Philosophical Transactions of the Royal Society of London Series A, Vol. 187, pp. 253-318. Pebesma, E. J., Switzer, P., and Loague, K. (2005). “Error analysis for the evaluation of model performance: rainfall-runoff event times series data.” Hydrological Processes, Vol. 19, No. 8, pp. 1529-1548. Pelckmans, K., Suykens, J. A. K., Van Gestel, T., De Brabanter, J., Lukas, L., Hamers, B., De Moor, B., and Vandewalle, J. (2003). SVMlab toolbox user’s guide version 1.5, K. U. Leuven. Ramanarayanan, T. S., Williams, J. R., Dugas, W. A., Hauck, L. M., and McFarland, A. M. S. (1997). Using APEX to identify alternative practices for animal waste management, ASAE Paper No. 972209, St. Joseph: Michigan. Saleh, A., Arnold, J. G., Gassman, P. W., Hauk, L. M., Rosenthal, W. D., Williams, J. R., and MacFarlan, A. M. S. (2000). “Application of SWAT for the upper North Bosque River watershed.” Transaction of the ASAE, Vol. 43, No. 5, pp. 1077-1087. Santhi, C., Arnold, J. G., Williams, J. R., Dugas, W. A., Srinivasan, R., and Hauck, L. M. (2001). “Validation of the SWAT model on a large river basin with point and nonpoint sources.” Journal of American Water Resources Association, Vol. 37, No. 5, pp. 1169-1188. Scholkopf, B. and Smola A. (2002). Learning with kernels, MIT Press, Cambridge MA. Seibert, J. (2001). “On the need for benchmarks in hydrological modeling.” Hydrological Processes, Vol. 15, No. 6, pp. 1063-1064. Singh, J., Knapp, H. V., and Demissie, M. (2004). Hydrologic modeling of the Iroquois River watershed using HSPF and SWAT, ISWS CR 2004-08. Suykens, J. A. K. and Vandewalle, J. (1999). “Least squares support vector machine classifiers.” Neural Processing Letters, Vol. 9, No. 3, pp. 293-300. Suykens, J. A. K., Van Gestel, T., De Brabanter, J., De Moor, B., and Vandewalle, J. (2002). Least squares support vector machines, WorldScientic, Singapore. Van Liew, M. W., Veith, T. L., Bosch, D. D., and Arnold, J. G. (2007). “Suitability of SWAT for the conservation effects assessment project: A comparison on USDA-ARS experimental watersheds.” Journal of Hydrologic Engineering, Vol. 12, No. 2, pp. 173-189. Vapnik, V. (1995). The nature of statistical learning theory, Springer, New York. Vapnik, V. (1998). Statistical learning theory, Wiley, New York. Wang, W. C., Chau, K. W., Cheng, C. T., and Qiu, L. (2009). “A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series.” Journal of Hydrology, Vol. 374, Nos. 3-4, pp. 294-306. Wu, C. L., Chau, K. W., and Li, Y. S. (2009). “Predicting monthly streamflow using data-driven models coupled with data-preprocessing techniques.” Water Resources Research, Vol. 45, No. W08432, doi:10. 1029/2007WR 006737.

− 882 −

KSCE Journal of Civil Engineering