An EMD-Based Chaotic Least Squares Support Vector ... - MDPI

0 downloads 0 Views 4MB Size Report
Feb 23, 2017 - Yaseen, Z.M.; El-Shafie, A.; Jaafar, O.; Afan, H.A.; Sayl, M.N. Artificial ... Barge, J.T.; Sharif, H.O. An ensemble empirical mode decomposition, ...
water Article

An EMD-Based Chaotic Least Squares Support Vector Machine Hybrid Model for Annual Runoff Forecasting Xuehua Zhao 1 , Xu Chen 2 , Yongxin Xu 1,3, *, Dongjie Xi 1 , Yongbo Zhang 1 and Xiuqing Zheng 1 1

2 3

*

College of Water Resources Science and Engineering, Taiyuan University of Technology, Taiyuan 030024, China; [email protected] (Xu.Z.); [email protected] (D.X.); [email protected] (Y.Z.); [email protected] (Xi.Z.) School of Civil Engineering, Tianjin University, Tianjin 300072, China; [email protected] Department of Earth Sciences, University of the Western Cape, Cape Town 7535, South Africa Correspondence: [email protected] or [email protected]; Tel.: +86-351-601-0102

Academic Editor: Marco Franchini Received: 23 November 2016; Accepted: 16 February 2017; Published: 23 February 2017

Abstract: Accurate forecasting of annual runoff is necessary for water resources management. However, a runoff series consists of complex nonlinear and non-stationary characteristics, which makes forecasting difficult. To contribute towards improved prediction accuracy, a novel hybrid model based on the empirical mode decomposition (EMD) for annual runoff forecasting is proposed and applied in this paper. Firstly, the original annual runoff series is decomposed into a limited number of intrinsic mode functions (IMFs) and one trend term based on the EMD, which makes the series stationary. Secondly, it will be forecasted by a least squares support vector machine (LSSVM) when the IMF component possesses chaotic characteristics, and simulated by a polynomial method when it does not. In addition, the reserved trend term is predicted by a Gray Model. Finally, the ensemble forecast for the original runoff series is formulated by combining the prediction results of the modeled IMFs and the trend term. Qualified rate (QR), root mean square errors (RMSE), mean absolute relative errors (MARE), and mean absolute errors (MAE) are used as the comparison criteria. The results reveal that the EMD-based chaotic LSSVM (EMD-CLSSVM) hybrid model is a superior alternative to the CLSSVM hybrid model for forecasting annual runoff at Shangjingyou station, reducing the RMSE, MARE, and MAE by 39%, 28.6%, and 25.6%, respectively. To further illustrate the stability and representativeness of the EMD-CLSSVM hybrid model, runoff data at three additional sites, Zhaishang, Fenhe reservoir, and Lancun stations, were applied to verify the model. The results show that the EMD-CLSSVM hybrid model proved its applicability with high prediction precision. This approach may be used in similar hydrological conditions. Keywords: empirical mode decomposition; chaotic characteristics; phase space reconstruction; least squares support vector machine; runoff forecasting

1. Introduction Runoff forecasting, especially annual runoff forecasting, is required for appropriate and effective water resource planning and management [1,2]. It is generally known that the processes of runoff generation are heavily influenced by a host of factors such as precipitation, temperature, evaporation, underlying surface, and human activity [3,4]. Due to the uncertainty of these factors, runoff series tend to be nonlinear, non-stationary, and even chaotic [5,6]. It is very difficult to make accurate predictions about annual runoff, which has been a challenge for hydrologists over years. In the past two decades, the characteristic of deterministic chaos in runoff series has been observed by some researchers [7–9]. When the runoff series possesses a chaotic trait, the conventional forecasting Water 2017, 9, 153; doi:10.3390/w9030153

www.mdpi.com/journal/water

Water 2017, 9, 153

2 of 17

methods may not function properly. Recently, chaos-based runoff forecasting techniques have become increasingly attractive to the hydrology community [10,11]. Generally, the key to chaotic time series prediction is the phase space reconstruction theory, which is not concerned with whether a time series is stationary or not. Many studies have shown that once the time series has chaotic attractor, the phase space can be reconstructed when the appropriate parameters are chosen [12,13]. However, a reconstruction process may be affected by the non-stationary characteristics of the runoff series. A general method to deal with the non-stationarity is to apply the finite difference method (FDM) to the runoff series in question [14]. However, the series processed by using FDM may not remain stationary. In view of this, an empirical mode decomposition (EMD) has recently been introduced. The EMD method can self-adaptively decompose a complex non-stationary and nonlinear series into several intrinsic mode functions (IMFs). The EMD-based hybrid models have been successfully employed to predict runoff series [15,16]. However, many researchers only consider EMD a useful tool for the analysis of multi-scale variations of the runoff series [17,18], and have paid little attention to converting non-stationary runoff series into stationary ones. In this paper, the EMD method is introduced to make the time series stationary. Due to the nonlinear and chaotic characteristics of the runoff series, artificial intelligence appears to be a useful alternative to conventional statistical techniques for forecasting runoff, such as artificial neural networks (ANNs), support vector machines (SVM), and least squares support vector machines (LSSVM). Previous studies have demonstrated that ANNs are useful for runoff series forecasting [19,20]. However, the ANNs were subject to local convergence, slow learning, and poor generalization ability, which led to unsatisfactory performance. By contrast, the SVM is characterized by having non-linear kernels, high generalization ability, and sparse solutions, which is useful for forecasting small sample cases. The SVM is based on the structural risk minimization induction principle rather than the empirical risk minimization, and can achieve the global optimum by solving a quadratic optimization problem. The SVM is currently used increasingly in forecasting runoff series [21,22], although this method is often time-consuming and has a higher computational burden due to the requisite constrained optimization programming. To reduce the complexity of the optimization process, the LSSVM is proposed, which has also been successfully employed in runoff forecasting [23,24]. It adopts equality constraints rather than inequality constraints and uses the least squares linear system as its loss functions, which makes it computationally easier. The LSSVM presents similar advantages over the SVM, but an additional advantage of the LSSVM is that it only requires solving a set of linear equations rather than quadratic programming. Therefore, this paper presents an algorithm aimed at achieving accurate forecasts of annual runoff based on the combination of the EMD and chaotic least squares support vector machine (EMD-CLSSVM), which means that the original annual runoff time series are transformed into stationary subseries by EMD; subseries would be predicted by LSSVM when it has chaotic characteristics, and by a polynomial method when it does not. The main objective of this paper is to apply the EMD-CLSSVM hybrid model for annual runoff series forecasting under the case studies in the upper reaches of the Fenhe River basin, China. First, the EMD is applied to decompose the original annual runoff data into several IMFs and one trend term. Secondly, the stationarity of these IMFs was tested based on the KSPP method. Thirdly, the chaotic characteristics of the IMFs were identified based on the largest Lyapunov exponent method. The IMFs and the trend term were then modeled and forecasted using the CLSSVM, polynomial method, and Gray Model (GM). Finally, these prediction results are integrated to get the final annual runoff forecasting values. The EMD-CLSSVM hybrid model is compared to the CLSSVM hybrid approach to find a reliable hybrid forecasting model for annual runoff series.

Water 2017, 9, 153 Water 2017, 9, 153

3 of 17 3 of 17

Dataand andMethods Methods 2.2.Data 2.1.Study StudyArea Areaand andData Data 2.1. ◦ E–114 ◦ E, ◦ N–39 ◦ N), TheFenhe FenheRiver Riverisislocated locatedininthe theNorth NorthChina China(110 (110° E–114° E,35 35° N–39° N),which whichcovers coversaa The 2 2 drainagearea areaofofabout about38,728 38,728km km with withaariver riverlength lengthof of716 716km, km,as asseen seenin inFigure Figure1.1.With Withaasemi-arid semi-arid drainage climate,the theFenhe FenheRiver Riverbasin basin has average annual precipitation ranging from to 750 climate, has anan average annual precipitation ranging from 300300 to 750 mmmm andand an ◦ an annual mean temperature varying 12 The °C. The upper reaches ofFenhe the Fenhe annual mean temperature varying fromfrom 9 to 912to C. upper reaches of the RiverRiver are inare thein the above area above Lancun station. Annual datafour from four hydrologic in the upper area Lancun station. Annual runoff runoff data from hydrologic stations stations in the upper reaches, reaches, i.e., Shangjingyou, Fenhe Zhaishang, reservoir, Zhaishang, and Lancun, wereShangjingyou analyzed. Shangjingyou i.e., Shangjingyou, Fenhe reservoir, and Lancun, were analyzed. station lies lies in Lan belongs River, which to one of of the tributaries of Fenhe River. Fenhe The Zhaishang, instation Lan River, which to onebelongs of the tributaries Fenhe River. The Zhaishang, reservoir, Fenhe reservoir, and stations areofinFenhe the mainstream of FenheofRiver. The locations of the and Lancun stations areLancun in the mainstream River. The locations the stations are shown in stations are shown in Figure 1. These runoff data of flow records covering 1956 to 2000 were Figure 1. These runoff data of flow records covering 1956 to 2000 were obtained from the government obtained from the government hydrologic database. The data first were 40 years (1956–1995) of runoff data hydrologic database. The first 40 years (1956–1995) of runoff applied for calibration purpose were for calibration purpose andused the remaining five years of data were used for validation. and theapplied remaining five years of data were for validation.

Figure1.1.Locations Locationsof offour fourhydrologic hydrologicstations stationsininthe theFenhe FenheRiver Riverbasin basinofofChina. China. Figure

2.2. Methods 2.2. Methods 2.2.1.Empirical EmpiricalMode ModeDecomposition Decomposition 2.2.1. EMDisisan aneffective effectivemethod methodused usedfor forthe theanalysis analysisofofnon-stationary non-stationarytime timeseries series[25]. [25].The Themost most EMD important feature of EMD is that it can adaptively decompose the non-stationary runoff series into important feature of EMD is that it can adaptively decompose the non-stationary runoff series into severalstationary stationaryIMFs IMFs and a trend term. are produced an iterative called several and a trend term. TheThe IMFsIMFs are produced by an by iterative processprocess called sifting. sifting. Sifting is the core of the EMD method. The sifting process is summarized as follows: Sifting is the core of the EMD method. The sifting process is summarized as follows: Step 1) Identify all the extreme (maxima and minima) values of a time series x(t ) . Step 1) Identify all the extreme (maxima and minima) values of a time series x (t).

Water 2017, 9, 153

Step 2) Step 3)

Step 4) Step 5)

Generate the upper and lower envelope (u(t) and v(t)) by applying cubic spline interpolation. Compute a local mean curve m(t) of the two envelopes at the same time using m(t) = (u(t) + v(t))/2.

(1)

h ( t ) = x ( t ) − m ( t ).

(2)

Calculate the difference, Check the properties of h(t); it will be considered a valid IMF if it satisfies these two conditions:

• • Step 6)

4 of 17

The number of extreme and zero crossings must be equal or differ at most by one. At any point, the local mean value of the envelope defined by the local extremes must be zero.

When h(t) is not qualified as an IMF, repeat Steps 1) to 5) by sifting the residual series. The sifting process stops when the residual (i.e., the trend term; r(t)) satisfies the predetermined criteria.

Finally, the input runoff series can be decomposed into several IMFs and a residual r(t) as follows. In addition, the original runoff series x(t) can be exactly reconstructed by the following linear superposition: n

x (t ) =

∑ h(t) + r(t),

(3)

i=1

where h(t) is the IMF, r(t) refers to the residual term, and n is the number of IMFs. 2.2.2. Phase Space Reconstruction and Chaotic Characteristics Identification The phase space is the space of all possible states of a system with the observed data. Phase space reconstruction is the prerequisite for predictions of time series. To describe the temporal evolution of a time series in a multi-dimensional phase space, it is essential to unfold a multi-dimensional structure using univariate data, i.e., the one-dimensional time series is embedded to multi-dimensional phase space through reconstruction. For a time series {x1 , x2 , · · · , xn }, it may be reconstructed into a series of delay vectors of the following type: n o yi = xi , xi+τ , xi+2τ , · · · , xi+(m−1)τ ,

i = 1, 2, · · · , n − (m − 1)τ,

(4)

where yi is a point of m-dimensional phase space, m is the embedding dimension, and τ is the delay time. The connection of n − (m − 1)τ points describes the evolution orbit of the system in m-dimensional phase space. By means of phase space reconstruction, it is hoped that the points in m-dimensional phase space form an attractor that is defined as regular phase space orbit. In theory, a good reconstruction means near topological equivalence of the reconstructed attractor to the original one. The selection of τ and m is important for the quality of the reconstruction. Delay time τ is usually calculated by the autocorrelation function method, the multiple correlation function method, and the mutual information method [26]. The autocorrelation function method is mainly used to measure the linear correlation of a continuous time series. However, when the time series is nonlinear, this method will be ineffective. The mutual information method, based on Shannon information entropy, can not only be used to calculate the correlation between variables, but also to provide measurement of the overall dependence of variables. This method can be used to analyze the linear correlation as well as the nonlinear correlation. Therefore, in this paper, the mutual information method is employed to determine the delay time τ. The typical methods of determining embedding dimension m include the Grassberger–Procaccia algorithm, correlation integral, false nearest neighbor (FNN) [27], and the Cao method [28]. Generally, the saturated correlation dimension method requires a larger sample. The FNN method will obtain different false nearest points when the threshold value selection is different. Hence, the Cao method is adopted to determinate the embedding dimension in this paper.

Water 2017, 9, 153

5 of 17

If a time series is non-stationary, the phase space reconstruction method is robust. It is necessary to transform the time series into stationary ones before selecting the parameters of phase space. The KPSS method introduced by Kwiatkowski, Phillips, Schmidt and Shin [29] is used to test the stationarity of a time series. Let yt , t = 1, 2, . . . , N, be the time series for which we wish to test stationarity. Let et , t = 1, 2, . . . , N be the residuals from the regression of y on an intercept and time trend. Let σˆ ε2 be the estimate of the error variance from this regression (the sum of squared residuals, divided by N). Define the partial sum process of the residuals: t

St =

∑ ei , t = 1, 2, . . . ,

N.

(5)

i=1

The Lagrange Multiplier (LM) statistic is N

LM =

∑ St2 /σˆε2.

(6)

t=1

The null hypothesis of this test is stationary series. Thus if the LM statistic value is greater than the critical value (chosen alpha level), the null hypothesis is rejected. On the contrary, if the LM statistic value is less than the critical value, the null hypothesis cannot be rejected. The diagnosis of the character of chaos can begin when the phase space has been reconstructed. Chaotic characteristics identification is significant to reveal the essential law of runoff series and establish a reliable forecasting model. The usual methods of chaotic characteristics identification include the phase portrait, power spectrum, saturated correlation dimension, largest Lyapunov exponent, Kolmogorov entropy, and so on [30]. The largest Lyapunov exponent is employed to identify the chaotic characteristics in this paper. 2.2.3. Chaotic Least Squares Support Vector Machine Model Given the training samples Xt = {x1 , x2 , · · · , xn }, if we choose the proper delay time τ, embed dimension m, and transform the prediction series Xt into new m dimension data space, it may be expressed as follows:      

X1 X2 .. . XM





    =    

x1 x2 .. . xM

x1+τ x2+τ .. . x M +τ

x1+2τ x2+2τ .. . x M+2τ

· · · x1+(m−1)τ · · · x2+(m−1)τ .. .. . . · · · x M+(m−1)τ

   ,  

(7)

where M = n − (m − 1)τ, X2 = F(X1 ), X3 = F(X2 ), . . . , XM = F(XM−1 ), and the relation is: n o Xt+1 = xt+1 , xt+1+τ , xt+1+2τ , · · · , xt+1+(m−1)τ = F(xt , xt+τ , xt+2τ , · · · , xt+(m−1)τ ).

(8)

The former m − 1 dimension of Xt+1 is as in the historic data, which is converted into a single output as follows: xt+1+(m−1)τ = f (xt , xt+τ , xt+2τ , · · · , xt+(m−1)τ ), (9) where f (x) is the mapping from Rm to R. The essence of the prediction problem is to obtain a best approximation of f (x), which has high nonlinearity. Therefore, a nonlinear function is adopted to fit the mapping. Combined with the LSSVM model, training data is obtained as follows: Y = [ x2+(m−1)τ

x3+(m−1)τ

T

· · · x M+1+(m−1)τ ] .

(10)

Assuming the current training sample is as shown in Equation (10), the length of the time window is M. When one new series comes into the time window, the oldest series X1 is removed. Therefore,

Water 2017, 9, 153

6 of 17

Y = [ x2 + ( m −1)τ

Water 2017, 9, 153

x3+ ( m −1)τ

 xM +1+ ( m −1)τ ]T .

6 (10) of 17

Assuming the current training sample is as shown in Equation (10), the length of the time window is M . When series into the time window, removed. the training series goesone backnew to the formcomes of Equation (10) again. Using the the oldest LSSVMseries algorithm, current X1 isthe regression function is given as follows: Therefore, the training series goes back to the form of Equation (10) again. Using the LSSVM algorithm, the current regression function is given as follows: M

f (x) = ∑M αi K(xi , x) + b. f ( x) =i= 1 α i K ( xi , x ) + b . i =1

(11) (11)

Furthermore, the result of the first prediction step is Furthermore, the result of the first prediction step is M M

xˆ M+x2ˆ+( = X MM++1 )1 )++ b ,b, ∑ααiiK (xii, X ττ = M +m 2 +− ( m1−)1) =11 ii=

(12) (12)

where x1M++1τ+2,τx, where XXMM++11 ==[ x[Mx+M1,+x1M,+x1+M · −·1)·τ ], x. M+1+(m−1)τ ]. τ ,+ M2τ +1+,( m M+, 1x+ Thus, the the result result of of pp step step prediction prediction model is, Thus, M

ˆM + p +1+ ( m −1)τ = =  ααi KK((xxi ,,XXM + p ) +) + b ,b, xˆ M+xp+ M+ p i i 1+(m−1)τ i∑ =1 M

(13) (13)

i=1

where X M + p = [ xM + p , xM + p+τ , xM + p+2τ ,, xM + p+( m−1)τ ] . where X M+ p = [x M+ p , x M+ p+τ , x M+ p+2τ , · · · , x M+ p+(m−1)τ ]. 3. 3. Model ModelResults Resultsand andDiscussion Discussion 3.1. 3.1. Stationarity StationarityTest Test of of the the Original Original Runoff Series In In order order to to examine examine the the stationarity stationarity of of aa runoff runoff series, series, the the KPSS KPSS method method is is used used to to test test the the stationarity result indicates thatthat thethe LMLM statistic value (0.847) of the stationarityof ofthe theoriginal originalrunoff runoffseries. series.The The result indicates statistic value (0.847) of annual runoff series for for Shangjingyou hydrologic station is is greater than the annual runoff series Shangjingyou hydrologic station greater thanthe thecritical criticalvalue value(0.463), (0.463), which which shows shows that that the the annual annual runoff runoff series series for for Shangjingyou Shangjingyou station stationisisnon-stationary. non-stationary. Therefore, Therefore, itit is is necessary necessary to to transform transform the the runoff runoff series series to become stationary. 3.2. EMD EMDofofRunoff RunoffSeries Series 3.2. The original original runoff runoff series series are are decomposed decomposed into into several several sub-series sub-seriesover overthe thesame sametime timedomain, domain, The namely, IMF1 to IMF5, hydrologic station, as namely, IMF5, and and one onetrend trendterm termbased basedon onthe theEMD EMDfor forShangjingyou Shangjingyou hydrologic station, seen in in Figure 2. 2. as seen Figure

Figure Figure2.2.Empirical Empiricalmode modedecomposition decomposition(EMD) (EMD)results results of of runoff runoff series series for for Shangjingyou Shangjingyou station. station.

Water 2017, 9, 153 Water 2017, 9, 153

7 of 17 7 of 17

The KPSS is also used to analyze the stationarity of IMFs. As shown in Table 1, the LM statistic value of several IMFs for Shangjingyou station are all less than the critical value, which indicates the The KPSS is also used to analyze the stationarity of IMFs. As shown in Table 1, the LM statistic runoff series decomposed by the EMD become stationary ones. value of several IMFs for Shangjingyou station are all less than the critical value, which indicates the runoff series decomposed by the EMD become stationary ones. Table 1. Lagrange Multiplier (LM) statistic, delay time τ, embedding dimension m, average period p, and Lyapunov exponent λ1 of intrinsic mode functions (IMFs) for Shangjingyou station. Table 1. Lagrange Multiplier (LM) statistic, delay time τ, embedding dimension m, average period p, and Lyapunov λ1 of intrinsic functions (IMFs) Shangjingyou station. λ1 IMFs exponent LM Statistic Average Period p forLyapunov Exponent τ mmode

IMF1 0.114 LM Statistic IMF2 0.148 IMF1IMF3 0.114 0.063 IMF2 0.148 IMF4 0.075 IMF3 0.063 0.571 IMF4IMF5 0.075

1 1τ 21 41 2 14

IMF5

1

IMFs

0.571

2 5 4 6 6

m 2 5 4 6 6

3 4Average Period p 3 4 4 8 4 23 8 23

0.113 Lyapunov Exponent λ1 0.160 0.113 0.101 0.160 0.040 0.101 — 0.040 —

3.3. Determination of Delay Time τ and Embedding Dimension m 3.3. Figure Determination of Delay τ and Embedding Dimension m 3 indicates theTime mutual information I (τ ) of IMF1, IMF2, IMF3, IMF4, and IMF5 for

each information runoff series is attained firstly at the lagIMF5 time of Shangjingyou I (τthe ) for Figure 3station. indicates mutual I (τthe ) ofminimum IMF1, IMF2, IMF3, IMF4, and for station. I (the τ ) for runoff series the minimum firstly at the lag time 1,Shangjingyou 1, 2, 4, and 1. Therefore, fiveeach values of the delayistime τ for IMF1,attained IMF2, IMF3, IMF4, and IMF5 1, 1, 2, 4, and 1.station Therefore, five values the4,delay τ for IMF1, (Table IMF2, IMF3, IMF4, and IMF5 atofShangjingyou are the chosen as 1, 1,of 2, and time 1, respectively 1). The embedding at Shangjingyou station as 1, 1, 2, station, 4, and 1,asrespectively (Table 1). Cao The embedding dimension dimension m of the IMFsare forchosen Shangjingyou determined by the method, is shown in m of the Table 1. IMFs for Shangjingyou station, as determined by the Cao method, is shown in Table 1.

Figure 3. Cont.

Water 2017, 9, 153 Water 2017, 9, 153 Water 2017, 9, 153

8 of 17 8 of 17 8 of 17

Figure 3. Delay time τ of several intrinsic mode functions (IMFs) for Shangjingyou station ((a) IMF1; FigureFigure 3. Delay time τ of several intrinsic mode functions (IMFs) forfor Shangjingyou station 3. Delay time τ of several intrinsic mode functions (IMFs) Shangjingyou station((a) ((a)IMF1; IMF1; (b) IMF2; (c) IMF3; (d) IMF4; and (e) IMF5). (b) IMF2; (c) IMF3; (d) IMF4; and (e) (b) IMF2; (c) IMF3; (d) IMF4; andIMF5). (e) IMF5).

3.4.3.4. Identification of ofChaotic Characteristics Identification Chaotic Characteristics 3.4. Identification of Chaotic Characteristics The average periods ofofseveral IMFs for Shangjingyou stationneed need to bedetermined determined before The average periods several IMFs Shangjingyou station before The average periods of several IMFs for for Shangjingyou station need totobebedetermined before identifying the chaotic characteristics by the largest Lyapunov exponent method. Therefore, the Hilbert identifying the chaotic characteristics by the largest Lyapunov exponentmethod. method.Therefore, Therefore, the the identifying the chaotic characteristics by the largest Lyapunov exponent transform [25] is used toiscalculate the average period pperiod with the results shownshown in Table 1. Hilbert transform [25] used to calculate the average p with the results in Table 1. Hilbert transform [25] is used to calculate the average period p with the results shown in Table 1. The Lyapunov exponent isis used to the chaotic chaoticcharacteristics characteristicsofofIMFs IMFs Thelargest largest Lyapunov exponent used to identify identify forfor The largest Lyapunov exponent is used to identify thethe chaotic characteristics of IMFs for Shangjingyou station. It can be seen in Table 1Table that the Lyapunov exponent λ of IMFs for Shangjingyou Shangjingyou station. It can be seen in 1 that the Lyapunov exponent λ 1 of IMFs for 1 Shangjingyou station. It can be seen in Table 1 that the Lyapunov exponent λ1 of IMFs for station is greaterstation than zero, except forzero, IMF5, which indicates that IMFs have chaotic characteristics. Shangjingyou is greater than except for IMF5, which indicates that IMFs havechaotic chaotic Shangjingyou station is greater than zero, except for IMF5, which indicates that IMFs have characteristics. Theofaverage period of the IMF5 for Shangjingyou station 23 years,ofbut thefor The average period the IMF5 for Shangjingyou station was 23 years, butwas the lengths IMF5 characteristics. The average period of the IMF5 for Shangjingyou station was 23 years, but the lengths of IMF5 forare Shangjingyou are only 45 years. Therefore, the Lyapunov exponent Shangjingyou station only 45 years.station Therefore, the Lyapunov exponent method would be unable to lengths of IMF5 for Shangjingyou station are only 45 years. Therefore, the Lyapunov exponent method unable to analyze chaotic characteristics the IMF5. In view of this, the to analyze thewould chaoticbecharacteristics of thethe IMF5. In view of this, the of Power Spectrum method is used method would be unable to analyze the chaotic characteristics of the IMF5. In view of this, the Powerthe Spectrum is used of to the analyze characteristics ofFigure the IMF5 for Shangjingyou analyze chaotic method characteristics IMF5the forchaotic Shangjingyou station. 4 shows that the IMF5 Power Spectrum method is used to analyze the chaotic characteristics of the IMF5 for Shangjingyou Figure 4station shows has thatathe IMF5 for Shangjingyou station single peak, which indicates forstation. Shangjingyou single peak, which indicates that has the aIMF5 time series does not have station. Figure 4 shows that the IMF5 for Shangjingyou station has a single peak, which indicates that the IMF5 time series does not have chaotic characteristics. chaotic characteristics. that the IMF5 time series does not have chaotic characteristics.

Figure4.4.Power Powerspectrum spectrum of of IMF5 IMF5 for Figure for Shangjingyou Shangjingyoustation. station.

Figure 4. Power spectrum IMF5 IMF1, for Shangjingyou station. Chaotic characteristics are only foundoffrom IMF2, IMF3, and IMF4 for Shangjingyou Chaotic characteristics are only found from IMF1, IMF2, IMF3, and IMF4 for Shangjingyou station. station. The IMF5 for Shangjingyou station has no chaotic characteristics. The IMF5 for Shangjingyou has no from chaotic characteristics. Chaotic characteristics arestation only found IMF1, IMF2, IMF3, and IMF4 for Shangjingyou

station. The IMF5 for Shangjingyou station has no chaotic characteristics.

Water 2017, 9, 153 Water 2017, 9, 153

9 of 17 9 of 17

3.5. 3.5.Hybrid HybridModel Model For Forthe thechaotic chaotictime timeseries, series,after afterthe thephase phasespace spacereconstruction, reconstruction,the theLSSVM LSSVMmodel modelisistrained trainedby by the and σ, σ, where where ε thereconstructed reconstructedlearning learningsamples. samples.The Theparameters parametersof ofthe the LSSVM LSSVM model model include include C, C, ε, ε, and ε decides decidesthe theSVM SVM number and model generalization. greater the value of lower ε, the the lower the number and model generalization. TheThe greater the value of ε, the number number of SVM and the worse the model precision. ε is taken to be 0.001 in this paper. The of SVM and the worse the model precision. ε is taken to be 0.001 in this paper. The parameters of C parameters ofcontrol C andthe σ model jointly complexity control theand model complexity andInprediction accuracy. In the and σ jointly prediction accuracy. the calculation, the Bayesian calculation, the Bayesian Information Criterion (BIC) is used to optimally select the parameters of C Information Criterion (BIC) is used to optimally select the parameters of C and σ. The selected and σ. The selected parameters parameters are shown in Tableare 2. shown in Table 2. Table Table2.2.Least Leastsquares squaressupport supportvector vectormachine machine(LSSVM) (LSSVM)model modelparameters parametersofofIMFs IMFsfor forShangjingyou Shangjingyou station. station.C:C:penalty penaltyfactor; factor;σ:σ:parameter parameterofofkernel kernelfunction. function. IMFs C σ

IMFs IMF1 IMF1 C 625.2 625.2 σ 0.1 0.1

IMF2 IMF3 IMF4 IMF2 IMF3 316.0 68.1 18.1 316.0 68.1 49.2 7.2 8.3 49.2

7.2

IMF4 18.1 8.3

3 Runoff(million m)

The CLSSVM is used to train and test the IMF1, IMF2, IMF3, and IMF4 for Shangjingyou The CLSSVM is used to train test theto IMF1, IMF2, andthe IMF4 Shangjingyou station. station. The polynomial method is and employed model andIMF3, forecast lastfor IMF for Shangjingyou The polynomial method is employed to model and term. forecast the last for Shangjingyou station, station, and the GM is applied to predict the trend Finally, theIMF resultant predictions of the and the GM is applied to predict the trend term. Finally, the resultant predictions of the modeled modeled IMFs and trend term are summed to formulate an ensemble forecast for the original runoff IMFs In and trend areprove summed to formulate forecast for the original series. series. order to term further the superiority of an theensemble EMD-CLSSVM hybrid model overrunoff the others, In CLSSVM order to further superiority EMD-CLSSVM hybrid model thethe others, the the hybrid prove modelthe was employedofasthe a comparative forecast model to over predict annual CLSSVM hybrid model was employed as a comparative forecast model to predict the annual runoff, runoff, which means that in place of the EMD-CLSSVM model, the CLSSVM model is used with whichmodels means unchanged. that in place The of theprediction EMD-CLSSVM the CLSSVM model usedbe with other models other resultsmodel, are shown in Figure 5. Itis can seen from the unchanged. The prediction results are shown in Figure 5. It can be seen from the hydrographs that the hydrographs that the EMD-CLSSVM hybrid model shows better performance for annual runoff EMD-CLSSVM hybrid model shows better performance for annual runoff forecasting compared with forecasting compared with the CLSSVM hybrid model in the testing period. the CLSSVM hybrid model in the testing period.

70

measured value

60

CLSSVM

50 40

EMD-CLSSVM

30 20 10 0 1996

1997

1998

1999

2000

Year Figure 5. Prediction results of annual runoff for Shangjingyou station. CLSSVM: chaotic least squares Figure 5. Prediction results of annual runoff for Shangjingyou station. CLSSVM: chaotic least squares support vector machine; EMD-CLSSVM: EMD-based chaotic LSSVM. support vector machine; EMD-CLSSVM: EMD-based chaotic LSSVM.

3.6.Comparative ComparativeAnalysis Analysis 3.6. To verify verify the representativeness of the hybridhybrid model, model, Fenhe reservoir, To thestability stabilityand and representativeness of EMD-CLSSVM the EMD-CLSSVM Fenhe Zhaishang, and Lancun stations are used as examples. The original runoff series are decomposed into reservoir, Zhaishang, and Lancun stations are used as examples. The original runoff series are several IMFs and a trend term is based on the EMD for the three hydrologic stations, as shown decomposed into several IMFs and a trend term is based on the EMD for the three hydrologicin Figures as 6–8. stations, shown in Figures 6–8.

Water 2017, 9, 153 Water 2017, 9, 153 Water 2017, 9, 153

10 of 17 10 of 17 10 of 17

Figure 6. EMD results of runoff series for Fenhe reservoir station. Figure Figure6.6. EMD EMD results resultsof ofrunoff runoffseries seriesfor forFenhe Fenhereservoir reservoirstation. station.

Figure 7. EMD results of runoff series for Zhaishang station. Figure7.7. EMD EMD results resultsof ofrunoff runoffseries seriesfor forZhaishang Zhaishangstation. station. Figure

Water 2017, 9, 153 Water 2017, 9, 153

11 of 17 11 of 17

Figure 8. 8. EMD EMD results results of of runoff runoff series series for for Lancun Lancun station. station. Figure

Analysis results of the chaotic characteristics of runoff series for the three stations are given in Analysis results of the chaotic characteristics of runoff series for the three stations are given in Table 3. Table 3. Table3.3.LM LM statistic, delay τ, embedding dimension m, average p, and Lyapunov statistic, delay timetime τ, embedding dimension m, average period p,period and Lyapunov exponent Table 1 of IMFs for the three stations. exponent λ λ1 of IMFs for the three stations.

Stations

IMFs LM Statistic LM Statistic τ IMF1 0.180 IMF1IMF2 0.180 1 0.087 IMF2 0.087 1 0.105 Fenhe Fenhe reservoir IMF3 IMF3 0.105 3 reservoir 0.112 IMF4IMF4 0.112 3 0.363 IMF5IMF5 0.363 2 0.248 IMF1IMF1 0.248 1 IMF2IMF2 0.119 2 0.119 Zhaishang Zhaishang IMF3 0.097 2 IMF3 0.097 IMF4 0.335 1 IMF4 0.335 IMF1IMF1 0.129 1 0.129 IMF2 0.129 2 IMF2 0.129 Lancun 0.082 2 Lancun IMF3 0.082 IMF4IMF3 0.329 2 IMF4 0.329 Stations

IMFs

τ 1 1 3 3 2 1 2 2 1 1 2 2 2

m m 7 77 7 6 6 55 4 77 7 33 44 77 7 7 4 44 4

Average Period p Lyapunov Exponent λ1 Average Period p Lyapunov Exponent λ1 7 0.091 0.091 7 7 0.042 7 0.042 6 0.023 6 0.023 5 5 0.168 0.168 2323 —— 7 7 0.01 0.01 0.169 7 7 0.169 0.169 3 3 0.169 23 23 —— 0.163 7 7 0.163 7 0.017 7 0.017 4 0.140 4 23 0.140— 23 —

Similarly, it can be seen in Table 3 that the Lyapunov exponent λ1 of the IMFs for Fenhe reservoir, Similarly, it can be seen in Table 3 that the Lyapunov exponent λ1 of the IMFs for Fenhe Zhaishang, and Lancun stations are more than 0, except for the IMF5 for Fenhe reservoir station and reservoir, Zhaishang, and Lancun stations are more than 0, except for the IMF5 for Fenhe reservoir the IMF4 for Zhaishang and Lancun stations, which indicates that the IMFs have chaotic characteristics. station and the IMF4 for Zhaishang and Lancun stations, which indicates that the IMFs have chaotic The average period of the IMF5 for Fenhe reservoir station and the IMF4 for Zhaishang and Lancun characteristics. The average period of the IMF5 for Fenhe reservoir station and the IMF4 for stations is 23 years. Therefore, the Power Spectrum method is used to analyze the chaotic characteristics Zhaishang and Lancun stations is 23 years. Therefore, the Power Spectrum method is used to of the IMF5 for Fenhe reservoir station, and the IMF4 for Zhaishang and Lancun stations. Figures 9–11 analyze the chaotic characteristics of the IMF5 for Fenhe reservoir station, and the IMF4 for show that the IMF5 for Fenhe reservoir stations and the IMF4 for Zhaishang and Lancun stations each Zhaishang and Lancun stations. Figures 9–11 show that the IMF5 for Fenhe reservoir stations and have single peaks, which indicates that the three time series do not have chaotic characteristics. the IMF4 for Zhaishang and Lancun stations each have single peaks, which indicates that the three time series do not have chaotic characteristics.

Water 2017, 9, 153 Water 2017, 9, 153 Water 2017, 9, 153 Water 2017, 9, 153

12 of 17 12 of 17 12 of 17 12 of 17

Figure 9. Power spectrum of IMF5 for Fenhe reservoir station. Figure spectrumof ofIMF5 IMF5for forFenhe Fenhereservoir reservoir station. Figure 9. 9. Power Power spectrum station. Figure 9. Power spectrum of IMF5 for Fenhe reservoir station.

Figure 10. Power spectrum of IMF4 for Zhaishang station. Figure 10. Power spectrum of IMF4 for Zhaishang station. Figure of IMF4 IMF4for forZhaishang Zhaishangstation. station. Figure10. 10.Power Power spectrum spectrum of

Figure 11. Power spectrum of IMF4 for Lancun station. Figure 11. Power spectrum of IMF4 for Lancun station. Figure 11. Power spectrum of IMF4 for Lancun station. Figure 11. Power spectrum of IMF4 for Lancun station.

The prediction results analyzed from Figures 12–14 are similar to those in Figure 5. It can be The prediction results analyzed from Figures 12–14 are similar to those in Figure 5. It can be seenThe from the hydrographs that the from CLSSVM hybrid model shows worse performance annual prediction results analyzed Figures 12–14 are similar to those in Figure 5.for It can be seen from the hydrographs that thefrom CLSSVM hybrid model shows to worse performance forcan annual The prediction results analyzed Figures 12–14 are similar those in Figure 5. It be seen runoff forecasting than the EMD-CLSSVM hybrid model in the testing period, especially in Figures seen from the hydrographs that the CLSSVM hybrid model shows worse performance for annual runoff forecasting than the EMD-CLSSVM hybrid model in the testing period, especially in Figures from the hydrographs that the CLSSVM hybrid model shows worse performance for annual runoff 13 andforecasting 14. runoff than the EMD-CLSSVM hybrid model in the testing period, especially in Figures 13 and 14. forecasting 13 and 14.than the EMD-CLSSVM hybrid model in the testing period, especially in Figures 13 and 14.

Water 2017, 9, 153

13 of 17

Water 2017, 9, 153

13 of 17

3

3 3 (million Runoff )m)m) (Runoff (m Runoff million million

Water Water 2017, 9, 153

13 13 of of 17 17

700 600 700

measured value

700 500 600 600 400 500

CLSSVM value measured EMD-CLSSVM CLSSVM measured value

500 300 400 400 200 300

EMD-CLSSVM CLSSVM EMD-CLSSVM

300 100 200 200 0 100 100 01996 01996 1996

1997

1998

1999

2000

1997 1997

1998 Year 1998 Year

1999 1999

2000 2000

Figure 12. Prediction results of annualYear runoff for Fenhe reservoir station.

3

3 3 Runoff (million Runoff Runoff (million (million m) m) m)

Figure 12. Prediction results of annual runoff for Fenhe reservoir station. Figure 12. 12. Prediction Predictionresults results of of annual annual runoff runoff for for Fenhe Fenhe reservoir reservoir station. station. Figure

800 700 800 600 700 800 500 600 700 400 500 600 300 400 500 200 300 400 100 200 300 0 100 200

01996 100 01996 1996

measured value CLSSVM value measured EMD-CLSSVM CLSSVM measured value EMD-CLSSVM CLSSVM EMD-CLSSVM

1997

1998

1999

2000

1997 1997

1998 Year 1998 Year

1999 1999

2000 2000

Figure 13. Prediction results of annual runoff for Zhaishang station. Year Figure 13. Prediction results of annual runoff for Zhaishang station. Figure 13. Prediction results of annual runoff for Zhaishang station.

3 3 3 (million Runoff ( (million Runoff Runoff million m)m)m)

Figure 13. Prediction results of annual runoff for Zhaishang station.

800 700 800 600 700 800 500 600 700 400 500 600 300 400 500 200 300 400 100 200 300 0 100 200 01996 100 01996 1996

measured value CLSSVM value measured measured value EMD-CLSSVM CLSSVM CLSSVM EMD-CLSSVM EMD-CLSSVM

1997

1998

1999

2000

1997 1997

1998 Year 1998 Year

1999 1999

2000 2000

Figure 14. Prediction results of annual runoff for Lancun station. Figure 14. Prediction results of annual Yearrunoff for Lancun station.

Figure 14. Prediction results of annual runoff for Lancun station.

3.7. Evaluation Evaluation of of Model Model Performance Performance 3.7.

Figure 14. Prediction results of annual runoff for Lancun station.

To evaluate the thethe models, qualified rate (QR), root mean 3.7. Evaluation of Model Performanceof of To evaluate theperformance performance models, qualified rate (QR), root square mean errors square(RMSE), errors mean absolute relative errors (MARE), and mean absolute errors (MAE) are employed to measure the 3.7. Evaluation of Model Performance (RMSE), mean absolute relative errors (MARE), andqualified mean absolute errors (MAE) aresquare employed to To evaluate the performance of the models, rate (QR), root mean errors goodness of fit of various models in this paper. Each statistical measure provides unique insight into measure the goodness of fit of various models in this paper. Each statistical measure provides (RMSE), mean absolute relative errors (MARE), andqualified mean absolute errorsroot (MAE) aresquare employed to To evaluate the performance of the models, rate (QR), mean errors the performanceinto of the model. The forecast is eligible if the forecast errorthe is less than unique theof performance of thevalue model. Thepaper. forecast value isrelative eligible forecast measure the goodness fit of errors various models in this Each statistical measure provides (RMSE),insight mean absolute relative (MARE), and mean absolute errors (MAE) areif employed to 20% [31]. TheisQR is than defined as[31]. the The ratioQR of the numberasofthe eligible years to the number of prediction relative error less 20% is defined ratio of the number of eligible years to unique insight into the performance of the model. The forecast value is eligible if the forecast measure the goodness of fit of various models in this paper. Each statistical measure provides years. Model performance can be measured as “satisfactory” if QR > 70%, as “good” if QR > 80%, relative is less 20% [31]. The of QRthe is defined theforecast ratio ofvalue the number of eligible to unique error insight into than the performance model. as The is eligible if the years forecast relative error is less than 20% [31]. The QR is defined as the ratio of the number of eligible years to

Water 2017, 9, 153

14 of 17

as “very good” if QR > 90%, and as “unsatisfactory” if QR < 70%. RMSE and MARE provide different types of information about the predictive capabilities of the model. MAE provides a more balanced measure of goodness of fit at moderate values [32]. The assessments of training and testing results obtained by the two models for four stations are presented in Table 4. Table 4. Performance indicators of different models for four hydrologic stations during training and testing periods.

Station

Hybrid Model

Shangjingyou

Training

Testing QR

RMSE (106 m3 )

MARE

MAE (106 m3 )

8.12 2.93

100% 100%

2.49 1.52

0.07 0.05

2.03 1.51

0.27 0.21

70.83 53.43

100% 100%

11.03 9.93

0.05 0.03

9.89 7.95

114.91 40.19

0.32 0.09

90.91 30.33

40% 100%

77.05 17.71

0.26 0.07

67.64 15.72

106.63 24.18

0.30 0.07

65.58 19.11

40% 100%

85.84 12.16

0.30 0.05

74.17 10.18

MARE

MAE (106 m3 )

10.19 3.78

0.21 0.07

65% 97%

120.33 75.17

CLSSVM EMD-CLSSVM

68% 97%

CLSSVM EMD-CLSSVM

58% 97%

QR

RMSE (106 m3 )

CLSSVM EMD-CLSSVM

61% 97%

Fenhe reservoir

CLSSVM EMD-CLSSVM

Zhaishang Lancun

Note: QR: qualified rate; RMSE: root mean square errors; MARE: mean absolute relative errors; MAE: mean absolute errors.

Table 4 shows that the EMD-CLSSVM hybrid model for Shangjingyou station has good performance during both training and testing. In the training period, the EMD-CLSSVM hybrid model obtained good QR, RMSE, MARE and MAE statistics, which increases the QR by 59%, and reduces the RMSE, MARE, and MAE by 62.9%, 66.7%, and 63.9%, respectively, in comparison with CLSSVM hybrid model. The EMD-CLSSVM hybrid model obtained good QR, RMSE, MARE, and MAE statistics in the testing period, reducing the RMSE, MARE, and MAE by 39%, 28.6%, and 25.6%, respectively, in comparison with the CLSSVM hybrid model. Table 4 shows that the performance of the CLSSVM hybrid model is worse than the EMD-CLSSVM hybrid model for the other three hydrologic stations. Training QR and testing QR of the EMD-CLSSVM hybrid model for the other three hydrologic stations also reach up to 97% and 100%, respectively. The training QR of the CLSSVM hybrid model for the other three hydrologic stations is about 60%, and the testing QR of the CLSSVM hybrid model for Fenhe reservoir station is 100%. However, the testing QR of the CLSSVM hybrid model for Zhaishang and Lancun stations is 40%, i.e., it is disqualified. It can be observed from Table 4 that EMD significantly increases the accuracy of the CLSSVM hybrid model in terms of RMSE, MARE, and MAE. For Fenhe reservoir station in the training period, the EMD-CLSSVM hybrid model reduces the RMSE, MARE, and MAE by 37.5%, 22.2%, and 24.6%, respectively, as comparison with the CLSSVM hybrid model. In the testing period, the reductions in RMSE, MARE, and MAE are 10%, 40%, and 19.6%, respectively. For Zhaishang station in the training period, the EMD-CLSSVM hybrid model reduces the RMSE, MARE and MAE by 65%, 71.9%, and 66.6%, respectively, in comparison with the CLSSVM hybrid model. In the testing period the reductions in the RMSE, MARE, and MAE are 77%, 73.1%, and 76.8%, respectively. For Lancun station in the training period, the EMD-CLSSVM hybrid model reduces the RMSE, MARE, and MAE by 77.3%, 76.7%, and 70.9%, respectively, in comparison with the CLSSVM hybrid model. In the testing period the reductions in the RMSE, MARE, and MAE are 85.8%, 83.3%, and 86.3%, respectively. Similarly, it can be seen from Table 4 that the EMD-CLSSVM hybrid model has better accuracy in forecasting annual runoff at Fenhe reservoir, Zhaishang, and Lancun stations, which is consistent with the results for Shangjinyou station. Therefore, the forecasting results of the other three stations have further confirmed the high consistency and good stability of the EMD-CLSSVM hybrid model.

Water 2017, 9, 153

15 of 17

The optimal model that gives the minimum RMSE, MARE, and MAE was selected. The EMD-CLSSVM hybrid model has good performance with lower RMSE, MARE, and MAE values. The reason the forecasting accuracy of the EMD-CLSSVM hybrid model is better than that of the CLSSVM hybrid model lies in the advantages of EMD, e.g., decomposing the original runoff series into several sub-series and transforming the time series into stationary ones. A stationary series is needed when selecting the parameters of phase space in chaotic theory. A nonstationary time series affects the performance of phase space reconstruction, which leads to a reduction in prediction accuracy. In practice, it is very difficult to forecast nonstationary annual runoff series. EMD is introduced to transform nonstationary series into stationary ones, taking advantage of a combination of chaotic theory and LSSVM methods, which enhances the prediction accuracy. On the other hand, EMD is also a decomposition process. The decomposition strategy does effectively enhance the prediction accuracy. From Table 4, it is clear that the prediction accuracy of the CLSSVM hybrid model is worse than the prediction accuracy of the EMD-CLSSVM hybrid model. For example, compared to the CLSSVM hybrid model, the RMSE, MARE, and MAE of the EMD-CLSSVM hybrid model are all reduced by more than 80% for Lancun station in the testing period; the RMSE, MARE, and MAE of the EMD-CLSSVM hybrid model are reduced by more than 70% for Zhaishang station in the testing period. Zhaishang station and Lancun station are located in the lower reaches of the Fenhe reservoir. The runoff time series are affected by human activities at the two stations, including the characteristics of multi-scale changes. These results indicate that many multi-scale components with different characteristics exist in the runoff time series. The decomposition process segregates the multi-scale components from the runoff time series and predicts the components separately, and this can enhance the forecasting performance. The main reason for this improvement is that the EMD-CLSSVM hybrid model is capable of decomposing original runoff series into stationary time series using EMD, which indicates that the EMD-CLSSVM hybrid model significantly outperformed the CLSSVM hybrid model. The models applied to the four hydrological stations have achieved consistent results, which indicates that the EMD-CLSSVM hybrid model has the characteristics of high consistency and great stability. 4. Conclusions If the runoff series possesses a chaotic trait, the traditional forecasting methods cannot perform properly. To improve the accuracy of runoff forecasting, the LSSVM model may be an effective tool for the prediction of chaotic runoff series. Thus, the EMD-CLSSVM hybrid model was investigated for prediction purpose. The EMD-CLSSVM hybrid model was coupled with the EMD technique, chaotic theory, LSSVM, and other forecasting methods. The annual runoff data from Shangjingyou station in Lan River, a tributary of the Fenhe River in China, were used to develop the model, while Fenhe reservoir, Zhaishang, and Lancun in Fenhe River in China were used to verify the proposed method. Firstly, the KPSS method was used to analyze the stationarity of the runoff series. The results showed that the runoff series for four hydrologic stations was nonstationary. Using the EMD technique, the original runoff series were decomposed into several IMFs and one trend term. After the KPSS test, it was found that the decomposed IMFs were stationary. On this basis, the chaotic theory was employed to analyze the chaotic characteristics of several IMFs. It is indicated that the IMF1, IMF2, IMF3, and IMF4 for Shangjingyou and Zhaishang stations have chaotic characteristics. So do the IMF1, IMF2, and IMF3 for Zhaishang and Lancun stations. The remaining IMFs have no chaotic characteristics. Then the CLSSVM model was used to predict the IMF1, IMF2, IMF3, and IMF4 for Shangjingyou and Zhaishang stations, and the IMF1, IMF2, and IMF3 for Zhaishang and Lancun stations; the polynomial method was employed to predict the last IMF, and the GM was applied to predict the trend term. Finally, the prediction result was reconstructed to obtain the prediction value of the original runoff series. The results showed that the EMD-CLSSVM hybrid model has good performance in comparison with the CLSSVM hybrid model. For four hydrological stations, the EMD-CLSSVM hybrid model increased QR with respect to the CLSSVM hybrid model by 30%–60% and reduced RMSE, MARE, and MAE by 10%–85.8%, 22.2%–83.3%, and 19.6%–86.3%, respectively. Although runoff generation is impacted

Water 2017, 9, 153

16 of 17

by many factors, no exogenous variables are considered in this paper. The reason is that a chaotic runoff series carries enough information about the behavior of the system to carry out forecasting. Moreover, this paper focuses on the advantages of the proposed method. However, directly applying the CLSSVM hybrid model for runoff prediction does not produce a better result. The reason is that a nonstationary runoff series limits prediction accuracy. Therefore, the EMD has been introduced for stationarity conduct of the runoff series. The method proposed in this paper can be applied to the remaining reaches of Fenhe River, where data are available. It would also have potential application in other catchments with a similar environment. Acknowledgments: This research was supported by the National Natural Science Foundation of China (Grant Nos. 40901018 and 51509176), the Science and Technology Key Project on Social Development of Shanxi Province (20140313023-4), the Natural Science Foundation of Shanxi Province in China (201601D011054), the Foundation of Taiyuan University of Technology (2015MS011), and the Hydrology Bureau of Shanxi Province (ZNGZ2015-036). Author Contributions: Xuehua Zhao and Yongxin Xu designed and performed the experiments; Xuehua Zhao and Xu Chen wrote the draft of the paper; Yongxin Xu made the corrections; Dongjie Xi analyzed the runoff data; and Yongbo Zhang and Xiuqing Zheng made some comments. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2. 3. 4. 5. 6.

7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

Yaseen, Z.M.; El-Shafie, A.; Jaafar, O.; Afan, H.A.; Sayl, M.N. Artificial intelligence based models for stream-flow forecasting: 2000–2015. J. Hydrol. 2015, 530, 829–844. [CrossRef] Kisi, O.; Nia, A.M.; Goshen, M.G.; Tajabadi, M.R.J.; Ahmadi, A. Intermittent streamflow forecasting by using several data driven techniques. Water Resour. Manag. 2012, 26, 457–474. [CrossRef] Barge, J.T.; Sharif, H.O. An ensemble empirical mode decomposition, self-organizing map, and linear genetic programming approach for forecasting River streamflow. Water 2016, 8, 247. [CrossRef] Chang, J.X.; Zhang, H.X.; Wang, Y.M.; Zhu, Y.L. Assessing the impact of climate variability and human activities on streamflow variation. Hydrol. Earth Syst. Sci. 2016, 20, 1547–1560. [CrossRef] Chen, C.S.; Liu, C.H.; Su, H.C. A nonlinear time series analysis using two-stage genetic algorithms for streamflow forecasting. Hydrol. Process. 2008, 22, 3697–3711. [CrossRef] Chiew, F.H.S.; Potter, N.J.; Vaze, J.; Petheram, C.; Zhang, L.; Teng, J.; Post, D.A. Observed hydrologic non-stationarity in far south-eastern Australia: Implications for modelling and prediction. Stoch. Environ. Res. Risk Assess. 2014, 28, 3–15. [CrossRef] Islam, M.N.; Sivakumar, B. Characterization and prediction of runoff dynamics: A nonlinear dynamical view. Adv. Water Resour. 2002, 25, 179–190. [CrossRef] Hu, Z.Y.; Zhang, C.; Luo, G.P.; Teng, Z.D.; Jia, C.J. Characterizing cross-scale chaotic behaviors of the runoff time series in an inland river of Central Asia. Quat. Int. 2013, 311, 132–139. [CrossRef] Dhanya, C.T.; Kumar, D.N. Predictive uncertainty of chaotic daily streamflow using ensemble wavelet networks approach. Water Resour. Res. 2011, 47. [CrossRef] Sivakumar, B.; Bemdtsson, R.; Persson, M. Monthly runoff prediction using phase space reconstruction. Hydrol. Sci. J. 2001, 46, 377–387. [CrossRef] Tongal, H. Nonlinear forecasting of stream flows using a chaotic approach and artificial neural networks. Earth Sci. Res. J. 2013, 17, 119–126. Fathima, T.A.; Jothiprakash, V. Behavioural analysis of a time series—A chaotic approach. Sadhana-Acad. Proc. Eng. Sci. 2014, 39, 659–676. [CrossRef] Liu, Q.; Islam, S.; Rodriguez-lturbe, I.; Le, Y. Phase-space analysis of daily streamflow: Characterization and prediction. Adv. Water Resour. 1998, 21, 463–475. [CrossRef] Zhang, Q.; Wang, B.D.; He, B.; Peng, Y.; Ren, M.L. Singular spectrum analysis and ARIMA hybrid model for annual runoff forecasting. Water Resour. Manag. 2011, 25, 2683–2703. [CrossRef] Karthikeyan, L.; Kumar, D.N. Predictability of nonstationary time series using wavelet and EMD based ARMA models. J. Hydrol. 2013, 502, 103–119. [CrossRef] Kisi, O.; Latifoglu, L.; Latifoglu, F. Investigation of empirical mode decomposition in forecasting of hydrological time series. Water Resour. Manag. 2014, 28, 4045–4057. [CrossRef]

Water 2017, 9, 153

17. 18. 19.

20. 21. 22. 23. 24. 25.

26. 27. 28. 29.

30. 31. 32.

17 of 17

Huang, S.Z.; Chang, J.X.; Huang, Q.; Chen, Y.T. Monthly streamflow prediction using modified EMD-based support vector machine. J. Hydrol. 2014, 511, 764–775. [CrossRef] Di, C.L.; Yang, X.H.; Wang, X.C. A four-stage hybrid model for hydrological time series forecasting. PLoS ONE 2014, 9. [CrossRef] [PubMed] Mwale, F.D.; Adeloye, A.J.; Rustum, R. Application of self-organising maps and multi-layer perceptron-artificial neural networks for streamflow and water level forecasting in data-poor catchments: The case of the Lower Shire floodplain, Malawi. Hydrol. Res. 2014, 45, 838–854. [CrossRef] Alvaro, L.R.; Vicente, L.F.; David, P.V.; Joaquin, T.P. One-Day-Ahead Streamflow forecasting using artificial neural networks and a meteorological mesoscale model. J. Hydrol. Eng. 2015, 20, 05015001. Liu, Z.Y.; Zhou, P.; Chen, G.; Guo, L.D. Evaluating a coupled discrete wavelet transform and support vector regression for daily and monthly streamflow forecasting. J. Hydrol. 2014, 519, 2822–2831. [CrossRef] Sudheer, C.; Maheswaran, R.; Panigrahi, B.; Mathur, S. A hybrid SVM-PSO model for forecasting monthly streamflow. Neural Comput. Appl. 2014, 24, 1381–1389. [CrossRef] Shabri, A.; Suhartono, S. Streamflow forecasting using least-squares support vector machines. Hydrol. Sci. J. 2012, 57, 1275–1293. [CrossRef] Samsudin, R.; Saad, P.; Shabri, A. River flow time series using least squares support vector machines. Hydrol. Earth Syst. Sci. 2011, 15, 1835–1852. [CrossRef] Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.L.C.; Shih, H.H.; Zheng, Q.N.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. A 1998, 454, 903–995. [CrossRef] Fraser, A.M.; Swinney, H.L. Independent coordinates for strange attractors from mutual information. Phys. Rev. A 1986, 33, 1134–1140. [CrossRef] Kugiumtzis, D. State space reconstruction parameters in the analysis of chaotic time series—The role of the time window length. Phys. D 1996, 95, 13–28. [CrossRef] Cao, L.Y. Practical method for determining the minimum embedding dimension of a scalar time series. Phys. D 1997, 110, 43–50. [CrossRef] Kwiatkowski, D.; Phillips, P.C.B.; Schmidt, P.; Shin, Y.C. Testing the null hypothesis of stationarity against the alternative of a unit-root-How sure are we that economic time series have a unit-root? J. Econometr. 1992, 54, 159–178. [CrossRef] Vallejos, R.O.; Anteneodo, C. Theoretical estimates for the largest lyapunov exponent of many-particle systems. Phys. Rev. E 2002, 66. [CrossRef] [PubMed] Ministry of Water Resources. SL250–2000, Standard for Hydrological Information and Hydrological Forecasting; China Water and Power Press: Beijing, China, 2000. Karunanithi, N.; Grenney, W.J.; Whitley, D.; Bovee, K. Neural networks for river flow prediction. J. Comput. Civ. Eng. ASCE 1994, 8, 201–220. [CrossRef] © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Suggest Documents