Evaluation of a Two-Stage SVM and Spatial Statistics

0 downloads 0 Views 1MB Size Report
Oct 19, 2015 - At temporal stage, Support Vector Machine (SVM) ...... Ch S, Anand N, Panigrahi BK (2013) Streamflow forecasting by SVM with quantum ...
Water Resour Manage DOI 10.1007/s11269-015-1168-7

Evaluation of a Two-Stage SVM and Spatial Statistics Methods for Modeling Monthly River Suspended Sediment Load Vahid Nourani 1 & Farhad Alizadeh 1 & Kiyoumars Roushangar 1

Received: 20 April 2015 / Accepted: 19 October 2015 # Springer Science+Business Media Dordrecht 2015

Abstract This study is aimed on successful modeling of Ajichay River Suspended Sediment Load (SSL) which is significant object in watershed planning and management. Therefore, a two-stage modeling strategy was proposed in order to handle spatio-temporal variation of SSL. At temporal stage, Support Vector Machine (SVM) was utilized for three stations located on the Ajichay River to find the non-linear relationship of SSL in time domain. Different input sets were examined for the SVM via sensitivity analysis. Results of temporal modeling stage were used in spatial modeling. In spatial modeling stage, firstly semi-variogram of monthly SSL data was calculated and then theoretical semi-variogram model was fitted to the empirical variogram. It was found that Gaussian model is the best fitted model for the study case. The obtained results of semi-variogram were imported into Geostatistic tool for spatial estimation of SSL in sites where there is not any measurement. Results of temporal modeling stage demonstrated that input data as combination of SSL and discharges at 1 month and 12 monthes ago employing RBF based SVM could lead to the best performance for each station. Spatial modeling performance was improved relatively using streamflow dataset. The obtained results show that the hybrid of SVM and Spatial statistics methods could predict and simulated SSL appropriately by enjoying unique features of both approaches. Keywords Suspended sediment load . Support vector machine . Spatial statistics . Ajichay watershed

* Kiyoumars Roushangar [email protected] 1

Faculty of Civil Engineering, University of Tabriz, Tabriz, Iran

V. Nourani et al.

1 Introduction Suspended sediment load, as a major component functioning of rivers, has been identified as the leading direct cause of river impairments (USEPA 2000). Therefore, estimating suspended loads at different temporal scales continues to be crucial for various river and watershed management issues such as design of hydraulic structures, pollutants transport, watershed and river management, protection of wildlife habitats and environmental impact assessment (Kuhnle and Simon 2000). Yet, it has been widely recognized that sediment transport varies both spatially and temporally at the watershed scale which has become a subject of interest for engineers and geologists to peruse (e.g., Wilkinsona et al. 2014; Liu et al. 2015; Roushangar et al. 2011). Roushangar et al. (2011) described a kind of mathematical model to solve the 1D unsteady flow over a movable bed and assessed sediment transport of a coarse bed river. Although suspended sediment load may be estimated using a variety of physically based models (e.g., de Vente et al. 2006, Liu et al. 2015), In some of previous hydrological studies (e.g., Cobaner et al. 2009), development of artificial intelligence techniques as a predictor for hydrological phenomenon has created a great change in predictions. Black box models have been successfully used in various fields including water resources (Kisi et al. 2008). Recent experiences concerning hydrological forecasts have shown that Artificial Intelligence (AI) can be a proper alternative to predict hydraulic and hydrology properties (Partal and Cigizoglu 2008; Kisi et al. 2008; Zhu et al. 2007; Afan et al. 2015; Roushangar et al. 2014a, b). Roushangar et al. (2014b) developed new formulas for predicting of total sediment load using AI based and theoretical methods. Support Vector Machine (SVM), first proposed by Vapnik and Cortes (1995) is one of the most remarkable modeling tools as an alternative method of AI. SVM is on the basis of the structural risk minimization principle and the Vapnik–Chervonenkis dimension theory, and involves solving a quadratic programming problem, thus can theoretically get the global best consequence of the primal problem (Vapnik and Cortes 1995). In the recent decades, the SVM has been implemented in several hydrological fields (e.g. Noori et al. 2011; Sun et al. 2012 and Ch et al. 2013). As well, in suspended sediment estimation (Misra et al. 2009; Kakaei et al. 2013; Nourani and Andalib 2015). Kakaei et al. (2013) investigated the capability of Artificial Neural Network (ANN) and SVM models for prediction of daily SSL. They showed that ANN and SVM models linked to the Gamma Test for input selection could lead to better efficiency than the regression combination. Nourani and Andalib (2015) verified the efficiency of wavelet-based Least Square Support Vector Machine (WLSSVM) for prediction of daily and monthly SSL of the Mississippi River. The results showed that in daily SSL prediction, LSSVM has better outcomes with regard to ad hoc ANN model. However unlike daily SSL, in monthly modeling, ANN has a bit accurate upshot. WLSSVM and wavelet-based ANN (WANN) models showed same consequences in daily and different in monthly SSL predictions. Geostatistical methods such as Kriging and CoKriging have been extensively employed in hydrological modeling (Matheron 1963). Many hydrologists applied Geostatistical tools for estimation and simulation of hydrological processes (Rouhani and Hall 1989; Syed et al. 2003; Moulin et al. 2009). In all of such works, the hydrological variable of interest has been primarily analyzed in the time or space domain. The data are usually poor in space, while rich in time (Rouhani and Hall 1989). Rouhani and Hall (1989) attemped to expand Kriging to the time-space domain in order to study the phenomena of interest as spatio-temporal variables. Nourani et al. (2010) proposed a hybrid model for spatio-temporal forecasting of groundwater

Evaluation of a Two-Stage SVM and Spatial Statistics

level in coastal aquifers. The basic idea of the models combination in the forecasting was the use each model’s unique feature to capture distinct pattern in the data. Both theoretical and empirical findings suggested that the combining different methods could be an efficient way to improve the forecasting performance. The principal object of this study is spatio-temporal modeling of SSL which is extended on a case study of the Ajichay River located in northwest of Iran. To attain this goal, firstly the data from three hydrometric stations located on the Ajichay River were used for temporal modeling of SSL by employing SVM. Then, spatial trace of SSL were monitored by applying geostatistical method.

2 Materials and Methods 2.1 Support Vector Machine SVM is a powerful methodology for solving problems in non-linear classification, function estimation and density estimation (Kumar and Kar 2002). It uses a linear model to separate sample data through some non-linear mapping from the input vectors into the high-dimensional feature space according to the structural risk minimization (SRM) principle. The most important principle of SVM is the application of minimizing an upper bound to the generalization error instead of minimizing the training error. Based on this, SVM can achieve an optimum networks structure. Many reasearches (Vapnik 1998; Guo et al. 2011) have provided the detailed description of the theory of SVM, and hence only a brief description of SVM is given here. The basic idea of SVM for regression is to map non-linearly the original data x into a high-dimensional feature space and then to perform a linear regression in the feature space. Given a set of training data fðxi ; d i Þg Ni (xi is the input vector, di is the actual value and N is the total number of data patterns), the general SVM regression function is (Wang et al. 2013): y ¼ f ðxÞ ¼ wφðxi Þ þ b

ð1Þ

where φ(xi) represents the high dimensional feature spaces, which is non-linearly mapped from the input space x, and w and b are the weight vector and bias term, respectively (Vapnik 1998). w and b can be estimated by minimizing the error function (Eq. (2)) and introducing the positive slack variables ξ and ξ* (Wang et al. 2013). ! N X   1 2 ξi þ ξ*i Minimize : k wk þ C 2 i 8 * > < wi φðxi Þþbi −d i ≤ε þ ξi i ¼ 1; 2; :::; N Subject to d i −wi φðxi Þþbi ≤ε þ ξ*i i ¼ 1; 2; :::; N > : ξi ; ξ*i i ¼ 1; 2; :::; N

ð2Þ

Where 12 k wk 2 is the weights vector norm and C is referred to as the regularized constant determining the tradeoff between the empirical error and the regularized term. Increasing the value of C will result in an increasing relative importance of the empirical risk with respect to

V. Nourani et al.

the regularization term. ε is called the tube size and is equivalent to the approximation accuracy placed on the training data points. Both C and ε are user-determined parameters. By introducing Lagrange multipliers αi and α*i , the above mentioned optimization problem is transformed into the dual quadratic optimization problem. After the quadratic optimization problem with inequality constraints is solved, the parameter vector w in Eq. (1) can be obtained (Wang et al. 2013): ω* ¼

N  X

 αi −α*i φðxi Þ

ð3Þ

i¼1

Therefore, the SVR regression function is obtained as Eq. (4) (Wang et al. 2013):   X  f x; αi ; α*i ¼ αi α*i k ðx; xi Þ þ b N

ð4Þ

i¼1

Here, K(x, xi) is called the Kernel function. The value of the Kernel is inner product of the two vectors x and xi in the feature space φ(x) and φ(xi), so K(x, xi) = φ(x) × φ(xi), and a function that satisfies Mercer’s condition (Vapnik 1998) can be used as the Kernel Function. In general, there are several types of kernel function, namely linear, polynomial and radial basis function (RBF). The most used kernel function is the RBF, as follows (Wang et al. 2013):   K ðx; xi Þ ¼ exp −kx−xi k2 =2σ2

ð5Þ

The RBF kernel has been reported as the best choice over other kernel functions (Dibike et al. 2001; Noori et al. 2009). Figure 1 illustrates general structure of SVM for predicting SSL. Fig. 1 Scheme of SVM proposed by Vapnik (1998)

Evaluation of a Two-Stage SVM and Spatial Statistics

2.2 Spatial Statisticas Hydrological processes often show high variability, in both time and space domains (Webster and Oliver 2007). The study of regionalized variables starts from the ability to interpolate a given field starting from a limited number of observation, but preserving the theoretical spatial correlation. This is accomplished by means of Geostatistical tools, for the spatial interpolation of various physical quantities given a number of spatially distributed measurements. The Geostatistical interpolation method used in this paper will be briefly introduced. A detailed presentation of Geostatistical theories can be found in Cressie (1991); Goovaerts (1997) and Webster and Oliver (2007). Spatial interpolation is generally carried out by estimating a regionalized value at un-sampled points from a weight of observed regionalized values. The general formula for spatial interpolation is as follows (Ly et al. 2011): Zg ¼

ns X

λi Z ðuÞsi

ð6Þ

i¼1

where Zg is the interpolated value at point g, Z si is the observed value at point i, ns is the total number of observed points (stations) and λ = λi is the weight contributing to the interpolation. The challenge is to calculate the weights which will be used in the interpolation. Geostatistical methods use the semi-variogram as a tool to characterize the spatial dependence and to calculate λ in the property of interest. The spatial structure of SSL mainly depends on basin properties like, topography of the study area, stream flow, and spatial and temporal scale, and can be dynamic in space and time (Tayfur et al. 2003). Recorded SSL values at a specific station is considered to be lumped value, which is the same for all the stations in watershed. By considering two stations in a close neighborhood, the SSL response for these station can be considered as difference of SSL values of these station. If considering whole watershed area stations, the net difference income of SSL at inlet of watershed and it’s outlet would be the same by considering the recorded values at all stations. In order to measure the spatial variability of a regionalized variable Z and assuming the variable being stationary, the traditional experimental semi-variogram is calculated as follows (Ly et al. 2011): N ðh Þ

1 X ðZ ðusi Þ−Z ðusi þ hÞÞ2 γ ðhÞ ¼ 2N ðhÞ i¼1

ð7Þ

Where N(h) is the number of data pairs, which are located a distance vector h apart. The fitting of a theoretical model is necessary in order to deduce semi-variogram values for any possible lag h required by interpolation algorithms. The variogram measures dissimilarity, or increasing variance between points (decreasing correlation) as a function of distance and to help assess how values at different locations vary over distance. Most previous studies have used only one theoretical model for each time step (Goovaerts 2000; LIoyd, 2005). However, this paper focuses on monthly data over 21 years, and deals with the fitting of the semi-variogram for every month. In order to find the best fit, three existing theoretical models, Gaussian, Spherical and Exponential models were employed. Each of these models are combined with a nugget effect. The coefficients of the chosen model were then used to determine the weight through equation systems of different types of Kriging, Ordinary Kriging (ORK), and CoKriging.(CK) (Ly et al. 2011).

V. Nourani et al.

2.3 Study Area and Used Data The monthly streamflow and suspended sediment load dataset used in this study (Table 1) are provided from Ajichay watershed hydrometric stations (Fig.2) which is located in northwest Iran at Azerbaijan province (between 47° 45 and 45° 30 east longitude and 38° 30 and 37° 45 north latitude). The watershed area is 10,853 km2 and covers 25 % of Urmia Lake basin. Watershed elevation varies between 1228 m and 3755 m above the sea level. River discharge is approximately 40.6 m3/s, and it sheds into Urmia Lake, which makes it momentous for survival of the lake. The statistical parameters of the SSL data such as the mean, standard deviation, maximum and minimum values are given in Table 1. In temporal modelling stage, due to the calibration and verification goals, data sets were divided into two parts. The first division as 70 % of total data included the training set and the remaining 30 % data set was used for the testing purpose. In spatial estimation stage, 21 years of monthly data of different stations were employed (except for Arzanagh station, which covers 12 years of monthly SLL). Data of each station were normalized between 0 and 1.

2.4 Proposed Spatio-Temporal Model The research in this paper is separated into two parts, the temporal and spatial stage. The temporal stage includes investigating seasonal processes and monthly prediction of three stations located on main branch of the Ajichay River by utilizing a SVM. The reason of selecting these stations is because monthly SSL data of these stations are continues and thorough time series. The spatial stage consists of SSL monitoring by employment of Geostatistical tools. At this stage spatial modeling is performed by employing output of temporal modeling to fulfill the data shortage of Arzanagh station (9 years). Simultaneous employment of SVM and Geostatistical tools for spatio-temporal modeling could be considered as an advantage in this study. The modeling strategy was performed as follows: I. An SVM was trained for three stations locate on Ajichay River for temporal modeling of the SSL. The model predicts the preceding month SSL (SSLt) of the stations based on quantity of present month, last month and twelve month ago of Stream flow (Qt-1, Qt-2, Qt-12), SSL at one, two and twelve months ago (SSLt-1, SSLt-2, SSL t-12), in order to handle the seasonality of the process. For this reason a sensitivity analysis was performed in order to select the dominant input parameters from the available data, Therefore, 5 combinations of streamflow and SSL values were designated as inputs of models at the input vector of SVM to predict monthly SSL. The selected input combinations are described as follows: Comb. 1 Comb. 2 Comb. 3 Comb. 4 Comb. 5

SSLt-1 SSLt-1, SSLt-2 SSLt-1, Qt-1 SSLt-1, SSLt-12 SSLt-1, SSLt-12, Qt-1, Qt-12

In all cases, t represent the current time step. The output layer was consisted of only one variable, i.e., SSL at current time step (SSLt). Combinations contain various values of river discharge and SSL which are regarded in the input vector to predict the single SSL in one month ahead at time t (SSLt) in the output for SVM model.

685,679.6

719,410.9

628,393

600,716.7

718,096.1

639,048.2

625,646.3

Mehrban

Mirkuh

Sade nahand

Pole sinikh

Sahzab

Saeed abad

Vanyar

637,272.6 659,143.2

Khaje Merkid

661,295.9

Bostan abad

630,383.5

696,092.8

Arzanagh

686,801.4

612,445.6

Anakhatun

Hervi

590,704.5

Akhola

Harzevarz

X(UTM) (m)

Station number

4,219,729.8

4,205,148.4

4,206,875.6

4,226,781.4

4,230,871.9

4,212,463.7

4,215,327.2

4,223,617.5 4,225,866.7

4,230,157

4,197,606.4

4,190,756.1

4,208,168.6

4,223,236

4,208,168

Y(UTM) (m)

891.8398

33.5571

27.42416

31.2107

30.0342

22.6293

24.6029

30.5981 521.2674

32.13454

18.2898

27.2531

286.8670

31.2107

1183.2654

Mean (ton/day)

Table 1 Statistical analysis of observed data in hydrometric stations

0.00

0.00

0.059

0.00

0.03

0.37

0.00

0.00 0.00

0.00

0.00

0.00

0.00

0.00

0.00

Min (ton/day)

7170.00

165.02

217.950

231.50

196.36

238.80

244.50

247.90 4438.42

190.450

143.18

147.95

2327.80

231.50

16,030.00

Max (ton/day)

1,815,572

1062

563

1769

1636

1130

2224

1665 650,816

1273

773

1108.024

239,267

1769

3,208,055

Variance

1347.4320

32.59194

23.745665

42.06311

40.44833

33.62415

47.16309

40.80547 806.73222

35.689796

27.81883

33.28700

489.15014

42.06311

1791.1044

Standard deviation (ton/day)

2.207

1.259

2.996

2.032

1.714

2.690

2.386

2.051 2.408

2.019

2.179

1.545

2.457

2.032

3.602

Skewness coefficient

31

21

21

21

21

21

21

21 21

21

21

21

12

21

31

Data Duration (years)

Evaluation of a Two-Stage SVM and Spatial Statistics

V. Nourani et al.

Fig. 2 Location of hydrometric stations and digital elevation model (DEM) of the Ajichay watershed, East Azerbayjan, Iran

II. At this stage, the results of temporal modeling for three stations were imposed along with dataset available from other stations to a calibrated Geostatistical model in order to monitor SSL at any desired point in the Ajichay River. Following the modeling procedure, data shortage of Arzanagh station was coverd by employing output of spatial modeling at each time step. Finally, spatial model accuracy was evaluated via cross-validation technique.

3 Results and Discussion 3.1 Temporal Stage At first, for the sake of predicting one-step-ahead SSL by employing streamflow and SSL dataset, a SVM model is trained. The selected SVM model in this research employed an RBF-kernel since firstly, the RBF kernels tend to give better performance under general smoothness assumptions and it has fewer tuning parameters than the polynomial and the sigmoid kernels (Noori et al. 2011). The architecture of SVM model is organized according to the history of streamflow and SSL processes; whereas, streamflow and SSL time series usually behave as Markovian process, so that the value of parameters at the current time step could be coherent to the previous time steps conditions. The input vector is optimized with only the most significant time retentions which was defined in part 2.3. Input combination are imported into SVM for estimating SSL of three stations located on the Ajichay River. Likewise, the Combs. 4 and 5 are assigned as input for SSL prediction since the volume of current month SSL may have a convenient correlation with the SSL value at the same month in the previous year due to the seasonality of the process in monthly scale. Toward catching a proper one-step-ahead prediction of SSL, input vectors are formed in a condition that all relevant information on target data be utilized sufficiently.

Evaluation of a Two-Stage SVM and Spatial Statistics

Since SVM may be very sensitive to the proper choice of parameters, it is very important to check a range of parameter combinations, at least on a reasonable subset of the data. RBF kernel, because of its good general performance and the few number of parameters to tune (γ,σ) and it’s accuracy is more desirable. In this study small values of parameters were selected by grid research for parameters. For each input combination, the RBF-kernel’s parameters in SVM were tuned to achieve highest performance. Then, the trained SVM is used to verify the model. The performance of SVM modeling is evaluated by two global statistics including DC and RMSE (Nourani and Andalib 2015). The model performance results for training and testing steps are given in Table 3. The best results of SSL models have been bolded. As it is clear from Table 3, input Combs. 1 to 3 for the Akhola station led to poor results. DC and RMSE values of these combinations are relatively low, which could be referred to weak Markovian behavior and low correlation of selected data at different time lags. In the Comb.4 and 5 which involve seasonality of the SSL process (with one year period) yielded to better results in comparison to other combinations. Analogy of Combs. 4 and 5 demonstrates that Comb. 5 had better result. It is depicted from Table 2 that input combinations 1 through 3 for Merkid station caused poor performance of SVM. As well, Combs. 4 and 5 had better outputs. Comb. 5 which included SSL integrated by streamflow data led to best performance among input combinations. Comparison between input combinations for Vanyar Station showed that Comb. 5 induced best result. It is inferred from Table 2 that for Akhola, Merkid and Vanyar stations, DC results were 0.9, 0.91 and 0.91, also, RMSE results were 0.011, 0.017 and 0.015 respectively in calibration period which is best among all combinations selected. The concluded results for time series of measured and predicted SSL employing SVM model for input Comb. 5 (verification and calibration period) are illustrated in Fig. 3. Furthermore, the scatter plot of SSL predictions for all input combinations are plotted versus observed SSL in Fig. 4. Vertical Table 2 Results of SVM for different input variables in monthly Suspended Sediment Load (SSL) predictions Station Input vectors Akhola Comb. 1 Comb. 2 Comb. 3

Optimized SVM structure (RBF-Kernel structure (γ,σ))

RMSE (Normalized)

DC

RMSE (Normalized)

DC

0.1, 0.1

0.07

0.5

0.13

0.37

0.1, 0.1 0.18, 0.14

0.065 0.05

0.52 0.15 0.63 0.093

0.38 0.51

Comb. 4

0.27, 0.24

0.037

0.67 0.093

0.39

Comb. 5

0.27, 0.24

0.011

0.9

0.74

0.049

Merkid Comb. 1

0.1, 0.1

0.1

0.49 0.17

0.35

Comb. 2

0.1, 0.1

0.095

0.5

0.38

Comb. 3

0.25, 0.24

0.063

0.67 0.1

0.55

Comb. 4

0.35, 0.28

0.06

0.62 0.092

0.56

Comb. 5 Vanyar Comb. 1

0.35, 0.28 0.1, 0.1

0.017 0.08

0.91 0.063 0.49 0.15

0.83 0.32

0.14

Comb. 2

0.1, 0.1

0.9

0.55 0.12

0.36

Comb. 3

0.25, 0.75

0.045

0.6

0.56

0.1

Comb. 4

0.28, 0.65

0.04

0.62 0.094

0.58

Comb. 5

0.28, 0.65

0.015

0.91 0.049

0.76

RMSE = Root mean squared error; DC = Determination Coefficient

V. Nourani et al.

(A) Normalized observed and predicted SSL for Akhola station.

(B) Scatter plot for input combination of Akhola station.

(C) Normalized observed and predicted SSL for Merkid station. Fig. 3 Results of temporal modeling

axes represent observed values of SSL and horizontal axes represent calculated values of SSL. Each scatterplot includes trend line in order to understand each combination performance. It is concluded from outcomes that SVM performed well by taking advantage of input Comb. 5 for three stations according to statistical evaluations.

3.2 Spatial Stage Spatial modeling procedure is performed in two steps. Frist step which is variography, includes measuring similarity or dissimilarity of spatial data structure. The outcome of this step is the calculated weights for point data. Second step is to constrain obtained weights in Kriging equations (e.g., Kriging, CoKriging) in order to perform spatial estimation. In this regard, for

Evaluation of a Two-Stage SVM and Spatial Statistics

(D) Scatter plot for input combination of Merkid station.

(E) Normalized observed and predicted SSL Vanyar station.

(F) Scatter plot for input combination of Vanyar station. Fig. 3 (continued)

each time step (monthly step), empirical semi-variogram are calculated and then three theoretical semi-variograms are fitted in order to obtain a continuous variogram. The center of the

Fig. 4 Simulated values for Arzanagh station

V. Nourani et al. Table 3 Variography results Theoretical model

Univariate (ORK)

Bivariate (CK)

Nugget effect

Partial sill

Range (Km*)

Selected model

Nugget effect

Partial sill

Range (Km)

Selected model

Gaussian

5.4E-6

6.5E-3

45.8

192

3.2E-8

6.5E-3

42

184

Spherical

4E-5

7.5E-5

47.1

4

8E-7

8.8E-7

45

36

Exponential

3E-6

1.4E-4

46.2

56

3.7E-8

3.4E-4

45.2

32

Km=Kilometers

map corresponds to the origin of the variogram γ(h) = 0 for every direction. Semi-variance increased according to the separation distance, explaining that two stations close to each other are more similar, and hence their squared difference is less significant, than those that are farther apart. Results of these models are noted in Table 3. According to Table 3, parameters of theoretical models (e.g. nugget effect, partial sill, range) demonstrate that desired values for the parameters are the one with least values. By considering this subject and obtained values for these parameters, Gaussian model showed better performance among the other models. The Geostatistical model with the least RMSE is selected by comparing the observed SSL and streamflow values with the values estimated by the Variogram models. After performing spatial modeling procedure, output of each time interval was used to simulate missing values of arzanagh station. Figure 4 shows results of simulation for both ORK and CK. According to obtained results, CK performed better than ORK and simulated values are more close to observed values. Results of implication of ORK and CK provided some insights in terms of strengths and weaknesses, and in terms of the applicability of the statistical methods to model monthly SSL using data of 15 stations located on Ajichay watershed. Geostatistics is able to produce 21 years of monthly SSL on the river. Results of these models are tabulated in Table 3. A significant superiority could be found for Gaussian model. At some time intervals when Gaussian model failed for proper semi-variogram modeling, other models are replaced it to avoid unappropriated values. The calibrated ORK and CK estimators are then verified via a cross validation technique. Cross validation is a process for checking the adaptability between a set of data, the spatial model and neighborhood design. In cross validation, each point in the spatial model is individually removed from the model, and then it’s value is estimated by a covariance model (Isaaks and Srivastava 1989). The results of cross-validation are shown in Table 4. It is concluded from Table 4 that CoKriging with DC of 0.83 and RMSE equal to 0.0242 outperformed Kriging, which means application of secondary variable (streamflow), reduced error of spatial modeling. Results show an acceptable performance of Geostatistical tools in spatial simulation of monthly SLL.

Table 4 Performance of geostatistics

Model

RMSE (Normalized)

DC

Kriging

0.0258

0.79

CoKriging

0.0242

0.83

Evaluation of a Two-Stage SVM and Spatial Statistics

The proposed two-stage SSL modeling by employing unique tools in each stage, concluded to desirable outcomes according to Tables 3 and 4. These results are applicable in watershed management, since both temporal and spatial models demonstrate desired values and features at each time step.

4 Summary and Conclusions Suspended sediment load (SSL) as a spatio-temporal phenomenon, is a ‘watershed wide’ measurement of soil erosion, transport and deposition. Until now the convenience of mapping methods have not been comprehensively specified. Due to the complexity of assessing net upstream and downstream sediment yield, in order to handle the problem for SSL modeling, a two-stage spatio-temporal model was proposed in this study. The temporal stage consisted of modeling three stations on the Ajichay River which located on upstream to downstream of the Ajichay River. One-step-ahead estimation of SSL by sensitivity analysis was performed via SVM. The provided SVM model in this study utilized a RBF-kernel structure which was selected because of its reliable performance in comparison to the other functions. The optimized inputs were determined by the sensitivity analysis based on the Markovian property of monthly dataset. Sensitivity analysis demonstrated that input combination 5 which takes into account the seasonality of the process could lead to best outcome with DC = 0.9, 0.91 and 0.91 and RMSE = 0.011, 0.017 and 0.015 for Akhula, Vanyar and Merkid stations respectively in calibration period. At the spatial stage, the spatial statistical method used a monthly-based variogram model chosen from the three variogram models to avoid negative SSL results. For simulating spatial variation of SSL, Geostatistical methods (e.g., Ordinary Kriging and CoKriging) were employed. The CK could incorporate the streamflow as secondary variable in the modeling framework. Both ORK and CK methods were able to produce successful estimation of monthly SSL for the Ajichay watershed area. Among the three used variogram models, the Gaussian model was the best fitted model, which should be recommended for the spatial interpolation of monthly SSL. CK was considered as the better method since it provided appropriate RMSE and DC according to cross-validation results. The proposed spatio-temporal model employing SVM and spatial statistical method could handle complexity of SSL modeling in both space and time domains. Due to the large size of the study area and complexity of SSL process and to obtain more precise modeling results, it is suggested to use an integrated geomorphological machine learning tool like SVM conjugated to spatial clustering approach (such as Self Organizing Map) for multi-station modeling of SSL. Acknowledgments This paper is supported by the University of Tabriz and East Azerbaijan regional water company.

References Afan HA, El-Shafie A, Yaseen ZM, Hameed MM, Mohtar WHMW, Hussain A (2015) ANN based sediment prediction model utilizing different input scenarios. Water Resour Manag 29(4):1231–1245 Ch S, Anand N, Panigrahi BK (2013) Streamflow forecasting by SVM with quantum behaved particle swarm optimization. Neurocomputing 101:18–23

V. Nourani et al. Cobaner M, Unal B, Kisi O (2009) Suspended sediment concentration estimation by an adaptive neuro-fuzzy and neural network approaches using hydro-meteorological data. J Hydrol 367(1–2):52–61 Cressie N (1991) Statistics for spatial data. Wiley, New York de Vente J, Poesen J, Bazzoffi P, Van Rompaey A, Verstraeten G (2006) Predicting catchment sediment yield in Mediterranean environments: the importance of sediment sources and connectivity in Italian drainage basins. Earth Surf Processes Land 31:1017–1034 Dibike YB, Velickov S, Solomatine DP, Abbott MB (2001) Model induction with support vector machines: introduction and applications. J Comput Civ Eng 15:208–216 Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York Goovaerts P (2000) Geostatistical approaches for incorporating elevation into the spatial interpolation of rainfall. J Hydrol 228:113–129 Guo J, Zhou J, Qin H (2011) Monthly streamflow forecasting based on improved support vector machine model. Expert Syst Appl 38:13073–13081 Isaaks EH, Srivastava RM (1989) An introduction to applied geostatistics. Oxford University Press, New York Kakaei E, Moghddamnia A, Ahmadi A (2013) Daily suspended sediment load prediction using artificial neural networks and support vector machines. J Hydrol 478:50–62 Kisi O, Haktanir T, Ardiclioglu M, Ozturk O, Yalcin E, Uludag S (2008) Adaptive neuro-fuzzy computing technique for suspended sediment estimation. Adv Eng Softw 40(6):438–444 Kuhnle RA, and Simon A (2000) Evaluation of Sediment Transport Data for Clean Sediment TMDLs. National Sedimentation Laboratory USDA Agricultural Research Service Oxford, Mississippi. Evaluation of Sediment Transport Data. NSL Report No. 17 Kumar M, Kar IN (2002) Non-linear HVAC computations using least square support vector machines. Energ Convers Manage 50:1411–1418 Liu Y, Yang W, Yu Z, Lung I, Gharabaghi B (2015) Estimating sediment yield from upland and channel erosion at a watershed scale using SWAT. Water Resour Manag 29(5):1399–1412 Ly S, Charles C, Degré A (2011) Geostatistical interpolation of daily rainfall at catchment scale: the use of several variogram models in the ourthe and ambleve catchments. Belgium. Hydrol Earth Syst Sci 15:2259–2274 Matheron G (1963) Principles of geostatistics. Econ Geol Geology 58:1246–1266 Misra D, Oommen T, Agrawal A (2009) Application and analysis of support vector machine based simulation for runoff and sediment yield. Biosyst Eng 103(4):527–535 Moulin L, Gaume E, Obled C (2009) Uncertainties on mean areal precipitation: assessment and impact on streamflow simulations. Hydrol Earth Syst Sci 13:99–114 Noori R, Abdoli MA, Ghasrodashti AA, Jalili Ghazizade M (2009) Prediction of municipal solid waste generation with combination of support vector machine and principal component analysis: a case study of Mashhad. Environ Prog Sustain 28:249–258 Noori R, Karbassi AR, Moghaddamnia A (2011) Assessment of input variables determination on the SVM model performance using PCA, gamma test and forward selection techniques for monthly stream flow prediction. J Hydrol 401:177–189 Nourani V, Andalib G (2015) Daily and monthly suspended sediment load predictions using wavelet based artificial intelligence approaches. J Mt Sci 12(1):85–100 Nourani V, Ejlali RG, Alami MT (2010) Spatiotemporal groundwater level forecasting in coastal aquifers by hybrid artificial neural network-geostatisics model: a case study. Environ Eng Sci 28(3):217–228 Partal T, Cigizoglu HK (2008) Estimation and forecasting of daily suspended sediment data using wavelet-neural networks. J Hydrol 358:317–331 Rouhani S, Hall TJ (1989) Space-time kriging of groundwater data. In: Armstrong M (ed) Geostatistics, Kluwer Academic Publishers, vol 2, pp. 639–651 Roushangar K, Hassanzadeh Y, Keynejad MA, Alami MT, Nourani V, Mouaze D (2011) Studying of flow model and bed load transport in a coarse bed river: case study – Aland river, Iran. J Hydroinform 13 (4):850–866 Roushangar K, Mehrabani FV, Shiri J (2014a) Modeling River total bed Material Load Discharge Using Artificial Intelligence Approaches (Based on Conceptual Inputs). J Hydrol 514:114–122 Roushangar K, Mouaze D, Shiri J (2014b) Evaluation of genetic programming -based models for simulating friction factor in alluvial channels. J Hydrol 517:1154–1161 Sun D, Li Y, Wang Q (2012) A novel support vector regression model to estimate the phycocyanin concentration in turbid inland waters from hyperspectral reflectance. Hydrobiologia 680:199–217. doi:10.1007/s10750-011-0918-7 Syed KH, Goodrich DC, Myers DE, Sorooshian S (2003) Spatial characteristics of thunderstorm rainfall fields and their relation to runoff. J Hydrol 271:1–21 Tayfur G, Ozdemir S, Singh VP (2003) Fuzzy logic algorithm for runoff-induced sediment transport from bare soil surfaces. Adv Water Resour 26(12):1249–1256

Evaluation of a Two-Stage SVM and Spatial Statistics USEPA (2000) The Quality of Our Nation’s Waters. A Summary of the National Water Quality Inventory: 1998 Report to Congress, Office of Water. 841-S-00-001 Washington DC Vapnik V (1998) Statistical learning theory. Wiley, New York Vapnik V, Cortes C (1995) Support vector networks. Mach Learn 20(3):273–297 Wang W, Xu D, Chau K, Chen S (2013) Improved annual rainfall-runoff forecasting using PSO-SVM model based on EEMD. J Hydroinform 15:1377–1390 Webster R, Oliver MA (2007) Geostatistics for environmental scientists, Statistics in Practice Series. Wiley, United Kingdom Wilkinsona SN, Dougallb C, Kinsey-Hendersonc AE, Searled RD, Ellisb RJ, Bartleyd R (2014) Development of a time-stepping sediment budget model for assessing land use impacts in large river basins. Sci Total Environ 468-469:1210–1224 Zhu YM, Lu XX, Zhou Y (2007) Suspended sediment flux modeling with artificial neural network: an example of the longchuanjiang river in the upper Yangtze catchment, China. Geomorphology 84:111–125

Suggest Documents