Neurocomputing 145 (2014) 324–335
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
An integrated wavelet-support vector machine for groundwater level prediction in Visakhapatnam, India Ch. Suryanarayana a,n, Ch. Sudheer b, Vazeer Mahammood c, B.K. Panigrahi d a
Department of Civil Engineering, Gayatri Vidya Parishad College of Engineering, Visakhapatnam, A.P., India Department of Civil Engineering, Indian Institute of Technology, Hauzkhas, New Delhi, India c Department of Civil Engineering, Andhra University College of Engineering, Visakhapatnam, A.P., India d Department of Electrical Engineering, Indian Institute of Technology, Hauzkhas, New Delhi, India b
art ic l e i nf o
a b s t r a c t
Article history: Received 23 October 2013 Received in revised form 7 April 2014 Accepted 12 May 2014 Communicated by Swagatam Das Available online 27 June 2014
Accurate and reliable prediction of the groundwater level variation is significant and essential in water resources management of a basin. The situation is complicated by the fact that the variation of groundwater level is highly nonlinear in nature because of interdependencies and uncertainties in the hydro-geological process. Models such as Artificial Neural Networks (ANN) and Support Vector Machine (SVM) have proved to be effective in modeling virtually any nonlinear function with a greater degree of accuracy. In recent times, combining several techniques to form a hybrid tool to improve the accuracy of prediction has become a common practice for various applications. This integrated method increases the efficiency of the model by combining the unique features of the constituent models to capture different patterns in the data. In the present study, an attempt is made to predict monthly groundwater level fluctuations using integrated wavelet and support vector machine modeling. The discrete wavelet transform with two coefficients (db2 wavelet) is adopted for decomposing the input data into wavelet series. These series are further used as input variables in different combinations for Support Vector Regression (SVR) model to forecast groundwater level fluctuations. The monthly data of precipitation, maximum temperature, mean temperature and groundwater depth for the period 2001–2012 are used as the input variables. The proposed Wavelet-Support Vector Regression (WA-SVR) model is applied to predict the groundwater level variations for three observation wells in the city of Visakhapatnam, India. The performance of the WA-SVR model is compared with SVR, ANN and also with the traditional Auto Regressive Integrated Moving Average (ARIMA) models. Results indicate that WA-SVR model gives better accuracy in predicting groundwater levels in the study area when compared to other models. & 2014 Elsevier B.V. All rights reserved.
Keywords: Predicting Groundwater level Support vector regression Wavelet transform
1. Introduction Groundwater management has been a major cause of concern because of the ever increasing demand of water for industrial, agricultural and domestic needs. In India and in many other parts of the world, groundwater depletion has been a common cause of concern for engineers. Further, the prediction of the groundwater levels in a basin is of immense importance for the management of groundwater resources. The prediction of groundwater is very complex and highly nonlinear in nature as it depends upon many complex factors such as precipitation, evapotranspiration, soil characteristics and topography of the watershed. Nonlinear n
Corresponding author. E-mail addresses:
[email protected] (Ch. Suryanarayana),
[email protected] (Ch. Sudheer),
[email protected] (V. Mahammood),
[email protected] (B.K. Panigrahi). http://dx.doi.org/10.1016/j.neucom.2014.05.026 0925-2312/& 2014 Elsevier B.V. All rights reserved.
empirical models and data driven models [1–4] have been used in the forecasting studies in many areas of science and engineering. In the recent decades, machine learning models are employed in modeling nonlinear processes that are complex in nature [5], particularly in problems of flow prediction where ANN has been widely used [6]. Besides neural networks, other very recent techniques such as SVM [7–9], genetic programming [10] and probabilistic graphical models such as Bayesian networks [3,4] are found to be effective in modeling virtually any nonlinear function. Bayesian networks take into account the causal relationship between random variables statistically. Support vector machines are found to perform well compared to the other techniques [11,12]. The concept of a support vector machine (SVM) has been developed recently by Cortes and Vapnik [13]. SVM not only possesses the strength of ANN but it also overcomes some of the major problems associated with ANN. In the context of hydrology,
Ch. Suryanarayana et al. / Neurocomputing 145 (2014) 324–335
SVM has proved to be a promising tool. It has many applications like forecasting flood stage [14], extension of rating curve [15], forecasting future water levels [16], long term discharges [12,17], estimation of removal efficiency of settling basins in canals [18], developing pedotransfer functions for water retention soils [19], developing probabilistic reservoir operation model [20], forecasting monthly time discharge [21], statistical downscaling [22], designing optimal insitu bioremediation [23] and so on. In the present study, the SVR is chosen to forecast groundwater level. When making a decision regarding water management, the hydrologists consider results from many types of techniques that help them to achieve their objective. Relying on a single technique can be very risky particularly in water resources management as the results of the decision are very sensitive, effecting lives of people of a region. Combining several techniques to form a hybrid tool has become a common practice to improve the accuracy of predicted results where the unique features of all the models are combined to capture different patterns in the data. Theoretical as well as empirical findings suggest that hybrid methods can be effective and efficient in improving forecasts [24]. In the present work, SVR method coupled with the wavelet techniques is used to increase the efficiency of the model. Wavelets are a mathematical expression which decomposes the original time series into various components. The wavelet components thus obtained are very helpful for improving the forecasting capability of a model by capturing useful information at various levels. Wavelet transforms proved to perform better compared to the traditional Fourier transforms [11]. In this study wavelet analysis is used to decompose the time series of groundwater depths into various components. The decomposed components are thus used as inputs for the SVR model. The purpose of this paper is to investigate the performance of the wavelet-support vector regression model in predicting the ground water depths and to compare this with the performance of other existing models like Support Vector Regression model, Artificial Neural Networks and Auto Regressive Integrated Moving Average. The organization of the paper is as follows. Section 2 details the formulation of support vector machines. Section 3 describes the discrete wavelet transform. Section 4 gives the details of the study area and collected data. Section 5 deals with the details of proposed hybrid wavelet-support vector regression (WA-SVR) model. Section 6 describes the models developed for comparing forecast performance of the proposed WA-SVR model. In Section 7 results and discussions are given. Finally the summary and conclusions are presented in Section 8.
2. Support vector machines Support Vector Machines is a data learning tool. SVM performs data regression and pattern recognition. A support vector machine constructs a set of hyper-plane in an infinite dimensional space. The SVM equations are formulated as per Vapnik's theory [13]. Let fðx1 ; y1 Þ; …; ðxℓ ; yℓ Þg be assumed to be the given training data sets, where xi Rn represents the input space of the sample and yi R for i ¼ 1; …; l represents respective target value, l denotes the number of elements in the training data set. The errors are tolerable as long as they are less than the ϵ value. Equation (1) is solved to estimate linear regression in SVM. ℓ 1 n ‖w‖2 þ C ∑ ðξ þ ξ Þ 2 i¼0 8 > < yi 〈w; xi 〉 b r ϵi þ ξn 〈w; xi 〉 þ b yi r ϵi þ ξ subject to > : ξ ; ξn Z0 i ¼ 1; …; l i
i
w is a normal vector, b is a scalar quantity, C represents a regularization constant, ϵ is the insensitive loss function, and the n slack variables, ξ, ξ correspond to the size of the excess deviation for upper and lower deviations, respectively. Lagrangian multipliers ðαi ; αni ; η; ηn Þ are used to solve Eq. (1). By suitable mathematical manipulations the primal formulation of the objective function is turned into dual formulation and then the dual problem becomes objective function for the quadratic programming. Quadratic programming is an optimization algorithm which was used initially for solving SVM because of it's ease for solving the problems and also because it ensures global minimum. The use of optimization techniques and their choice have a crucial role in training SVMs. The training speed, memory constraint and accuracy of optimization variables depend upon the optimization methods. A detailed review of the optimization techniques used to train SVMs was given by Shawe-Taylor and Sun [25]. Thus the generic equation obtained after optimization was given as l
f ðxÞ ¼ ∑ ðαi αni Þ〈xi ; x〉 þ b
ð2Þ
i¼1
The approach discussed in Eq. (2) is applicable for solving the linear regression problems. This approach can further be extended to nonlinear regression problems by replacing xi with a mapping into the feature space ϕðiÞ and this in turn linearizes the relationship between xi and yi. Thus solution of Eq. (2) becomes l
f ðxÞ ¼ ∑ ðαi αni Þkðxi ; xÞ þb i¼1
kðxi ; xÞ ¼ 〈ϕðxi Þ; ϕðxÞ〉;
ð3Þ
kðxi ; xj Þ is the kernel function. Over period of years, Radial Basis Function has become the choice of the researchers as kernel function for SVM because of its accuracy and reliable performance [23,26,27]. Therefore, the Radial Basis Function (RBF) is adopted in this study and is expressed as kðxi ; xÞ ¼ expðγ ‖x xi ‖2 Þ
ð4Þ
The selection of the three SVR parameters namely C, γ, and ϵ influence the accuracy of prediction. In the present study trial and error procedure is used to optimize these parameters.
3. Discrete wavelet transform Wavelet transform is used in analyzing data because of it's capacity to extract the relevant time-frequency information from non-periodic and transient signals. Wavelets' decomposes the frequency components of signals. Wavelet functions disintegrate the data into different frequency components, and then study each component with a resolution matched to its scale thus overcoming the limitations of Fourier and Short-time Fourier transform. Wavelets are widely used in different fields of civil engineering applications including river flow forecast [28], suspended sediment [29] , groundwater level prediction [11], fault classification [30], and hydrological prediction [31,32]. While employing the wavelet transform technique, a finite number of positions and resolution levels (Discrete Wavelet Transform) are considered. T 1 t n 2m ðm=2Þ pW ¼ 2 ∑ p W t mn 2m t¼0 T 1
minimize
325
¼ 2 ðm=2Þ ∑ pt W mn ðtÞ;
ð5Þ
t¼0
ð1Þ
WðÞ is the selected wavelet function, pt is the groundwater depth during the month t, T is the length of the series, pW mn is the decomposition coefficient corresponding to resolution level m and
326
Ch. Suryanarayana et al. / Neurocomputing 145 (2014) 324–335
position n. The number of coefficients at each resolution level is given by T=2m provided the number of observations, T, is divisible by 2m . Faster calculations can be done by treating Eq. (5) as a convolution, and using the efficient Fast Fourier Transform. An effective way to apply the wavelet functions is the multiresolution technique based on using a father wavelet function and its complementary, a mother wavelet function. The father function extracts the low frequency components, while the mother function extracts the high frequency components of the series. Orthogonal wavelet functions are preferred because of their appropriate mathematical properties. Hence the “approximation series”, Am ðm ¼ 1; …; MÞ , and the “detail series”, Dm ðm ¼ 1; …; MÞ, are defined in Eqs. (6) and (7). Am ¼ ∑pΦ mn φmn ðtÞ; m ¼ 1; …; M
ð6Þ
n
water reservoirs in the city. In the recent decades, there has been a rapid industrial expansion in and around the city resulting in a significant diversion of surface water for meeting the industrial requirements. Hence groundwater is being used as an alternative source to meet the domestic water needs. Further the rapid growth in population increased the subsequent demand for groundwater, resulting in gradual decline in groundwater table. There are certain areas in the city which have become vulnerable to sea water intrusion due to depleting groundwater table. Three such regions of the city viz Sivajipalem, Madhurawada and Gullalapalem are identified. The locations of observation wells in the study area are shown in the Fig. 1. The Sivajipalem observation well is within 3 Km from Cyclone Warning Centre (CWC) of Indian Meteorological Department (IMD), Visakhapatnam, whereas the other two observation wells are about 10 km away from the CWC.
and Dm ¼ ∑pΨ mn ψ mn ðtÞ; m ¼ 1; …; M;
ð7Þ
4.2. Hydrological and meteorological data
n
φmn ðtÞ and ψ mn ðtÞ are the father and mother wavelet functions, Ψ and pΦ mn and pmn are the coefficients obtained through Eq. (5). The expression of the original groundwater depth series pt ðt ¼ 1; …; TÞ can now be reconstructed as given in Eq. (7), which is the denominated multi-resolution decomposition of the groundwater depth series. pt ¼ D1 þ … þ DM þAM
ð8Þ
Daubechies wavelets are most appropriate for treating a non-stationary series and have been considered in this work. The characteristic of these families of wavelets is that the smoothness increases as the order of the functions and simultaneously the support intervals also increase, which may deteriorate the prediction. Therefore, low order wavelet functions are generally advisable.
All the models in this study are developed using hydrological and meteorological parameters. The data used in this study consisted of monthly groundwater depth (d in m below ground level), the monthly precipitation (P in cm), maximum temperature (T Max in 1C) and mean temperature (T Mean in 1C). Monthly groundwater depth recorded by State Groundwater Department is available for the period May 2001–February 2012 for the observation wells in these three regions. In addition to monthly groundwater depth, monthly precipitation, maximum temperature and mean temperature are obtained from IMD. The input data is divided into training set (from May 2001 to May 2010), a validation set (from June 2010 to February 2011) and a testing set (from March 2011 to February 2012).
5. WA-SVR model development 4. Study area and data 4.1. Study area Visakhapatnam city is located in Andhra Pradesh along the east coast of India at latitude 171450 North and longitude 831160 East. The main source of water supply in the study area is impounded
The wavelet support vector machine (WA-SVR) model is developed for predicting the groundwater depth for the three observation wells located at Sivajipalem, Madhurawada and Gullalapalem. The details of the wavelet decomposition time series and details regarding the input parameters are discussed briefly in this section.
Madhurawada
Sivajipalem IMD Visakhapatnam
Gullalapalem
Bay of Bengal
Fig. 1. Location map of Visakhapatnam and observation wells.
35 30 25 0
40
80
120
140
0
40
80
120
140
0
40
80
120
140
0
40
80
120
140
0
40
80
120
140
0
40
80
120
140
A4
32 30
D1
2 0 -2
D2
2 0 -2
D3
5 0 -5
D4
2 0 -2
months Fig. 3. Wavelet decomposition series of monthly maximum temperature data of the study area.
40 30 20
0
40
80
120
140
0
40
80
120
140
40
80
120
140
0
40
80
120
140
0
40
80
120
140
0
40
80
120
140
A4
35 30 25
D1
2 0
10
2
D2
Precipitation (cm) A4
-2 0
100 50 0
0
40
80
120
140
5
D3 0
40
80
120
140
D4
D1
5
0
D2
0 -5
0
40
80
120
140
months
20
Fig. 4. Wavelet decomposition series of monthly mean temperature data of the study area.
0 -20
0 -5
50
-50
0 -2
20 0
327
34
Mean Temperature (degree C)
WA-SVR models are formed by combining the decomposing capabilities of wavelet with support vector machines. MATLAB Wavelet toolbox [33] is used in this study. The input data of monthly P, T Max , T Mean and d in WA-SVR models are decomposed into sub-series component (D's and A's denoting details and approximations of the sub-series) using the discrete wavelet transform (DWT). D's represent the detailed frequency series and time series of the input data. The original time series is decomposed into lower resolution components by iterating the decomposition of successive approximation signals. Four resolution levels are employed in the present study. The four decomposition levels contain four detail signals (2-4-8-16 months) and one approximation signal. All the sub-series coefficients contain the information about the original time series of the particular parameter. Therefore, all the parameters of the sub-series are used to give input to the SVR model. The P, T Max and T Mean data are decomposed into time series of 2-month mode (D1), 4-month mode (D2), 8-month mode (D3), 16-month mode (D4) and one approximation mode (A4) using db2 wavelet transform. Figs. 2–4 indicate wavelet decomposition series of monthly P, T Max and T Mean data of the study area. Here A4 sub-series represents the low frequency data and D1, D2, D3 and D4 are the four level decomposed sub-series in sequence when the data continuously pass four times through the high pass filters. It can be observed from Figs. 2–4 that D3 sub-series shows near similarity with the original data time series. Similarly the observed groundwater level data at Sivajipalem, Madhurawada and Gullalapalem wells are decomposed using db2 wavelet transform as shown in Figs. 5–7. The figures show that groundwater level variation is different at all the three wells and D4 sub-series is almost similar to the original data time series. Fig. 8 is giving the input and output structure of the proposed WA-SVR model. The sub-series A4, D1, D2, D3 and D4 of P, T Max , T Mean and d are segregated for each well separately and normalized to be given as input to the SVR models as shown in Fig. 8. The WA-SVR models are tested for various input combinations of the normalized data of decomposed time series separately and in each input combination the impact of adding lag time from t
Maximum Tempeature (degree C)
Ch. Suryanarayana et al. / Neurocomputing 145 (2014) 324–335
0
40
80
120
140
D3
50 0 -50
0
40
0
40
80
120
140
80
120
140
D4
10 0 -10
months Fig. 2. Wavelet decomposition series of monthly precipitation data of the study area.
(current month) to t 6 (6 months earlier) is considered. The output (predicted groundwater level) is the summation of the outputs from each decomposed time series and it is compared with the observed groundwater levels at the three wells. The optimum values of the SVR parameters C and γ are selected by a trial and error procedure and are given in Table 1. The best WA-SVR model is the one which has the best performance criteria and it is selected by training and testing the SVR model with different input data time series and also applying time lag from current month to 6 months to all the parameters. The best
20 10 0
0
40
80
120
140
Groundwater Level (m)
Ch. Suryanarayana et al. / Neurocomputing 145 (2014) 324–335
Groundwate r Level (m)
328
A4
10
A4 0
40
80
120
D1
D1 0
40
80
120
140
D2
D2 0
40
80
120
D3
0
40
80
120
140
0
40
80
120
140
0
40
80
120
140
0
40
80
120
140
0
40
80
120
140
2
D3
0
D4
140
0 -5
140
0
40
80
120
140
0 -2
10
5
D4
0 0
40
80
120
140
0 -5
months Fig. 5. Wavelet decomposition series of monthly groundwater level data at Sivajipalem well.
Groundwater Level (m)
120
0
5
months Fig. 7. Wavelet decomposition series of monthly groundwater level data at Gullalapalem well.
is determined by using the following equations:
20 10 0
80
5
0
-10
40
5
-2
10
-5
0
2
0
-10
0
0
140
5
-5
5
10
5 0
10
NMSE ¼ 0
40
80
120
140
n 1 ∑ni¼ 1 ½ðhm Þi ðhs Þi 2 n n ∑ ½ðhm Þi ðh m Þi 2 i¼1
A4
20 10 0
ð9Þ
0
40
80
120
140
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 n ∑ ½ðhm Þi ðhs Þi 2 RMSE ¼ n i¼1
ð10Þ
D1
5 0 -5
0
40
80
120
140
0
40
80
120
140
n ½ðhm Þi ðh m Þ½ðhs Þi ðh s Þ R2 ¼ ∑ rhffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ih iffi i¼1 n ∑i ¼ 1 ½ðhm Þi ðh m Þ2 ∑ni¼ 1 ½ðhs Þi ðh s Þ2
ð11Þ
D2
5 0 -5
D3
5
0
40
80
120
140
D4
5
n
ð12Þ
∑ ½ðhm Þi ðh m Þ2
1 jðhs Þi ðhm Þi j ; MAPE ¼ ∑ni¼ 1 n ðhm Þi
ð13Þ
h is the groundwater level and the subscripts m and s represent the measured and simulated values respectively.
0 -5
∑ni¼ 1 ½ðhm Þi ðhs Þi 2 i¼1
0 -5
Ec ¼ 1
0
40
80
60
140
months
Fig. 6. Wavelet decomposition series of monthly groundwater level data at Madhurawada well.
WA-SVR model is selected among the various input combinations which gives the best performance criteria.
6. Models for comparing forecast performance The prediction accuracy of the proposed WA-SVR model is compared with the normal Support Vector Regression (SVR) model, Artificial Neural Networks (ANN) model and also with traditional Auto Regressive Integrated Moving Average (ARIMA) model. 6.1. SVR models
5.1. Criterion of performance The forecast performance is evaluated using the Normalized Mean Square Error (NMSE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), Nash-Sutcliffe Efficiency Coefficient (Ec) and Correlation Coefficient (R2 ). The evaluation criteria
LIBSVM toolbox [34] is used to develop the SVR models for predicting the groundwater level. Generally the accuracy of the SVR model depends on the appropriate selection of kernels and its parameters. Many works in forecast have demonstrated the favorable performance of the Radial Basis Function [26,27,35] as
Ch. Suryanarayana et al. / Neurocomputing 145 (2014) 324–335
Input time series (P, TMax , TMean and d)
Decompose input time series using DWT
Segregate A4 wave components
SVR Model
Segregate D1 wave components
SVR Model
Segregate D2 wave components
SVR Model
Segregate D3 wave components
SVR Model
Segregate D4 wave components
SVR Model
329
Add the output from each wave series to get Final output (d)
Fig. 8. Structure of WA-SVR model.
Table 1 Optimal SVR parameters of the WA-SVR model for various decomposition series. Type of WA Sivajipalem decomposition observation well
A4 D1 D2 D3 D4
γ
best C
best
100 5 10 50 150
0.080 0.008 0.009 0.008 0.080
Madhurawada observation well
γ
best C
best
50 5 5 5 50
0.015 0.15 0.15 0.15 0.015
Gullalapalem observation well
γ
best C
best
50 1.5 5 50 150
0.120 0.180 0.180 0.200 0.006
layer. ANN models used in this study consist of input layer with the neurons varying from 4 to 28 and a hidden layer and an output layer. By trial and error process the optimal number of neurons in the hidden layer is determined to be 90 for all the cases. The Neural Network Tool box of MATLAB package [36] is used for developing ANN model in this study. A total of 84 ANN models are developed for all the three regions and the models which have best performance criteria are selected as best ANN models for all the three observation wells. 6.3. ARIMA models
Table 2 Optimal SVR parameters of the SVR model.
γ
Location of well
best C
best
Sivajipalem Madhurawada Gullalapalem
250 1 10
0.150 0.150 0.008
kernel function for SVM. Therefore, the Radial Basis Function (RBF) is adopted in this work. The optimal SVR parameters C and γ obtained after the trial and error process are depicted in Table 2. The developed SVR models consist of several combinations of the normalized data of P, T Max , T Mean and d at different lags. The impact of time lag for each model input combination is tested from t (current month) to t 6 (6 months earlier). The best SVR model is the model which has best performance criteria among the various model input combinations and is developed for all the three observation wells. 6.2. ANN models Feed forward neural networks are the simplest and most commonly adapted architectures. In this study ANN's used the Levenberg–Marquardt (LM) algorithm for training as it is the fastest algorithm [6,11]. The performance of ANN's greatly depends upon the optimal choice of parameters. Most significant variable to be determined is the number of neurons in the hidden
SPSS 13 [37] is used in developing the ARIMA model for predicting the groundwater levels. ARIMA techniques can be used with only single variable. Thus in this study a univariate analysis is performed with groundwater level being the variable. The groundwater time series for all the three wells are found to be nonstationary. These time series are converted into stationary values by the differentiating process. The autoregressive terms are altered from 0 to 5. Also the lagged forecast errors are varied from 0 to 5 in the prediction equation. The parameter estimation in ARIMA models is performed by using the generalized least square method.
7. Results and discussion The input combinations of best WA-SVR model are different for different wells. For Sivajipalem well, the best WA-SVR model is a function of the monthly precipitation, maximum temperature and groundwater levels from the current month (t) and previous month (t 1). For Madhurawada and Gullalapalem wells, the best WA-SVR model is a function of the monthly precipitation, maximum temperature and groundwater levels from the current month (t), previous month (t 1) and 2 months earlier (t 2). The models are run with maximum temperature and mean temperature separately and it is found that in the former case, the results are slightly better. Even when both the inputs are used together, the results are not improved in accuracy. Therefore mean temperature is not included as input variable in the best models under consideration in the present study. The R2 , Ec, NMSE, RMSE and MAPE statistics of best WA-SVR model, best SVR model, best ANN model, and best ARIMA model
330
Ch. Suryanarayana et al. / Neurocomputing 145 (2014) 324–335
for the Sivajipalem, Madhurawada and Gullalapalem observation wells in the testing period are given in Tables 3–5 respectively. Relatively the best WA-SVR model has given better results. The best WA-SVR model gives an increase in R2 and Ec values over the best SVR model respectively as 20% and 27% for Sivajipalem, 83% and 164% for Madhurawada and 52% and 217% for Gullalapalem observation wells. Similarly best WA-SVR model when compared to best ANN model, shows an increase in R2 and Ec values of 15% and 34% for Sivajipalem observation well and 121% and 253% for Madhurawada observation well and 60% and 249% for Gullalapalem observation well respectively. The R2 and Ec values of best WA-SVR model are far superior to the best ARIMA model for all the three observation wells as shown in Tables 3–5. In comparison with best SVR model, the NMSE, RMSE and MAPE values for best WA-SVR model are lower by 64%, 40% and 32% respectively for Sivajipalem observation well, 80%, 55% and 57% for Madhurawada observation well and 55%, 33% and 34% for Gullalapalem observation well. Similarly best WA-SVR model when compared to best ANN model shows a decrease in NMSE, RMSE and MAPE values of 70%, 43% and 38% respectively for Sivajipalem observation well and 83%, 57% and 58% for Madhurawada observation well and 59%, 34% and 23% for Gullalapalem observation well. Similarly the values are lesser by 84%, 61% and 56% for Sivajipalem observation well and 86%, 62% and 62% for Madhurawada
Table 3 Performance of the best WA-SVR model and other models for monthly groundwater level prediction for Sivajipalem observation well. Performance criterion
Model For best ARIMA
R2 Ec NMSE RMSE MAPE
For best ANN
For best SVR
For best WA-SVR
0.305
0.818
0.782
0.943
0.299 0.643 1.274 0.198
0.662 0.334 0.884 0.140
0.701 0.275 0.833 0.128
0.894 0.098 0.496 0.087
Table 4 Performance of the best WA-SVR model and other models for monthly groundwater level prediction for Madhurawada observation well.
observation well and 93%, 74% and 67%for Gullalapalem observation well when compared to best ARIMA models. Figs. 9–11 give the comparison of the observed and computed groundwater levels using the best models of WA-SVR, SVR, ANN and ARIMA for 1 month-ahead forecast in the testing period for Sivajipalem, Madhurawada and Gullalapalem observation wells respectively. The correlation coefficient (R2 ) values for testing period for all the models are also shown in the scatter plots for all the three wells in Figs. 9–11 corresponding to each model computed values. It is observed that the WA-SVR model is able to capture the underlying dynamics of the groundwater level variations and forecast even when there is sudden change in groundwater levels in the consecutive months. Further the WA-SVR model is able to forecast accurately the higher and lower peaks groundwater levels in the testing period when compared to the other models. From the scatter plots drawn between observed and computed groundwater levels for all the three observation wells it can be seen that, WA-SVR model predicts the groundwater levels with less scatter and all points are close to the straight line when compared to SVR, ANN and ARIMA models. It can also be seen from these plots that overall prediction of the groundwater levels in SVR model are better than the ANN model and ANN model results are superior to ARIMA model. Tables 6–8 give a comparison of errors in the results obtained by the WA-SVR, SVR, ANN and ARIMA models in the testing period. The maximum error in predicted groundwater levels for Sivajipalem, Madhurawada and Gullalapalem observation wells are 0.8 m, 1.06 m and 0.29 m in best WA-SVR model and 1.56 m, 2.86 m and 0.49 m in the best SVR model and 1.74 m, 2.53 m and 0.57 m in the best ANN model and 2.49 m, 3.59 m and 1.32 m in the best ARIMA model respectively. The WA-SVR model has shown minimum error when compared to the other models for all the three observation wells. The average error in the testing period at Sivajipalem observation well are 9%, 13%, 15%, 20% and at Madhurawada observation well are 11%, 25%, 26%, 28% and at Gullalapalem observation well are 4%, 6%, 6%, 12% respectively for WA-SVR, SVR, ANN and ARIMA models. The results indicate that out of all the models employed in the present study, the WA-SVR model is the most accurate in predicting the groundwater levels.
8. Summary and conclusions Performance criterion
Model For best ARIMA
R2 Ec NMSE RMSE MAPE
For best ANN
For best SVR
For best WA-SVR
0.058
0.395
0.477
0.874
0.034 0.885 1.436 0.278
0.245 0.746 1.269 0.254
0.328 0.616 1.198 0.248
0.867 0.122 0.534 0.105
Table 5 Performance of the best WA-SVR model and other models for monthly groundwater level prediction for Gullalapalem observation well. Performance criterion
Model For best ARIMA
R2 Ec NMSE RMSE MAPE
For best ANN
For best SVR
For best WA-SVR
0.156
0.428
0.449
0.687
4.251 4.813 0.761 0.112
0.186 0.805 0.300 0.047
0.204 0.730 0.296 0.054
0.648 0.323 0.197 0.036
In the present study, the prediction capability of an Integrating model with Discrete Wavelet Transform and Support Vector Regression has been investigated to predict the monthly groundwater levels at three observation wells in the Visakhapatnam city viz, Sivajipalem, Madhurawada and Gullalapalem observation wells. To study the accuracy of WA-SVR model, other models such as SVR, ANN and ARIMA are developed. The multivariate time series analysis is performed by considering the various hydrological variables like monthly data of P, T Max , T Mean and d. Four wavelet decomposition levels are done for all the input variables. All the D series and A series are considered equally important and contain information about the original time series of the particular input parameter. Therefore, in the WA-SVR model, various combinations of the sub-series of all the inputs together with the effect of time lag are supplied as input to compute the outputs of each sub-series from the SVR model. The summation of these outputs gives the computed depth from model and it is compared with the observed depth values at the observation wells by evaluating the standard statistical parameters. The comparison of the performance of the models used for the study of groundwater variation is done based on the statistical parameters and also the error percentage.
0 Observed
2
Forecasted Groundwater level (m)
Groundwater level (m)
Ch. Suryanarayana et al. / Neurocomputing 145 (2014) 324–335
WA-SVR forecasted
4 6 8
331
9.00
7.00
5.00
3.00
10
0
2
4
6
8
10
3
12
5
7
9
Observed Groundwater level (m)
9
0
Observed
Forecasted Groundwater level (m)
Groundwater level (m)
months
SVR forecasted
2 4 6 8
7
5
3
10
0
2
4
6
8
10
3
12
months
7
9
9
Observed
ANN forecasted
Forecasted Groundwater level (m)
Groundwater level (m)
0 2 4 6 8 10
5
Observed Groundwater level (m)
7
5
3
0
2
4
6
8
10
3
12
5
7
9
Observed Groundwater level (m)
months
Observed
ARIMA forecasted
Forecasted Groundwater level (m)
Groundwater level (m)
0 2 4 6 8 10
0
2
4
6
8
months
10
12
9
7
5
3 3
5
7
9
Observed Groundwater level (m)
Fig. 9. Comparison of observed and computed groundwater levels at Sivajipalem observation well using (a) the best WA-SVR model, (b) best SVR model, (c) best ANN model and (d) ARIMA model for 1-month-ahead forecast.
For the best WA-SVR model, three input parameters (P, T Max and d) with time lag of current month (t), previous month (t 1) for Sivajipalem observation well and with time lag of current month (t), previous month (t 1), 2 months before (t 2) for Madhurawada and Gullalapalem observation wells are found to be giving better accuracy. It can be concluded that the length of lag time of the
input data is influenced by the validity of the data. Since the Sivajipalem observation well is in close proximity of the rain gauge station, the rainfall data is more valid for this station when compared to other two observation wells. Therefore, the other two wells require more lag time data towards achieving better accuracy.
332
Ch. Suryanarayana et al. / Neurocomputing 145 (2014) 324–335
Observed
WA-SVR forecasted
2 4 6 8
Groundwater level (m)
Forecasted Groundwater level (m)
0
0
2
4
6 months
8
10
Observed
SVR forecasted
2 4 6
0
2
4
6
4.00
4.00
6.00
8.00
Observed Groundwater level (m)
0
8
6.00
2.00 2.00
12
Forecasted Groundwater level (m)
Groundwater level (m)
8.00
8
10
8.00
6.00
4.00
2.00 2.00
12
4.00
6.00
8.00
Observed Groundwater level (m)
months 10.00
Observed
Forecasted Groundwater level (m)
Groundwater level (m)
0
ANN forecasted
2 4 6 8 9
0
2
4
6 months
8
10
4.00
6.00
8.00
Observed Groundwater level (m)
6.00
Observed
2
Forecasted Groundwater level (m)
Groundwater level (m)
6.00
4.00 2.00
12
0
ARIMA forecasted
4 6 8 9
8.00
0
2
4
6 months
8
10
12
4.00
2.00 2.00
4.00
6.00
8.00
Observed Groundwater level (m)
Fig. 10. Comparison of observed and computed groundwater levels at Madhurawada observation well using (a) the best WA-SVR model, (b) best SVR model, (c) best ANN model and (d) ARIMA model for 1-month-ahead forecast.
The WA-SVR model testing results show significant improvement in all the statistical parameters (R2 , Ec, NMSE, RMSE and MAPE values) compared to all other models for all the three observation wells. For the Sivajipalem, Madhurawada and Gullalapalem observation wells, the best WA-SVR model increased the prediction correlation coefficient and efficiency by 20%–83% and 27%–217% respectively when compared to best SVR model and 15%–121% and 34%–253% respectively, when compared to best
ANN model. Similarly for these observation wells the best WA-SVR model NMSE, RMSE and MAPE reduced by 55%–80%, 33%–55% and 32%–57% respectively when compared to best SVR model and 59%–83%, 34%–57% and 23%–58% respectively when compared to best ANN model. For the Sivajipalem, Madhurawada and Gullalapalem observation wells, the maximum error in the best WA-SVR model are 17%, 50% and 7% and for the best SVR model are 27%, 133% and 10% and
Observed
5.5
WA-SVR forecasted
4
5
6
5
4.5
4
0
2
4
6
8
10
12
4
SVR forecasted
4
5
5
4.5
4
0
2
4
6
8
10
4
12
5
5.5
6.00
Observed
Forecasted Groundwater level (m)
Groundwater level (m)
4.5
Observed Groundwater level (m)
3
ANN forecasted
4
5
5.50
5.00
4.50
4.00
0
2
4
6
8
10
4
12
4.5
5
5.5
Observed Groundwater level (m)
months 3
5.5
ARIMA forecasted
Forecasted Groundwater level (m)
Observed 4
5
6
5.5
5.5
months
6
5
6
Observed
Forecasted Groundwater level (m)
Groundwater level (m)
3
6
4.5
Observed Groundwater level (m)
months
Groundwater level (m)
333
3 Forecasted Groundwater level (m)
Groundwater level (m)
Ch. Suryanarayana et al. / Neurocomputing 145 (2014) 324–335
0
2
4
6
8
months
10
12
5.0
4.5
4.0
3.5 4
4.5
5
5.5
Observed Groundwater level (m)
Fig. 11. Comparison of observed and computed groundwater levels at Gullalapalem observation well using (a) the best WA-SVR model, (b) best SVR model, (c) best ANN model and (d) ARIMA model for 1-month-ahead forecast.
for the best ANN model are 33%, 118% and 12% and for the best ARIMA model are 71%, 167% and 26% respectively. The results indicate that the maximum error in the best WA-SVR model is less when compared to other models. The result of SVR model is more accurate than ANN model and the ANN model results are superior to ARIMA model results. Thus, based on the performance criteria
and maximum error in prediction obtained for the study area it can be concluded that the WA-SVR model is a superior alternative to SVR, ANN and ARIMA models to forecast the groundwater levels. In future a multivariate time series analysis can be done considering either by adding effective D series and A series or separately giving Ds and As as the input to the SVR model.
334
Ch. Suryanarayana et al. / Neurocomputing 145 (2014) 324–335
Table 6 Comparison of errors in computed groundwater depths by ARIMA, ANN, SVR and WA-SVR models for Sivajipalem observation well. Computed depth in bgl (m)
Month Observed depth in bgl (m)
Error (%)
ARIMA ANN SVR WA-SVR ARIMA ANN SVR WA-SVR 1 2 3 4 5 6 7 8 9 10 11 12
3.71 5.35 5.40 5.81 5.25 3.55 4.35 5.30 5.95 7.10 8.10 8.70
3.61 4.36 4.97 5.44 5.79 6.04 6.23 6.37 6.48 6.55 6.61 6.66
3.77 4.30 5.13 4.82 5.63 4.16 3.07 3.56 4.85 6.49 7.56 8.32
4.18 4.71 5.23 4.25 5.68 4.16 4.03 3.98 4.51 6.79 7.34 8.96
4.22 4.72 6.01 6.19 5.89 4.16 4.49 6.10 6.15 7.21 8.25 9.20 Average
3 19 9 7 11 71 44 21 9 8 19 24 20
2 13 14 20 13 12 5 4 12 18 27 7 8 9 13 18 18 17 30 8 4 33 25 16 19 25 4 9 5 2 7 10 2 5 4 6 15
13
9
Table 7 Comparison of errors in computed groundwater depths by ARIMA, ANN, SVR and WA-SVR models for Madhurawada observation well. Computed depth in bgl (m)
Month Observed depth in bgl (m)
Error (%)
ARIMA ANN SVR WA-SVR ARIMA ANN SVR 1 2 3 4 5 6 7 8 9 10 11 12
4.01 4.85 5.50 6.21 4.55 4.80 4.36 2.16 5.22 6.80 7.12 7.70
4.00 4.66 5.13 5.42 5.59 5.69 5.74 5.75 5.75 5.75 5.73 5.72
4.47 4.99 5.17 5.44 6.79 3.59 4.84 4.69 3.60 5.54 7.97 8.30
5.22 5.61 5.67 5.80 6.58 5.10 5.61 5.02 4.42 6.33 6.97 7.29
3.57 3.82 5.40 6.26 5.20 4.75 3.97 3.22 5.35 7.16 7.63 7.51 Average
1 4 7 13 23 19 32 167 11 16 20 26 28
WA-SVR
12 31 11 3 16 22 6 4 2 13 7 1 50 45 15 26 7 1 11 29 9 118 133 50 32 16 3 19 7 6 12 3 8 8 6 3 26
25
11
Table 8 Comparison of errors in computed groundwater depths by ARIMA, ANN, SVR and WA-SVR models for Gullalapalem observation well. Computed depth in bgl (m)
Month Observed depth in bgl (m)
Error (%)
ARIMA ANN SVR WA-SVR ARIMA ANN SVR WA-SVR 1 2 3 4 5 6 7 8 9 10 11 12
4.62 4.71 4.77 4.91 4.52 4.12 4.27 4.88 5.18 5.11 5.20 5.02
4.48 4.96 4.79 4.80 4.50 4.36 4.12 4.02 3.91 3.90 3.88 3.92
4.71 4.94 4.85 4.84 5.03 4.20 4.39 4.31 4.88 5.10 5.16 5.57
4.56 5.03 4.90 4.73 4.94 4.32 4.44 4.42 4.78 5.23 5.37 5.51
4.60 4.45 4.67 5.07 4.81 4.28 4.30 4.97 4.91 4.82 5.44 5.19 Average
3 6 1 3 1 6 4 18 25 24 26 22 12
3 2 1 5 7 6 2 3 3 2 4 4 12 10 7 3 5 4 3 5 1 12 10 2 6 8 5 1 3 6 1 4 5 11 10 4 6
6
4
Acknowledgments The authors wish to thank the Directors and other authorities of A.P. State Groundwater Board and Indian Meteorological Department for providing necessary data for carrying out this work. We also acknowledge the support rendered by Mr. K.S. Sastry, Deputy Director, State Groundwater Board–Visakhapatnam during this work. We also thank the reviewers of this paper whose comments have significantly improved its quality. References [1] Y. Sun, T. Clement, A decomposition method for solving coupled multi-species reactive transport problems, Transp. Porous Media 37 (3) (1999) 327–346. [2] S. Sun, C. Zhang, G. Yu, A Bayesian network approach to traffic flow forecasting, IEEE Trans. Intell. Transp. Syst. 7 (1) (2006) 124–132. [3] S. Sun, C. Zhang, The selective random subspace predictor for traffic flow forecasting, IEEE Trans. Intell. Transp. Syst. 8 (2) (2007) 367–373. [4] S. Sun, X. Xu, Variational inference for infinite mixtures of gaussian processes with applications to traffic flow prediction, IEEE Trans. Intell. Transp. Syst. 12 (2) (2011) 466–475. [5] C. Sivapragasam, P. Sugendran, M. Marimuthu, S. Seenivasakan, G. Vasudevan, Fuzzy logic for reservoir operation with reduced rules, Environ. Prog. 27 (1) (2008) 98–103. [6] J. Adamowski, C. Karapataki, Comparison of multivariate regression and artificial neural networks for peak urban water-demand forecasting: evaluation of different ann learning algorithms, J. Hydrol. Eng. 15 (10) (2010) 729–743. [7] C. Sivapragasam, S. Liong, Flow categorization model for improving forecasting, Nordic Hydrol. 36 (1) (2005) 37–48. [8] L. Yu, A. Porwal, E.-J. Holden, M.C. Dentith, Towards automatic lithological classification from remote sensing data using support vector machines, Comput. Geosci. 45 (2012) 229–239. [9] G.P. Petropoulos, C. Kalaitzidis, K. Prasad Vadrevu, Support vector machines and object-based classification for obtaining land-use/cover cartography from hyperion hyperspectral imagery, Comput. Geosci. 41 (2012) 99–107. [10] N. Muttil, J. Lee, Genetic programming for analysis and real-time prediction of coastal algal blooms, Ecol. Model. 189 (3–4) (2005) 363–376. [11] J. Adamowski, H.F. Chan, A wavelet neural network conjunction model for groundwater level forecasting, J. Hydrol. 407 (1) (2011) 28–40. [12] C. Sudheer, R. Maheswaran, B. Panigrahi, S. Mathur, A hybrid SVM–PSO model for forecasting monthly streamflow, Neural Comput. Appl. (2013) 1–9, http://dx.doi.org/10.1007/s00521-013-1341-y. [13] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995) 273–297. [14] S. Liong, C. Sivapragasam, Flood stage forecasting with support vector machines 1, J. Am. Water Resour. Assoc. 38 (1) (2002) 173–186. [15] C. Sivapragasam, N. Muttil, Discharge rating curve extension—a new approach, Water Resour. Manag. 19 (5) (2005) 505–520. [16] M.S. Khan, P. Coulibaly, Application of support vector machine in lake water level prediction, J. Hydrol. Eng. 11 (3) (2006) 199–205. [17] J.-Y. Lin, C.-T. Cheng, K.-W. Chau, Using support vector machines for long-term discharge prediction, Hydrol. Sci. J. 51 (4) (2006) 599–612. [18] K. Singh, M. Pal, C. Ojha, V. Singh, Estimation of removal efficiency for settling basins using neural networks and support vector machines, J. Hydrol. Eng. 13 (3) (2008) 146–155. [19] K. Lamorski, Y. Pachepsky, C. Sławiński, R. Walczak, Using support vector machines to develop pedotransfer functions for water retention of soils in Poland, Soil Sci. Soc. Am. J. 72 (5) (2008) 1243–1247. [20] M. Karamouz, A. Ahmadi, A. Moridi, Probabilistic reservoir operation using Bayesian stochastic model and support vector machine, Adv. Water Resour. 32 (11) (2009) 1588–1600. [21] W.-C. Wang, K.-W. Chau, C.-T. Cheng, L. Qiu, A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series, J. Hydrol. 374 (3) (2009) 294–306. [22] S.-T. Chen, P.-S. Yu, Y.-H. Tang, Statistical downscaling of daily precipitation using support vector machines and multivariate analysis, J. Hydrol. 385 (1) (2010) 13–22. [23] C. Sudheer, D. Kumar, R.K. Prasad, S. Mathur, Optimal design of an in-situ bioremediation system using support vector machine and particle swarm optimization, J. Contam. Hydrol. 151 (2013) 105–116. [24] O. Kisi, M. Cimen, A wavelet-support vector machine conjunction model for monthly streamflow forecasting, J. Hydrol. 399 (1) (2011) 132–140. [25] J. Shawe-Taylor, S. Sun, A review of optimization methodologies in support vector machines, Neurocomputing 74 (17) (2011) 3609–3618. [26] C. Sivapragasam, S. Liong, M. Pasha, Rainfall and runoff forecasting with SSA-SVM approach, J. Hydroinf. 3 (3) (2001) 141–152. [27] X. Yu, S. Liong, V. Babovic, EC-SVM approach for real-time hydrologic forecasting, J. Hydroinf. 6 (3) (2004) 209–223. [28] A.M. Kalteh, Monthly river flow forecasting using artificial neural network and support vector regression models coupled with wavelet transform, Comput. Geosci. 54 (2012) 1–8.
Ch. Suryanarayana et al. / Neurocomputing 145 (2014) 324–335
[29] T. Partal, H.K. Cigizoglu, Estimation and forecasting of daily suspended sediment data using wavelet–neural networks, J. Hydrol. 358 (3) (2008) 317–331. [30] M. Jayabharata Reddy, D. Mohanta, A wavelet-fuzzy combined approach for classification and location of transmission line faults, Int. J. Electr. Power Energy Syst. 29 (9) (2007) 669–678. [31] R. Maheswaran, R. Khosa, Wavelet-volterra coupled model for monthly stream flow forecasting, J. Hydrol. 450 (2012) 320–335. [32] R. Maheswaran, R. Khosa, Comparative study of different wavelets for hydrologic forecasting, Comput. Geosci. 46 (2012) 284–295. [33] M. Misiti, Y. Misiti, G. Oppenheim, J.-M. Poggi, Matlab Wavelet Toolbox User's Guide. Version 3. [34] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (3) (2011) 27. [35] K. Choy, C. Chan, Modelling of river discharges and rainfall using radial basis function networks based on support vector regression, Int. J. Syst. Sci. 34 (1) (2003) 763–773. [36] H. Demuth, M. Beale, Neural Network Toolbox for Use with MATLAB, Users Guide, Version 3.0., The Mathworks Inc., Natick., MA, 1997. [37] A. Bryman, Quantitative Data Analysis with SPSS Release 12 and 13: A Guide for Social Scientists, Routledge, 2005.
Ch. Suryanarayana received his M.Tech in Water Resources Engineering from Indian Institute of Technology, Delhi. Currently he is pursuing his Ph.D. from Andhra University College of Engineering, Visakhapatnam, India. He is working as an Assistant Professor in Civil Engineering Department, G.V.P. College of Engineering, Visakhapatnam. His research interests include groundwater modeling and optimization of water resource management.
Ch. Sudheer received his Ph.D. in Water resources from Indian Institute of Technology Delhi. Currently he is a Senior Project Scientist in Civil engineering Department at IIT Delhi. His research interests include Groundwater Contamination, Bioremediation of Soils, Design of landfills, Developing mathematical models for Malaria transmission.
335
Vazeer Mahammood is a professor of Civil Engineering in Andhra University College of Engineering, Visakhapatnam, India. He has Master of Engineering in Hydraulics, Coastal & Harbour engineering. He has also done M.Tech in Remote Sensing & GIS from CSSTEAP (affiliated to United Nations) and Ph.D. in Water Resources Engineering. He is guiding research works in water resources engineering and applied remote sensing & GIS. He has done major research projects funded by UGC. He is consultant for many governmental civil engineering works.
B.K. Panigrahi is an Associate Professor with the Department of Electrical Engineering, Indian Institute of Technology (IIT), New Delhi. His main research focuses on the development of advanced DSP tools and machine intelligence techniques for power quality studies, protection of power systems, etc. He also works in the area of application of evolutionary computing techniques to solve the problems related to power system planning, operation, and control.