Expert Systems With Applications 83 (2017) 145–163
Contents lists available at ScienceDirect
Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa
Developing an intelligent expert system for streamflow prediction, integrated in a dynamic decision support system for managing multiple reservoirs: A case study Hamed Zamani Sabzi a,∗, James Phillip King a,1, Shalamu Abudu b a b
Department of Civil Engineering, New Mexico State University, MSC 3CE, PO Box 30001, Las Cruces, NM, USA, 88003 Texas AgriLife Research & Extension Center at El Paso, Texas A&M University System, 1380 A&M Circle, El Paso, TX 79927, USA
a r t i c l e
i n f o
Article history: Received 23 June 2016 Revised 18 April 2017 Accepted 18 April 2017 Available online 19 April 2017 Keywords: ANFIS ARIMA ANN Hybrid model of ANN-ARIMA Data mining Streamflow prediction
a b s t r a c t Since fresh water is limited while agricultural and human water demands are continuously increasing, optimal prediction and management of streamflows as a source of fresh water is crucially important. This study investigates and demonstrates how data preprocessing and data mining techniques would improve the accuracy of streamflow predictive models. Based on easily accessible Snow Telemetry data (SNOTEL), four streamflow prediction models – autoregressive integrated moving average (ARIMA), artificial neural networks (ANNs), a hybrid-model of ANN and ARIMA (ANN-ARIMA), and an adaptive neuro fuzzy inference system (ANFIS) – were developed and utilized in a streamflow prediction process on Elephant Butte Reservoir. Utilizing the statistical correlation analysis and the extracting importance degrees of predictors led to efficiently select the most effective predictors for daily and monthly streamflow to Elephant Butte Reservoir. For the daily prediction time step, by preprocessing the historical data and extracting and utilizing the extracted climate variability indices through data mining techniques, the ANFIS model achieved a superior streamflow prediction performance for Elephant Butte Reservoir compared to the other three evaluated prediction models. Additionally, for predicting monthly streamflow to the Elephant Butte Reservoir, ANFIS showed significantly higher accuracy than the ANNs. As an optimal application of the developed predictive expert systems, successful integrating the prediction models in integrated reservoir operations balanced the need for a reliable supply of irrigation water against losses through evaporation. The optimal operation plan significantly minimizes the total evaporation loss from both reservoirs by providing the optimal storage levels in both reservoirs. This study provides the conceptual procedures of non-seasonal (ARIMA) model, and since the model is univariate, it demonstrates a strongly-reliable inflow prediction when existing information is limited to streamflow data as a predictor. © 2017 Elsevier Ltd. All rights reserved.
1. Introduction Multi-objective reservoirs play a critical role in providing a reliable source of fresh water for domestic human needs, agricultural water demands, hydroelectricity production, and environmental purposes. Therefore, to adequately meet these needs, multiobjective reservoirs should be operated optimally. Elephant Butte Reservoir is a multi-objective reservoir that produces electrical power and provides water for south-central New Mexico and west Texas. For irrigation, it provides water for about 170,0 0 0 acres of
∗
Corresponding author. E-mail addresses:
[email protected] (H. Zamani Sabzi),
[email protected] (J.P. King),
[email protected] (S. Abudu). 1 Member of the Engineering Research Center for Re-inventing Urban Water Infrastructure, Stanford University. http://dx.doi.org/10.1016/j.eswa.2017.04.039 0957-4174/© 2017 Elsevier Ltd. All rights reserved.
agricultural lands. Elephant Butte Reservoir releases water to Caballo Reservoir, 25 mi downstream of Elephant Butte Reservoir. Direct water release from Caballo Reservoir provides water for irrigation downstream in the Rio Grande Project. The two reservoirs of Elephant Butte and Caballo are considered as an integrated system whose streamflow values, outflows, and storage volumes can be simulated. A reliable release plan from Caballo directly relates to the storage levels at Caballo Reservoir. The larger of the two reservoirs, Elephant Butte, retains water across a length of 40 mi, and the total surface area of that water can reach 57 mi2 . Although retaining such high levels of water would increase the likelihood of meeting the area’s water needs, this practice can cause significant water losses to evaporation: during the summer, evaporation rates in the area can reach 1 cm per day, and the historical evaporation rates are measured as pan evaporation rates in each specific region. Therefore, a decision support
146
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
system that balances the need for reliable water supplies against evaporation loss, under realistic conditions that are uncertain and dynamic, can provide significant benefits. The optimal operation policy critically depends on accurately estimating the streamflow to Elephant Butte Reservoir. Depending on the forecasting condition and availability of the utilized data, different forecast models would be appropriate for different forecasting conditions. Although some forecast models in many cases have better prediction performances, but appropriateness of any specific streamflow forecasting model is not universal. Therefore, it is crucially important to investigate the appropriateness of several streamflow forecast models for a specific prediction condition, and select the most reliable streamflow forecast model. As a result, considering the several previously developed studies on streamflow forecasting, four categories of the forecast models of time series models (ARIMA), ANNs, hybrid models of ANN and ARIMA (ANN-ARIMA), and ANFIS applied in predicting hydrologic parameters were investigated through this study. Then considering the accuracy performances, the most accurate forecast model is introduced for this study case along with introducing an efficient approach in selecting the effective predictors. Previously, many researchers have studied applications of those forecast models in diverse engineering predictions including streamflow forecasting. Cumulatively, the two major artificial-intelligence-based prediction models of ANN and ANFIS have been widely used to estimate diverse hydrologic objectives such as the dispersion coefficient of natural streams, earthquake probability, global solar radiation, daily pan evaporation, fishing rate, and lake level fluctuations. In several of the previous studies such as study that was developed by Adamowski (2008), accuracy performances of different models were compared. Adamowski (2008) investigated several prediction models including multiple linear regression models, time series, and ANNs to forecast the peak water daily demand in the city of Ottawa, Canada for the summer period of May to August. He utilized historical data of 10 years on peak daily water demand and meteorological variables of the maximum daily temperature and daily rainfall. His accuracy performance analysis showed that ANNs predicted the peak daily water demand with higher accuracy compared to the other multiple linear regression and time series based models. Basically, in majority of prediction cases a structure of ANNs based forecast model is utilized to map the existing nonlinear relationships between the predictors and predicted values. Therefore, observing higher accuracy for ANNs would be due to the existing non-linear relationships between the meteorological variables and peak daily water demands (Adamowski, 2008). Basically, developing an accurate and efficient ANNs based forecast model relies on two factors of correct election of the predictors and designing an efficient ANN along with efficient training and validation techniques. Some of previous researchers such as Coulibaly, Anctil, and Bobée (20 0 0) studied accuracy performance of utilizing various techniques for training the defined ANNs. Findings indicates that obtaining a higher accuracy by utilizing specific training and validation techniques is not a universal expectation and it might be different case by case (Coulibaly et al., 20 0 0). However, in majority of the evaluated previous studies through this work such as studies developed by Imrie, Durucan, and Korre (20 0 0) and Othman and Naseri (2011), a typical feed-forward network with Levenberg–Marquart Back Propagation (LMBP) structure and algorithm were utilized (Imrie et al., 20 0 0). The same approach in searching for an optimal structure of the ANNs for streamflow prediction was followed by other researchers. Valipour, Banihabib, and Behbahani (2012) specifically investigated the searching procedure to find the near-optimal structure of the developed ANNs that was used to predict monthly streamflow to the Dez Reservoir in Iran (Valipour et al., 2012). Some studies
such as the study developed by Wu, Chau, and Li (2009) indicate that simultaneously utilizing the effective streamflow lagged values along with the other effective predictors would improve the prediction accuracy. In more details, Wu et al. (2009) examined using different hybrid models of moving average (MA), wavelet multi-resolution analysis (WMRA), and singular spectrum analysis (SSA) coupled with an ANNs model to improve the daily streamflow prediction accuracy based on data from different watersheds in China. Their numerical results of accuracy performance analysis of developed hybrid models showed that utilizing effective streamflow lagged values from moving average analysis along with the ANNs (ANN-MA) provided the most accurate model compared to the other developed hybrid models of ANN-SSA and ANN-WMRA (Wu et al., 2009) ANFIS, too, has been used by several researchers to develop streamflow and runoff forecast models. Majority of the previously developed studies on hydrologic forecast models such as studies developed by Atsalakis and Minoudaki (2007) and El-Shafie, Taha, and Noureldin (2007) reported that utilizing ANFIS as a forecast model provides higher accuracy compared to the time series and ANNs based models. Atsalakis and Minoudaki (2007) utilized ANFIS to predict daily irrigation water demand, and compared their results with ARIMA based predicted values. Their results showed that ANFIS is more accurate model than ARIMA in predicting the water demand values (Atsalakis & Minoudaki, 2007). ElShafie et al. (2007) used ANFIS as a technique to predict monthly streamflow for the Nile River inflowing to the Aswan High Dam (AHD). They used 130 years gathered historical data to develop the ANFIS-based forecast model. The prediction accuracy performance of their model indicates higher prediction accuracy for ANFIS based forecast model compared to the previously developed ANNs-based forecast model (El-Shafie et al., 2007). Pramanik and Panda (2009) utilized ANN and ANFIS to develop a forecast model to predict the daily outflow from a barrage located on the Mahanadi River Basin, India based on flow data upstream of the barrage. Their accuracy performance of the developed forecast models indicate that ANFIS based models provide higher prediction accuracy compared to the ANN based models (Pramanik & Panda, 2009). Talebizadeh and Moridnejad (2011) used ANN and ANFIS models to predict the lake level fluctuations in Urmia Lake located in northwest of Iran. In developing the lake level fluctuations, they utilized time series of lake levels, rainfall magnitudes, evaporation rates, and inflow to the lake as predictors (Talebizadeh & Moridnejad, 2011). Similar to the several other researches’ studies that were considered as literature review, wherever ANNs and ANFIS utilize the same sort of the predictors, in most cases ANFIS provided accurate data compared to the ANNs. Although in the majority of the previous researches, ANFIS provided better prediction accuracy, still there are some cases such as study that was developed by Karimi-Googhari and Lee (2011) report that ANNs-based forecast model provided better prediction accuracy compared to the ANFIS. As a results, expecting the higher prediction accuracy from ANFIS based forecast models compared to the ANNs-based models is not an universal fact (Karimi-Googhari & Lee, 2011). The following material details some of the other previous studies in using ANNs, ARIMA, and hybrid models for predicting streamflow. Jain, Das, and Srivastava (1999) utilized ANN and ARIMA models to predict the monthly streamflow to the Indravati multipurpose reservoir in Orissa, India; then. they utilized the predicted values to optimize the reservoir operation policies (monthly release plans) (Jain et al., 1999). Mohammadi, Eslami, and Dayyanidardashi (2005) studied the similarities and dissimilarities of the regression-based model, ARIMA, and ANNs in forecasting reservoir streamflow in Karaj, Iran (Mohammadi et al., 2005). Some researchers such as Muluye and Coulibaly (2007) in order to improve the prediction accuracy, utilized climate variability indices
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
along with historical time series of streamflow data to forecast the seasonal streamflow from a watershed to the reservoir (Muluye & Coulibaly, 2007). In this study, runoff regime in the region of the study area is snowmelt dominated which led us to select and include the most effective physical predictors through forecasting model development procedure. Shukla, Sheffield, Wood, and Lettenmaier (2013) investigated various physical parameters that affect global land surface hydrologic predictability (Shukla et al., 2013). Therefore, considering the physical characteristics of the study area can lead to appropriately selecting the optimal hydrologic prediction model. So far, most of the previous studies show that the optimal operation of a reservoir depends on the accurate estimation of streamflow values. Since most models have different levels of prediction accuracy when they are applied on different case studies, it is significantly important to select the most appropriate prediction models. Therefore, to optimize the operation of the Elephant Butte and Caballo reservoirs, the present study investigates several prediction models to accurately estimate the streamflow to Elephant Butte Reservoir. The accurately predicted values are utilized in a multi-objective decision support system that minimizes total evaporation loss by providing the optimal storage levels on both reservoirs. Simultaneously, release reliability from the Caballo Reservoir is maximized. In more details, based on the accuracy performances, the most accurate model was introduced. Then, values for one specific month predicted through the utilized prediction models were incorporated in the operation policy for one month to adjust the storage levels on Elephant Butte and Caballo reservoirs. The extended processes of developing the procedures of ARIMA, ANN, and ANN-ARIMA have been described through two independent studies that are in the publication process. Two independent studies have been developed by authors of this study to provide comprehensive details regarding the development of optimal streamflow prediction models focusing on ANNs and ARIMA models (Zamani Sabzi, King, & Abudu, 2017a, 2017b). The following sections describe the study area, utilized data, streamflow prediction models, mass balance concept (control volume), and the optimal choice of reservoir for storing a specific amount of streamflow for a specific prediction period. The adaptive reservoir operation model will support daily operations when short-term predicted values are utilized in providing the optimal operation plans and daily observed values are fed to the prediction model to update the prediction model’s accuracy.
147
tion no: 08358400), which located in the river channel, and the Rio Grande Conveyance Channel at San Marcial, NM (USGS Station no. 08358300), a developed channel parallels the Rio Grande. The sum of the streamflows in these two channels are considered as total streamflow at San Marcial inflowing to the Elephant Butte Reservoir. All data utilized as model inputs were easily accessible online from Natural Resources Conservation Center (NRCS), National Weather Service (NWS) websites, and United States Geology Service (USGS). Specifically, the snow water equivalent (SWE) data were obtained through the Natural Resources Conservation Service (NRCS) website: http://www.wcc.nrcs.usda.gov/snow/. The SNOTEL precipitation (PRCP) data were obtained from NRCS website: http://www.wcc.nrcs.usda.gov/snow/. In addition, the monthly precipitation data for weather stations were obtained through the Western Region Climate Center (WRCC) website: http://www.wrcc. dri.edu/. Considering the physical characteristics of the study area, the major portion of the streamflow is from snowmelt, therefore, snowpack related data were the most effective selected predictors through the developed prediction models. The snowpack related historical data have been obtained from 12 snow telemetry sites in upper side of the Elephant Butte’s watershed region. Regarding the data sources for the SNOTEL sites where the data have been obtained, Table 1. Represents the names, locations, geographical coordinates, and the elevations of the selected sites. Twelve SNOTEL sites were chosen for this study. In addition, the correlation analysis indicated that the SNOTEL sites located in west of Rio Grande in New Mexico and Colorado have stronger correlations than the SNOTEL sites located in east of Rio Grande in New Mexico and Colorado, although three SNOTEL sites were selected form the east sides, since the selected sites located in the positions that affects the total streamflow in the Rio Grande. Historical precipitation record indicates that the average of total annual based on the gathered information precipitation amount measured on the 12 SNOTEL sites was 189.1 in. The total SWE amount for the six-month period of the available snowpack on the watershed based on the gathered information measured on the 12 SNOTEL sites was 65.6 inches. The minimum, maximum and average of annual temperatures from the 12 weather station sites on the watershed are −8 °C, 12 °C, and 1 °C respectively. Total watershed area above the Elephant Butte Reservoir is about 29,0 0 0 mi2 . The total capacities of the Elephant Butte and Caballo reservoirs are about 2.5 billion m3 and 424 million m3 , respectively. 3. Methodology
2. Study area and data used The required data include the recorded historical streamflow data measured at San Marcial gauges from 1961 to 2015, historical agricultural water demand, rainfall rates, and measured evaporation rates. These data sources were used to estimate the agricultural water demand and evaporation loss from both Elephant Butte and Caballo reservoirs. Physical characteristics of the reservoirs and recorded bathymetry data by USGS were utilized to develop the stage-surface area and stage-storage relationships for both Elephant Butte and Caballo reservoirs. Fig. 1 shows the developed watershed area above the Elephant Butte Reservoir in this study. The digital elevation model (DEM) maps were obtained through USGS website and GIS software (Arc Map 10.1) was utilized to develop the watershed boundary and its sub-basins. The study area as shown in Fig. 1 is the Rio Grande watershed located in Colorado and New Mexico upstream of the Elephant Butte Reservoir in New Mexico. Two US geological survey gaging stations are utilized to measure the flow at San Marcial, New Mexico at the upstream of the reservoir. Two measurement sites include: the Rio Grande Floodway at San Marcial, NM (USGS Sta-
Fig. 2 shows the schematic diagram of the integrated reservoirs’ operation model. The operation model includes a prediction engine with four prediction models, a control volume concept, and an optimization process. The historical recorded data are utilized through streamflow prediction models. Considering the accuracy performance of the developed models, the most accurate model is used to provide the daily predicted values. The predicted values are utilized in the mass balance calculation process. Finally, the optimal storage strategy on the two reservoirs is obtained. This study utilized the ANFIS, ANN, and the hybrid model of ANN and ARIMA as multivariate models and ARIMA as a univariate streamflow prediction model. The accuracy performances and the availability of data on the predictors identified the appropriate streamflow prediction model. 3.1. ARIMA model ARIMA models are stochastic univariate time series models that were developed by Box and Jenkins (1976). Generally, the ARIMA models are the generalized forms of the auto regressive moving
148
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
Fig. 1. Study area showing the locations of Elephant Butte and Caballo reservoirs, the watershed area above Elephant Butte Reservoir, and the Rio Grande above Caballo Reservoir.
average (ARMA) models (Box & Jenkins, 1976). ARIMA as nonseasonal univariate model is applicable prediction technique when existing information is limited to streamflow data as a predictor. The nonseasonal form of ARIMA models is defined as: ARIMA(p, d, q), in which parameters p, d, and q are the number of autoregressive lags, order of differencing, and number of moving average lags through the ARIMA model, respectively. In this work,
the three stages of identification, estimation, and diagnostic check defined by Box and Jenkins were followed and examined to find the most accurate ARIMA model for daily streamflow prediction. The appropriate parameters of (p and q) were obtained by investigating autocorrelation (ACF) and partial autocorrelation functions (PACF). The parameters of (p and q) demonstrate the time dependency of the time series values. Abudu (2009) and Abudu, King,
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
149
Table 1 SNOTEL Sites used in the study of Rio Grande Basin above the Elephant Butte Reservoir, New Mexico. SNOTEL SITES
Location
Bateman Chamita Culbera#2 Cumbres Trestle Gallegos PEAK Hopewell Lily Pond Middle Greek Quemazon Red River Pass#2 Upper San Juan Wolf Greek
Elevation (feet)
West longitude
North latitude
Region
106.32 106.66 105.2 106.45 105.56 106.26 106.55 107.03 106.39 105.34 106.84 106.8
36.51 36.96 37.21 37.02 36.19 36.72 37.38 37.62 35.92 36.7 37.49 37.48
Rio Rio Rio Rio Rio Rio Rio Rio Rio Rio Rio Rio
and Bawazir (2011) utilized a seasonal ARIMA (SARIMA) model to predict the monthly streamflow to the Elephant Butte Reservoir (Abudu, 2009; Abudu et al., 2011). The ARIMA model was developed and examined through data preprocessing, processing, and programming in Statistical Analysis Software (SAS). Generally, a non-seasonal ARIMA model is defined through Eqs. (1)–(6) as follows:
ϕ (B )xt = θ (B )at
(1)
where xt = 1 − Bd Xt , as stationary series after differencing,
Grande Grande Grande Grande Grande Grande Grande Grande Grande Grande Grande Grande
west in New Mexico west in New Mexico east in New Mexico west in Colorado east in New Mexico west in New Mexico west in Colorado west in Colorado west in New Mexico east in New Mexico west in Colorado west in Colorado
9300 8400 10,500 10,040 9800 10,0 0 0 11,0 0 0 11,250 9500 9850 10,200 11,0 0 0
the Eq. (6) as follows (Zamani Sabzi, King, & Abudu, 2017b):
xt =
√
√ √ √ √ xt−1 + 1.05209 xt−1 − xt−2 − 0.23324 xt−2 − xt−3 √ √ √ √ + 0.0213 xt−7 − xt−8 − 0.02406 xt−8 − xt−9 √ √ √ √ + 0.00835 xt−24 − xt−25 − 0.01832 xt−30 − xt−31 + At − 0.85063At−1 )2
(6)
√ where Wt = Zt − Zt−1 , Zt = xt , xt represents the daily streamflow data to the reservoir (acre-ft), Zt represents the transformed daily streamflow data, Wt is the differenced time series √ √ data, At = error term on xt (considered white noise on xt ), and √ √ √ At−1 = error term on xt−1 = actual xt−1 − predicted xt−1 .
and d is the number of non-seasonal differencing. (2) 3.2. Artificial neural networks
ϕ ( B ) = 1 − ϕ1 B − ϕ2 B − · · · − ϕ p B , 2
p
as nonseasonal autoregressive polynomial
(3)
θ ( B ) = 1 − θ1 − θ2 B 2 − · · · − θq B q , as nonseasonal moving average polynomial, q indicates the order of moving average polynomial. (4) where p indicates the order of autoregressive polynomial, q indicates the order of moving average polynomial, at represents the white noise, Xt represents a dependent variable, and B is backward shift operator (lag operator). The backward shift operator B is calculated via Eq. (5) as follows:
BXt = Xt−1
(5)
Since the mean and variance of non-seasonal daily streamflow data are not constant, the data are non-stationary. In utilizing the ARIMA model for daily prediction, the data are stabilized as stationary (having constant means and variance on time series data) condition by applying the required order of differencing, such as order of differencing applied on the square roots of data. The optimal values of the parameters of p and q orders were obtained considering the covariance values observed through plotting the PACF and ACF, respectively. Based on the utilized data, the ARIMA model (30,1,1) was obtained as the most accurate non-seasonal univariate model. The lags of 1, 2, 7, 8, 24, and 30 were extracted as significant autoregressive factors and passed all required diagnostic checks. The primary daily ARIMA model was obtained as: (11.05209 B∗∗ (1) + 0.23324 B∗∗ (2) - 0.0213 B∗∗ (7) + 0.02406 B∗∗ (8) - 0.00835 B∗∗ (24) + 0.01832 B∗∗ (30)) Wt = (1-0.85063 B∗∗ (1))∗ At , in which is transferred to the usable predictive model through
Basically, ANNs are computational models that were developed through inspiration from the human brain. ANNs can identify the existing complex relationships and patterns between input and output values. Generally, neural networks are formed through interconnected neurons, which are computed by using historical data and future values. The most accurate models are made by using the most meaningful information. Considering the simplicity and complexity of the existing relationships, the best models are obtained by examining both simple and complex models and evaluating the accuracy performances of the constructed models. Feedforward neural networks (FFNNs), the most commonly used ANNs models, were utilized in this study. Fig. 3 shows a typical structure of an ANN model with one input layer, one hidden layer, and one output layer. In a three-layered FFNN, the outputs are computed through the nonlinear transformation of linear combinations of input variables. The output values are explicitly expressed and calculated through Eq. (7) as follows:
yˆk = f0
M j=1
wk j . f h
N
w ji xi + w j0
+ wk0
(7)
i=1
where the parameter of wji represents the optimal assigned weight that connects the ith neuron of the input layer to the jth neuron on the hidden layer, wj0 is the considered bias value assigned to the jth neuron on the hidden layer, fh is the taken activation function applied on the neurons in the hidden layer, wkj is the considered weight that relates the jth neuron on the hidden layer to the kth neuron on the output layer, wk0 is the bias value assigned to the kth neuron on the output layer, and f0 is the taken activation function applied on the output neurons:
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
Streamflow Predicon Engine
150
Inflow esmaon using Univariate and Mul-Variate Predicon Models
Accuracy Performance
ANN
ANFIS
Ulize historical data and update predicon models with observed data
Hybrid model of ANN and ARIMA
ARIMA
Rio Grande
Mass Balance Control (Control Volume)
Tributary streamflow
Direct precipitaon on the reservoir
Elephant Bue Reservoir
Evaporaon from Elephant Bue
Seepage and groundwater interacon
Tributary streamflow
Seepage to the groundwater
Caballo Reservoir
Direct precipitaon on the reservoir
Evaporaon from Caballo Reservoir
Rio Grande
Reservoirs Opmizaon Model
Seepage and groundwater
Subject to: Minimize evaporaon loss Find the opmal storage levels on Elephant Bue and Caballo
Maximize the release reliability Maximize the hydropower electricity producon
Fig. 2. Optimization process in the developed dynamic model.
E (n ) =
N L
2 1 y pk (n ) − yˆ pk (n ) 2
(8)
p=1 k=1
where N represents the number of observations (inputs), L represents the number of predicted output values, ypk (n) represents the observed values (desired target values), and yˆ pk (n ) represents the value simulated by the network for the kth neuron by nth iteration. One of the optimal models obtained through accuracy performances of developed ANNS is formed as follows:
It +1,t +2,t +3,t +4,t +5,t +6,t +7 = f (It , SW Et ∗ , SW E Et ∗ , PM−2 , Mi )
(9)
where It+1, t+2, t+3, t+4, t+5, t+6, t+7 are the predicted daily streamflows at 1, 2, 3, 4, 5, 6, and 7 days ahead; It defines the observed daily value at an assumed day It ; SWEt∗ is the snow water equivalent recorded at the first day of the prediction month; SWE Et∗ defines the snow water equivalent at the first day of the month that has higher correlation between the recorded SWE and the streamflow values ate the prediction month; and PM−2 defines the precipitation index recorded in the two months preceding the predicted month, and Mi is the month index (month number). Eq. (10) represents the model of Eq. (9) along with utilized time dependent trend indices as other effective predictors.
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
151
Input layer Hidden layer
1
Input 1
Output layer
1
2
Input 2
Output 1
1
jth
kth
ith M-1
Input N-1
L
N-1 M
Input N
N
Output L
Weights wkj
Weights wji wk0 Bias values wj0 Bias values
Fig. 3. Typical structure of an FFNN model with one input layer, one hidden layer, and one output layer.
It +1,t +2,t +3,t +4,t +5,t +6,t +7 = f (It , SW Et ∗ , SW E Et ∗ , PM−2 , Mi , P 5Yi , P 2Yi , PYi )
3.5. Monthly streamflow prediction models
(10)
Where P5Yi represents the average value of annual stream inflow in the 5 years before the predicted year, P2Yi represents the average value of annual stream inflow in the 2 years before the predicted year, and PYi represents the average value of annual stream inflow in the year before the predicted year. The rest of the utilized predictors are the same as explained through Eq. (9). 3.3. The hybrid model of the ANN-ARIMA The hybrid model is defined by utilizing the most significant autoregressive lags and the meaningful predictors of developed ANN models.
It +1,t +2,t +3,t +4,t +5,t +6,t +7 = f (It , It−1 , It−2 , It−7 , It−8 , It−24 , It−30 , SW Et ∗ , PM−2 , Mi )
(11)
where It−1 , It−2 , It−7 , It−8 , It−24 , It−30 are the lagged daily streamflow values at 1, 2, 7, 8, 24, and 30 days in the past. The rest of the utilized predictors are the same as explained through Eq. (9). Since, in the developed daily ARIMA model, we utilize It−1 , It−2 , It−7 , It−8 , It−24 , It−30 as lagged daily streamflow values at 1, 2, 7, 8, 24, and 30 days in the past, therefore for predicting 2 days ahead, we will need It , which we do not have it yet and we have to observe it. As a result, when the closest effective lagged data is It−1 , we can predict the streamflow value for just one day, and if the closest effective lagged data was It−2 , we could predict for 2 days ahead. 3.4. Adaptive network based fuzzy inference system (ANFIS) As another prediction model, a fuzzy inference system (FIS) with a Takagi–Sugeno inference system was utilized as an inference system to estimate the streamflow values as output values of the inference system. ANFIS was utilized as a hybrid learning method to find the optimal parameters of the membership functions of the Takagi–Sugeno system. The Takagi–Sugeno FIS defines the input variables through Gaussian distribution, and output variables are defined as constant output values or linear intervals. Then, the parameters of the utilized membership functions are tuned using ANFIS. Fig. 4 illustrates the schematic of a Takagi– Sugeno FIS. (Jang, 1993)
Monthly streamflow prediction models were developed based on the same data set available from January 1961 to August 2015. Physical parameters (snowpack data, precipitation indices, and streamflow values), time indices, and time dependent streamflow trend indices were utilized to develop monthly streamflow prediction models. The time dependent stream flow trend indices were obtained through supervised data mining techniques. Conceptually, importance degrees along with accuracy performance analysis led us to develop monthly prediction models through Eqs. (12) and (13).
Im = f (Im−1 , SW EEm , SW Em , Pm−1 , Pm−2 , Si , Mi , P 5Yi , P 2Yi , PYi ) (12) Im = f (Im−1 , SW EEm , SW Em , Pm−1 , Pm−2 , Mi ,)
(13)
where Im−1 indicates the observed monthly value at a specific month, SWE represents the snow water equivalent at the first day of the prediction month, SWEEm represents the snow water equivalent at the first day of the month with higher correlation of the SWE and predicted stream inflow values in the prediction month, Pm−1 represents the precipitation index in the month before the predicted month, PM−2 represents the precipitation index two months before the predicted month, Si represents season index, Mi stands for the number of the predicted month, P5Yi represents the average value of annual stream inflow in the 5 years before the predicted year, P2Yi represents the average value of annual stream inflow in the 2 years before the predicted year, and PYi represents the average value of annual stream inflow in the year before the predicted year. 4. Results and discussion 4.1. Accuracy performance analysis on developed daily streamflow prediction models To compare the accuracy performance of the four utilized prediction models, the last 25% of the test data from 1961 to 2015 (from January 2002 to August 2015) was used as a test data set. Tables 2 and 3 represent the accuracy performance of the developed models along with the correlation coefficient fitted between
152
Input variables (Predictors)
Correlation coefficient (r) between predicted and observed daily streamflow values
MI
PM-2
SWEEM
-
-
-
-
∗
∗
∗
∗
∗
∗
∗
∗
It, t-1, t-2, t-7, t-8, t-24, t-30
-
∗
∗
-
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
SWEt
Climate variability indices (P5Yi , P2Yi , PYi )
No. of Month
Prediction period
It+1
It+2
It+3
It+4
It+5
It+6
It+7
∗
-
∗
∗
1 day 7 days 7 days
0.97 0.982 0.977
0.965 0.961
0.948 0.943
0.929 0.926
0.911 0.907
0.897 0.892
0.884 0.877
-
ARIMA (Model 1) 12 Month (ANN) 12 month Hybrid ANN_ARIMA (Model 2) 12 Month (ANN)
0.984
0.963
0.946
0.927
0.908
0.892
0.879
-
12 Month (ANN) (Model 3) 12 Month (ANFIS) (Model 4)
15 days, (first 7 days from 15 days prediction period) 7 days 7 days
0.983 0.987
0.964 0.984
0.945 0.978
0.925 0.973
0.908 0.967
0.893 0.956
0.878 0.950
It
Note: the star sign (∗ ) indicates that the related parameter has been considered as a predictor and the hyphen (−) sign indicates that the related parameter has not been used as a predictor in the developed prediction model. The bold values represent the parameters of the selected models.
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
Table 2 Numerical results of the developed networks for prediction models through different periods of the year.
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
153
Fig. 4. Typical structure of ANFIS model with five predictors and one output. Table 3 Accuracy of the utilized prediction models in different prediction periods. Accuracy performance
It+1
It+2
It+3
It+4
It+5
It+6
It+7
MSE MSE MSE MSE
76,437.3 110,775.3 77,628.1 62,553.1
192,241.2 164,806.7 138,206.5
279,312.5 250,180.5 201,320.2
352,262.5 338,942.4 270,732.2
434,890.2 412,681.4 336,255.4
499,070.6 476,938.0 370,526
566,679.2 543,114.5 419,128
(ARIMA), Model 1 (ANN_ARIMA), Model 2 (ANN), Model 3 (ANFIS), Model 4
The bold values represent the parameters of the selected models.
the predicted and observed daily streamflow values through different predicted future periods. Four models of ARIMA, ANN_ARIMA, ANN, and ANFIS as bolded in Table 2 were selected and compared for their prediction accuracy performances. Three models of ANN-ARIMA, ANN, and ANFIS that utilize the same sort of predictors (MI, PM-2 , SWEEM , SWEt , It ) along with ARIMA model (four models) were selected and compared for their prediction accuracy performances. Since the ARIMA model predicts only for one time step ahead, and just utilizes inflow data (It ), as shown in Table 2, provides the predicted value only for It+1 . For ARIMA model, the correlation coefficient between the predicted and observed daily values was 0.97. Figs. 5–8 illustrate the predicted values versus the observed values. As shown in Table 3, ANFIS is significantly more accurate than the other three models in estimating the streamflow to Elephant Butte Reservoir. After ANFIS, the ARIMA model accurately predicts one day ahead compared to ANN and ANN-ARIMA. To extend the prediction period to more than one day ahead, the one-day-ahead predicted value should be used as another observed value. Tables 2 and 3 represent the prediction accuracy performances for the selected four models. In order to accuracy performance comparison,
both correlation coefficient and MSE (Mean of Squared Error) values are considered, Therefore, Table 2 represents the correlation coefficient and Tables 3 for the selected models (for the selected models, the correlation coefficient have been highlighted as bold values), the related MSE values have been represented. Although, the daily prediction model (the ANN model as shown in Table 2, fourth row of the numerical results) that predicts for 15 days ahead has slightly higher correlation coefficient value compared to the selected ANN model (the ANN model as shown in Table 2, fifth row of the numerical results), it was not selected as an optimal model due to its high MSE values for the predicted streamflow values. In addition, the selected models of ANN-ARIMA, ANN, and ANFIS predict for the 7 days ahead, and only ARIMA model predicts only one day ahead on daily streamflow prediction models. For the ARIMA model, the correlation coefficient between predicted and observed daily values was 0.97, indicating that the daily ARIMA prediction model was significantly accurate just only for one day ahead. Since this produces a higher error value for extended predictions, ARIMA is not considered an appropriate model to predict for more than one day ahead. The values predicted by the most appropriate prediction model would be incorporated in the mass bal-
154
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
Fig. 5. Output values (predicted) versus Target values (observed) using ANFIS. 25% of the data from 1961 to 2015 was used as a test data set.
Targets Outputs
10000 , Acre-feet
Targets and Outputs
15000
5000
0 0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
4000
4500
5000
Number of ulized daily data
Error , Acre-feet
MSE=77628.1, RMSE=278.618 4000 2000 0 -2000 -4000
0
500
1000
1500
2000
2500
3000
3500
Number of ulized daily data Fig. 6. Output values (predicted) versus target values (observed) using ANN. 25% of the data from 1961 to 2015 was used as a test data set.
ance (volume balance) law to investigate the two integrated reservoirs’ operation optimization. In addition, because of the different physical characteristics of the two reservoirs, the evaporation rates in different times of the year are different at each reservoir, and these differences are considered to calculate the evaporation loss from each reservoir. The observed timely evaporation rates are shown in Fig. 9. The average annual evaporation rates from Elephant Butte and Caballo reservoirs are 112.8 inch/year and 107.6 inch/year respectively. Considering the numerical results in Table 4, in order to demonstrate the application of the most appropriate prediction model, ANFIS was successfully used to predict the streamflow for a specific period. Since the Elephant Butte Reservoir is connected to the existing aquifer, there is interaction between the groundwater
and surface water in this reservoir. When the surface water level on the reservoir rises, it charges the groundwater (existing aquifer), and when surface water levels falls, the groundwater charges the surface water level. The historical recorded data on both surface water level and groundwater level in the Elephant Butte Reservoir indicates that the interactions between these two water levels are significantly non-linear. As a result, it is recommended to consider the daily existing water level on the Elephant Butte Reservoir as a reliable parameter in finding the optimal operation plan (release plan) from this reservoir. Because of the existing interaction between the groundwater and surface water in Elephant Butte Reservoir, mass balance equation cannot be used deterministically to find the water levels for a long period ahead. In more details, it is not deterministic that for a
Table 4 Incorporating the daily streamflow values predicted by four prediction models in estimating the surface increments in both Elephant Butte (EB) and Caballo (Ca) reservoirs per acre-feet. Actual Stream-flow to EB, acre-ft
Pred. Streamflow (acre-ft)/day, ARIMA
Pred. Stream-flow (acre-ft)/day, Hybrid
Pred. Stream-flow (acre-ft)/day, ANN
Pred. Stream-flow (acre-ft)/day, ANFIS
Storage level at EB, (acre-ft)
Release from EB, (acre-ft)
Storage level at Ca, (acre-ft)
Release from Ca, (acre-ft)
dS/dV at EB, ac/(acre-ft)
dS/dV at Ca, ac/(acre-ft)
6/1/2015 6/2/2015 6/3/2015 6/4/2015 6/5/2015 6/6/2015 6/7/2015 6/8/2015 6/9/2015 6/10/2015 6/11/2015 6/12/2015 6/13/2015 6/14/2015 6/15/2015 6/16/2015 6/17/2015 6/18/2015 6/19/2015 6/20/2015 6/21/2015 6/22/2015 6/23/2015 6/24/2015 6/25/2015 6/26/2015 6/27/2015 6/28/2015 6/29/2015 6/30/2015
2235.40 1763.33 1610.58 1560.96 1509.40 1281.28 1271.42 1390.39 1261.53 1267.43 1237.70 1192.11 1715.70 3945.10 4821.77 4242.57 3193.38 3159.68 3058.53 2959.25 2834.39 3151.70 3322.25 3254.93 2423.79 1644.30 1301.12 1053.20 954.01 1297.22
2637.46 2193.99 1739.87 1647.28 1595.05 1515.01 1247.55 1288.19 1439.96 1256.74 1304.93 1262.23 1207.07 1852.06 4517.17 4818.95 3923.91 2896.78 3103.60 3065.42 3024.20 2840.57 3168.16 3209.88 3122.77 2199.28 1491.38 1268.24 1064.73 1005.13
1631.86 1377.32 1387.54 1404.90 1352.02 1116.91 1058.73 1089.36 944.59 933.83 904.18 870.78 1285.40 3651.48 4277.03 3309.39 2154.55 2332.50 2262.55 2179.02 2106.18 2693.39 2925.95 2743.98 1746.71 1159.77 997.20 860.28 813.76 1099.24
2001.03 1562.66 1423.33 1378.38 1331.78 1127.60 1118.79 1224.85 1110.00 1115.27 1088.91 1048.59 1519.09 3657.50 4527.70 3952.12 2919.28 2886.45 2788.16 2692.08 2571.45 2878.73 3045.05 2979.21 2178.93 1453.98 1145.23 926.72 840.46 1141.70
2192.16 1744.24 1591.83 1542.30 1491.04 1271.48 1262.29 1374.66 1253.13 1258.62 1231.29 1189.93 1696.91 3862.80 4575.30 4127.49 3097.77 3060.92 2954.43 2855.70 2736.91 3052.34 3242.42 3166.31 2362.81 1625.56 1289.95 1067.16 980.27 1286.25
342,023 345,365 348,407 351,357 353,578 355,699 357,720 359,747 361,672 363,066 364,355 365,108 365,431 366,724 367,587 368,452 369,317 371,051 374,095 376,718 379,574 382,001 384,661 386,775 389,345 391,701 393,841 395,876 397,463 399,054
3828.10 3828.10 3788.43 3768.60 3768.60 3768.60 3788.43 3788.43 3788.43 3808.26 3828.10 3828.10 3828.10 3808.26 3728.93 3689.26 3709.09 3709.09 3709.09 3728.93 3728.93 3728.93 3728.93 3709.09 3689.26 3709.09 3669.42 3649.59 3649.59 3649.59
42,629 40,999 39,611 38,191 36,997 36,080 35,179 34,203 33,011 31,850 30,859 29,892 28,895 27,973 27,299 26,633 25,977 25,405 24,595 23,944 23,208 22,555 21,618 20,791 20,091 19,511 18,774 18,115 17,609 17,310
5373.22 5032.07 4952.73 4911.07 4748.43 4540.17 4464.79 4839.67 5137.19 5008.26 4863.47 4762.31 4724.63 4700.83 4502.48 4462.81 4556.03 4536.20 4625.45 4686.94 4655.21 4637.36 4730.58 4790.08 4677.02 4595.70 4651.24 4669.09 4526.28 4270.41
0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179
0.0629 0.0629 0.0545 0.0545 0.0577 0.0577 0.0577 0.0577 0.0612 0.0612 0.0612 0.0652 0.0652 0.0652 0.0697 0.0697 0.0697 0.0697 0.0521 0.0521 0.0521 0.0521 0.0550 0.0550 0.0582 0.0582 0.0582 0.0582 0.0618 0.0618
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
Date
155
156
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
Targets Outputs
10000 , Acre-feet
Targets and Outputs
15000
5000
0
0
1000
2000
3000
4000
5000
Number of ulized daily data
MSE=110775.3, RMSE=332.829
Error , Acre-feet
4000 2000 0 -2000 -4000
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Number of ulized daily data Fig. 7. Output values (predicted) versus Target values (observed) using Hybrid model of ANN-ARIMA. 25% of data from 1961 to 2015 were used as test data set.
Targets Outputs 10000 , Acre-feet
Targets and Outputs
15000
5000
0
0
1000
2000
3000
4000
5000
Error , Acre-feet
Number of ulized daily data
MSE=76437.329, RMSE=276.473
4000 2000 0 -2000 -4000 0
500
1000
1500
2000 2500 3000 3500 Number of ulized daily data
4000
4500
5000
Fig. 8. Output values (predicted) versus target values (observed) using ARIMA. 25% of the data from 1961 to 2015 was used as a test data set.
long prediction period how much reservoir can discharge water to the groundwater or being recharged from the groundwater. Therefore, it would be a rational decision to plan for a short prediction period (for example, only one day ahead). As a result, for each individual one day ahead, daily observed storage levels is considered for the mass balance calculation; then, considering the observed storage levels on both reservoirs and bathymetry data as shown in Figs. 9 and 11, the evaporation loss rates from both reservoirs can led us to select the optimal storage levels, which simultaneously minimize the evaporation loss and maximize the release reli-
ability to meet the agricultural water demands downstream of Caballo Reservoir. Fig. 10 illustrates the surface area versus storage volumes in both Elephant Butte and Caballo reservoirs. Table 4 indicates the integration of the predicted streamflow values through four developed prediction models through this study including ANFIS, ANN, ARIMA, and Hybrid model of ANN-ARIMA. The predicted values were successfully utilized in the mass balance calculation. Of course, some error are observed through mass balance calculation because of existing interaction between the surface water storage and groundwater level behind the Elephant Butte Reser-
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
Average Evaporation, inch /month
18.00
157
Average eveporation from Elephant Butte Reservoir, inch/month
16.00
Average eveporation from Caballo Reservoir, inch/month
14.00 12.00 10.00 8.00 6.00 4.00 2.00 0.00 1
2
3
4
5
6
7
8
9
10
11
12
Month Fig. 9. Monthly average evaporation from Elephant Butte and Caballo reservoirs.
Surface area, acres (ac)
35000
Elephant Bue Reservoir
Caballo Reservoir
30000 25000 20000 15000 10000 5000 0 0
500000
1000000 Storage volume, ac-ft
1500000
2000000
Fig. 10. Surface area versus storage volumes in both Elephant Butte and Caballo reservoirs.
Daily evaporation volume increment, ac-ft
1.2 1 0.8 0.6
Caballo
Elephant Butte
0.4 0.2 0 0
250000
500000
750000 1000000 1250000 1500000 1750000 2000000 Storage Volume, ac-ft
Fig. 11. Evaporation volume from Elephant Butte and Caballo reservoirs considering different surface area increments in different storage volumes of the reservoirs.
voir. Last two columns in Table 4 shows the surface increment per acre-feet estimated through bathymetry data of stage-surface area and stage-storage relationships for both Elephant Butte and Caballo reservoirs. In Table 4, column 7, considering the existing interaction between the surface and groundwater in the Elephant Butte Reservoir, utilizing the observed storage volumes obtained from bathymetry data (storage levels-volumes relationships) would provide accurate data compared to the storage volumes obtained through the mass balance. Therefore, both mass balance and observed storage levels would be considered, however, utilizing the daily observed storage volume would be significantly reliable. But,
in planning for more than one day ahead, the predicted values are utilized. Fig. 12 shows the numerical results of sensitivity about the mean. In order to analyze the importance degrees of variables, the sensitivity about the mean was determined. Additionally, the developed pre-trained network was batch tested for another type of sensitivity analysis for the predictors on the average monthly inflow; the first input variable was changed by ± one standard deviation, while the rest of the variables were fixed by their respective means. Then, the output of the network was computed for limited steps (usually 50 steps) above and below the mean. The same process was done for the rest of the variables. Sensitivity analy-
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
Normalized sensitivity about the mean of outputs (predicted values)
158
60000 50000 40000 30000 20000 10000 0
Input variavles (predictors) Fig. 12. Sensitivity analysis of effective parameters on monthly inflow prediction (April to September).
Fig. 13. Output values (simulated) versus target values (observed) using ANFIS. The data set from 1961 to 2015 was used to tune the model based on the 10 most effective selected predictors.
sis results, as shown in Fig. 12, represent the variation of output values against variation of each individual input variable. Finally, in order to rank the impacts of the input variables on the outputs (predicted values), relative importance indices were developed by normalizing the sensitivity of the variables (Cheung, Fung, & Coffey, 2006; Cheung, Tam, & Harris, 2012). 4.2. Accuracy performance analysis on developed monthly streamflow prediction models Figs. 13 and 15 illustrate simulated monthly streamflows versus target values (observed) using ANFIS by utilizing Eqs. (12) and (13). Eqs. (12) and (13), which utilized the data set from 1961 to 2015, were used to tune the model based on the 10 and 6 most effective selected predictors, respectively. Figs. 14 and 16 graphically represent the monthly simulated streamflows versus the observed monthly historical streamflows, and the models were developed based on the 10 and 6 most effective selected predictors, respectively. Comparing the 6 most effective parameters, Pm−1 is the least sensitive variable. Although based
on importance degree analysis, Pm−1 has significantly lower importance degree compared to the other utilized predictors of Im−1 , SW EEm , SW Em , Pm−2 , Si , Mi , P 5Yi , P 2Yi , and PYi , but including it in the model improves a little bit the prediction accuracy compared to the model that we did not utilize Pm−1 . The significance of the selected predictors was statistically analyzed, and all included predictors are statistically significant and effective, but they have different degrees of importance (Zamani Sabzi et al., 2017a). As illustrated in Figs. 13–16, utilizing time-dependent indices of PYi , P2Yi , and P5Yi obtained through historical data clustering improved the prediction accuracy (lower RMSE) and coefficient of determination (R2 ). 4.3. Numerical analysis on saved water volume due to considering more efficient management based on the daily streamflow prediction models The release from Elephant Butte Reservoir inflows to the Caballo Reservoir, and release from Caballo Reservoir goes to the downstream agricultural lands. Since these two reservoirs have dif-
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
159
400000
Predicted Monthly Streamflow, ac-
350000
y = 0.8161x + 11404 R² = 0.8219
300000
250000
200000
150000
100000
50000
0 0
50000
100000
150000
200000
250000
300000
350000
400000
Monthly Streamflow, ac- Fig. 14. Monthly simulated streamflows versus observed monthly historical streamflow, model developed based on the 10 most effective selected predictors.
Fig. 15. Output values (simulated) versus target values (observed) using ANFIS. The data set from 1961 to 2015 was used to tune the model based on the 6 most effective selected predictors.
160
Table 5 The operation policies for both Elephant Butte (EB) and Caballo (Ca) reservoirs considering the daily predicted inflows. Pred. Streamflow (acre-ft)/day, ARIMA
Pred. Stream-flow (acre-ft)/day, Hybrid
Pred. Stream-flow (acre-ft)/day, ANN
Pred. Stream-flow (acre-ft)/day, ANFIS
Storage Volume at EB, (acre-ft)
Storage dS/dV at EB, Volume at ac/(acre-ft) Ca, (acre-ft)
dS/dV at Ca, ac/(acre-ft)
Correct Choice for Storing
Saved Volume Actual stream flow (acre-ft)
Saved Volume ARIMA stream flow (acre-ft)
Saved Volume Hybrid stream flow (acre-ft)
Saved Volume ANN stream flow (acre-ft)
6/1/2015 6/2/2015 6/3/2015 6/4/2015 6/5/2015 6/6/2015 6/7/2015 6/8/2015 6/9/2015 6/10/2015 6/11/2015 6/12/2015 6/13/2015 6/14/2015 6/15/2015 6/16/2015 6/17/2015 6/18/2015 6/19/2015 6/20/2015 6/21/2015 6/22/2015 6/23/2015 6/24/2015 6/25/2015 6/26/2015 6/27/2015 6/28/2015 6/29/2015 6/30/2015
2637.46 2193.99 1739.87 1647.28 1595.05 1515.01 1247.55 1288.19 1439.96 1256.74 1304.93 1262.23 1207.07 1852.06 4517.17 4818.95 3923.91 2896.78 3103.60 3065.42 3024.20 2840.57 3168.16 3209.88 3122.77 2199.28 1491.38 1268.24 1064.73 1005.13
1631.86 1377.32 1387.54 1404.90 1352.02 1116.91 1058.73 1089.36 944.59 933.83 904.18 870.78 1285.40 3651.48 4277.03 3309.39 2154.55 2332.50 2262.55 2179.02 2106.18 2693.39 2925.95 2743.98 1746.71 1159.77 997.20 860.28 813.76 1099.24
2001.03 1562.66 1423.33 1378.38 1331.78 1127.60 1118.79 1224.85 1110.00 1115.27 1088.91 1048.59 1519.09 3657.50 4527.70 3952.12 2919.28 2886.45 2788.16 2692.08 2571.45 2878.73 3045.05 2979.21 2178.93 1453.98 1145.23 926.72 840.46 1141.70
2192.16 1744.24 1591.83 1542.30 1491.04 1271.48 1262.29 1374.66 1253.13 1258.62 1231.29 1189.93 1696.91 3862.80 4575.30 4127.49 3097.77 3060.92 2954.43 2855.70 2736.91 3052.34 3242.42 3166.31 2362.81 1625.56 1289.95 1067.16 980.27 1286.25
342,023 345,365 348,407 351,357 353,578 355,699 357,720 359,747 361,672 363,066 364,355 365,108 365,431 366,724 367,587 368,452 369,317 371,051 374,095 376,718 379,574 382,001 384,661 386,775 389,345 391,701 393,841 395,876 397,463 399,054
42,629 40,999 39,611 38,191 36,997 36,080 35,179 34,203 33,011 31,850 30,859 29,892 28,895 27,973 27,299 26,633 25,977 25,405 24,595 23,944 23,208 22,555 21,618 20,791 20,091 19,511 18,774 18,115 17,609 17,310
0.0629 0.0629 0.0545 0.0545 0.0577 0.0577 0.0577 0.0577 0.0612 0.0612 0.0612 0.0652 0.0652 0.0652 0.0697 0.0697 0.0697 0.0697 0.0521 0.0521 0.0521 0.0521 0.0550 0.0550 0.0582 0.0582 0.0582 0.0582 0.0618 0.0618
EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB EB
97.7 77.1 57.9 56.1 58.7 49.8 49.4 54.1 53.2 53.4 52.1 54.6 78.6 180.8 231.3 203.5 153.2 151.6 96.9 93.7 89.8 99.8 114.1 111.8 90.4 61.4 48.6 39.3 38.8 52.7
115.3 95.9 62.5 59.2 62.0 58.9 48.5 50.1 60.7 52.9 55.0 57.9 55.3 84.9 216.7 231.1 188.2 138.9 98.3 97.1 95.8 90.0 108.8 110.3 116.5 82.1 55.7 47.3 43.3 40.9
71.3 60.2 49.9 50.5 52.6 43.4 41.2 42.4 39.8 39.3 38.1 39.9 58.9 167.4 205.2 158.7 103.3 111.9 71.7 69.0 66.7 85.3 100.5 94.3 65.2 43.3 37.2 32.1 33.1 44.7
87.5 68.3 51.1 49.5 51.8 43.9 43.5 47.6 46.8 47.0 45.9 48.1 69.6 167.6 217.2 189.6 140.0 138.5 88.3 85.3 81.4 91.2 104.6 102.3 81.3 54.3 42.7 34.6 34.2 46.4
0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0157 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179 0.0179
The bold values were used to emphasize on the selected reservoir and the saved water through considering different forecast models.
Saved Volume ANFIS stream flow (acre-ft) 95.8 76.2 57.2 55.4 58.0 49.5 49.1 53.5 52.8 53.0 51.9 54.5 77.8 177.1 219.5 198.0 148.6 146.8 93.6 90.4 86.7 96.7 111.4 108.8 88.2 60.7 48.1 39.8 39.8 52.3
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
Date
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
161
400000
Predicted Monthly Streamflow, ac-
350000
300000
y = 0.8068x + 11904 R² = 0.8061
250000
200000
150000
100000
50000
0 0
50000
100000
150000
200000
250000
300000
350000
400000
Monthly Streamflow, ac- Fig. 16. Monthly simulated streamflows versus observed monthly historical streamflow, model developed based on the 6 most effective selected predictors. Table 6 Total saved volume based on correct choice of reservoir for storing through the same prediction period of the June 2015 (the same prediction period through Table 5). Date
Saved volume considering the actual streamflow (acre-ft)
Saved volume, ARIMA streamflow (acre-ft)
Saved volume hybrid streamflow (acre-ft)
Saved volume, ANN streamflow (acre-ft)
Saved volume, ANFIS s treamflow (acre-ft)
Total saved volume, acre-ft Total saved volume, m3
2650.4 3,269,188
2680.0 3,305,751
2117.0 2,611,216
2400.0 2,960,334
2591.0 3,195,956
The bold values were used to emphasize on the selected reservoir and the saved water through considering different forecast models.
ferent evaporation losses for different storage levels, the baseline is to select the reservoir that would have lower amount of evaporation loss. In other words, considering the existing storage levels, if Elephant Butte has lower evaporation loss for storing specific volume, in time of demand, the required demand from Elephant Butte Reservoir would be released to the Caballo Reservoir and without keeping in Caballo Reservoir, simultaneously be release to the downstream; In other words, the baseline is adjusting the storage levels on both reservoir to save more amount of water due to the evaporation loss and provide required water to downstream agricultural lands. As shown in Fig. 9, evaporation rates in these two reservoirs are not equal and vary thorough the year, different evaporation rates are considered in determining the total loss from each individual reservoir. Total evaporation loss for storing specific amount of volume in each individual reservoir of Elephant Butte and Caballo is calculated through Eq. (14) as follows:
(Total added surface area for specific amount of volume in each individual reservoir ) ∗ Evaporation rate each individual reservoir (14) in the considered time period = Total Evaporation Loss in that time period
Therefore, the total saved evaporation loss because of correct choice of reservoir to store specific amount of volume is calculated through Eq. (15) as follows:
(Difference of total added surface area for specific amount of volume in each individual reservoir ) ∗ Evaporation rate each individual reservoir in the considered time period = Total
(15)
savable water from minimizing evaporation loss in that time period Table 5 indicates the saved amount of water volume obtained through correct taken decision on choosing the reservoir with less evaporation loss. Total saved volume of evaporation loss for the provided example in Table 5 through month of June is the sum of daily saved evaporation loss in month of June, which is a significant amount of 2680 acre-ft (3,305,750.52 m3 ). The numerical results in Table 6 represents the total saved water volume through utilizing the different streamflow prediction models 5. Conclusion Our findings through comparison of four categories of streamflow forecast models revealed the importance of appropriate se-
162
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
lecting the effective predictors on the prediction accuracy performances. Although, similar to most of the previous studies in comparing the ANNs and ANFIS based streamflow forecast models, in our case study, ANFIS performed better than ANNs and hybrid models of ANNs-ARIMA, still the superiority of the ANFIS to other forecast models is not a universal conclusion, but its superiority can be considered as an expectation. Therefore, in most of the river or reservoir management that optimal operation relies on the accuracy of the streamflow prediction, different forecast models should be examined to get the most accurate model. In our case study, prediction accuracy performances for daily streamflow prediction models showed that utilizing ANFIS led to a superior prediction performance compared to the other three selected prediction models. In addition, extracting and utilizing the climate variability indices (time-dependent indices of PYi , P2Yi , and P5Yi ) through utilizing ANFIS led us to a superior prediction performance for monthly streamflow prediction. For daily streamflow prediction, utilizing time dependent trend indices did not change the prediction accuracy as significantly as monthly prediction. Rationally it is expected that the effect of a historical existing trend would be more sensible on the longer prediction period. Therefore, for prediction cases that computational procedure to extract the time dependent trend indices are costly and timely significant, the prediction model can be developed only based on other physical effective predictors. Accuracy performance indicated that, for this case study, the use of more information through hybrid models did not improve the prediction performance. Although the ARIMA based univariate streamflow prediction models are significantly simple to use, but most of the univariate forecast models utilize only previous observed values for prediction. As an example of a univariate ARIMA model, for predicting four time steps ahead (It+4 ), if effective lags are 1, 2, and 3, it means observing streamflow values for 1, 2, and 3 days ahead (It+1 , It+2 , and It+3 ) are required. Therefore, extending the prediction period for ARIMA would not be as reliable as ANFIS and ANNs. The non-seasonal ARIMA and seasonal ARIMA (SARIMA) models are univariate and only utilize the observed past streamflow values. Conceptually, the univariate models do not utilize the existing information from other prediction variables, such as snowpack and precipitation data; therefore, the prediction accuracy of those models rationally are considered less than the ANNs and ANFIS based forecast models. As a results, ARIMA as non-seasonal univariate model demonstrates an approach for a strongly-reliable inflow prediction when existing information is limited to streamflow data as a predictor. ANFIS as a powerful fuzzy logic based neural network by applying fuzzy inference approach between input and output datasets provides significantly reliable forecast models. In this study, the applicability of the ANFIS was investigated, and the comparison of the prediction accuracy performances proved its significant reliability compared to the other models of ANNs and hybrid models of ANNs-ARIMA. Of course, this cannot be considered as a universal conclusion since in most of the forecast models designing the structure of the prediction networks and selecting the most effective predictors are considered as other significantly important factors. The data in Table 4 indicate that the streamflow values predicted by ANFIS and the other three prediction models were successfully incorporated in the mass balance calculation along with optimization process. This achievement will support the daily operation and short-term forecasting for the reservoirs and provide the operation managers with clearer and more detailed hydrology. The developed management model, as shown in Fig. 2, and the prediction model present a foundation for an analysis of water manager values and an assessment of the interaction between conflicting criteria, such as maximizing the reliability of the Caballo release
while minimizing the net evaporation loss to the system by adjusting the appropriate storage levels in both reservoirs. Optimal management of integrated reservoirs would save fresh water sources and bring significant hydrological and environmental impacts and benefits. The approach used through this study is extendable to similar water resource projects, enabling reservoir operators to save water as the major source for meeting agricultural demands, environmental demands, and hydropower energy production. Cumulatively, preprocessing monthly historical data through clustering historical data and adding the time-dependent streamflow trend indices improved the prediction accuracy of the developed models. This suggests that prediction accuracy in both ANNs and ANFIS significantly depends on optimal structure design, intelligent selection of the predictors, and preprocessing of the selected predictors. Superior prediction performances were achieved by preprocessing historical data through data mining techniques and the intelligent selection of the predictors based on their developed importance degrees, and the employment of several developed prediction models including ANFIS, ANNs, hybrid model of ARIMAANN, and univariate time series model of ARIMA. Specifically, on the monthly model, utilizing ANFIS along with the time dependent trend indices significantly improved the prediction accuracy. As such, the results of this study would serve to prediction model developers; those interested in applying time series and artificial intelligence based prediction models; and the professionals interested in applying those intelligent models in real environments, particularly those involved in diverse engineering applications of intelligent prediction expert systems such as policy-makers on water and energy resources. The numerical example represented in Table 5 indicates the successful application of the developed prediction models along with the developed dynamic operation model. The daily observed values were successfully incorporated in the estimation process of the daily inflow values to the Elephant Butte Reservoir and in updating the utilized parameters in Eqs. (14) and (15) as mass balance law in the dynamic model. The developed management approach, along with accurate daily predicted values, provided a sound basis for the optimal integrated operation of Elephant Butte and Caballo Reservoirs and led to minimum evaporation loss by selecting appropriate storage volumes in both reservoirs, minimizing total surface area, and therefore minimizing the evaporation amount. As presented in Tables 5 and 6, according to the optimal reservoir operation strategy, the total saved volume of evaporation loss for a single month (June) considering different prediction methods are significant amounts. The accuracy performances of the developed prediction models through Table 3 and numerical results through Tables 5 and 6 represents that ANFIS and ARIMA provided more reliable with higher accuracy compared to the other models. In addition, utilizing this two models led us to save significant amounts of 2591 acre-ft (3,195,956 m3 ) and 2680 acre-ft (3,305,750.52 m3 ) respectively. The saved amounts of water due to utilizing the developed prediction models in this study led us to save significant amount of freshwater through the integrated optimal management these two Elephant Butte and Caballo reservoirs, in which considering the freshwater shortage in this region, is a significantly critical value. Since the area of this study is a snowmelt dominated region, the developed streamflow prediction models through this study would be more appropriate for specific regions that are considered snowmelt dominated regions, and it would be less effective in the rainfall dominated regions. As a future direction of this study, in order to show the effect of one-unit increment or decrement of each individual predictors on the streamflow magnitude, multiple logistic regression models
H. Zamani Sabzi et al. / Expert Systems With Applications 83 (2017) 145–163
would be developed. In addition, those multiple logistic regression models would be used to quantify and predict streamflow drought (in term of hydrologic drought) on the Rio Grande above Elephant Butte Reservoir, New Mexico. Those models would provide a basis for quantifying how significant each individual predictor of streamflow value is for quantifying the probability of hydrologic (streamflow) drought one year ahead. This study comparatively investigated the applicability of several forecast models on Rio Grande inflow, and led to integrating those models in successful multiple reservoirs’ optimal operation. Along with this study, a comparative review study on the accuracies and capabilities of numerous existing forecast models based on artificial intelligence or regressionbased models would be done to introduce the applicability of those models, in which it would provide a basis for categorizing the forecast models and recommend the best forecast model to select considering the existing prediction conditions in regional and global scales. Acknowledgment The authors would like to thank the Reinventing the Nation’s Urban Water Infrastructure Engineering Research Center at Stanford University for partially funding this research project. References Abudu, S. (2009). Monthly and seasonal streamflow forecasting in the Rio Grande Basin. 2009. New Mexico State University. Abudu, S., King, J. P., & Bawazir, A. S. (2011). Forecasting monthly streamflow of spring-summer runoff season in Rio Grande headwaters basin using stochastic hybrid modeling approach. Journal of Hydrologic Engineering, 2004, 384–390. http://doi.org/10.1061/(ASCE)HE.1943-5584.0 0 0 0322. Adamowski, J. F. (2008). Peak daily water demand forecast modeling using artificial neural networks. Journal of Water Resources Planning and Management, 134(April), 119–128. http://doi.org/10.1061/(ASCE)0733-9496(2008)134:2(119). Atsalakis, G., & Minoudaki, C. (2007). Daily irrigation water demand prediction using adaptive neuro- fuzzy inferences systems (ANFIS), 369–374. Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis: Forecasting and control. San Francisco: Holden-Day. Cheung, S. O., Fung, A. S., & Coffey, W. V. (2006). Project dispute resolution satisfaction classification through neural network, 23(8), 573–650.
163
Cheung, S. O., Tam, C. M., & Harris, F. C. (2012). Arbitration as an alternative dispute resolution method. World Construction Conference 2012 – Global Challenges in Construction Industry, 16(June), 23–31. Coulibaly, P., Anctil, F., & Bobée, B. (20 0 0). Daily reservoir inflow forecasting using artificial neural networks with stopped training approach. Journal of Hydrology, 230(3–4), 244–257. http://doi.org/10.1016/S0022-1694(00)00214-6. El-Shafie, A., Taha, M. R., & Noureldin, A. (2007). A neuro-fuzzy model for inflow forecasting of the Nile river at Aswan high dam. Water Resources Management, 21(3), 533–556. http://doi.org/10.10 07/s11269-0 06-9027-1. Imrie, C. E., Durucan, S., & Korre, a. (20 0 0). River flow prediction using artificial neural networks: Generalisation beyond the calibration range. Journal of Hydrology, 233(1–4), 138–153. http://doi.org/10.1016/S0022-1694(00)00228-6. Jain, S. K.; Das, A., & Srivastava, D.K. (1999). Application of ann for reservoir inflow prediction and operation, 125 (October), 263–271. Jang, J. R.. ANFIS: Adaptive-network-based fuzzy inference system http://doi.org/10. 1109/21.256541. Karimi-Googhari, S. H., & Lee, T. S. (2011). Applicability of adaptive neuro-fuzzy inference systems in daily reservoir inflow forecasting. International Journal of Soft Computing, 6(3), 75–84. http://doi.org/10.3923/ijscomp.2011.75.84. Mohammadi, K.; Eslami, H.R.; and Dayyani-dardashi, Sh., & . (2005). Comparison of regression, arima and ann models for reservoir inflow forecasting using snowmelt equivalent, 4, 43–52. Muluye, G. Y., & Coulibaly, P. (2007). Seasonal reservoir inflow forecasting with lowfrequency climatic indices: A comparison of data-driven methods. Hydrological Sciences Journal, 52(3), 508–522. http://doi.org/10.1623/hysj.52.3.508. Othman, F., & Naseri, M. (2011). Reservoir inflow forecasting using artificial neural network, 6(3), 434–440. http://doi.org/10.5897/IJPS10.649. Pramanik, N., & Panda, R. K. (2009). Application of neural network and adaptive neuro-fuzzy inference systems for river flow prediction. Hydrological Sciences Journal, 54(2), 247–260. http://doi.org/10.1623/hysj.54.2.247. Shukla, S., Sheffield, J., Wood, E. F., & Lettenmaier, D. P. (2013). On the sources of global land surface hydrologic predictability. Hydrology and Earth System Sciences, 17(7), 2781–2796. http://doi.org/10.5194/hess- 17- 2781- 2013. Talebizadeh, M., & Moridnejad, A. (2011). Uncertainty analysis for the forecast of lake level fluctuations using ensembles of ANN and ANFIS models. Expert Systems with Applications, 38(4), 4126–4135. http://doi.org/10.1016/j.eswa.2010.09. 075. Valipour, M., Banihabib, M. E., & Behbahani, S. M. R. (2012). Monthly inflow forecasting using autoregressive artificial neural network. Journal of Applied Sciences, 12(20), 2139–2147. http://doi.org/10.3923/jas.2012.2139.2147. Wu, C. L., Chau, K. W., & Li, Y. S. (2009). Methods to improve neural network performance in daily flows prediction. Journal of Hydrology, 372(1–4), 80–93. http://doi.org/10.1016/j.jhydrol.2009.03.038. Zamani Sabzi, H., King, J. P., & Abudu, S. (2017a). Developing an artificial intelligence-based forecast model utilizing datamining techniques to improve reservoir streamflow prediction - A case study. Water Science and Engineering. Zamani Sabzi, H., King, J. P., & Abudu, S. (2017b). Integration of time series forecasting in a dynamic decision support system for multiple reservoir managementA case study. Journal of Irrigation and Drainage Engineering.