A hybrid approach to monthly streamflow forecasting

Humphrey G.B., Gibbs M.S., Dandy G.C. and Maier H.R. A hybrid approach to monthly streamflow forecasting: integrating hydrological model outputs into a Bayesian artificial neural network, Journal of Hydrology, DOI: 10.1016/j.jhydrol.2016.06.026.

A hybrid approach to monthly streamflow forecasting: integrating hydrological model outputs into a Bayesian artificial neural network. Greer B. Humphreya,∗, Matthew S. Gibbsa,b , Graeme C. Dandya , Holger R. Maiera a

School of Civil, Environmental and Mining Engineering, The University of Adelaide, Adelaide, SA, Australia, 5005 b Department of Environment, Water and Natural Resources, GPO Box 2384, Adelaide, SA, 5001, Australia

Abstract Monthly streamflow forecasts are needed to support water resources decision making in the South East of South Australia, where baseflow represents a significant proportion of the total streamflow and soil moisture and groundwater are important predictors of runoff. To address this requirement, the utility of a hybrid monthly streamflow forecasting approach is explored, whereby simulated soil moisture from the GR4J conceptual rainfall-runoff model is used to represent initial catchment conditions in a Bayesian artificial neural network (ANN) statistical forecasting model. To assess the performance of this hybrid forecasting method, a comparison is undertaken of the relative performances of the Bayesian ANN, the GR4J conceptual model and the hybrid streamflow forecasting approach for producing 1-month ahead streamflow forecasts at three key locations in the South East of South Australia. Particular attention is paid to the quantification of uncertainty in each of the forecast models and the potential for reducing forecast uncertainty by using the hybrid approach is considered. Case study results suggest that the hybrid models developed in this study are able to take advantage of the complementary strengths of both the ANN models and the GR4J conceptual models. This was particularly the case when forecasting high flows, where the hybrid models were shown to outperform the two individual modelling ∗

Corresponding author Email address: [email protected] (Greer B. Humphrey)

Preprint submitted to Journal of Hydrology

June 21, 2016

approaches in terms of the accuracy of the median forecasts, as well as reliability and resolution of the forecast distributions. In addition, the forecast distributions generated by the hybrid models were up to 8 times more precise than those based on climatology; thus, providing a significant improvement on the information currently available to decision makers. Keywords: Monthly streamflow forecasting; Bayesian artificial neural networks; Conceptual hydrological models; Uncertainty; Hybrid modelling; South Australia

1

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

1. Introduction Accurate and reliable monthly streamflow forecasts can be extremely valuable for the proper management and allocation of water resources, particularly in a highly variable climate where the historical data alone have limited value in supporting decision making. This is the case in the South East of South Australia, where water resources are under pressure from changing land uses, yet highly variable flow regimes make these resources difficult to predict and manage. However, the competing environmental and agricultural demands on water resources in this region mean that the optimal management of flows is needed in order to ensure maximum benefit is derived from the water that is available (Gibbs et al., 2014). In monthly streamflow forecasting, two sources of predictability are typically exploited: catchment conditions (wetness) at the time of the forecast and the effect of climate over the forecast period (Pokhrel et al., 2013). As discussed in Wang et al. (2009), there are essentially two distinct approaches for doing this. The first involves the use of hydrological models that are driven by dynamic climate forecasts (i.e. forecasts of rainfall and other weather variables) and represent hydrological processes related to soil water balance and the evolution of the flow to the outlet of the basin. The second approach relies on predictors for representing initial catchment conditions (e.g. antecedent streamflow or soil moisture data) and climate during the forecast period (e.g. large scale climate indices or climate model predictions), together with statistical relationships derived from data that relate these predictors to upcoming streamflows (Plummer et al., 2009; Robertson and Wang, 2012). Hydrologic models employed in the former ‘dynamical’ forecasting approach typically operate on a daily or sub-daily time scale and can range from simple lumped conceptual rainfall–runoff (R-R) models to more physically-based fully dis2

28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

tributed models. Simple conceptual R-R models have been widely employed for modelling streamflow in Australia, as they generally provide good prediction accuracy, provided good climate data are available, and have relatively few parameters to calibrate (see Boughton (2005) for a review of hydrological models used in Australia and Wang et al. (2011b) for a discussion on the use of such models for monthly streamflow forecasting). These models attempt to explicitly simulate the dominant processes occurring within a hydrological system through the simplified representation of the system, typically as a series of conceptual water stores with simple empirical relationships used to describe the recharge and depletion processes that occur within and between them (Jain and Srinivasulu, 2006; Kokkonen and Jakeman, 2001). In the ‘statistical’ flow forecasting approach, on the other hand, system response is characterised primarily through the extraction of information implicitly contained in a set of observed data (e.g. monthly totals or averages), without directly taking into account the physical processes occurring within the hydrological system (Kokkonen and Jakeman, 2001; Toth and Brath, 2002). A perceived advantage of the dynamical forecasting approach, when compared with statistical streamflow forecasting models, is that given the physical basis of the hydrological models, they are able to capture catchment dynamics that predictors used in statistical models cannot (e.g. those related to catchment wetness). Therefore, they should be more faithful in simulating the rainfall-runoff process (Chen and Adams, 2006; Robertson et al., 2013). This is particularly considered to be the case under nonstationary conditions where ‘past’ predictor–response relationships derived by statistical models may no longer represent those at the time of the forecast (Wang et al., 2011a). However, the transformation of rainfall into runoff is an extremely complex, dynamic, and nonlinear process (Hsu et al., 1995), which can be difficult to fully understand and represent, particularly by means of a simple, conceptual model. Furthermore, similar to statistical forecasting models, hydrological models generally require calibration using historical rainfall and streamflow data. The choice of the calibration period and its length can have a significant impact on the estimated conceptual model parameters and, hence, the relationships modelled. In addition, dynamical forecasting models generally require statistical post-processing to remove systematic biases and to quantify uncertainty not represented directly by the calibrated model (Robertson et al., 2013). While knowledge-based hydrologic models are important for understanding hydrological processes, the main concern in many practical applications of 3

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103

monthly streamflow forecasting models is the accuracy and reliability of the forecasts; therefore, in such situations, statistical forecasting may be more suitable. These models do not require explicit consideration of the processes occurring within a hydrological system and, therefore, are not limited by an incomplete or unsuitable description of the complex R-R transformation processes as simple hydrological models may be. Furthermore, in contrast to conceptual R-R models, which typically require daily rainfall and potential evapotranspiration (PET) data as inputs, statistical models are generally not based on a prescribed (and possibly limited or prohibitive) set of input information, but rather they are able to take advantage of whatever relevant data are available. The potential to include auxiliary data, for example, those related to possible land use and climate impacts, may allow statistical streamflow forecasting models to characterise changes in the hydrological behaviour of a catchment that cannot be easily represented by simple conceptual R-R models (provided data are available that describe the change in rainfall-runoff relationship; e.g. data related to changes in land use, extractions or groundwater levels). Examples of statistical streamflow forecasting approaches include linear regression and time series models (Garen, 1992; Pagano et al., 2009; Valipour et al., 2012, 2013), non-parametric fitting (Sharma, 2000), independent component analysis (Westra et al., 2008), joint probability modelling (Wang et al., 2009) and artificial intelligence based approaches such as support vector machines (SVM), fuzzy logic and evolutionary computation based methods, Wavelet-Artificial Intelligence (W-AI) models and artificial neural networks (ANNs) (see Yaseen et al. (2015) for a review of such artificial intelligence based methods). ANNs are an extremely versatile type of data-driven model that have become widely adopted for hydrological modelling applications over the past two decades (see Abrahart et al. (2012) and Maier et al. (2010)). An advantage of these models over more traditional statistical modelling approaches is their flexible model structure, which enables them to capture arbitrarily complex and nonlinear input-output relationships from data without any restrictive assumptions about the functional form of the underlying process. However, despite their appeal, the performance of an ANN, like all statistical streamflow forecasting models, is highly dependent on the availability and quality of observed data. Ideally, to develop a reliable and robust ANN model, concurrent observations of all relevant predictors (i.e. those representing catchment wetness and climate effects) and the streamflow response would be required, with records sufficiently long to include a wide range of 4

104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141

conditions; while in reality, ANN models usually need to make do with whatever data are available. For example, antecedent rainfall and streamflows are typically used as rather crude proxies for representing initial catchment wetness in ANNs and other statistical streamflow forecasting models, due to the limited availability of soil moisture observations (Robertson et al., 2013). Furthermore, in comparison with conceptual R-R models, ANNs tend to have many more parameters requiring calibration and, consequently, they are more likely to be overparameterised with respect to the available data. As such, there is a greater risk that ANNs will not be capable of producing reliable forecasts beyond the range of the calibration data, unless models are updated as new data become available (e.g. Bowden et al. (2012)). In order to improve the accuracy and reliability of monthly streamflow forecasts, it would seem opportune to integrate or hybridise dynamical and statistical streamflow forecasting models in some way so as to exploit the strengths and eliminate the weaknesses of the respective modelling methodologies, rather than continuing to choose between the individual techniques and using them in isolation (Maier et al., 2010; Srinivasulu and Jain, 2009; Mount et al., 2016). There are a number of ways in which conceptual R-R models and ANNs can and have been combined in order to take advantage of their complementary strengths. These include the use of ANNs for the statistical post-processing of conceptual R-R model outputs and their associated uncertainty (Shamseldin and O’Connor, 2001; Brath et al., 2002; Abebe and Price, 2003; Anctil et al., 2003; Shrestha et al., 2009); the replacement of runoff generation and routing algorithms within both lumped and semi-distributed conceptual R-R models with ANNs (Chen and Adams, 2006; Corzo et al., 2009; Song et al., 2012; Liu et al., 2013; Loukas and Vasiliades, 2014); and the use of non-standard outputs from conceptual R-R models to expand the predictor set for ANNs (see Abrahart et al. (2012) for a more thorough discussion). The latter approach was taken by Anctil et al. (2004); Srinivasulu and Jain (2009); Isik et al. (2013) and Noori and Kalin (2016) who incorporated simulated data including soil moisture, effective rainfall, surface runoff and infiltration depths, baseflow and stormflow information derived from conceptual models into ANNs used for forecasting daily river flows. Similarly, Nilsson et al. (2006) used information about soil moisture and snow accumulation derived from a conceptual R-R model as auxiliary inputs to an ANN used for simulating monthly runoff. Recently, Rosenberg et al. (2011) and Robertson et al. (2013) investigated the benefits of hybrid seasonal forecasting systems where outputs from hydrological 5

142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179

models were used as predictors in a statistical forecasting system (although not ANN-based). In both cases, it was found that improved skill could be gained by using simulated initial conditions provided by a hydrological model rather than observations either due to the better spatial and temporal variation captured by the simulated data or through the provision of data when observations of an important variable were unavailable. In this study, the utility of a hybrid monthly streamflow forecasting approach similar to that developed by Robertson et al. (2013) is explored, whereby simulated soil moisture data from a hydrological model are used to represent initial catchment wetness in a statistical forecasting system. The study was motivated by the need for 1-month ahead streamflow forecasts to support water resources decision making in the South East of South Australia, where baseflow represents a significant proportion of the total streamflow and soil moisture and groundwater are important drivers of runoff. In highly variable, ephemeral catchments such as this (that flow for some period between June and November most years), previous or current streamflow is often not a powerful predictor of future streamflow. Therefore, it is expected that the integration of simulated soil moisture from a conceptual R-R model into the statistical forecast model will help to overcome this shortcoming, as was found to be the case in Robertson et al. (2013) and in Anctil et al. (2004), with the latter study focused on daily forecasting. In contrast to Robertson et al. (2013) who used a Bayesian joint probability statistical forecasting model, which assumes the joint distribution of streamflow and its predictors is described by a transformed multivariate normal distribution, Bayesian ANNs are employed for statistical forecasting in this study, as this modelling approach makes no assumptions about the form of the modelled function. In addition, unlike Robertson et al. (2013) who relied on lagged climate indices to represent the climate during the forecast period, rainfall forecasts produced by a dynamical climate model are used for this purpose, since this approach has been found to marginally improve statistical seasonal streamflow forecasts (Pokhrel et al., 2013). However, to ensure that only those predictors that are both relevant and non-redundant for forecasting streamflows in the catchments considered are selected as model inputs, a rigorous input variable selection (IVS) method is applied. In order to identify any gains achieved through the hybridisation of the dynamical and statistical streamflow forecasting methods, a comparison is undertaken of the relative performances of the statistical (Bayesian ANN), dynamical (based on the GR4J hydrological model (Perrin et al., 2003)) and 6

193

hybrid (Bayesian ANN with GR4J simulated soil moisture inputs) streamflow forecasting approaches for producing 1-month ahead streamflow forecasts at three key locations in the South East of South Australia. Streamflow forecasts are needed at these locations for flow management purposes and, ultimately, the aim of this study is to develop the most accurate and reliable models for supporting water resources decision making in this region. As such, the models developed in this study are assessed not only based on model accuracy, but also on their ability to provide forecasts on which management decisions could confidently be based. Consequently, particular attention is given to forecast uncertainty, as monthly streamflow forecasts can be sensitive to the representation of climate over the forecasting period and the initial catchment conditions, both of which may be highly uncertain and difficult to characterise appropriately. Thus, the potential for including soil moisture as a strategy for reducing such uncertainty is investigated.

194

2. Study area

180 181 182 183 184 185 186 187 188 189 190 191 192

195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215

A number of high value wetlands within the South East region of South Australia, including the internationally significant Bool and Hacks Lagoons, are currently under threat as a result of the significantly reduced inflows they have received since European settlement. Most of the region’s runoff is now diverted away from many of the wetlands by a series of cross-country drains that were constructed to provide flood relief by draining flood waters to the ocean. It is now recognised, however, that such flows are needed to maintain wetlands in the region and methods for restoring natural flow paths have been sought (e.g. Gibbs et al. (2014)). The extensive drainage network is shown in Fig. 1. Drain M, which conveys water from Bool Lagoon to the ocean near Beachport, is the largest of the cross-country drains and collects flow from a number of drainage systems located on the southern side of the drain (the Drain M catchment is denoted by the grey shaded region in Fig. 1). Recently, the ability to divert flow from this drain toward wetlands in the north has been established. However, the Drain M system and its contributing catchments also contain wetlands of high importance, including Bool and Hacks Lagoons and Lake George, where the drain terminates. To best manage this water resource, decisions must be made throughout the year regarding diversions from Drain M and whether water should be allowed to drain to Lake George and the ocean or be diverted to the north. As such, streamflow forecasts 7

−36S −36.5S −37S

Naracoorte

Bool Lagoon

N

−37.5S

Drain M

W

Lake George

E

Beachport

S

−38S

Mt Gambier 0

138.5E

20

40

139E

60 km

139.5E

140E

140.5E

141E

141.5E

Figure 1: Lower South East drainage network (red). The Drain M catchment is denoted by the grey shaded region. The vertical black line at approximately 141E denotes the border between South Australia and Victoria.

8

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253

with a lead time of one month are required for the optimal management of Drain M flows. For the purposes of this study, the overall catchment area contributing to the flow in Drain M has been divided into three smaller catchments, as shown by the grey shaded regions in Fig. 2, with outlets at flow stations A2390519, A2390514 and A2390512. This division is not only based on the location of flow gauges, but also major management decision making locations. The most upstream catchment (to the northeast) is important for maintaining flows into the internationally significant Bool and Hacks Lagoons and flows into these wetlands are recorded at station A2390519. Not only does Bool Lagoon have an important conservation function, it also acts as a balancing storage for the system when inflow rates from the upstream catchment are substantially greater than the capacity of Drain M. Flow station A2390514 records flow just downstream of the Callendale regulator, which is used to control flows in Drain M and provides the potential to divert flows northward to maintain other wetlands in the region. Flow at this point includes runoff from the middle catchment, as well as any releases from Bool and Hacks Lagoons, which are recorded at station A2390541. The most downstream flow station A2390512 is located near to where Drain M terminates at Lake George, and as such records the volumes flowing into Lake George. All three catchments shown in Fig. 2 may potentially contribute to flow in Drain M at this point; however, releases from Bool and Hacks Lagoons have been infrequent over the past 15 years and more typically only the lower two catchments would contribute. On average, 30GL/year is required to support Lake George (based on 1.5 times the volume required to fill Lake George annually) (AWE, 2009), while additional flows have the potential to be diverted to the north at Callendale. The three catchments will henceforth be referred to as A2390519, A2390514 and A2390512, after the flow gauges which record their respective outflows. Details of the catchment areas, annual rainfalls and annual flows are given in Table 1. Flows in Drain M predominantly occur between June and November, while for the remainder of the year there is typically no (or only very low) flow. The highest rainfalls are experienced in the winter months, with rainfall exceeding evapotranspiration between May and September. The development of a streamflow forecasting model and the management of the Drain M system is complicated by the fact that historical flow data may not represent the flows which might be expected in the future under similar climatic conditions. Since the mid to late 1990s, there has been a 9

−37S

A239519 ●● 26082 ●

−37.2S

●

SMT020

26075

A2390541 26003 ● A2390514 ●

−37.4S

CMM079

●

N ●

−37.6S

W

A2390512

26000

E

S

−37.8S

●

0

139E

20

●

40 km

139.5E

140E

140.5E

Flow Station Climate Station GW Well

141E

141.5E

Figure 2: Catchments considered with flow stations A2390519, A2390514 and A2390512 marking their outlets. The potential diversion point is at station A2390514, where flow can be directed to the north. Also shown are climate stations and groundwater wells where data used for calibrating the models were recorded. The green shaded region denotes the area of Blue Gum forestry plantations established in the late 1990s.

Table 1: Catchment area and mean annual rainfall and flow. ‘Mean Annual Flow’ was calculated from the total flow recorded at each location; however, ‘Area’ represents the sub-catchment area only (e.g. dark grey area in Fig. 2 for catchment A2390512). Mean annual rainfalls and flows were calculated over the period 1971 to 2012. Catchment A2390519 A2390514 A2390512

Area (km2 ) 1003 2200 383

10

Mean Annual Rainfall (mm) 606 667 676

Mean Annual Flow (GL) 22.29 29.27 50.06

257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273

2000

256

A2390512 A2390514 A2390519

Cumulative monthly flow (GL) 500 1000 1500

255

significant decrease in monthly flow volumes from each of the three catchments, which is in part due to the 1997-2008 ‘Millennium’ drought, where rainfall was substantially below average and there was an observed decline in groundwater levels in the shallow unconfined aquifer. The establishment of Blue Gum forestry plantations (denoted by the green shaded regions in Fig. 2) also coincides with this decrease in streamflow and may have contributed to a lowering of the watertable (Mustafa et al., 2006). In Fig. 3, plots of cumulative flow against cumulative rainfall are presented from the beginning of the flow record until 2012 for each catchment. The nonlinearity of these plots shows a change in the relationship between rainfall and runoff, particularly in the A2390519 and A2390514 catchments, occurring around the late 1990s. It has been proposed by Petheram et al. (2011) that in low-relief, moderate rainfall catchments with shallow unconfined aquifers, relatively high pre-drought groundwater levels may have amplified overland flow by reducing the storage capacity of the unsaturated zone and by facilitating organised patterns of drainage as the soil wetted up during a rainfall event. A falling groundwater table, on the other hand, would result in a reduction of overland flows due to the increased storage capacity of the unsaturated zone, meaning that saturation conditions are less likely to occur (Petheram et al., 2011). 2010 2000

1990

2010 2000

1990 2000

2010

1990 1980 1980 1980 1972

0

254

0

5000 10000 15000 20000 Cumulative monthly rainfall (mm)

25000

Figure 3: Cumulative Rainfall versus cumulative runoff for the three catchments in Fig. 2.

11

274

275 276 277 278 279 280 281 282 283 284 285 286 287

288 289 290 291 292 293 294 295 296 297 298 299 300 301 302

303 304 305 306 307

3. Methodology As discussed in the introduction to this paper, it is expected that a hybrid streamflow forecasting model, integrating simulated output from a hydrological model into an ANN, would exploit the strengths of the respective modelling approaches to provide the most accurate and reliable forecasts at the locations of interest for the management problem considered. Specifically, in this study, the ability of a hydrological model to describe initial catchment conditions (i.e. soil moisture) is taken advantage of, while the ability of an ANN to model complex processes and to readily incorporate the groundwater information needed to properly describe the generation of runoff in the case study area is leveraged. In order to test this hypothesis, a comparison of the statistical (Bayesian ANN), dynamical (based on the GR4J conceptual hydrological model) and hybrid (Bayesian ANN with GR4J simulated soil moisture inputs) forecasting approaches was carried out. 3.1. Dynamical forecast models 3.1.1. Model structure The GR4J conceptual R-R model (Perrin et al., 2003) was selected as the hydrological model in the dynamical forecasting approach, as it is a parsimonious model that explicitly accounts for non-conservative (or leaky) catchments, and has been shown to perform well for Australian conditions (Coron et al., 2012). The hydromad R package (Andrews et al., 2011) implementation of the GR4J model was adopted in this study, which allows for a 5-parameter version of this model. The five parameters include the original GR4J parameters X1, X2, X3 and X4, as well as an additional parameter X5, which determines the proportions of total effective rainfall modelled as indirect flow (X5) and direct flow (1 - X5). In the original GR4J model, this parameter is treated as a constant with a value of 0.9. The model schematic can be seen in Fig. 4, and further details of the model structure and parameters can be found in Perrin et al. (2003). 3.1.2. Model data To generate forecasts using the dynamical approach, the hydrological model is run up to the time of the forecast using historical rainfall and PET observations as inputs in order to initialise the model state variables. The model is subsequently run for the forecast period using dynamical climate

12

Rain

PET

interception

Production store

X1

S

X5

1 - X5 UH2

UH1

2.X4

X4 Routing store

X3

F(X2)

R

F(X2)

Q

X1 X2 X3 X4 X5

Capacity of the production store (mm) Water exchange coefficient (mm) Capacity of the nonlinear routing store (mm) Unit hydrograph time base (day) Split between direct and routed runoff (%)

Figure 4: Schematic of 5-parameter GR4J model adapted from Perrin et al. (2003)

13

308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345

model forecasts to produce forecasts of streamflow. This requires historical flow, rainfall and PET data for calibration and estimation of the initial catchment conditions, while forecast rainfall and PET data were required for producing streamflow forecasts. All models presented in this paper were ultimately developed to provide monthly forecasts of streamflow; however, the GR4J model is a daily model, requiring daily data. Daily flow data were available at flow stations A2390519, A2390514 and A2390512 from 1971 until 2014. The SILO Patched Point Dataset (Jeffrey et al., 2001) was used to provide daily rainfall and FAO56 PET data from 1889 until 2014, with the climate stations used shown in Fig. 2. A Thiessen polygon approach was used to combine stations and produce one daily time series each of catchment average rainfall and PET for each catchment. Forecasts generated using the Australian Bureau of Meteorology’s (BoM) dynamical climate forecasting system, POAMA-2 (version 2.4), were used to obtain future rainfall and PET data over the forecast period. Monthly rainfall predictions (in the form of 30-member ensembles, each with a spatial resolution of ≈200km) were available from the forecast system from January 1980 until December 2011. Downscaling of the large scale monthly POAMA forecasts was carried out to obtain daily rainfall and PET at the catchment scale based on the statistical analogue downscaling method detailed in Shao and Li (2013). Using this approach, the analogue period identified by downscaling monthly POAMA rainfall predictions was used to provide daily rainfall and PET ‘forecasts’ at each of the climate stations shown in Fig. 2, whereby the observed daily rainfall and PET data from the analogue month (i.e. the historical month identified as being most similar to the forecast mean monthly weather pattern) were used as daily climate forecasts. This resulted in 30-member ensembles of daily rainfall and PET data over the forecast period. For the purposes of this study, the forecast period is the period over which the models developed were validated and compared. Data from 2000 to 2004, inclusive, were used for independent validation and comparison of the models developed. These five years were selected such that the performance of the models could be assessed on a mixture of both relatively high and low flows. Furthermore, the validation period represents the “current” runoff regime, which is drier than the pre-drought regime prior to the late 1990s. As such, performance results on these data should give a better representation of how the models may perform if used for operational purposes. The period of data used for model calibration is discussed in the following section. 14

346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362

3.1.3. Model calibration Three GR4J models were developed to forecast flows at stations A2390519, A2390514 and A2390512. For each of these models, which will henceforth be referred to as GR4J 0519, GR4J 0514 and GR4J 0512, respectively, values of the five GR4J model parameters (X1, . . . , X5) were identified by calibration. In order to account for the uncertainty associated with the GR4J parameter estimates, calibration was undertaken using the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm (Vrugt et al., 2009), which estimates the posterior parameter distributions within a Bayesian framework using an efficient, self-adaptive Markov chain Monte Carlo (MCMC) sampling procedure. As with the GR4J model, the implementation of the DREAM algorithm included in the hydromad R package was adopted. The DREAM algorithm requires specification of a prior distribution for each parameter and a likelihood function that describes how well the estimated parameters represent the process being modelled. Uniform prior distributions were assumed for all parameters, with the assumed bounds given in Table 2, while the likelihood function adopted for calibration of the GR4J model was the sum of squared errors (SSE), which assumes independent, homoscedastic residuals. Table 2: Bounds adopted for the uniform prior distributions. Parameter X1 X2 X3 X4 X5

Units mm mm mm days %

Lower Bound 100 -15 1 0.5 0.6

Upper Bound 600 5 300 6 0.99

363 364 365 366 367 368 369 370 371 372 373 374 375 376

To account for the change in hydrological behaviour of the catchments illustrated in Fig. 3, a rolling ten year period was used to calibrate the model parameters for each of the three models. Using this approach, the model parameters were recalibrated for each year in the validation or forecast period based on the data from the preceding 10 years (3652 data points), which were considered to be more representative of the catchment conditions than the entire data record. The parameter values identified were then used to forecast the following one year of streamflow data, before being recalibrated again. This methodology allows for the change in parameter distributions over time to be accounted for. It also allows data from the independent validation period to be used for calibration without inflating the validation results. To reduce any bias in the simulated flows, observed flows were assimilated into the GR4J models using the approach of Demirel et al. (2013), where the 15

377 378 379 380 381 382 383 384

385

386 387 388

389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407

408

level in the routing store is updated at the start of each month, forcing simulated flows to match observed flows. As can be seen in Fig. 4, the GR4J ˜ t , as the sum of the flow direct from the model calculates total streamflow, Q ˜ t,d , and the flow from production store (after applying a unit hydrograph), Q ˜ t,r . The modelled Q ˜ t,d is subtracted from the observed the routing store, Q ˜ ˜ flow, Qt , to derive Qt,r necessary for Qt = Qt . The routing store level, R, can then be solved for using the equation used to calculate the outflow from this storage by the GR4J model:   " 4 #−1/4   ˜ t,r = R 1 − 1 + R (1) Q   X3 where X3 is the estimated GR4J parameter. Updating of the routing store was applied separately to each parameter set sampled from the posterior distribution. 3.1.4. Forecast uncertainty Only parameter uncertainty was explicitly represented using the DREAM calibration approach, while all other sources of uncertainty (e.g. those related to input data errors and the simplified representation of the rainfall-runoff process) were aggregated into a statistical model of the residual error in order to account for the total, or forecast, uncertainty. This residual error model provides a statistical description of the differences between the model predictions and observed data, without trying to disentangle its contributing sources (Evin et al., 2014), which is generally less data intensive and less complex than methods which attempt to characterise each individual source of error. As with the GR4J model itself, the residual error model also has parameters whose values needed to be estimated for each of the forecast models developed. In this study, a “postprocessor” step was applied after the parameter estimation process in order to estimate the residual error model parameters conditional on the estimated GR4J parameters. While parameter uncertainty was estimated at a daily time scale, the residual error model was fitted to the errors in the monthly volumes, as this is the main time scale of interest for the management of the drains in the South East. The following residual error model, , was assumed at a monthly timestep t: ˜ θt + α0 ) t = log (Qt + α0 ) − log (Q 16

(2)

409 410 411 412 413 414 415 416 417 418 419 420 421 422 423

424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445

˜ θ is the corresponding where Qt is the observed monthly flow at time t, Q t simulated monthly flow based on model parameter set θ and α0 is used to allow for the log transformation of zero monthly flows. The logarithmic transform was applied in order to account for any heteroscedasticity in the errors, which is common in hydrological modelling applications. The above error model makes the assumption that, after the transformation, the residual errors are normally distributed with zero mean and constant variance σ0 (i.e. ∼ N (0, σ0 )) and has two parameters, α0 and σ0 . Values for α0 and σ0 were estimated for each GR4J parameter set sampled from the posterior parameter distribution separately. A maximum likelihood approach was used for this, accounting for the Jacobian of the transformation. A sample of 100 parameter sets was taken from the posterior parameter distributions identified by the DREAM algorithm, and for each a further 100 replicates from the fitted residual error model were derived, resulting in 10,000 samples from the overall forecast distributions, representing total forecast uncertainty. 3.2. Statistical forecast models 3.2.1. Model architecture ANNs are a self-adaptive modelling approach with the ability to model both linear and nonlinear functions without prior specification of the functional form of the model (Qi and Zhang, 2001). As such, these models can be considered as a very general form of nonlinear regression model. The Bayesian ANN models developed in this study were single hidden layer multilayer perceptrons (MLPs), as this is the simplest and most common form of ANN used for hydrological modelling (Maier et al., 2010). As their name suggests, MLPs are made up of several layers of processing nodes, which are arranged into an input layer, an intermediate or ‘hidden’ layer and an output layer. Input information is transmitted through the network via weighted connections, such that it is redistributed across all of the nodes in the subsequent layers. At the nodes, the weighted information is summed together with a bias term and transformed using a prespecified transfer function. In this study, a logistic transfer function was used at the hidden layer nodes, while a linear transfer function was used at the output layer, as this configuration has been shown to have universal approximation capabilities (Bishop, 1995). The complexity of a MLP can easily be adjusted by increasing or decreasing the number of hidden layer nodes, which determine the number of free parameters in the model. There is, however, a desire to select the minimum number of hidden nodes necessary for capturing the underlying 17

446 447 448 449 450 451 452

453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480

rainfall-runoff relationship, since the more free parameters there are, the more difficult the model is to calibrate and the more uncertain the resulting parameters and forecasts are (Kingston et al., 2008). In an ANN, the free parameters are the network weights and biases, which have no physical interpretation and can take any value. In a Bayesian ANN, these parameters are represented by probability distributions which are estimated within a Bayesian framework (Kingston et al., 2005; Khan and Coulibaly, 2006). 3.2.2. Model data Statistical flow forecasting models require predictors that describe the initial catchment condition and the effect of climate during the forecast period. Antecedent streamflows, rainfall and PET data were considered as potential predictors for representing initial catchment conditions. As such, daily flow and catchment average rainfall and PET data discussed in Section 3.1.2 were converted to monthly totals for development and validation of the ANN models. In addition, groundwater observations and a computed Antecedent Precipitation Index (API) time series for each catchment were also considered as potential predictors for representing catchment wetness, while POAMA rainfall forecasts were used as potential predictors of climate over the forecast period. For each catchment, the model output was the total monthly flow recorded at the respective downstream flow station (i.e. A2390519, A2390514 or A2390512) one month in advance. Groundwater data were obtained from two monitoring wells (wells CMM079 and SMT020) within the Drain M catchment, with locations shown in Fig. 2. These were the only wells within or adjacent to the Drain M catchments from which sufficiently long records were available at a monthly time scale (linearly interpolated from approximately 3- to 6-monthly records) and where telemetry is used to provide data in near real-time (which is necessary for operational purposes). However, there was a significant period of missing data for well CMM079 between 1979 and 1994, which the Hydrograph Analysis Rainfall and Time Trend (HARTT) software (Ferdowsian et al., 2001) was used to infill. The HARTT model was found to provide a good estimate of the measured groundwater depth for this well from 1994 onward (not shown for the purpose of brevity) and, therefore, it was considered that this method was appropriate for filling in the period of missing data. The API is a popular index used for representing initial catchment wetness

18

481

482

in hydrological practice (Brocca et al., 2008) and is given by: AP Ii (t) =

i X

Pt−j k −j

(3)

j=1 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515

where i is the number of antecedent days included in the calculation, k is a recession coefficient and Pt−j is the precipitation for day t − j. The recession coefficient k represents the “memory” of a catchment by decaying the effect of accumulated rainfall at each time step. The theory is that earlier precipitation should have less influence on present streamflow response than recent precipitation. A value of k = 0.99 was used for each catchment based on a recession analysis of the daily flow data, under the assumption that the effect of antecedent precipitation on streamflow would decay at the same rate as the recession limb of a hydrograph during periods without rainfall. Daily API values computed based on daily rainfall data were converted to monthly average values. Rainfall forecasts generated using the Australian Bureau of Meteorology’s (BoM) dynamical climate forecasting system, POAMA-2 (version 2.4), were considered as potential inputs for representing the effect of climate during the forecast period in the ANN models. Large-scale monthly rainfall predictions from January 1981 until December 2011 were downscaled using the statistical analogue approach of Shao and Li (2013) to provide 1-month ahead rainfall forecasts at each of the rain gauges shown in Fig. 2. The downscaled 30member ensembles were reduced to three time series at each rain gauge: the minimum, mean and maximum of the ensembles. As a result of the large number of zero flows recorded at stations A2390519, A2390514 and A2390512, primarily in the summer and autumn months, the information content of the complete flow time-series is relatively small for each catchment. Therefore, to increase the influence of non-zero flows in the dataset and improve model calibration capabilities, it was considered sensible to forecast flows from the months June to November (winter and spring) only, when non-zero flows are dominant. As such, only monthly flow data between June and November were considered for model development and validation and all corresponding potential input time series were reduced accordingly. The resulting model development data were comprised of concurrent monthly observations of all potential inputs and streamflow 1-month in advance, which were available for the period from February 1980 to December 2011. However, monthly data between 2000 and 2004 were reserved for inde19

516 517

518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552

pendent validation and, in this case, were not used in the model development process. 3.2.3. Input variable selection The appropriate selection of inputs is one of the most critical steps in the development of ANN models (Wu et al., 2014). The model inputs, together with the response variable, contain all of the information necessary for characterising the underlying process, given that no descriptions of the physical processes are explicitly included in the model. For streamflow forecasting, the database of potential or candidate inputs usually includes observations of the predictors (e.g. those which represent initial catchment conditions and the effect of climate) at different locations and time lags, as well as lagged observations of the streamflow being modelled. Consequently, the number of potentially important inputs can be quite large; however, given their correlated nature, many may be redundant. Furthermore, some potential inputs (e.g. coarse-scale spatial data or point measurements in particular locations) may provide little or no information about the rainfall-runoff process, making them irrelevant to the problem. The inclusion of irrelevant and/or redundant inputs only adds noise and complexity into the model, while the omission of relevant input variables, on the other hand, results in part of the output behaviour remaining unexplained. The GA-ANN input variable selection (IVS) approach described in Galelli et al. (2014) was used in this study to select the “optimum” set of inputs for each of the models developed. This method combines a genetic algorithm (GA) search procedure (Goldberg, 1989) with a simple, 1-hidden node ANN model, where the aim is to systematically search the potential input space (i.e. the candidate input pool) for the subset of inputs that yields the best ANN model performance. The GA-ANN algorithm was selected for IVS in this study as preliminary investigations found ANNs with a single hidden node to be sufficient for forecasting flows in Drain M. As such, this algorithm can be considered tailored to this study, in that the selected inputs are those that optimise the performance of the forecasting models developed (and should therefore yield high forecast accuracy). The objective function used to determine whether one subset of inputs is better than another can have a significant impact on the results obtained. While goodness-of-fit criteria such as the sum-squared error (SSE) or coefficient of determination (R2 ) may be appropriate at times for this purpose, it is often better to use more parsimonious model selection criteria such as Akaike’s information criterion (AIC) 20

553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590

(Akaike, 1974) or the Bayesian information criterion (BIC) (Schwarz, 1978), which penalise unnecessary model complexity (see Galelli et al. (2014)). In this study, the out-of-sample AIC, which has been found to consistently indicate the optimum ANN structure in water resources modelling applications (Kingston et al., 2010), was adopted for this purpose. This was computed using k -fold cross-validation, where the number of folds, k, was equal to 5. Similar to the dynamical forecasting approach, three ANN models were developed for forecasting flows at stations A2390519, A2390514 and A2390512, which will henceforth be referred to as ANN 0519, ANN 0514 and ANN 0512, respectively. To generate the candidate input pools for each of these models, only input variables recorded within or upstream of the catchment under consideration were included as potential inputs (e.g. rainfall data for catchment A2390514 were included in the candidate input pools for the models developed to forecast flows at A2390512 and A2390514, but not at A2390519). Lagged climate and streamflow data may explain some of the future streamflow due to the long memory response of the catchment, while lagged soil moisture inputs may be important for providing additional information on how the catchment is wetting and drying and for capturing seasonal effects. As such, each of the predictors representing the initial catchment conditions was lagged by up to two months, while 1-month ahead POAMA forecast rainfalls were represented by a minimum, mean and maximum value, as described in Section 3.2.2. A maximum lag of two months was selected in order to minimise the sizes of the candidate input pools without being overly restrictive. This was based on the results of partial correlation analyses, which showed that (linear) correlation between the inputs and the target streamflow data were generally insignificant at lags greater than two months (for the purpose of brevity, results of this analysis are not shown). The GA-ANN IVS method selected between 5 and 8 inputs for the three catchments considered (the resulting inputs selected by the GA-ANN IVS algorithm are presented in Table A.1 in Appendix A). The majority of the selected inputs represent antecedent catchment conditions, suggesting that such initial catchment wetness is of greatest importance for forecasting flows in Drain M. Rainfall, evaporation, API and groundwater data were selected in all cases where available (groundwater data were not available for catchment A2390519). However, in contrast to the commonly adopted approach of including flow data from previous time periods to forecast future flows, antecedent flow data were not found to contribute to optimal forecast accuracy for any of the catchments, suggesting that these data are suboptimal pre21

591 592 593 594 595

596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627

dictors for representing the soil moisture status of the ephemeral catchments considered. The fact a POAMA forecast rainfall input was also selected for each catchment suggests that the dynamical climate forecasts contribute to improved forecast accuracy in comparison to the situation where no predictors of future climate are included. 3.2.4. Model structure selection and initialisation of weights The ANN models ANN 0519, ANN 0514 and ANN 0512 each had a single output node corresponding to the 1-month ahead streamflow forecast at the respective downstream flow station, while the number of input nodes for each model was fixed equal to the number of selected inputs (see Table A.1). However, it was necessary to select the appropriate number of hidden nodes for each of the models. To do this, and to provide initial estimates of the parameter values for the Bayesian calibration procedure employed, standard deterministic ANN development methods, as outlined in Maier and Dandy (2000), were implemented using the R nnet package (Venables and Ripley, 2002). The model development data, excluding the validation period between 2000 and 2004, were divided into training and test datasets with proportions of 80% and 20%, respectively. To do this, the data were randomly reordered multiple times until the statistics of the training and testing portions of the data were most similar. After accounting for the appropriate lags of the input and output variables, 110 monthly data points were allocated to the training set and 27 data points were allocated to the test set. The training set was used for pre-Bayesian training, where deterministic model parameter values were estimated that would subsequently be used to initialise the Bayesian calibration approach adopted in this study to account for parameter uncertainty. The Nelder-Mead (or downhill simplex) method, as implemented within the nnet package, was used to obtain these estimates. To avoid this algorithm becoming trapped in a poor local optimum, training was initialised 20 times using different random weight vectors and the best resulting model was selected based on out-of-sample performance on the test dataset. Early stopping using the test set for cross-validation was employed to prevent overfitting of the training data. To select the optimum number of hidden nodes for each of the three ANN models developed, a trial-and-error approach was used, whereby ANNs with 1, 2, . . . , 6 hidden nodes were trained for each of the three catchments and the models with the minimum out-of-sample AIC values, calculated using the 22

628 629 630 631 632 633

test data, were selected. Similar to the ANN input selection procedure (see Section 3.2.3), the out-of-sample AIC was chosen to select the appropriate number of hidden nodes, as it has been successfully used for this purpose previously when applied to a number of different case studies (Kingston et al., 2010). For each catchment, it was found that an ANN model with a single hidden node gave the best results for modelling the rainfall-runoff process.

650

3.2.5. Bayesian calibration and quantification of forecast uncertainty Similar to the GR4J forecasting approach, the uncertainty associated with the ANN forecasts was estimated by assessing parameter uncertainty explicitly within a Bayesian framework, while all other sources of uncertainty were aggregated into a residual error model. The MCMC Bayesian training approach developed by Kingston et al. (2005) was used to infer the posterior probability distribution of the ANN weights, given the observed data. This method does not require a test data set; therefore, the training and testing data were recombined and used to train the models (giving a total of 137 monthly data points). Like the DREAM algorithm used for calibrating the GR4J models, this method uses an adaptive MCMC sampling procedure to generate samples from the posterior parameter distributions. However, unlike the postprocessor method used to estimate the parameters of the error model in Section 3.1.4, “joint inference” is employed to simultaneously estimate the model parameters (i.e. ANN weights) and the residual error model parameters θ . In this case, the likelihood function is given by the joint pdf of the model residuals (Evin et al., 2014):

651

p(Q|w, θ , X) = p( [w, X, Q] |θ )

634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649

652 653 654

655 656 657 658 659

660

(4)

where Q is the series of observed monthly flows and X denotes the monthly input data. In this study, an additive Gaussian error model, with mean of zero and constant variance σ2 was assumed: ˜w Qt = Q t + t t ∼ N (0, σ2 )

(5)

˜ w is the simulated monthly flow at time t based on the estimated where Q t weight vector w. As such, the likelihood function is given by:  2    w ˜ N t   Q − Q Y t t 1 2 p (6) exp − p(Q|w, σ , X) =   2σ2 2πσ2 t   23

661 662 663 664 665 666 667 668 669 670 671

672

673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689

690 691 692 693

where Nt is the number of observations in the calibration period. In this case, no attempts were made to correct for heteroscedasticity in the residual error model. The prior distribution was chosen based on the experience that the values of the weights can be positive and negative with equal probability and that the weights have a finite variance. This prior (given by Eq. 7) is the product of four different normal distributions, each with a mean of zero, corresponding to four different groups of weights: the input-hidden layer weights, the hidden layer biases, the hidden-output layer weights and the output layer biases; 2 and dg are the variance and dimension of the gth weight group, where σw g respectively. # " P 4 dg −dg /2 2 Y − w i=1 i 2 2 (7) p(w|σw )= 2πσw exp g 2 2σ w g g=1 Estimation of the parameters is given a hierarchical treatment, where the lower level of the hierarchy is comprised of the weights w, while the upper 2 2 2 layer consists of the variance ‘hyperparameters’, σw = σw1 , . . . , σw4 and σ2 , which control the distributions of the weights and the residual errors, respectively. As such, the MCMC training method involves a two-step iterative procedure, where ANN weight vectors are sampled from the posterior distribution using the adaptive Metropolis algorithm (Haario et al., 2001) and the likelihood and prior variance hyperparameters are sampled using the Gibbs sampler. Rather non-informative inverse chi-squared hyperprior distributions were assumed for the variance hyperparameters so as to allow their values to be determined from the data. Further details of this method can be found in Kingston et al. (2005). A sample of 10,000 weight vectors and corresponding residual variance values was obtained from the posterior weight distribution, p(w|X, Q), and the posterior variance distribution, p(σ2 |w, X, Q), respectively, and subsequently used to generate 10,000 samples representing total forecast uncertainty. 3.3. Hybrid forecast models The hybrid forecasting models were developed using the same approach and data as outlined in Section 3.2 for the statistical models. However, the model development data for the hybrid models included additional soil

24

694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731

moisture index (SMI) data simulated by the GR4J models, which could potentially be selected as inputs to Bayesian ANNs. The SMI is the daily level in the ‘production store’, S (see Fig. 4) and represents the soil water content inside the conceptual soil reservoir. This level is balanced according to the relative magnitudes of rainfall and PET and can be depleted by percolation and is, therefore, analogous to soil moisture. The GR4J models used to simulate the SMI series for each catchment were calibrated to daily flows at A2390512, A2390514 and A2390519 using a maximum likelihood approach (i.e. SMI was simulated based on a “best parameter set” for each catchment and parameter uncertainty was not accounted for). A rolling 10 year period was again used for calibration, with the final year in each calibration period used to provide the SMI time series over the period 1980 to 2011. This was expected to produce the best possible estimate of the internal model parameters, including the daily level of the production store (i.e. the SMI). The daily SMI was extracted and converted to monthly average values to provide monthly SMI time series for the Bayesian ANN models. Again, three hybrid models were developed for forecasting flows at stations A2390519, A2390514 and A2390512, which will henceforth be referred to as hybrid 0519, hybrid 0514 and hybrid 0512, respectively. The candidate input pools for these models were restricted to only include those inputs found to be optimal for the corresponding statistical forecasting models (i.e. ANN 0519, ANN 0514 and ANN 0512, see Table A.1), with the addition of lagged values of the SMI simulated from within and upstream of the catchment under consideration. This was done so that any direct gains associated with the hybridisation of the GR4J and Bayesian ANN models could be identified without the confusion of having entirely different combinations of inputs selected for each model, which could result due to the high degree of correlation between the candidate inputs and the fact that the GA-ANN algorithm can not guarantee a globally optimal solution. Furthermore, the smaller the candidate input pool, the easier it is to find optimal combinations of the SMI inputs to complement (or replace) inputs used in the statistical models. The GA-ANN IVS method selected between 5 and 9 inputs for the hybrid models developed (the resulting inputs selected by the GA-ANN IVS algorithm are presented in Table A.2 in Appendix A). For catchment A2390512, the inclusion of SMI data in the candidate input pool resulted in two more inputs being selected for model hybrid 0512 than for ANN 0512, while for catchment A2390514, one more input was selected for hybrid 0514 than for 25

732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769

ANN 0514. For catchment A2390519, the number of inputs selected for the hybrid and statistical models was the same. Based on the IVS results, it is apparent that the GR4J generated SMI data are complementary to the more traditional inputs used for representing initial catchment conditions, rather than providing replacements or substitutes for any of them. The fact that at least one SMI input was selected for each of the hybrid models supports the hypothesis that the integration of this information from the dynamic models into the statistical forecasting models may result in improved streamflow forecasts at each of the flow gauging stations (since the GA-ANN algorithm was used to select inputs based on predictive performance). However, antecedent rainfall, evaporation and groundwater data were also selected in all cases where available (groundwater data were not available for catchment A2390519), suggesting that the SMI data provides additional information about the rainfall-runoff process than these inputs are able to characterise. For catchments A2390512 and A2390519, the API inputs selected for models ANN 0512 and ANN 0519 were omitted from the corresponding hybrid models, implying that these data became redundant with the addition of the SMI inputs. However, this was not the case for model hybrid 0514, where both API and SMI inputs were selected by the IVS algorithm. For this model, the SMI data selected were those generated for the upstream catchment A2390519. It is likely that this result has limited physical interpretation, since the influence of the upstream catchment on flow at A2390514 is largely related to releases from Bool Lagoon, which happen very infrequently (only twice since the late 1990s). However, it is possible that the catchment processes, in particular, those related to soil moisture, are better captured by the GR4J model developed for catchment A2390519, which is smaller and has less interception than catchment A2390514, with a greater proportion of rainfall converting to runoff. In turn, this simulated soil moisture may be more correlated with actual average soil moisture in catchment A2390514 than that simulated using the GR4J model developed for catchment A2390514. As with the statistical forecast models, it was found that a single hidden node ANN was sufficient for modelling the rainfall-runoff process using the hybrid approach. A sample of 10,000 weight vectors and corresponding residual variance values were sampled from the posterior weight distribution and the posterior variance distribution estimated using the Bayesian MCMC calibration procedure. These were subsequently used to generate a distribution of 10,000 streamflow forecasts, accounting for total forecast uncertainty, 26

770

771 772 773 774 775 776 777

778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805

for each month in the validation period. 3.4. Performance assessment To assess and compare the performance of the models developed in this study, a number of different performance metrics were employed for analysing the quality of the median forecasts, as well as the forecast distributions, which account for the uncertainty associated with the forecasts. In addition, the performance of each model was assessed in regard to the management problem being addressed in the case study presented. 3.4.1. Accuracy and reliability metrics To evaluate the accuracy of the median forecasts (i.e. the median values of the forecast distributions at all times t) produced by the various models, the root mean squared error (RMSE), Nash-Sutcliffe efficiency (NSE) (Nash and Sutcliffe, 1970) and percent bias were computed for each model. These metrics were selected to assess and compare forecast accuracy based on different baselines. The RMSE is a measure of overall error between the observed and modelled data and returns an error value with the same units as the data. For a perfect model this non-negative metric would be 0. The NSE statistic, on the other hand, is a dimensionless “goodness-of-fit” measure which can range from −∞ in the worst case to 1 for a perfect model. This metric can be useful for benchmarking the performance of the streamflow forecasting models, as a number of authors have considered NSE scores ≥ 0.9 to be “very satisfactory”, between 0.8 and 0.9 to be “fairly good”, and < 0.8 to be “unsatisfactory” when applied for the assessment of hydrological models (Dawson et al., 2007; Shamseldin, 1997). Percent bias determines whether forecasts have a general tendency to be too high or too low, where a positive bias indicates over-estimation by the model (the mean of the median forecasts is greater than the mean of the observations), while a negative bias indicates under-estimation (the mean of the median forecasts is less than the mean of the observations). As the above metrics are deterministic measures of forecast accuracy, they cannot be used to analyse the quality of the time-varying forecast distribution (the forecast distributions at all times t). Therefore, in addition to the deterministic metrics, several posterior diagnostics have also been adopted for assessing and comparing the models in terms of the reliability and resolution of the forecast distributions. Forecast reliability describes how statistically consistent the time-varying forecast distribution is with the observed data, 27

806 807 808 809 810 811 812 813 814 815 816

817

818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841

while resolution describes the average precision of the forecast distribution. A model that has both good reliability and good resolution is desirable. One of the simplest and most intuitive ways to measure reliability is to compare the proportion of observations that fall within specified forecast intervals with the confidence level for those intervals (Mason and Stephenson, 2008). For example, if a model were reliable, it would be expected that the 50% and 95% forecast intervals would capture 50% and 95% of the observations, respectively, given a sufficient number of observations. In this study, a “hit rate” is used to measure the frequency with which observations fall within the 95% forecast intervals generated by the different models. This is calculated according to (Demirel et al., 2013): hit rate =

hits × 100% hits + misses

(8)

where a “hit” occurs when the observation falls within the 95% forecast intervals and a “miss” occurs when the observation is not included within these intervals. A hit rate greater than 95% may indicate that observations are too frequently included within the 95% forecast intervals and that the model is under-confident (forecast uncertainty is overestimated), while a hit rate lower than 95% may either indicate that the model is over-confident (forecast uncertainty is underestimated) or that the model is biased. Despite its intuitive appeal, a problem with this measure of reliability is that a hit rate of 95% could be achieved simply by making the forecast intervals infinitely wide 95% of the time, and infinitely narrow the remaining 5% of the time (Mason and Stephenson, 2008). Therefore, a more informative picture of forecast reliability, in the form of a predictive quantile-quantile (QQ) plot (Laio and Tamea, 2007; Thyer et al., 2009; Renard et al., 2010), is also considered. If the forecast distribution has been accurately estimated, historical observations of streamflow should correspond to realisations from the estimated forecast distributions. Under this assumption, if Ft is the cumulative distribution function (cdf) of the forecast distribution at time t, the values p = Ft (Qt ), for t = 1, . . . , Nt should follow a uniform distribution on the interval [0,1]. The predictive QQ plot compares the observed p-values to the cdf of a uniform distribution, with perfect reliability represented by the 1:1 line. Deviation from this line can be interpreted as over-/under-confident or biased forecasts, as shown in Fig. 5. In addition, to enable easier comparison of forecast reliability between models, the reliability index α described in Renard et al. (2010) is used to 28

Quantile of Observed p-value

1 0.8 0.6 0.4 0.2 0 0

0.2 0.4 0.6 0.8 Theoretical Quantile of U[0,1] Under-prediction Over-confident Perfect Reliability

1

Over-prediction Under-confident

Figure 5: Examples of predictive QQ plots displaying under-/over-confidence and under/over-prediction (adapted from (Laio and Tamea, 2007)).

29

842 843

844

845 846 847 848 849 850 851 852 853 854 855 856 857 858 859

860

861 862 863 864 865 866 867 868 869

870 871 872

summarise the predictive QQ plots. This index is related to the area between the p-value curve and the 1:1 line and is given by: PNt th t=1 pQt − pQt (9) α=1−2 Nt where pQt and pth Qt are the observed and theoretical p-values of Qt , respectively. The value of α varies between 0 and 1, where a value of α = 0 represents unreliable forecast distributions and occurs when all observed p values equal either 0 or 1 (i.e. all observed flows lie outside of the estimated forecast distributions), while α = 1 represents perfect reliability. Reliability is a necessary, but not sufficient, attribute of good quality forecasts; thus, good reliability does not necessarily indicate good forecasts (on the other hand, poor reliability will indicate poor forecasts). Reliable forecasts may be consistent with the observed data, but may also be relatively uninformative (e.g. forecasts based on climatology). Resolution, on the other hand, relates to how informative the forecasts are, with the outcome being strongly conditioned upon the forecast if the forecasts have good resolution (Mason and Stephenson, 2008). In this study, the relative resolution metric proposed by Renard et al. (2010) is used to quantify forecast resolution. This metric is calculated according to: π (rel) =

N ˜ 1 XQ med,t ˜ N t=1 Qsd,t

(10)

˜ med,t and Q ˜ sd,t are the median and standard deviation of the simuwhere Q lated flow distribution at time t, respectively. Low values of π (rel) indicate wide uncertainty distributions, where the standard deviation is large relative to the median values, while high values indicate narrow distributions, with a low standard deviation relative to the median. Like reliability, good resolution is not a sufficient condition of forecast quality and, as such, it does not necessarily indicate good forecasts. Consequently, it is important that the reliability and resolution diagnostics are considered in conjunction with one another when assessing the performance of the models. 3.4.2. Utility for supporting decision making To assess the usefulness of the models developed in this study in regard to the management problem being addressed, the assessment metrics described 30

873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902

903

904 905 906 907

in the previous section were computed based on all flows and based only on those flows greater than the 75th percentile historical flow for each catchment. For the case study presented, it is of greatest importance to accurately and reliably forecast high flows that would otherwise drain to the ocean if not diverted to the north. If such flows are not accurately forecast, there may be missed diversion opportunities or situations where flows are prematurely diverted before the downstream water requirements are met. The 75th percentile historical flow (approximately 7600 ML/day, 3200 ML/day and 2500 ML/day for catchments A2390512, A2390514 and A2390519, respectively) was selected as an arbitrary threshold, above which the optimum management of flows would provide the most benefit. In addition, to provide a baseline with which to compare the performance of the models to, the assessment metrics were also computed using the distribution of historical streamflows for each month from June to November observed at each flow station between 1971 and 2011. Again, these were computed for all flows and for high flows only. This set of baseline or reference metrics enables an assessment of whether the models provide more useful streamflow forecasts than climatology alone can provide. From a flow management perspective, the most important forecasts are those generated for flow station A2390512 at the bottom of the Drain M system, since forecasts at this location provide important information about volumes flowing into Lake George, which is required before questions regarding any diversions to the north can be addressed. In AWE (2009), it was assumed that flows would not be diverted northward from Drain M until the 30GL/year requirement of Lake George had been met. As such, cumulative monthly flows over the months June to November were computed for each year in the validation period in order to assess the total volumes flowing into Lake George up to a given month in comparison to the 30GL/year requirement. These cumulative totals were computed based on the median forecasts, as well as the upper and lower 95% forecast limits, according to: ( ˜ for mn = 6 ˜ cum (mn, yr) = Q(mn, yr) P (11) Q mn−1 ˜ Q(mn, yr) + i=6 Q(i, yr) for mn = 7, . . . , 11 ˜ where, in this case, Q(mn, yr) is the median, upper or lower 95th percentile forecast for month mn in year yr and Q(i, yr) are the observed flows for the preceding months i = 6, . . . , mn − 1 in year yr. Observed flows for the months preceding mn in each year were used in the calculation of the forecast 31

913

cumulative flows, since in an operational situation, this information would be available at the time of the forecast. As the 95% cumulative forecast limits similarly include observed flows for months preceding mn in their computation, the uncertainty associated with the cumulative forecast is only that associated with the forecast for the month of interest (since the observed flows are known with certainty).

914

4. Results and Discussion

908 909 910 911 912

915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943

Model performance results for the calibration periods are not presented in this paper, since different periods were used for calibrating the GR4J models than were used to calibrate the ANN and hybrid models, and therefore these results are not directly comparable (a rolling ten year period was used to calibrate the GR4J models, while a static calibration data set covering the period 1980-2011, but excluding the validation period between 2000-2004, was used to calibrate the ANN and hybrid models). The forecast accuracy results for all of the models developed when applied to the validation data from 2000 to 2004 (months June to November only) based on all flows (30 data points) are presented in Table 3. Also given in this table are the reference metrics based on the distribution of monthly historical flows over the validation period. These are presented as models hist 0512, hist 0514 and hist 0519. The best results for each catchment according to each performance metric are highlighted in bold. It is immediately obvious from this table that the hybrid models were most accurate according to the median forecast accuracy metrics (i.e. NSE, RMSE and percent bias) when applied to forecast flows at stations A2390512 and A2390514, while the historical monthly flow distributions resulted in superior median forecast accuracy for station A2390519. For catchment A2390512, model hybrid 0512 achieved a 30% reduction in the RMSE when compared with ANN 0512, while for catchment A2390514, model hybrid 0514 resulted in a RMSE 12% lower than that computed for model ANN 0514. A similar improvement in median accuracy can be seen when comparing models hybrid 0519 and ANN 0519; however, neither of these models were able to generate better median forecasts than those based on climatology (hist 0519). These results are indicative of the gain in median forecast accuracy achieved by using the hybrid dynamic and statistical streamflow forecasting approach, as this is best seen by comparing the performance results of the hybrid and ANN models. Less obvious from Table 3 is how well the models performed in terms of the 32

944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981

forecast distributions generated. For catchment A2390512, model GR4J 0512 has the highest value of α, but a hit rate of only 73%. The predictive QQ plot for this model, shown in Fig. 6 (a), indicates that this model is somewhat over-confident, particularly at the upper end of the distribution, which explains why some of the validation period observations are not accounted for within the 95% forecast intervals. However, it can also be seen in Table 3 that the forecast distribution generated by GR4J 0512 has relatively low resolution in comparison to models ANN 0512 and hybrid 0512. The predictive QQ plots for these models are very similar, as seen in Fig. 6 (a), and indicate that both ANN 0512 and hybrid 0512 are under-confident (forecast uncertainty has been over-estimated), again, particularly in the upper ranges of the distribution, corresponding to higher quantiles. This is also reflected in the high hit rates for these models. Nevertheless, hybrid 0512 also has the highest resolution forecasts of the models developed for this catchment, with a standard deviation that is approximately two thirds of the median value on average. In comparison, the climatological forecasts (hist 0512) have good reliability, but low resolution. For catchment A2390514, model GR4J 0514 has the highest resolution, but lowest reliability according to both the α values and hit rates. The predictive QQ plot for this model, shown in Fig. 6 (b), indicates that the lower reliability associated with this model’s forecasts is due to over-prediction, particularly of lower flows that would tend to fall in the lower range of the forecast distribution. This is also reflected in the positive median bias. The QQ plots for models ANN 0514 and hybrid 0514 are, again, very similar to one another, as can be seen in Fig. 6 (b), and again reveal that these models are under-confident to some degree. This is confirmed by the hit rates of 100%, which indicate that all of the validation period streamflows fall within the 95% forecast intervals for these models. As seen in Table 3, the π (rel) values resulting from all of the models developed for catchment A2390514 are low with standard deviations approximately 2-3 times the median values on average (or approximately 10 times in the case of the historical distributions). This suggests that none of the models developed for this catchment are overly informative about what flows that might be expected under different catchment and climate conditions. Finally, for catchment A2390519, the α values presented in Table 3 suggest that all of the forecast distributions generated for this catchment have relatively low reliability. From the predictive QQ plots shown in Fig. 6 (c), this appears to be primarily due to over-prediction, with models ANN 0519 and hybrid 0519 also being over-confident. Resolution of the forecast distributions is again 33

982 983 984

relatively low for all models, with standard deviations approximately 1.5-2 times the median values on average (or approximately 3.5 times the median for hist 0519). Table 3: Comparison of the different models considered for the validation period on each catchment based on all flows. NSE Model GR4J 0512 ANN 0512 hybrid 0512 hist 0512 GR4J 0514 ANN 0514 hybrid 0514 hist 0514 GR4J 0519 ANN 0519 hybrid 0519 hist 0519

Quantile of Observed p-value

(a)

-0.16 0.78 0.89 0.12 -0.38 0.68 0.75 0.12 -2.08 0.00 0.20 0.38

A2390512 1

RMSE (ML/month) 10145 4376 3075 8731 5642 2663 2340 4413 4987 2772 2481 2188

0.5

α

π (rel)

0.86 0.75 0.73 0.80 0.73 0.79 0.77 0.86 0.72 0.53 0.53 0.69

1.04 1.42 1.48 0.31 0.57 0.35 0.38 0.09 0.54 0.65 0.67 0.28

A2390514

(b) 1

Perfect Reliability hist GR4J ANN hybrid

Bias (%) -38 12 10 -46 28 -29 -22 -59 53 86 86 1

0

0.5

0 0

0.5 Theoretical Quantile of U[0,1]

1

A2390519

(c) 1

0.5

Hit Rate (%) 73 97 100 100 87 100 100 100 97 100 100 100

0 0

0.5 Theoretical Quantile of U[0,1]

1

0

0.5 1 Theoretical Quantile of U[0,1]

Figure 6: Predictive QQ plots obtained based on the forecast distributions generated using the GR4J, ANN and hybrid models and on the monthly historical distributions when applied to catchments (a) A2390512, (b) A2390514 and (c) A2390519. 985 986 987 988 989 990

The same metrics as presented in Table 3 are also presented in Table 4, though this time being based only on flows greater than the 75th percentile historical flow for each catchment (8 data points). The best results for each catchment according to each performance metric are again highlighted in bold. While caution is urged when interpreting these results, since they are computed based on 8 data points only, this table reveals that when applied

34

991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013

for forecasting high flows, the hybrid models tend to produce the most accurate median forecasts and the most reliable and precise forecast distributions. Although this result can not be seen for catchment A2390519, the ANN 0519 and hybrid 0519 models both generate forecast distributions with comparable reliability to the historical monthly streamflow distributions (hist 0519), but with considerably higher resolution; thus, providing somewhat more informative high flow forecasts than those based on climatology alone. This is in contrast to model GR4J 0519, whose forecast distribution is equally uninformative as the hist 0519 forecast distribution for high flows. As mentioned in Section 3.4.2, it is especially important for optimal flow management to have accurate and reliable streamflow forecasts at flow station A2390512, particularly for higher flows that may return more benefit if managed appropriately. For this catchment, only model hybrid 0512 was able to provide accurate, reliable and informative forecasts of high flows. For catchments A2390514 and A2390519 on the other hand, median forecast accuracy is relatively poor for all models when based on high flows only. However, in the case of the ANN and hybrid models, forecast accuracy may potentially be improved in future as more data become available that represent the important catchment processes (e.g. rainfall-groundwater-runoff interactions), to which these models may be recalibrated. While the GR4J model may also be recalibrated to data more representative of current catchment conditions, any future changes in the rainfall-runoff relationship due to the influence of the groundwater would not be captured by this model. Table 4: Comparison of the different models considered for the validation period on each catchment based on flows > the 75th percentile historical flow. NSE Model GR4J 0512 ANN 0512 hybrid 0512 hist 0512 GR4J 0514 ANN 0514 hybrid 0514 hist 0514 GR4J 0519 ANN 0519 hybrid 0519 hist 0519

1014 1015

-2.81 0.27 0.74 -2.41 -0.42 0.33 0.42 -1.11 -2.31 -2.20 -1.53 -0.47

RMSE (ML/month) 17723 7455 4476 16113 7128 4698 4375 8352 5176 4609 4096 3125

Bias (%) -72 3 -1 -64 -66 -43 -26 -73 -72 44 47 -43

α

π (rel)

0.36 0.82 0.83 0.57 0.54 0.67 0.77 0.59 0.55 0.60 0.55 0.59

1.42 3.30 3.43 0.39 0.62 0.77 0.97 0.15 0.43 1.57 1.65 0.40

Hit Rate (%) 38 88 100 100 88 100 100 100 88 100 100 100

The results for catchment A2390512 presented in Tables 3 and 4 can be seen visually in Fig. 7, where time series plots of the monthly median forecasts 35

1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053

and the 95% forecast limits produced by models GR4J 0512, ANN 0512 and hybrid 0512 are shown in comparison to the corresponding observed monthly flows and monthly historical flow distributions. The black dashed line in this figure denotes the 75th percentile historical flow. Most notable in this figure is the improved forecast accuracy on high flows and the significant reduction in forecast uncertainty achieved by the ANN 0512 and hybrid 0512 models in comparison to the historical distributions. Consequently, both the ANN and hybrid models can be considered useful for managing flows at stations A2390512 and A2390514, given that they provide a significant improvement on the information currently available to decision makers. The reduction in forecast uncertainty allows decision makers to place more trust in the median forecasts, which should ultimately lead to better flow management. The above results are reinforced by inspecting the observed and forecast cumulative monthly flows for catchment A2390512 over the months June to November for each year in the validation period, together with the 95% cumulative forecast limits, as shown in Fig 8. The red dashed line in this plot denotes the 30GL/year Lake George requirement. Differences between the forecast (blue) and observed (black) cumulative flows may represent missed opportunities to divert flow northward or instances where flows are diverted before Lake George requirements are fulfilled. As can be seen in this figure, the plots resulting from the ANN and hybrid models are almost identical, with both models able to accurately and reliably forecast cumulative flow into Lake George for the upcoming month. Therefore, from a management perspective, the additional effort required to build the hybrid models may not be warranted, given that the statistical ANN-based forecasting model is sufficient for addressing questions of whether and what volumes of flow should be diverted to the north. However, given that the high flow median forecasts generated by hybrid 0512 are considerably more accurate than those produced by ANN 0512 (see Table 4), this additional effort may be considered worthwhile. Additionally, as can be seen in Fig 8, the forecast cumulative flows obtained using hybrid 0512 are slightly more accurate and have slightly less associated uncertainty (narrower forecast limits) than those obtained using ANN 0512. Despite the relatively poor performance of the GR4J model on the Drain M catchments (as seen in Tables 3 and 4 and Figs. 7-8), it is apparent that the soil moisture data generated using the GR4J models was of sufficient accuracy to improve the forecast accuracy of the ANN models across all catchments. To diagnose why the GR4J models performed so poorly, yet 36

100

(a)

95% forecast limits Historical 95% distribution limits

40

60

80

Observed Median forecast Historical monthly median

0

20

Flow (GL)

GR4J_0512

Aug 2000

Aug 2001

(b)

Aug 2002

Aug 2003

Aug 2004

Aug 2003

Aug 2004

Aug 2003

Aug 2004

60 40 0

20

Flow (GL)

80

100

ANN_0512

Aug 2000

Aug 2001

(c)

Aug 2002

60 40 0

20

Flow (GL)

80

100

hybrid_0512

Aug 2000

Aug 2001

Aug 2002

Figure 7: Time series plots of median forecast flows and 95% uncertainty limits versus observed monthly flows and monthly historical flow distributions obtained using the (a) GR4J, (b) ANN and (c) hybrid models when applied to catchment A2390512. The black dashed line denotes the high flow threshold for this catchment.

37

GR4J_0512

150

Observed Median forecast

●

100

●

● ● ● ●

50

● ●

● ●

●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

●

Aug 2001

Aug 2002

●

● ● ●

● ●

● ● ● ● ●

Aug 2003

Aug 2004

ANN_0512

100

150

(b)

●

● ● ● ● ●

● ●

50

● ●

● ● ● ● ● ●

● ●

0

Cumulative Flow (GL)

●

● ● ● ●

● ● ● ●

Aug 2000

●

● ● ● ●

● ● ● ●

Aug 2001

● ●

● ●

● ● ● ●

●

Aug 2000

● ● ● ● ●

Aug 2002

●

● ●

●

●

Aug 2003

Aug 2004

hybrid_0512

100

150

(c)

●

● ● ● ● ●

● ●

50

● ● ● ● ● ●

● ● ● ●

0


95% forecast limits Lake George requirement

● ● ●

0


(a)

● ●

Aug 2000

● ● ● ●

Aug 2001

●

● ●

● ● ● ● ● ● ●

● ● ● ● ●

Aug 2002

●

●

Aug 2003

● ● ●

Aug 2004

Figure 8: Observed and forecast cumulative flow at gauge A2390512 obtained using the (a) GR4J, (b) ANN and (c) hybrid models.

38

1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091

were still able to generate useful soil moisture data, actual rainfall and PET data from the validation period were used as inputs to the models, allowing the performance of the GR4J models to be assessed, given a “perfect climate forecast”. The model performance results based on all flows using actual climate data for the forecast month are presented in Table 5. Comparing these results with those presented in Table 3, it can be seen that the poor quality of the streamflow forecasts for catchments A2390512 and A2390514 is primarily related to the quality of the climate forecasts (with NSE > 0.8 and bias < 15% obtained when actual climate data were used). It is not surprising then that the ANN and hybrid models are more accurate than the GR4J models when using forecast climate data, since the GR4J model is more sensitive to errors in the rainfall and PET forecasts than the ANN and hybrid models. Firstly, unlike the ANN and hybrid models, the GR4J model required that large-scale monthly climate forecasts be downscaled to provide local daily forecasts of rainfall and PET and, while these climate forecasts may be reasonable at a monthly time scale, this may not be the case when converted to daily data. Secondly, any errors associated with these inputs are propagated through the GR4J model and accumulated over time, such that initial catchment conditions may be inappropriately represented at subsequent time steps. For the ANN and hybrid models, on the other hand, initial catchment conditions are represented by a number of observed predictors and the uncertain rainfall forecast is only one of at least five model inputs used to predict the streamflow (see Appendix A for the number of selected inputs in each model). Furthermore, by including the POAMA rainfall forecasts as inputs when calibrating the ANN and hybrid models, these models were calibrated to correct for any bias or consistent error in these forecasts or the downscaling method used to provide local scale data. In comparison, the GR4J models were calibrated to actual daily rainfall and PET data, making the assumption that the climate forecasts accurately represent the future climate. As such, if the forecast climate data do not accurately represent the observed climate data, the calibrated model parameters may not be appropriate. The GR4J simulated SMI data used in the hybrid models are not similarly affected by any biases in the climate forecasts, as these are based on the calibration results using observed rainfall and PET data. It can be seen in Table 5 that, even when using observed climate inputs, the GR4J model was unable to accurately forecast streamflows for catchment A2390519. In fact, as was seen in Tables 3 and 4, none of the models developed in this study were able to improve upon the unskilled forecasts provided 39

Table 5: GR4J model for the validation period when actual rainfalls for the forecast month were used to generate forecasts. NSE Model GR4J 0512 GR4J 0514 GR4J 0519

1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121

0.92 0.83 -0.98

RMSE (ML/month) 2661 1955 3995

Bias (%) 8 14 150

α

π (rel)

0.85 0.69 0.61

1.19 0.88 2.35

Hit Rate (%) 87 100 50

by the monthly historical streamflow distributions for this catchment. This is likely due to the change in the catchment rainfall-runoff relationship, which occurred around the late 1990s, as seen in Fig. 3, and the inability of the models developed in this study to properly represent this change. In the case of the GR4J model, the change in rainfall-runoff relationship occurred toward the middle of the 10-year calibration periods used to estimate the model parameters and initial conditions subsequently used to generate the validation period (2000-2004) forecasts. Consequently, much of the calibration period was unrepresentative of validation period conditions (i.e. the model was calibrated to a wetter runoff regime than the drier regime experienced during the validation period, as suggested by the high positive bias of 53% computed based on all flows). This could potentially be ameliorated to some degree if a shorter rolling period (e.g. five years) was chosen for calibrating the GR4J models (since the more recent calibration data should be more representative of catchment conditions over the forecasting period). However, this may also result in greater forecast uncertainty. For the ANN and hybrid models, the lack of groundwater data for this catchment, which are needed to appropriately characterise the monthly flows under a falling groundwater table, may explain the relatively poor performance of these models when used to forecast catchment A2390519 streamflows. Thus, if telemetry were introduced to provide groundwater data from wells in this catchment (near real-time data are required for operational purposes), the inclusion of such data as inputs to the ANN and hybrid models may result in improved forecast accuracy. Shown in Fig. 9 are time series plots of the forecasts generated by models GR4J 0519, ANN 0519 and hybrid 0519 in comparison to observed flows and the monthly historical flow distributions for catchment A2390519 over the validation period. Despite the poor median forecast accuracy associated with models ANN 0519 and hybrid 0519, it can be seen in comparison with the climatological forecasts that both of these models resulted in a considerable reduction in forecast uncertainty.

40

(a)

95% forecast limits Historical 95% distribution limits

20

40

Observed Median forecast Historical monthly median

0

Flow (GL)

60

GR4J_0519

Aug 2000

Aug 2001

(b)

Aug 2002

Aug 2003

Aug 2004

Aug 2003

Aug 2004

Aug 2003

Aug 2004

40 20 0

Flow (GL)

60

ANN_0519

Aug 2000

Aug 2001

(c)

Aug 2002

40 20 0

Flow (GL)

60

hybrid_0519

Aug 2000

Aug 2001

Aug 2002

Figure 9: Time series plots of median forecast flows and 95% uncertainty limits versus observed monthly flows and monthly historical flow distributions obtained using the (a) GR4J, (b) ANN and (c) hybrid models when applied to catchment A2390519. The black dashed line denotes the high flow threshold for this catchment.

41

1122

1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158

5. Conclusions In this study, monthly streamflow forecasting models were developed for three catchments in the South East of South Australia in order to support flow management decision making in this region. It was hypothesized that a hybrid streamflow forecasting approach, which uses the output of the dynamic GR4J hydrological model as an auxiliary input to a statistical ANN-based forecasting model, would take advantage of the complementary strengths of these conceptually different modelling methods to provide the most accurate and reliable forecasts for the case study catchments considered. To test this hypothesis, hybrid model performance results were compared to those obtained using both the ANN and GR4J forecasting approaches individually, as well as those based on historical monthly streamflow distributions. For two of the three catchments considered, the hybrid models were indeed found to generate the most accurate median forecasts and the most precise (least uncertain) forecast distributions, particularly in relation to the higher flows for which optimal management would be expected to return the greatest benefit. Furthermore, from a management perspective, it was found that both the hybrid and ANN models generated considerably more useful forecasts than those based on climatology for these two catchments. For the remaining catchment, however, it was found that historical monthly flow distributions gave more accurate forecasts than any of the models developed for this catchment. It is likely that this was due to the change in the catchment rainfall-runoff relationship which occurred prior to the validation period and the inability of the models developed to represent this change given data and model limitations. The results of the study demonstrated that a primarily statistical forecasting approach, as opposed to a dynamical forecasting model, was most suitable for the case study catchments, with the ANN and hybrid models generally outperforming the GR4J model both in terms of the median forecasts and the forecast distributions. This was largely due to the uncertainty associated with the climate forecasts and the sensitivity of the dynamical forecasting approach to this uncertain information. Furthermore, the ability of the ANN and hybrid models to easily include groundwater data meant that these models were better able to characterise the observed change in relationship between catchment rainfall and runoff than the GR4J models, which could not easily incorporate this information. The primary advantage of the hybrid models over the ANN models was due to their better rep42

1181

resentation of initial catchment conditions, through the inclusion of GR4J simulated soil moisture inputs. While the sample size of 8 data points was very small, the inclusion of simulated soil moisture inputs was shown to aid particularly in the forecasting of high flows, with the hybrid models tending to result in better forecasts than the ANN models in terms of median accuracy, as well as reliability and resolution of the forecast distributions, on all three catchments. A limitation of the hybrid modelling approach, however, is that it requires the time and expertise to develop two different types of models for each catchment. In the case study presented, the very slightly improved ability of the hybrid model to forecast 1-month ahead cumulative flows at station A2390512 (a key location for making flow management decisions) did not appear to warrant the additional effort of developing the GR4J model to provide soil moisture data. This is particularly true given that the statistical (ANN) model alone was found to be sufficient for addressing questions related to diversions from Drain M. However, the generally superior performance of the hybrid models when compared with the ANN models has demonstrated the potential for this hybridisation approach to improve forecast accuracy and reduce forecast uncertainty. While the results presented in this paper are case study focused and cannot be generalised, the methods used can be easily applied to other regions and may be of particular interest for highly variable, ephemeral catchments where soil moisture and potentially groundwater are important drivers of runoff.

1182

Acknowledgements

1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180

1186

This work was supported by the Goyder Institute for Water Research, Project E.2.4. The authors are grateful to the two anonymous reviewers and the associate editor of Journal of Hydrology for their constructive and insightful comments, which have helped to improve and clarify this paper.

1187

References

1183 1184 1185

1188 1189 1190

1191 1192

Abebe, A.J., Price, R.K., 2003. Managing uncertainty in hydrological models using complementary models. Hydrological Sciences Journal 48, 679–692. doi:10.1623/hysj.48.5.679.51450. Abrahart, R.J., Anctil, F., Coulibaly, P., Dawson, C.W., Mount, N.J., See, L.M., Shamseldin, A.Y., Solomatine, D.P., Toth, E., Wilby, R.L., 2012. 43

1193 1194 1195

1196 1197 1198

1199 1200 1201

1202 1203 1204 1205

1206 1207 1208 1209

1210 1211 1212 1213

1214 1215

1216 1217 1218

1219 1220 1221 1222

Two decades of anarchy? Emerging themes and outstanding challenges for neural network river forecasting. Progress in Physical Geography 36, 480–513. doi:10.1177/0309133312444943. Akaike, H., 1974. A new look at the statistical model identification. IEEE Transactions of Automatic Control 19, 716–723. doi:10.1109/TAC.1974. 1100705. Anctil, F., Michel, C., Perrin, C., Andréassian, V., 2004. A soil moisture index as an auxiliary ANN input for stream flow forecasting. Journal of Hydrology 286, 155–167. doi:10.1016/j.jhydrol.2003.09.006. Anctil, F., Perrin, C., Andréassian, V., 2003. ANN output updating of lumped conceptual rainfall/runoff forecasting models. JAWRA Journal of the American Water Resources Association 39, 1269–1279. doi:10.1111/ j.1752-1688.2003.tb03708.x. Andrews, F.T., Croke, B.F.W., Jakeman, A.J., 2011. An open software environment for hydrological model assessment and development. Environmental Modelling & Software 26, 1171–1185. doi:10.1016/j.envsoft. 2011.04.006. AWE, 2009. Coorong South Lagoon restoration project - hydrological investigation final report. Technical Report. Australian Water Environments Pty Ltd (AWE) for Department of Water, Land and Biodiversity Conservation, Adelaide. Bishop, C.M., 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford. Boughton, W., 2005. Catchment water balance modelling in Australia 1960– 2004. Agricultural Water Management 71, 91–116. doi:10.1016/j.agwat. 2004.10.012. Bowden, G.J., Maier, H.R., Dandy, G.C., 2012. Real-time deployment of artificial neural network forecasting models: Understanding the range of applicability. Water Resources Research 48, W10549. doi:10.1029/ 2012wr011984.

44

1223 1224 1225 1226

1227 1228 1229

1230 1231 1232

1233 1234 1235 1236

1237 1238 1239 1240 1241

1242 1243 1244 1245

1246 1247 1248 1249

1250 1251 1252 1253

Brath, A., Montanari, A., Toth, E., 2002. Neural networks and nonparametric methods for improving real-time flood forecasting through conceptual hydrological models. Hydrology and Earth System Sciences 6, 627–639. doi:10.5194/hess-6-627-2002. Brocca, L., Melone, F., Moramarco, T., 2008. On the estimation of antecedent wetness conditions in rainfall–runoff modelling. Hydrological Processes 22, 629–642. doi:10.1002/hyp.6629. Chen, J., Adams, B.J., 2006. Integration of artificial neural networks with conceptual models in rainfall-runoff modeling. Journal of Hydrology 318, 232–249. doi:10.1016/j.jhydrol.2005.06.017. Coron, L., Andréassian, V., Perrin, C., Lerat, J., Vaze, J., Bourqui, M., Hendrick, F., 2012. Crash testing hydrological models in contrasted climate conditions: An experiment on 216 Australian catchments. Water Resources Research 48, W05552. doi:10.1029/2011wr011721. Corzo, G.A., Solomatine, D.P., Hidayat, de Wit, M., Werner, M., Uhlenbrook, S., Price, R.K., 2009. Combining semi-distributed processbased and data-driven models in flow simulation: a case study of the Meuse river basin. Hydrology and Earth System Sciences 13, 1619–1634. doi:10.5194/hess-13-1619-2009. Dawson, C.W., Abrahart, R.J., See, L.M., 2007. Hydrotest: A web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts. Environmental Modelling & Software 22, 1034–1052. doi:10.1016/j.envsoft.2006.06.008. Demirel, M.C., Booij, M.J., Hoekstra, A.Y., 2013. Effect of different uncertainty sources on the skill of 10 day ensemble low flow forecasts for two hydrological models. Water Resources Research 49, 4035–4053. doi:10.1002/wrcr.20294. Evin, G., Thyer, M., Kavetski, D., McInerney, D., Kuczera, G., 2014. Comparison of joint versus postprocessor approaches for hydrological uncertainty estimation accounting for error autocorrelation and heteroscedasticity. Water Resources Research 50, 2350–2375. doi:10.1002/2013wr014185.

45

1254 1255 1256

1257 1258 1259 1260

1261 1262 1263

1264 1265 1266 1267

1268 1269

1270 1271

1272 1273 1274

1275 1276 1277 1278

1279 1280 1281 1282

1283 1284

Ferdowsian, R., Pannell, D.J., McCarron, C., Ryder, A., Crossing, L., 2001. Explaining groundwater hydrographs: separating atypical rainfall events from time trends. Soil Research 39, 861–876. doi:10.1071/SR00037. Galelli, S., Humphrey, G.B., Maier, H.R., Castelletti, A., Dandy, G.C., Gibbs, M.S., 2014. An evaluation framework for input variable selection algorithms for environmental data-driven models. Environmental Modelling & Software 62, 33–51. doi:10.1016/j.envsoft.2014.08.015. Garen, D., 1992. Improved techniques in regression-based streamflow volume forecasting. Journal of Water Resources Planning and Management 118, 654–670. doi:10.1061/(ASCE)0733-9496(1992)118:6(654). Gibbs, M.S., Dandy, G.C., Maier, H.R., 2014. Assessment of the ability to meet environmental water requirements in the Upper South East of South Australia. Stochastic Environmental Research and Risk Assessment 28, 39–56. doi:10.1007/s00477-013-0735-9. Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Pub. Co., Reading, Mass. Haario, H., Saksman, E., Tamminen, J., 2001. An adaptive Metropolis algorithm. Bernoulli 7, 223–242. Hsu, K.l., Gupta, H.V., Sorooshian, S., 1995. Artificial neural network modeling of the rainfall-runoff process. Water Resources Research 31, 2517–2530. doi:10.1029/95wr01955. Isik, S., Kalin, L., Schoonover, J.E., Srivastava, P., Lockaby, B.G., 2013. Modeling effects of changing land use/cover on daily streamflow: An artificial neural network and curve number based hybrid approach. Journal of Hydrology 485, 103–112. doi:10.1016/j.jhydrol.2012.08.032. Jain, A., Srinivasulu, S., 2006. Integrated approach to model decomposed flow hydrograph using artificial neural network and conceptual techniques. Journal of Hydrology 317, 291–306. doi:10.1016/j.jhydrol.2005.05. 022. Jeffrey, S.J., Carter, J.O., Moodie, K.B., Beswick, A.R., 2001. Using spatial interpolation to construct a comprehensive archive of Australian climate

46

1285 1286

1287 1288 1289

1290 1291

1292 1293 1294

1295 1296 1297

1298 1299 1300

1301 1302 1303

1304 1305 1306 1307

1308 1309 1310

1311 1312 1313 1314

data. Environmental Modelling & Software 16, 309–330. doi:10.1016/ s1364-8152(01)00008-1. Khan, M.S., Coulibaly, P., 2006. Bayesian neural network for rainfallrunoff modeling. Water Resources Research 42, W07409. doi:10.1029/ 2005wr003971. Kingston, G., Maier, H., Lambert, M., 2010. Bayesian Artificial Neural Networks: with Applications in Water Resources Engineering. VDM Verlag. Kingston, G.B., Lambert, M.F., Maier, H.R., 2005. Bayesian training of artificial neural networks used for water resources modeling. Water Resources Research 41, W12409. doi:10.1029/2005wr004152. Kingston, G.B., Maier, H.R., Lambert, M.F., 2008. Bayesian model selection applied to artificial neural networks used for water resources modeling. Water Resources Research 44, W04419. doi:10.1029/2007wr006155. Kokkonen, T.S., Jakeman, A.J., 2001. A comparison of metric and conceptual approaches in rainfall-runoff modeling and its implications. Water Resources Research 37, 2345–2352. doi:10.1029/2000WR000299. Laio, F., Tamea, S., 2007. Verification tools for probabilistic forecasts of continuous hydrological variables. Hydrology and Earth System Sciences 11, 1267–1277. doi:10.5194/hess-11-1267-2007. Liu, S., Xu, J., Zhao, J., Xie, X., Zhang, W., 2013. An innovative method for dynamic update of initial water table in XXT model based on neural network technique. Applied Soft Computing 13, 4185–4193. doi:10.1016/ j.asoc.2013.06.024. Loukas, A., Vasiliades, L., 2014. Streamflow simulation methods for ungauged and poorly gauged watersheds. Natural Hazards and Earth System Sciences 14, 1641–1661. doi:10.5194/nhess-14-1641-2014. Maier, H.R., Dandy, G.C., 2000. Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environmental Modelling & Software 15, 101–124. doi:10. 1016/S1364-8152(99)00007-9.

47

1315 1316 1317 1318 1319

1320 1321 1322

1323 1324 1325 1326 1327

1328 1329 1330 1331 1332

1333 1334 1335

1336 1337 1338

1339 1340 1341

1342 1343 1344 1345

Maier, H.R., Jain, A., Dandy, G.C., Sudheer, K.P., 2010. Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions. Environmental Modelling & Software 25, 891–909. doi:10.1016/j.envsoft.2010. 02.003. Mason, S.J., Stephenson, D.B., 2008. How do we know whether seasonal climate forecasts are any good?. Springer Netherlands, Dordrecht. pp. 259–289. Mount, N.J., Maier, H.R., Toth, E., Elshorbagy, A., Solomatine, D., Chang, F.J., Abrahart, R.J., 2016. Data-driven modelling approaches for sociohydrology: opportunities and challenges within the Panta Rhei Science Plan. Hydrological Sciences Journal, 1–17. doi:10.1080/02626667.2016. 1159683. Mustafa, S., Lawson, J., Leaney, F., Osei-Bonsu, K., 2006. Land-use impact on water quality and quantity in the Lower South East, South Australia. Technical Report DWLBC Report 2006/25. Government of South Australia, Department of Water, Land and Biodiversity Conservation, Adelaide. Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models part I - a discussion of principles. Journal of Hydrology 10, 282–290. doi:10.1016/0022-1694(70)90255-6. Nilsson, P., Uvo, C.B., Berndtsson, R., 2006. Monthly runoff simulation: Comparing and combining conceptual and neural network models. Journal of Hydrology 321, 344–363. doi:10.1016/j.jhydrol.2005.08.007. Noori, N., Kalin, L., 2016. Coupling SWAT and ANN models for enhanced daily streamflow prediction. Journal of Hydrology 533, 141–151. doi:10. 1016/j.jhydrol.2015.11.050. Pagano, T.C., Garen, D.C., Perkins, T.R., Pasteris, P.A., 2009. Daily updating of operational statistical seasonal water supply forecasts for the western U.S. JAWRA Journal of the American Water Resources Association 45, 767–778. doi:10.1111/j.1752-1688.2009.00321.x.

48

1346 1347 1348

1349 1350 1351 1352 1353

1354 1355 1356 1357 1358 1359 1360 1361

1362 1363 1364 1365

1366 1367 1368

1369 1370 1371 1372

1373 1374 1375

1376 1377 1378

Perrin, C., Michel, C., Andréassian, V., 2003. Improvement of a parsimonious model for streamflow simulation. Journal of Hydrology 279, 275–289. doi:10.1016/S0022-1694(03)00225-7. Petheram, C., Potter, N., Vaze, J., Chiew, F., Zhang, L., 2011. Towards better understanding of changes in rainfall-runoff relationships during the recent drought in south-eastern Australia, in: MODSIM 2011, 19th International Congress on Modelling and Simulation, Modelling and Simulation Society of Australia and New Zealand. Plummer, N., Tuteja, N., Wang, Q.J., Wang, E., Robertson, D., Zhou, S., Schepen, A., Alves, O., Timbal, B., Puri, K., 2009. A seasonal water availability prediction service: Opportunities and challenges., in: Anderssen, R.S., Braddock, R.D., Newham, L.T.H. (Eds.), 18th World IMACS Congress and MODSIM09 International Congress on Modelling and Simulation, Modelling and Simulation Society of Australia and New Zealand and International Association for Mathematics and Computers in Simulation. pp. 80–94. Pokhrel, P., Wang, Q.J., Robertson, D.E., 2013. The value of model averaging and dynamical climate model predictions for improving statistical seasonal streamflow forecasts over Australia. Water Resources Research 49, 6671– 6687. doi:10.1002/wrcr.20449. Qi, M., Zhang, G.P., 2001. An investigation of model selection criteria for neural network time series forecasting. European Journal of Operational Research 132, 666–680. doi:10.1016/S0377-2217(00)00171-5. Renard, B., Kavetski, D., Kuczera, G., Thyer, M., Franks, S.W., 2010. Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors. Water Resources Research 46, W05521. doi:10.1029/2009wr008328. Robertson, D.E., Pokhrel, P., Wang, Q.J., 2013. Improving statistical forecasts of seasonal streamflows using hydrological model output. Hydrology and Earth System Sciences 17, 579–593. doi:10.5194/hess-17-579-2013. Robertson, D.E., Wang, Q.J., 2012. A Bayesian approach to predictor selection for seasonal streamflow forecasting. Journal of Hydrometeorology 13, 155–171. doi:10.1175/jhm-d-10-05009.1. 49

1379 1380 1381 1382

1383 1384

1385 1386 1387

1388 1389 1390

1391 1392 1393 1394

1395 1396 1397 1398

1399 1400 1401 1402

1403 1404 1405 1406

1407 1408 1409

Rosenberg, E.A., Wood, A.W., Steinemann, A.C., 2011. Statistical applications of physically based hydrologic models to seasonal streamflow forecasts. Water Resources Research 47, W00h14. doi:10.1029/ 2010wr010101. Schwarz, G., 1978. Estimating the dimension of a model. The Annals of Statistics 6, 461–464. Shamseldin, A.Y., 1997. Application of a neural network technique to rainfall-runoff modelling. Journal of Hydrology 199, 272–294. doi:10. 1016/S0022-1694(96)03330-6. Shamseldin, A.Y., O’Connor, K.M., 2001. A non-linear neural network technique for updating of river flow forecasts. Hydrology and Earth System Sciences 5, 577–598. doi:10.5194/hess-5-577-2001. Shao, Q., Li, M., 2013. An improved statistical analogue downscaling procedure for seasonal precipitation forecast. Stochastic Environmental Research and Risk Assessment 27, 819–830. doi:10.1007/ s00477-012-0610-0. Sharma, A., 2000. Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 3 - A nonparametric probabilistic forecast model. Journal of Hydrology 239, 249–258. doi:10.1016/ S0022-1694(00)00348-6. Shrestha, D.L., Kayastha, N., Solomatine, D.P., 2009. A novel approach to parameter uncertainty analysis of hydrological models using neural networks. Hydrology and Earth System Sciences 13, 1235–1248. doi:10.5194/ hess-13-1235-2009. Song, X., Kong, F., Zhan, C., Han, J., 2012. Hybrid optimization rainfallrunoff simulation based on Xinanjiang model and artificial neural network. Journal of Hydrologic Engineering 17, 1033–1041. doi:10.1061/(ASCE)HE. 1943-5584.0000548. Srinivasulu, S., Jain, A., 2009. River flow prediction using an integrated approach. Journal of Hydrologic Engineering 14, 75–83. doi:10.1061/ (ASCE)1084-0699(2009)14:1(75).

50

1410 1411 1412 1413 1414

1415 1416 1417 1418 1419

1420 1421 1422 1423 1424

1425 1426 1427 1428

1429 1430

1431 1432 1433 1434 1435

1436 1437 1438 1439

1440 1441

Thyer, M., Renard, B., Kavetski, D., Kuczera, G., Franks, S.W., Srikanthan, S., 2009. Critical evaluation of parameter consistency and predictive uncertainty in hydrological modeling: A case study using Bayesian total error analysis. Water Resources Research 45, W00B14. doi:10.1029/ 2008wr006825. Toth, E., Brath, A., 2002. Flood forecasting using artificial neural networks in black-box and conceptual rainfall-runoff modelling, in: Rizzoli, A.E., Jakeman, A.J. (Eds.), Integrated Assessment and Decision Support, Proceedings of the 1st Biennial Meeting of the International Environmental Modelling and Software Society, iEMSs. pp. 166–171. Valipour, M., Banihabib, M.E., Behbahani, S.M.R., 2012. Parameters estimate of autoregressive moving average and autoregressive integrated moving average models and compare their ability for inflow forecasting. Journal of Mathematics and Statistics 8, 330–338. doi:10.3844/jmssp.2012.330. 338. Valipour, M., Banihabib, M.E., Behbahani, S.M.R., 2013. Comparison of the ARMA, ARIMA, and the autoregressive artificial neural network models in forecasting the monthly inflow of Dez dam reservoir. Journal of Hydrology 476, 433–441. doi:10.1016/j.jhydrol.2012.11.017. Venables, W.N., Ripley, B.D., 2002. Modern Applied Statistics with S. Fourth ed., Springer, New York. Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., Robinson, B.A., Hyman, J.M., Higdon, D., 2009. Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling. International Journal of Nonlinear Sciences and Numerical Simulation 10, 273–290. Wang, E., Zhang, Y., Luo, J., Chiew, F.H.S., Wang, Q.J., 2011a. Monthly and seasonal streamflow forecasts using rainfall-runoff modeling and historical weather data. Water Resources Research 47, W05516. doi:10.1029/ 2010wr009922. Wang, Q.J., Pagano, T.C., Zhou, S.L., Hapuarachchi, H.A.P., Zhang, L., Robertson, D.E., 2011b. Monthly versus daily water balance models in

51

1442 1443

1444 1445 1446

1447 1448 1449

1450 1451 1452 1453 1454

simulating monthly runoff. Journal of Hydrology 404, 166–175. doi:10. 1016/j.jhydrol.2011.04.027. Wang, Q.J., Robertson, D.E., Chiew, F.H.S., 2009. A Bayesian joint probability modeling approach for seasonal forecasting of streamflows at multiple sites. Water Resources Research 45, W05407. doi:10.1029/2008wr007355. Westra, S., Sharma, A., Brown, C., Lall, U., 2008. Multivariate streamflow forecasting using independent component analysis. Water Resources Research 44, W02437. doi:10.1029/2007wr006104. Wu, W., Dandy, G.C., Maier, H.R., 2014. Protocol for developing ANN models and its application to the assessment of the quality of the ANN model development process in drinking water quality modelling. Environmental Modelling & Software 54, 108–127. doi:10.1016/j.envsoft.2013. 12.016.

1457

Yaseen, Z.M., El-shafie, A., Jaafar, O., Afan, H.A., Sayl, K.N., 2015. Artificial intelligence based models for stream-flow forecasting: 2000-2015. Journal of Hydrology 530, 829–844. doi:10.1016/j.jhydrol.2015.10.038.

1458

Appendix A

1455 1456

52

Table A.1: Selected inputs for ANN models. Model ANN 0512a

Number of inputs 7

ANN 0514a

8

ANN 0519a

5

Selected Inputs Rain A2390519t Rain A2390512t Rain A2390512t−1 Evap A2390519t−1 GW SM T 020t AP I A2390514t−2 P OAM A Rain 26075 meant+1 Rain A2390514t−1 Rain A2390519t Rain A2390519t−2 Evap A2390514t Evap A2390519t AP I A2390519t−2 GW CM M 079t−1 P OAM A Rain 26075 meant+1 Rain A2390519t Evap A2390519t Evap A2390519t−1 AP I A2390519t−1 P OAM A Rain 26082 meant+1

No. of hidden nodes: 1; hidden layer activation: tanh; output layer activation: linear

53

Table A.2: Selected inputs for hybrid models. Model hybrid 0512a

Number of inputs 9

hybrid 0514a

9

hybrid 0519a

5

Selected Inputs Rain A2390519t Rain A2390512t Rain A2390512t−1 Evap A2390519t−1 GW SM T 020t SM I A2390512t SM I A2390512t−1 SM I A2390512t−2 P OAM A Rain 26075 meant+1 Rain A2390514t−1 Rain A2390519t Rain A2390519t−2 Evap A2390514t AP I A2390519t−2 GW CM M 079t−1 SM I A2390519t SM I A2390519t−1 P OAM A Rain 26075 meant+1 Rain A2390519t Evap A2390519t Evap A2390519t−1 SM I A2390519t P OAM A Rain 26082 meant+1

No. of hidden nodes: 1; hidden layer activation: tanh; output layer activation: linear

54