AMERICAN METEOROLOGICAL SOCIETY Journal of Hydrometeorology
EARLY ONLINE RELEASE This is a preliminary PDF of the author-produced manuscript that has been peer-reviewed and accepted for publication. Since it is being posted so soon after acceptance, it has not yet been copyedited, formatted, or processed by AMS Publications. This preliminary version of the manuscript may be downloaded, distributed, and cited, but please be aware that there will be visual differences and possibly some content differences between this version and the final published version. The DOI for this manuscript is doi: 10.1175/JHM-D-15-0130.1 The final published version of this manuscript will replace the preliminary version at the above DOI once it is available. If you would like to cite this EOR in a separate work, please use the following full citation: Zsoter, E., F. Pappenberger, P. Smith, R. Emerton, E. Dutra, F. Wetterhall, D. Richardson, K. Bogner, and G. Balsamo, 2016: Building a multi-model flood prediction system with the TIGGE archive. J. Hydrometeor. doi:10.1175/JHM-D15-0130.1, in press. © 2016 American Meteorological Society
Manuscript (non-LaTeX)
1
Click here to download Manuscript (non-LaTeX) Zsoter_JoHM.docx
Building a multi-model flood prediction system with the TIGGE archive
2 3
Ervin Zsoter, European Centre for Medium-Range Weather Forecasts, Reading, UK
4
Current address: European Centre for Medium-Range Weather Forecasts, Shinfield Park,
5
Reading, RG2 9AX, UK
6 7
Florian Pappenberger, European Centre for Medium-Range Weather Forecasts, Reading, UK
8
and College of Hydrology and Water Resources, Hohai University, Nanjing, China
9 10
Paul Smith, European Centre for Medium-Range Weather Forecasts, Reading, UK and
11
Lancaster Environment Centre, Lancaster University, UK
12 13
Rebecca Elizabeth Emerton, Department of Geography and Environmental Science,
14
University of Reading, Reading, UK and European Centre for Medium-Range Weather
15
Forecasts, Reading, UK
16 17
Emanuel Dutra, European Centre for Medium-Range Weather Forecasts, Reading, UK
18 19
Fredrik Wetterhall, European Centre for Medium-Range Weather Forecasts, Reading, UK
20 21
David Richardson, European Centre for Medium-Range Weather Forecasts, Reading, UK
22 23
Konrad Bogner, Swiss Federal Research Institute WSL, Birmensdorf, Switzerland and
24
European Centre for Medium-Range Weather Forecasts, Reading, UK
25
1
26
Gianpaolo Balsamo, European Centre for Medium-Range Weather Forecasts, Reading, UK
27 28
Corresponding author: E. Zsoter, European Centre for Medium-Range Weather Forecasts,
29
Shinfield Park, Reading, RG2 9AX, UK (
[email protected])
2
30
Abstract
31 32
In the last decade operational probabilistic ensemble flood forecasts have become common in
33
supporting decision-making processes leading to risk reduction. Ensemble forecasts can
34
assess uncertainty, but they are limited to the uncertainty in a specific modelling system.
35
Many of the current operational flood prediction systems use a multi-model approach to
36
better represent the uncertainty arising from insufficient model structure.
37 38
This study presents a multi-model approach to building a global flood prediction system
39
using multiple atmospheric reanalysis datasets for river initial conditions and multiple
40
TIGGE ensemble forcing inputs, to the ECMWF land-surface model. A sensitivity study is
41
carried out to clarify the effect of using archive ensemble meteorological predictions and
42
uncoupled land-surface models. The probabilistic discharge forecasts derived from the
43
different atmospheric models are compared with those from the multi-model combination.
44
The potential for further improving forecast skill by bias correction and Bayesian Model
45
Averaging is examined.
46 47
The results show that the impact of the different TIGGE input variables in the
48
HTESSEL/CaMa-Flood system setup is rather limited other than for precipitation. This
49
provides a sufficient basis for evaluation of the multi-model discharge predictions. We also
50
highlight that the three applied reanalysis data sets have different error characteristics
51
allowing large potential gains with a multi-model combination. It is shown that large
52
improvements to the forecast performance for all models can be achieved through appropriate
53
statistical post-processing (bias and spread correction). A simple multi-model combination
3
54
generally improves the forecasts, while a more advanced combination using Bayesian Model
55
Averaging provides further benefits.
4
56
1
Introduction
57
Operational probabilistic ensemble flood forecasts have become more common in the last
58
decade (Cloke and Pappenberger 2009; Demargne et al. 2014; Olsson and Lindström 2008).
59
Ensemble forecasts are a good way of assessing forecast uncertainty, but they are limited to
60
the uncertainty captured by a specific modelling system. A multi-model approach can address
61
this shortcoming and provide a more complete representation of the uncertainty in the model
62
structure, also potentially reducing the errors (Krishnamurti et al. 1999).
63 64
“Multi-model” can refer to systems using multiple meteorological models, hydrological
65
models or both (Velázquez et al. 2011). According to Emerton et al. (2016), amongst the many
66
regional-scale operational hydrological ensemble prediction systems across the globe, at
67
present there are six large (continental and global) scale models, four running at continental
68
scale over Europe, Australia and the USA, and two systems are available globally. The U.S.
69
Hydrologic Ensemble Forecast Service (HEFS) run by the National Weather Service (NWS)
70
(Demargne et al. 2014), and the Global Flood Forecasting Information System (GLOFFIS), a
71
recent development at Deltares in The Netherlands are examples of systems using different
72
hydrological models as well as multiple meteorological inputs. The European Flood Awareness
73
System (EFAS) developed by the Joint Research Centre (JRC) of the European Commission
74
(EC) and ECMWF operates using a single hydrological model with multi-model
75
meteorological input (Thielen et al. 2009). Finally, the European HYdrological Predictions for
76
the Environment (E-HYPE) Water in Europe Today (WET) model of the Swedish
77
Meteorological and Hydrological Institute (SMHI) (Donnelly et al. 2015), the Australian Flood
78
Forecasting and Warning Service and the Global Flood Awareness System (GloFAS, Alfieri
79
et al. 2013) running in collaboration between ECMWF and JRC, all use one main hydrological
80
model and one meteorological model input. 5
81 82
While the multi-model approach has traditionally involved the use of multiple forcing inputs
83
and hydrological models to generate discharge forecasts it also allows for consideration of
84
multiple initial conditions. In keeping with GloFAS, this paper uses atmospheric reanalysis
85
data to generate the initial conditions of the land-surface components of the forecasting system,
86
therefore a multi-model approach based on three reanalysis datasets is trialled.
87 88
The THORPEX (The Observing system Research and Predictability Experiment) Interactive
89
Grand Global Ensemble (TIGGE; Bougeault et al. 2010) archive is an invaluable source of
90
multi-model meteorological forcing data. The archive has attracted attention among
91
hydrological forecasters and is already being extensively used in hydrological applications.
92
The first published example of a hydro-meteorological forecasting application was by
93
Pappenberger et al. (2008). In that paper, the forecasts of nine TIGGE centres were used
94
within the setting of EFAS for a case study of a flood event in Romania in October 2007, and
95
showed that the lead time of flood warnings could be improved by up to 4 days through the
96
use of multiple forecasting models rather than a single model. This study and other
97
subsequent studies using TIGGE multi-model data (e.g. He et al. 2009; He et al. 2010; Bao
98
and Zhao 2012) have indicated that combining different models not only increases the skill,
99
but also the lead time at which warnings could be issued. He et al. (2009) highlighted this and
100
further showed that individual systems of the multi-model forecast have systematic errors in
101
time and space which would require temporal and spatial post-processing. Such post-
102
processing should carefully maintain spatial, temporal and inter-variable correlations;
103
otherwise they lead to deteriorating hydrological forecast skill.
104
6
105
The scientific literature contains numerous studies on methods that can lead to significant
106
gain in forecast skill by combining and post-processing different forecast systems. Statistical
107
ensemble post-processing techniques target the generation of sharp and reliable probabilistic
108
forecasts from ensemble outputs. Hagedorn et al. (2012) showed, based on TIGGE
109
ensembles, that by considering an equal weight multi-model approach, a selection of best
110
NWP models might be needed to gain skill on the best performing single model. In addition
111
to this, the calibration of the best single model using a reforecast dataset can lead to
112
comparable or even superior quality to the multi-model prediction. Gneiting and Katzfuss
113
(2014) focus on various methodologies that require weighting of the different contributing
114
forecasts to optimize model error corrections. They recommend the application of well-
115
established techniques in the operational environment such as the nonhomogeneous
116
regression or Bayesian model averaging (BMA). The BMA method generates calibrated and
117
sharp probability density functions (PDFs) from ensemble forecasts (Raftery et al. 2005),
118
where the predictive PDF is a weighted average of the PDFs centred on the bias-corrected
119
forecasts. The weights reflect the relative skill of the individual members over a training
120
period. The BMA has been widely used and proved to be beneficial in hydrological ensemble
121
systems (e.g. Ajami et al. 2007; Cane et al. 2013; Dong et al. 2013; Liang et al. 2013; Todini
122
2008; Vrugt and Robinson 2007).
123 124
Previous studies have used hydrological models, rather than land surface models, to analyse
125
the benefits of multi-model forecasting, and have focussed on individual catchments. The
126
potential of multi-model forecasts at the regional or continental scale shown in previous
127
studies, provides the motivation for building a global multi-model hydro-meteorological
128
forecasting system.
129
7
130
In this study we present our experiences in building a multi-model hydro-meteorological
131
forecasting system. Global ensemble discharge forecasts with a 10-day horizon are generated
132
using the ECMWF land-surface model and a river-routing model. The multi-model approach
133
arises from the use of meteorological forecasts from four models in the TIGGE archive and
134
the derivation of river initial conditions using three global reanalysis datasets. The main focus
135
of our study is the quality of the discharge forecasts derived from the TIGGE data. We
136
analyse the HTESSEL/CaMA-Flood set-up and the scope for error reduction by applying the
137
multi-model approach and different post-processing methods on the forecast data. Three sets
138
of experiments are undertaken to test (i) the sensitivity of the forecasting system to the input
139
variables, (ii) the potential improvements in forecasting historical discharge that can be
140
achieved by a combination of different reanalysis data sets and (iii) the use of bias correction
141
and model combination to improve the predictive distribution of the forecasts.
142 143
In Section 2 the datasets, models and methodology used throughout the paper are described.
144
Section 3 summarises the discharge experiments we produced and analysed. In Section 4, we
145
provide the results, while Section 5 gives conclusions to the paper.
146
2
147
2.1 HTESSEL land-surface model
148
The hydrological component of this study was the Hydrology-Tiled ECMWF Scheme for
149
Surface Exchange over Land (HTESSEL; Balsamo et al. 2009; Balsamo et al. 2011) land-
150
surface model. The HTESSEL scheme follows a mosaic (or tiling) approach where the grid
151
boxes are divided into patches (or tiles), with up to six fractions over land (bare ground, low
152
and high vegetation, intercepted water, shaded and exposed snow) and two extra tiles over
153
water (open and frozen water) exchanging energy and water with the atmosphere. The model
System Description and Datasets
8
154
is part of the integrated forecast system (IFS) at ECMWF and used in coupled atmosphere-
155
surface mode on time ranges from medium-range to seasonal forecasts. In addition, the model
156
provides a research test bed for applications where the land-surface model can run in a stand-
157
alone mode. In this so-called “offline” version the model is forced with near-surface
158
meteorological input (temperature, specific humidity, wind speed and surface pressure),
159
radiative fluxes (downward solar and thermal radiation) and water fluxes (liquid and solid
160
precipitation). This offline methodology has been explored in various research applications
161
where HTESSEL or other models were applied (e.g. Agustí-Panareda et al. 2010; Dutra et al.
162
2011; Haddeland et al. 2011).
163
2.2 CaMa-Flood river-routing
164
The Catchment-based Macro-scale Floodplain model (CaMa-Flood; Yamazaki et al. 2011)
165
was used to integrate HTESSEL runoff over the river network into discharge. CaMa‐Flood is
166
a distributed global river-routing model that routes runoff to oceans or inland seas using a
167
river network map. A major advantage of CaMa-Flood is the explicit representation of water
168
level and flooded area in addition to river discharge. The relationship between water storage
169
(the only prognostic variable), water level, and flooded area is determined on the basis of the
170
subgrid‐scale topographic parameters based on a 1km digital elevation model.
171
2.3 TIGGE ensemble forecasts
172
The atmospheric forcing for the forecast experiments is taken from the TIGGE archive where
173
all variables are available on a standard 6-hour forecast frequency. The ensemble systems of
174
ECMWF, UKMO (UK Met Office), NCEP (National Centers for Environmental Prediction)
175
and CMA (China Meteorological Administration) provide, in the TIGGE archive,
176
meteorological forcing fields from the 00 UTC runs with 6-hour frequency starting from
177
2006-2008 depending on the model. All four models were only available with the complete
9
178
forcing variable set from August 2008. ECMWF was available with 50 ensemble members on
179
32 km horizontal resolution (~50 km before January 2010) up to 15 days ahead, UKMO with
180
23 members on ~60 km (~90 km before March 2010) also up to 15 days, NCEP with 20
181
members on ~110 km up to 16 days ahead, and finally CMA with 14 members on ~60 km
182
resolution up to 10 days ahead. In testing the sensitivity of the experimental set-up to
183
meteorological forcing (see Section 4.1) the ECMWF control forecasts were used, extracted
184
directly from ECMWF's Meteorological Archival and Retrieval System (MARS), where the
185
meteorological variables are available without the TIGGE restrictions. These have the same
186
resolution as the 50 ensemble members but start from the unperturbed analysis.
187
2.4 Reanalysis data
188
The discharge modelling experiments require reanalysis data, which are used to provide the
189
climate and the initial conditions needed for the HTESSEL land surface model runs, and to
190
produce the river initial conditions required in the CaMa-Flood routing part of the TIGGE
191
forecast experiments.
192 193
In this study we have used three different reanalysis datasets; two produced by ECMWF, the
194
ERA Interim reanalysis (hereafter ERAI) and the ERA-Interim/Land version with GPCP v2.2
195
precipitation (Huffman et al. 2009) correction (Balsamo et al., 2015; hereafter ERAI-Land),
196
and a third, the MERRA-Land upgrade of the Modern-Era Retrospective Analysis for
197
Research and Applications (MERRA) dataset, produced by NOAA. The combination of these
198
three sources was a proof of concept to potential added value of the multi-initial conditions.
199 200
ERAI is ECMWF‘s global atmospheric reanalysis from 1979 to present produced with an
201
older (2006) version of the ECMWF IFS on a T255 spectral resolution (Dee et al., 2011).
202
ERAI-Land is a version of ERAI at the same 80 km spatial resolution with improvements for 10
203
land-surface. It was produced in offline mode with a 2014 version of the HTESSEL land-
204
surface model using atmospheric forcing from ERA-Interim, with precipitation adjustments
205
based on GPCP v2.2, where the ERA-Interim 3-hourly precipitation is rescaled to match the
206
monthly accumulated precipitation provided by the GPCP v2.2 product (for more details
207
please consult Balsamo et al. 2010).
208 209
The MERRA-Land dataset is similar to ERAI-Land in that it is a land-only version of the
210
MERRA land model component, produced also in offline mode, using improved precipitation
211
forcing and an improved version of the catchment land surface model (Reichle et al. 2011).
212
2.5 Discharge Data
213
In this study a subset of the observations available in GloFAS were used, mainly originating
214
from the Global Runoff Data Centre (GRDC) archive. The GRDC is the digital world-wide
215
depository of discharge data and associated metadata. It is an international archive of data
216
starting in 1812, and fosters multinational and global long-term hydrological studies.
217 218
For the discharge modelling a dataset of 1121 stations with upstream areas over 10,000 km2
219
was available until the end of 2013. GRDC has a gradually decreasing number of stations
220
with data in the archive limiting their use for more recent years. For the forecast discharge we
221
limited our analyses to the period of August 2008 to May 2010. This period provided the
222
optimal compromise in increasing the sample size between the length of the period and the
223
number of stations with good data coverage. For the reanalysis discharge experiments and
224
also for generating the observed discharge climate, stations with a minimum of 15-years of
225
available observations were used in the 30-year period from 1981 to 2010. For the forecast
226
experiments, stations with at least 80% of the observations available were used in the 22-
227
month period from August 2008 to May 2010. Figure 1 shows the observation availability in 11
228
the reanalysis and TIGGE forecast experiments. It highlights that for the reanalysis the
229
coverage is better globally with about 850 stations while the forecast experiments has around
230
550 stations with large missing areas, mainly in Africa and Asia.
231
2.6 Forecasting system setup
232
To produce runoff from the TIGGE atmospheric ensemble variables (see Section 2.3)
233
HTESSEL experiments were run with 6-hourly forcing frequency and hourly model time
234
step. For the instantaneous variables (such as 2m temperature) linear interpolation was used
235
to move from the 6-hour to hourly time step used in the CaMa-Flood simulations. For
236
accumulated variables (such as precipitation) a disaggregation algorithm which conserves the
237
6-hourly totals was used. The disaggregation algorithm divides into hourly values based on a
238
linear combination of the current and adjacent 6-hourly totals with weights derived from the
239
time differences.
240 241
The climate and the initial conditions needed for the HTESSEL land-surface model runs to
242
produce runoff were taken from ERAI-Land, the same initial conditions for all models and
243
ensemble members without perturbations. The other two reanalysis data sets could also be
244
used to initialise HTESSEL, but the variability on the resulting TIGGE runoff (and thus on
245
the TIGGE discharge) would be very small compared with the impact of the TIGGE
246
atmospheric forcing (especially precipitation, see also Section 4.1) and the impact of the
247
TIGGE forecast routing initialisation (see Section 3.2 for further details).
248 249
The HTESSEL resolution was set to T255 spectral resolution (~80 km). This was the
250
horizontal resolution used in ERAI, and an adequate compromise between the highest
251
(ECMWF mainly ~50 km) and lowest (NCEP with ~110 km) forcing model resolution which
12
252
also allowed fast enough computations. The TIGGE forcing fields were transformed to T255
253
using bilinear interpolation.
254 255
The TIGGE archive includes variables at the surface and several pressure levels. However,
256
variables are not available on model levels, and as such, temperature, wind and humidity at
257
the surface (i.e. 2m for temperature and humidity, and 10m for wind) were used in HTESSEL
258
rather than on the preferred lowest model level (LML).
259 260
Similarly, TIGGE contains several radiation variables, but not the downward radiations
261
required by HTESSEL. In order to run HTESSEL without major technical modifications, we
262
had to use a radiation replacement for all TIGGE models and ensemble members. We used
263
ERAI-Land for this purpose, as it does not favour any of the TIGGE models used in this
264
study. This way, for one daily run the same single radiation forecast was used for all
265
ensemble members and all models. These 10-day radiation forecasts were built from 12-hour
266
ERAI-Land short range predictions. To reduce the possible spin-up effects in the first hours
267
of the ERAI-Land forecasts, the 06-18h radiation fluxes were combined (as 12-hour sections)
268
from subsequent 00:00 and 12:00 UTC runs, following the approach described in Balsamo et
269
al. (2015). The sensitivity to the HTESSEL input variables will be discussed in Section 4.1.
270 271
In this study we were able to process four models out of the 10 global models archived in
272
TIGGE: ECMWF, UKMO, NCEP and CMA. The other six models do not archive one or
273
more of the forcing variables, in addition to the downward radiation, required for this study.
274
The runoff produced by HTESSEL for the TIGGE ensembles was routed over the river
275
network by CaMa-Flood. These relatively short experiments for the TIGGE forecasts
13
276
required initial river conditions. These were provided by three CaMa-Flood runs for the
277
1980-2010 period with ERAI, ERAI-Land and MERRA-Land runoff input.
278 279
The discharge forecasts were produced by CaMa-Flood out to 10 days (T+240h), the longest
280
forecast horizon common to all models. No perturbations were applied on the river initial
281
conditions for the ensemble members. The forecasts were extracted from the CaMa-Flood 15
282
arc min (~25 km) model grid for every 24 hours similarly to the 24-hour reporting frequency
283
of the discharge observations.
284
3
285
The main focus of the experiments was on the quality of the discharge forecasts derived from
286
the TIGGE data. Three sets of experiments were performed to test the HTESSEL/CaMA-
287
Flood set-up and the scope for error reduction by applying the multi-model approach and
288
different post-processing methods:
289
Experiments
290 291
Discharge sensitivity to meteorological forcing - The first experiment (Section 4.1) tests the sensitivity of the forecasting system to the input variables.
Reanalysis impact on discharge - The second experiment (Section 4.2) evaluates the
292
potential improvements on the historical discharge that can be achieved by a
293
combination of different reanalysis datasets.
294
Improving the forecast distribution - In the third experiment (Section 4.3) the use
295
of bias correction and model combination to improve the predictive distribution of the
296
forecast is considered.
14
297
3.1 Discharge sensitivity to meteorological forcing
298
In Section 2.6 a number of compromises in the coupling of HTESSEL and forecasts from the
299
TIGGE archive were introduced. Sensitivity experiments were conducted to study the impact
300
of these. Table 1 provides a short description of the experiments.
301 302
The baseline for the comparisons is the discharge forecasts generated by HTESSEL and
303
CaMa-Flood driven by ECMWF control forecasts. These forecasts were produced weekly (at
304
00 UTC) throughout 2008-2012 to cover several seasons (~260 forecast runs in total). In the
305
baseline setup, the LML meteorological output for temperature, wind and humidity was used
306
to drive HTESSEL.
307 308
The first sensitivity test (Surf vs. LML) was to replace these LML values with the surface
309
values (as 2m temperature and humidity, and 10m wind), from the same model run. This
310
mirrors the change needed to make use of the TIGGE archive. Due to limitations in the
311
TIGGE archive, the ERAI-Land radiation was used for all forecasts. Substitution of the
312
ECMWF control radiation in the HTESSEL input by ERAI-Land is the second sensitivity test
313
(Rad). Further to this, substitution of the wind (Wind), temperature, humidity and surface
314
pressure together (THP), and finally precipitation (Prec) from ERAI-Land in place of the
315
ECMWF ensemble control run values was also evaluated. Temperature and humidity were
316
analysed together due to the sensitive nature of the balance between these two variables.
317
Although these changes were not applied on the TIGGE data, they give a more complete
318
picture on sensitivity to the forcing variables. This puts into context the discharge errors that
319
we indirectly introduced through the TIGGE-HTESSEL setup changes.
320
15
321
The impact on the errors was compared by evaluating the ratio of the magnitude (absolute
322
value) of the discharges to the baseline experiments discharge value. These changes in
323
relative discharge were computed for each station as the average of the relative changes over
324
all runs (in the 2008-2012 period with weekly runs), and also as a global average of all
325
available stations.
326
3.2 Reanalysis impact on discharge
327
For the forecast CaMa-Flood routing the river initial conditions are provided by reanalysis
328
based simulations (see Section 2.4). They do not make use of observed river flow and
329
therefore are an estimate of the observed values. The quality of the forecast discharge is
330
expected to be strongly dependent on the skill of this reanalysis-derived historical discharge.
331
This is highlighted in Figure 2 where ERAI-Land, ERAI and MERRA-Land are compared for
332
a station in the US for a 4-year period.
333 334
Each of these reanalyses provides different error characteristics which can potentially be
335
harnessed by using a multi-model approach. For this station ERAI has a tendency to produce
336
occasional high peaks, while MERRA-Land has a strong negative bias. Although Figure 2 is
337
only a single example it highlights the large variability between these reanalysis datasets and
338
therefore a potentially severe underestimation of the uncertainties in the subsequent forecast
339
experiments by using only a single initialisation data set.
340 341
The impact of the multi-model approach was analysed by experiments with the historical
342
discharges derived from ERAI, ERAI-Land and MERRA-Land inputs. Three sets of CaMa-
343
Flood routing runs were performed, for each of the four TIGGE models for the whole 22-
344
month period in 2008-2010, each initialised from one of the three reanalysis-derived
16
345
historical river conditions. The performance of the historical discharge was evaluated
346
independently of the TIGGE forecasts on the period of 1981-2010.
347
3.3 Improving the forecast distribution
348
In the third group of experiments a number of post-processing techniques were applied at
349
each site with the aim of improving the forecast distribution for the observed data. Here we
350
outline the techniques with reference to a single site and forecast origin 𝑡. The forecast values
351
available are denoted 𝑓𝑚,𝑗,𝑡,𝑖 where 𝑚 indexes over the forecast products (ECMWF, UKMO,
352
NCEP, CMA), 𝑗 over the 𝑁𝑚 ensemble members in forecast product 𝑚 and 𝑖 = 1, … ,10 the
353
available lead times.
354
3.3.1 Bias correction
355
As a first step we analysed the biases of the data. As described in Section 3.2 the historical
356
river initial conditions have potentially large errors. In addition, the variability of the
357
discharge in a 10-day forecast horizon is generally much smaller than derived from reanalysis
358
over a long period. Therefore any timing or magnitude error in the historical discharge
359
provided initial conditions means the forecast errors can be very large and will change only
360
slightly, in relative terms, throughout the 10-day forecast period.
361 362
As bias was expected to be a very important aspect of the errors, three methods of computing
363
the bias correction 𝑒𝑚,𝑗,𝑡,𝑖 to add to the forecast 𝑓𝑚,𝑗,𝑡,𝑖 were proposed. The first of these is to
364
apply no correction (or uncorrected); that is 𝑒𝑚,𝑗,𝑡,𝑖 = 0 in all cases. The second method,
365
referred to as 30-day correction, removes the mean bias of the 30-day period preceding the
366
actual forecast run for each forecast product at each specified forecast range. The mean bias
367
is computed as an average error of the ensemble mean over a 30-day period. In this case
17
368
given a series of 30 dates 𝑡 = 1, … 𝑁30 and observed discharge data 𝑦𝑡 the bias corrections are
369
given by
370
𝑁𝑚 30 ∑𝑁 𝑡=1 ∑𝑗=1(𝑦𝑡 − 𝑓𝑚,𝑗,𝑡−𝑖,𝑖 )
𝑁30 𝑁𝑚
371
The third correction method, referred to as initial-time correction, focused specifically on
372
the historical discharge based initial condition errors. The error at initialisation of the routing
373
(𝑓𝑚,𝑗,𝑡,0), i.e. the error of the historical discharge, was used as a correction for all forecast
374
ranges. This initial time correction gives 𝑒𝑚,𝑗,𝑡,𝑖 = 𝑦𝑡 − 𝑓𝑚,𝑗,𝑡,0
375 376
This method therefore uses a specific error correction for each individual forecast run from
377
day1 to day10. Due to the common initialisation the initial time correction was the same for
378
all four TIGGE models, for all three historical discharge experiments respectively.
379 380
3.3.2 Multi-model combination
381
To investigate the potential further benefits of combining different forecast products two
382
model combination strategies were trialled. The naïve combination strategy (also referred to
383
as multi-model combination or MM) was based on utilising a grand ensemble with each
384
member having equal weight. In this combination the larger ensembles (the largest being
385
ECMWF with 50 members) get larger weights. In direct analogy to the case of a single
386
forecast product the cumulative forecast distribution is expressed in terms of the indicator
387
function 𝛿(𝑧) which takes the value 1 if the statement 𝑧 is true and zero otherwise as:
388 389
𝑃𝑟(𝑌𝑡+𝑖 < 𝑦) =
𝑚 ∑𝑚 ∑𝑁 𝑗=1 𝛿(𝑓𝑚,𝑗,𝑡,𝑖 + 𝑒𝑚,𝑗,𝑡,𝑖 < 𝑦)
∑𝑚 𝑁𝑚
Here 𝑒𝑚,𝑗,𝑡,𝑖 indicates one of the three bias corrections we introduced in the previous section.
390 18
391
In the second combination strategy BMA was used to explore further the effects of weighted
392
combination and a temporally localised bias correction. Since discharge is always positive the
393
variables were transformed so that their distributions marginalised over time are standard
394
Gaussian. This is achieved using the Normal Quantile transform (Krzysztofowicz, 1997) with
395
the upper and lower tails handled as in Coccia and Todini (2011). The transformed values of
396
the bias corrected forecasts and observations are denoted 𝑓̃𝑚,𝑗,𝑡,𝑖 and 𝑦̃𝑡 respectively.
397 398
This study follows the BMA approach proposed by Fraley et al. (2010) for systems with
399
exchangeable members with the weight (𝑤𝑚,𝑡,𝑖 ), linear bias correction (with parameters 𝑎𝑚,𝑡,𝑖
400
2 and 𝑏𝑚,𝑡,𝑖 ) and nugget variance (𝜎𝑚,𝑡,𝑖 ) being identical for each ensemble member within a
401
given forecast product. The resulting cumulative forecast distribution; in the transformed
402
space; is then a weighted combination of standard Gaussian cumulative distributions,
403
specifically: 𝑁𝑚
404
𝑃𝑟(𝑌̃𝑡+𝑖 < 𝑦̃) = ∑ 𝑤𝑚,𝑡,𝑖 ∑ 𝛷 ( 𝑚
𝑗=1
𝑦̃ − 𝑎𝑚,𝑡,𝑖 − 𝑏𝑚,𝑡,𝑖 𝑓̃𝑚,𝑗,𝑡,𝑖 ) 𝜎𝑚,𝑡,𝑖
405
As indicated by the origin and lead time subscripts the BMA parameters were estimated for
406
each forecast origin and lead time. Estimation bias proceeds by first fitting the linear
407
correction using least-squares before estimating the weight and variance terms using
408
maximum likelihood (Raftery et al. 2005). A moving window of 30 days of data, before the
409
initialisation of the forecasts similarly to the 30-day correction, were utilised for the
410
estimation to mimic operational practice.
411 412
As the initial conditions were expected to play an important role a further forecast was
413
introduced in the context of the BMA analysis. The deterministic persistence forecast is,
19
414
throughout the 10-day forecast range, the most recent observation available at time of issue.
415
That is 𝑓𝑚,1,𝑡,𝑖 = 𝑦𝑡
416 417
This persistence forecast was also used as a simple reference to compare our forecasts
418
against.
419 420
To aid comparison with the naïve combination strategy a similar sized ensemble of forecasts
421
was generated from the BMA combination by applying ensemble copula coupling (Schefzik
422
et al., 2013) to a sample generated by taking equally spaced quantiles from the forecast
423
distribution and reversing the transformation.
424
3.3.3 Verification statistics
425
The forecast distributions were evaluated using the continuous ranked probability score
426
(CRPS, Candille and Talagrand 2005). The CRPS evaluates the global skill of the ensemble
427
prediction systems by measuring a distance between the predicted and the observed
428
cumulative density functions of scalar variables. For a set of dates 𝑡 = 1, … 𝑁𝑡 with
429
observations and forecasts issued with the same lead time the CRPS can be defined as: 𝑁𝑡
430
∞ 2 1 𝐶𝑅𝑃𝑆 = ∑ ∫ (𝑃𝑟(𝑌̃𝑡+𝑖 < 𝑦) − 𝛿(𝑦 ≥ 𝑦𝑡 )) 𝑑𝑦 𝑁𝑡 −∞ 𝑡=1
431 432
The CRPS has a perfect score of 0 and has the advantage of transforming into the mean
433
absolute error for deterministic forecasts and thus providing a simple way of comparing
434
different types of systems. In this study the method of Hersbach (2000) for computing the
435
CRPS from samples was used. The global CRPS scores reported for each lead time were
436
produced by pooling the samples from all the stations before computing the scores.
437 20
438
As the CRPS has the unit of the physical quantity (e.g. for discharge m3/s), comparing scores
439
can be problematic and is only meaningful if two homogeneous sample based scores are
440
compared. For example, different geographical areas or different seasons cannot really be
441
compared. In this study we ensured that for any comparison of forecast models and post-
442
processed versions the samples were homogeneous. We considered the same days in the
443
verification period at each station specifically, and also the same stations in the global
444
analysis, producing equal sample sizes across all compared products.
445 446
To help compare results across different stations and areas we used the CRPS-based skill
447
score (CRPSS) with the reference system of the observed discharge climate in our
448
verification. We produced the daily observed climate for the 30-year period of 1981-2010 and
449
pooled observations from a 31-day window centred over each day. Observed climate was
450
produced for stations with at least 10-year data available in total (310 values) for all days of
451
the year.
452 453
The historical discharge experiments were compared using the mean absolute error (MAE)
454
based skill score with the observed daily discharge climate as reference,
455 456 457
𝑀𝐴𝐸𝑆𝑆 = 1 −
𝑀𝐴𝐸 𝑀𝐴𝐸𝑜𝑏𝑠𝑐𝑙𝑖𝑚
and the sample Pearson correlation coefficient, 𝐶𝑂𝑅𝑅 =
𝑡 ∑𝑁 ̅) 𝑡=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦 𝑡 𝑡 √∑𝑁 (𝑥 )2 √∑𝑁 (𝑦 ̅)2 𝑡=1 𝑖 − 𝑥̅ 𝑡=1 𝑖 − 𝑦
458
The MAE reflects the ability of the systems to match the actual observed discharge, while the
459
correlation highlights the quality of match between the temporal behaviour of the historical
460
forecast time series and the observation time series.
21
461
4
Results
462
First we present the findings of the sensitivity experiments carried out, using the ECMWF
463
ensemble control forecast, on the impact of the HTESSEL coupling with the TIGGE
464
meteorological input. Then we compare the quality of the historical discharge produced from
465
ERAI, ERAI-Land and MERRA-Land reanalysis datasets and the impact of their
466
combination. Finally from the large number of forecast products described in Sections 3.2
467
and 3.3 we present results that aid interpretation of the discharge forecast skills and errors
468
with focus on the potential multi-model improvements:
469
The four uncorrected TIGGE forecasts with ERAI-Land initialisation.
470
The MM combination of the four uncorrected models with the ERAI, ERAI-Land and
471
MERRA-Land initialisations, and the grand-ensemble of these three MM
472
combinations (called GMM hereafter).
473
The GMM combinations of the 30-day corrected, the initial-time corrected, the
474
combined initial time and 30-day corrected MM forecasts (first initial-time correct the
475
forecasts then apply the 30-day correction on these).
476
And finally the GMM of the BMA combined MM forecasts (from all three
477
initialisations) with the uncorrected models, the initial time corrected models and also
478
the uncorrected models extended by the persistence as a separate single value model.
479
4.1 Discharge sensitivity to meteorological forcing
480
The impact of replacing HTESSEL forcing variables other than precipitation (combination of
481
Rad, Wind and THP tests) with ERAI-Land (Fig. 3) is rather small (~3% by T+240, brown
482
curve in Figure 3). The least influential is the Wind (red curve), while the biggest
483
contribution comes from the THP (green curve). When all ensemble forcing is replaced
484
including precipitation the impact jumps to ~15% by T+240h showing that large majority of
485
the change in the discharge comes from differences in precipitation (not shown). 22
486 487
The analysis of different areas and periods (see Table 2) highlights that larger impacts are
488
seen for the winter period where the contribution of precipitation decreases and the
489
contribution of the other forcing variables, both individually and combined, increases by
490
~two to fivefold (this is particularly noticeable for THP). This is most likely a consequence of
491
the snow-related processes, with snowmelt being dependent on temperature, radiation and
492
also wind in the cold seasons. This also implies that the results are dependent on seasonality;
493
a result which was also found by Liu et al. (2013), who looked at the skill of post-processed
494
precipitation forecasts using TIGGE data for the Huai river basin in China. In this study, due
495
to the relatively short period we were able to use in the forecasts verification, scores were
496
only computed for the whole verification period and no seasonal differences were analysed.
497 498
Regarding the change from LML to surface forcing for temperature (2m), wind (10m) and
499
humidity (2m), the potential impact can be substantial as shown by an example for the 1
500
January 2012 in Fig. 4. In such cold winter conditions large erroneous surface runoff values
501
could appear in some parts of Russia when switching to surface forcing in HTESSEL. The
502
representation of dew deposition is a general feature of HTESSEL which can be amplified in
503
stand-alone mode. When coupled to the atmosphere, the deposition is limited in time, as it
504
leads to a decrease of atmospheric humidity. However, in stand-alone mode, since the
505
atmospheric conditions are prescribed, large deposition rates can be generated when the
506
atmospheric forcing is not in balance (e.g. after model grid interpolation, or changing from
507
LML to surface forcing).
508 509
This demonstrates that with a land-surface model such as HTESSEL, particular care needs to
510
be taken in design of the experiments when model imbalances are expected. The use of
23
511
surface data was an acceptable compromise as the sensitivity experiments highlighted only a
512
small impact caused by the switch from LML to surface forcing (dashed black line in Fig. 3),
513
and similarly by the impact of the Rad test, confirming that the necessary changes in the
514
TIGGE land-surface model setup did not have a major impact on the TIGGE discharge.
515
4.2 Reanalysis impact on discharge
516
The quality of the historical river flow that provides initial conditions for the CaMa-Flood
517
TIGGE routing is expected to have a significant impact on the forecast skill. We analyse the
518
discharge performance which is highlighted in Figure 5. This shows the mean absolute error
519
skill score (MAESS) and the sample Pearson correlation (CORR) for the ERAI, ERAI-Land
520
and MERRA-Land simulated historical discharge from 1981 to 2010, and for their equal
521
weight multi-model average (MMA). The results are provided as continental and also as
522
global averages of the available stations for Europe (~150 stations), North-America (~350
523
stations), South-America (~150 stations), Africa (~80 stations), Asia (mainly Russia, 60
524
stations) and Australia & Indonesia (~50 stations), making ~840 stations globally.
525 526
The general quality of these global simulations is quite low. The MAESS averages over the
527
available stations (see Figure 1) are below 0 for all continents, i.e. large scale average
528
performance is worse than the daily observed climatology. The models are closest to the
529
observed climate performance over Europe and Australia & Indonesia. The correlation
530
between the simulated and observed time series shows a slightly more mixed picture, at least
531
in some cases; especially Europe and Australia & Indonesia, the model is better than the
532
observed climate. It is interesting to note that although the observed climate produces a high
533
forecast time series correlation, in Asia the reanalysis discharge scores very low for all three
534
sources. This could be related to the problematic handling of the snow in that area.
535 24
536
Figure 2 shows an example where MERRA-Land displayed a very strong negative bias. This
537
example highlights the large variability amongst these data sources and is not an indication of
538
the overall quality. Although MERRA-Land shows generally negative bias (not shown), the
539
overall quality of the three re-analysis driven historical discharge data sets is rather
540
comparable. The highest skill and correlation is generally shown by ERAI-Land for most of
541
the regions with the exception of Africa and Australia, where MERRA-Land is superior.
542
ERAI, as the oldest dataset, appears to be the least skilful. Reichle et al. (2011) have found
543
the same relationship between MERRA-Land and ERAI using 18 catchments in the US.
544
Although they computed correlation between seasonal anomaly time series (rather than the
545
actual time series evaluated here), they could show that runoff estimates had higher
546
correlation of the anomaly time series in MERRA-Land than in ERAI.
547 548
The multi-model average of the three simulations is clearly superior in the global and also in
549
the continental averages, with very few exceptions that have marginally lower MMA scores
550
compared with the best individual reanalysis. The MMA is able to improve on the best of the
551
three individual data sets at about half of the stations globally, both in the MAESS and
552
CORR scores. Figure 6 shows the improvements in correlation. The points where the
553
combination of the three reanalyses helps to improve on the best model cluster mainly over
554
Europe, Amazonia and the east of US. On the other hand the northern hemispheric winter
555
areas seem to show mainly deterioration. This again is most likely related to the difficulty in
556
the snow related processes, which can hinder the success of the combination if, for example,
557
one model is significantly worse with larger biases than the other two. Further analysis could
558
help identify these more detailed error characteristics providing a basis for further potential
559
improvements.
25
560
4.3 Improving the forecast distribution
561
Figure 7 displays example hydrographs of some analysed forecast products for a single
562
forecast run to provide a practical impression of our experiments. The forecasts from 18 April
563
2009 are plotted for the GRDC station of Lobith in the Netherlands. The thin solid coloured
564
lines are the four TIGGE models (ECMWF, UKMO, NCEP and CMA) plotted together
565
(MM) with ERAI-Land (red), ERAI (green) and MERRA-Land (blue) initialisations. They
566
start from very different levels which are quite far from the observation (thick black line), but
567
then seem to converge to roughly the same range in this example. The ensemble mean of the
568
initial error corrected MM-s (from the three initialisations with dashed lines), which by
569
definition start from the observed discharge at T+0h, then follow faithfully the pattern of the
570
mean of the respective MM-s. The 30-day corrected forecasts (dash-dotted lines) follow a
571
pattern relative to the MM ensemble means set by the last 30 days performance. The
572
combination of the two bias correction methods (dotted lines) blend the characteristics of the
573
two; all three versions start from the observation (as first the initial error is removed), and
574
then follow the pattern set by the past 30 day performance of this initial time corrected
575
forecast. Finally, the BMA transformed (uncorrected) MM-s (thin grey lines) happen to be
576
closest to the observations in this example, showing a rather uniform spread throughout the
577
processed T+24h to T+240h range.
578 579
The quality of the TIGGE discharge forecasts based on the verified August 2008 to May
580
2010 period is strongly dependent on the historical discharge that is used to initialise them.
581
Figure 5 highlighted that the daily observed discharge climate is a better predictor than any of
582
the three historical reanalysis-driven discharges (MAESS < 0). It is therefore not surprising
583
that the uncorrected TIGGE forecasts show similarly low relative skill based on the CRPS
584
(Figure 8). Figure 8 also shows the performance of the four models (dashed grey lines). As in
26
585
this study we concentrate on the added value of the multi-model combination, we do not
586
distinguish between the four raw models. The scores change very little over the 10-day
587
forecast period, showing a marginal increase in CRPSS as lead time increases. This is
588
indicative of the incorrect initialisation, with the forecast outputs becoming less dependent on
589
initialisation further into the medium-range, and slowly converging towards climatology.
590 591
The first stage of the multi-model combination is the red line in Figure 8, the combination of
592
the four uncorrected model with the same ERAI-Land initialisation. On the basis of this
593
verification period and global station list the simple equal-weight combination of the
594
ensembles does not really seem to be able to improve on the best model. However, we have
595
to acknowledge that the performance in general is very low.
596 597
The other area where we expect improvements through the multi-model approach is the
598
initialisation. Figure 8 highlights a significant improvement when using three historical
599
discharge initialisations instead of only one. The quality of the ERAI-Land (red), ERAI
600
(green) and MERRA-Land (blue) initialised forecasts (showed here only the multi-model
601
combination versions) are comparable, with the ERAI-Land slightly ahead which is in
602
agreement with the results of the direct historical discharge comparisons presented in Section
603
4.2. However, the grand combination of the three is able to improve significantly (orange
604
line) on all of them. The improvement is much larger at shorter lead times as the TIGGE
605
meteorological inputs provide lower spread, and therefore the spread introduced by the
606
different initialisations is able to have a bigger impact.
607 608
The quality of the discharge forecasts could be improved noticeably by introducing different
609
initial conditions. However, the CRPSS is still significantly below zero, pointing to the need
27
610
for post-processing. In this study we have experimented with a few methods which were
611
proven to be beneficial.
612 613
The 30-day correction removed the mean bias of the most recent 30 runs from the forecasts.
614
Figure 8 shows the grand combination of the 30-day bias corrected multi-models (with all
615
three initialisations) which brings the CRPSS to almost 0 throughout the 10 day forecast
616
range (dashed burgundy line in Figure 8). This confirms that the forecasts are severely biased.
617
In addition, the shape of the curve remains fairly horizontal suggesting this correction is not
618
making best use of the temporal patterns in the bias.
619 620
Further significant improvements in CRPSS are gained at shorter forecast ranges by using the
621
initial time correction (dashed purple line with markers in Figure 8), which does make use of
622
temporal patterns in the bias. The shape of this error curve shows a typical pattern with the
623
CRPSS decreasing with forecast range, reflecting the decreasing impact of the initial time
624
correction and increased uncertainty in the forecast. The impact of the initial time errors
625
gradually decreases until finally it disappears by around day 5-6 when the 30-day correction
626
becomes superior.
627 628
The combination of the two methods, by applying the 30-day bias correction to forecasts
629
already adjusted by the initial time correction, blends the advantages of both corrections. The
630
CRPSS is further improved mainly in the middle of the 10-day forecast period with
631
disappearing gain by T+240h (solid burgundy line with markers in Figure 8).
632 633
The fact that the performance of the 30-day correction is worse in the short range than the
634
initial time correction highlights that the impact of the errors at initial time has a structural
28
635
component that cannot be explained by the temporally averaged bias. Similarly the initial
636
time correction cannot account exclusively for the large biases in the forecasts as its impact
637
trails off relatively quickly.
638 639
The persistence forecast shows a distinct advantage over these post-processed forecasts
640
(dashed light blue line with circles in Figure 8). There is positive skill up to T+144h and the
641
advantage of the persisted observation as a forecast diminishes so that by T+240h its skill is
642
similar to that of the combined corrected forecasts. This further highlights that the utilising of
643
the discharge observations in the forecast production promises to provide a really significant
644
improvement.
645 646
It is suggested that the structure of the initial errors has two main components: (i) biases in
647
the reanalysis initialisations due to biases in the forcing (e.g. precipitation) and in the
648
simulations (e.g. evapotranspiration) and (ii) biases introduced by timing errors in the routing
649
model due, in part, to the lack of optimized model parameters. A further evaluation of the
650
weight of each of these error sources is beyond the scope of this study.
651 652
The final of our trialled post-processing methods is the BMA. In Figure 8, similarly to the
653
other post-processed products, only the grand combination is displayed of the three BMA
654
transformed MM-s with the different initialisations. The BMA of the uncorrected forecasts
655
was able to increase further the CRPSS markedly across all forecast ranges except T+24h
656
(black line without markers). The results for T+24h suggest that at this lead time the perfect
657
initial error correction from T+0h still holds superior.
658
29
659
The other two BMA versions, one with the uncorrected forecasts extended by the persistence
660
as predictor (black line with circles), and one with the initial time corrected forecasts (not
661
shown), both provide further skill improvements. The one with the persistence performs
662
overall better, especially in the first few days. The BMA incorporating the persistence
663
forecast remains skilful up to T+168h, the longest lead time of any of the forecast methods
664
tested. At longer lead times (days 8-10) the BMA of the uncorrected model forecast appears
665
to provide the highest skill of all the post-processed products. This is evidence that the
666
training of the BMA is not optimal. In part this is due to the estimation methodology used.
667
More significantly experiments (not reported) show that the optimal training window for the
668
BMA varies across sites, showing different picture for the BMA with or without persistence,
669
and also delivering potentially higher global average skill using a longer window.
670 671
Although Figure 8 shows only the impact of the four post-processing methods on the grand
672
combination of the MM forecasts, the individual MM-s with the three initialisations show the
673
same behaviour. The GMM-s always outperform the three MM-s for all the post-processing
674
products, e.g. for the most skilful method, the BMA, the grand combination extends the
675
positive skill by ~1 day (from around 5 days to 6 days, not shown).
676 677
The distribution of the skill increments over all stations provided by different combination
678
and post-processing products is summarised in Figure 9 at T+24h (Figure 9a) and T+240
679
(Figure 9b). The reference skill is the average CRPSS of the four TIGGE models with the
680
ERA-Land initialisation (these values are represented by the dashed grey lines in Figure 8).
681
Figure 9 highlights the structure of the improvements in different ranges of the CRPSS for
682
the different methods over all verified stations in the August 2008 to May 2010 period. The
683
picture is characteristically different at different lead times as suggested by the T+24h and
30
684
T+240h plots. At short range the improvements of the different products scale nicely into
685
separate bands. The relatively simple MM combination of the 4 models with ERAI-Land (red
686
circles) does not improve on the forecast; the increments are small and with mixed sign. The
687
GMM combination of the three uncorrected MM-s (green triangles) shows a marked
688
improvement, and the 30-day correction version (orange triangles) improves further while the
689
initial time correction products (cyan squares and purple stars) show the largest improvement
690
over most of the stations. At this short T+24h range the BMA (blue stars) of the uncorrected
691
forecasts is slightly behind which is a general feature across the displayed -5 to 1 CRPSS
692
range.
693 694
In contrast to the short range, T+240h provides a significantly different picture. The relatively
695
clear ranking of the products is gone by this lead time. The MM and GMM combinations are
696
able to improve slightly for most of the stations, but at this range the contribution seems to be
697
generally always positive. The post-processing methods at this medium range, however,
698
deteriorate the forecasts sometimes, especially in the -1 to 0.5 range (the 30-day correction
699
seems to behave noticeably better in this respect). The general improvements are clear though
700
for most of the stations, and also the overall ranking of the methods seen in Figure 8 is
701
reflected, although much less clearly than at T+24h, with the BMA topping the list at
702
T+240h.
703 704
Finally Figure 10 presents the discharge performance we could achieve in this study for all
705
the stations that could be processed in the August 2008 to May 2010 period at T+240h. It
706
displays the CRPSS of the best overall product, the GMM with the BMA of the uncorrected
707
forecasts (combination of the three BMA transformed MM-s with the three initialisations
708
without initial time or 30-day bias correction). The variability of the scores is very large
31
709
geographically, but there are emerging patterns. Higher performance is observed in the
710
Amazon, and central and western parts of the USA, while lower CRPSSs are seen over the
711
Rocky Mountains in North America, and in northerly points in Europe and Russia.
712
Unfortunately the geographical coverage of the stations is not good enough to draw more
713
detailed conclusions.
714
5
715
This study has shown aspects of building a global multi-model hydro-meteorological
716
forecasting system using the TIGGE archive, and analysed the impact of the post-processing
717
required to run a multi-model system on the forecasts.
Conclusions
718 719
The atmospheric input was taken from four operational global meteorological ensemble
720
systems, using data available from TIGGE. The hydrological component of this study was the
721
HTESSEL land-surface model while the CaMa‐Flood global river routing model was used to
722
integrate runoff over the river network. Observations from the GRDC discharge archive were
723
used for evaluation and post-processing.
724 725
We have shown that the TIGGE archive is a valuable resource for river discharge forecasting,
726
and three main objectives were successfully addressed: (i) the sensitivity of the forecasting
727
system to the meteorological input variables, (ii) the potential improvements to the historical
728
discharge dataset (which provides initial river conditions to the forecast routing) and (iii)
729
improving the predictive distribution of the forecasts. The main outcomes can be grouped as
730
follows:
32
731
(i) The impact of replacing or altering the input meteorological variables to fit the system
732
requirements is small and allows the use of variables from the TIGGE archive for this
733
hydrological study.
734
(ii) The multi-model average historical discharge dataset provides a very valuable source of
735
uncertainty and a general gain in skill.
736
(iii) Significant improvements in the forecast distribution can be produced through the use of
737
initial time and 30-day bias corrections on the TIGGE model discharge, or on the
738
combination of the forecast models; however, the combination of techniques used has a big
739
impact on the improvement observed, with the best BMA products providing positive skill up
740
to 6 days.
741 742
The quality of the raw TIGGE-based discharge forecasts has been shown to be low, mainly
743
determined by the limited performance of the reanalysis-driven historical river conditions
744
analysed in Section 4.2. The lower skill is in agreement with results found in other studies.
745
For example Alfieri et al. (2013) showed that in the context of GloFAS, the Lisflood
746
hydrological model (Van Der Knijff et al. 2010), forced by ERAI-Land runoff, shows
747
variable performance based on the 1990-2010 historical period. From the analysed 620 global
748
observing stations the Pearson’s correlation coefficient reaches as low as -0.2, and only 71%
749
of them provide correlation values above 0.5. Donelly et al. (2015) highlighted similar
750
behaviour with the E-HYPE system based on 181 river gauges in Europe for 1981-2000. The
751
correlation component of the Kling-Gupta efficiency started around 0, and geographical
752
distribution of values in Europe was very similar to our result (not shown). The lowest
753
correlation was found mainly in Spain and in Scandinavia, with a comparable average value
754
to our European mean of 0.6-0.7 (see our Figure 5).
755
33
756
The combination and post-processing methods we applied to the discharge forecasts provided
757
significant improvement of the skill. Although the simple multi-model combinations and the
758
30-day bias correction (removing the mean error of the most recent 30 days) both provide
759
significant improvements, they are not capable of achieving positive global skill (i.e.
760
outperform the daily observed discharge climate). The initial-time correction, by adjusting to
761
the observations at initial time and applying this error correction into the forecast, is able to
762
provide skill in the short range (only up to 2-3 days), especially when combined with the 30-
763
day correction. However, the impact quickly wears off and for longer lead times (up to about
764
6 days) only the BMA post-processing method is able to provide positive average global skill
765
(closely followed by the persistence).
766 767
Although other studies could show significant improvement by using multiple meteorological
768
inputs (e.g. Pappenberger et al. 2008), in this study the impact of combining different TIGGE
769
ensemble models is rather small. This is most likely a consequence of the overwhelming
770
influence of the historical river conditions on the river initialisation. The grand combinations,
771
when we combine the forecasts produced with different reanalysis-driven historical river
772
conditions, however, always outperform the individual MM-s (single initialisation) for all the
773
post-processing products. They provide a noticeable overall skill improvement, which in our
774
study translated into an extension of the lead time, when the CRPSS drops below 0, by about
775
one day as a global average for the most skilful BMA forecasts.
776 777
In the future we plan to extend this study to address other aspects of building a skilful multi-
778
model hydro-meteorological system. The following areas are considered:
779
(i) Include other data sets that provide global coverage of runoff data on high enough
780
horizontal resolution, such as the Japanese 55-year Reanalysis (JRA-55, Kobayashi et al.
34
781
2015) or the NCEP Climate Forecast System Reanalysis (CFSR, Saha et al. 2010) to provide
782
further improvements in the initial river condition estimates.
783
(ii) Introduce the multi-hydrology aspect by adding an additional land-surface model such as
784
JULES (the Joint UK Land Environment Simulator, Best et al. 2011).
785
(iii) The presented scores in this study are relatively low even with the post-processing
786
methods applied. In order to achieve significantly higher overall scores, the information on
787
the discharge observations should be utilised in the modelling.
788
(iv) Similarly the discharge quality could be significantly improved by better calibration of
789
many of the watersheds in the CaMa-Flood routing.
790
(v) Alternatively the application of different river routing schemes such as Lisflood, which is
791
currently used in the GloFAS, would also provide potential increase in the skill through the
792
multi-model use.
793
(vi) Further analysis of the errors, and the trialling of other post-processing methods could
794
also lead to potential improvements. In particular better allowance should be made for
795
temporal correlation in the forecast errors. The use of the Extreme Forecast Index (Zsoter,
796
2006) as a tool to compare the forecasts to the model climate could potentially bring added
797
skill into the flood predictions.
798
6
799
We are thankful to the European Commission for funding this study through the GEOWOW
800
project in the 7th Framework Programme for Research and Technological Development
801
(FP7/2007-2013) under Grant Agreement no. 282915. We are also grateful to The Global
802
Runoff Data Centre, 56068 Koblenz, Germany for providing the discharge observation
803
dataset for our discharge forecast analysis.
Acknowledgements
35
804
7
References
805
Agustí-Panareda, A., G. Balsamo, and A. Beljaars, 2010: Impact of improved soil moisture
806
on the ECMWF precipitation forecast in West Africa, Geophys. Res. Lett., 37, L20808,
807
doi:10.1029/2010GL044748.
808 809
Ajami, N. K., Q. Duan, and S. Sorooshian, 2007: An integrated hydrologic Bayesian
810
multimodel combination framework: Confronting input, parameter, and model structural
811
uncertainty in hydrologic prediction, Water Resour. Res., 43, W01403,
812
doi:10.1029/2005WR004745.
813 814
Alfieri, L., P. Burek, E. Dutra, B. Krzeminski, D. Muraro, J. Thielen, and F. Pappenberger,
815
2013: GloFAS - global ensemble streamflow forecasting and flood early warning, Hydrology
816
and Earth System Sciences, 17, 1161–1175, doi:10.5194/hess-17-1161-2013.
817 818
Balsamo, G., A. Beljaars, K. Scipal, P. Viterbo, B. van den Hurk, M. Hirschi, and A. K.
819
Betts, 2009: A Revised Hydrology for the ECMWF Model: Verification from Field Site to
820
Terrestrial Water Storage and Impact in the Integrated Forecast System, J. Hydrometeor., 10,
821
623–643. doi: http://dx.doi.org/10.1175/2008JHM1068.1.
822 823
Balsamo, G., S. Boussetta, P. Lopez, and L. Ferranti, 2010: Evaluation of ERA-Interim and
824
ERA-Interim-GPCP-rescaled precipitation over the U.S.A., Era report Series 01/2010, 10 pp,
825
European Centre for Medium Range Weather Forecasts, Reading, UK.
826 827
Balsamo, G., F. Pappenberger, E. Dutra, P. Viterbo, and B. van den Hurk, 2011: A revised
828
land hydrology in the ECMWF model: a step towards daily water flux prediction in a fully36
829
closed water cycle, Hydrol. Proc., 25, 1046–1054. doi: 10.1002/hyp.7808.
830 831
Balsamo, G., and Coauthors, 2015: ERA-Interim/Land: a global land surface reanalysis data
832
set, Hydrol. Earth Syst. Sci., 19, 389–407, doi:10.5194/hess-19-389-2015.
833 834
Bao, H. H., and O. Zhao, 2012: Development and application of an atmospheric-hydrologic-
835
hydraulic flood forecasting model driven by TIGGE ensemble forecasts, Acta Meteorologica
836
Sinica, 26, 93–102.
837 838
Best, M. J., and Coauthors, 2011: The Joint UK Land Environment Simulator (JULES),
839
model description – Part 1: Energy and water fluxes, Geosci. Model Dev., 4, 677–699,
840
doi:10.5194/gmd-4-677-2011.
841 842
Bougeault, P., and Coauthors, 2010: The THORPEX Interactive Grand Global Ensemble,
843
Bull. Amer. Meteor. Soc., 91, 1059–1072.
844 845
Candille, G., and O. Talagrand, 2005: Evaluation of probabilistic prediction systems for a
846
scalar variable, Q.J.R. Meteorol. Soc., 131: 2131–2150. doi:10.1256/qj.04.71.
847 848
Cane, D., S. Ghigo, D. Rabuffetti, and M. Milelli, 2013: The THORPEX Interactive Grand
849
Global Ensemble. The case study of may 2008 flood in western Piemonte, Italy, Nat. Hazards
850
Earth Syst. Sci., 13, 211–220.
851 852
Cloke, H. L., and F. Pappenberger, 2009: Ensemble flood forecasting: A review, J.
853
Hydrology, 375, 613–626, doi:10.1016/j.jhydrol.2009.06.005.
37
854 855
Coccia, G. and Todini, E., 2011: Recent developments in predictive uncertainty assessment
856
based on the model conditional processor approach, Hydrol. Earth Syst. Sci., 15, 3253–3274,
857
doi:10.5194/hess-15-3253-2011.
858 859
Dee, D. P., and Coauthors, 2011: The ERA-Interim reanalysis: configuration and
860
performance of the data assimilation system, Q. J. R. Meteorol. Soc., 137, 553–597.
861
doi:10.1002/qj.828.
862 863
Demargne, J., Wu, L., Regonda, S. K., Brown, J. D., Lee, H., He, M., Seo, D.-J., Hartman,
864
R., Herr, H. D., Fresch, M., et al. (2014). The science of NOAA's Operational Hydrologic
865
Ensemble Forecast Service. Bull. Amer. Meteor. Soc., 95, 79–98,doi:10.1175/BAMS-D-12-
866
00081.1.
867 868
Donnelly, C., J. C. M. Andersson, B . Arheimer, 2015: Using flow signatures and
869
catchment similarities to evaluate the E-HYPE multi-basin model across Europe,
870
Hydrological Sciences Journal, Doi:10.1080/02626667.2015.1027710
871 872
Dong, L., L. Xiong, and K.-x. Yu, 2013: Uncertainty Analysis of Multiple Hydrologic
873
Models Using the Bayesian Model Averaging Method, J Appl. Math., 2013, 11 pages,
874
doi:10.1155/2013/346045.
875 876
Dutra, E., C. Schär, P. Viterbo, and P. M. A. Miranda, 2011: Land-atmosphere coupling
877
associated with snow cover, Geophys. Res. Lett., 38, L15707, doi:10.1029/2011GL048435.
878
38
879
Emerton, R. E., E. M. Stephens, F. Pappenberger, T. C. Pagano, A. H. Weerts, A. W. Wood,
880
P. Salamon, J. D. Brown, N. Hjerdt, C. Donnelly, C. A. Baugh and H. L. Cloke, 2016:
881
Continental and Global Scale Flood Forecasting Systems, WIREs Water,
882
doi:10.1002/wat2.1137.
883 884
Fraley, C., A. E. Raftery, and T. Gneiting, 2010: Calibrating Multimodel Forecast Ensembles
885
with Exchangeable and Missing Members Using Bayesian Model Averaging, Mon. Wea.
886
Rev., 138, 190–202. doi:http://dx.doi.org/10.1175/2009MWR3046.1.
887 888
Gneiting, T., and M. Katzfuss, 2014: Probabilistic Forecasting, Annual Review of Statistics
889
and Its Application, 1, 125–151.
890 891
Haddeland, I., and Coauthors, 2011: Multimodel Estimate of the Global Terrestrial Water
892
Balance: Setup and First Results, J. Hydrometeor, 12, 869–884.
893
doi:http://dx.doi.org/10.1175/2011JHM1324.1.
894 895
Hagedorn, R., R. Buizza, T. M. Hamill, M. Leutbecher, and T. N. Palmer, 2012: Comparing
896
TIGGE multimodel forecasts with reforecast-calibrated ECMWF ensemble forecasts, Q. J. R.
897
Meteorol. Soc., 138, 1814–1827. doi:10.1002/qj.1895.
898 899
He, Y., F. Wetterhall, H. L. Cloke, F. Pappenberger, M. Wilson, J. Freer, and G. McGregor,
900
2009: Tracking the uncertainty in flood alerts driven by grand ensemble weather predictions,
901
Meteor. Appl., 16, 91–101. doi:10.1002/met.132.
902
39
903
He, Y., F. Wetterhall, H. Bao, H. L. Cloke, Z. Li, F. Pappenberger, Y. Hu, D. Manful, and Y.
904
Huang, 2010: Ensemble forecasting using TIGGE for the July-September 2008 floods in the
905
Upper Huai catchment: a case study, Atm. Sci. Lett., 11, 132–138. doi:10.1002/asl.270.
906 907
Hersbach, H., 2000: Decomposition of the Continuous Ranked Probability Score for
908
Ensemble Prediction Systems, Wea. Forecasting, 15, 559–570.
909
doi:http://dx.doi.org/10.1175/1520-0434(2000)0152.0.CO;2.
910 911
Huffman, G. J., R. F. Adler, D. T. Bolvin, G. Gu, 2009: Improving the Global Precipitation
912
Record: GPCP Version 2.1., Geophys. Res. Lett., 36, doi:10.1029/2009GL040000.
913 914
Kobayashi, S., and Coauthors, 2015: The JRA-55 Reanalysis: General Specifications and
915
Basic Characteristics, Journal of the Meteorological Society of Japan, 93, 5–48,
916
doi:10.2151/jmsj.2015-001.
917 918
Krishnamurti, T. N., C. M. Kishtawal, T. E. LaRow, D. R. Bachiochi, Z. Zhang, C. E.
919
Williford, S. Gadgil, S. Surendran, 1999: Improved weather and seasonal climate forecasts
920
from multimodel superensemble, Science, 285, 1548–1550.
921 922
Krzysztofowicz, R., 1997: Transformation and normalization of variates with specified
923
distributions, J. Hydrol., 197, 286–292.
924 925
Liang, Z., D. Wang, Y. Guo, Y. Zhang, and R. Dai, 2013: Application of Bayesian Model
926
Averaging Approach to Multimodel Ensemble Hydrologic Forecasting, J. Hydrol. Eng., 18,
927
1426–1436.
40
928 929
Liu, Y., Q. Duan, L. Zhao, A. Ye, Y. Tao, C. Miao, X. Mu, and J. C. Schaake, 2013:
930
Evaluating the predictive skill of post-processed NCEP GFS ensemble precipitation forecasts
931
in China's Huai river basin, Hydrological Processes, 27, 57-74, DOI: 10.1002/hyp.9496.
932 933
Olsson, J., and G. Lindström, 2008: Evaluation and calibration of operational hydrological
934
ensemble forecasts in Sweden, J Hydrol., 350, 14–24, doi:10.1016/j.jhydrol.2007.11.010.
935 936
Pappenberger, F., J. Bartholmes, J. Thielen, H. L. Cloke, R. Buizza, and A. de Roo, 2008:
937
New dimensions in early flood warning across the globe using grand-ensemble weather
938
predictions, Geophys. Res. Lett., 35, L10404, doi:10.1029/2008GL033837.
939 940
Raftery, A. E., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: Using Bayesian Model
941
Averaging to Calibrate Forecast Ensembles, Mon. Wea. Rev., 133, 1155–1174.
942
doi:http://dx.doi.org/10.1175/MWR2906.1.
943 944
Reichle, R. H., R. D. Koster, G. J. M. De Lannoy, B. A. Forman, Q. Liu, S. P. P. Mahanama,
945
and A. Toure, 2011: Assessment and enhancement of MERRA land surface hydrology
946
estimates, Journal of Climate, 24, 6322–6338, doi:10.1175/JCLI-D-10-05033.1.
947 948
Schefzik, R., L. T. Thorarinsdottir, and T. Gneiting, 2013: Uncertainty Quantification in
949
Complex Simulation Models Using Ensemble Copula Coupling, Statistical Science, 28, 616–
950
640, doi:10.1214/13-STS443.
951
41
952
Saha, S., and Coauthors, 2010: The NCEP Climate Forecast System Reanalysis. Bulletin of
953
the American Meteorological Society, 91, 1015–1057, doi:10.1175/2010BAMS3001.1.
954 955
Thielen, J., J. Bartholmes, M. H. Ramos, A . de Roo, 2009: The European Flood Alert
956
System - Part 1: Concept and development, Hydrology and Earth System Sciences, 13,
957
125–140.
958 959
Todini, E., 2008: A model conditional processor to assess predictive uncertainty in flood
960
forecasting, Intern. J River Basin Manag., 6, 123–137. doi:10.1080/15715124.2008.9635342.
961 962
J. M. Van Der Knijff, J. M., J. Younis, and A. P. J. De Roo, 2010: LISFLOOD: a GIS‐based
963
distributed model for river basin scale water balance and flood simulation, International
964
Journal of Geographical Information Science, 24:2, 189–212,
965
doi:10.1080/13658810802549154.
966 967
Velázquez, J. A., F. Anctil, M. H. Ramos, and C. Perrin, 2011: Can a multi-model approach
968
improve hydrological ensemble forecasting? A study on 29 French catchments using 16
969
hydrological model structures, Adv. Geosci., 29, 33–42, doi:10.5194/adgeo-29-33-2011.
970 971
Vrugt, J. A., and B. A. Robinson, 2007: Treatment of uncertainty using ensemble methods:
972
Comparison of sequential data assimilation and Bayesian model averaging, Water Resour.
973
Res., 43, W01411, doi:10.1029/2005WR004838.
974
42
975
Yamazaki, D., S. Kanae, H. Kim, and T. Oki, 2011: A physically based description of
976
floodplain inundation dynamics in a global river routing model, Water Resour. Res., 47,
977
W04501, doi:10.1029/2010WR009726.
978 979
Zsoter, E., 2006: Recent developments in extreme weather forecasting, ECMWF Newsletter,
980
pp. 8–17, Reading, United Kingdom.
43
981
Figure 1 Location of discharge observing stations that could be processed in the discharge
982
experiments. The blue points are used in both the reanalysis (at least 15 years of data
983
available in 1981-2010) and in the TIGGE forecast experiment (at least about 80% of days
984
available in August 2008 to May 2010) evaluation. The yellow points provide enough
985
observation only for the reanalysis while the red points have enough data available only for
986
the TIGGE forecasts.
987 988
Figure 2 Example of discharge produced by ERAI-Land (red), ERAI (green) and MERRA-
989
Land (blue) forcing and the corresponding observations (black) for a GRDC stations on the
990
Rainy River at Manitou Rapids in the US.
991 992
Figure 3 Impact of different forcing configurations in HTESSEL on the discharge outputs as
993
a relative change compared to baseline (see Section 3.1). The black dashed line displays the
994
impact of changing the lowest model level (LML) to surface forcing (2m for temperature and
995
humidity, 10m for wind). The coloured lines highlight the impact of replacing different
996
ensemble control forcing variables, either individually or in combination, with ERAI-Land
997
data.
998 999
Figure 4 Surface runoff output of HTESSEL for the period 1-10 January 2012 (240-hour
1000
accumulation) from two ensemble control experiments, one with using (a) surface forcing
1001
and (b) where possible the lowest model level forcing. In the surface forcing experiment very
1002
large erroneous surface runoff values appear in very cold winter conditions.
1003 1004
Figure 5 Historical discharge forecast performance for ERAI-Land, ERAI, MERRA-Land
1005
and their equal-weight multi-model average (MMA). Mean Absolute Error Skill Score
44
1006
(MAESS) and sample Pearson correlation (CORR) are provided for each continent (North-
1007
America, South-America, Europe, Africa, Asia and Australia & Indonesia). The reference
1008
forecast system in the skill score is the observed discharge climate as daily prediction. The
1009
correlation values are also provided for the observed climate. The scores are continental and
1010
global averages of the individual scores of the available stations (for station reference see
1011
Figure 1).
1012 1013
Figure 6 Relative improvements in the sample Pearson correlation (CORR) by equal-weight
1014
average of ERAI-Land, ERAI, MERRA-Land discharges. Values show the change in
1015
correlation compared with the best of ERAI-Land, ERAI and MERRA-Land. Positive values
1016
show improvement while negative change means lower skill in the average than in the best
1017
of the three historical discharges.
45
1018 1019
Figure 7 Example of different discharge forecast products for the GRDC station of Lobith on
1020
the Rhine river in the Netherlands. All forecast are from the 00 UTC run at 18 April 2009 up
1021
to T+240h. The following products are plotted: multi-model combinations of four TIGGE
1022
models (ECMWF, UKMO, NCEP and CMA) with ERAI-Land (solid red lines), ERAI (solid
1023
green lines) and MERRA-Land (solid blue lines) initialisations; 30-day corrected (dash-
1024
dotted lines), initial time corrected (dashed lines) and 30-day and initial time corrected
1025
(dotted lines) versions of the three multi-model combinations, each with all three
1026
initialisations (with the respective colours); and finally the Bayesian Model Averaging
1027
versions of the three multi-model combinations (all with grey lines, only from T+24h). The
1028
verifying observations are displayed by the black line.
1029 1030
Figure 8 Discharge forecast performance for forecast ranges of T+0h to T+240h for the
1031
period of August 2008 to May 2010 as global averages of CRPSS (computed at each stations
1032
over the whole period) with the following forecast products: Four TIGGE models (ECMWF,
1033
UKMO, NCEP and CMA) with ERAI-Land initialisation (grey lines). Multi-model
1034
combination of these four models with ERAI-Land (red line), ERAI (green line) and
1035
MERRA-Land (blue line) initialisation. A grand combination of these three multi-models
1036
(orange line). Grand combinations for six post-processed products: the multi-model of the
1037
30-day correction (dashed burgundy line), the initial-error correction (dashed purple line
1038
with markers), the 30-day and initial-error correction combination (solid burgundy line with
1039
markers), two Bayesian model averaging (BMA) versions of the multi-model, one with the
1040
uncorrected forecasts (black line without markers), and one with the uncorrected forecasts
1041
extended by the persistence as predictor (black line with circles), and finally the persistence
1042
forecast (dashed light blue line with circles). The CRPSS is positively oriented and has a
46
1043
perfect value of 1. The 0 value line represents the quality of the reference system, the daily
1044
observed discharge climate.
1045 1046
Figure 9 Distribution of the skill increments over all stations provided by six combination
1047
and post-processing products for two time ranges, T+24h (a) and T+240h (b). The x axis
1048
shows the reference skill, the average CRPSS of the four TIGGE models with the ERA-Land
1049
initialisation, while the y axis displays the CRPSS of the post-processed forecasts at the
1050
stations. The six products are the MM combination of the 4 models with ERAI-Land (red
1051
circles), the GMM combination of the three uncorrected MM-s (green triangles), then the
1052
GMM combination of four post-processed products: the 30-day corrected MM-s (orange
1053
triangles), the initial time corrected MM-s (cyan squares), the combined 30-day and initial
1054
time corrected MM-s (purple stars) and finally the BMA transformed MM-s of the
1055
uncorrected forecasts (blue stars), where all the MM-s are the three MM with the different
1056
initialisations. The diagonal line represents no skill improvement, above this line the six
1057
products are better, while below it they are worse than the reference. The CRPSS values are
1058
computed based on the period of August 2008 to May 2010. Some of the stations which have
1059
reference CRPSS below -5 are not plotted.
1060 1061
Figure 10 Global CRPSS distribution of the highest quality post-processed product at
1062
T+240h, the grand multi-model combination of the BMA transformed uncorrected forecasts,
1063
based on the August 2008 to May 2010 period. The CRPSS is positively oriented and has a
1064
perfect value of 1. The 0 value represents the quality of the reference system, the daily
1065
observed discharge climate.
1066
47
1067
Table 1 Summary description of the sensitivity experiments with the ECMWF ensemble
1068
control forecasts. The baseline is the reference run with the lowest model level (LML) for
1069
wind, temperature and humidity forcing. The other experiments are with different changes
1070
for the forcing variables. First the LML is changed to surface (Surf), then different variables
1071
of the ensemble control and their combinations are substituted by ERAI-Land data. Default
1072
font means ensemble control forcing input (EC) while the bold italicised font denotes
1073
substituted ERAI-Land input. Sensitivity experiments
Forcing variable setup Rad
THP
Wind
Prec
Baseline
EC-Surf
EC-LML
EC-LML
EC-Surf
Surf vs. LML
EC-Surf
EC-Surf
EC-Surf
EC-Surf
Radiation
ERAI-Surf
EC-LML
EC-LML
EC-Surf
Wind
EC-Surf
EC-LML
ERAI-LML
EC-Surf
THP
EC-Surf
ERAI-LML
EC-LML
EC-Surf
Rad/THP
ERAI-Surf
ERAI-LML
EC-LML
EC-Surf
Rad/THP
ERAI-Surf
ERAI-LML
ERAI-LML
EC-Surf
Rad/THP/Wind/Prec ERAI-Surf
ERAI-LML
ERAI-LML
ERAI-Surf
1074
48
1075
Table 2 Detailed evaluation of the discharge sensitivity experiments at T+240h range for
1076
different areas and periods. Relative discharge differences are shown after replacing
1077
ensemble control forcing variables, either individually or in combination, by ERAI-Land,
1078
and also the lowest model level (LML) with surface forcing (2m for temperature and
1079
humidity, 10m for wind). The whole globe, Northern Extratropics (defined here as 35-70N)
1080
and Tropics (30S-30N), as well as the specific seasons are displayed. Average difference
Rad+THP+ Rad+THP+ Rad
THP
Wind
(%)
Surf vs. LML Wind
Wind+Prec
Global
1.0
2.4
0.6
2.9
15.6
0.7
N.Extratropics
1.0
3.1
0.6
3.5
12.8
0.8
N.Extratropics JJA
0.5
0.6
0.2
0.9
13.0
0.3
N.Extratropics DJF
1.1
2.9
0.7
3.6
9.5
0.8
Tropics
1.0
1.1
0.5
1.8
17.6
0.5
Tropics JJA
0.9
0.9
0.4
1.5
15.0
0.5
Tropics DJF
1.2
1.3
0.6
2.1
18.7
0.6
1081
49
1082
1083 1084
Figure 1 Location of discharge observing stations that could be processed in the discharge
1085
experiments. The blue points are used in both the reanalysis (at least 15 years of data
1086
available in 1981-2010) and in the TIGGE forecast experiment (at least about 80% of days
1087
available in August 2008 to May 2010) evaluation. The yellow points provide enough
1088
observation only for the reanalysis while the red points have enough data available only for
1089
the TIGGE forecasts.
1090
50
1091 1092
Figure 2 Example of discharge produced by ERAI-Land (red), ERAI (green) and MERRA-
1093
Land (blue) forcing and the corresponding observations (black) for a GRDC stations on the
1094
Rainy River at Manitou Rapids in the US.
1095
51
1096 1097
Figure 3 Impact of different forcing configurations in HTESSEL on the discharge outputs as
1098
a relative change compared to baseline (see Section 3.1). The black dashed line displays the
1099
impact of changing the lowest model level (LML) to surface forcing (2m for temperature and
1100
humidity, 10m for wind). The coloured lines highlight the impact of replacing different
1101
ensemble control forcing variables, either individually or in combination, with ERAI-Land
1102
data.
1103
52
1104 1105
Figure 4 Surface runoff output of HTESSEL for the period 1-10 January 2012 (240-hour
1106
accumulation) from two ensemble control experiments, one with using (a) surface forcing
1107
and (b) where possible the lowest model level forcing. In the surface forcing experiment very
1108
large erroneous surface runoff values appear in very cold winter conditions.
1109
53
1110 1111
Figure 5 Historical discharge forecast performance for ERAI-Land, ERAI, MERRA-Land
1112
and their equal-weight multi-model average (MMA). Mean Absolute Error Skill Score
1113
(MAESS) and sample Pearson correlation (CORR) are provided for each continent (North-
1114
America, South-America, Europe, Africa, Asia and Australia & Indonesia). The reference
1115
forecast system in the skill score is the observed discharge climate as daily prediction. The
1116
correlation values are also provided for the observed climate. The scores are continental and
1117
global averages of the individual scores of the available stations (for station reference see
1118
Figure 1).
1119
54
1120 1121
Figure 6 Relative improvements in the sample Pearson correlation (CORR) by equal-weight
1122
average of ERAI-Land, ERAI, MERRA-Land discharges. Values show the change in
1123
correlation compared with the best of ERAI-Land, ERAI and MERRA-Land. Positive values
1124
show improvement while negative change means lower skill in the average than in the best
1125
of the three historical discharges.
1126
55
1127 1128
Figure 7 Example of different discharge forecast products for the GRDC station of Lobith on
1129
the Rhine river in the Netherlands. All forecast are from the 00 UTC run at 18 April 2009 up
1130
to T+240h. The following products are plotted: multi-model combinations of four TIGGE
1131
models (ECMWF, UKMO, NCEP and CMA) with ERAI-Land (solid red lines), ERAI (solid
1132
green lines) and MERRA-Land (solid blue lines) initialisations; 30-day corrected (dash-
1133
dotted lines), initial time corrected (dashed lines) and 30-day and initial time corrected
1134
(dotted lines) versions of the three multi-model combinations, each with all three
1135
initialisations (with the respective colours); and finally the Bayesian Model Averaging
1136
versions of the three multi-model combinations (all with grey lines, only from T+24h). The
1137
verifying observations are displayed by the black line.
1138
56
1139 1140
Figure 8 Discharge forecast performance for forecast ranges of T+0h to T+240h for the
1141
period of August 2008 to May 2010 as global averages of CRPSS (computed at each stations
1142
over the whole period) with the following forecast products: Four TIGGE models (ECMWF,
1143
UKMO, NCEP and CMA) with ERAI-Land initialisation (grey lines). Multi-model
1144
combination of these four models with ERAI-Land (red line), ERAI (green line) and
1145
MERRA-Land (blue line) initialisation. A grand combination of these three multi-models
1146
(orange line). Grand combinations for six post-processed products: the multi-model of the
1147
30-day correction (dashed burgundy line), the initial-error correction (dashed purple line
1148
with markers), the 30-day and initial-error correction combination (solid burgundy line with
1149
markers), two Bayesian model averaging (BMA) versions of the multi-model, one with the
1150
uncorrected forecasts (black line without markers), and one with the uncorrected forecasts
1151
extended by the persistence as predictor (black line with circles), and finally the persistence
1152
forecast (dashed light blue line with circles). The CRPSS is positively oriented and has a
57
1153
perfect value of 1. The 0 value line represents the quality of the reference system, the daily
1154
observed discharge climate.
1155
58
1156 1157
Figure 9 Distribution of the skill increments over all stations provided by six combination
1158
and post-processing products for two time ranges, T+24h (a) and T+240h (b). The x axis
1159
shows the reference skill, the average CRPSS of the four TIGGE models with the ERA-Land
1160
initialisation, while the y axis displays the CRPSS of the post-processed forecasts at the
1161
stations. The six products are the MM combination of the 4 models with ERAI-Land (red
1162
circles), the GMM combination of the three uncorrected MM-s (green triangles), then the
1163
GMM combination of four post-processed products: the 30-day corrected MM-s (orange
1164
triangles), the initial time corrected MM-s (cyan squares), the combined 30-day and initial
1165
time corrected MM-s (purple stars) and finally the BMA transformed MM-s of the
1166
uncorrected forecasts (blue stars), where all the MM-s are the three MM with the different
1167
initialisations. The diagonal line represents no skill improvement, above this line the six
1168
products are better, while below it they are worse than the reference. The CRPSS values are
1169
computed based on the period of August 2008 to May 2010. Some of the stations which have
1170
reference CRPSS below -5 are not plotted.
1171
59
1172 1173
Figure 10 Global CRPSS distribution of the highest quality post-processed product at
1174
T+240h, the grand multi-model combination of the BMA transformed uncorrected forecasts,
1175
based on the August 2008 to May 2010 period. The CRPSS is positively oriented and has a
1176
perfect value of 1. The 0 value represents the quality of the reference system, the daily
1177
observed discharge climate.
60