Danmarks og Grønlands Geologiske Undersøgelse — Særudgivelse 2007
Hydrological Modelling and River Basin Management Doctoral Thesis Jens Christian Refsgaard
Geological Survey of Denmark and Greenland Danish Ministry of the Environment
Denne afhandling er af Det Naturvidenskabelige Fakultet ved Københavns Universitet antaget til offentligt at forsvares for den naturvidenskabelige doktorgrad. København, den 5. januar, 2007 Nils O. Andersen Dekan Forsvaret vil finde sted fredag den 1 juni, 2007 kl 14 versitet
00
i Anneksauditorium A, Studiestræde 6, Københavns Uni-
This thesis has been accepted by the Faculty of Natural Science at the University of Copenhagen for public defence in fulfilment of the degree of Doctor of Science. th Copenhagen, 5 January, 2007
Nils O Andersen Dean st 00 The defence will take place on Friday 1 June, 2007 at 14 in Anneksaudiorium A, Studiestræde 6, University of Copenhagen
Special Issue Author: Jens Christian Refsgaard Illustrations: Kristian A. Rasmussen and reproductions from existing publications Cover: Kristian A. Rasmussen Date: January 2007 The Report is available on the internet at http://www.geus.dk/ ISBN 978-87-7871-185-4
Geological Survey of Denmark and Greenland (GEUS) Øster Voldgade 10 DK-1350 København K Tel: +45 38142000 Fax: +45 38142050 Email:
[email protected] http://www.geus.dk/
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Table of Contents Dansk Resume
3
Abstract
4
1.
5
Introduction 1.1 1.2
2
Water Resources Management and Hydrological Modelling Objective and Content
5 5
Water Resources Management and the Modelling Process
7
2.1 Modelling as Part of the Planning and Management Process 2.2 Terminology and Scientific Philosophical Basis for the Modelling Process 2.2.1 Background 2.2.2 Terminology and guiding principles 2.2.3 Scientific philosophical aspects 2.3 Modelling Protocol 2.4 Classification of Models 3
Simulation of Hydrological Processes at Catchment Scale 3.1 Flow modelling 3.1.1 Groundwater/surface water model for the Suså catchment ([1], [2]) 3.1.2 Application of SHE to catchments in India ([4], [5]) 3.1.3 Intercomparison of different types of hydrological models ([6]) 3.2 Reactive Transport 3.2.1 Oxygen transport and consumption in the unsaturated zone ([3]) 3.2.2 An integrated model for the Danubian Lowland ([9]) 3.2.3 Large scale modelling of groundwater contamination ([10]) 3.3 Real-time Flood Forecasting 3.3.1 Intercomparison of updating procedures for real-time forecasting ([8])
7 10 10 10 12 14 18 20 20 20 27 32 36 36 39 45 49 49
4. Key Issues in Catchment Scale Hydrological Modelling
53
4.1 Scaling 4.1.1 Catchment heterogeneity 4.1.2 A scaling framework 4.1.3 Scaling - an example 4.1.4 Discussion – post evaluation 4.2 Confirmation, Verification, Calibration and Validation 4.2.1 Confirmation of conceptual model 4.2.2 Code verification 4.2.3 Model calibration 4.2.4 Model validation
53 53 56 58 59 62 62 62 63 63
i
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management Discussion – post evaluation 4.2.5 4.3 Uncertainty Assessment 4.3.1 Modelling uncertainty in a water resources management context 4.3.2 Data uncertainty 4.3.3 Parameter uncertainty 4.3.4 Model structure uncertainty 4.3.5 Discussion – post evaluation 4.4 Quality Assurance in Model based Water Management 4.4.1 Background 4.4.2 The HarmoniQuA approach 4.4.3 Organisational requirements for QA guidelines to be effective 4.4.4 Performance criteria and uncertainty – when is a model good enough? 4.4.5 Discussion – post evaluation 5
Conclusions and Perspectives for Future Work 5.1 5.2
6
Summary of Main Scientific Contributions Modelling Issues for Future Research
January 2007
64 66 66 71 71 73 75 77 77 77 79 79 80 81 81 82
References
84
ii
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Appendices: Publications [1] – [15] [1]
Refsgaard JC, Hansen E (1982) A Distributed Groundwater/Surface Water Model for the Suså Catchment. Part 1: Model Description. Nordic Hydrology, 13, 299-310.
[2]
Refsgaard JC, Hansen E (1982) A Distributed Groundwater/Surface Water Model for the Suså Catchment. Part 2: Simulations of Streamflow Depletions Due to Groundwater Abstraction. Nordic Hydrology, 13, 311-322.
[3]
Refsgaard JC, Christensen TH, Ammentorp HC (1991) A model for oxygen transport and consumption in the unsaturated zone. Journal of Hydrology, 129, 349-369.
[4]
Refsgaard JC, Seth SM, Bathurst JC, Erlich M, Storm B, Jørgensen, GH, Chandra S (1992) Application of the SHE to catchments in India - Part 1: General results. Journal of Hydrology, 140, pp 1-23.
[5]
Jain SK, Storm B, Bathurst JC, Refsgaard JC, Singh RD (1992) Application of the SHE to catchments in India - Part 2: Field experiments and simulation studies with the SHE on the Kolar subcatchment of the Narmada River. Journal of Hydrology, 140, 25-47.
[6]
Refsgaard JC, Knudsen J (1996) Operational validation and intercomparison of different types of hydrological models. Water Resources Research, 32 (7), 2189-2202.
[7]
Refsgaard JC (1997) Parametrisation, calibration and validation of distributed hydrological models. Journal of Hydrology, 198, 69-97.
[8]
Refsgaard JC (1997) Validation and Intercomparison of Different Updating Procedures for RealTime Forecasting. Nordic Hydrology, 28, 65-84.
[9]
Refsgaard JC, Sørensen HR, Mucha I, Rodak D, Hlavaty Z, Bansky L, Klucovska J, Topolska J, Takac J, Kosc V, Enggrob HG, Engesgaard P, Jensen JK, Fiselier J, Griffioen J, Hansen S (1998) An Integrated Model for the Danubian Lowland – Methodology and Applications. Water Resources Management, 12, 433-465. Refsgaard JC, Thorsen M, Jensen JB, Kleeschulte S, Hansen S (1999) Large scale modelling of groundwater contamination from nitrogen leaching. Journal of Hydrology, 221(3-4), 117-140.
[10] [11]
Thorsen M, Refsgaard JC, Hansen S, Pebesma E, Jensen JB, Kleeschulte S (2001) Assessment of uncertainty in simulation of nitrate leaching to aquifers at catchment scale. Journal of Hydrology, 242, 210-227.
[12]
Refsgaard JC, Henriksen HJ (2004) Modelling guidelines – terminology and guiding principles. Advances in Water Resources, 27(1), 71-82.
[13]
Refsgaard JC, Henriksen HJ, Harrar WG, Scholten H, Kassahun A (2005) Quality assurance in model based water management – review of existing practice and outline of new approaches. Environmental Modelling & Software, 20, 1201-1215.
[14]
Refsgaard JC, Nilsson B, Brown J, Klauer B, Moore R, Bech T, Vurro M, Blind M, Castilla G, Tsanis I, Biza P (2005) Harmonised techniques and representative river basin data for assessment and use of uncertainty information in integrated water management (HarmoniRiB). Environmental Science and Policy, 8, 267-277.
[15]
Refsgaard JC, van der Sluijs JP, Brown J, van der Keur P (2006). A framework for dealing with uncertainty due to model structure error. Advances in Water Resources, 29, 1586-1597.
iii
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
iv
January 2007
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Preface The work presented in this thesis together with the 15 publications published between 1982 and 2006 form the material for evaluation for the degree of doctor scientiarum (dr. scient.) at the University of Copenhagen. The papers have all been published in peer reviewed international scientific journals. They are referred to by the numbers [1] to [15]. In the present report I have assembled and summarised my most important scientific contributions to catchment modelling that has been my research interest during the past three decades. In this connection I wish to thank all my co-authors for a very inspiring co-operation during the years. Research does not take place in a vacuum, and without the interactions with them my work would not have been possible. I wish to acknowledge former and present colleagues and managements at the three organisations where I have been employed. At the Institute of Hydrodynamics and Hydraulic Engineering, Technical University of Denmark (now Environment and Resources, DTU) I was given the opportunity to explore and develop new integrated groundwater/surface water catchment models at a time when hydrological modelling was still in its infancy. This showed me the enormous potential of this new field. At Danish Hydraulic Institute (now DHI Water & Environment) I was then entrusted with further development of modelling tools and with testing them in real life applications. This taught me the limitations and difficulties we encounter and the need to be humble when applying models in water resources management. Finally, the Geological Survey of Denmark and Greenland (GEUS) has provided a very inspiring scientific environment and given me the opportunity to get involved in broader international research projects that have matured much of my previous views and allowed me to assemble this work. A special thank goes to Kristian A. Rasmussen, GEUS, for using his magic touch to polish some of the old dusty figures from the last century to make them easier to read in this thesis. Last, but not least, I wish to thank my family for their patience and support and for accepting that I always have been too busy with this topic.
Copenhagen, January 2007 Jens Christian Refsgaard
"Life can only be understood backwards; but it must be lived forwards" Søren Kierkegaard (1813-1855)
1
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
2
January 2007
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Dansk Resume Publikationerne og materialet i denne doktorafhandling beskriver en række videnskabelige undersøgelser af hydrologisk modellering på oplandsskala i relation til vandressourceforvaltning. Hver af de 15 publikationer fokuserer på dele af det overordnede emne spændende fra udvikling af nye koncepter og modelkoder til modelanvendelser; fra punktskala til oplandsskala; fra modellering af vandstrømninger til transport af opløste og reaktive stoffer; fra fokus på planlægning til real-tids oversvømmelsesvarsling og videre til tværgående emner og protokoller for selve modelleringsprocessen. Afhandlingens kapitel 2 præsenterer protokoller for hydrologisk modellering og en diskussion af interaktionen mellem hydrologisk modellering og vandressourceforvaltning. Endvidere forklares den terminologi og den tilgrundlæggende videnskabsfilosofiske tankegang samt den klassifikation af modeltyper, som benyttes i resten af afhandlingen. Kapitel 3 indeholder resumeer af modelstudier baseret på ni af publikationerne. Vurderingerne af disse publikationers bidrag til ny viden på det tidspunkt de blev publiceret og af emner som ikke blev behandlet i publikationerne, viser en betydelig udvikling gennem de sidste 25 år. Fx indeholder de første publikationer om udvikling af nye modelkoder, intet om verifikation af modelkode, validering af modeller mod uafhængige data eller usikkerhedsvurderinger – emner som i dag betragtes som meget væsentlige. Eksemplerne illustrerer ligeledes, hvordan generelle emner som skalaproblemer og model validering gradvis udviklede sig med baggrund i erfaringer og erkendte problemer fra modelstudier, som egentlig havde andre formål. Kapitel 4 præsenterer og diskuterer herefter fire generelle emner: (a) heterogenitet og skalering; (b) konfirmation, verifikation, kalibrering og validering af modeller; (c) usikkerhedsvurderinger; og (d) kvalitetssikring af modelleringsprocessen. Mine væsentligste bidrag til ny videnskabelig viden har været indenfor de følgende fem områder: • Ny konceptuel forståelse og tilhørende kodeudvikling. Suså modellen var baseret på en ny forståelse af interaktionen mellem overfladevand og grundvand i moræneområder og bragte ny viden om hvorledes grundvandsindvinding påvirker vandløb i sådanne oplande. • Validering af modeller. Arbejdet med rigoristiske principper for validering af modeller og eksempler på anvendelser for såvel ’lumped conceptual’ og ’distributed physically-based’ modeller har været en grundpille gennem de sidste 15 år af min forskning. Specielt er introduktionen af begrebet ’conditional validation’ ny. • Skalering. Mit arbejde har ikke ’løst’ skalaproblemerne, men bidrager til at tydeliggøre de principielt forskellige metoder med fokus på deres respektive forudsætninger og begrænsninger. • Usikkerhedsvurderinger. En betydelig del af min forskningsaktivitet gennem de sidste 10 år har fokuseret på usikkerhedsaspekter. Mit hovedbidrag i den sammenhæng har været introduktion af bredere usikkerhedsaspekter i hele modelleringsprocessen samt arbejdet med usikkerheder på modelstruktur. • Protokoller for hydrologisk modellering og kvalitetssikring af modelleringsprocessen. Den omfattende og detaljerede modelleringsprotokol, som blev udviklet i HarmoniQuA projektet er en formalisering og udmøntning af erfaring fra de foregående 25 års arbejde med hydrologisk modellering. De ny elementer heri er den fokus der lægges på (a) den interaktive dialog mellem modellør, vandressourceforvalter, reviewer, interessenter og offentligheden; (b) usikkerhedsvurderinger som et løbende element gennem hele modelleringsprocessen; (c) model validering; og (d) introduktion af erfaringer og subjektiv viden via eksterne reviews.
3
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Abstract The publications and material presented in this thesis describe a series of scientific investigations on catchment modelling in relation to water resources management. Each of the 15 publications represents parts of the overall topic ranging from development of new concepts and model codes to model applications; from point scale to catchment scale; from flow modelling to transport and reactive modelling; from planning type applications to real-time forecasting and further on to crosscutting issues and protocols for the modelling process. The thesis starts with a presentation of protocols for the hydrological modelling process together with a discussion of the interaction between the water resources planning and management process and the hydrological modelling process. This includes a definition of terminology, a discussion of the underlying scientific philosophy and a classification of hydrological models. The following chapter comprises summaries of cases of simulation models based on nine of the publications. The post evaluations of the contributions to scientific knowledge in the publications and the issues not taken into account in the earlier publications reveal significant developments over the years. For example the first publications focussing on development of new model codes did not put any emphasis on rigorous verification or validation tests nor on uncertainty assessments, which are key issues today. The cases furthermore illustrate how general issues such as scaling and model validation gradually emerged from experiences and problems encountered in catchment studies that had other primary objectives. The next chapter then provides a presentation and discussion of four general issues: (a) catchment heterogeneity and scaling; (b) confirmation, verification, calibration and model validation; (c) uncertainty assessment; and (d) quality assurance in model based water management. My main contributions to scientific knowledge have been in the following five areas: • New conceptual understanding and code development. The Suså model was based on a new conceptual understanding of the surface water/groundwater interaction in moraine catchment and brought new insight into the effect of groundwater abstraction on streamflow in catchments with such hydrogeological characteristics. • Model validation. The work on rather rigorous principles for model validation and the examples of their application both for lumped conceptual and distributed physically based models is a cornerstone in my research. In particular the introduction of the term ‘conditional validation’ is novel. • Scaling. The framework on scaling does not ‘solve’ the scaling problem but contributes to clarifications on applicable methodologies with focus on their respective assumptions and limitations. • Uncertainty assessment. During the past decade a considerable part of my research work has focussed on uncertainty aspects. I consider my main contributions in this respect to be the introduction of the broader uncertainty aspects integrated into the modelling framework and the work with model structure uncertainty. • Modelling protocols and guidelines for quality assurance in the modelling process. The comprehensive modelling protocol developed within the HarmoniQuA project is a formalisation of experience and practises that have gradually emerged over the years. The novel elements are the emphasis on (a) the interactive dialogue between modeller, water manager, reviewer, stakeholders and the public; (b) uncertainty assessments throughout the modelling process; (c) model validation; and (d) experience and subjective knowledge introduced through external model reviews.
4
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
1. Introduction
1.1
Water Resources Management and Hydrological Modelling "Scarcity and misuse of fresh water pose a serious and growing threat to sustainable development and protection of the environment. Human health and welfare, food security, industrial development and the ecosystems on which they depend, are all at risk, unless water and land resources are managed more effectively in the present decade and beyond than they have been in the past". (ICWE, 1992) “The fact that the world faces a water crises has become increasingly clear in recent years. Challenges remain widespread and reflect severe problems in the management of water resources in many parts of the world. These problems will intensify unless effective and concerted actions are taken”. (WWAP, 2003)
The first of the above quotes presents the status and the future challenges facing hydrologists and water resources managers as summarised in the introductory paragraph of the Dublin Statement on Water and Sustainable Development (ICWE, 1992). The second quote is from the first chapter of the UN World Water Development Report “Water for People, Water for Life” which is a collaborative effort of 23 UN agencies and convention secretariats co-ordinated by the World Water Assessment Programme. Thus the challenges in water resources management are enormous, both at the global scale as illustrated above and at smaller scales as for instance outlined in the vision for the European water sector recently formulated by the European Water Supply and Sanitation Technology Platform (WSSTP, 2005). The present thesis deals with hydrological modelling. It must be emphasised that modelling in itself is not sufficient to address these challenges. Modelling only constitute one, among several, sets of tools that can be used to support water resources management. Computer based hydrological models have been developed and applied at an ever increasing rate during the past four decades. The key reasons for that are twofold: (a) improved models and methodologies are continuously emerging from the research community, and (b) the demand for improved tools increases with the increasing pressure on water resources. Overviews of the status and development trends in catchment scale hydrological modelling during this period can be found in Fleming (1975) and Singh (1995).
1.2
Objective and Content
The objective of this thesis is to present the contributions to scientific knowledge that has emerged from the research described in the 15 appended publications. I have structured the thesis with an aim of presenting my research contributions within a framework of catchment modelling and its application to support water resources management.
5
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
The next chapter (Chapter 2) therefore presents an overall framework of the water resources management and planning process and the modelling process and the interaction between these two processes. Here the terminology and modelling protocol are introduced and discussed. This chapter is based on publications [7], [12] and [13], i.e. mainly some of my most recent work. Chapter 3 comprises a number of examples of simulation models ranging from point scale to catchment scale, from flow modelling to transport and reactive modelling and from planning type applications to real-time forecasting. This chapter is based on publications [1], [2], [3], [4], [5], [6], [8], [9] and [10], i.e. mainly some of my earlier work. Chapter 4 then provides a presentation and discussion of key and cross-cutting issues in hydrological modelling such as scaling, model validation, uncertainty assessment and quality assurance. These issues that were introduced as part of the overall framework in Chapter 2 are here discussed with reference to the experience and findings made in the publications. This chapter includes ideas, views and material from all the 15 publications, but with more emphasis on some of the more general purpose publications [6], [7], [10], [11], [12], [13], [14] and [15]. Finally, Chapter 5 contains some conclusions and perspectives for future work. Thus I have not structured the content of this report according to the chronology of my publications [1] – [15]. The reason for this is that my most recent work provides a broader and better overview of the topic and is thus better suited for providing a framework for my earlier work.
6
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
2
January 2007
Water Resources Management and the Modelling Process
2.1 Modelling as Part of the Planning and Management Process Integrated Water Resources Management (IWRM) is “a process, which promotes the co-ordinated development and management of water, land and related resources, in order to maximise the resultant economic and social welfare in an equitable manner without compromising the sustainability of vital ecosystems” (GWP, 2000). In the EU Water Framework Directive (WFD) Guidance Document on Planning Processes planning is defined as “a systematic, integrative and iterative process that is comprised of a number of steps executed over a specified time schedule” (EC, 2003b). In all new guidelines on water resources management the importance of integrated approaches, cross-sectoral planning and of public participation in the planning process are emphasised (GWP, 2000; EC, 2003b; Jønch-Clausen, 2004). Models describing water flows, water quality, ecology and economy are being developed and used in increasing number and variety to support water management decisions. The interactions between the modelling process and the water management process are illustrated in Figs. 1 and 2. Fig. 1 shows the key actors in the water management process and the five steps that the modelling process typically may be decomposed in. The organisation that commissions a modelling study is denoted the water manager. This is often the competent authority, but can also be a stakeholder such as a water supply company. The role of the government is most often limited to providing the enabling environment such as legislation, research and information infrastructure. The typical cyclic and iterative character of the water management process, such as the WFD process, is illustrated in Fig. 2, where the interaction with the modelling process is illustrated by the large circle (water management) and the four smaller supporting circles (modelling). The WFD planning process, as most other planning processes, contains four main elements: • Identification including assessment of present status, analysis of impacts and pressures and establishment of environmental objectives. Here modelling may be useful for example for supporting assessments of what are the reference conditions and what are the impacts of the various pressures (EC, 2004). • Designing including the set up and analysis of programme of measures designed to be able in a cost effective way to reach the environmental objectives. Here modelling will typically be used for supporting assessments of the effects and costs of various measures under consideration. • Implementing the measures. Here on-line modelling in some cases may support the operational decisions to be made. • Evaluation of the effects of the measures on the environment. Here modelling may support the monitoring in order to extract maximum information from the monitoring data, e.g. by indicating errors and inadequacies in the data and by filtering out the effects of climate variability.
7
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
The Environment
Problem Identification
January 2007
1. Model Study Plan • • • •
Identify problem Define requirements Assess uncertainties Prepare model study plan
2. Data and Conceptualisation • • • •
Public Opinion
Collect and process data Develop conceptual model Select model code Review and dialogue
3. Model Set-up
Stakeholders
• Construct model • Reassess performance criteria • Review and dialogue
Competent Authority
4. Calibration and Validation
Government
• • • •
Model calibration Model validation Uncertainty assessment Review and dialogue
5. Simulation and Evaluation Implementation
Water Management Decision
• Model predictions • Uncertainty assessment • Review and dialogue
Water Management Process
Modelling Process
Fig. 1 The role of the modelling process and the water management decision process (inspired from Pascual et al. (2003).
It is important to note that the modelling studies typically do not address the entire planning and management process, but rather support certain elements of the process. Modelling is applied as a response (but usually not the only response) to an identified problem and can provide support for water management decisions. The types of interactions between the modelling process and the planning and management process are:
8
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management •
•
•
January 2007
The modelling process starts with a thorough framing of the problem to be addressed and definition of modelling objectives and requirements for the modelling study (Step 1 in Fig. 1). Water managers and stakeholders dominate this step, which basically is identical to part of the broader planning process. A participatory based assessment of the most important sources of uncertainty for the decision process should be used as a basis for prioritising the elements of the modelling study. The uncertainty assessments made at this stage will typically be qualitative. The main modelling itself is composed of steps 2, 3 and 4 of Fig. 1. Here the link with the main planning process consists of dialogue, reviews and discussions of preliminary results. The amount and type of interaction here depends on the level of public participation that may vary from case to case from providing information over consultation to active involvement (Henriksen et al., submitted). The finalisation of the modelling study (equivalent to the last step in Fig. 1), typically including scenario simulations. Here the water managers and the stakeholders again have a dominant role. The decisions made at the outcome of this step on the basis of modelling results are made in the context of the main planning process. Uncertainty assessment of model predictions is a crucial aspect of the modelling results and should be communicated in a way that is accessible for the stakeholders in the further water management process.
Modelling Evaluation
Modelling
WFD process
Implementation
Modelling Identification
Designing
Modelling
Fig. 2 The role of modelling in the water management process within the context of the EU Water Framework Directive (WFD)
9
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
2.2 Terminology and Scientific Philosophical Basis for the Modelling Process
2.2.1 Background As pointed out in [12] a key problem in relation to establishment of a theoretical modelling framework is confusion on terminology. For example the terms validation and verification are used with different, and some times interchangeable, meanings by different authors. The confusion arises from both semantic and philosophical considerations (Rykiel, 1996). Another important problem is the lack of consensus related to the so far non-conclusive debate on the fundamental question concerning whether a water resources model can be validated or verified, and whether it as such can be claimed to be suitable or valid for particular applications (Konikow and Bredehoeft, 1992; De Marsily et al., 1992; Oreskes et al., 1994). An important issue in relation to validation/verification is the distinction between open and closed systems. A system is a closed system if its true conditions can be predicted or computed exactly. This applies to mathematics and mostly to physics and chemistry. Systems where the true behaviour cannot be computed due to uncertainties and lack of knowledge on e.g. input data and parameter values are called open systems. The systems we are dealing with in water resources management, based on geosciences, biology and socio-economy, are open systems. According to Konikow and Bredehoeft (1992) and Oreskes et al. (1994) it is not possible to verify or validate models of open systems. Finally, the principles have to reflect and be in line with the underlying philosophy of environmental modelling that have changed significantly during the past decades. In the early days many of us were focussing on the huge potentials of sophisticated models in a way that in retrospect may be characterised as rather naive enthusiasm (e.g. Freeze and Harlan (1969); Abbott, 1992). The dominant views today appears to be a much more balanced and mature view (e.g. Beven, 2002a; Beven, 2002b).
2.2.2
Terminology and guiding principles
According to the terminology presented in [12] the simulation environment is divided into four basic elements as shown in Fig. 3. The inner arrows describe the processes that relate the elements to each other, and the outer circle refers to the procedures that evaluate the credibility of these processes. In general terms a model is understood as a simplified representation of the natural system it attempts to describe. However, a distinction is made between three different meanings of the general term model, namely the conceptual model, the model code and the model that here is defined as a site-specific model. The most important elements in the terminology and their interrelationships are defined as follows: Reality: The natural system, understood here as the study area. Conceptual model: A description of reality in terms of verbal descriptions, equations, governing relationships or ‘natural laws’ that purport to describe reality. This is the user's perception of the key hydrological and ecological processes in the study area (perceptual model) and the corresponding
10
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
simplifications and numerical accuracy limits that are assumed acceptable in order to achieve the purpose of the modelling. A conceptual model thus includes both a mathematical description (equations) and a descriptions of flow processes, river system elements, ecological structures, geological features, etc. that are required for the particular purpose of modelling. By drawing an analogy to scientific philosophical discussion the conceptual model in other words constitutes the scientific hypothesis or theory that we assume for our particular modelling study.
Fig. 3 Elements of a modelling terminology [12].
Model code: A mathematical formulation in the form of a computer program that is so generic that it, without program changes, can be used to establish a model with the same basic type of equations (but allowing different input variables and parameter values) for different study areas. Model: A site-specific model established for a particular study area, including input data and parameter values.
11
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Model confirmation: Determination of adequacy of the conceptual model to provide an acceptable level of agreement for the domain of intended application. This is in other words the scientific confirmation of the theories/hypotheses included in the conceptual model. Code verification: Substantiation that a model code is in some sense a true representation of a conceptual model within certain specified limits or ranges of application and corresponding ranges of accuracy. Model calibration: The procedure of adjustment of parameter values of a model to reproduce the response of reality within the range of accuracy specified in the performance criteria. Model validation: Substantiation that a model within its domain of applicability possesses a satisfactory range of accuracy consistent with the intended application of the model. Model set-up: Establishment of a site-specific model using a model code. This requires, among other things, the definition of boundary and initial conditions and parameter assessment from field and laboratory data. Simulation: Use of a validated model to gain insight into reality and obtain predictions that can be used by water managers. This includes insight into how reality can be expected to respond to human interventions. In this connection uncertainty assessments of the model predictions are very important. Performance criteria: Level of acceptable agreement between model and reality. The performance criteria apply both for model calibration and model validation. Domain of applicability (of conceptual model): Prescribed conditions for which the conceptual model has been tested, i.e. compared with reality to the extent possible and judged suitable for use (by model confirmation). Domain of applicability (of model code): Prescribed conditions for which the model code has been tested, i.e. compared with analytical solutions, other model codes or similar to the extent possible and judged suitable for use (by code verification). Domain of applicability (of model): Prescribed conditions for which the site-specific model has been tested, i.e. compared with reality to the extent possible and judged suitable for use (by model validation).
2.2.3
Scientific philosophical aspects
The credibility of the descriptions or the agreements between reality, conceptual model, model code and model are evaluated through the terms confirmation, verification, calibration and validation. Thus, the relation between reality and the scientific description of reality which is constituted by the conceptual model with its theories and equations on flow and transport processes, its interpretation of the geological system and ecosystem at hand, etc., is evaluated through the confirmation of the conceptual model. By using the term confirmation in connection with conceptual model, it is implied that it is never considered possible to prove the truth of a theory/hypothesis and as such of a conceptual model. And even if a site-specific
12
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
model is eventually accepted as valid for specific conditions, this is not a proof that the conceptual model is true, because, due to non-uniqueness, the site-specific model may turn out to perform right for the wrong reasons. The fundamental view expressed by scientific philosophers is that verification and validation of numerical models of natural systems is impossible, because natural systems are never closed and because the mapping of model results are always non-unique (Popper, 1959; Oreskes et al., 1994). I agree that it is not possible to carry out model verification or model validation, if these terms are used universally, without restriction to domains of applicability and levels of accuracy. [12] note, however, that Popper (1959) distinguished between two kinds of universal statements: the 'strictly universal' and the 'numerical universal'. The strictly universal statements are those usually dealt with when speaking about theories or natural laws. They are a kind of 'all-statement' claiming to be true for any place and any time. In contrary, numerical universal statements refers only to a finite class of specific elements within a finite individual spatio-temporal region. A numerical universal statement is thus in fact equivalent to conjunctions of singular statements. The restrictions in use of the terms confirmation, verification and validation imposed by the respective domains of applicability imply, according to Popper's views, that the conceptual model, model code and site-specific models can only be classified as numerical universal statements as opposed to strictly universal statements. This distinction is fundamental for the terminology described in [12] and its link to scientific philosophical theories. Consequently the terms verification and validation should never be used without qualifiers. An important aspect of the framework outlined in [12] lies in the separation between the three different ‘versions’ of the word model, namely the conceptual model, the model code and the-site specific model. Due to this distinction it is possible, at a general level, to talk about confirmation of a theory or a hypothesis about how nature can be described using the relevant scientific method for that purpose, and, at a site-specific level, to talk about validity of a given model within certain domains of applicability and associated with specified accuracy limits.
13
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
2.3 Modelling Protocol The procedure for applying a hydrological model is often denoted a modelling protocol. It comprises a series of actions to be followed in a sequential or iterative form. The modelling protocol presented in [7] for distributed catchment modelling was inspired by the groundwater community (Anderson and Woessner, 1992). It was subsequently used in the Danish Handbook for Groundwater Modelling (Henriksen et al., 2001) that has been used extensively in practise since its emergence. A more recent modelling protocol, developed within the context of the EU research project HarmoniQuA, is reported in [13] and Scholten et al. (2007). The two protocols are illustrated in Figs. 4 and 5.
Fig. 4 The modelling protocol from [7].
14
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
A modelling study will involve several phases and several actors. A typical modelling study will involve the following four different types of actors: • The water manager, i.e. the person or organisation responsible for the management or protection of the water resources, and thus responsible for the modelling study and the outcome (the problem owner). • The modeller, i.e. a person or an organisation that works with the model conducting the modelling study. If the modeller and the water manager belong to different organisations, their roles will typically be denoted consultant and client, respectively. • The reviewer, i.e. a person that is conducting some kind of external review of a modelling study. The review may be more or less comprehensive depending on the requirements of the particular case. The reviewer is typically appointed by the water manager to support the water manager to match the modelling capability of the modeller. • The stakeholders/public. A stakeholder is an interested party with a stake in the water management issue, either in exploiting or protecting the resource. Stakeholders include the following different groups: (i) competent water resource authority (typically the water manager, cf. above); (ii) interest groups; and (iii) general public. The modelling process may, according to [13], be decomposed into five major steps which again are decomposed into 48 tasks (Fig. 5). The contents of the five steps are: • STEP1 (Model Study Plan). This step aims to agree on a Model Study Plan comprising answers to the questions: Why is modelling required for this particular model study? What is the overall modelling approach and which work should be carried out? Who will do the modelling work? Who should do the technical reviews? Which stakeholders/public should be involved and to what degree? What are the resources available for the project? The water manager needs to describe the problem and its context as well as the available data. A very important task is then to analyse and determine the various requirements of the modelling study in terms of the expected accuracy of modelling results. The acceptable level of accuracy will vary from case to case and must be seen in a socio-economic context. It should, therefore, be defined through a dialogue between the modeller, water manager and stakeholders/public. In this respect an analysis of the key sources of uncertainty is crucial in order to focus the study on the elements that produce most information of relevance to the problem at hand. • STEP 2 (Data and Conceptualisation). In this step the modeller should gather all the relevant knowledge about the study basin and develop an overview of the processes and their interactions in order to conceptualise how the system should be modelled in sufficient detail to meet the requirements specified in the Model Study Plan. Consideration must be given to the spatial and temporal detail required of a model, to the system dynamics, to the boundary conditions and to how the model parameters can be determined from the available data. The need to model certain processes in alternative ways or to differing levels of detail in order to enable assessments of model structure uncertainty should be evaluated. The availability of existing computer codes that can address the model requirements should also be addressed. • STEP 3 (Model Set-up). Model Set-up implies transforming the conceptual model into a site-specific model that can be run in the selected model code. A major task in Model Set-up is the processing of data in order to prepare the input files necessary for executing the model. Usually, the model is run within a Graphical User Interface (GUI) where many tasks have been automated. The GUI speeds
15
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
up the generation of input files, but it does not guarantee that the input files are error free. The modeller performs this work. • STEP 4 (Calibration and Validation). This step is concerned with the process of analysing the model that was constructed during the previous step, first by calibrating the model, and then by validating its performance against independent field data. Finally, the reliability of model simulations for the intended domain of applicability is assessed through uncertainty analyses. The results are described so that the scope of model use and its associated limitations are documented and made explicit. The modeller performs this work. • STEP 5 (Simulation and Evaluation). In this step the modeller uses the calibrated and validated model to make simulations to meet the objectives and requirements of the model study. Depending on the objectives of the study, these simulations may result in specific results that can be used in subsequent decision making (e.g. for planning or design purposes) or to improve understanding (e.g. of the hydrological/ecological regime of the study area). It is important to carry out suitable uncertainty assessments of the model predictions in order to arrive at a robust decision. As with the other steps, the quality of the results needs to be assessed through internal and external reviews. Each of the last four steps is concluded with a reporting task followed by a review task. The review tasks include dialogues between water manager, modeller, reviewer and, often, stakeholders/public. The protocol includes many feedback possibilities (Fig. 5). A comparison of the old protocol (Fig. 4) and the one decade younger HarmoniQuA protocol (Fig. 5) shows some interesting developments: • The basic sequence of the prescribed activities in the protocols is the same. The HarmoniQuA protocol is much more detailed than the old one, but there are no fundamental disagreements between the two. • The HarmoniQuA protocol puts much more emphasis on the framing of the modelling study. This is only considered in one box in Fig. 4 and not given much weight in [7], while it is one full Step comprising seven tasks in Fig 5. This implies for instance that requirements on performance criteria and uncertainty assessments are introduced rather late in the old protocol, while it is an important part of Step 1 in the HarmoniQuA protocol. • There is much emphasis on uncertainty assessments throughout the modelling process in the HarmoniQuA protocol, while uncertainty assessments are only considered as part of model calibration and simulation in the old protocol. • The HarmoniQuA protocol is part of a quality assurance framework with much emphasis on the role play between the various actors in the modelling process. This results in stakeholder involvement, peer reviews, focus on reporting and dialogue between water manger and modeller. In contrary to this, the old protocol only focuses on the modeller. These developments reflect a process from guidance to the modeller only (old protocol) towards guidance to all actors involved in the modelling process (HarmoniQuA). This process has been inspired by feedbacks from introducing the old protocol to real world applications, where it was realised that a broader concept was required.
16
Calibration and Validation
Data and Conceptualisation
Collect and Process Raw Data
Model Study Plan
Dire
Identify Data Availability
Model Parameters Summarise Conceptual Model and Assumptions Need for Alternative Conceptual Models? No
Prepare Terms of Reference Proposal and Tendering Agree on Model Study Plan and Budget Yes
Dire Dire
Determine Requirements
No
Construct Model
Model Structure and Processes
Define Objectives
Assess Soundness of Conceptualisation
Test Runs Completed
Not OK
Simulations
Not OK
All Calibration Stages Completed?
Specify or Update Calibration and Validation Targets and Criteria
Yes
Parameter Estimation
Check Simulations OK
No
Yes Assess Soundness of Calibration
Report and Revisit Model Study Plan (Model Set-up)
Not OK
Analyse and Interpret Results
OK
OK
Process Model Structure Data Not OK
Set-up Scenario
Define Stop Criteria
Model Set-up
Yes Describe Problem and Context
Select Calibration Method
Select Calibration Parameters
No
Sufficient Data?
Simulation and Evaluation
Specify Stages in Calibration Strategy
Describe System and Data Availability
Dire
Assess Soundness of Simulattion
Not OK
OK Not OK
OK Validation
Uncertainty Analysis of Simulation
All Scenarios Completed?
No
Not Dire
Dire
OK Code Selection
OK Review Model Set-up and Calibration and Validation Plan
Dire
Assess Soundness of Validation
Not OK
OK OK Uncertainty Analysis of Calibration and Validation Scope of Applicability
Report and Revisit Model Study Plan (Data and Conceptualisation)
Yes Reporting of Simulation and Evaluation Review of Simulation and Evaluation
Not OK
OK Need for Post Audit
Report and Revisit Model Study Plan (Calibration and Validation)
Not Review Data and OK Conceptualisation and Model Set-up Plan
Not OK Review Calibration and Validation and Simulation Plan
OK
OK
Model Study Closure
Fig. 5 The five modelling steps and the 48 tasks in the HarmoniQuA modelling protocol. The diagram is an updated version of Fig. 5 in [13] (Refsgaard et al., 2006).
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
2.4 Classification of Models Many attempts have been made to classify hydrological models (or model codes). Refsgaard (1996) presented the classification shown in Fig. 6 that I have used in all papers of the present thesis. Deterministic models can be classified according to whether the model gives a lumped or a distributed description of the considered area, and whether the description of the hydrological processes is empirical, conceptual, or more physically-based. A lumped model implies that the catchment is considered as one computational unit. A distributed model, on the other hand, provides a description of catchment processes at geo-referenced computational grid points within the catchment. An intermediate approach is a semi-distributed model, which uses some kind of distribution, either in sub-catchments or in hydrological response units, where areas with the same key characteristics are aggregated to sub-units without considering their actual locations within the catchment. Examples of hydrological response units considered in semi-distributed models are elevation zones, which are relevant for snow modelling, and combinations of soil and vegetation type, which may be relevant for simulation of root zone processes such as evapotranspiration and nitrate leaching. As most conceptual models are also lumped, and as most physically-based models are also distributed, the three main classes emerge: • Empirical (black box) • Lumped conceptual models (grey box) • Distributed physically-based (white box) The classification is discussed in some details in Refsgaard (1996). Here, the focus is on the two traditional approaches in deterministic hydrological catchment modelling, namely the lumped conceptual and the distributed physically-based ones. The fundamental difference between these two types of models lies in their process descriptions and the way spatial variability is treated. The distributed physically-based models contain equations which have originally been developed for point scales and which provide detailed descriptions of flows of water and solutes. The variability of catchment characteristics is accounted for explicitly through the variations of hydrological parameter values among the different computational grid points. This approach leaves the variability within a grid as un-accounted for, which in some cases is of minor importance but in other cases may pose a serious constraint. The lumped conceptual models uses empirical process descriptions, which have built-in accounting for the spatial variability of catchment characteristics.
18
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Fig. 6 Classification of hydrological models according to process description (Refsgaard, 1996).
Typical examples of lumped conceptual model codes are the Stanford Watershed Model (Crawford and Linsley, 1966), the Sacramento (Burnash, 1995), the HBV (Bergström, 1995) and the NAM (Nielsen and Hansen, 1973). Typical examples of distributed physically-based model codes are the MIKE SHE (Abbott et al., 1986a, b; Refsgaard and Storm, 1995) and the Thales (Grayson et al., 1992a, b). Groundwater model codes like MODFLOW belong to the distributed physically-based class. The classification has some shortcomings that should be noted. First of all, the use of the term ‘conceptual model’ is unfortunate, because this is a different meaning of the term as compared to the definition given in Section 2.2 and used in the modelling protocols (Section 2.3). This can cause some confusion, but to introduce a new term completely different from what is used by almost all other scientists in the community of catchment modelling may cause even more confusion. Secondly, and more fundamental, the names of the classes should be considered as relative rather than absolute. For example Beven (1989) argued that in most applications physically-based models are used as lumped conceptual models at the grid scale. As discussed in [4] I agree that some degree of lumping and conceptualisation will always need to take place, but that in spite of this there is a fundamental difference in the functioning and, as shall also be discussed later, of the applicability of the two model types.
19
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
3
January 2007
Simulation of Hydrological Processes at Catchment Scale
In this chapter some modelling examples from the publications are briefly summarised and discussed within the framework outlined in Chapter 2.
3.1 Flow modelling
3.1.1
Groundwater/surface water model for the Suså catchment ([1], [2])
Summary The publications [1] and [2] describe a new model code and the set-up, calibration and validation of a model for a 1,000 km2 area. Further details can be found in Stang (1981), Refsgaard (1981) and Refsgaard and Stang (1981). The objectives of the study were to develop a spatially distributed groundwater/surface water model code and apply it to the Suså catchment with a particular focus on the stream-aquifer interaction in a hydrogeological system consisting of confined aquifer-aquitardphreatic aquifer and to test the model for prediction of the hydrological consequences on streamflows and hydraulic heads of groundwater abstraction. The new model code was rather complex and computationally demanding at the time of development. Thus, standard 30 years model simulations could only be carried out as night runs at the main frame computer at DTU’s computer centre. The model area comprising the Suså and the neighbouring Køge Å catchments is located in the central and southern part of Zealand. The model area, the topographic divides and the groundwater model polygonal mesh are shown in Fig. 7. The overall structure of the model is outlined in Fig. 8. It consists of four separate components for the confined regional aquifer, the aquitard, the phreatic aquifer and the root zone. The spatial distribution and the degree of physical basis differ between the four components. The time steps in the calculations are one day in all parts of the model. The confined aquifer is described by a two-dimensional integrated finite difference model with 112 polygons. For the phreatic aquifer consisting of till with very small transmissivities and for the aquitard each of the polygons are distributed further into four sub-polygons based on hypsographic curves (Fig. 9). Due to small scale topographic variations the flows in the aquitard in most polygons are upwards in some parts and downwards in other parts of the polygon. A correct representation of these flows between the regional aquifer and the phreatic aquifer that discharges the rivers is crucial for achieving a good description of the stream-aquifer interaction. Without such approach allowing a description of both upwards and downwards flows in the aquitard within the same polygon a much finer spatial resolution with 10-100 times as many polygons would have been required. This would have been impossible 25 years ago due to computational constraints.
20
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
The root zone component calculated the net precipitation that recharged the phreatic aquifer. The modelling area was divided into seven sub-areas with separate precipitation input and soil parameters. Further the spatial variation in vegetation was accounted for by dividing each of these seven areas into five vegetation areas based on agricultural statistics and one meadow (wetland) area. This makes the total distribution to 42 sub-areas where each sub-area is a kind of ‘hydrological response unit’, i.e. a semidistributed approach. The root zone calculations were based on a box approach with four layers in the root zone.
Fig. 7 Topographic divides, groundwater polygonal mesh, precipitation gauging stations and precipitation zones of the Suså model.
21
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Fig. 8 The structure of the Suså model Legend < 24 m above MSL 24–28 m above MSL 28–34 m above MSL > 34 m above MSL Ground surface
40
0
Water table (lower outlet)
1
2
3
km
POLYGON 21
30 Head, regional aquifer
Aquitard
Lilleå
20 Vendebæk Pre-Quaternary surface
10 Regional aquifer
1 0
2
3 50
4
Suså 100 %
Gasmose Bæk
Fig. 9 Hypsographic curve for polygon 21 and areas represented by the four sub-polygons.
22
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Fig. 10 Examples of simulation results from soil moisture in root zone, hydraulic head of regional confined aquifer and river discharge. The model was calibrated against soil moisture data from four experimental plots, time series of hydraulic heads from 40 observation wells in the regional aquifer and streamflow from six gauging stations. Examples of simulation results from the calibration period are shown in Fig. 10 which shows excellent curve fits. The groundwater and aquitard models were calibrated, along with the code development itself, using all available hydraulic head data from the period 1950-80. Between 1964 and 1970 the groundwater abstraction to Copenhagen Water Supply from the Regnemark Waterworks in the Køge Å catchment was increased from zero to about 15 million m3/year. The remaining model components
23
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
were calibrated against only some of the available streamflow data, namely some of the data from the Suså catchment, while amongst others Køge Å data were not used for calibration. While the simulation of streamflows in the Køge Å catchment in [1] was characterised as a “half-way test of the model’s ability to simulate streamflow from ungauged catchments” no systematic validation tests against independent data were carried out as part of the study. Some years later the model simulations were extended with new data from the period 1981-87, where the groundwater abstractions had changed slightly. In this post audit validation study the model simulations were found to match the observations to the same degree of accuracy as during the calibration period (Jensen and Jørgensen, 1988). The model’s ability to simulate the streamflow depletion caused by a groundwater abstraction from the regional confined aquifer was tested on historical data from the Køge Å catchment. Fig. 11 shows simulated streamflow assuming actual groundwater abstraction from the Regnemark Waterworks starting in 1964, Qsim, and assuming no abstracting from Regnemark, Q1sim. The recorded streamflow fits reasonably well with Qsim. The difference Q1sim - Qsim, which is the simulated streamflow depletion caused by the increased groundwater abstraction, is seen to have a clear seasonal variation with smaller depletion during the dry summer periods and larger depletion during the wet winter season.
Fig. 11 Comparison of 15 days moving average streamflows for Køge Å (lower) and the relative streamflow depletion caused by the groundwater abstraction (upper)
24
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Discussion - post evaluation Most other catchment models existing when the Suså model code was developed were either purely rainfall runoff models of the lumped conceptual type, such as the classical Stanford Watershed Model (Crawford and Linsley, 1966), the HBV (Bergström and Forsman, 1973; Bergström, 1976) and the NAM (Nielsen and Hansen, 1973) or purely groundwater models (Prickett and Lonnquist, 1971; Thomas, 1973). A few authors had concluded that coupled groundwater/surface water modelling was essential (e.g. Luckner, 1978; Lloyd, 1980) and some had outlined specific, but not yet operational, concepts (e.g. Freeze and Harlan, 1969; Wardlaw, 1978; Jønch-Clausen, 1979). In some studies groundwater models and rainfall-runoff models were used at the same catchment, but without coupling (e.g. Weeks et al., 1974). Thus, apparently no other model had previously been used to dynamically simulate coupled groundwater/surface water conditions at catchment scale (rainfall, evapotranspiration, surface near runoff, groundwater recharge, groundwater heads, baseflow discharge from aquifers to streams). During the decade following [1] and [2] a few model codes with integrated groundwater/surface water descriptions emerged. The most prominent of these codes was the SHE (Abbott et al., 1986a, b) and its operational daughter codes, MIKE SHE from DHI (Refsgaard and Storm, 1995) and SHETRAN from University of Newcastle (Bathurst and O’Connell, 1992), which both are used today, although in later versions. Other operational models from that period were described by Miles and Rushton (1983), Christensen (1994) and Wardlaw (1994). Miles and Rushton (1983) used a simpler root zone and surface water component than [1] together with a two-dimensional finite difference groundwater model and monthly time steps. Christensen (1994) developed a model for the Tude Å catchment (a neighbour to Suså) that conceptually was similar and a little bit simpler than [1]. Wardlaw et al. (1994) used the concepts outlined in Wardlaw (1978) coupling the Stanford Watershed Model with a finite-difference groundwater model and a channel routing model for simulation of discharge and groundwater levels in the Allen catchment in England. During the past decade the number of integrated modelling codes has exploded. The existing codes today can be considered to fall in three classes: (a) fully integrated codes such as MIKE SHE (Graham and Butts, 2005); (b) couplings of existing groundwater codes and surface water codes such as MODFLOW and SWAT (Perkins and Sophocleous, 1999); and (c) codes based on the fully 3-dimensional Richards’ equation (Panday and Hayakorn, 2004). Independent reviews of the scientific basis and practical applicability of a number of recent integrated model codes are provided by e.g. Kaiser-Hill (2001) and Tampa Bay Water (2001). A major novelty of [1] and [2] was that the Suså model code was one of the first codes, which integrated surface water and groundwater descriptions, and the first of its kind applied operationally to moraine landscapes. The model results were unique with respect to simulation of the dynamics of the groundwater/surface water interaction, as for instance reflected by the annual hydraulic head fluctuations and the streamflow depletion due to the groundwater abstraction. Furthermore the study provided new insights and understanding on the mechanisms that governed streamflow depletion due to groundwater abstraction from confined aquifers in moraine catchments. In contrary to the traditional type curve analyses which were used extensively in hydrogeology to analyse test pumpings and to predict the effects of abstractions, [1] and [2] were based on non-stationary analysis which, as evident from the annual variations of streamflow depletion shown in Fig. 11, turns out to be crucial. The only modelling study from the following decade that considered the dynamics of the stream-aquifer interaction in moraine catch-
25
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
ments in connection with groundwater abstraction was Christensen (1994) who basically confirmed the results of [2]. The spatial distribution and the degree of physical basis differ between the four components of the Suså model. The groundwater model can be characterised as distributed physically-based, the aquitard model as semi-distributed physically-based and the phreatic aquifer and root zone models as semidistributed conceptual. In contrary to for instance the later SHE code (Abbott et al, 1986a, b), the Suså model code was not generic, because it could not be applied to other catchments without changes in the code. Furthermore, it was tailored to the specific hydrological conditions prevailing in the Suså catchment and could for instance not be applied to an alluvial unconfined aquifer. In retrospect, it is interesting to observe that issues related to the credibility of model simulations were not critically analysed or discussed in [1] and [2]. First of all, aspects of code verification were not dealt with in the publications, although a major novelty of the work was the development of a completely new code. Secondly, and maybe more surprisingly, model validation and uncertainty assessments of model simulations were almost not addressed. By using all the available groundwater head data for calibration the opportunity to make split-sample validation test against parts of the data or even the unique opportunity to calibrate on data before the groundwater abstraction and validate on data after the abstraction (differential split-sample test according to Klemes (1986)) were not utilised. By not addressing the uncertainty and by not conducting rigorous validation tests the reader may be left with the, undocumented, impression that the curve fitting in Fig. 10 is supposed to reflect the predictive capability of the model. That the model proved to perform well in a subsequent post-audit validation study could not be known at the time of [1] and [2]. The other integrated groundwater/surface water modelling studies from the following decade (Miles and Rushton, 1983; Christensen, 1994; Wardlaw, 1994) had the same characteristics, i.e. only focus on calibration and model prediction but no mentioning of verification of the new model codes, no model validation tests against independent data and no uncertainty assessments. The SHE study reported by Bathurst (1986a, b) focussing on surface water hydrology did include split-sample validation testing and sensitivity analysis. For surface water (rainfall-runoff) modelling studies focusing more on model applications than code developments split-sample testing was more common (e.g. Bergström, 1976; WMO, 1975; WMO 1988) but uncertainty assessment was not systematically carried out and usually not even considered until Beven called for it (Beven, 1989; Beven and Binley, 1992). Altogether, this illustrates a very significant development in the modelling practise during these three decades.
26
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
3.1.2
January 2007
Application of SHE to catchments in India ([4], [5])
Summary The publications [4] and [5] describe the set-up, calibration and validation of the ‘Système Hydrologique Européen’ (SHE) code to six sub-catchments totalling about 15,000 km2 of the Narmada basin in India, Fig. 12. The objective of the papers was to describe experiences from applying a distributed physicallybased code like SHE to large basins with rather limited data coverage compared to previous SHE applications to research catchments. In contrary to the Suså study in [1] and [2], the India study did not include any code development, except for data processing utility software. Instead it comprised application of an existing code (Abbott et al., 1986a,b) to conditions that were far beyond the conditions for which the SHE had previously been tested in terms of catchment size, data coverage and hydrological regime (Bathurst, 1986a).
Fig. 12 Location map for the Narmada and the six sub-catchments.
Applicationwise, the study focused on simulation of catchment runoff, i.e. surface water aspects only. The model structure was as illustrated in Fig. 17. The groundwater zone was, however, considered only with one layer, i.e. a 2-dimensional groundwater model, and there were no data from observation wells to allow a calibration of the groundwater part of the model. The six models were set-up with a 2 km x 2 km computational grid. A split-sample approach was used with typically three years for model calibrations and other three years for the subsequent model validation.
27
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
The data requirements for a SHE based model is substantial and much larger than for a rainfall-runoff model of lumped conceptual type that previously had been applied to such types of catchments. A major challenge of the study was therefore to identify, collect and process data and to check their quality. Data were collected from more than 15 different agencies belonging to many different ministries and the data quality varied substantially. Another challenge was how to assess parameter values in a distributed model when data, in contrary to the previous tests on small experimental catchments like in Bathurst (1986a), are scarce. Each of the grid points in a distributed model is characterised by one or more parameters. Although the parameter values in principle (as in nature) vary from grid point to grid point, it is neither feasible nor desirable to allow the parameter values to vary so freely. Instead, a given parameter should only reflect the significant and systematic variation described in the available field data. Therefore a parameterisation procedure was developed, where representative parameter values were associated to individual soil types, vegetation types, geological layers, etc. This process of defining the spatial pattern of parameter values effectively reduced the number of free parameter coefficients, which needs to be adjusted in the subsequent calibration procedure. For example, the 820 km2 Kolar catchment is parameterised into three soil classes and 10 land use/soil depth classes. For the soil type classes calibration was allowed for the hydraulic conductivity in the unsaturated zone (for each soil type class the conductivity could vary among three different land uses => nine parameter values). For the land use/soil depth classes the calibration parameters comprised soil depths (10 parameters in total) and the Strictler overland flow coefficients for four land use types (four parameters in total). Further three parameters were subject to calibration (hydraulic conductivity in the saturated zone, an (empirical) by-pass coefficient and a surface retention parameter; all kept constant throughout the catchment). Although the 26 calibration parameters could not be assessed from field data alone, but had to be modified through calibration, the physical realism of the parameter values resulting from the subsequent calibration procedure could be evaluated from available field data. The simulation results are illustrated in Fig. 13 as hydrographs for the largest sub-catchment and in Fig. 14 as annual runoff and annual peaks for all six sub-catchments. In both figures the results are for the validation periods, where results are slightly poorer as compared to the calibration periods. In [4] the rainfall-runoff simulation results were characterised as having the same degree of accuracy as would have been expected with simpler hydrological models of the lumped conceptual type. The results therefore suggested that application of complex data demanding models like the present SHE approach are not justified in cases where the modelling objective is limited to simulation of catchment runoff and where observed runoff records exist for calibration purposes. No attempts were made in the study to test the capability of a model without calibration. After the first calibration and validation tests had been made, field investigations were carried out in the Kolar catchment during a 2½ week period to improve the parameter estimates, mainly for soil and vegetation parameters, and to evaluate the importance of additional field data. Subsequently, the Kolar model was recalibrated in such a way that rather narrow constraints were put on the range of values allowed for the key parameters. The final model, based on the additional data, produced simulation results of same quality as the preliminary model with respect to simulated hydrograph. Although it is argued in [5] that the final model is believed to give an improved physical representation of the hydrological regime, it is concluded that a good match between observed and simulated outlet hydrographs does not provide a sufficient guarantee of a hydrologically realistic process description.
28
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Fig. 13 Observed and simulated hydrographs for the Narmada at Manot during the validation period 1985 and 1987.
Fig. 14 Simulated monthly runoff during monsoon season (left) and simulated annual peak discharge compared with measured values during validation periods for all six sub-catchments.
29
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Discussion - post evaluation At the time of [4] and [5] lumped conceptual catchment model codes such as HBV (Bergström, 1992) and NAM (Jønch-Clausen and Refsgaard, 1984) had been used operationally for two decades, typically for catchments ranging from a few km2 to more than 10,000 km2. At the same time distributed physically-based models had mainly been tested on flood events on small catchments that typically had very good data due to experimental instrumentation (Loague and Freeze, 1985; Bathurst 1986a; Grayson et al., 1992a,b; Troch et al., 1993). Loague and Freeze (1985) compared a quasi-physically based model with a regression model and a unit hydrograph model on three experimental catchments, the 0.1 km2 R-5, Chickasha, Oklahoma, the 7.2 km2 WE-38, Klingertown, Pensylvania and the 0.1 km2 HB-6, West Thornton, New Hampshire. Bathurst (1986a) applied the SHE to the simulation of flood events for the 10.6 km2 experimental Wye catchment in Wales. Grayson et al. (1992a,b) applied the THALES to the simulation of flood events for the 7.0 ha Wagga catchment in Australia and the 4.4 ha Lucky Hill catchment at the Walnut Gulch Experimental Area in Arizona. Troch et al. (1993) applied a model based on a 3-dimensional numerical solution to Richards’ equation to the 7.2 km2 WE-38 catchment and a 0.64 km2 subcatchment. To my knowledge the only examples until then of distributed physically-based model studies including applications on several hundred km2 catchments and continuous simulation for periods of several years were the coupled groundwater/surface water models discussed in the previous section ([1]; [2]; Miles and Rushton, 1983; Christensen, 1994; Wardlaw et al., 1994) that all had distributed physically-based groundwater components and lumped (or semi-distributed) conceptual surface water components and some models such as WATBAL (Knudsen et al., 1986) that had semi-distributed surface water components and lumped conceptual groundwater components. During the following few years a few additional catchment scale studies with continuous simulations of distributed physically-based models emerged. One example is Querner (1997) who applied the MOGROW to the 6.5 km2 Hupselse Beek catchment simulating both discharge and groundwater head dynamics. Another example is Kutchment et al. (1996) who simulated surface water processes for the 3315 km2 Ouse catchment. The study of Kutchment et al (1996) had many similarities with [4] and [5] with respect to model conceptualisation and conclusions. The main scientific contribution of [4] and [5] was therefore as the first study to demonstrate that distributed physically-based models could be established for catchments of this size and with ordinary data availability. Previous studies reported in literature had either been tests on small research catchments or been models with major components of the lumped conceptual type. As outlined above, it is worth noting the different traditions in the communities that had dealt with (large scale) lumped conceptual models, (small scale) physically-based models and groundwater models, respectively. I believe that an important characteristic of the team who performed the present study ([4] and [5]) was that it comprised scientists who together had comprehensive experiences from all these communities. Another key contribution was the parameterisation approach introduced. The point of departure for this approach, e.g. [1] and Bathurst (1986a), was an approach allowing parameter values to vary as required to fit the observed data during the calibration phase. This approach had been criticised by Beven (1989) to result in overparameterisation. The procedure resulted in 26 parameters to be calibrated for the Kolar catchment. Although this number is significantly less than e.g. the number of free parameters
30
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
in [1], it is still very high and it is very likely that a sensitivity analysis would have shown that this number could easily be reduced without loss of model performance. It is interesting to note that similar parameterisation approaches reported for other catchments in 1997 ([7]) and 2001 (Andersen et al., 2001) resulted in 11 and 4 free parameters, respectively, implying that the parameterisation approach adopted in [4] and [5] were not yet finally developed. Beven (1989) had provided a fundamental critique of the way physically-based models such as the SHE had been promoted by e.g. Abbott et al. (1986a) and Bathurst (1986a). His main critique was that the attitudes in these early SHE papers were not realistic with respect to the abilities and achievements of physically-based models. Beven pointed amongst others to the following key problems: • The process equations are simplifications leading to model structure uncertainty. • Spatial heterogeneity at subgrid scale is not included in the physically-based models. The current generation of distributed physically-based models are in reality lumped conceptual models. • There is a great danger of overparameterisation if it is attempted to simulate all hydrological processes thought to be relevant and the related parameters against observed discharge data only. As a conclusion Beven argued that for future applications attempts must be made to obtain realistic estimates of the uncertainty associated with their predictions, particularly in the case of evaluating future scenarios of the effects of management strategies. [4] noted some of Beven’s critique, acknowledging that the process representation at the 2 km x 2 km grid squares is causing significant violations of some of the process descriptions, that “some degree of lumping and conceptualisation has taken place at the grid scale” and that “scale problems are important”. [4] stressed, however, that in spite of these acknowledged limitations “the present basin model is much more physically based and distributed than the traditional lumped conceptual model, where the entire catchment is represented in effect by one grid square, and where the process representations due to averaging over characteristics of topography, soil type and vegetation type are fundamentally different from the basic physical laws”. [4] and [5] concluded that the SHE is a suitable tool to support water management for conditions in India. In contrary to this, Beven (1989) had stated that the physically-based models “are not well suited to applications to real catchments”. In retrospect, it is remarkable that [4] and [5] did not go more substantially into a dialogue with the very fundamental critique raised by Beven (1989). For instance [4] and [5] did not comment at all on Beven’s main conclusion on the need for uncertainty assessment, although [5] actually used the model to study the impact of soil and land use by performing sensitivity analyses. A more comprehensive response and dialogue took place a few years later (Beven, 1996a; Refsgaard et al., 1996; Beven, 1996b). Seen in the perspective of present protocols for good modelling practise ([12] and [13]) the approach and conclusions in [4] and [5] are especially deficient by the lacking focus on uncertainty assessment. A main reason for the lack of dialogue with Beven’s critique and the lack of focus on uncertainty in [4] and [5] may be that we were too preoccupied with the real achievement as the first to setting up and running such type of model for such large catchments. Another reason may be that some of us had a background in groundwater modelling, where large scale distributed physically-based models had been successfully used to support practical water resources management for more than a decade, so we considered Beven’s statement that the physically-based models “are not well suited to applications to real catchments” as a large exaggeration.
31
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
3.1.3
January 2007
Intercomparison of different types of hydrological models ([6])
Summary The research study reported in publication [6] had two objectives. The first objective was to identify a rigorous framework for the testing of model capabilities for different types of tasks. The second objective was to use this theoretical framework and conduct an intercomparison study involving application of three model codes of different complexity to a number of tasks ranging from traditional simulation of stationary, gauged catchments to simulation of ungauged catchments and of catchments with nonstationary climate conditions. Data from three catchments in Zimbabwe were used for the tests. The three codes used in the study were (a) NAM (Nielsen and Hansen, 1973; Havnø et al., 1995) – Fig. 15; (b) WATBAL (Knudsen et al., 1986) – Fig. 16; and (c) MIKE SHE (Abbott et al., 1986a,b; Refsgaard and Storm, 1995) – Fig. 17. The NAM and MIKE SHE can be characterised as very typical of their lumped conceptual and distributed physically-based types, respectively, while the WATBAL with its semi-distributed approach falls in between these two standard classes.
Fig. 15 Structure of the NAM rainfall-runoff model code
32
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Fig. 16 Structure of the WATBAL code.
Fig. 17 Schematic representation of the model structure of the ‘Système Hydrologique Européen’ (SHE) code.
The three catchments in Zimbabwe that were selected for the tests were Ngezi-South (1090 km2), Lundi (254 km2) and Ngezi-North (1040 km2). For two of the catchments the model simulations started with a blind simulation, i.e. a simulation where no calibration was conducted, but where model parameters were assessed directly from field data and indirectly by considering parameter values in the first catchment (proxy basin test). Then one year was made available for calibration and finally the full calibration period of 4-5 years was used. In all cases an independent period was used for validation tests (splitsample test). The hydrological regime in Zimbabwe is semi-arid and characterised by very large interannual variations. It was therefore possible to construct a test scheme in such a way that a model’s ability to predict differences in climate input could be tested by calibrating on a dry period and validating on a wet period or vice versa (differential split-sample test).
33
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
The model performance was evaluated for annual runoff and criteria focussing on the shape of the discharge hydrograph, i.e. rainfall-runoff modelling. The modelling work was carried out by three different persons/teams that were very experienced by applying their respective model codes. A general conclusion from the study was that the performances of the three codes were surprisingly similar. Thus, the ability of WATBAL and SHE to explicitly utilise data such as topography, soil and vegetation data that the NAM could not use turned out to make no significant difference in most cases. In summary the conclusions were: • Given a few (1–3) years of runoff measurements, a lumped model of the NAM type would be a suitable tool from the point of view of technical and economical feasibility. This applies for catchments with homogeneous climatic input as well as cases where significant variations in the exogenous input are encountered. • For ungauged catchments, however, where accurate simulations are critical for water resources decisions, a distributed model is expected to give better results than a lumped model if appropriate information on catchment characteristics can be obtained.
Discussion - post evaluation A scientific contribution of [6] was the adoption and demonstration of Klemes’s model validation testing scheme, which had not been much used since the basic idea was published by Klemes (1986). This is discussed further in Section 4.2.4. Furthermore, the results from the intercomparison contributed to the ongoing scientific discussion on which types of model codes should be recommended for which application purpose. Only a few intercomparison studies involving different model types had been reported in literature and only two studies included physically-based models (Loague and Freeze, 1985; Michaud and Sorooshian, 1994). Most of these previous studies had been conducted on small research catchments and none of them had included tests for non-stationary climate conditions as in [6]. From the emergence of the distributed physically-based models it was widely stated and believed that these new model types generally would be able to provide more accurate simulation of the hydrological cycle (Abbot et al., 1986a). In the absence of hard facts from suitable tests the scientific debate had to a very large extent been based on expectations and qualitative arguments such that the models with more physical basis in their model structure were assumed to be able to provide more accurate simulation results, or the opposite view, as e.g. advocated by Beven (1989) that such expectations to the superior performance of the physically-based models were unrealistic. In [4] we basically agreed with Beven (1989) with respect to the SHE’s capability to simulate discharge for large scale catchments with ordinary data, i.e. that the rainfall-runoff simulation results were of the same degree of accuracy “as would have been expected” with simpler hydrological models of the lumped conceptual type. With the results from [6] it was now possible to more firmly conclude that if the purpose of modelling is limited to simulation of runoff under stationary catchment conditions and if data exist for calibration purpose, there is no scientifically documented reason to go beyond lumped conceptual models. This issue has been subject to several studies since then, where the conclusions from [6] basically have been confirmed (e.g. Perrin et al., 2001; Reed et al., 2004). I believe that the only thing that may change that conclusion is the introduction of new spatial data from new airborne or satellite sensors. Whereas these new data types have proven to have great value for many hydrological purposes and for special condi-
34
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
tions (e.g. snow cover), they have in general not yet documented that they can provide distributed models with comparative advantages in simulation of catchment runoff.
35
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
3.2 Reactive Transport
3.2.1
Oxygen transport and consumption in the unsaturated zone ([3])
Summary Publication [3] describes the development of a new code for simulation of oxygen transport and consumption in the unsaturated zone. The code was linked as a sub-component to the SHE modelling system (Abbott et al., 1986a,b). The objective of the paper was to describe the new process formulation, document its applicability through two case studies and outline the perspectives in relation to its use as part of the comprehensive SHE code. The unsaturated zone water flow calculations in SHE were based on a finite difference solution to the full Richards’ equation for unsteady soil water flow. The solute transport calculations were based on the traditional convection-dispersion equation. The new code for oxygen transport and consumption was an add-on to these first two steps and used information on soil moisture content, water flows and solute concentrations and fluxes as input. Thus the spatial representation is given by the underlying flow and solute transport discretisation, implying a one-dimensional description with spatial resolution ranging from a few cm close to the terrain to 20-40 cm further down in the soil column. The process description in [3] is based on a three-phase system (soil, water, air) and accounting for spatial heterogeneity at this small scale. Fig. 18 shows a microscale illustration of the soil. Air tends to fill the larger pores in the soil matrix whereas water is drawn into the narrow necks and finer pore spaces in aggregates, forming capillary films and wedges. The air and water coexist in the soil by occupying different geometric configurations. Oxygen movement within these different portions of the pore space can occur by: convective transport in the water, diffusion in water, convective transport in soil air, diffusion in soil air, diffusion into water-saturated soil crumbs, and consumption in free and fixed water. Microorganisms and plant roots are generally found in the finer pores of the soil because they require close contact with the soil particles for uptake of substrate and nutrients. Transport of oxygen to these respiring sites usually occurs in the water phase of soil crumbs. It is the rate of oxygen diffusion through this fixed water in micropores that will determine the availability of oxygen for respiration and the anaerobic fraction of the soil. A soil crumb is considered to be any fully water-saturated subvolume of soil, the physical size of which is determined by the nearness of air-filled soil pores. The crumb is thus defined by the fact that oxygen transport within the crumb is primarily due to diffusion in water-filled pores. The size of the soil crumbs is dependent on the water content of the soil and the corresponding number of air-filled pores. The relation between soil water content and size of the water crumbs is derived from the soil water retention curve that is already used in Richards’ equation. The idea behind this is illustrated in Fig. 19 and described in more details in [3]. The number of air filled pores at a given soil moisture content can be
36
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
calculated from the retention curve (Fig. 19b). It is furthermore assumed that the distance between two air filled pores, di, corresponds to the average diameter of a water saturated crumb (Fig. 19a).
Air “Free” water Solids/ aggregates
“Fixed” water
Anaerobic zone Aerobic zone
Fig. 18 Microscale representation of the three-phase soil system with respect to oxygen transport.
Tension (ψ)
Pore radius (p)
Airfilled pore Water saturated crumb
di
L
(θi+1)
(a)
θi
Water content (θ)
(b)
Fig. 19 (a) The assumed pore distribution within the unit L x L. (b) Retention curve showing the relation between tension, water content and pore radius of a soil. The two case studies where the model code was tested and demonstrated dealt with operation of a waste water infiltration plant and assessment of anaerobic zones of importance for denitrification in agricultural soils.
37
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Discussion - post evaluation Previous research in oxygen transport processes in heterogeneous soils (e.g. Currie, 1961; Smith, 1980; Troeh et al., 1982) were based on the assumption of steady-state conditions with regard to crumb/aggregate size and aerobic-anaerobic fractions. The novel scientific contribution of this paper was the new concept of calculating the size of the water crumbs as a function of the water retention curve and the time varying soil moisture content originating from SHE calculations and the linking of this concept to the previous research in this field. In this way it became possible to calculate aerobicanaerobic fractions dynamically. Although the scale of consideration in this study is the smallest possible in a catchment modelling perspective, namely point or column scale, it illustrates that smaller scale phenomena (here diffusion into soil crumbs that are of mm or less in size and temporally varying) often dominate the oxygen conditions at grid (cm - dm) scale. The approach in [3] is an upscaling from grain size to computational model grid point, where the within grid heterogeneity is accounted for by developing a set of process equations that includes the effect of the smaller scale heterogeneity at the larger grid scale. In retrospect, it is interesting to consider the issues that were not discussed in [3]. In this respect it should be noted that code verification aspects were not mentioned in [3], although a completely new code was developed. Furthermore, [3] did not discuss the issue of upscaling the present grid scale processes to application at catchment scale. Interesting issues in this regard would be evaluations of how data and parameter values could be assessed for catchment scale applications and discussions of whether it would still be the mm-scale (crumbs) processes that would be dominating when simulating at large scale, or whether larger scale heterogeneities, such as differences in crops, soil types or topography, would become more important and thus reduce the importance of the present process description. The model code presented in [3] was developed in a ‘research version’ of the SHE code. After the completion of the study it was not upgraded to become part of the ‘commercial version’ of MIKE SHE that emerged a few years later. The oxygen model has not been used for practical purposes. To my knowledge, process description of the same detail as in [3] has not been included in any catchment model, and not even in the most comprehensive physically-based root zone models such as DAISY (Hansen et al., 1991; Abrahamsen and Hansen, 2000). In DAISY that provides state-of-the-art descriptions of root zone processes with focus on water, plant growth and nitrogen a much simpler and more empirical process formulation is used for calculating denitrification as a function of anaerobic subsoil conditions.
38
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
3.2.2
January 2007
An integrated model for the Danubian Lowland ([9])
Summary Publication [9] is concerned with environmental assessment studies in connection with the Gabcikovo hydropower scheme along the Danube. The objective of the underlying study was to develop and apply a comprehensive integrated modelling system to support management decisions in this respect. The Danubian Lowland (Fig. 20) in Slovakia and Hungary downstream Bratislava is an inland delta formed in the past by river sediments from the Danube. The entire area forms an alluvial aquifer, which throughout the year receives around 30 m3/s infiltration water from the Danube in the upper parts of the area and returns it to the Danube and the drainage canals in the downstream part. The aquifer is an important water resource for municipal and agricultural water supply, and the floodplain area with its alluvial forests and associated ecosystems represents a unique landscape of outstanding ecological importance.
Fig. 20 The Danubian Lowland with the new reservoir and the Gabcikovo hydropower scheme.
The Gabcikovo hydropower scheme was put into operation in 1992. A large number of hydraulic structures was established as part of the hydropower scheme. The key structures are a system of weirs across the Danube at Cunovo 15 km downstream of Bratislava, a reservoir created by the damming at Cunovo, a 30 km long lined navigation canal, outside the floodplain area, parallel to the Danube River
39
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
with intake to the hydropower plant, a hydropower plant and two ship-locks at Gabcikovo, and an intake structure at Dobrohost, 10 km downstream of Cunovo, diverting water from the new canal to the river branch system. The entire scheme has significantly affected the hydrological regime and the ecosystem of the region. The scheme was originally planned as a joint effort between former Czecho-Slovakia and Hungary, and the major parts of the construction were carried out as such on the basis of an international treaty from 1977. However, since 1989 Gabcikovo has been a major matter of controversy between Slovakia and Hungary, who have referred some disputed questions to the International Court of Justice in The Hague (ICJ, 1997). The hydrological regime in the area is very dynamic with so many crucial links and feedback mechanisms between the various parts of the surface- and subsurface water regimes that no single existing model code was able to describe the entire regime. Therefore, the modelling system illustrated in Fig 21 was established. It integrates four model codes: (a) MIKE 21 (DHI, 1995) for describing the reservoir (2D flow, eutrophication, sediment transport); (b) MIKE 11 (Havnø et al., 1995) describing the river and river branches (1D flow including effects of hydraulic control structures, water quality, sediment transport); (c) MIKE SHE (Refsgaard and Storm, 1995) describing the ground water (3D flow, solute transport, geochemistry) and flood plain conditions (dynamics of inundation pattern, ground water and soil moisture conditions); and (d) DAISY (Hansen et al., 1991) describing agricultural aspects (crop yield, irrigation, nitrogen leaching). The interfaces between the various models were:
Fig. 21 Structure of the integrated modelling system with indication of the interactions between the individual models
40
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
A)
B)
C)
D)
E)
January 2007
MIKE SHE forms the core of the integrated modelling system having interfaces to all the individual modelling systems. The coupling of MIKE SHE and MIKE 11 is a fully dynamic coupling where data is exchanged within each computational time step. Results of eutrophication simulations with MIKE 21 in the reservoir are used to estimate the concentration of various water quality parameters in the water that enters the Danube downstream of the reservoir. This information serves as boundary conditions for water quality simulations for the Danube using MIKE 11. Sediment transport simulations in the reservoir with MIKE 21 provide information on the amount of fine sediment on the bottom of the reservoir. The simulated grain size distribution and sediment layer thickness is used to calculate leakage coefficients, which are used in ground water modelling with MIKE SHE to calculate the exchange of water between the reservoir and the aquifer. DAISY simulates vegetation parameters that are used in MIKE SHE to simulate the actual evapotranspiration. Ground water levels simulated with MIKE SHE act as lower boundary conditions for DAISY unsaturated zone simulations. Consequently, this process is iterative and requires several model simulations. Results from water quality simulations with MIKE 11 and MIKE 21 provide estimates of the concentration of various components/parameters in the water that infiltrates to the aquifer from the Danube and the reservoir. This can be used in the ground water quality simulations (geochemistry) with MIKE SHE.
The integrated model was established for the 3,000 km2 area on the basis of a large amount of good quality data. Most of the model parameters were assessed directly from field data, and some were estimated through calibration. For most of the individual model components, traditional split-sample validation tests were carried out. The modelling system was used in a scenario approach to assess the environmental impacts of alternative water management options. The uncertainties of the model predictions were assessed through sensitivity analyses. As an example, Figs 22 and 23 shows a characterisation of the floodplain area between the (old) main Danube river channel (western model boundary) and the power canal for predam (Fig. 22) and a hypothetical post-dam condition (Fig. 23) where the major part of the water is diverted from the main Danube channel to the power canal. The classes with different ground water depths and flooding have been determined from ecological considerations according to requirements of (semi)terrestrial (floodplain) ecotopes. For the pre-dam condition (Fig. 22) the contacts between the main Danube river and the river branch system is clearly seen. Similar results for a hypothetical post-dam water management regime (Fig. 23) show significant differences in hydrological regime, e.g. many areas are characterised by high groundwater tables and small/seldom flooding, while the post-dam situation (Fig. 22) generally has deeper ground water tables and more frequent flooding. From such changes in hydrological conditions inferences can be made on possible changes in the floodplain ecosystem.
41
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Fig. 22 Hydrological regime in the river branch area for 1988 pre-dam conditions characterised in ecological classes
Fig. 23 Hydrological regime in the river branch area for a post-dam water management regime characterised in ecological classes. The scenario has been simulated using 1988 observed upstream discharge data and a given hypothetical operation of the hydraulic structures.
42
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Discussion - post evaluation The uniqueness of the established modelling system is the integration between the individual model codes, each of which providing complex distributed physically-based descriptions of the various processes. The validation tests have generally been carried out for the individual models, whereas only few tests on the integrated model were possible. Altogether, the integrated modelling system and the applications were more comprehensive and complex in terms of interactive dynamics between different components of an ecosystem than had previously been reported in the scientific literature. In the years following [9] a few comprehensive large scale studies with coupled models emerged. The most comprehensive of those was probably Wolf et al. (2003) who developed the STONE for calculating nutrient emissions from agriculture in The Netherlands. Although based on different codes the STONE resembles the integrated modelling system in [9] in terms of number of codes and complexity of process descriptions. One main difference, however, was that STONE consists of a chain of models without the feedback couplings that characterise [9]. Simpler, although still comprehensive, modelling systems were presented by Birkinshaw and Ewen (2000) as the SHETRAN code with a built-in nitrate transformation component and Conan et al. (2003) with a coupling of SWAT, MODFLOW and MT3DMS also focusing on nitrate fate at catchment scale. The complexity of the modelling studies in [9] may be compared to coupled modelling studies in neighbouring fields. The hydrology related field with the strongest modelling traditions is no doubt the atmospheric science. Here very comprehensive coupled models have been used in connection with hydrology oriented climate change studies. An example of a sequentially coupled atmospherichydrological model from that period is Graham (1999) who used the ECHAM4 regional atmospheric model coupled with the HBV hydrological model to simulate discharge for the entire 1.6 106 km2 Baltic Sea basin. The atmospheric modelling component is in itself more demanding in terms of computer power than comprehensive hydrological modelling such as [9], and the complexity of the atmospheric modelling is maybe larger than the complexity of the individual process model codes in [9]. Otherwise the complexity of the coupled atmospheric-hydrological studies with respect to feedback couplings between process descriptions, data requirements, different scales for different processes, etc., may be considered comparable to the complexity of [9]. In retrospect it is interesting to evaluate how much this comprehensive modelling system actually was used as part of the political decision process? Were the full potential of the models utilised by the decision makers? In the following my personal perception of these aspects are presented. The application of the integrated modelling and information system in practise may be categorised in three principally different functions: (a) to assist in design of structures and details of water management regimes, (b) to assist in policy analysis by assessing the environmental impacts of alternative water management regimes, and (c) to assist in resolving different views between interest groups on environmental assessments. The use of models to assist in designs is the classical "engineering" way of using such models. There were a number of such applications. The best example of this is the final design in 1993 of the guiding structures of the Cunovo reservoir that was based on model simulations. Such model use was possible, because the objectives of the decision-makers were clear and there was an urgent need for the results before the construction works actually started.
43
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Use of models to assess the environmental impacts of alternative water management regimes was one of the primary reasons for establishing the modelling systems. There were several examples of such model applications. A key example was a combined field and modelling study of the geochemical conditions in the aquifer to assess whether the changed boundary conditions with the new reservoir would affect the redox conditions and hence the groundwater quality in the aquifer that forms the basis for the water supply of Bratislava. Another example is a combined field and modelling study of the eutrophication conditions in the reservoir. Such studies were conducted in close dialogue with the decision-makers in order to assist in their policy formulation. Finally, the modelling system was an invaluable tool in connection with the international attempts made to assist in resolving some of the issues that were disputed between Slovakia and Hungary. Many of the arguments brought forward on these highly controversial issues were mixtures of scientifically based facts and politically based views, but they were often claimed as purely scientifically based. It is very natural and fully legitimate that all parties have political interests and do their best to pursue them. However, the mixing of scientific facts and political interest makes the whole scene less transparent and may be an obstacle for arriving at rationale decisions. The role the modelling system had in this context was that it made it possible at some occasions to help distinguish between facts and fiction with respect to the scientific arguments. In this way the modelling tools assisted in separating scientific and political problems. Thus, the modelling system was often used as an important tool in resolving technical disagreements between the Slovakian and Hungarian delegations in the international expert groups (EC, 1992, 1993a, 1993b). Similarly, it is my impression that the modelling results played a significant role for the International Court of Justice when dealing with the question of whether the ecological situation could be characterised as a catastrophe justifying the use of the legal principle of “the ecological state of necessity” as done when Hungary stopped the construction works on the Gabcikovo scheme in 1989 (ICJ, 1997). However, there were also clear limitations to the application of the modelling tools. These limitations occurred when the political objectives were not clearly defined. It was for instance imagined that the modelling tools should be used to identify the optimal solution for the water management regime in the river branch system. This unique area is, however, subject to considerable interest from different sectors such as commercial forestry, fishery, tourism and natural conservation. The requirements of these different sectoral interests are not common and in some cases even contradictory with respect to how the water regime should be. Thus, until the balance of interests between these different stakeholders has been decided in terms of clear political goals from the government, an optimal solution does not exist. Another example of lack of clear political goals was related to the overall sharing of water between hydropower and the environment.
44
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
3.2.3
January 2007
Large scale modelling of groundwater contamination ([10])
Summary Publication [10] describes results from an EU research project on groundwater pollution from non-point sources. The rationale outlined in [10] is that physically based models for describing nitrate due to better process descriptions may be expected to have better predictive capabilities than simpler empirical models for certain applications related to assessing the impacts of changes in agricultural management practise. Such models were well proven for simulation of nitrate contamination at small scale with good data availability. Two of the main constraints for using such models operationally were that (a) the databases existing at national or European scale had not previously been tested as input for such models; and (b) almost no tests had been conducted for such models at large scale. The objectives of the paper were therefore to study the data availability at the large scale and develop methodologies for model upscaling/aggregation to represent conditions at larger scale. The theoretical aspects on scaling included in [10] are dealt with in Section 4.1. Here some key results from one of the two catchments (Karup) are discussed. The modelling system used was MIKE SHE (Refsgaard and Storm, 1995) coupled with the DAISY root zone model (Hansen et al., 1991). Two Danish catchments of about 500 km2 each, Karup and Odense, were used for the tests. The principles used for collecting input data and assessing values of model parameters were: • The data must be easily accessible. This implied that most of the data were aggregated data from national or European databases. • No model calibration is carried out. Instead parameter values are estimated from generic transfer functions. Data were collected from the following sources: • Topography: 1 km grid data downloadable from USGS and GISCO (Geographical Information System of the European Commission) • Catchment boundaries and river network: generated from the topographical data using standard GIS functionality. • River cross-sections: derived from a special GIS application where the cross-section was estimated based on upstream catchment area, slope and a characteristic discharge. • Soil type: GISCO soil map. • Soil organic matter: experience values. • Vegetation: EEA CORINE land cover map. • Agricultural management practise: Agricultural statistics and government prescribed norms • Geology and groundwater abstraction: EC report • Climatic variables and discharge data: national data The MIKE SHE models were run with 1, 2 and 4 km grids. For describing the nitrate leaching from the root zone, 17 crop rotation schemes were established by use of DAISY. The crop rotations were based
45
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
on the statistical information on crop type and livestock densities. The 17 schemes were distributed randomly over the catchment in such a way that the statistical distribution was in accordance with the agricultural statistics. As an alternative, all the agricultural area was described by one representative crop instead of 17 cropping patterns. These two approaches are denoted ‘Distributed’ and ‘Uniform’ in Figs. 24 and 25 below. The Karup model was validated by comparison of model simulations and field data on annual water balances, discharge hydrographs (Fig. 24) and nitrate concentrations in the upper groundwater layer from 35 observation wells (Fig. 25). The results of the validation tests were characterised as follows: • The annual water balance was simulated remarkably well with only 2% difference as average value over the five years validation period. The variation over the year (Fig. 24) is less well described. • The simulated nitrate concentrations (Fig. 25) match the observed data remarkably well both with respect to average concentrations and statistical distribution of concentrations within the catchment. • The simulations are clearly affected by various scale effects (1, 2, 4 km grid and Distributed/Uniform). This is addressed further in Section 4.1 below.
Fig. 24 Comparison of the recorded discharge hydrograph for the Karup catchment with simulations based on 1, 2 and 4 km grids. The two simulated curves correspond to the combined upscaling/aggregation procedure (Distributed) and the simpler upscaling procedure (Uniform).
46
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Distribution of groundwater concentrations (ultimo 1993) (uniform agricultural representation) 1,2
Cumulative frequency
1 Measure d det1000_ d1 det2000_ d1 det4000_ d1
0,8 0,6 0,4 0,2 0 0
20
40
60
80
100
120
140
160
180
(mg/l)
Distribution of groundwater concentrations (ultimo 1993) (distributed agricultural representation)
Cumulative frequency
1,2 1 Measured 0,8 det1000 0,6
det2000
0,4
det4000
0,2 0 0
20
40
60
80
100
120
140
160
180
mg/l
Fig. 25 Comparison of statistical distribution of nitrate concentrations in groundwater for the Karup catchment by the model with 1, 2 and 4 km grids and observed in 35 wells. The lower figure corresponds to the upscaling procedure resulting in a distributed representation of agricultural crops, while the upper figure is from the run with the upscaling procedure, where all agricultural area is represented by one uniform crop.
Discussion - post evaluation The model codes used in [10] were well known and previously used in one of the catchments (Styczen and Storm, 1993a, b). The scientific contributions of [10] relate partly to scaling issues, which are dealt with in Section 4.1 below, and partly to testing the performance of nitrate catchment models when scarce data are used and when no model calibration is carried out. The most important finding with respect to data availability is probably that aggregated data in many cases can provide sufficient input to perform useful model simulations. This message is similar to the output from the first large scale application of SHE to catchments in India with scarce data ([4] and [5]), namely that an apparent lack of primary data should not always prevent you from using a model. With regard to data availability at large scale it was concluded that the most critical data that may cause problems for large scale applications are the geological data for which no suitable global or European digital database exist. In this respect the development of a national hydrological model in Denmark (Henriksen et al., 2003) that is based on comprehensive geological data from the very large national geological database is an important development.
47
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
The study showed that one of the strengths of physically-based models is the possibility to assess many parameter values from standard values, achieved from experience through a number of other applications. It also showed some of the limitations in this respect. While the key results in terms of annual runoff and nitrogen concentration distributions are encouraging, the discharge hydrographs clearly illustrate that it would be very easy to obtain a better hydrograph fit through calibration of a couple of parameter values. When parameters are assessed in this way they are subject to considerable uncertainty, which will generate significant uncertainty in model predictions. This aspect is addressed in ([11]) which is discussed in Section 4.3 below. The attempt to assess parameter values directly from data without any model calibration can be seen as the extreme end of the development starting with hundreds of free parameters in the Suså model ([1]), over 26 parameters in the Kolar basin in India ([5]), to 11 free parameters in a previous Karup study ([7]). The results from the present study showed some obvious shortcomings of this approach, and in a later study of the Senegal basin (Andersen et al., 2001) we used 4 free parameters for calibration.
48
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
3.3 Real-time Flood Forecasting
3.3.1
Intercomparison of updating procedures for real-time forecasting ([8])
Summary Publication [8] presents a classification of updating procedures used in real-time flood forecasting modelling and a review of the results from the WMO project ‘Simulated Real-Time Intercomparison of Hydrological Models’ (WMO, 1992) comprising more than 10 commonly used hydrological model codes and a variety of different updating procedures. The objective of the paper was to analyse the performance of different types of updating procedures and to assess what is more important, the simulation model or the updating procedure. In the context of real-time forecasting a hydrological catchment model, as those in the remaining part of this thesis, may be denoted a process model (Fig. 26). A process model consists of a model structure including process equations, model parameters that are constant throughout a model run and state variables. The transformation from input to output by the process model is called simulation, in accordance with the terminology defined in Section 2.2 above. Process models that operate in real-time may take into consideration the measured discharge/water level at the time of preparing the forecast. This feedback process of assimilating the measured data into the forecasting procedure is referred to as updating, or data assimilation. Updating procedures can be classified according to four different methodologies (Fig. 26): 1. Updating of input variables, typically by adjusting precipitation. 2. Updating of state variables, e.g. the soil moisture content. 3. Updating of model parameters. 4. Updating of output variables (error prediction). The core of the WMO project was a workshop held in Vancouver during the period July 30 – August 8, 1987, where 15 models from 14 different organisations were run in a simulated real-time environment. Data from three catchments with significantly different hydrological characteristics were used for the tests. Before the workshop the modellers had received historical data for several years for calibration and validation and two ‘warm up’ flood events. During the workshop four additional flood events were forecasted as blind tests, each with seven forecasts at consecutive times. Each event was forecasted within one workshop day, often under considerable time pressure. I participated in the workshop with two models that differed both with respect to process model and updating procedure: • NAMS11 comprising the NAM as catchment model, St. Venant river routing and an error prediction model as updating procedure. This is basically identical to what later became known as the flood forecasting module of MIKE 11 (Havnø et al., 1995).
49
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
•
NAMKAL comprising the NAM formulated in a state-space form and build into an extended Kalman filter for updating. This version had no separate river routing but relied on the linear reservoirs in NAM. The two models were tested on the 104 km2 Orgeval catchment (France) and the 2,344 km2 Bird Creek catchment (United States). The models were not tested on the third, snow-dominated catchment.
Fig. 26 Schematic diagram of simulation and forecasting with illustration of four different updating methodologies), [8].
Summary results from the two catchments are shown in Fig. 27 as root mean square errors (RMSE) as a function of forecast lead time (lag). As can be seen from the figure the intercomparison test turned out to be a very close ‘race’ with at least one third of the models performing almost equally well. Depending on the selected criteria for comparison (which catchment, priority to short, medium or long lead times, etc.) several of these could claim to be the ‘best model’. What is maybe more interesting is some of the general findings: • The process models belonged to two of the classes shown in Fig. 6, namely empirical (black box) models and lumped conceptual models. From the results it was not possible to clearly distinguish which model type performed better. • All four types of updating procedures were represented, both among the models with the best performance and among the models with the poorest performance. This indicates that the selection of a specific updating methodology is only one out of several important factors. • The forecast error (RMSE) generally increases with forecast lead time. This shows that updating procedures most often significantly improve the performance of hydrological models for short-range forecasting. • In most cases the models with the best performance for short lead times were also those with the best results for the long lead times. This indicates that the goodness of the basic simulation (by the
50
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
process model) is crucial to forecast accuracy, or in other words that a good updating procedure can not compensate for a poor process model.
Discussion - post evaluation Real-time forecasting is the toughest field I have experienced in hydrological modelling with respect to model validation, because the results of the model forecasts are continuously confronted with observations. In many studies involving model simulations for planning purposes it is often not possible to conduct a validation test that exactly fits the conditions for which model simulations of future conditions are needed. Therefore, the validation test results will often have many qualifiers and be considered together with other arguments. In real-time flood forecasting there is no need for such qualifiers and arguments (‘no nonsense’) and therefore only the hard facts are considered.
Fig. 27 Root Mean Square Errors (RMSE) as a function of forecast lead time for all models participating in the Orgeval and Bird Creek catchments. The RMSE values are averaged over the four forecasted flood events with blind tests (events 3-6), [8].
51
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
The main scientific contribution of [8] was the analysis of the performance of different types of process models and updating procedures and combinations hereof. Our motivations to participate in this unique WMO intercomparison project were (a) to test DHI’s code NAMS11 (now MIKE 11), which was used operationally in India at that time, in an intercomparison with some of the internationally leading codes and modellers; and (b) to test whether an extended Kalman filter could provide a better updating routine than the more commonly used and simpler error prediction routine. In addition to noting that the NAMS11 performed very well and that the extended Kalman filter under ideal conditions could perform marginally better than the standard updating procedure, the analysis lead to the following interesting findings: • It was not possible to conclude which model type, black box or lumped conceptual, is better suited for simulation of runoff. This is in good agreement with [6] and later studies such as Reed et al. (2004), which concluded that lumped conceptual and distributed physically-based models performed equally well for split-sample tests. Thus it may be argued that all three model types described in Section 2.4 in many cases can be expected to be able to perform equally well in rainfallrunoff modelling. • It turned out that the personal factor is maybe the most important aspect of hydrological modelling. It was clear after the workshop that the difference in model performances between the participating codes could often not be explained by differences in model codes. Personal factors such as the modeller’s ability to make a good model calibration, experience from working in hydrological regimes different from the regime you see in your home office, ability to work under extreme stress, level of preparation beforehand and random luck also played important roles. The personal factor is most often overlooked in natural science, maybe because it is subjective of nature and therefore does not fit well into the methods usually adopted in natural science. The ultimate consequence of this finding is that good quality of modelling results requires both use of good scientifically based methodologies and adoption of sound practises by competent professionals. This consequence was not derived in [6] but is central for recent work on quality assurance guidelines in the modelling process ([13]). Most of the model codes that participated in the intercomparison study were state-of-the-art hydrological model codes such as Sacramento (Burnash, 1995), HBV (Bergström, 1995) and MIKE 11 (NAMS11) with comprehensive experience in operational flood forecasting. These codes are still among the most commonly used today. The updating techniques tested in [8] are also still the basic techniques used operationally today, although more sophisticated developments and improvements have taken place, e.g. a combination of the Kalman filtering and the error prediction procedure (Madsen and Skotner, 2005).
52
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
4. Key Issues in Catchment Scale Hydrological Modelling
4.1 Scaling This section provides a discussion of catchment heterogeneity and upscaling in relation to catchment modelling based partly on the publications in the present thesis (most importantly [7] and [10]) and partly on other previous work such as Refsgaard (1981), the foundation of [1] and [2], and Refsgaard and Butts (1999) that was heavily inspired by the EU research project behind [10] and [11]. Hydrological modelling is being carried out at spatial scales ranging from pore scale to global scale and a variety of scaling theories has been developed, see e.g. Blöschl and Sivapalan (1995) and Beven (1995). Many of the scaling theories consider different spatial scales for single processes. For catchment modelling it is necessary to include several processes and their linkages.
4.1.1
Catchment heterogeneity
Catchment properties exhibit spatial variability. For almost all properties this heterogeneity is very large and dominates the behaviour of the catchment. Scaling is basically a question of how to handle heterogeneity at different spatial scales. Different model types do this fundamentally different. Let us illustrate this by two examples. As the first example, let us consider an idealised description of flow through the root zone (Fig. 28). If a soil column, initially dry, is supplied with a certain amount of water it will retain water, until it is filled to a certain level, the field capacity θ’F, whereupon all the supplied water will pass through. This is illustrated in Fig. 28 A,B,C, where also the frequency and the distribution of θF are shown. If we then consider a catchment with a spatial variability in soil physical properties, the frequency and the distribution of the field capacity are illustrated in Fig. 28 D and E respectively. If the root zone of this catchment, initially dry, is being supplied with water, not all of the area will contribute to throughflow at the same time, as θF varies in the catchment. When, for instance, the rainfall has supplied the water amount θ’F,m, it is seen from Fig. 28 E that field capacity has been reached in one half of the catchment, thus contributing to throughflow, while the other half of the catchment still retains the rain in its root zone. In a lumped model, such as NAM, such spatial variability is taken into account by using semi-empirical relations as e.g. the dashed line in Fig. 28 F, where θ’1 and θ’2 typically have to be estimated from calibration. The difference between θ’1 and θ’2 can be seen as a measure of the heterogeneity of the catchment, or of the catchment input that is also assumed homogeneously distributed in a lumped approach. This way of accounting for the spatial variability in the process equations can be considered the heart of lumped models and also explains why the process equations in lumped models are fundamentally different from point scale physical process equations. In a distributed model the spatial variability is taken into account by dividing the catchment into several smaller elements, which are then usually treated as homogeneous units, i.e. as a column in Fig. 28.
53
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
However, the spatial variability of soil physical properties comprise both variability between different soil types and variability within the same soil type as illustrated in Fig. 29. It has been demonstrated in several studies (Nielsen et al., 1973; Jensen and Refsgaard, 1991a,b,c; Djurhus et al. 1999) that the spatial variability of e.g. soil properties within one standard soil type at field scale is very high and can significantly influence the water balance and solute transport at this scale.
Frequency
Through flow Supplied water
Distribution
A
B
C
Soil Column θF θ’F
θ’F
Frequency 1.0
Supplied water
θF θ’F Through flow Supplied water
Distribution
D
E
F
Catchment 0.5
θF
0
Supplied water
θF θ’F, m
θ’1
θ’2
Fig. 28 Idealised description of the variation of field capacity, θF, and its effect on flow through the root zone in a soil column and in a catchment (Refsgaard, 1981).
Frequency Spatial variability of field capacity, θF
within one of the soil types in the entire catchment
θF
Fig. 29 The principle of spatial variability of a soil physical property within a single soil type and within a catchment containing more than one soil type (Refsgaard, 1981).
Let us then turn to another example focusing on the limitation of a distributed model to resolve key features of a catchment. Fig. 30 shows the topography and river network for two models that are identical
54
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
except for differences in spatial discretisation. It is clearly seen that the 500 m grid provides a much better resolution of the topography and the river network, and also of other catchment characteristics as explained in [7]. In the 2000 m grid the river valley cannot be described well and many of the smaller streams have to be omitted, where the distance between neighbouring streams are smaller than the model grid size. This significantly affects the stream-aquifer interaction and in this way the simulation of both river discharge and groundwater heads. As discussed in [7] a change in scale (grid size) in this way changes the model simulations. This can in some cases be compensated by adjusting parameter values. But it implies that parameter values are scale dependent and that the physical basis is reduced if the grid size is increased.
Fig. 30 Topography, river network and model grid for two models with discretisations of 500 m and 2000 m [7]. This example focussed on river discharges and hydraulic heads at some given observational locations for which [7] argues that a 500 m resolution provides an adequate description. If we instead had focussed on other processes such as reactive transport in aquifers or in river valleys, we would have needed to account for geological and geomorphological heterogeneity of much smaller scale than 500 m. This line of argument can continue down to pore scale processes such as those described in [3]. The point is that, no matter which resolution a model has, it is always possible to find processes that require a smaller scale in order to provide a physically based description. Consequently, the ultimate distributed physically based model where everything is described can never be achieved. This implies that any distributed model needs to provide a kind of lumped conceptual representation at its scale of operation. An excellent example of this is the traditional advection dispersion equation with its associated dispersivities, where the dispersivities show the well known scale dependence (Gelhar, 1986). The process description of oxygen transport and consumption given in [3] is another example. Although meant for
55
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
inclusion as a submodel in a distributed physically based model, [3] incorporates spatial heterogeneity of processes at pore scale (mm) to a process equation assumed valid at its scale of operation (grid points with 10-40 cm distance). This process equation can therefore be considered a lumped conceptual description at this scale.
4.1.2
A scaling framework
In this section we only consider the case of moving from the smaller to the larger scale, which is often denoted upscaling. When moving to larger scales the spatial variability of physical parameters and variables have to be taken into account. This can in principle be done in two ways, either by aggregation or upscaling (Heuvelink and Pebesma, 1999): • Upscaling means that the process equations and the associated parameters that basically constitute the model in principle are modified or substituted when moving from the smaller scale to the larger scale. • Aggregation means that the process equations are applied at the smaller scale (where they were derived) and the large-scale results are obtained by aggregating the small-scale results at the larger scale. Hence, in order not to confuse the terminology with two different meanings of the term upscaling the term scaling will in the following be used for the case of moving from modelling at the smaller scale to modelling at the larger scale. Thus, the term upscaling is reserved to the specific approach of scaling defined above. The differences between upscaling and aggregation are illustrated in Fig. 31 and some key characteristics are summarised in Table 1. At the smaller scale, the hydrological processes can be described by smaller scale equations and associated smaller scale parameters. If the aggregation approach is adopted for large-scale modelling, then the model is operated at the smaller scale units with smaller scale equations and parameters and the model output valid for the larger scale emerges after aggregation of the results. The aggregation consists of estimating the spatial mean and in some cases also the statistical distribution of the model outputs. If the model is linear or the parameters and variables are spatially constant, computational time may be saved by averaging of model parameters and input before running the model; otherwise the models runs must be made before the aggregation step. Table 1. Characteristics of different scaling procedures when moving from a smaller scale (SS) to a larger scale (LS). Aggregation Upscaling SS equations Large-scale LS equations used at LS PDE developed
Basis of process descriptions
Smaller scale
Smaller scale
Smaller scale
Larger scale
Computational unit
Smaller scale
Larger scale
Larger scale
Larger scale
Parameter estimation possible from field data?
Yes
No, some values need calibration
Yes
No, some values need calibration
56
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Fig. 31 Upscaling and aggregation methods for extending hydrological processes from small-scale (SS) to large-scale (LS) models (Refsgaard and Butts, 1999).
If the upscaling approach is adopted for the large-scale modelling, the smaller scale equations and parameters are in principle substituted by larger scale ones. The upscaling approach can be carried out in three different ways: • The smaller scale equations are assumed valid also at the larger scale. In this case the parameter values have to be estimated as effective parameters corresponding to the larger scale computational unit. Effective parameters are single values, similar to point scale parameters, but somehow reproduce the bulk behaviour of a heterogeneous medium. The estimation of parameter values is in such case often done by calibration, at least for a handful of the key parameters. An example of this approach is given in [5] describing an application of the SHE to a large catchment in India using spatial grid sizes of 2 km x 2 km. • The equations at the larger scale are derived in a theoretical framework from a set of deterministic partial differential equations (PDE) assumed valid at the smaller scale and assumptions on the spatial variability of key parameters and/or input data. This is often carried out in a stochastic framework where quantities such as the average value and higher order statistical moments of the desired model output variables can be assessed. An example of this approach is Jensen and Mantouglou (1992) who consider the spatial variability of soil hydraulic parameters in field scale modelling. In this case the parameter values may be assessed directly on the basis of smaller scale information. • The equations at the larger scale are developed at the larger scale using a concept, which does not explicitly consider the smaller scale equations, i.e. the formulation of laws that apply at the large scale. Examples of this approach are the conceptual rainfall-runoff models such as the NAM (Niel-
57
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
sen and Hansen, 1973; [6]; [8]), cf Fig. 28 and the discussion above. The oxygen model described in [3] is also an example of this approach, although smaller scale and larger scale here refer to mm and dm scales and not to catchment scale. As a result of the larger scale concepts such codes are often not adequate also for smaller scale application and can most often not assess parameters directly from small scale information.
4.1.3
Scaling - an example
The above four scaling approaches each have their advantages and limitations and the specific approach to use in particular applications will depend on many factors such as the purpose of a given study, the dominating processes in the particular hydrological regime and the data availability. Thus, no unique approach can be claimed superior in all cases. As illustrated below, scaling procedures are in practise often based on combinations of the above approaches. The example outlines the scaling methodologies adopted under an EU research project dealing with uncertainties of assessing non-point pollution to aquifers at the European scale (Refsgaard et al, 1998; [10]). During this project two model codes were used: • SMART2 for studying leaching to groundwater of nitrate and aluminium from natural areas due to atmospheric deposition. SMART2 is a relatively simple dynamic model operating in vertical columns with annual time steps (Kros et al., 1995). • MIKE SHE/DAISY for studying groundwater contamination from agricultural areas. Both MIKE SHE (Refsgaard and Storm, 1995) and DAISY (Hansen et al., 1991) are physically-based model codes with detailed process descriptions and typically hourly time steps. The objective of the project was to assess the uncertainty in model predictions when applied at the European scale. As both codes had been developed for and previously mainly been applied at much smaller scales a scaling procedure had to be adopted. The two scaling procedures, illustrated in Fig. 32, show significant differences: SMART 2 is operating at a 1 km grid scale. It was developed on the basis of experience with the NUCSAM code (Groenenberg et al., 1995) which is a detailed physically-based code operating at point scale. Thus, SMART2 can be considered as an upscaling of NUCSAM with new equations and parameters applicable at the 1 km scale, equivalent to the upscaling procedure of the conceptual hydrological models described above. For use for the Netherlands the SMART2 model results were aggregated to 5 km x 5 km grid by selecting the median value among the 25 grids of 1 km x 1 km size. The parameters were assessed by pedotransfer functions from field data without prior model calibration. The scaling procedure from point scale to national or European scales thus consists of a combination of an upscaling and an aggregation step. MIKE SHE/DAISY, on the other hand, is in this case run with equations and parameter values in each model grid point representing field scale conditions. The field scale is characterised by ‘effective’ soil and vegetation parameters, but assuming only one soil type and one cropping pattern. The smallest horizontal discretisation in the model is the grid scale (1-5 km) that is larger than the field scale. This implies that all the variations between categories of soil type and crop type within the area of each grid can not be resolved and described at the grid level. Input data, whose variations are not included in the
58
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
grid scale representation, are distributed randomly at the catchment scale so that their statistical distributions are preserved at that scale. The results from the grid scale modelling are then aggregated to catchment scale (10-50 km) and the statistical properties of model output and field data are then compared at catchment scale (Hansen et al., 1999; [10]). Thus the scaling procedure from point scale to catchment scale is again a combination of an upscaling step and an aggregation step. In contrary to the NUCSAM-SMART2 case the upscaling step here is simply the (important) assumption that the point scale equations are valid at field scale. The aggregation step highlights a key issue from the concept of Representative Elementary Area, REA (Wood et al., 1988), namely that variability can be explicitly represented only at scales larger than the model grid size. Validation tests against field data suggested that the two different scaling procedures basically could be assumed valid for their respective cases, although important limitations were also identified. An important question regarding the differences between the two upscaling methods is, why it apparently was possible to make the large upscaling step from the smaller scale NUCSAM to the larger scale SMART 2 code, while a similar step was not judged possible for the MIKE SHE/DAISY code. The answer may be that the nitrogen leaching in agricultural fields is a highly non-linear and dynamic process that depends on cropping pattern and agricultural management practise, which can not be lumped to a larger scale description, while the geochemical processes below natural lands, where no management practise is interfering, more easily can be represented by long term average simulations focussing on the gradual reduction of the chemical buffer capacities due to the acids in the atmospheric deposition. An inherent limitation of the scaling methodologies illustrated in this example is that they do not preserve the georeferenced location of simulated concentrations, but only their statistical distribution over the catchment area (e.g. Fig. 25). Therefore, comparisons with field data make no sense on a well by well or subcatchment by subcatchment basis, and no information on the actual location of the simulated ‘hot spots’ within the catchment is provided. If it from a management point of view is required with a more detailed spatial resolution of the model predictions, then the same scaling method has to be carried out at a finer scale with all the statistical input data being supplied on a subcatchment basis. This is in principle straightforward, but in reality it may often be limited by data availability.
4.1.4
Discussion – post evaluation
The issue of scaling represents both a major scientific challenge and a practical problem in water resources management. Scaling is dealt with as a key issue in two of the publications in this thesis ([7], [10]). As the studies behind the other publications operate on scales ranging from point scale ([3]) to thousands of km2 ([4], [5], [9]) catchment heterogeneity and scaling are dealt with and discussed in many of the publications.
59
Fig. 32 Scaling methodology adopted by the SMART2 and MIKE SHE/DAISY models in the UNCERSDSS project (Refsgaard and Butts, 1999).
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
In the beginning of my career I had the rather naive view that it might be possible to develop a universal model code and a methodology that could be used to address most problems in hydrological management. This is reflected in the dualism of statements of the MIKE SHE description in Refsgaard and Storm (1995), where it on the one hand is stated that “MIKE SHE is applicable on spatial scales ranging from a single soil profile to a large regions”, while it on the other hand is acknowledged that “there are a number of fundamental scale problems which need to be carefully considered in the model applications”. I do not believe any longer that a universally applicable code and modelling methodology is theoretically realistic, and certainly it is not feasible in practise. The main reason for this is the scaling problems. Because scaling is interlinked with modelling concepts, I therefore do not believe it will ever be possible to derive a universal scaling theory of practical applicability. Scaling implies to take spatial heterogeneity into account. In catchment modelling it is furthermore complicated by the need to include and link several processes, such as subsurface processes (Dagan, 1986; Gelhar, 1986; Wen and Gómez-Hernández, 1996), root zone processes including land surfaceatmosphere interaction (Michaud and Shuttelworth, 1997); and surface water processes including stream-aquifer interaction (Saulnier et al., 1997; [7]). Many researchers have expressed doubts whether it is feasible to use the same model process descriptions at different scales. For instance Beven (1995) states that “… the aggregation approach towards macroscale hydrological modelling, in which it is assumed that a model applicable at small scales can be applied at larger scales using ‘effective’ parameter values, is an inadequate approach to the scale problem. It is also unlikely in the future that any general scaling theory can be developed due to the dependence of hydrological systems on historical and geological perturbations.” Beven’s view can be considered a universal and fundamental statement to which it is difficult to disagree. A more pragmatic, but not necessarily conflicting, view is expressed by Grayson and Blöschl (2000): “As modellers, we are often left with little choice but to use the effective parameter approach, but we must recognise that effective parameters may have a narrow range of application and an effective parameter value that “works” for one process may not be valid for another process.” The scaling framework presented above should be seen in this context. It is not a fundamental theory but rather a collection of different methods and an emphasis on their respective assumptions and associated costs in terms of lost information. These methods or building blocks can then be used in composing specific scaling methodologies depending on the purposes of the particular modelling studies. In this respect it is crucial that the modeller is aware of the limitations of the scaling methodology chosen in a particular study.
61
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
4.2 Confirmation, Verification, Calibration and Validation As illustrated in Fig. 3 the credibility of the descriptions or the agreements between reality, conceptual model, model code and model are evaluated through confirmation of the conceptual model, verification of the code, model calibration and model validation. These four terms are addressed in this section.
4.2.1
Confirmation of conceptual model
The conceptual model, with its selection of process descriptions, equations, etc., is the foundation for the model structure. Therefore a good conceptual model is most often a prerequisite for obtaining trustworthy model results. In groundwater modelling, establishment of the conceptual model is often considered the most important part of the entire modelling process (Middlemis, 2000). Evaluation of conceptual models is an important part in assessing uncertainty due to model structure error (Section 4.3 below and [15]). Methods for conceptual model confirmation should follow the standard procedures for confirmation of scientific theories. This implies that conceptual models should be confronted with actual field data and be subject to critical peer reviews. Furthermore, the feedback from the calibration and validation process may also serve as a means by which one or a number of alternative conceptual models may be either confirmed or falsified. As Beven (2002b) argues we need to distinguish between our qualitative understanding (perceptual model) and the practical implementation of that understanding in our conceptual model. As a conceptual model is defined in [12] as combination of a perceptual model and the simplifications acceptable for a particular model study a conceptual model becomes site-specific and even case specific. For example a conceptual model of a groundwater aquifer may be described as two-dimensional for a study focussing on regional groundwater heads, while it may need to include more complex three-dimensional geological structures for a study requiring detailed solute transport simulations.
4.2.2
Code verification
The ability of a given model code to adequately describe the theory and equations defined in the conceptual model by use of numerical algorithms is evaluated through the verification of the model code. Use of the term verification in this respect is in accordance with Oreskes et al. (1994), because mathematical equations are closed systems. The methodologies used for code verification include comparing a numerical solution with an analytical solution or with a numerical solution from other verified codes. However, some programme errors only appear under circumstances that do not routinely occur, and may not have been anticipated. Furthermore, for complex codes it is virtually impossible to verify that the code is universally accurate and error-free. Therefore, the term code verification must be qualified in terms of specified ranges of application and corresponding ranges of accuracy. Code verification is not an activity that is carried out from scratch in every modelling study. In a particular study it has to be ascertained that the domain of applicability for which the selected model code has been verified covers the conditions specified in the actual conceptual model. If that is not the case, additional
62
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
verification tests have to be conducted. Otherwise, the code explicitly must be classified as not verified for this particular study, and the subsequent simulation results therefore have to be considered with extra caution.
4.2.3
Model calibration
The application of a model code to be used for setting up a site-specific model is usually associated with model calibration. The model performance during calibration depends on the quantity and quality of the available input and observation data as well as on the conceptual model. If sufficient accuracy cannot be achieved either the conceptual model and/or the data have to be re-evaluated. Many of the publications ([1], [4], [5], [6], [7], [8], [9]) have involved model calibration. This was in all cases done manually. Today automatic calibration (inverse modelling) is state-of-the-art (Duan et al., 1994; Hill, 1998; Doherty, 2003), also as part of the calibration process for rather complex distributed physically-based models (Sonnenborg et al., 2003; Henriksen et al., 2003). A key issue related to calibration of distributed models with potentially hundreds or thousands of parameter values is a rigorous parameterisation procedure, where the spatial pattern of the parameter values are defined and the number of free parameters adjustable through calibration is reduced as much as possible. A methodology for this is presented in [7], and this issue is further discussed in [4], [5], [10] and Andersen et al. (2001).
4.2.4
Model validation
Often the model performance during calibration is used as a measure of the predictive capability of a model. This is a fundamental error. Many studies (e.g. [4]; [6]; Andersen et al., 2001) have demonstrated that the model performance against independent data not used for calibration is generally poorer than the performance achieved in the calibration situation. Therefore, the credibility of a sitespecific model’s capability to make predictions about reality must be evaluated against independent data. This process is denoted model validation. In designing suitable model validation tests a guiding principle should be that a model should be tested to show how well it can perform the kind of task for which it is specifically intended (Klemes, 1986). Klemes proposed the following scheme comprising four types of test corresponding to different situations with regard to whether data are available for calibration and whether the catchment conditions are stationary or the impact of some kind of intervention has to be simulated: • The split-sample test is the classical test, being applicable to cases where there is sufficient data for calibration and where the catchment conditions are stationary. The available data record is divided into two parts. A calibration is carried out on one part and then a validation on the other part. Both the calibration and validation exercises should give acceptable results. • The proxy-basin test should be applied when there is not sufficient data for a calibration of the catchment in question. If, for example, streamflow has to be predicted in an ungauged catchment Z, two gauged catchments X and Y within the region should be selected. The model should be calibrated on catchment X and validated on catchment Y and vice versa. Only if the two validation results are
63
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
•
•
January 2007
acceptable and similar can the model command a basic level of credibility with regard to its ability to simulate the streamflow in catchment Z adequately. The differential split-sample test should be applied whenever a model is to be used to simulate flows, soil moisture patterns and other variables in a given gauged catchment under conditions different from those corresponding to the available data. The test may have several variants depending on the specific nature of the modelling study. If for example a simulation of the effects of a change in climate is intended, the test should have the following form. Two periods with different values of the climate variables of interest should be identified in the historical record, such as one with a high average precipitation and the other with a low average precipitation. If the model is intended to simulate streamflow for a wet climate scenario, then it should be calibrated on a dry segment of the historical record and validated on a wet segment. Similar test variants can be defined for the prediction of changes in land use, effects of groundwater abstraction and other such changes. In general, the model should demonstrate an ability to perform through the required transition regime. The proxy-basin differential split-sample test is the most difficult test for a hydrological model, because it deals with cases where there is no data available for calibration and where the model is directed to predicting non-stationary conditions. An example of a case that requires such a test is simulation of hydrological conditions for a future period with a change in climate and for a catchment, where no calibration data presently exist. The test is a combination of the two previous tests.
The above test types are very general and needs to be translated to specific tests in each case depending on data availability, hydrological regime and purpose of the modelling study. Except for the situations, where the split-sample test is sufficient, rather limited work has been carried out so far on validation test schemes. From a theoretical point of view the procedures outlined by Klemes (1986) for the proxy-basin and the differential split-sample tests, where tests have to be carried out using data from similar catchments, are weaker than the usual split-sample test, where data from the specific catchment are available. However, no obviously better testing schemes exist. It must be realised that the validation test schemes proposed above are so demanding that many applications today would fail to meet them. Thus, for many cases where either proxy-basin or differential split-sample tests are required, suitable test data simply do not exist. This is for example the case for prediction of regional scale transport of potential contamination from underground radionuclide deposits over the next thousands of years. In such case model validation is not possible. This does not imply that these modelling studies are not useful, only that their output should be recognised to be somewhat more uncertain than is often stated and that the term ‘validated model’ should not be used. Thus, a model’s validity will always be confined in terms of space, time, boundary conditions, types of application, etc.
4.2.5
Discussion – post evaluation
Relative to confirmation, verification and calibration, the main scientific contributions in my publications [1] – [15] are on the model validation issue. The motivation for this research was twofold: First of all, there were too many undocumented claims (over-selling) in the modelling community on model capabilities during the years following the development of many comprehensive model codes such as MIKE
64
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
SHE. This over-selling was most obvious in practical studies conducted by consultants, but it was also common in large parts of the scientific community, e.g. Abbot et al. (1986a,b) and many others. Secondly, dominant parts of the hydrological scientific community advocated that model validation was not possible (Konikow and Bredehoeft, 1992; Beven, 1996a). This left the practising world in a vacuum without scientifically based methodologies to test and document the degree of credibility of particular model predictions. The methodologies described in [6] and [7] should be seen as pragmatic approaches to help filling this vacuum and the discussions in [12] should be seen as an attempt to provide a scientific basis for adopting rigorous model validation schemes as part of a good modelling practise. The principles and schemes proposed by Klemes have been extensively used in the last 12 of the publications ([4] – [15]). Thus, the intercomparison study in [6] was based on a rigorous use of all four types of tests. Furthermore, [7] ‘translated’ Klemes’ principles that were developed with lumped conceptual models in mind to use in distributed modelling. After demonstrating that a distributed model that was validated for simulating catchment response often performs much poorer for internal sites, [7] emphasised that a model should only be assumed valid with respect to the outputs that have been directly validated. This implies e.g. that multi-site validation is needed if predictions of spatial patterns are required. Furthermore, a model which is validated against catchment runoff can not automatically be assumed valid also for simulation of erosion on a hillslope within the catchment, because smaller scale processes may dominate here; it will need specific validation against hillslope soil erosion data. Furthermore, systematic split-sample tests were made in [4], [5] and [9], and proxy- basin tests were conducted in [10]. Finally, the validation requirements are emphasised in the publications related to quality assurance [12] and [13]. [6] and [7] were not the first studies to use Klemes’ principles for validation. For example Quinn and Beven (1993) used split sample-tests, proxy-basin tests and differential split-sample tests (wet/dry periods) to analyse TOPMODEL’s predictive capabilities for the Plynlimon catchment in Wales. The key contribution of [7] and [12] in this respect was the integration of Klemes’ principles as core elements of a protocol for good modelling practise. The principles outlined in [7] and consolidated in [12] that a model should never be considered universally validated, but can only be conditionally validated restricted by the availability of data and specifically performed validation tests are well in line with Lane and Richards (2001) who argue that “evidence of a successful prediction in observed spaces and times (conventional validation) cannot provide a sufficient basis for use of a model beyond the set of situations for which the model has been empirically tested”. The principles are also in accordance with the new coherent philosophy for modelling of the environment proposed by Beven (2002b) where he argues that it is required to be able to “define those areas of the model space where behavioural models occur”.
65
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
4.3 Uncertainty Assessment This section presents a broad framework originating from Refsgaard et al. (2005) and [14] followed by a discussion on data uncertainty (including [14]), parameter uncertainty (including [11]) and model structure uncertainty (including [15]) and how they affect model output uncertainty.
4.3.1
Modelling uncertainty in a water resources management context
Definitions and Taxonomy Uncertainty and associated terms such as error, risk and ignorance are defined and interpreted differently by different authors (see Walker et al. (2003) for a review). The different definitions reflect, among other factors, the different scientific disciplines and philosophies of the authors involved, as well as the intended audience. In addition they vary depending on their purpose. Here I will use the terminology used in Refsgaard et al. (2005) and [14] that has emerged after discussions between social scientists and natural scientists specifically aiming at applications in model based water management (Klauer and Brown, 2003). It is based on a subjective interpretation of uncertainty in which the degree of confidence that a decision maker has about possible outcomes and/or probabilities of these outcomes is the central focus. Thus, according to this definition a person is uncertain if s/he lacks confidence about the specific outcomes of an event. Reasons for this lack of confidence might include a judgement that the information is incomplete, blurred, inaccurate, imprecise or potentially false. Similarly, a person is certain if s/he is confident about the outcome of an event. It is possible that a person feels certain but has misjudged the situation (i.e. s/he is wrong). There are many different (decision) situations, with different possibilities for characterising of what we know or do not know and of what we are certain or uncertain. A first distinction is between ignorance as a lack of awareness about imperfect knowledge and uncertainty as a state of confidence about knowledge (which includes the act of ignoring). Our state of confidence may range from being certain to admitting that we know nothing (of use), and uncertainty may be expressed at a number of levels in between. Regardless of our confidence in what we know, ignorance implies that we can still be wrong (‘in error’). In this respect Brown (2004) has defined a taxonomy of imperfect knowledge illustrated in Fig. 33.
66
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Ignorance: unaware of imperfect knowledge
Spectrum of confidence (a state of awareness) Indeterminacy (‘cannot know’)
Certainty
‘Bounded’ uncertainty
‘Unbounded’ uncertainty No possible outcomes known (‘do not know’) Some possible outcomes and probabilities known
All possible outcom es and all probabilities known
All possible outcomes and som e probabilities known
Some possible outcom es, but no probabilities known
All possible outcomes but no probabilities known
Fig. 33 Taxonomy of imperfect knowledge resulting in different uncertainty situations (Brown, 2004)
In evaluating uncertainty, it is useful to distinguish between uncertainty that can be quantified e.g. by probabilities and uncertainty that can only be qualitatively described e.g. by scenarios. If one throws a balanced die, the precise outcome is uncertain, but the ‘attractor’ of a perfect die is certain: we know precisely the probability for each of the 6 outcomes, each being 1/6. This is what we mean with ‘uncertainty in terms of probability’. However, the estimates for the probability of each outcome can also be uncertain. If a model study says: “there is a 30% probability that this area will flood two times in the next year”, there is not only ‘uncertainty in terms of probability’ but also uncertainty regarding whether the estimate of 30% is a reliable estimate. Secondly, it is useful to distinguish between bounded uncertainty, where all possible outcomes have been identified and unbounded uncertainty, where the known outcomes are considered incomplete. Since quantitative probabilities require ‘all possible outcomes’ of an uncertain event and each of their individual probabilities to be known, they can only be defined for ‘bounded uncertainties’. If probabilities cannot be quantified in any undisputed way, we often can still qualify the available body of evidence for the possibility of various outcomes. The bounded uncertainty where all probabilities are deemed known (Fig. 33) is often denoted ‘statistical uncertainty’ (e.g. Walker et al., 2003). This is the case traditionally addressed in model based uncertainty assessment. It is important to note that this case constitutes one of many decision situations outlined in Fig. 33, and in other situations the main uncertainty in a decision situation cannot be characterised statistically.
67
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Sources of uncertainty Walker et al. (2003) describes the uncertainty as manifesting itself at different locations in the model based water management process. These locations, or sources, may be characterised as follows: • Context, i.e. at the boundaries of the system to be modelled. The model context is typically determined at the initial stage of the study where the problem is identified and the focus of the model study selected as a confined part of the overall problem. This includes, for example, the external economic, environmental, political, social and technological circumstances that form the context of problem. • Input uncertainty in terms of external driving forces (within or outside the control of the water manager) and system data that drive the model such as land use maps, pollution sources and climate data. • Model structure uncertainty is the conceptual uncertainty due to incomplete understanding and simplified descriptions of processes as compared to nature. • Parameter uncertainty, i.e. the uncertainties related to parameter values. • Model technical uncertainty is the uncertainty arising from computer implementation of the model, e.g. due to numerical approximations and bugs in the software. • Model output uncertainty, i.e. the total uncertainty on the model simulations taken all the above sources into account, e.g. by uncertainty propagation.
Nature of uncertainty Many authors (e.g. Walker et al., 2003) categorise the nature of uncertainty into: • Epistemic uncertainty, i.e. the uncertainty due to imperfect knowledge. • Stochastic uncertainty, i.e. uncertainty due to inherent variability, e.g. climate variability. Epistemic uncertainty is reducible by more studies: e.g. research or data collection. Stochastic uncertainty is non-reducible. Often the uncertainty on a certain event includes both epistemic and stochastic uncertainty. An example is the uncertainty of the 100 year flood at a given site. This flood event can be estimated: e.g. by use of standard flood frequency analysis on the basis of existing flow data. The (epistemic) uncertainty may be reduced by improving the data analysis, by making additional monitoring (longer time series) or by a deepening our understanding of how the modelled system works. However, no matter how much we improve our knowledge, there will always be some (stochastic) uncertainty inherent to the natural system, related to the stochastic and chaotic nature of several natural phenomena, such as weather. Perfect knowledge on these phenomena cannot give us a deterministic prediction, but would have the form of a perfect characterisation of the natural variability; for example, a probability density function for rainfall in a month of the year.
68
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
The uncertainty matrix The uncertainty matrix in Table 2 can be used as a tool to get an overview of the various sources of uncertainty in a modelling study. The matrix is modified after Walker et al. (2003) in such a way that it matches Fig. 33 and so that the taxonomy now gives ‘uncertainty type’ in descriptions that indicates in what terms uncertainty can best be described. The vertical axis identifies the source of uncertainty while the horizontal axis covers the level and nature of uncertainty. It is noticed that the matrix is in reality three-dimensional (source, type, nature), because the categories Type and Nature are not mutually exclusive Table 2 The uncertainty matrix (modified after Walker et al., 2003). Taxonomy (types of uncertainty) Nature Statistical Scenario QualitaRecogEpistemic Stochasunceruncertive un- nised ignouncertic uncerSource of uncertainty tainty tainty certainty rance tainty tainty Natural, techContext nological, economic, social, political Inputs System data Driving forces Model strucModel ture Technical Parameters Model outputs
69
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Methodologies for assessing uncertainty A list of the most common methodologies applicable for addressing different types of uncertainty has been compiled and briefly described in Refsgaard et al. (2005). Table 3 provides an overview. Table 3 Applicability of different methodologies to address different types and sources of uncertainty (modified after Refsgaard et al., 2005). Taxonomy (types of uncertainty) Statistical Scenario unQualitative Recognised uncertainty certainty uncertainty ignorance Source of uncertainty Natural, tech- EE EE, SC, SI EE, EPR, EE, EPR, NUContext nological, NUSAP, SI, SAP, SI, UM economic, UM social, political Inputs System data DA, EPE, EE, DA, EE, SC DA, EE DA, EE MCA, SA Driving forces DA, EPE, EE, DA, EE, SC DA, EE, EPR DA, EE, EPR MCA, SA Model struc- EE, MMS, QA EE, MMS, SC, EE, NUSAP, EA, NUSAP, Model ture QA QA QA Technical QA QA QA QA
Parameters
EE, IN-PA, SA EE, IN-PA, SA EE
Model outputs
EPE, EE, IN- EE, IN-UN, EE, NUSAP UN, MCA, MMS, SA MMS, SA Abbreviations of methodologies: DA EPE EE EPR IN-PA IN-UN MCA MMS NUSAP QA SC SA SI UM
Data Uncertainty Error Propagation Equations Expert Elicitation Extended Peer Review (review by stakeholders) Inverse modelling (parameter estimation) Inverse modelling (predictive uncertainty) Monte Carlo Analysis Multiple Model Simulation NUSAP Quality Assurance Scenario Analysis Sensitivity Analysis Stakeholder Involvement Uncertainty Matrix
70
EE EE, NUSAP
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
4.3.2
January 2007
Data uncertainty
Uncertainty in data is a major source of uncertainty when assessing uncertainty of model outputs. It is also an uncertainty source that is very visible for people outside the modelling community. One of the scientific contributions of the HarmoniRiB project ([14]) is to address data uncertainty. This has been done in three steps: • A methodology has been developed for characterising uncertainty in different types of data (Brown et al., 2005). • A software tool (Data Uncertainty Engine – DUE) for supporting the assessment of data uncertainty has been developed (Brown and Heuvelink, 2005). • Reviews with results on data uncertainty reported in the literature have been compiled into a guideline report for assessing uncertainty in various types of data originating from meteorology, soil physics and geochemistry, hydrogeology, land cover, topography, discharge, surface water quality, ecology and socio-economics (Van Loon and Refsgaard, 2005). The categorisation of data types distinguishes 13 categories (Table 4) for each of which a conceptual data uncertainty model is developed. By considering measurement scale, it becomes possible to quickly limit the relevant uncertainty models for a certain variable. On a discrete measurement scale, for example, it is only relevant to consider discrete probability distribution functions, whereas continuous density functions are required for continuous numerical data. In addition, the use of space and time variability determines the need for autocorrelation functions alongside a probability density function (pdf). Each data category is associated with a range of uncertainty models, for which more specific pdfs may be developed with different simplifying assumptions (e.g. Gaussian; second-order stationarity; degree of temporal and spatial autocorrelation). Table 4 The subdivision of uncertainty categories, along the ‘axes’ of space-time variability and measurement scale (Brown et al., 2005). Measurement scale Space-time variability
Continuous numerical
Discrete numerical
Categorical
Constant in space and time
A1
A2
A3
Varies in time, not in space
B1
B2
B3
Varies in space, not in time
C1
C2
C3
Varies in time and space
D1
D2
D3
Narrative
4
4.3.3
Parameter uncertainty
In addition to data uncertainty, uncertainty of parameter values is the most commonly considered source of uncertainty in hydrological modelling. The scientifically soundest way of assessing parameter uncertainty is through inverse modelling (Duan et al., 1994; Hill, 1998; Doherty, 2003). These tech-
71
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
niques have the benefit that they, in addition to optimal parameter values, also produce calibration statistics in terms of parameter- and observation sensitivities, parameter correlation and parameter uncertainties. When parameter uncertainties are assessed they can be propagated through the model to infer about model output uncertainty. A serious constraint in this respect is the interdependence between model parameters and model structure as discussed under model structure uncertainty below. [11] describe an example of how (input) data uncertainty and parameter uncertainty are propagated through a model to assess uncertainty in model simulation of nitrate concentrations in groundwater. The assessment of data and parameter values were done by expert judgement and a Monte Carlo technique with Latin hypercube sampling was used for the uncertainty propagation. The simulated uncertainty band around the deterministic model simulation in Fig. 25 is shown in Fig. 34 based on 25 Monte Carlo realisations. The uncertainty is seen to be considerable, e.g. with the estimate of the areal fraction of the aquifer having concentrations less than 50 mg NO3/l ranging between 30% and 80%.
1
Cum. frequency
0,8 0,6 0,4
(ultimo 1993)
0,2 0 0
20
40
60
80
100
120
140
160
180
mg/l
Fig. 34 Measured () and simulated (×) areal distribution of NO3 concentrations in groundwater at a point in time. Measured values are based on 35 groundwater observations. [11]. As noted in [11] a fundamental limitation of the approach adopted in [11] is that the errors due to incorrect model structure are neglected. As discussed also below one approach to assess such model structure error is through comparison of predicted and observed values. In the present case (Figs 25 and 34) the deviation between observed and simulated values is so small that this term may be neglected. This is, however, by no means a proof of a correct model structure. It only shows that the particular model performs without apparent model errors for this particular application.
72
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
4.3.4
January 2007
Model structure uncertainty
Existing approaches and new framework Any model is an abstraction, simplification and interpretation of reality. The incompleteness of a model structure and the mismatch between the real causal structure of a system and the assumed causal structure as represented in a model will therefore always result in uncertainty about model predictions. The importance of the model structure for predictions is well recognised, even for situations where predictions are made on output variables, such as discharge, for which field data are available (Franchini and Pacciani, 1992; Butts et al., 2004). The considerable challenge faced in many applications of environmental models is that predictions are required beyond the range of available observations, either in time or in space, e.g. to make extrapolations towards unobservable futures (Babendreier, 2003) or to make predictions for natural systems, such as ecosystems, that are likely to undergo structural changes (Beck, 2005). In such cases, uncertainty in model structure is recognised by many authors to be the main source of uncertainty in model predictions (Dubus et al., 2003; Neumann and Wierenga, 2003; Linkov and Burmistrov, 2003). The existing strategies for assessing uncertainty due to incomplete or inadequate model structure may be grouped into the categories shown in Fig. 35. The most important distinction is whether data exist that makes it possible to infer directly on the model structure uncertainty. This requires that data are available for the output variable of predictive interest and for conditions similar to those in the predictive situation. In other words it is a distinction between whether the model predictions can be considered as interpolations or extrapolations relative to the calibration situation.
Availability of data for model validation test?
No direct data (extrapolation)
Target data exist (interpolation)
Increase parameter uncertainty
Estimate structural term
Multiple conceptual models
Intermediate data (differential splitsample case)
Expert elicitation
Pedigree analysis
No data at all (proxy basin case)
Fig. 35 Classification of existing strategies for assessing conceptual model uncertainty [15].
73
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
The two main categories are thus equivalent to different situations with respect to model validation tests. According to Klemes’ classical hierarchical test scheme (Klemes, 1986; see Section 4.2 above), the interpolation case corresponds to situations where the traditional split-sample test is suitable, while the extrapolation case corresponds to situations where no data exist for the concerned output variable (proxy-basin test) or where the basin characteristics are considered non-stationary, e.g. for predictions of effects of climate change or effects of land use change (differential split-sample test). The strategies used in ‘interpolation’, i.e. for situations that are similar to the calibration situation with respect to variables of interest and conditions of the natural system, have the advantage that they can be based directly on field data (e.g. Radwan et al., 2004; van Griensven and Meixner, 2004; and Vrugt et al., 2005). A fundamental weakness is that field data are themselves uncertain. Nevertheless, in many cases, they can be expected to provide relatively accurate estimates of, at least, the total predictive uncertainty for the specific measured variable and for the same conditions as those in the calibration and validation situation. A more serious limitation of the strategies depending on observed data is that they are only applicable for situations where the output variables of interest are measured. While relevant field data are often available for variables such as water levels and water flows, this is usually not the case for concentrations, or when predictions are desired for scenarios involving catchment change, such as land use change or climate change. Another serious limitation stems from an assumption that the underlying system does not undergo structural changes, such as changes in ecosystem processes due to climate change. The strategy that uses multiple conceptual models benefits from an explicit analysis of the effects of alternative model structures, e.g. IPCC (2001), Harrar et al. (2003), Troldborg (2004), Poeter and Anderson (2005) and Højberg and Refsgaard (2005). The multiple conceptual model strategy makes it possible to include expert knowledge on plausible model structures. This strategy is strongly advocated by Neuman and Wierenga (2003) and Poeter and Anderson (2005). They characterise the traditional approach of relying on a single conceptual model as one in which plausible conceptual models are rejected (in this case by omission). They conclude that the bias and uncertainty that results from reliance on an inadequate conceptual model are typically much larger than those introduced through an inadequate choice of model parameter values. This view is consistent with Beven (2002b) who outlines a new philosophy for modelling of environmental systems. The basic aim of his approach is to extend traditional schemes with a more realistic account of uncertainty, rejecting the idea that a single optimal model exists for any given case. Instead, environmental models may be non-unique in their accuracy of both reproduction of observations and prediction (i.e. unidentifiable or equifinal), and subject to only a conditional confirmation, due to e.g. errors in model structure, calibration of parameters and period of data used for evaluation. A weakness of the multiple modelling strategy, is the absence of quantitative information about the extent to which each model is plausible. Furthermore, it may be difficult to sample from the full range of plausible conceptual models. In this respect, expert knowledge on which the formulations of multiple conceptual models are based, is an important and unavoidable subjective element. The framework presented in [15] for assessing the predictive uncertainties of environmental models used for extrapolation includes a combination of use of multiple conceptual models and assessment by use of the pedigree approach of their credibility as well as a reflection on the extent to which the sampled models adequately represent the space of plausible models.
74
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
The role of model calibration Some of the existing strategies used in ‘interpolation’ cannot differentiate how the total predictive uncertainty originates from model input, model parameter and model structure uncertainty. Other methods attempt to do so, but as discussed in [15] this is problematic. In the case of uncalibrated models, the parameter uncertainty is very difficult to assess quantitatively, and wrong estimates of model parameter uncertainty will influence the estimates of model structure uncertainty. In the case of calibrated models, estimates of model parameter uncertainty can often be derived from autocalibration routines. An inadequate model structure will, however, be compensated by biased parameter values to optimise the model fit with field data during calibration. Hence, the uncertainty due to model structure will be underestimated in this case. The importance of model calibration can be illustrated by the example described in Højberg and Refsgaard (2005). They use three different conceptual models, based on three alternative geological interpretations, for a multi-aquifer system in Denmark. Each of the models was calibrated against piezometric head data using inverse technique. The three models provided equally good and very similar predictions of groundwater heads, including well field capture zones. However, when using the models to extrapolate beyond the calibration data to predictions of flow pathways and travel times the three models differed dramatically. When assessing the uncertainty contributed by the model parameter values, the overlap of uncertainty ranges between the three models significantly decreased when moving from groundwater heads to capture zones and travel times. They conclude that the larger the degree of extrapolation, the more the underlying conceptual model dominates over the parameter uncertainty and the effect of calibration. This diminishing effect of calibration as the prediction situation is extrapolated further and further away from the calibration base resembles the conclusion on the effects of updating relative to the underlying process model, when forecast lead times are increased in real-time forecasting (Fig. 27, Section 3.3). Here the effect of updating is reduced and the forecast error therefore increases as the forecast lead time (= degree of extrapolation) increases.
4.3.5
Discussion – post evaluation
Uncertainty is a key, and crosscutting, issue that I consider a useful platform or catalyst for establishing a common understanding in hydrological modelling and water resources management. By this I mean both a common understanding within the natural science based modelling issues such as scaling and validation and between people from the modelling and the monitoring communities as well as a broader dialogue between modellers and stakeholders on issues such as when is a model accurate and credible enough for its purpose of application, see Subsection 4.4.4 below. In the publications on developing the Suså model ([1], [2]) and the oxygen module ([3]) no explicit consideration is given to the goodness of the model structure and uncertainty assessment was not an issue at all. In the later work on catchment modelling in India ([4], [5]), where some twisting was done of the physical realism of the model due to scaling problems, it was noted that the model results might be ‘right for the wrong reasons’, and the limitations of model applicability were emphasised in this respect, but no uncertainty assessments were made. In the paper describing a methodology for parameterisa-
75
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
tion, calibration and validation of distributed hydrological models ([7]) uncertainty is also neglected. In the publications [6], [8], [9] and [10] uncertainty is discussed, but as a secondary issue only. Although examples of model prediction uncertainty assessments had been reported previously from different modelling disciplines (e.g. Refsgaard et al., 1983; Beck, 1987), the fist to emphasise the need to systematically perform uncertainty assessments related to catchment model predictions was probably Beven (1989). This was followed by Binley et al. (1991) who used Monte Carlo analysis to assess the predictive uncertainty for the Institute of Hydrology Distributed Model and by the introduction of the Generalised Likelihood Uncertainty Estimation (GLUE) methodology (Beven and Binley, 1992) after which uncertainty in catchment modelling was high on the agenda in the scientific community. My main scientific contributions on uncertainty are the publications [11], [14] and [15] and the link of uncertainty to principles and protocols for good modelling practise in [12] and [13]. Although reported 10 years later than Binley et al. (1991), [11] was one of the first studies with uncertainty propagation through a complex, coupled distributed physically based catchment model with a focus on water quality. A key contribution of [14] and Refsgaard et al. (2005) is the broad framework for characterising uncertainty. This framework provides the link to uncertainty in the quality assurance work ([12], [13]). This broad framework is inspired by research in social science (Pahl-Wostl, 2002; van Asselt and Rotmans, 2002; Dewulf et al., 2005). The main difference between the traditions in social science and natural science is that social scientists emphasise participatory processes including consultation and involvement of users, also on uncertainty aspects, right from the beginning of a study, while natural scientists often talk about users as someone to which uncertainty results should be communicated, e.g. Pappenberger and Beven (2006). The most difficult uncertainty problem (in natural science) to handle today is the model structure uncertainty, and the most important and novel contribution is probably the efforts made in this respect, primarily the new framework outlined in [15] but also the inclusion of options for evaluating multiple conceptual models in the HarmoniQuA modelling protocol ([13] and Fig. 5). The approach suggested in [15] of using multiple conceptual models (model structures) is not new (IPCC, 2001; Beven, 2002a; Neuman and Wierenga, 2003) and the use of pedigree analysis to qualitatively assess the credibility of something is not new either (van der Sluijs et al., 2005). The novelty lies in the combination of the two approaches that originate from different disciplines.
76
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
4.4 Quality Assurance in Model based Water Management
4.4.1
Background
During the last decade many problems have emerged in river basin modelling projects, including poor quality of modelling, unrealistic expectations, and lack of credibility of modelling results. Some of the reasons for this lack of quality can be evaluated ([13]; Scholten et al., 2007) as the effect of: • Ambiguous terminology and a lack of understanding between key-players (modellers, clients, reviewers, stakeholders and concerned members of the public) • Bad practice (careless handling of input data, inadequate model set-up, insufficient calibration/validation and model use outside of its scope) • Lack of data or poor quality of available data • Insufficient knowledge on the processes • Poor communication between modellers and end-users on the possibilities and limitations of the modelling project and overselling of model capabilities • Confusion on how to use model results in decision making • Lack of documentation and clarity on the modelling process, leading to results that are difficult to audit or reproduce • Insufficient consideration of economic, institutional and political issues and a lack of integrated modelling. In the water resources management community many different guidelines on good modelling practice have been developed, see [13] for a review. One, if not the most, comprehensive example of a modelling guideline has been developed in The Netherlands (Van Waveren et al., 2000) as a result of a process involving all the main players in the Dutch water management field. The background for this was a perceived need to improve the quality of modelling (Scholten et al., 2000). Similarly, modelling guidelines for the Murray-Darling Basin in Australia were developed due to the perception among end-users that model capabilities may have been ‘over-sold’, and that there was a lack of consistency in approaches, communication and understanding among and between the modellers and the water managers, which often resulted in considerable uncertainty for decision making (Middlemis, 2000).
4.4.2
The HarmoniQuA approach
A software tool, MoST, with its associated knowledge base (KB), has been developed by the HarmoniQuA project ([13]; Scholten et al., 2007) to provide QA in modelling through guidance, monitoring and reporting. As defined in HarmoniQuA: “Quality Assurance (QA) is the procedural and operational framework used by an organisation managing the modelling study to build consensus among the organisations concerned in its implementation, to assure technically and scientifically adequate execution of all tasks included in the study, and to assure that all modelling-based analysis is reproducible and justifiable”. This modification of the older NRC (1990) definition includes the organisational, technical
77
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
and scientific aspects, but also the need to build consensus among the organisations concerned in accordance with the discussion in Section 2.1 above. Guidelines for good modelling practise are included in the Knowledge Base (KB) of MoST. The modelling process has been decomposed into five steps, see the flowchart in Fig. 5. Each step includes several tasks. Each task has an internal structure i.e. name, definition, explanation, interrelations with other tasks, activities, activity related methods, references, sensitivity/pitfalls, task inputs and outputs. The KB contains knowledge specific to seven domains (groundwater, precipitation-runoff, river hydrodynamics, flood forecasting, water quality, ecology and socio-economics), and forms the heart of the tool. A computer based journal is produced within MoST where the water manager and modelling team record the progress and decisions made during a model study according to the tasks in the flowchart. This record can be used when reviewing the model study to judge its quality. The most important QA principles incorporated in the KB are: • The five modelling steps conclude with a formal dialogue between the modeller and manager, where activities and results from the present step are reported, and details of plans for the next step (a revised work plan) are discussed. • External reviews are prescribed as the key mechanism of ensuring that the knowledge and experience of other independent modellers are used. • The KB provides public interactive guidelines to facilitate dialogue between modellers and the water manager, with options to include auditors (reviewers), stakeholders and the public. • There are many feed back loops, some technical involving only the modeller, and others that may require a decision before doing costly additional work. • The KB allows performance and accuracy criteria to be updated during the modelling process. In the first step the water manager’s objectives and requirements are translated into performance criteria that may include qualitative and quantitative measures. These criteria may be modified during the formal reviews of subsequent steps. • Emphasis is put on validation schemes, i.e. tests of model performance against data that have not been used for model calibration. • Uncertainties must be explicitly recognised and assessed (qualitatively and/or quantitatively) throughout the modelling process. MoST supports multi-domain studies and working in teams of different user types (water managers, modellers, auditors, stakeholders and members of the public). It contains an interactive glossary that is accessible via hyperlinked text. The key functionality of MoST is to: • Guide, to ensure a model has been properly applied. This is based on the Knowledge Base. • Monitor, to record decisions, methods and data used in the modelling work and in this way enable transparency and reproducibility of the modelling process. • Report, to provide suitable reports of what has been done for managers/clients, modellers, auditors, stakeholders and the general public.
78
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
4.4.3
January 2007
Organisational requirements for QA guidelines to be effective
Modelling studies involve several parties with different responsibilities. The key players are modellers and water managers, but often reviewers, stakeholders and the general public are also involved. To a large extent the quality of the modelling study is determined by the expertise, attitudes and motivation of the teams involved in the modelling and QA process. QA will only be successful if all parties actively support its use. The attitude of the modellers is important. NRC (1990) characterises this as follows: “most modellers enjoy the modelling process but find less satisfaction in the process of documentation and quality assurance”. Scholten and Groot (2002) describe the main problem with the Dutch Handbook on Good Modelling Practice as “they all like it, but only a few use it”. The water manager, however, has a particular responsibility, because he/she has the power to request and pay for adequate QA in modelling studies. Therefore, QA guidelines can only be expected to be used in practice if the water manager prescribes their use. It is therefore very important that the water manager has the technical capacity to organise the QA process. Often, water managers do not have individuals available with the appropriate training to understand and use models. An external modelling expert should then be sought to help with the QA process. However, this requires that the manager is aware of the problem and the need.
4.4.4
Performance criteria and uncertainty – when is a model good enough?
A critical issue is how to define the performance criteria. We agree with Beven (2002b) that any conceptual model is known to be wrong and hence any model will be falsified if we investigate it in sufficient detail and specify very high performance criteria. Clearly, if one attempts to establish a model that should simulate the truth it would always be falsified. However, this is not very useful information. Therefore, we are using the conditional validation, or the validation restricted to domain of applicability (or numerical universal as opposed to strictly universal in Popperian terms). The good question is then what is good enough? Or in other words what are the criteria? How do we select them? A good reference for model performance is to compare it with uncertainties of the available field observations. If the model performance is within this uncertainty range we often characterise the model as good enough. However, usually it is not so simple. How wide confidence bands do we accept on observational uncertainties – ranges corresponding to 65%, 95% or 99%? Do we always then reject a model if it cannot perform within the observational uncertainty range? In many cases even results from less accurate models may be useful. Therefore, the decision on what is good enough generally must be taken in a socio-economic context. For instance, the accuracy requirements to a model to be used for an initial screening of alternative options for location of a new small well field for a small water supply will be much smaller than the requirements to a model that is intended to be used for the final design of a large well field for a major water supply in an area with potential damaging effects on precious nature and other significant conflicts of interests. Thus, the accuracy criteria can not be decided universally by modellers or researchers, but must be different from case to case depending on how much is at stake in the decision to depend on the support from model predictions. This implies that the performance criteria must be discussed and agreed between the manager and the modeller beforehand.
79
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Accuracy requirements and uncertainty assessments of model simulations are two sides of the same coin, just seen from two different perspectives, namely the water manager and the modeller. As all uncertainty can not be characterised as statistical uncertainty (see Fig. 33 and Tables 2 and 3 in Subsection 4.3.1) it is also required to characterise accuracy requirements in qualitative terms. Furthermore the risk perception of the water manager and the stakeholders/public has to be considered. Therefore, involvement of stakeholders and public are most often required as an integrated part of this process (see also Section 2.1 and Figs. 1-2). According to the HarmoniQuA methodology stakeholder/public involvement is crucial at the beginning of a modelling project to frame the problem, define the requirements and assess the uncertainties (Henriksen et al., submitted). This way of thinking is well in line with the principles behind some of the Water Framework Directive Guidance Documents. For example the Guidance Document on Monitoring (EC, 2003a) does not specify the levels of precision and confidence required from the monitoring programmes, but rather states that the precision and confidence level should be sufficient to enable a meaningful assessment of for instance the status of the environment and should be sufficient to achieve an acceptable risk of making the wrong decision. This obviously calls for uncertainty assessments and public participation to have a central role in the entire process, which pave the road towards making adaptive management an important part of the river basin management process (Pahl-Wostl, 2002).
4.4.5
Discussion – post evaluation
The ideas and concepts behind the HarmoniQuA guidelines ([12], [13]) summarised above have been inspired from previous QA guidelines. The novel contributions have been inspired both from previous research activities (including [4], [5], [6], [7], [9], [11]) and from participation in a large range of national and international consultancy projects. Without having been in this crossroad between the research world and the practical world for more than two decades this would not have been possible. I consider my most important contributions in this respect to be: • The terminology and guiding principles behind the guidelines [12] are novel in their attempt to formulate a coherent approach that on the one hand has a solid scientifically philosophical foundation and on the other hand can be useful for practitioners. In the very controversial issue of model validation, where there has been almost a deadlock between different schools with respect to whether validation at all is possible, the philosophy of conditional validation is novel. • The major novelty of the HarmoniQuA approach does not lie in its guidance on model technical issues, but on its emphasis and more elaborate focus on the dialogue between modeller, water manager, reviewer, stakeholders and the public. In addition, there are novel elements on the large emphasis on uncertainty assessments throughout the modelling process and model validation. Finally, the emphasis on model reviews allows bringing in subjective knowledge and experience in the QA process. Both the HarmoniQuA guidelines and other recent good modelling practise guidelines have been deeply rooted both in the scientific community and among practitioners ([13]). As a comparison, ideas originating alone from the natural science community, such as the suggested Code of Practise on performing uncertainty analysis by Pappenberger and Beven (2006), are typically limited to valuable contributions on model technical issues, while they often do not consider the broader aspects of the modelling process such as the involvement of water managers and stakeholders.
80
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
5 Conclusions and Perspectives for Future Work
5.1 Summary of Main Scientific Contributions The contributions to scientific knowledge in the papers of the present thesis are discussed in the previous chapters. The main contributions have been in the following five areas: •
New conceptual understanding and code development. The Suså model ([1], [2]) was based on a new conceptual understanding of the surface water/groundwater interaction in moraine catchment. The code and its application brought new insight regarding the effect of groundwater abstraction on streamflow in catchments with such hydrogeological characteristics.
•
Model validation. The adoption and adaptation of rather rigorous principles for model validation and the examples of their application both for lumped conceptual and distributed physically based models is a cornerstone in my research. This work was first published in [6] and [7] and later brought into a broader modelling framework in [12] and [13]. In particular the introduction of the term ‘conditional validation’ in [7] and the outline of its scientific philosophical basis in [12] is novel.
•
Scaling. The publications focussing on scaling ([7], [10]) presents ideas crystallised from work with scaling problems in many modelling studies ranging from point scale to thousands of km2. The later framework, outlined in Section 4.1 above does not in any way ‘solve’ the scaling problem but contributes to clarifications on applicable methodologies with focus on their respective assumptions and limitations.
•
Uncertainty assessment. During the past decade a considerable part of my research work has focussed on uncertainty aspects. I consider my main contributions in this respect to be the introduction of the broader uncertainty framework integrated into the modelling framework ([13], [14]) and the work with model structure uncertainty ([15]).
•
Modelling protocols and guidelines for quality assurance in the modelling process. The modelling protocol in [7] and the later and more comprehensive one presented as part of the guidelines for quality assurance in the modelling framework in [13] are a formalisation of experience and practises that have gradually emerged over the years. The novel elements in [13] are the emphasis on (a) the interactive dialogue between modeller, water manager, reviewer, stakeholders and the public; (b) uncertainty assessments throughout the modelling process; (c) model validation; and (d) experience and subjective knowledge introduced through external model reviews.
These main contributions to scientific knowledge would, however, not have been possible without the experience and insight gained in modelling studies ranging from point scale ([3]) to large catchments ([4], [5], [8], [9], [11]).
81
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
5.2 Modelling Issues for Future Research Hydrological modelling has developed significantly during the three decades I have worked in this field. I started with editing punch cards and could only run one simulation per day (overnight) using model codes that today are considered small and simple. Since then, comprehensive new knowledge has been build into model codes and into the methodologies used in the modelling process. During the process of writing this thesis, where I had to review my older publications, it was interesting to note the gradual change in research focus. The first decade my research focused on development of new codes. During the second decade more general methodological problem areas such as scaling and model validation were addressed. Towards the end of the third decade the emphasis is now on the broader issues such as uncertainty assessment and quality assurance frameworks for the entire modelling process, and the interaction between the modelling and the water management processes. While this no doubt is affected by personal and career developments, it also reflects a general trend. We are no longer satisfied with being able to produce beautiful simulations with sophisticated new model codes; we also want to evaluate the credibility of such simulations and to apply them in real-world water management decisions. Certainly I did not foresee this development three decades ago. On this background it is therefore not wise to make long range forecasts on what we can expect as the key issues for future modelling research. Hence, the following list should not pretend to cover all the most important research issues for modelling during the coming many years. It rather presents a list of issues which I, seen from the perspective dealt with in the present thesis, consider the presently most important and fundamental problems requiring more research during the coming years. •
Improved representation of heterogeneity in reactive transport modelling. There will always be a need to improve our conceptual understanding of hydrological processes. It appears that, whereas we have had some success with prediction of flows and hydraulic heads, the existing paradigms in hydrological modelling are not good enough to simulate concentrations of conservative and reactive contaminants. Flows and hydraulic heads are much less depending on heterogeneity than concentrations, and it will be necessary to include heterogeneity much more explicitly in the modelling than done until now. Examples of areas, where this is important, include simulation of transport and fate of contaminants in aquifers and simulation of the stream-aquifer interaction governed by processes in river valleys.
•
Utilisation of new data types. Whenever possible we should try to make use of new data types. New techniques for collecting satellite data on surface conditions and geophysical data on subsurface features are promising and have not been fully exploited yet. We can hope and expect that better techniques will be developed during the coming years. Thus, it is not unrealistic in some years to have improved data providing both a much better spatial resolution of catchment/aquifer properties and on-line information on state variables. The improved spatial resolution can help us give a better representation of heterogeneities in models (see above), while on-line information provide interesting potentials for improved management. In order to utilise on-line data optimally new and improved data assimilation (updating) techniques will be required.
82
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
•
Model structure error. Probably the most important single issue related to uncertainty of model predictions is how to assess uncertainty caused by model structure error. It is important, because the most interesting fields of model applications deal with assessments of the effects on the ecosystem of human activities. And it is at the same time fundamentally difficult, because we in such situations are using models beyond the situations, where we can test the model performance against field data. I consider the framework based on multiple conceptual models ([15]) only to be a very first beginning in this respect.
•
Uncertainty and credibility of modelling in relation to water resources management. Uncertainty assessments of model predictions are crucial for a sound use of models in water resources management in practise. Model predictions without uncertainty assessments correspond to only presenting a (minor) part of the available information. Uncertainty in relation to water resources management in practise is not confined to statistical uncertainty. It is also required to include aspects of qualitative uncertainty and ignorance. Furthermore, uncertainty must be seen in a broad socioeconomic context where stakeholder and policy views are taken into account. There are many future challenges on this multi-disciplinary road. How do we ensure that models incorporate the best available information and adequately address the issues and the priorities set by water managers and stakeholders? How should we translate objectives and requirements formulated in qualitative language by water managers and stakeholders to accuracy criteria for a modelling study? And how should we compile and present uncertainties from a modelling study in a way that is understandable by non-modellers? Some of these questions are likely to be answered within the context of new water management paradigms such as adaptive management.
83
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
6 References Abbott MB (1992) The theory of the hydrological model, or: the struggle for the soul of hydrology. In: O’Kane JP (Ed.) Advances in theoretical hydrology, Elsevier, 237-254. Abbott MB, Bathurst JC, Cunge JA, O'Connel PE, Rasmussen J (1986a) An introduction to the European Hydrological System - Systeme Hydrologique Européen "SHE", 1: History and philosophy of a physically-based distributed modelling system. Journal of Hydrology, 87, 45-59. Abbott MB, Bathurst JC, Cunge JA, O'Connel PE, Rasmussen J (1986b) An introduction to the European Hydrological System - Systeme Hydrologique Européen "SHE", 2: Structure of a physically-based distributed modelling system. Journal of Hydrology, 87, 61-77. Abrahamsen P, Hansen S (2000) Daisy: an open soil-crop-atmosphere system model. Environmental Modelling & Software, 15, 313-330. Andersen J, Refsgaard JC, Jensen KH (2001) Distributed hydrological modelling of the Senegal River Basin – model construction and validation. Journal of Hydrology, 247, 200-214. Anderson MP, Woessner WW (1992) The role of postaudit in model validation. Advances in Water Resources, 15, 167-173. Babendreier JE (2003) National-scale multimedia risk assessment for hazardous waste disposal. International Workshop on Uncertainty, Sensitivity and Parameter Estimation for Multimedia Environmental Modelling held at U.S Nuclear Regulatory Commission, Rockville, Maryland, August 19-21, 2003. Proceedings, 103-109. Bathurst JC (1986a) Physically-based distributed modelling of an upland catchment using the Systeme Hydrologique Européen. Journal of Hydrology, 87, 79-102. Bathurst JC (1986b) Sensitivity analysis of the Systeme Hydrologique Européen for an upland catchment. Journal of Hydrology, 87, 103-123. Beck MB (1987) Water quality modelling: a review of the analysis of uncertainty. Water Resources Research, 23(8), 1393-1442. Beck MB (2005) Environmental foresight and structural change. Environmental Modelling & Software, 20, 651-670. Bergström (1976) Development and application of a conceptual runoff model for Scandinavian catchments. PhD Thesis, University of Lund, Bulletin Series A No 52. Bergström S (1992) The HBV model – its structure and applications. SMHI RH No 4. Norrköping. Bergström S (1995) The HBV model. In: Singh VP (Ed) Computer Models of Watershed Hydrology. Water Resources Publications, Highlands Ranch, Colorado, 443-476. Bergström S, Forsman A (1973) Development of a conceptual deterministic rainfall-runoff model. Nordic Hydrology, 4, 147-170. Beven K (1989) Changing ideas in hydrology – the case of physically based models. Journal of Hydrology, 105, 157-172. Beven K (1995) Linking parameters across scales: Subgrid parameterization and scale dependent hydrological models. Hydrological Processes, 9, 507-525. Beven K (1996a) A discussion of distributed hydrological modelling. In: Abbott MB, Refsgaard JC (Eds): Distributed Hydrological Modelling, Kluwer Academic Publishers, 255-278. Beven K (1996b) Response to comments on ‘A discussion of distributed hydrological modelling’. In: Abbott MB, Refsgaard JC (Eds): Distributed Hydrological Modelling, Kluwer Academic Publishers, 289-295. Beven K (2001) How far can we go in distributed hydrological modelling? Hydrology and Earth System Sciences, 5(1), 1-12. Beven K (2002a) Towards an alternative blueprint for a physically based digitally simulated hydrologic response modelling system. Hydrological Processes, 16(2), 189-206. Beven K (2002b) Towards a coherent philosophy for modelling the environment. Proceedings of the Royal Society of London, A, 458 (2026), 2465-2484.
84
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Beven K, Binley AM (1992) The future of distributed models: model calibration and uncertainty prediction. Hydrological Processes, 6, 279-298. Binley AM, Beven KJ, Calver A, Watts LG (1981) Changing Responses in Hydrology: Assessing the Uncertainty in Physically Based Model Predictions. Water Resources Research, 27(6), 1253-1261. Birkinshaw SJ, Ewen J (2000) Nitrogen transformation component for SHETRAN catchment nitrate transport modelling. Journal of Hydrology, 230, 1-17. Blöschl G, Sivapalan M (1995) Scale issues in hydrological modelling: A review. Hydrological Processes, 9, 251-290. Brown JD (2004) Knowledge, uncertainty and physical geography: towards the development of methodologies for questioning belief. Transactions of the Institute of British Geographers 29(3), 367-381. Brown JD, Heuvelink GBM, Refsgaard JC (2005) An integrated framework for assessing and recording uncertainties about environmental data. Water Science and Technology, 52(6), 153-160. Brown JD, Heuvelink GBM (2005) Data Uncertainty Engine (DUE) User’s Manual. University of Amsterdam. http://www.harmonirib.com. Butts MB, Payne JT, Kristensen M, Madsen H (2004) An evaluation of the impact of model structure on hydrological modelling uncertainty for streamflow prediction. Journal of Hydrology, 298, 242-266. Burnash RJC (1995) The NWS river forecast system - catchment modelling. In: Singh VP (Ed): Computer Models of Watershed Hydrology, Water Resources Publications, 311-366. Christensen S (1994) Hydrological Model for the Tude Å Catchment. Nordic Hydrology, 25, 145-166. Conan C, Bouraoui F, Turpin N, de Marsily G, Bidoglio G (2003) Modelling Flow and Nitrate Fate at Catchment Scale in Brittany (France). Journal of Environmental Quality, 32, 2026-2032. Crawford NH, Linsley RK (1966) Digital simulation in hydrology, Stanford Watershed Model IV, Department of Civil Engineering, Stanford University, Technical Report 39. Currie JA (1961) Gaseous diffusion in the aeration of aggregated soils. Soil Science, 92, 40-45. Dagan G (1986) Statistical theory of groundwater flow and transport: pore to laboratory, laboratory to formation and formation to regional scale. Water Resources Research, 22(9), 120-134. De Marsily, G Combes P, Goblet P (1992) Comments on 'Ground-water models cannot be validated', by Konikow LF, Bredehoeft, JD, Advances in Water Resources, 15, 367-369. Dewulf A, Craps M, Bouwen R, Pahl-Wostl C (2005) Integrated management of natural resources dealing with ambiguous issues, multiple actors and diverging frames. Water Science and Technology, 52(6), 115-124. DHI (1995) MIKE 21 Short Description. Danish Hydraulic Institute, Hørsholm, Denmark. Djuurhus J, Hansen S, Schelde K, Jacobsen OH (1999) Modelling mean nitrate leaching from spatially variable fields using effective parameters. Geoderma, 87,261-279. Doherty J (2003) Ground water model calibration using pilot points and regularization. Ground Water, 41(2), 170-177. Duan Q, Sorooshian S, Gupta VK (1994) Optimal use of the SCE-UA global optimization method for calibrating watershed models. Journal of Hydrology 158, 265–284. Dubus, IG, Brown CD, Beulke S (2003) Sources of uncertainty in pesticide fate modelling. The Science of the Total Environment, 317, 53-72. EC (1992) Working Group of Independent Experts on Variant C of the Gabcikovo-Nagymaros Project, Working Group Report, Commission of the European Communities, Czech and Slovak Federative Republic, Republic of Hungary, Budapest November 23, 1992. EC (1993a) Working Group of Monitoring and Water Management Experts for the Gabcikovo System of Locks - Data Report, Commission of the European Communities, Republic of Hungary, Slovak Republic, Budapest November 2, 1993. EC (1993b) Working Group of Monitoring and Water Management Experts for the Gabcikovo System of Locks - Report on Temporary Water Management Regime, Commission of the European Communities, Republic of Hungary, Slovak Republic, Bratislava, December 1, 1993. EC (2003a) Common Implementation Strategy for the Water Framework Directive (2000/60/EC). Guidance Document No. 7. Monitoring under the Water Framework Directive. Working Group 2.7. Office for the Official Publications of the European Communities, Luxembourg.
85
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
EC (2003b) Common Implementation Strategy for the Water Framework Directive (2000/60/EC). Guidance Document No. 11. Planning Processes. Working Group 2.9. Office for the Official Publications of the European Communities, Luxembourg. EC (2004) Common Implementation Strategy for the Water Framework Directive (2000/60/EC) Guidance Document No 3, pressures and impacts, IMPRESS. Working Group 2.3. Office for the Official Publications of the European Communities, Luxembourg. Fleming G (1975) Computer simulation techniques in hydrology. Elsevier, New York. Franchini M, Pacciani M (1992) Comparative analysis of several conceptual rainfall-runoff models. Journal of Hydrology, 122, 161-219. Freeze RA, Harlan RL (1969) Blueprint for a physically-based digitally-simulated hydrologic response model. Journal of Hydrology, 9, 237-258. Gelhar LW (1986) Stochastic subsurface hydrology. From theory to application. Water Resources Research, 22(9), 135-145. Graham DN, Butts MB (2005) Flexible integrated watershed modelling with MIKE SHE. In: Singh VP, Frevert DK (Eds) Watershed Models. CRC Press, Chapter 10. Graham LP (1999) Modelling runoff to the Baltic Sea, Ambio, 28, 328-334. Grayson RB, Moore ID, McHahon TA (1992a) Physically based hydrologic modelling, 1. A terrain-based model for investigative purposes. Water Resources Research, 28(10), 2639-2658. Grayson RB, Moore ID, McHahon TA (1992b) Physically based hydrologic modelling, 2. Is the concept realistic ? Water Resources Research, 28(10), 2639-2658. Grayson R, Blöschl G (2000) Spatial Modelling of Catchment Dynamics. In: Grayson R, Blöschl G (Eds.) Spatial Patterns in Catchment Hydrology: Observations and Modelling. Cambridge University Press, UK. Groenenberg JE, Kros J, van der Salm C, de Vries W (1995) Application of the model NUCSAM to the Solling spruce site. Ecological Modelling, 83, 97-107. GWP (2000) Integrated Water Resources Management. TAC Background Papers No. 4. Global Water Partnership, Stockholm. Hansen S, Jensen HE, Nielsen NE, Svendsen H (1991) Simulation of nitrogen dynamics and biomass production in winter wheat using the Danish simulation model DAISY. Fertilizer Research, 27, 245-259. Hansen S, Thorsen M, Pebesma E, Kleeschulte S, Svendsen H (1999) Uncertainty in simulated leaching due to uncertainty in input data. A case study. Soil Use and Management, 15, 167-175. Harrar WG, Sonnenborg TO, Henriksen HJ (2003) Capture zone, travel time and solute transport predictions using inverse modelling and different geological models. Hydrogeology Journal, 11(5), 536-548. Havnø K, Madsen MN, Dørge J (1995) MIKE 11 - A Generalized River Modelling Package. In: Singh VP (Ed) Computer Models of Watershed Hydrology, Water Resources Publications, Highlands Ranch, Colorado, 733-782. Henriksen HJ, Refsgaard JC, Sonnenborg TO, Gravesen P, Brun A, Refsgaard A, Jensen KH (2001) STÅBI i grundvandsmodellering (Handbook in groundwater modelling). Danmarks og Grønlands Geologiske Undersøgelse, Rapport 2001/56. (In Danish) Henriksen HJ, Troldborg L, Nyegaard P, Sonnenborg TO, Refsgaard JC, Madsen B (2003) Methodology for construction, calibration and validation of a national hydrological model for Denmark. Journal of Hydrology 280, 52-71. Henriksen HJ, Refsgaard JC, Højberg AL, Ferrand N, Gijsbers P, Scholten H (submitted) Public participation in relation to quality assurance of water resources modelling (HarmoniQuA). Heuvelink GBM, Pebesma EJ (1999) Spatial aggregation and soil process modelling. Geoderma, 89, 47-65. Hill MC (1998) Methods and guidelines for effective model calibration. U.S. Geological Survey, WaterResources Investigations Report 98-4005. Denver CO. Højberg AL, Refsgaard JC (2005) Model Uncertainty - Parameter uncertainty versus conceptual models. Water Science and Technology, 52(6), 177-186. ICJ (1997) Case Concerning Gabcikovo-Nagymaros project (Hungary/Slovakia). Summary of the Judgement of 25 September 1997. International Court of Justice, The Hague.
86
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
ICWE (1992) The Dublin Statement and report of the conference. International Conference on Water and the Environment: Development issues for the 21st century. 26-31 January 1992, Dublin, Ireland. IPCC (2001) Climate Change 2001: The Scientific Basis. Contribution of Working Group I to the Third Assessment Report of the Intergovernmental Panel of Climate Change [Houghton JT, Ding Y, Griggs DJ, Noguer M, van der Linden PJ, Dai X, Maskell K and Johnson CA (eds)]. Cambridge University Press, Cambridge, UK and New York, NY, USA, 881 pp. Jensen KH, Mantoglou A (1992) Application of stochastic unsaturated flow theory, numerical simulations, and comparisons to field observations. Water Resources Research, 28, 269-284. Jensen RA, Jørgensen GH (1988) Hydrologisk overfladevands/grundvands model (Hydrological surface water/groundwater model). Technical report prepared by Danish Hydraulic Institute for the County of Storstrøm and the County of Vestsjælland. (in Danish) Jensen KH, Refsgaard JC (1991a) Spatial variability of physical parameters and processes in two field soils. Part I: Water Flow and Solute Transport at Local Scale. Nordic Hydrology, 22, 275-302. Jensen KH, Refsgaard JC (1991b) Spatial variability of physical parameters and processes in two field soils. Part II: Water flow at field scale. Nordic Hydrology, 22, 303-326. Jensen KH, Refsgaard JC (1991c) Spatial variability of physical parameters and processes in two field soils. Part III: Solute Transport at Field Scale. Nordic Hydrology, 22, 327-340. Jønch-Clausen T (1979) SHE. Système Hydrologiique Européen. A short description. Danish Hydraulic Institute, Hørsholm, Denmark. Jønch-Clausen T (2004) Integrated Water Resources Management (IWRM) and Water Efficiency Plans by 2005. Why, What and How? Global Water Partnership, TEC Background Papers No. 10, Stockholm. Jønch-Clausen T, Refsgaard JC (1984) A Mathematical Modelling System for Flood Forecasting. Nordic Hydrology, 15, 307-318. Kaiser-Hill (2001) Model Code and Scenario Selection Report Site-Wide Water Balance Rocky Flats Environmental Technology Site. Report 01-RF-00337. Kaiser-Hill Company LLC. Klauer B, Brown JD (2003) Conceptualising imperfect knowledge in public decision making: ignorance, uncertainty, error and ‘risk situations’. Environmental Research, Engineering and Management. Klemes V (1986) Operational testing of hydrological simulation models. Hydrological Sciences Journal, 31, 13-24. Knudsen J, Thomsen A, Refsgaard JC (1986) WATBAL: A semi-distributed, physically based hydrological modelling system. Nordic Hydrology, 17, 347-362. Konikow LF, Bredehoeft JD (1992) Ground-water models cannot be validated. Advances in Water Resources, 15, 75-83. Kros J, Reinds GJ, de Vries W, Latour JB, Bollen M (1995) Modelling of soil acidity and nitrogen availability in natural ecosystems in response to changes in acid deposition and hydrology. Report 95, DLO Winand Staring Centre, Wageningen. Kutchment LS, Demidov VN, Naden PS, Cooper DM, Broadhurst P (1996) Rainfall-runoff modelling of the Ouse basin, North Yorkshire: an application of a physically based distributed model. Journal of Hydrology, 181, 323-342. Lane SA, Richards KS (2001) The ‘Validation’ of Hydrodynamic Models: Some Critical Perspectives. In: Anderson MG, Bates PD (Eds) Model Validation perspectives in Hydrological Science, 413-438. John Wiley & Sons, Ltd. Linkov I, Burmistrov D (2003) Model Uncertainty and Choices Made by Modelers: Lessons Learned from the International Atomic Energy Model Intercomparisons. Risk Analysis, 23(6), 1297-1308. Lloyd JW (1980) The importance of drift deposit influences on the hydrogeology of major British aquifers. Institution of Water Engineers and Scientists, Journal, 34, 346-356. Loague KM, Freeze RA (1985) A Comparison of Rainfall-Runoff Modelling Techniques on Small Upland Catchments. Water Resources Research, 21(2), 1985. Luckner L (1978) Gekoppelte Grundwasser-Oberflächenwassermodelle (A coupled groundwater-surface water model). Wasserwirtschaft-Wassertechnik, 1978, 276-278 (In German). Madsen H, Skotner C (2005) Adaptive state updating in real-time river flow forecasting – a combined filtering and error forecasting procedure. Journal of Hydrology, 308, 302-312.
87
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Michaud J, Sorooshian S (1994) Comparison of simple versus complex distributed runoff models on a midsized semiarid watershed. Water Resources Research, 30(3), 593-605. Michaud JD, Shuttelworth WJ (1997) Executive summary of the Tuczon aggregation workshop. Journal of Hydrology, 190, 176-181. Middlemis H (2000) Murray-Darling Basin Commission. Groundwater flow modelling guideline. Aquaterra Consulting Pty Ltd., South Perth. Western Australia. Project no. 125. Miles JC, Rushton KR (1983) A coupled surface water and groundwater catchment model. Journal of Hydrology, 62, 159-177. Neuman SP, Wierenga PJ (2003) A comprehensive strategy of hydrogeologic modeling and uncertainty analysis for nuclear facilities and sites. University of Arizona, Report NUREG/CR-6805. Nielsen DR, Bigger JW, Erk KT (1973) Spatial variability of field measured soil water properties. Hilgardia, 42, 215-259. Nielsen SA, Hansen E (1973) Numerical simulation of the rainfall-runoff process on a daily basis. Nordic Hydrology, 4, 171-190. NRC (1990) Ground Water Models: Scientific and Regulatory Applications. National Research Council, National Academy Press, Washington, D.C. Oreskes N, Shrader-Frechette K, Belitz K (1994) Verification, validation and confirmation of numerical models in the earth sciences. Science, 264, 641-646. Pahl-Wostl C (2002) Towards sustainability in the water sector – The importance of human actors and processes of social learning. Aquatic Sciences, 64, 394-411. Panday S, Hayakorn PS (2004) A fully coupled physically-based spatially-distributed model for evaluating surface/subsurface flow. Advances in Water Resources, 27, 361-382. Pappenberger F, Beven KJ (2006) Ignorance in bliss: Or seven reasons not to use uncertainty analysis. Water Resources Research 42, W05302, doi:10.1029/2005WR004820. Pascual P, Steiber N, Sunderland E (2003) Draft guidance on development, evaluation and application of regulatory environmental models. The Council for Regulatory Environmental Modeling. Officie of Science Policy, Office of Research and Development. US Environmental Protection Agency, Washington D.C. 60 pp. Perkins SP, Sophocleous M (1999) Development of a Comprehensive Watershed Model Applied to Study Stream Yield under Drought Conditions. Ground Water, 37(3), 418-426. Perrin C, Michel C, Andréassian V (2001) Does a large number of parameters enhance model performance? Comparative assessment of common catchment model structures on 429 catchments. Journal of Hydrology, 242, 275-301. Poeter E, Anderson D (2005) Multiple Ranking and Inference in Ground Water Modeling. Ground Water, 43(4), 597-605. Popper KR (1959) The logic of scientific discovery. Hutchingson & Co, London. Prickett TA, Lonnquist CG (1971) Selected digital computer techniques for groundwater resource evaluation. Illinois State Water Survey, Bulletin 55. Querner EP (1997) Description and application of the combined surface and groundwater flow model MOGROW. Journal of Hydrology, 192, 158-188. Quinn PF, Beven KJ (1993) Spatial and temporal predictions of soil moisture dynamics, runoff, variable source areas and evapotranspiration for Plynlimon, Mid-Wales. Hydrological Processes, 7, 425-448. Radwan M, Willems P, Berlamont J (2004) Sensivity and uncertainty analysis for river quality modelling. Journal of Hydroinformatics, 6, 83-99. Reed S, Koren V, Smith M, Zhang Z, Moreda F, Seo D-J (2004) Overall distributed model intercomparison project results. Journal of Hydrology, 298, 27-60. Refsgaard JC (1981) The surface water component of an integrated hydrological model. Danish Committee for Hydrology. Suså Report No. H12. Refsgaard JC (1996) Terminology, modelling protocol and classification of hydrological model codes. In: Abbott MB, Refsgaard JC (Eds): Distributed Hydrological Modelling, Kluwer Academic Publishers, 1739.
88
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Refsgaard JC, Stang O (1981) An integrated groundwater/surface water hydrological model. Danish Committee for Hydrology. Suså Report No. H13. Refsgaard JC, Rosbjerg D, Markussen LM (1983) Application of Kalman filter to real-time operation and to uncertainty analyses in hydrological modelling. IAHS Publication No 147, 273-282. Refsgaard JC, Storm B (1995) MIKE SHE. In: Singh VP (Ed) Computer Models of Watershed Hydrology. Water Resources Publications, Highlands Ranch, Colorado, 809-846. Refsgaard JC, Storm B, Abbott MB (1996) Comments on ‘A discussion of distributed hydrological modelling’. In: Abbott MB, Refsgaard JC (Eds): Distributed Hydrological Modelling, Kluwer Academic Publishers, 279-287. Refsgaard JC, Ramaekers D, Heuvelink GBM, Schreurs V, Kros H, Rosén L, Hansen S (1998) Assessment of ‘cumulative’ uncertainty in spatial decision support systems: Application to examine the contamination of groundwater from diffuse sources (UNCERSDSS). Presented at the European Climate Science Conference, Vienna, 19-23 October 1998. Refsgaard JC, Butts MB (1999) Determination of grid scale parameters in catchment modelling by upscaling local scale parameters. Key note presentation. Proceedings of the EurAgEng International Workshop on Modelling of transport processes in soils at various scales in time and space, 24-26 November 1999, Leuven, Belgium. Refsgaard JC, van der Sluijs JP, Højberg AL, Vanrolleghem P (2005) Harmoni-CA Guidance Uncertainty Analysis. Guidance 1. 46 pp. www.harmoni-ca.info. Rykiel ER (1996) Testing ecological models: the meaning of validation. Ecological Modelling, 90, 229-244. Saulnier GM, Beven K, Obled C (1997) Digital elevation analysis for distributed hydrological modelling: Reducing scale dependence in effective hydraulic conductivity values. Water Resources Research, 33(9), 2097-2101. Scholten H, Van Waveren RH, Groot S, Van Geer FC, Wösten JHM, Koeze RD, Noort JJ (2000) Good Modelling Practice in water management. Paper presented on Hydroinformatics 2000, Cedar Rapids, IA, USA. Scholten H, Groot S (2002) Dutch guidelines. In: Refsgaard, JC (Ed) State-of-the-Art Report on Quality Assurance in modelling related to river basin management. Chapter 12, Geological Survey of Denmark and Greenland, Copenhagen. www.harmoniqua.org. Scholten H, Kassahun A, Refsgaard JC, Kargas T, Gavardinas C, Beulens AJM (2007) A methodology to support multidisciplinary model-based water management. Environmental Modelling & Software, 22, 743-759. Singh VP (Ed) (1995) Computer Models of Watershed Hydrology. Water Resources Publications, Highlands Ranch, Colorado. Smith KA (1980) A model of the extent of anaerobic zones in aggregated soils and its potential application to estimates of denitrification. Journal of Soil Science, 31, 263-277. Sonnenborg TO, Christensen BSB, Nyegaard P, Henriksen HJ, Refsgaard JC (2003) Transient modelling of regional groundwater flow using parameter estimates from steady-state automatic calibration. Journal of Hydrology, 273, 188-204. Stang O (1981) A regional groundwater model for the Suså area. Danish Committee for Hydrology. Suså Report No. H9. Styczen M, Storm B (1993a) Modelling of N-movements on catchment scale – a tool for analysis and decisionmaking. 1. Model description. Fertilizer Research, 36, 1-6. Styczen M, Storm B (1993b) Modelling of N-movements on catchment scale – a tool for analysis and decisionmaking. 2. A case study. Fertilizer Research, 36, 7-17. Tampa Bay Water (2001) Scientific review of integrated hydrologic model ISGW/CNTB121. Prepared by West Consultants, Gartner Lee Ltd and AQUA TERRA Consultants for Tampa Bay Water, Florida. Thomas RG (1973) Groundwater models. FAO, Irrigation and Drainage Paper 21, Rome. Troch PA, Mancini M, Paniconni C, Wood EF (1993) Evaluation of a Distributed Catchment Scale Water Balance Model. Water Resources Research, 29(6), 1805-1817. Troeh FR, Jabro JD, Kirkham D (1982) Gaseous diffusion equations for porous materials. Geoderma, 27, 239-253.
89
Refsgaard JC – Doctoral Thesis Hydrological Modelling and River Basin Management
January 2007
Troldborg L (2004) The influence of conceptual geological models on the simulation of flow and transport in Quaternary aquifer systems. PhD Thesis. Geological Survey of Denmark and Greenland, Report 2004/107. Van Asselt MBA, Rotmans J (2002) Uncertainty in Integrated Assessment Modelling. From Positivism to Pluralism. Climatic Change, 54: 75-105. Van der Sluijs JP, Craye M, Funtowicz SO, Kloprogge P, Ravetz J, Risbey JS (2005) Combining Quantitative and Qualitative Measures of Uncertainty in Model based Foresight Studies: the NUSAP System. Risk Analysis, 25(2), 481-492. Van Griensven A, Meixner T (2004) Dealing with unidentifiable sources of uncertainty within environmental models. In: Pahl C, Schmidt S, Jakeman T. (Eds.), iEMSs 2004 International Congress: "Complexity and Integrated Resources Management". International Environmental Modelling and Software Society, Osnabrück, Germany, June 2004. Van Loon E, Refsgaard JC (eds.) (2005) Guidelines for assessing data uncertainty in hydrological studies. HarmoniRiB Report. Geological Survey of Denmark and Greenland. http://www.harmonirib.com. Van Waveren RH, Groot S, Scholten H, Van Geer FC, Wösten JHM, Koeze RD, Noort JJ (2000) Good Modelling Practice Handbook, STOWA Report 99-05, Utrecht, RWS-RIZA, Lelystad, The Netherlands, http://waterland.net/riza/aquest/ Vrugt J, Diks CGH, Gupta HV (2005) Improved treatment of uncertainty in hydrologic modelling: Combining the strengths of global optimization and data assimilation. Water Resources Research, 41, W01017, doi:10.1029/2004WR003059. Walker WE, Harremoës P, Rotmans J, Van der Sluijs JP, Van Asselt MBA, Janssen P, Krayer von Krauss MP (2003) Defining Uncertainty A Conceptual Basis for Uncertainty Management in Model-Based Decision Support, Integrated Assessment, 4(1), 5-17. Wardlaw RB (1978) The development of a deterministic integrated surface/subsurface hydrological response model. PhD Thesis, University of Stratchclyde, Glasgow. Wardlaw RB, Wyness A, Rippon P (1994) Integrated catchment modelling. Surveys in Geophysics, 15, 311330. Weeks JB (1974) Simulated effects of oil-shale development on the hydrology of the Piceance basin, Colorado. US Geological Survey, Professional Paper 908. Wen X-H, Gómez-Hernández JJ (1996) Upscaling hydraulic conductivities in heterogeneous media: An overview. Journal of Hydrology, 183, ix-xxxii. WMO (1975) Intercomparison of conceptual models used in operational hydrological forecasting. WMO Operational Hydrology Report No 7, WMO No 429, World Meteorological Organisation, Geneva. WMO (1988) Intercomparison of models for snowmelt runoff. WMO Operational Hydrology Report No 23, WMO No 646, World Meteorological Organisation, Geneva. WMO (1992) Simulated real-time intercomparison of hydrological models. WMO Operational Hydrology Report No 38, WMO No 779, World Meteorological Organisation, Geneva. Wolf J, Beusen AHW, Groenendijk P, Kroon T, Rötter R, van Zeijts H (2003) The integrated modelling system STONE for calculating nutrient emissions from agriculture in the Netherlands. Environmental Modelling & Software, 18, 597-617. Wood EF, Sivapalan M, Beven KJ, Band L (1988) Effects of spatial variability and scale with implications to hydrologic modelling. Journal of Hydrology, 102, 29-47. WSSTP (2005) Water safe strong and sustainable. A European vision for water supply and sanitation in 2030. Water Supply and Sanitation Technology Platform. October 2005. http://www.wsstp.org WWAP (2003) Water for People, Water for Life. UN World Water Development Report. Prepared as a collaborative effort of 23 UN agencies and convention secretariats co-ordinated by the World Water Assessment Programme. UNESCO, Paris. http://www.unesco.org/water/wwap/index.shtml
90
[1]
Refsgaard JC, Hansen E (1982) A Distributed Groundwater/Surface Water Model for the Suså Catchment. Part 1: Model Description. Nordic Hydrology, 13, 299-310.
Reprinted with permission from Nordic Hydrology
[2]
Refsgaard JC, Hansen E (1982) A Distributed Groundwater/Surface Water Model for the Suså Catchment. Part 2: Simulations of Streamflow Depletions Due to Groundwater Abstraction. Nordic Hydrology, 13, 311-322.
Reprinted with permission from Nordic Hydrology
[3]
Refsgaard JC, Christensen TH, Ammentorp HC (1991) A model for oxygen transport and consumption in the unsaturated zone. Journal of Hydrology, 129, 349-369.
Reprinted from Journal of Hydrology with permission from Elsevier
[4]
Refsgaard JC, Seth SM, Bathurst JC, Erlich M, Storm B, Jørgensen, GH, Chandra S (1992) Application of the SHE to catchments in India - Part 1: General results. Journal of Hydrology, 140, pp 1-23.
Reprinted from Journal of Hydrology with permission from Elsevier
[5]
Jain SK, Storm B, Bathurst JC, Refsgaard JC, Singh RD (1992) Application of the SHE to catchments in India - Part 2: Field experiments and simulation studies with the SHE on the Kolar subcatchment of the Narmada River. Journal of Hydrology, 140, 25-47.
Reprinted from Journal of Hydrology with permission from Elsevier
[6]
Refsgaard JC, Knudsen J (1996) Operational validation and intercomparison of different types of hydrological models. Water Resources Research, 32 (7), 2189-2202.
Reproduced by permission of American Geophysical Union
WATER RESOURCES RESEARCH, VOL. 32, NO. 7, PAGES 2189 –2202, JULY 1996
Operational validation and intercomparison of different types of hydrological models Jens Christian Refsgaard and Jesper Knudsen Danish Hydraulic Institute, Hørsholm, Denmark
Abstract. A theoretical framework for model validation, based on the methodology originally proposed by Klemes [1985, 1986], is presented. It includes a hierarchial validation testing scheme for model application to runoff prediction in gauged and ungauged catchments subject to stationary and nonstationary climate conditions. A case study on validation and intercomparison of three different models on three catchments in Zimbabwe is described. The three models represent a lumped conceptual modeling system (NAM), a distributed physically based system (MIKE SHE), and an intermediate approach (WATBAL). It is concluded that all models performed equally well when at least 1 year’s data were available for calibration, while the distributed models performed marginally better for cases where no calibration was allowed.
Introduction In recent years water resources studies have become increasingly concerned with aspects of water resources for which data are not directly available. Examples include studies of the development potential of ungauged areas, environmental impacts of land use changes related to agricultural and forestry practices, conjunctive use of groundwater and surface water, and climate impact studies concerned with the effects on water resources of an anticipated climate change. In these and other types of studies, hydrological simulation models are often used to provide the missing information as a basis for decisions regarding the development and management of water and land resources. Traditionally, hydrological simulation modeling systems are classified in three main groups, namely, (1) empirical black box, (2) lumped conceptual, and (3) distributed physically based systems. The great majority of the modeling systems used in practice today belongs to the simple types (1) or (2) and require a modest numbers of parameters (approximately 5–10) to be calibrated for their operation. Despite their simplicity, many models have proven quite successful in representing an already measured hydrograph. A severe drawback of these traditional modeling systems, however, is that their parameters are not directly related to the physical conditions of the catchment. Accordingly, it may be expected that their applicability is limited to areas where runoff has been measured for some years and where no significant change in catchment conditions have occurred. To provide a more appropriate tool for the type of studies mentioned above, considerable efforts within hydrological research have been directed toward development of distributed physically based catchment models. Such models use parameters which are related directly to the physical characteristics of the catchment (topography, soil, vegetation, and geology) and operate within a distributed framework to account for the spatial variability of both physical characteristics and meteorological conditions. These models aim at describing the hyCopyright 1996 by the American Geophysical Union. Paper number 96WR00896. 0043-1397/96/96WR-00896$09.00
drological processes and their interaction as and where they occur in the catchment and therefore offer the prospect of remedying the shortcomings of the traditional rainfall runoff models. Although there appears to be a certain degree of consensus at the theoretical level regarding the potential of the distributed physically based types of models, there are widely divergent points of view as to whether they offer a significant improvement in actual performance when compared to the wellproven lumped conceptual model type. Beven [1989, p. 161] argues from theoretical considerations of scale problems that “the current generation of distributed physically based models are lumped conceptual models,” and, further, that all current physically based models “are not well suited to applications to real catchments.” Grayson et al. [1992] support this view and claim that physically based models have been oversold by their developers. Other authors, for example, Smith et al. [1994], argue that this criticism is “overly pessimistic.” An evaluation of the capabilities of hydrological models when applied in the absence of site calibration data and limited validation data to predict the effects of major land use changes was made by the Task Committee on Quantifying Land-Use Change Effects [U.S. Committee, 1985], which reported a great belief among committee members in the capabilities of 28 surface water hydrological modeling systems, most of which can be classified as lumped conceptual models. In view of the limited number of model comparison studies conducted and the less-than-encouraging results often obtained, this confidence is remarkable. According to the U.S. Committee [1985, p. 1], “the reasons for this confidence were explored and appear to be based upon personal experience, possibly tempered by belief in the model originators.” Owing to the complexity of the problems involved, further theoretical evaluation is not likely to provide a definite conclusion regarding the capability and limitation of distributed, physically based modeling systems. For establishing a basis to better advance the discussion, relevant model validations appear to be a more fruitful approach, where the models concerned simply are subjected to a range of practical modeling tests to validate their capability for undertaking particular tasks.
2189
2190
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
In this respect, Klemes [1986, p. 17], has developed a hierarchial scheme for model testing, which is based on the philosophy that “a hydrological simulation model must demonstrate, before it is used operationally, how well it can perform the kind of task for which it is intended.” It may appear needless to advocate such a basic and evident requirement. Unfortunately, it is well justified in view of the current practice in hydrological model testing. The present paper is based on results from a research project conducted at the Danish Hydraulic Institute (DHI) [1993a]. The project had two major objectives. The first objective was to identify a rigorous framework for the testing of model capabilities for different types of tasks. The second objective was to use this theoretical framework and conduct an intercomparison study involving application of three modeling systems of different complexity to a number of tasks ranging from traditional simulation of stationary, gauged catchments to simulation of ungauged catchments and of catchments with nonstationary climate conditions. Data from three catchments in Zimbabwe were used for the tests. The research project was a contribution to project D.5, “Testing the transferability of hydrological simulation models,” forming part of the World Climate Programme—Water [World Meteorological Organization (WMO), 1985]. Some of the results of DHI [1993a] were presented by Refsgaard [1996] with a focus on modeling the land surface processes and the coupling between hydrological and atmospheric models within the global change context. Thus Refsgaard [1996] presents some of the results from two of the Zimbabwean catchments to illustrate data requirements and form the basis for conclusions regarding which type of hydrological model is required for climate change modeling. The present paper, on the other hand, emphasizes the modeling methodology and contains a summary of all the test results from all the three Zimbabwian catchments. It furthermore provides a general discussion of these results with references to similar studies reported in literature.
Theoretical Framework for Model Validation Terminology No unique and generally accepted terminology is presently used in the hydrological community with regard to issues related to model validation. The framework used in the present paper is basically in line with the terminology defined by Schlesinger et al. [1979], Tsang [1991], and Flavelle [1992] and comprises the following key definitions. A modeling system (i.e., code) is a generalized software package, which can be used for different catchments without modifying the source code. Examples of modeling systems are MIKE SHE, SACRAMENTO, and MODFLOW. A model is a site-specific application of a modeling system, including given input data and specific parameter values. An example of a model is a MIKE SHE– based model for the Ngezi catchment (cf. the case study below). A modeling system or a code can be “verified.” A code verification involves comparison of the numerical solution generated by the code with one or more analytical solutions or with other numerical solutions. Verification ensures that the computer program accurately solves the equations that constitute the mathematical model. Model validation is here defined as the process of demonstrating that a given site-specific model is capable of making
accurate predictions for periods outside a calibration period. A model is said to be validated if its accuracy and predictive capability in the validation period have been proven to lie within acceptable limits or errors. It is important to notice that the term model validation refers to a site specific validation of a model. This must not be confused with a more general validation of a generalized modeling system which, in principle, will never be possible. Testing Scheme for Validation of Hydrological Models The hierarchial testing scheme proposed by Klemes [1985, 1986] appears suitable for testing the capability of a model to predict the hydrological effect of climate change, land use change, and other nonstationary conditions. Klemes distinguished between simulations conducted for the same station (catchment) used for calibration and simulations conducted for ungauged catchments. He also distinguished between cases where climate, land use, and other catchment characteristics remain unchanged (are stationary) and cases where they are not. This leads to the definitions of four basic categories of typical modeling tests. 1. The split-sample test (SS) involves calibration of a model based on 3–5 years of data and validation on another period of a similar length. 2. The differential split-sample test (DSS) involves calibration of a model based on data before catchment change occurs, adjustment of model parameters to characterize the change, and validation on the subsequent period. 3. In the proxy-basin test (PB) no direct calibration is allowed, but advantage may be taken of information from other gauged catchments. Hence validation will comprise identification of a gauged catchment deemed to be of a nature similar to that of the validation catchment; initial calibration; transfer of model, including adjustment of parameters to reflect actual conditions within validation catchment; and validation. 4. With the proxy-basin differential split-sample test (PBDSS), again no direct calibration is allowed, but information from other catchments may be used. Hence validation will comprise initial calibration on the other relevant catchment, transfer of model to validation catchment, selection of two parameter sets to represent the periods before and after the change, and subsequent validations on both periods.
Relevant Literature on Model Intercomparison Studies The testing of hydrological models through validation on independent data has for a long time been emphasized by the World Meteorological Organization (WMO). In their pioneering studies [WMO, 1975, 1986, 1992] several hydrological modeling systems of the empirical black box and the lumped conceptual types were tested on the same data from different catchments. The actual testing, however, only included the standard SS test comprising an initial calibration of a model and subsequent validation based on data from an independent period. No firm conclusions were derived regarding significant differences in performance among different model types. Franchini and Pacciani [1991] made a comparative analysis of seven different lumped conceptual models. They used an SS testing approach calibrating on a 1-month period and validating on a subsequent 3-month period. They concluded that in spite of a wide range of structural complexity all the models produced similar and equally valid results. With regard to the
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
question of whether the simpler or the more complex variants within this group of models are better, they concluded that significantly different models produced basically equivalent results, with calibration times being generally proportional to the complexity of their structure. On the other hand, they concluded that the model structure should not be made too simple, because it will then cause a loss of the link with the physics of the problem and of the possibility of taking advantage of prior knowledge of the geomorphological nature of the catchment. Other researchers have conducted similar intercomparison studies involving empirical black box models and lumped conceptual models [Naef, 1981; Wilcox et al., 1990] with similar conclusions. Only a few studies have included comparisons of distributed physically based models with simpler models. Loague and Freeze [1985] in a classical study compared two empirical black box modeling systems (a regression model and a unit hydrograph model) and a quasi physically based system on three small experimental catchments ranging from 10 ha to 7.2 km2. The models were used on an event basis to simulate runoff peaks. The two empirical models were calibrated against runoff data and subsequently validated on independent data in an SS approach. The parameter values for the quasi physically based model were assessed directly from field data and not subject to any calibration before being validated against the same data as the two other models. Loague and Freeze [1985] found that all models performed poorly. For one catchment the quasi physically based model was subsequently applied with and without calibration of one key model parameter. Such calibration had little impact on the model performance during the validation period. In a study in the semiarid 150 km2 Walnut Gulch experimental watershed Michaud and Sorooshian [1994] compared a lumped conceptual model (SCS), a distributed conceptual model (SCS with eight subcatchments, one per raingauge) and a distributed physically based model (KINEROS) for simulation of storm events. They found that with calibration, the accuracies of the two distributed models were similar. Without calibration the distributed physically based model performed better than the distributed conceptual model, and in both cases the lumped conceptual model performed poorly. Thus, as far as the test experience for distributed physically based models is concerned, both Loague and Freeze [1985] and Michaud and Sorooshian [1994] have performed tests on relatively small experimental catchments with very good data coverage. Both studies have used the models on ungauged conditions (without calibration) but in all cases under stationary climate conditions. The present paper presents results from larger catchments in Zimbabwe with ordinary data coverage and performs a sequence of rigorous tests of increasing complexity according to the hierarchial scheme outlined by Klemes [1986], involving intercomparisons between lumped conceptual and distributed physically based models.
Hydrological Modeling Systems The following three modeling systems (codes) are used in the present study: a lumped conceptual rainfall-runoff modeling system (NAM), a semidistributed hydrological modeling system (WATBAL), and a distributed physically based hydrological modeling system (MIKE SHE). The NAM and MIKE SHE can be characterized as very typical of their respective
2191
classes, while the WATBAL falls in between these two standard classes. All three modeling systems are being used on a routine basis at the Danish Hydraulic Institute (DHI) in connection with consultancy and research projects. NAM NAM is a traditional hydrological modeling system of the lumped conceptual type operating by continuously accounting for the moisture contents in four mutually interrelated storages. The NAM was originally developed at the Technical University of Denmark [Nielsen and Hansen, 1973] and has been modified and extensively applied by DHI in a large number of engineering projects covering all climatic regimes of the world. Furthermore, the NAM has been transferred to more than 100 other organizations worldwide as part of DHI’s MIKE 11 generalized river modeling package. The structure of NAM is illustrated in Figure 1. The NAM has in its present version a total of 17 parameters; however, in most cases only about 10 of these are adjusted during calibration. WATBAL WATBAL was developed in the early 1980s by DHI in an attempt to enable full utilization of readily available, distributed data on land surface properties (topography, vegetation, and soil) in a physically based model, and yet it is simple enough to allow large-scale applications within reasonable computational requirements. Here the WATBAL is briefly introduced; more detailed information has been given by Knudsen et al. [1986]. WATBAL has been designed to account for the spatial and temporal variations of soil moisture. On the basis of distributed information on meteorological conditions, topography, vegetation, and soil types, the catchment area is divided into a number of hydrological response units, as illustrated in Figure 2, with each unit being characterized by a different composition of the above features. These units are used to provide the spatial representation of soil moisture, while temporal variations within each unit are accounted for by means of empirical relations for the processes affecting soil moisture, using physical parameters particular to each unit. For the representation of subsurface flows a simple lumped, conceptual approach is applied, using a cascade of linear reservoirs to account for the interflow and baseflow components (Figure 3). In summary, WATBAL provides a distributed physically based description of the surface processes affecting soil moisture (interception, infiltration, evapotranspiration, and percolation), while a lumped conceptual approach is used to represent subsurface flows. WATBAL has previously been used successfully for prediction of runoff from ungauged catchments [Nielsen and Bari, 1988]. MIKE SHE MIKE SHE is a further development of the European Hydrological System—SHE [Abbott et al., 1986a, b]. It is a deterministic, fully distributed and physically based modeling system for describing the major flow processes of the entire land phase of the hydrological cycle. MIKE SHE solves the partial differential equations for the processes of overland and channel flow and unsaturated and saturated subsurface flow. The system is completed by a description of the processes of snow melt, interception, and evapotranspiration. The flow equations are solved numerically using finite difference methods. In the horizontal plane the catchment is discretized in a
2192
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
Figure 1. Structure of the NAM rainfall runoff modeling system [DHI, 1994]. network of grid squares. The river system is assumed to run along the boundaries of these. Within each square the soil profile is represented by a number of computational nodes in the vertical direction, which above the groundwater table may
become partly saturated. Lateral subsurface flow is only considered in the saturated part of the profile. Figure 4 illustrates the structure of the MIKE SHE. A description of the methodology and some experiences of model application to ordi-
Figure 2. WATBAL representation of catchment characteristics and definition of hydrological response units [Knudsen et al., 1986].
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
2193
Figure 3. Principal structure of WATBAL [Knudsen et al., 1986]. nary catchments have been given by Refsgaard et al. [1992] and Jain et al. [1992]. A more detailed description has been given by Refsgaard and Storm [1995]. MIKE SHE is usually categorized as a physically based sys-
tem. The characterization is, strictly speaking, correct only if it is applied on an appropriate scale. A number of scale problems arise when the MIKE SHE is used on a regional scale [Refsgaard and Storm, 1995]. In addition, if there is a considerable
Figure 4. Schematic presentation of the MIKE SHE [DHI, 1993b].
2194
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
Figure 5. Location of the three catchments in Zimbabwe.
uncertainty attached to the basic information, and if the spatial and temporal variables (such as groundwater table elevations) cannot be validated against observations, a MIKE SHE model of that particular site cannot be considered fully physically based but will degenerate towards a detailed conceptual model. In this case the calibration procedure is usually to adjust the parameters with the largest uncertainties attached, within a reasonable range.
Case Study: Methodology Selected Catchments in Zimbabwe The three catchments in Zimbabwe that were selected for the model tests are Ngezi-South (1090 km2), Lundi (254 km2), and Ngezi-North (1040 km2). The locations of the catchments are shown in Figure 5. A brief data collection/field reconnaissance to Zimbabwe was arranged to obtain relevant information. Daily series of rainfall and monthly series of pan evaporation were obtained from the Department of Meteorological Services. Records of mean daily discharges as well as information on water rights were obtained from the Hydrological Branch, Ministry of Energy Water Resources and Development. Detailed information on land use was obtained through subcontracting R. Whitlow, University of Zimbabwe, to prepare land-use maps based upon 1:25,000 aerial photographs. Furthermore, 1:50,000 topographical maps were collected and digitized. Information on vegetation characteristics was obtained from Timberlake [1989] as well as from J. Timberlake and N. Nobanda, National Herbarium (personal communication, 1989); B. Campell, Department of Biological Sciences (personal communication, 1989);
and G. MacLaureen, Department of Crop Science, University of Zimbabwe (personal communication, 1989). Information on soil characteristics and hydrogeology was obtained from Anderson [1989]. Finally, valuable information of various kinds was provided by R. Whitlow, Department of Geography, University of Zimbabwe (personal communication, 1989); H. Elwell, Agritex (personal communication, 1989); J. Anderson, Chemistry and Soil Research Institute, Ministry of Agriculture (personal communication, 1989); and others. A more detailed description is given in DHI [1993a]. The annual catchment rainfall and runoff for the periods selected for modeling are shown in Table 1, while some of the key features for the three catchments are presented in Table 2. It is noticed from the rainfall and runoff figures in Table 1 that there are very large interannual variations. From Table 2 it appears that there are significant differences in the vegetation and soil characteristics from catchment to catchment. Model Testing Scheme The model testing scheme is illustrated in Figure 6. The testing of the involved models has been undertaken in parallel and in the following sequence. 1. The SS test was based on data from Ngezi-South comprising an initial calibration of the models and a subsequent validation using data for an independent period. 2. The PB test involved transfer of models to the Lundi catchment and adjustment of parameters to reflect the prevailing catchment characteristics and validation without any calibration. 3. The modified proxy-basin (M-PB) test was as above, but
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
2195
Table 1. Annual Rainfall and Runoff Values for the Three Zimbabwean Test Catchments
PB-DSS) test was as above, though it allowed models to be calibrated using a short-term (1 year) record.
Hydrological Year
Rainfall, mm/yr
Runoff, mm/yr
Evaluation Criteria
1971/1972 1972/1973 1973/1974 1974/1975
Ngezi-South 890 317 1290 1087
131 2 349 236
1975/1976 1976/1977 1977/1978 1978/1979
879 872 1131 609
90 116 245 59
1971/1972 1972/1973 1973/1974 1974/1974 1975/1976
Lundi 920 371 1384 1046 857
89 2 460 217 89
1981/1982 1982/1983 1983/1984
416 528 547
10 7 8
1977/1978 1978/1979 1981/1982 1982/1983 1983/1984
Ngezi-North 1047 730 430 395 436
156 64 12 1 4
For measuring the performance of the models for each test, a standard set of criteria has been defined. The criteria have been designed with the sole purpose of measuring how closely the simulated series of daily flows agree with the measured series. Owing to the generalized nature of the defined model validations, it has been necessary to introduce several criteria for measuring the performance with regard to water balance, low flows, and peak flows. The standard set of performance criteria comprises a combination of the following four graphical plots and three numerical measures: (1) joint plots of the simulated and observed hydrographs; (2) scatter diagram of monthly runoffs; (3) flow duration curves; (4) scatter diagram of annual maximum discharges; (5) overall water balance; (6) the Nash-Sutcliffe coefficient (R2); and (7) an index (EI) measuring the agreement between the simulated and observed flow duration curves. The coefficient R2, introduced by Nash and Sutcliffe [1970], is computed on the basis of the sequence of observed and simulated monthly flows over the whole testing period (perfect agreement for R2 is 1):
O M
R2 5 1 2
m51
YO M
~Q om 2 Q sm! 2
¯ o! 2 ~Q om 2 Q
m51
where was adjusted by allowing model calibration based on 1 year of runoff data. 4. For the DSS test, model calibration was based on data from an initial calibration period, and validation was based on data from a subsequent period. The differential nature of this test is justified by the fact that the later independent period includes three successive years (1981/1982–1983/1984) with a markedly lower rainfall than would be otherwise and hence represents a nonstationary climate scenario. 5. The PB-DSS test involved transferring the models to the Ngezi-North catchment, adjusting the parameters to represent the catchment characteristics, and validating them by runoff simulation over a nonstationary climate period. 6. The modified proxy-basin differential split-sample (M-
M total number of months; Q sm simulated monthly flows; Q om observed monthly flows; ¯ o average observed monthly flows over whole period. Q The flow duration curve error index, EI, provides a numerical measure of the difference between the flow duration curves of simulated and observed daily flows (perfect agreement for EI is 1): EI 5 1 2
E
@ f o~q! 2 f s~q!# dq
YE
where f o(q) is the flow duration curve based on observed daily flows, and f s(q) is the flow duration curve based on simulated daily flows.
Table 2. Land-Use Vegetation and Soil Characteristics Estimated From Available Information and a Brief Field Visit Catchment Ngezi-South Land use/vegetation (area %) Dense/closed woody vegetation Open woody vegetation Sparse woody vegetation Grassland Cropland Abandoned cropland Rock outcrops Soil depth range, m Saturated hydraulic conductivity in root zone soil, mm/hr Available water content in root zone soil, vol %
f o~q! dq
Lundi
Ngezi-North
7 36 14 11 29 2 1 0–2.5 range: 1–250
13 25 19 39 3 0 0 0–1 range: 1–70
10 35 14 16 19 6 0 0.5–6 range: 2–100
average: 80
average: 60
average: 50
range: 10–14 average: 12
range: 10–12 average: 11
range: 9–29 average: 17
2196
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
Figure 6.
Model validation test schemes.
Model Construction, Calibration, and Application All models have had access to the same hydrometeorological data and catchment information at any time. Due to the nature of the different models, however, the WATBAL and SHE have been able to make more direct use of the available information than the NAM. In this respect, the NAM has disregarded the spatial variation of rainfall and used the catchment average series as input, and for the simulation of ungauged catchments, a subjective evaluation of catchment characteristics has been undertaken for estimation of the appropriate model parameters. On the other hand, the WATBAL and SHE have attempted to account for the spatial variability of rainfalls as well as information on typical storm durations to convert daily rainfall series to realistic hourly rainfalls. Furthermore, these models have directly used the available information on the spatial variation of topography and soil and vegetation types and their characteristics for model setup and estimation of appropriate model parameters. As an illustration of the differences in model complexity and the different abilities of the three modeling systems to utilize the available distributed catchment data, some key facts for the three model applications to the 1090 km2 Ngezi-South catchment are given in the following three paragraphs. The NAM model considered the entire catchment as one unit, utilized only catchment areal rainfall, and initially disregarded information on soil, vegetation, and geology. Such information was subsequently used on a subjective basis for assessing likely parameter values in the PB tests on the other two catchments. During the model calibrations (when allowed) the values of the 10 parameters were assessed. The WATBAL model was established on the basis of six meteorological zones, eight soil types, and 11 vegetation types. The spatial occurrences of these three features resulted in 129 hydrological response units. During the model calibrations (when allowed) parameter values reflecting root depths, soil water retention capacity, soil hydraulic conductivities, and time constants in subsurface flow routing were adjusted. The MIKE SHE also distributed the rainfall information to different inputs in six meteorological zones. Information on topography, soil, vegetation, and geology were distributed to a 1-km grid. Thus MIKE SHE carried out calculations at 1090 horizontal grid points. During the model calibrations (when
allowed) parameter values reflecting soil depth and maximum root depths, as well as an empirical drainage time constant, were adjusted. In order to minimize the calibration work the parameter values were not varied within all 1090 grid points, but kept identical within each of the 13 land-use classes. In general, the parameters for which field data were available, such as soil water retention curves and leaf area index, were not modified during the calibration process. The present study has aimed at testing various types of general modeling systems. However, it should be emphasized that validation results are not solely dependent on the modeling system but, indeed, also depend on the hydrologist operating the model, including his or her personal interpretation of available information and subjective assessments. In the present study this element of uncertainty has been minimized to the extent possible by assigning three experienced hydrologists with comprehensive experience in the application of each of the three modeling systems and by providing each of them with the same catchment data. The calibration procedure adopted was that of “trial and error,” implying that the hydrologists made subjective adjustments of parameter values in between the calibration runs. The numerical and graphical performance criteria described above were used as important guidance for the hydrologists when deciding upon the set of parameter values which they assessed to be the optimal ones. As these decisions inevitably depend on the personal experiences and judgments of the hydrologists, it may be argued that this procedure adds an undesirable degree of subjectivity to the results. However, given the large number of performance criteria and the large number of adjustable parameters, especially in the WATBAL and MIKE SHE models, suitable and well-proven automatic parameter optimization techniques did not exist. Instead, by applying the standard calibration procedure by which the three hydrologists had comprehensive experience, the results may be seen as typical results from three different modeling systems, when using standard engineering procedures for data collection, model construction, and calibration.
Results of Model Validation Test Scheme The results of the six tests outlined in Figure 6 are summarized in Figure 7, which shows the overall water balances and
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
2197
PB Test
Figure 7. Summary of key validation results for all tests.
the R2 and EI numerical criteria. Simulated and observed hydrographs are shown in Figure 8 for two of the tests from the Lundi and Ngezi-North catchments. Annual water balances are shown for all the tests in Figures 9 –15. Assessments of uncertainties in the PB predictions are shown in Figures 16 and 17. Note that the different performance criteria presented in the figures focus on different aspects, such as overall annual water balances (Figures 9 –17), monthly flows (R2 in Figure 7), flow pattern on a daily basis (EI in Figure 7) and hydrograph shapes (Figure 8). The results are discussed test by test in the following sections. SS Test This test is based on data from Ngezi-South and comprises an initial calibration of the models and a subsequent validation using data for an independent period. As indicated in Figures 7, 9, and 10 the performances of the three models are very similar. All models are able to provide a close fit to the recorded flows for the calibration period, while for the independent validation period the performance is somewhat reduced, as expected. The reduction is, however, limited, and all models are able to maintain a very good representation of the overall water balance and the interannual and seasonal variations, as well as the general flow pattern.
This test comprises a transfer of models to the Lundi catchment, adjustment of parameters to reflect the prevailing catchment characteristics, and validation without any calibration. The PB test was arranged to test the capability of the different models to represent runoff from an ungauged catchment area, and hence no calibration was allowed prior to the simulation. All models have used the experience from the Ngezi-South calibrations in combination with the available information on the particular catchment characteristics for Lundi. While the NAM model has used this information in a purely subjective manner to revise model parameters, both the WATBAL and MIKE SHE models have directly used this information for the model setup. The estimates prepared by the latter two models have, however, also been influenced by the individual modelers’ subjective interpretation of the available information on soil and vegetation characteristics. In order to assess the effects of the uncertainty in parameter estimation as perceived by the individual modelers, three alternative runoff simulations were prepared, reflecting expected low, central, and high (runoff) estimates, respectively. The results of the central estimates are included in Figures 7, 8a, and 11, while annual runoff figures for the assessed uncertainty intervals are shown in Figure 16. In general, all models provide an excellent representation of the general flow pattern and the overall water balance, while maintaining the significant interannual variability to a satisfactory degree. The predicted hydrographs for the rainy season of 1973/1974, shown in Figure 8a, confirm that the overall hydrograph pattern is predicted quite well by all three models. The overall performance of the central estimates by the NAM and MIKE SHE models is somewhat reduced compared to validation runs for the Ngezi-South catchment as expected when no calibration is possible. The estimates would, however, still be very valuable for all practical purposes. For the WATBAL model, the central estimate is even better than obtained for the validation period for Ngezi-South, providing for a very accurate representation of observed runoff record. From Figure 16 it appears that the assessed uncertainty interval for the NAM predictions of annual runoff is about twice as wide as for the WATBAL and MIKE SHE predictions. M-PB Test This test is based on the same data from Lundi as the above PB test. The M-PB test was undertaken to evaluate whether better model performance could be obtained should shortterm measurements be available for calibration. Hence, before the results of the previous test were revealed, 1 year (1975/ 1976) of runoff record was released for calibration, and the PB test repeated. The main results of this test are summarized in Figure 7, and annual water balances are shown in Figure 12. For the NAM model the short-term calibration leads to an improved performance, decreasing the deviation of the overall water balance to some 15%. At the same time, the statistics of R2 and EI confirm the good representation of monthly flows and the overall flow pattern in general. For the WATBAL model the short-term calibration introduces only a slight improvement in the overall performance. The reason for this is thought to be due to the originally very good performance, which in any case would be difficult to improve. The main benefit of the short runoff record is in this case primarily to confirm the validity of the central estimate
2198
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
Figure 8. (a) Lundi (central estimates) proxy-basin (PB) test hydrographs from 1973/1974. (b) Ngezi-North (central estimates) PB differential split-sample (SS) test hydrographs for 1977/1978. and hence to reduce the uncertainty related to the final runoff estimate. In this sense the calibration has proven quite valuable and would indeed be so in any practical case. For the MIKE SHE model the calibration has not intro-
duced any improvement in the overall performance. As compared to the best of the original estimates (i.e., the low case) the calibration has in fact caused a deterioration of the performance. This rather unfortunate incident may occur for all
Figure 9. Annual water balances for the calibration part of the SS test on Ngezi-South catchment.
Figure 10. Annual water balances for the validation part of the SS test on Ngezi-South catchment.
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
2199
Figure 11. Annual water balances for PB test on Lundi catchment.
Figure 13. Annual water balances for differential split sample (DSS) test on Lundi catchment.
types of models when calibration data are not fully consistent, but it appears that the SHE type of model requires a greater reliability of input data than other, more simple types of models to avoid the pitfall of miscalibration.
measured (15 mm compared to 8 mm). The related statistics are poorer than those in the other testing schemes, but it should be noted that even small deviations cause poor statistics when mean flows are as low as those in this case.
DSS Test
PB-DSS Test
This test consists of model calibrations based on data from Lundi for 4 wet years (1971/1972–1975/1976 with mean annual runoff of 171 mm) and validation on data from 3 very dry years (1981/1982–1983/1984 with mean annual runoff of 8 mm). The purpose of this test is to assess the capability of the models to do simulations under nonstationary climate conditions. A summary of the main results of the differential SS tests is given in Figure 7, and the annual water balances are shown in Figure 13. As is evident from the results, both NAM and MIKE SHE predict the water balance well. The WATBAL model, however, grossly overestimates the peaks in the relative sense, causing the simulated average runoff to be about twice that
This test is based on data from the third catchment, NgeziNorth. Without allowing for any prior calibration, all modelers were requested to prepare low, central, and high estimates of the expected series of flows for the 1977/1978 –1983/1984 period. This period contained a sequence of mainly wet years (1977/1978 –1980/1981) followed by 3 consecutive dry years, with rainfalls being less than half of that experienced in the former period. At the stage when the measured flow record was revealed, it was unfortunately discovered that the record for the 1979/ 1980 –1980/1981 years was erroneous and hence had to be disregarded when computing the test statistics. The results of this test are summarized in Figure 7, while the annual water
Figure 12. Annual water balances for modified proxy-basin (M-PB) test on Lundi catchment.
Figure 14. Annual water balances for proxy-basin differential split-sample (PB-DSS) test on Ngezi-North catchment.
2200
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
Figure 17. Assessments of uncertainty interval for prediction of annual water balances in the PB-DSS test on Ngezi-North catchment. Figure 15. Annual water balances for modified proxy-basin differential split-sample (M-PB-DSS) test on Ngezi-North catchment.
balances are shown in Figure 14. The assessed uncertainty intervals of the model predicted annual runoff are shown in Figure 17. From Figure 17 it appears that all models have managed to provide for a nonbiased range of estimates of the overall water balance, which for some models is quite narrow: NAM, 650%; WATBAL, 630%; and MIKE SHE, 610%. In terms of the overall water balance, the central estimates of the models agree within 25% (NAM), 5% (WATBAL), and 2% (MIKE SHE). The agreement between the recorded and simulated monthly flows and the flow duration curves, however, is less accurate for NAM and MIKE SHE than for the WATBAL model, which provides for an excellent fit in terms of these measures. The reason for the somewhat lower R2 and EI figures for the NAM model is related to its generally less accurate prediction of flows, while for the MIKE SHE model this is directly linked to the erroneous assessment of a key drainage parameter, causing the model to produce much more base flow than actually exist. Hydrographs showing measured discharge and predictions by the three models for the rainy season of 1977/1978 are presented in Figure 8b. These graphs confirm the conclusions derived from the numerical criteria, R2, and EI, namely, that
Figure 16. Assessments of uncertainty interval for prediction of annual water balances in the PB test on Lundi catchment.
the WATBAL reproduces the observed hydrograph very well, while the daily hydrograph for MIKE SHE reveals major errors in overall flow pattern. Note that the model which produces the best overall water balance (MIKE SHE) has at the same time the poorest fit when compared on daily values. M-PB-DSS Test This test is based on the same data from Ngezi-North as the previous PB-DSS test. Following the calibration of all models based on only 1 year of data (1977/1978), before the results for other years were revealed the above test was repeated. The main results of the modified test are shown in Figures 7 and 15. These results clearly demonstrate that access to only 1 year of runoff data has enabled all models to provide an excellent representation of the runoff within the entire testing period. The overall water balance agrees within 7% for all models and despite the fact that the calibration was based on a wet year, annual flows for the dry period come within the right order of magnitude, although the relative deviation in some cases is quite significant. The high R2 and EI scores achieved by all models confirm that the representation of the monthly flow sequence and the overall flow pattern has become very good after the calibration.
Discussion and Conclusions The three generalized modeling systems, NAM, WATBAL, and MIKE SHE, have been subject to a rigorous testing scheme on data from three Zimbabwean catchments. NAM is a typical representative for the lumped conceptual class of models, while MIKE SHE similarly belongs to the distributed physically based class. WATBAL falls between the two classes. However, for the specific applications in Zimbabwe, where surface water hydrological aspects have been dominated, it can be argued that WATBAL can be considered as another representative of the distributed physically based class. Although establishing an objective framework for the model tests and intercomparisons has been attempted, it should be recognized that the results of a certain validation will be influenced by the specific test conditions, including the particular climate, catchment characteristics, data availability, and quality as well as subjective assessments made by the user (e.g., interpretation of available information for determining model parameters). Hence the obtained results are not only a function
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
of the modeling system itself, but also of the user and numerous other factors. To arrive at a firm conclusion many validations would usually be required, and the limited number of tests undertaken therefore suggests that individual results may only be cautiously concluded. With this caution regarding generality in mind, a number of specific conclusions may be derived from the case study. First, in view of the difficult tasks given to the models involving simulation for ungauged catchments and nonstationary time periods, the overall performance of the models is considered quite impressive. The overall water balance agrees within 625% in all cases but one, and good results are achieved without balancing out excessive positive and negative deviations within individual years. In most cases the models score an R2 value at about 0.8 or greater and an EI index generally above 0.7. Secondly, the following is noted with regard to the specific types of validations tests: 1. For the SS test the NAM, WATBAL, and MIKE SHE systems generally exhibit similar performance. All models are able to provide a close fit to the recorded flows for the calibration period, without severely reducing the performance during the independent validation period. Hence this test suggests that if an adequate runoff period for a few (3–5) years exists, any of the modeling systems could be used as a reliable tool for filling in gaps in such records or used to extend runoff series based on long-term rainfall series. Considering the data requirements and efforts involved in the setup of the different models, however, a simple model of the NAM type should generally be selected for such tasks. 2. For the PB tests, designed for validating the capability of the models to represent flow series of ungauged catchments, it had been expected that the physically based models would produce better results than the simple type of models. The results, however, do not provide unambiguous support for this hypothesis. All three modeling systems generated good results, with the WATBAL providing slightly more accurate results than the others. Hence for the Zimbabwean conditions the additional capabilities of the MIKE SHE, as compared to the WATBAL, namely, the distributed physically based features relating to subsurface flow, proved to be of little value in simulating the water balance. For the PB tests it is noticed that the uncertainty range represented by the low and high estimates is significantly larger for the NAM than for the WATBAL and MIKE SHE cases. This probably reflects the fact that parameter estimation for ungauged catchments is generally more uncertain for the NAM, whose parameters are semiempirical coefficients without direct links to catchment characteristics. 3. A general experience of the M-PB tests is that allowing for model calibration based on only 1 year of runoff data improves the overall performance of all models. The improvement appears to be particularly significant for the NAM model, which also showed the largest uncertainties in the cases where no calibration was possible. 4. For the DSS tests all models have been able to simulate flows of the right order of magnitude and correct pattern. Hence all models have proven their ability to simulate the runoff pattern in periods with much reduced rainfall and runoff as compared to the calibration period. On the basis of these results there appears no immediate justification for using an advanced type of model to represent flows following a significant change of rainfall, providing a number of years are avail-
2201
able for calibration purposes. It is tempting to extend this finding to suggest that the simple type of model could be used to assess the impact of climate change on water resources. It should be recognized, however, that above results cannot fully justify such a hypothesis, since a long-term climate change would probably bring about changes in vegetation and their evaporation. This type of nonstationarity has not been adequately tested. As far as the SS tests are concerned the above conclusion is in full agreement with results of other studies [e.g., Michaud and Sorooshian, 1994]. With regard to the PB tests the present conclusion in favor of the distributed physically based modeling systems is in agreement with, albeit more vague than, that of Michaud and Sorooshian [1994]. In summary, the present study, as well as similar studies reported in literature, suggests the following conclusions with regard to rainfall runoff modeling. 1. Given a few (1–3) years of runoff measurements, a lumped model of the NAM type would be a suitable tool from the point of view of technical and economical feasibility. This applies for catchments with homogeneous climatic input as well as cases where significant variations in the exogenous input is encountered. 2. For ungauged catchments, however, where accurate simulations are critical for water resources decisions, a distributed model is expected to give better results than a lumped model if appropriate information on catchment characteristics can be obtained.
Acknowledgments. The modeling work on the Zimbabwe catchments were carried out by our colleagues Børge Storm and Merete Styczen (MIKE SHE) and Roar Jensen (NAM), while the second author was responsible for the WATBAL work. During the data collection and field reconnaissance in Zimbabwe, kind help and assistance was provided by University of Zimbabwe; National Herbarium; and Department of Meteorological Services and Hydrological Branch, Ministry of Energy, Water Resources and Development. The study was carried out with financial support from the Danish Council of Technology, and the paper preparation was supported by the Danish Technical Research Council.
References Abbott, M. B., J. C. Bathurst, J. A. Cunge, P. E. O’Connel, and J. Rasmussen, An introduction to the European Hydrological System—Systeme Hydrologique Europeen, “SHE,” 1, History and philosophy of a physically based distributed modelling system, J. Hydrol., 87, 45–59, 1986a. Abbott, M. B., J. C. Bathurst, J. A. Cunge, P. E. O’Connell, and J. Rasmussen, An introduction to the European Hydrological System—Syste`me Hydrologique Europe ´en “SHE,” 2, Structure of a physically based distributed modelling system, J. Hydrol., 87, 61–77, 1986b. Anderson, J., Communal land physical resource inventory, Mhondoro and Ngezi, Draft Rep. A 551, Chem. and Soil Res. Inst., Minist. of Agric., Harare, Zimbabwe, 1989. Beven, K. J., Changing ideas in hydrology—The case of physically based models, J. Hydrol., 105, 157–172, 1989. Danish Hydraulic Institute (DHI), Validation of hydrological models, Phase II, Hørsholm, 1993a. Danish Hydraulic Institute (DHI), MIKE SHE WM, short description, 1993b. Danish Hydraulic Institute (DHI), MIKE11 short description, 1994. Flavelle, P., A quantitative measure of model validation and its potential use for regulatory purposes, Adv. Water Resour., 15, 5–13, 1992. Franchini, M., and M. Pacciani, Comparative analysis of several conceptual rainfall-runoff models, J. Hydrol., 122, 161–219, 1991. Grayson, R. B., I. D. Moore, and T. A. McHahon, Physically based
2202
REFSGAARD AND KNUDSEN: INTERCOMPARISON OF HYDROLOGICAL MODELS
hydrologic modeling, 2, Is the concept realistic?, Water Resour. Res., 28(10), 2659 –2666, 1992. Jain, S. K., B. Storm, J. C. Bathurst, J. C. Refsgaard, and R. D. Singh, Application of the SHE to catchments in India, 2, Field experiments and simulation studies with the SHE on the Kolar subbasin to the Narmada River, J. Hydrol., 140, 25– 47, 1992. Klemes, V., Sensitivity of water resources systems to climate variations, WCP Rep. 98, World Meteorological Organisation, Geneva, 1985. Klemes, V., Operational testing of hydrological simulation models, Hydrol. Sci. J., 31(1), 13–24, 1986. Knudsen, J., A. Thomsen, and J. C. Refsgaard, WATBAL: A semidistributed, physically based hydrological modelling system, Nordic Hydrol., 17, 347–362, 1986. Loague, K. M., and R. A. Freeze, A comparison of rainfall-runoff modeling techniques on small upland catchments, Water Resour. Res., 21(2), 229 –248, 1985. Michaud, J., and S. Sorooshian, Comparison of simple versus complex distributed runoff models on a midsized semiarid watershed, Water Resour. Res., 30(3), 593– 605, 1994. Naef, F., Can we model the rainfall-runoff process today?, Hydrol. Sci. Bull., 26(3), 281–289, 1981. Nash, I. E., and I. V. Sutcliffe, River flow forecasting through conceptual models, I, J. Hydrol., 10, 282–290, 1970. Nielsen, S. A., and Bari, Simulation of runoff from ungauged catchments by a semi-distributed hydrological modelling system, Proceedings, 6th IAHR Congress, Int. Assoc. for Hydraul. Res., Delft, Netherlands, 1988. Nielsen, S. A., and E. Hansen, Numerical simulation of the rainfallrunoff process on a daily basis, Nordic Hydrol., 4, 171–190, 1973. Refsgaard, J. C., Model and data requirements for simulation of runoff and land surface processes, in Proceedings from NATO Advanced Research Workshop “Global Environmental Change and Land Surface Processes in Hydrology: The Trials and Tribulations of Modelling and Measurering, Tucson, May 17–21, 1993, edited by S. Sorooshian and V. K. Gupta, Springer-Verlag, New York, 1996. Refsgaard, J. C., and B. Storm, MIKE SHE, in Computer Models of Watershed Hydrology, edited by V. J. Singh, pp. 809 – 846, Water Resour. Publ., Littleton, Colo., 1995. Refsgaard, J. C., S. M. Seth, J. C. Bathurst, M. Erlich, B. Storm, G. H.
Jørgensen, and S. Chandra, Application of the SHE to catchments in India, 1, General results, J. Hydrol., 140, 1–23, 1992. Schlesinger, S., R. E. Crosbie, R. E. Gagne´, G. S. Innis, C. S. Lalwani, J. Loch, J. Sylvester, R. D. Wright, N. Kheir, and D. Bartos, Terminology for model credibility, Simulation, 32(3), 103–104, 1979. Smith, R. E., D. R. Goodrich, D. A. Woolhiser, and J. R. Simanton, Comment on “Physically based modeling, 2, Is the concept realistic?” by R. B. Grayson, I. D. More, and T. A. McHahon, Water Resour. Res., 30(3), 851– 854, 1994. Timberlake, J., Brief description of the vegetation of Mondoro and Ngezi communal lands, Mashonaland West, Natl. Herbarium, Harare, Zimbabwe, 1989. Tsang, C.-F., The modelling process and model validation, Ground Water, 29(6), 825– 831, 1991. U.S. Committee, Task Committee on Quantifying Land-Use Change Effects, Evaluation of hydrological models used to quantify major land-use change effects, J. Irrig. Drain. Eng., 111(1), 1–17, 1985. Wilcox, B. P., W. J. Rawls, D. L. Brakensiek, and J. R. Wright, Predicting runoff from rangeland catchments: A comparison of two models, Water Resour. Res., 26(10), 2401–2410, 1990. World Meteorological Organization, (WMO), Intercomparison of conceptual models used in operational hydrological forecasting, WMO Oper. Hydrol. Rep. 7, WMO 429, Geneva, 1975. World Meteorological Organization (WMO), Third planning meeting on World Climate Programme Water, WCP 114, WMO/TD 106, Geneva, 1985. World Meteorological Organization (WMO), Intercomparison of models for snowmelt runoff, WMO Oper. Hydrol. Rep. 23, WMO 646, Geneva, 1986. World Meteorological Organization (WMO), Simulated real-time intercomparison of hydrological models, WMO Oper. Hydrol. Rep. 38, WMO 779, Geneva, 1992. J. Knudsen and J. C. Refsgaard, Danish Hydraulic Institute, Agern Alle 5, DK-2970 Hørsholm, Denmark. (Received September 25, 1995; revised March 15, 1996; accepted March 20, 1996.)
[7]
Refsgaard JC (1997) Parametrisation, calibration and validation of distributed hydrological models. Journal of Hydrology, 198, 69-97.
Reprinted from Journal of Hydrology with permission from Elsevier
[8]
Refsgaard JC (1997) Validation and Intercomparison of Different Updating Procedures for Real-Time Forecasting. Nordic Hydrology, 28, 65-84.
Reprinted with permission from Nordic Hydrology
[9]
Refsgaard JC, Sørensen HR, Mucha I, Rodak D, Hlavaty Z, Bansky L, Klucovska J, Topolska J, Takac J, Kosc V, Enggrob HG, Engesgaard P, Jensen JK, Fiselier J, Griffioen J, Hansen S (1998) An Integrated Model for the Danubian Lowland – Methodology and Applications. Water Resources Management, 12, 433-465.
Reprinted from Water Resources Management with permission from Springer (www.springerlink.com)
433
Key words: Danube, environmental impacts, floodplain, Gabcikovo, groundwater, hydropower, integrated modelling, river branch.
Abstract. A unique integrated modelling system has been developed and applied for environmental assessment studies in connection with the Gabcikovo hydropower scheme along the Danube. The modelling system integrates model codes for describing the reservoir (2D flow, eutrophication, sediment transport), the river and river branches (1D flow including effects of hydraulic control structures, water quality, sediment transport), the ground water (3D flow, solute transport, geochemistry), agricultural aspects (crop yield, irrigation, nitrogen leaching) and flood plain conditions (dynamics of inundation pattern, ground water and soil moisture conditions, and water quality). The uniqueness of the established modelling system is the integration between the individual model codes, each of which provides complex descriptions of the various processes. The validation tests have generally been carried out for the individual models, whereas only a few tests on the integrated model were possible. Based on discussion and examples, it is concluded that the results from the integrated model can be assumed less uncertain than outputs from the individual model components. In an example, the impacts of the Gabcikovo scheme on the ecologically unique wetlands created by the river branch system downstream of the new reservoir have been simulated. In this case, the impacts of alternative water management scenarios on ecologically important factors such as flood frequency and duration, depth of flooding, depth to ground water table, capillary rise, flow velocities, sedimentation and water quality in the river system have been explicitly calculated.
(Received: 30 December 1997; in final form: 10 November 1998)
1 Danish Hydraulic Institute, Denmark 2 Ground Water Consulting Ltd., Bratislava, Slovakia 3 Irrigation Research Institute (VUZH), Bratislava, Slovakia 4 Water Research Institute (VUVH), Bratislava, Slovakia 5 Water Quality Institute (VKI), Denmark 6 DHV Consultants BV, The Netherlands 7 Netherlands Institute of Applied Geosciences TNO, The Netherlands 8 Royal Veterinary and Agricultural University, Denmark
J. C. REFSGAARD1, H. R. SØRENSEN1, I. MUCHA2, D. RODAK2, Z. HLAVATY2, L. BANSKY2, J. KLUCOVSKA2, J. TOPOLSKA4, J. TAKAC3, V. KOSC3, H. G. ENGGROB1, P. ENGESGAARD5, J. K. JENSEN5, J. FISELIER6, J. GRIFFIOEN7 and S. HANSEN8
An Integrated Model for the Danubian Lowland – Methodology and Applications
Water Resources Management 12: 433–465, 1998. © 1998 Kluwer Academic Publishers. Printed in the Netherlands. J. C. REFSGAARD ET AL.
Figure 1. The Danubian Lowland with the new reservoir and the Gabcikovo scheme.
The Danubian Lowland (Figure 1) in Slovakia and Hungary between Bratislava and Komárno is an inland delta (an alluvial fan) formed in the past by river sediments from the Danube. The entire area forms an alluvial aquifer, which receives around 30 m3 s−1 infiltration water from the Danube throughout the year, in the upper parts of the area and returns it to the Danube and the drainage canals in the downstream part. The aquifer is an important water resource for municipal and agricultural water supply. Human influence has gradually changed the hydrological regime in the area. Construction of dams upstream of Bratislava together with straightening and embanking of the river for navigational and flood protection purposes as well as exploitation of river sediments have significantly deepened the river bed and lowered the water level in the river and surrounding ground water level. These changes have had a significant influence on the ground water regime as well as the sensitive riverine forests downstream of Bratislava. Despite this basically negative trend the floodplain area with its alluvial forests and associated ecosystems still represents a unique landscape of outstanding ecological importance. The Gabcikovo hydropower scheme was put into operation in 1992. A large number of hydraulic structures has been established as part of the hydropower scheme. The key structures are a system of weirs across the Danube at Cunovo 15 km downstream of Bratislava, a reservoir created by the damming at Cunovo, a 30 km long lined power and navigation canal, outside the floodplain area, parallel to the Danube River with intake to the hydropower plant, a hydropower plant and two
1.1. THE DANUBIAN LOWLAND AND THE GABCIKOVO HYDROPOWER SCHEME
1. Introduction
434
435
• Ground water quality. Based on qualitative arguments it was hypothesised that the damming and creation of the reservoir might lead to changes in the oxidation-reduction state of the ground water. The reason for this is that the reservoir might increase infiltration from the Danube to the aquifer because of increased head gradients. On the other hand, fine sediment matter might accumulate on the reservoir bottom, thereby creating a reactive sediment layer. The river water infiltrating to the aquifer has to pass this layer, which might induce a change in the oxidation status of the infiltrating water. This could affect the quality of the ground water from being oxic or suboxic towards being anoxic, which is undesirable for Bratislava’s water works, most of which are located near the reservoir. Thus, the oxidation-reduction state of the groundwater is intimately linked to a balance between the rates of infiltrating reducing water and the aquifer oxidizing capacity. The infiltrating water is linked to the hydraulic behaviour of the reservoir: how large is the infiltration area and at which rates does the infiltration take place at different locations. However, without an integrated model it is not possible to quantify whether and under which conditions these mechanisms play a significant role in practise, whether they are correct in principle but without practical importance, and what measures should be realised. • Agricultural production. Changes in discharges in the Danube caused by diversion of some of the water through the power canal and creation of a reservoir
The hydrological regime in the area is very dynamic with so many crucial links and feedback mechanisms between the various parts of the surface- and subsurface water regimes that integrated modelling is required to thoroughly assess environmental impacts of the hydropower scheme. This is illustrated by the following three examples:
1.2. NEED FOR INTEGRATED MODELLING
ship-locks at Gabcikovo, and an intake structure at Dobrohost, 10 km downstream of Cunovo, diverting water from the new canal to the river branch system. The entire scheme has significantly affected the hydrological regime and the ecosystem of the region, see, e.g., Mucha et al. (1997). The scheme was originally planned as a joint effort between former Czecho-Slovakia and Hungary, and the major parts of the construction were carried out as such on the basis of a 1977 international treaty. However, since 1989 Gabcikovo has been a major matter of controversy between Slovakia and Hungary, who have referred some disputed questions to international expert groups (EC, 1992, 1993a, b) and others to the International Court of Justice in The Hague (ICJ, 1997). Comprehensive monitoring and assessments of environmental impacts have been made, see Mucha (1995) for an overview. Since 1995 a joint Slovak-Hungarian monitoring program has been carried out (JAR, 1995, 1996, 1997).
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
Figure 2. Important processes and their interactions with regard to floodplain hydrology.
J. C. REFSGAARD ET AL.
– 1-D flow and transport in the unsaturated zone
• MIKE SHE (Refsgaard and Storm, 1995) which, on a catchment scale, can simulate the major flow and transport processes in the hydrological cycle:
An integrated modelling system (Figure 3) has been established by combining the following existing and well proven model codes:
2.1. INDIVIDUAL MODEL COMPONENTS
2. Integrated Modelling System
would lead to changes in the ground water levels. As the agricultural crops depend on capillary rise from the shallow ground water table and irrigation, the new hydrological situation created by the damming of the Danube might influence both the crop yield, the irrigation requirements and the nitrogen leaching. Traditional crop models describing the root zone are not sufficient in this case, because the lower boundary conditions (ground water levels) are changed in a way that can only be quantified if also the reservoir, the river and canal system and the aquifer are explicitly included in the modelling. • Floodplain ecosystem. The flora and fauna, which in the floodplain area are dominated by the river side branches, depend on many factors such as flooding dynamics, flow velocities, depth of ground water table, soil moisture, water quality and sediments. Also in this case the important factors depend on the interaction between the groundwater and the surface water systems (illustrated in Figure 2), and even on water quality and sediments in the surface water system, so that quantitative impact assessments require an integrated modelling approach.
436
•
•
•
•
437
– 3-D flow and transport in the ground water zone – 2-D flow and transport on the ground surface – 1-D flow and transport in the river. All of the above processes are fully coupled allowing for feedback’s and interactions between components. In addition, MIKE SHE includes modules for multi-component geochemical and biodegradation reactions in the saturated zone (Engesgaard, 1996). MIKE 11 (Havnø et al., 1995), is a one-dimensional river modelling system. MIKE 11 is used for simulating hydraulics, sediment transport and morphology, and water quality. MIKE 11 is based on the complete dynamic wave formulation of the Saint Venant equations. The modules for sediment transport and morphology are able to deal with cohesive and noncohesive sediment transport, as well as the accompanying morphological changes of the river bed. The noncohesive model operates on a number of different grain sizes. MIKE 21 (DHI, 1995), which has the same basic characteristics as MIKE 11, extended to two horizontal dimensions, and is used for reservoir modelling. MIKE 11 and MIKE 21 include River/Reservoir Water Quality (WQ) and Eutrophication (EU) (Havnø et al., 1995; VKI, 1995) modules to describe oxygen, ammonium, nitrate and phosphorus concentrations and oxygen demands as well as eutrophication issues such as bio-mass production and degradation. DAISY (Hansen et al., 1991) is a one-dimensional root zone model for simulation of soil water dynamics, crop growth and nitrogen dynamics for various agricultural management practices and strategies.
Figure 3. Structure of the integrated modelling system with indication of the interactions between the individual models.
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
J. C. REFSGAARD ET AL.
The focus in MIKE SHE lies on catchment processes with a comparatively less advanced description of river processes. In contrary, MIKE 11 has a more advanced description of river processes and a simpler catchment description than MIKE SHE. Hence, for cases where full emphasis is needed for both river and catchment processes a coupling of the two modelling systems is required.
2.3. A COUPLING OF MIKE SHE AND MIKE 11
A general discussion on the limitations in the above couplings is given in Section 7 below.
A) MIKE SHE forms the core of the integrated modelling system having interfaces to all the individual modelling systems. The coupling of MIKE SHE and MIKE 11 is a fully dynamic coupling where data is exchanged within each computational time step, see Section 2.3 below. B) Results of eutrophication simulations with MIKE 21 in the reservoir are used to estimate the concentration of various water quality parameters in the water that enters the Danube downstream of the reservoir. This information serves as boundary conditions for water quality simulations for the Danube using MIKE 11. C) Sediment transport simulations in the reservoir with MIKE 21 provide information on the amount of fine sediment on the bottom of the reservoir. The simulated grain size distribution and sediment layer thickness is used to calculate leakage coefficients, which are used in ground water modelling with MIKE SHE to calculate the exchange of water between the reservoir and the aquifer. D) The DAISY model simulates vegetation parameters which are used in MIKE SHE to simulate the actual evapotranspiration. Ground water levels simulated with MIKE SHE act as lower boundary conditions for DAISY unsaturated zone simulations. Consequently, this process is iterative and requires several model simulations. E) Results from water quality simulations with MIKE 11 and MIKE 21 provide estimates of the concentration of various components/parameters in the water that infiltrates to the aquifer from the Danube and the reservoir. This can be used in the ground water quality simulations (geochemistry) with MIKE SHE.
The integrated modelling system is formed by the exchange of data and feedbacks between the individual modelling systems. The structure of the integrated modelling system and the exchange of data between the various modelling systems are illustrated in general in Figure 3 and the steps in the integrated modelling is described further in Section 6.2 and illustrated in Figure 10 for the case of flood plain modelling. The interfaces between the various models indicated in Figure 3 are
2.2. INTEGRATION OF MODEL COMPONENTS
438
439
A full coupling between MIKE SHE and MIKE 11 has been developed (Figure 4). In the combined modelling system, the simulation takes place simultaneously in MIKE 11 and MIKE SHE, and data transfer between the two models takes place through shared memory. MIKE 11 calculates water levels in rivers and floodplains. The calculated water levels are transferred to MIKE SHE, where flood depth and areal extent are mapped by comparing the calculated water levels with surface topographic information stored in MIKE SHE. Subsequently, MIKE SHE calculates water fluxes in the remaining part of the hydrological cycle. Exchange of water between MIKE 11 and MIKE SHE may occur due to evaporation from surface water, infiltration, overland flow or river-aquifer exchange. Finally, water fluxes calculated with MIKE SHE are exchanged with MIKE 11 through source/sink terms in the continuity part of the Saint Venant equations in MIKE 11. The MIKE SHE–MIKE 11 coupling is crucial for a correct description of the dynamics of the river-aquifer interaction. Firstly, the river width is larger than one MIKE SHE grid, in which case the MIKE SHE river-aquifer description is no longer valid. Secondly, the river/reservoir system comprises a large number of hydraulic structures, the operation of which are accurately modelled in MIKE 11, but cannot be accounted for in MIKE SHE. Thirdly, the very complex river branch system with loops and flood cells needs a very efficient hydrodynamic formulation such as in MIKE 11.
Figure 4. Principles of the coupling between the MIKE SHE catchment code and the MIKE 11 river code.
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
J. C. REFSGAARD ET AL.
The terminology and methodology used in the following is based on the concepts outlined in Refsgaard (1997).
3. Methodology for Model Construction, Calibration, Validation and Application
Yan and Smith (1994) described the demand and outlined a concept for a full integrated ground water–surface water modelling system including descriptions of hydraulic structures and agricultural irrigation as a decision support tool for water resources management in South Florida. Typical examples of integrated codes described in the literature are Menetti (1995) and Koncsos et al. (1995). In a review of recent advances in understanding the interaction of groundwater and surface water Winter (1995) mainly describes groundwater codes, such as MODFLOW, which have been expanded with some, but very limited, surface water simulation capabilities. The research activities are characterized as ‘... although studies of these systems have increased in recent years, this effort is minimal compared to what is needed’. Winter (1995) sees the prospects for the future as follows: ‘Future studies of the interaction of groundwater and surface water would benefit from, and indeed should emphasise, interdisciplinary approaches. Physical hydrologists, geochemists, and biologists have a great deal to learn from each other, and contribute to each other, from joint studies of the interface between groundwater and surface water.’ Integrated three-dimensional descriptions of flow, transport and geochemical processes is still rarely seen for groundwater modelling of large basins. Thus, according to a recent review of basin-scale hydrogeological modelling (Person et al., 1996) most of the existing reactive transport model codes are based on one-dimensional descriptions. While many model codes contain a distributed physically-based representation of one of the three main components: ground water, unsaturated zone, and surface water systems, only few codes provide a fully integrated description of all these three main components. For example in an up-to-date book (Singh, 1995) presenting descriptions of 25 hydrological codes only three codes, SHE/SHESED (Bathurst et al., 1995), IHDM (Calver and Wood, 1995) and MIKE SHE (Refsgaard and Storm, 1995) provide such integrated descriptions. Among these three codes only MIKE SHE has capabilities for modelling advection-dispersion and water quality. None of the three codes contained options for computations of hydraulic structures in river systems, nor agricultural modelling such as crop yield and nitrogen leaching. The individual components of the integrated modelling system presented in this paper, we believe, represent state-of-the-art within their respective disciplines. The uniqueness is the full integration.
LITERATURE
2.4. COMPARISON TO OTHER MODELLING SYSTEMS REPORTED IN
440
441
The validated models have finally been used, as an integrated system, in a scenario approach to assess the environmental impacts of alternative water management options. The uncertainties of the model predictions have been assessed through sensitivity analyses.
3.4. MODEL APPLICATION
Good model results during a calibration process cannot automatically ensure that the model can perform equally well for other time periods as well, because the calibration process involves some manipulation of parameter values. Therefore, model validations based on independent data sets are required. To the extent possible, limited by data availability, the models have been validated by demonstrating the ability to reproduce measured data for a period outside the calibration period, using a so-called split-sample test (Klemes, 1986). For some of the models, the model was even calibrated on pre-dam conditions and validated on post-dam conditions, where the flow regime at some locations was significantly altered due to the construction of the reservoir and related hydraulic structures and canals.
3.3. MODEL VALIDATION
The calibration of a physically-based model implies that simulation runs are carried out and model results are compared with measured data. The adopted calibration procedure was based on ‘trial and error’ implying that the model user in between calibration runs made subjective adjustments of parameter values within physically realistic limits. The most important guidance for the model user in this process was graphical display of model results against measured values. It may be argued that such manual procedure adds a degree of subjectivity to the results. However, given the very complex and integrated modelling focusing on a variety of output results and containing a large number of adjustable parameters, automatic parameter optimisation is not yet possible and ‘trial and error’ still becomes the only feasible method in practise.
3.2. MODEL CALIBRATION
All of the applied models are based on distributed physically-based model codes. This implies that most of the required input data and model parameters can ideally be measured directly in nature.
3.1. MODEL CONSTRUCTION
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
J. C. REFSGAARD ET AL.
The Danubian floodplain is a forest area of major ecological interest characterised by a complex system of river branches. A layout of the river branch system is shown in Figure 5. The cross-sections in the river branch system were measured during the 1960’s and 1970’s. The pre-dam model was calibrated against water level and flow data from the 1965 flood. In the post-dam situation, the branch system is fed by an
4.1.2. MIKE 11 Model for the River Branch System
The MIKE 11 model for the Danube is based on river cross-sections measured in 1989 and 1991. The applied boundary conditions were measured daily discharges at Bratislava (upstream) and a discharge rating curve at Komarno (downstream). The model was initially calibrated for two steady state situations reflecting a low flow situation (905 m3 s−1 ) and a flow situation close to the long term average (2390 m3 s−1 ), respectively. Subsequently, the model was calibrated in a nonsteady state against daily water level and discharge measurements from 1991. The model was finally validated by demonstrating the ability to reproduce measured daily water level data from 1990. Calibration and validation results are presented in Topolska and Klucovska (1995). For the post-dam model some river reaches were updated with cross-sections measured in 1993. In addition, the reservoir and related hydraulic structures and canals were included. As the conditions after damming of the Danube have changed significantly, re-calibration of the post-dam model was carried out for the period April 1993–July 1993. Subsequently, the model was validated against measured data from the period November 1992–March 1993.
4.1.1. MIKE 11 River Model for the Danube
The MIKE 11 models have been established in two versions reflecting post- and pre-dam conditions, respectively.
• one-dimensional MIKE 11 model for the Danube from Bratislava to Komarno, • one-dimensional MIKE 11 model for the river branch system at the Slovak floodplain, and • two-dimensional MIKE 21 model for the reservoir.
The following models have been constructed, calibrated and validated:
4.1. RIVER AND RESERVOIR FLOW MODELLING
Comprehensive data collection and processing as well as model calibration and validation were carried out (DHI et al., 1995). In the following sections a few selected results are presented for the individual components. Further aspects of model validation focusing on integrated aspects are discussed in Section 5.
4. Selected Results from Model Construction, Calibration and Validation of Individual Components
442
443
The regional and local ground water models all use the coupled version of the MIKE SHE and MIKE 11 and hence, include modelling of evapotranspiration and
• A regional ground water model for pre-dam conditions (3000 km2 , 500 m horizontal grid, 5 vertical layers). • A regional ground water model for post-dam conditions (3000 km2 , 500 m horizontal grid, 5 vertical layers). • A local ground water model for an area surrounding the reservoir for both preand post-dam conditions (200 km2 , 250 m horizontal grid, 7 vertical layers). • A local ground water model for the river branch system for both pre- and postdam conditions (50 km2 , 100 m horizontal grid, 2 vertical layers). • A cross-sectional (vertical profile) model near Kalinkovo at the left side of the reservoir (2 km long, 10 m horizontal grid, 24 vertical layers).
Ground water modelling has been carried out at three different spatial scales:
4.2. GROUND WATER FLOW MODELLING
A MIKE 21 hydrodynamic model for the reservoir was established based on a reservoir bathymetry measured in 1994. The spatial resolution of the finite difference model is 100 × 50 m. The model was calibrated against flow velocities measured in the reservoir in the autumn of 1994.
4.1.3. MIKE 21 Reservoir Model
inlet structure with water from the power canal. The system consists of a number of compartments (cascades) separated by small dikes. On each of these dikes combined structures of culverts and spillways are located enabling some control of the water levels and flows in the system. Results of the model calibration against data measured during the summer 1994 are shown in Klucovska and Topolska (1995). Finally, the model was validated by demonstrating the ability to reproduce water levels measured during the summer of 1993. Some of these results are presented in Sørensen et al. (1996).
Figure 5. Layout of the river branch system on the Slovakian side of the Danube.
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
J. C. REFSGAARD ET AL.
The ground water model was calibrated against selected measured time series of ground water levels. The following parameters were subject to calibration: specific yield in the upper aquifer layer, leakage coefficients for the river bed and hydraulic conductivities for the aquifer layers. The soil physical characteristics for the unsaturated zone have been adopted directly from the unsaturated zone/agricultural modelling. The river model that has been used in the ground water modelling is identical to the MIKE 11 river model of the Danube, which was successfully validated independently as a ‘stand alone model’ (Subsection 4.1, above). When coupling MIKE SHE and MIKE 11 water is exchanged between the two models. The amount of water that recharges the aquifer in the upstream part and re-enters the river further downstream is in the order of 10–60 m3 s−1 depending on the Danube discharge and on the actual ground water level. The recharge is typically two orders of magnitude less than the Danube discharge, and hence, a re-calibration of the MIKE 11 river model is not required. As the major part of the ground water recharge originates from infiltration through the river bed, the leakage coefficient for the river bed becomes very important. Limited field information was available on this parameter, and hence, it was assumed spatially constant and through calibration assessed to be 5 × 10−5 s−1 for the Danube and Vah rivers and 5 × 10−6 s−1 for
4.2.2. Model Calibration
Comprehensive input data were available and used in the construction of the models. In general, the regional and the local models are based on the same data with the main difference being that the local models provide finer resolutions and less averaging of measured input data. The two regional models, reflecting pre- and post-dam conditions, are basically the same. The only difference is that the postdam model includes the reservoir and related hydraulic structures and seepage canals. The models are based on information on location of river systems and crosssectional river geometry, surface topography, land use and cropping pattern, soil physical properties and hydrogeology. In addition, time series of daily precipitation, potential evapotranspiration and temperature as well as discharge inflow at Bratislava have been used. Comprehensive geological data exist from this area, see e.g., Mucha (1992) and Mucha (1993). The aquifer, ranging in thickness from about 10 m at Bratislava to about 450 m at Gabcikovo, consists of Danube river sediments (sand and gravel) of late Tertiary and mainly Quaternary age. The present model is based on the work of Mucha et al. (1992a, b).
4.2.1. Model Construction
snowmelt processes, river flow, unsaturated flow and ground water flow. The crosssectional model only includes ground water processes.
444
445
A geochemical field investigation was carried out in a cross-section north of the reservoir near Kalinkovo as a basis for identifying the key geochemical processes and estimating parameter values (see Mucha, 1995). Eleven multi-screen wells were installed close to the water supply wells at Kalinkovo forming a 7.5 km long cross-section parallel to the regional ground water flow direction. The multi-screen wells have been sampled frequently to investigate the ongoing bio-geochemical processes during infiltration of the Danube river water into the aquifer. A ground water quality model was established for the Kalinkovo cross-sectional profile based on all the measured field data. This model includes a comprehensive description of the bio-geochemical processes such as kinetically controlled denitrification and equilibrium controlled inorganic chemistry based on the well known PHREEQE code. More details are given in Griffioen et al. (1995) and
4.3. GROUND WATER QUALITY
The calibrated ground water model was validated by demonstrating the ability to reproduce measured ground water tables after damming of the Danube. In this regard the only model modification is the inclusion of the reservoir and related structures and canals. Due to the nonstationarity of the hydrological regime such a validation test, which according to Klemes (1986) is denoted a differential splitsample test, is a demanding test. Figure 7 shows the simulated and observed ground water levels for the same three observation wells as shown for the calibration period in Figure 6. The effects of the damming of the Danube in October 1992, when the new reservoir was established, is clearly seen in terms of increased ground water levels and reduced ground water dynamics when comparing the two figures. These features are well captured by the model.
4.2.3. Model Validation
the Little Danube. These values are in good agreement with previous modelling experiences (Mucha et al., 1992b). When keeping the specific yield and the leakage coefficients for the river bed fixed the main calibration parameters were the hydraulic conductivities of the saturated zone. About 300 time series of ground water level observations were available for the model area, typically in terms of 30–40 yr of weekly observations. The calibration was carried out on the basis of about 80 of these series for the period 1986–1990. In the parameter adjustments the overall spatial pattern described in the geological model were maintained. Some of the calibration results are illustrated in Figure 6 showing observed Danube discharge data together with simulated and measured ground water levels for three wells located at different distances from the Danube. Wells 694 and 740 are seen to react relatively quickly to fluctuations in river discharge as compared to well 7221, which is located further away from the river. This illustrates how the dynamics of the Danube propagates and is dampened in the aquifer.
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
446
Figure 6. Danube discharge at Bratislava together with simulated and observed ground water levels for three wells before the damming of the Danube (calibration period).
J. C. REFSGAARD ET AL.
447
Modelling of the pre-dam and post-dam conditions of agricultural potential and nitrate leaching risk was carried out using a representative selection of soil units, cropping pattern and meteorological data covering the area between Danube and Maly Danube (Figure 1). The DAISY model uses time-varying ground water levels (simulated with the regional MIKE SHE ground water model) as lower boundary condition, for the unsaturated flow simulations. Cropping pattern and fertiliser application is included in the model based on measurements and statistical data. The model was calibrated on the basis of data from field experiments carried out during the years 1981–1987 at the experimental station in Most near Bratislava. During this process the crop parameters used in the model were adjusted to Slovak
4.4. UNSATURATED ZONE AND AGRICULTURAL MODELLING
Engesgaard (1996). The transport part of the Kalinkovo cross-section has been calibrated against 18 O isotope data. The parameters describing reactive processes have been assessed and adjusted on the basis of the detailed field measurements in the Kalinkovo cross-sectional profile. It was shown that the geochemical model behaves qualitatively correct (Engesgaard, 1996).
Figure 7. Simulated and observed ground water levels for three wells after damming of the Danube (validation period).
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
J. C. REFSGAARD ET AL.
A two-dimensional fine graded sediment model was constructed for the reservoir. The suspended sediment input was imposed as a boundary condition in Bratislava with time series of sediment concentrations of six suspended sediment fractions with their own grain sizes and fall velocities. The fall velocity for each of the six fractions was assessed according to field measurements. No further model calibration was carried out. The only field data available for validation were a few bed
4.5.3. Reservoir Sediment Model
A one-dimensional fine sediment model was constructed for the river branch system in order to have a tool for quantitative evaluation of the possible sedimentation in the river branch system for alternative water management options. The upstream boundary condition for the model was provided in terms of concentration of suspended sediments simulated by the reservoir model. As virtually no field data on sedimentation in the river branch system were available neither calibration nor validation was possible. Instead, experienced values of model parameters from other similar studies as reported in the literature were used.
4.5.2. Sediment Transport in the River Branch System
A one-dimensional morphological model was established for the Danube. The model operates with cross-sectional averaged parameters representing the river reach between every computational point (i.e. approximately 500 m), a special technique for comparing ‘real’ and simulated state variables was required. Therefore, the changes in mean water level over a decade rather than changes in bed elevations were compared between observations and simulations. For this purpose the changes in the so-called ‘Low Regulation and Navigable Water Level’ (LRNWL) were used. LR-NWL is specified by the Danube Commission as the water level corresponding to Q94% which is approximately 980 m3 s−1 . By using such an approach, perturbations in bed levels from one cross-section to another did not destroy the picture of the overall trends in aggradation and degradation of the river bed. The results of the calibration (1974–84) and validation runs (1984–90) are described in Topolska and Klucovska (1995).
4.5.1. Danube River Sediment Transport
4.5. RIVER AND RESERVOIR SEDIMENT TRANSPORT MODELLING
conditions. After the initial model construction and calibration, the model performance was evaluated through preliminary simulations using data from a number of plots located on an experimental field site at Lehnice in the middle of the project area. On the basis of comparisons between measured and simulated values of nitrogen uptake, dry matter yield and nitrate concentrations in soil moisture, the model performance under Slovak conditions was considered satisfactory (DHI et al., 1995).
448
449
In the reservoir the driving force is also the algae growth and hence, a eutrophication model (MIKE 21 EU) was applied. The reservoir model was calibrated against measured data from August 1994. This field programme was substantial and resulted in much more data than available for the river branch system. Good correspondence between simulated and observed values were achieved during the calibration period. However, no further data have been available for independent validation tests.
4.6.3. Reservoir Model
The water quality in the river branches was simulated with a eutrophication model (MIKE 11 EU), in which the algae production is the driving force. The algae growth in this model is described as a function of incoming light, transparency of the water, temperature, sedimentation and growth rate of the algae and of the available inorganic nutrients. The calibration was carried out on the basis of few data available during the period June–August 1993. Due to lack of further data no independent model validation was possible and hence, the uncertainties related to applying the model for making quantitative predictions of the effects of alternative water management schemes may be considerable.
4.6.2. Model for the River Branch System
A BOD-DO model (MIKE 11 WQ) has been used to describe the water quality in the main stream of the Danube between Bratislava and Komarno. This model describes oxygen concentration (DO) as a function of the decay of organic matter (BOD), transformation of nitrogen components, re-aeration, oxygen consumption by the bottom and oxygen production and respiration by living organisms. As the conditions from pre-dam to post-dam have changed significantly, separate calibrations and validations were carried out. The pre-dam model was calibrated against data from October 1991 and validated against data from April and August/September 1991. The post-dam model was calibrated against data from May 1993 and validated against data from June 1993.
4.6.1. Danube River Model
4.6. SURFACE WATER QUALITY MODELLING
sediment samples from summer 1994 with data on sedimentation thickness and grain size analyses (Holobrada et al., 1994). A comparison of model results and field data indicated that a reservoir sedimentation of the right order of magnitude was simulated. The simulated reservoir sedimentation corresponded to 42% of the total suspended load at Bratislava.
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
J. C. REFSGAARD ET AL.
The model calibration and validation have basically been carried out for the individual models using separate domain data for river system, aquifer system, etc. Rigorous validation tests of the integrated model were generally not possible due to lack of specific and simultaneous data on the processes describing the various couplings. Furthermore, although reasonable good assessments of uncertainties of the individual model predictions could be made, it was not obvious how such uncertainty would propagate in the integrated model. It can be argued that uncertainties in output from one model would in principle influence the uncertainties in other components of the integrated modelling system, thus adding to the total uncertainty of the integrated model. Following this line of argument would lead to the conclusion that the uncertainty of predictions by the integrated model would be larger than the corresponding uncertainty of predictions made by traditional individual models. On the other hand it can also be argued that in the integrated modelling approach the uncertainties in the crucial boundary conditions are reduced, because assumptions needed for executing individual models are substituted by model simulations based on data from neighbouring domains, which, if properly calibrated and validated, better represent the boundary effects. This would lead to the conclusion that the uncertainties in predictions by the integrated model would be smaller than those of the individual models. In the present study, no theoretical analyses have been made of this problem. Instead, a few validation tests have been made for cases where the couplings could indirectly be checked by testing the performance of the integrated model against independent data. In the following, results from one of these validation tests for the integrated model are shown. The river-aquifer interaction changed significantly, when the reservoir was established. An important model parameter describing this interaction is the leakage coefficient, which was calibrated on the basis of ground water level data for the predam situation (Subsection 4.2). For the post-dam situation the MIKE 21 reservoir model calculates the thickness and grain sizes of the sedimentation at all points in the reservoir. By use of the Carman-Kozeny formula, the leakage factors are recalculated for the area which was now covered by the reservoir. The model results were then checked against ground water level observations from wells near the reservoir, and it was found, that a calibration factor of 10 had to be applied to the Carman-Kozeny formula. This can theoretically be justified by the fact that the sediments are stratified or layered due to variations in flow velocities during the sedimentation process. The same formula and the same calibration factor was also used for converting all texture data from aquifer sediment samples to hydraulic conductivity values in the model. Now, how can the validity of the integrated model be tested ? The ground water level observations from a few wells have been used to assess the leakage calibration factor, so although the model output was subsequently checked against data from
5. Validation of Integrated Model
450
451
The hydrology of the river branch system is highly complex with many processes influencing the water characteristics of importance for flora and fauna (Figure 2). These processes are highly interrelated and dynamic with large variations in time and space. The complexity of the floodplain, with its river branch system, is indicated in Figures 5 and 9 for the 20 km reach downstream the reservoir on the Slovakian side, where alluvial forest occurs. Before the damming of the Danube
6.1. HYDROLOGY OF RIVER BRANCH SYSTEM
6. Model Application – Case Study of River Branch System
many more wells, it may be argued that this in itself is not sufficient for a true model validation. Consider instead a comparison of simulated and measured discharges in the so-called seepage canals, which are small canals constructed a few hundred meters away from the reservoir with the aim of intercepting part of the infiltration through the bottom of the reservoir. In Figure 8 it can be seen that the model simulations match the measured data remarkably well at different locations along the seepage canals. Thus, at the two stations most downstream on both seepage canals (stations 2809 and 3214) the agreements between model predictions and field data are within 5%. This is a powerful test, because the discharge data have not been used at all in the calibration process, and because it integrates the effects of reservoir sedimentation, calculation of leakage factors and geological parameters.
Figure 8. Measured and simulated discharges in seepage canals. The data are from a particular day in May 1995 and in m3 s−1 .
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
Figure 9. Plan and perspective view of the surface topography, of the river branches and the related flood plains as represented in a model network of 100 m grid squares.
J. C. REFSGAARD ET AL.
in 1992 the river branches were connected with the Danube during periods with discharge above average. However, some of the branches were only active during flood situations a few days per year. It was anticipated that after the damming, the water level in the Danube would decrease significantly. Therefore, in order to avoid that water drains from the river branches to the Danube, resulting in totally dry river branches, the water outflow from branches into the Danube have been blocked except for the downstream one at chainage 1820 rkm (Figure 5). Now, the river branch system receives water from an inlet structure in the hydropower canal at Dobrohost (Figure 5). This weir has a design capacity of 234 m3 s−1 . Together with the various hydraulic structures in the river branches, it controls the hydraulic, hydrological and ecological regime in the river branches and on the flood plains.
452
453
Comprehensive field studies and modelling analyses are often carried out in connection with assessing environmental impacts of hydropower schemes. Recent examples from the Danube include the studies of the Austrian schemes Altenwörth (Nachtnebel, 1989) and Freudenau (Perspektiven, 1989). However, like in the Austrian cases, the modelling studies have most often been limited to independent modelling of river systems, groundwater systems or other subsystems, without providing an integrated approach as the one presented in this paper. The models in this study were applied in a scenario approach simulating the hydrological conditions resulting from alternative possible operations of the entire system of hydraulic structures (alternative water management regimes). Thus, one historical (pre-dam) regime and three hypothetical (post-dam) water regimes cor-
6.2. MODELLING APPROACH
Figure 10. Steps in integrated model for floodplain hydrology.
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
J. C. REFSGAARD ET AL.
Step 3a. Regional ground water flow (MIKE SHE/MIKE 11) Model simulation: The coupled MIKE SHE/MIKE 11 model simulates the ground water flow and levels including the interaction with the river system and the reservoir. Coupling: In the reservoir, the infiltration is simulated on the basis of leakage coefficients, which have been calculated from the amount and composition (grain sizes) of the sedimentation on the reservoir bottom (Step 2). This link between reservoir sedimentation and ground water was shown to be crucial for the model results. Furthermore, an iterative link to the DAISY agricultural model exists (Step 3b). Hence, spatially and temporally varying ground water levels from MIKE SHE/MIKE 11 are used as lower boundary conditions in DAISY, which in turn simulates the leaf area index and the root zone depth which are used as input time series data in MIKE SHE/MIKE 11. The model outputs, in terms of ground water flow velocities, are used as input to the ground water quality simulation. The model results, in terms of river flow velocities and water levels, ground water flow velocities and water levels, are used as time varying boundary conditions for the local flood plain model (Step 4b).
Step 2. Reservoir modelling (MIKE 21) Model simulation: The MIKE 21 reservoir model simulates velocities, sedimentation and eutrophication/water quality in the reservoir. Coupling: The flow boundary conditions are generated by the river model (Step 1). Results on sedimentation are used to calculate leakage coefficients. Results on oxygen, nitrogen and carbon can be used as boundary conditions of river water quality, water quality of infiltrating water (Step 3a).
Step 1. Hydraulic river modelling (MIKE 11) Model simulation: The MIKE 11 model simulates the river flows and water levels in the entire river system and river branches. Coupling: The model outputs, in terms of flows into the reservoir at the upstream end and downstream outflows through the reservoir structures are used as boundary conditions for the reservoir modelling (Step 2). Furthermore, the flow velocities and water levels are used in the river water quality simulations (Step 4a).
responding to alternative operation schemes for the structures of the Gabcikovo system were simulated (DHI et al., 1995). Due to the integration of the overall modelling system each scenario simulation involves a sequence, some times in an iterative mode, of model calculations. For the case of river branch modelling a hierarchical scheme of simulation runs (Figure 10) included the following major steps:
454
455
Step 5. Ecology A correlation matrix between the physical/chemical parameters provided by the model simulations (Steps 4a, b and c) and the aquatic and terrestric ecotopes has been established for the project area. Alternative water management regimes can be described in terms of specific operation of certain hydraulic structures and corresponding distribution of water discharges primarily between the Danube, the Gabcikovo hydropower scheme and the river branch
Step 4c. River branches sedimentation (MIKE 11) Model simulation: The MIKE 11 model simulates the transport of fine sediments through the river branch system. As a result the sedimentation/erosion and the suspended sediment concentrations are simulated. Coupling: The model uses sediment concentrations simulated by the reservoir model (Step 2) as input. Furthermore, the flow velocities simulated by the local flood plain model (Step 4b) are used as the basis for the sediment calculations. The results, in terms of grain size of the river bed and concentrations of suspended material, are used as input to the ecological assessments (Step 5).
Step 4b. Flood plain model (MIKE SHE/MIKE 11) Model simulation: The coupled MIKE SHE/MIKE 11 model simulates all the flow processes in the flood plain area including water flows and storages on the ground surface, river flows and water levels, ground water flows and water levels, evapotranspiration, soil moisture content in the unsaturated zone and capillary rise. Coupling: The model uses data from Step 3a as boundary conditions and provides river flow velocities as the basis for the water quality and sediment simulations (Steps 4a and c). The model provides data on flood frequency and duration, depth of flooding, depth to ground water table, moisture content in the unsaturated zone and flow velocities in river branches, which are key figures in the subsequent ecological assessments (Step 5).
Step 4a. River branches water quality (MIKE 11) Model simulation: The MIKE 11 model simulates the river water quality (BOD, DO, COD, NO3 , etc). Coupling: The model uses data from Step 2 and Step 4b and produces output on concentrations of COD and DO, which are used as input to the ecological assessments (Step 5).
Step 3b. Root zone (DAISY) Model simulation: The DAISY model simulates the unsaturated zone flows, the vegetation development, including crop yield. Coupling: The DAISY has an iterative link to the MIKE SHE/MIKE 11 model (as described above under Step 3a).
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
J. C. REFSGAARD ET AL.
system. The hydrological effects of such alternative operations can be simulated by the integrated model and subsequently, the ecological impacts can be assessed in terms of likely changes of ecotopes.
The extent of the floodplain model area is indicated in Figure 5 and a perspective view of the area with the river branch system and floodplains is shown in Figure 9. The horizontal discretization of the finite difference model is 100 m, and the ground water zone is represented by two layers. Several hundreds of cross-sections and more than 50 hydraulic structures in the river branch system were included in the MIKE 11 model for the river system. For the pre-dam model, the surface water boundary conditions comprise a discharge time series at Bratislava and a discharge rating curve at the downstream end (Komarno). For the post-dam model, the Bratislava discharge time series has been divided into three discharge boundary conditions, namely at Dobrohost (intake from hydropower canal to river branch system), at the inlet to the hydropower canal and at the inlet to Danube from the reservoir. For the groundwater system, time varying ground water levels simulated with the regional ground water models act as boundary conditions. The Danube river forms an important natural boundary for the area. The Danube is included in the model, located on the model boundary, and symmetric ground water flow is assumed below the river. Hence, a zero-flux boundary condition is used for ground water flow below the river. To illustrate the complex hydrology and in particular the interaction between the surface and subsurface processes model results from a model simulation for a period in June–July 1993 are shown in Figures 11 and 12. Figure 11 presents the inlet discharges at the upstream point of the river branch system (Dobrohost), while the discharges and water levels at the confluence between the Danube and the hydropower outlet canal downstream of Gabcikovo during the same period are shown in Figure 12. Figure 11 further shows the soil moisture conditions for the upper two m below terrain and the water depth on the surface at location 2. Similar information is shown for location 1 in Figure 12. A soil water content above 0.40 (40 vol.%) corresponds to saturation. Location 2 is situated in the upstream part of the river branch system, while location 1 is located in the downstream part (see Figure 9). At location 2 (Figure 11) flooding is seen to occur as a result of river spilling (surface inundation occurs before the ground water table rises to the surface) whenever the inlet discharge exceeds approximately 60 m3 s−1 . The soil moisture content is seen to react relatively fast to the flooding and the soil column becomes saturated. In contrary, full saturation and inundation does not occur in connection with the flood in the Danube in July, but the event is recognised through increasing ground water levels following the temporal pattern of the Danube flood.
6.3. THE FLOODPLAIN MODEL
456
457
As an example of the results which can be obtained by the floodplain model, Figure 13 shows a characterisation of the area according to flooding and depths to groundwater. The map has been processed on the basis of simulations for 1988 for pre-dam conditions. The classes with different ground water depths and flooding
6.4. EXAMPLE OF MODEL RESULTS
At location 1 (Figure 12) the conditions are somewhat different. During the simulation period location 1 never becomes inundated due to high inlet flows at Dobrohost. However, during the July flood in Danube, inundation at location 1 occurs as a result of increased ground water table caused by higher water levels in river branches due to backwater effects from the Danube. The surface elevation at location 1 is 116.4 m which is 0.4 m below the flood water level shown in Figure 12 at the confluence (5 km downstream of location 1). It is noticed that the inundation at this location occurs as a result of ground water table rise and not due to spilling of the river (surface inundation occurs after the ground water table has reached ground surface).
Figure 11. Observed inlet discharge to the river branch system at Dobrohost; simulated moisture contents at the upper two m of the soil profile at location 2 and simulated depths of inundation at location 2 during June–July 1993.
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
Figure 12. Simulated discharge and water levels in the Danube at the confluence between Danube and the outlet canal from the hydropower plant; simulated moisture contents at the upper two meter of the soil profile at location 1 and simulated depths of inundation at location 1 in the river branch system during June–July 1993.
J. C. REFSGAARD ET AL.
have been determined from ecological considerations according to requirements of (semi)terrestrial (floodplain) ecotopes. From the figure the contacts between the main Danube river and the river branch system is clearly seen. Similar computations have been made by alternative water management schemes after damming of the Danube. The results of one of the hypothetical post-dam water management regimes, characterized by average water flows in the power canal, Danube and river branch system intake of 1470 m3 s−1 , 400 m3 s−1 and 45 m3 s−1 , respectively, are shown in Figure 14. By comparing Figure 13 and Figure 14 the differences in hydrological conditions can clearly be seen. For instance the pre-dam conditions (Figure 13) are in many places characterised by high groundwater tables
458
459
The integrated modelling system and the way it was applied includes different degrees of integration ranging from sequential runs, where results from one model are used as input to the next model, to a full integration, such as the coupling between MIKE SHE and MIKE 11. Hence, the system is not truly integrated in all respects. The justification for these different levels lies in assessments of where it was required in the present project area to account for feed back mechanisms and where such feed backs could be considered to be of minor importance for all practical purposes. For other areas with different hydrological characteristics, the required levels of integration are not necessarily the same. Therefore, a discussion
7. Limitations in the Couplings made in the Integrated Model
and small/seldom flooding, while the post-dam situation (Figure 14) generally has deeper ground water tables and more frequent flooding. From such changes in hydrological conditions inferences can be made on possible changes in the floodplain ecosystem. Further scenarios (not shown here) have, amongst others, investigated the effects of establishing underwater weirs in the Danube and in this way improvement of the connectivity between the Danube and the river branch system.
Figure 13. Hydrological regime in the river branch area for 1988 pre-dam conditions characterized in ecological classes.
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
Figure 14. Hydrological regime in the river branch area for a post-dam water management regime characterized in ecological classes. The scenario has been simulated using 1988 observed upstream discharge data and a given hypothetical operation of the hydraulic structures.
J. C. REFSGAARD ET AL.
B. Reservoir/river (MIKE 21/MIKE 11) This coupling is a simple one-way coupling with the reservoir model providing input data to the downstream river model, both in terms of sediment and water
A. Hydrological catchment/river hydraulics (MIKE SHE/MIKE 11) This coupling between the hydrological code and the river hydraulic code is fully dynamic and fully integrated with feed back mechanisms between the two codes within the same computational time step. This coupling cannot be treated sequentially in this area, since the feedback between river and aquifer works in both directions, with the river functioning as a source in part of the area and as a drain in other parts, and since the direction of the stream-aquifer interaction changes dynamically in time and space as a consequence of discharge fluctuations in the Danube. This coupling was shown to be crucial during the course of the project, and, due to the full integration, it is fully generic.
is given below on the universality and limitations of the various couplings made in the present case.
460
461
E. Surface water/ground water quality (MIKE 11 – MIKE 21/MIKE SHE) In contrary to the full coupling of flows (coupling A) the corresponding water quality coupling is a simple one-way coupling with the river and reservoir models providing the water quality parameters in the infiltrating water and uses these as boundary conditions for the ground water quality simulations. This coupling is sufficient in the present case with respect to the reservoir, where the flow always is from the reservoir to the aquifer. The river-aquifer interaction involves flows in both directions, but the return flow from the aquifer to the Danube is very small (about 1%) as compared to the Danube flow, and hence, the feedback from the ground water quality to Danube water quality is assumed negligible. However, for other cases where the mass flux from the aquifer to the river system is important for the river water quality, the present one-way coupling will not be sufficient.
D. Hydrology catchment/crop growth (MIKE SHE/DAISY) This coupling is an iterative coupling with data flowing in both directions. However, it is not a full integration with the two model codes running simultaneously. Therefore, a number of iterations are required until the input data used in MIKE SHE (vegetation data simulated by DAISY) generates the input data used in DAISY (ground water levels) and vice versa. For example, changes in river water levels affect the ground water levels, implying that the crop growth conditions change and hence, the DAISY simulated vegetation data used by MIKE SHE to simulate the ground water levels are not correct. In such a case, the MIKE SHE simulation has to be repeated with the new crop growth data and subsequently, the DAISY simulation has to be repeated with the new ground water levels, etc., until the differences become negligible. This coupling has been used successfully in previous studies (Styczen and Storm, 1993), but may, due to the iterative mode, be troublesome in practise.
C. Reservoir/groundwater water exchange (MIKE 21/MIKE SHE) This coupling is a simple one-way coupling with the reservoir model providing data on sedimentation to the groundwater module of MIKE SHE, where they are used to calculate leakage coefficients in the surface water/ground water flow calculations. This coupling is sufficient in the present case, where the reservoir water table always is higher than the ground water table, and where the flow always is from the reservoir to the aquifer. However, for cases where water flows in both directions, or where there are significant temporal variations in the sedimentation, the present coupling is not necessarily sufficient.
quality parameters. This coupling is sufficient in the present case, because there is no feedback from the downstream river to the reservoir. Even though this coupling is not fully generic, it may be sufficient in most cases, even in cases with a network of reservoirs and connecting river reaches.
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
J. C. REFSGAARD ET AL.
The hydrological and ecological system of the Danubian Lowland is so complex with so many interactions between the surface and the subsurface water regimes and between physical, chemical and biological changes, that an integrated numerical modelling system of the distributed physically-based type is required in order to provide quantitative assessments of environmental impacts on the ground water, the surface water and the floodplain ecosystem of alternative management options for the Gabcikovo hydropower scheme. Such an integrated modelling system has been developed, and an integrated model has been constructed, calibrated and, to the extent possible, validated for the 3000 km2 area. The individual components of the modelling system represent state-of-the-art techniques within their respective disciplines. The uniqueness is the full integration. The integrated system enables a quite detailed level of modelling, including quantitative predictions of the surface and ground water regimes in the floodplain area, ground water levels and dynamics, ground water quality, crop yield and nitrogen leaching from agricultural land, sedimentation and erosion in rivers and reservoirs, surface water quality as well as frequency, magnitude and duration of inundations in floodplain areas. The computations were carried out on Hewlett Packard Apollo 9000/735 UNIX workstations with 132 MB RAM. With a 300 MHz Pentium II NT computer a typical computational times for one of the steps described in Section 6.2 (Figure 10) would be 2–10 hr. Thus, although the integrated system is rather computationally demanding, the computational requirements are not a serious constraint in practise as compared to the demand for comprehensive field data. For most of the individual model components, traditional split-sample validation tests have been carried out, thus documenting the predictive capabilities of these models. However, this was not possible for some aspects of the integrated model. Hence, according to rigorous scientific modelling protocols, the integrated model can be argued to have a rather limited predictive capability associated with large uncertainties. A theoretical analysis of error propagation in such an integrated model would be quite interesting, but was outside the scope of the present study which was limited to the comprehensive task of developing the integrated modelling system and establishing the integrated model on the basis of all available data. However, on the basis of the few possible tests (e.g. Figure 7) of the integrated model against independent data not used in the calibration-validation process for the individual models, it is our opinion that the uncertainties of the integrated model are significantly smaller than those of the individual models. The two key reasons for this are: (1) in the integrated model the internal boundaries are simulated by neighbouring model components and not just assessed through qualified but subjective estimates by the modeller; and (2) the integrated model makes it possible to explicitly include more sources of data in validation tests that can not all be utilised in the individual models. Thus, by adding independent validation tests for
8. Discussion and Conclusions
462
463
Bathurst, J. C., Wicks, J. M. and O’Connel, P. E.: 1995, The SHE/SHESED basin scale water flow and sediment transport modelling system, In V. P. Singh (ed.), Computer Models of Watershed Hydrology, Water Resources Publications, pp. 563–594. Calver, A. and Wood, W. L.: 1995, The institute of hydrology distributed model, In V. P. Singh (ed.), Computer Models of Watershed Hydrology, Water Resources Publications, pp. 595–626. CEC: 1991, Commission of European Communities, Czech and Slovak Federative Republic, Danubian Lowland-Ground Water Model, No. PHARE/90/062/030/001/EC/WAT/1 DHI: 1995, MIKE 21 Short Description. Danish Hydraulic Institute, Hørsholm, Denmark. DHI, DHV, TNO, VKI, Krüger and KVL: 1995, PHARE project Danubian Lowland – Ground Water Model (EC/WAT/1), Final Report. Prepared by a consultant group for the Ministry of the Environment, Slovak Republic and for the Commission of the European Communities, Vol. 1, 65 pp.; Vol. 2, 439 pp.; Vol. 3, 297 pp., Bratislava. EC: 1992, Working group of independent experts on variant C of the Gabcikovo-Nagymaros project, working Group Report, Commission of the European Communities, Czech and Slovak Federative Republic, Republic of Hungary, Budapest, 23 November, 1992. EC: 1993a, Working group of monitoring and water management experts for the Gabcikovo system of locks – Data Report, Commission of the European Communities, Republic of Hungary, Slovak Republic, Budapest, 2 November, 1993.
References
The present paper is based on results from the project ‘Danubian Lowland – Ground Water Model’ supported by the European Commission under the PHARE program. The project was executed by the Slovak Ministry of the Environment. The work was carried out by an international group of research and consulting organisations as reflected by the team of authors. The constructive criticisms of two anonymous reviewers are acknowledged.
Acknowledgements
the integrated model, such as the one shown in Figure 7 on discharges in seepage canals, to the validation tests for the individual models, the outputs of the integrated model have been subject to a more comprehensive test based on more data and hence, must be considered less uncertain than outputs from the individual models. The environmental impacts of the new reservoir and the diversion of water from the Danube through the Gabcikovo power plant can be simulated in rather fine detail by the integrated model established for the area. The integrated nature of the model has been illustrated by a case study focusing on hydrology and ecology in the wetland comprising the river branch system. The integrated model is not claimed to be capable of predicting detailed ecological changes at the species level. However, it is believed to be capable of simulating changes in the hydrological regime resulting from alternative water management decisions to such a degree of detail that it becomes a valuable tool for broader assessments of possible ecological changes in the area.
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
J. C. REFSGAARD ET AL.
EC: 1993b, Working group of monitoring and water management experts for the Gabcikovo system of locks – Report on temporary water management regime, Commission of the European Communities, Republic of Hungary, Slovak Republic, Bratislava, 1 December, 1993. Engesgaard, P.: 1996, Multi-Species Reactive Transport, In M. B. Abbott and J. C. Refsgaard (eds), Distributed Hydrological Modelling, Kluwer Academic Publishers, pp. 71–91. Griffioen, J., Engesgaard, P., Brun, A., Rodak, R., Mucha, I. and Refsgaard, J. C.: 1995, Nitrate and Mn-chemistry in the alluvial Danubian Lowland aquifer, Slovakia. Ground Water Quality: Remediation and Protection (GQ95), Proceedings of the Prague Conference, May 1995, IAHS Publ. No. 225, pp. 87–96. Hansen, S., Jensen, H. E., Nielsen, N. E. and Svendsen, H.: 1991, Simulation of nitrogen dynamics and biomass production in winter wheat using the Danish simulation model DAISY. Fertilizer Research 27, 245–259. Havnø, K., Madsen, M. N. and Dørge, J.: 1995, ‘MIKE 11 – A Generalized River Modelling Package’, In V. P. Singh (ed), Computer Models of Watershed Hydrology, Water Resources Publications, pp. 733–782. Holobrada, M., Capekova, Z., Lukac, M. and Misik, M.: 1994, Prognoses of the Hrusov reservoir eutrophication and siltation under various discharge distribution to the Old Danube (in Slovak), Water Research Institute (VUVH), Bratislava. ICJ: 1997, Case Concerning Gabcikovo-Nagymaros project (Hungary/Slovakia). Summary of the Judgement of 25 September 1997. International Court of Justice, The Hague, (available on www.icj-cij.org). JAR: 1995, 1996, 1997, Joint Annual Report of the environment monitoring in 1995, 1996, 1996 according to the ‘Agreement between the Government of the Slovak Republic and the Government of Hungary about Certain Temporary Measures and Discharges to the Danube and Mosoni Danube’, signed 19 April, 1995. Klemes, V.: 1986, Operational testing of hydrological simulation models, Hydrological Sciences Journal, 13–24. Klucovska, J. and Topolska, J.: 1995, Water regime in the Danube river and its river branches, In I. Mucha (ed.), Gabcikovo Part of the Hydroelectric Power Project. Environmental Impact Review, Faculty of Natural Sciences, Comenius University, Bratislava, pp. 33–42. Kocinger, D.: 1995, Gabcikovo Part of the Hydroelectric Power Project, Basic Characteristics, In I. Mucha (ed.), Gabcikovo Part of the Hydroelectric Power Project – Environmental Impact Review, Faculty of Natural Sciences, Comenius University, Bratislava, pp. 5–14. Koncsos, L., Schütz, E. and Windau, U.: 1995, Application of a comprehensive decision support system for the water quality management of the river Ruhr, Germany, In S. P. Simonovic, Z. Kunzewicz, D. Rosbjerg and K. Takeuchi (eds), Modelling and Management of Sustainable Basin-Scale Water Resources Systems, IAHS Publ. No. 231, pp. 49–59. Menetti, M.: 1995, Analysis of regional water resources and their management by means of numerical simulation models and satellites in Mendoza, Argentina, In S. P. Simonovic, Z. Kunzewicz, D. Rosbjerg and K. Takeuchi (eds), Modelling and Management of Sustainable Basin-Scale Water Resources Systems, IAHS Publ. No. 231, pp. 49–59. Mucha, I.: 1992, Database processing of the hydropedological parameters for the ground water flow model of the Danubian Lowland (in Slovak), Ground Water Division, Faculty of Natural Science, Comenius University, Bratislava. Mucha, I., Paulikova, E., Hlavaty, Z., Rodak, D. and Pokorna, L.: 1992a, Danubian Lowland Ground Water Model, Working Manual to consortium of invited specialists for workshop in Bratislava, Ground Water Division, Faculty of Natural Sciences, Comenius University, Bratislava. Mucha, I., Paulikova, E., Hlavaty, Z. and Rodak, D.: 1992b, Elaboration of basis data for preparation of hydrogeological parameters for the model of the ground water flow of the Danubian Lowland area (in Slovak), Ground Water Division, Faculty of Natural Science, Comenius University, Bratislava.
464
465
Mucha, I., Paulikova, E., Hlavaty, Z., Rodak, D. and Pokorna, L.: 1993, Surface and ground water regime in the Slovak part of the Danube alluvium, Ground Water Division, Faculty of Natural Science, Comenius University. Mucha, I. (ed): 1995, Gabcikovo part of the hydroelectric power project environmental impact review. Evaluation based on two years monitoring, Faculty of Natural Sciences, Comenius University, Bratislava. Mucha, I., Rodak, D., Hlavaty, Z. and Bansky, L.: 1997, Environmental aspects of the design and construction of the Gabcikovo Hydroelectric Power Project on the river Danube, Proceedings International Symposium on Engineering Geology and the Environment, organized by the Greek National Group of IAEG, Athens, June 1997, Engineering Geology and the Environment, pp. 2809–2817. Nachtnebel, H.-P. (ed): 1989, Ökosystemstudie Donaustau Altenwörth, Veränderungen durch das Donaukraftwerk Altenwörth, Österreische Akademie der Wissenschaften, Veröffentlichungen des Österreischen MaB-Programs, Band 14, Universitätsverlag Wagner, Innsbruck. Person, M., Raffensperger, J. P., Ge, S. and Garven, G.: 1996, Basin-scale hydrogeologic modelling, Rev. Geophys. 34(1), 61–87. Perspektiven: 1989, Staustufe Freudenau, Perspektiven, Magazin für Stadtgestaltung und Lebensqualität, Dezember 1989. Refsgaard, J. C.: 1997, Parameterisation, calibration and validation of distributed hydrological models, J. Hydrology 198, 69–97. Refsgaard, J. C. and Storm, B.: 1995, MIKE SHE, In V. P. Singh (ed), Computer Models of Watershed Hydrology, Water Resources Publications, pp. 809–846. Singh, V. P. (ed): 1995, Computer Models of Watershed Hydrology, Water Resources Publications. Sørensen, H. R., Klucovska, J., Topolska, T., Clausen, T. and Refsgaard, J. C.: 1996, An engineering case study – Modelling the influences of the Gabcikovo hydropower plant in the hydrology and ecology in the Slovak part of the river branch system, In M. B. Abbott and J. C. Refsgaard (eds), Distributed Hydrological Modelling, Kluwer Academic Publishers, pp. 233–253. Styczen, M. and Storm, B.: 1993, Modelling of N-movements on catchment scale – a tool for analysis and decision making. 1. Model description. 2. A case study, Fertilizer Research 36, 1–17. Topolska, J. and Klucovska, J.: 1995, River morphology, In I. Mucha (ed.), Gabcikovo Part of the Hydroelectric Power Project. Environmental Impact Review, Faculty of Natural Sciences, Comenius University, Bratislava, pp. 23–32. VKI: 1995, Short Description of water quality and eutrophication modules,. Water Quality Institute, Hørsholm, Denmark. Winter, T. C.: 1995, Recent advances in understanding the interaction of groundwater and surface water, Rev. Geophys., Supplement, U.S. National Report 1991–94 to IUGG, pp. 985–994. Yan, J. and Smith, K. R.: 1994, Simulation of integrated surface water and ground water systems – model formulation, Water Resources Bulletin 30(5), 879–890.
AN INTEGRATED MODEL FOR THE DANUBIAN LOWLAND
[10]
Refsgaard JC, Thorsen M, Jensen JB, Kleeschulte S, Hansen S (1999) Large scale modelling of groundwater contamination from nitrogen leaching. Journal of Hydrology, 221(3-4), 117-140.
Reprinted from Journal of Hydrology with permission from Elsevier
Journal of Hydrology 221 (1999) 117–140
Large scale modelling of groundwater contamination from nitrate leaching J.C. Refsgaard a,*, M. Thorsen a, J.B. Jensen a, S. Kleeschulte b, S. Hansen c a
Danish Hydraulic Institute, Hørsholm, Denmark b GIM, Luxembourg c Royal Veterinary and Agricultural University, Copenhagen, Denmark Received 20 July 1998; received in revised form 3 May 1999; accepted 31 May 1999
Abstract Groundwater pollution from non-point sources, such as nitrate from agricultural activities, is a problem of increasing concern. Comprehensive modelling tools of the physically based type are well proven for small-scale applications with good data availability, such as plots or small experimental catchments. The two key problems related to large-scale simulation are data availability at the large scale and model upscaling/aggregation to represent conditions at larger scale. This paper presents a methodology and two case studies for large-scale simulation of aquifer contamination due to nitrate leaching. Readily available data from standard European level databases such as GISCO, EUROSTAT and the European Environment Agency (EEA) have been used as the basis of modelling. These data were supplemented by selected readily available data from national sources. The model parameters were all assessed from these data by use of various transfer functions, and no model calibration was carried out. The adopted upscaling procedure combines upscaling from point to field scale using effective parameters with a statistically based aggregation procedure from field to catchment scale, preserving the areal distribution of soil types, vegetation types and agricultural practices on a catchment basis. The methodology was tested on two Danish catchments with good simulation results on water balance and nitrate concentration distributions in groundwater. The upscaling/aggregation procedure appears to be applicable in many areas with regard to root zone processes such as runoff generation and nitrate leaching, while it has important limitations with regard to hydrograph shape due to its lack of accounting for scale effects in relation to stream aquifer interaction. q 1999 Elsevier Science B.V. All rights reserved. Keywords: Upscaling; Databases; Non-point pollution; Nitrate leaching; Distributed model; Water balance
1. Introduction Groundwater is a significant source of freshwater used by industry, agriculture and domestic users. However, increasing demand for water, increasing use of pesticides and fertilisers as well as atmospheric deposition constitute a threat to the quality of groundwater. The use of fertilisers and manure leads to the * Corresponding author. E-mail address:
[email protected] (J.C. Refsgaard)
leaching of nitrates into the groundwater and atmospheric deposition contributes to the acidification of soils that may have an indirect effect on the contamination of water. In Europe, for instance, the present situation is summarised in EEA (1995), where it is assessed that the major part of aquifers in Northern and Central Europe are subject to risk of nitrate contamination amongst others due to agricultural activities. Therefore, policy makers and legislators in EU are concerned about the issue and a number of preventive
0022-1694/99/$ - see front matter q 1999 Elsevier Science B.V. All rights reserved. PII: S0022-169 4(99)00081-5
118
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
legislation steps are being taken in these years (EU Council of Ministers, 1991; EC, 1996). In the scientific community, concerns on groundwater contamination have motivated the development of numerous simulation models for groundwater quality management. Groundwater models describing the flow and transport mechanisms of aquifers have been developed since the 1970s and applied in numerous pollution studies. They have mainly described the advection and dispersion of conservative solutes. More recently, geochemical and biochemical reactions have been included to simulate the transport and fate of pollutants from point sources as industrial and municipal waste disposal sites, see e.g. Mangold and Tsang (1991); Engesgaard et al. (1996) for overviews. Fewer attempts have been made to simulate non-point pollution at catchment scale resulting from agricultural activities, see e.g. Thorsen et al. (1996); Person et al. (1996) for overviews. The approaches range from relatively simple models with semi-empirical process descriptions of the lumped conceptual type such as ANSWERS (Beasley et al., 1980), CREAMS (Knisel, 1980; Knisel and Williams, 1995), GLEAMS (Leonard et al., 1987), SWRRB (Arnold and Wiliams, 1990; Arnold et al., 1995) and AGNPS (Young et al., 1995) to more complex models with a physically based process description. The physically based models are most commonly one-dimensional leaching models, such as RZWQM (DeCoursey et al., 1989, 1992), Daisy (Hansen et al., 1991) and WAVE (Vereecken et al., 1991; Vanclooster et al., 1994, 1995), which basically describe root zone processes only, while true, spatially distributed, catchment models based on comprehensive process descriptions, such as the coupled MIKE SHE/Daisy (Styczen and Storm, 1993), are seldom reported. The simple conceptual models are attractive because they require relatively less data, which are usually easily accessible, while the predictive capability of these models with regard to assessing the impacts of alternative agricultural practises is questionable due to the semi-empirical nature of the process descriptions. On the contrary, a key problem in using the more complex catchment models operationally lies in the generally large data requirements prescribed by the developers of such model codes. However, due to the better process descriptions these models may for some types of
application be expected to have better predictive capabilities than the simpler models (Heng and Nikolaidis, 1998). Input data for the complex catchment models have traditionally been available in practise only for small areas such as experimental research catchments. However, as more and more data have been gathered in computerised databases and, in particular, in Geographical Information Systems (GIS), the data availability has improved significantly. Further, experience from case studies indicates that a considerable part of the input data may be derived from statistical data and more general databases (Styczen and Storm, 1995). The database of EUROSTAT, the statistical office of the European Commission, holds statistical information about different topics from all Member States of the European Union. Agricultural statistics provide information on main crops, on the structure of agricultural holdings and crop and on animal production. Environment statistics provide figures on impacts of other sector’s work on the environment, such as fertiliser and pesticide input, groundwater withdrawal, water quality or manure production on animal farms. These figures are mostly aggregated and published on national level. In order to use these statistics in a spatially distributed simulation model, the information needs to be spatially referenced to represent a unit on the ground. Therefore the statistical information needs to be linked to a GIS data set. Such GIS data is stored in the GISCO (Geographic Information System of the European Commission) database. The GISCO database holds spatial data about administrative boundaries down to commune level, thematic data sets such as the soil database, CORINE land cover (managed by the EEA) or climatic time series for about 2000 measuring stations in the European Union. Thus on one hand, there is a clearly expressed need from decision makers at national and international level to have tools, which on the basis of readily available data can predict the risks of groundwater pollution from non-point sources and the impacts of alternative agricultural management practices; and on the other hand, the scientific community has achieved new knowledge and developed new tools aiming at this. However, there are some important gaps to be filled before the scientifically based tools
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
119
Fig. 1. Schematic structure of the MIKE SHE.
can be applied operationally for supporting the decision makers: • The physically based models are very promising tools for assessing the impacts of alternative agricultural practises, but have so far been tested on plot scale and very small experimental catchments, whereas the need from a policy making point of view mainly relates to application on a much larger scale. Hence, there is a need to derive and test methodologies for upscaling of such models to run with model grid sizes one to two order of magnitudes larger than usually done. • Readily available data on large (national and international) scales do exist, although in a somewhat aggregated form. However, such data have not yet been used as the basis for comprehensive modelling, which so far always have been based on more detailed data, often from experimental catchments. Hence, there is a need to test to which extent these readily available data are suitable for modelling. • There is a need to assess the predictive
uncertainties, before it can be evaluated whether the approach of combining complex predictive models with existing data bases is of any practical use in the decision making process or whether the uncertainties are too large. This paper presents results from a joint EU research project on prediction of non-point nitrate contamination at catchment scale due to agricultural activities. Other results from the same study focussing on uncertainty aspects are presented in UNCERSDSS (1998), Refsgaard et al. (1998a, 1999) and Hansen et al. (1999). 2. Methodology 2.1. Materials and methods 2.1.1. MIKE SHE MIKE SHE is a modelling system describing the flow of water and solutes in a catchment in a distributed physically based way. This implies numerical
120
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
solutions of the coupled partial differential equations for overland (2D) and channel flow (1D), unsaturated flow (1D) and saturated flow (3D) together with a description of evapotranspiration and snowmelt processes. The model structure is illustrated in Fig. 1. For further details reference is made to the literature (Abbott et al., 1986; Refsgaard and Storm, 1995). 2.1.2. Daisy Daisy (Hansen et al., 1991) is a one-dimensional physically based modelling tool for the simulation of crop production and water and nitrogen balance in the root zone. Daisy includes modules for description of evapotranspiration, soil water dynamics based on Richards’ equation, water uptake by plants, soil temperature, soil mineral nitrogen dynamics based on the advection–dispersion equation, nitrate uptake by plants and nitrogen transformations in the soil. The nitrogen transformations simulated by Daisy are mineralization–immobilization turnover, nitrification and denitrification. In addition, Daisy includes a module for description of agricultural management practices. Details on the Daisy application in the present study are given by Hansen et al. (1999). 2.1.3. MIKE SHE/Daisy coupling By combining MIKE SHE and Daisy, a complete modelling system is available for the simulation of water and nitrate transport in an entire catchment. In the present case the coupling is a sequential one. Thus for all agricultural areas, Daisy first produces calculations of water and nitrogen behaviour from the soil surface and through the root zone. The percolation of water and nitrate at the bottom of the root zone simulated by Daisy, is then used as input to MIKE SHE calculations for the remaining part of the catchment. For natural areas, MIKE SHE calculates also the root zone processes assuming no nitrate contribution from these areas. Owing to the sequential execution of the two codes, it has to be assumed that there is no feed back from the groundwater zone (MIKE SHE) to the root zone (Daisy). Further, overland flow generated by high intensity rainfall (Hortonian) cannot be simulated by this coupling, while overland flow due to saturation from below (Dunne) can be accounted for by MIKE SHE. Thus, MIKE SHE does not in the present case handle evapotranspiration and other root zone
processes in the agricultural areas. As Daisy is onedimensional, one Daisy run in principle should be carried out for each of MIKE SHE’s horizontal grids. However, several MIKE SHE grids are assumed to have identical root zone properties (soil, crop, agricultural management practices, etc.), so that in practise the outputs from each Daisy run can be used as input to several MIKE SHE grids. 2.2. Data availability at European databases Input data for modelling at the European scale need to satisfy certain requirements to make them useful for large-scale applications: • The data must be available for the whole of Europe. • The data must be harmonised according to a common nomenclature in order to avoid regional or national inconsistencies. • The data should be available in a seamless database. • The data should be available from one single source to avoid regional or national inconsistencies. • The data should be available in a format which can be directly integrated into a Geographical Information System (GIS). Attached to the use of “European” data sets are also certain problems. The data are generalised in geometric as well as in thematic detail, local particularities which are especially important for hydrological simulations are not always accounted for. Often information that is required for specific modelling objectives is not directly available on European level demanding the establishment and use of transfer functions instead. On the contrary, information is sometimes too specific when it has been collected in the framework of a particular research project, e.g. information on a particular soil property is being collected in natural soils but not in agricultural soils. Given these formal requirements, a first task of the project was to study the availability of data sets suited for large-scale hydrological modelling of groundwater contamination from diffuse sources. After intensive searches of on-line data catalogues, paper publications and direct contacts with organisations holding relevant information, it was possible to
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
121
Table 1 Data sources for European scale hydrological modelling Data
Potential data source identified in European data base
Source actually used for modelling
Scale of available data used
Topography Soil type Soil organic matter
USGS a/GISCO GISCO soil map RIVM b report
1 km grid 1 km grid Denmark
Vegetation River network and river cross sections Geology
EEA: CORINE land cover DCW d
USGS/GISCO GISCO soil map Experience value for Danish arable soils c EEA: CORINE land cover Provided by an application developed within the project Report on groundwater resources in Denmark (EC, 1982)
Groundwater abstraction
Management practices Crop type Livestock density Fertilizer consumption Manure production Atmospheric deposition Climatic variables River runoff
Report on groundwater resources in Denmark (EC, 1982) RIVM—digital map data of report Report on groundwater resources in Denmark (EC, 1982) RIVM—digital map data of report SC-DLO e report Eurostat—Regional Statistics Eurostat—Regional Statistics Eurostat—Eurofarm Eurostat—Environmental Statistics Eurostat—Environmental Statistics MARS project MARS project f GRDC g
1 km grid 1 km grid County, i.e. approximately 3,000 km 2
Report on groundwater resources in Denmark (EC, 1982)
Commune, i.e. approximately 200 km 2
Plantedirektoratet (1996) Agricultural Statistics (1995)
Denmark County, i.e. 3000 km 2 County, i.e. 3000 km 2 County, i.e. 3000 km 2 County, i.e. 3000 km 2 Denmark Denmark Catchment
Agricultural Statistics (1995) Agricultural Statistics (1995) Agricultural Statistics (1995) National data National data National data
approximately approximately approximately approximately
a
USGS—United States Geological Survey. RIVM—National Institute of Public Health and the Environment of The Netherlands. c RIVM data only include natural areas, not arable land. Instead the figure was assessed on the basis of previous experience with Danish agricultural soils. d DCW—Digital Chart of the World. e SC-DLO—Winand Staring Centre, The Netherlands. f MARS—Monitoring Agriculture by Remote Sensing database. g GRDC—Global Runoff Data Centre, database mainly for large river basins. b
identify sources for all the information requirements. However, after evaluation of all the potential sources the following deficiencies became apparent: • Not all information was available in spatially referenced GIS format, therefore other sources such as tables and statistics had to be considered. • Not all information was available from “European” databases, finally national sources had to be considered. For these national sources strict requirements in terms of ease of availabil-
ity, data quality and data comparability were imposed. • The scale of the available data was often too coarse for the application. Global data sets with 1 × 18 longitude/latitude resolution are often not detailed enough. The potential “European scale” data sources and the data sources which ultimately was used for the model are shown in Table 1. Data about climatic variables were obtained from
122
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
the national meteorological institutes and river runoff from the national hydrological institutes. These data were only available from national sources, but on the contrary these data are probably the most easily available (if the issue of price charges is disregarded) and the most easily comparable due to international harmonised measuring techniques at these organisations. Regional statistics on Denmark obtained from EUROSTAT proved to be not detailed enough (country level only). The required statistical information could easily be recovered from Danish national statistics. Cost estimates for the compilation of the database have only been undertaken to a limited extent. The project data itself have mostly been obtained in exchange for the anticipated project results, i.e. at no cost. The main data that in a fully commercial environment cost a substantial amount of money are meteorological data which are available from the national meteorological institutes (Kleeschulte, 1998). 2.3. Change of scale Large scale hydrological models are required for a variety of applications in hydrological, environmental and land surface-atmosphere studies, both for research and for day to day water resources management purposes. The physically based models have so far mainly been tested and applied at small scale and therefore require upscaling. The complex interactions between spatial scale and spatial variability is widely perceived as a substantial obstacle to progress in this respect (Blo¨schl and Sivapalan, 1995; and many others). The research results on the scaling issue reported during the past decade have, depending on the particular applications, focussed on different aspects, which may be categorised as follows: • Subsurface processes focussing on the effect of geological heterogeneity. • Root zone processes including interactions between land surface and atmospheric processes. • Surface water processes focussing on topographic effects and stream–aquifer interactions. The effect of spatial heterogeneity on the description of subsurface processes has been the subject of
comprehensive research for two decades, see e.g. Dagan (1986) and Gelhar (1986) for some of the first consolidated results and Wen and Go´mezHerna´ndez (1996) for a more recent review, mainly related to aquifer systems. The focus in this area is largely concerned with upscaling of hydraulic conductivity and its implications on solute transport and dispersion processes in the unsaturated zone and aquifer system, typically at length scales less than 1 km. The research in the land surface processes has mainly been driven by climate change research where the meteorologists typically focus on length scales up to 100 km. Michaud and Shuttelworth (1997), in a recent overview, conclude that substantial progress has been made for the description of surface energy fluxes by using simple aggregation rules. Sellers et al. (1997) conclude that “it appears that simple averages of topographic slope and vegetation parameters can be used to calculate surface energy and heat fluxes over a wide range of spatial scales, from a few meters up to many kilometers at least for grassland and sites with moderate topography”. An interesting finding is the apparent existence of a threshold scale, or representative elementary area (REA) for evapotranspiration and runoff generation processes (Wood et al., 1988, 1990, 1995). Famiglietti and Wood (1995) concludes on the implications of such an REA in a study of catchment evapotranspiration that “the existence of an REA for evapotranspiration modelling suggests that in catchment areas smaller than this threshold scale, actual patterns of model parameters and inputs may be important factors governing catchment-scale evapotranspiration rates in hydrological models. In models applied at scales greater than the REA scale, spatial patterns of dominant process controls can be represented by their statistical distribution functions”. The REA scales reported in the literature are in the order of 1–5 km 2. The research on scale effects related to topography and stream–aquifer interactions has been rather limited as compared to the above two areas. Saulnier et al. (1997) have examined the effect of the grid sizes in digital terrain maps (DTM) on the model simulations using the topography-based TOPMODEL. They concluded that in particular for channel pixels the spatial resolution of the underlying DTM is important. Refsgaard (1997) using the distributed MIKE SHE
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
123
Fig. 2. Schematic representation of upscaling/aggregation procedure.
model to the Danish Karup catchment with grid sizes of 0.5, 1, 2 and 4 km, found that the discharge hydrograph shape was significantly affected for the 2 and 4 km grids as compared to the almost identical model results with 0.5 and 1 km grids. He concluded that the main reason for this change was that the density of smaller tributaries within the catchment was smaller for the models with the larger grids. Many researchers doubt whether it is feasible to use the same model process descriptions at different scales. For instance Beven (1995) states that “… the aggregation approach towards macroscale hydrological modelling, in which it is assumed that a model applicable at small scales can be applied at larger scales using ‘effective’ parameter values, is an inadequate approach to the scale problem. It is also unlikely in the future that any general scaling theory can be developed due to the dependence of hydrological systems on historical and geological perturbations”. We have experienced some of the same problems and agree that it is generally not possible to apply the same model without recalibration at small and large scales. Therefore, we have used another
approach based on a combination of aggregation and upscaling in accordance with the principles recommended by Heuvelink and Pebesma (1998). The scale terminology and the upscaling procedure adopted here are as follows (Fig. 2): • The basic modelling system is of the distributed physically based type. For application at point scale (where it is not used spatially distributed) the process descriptions of this model type can be tested directly against field data. • The model is in this case run with (equations and) parameter values in each horizontal grid point representing field scale (50–200 m) conditions. The field scale is characterised by ‘effective’ soil and vegetation parameters, but assuming only one soil type and one cropping pattern. Thus the spatial variability within a typical field is aggregated and accounted for in the ‘effective’ parameter values. • The smallest horizontal discretization in the model is the grid scale or grid size (1–5 km) that is larger than the field scale. This implies that all the variations between categories of soil type and crop type
124
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
Fig. 3. Locations of the Karup and Odense catchments in Denmark.
within the area of each grid cannot be resolved and described at the grid level. Such input data whose variations are not included in the grid scale model representation, are distributed randomly at the catchment scale so that their statistical distributions are preserved at that scale. • The results from the grid scale modelling are then aggregated to catchment scale (10–50 km) and the statistical properties of model output and field data are then compared at catchment scale. • For applications to larger scales than catchment scale, such as continental scale, the catchment scale concept is used, just with more grid points. This implies that the continental scale can be considered to consist of several catchments, within each of which the field scale statistical variations are preserved and at which scale the predictive capability of the model thus lies. In the upscaling procedure a distinction is made
between the terms upscaling and aggregation. Thus, spatial attributes are aggregated and model parameters are scaled up. A principal difference between aggregation and upscaling is that whereas aggregation can be defined irrespective of a model operating on the aggregated values, upscaling must always be defined in the context of a model that uses the parameters that have been scaled up (Heuvelink and Pebesma, 1998). In this respect the main principle of the upscaling procedure can be summarised as follows: • Upscale model from point scale to field scale. • Run model at grid scale using field scale parameters in such a way that their statistical properties are preserved at catchment scale. • Aggregate grid scale model output to catchment scale. This methodology mainly attempts to address scaling within the second of the above fields, namely root
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
zone processes, while scaling in relation to subsurface processes and stream–aquifer interaction has not been considered when designing the present upscaling procedure. The methodology has some complications and critical assumptions: • The assumption of upscaling from point scale to field scale is crucial. This assumption is documented to be fulfilled in many cases (Jensen and Refsgaard 1991a–c; Djuurhus et al., 1999), but may fail in other cases (Bresler and Dagan, 1983), for instance in areas where overland flow is a dominant flow mechanism. • Running the model at grid scale but using model parameters valid at a field scale, which is typically 2 to 3 orders of magnitude smaller, is necessary to make the computational demand acceptable for catchment and continental scale applications. The solution to this is to assign inputs on soil and vegetation types not correctly georeferenced but such that their statistical distribution at catchment scale is preserved. This implies that results at grid scale are dubious and should not be used. The aggregation step up to catchment scale is therefore essential. • While the statistical properties of the critical root zone parameters due to the aggregation step have been preserved at catchment scale this is not the case for the geological, topographical and stream data which are used directly at the grid scale. A critical question is therefore, how the catchment scale model output, due to these other data, are influenced by selection of grid scale. Here, investigations with 1, 2 and 4 km grids are made.
3. Application 3.1. Modelling approach for the Karup and Odense catchments The modelling studies have focussed on two aspects, namely the feasibility of using coarse aggregated data available at European level databases, and the effect of the upscaling procedure. The modelling aims at describing the integrated runoff at the catch-
125
ment outlet and the distribution function of the nitrate concentrations sampled from available wells over the catchment (aquifer). On this basis the following approach has been adopted:
1. Simulation models have been established for ˚ and two catchments in Denmark, Karup A ˚ (Fig. 3), in the following denoted Odense A the Karup and Odense models, respectively. The topographical areas for the Karup catchment gauging station 20.05 Hagebro is 518 km 2. Correspondingly, the catchment area at the gauging station used for the model validation tests in the Odense catchment, 45.26 Ejby Mølle, is 536 km 2. The most detailed studies were carried out for the Karup catchment, while the results for the Odense catchment were included mainly to check the generality of the conclusions derived from the Karup catchment. 2. The models are established directly from the European level databases and all input parameter values are assessed from these data or in a predefined objective way from experience values obtained from previous model studies. Thus, the models are not calibrated at all. 3. The results of the models are compared with field data, on which basis the model performance is assessed. 4. The effects of upscaling have been examined in two ways: • The models are run with different grid sizes (1, 2 and 4 km) and the results compared. • For the Karup catchment two different procedures have been compared, namely: the upscaling/aggregation procedure described above (Fig. 2), which according to its representation of agricultural crops is denoted ‘distributed’; a simpler procedure where the agricultural crops are upscaled all the way from field scale to catchment scale. This implies that one crop type represents all the agricultural areas. The dominant crop in the area, namely winter wheat, has been selected as the crop for the 70% agricultural area, while the 30% natural/
126
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
Fig. 4. Surface topography, catchment delineation and river network for the Karup-EU model.
urban areas remain as the only other vegetation type. This procedure is denoted ‘uniform’. 3.2. Karup model 3.2.1. Catchment and river system The catchment area and locations of the river branches (Fig. 4) were generated from the DEM by use of standard ARC/Info functionalities. The generated catchment areas for 1, 2 and 4 km grids were within 4% of the correct one at station 20.05 Hagebro. The river cross-sections were subsequently automatically derived on the basis of the following assumptions: • The bankful discharge (i.e. water flow up to top of cross-section) corresponds to a typical annual maximum discharge. This characteristic discharge is further assumed uniform in terms of specific runoff (1 s 21 km 22), so that the actual discharge at any cross section is estimated as the specific
runoff multiplied by the upstream catchment area that can be estimated from the DEM. • The river slope corresponds to the slope of the surrounding surface, which can be derived from the DEM. • The cross-section has a trapezium shape with a fixed given angle and relation between depth and width. • The relation between discharge, slope and river cross-section can be determined by the Manning formula with a given Manning number. Most areas in Denmark are drained in order to make the land suitable for agriculture. Agricultural areas are typically artificially drained with tile drains in combination with small ditches. Other areas may be naturally drained by creeks and rivers. It is not possible to include a detailed and fully correct drainage description in a coarse model like the Karup model. Moreover, detailed information on drainage network is not available. Therefore, when establishing a coarse scale
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
model, a lumped description must be used. In the present case it is simply assumed that the entire catchment area is drained and that the drains are located 1 m below ground surface. Drainage water is produced whenever the groundwater table is located above this drainage level. Drainage water is routed to the nearest river node where it contributes as a source to the river flow. Routing of groundwater to the drains and further to the ultimate recipient is in MIKE SHE described using a linear routing technique, where a time constant is specified by the user. In this case a time constant of 2:3 × 1027 s21 was used corresponding to an average retention time (in the linear reservoir) of 50 days. This time constant represents a typical value for Danish catchments. 3.2.2. Soil properties The soil texture classes in a 1 × 1 km resolution were provided by the GISCO soil data base. The texture classes were translated into soil parameters in terms of hydraulic conductivity functions and soil water retention curves using pedo-transfer functions (Cosby et al., 1984). According to the GISCO the Karup catchment is covered by coarse sandy soil for which the following key parameter values were estimated: (a) saturated hydraulic conductivity K s 1:7 × 1025 m=s; (b) moisture content at saturation us 40 vol%; (c) moisture content at field capacity uFC 20 vol%; and (d) moisture content at wilting point uwp 6 vol%: A specific problem was related to assessment of soil organic matter, which is an important parameter for nitrogen turnover processes. As indicated in Table 1 such information was not identified in any of the European data bases. Instead a value based on previous experience (Lamm, 1971) with Danish agricultural soils was estimated. In the plough layer (0– 20 cm) a value of 1.5%C was used, and this value decreased rapidly with depth to a minimum of 0.01%C below 1 m depth. 3.2.3. Hydrogeology The geological perception of the area and the basis for estimation of the hydrogeological parameters used in the model are all based on EC (1982), where the aquifer is described as composed of two main geological layers. The upper layer is Quaternary sediments consisting
127
of sands and gravel. The transmissivity of these sediments are assessed to be in the order of 2 × 1023 m2 =s and the thickness about 15 m (EC, 1982). This leads to a horizontal hydraulic conductivity of 1:3 × 1024 m=s that was used in the model calculations. An anisotropy factor of 10 between horizontal and vertical hydraulic conductivities was assumed leading to a vertical hydraulic conductivity of 1:3 × 1025 m=s: Moreover, a specific yield of 0.2 and a storage coefficient of 10 24 m 21 was assumed. Below the Quaternary sediments there are Miocene quarts-sand sediments with a relatively high transmissivity of 3 × 1023 m2 =s and a thickness of typically 10–20 m (EC, 1982). Hence, in the model a thickness of 15 m has been used. This leads to a horizontal hydraulic conductivity of 2:0 × 1024 m=s: The same assumptions on anisotropy, specific yield and storage coefficients as for the Quaternary sediments were applied for the Miocene sediments. EC (1982) provides information on groundwater abstraction on a commune (local administrative unit) basis. The Miocene sediments are described as suitable for drinking water supply, why it is assumed that all groundwater abstractions are made from these sediments that are the lower layer in the model. The total abstraction is given as 13 × 106 m3 =year: The exact location of the individual water supply wells is not given in EC (1982), and has been evenly distributed among 10–20 model grids located along the river system. The location of the reduction front in the aquifer is an important parameter for nitrate conditions. As percolation water containing nitrate moves into areas with reduced geochemical conditions the nitrate will disappear. No information on this important parameter was provided in EC (1982). It was assumed that the front separating oxic and reduced aquifer conditions all over the aquifer is located in the Miocene sediments, 3 m below the interface to the Quaternary sediments. This corresponds to a location 18 m below the terrain surface. 3.2.4. Hydrometeorology Time series of daily precipitation and temperature based on standard meteorological stations within the catchment was used. In addition, monthly values of potential evapotranspiration were calculated by the Makkink equation on the basis of climate data from
128
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
the synoptic station at Karup airport. The data from synoptic stations are generally easily available internationally. 3.2.5. Crop growth, evapotranspiration and nitrate leaching model Distributions of crop types and livestock densities were obtained from Agricultural Statistics (1995) and converted to slurry production using standard values for nitrogen content. Based on typical crop rotations proposed by The Danish Agricultural Advisory Centre and the constraints offered by crop distribution and livestock density two cattle farm rotations, one pig farm rotation and one arable farm rotation were constructed. In order to capture the effect of the interaction between weather conditions and crops, simulations were performed in such a way that each crop at its particular position in the considered rotation occurred exactly once in each of the years, which resulted in a total of 17 crop rotation schemes. These 17 schemes were distributed randomly over the area in such a way that the statistical distribution was in accordance with the agricultural statistics. To simulate the trend in the nitrate concentrations in the groundwater and in the streams, it is necessary to have information on the history of the fertiliser application in space and time. In Denmark, norms and regulations for fertilisation practice are defined (Plantedirektoratet, 1996) which regulate the maximum amount of nutrients allowed for a particular crop depending on forefruit and soil type, and in addition, provide norms for the lower limit of nitrogen utilisation for organic fertilisers. It was assumed that the farmers follow the statuary norms, and that the proportion of organic fertiliser to the individual crop in a rotation is proportional to the production of organic fertiliser in the rotation and to the relative nitrogen demand of the crop (the fertiliser norm of the particular crop in relation to the fertiliser norm of the rotation). Based on estimated application rates of organic and mineral fertilisers to the individual crops each year, the Daisy model simulated time series of nitrate leaching from the root zone for each agricultural grid. The MIKE SHE model then routed these fluxes further through the unsaturated zone and in the groundwater layers accounting for dispersion and dilution processes
and finally into the Karup stream where the integrated load from the entire catchment was estimated. The parameterisation of the Daisy model is adopted from previous studies. The basic processes and standard parameter values were originally assessed from results of Danish agricultural field experiments (Hansen et al., 1990). As then the process description and standard parameters have only been subject to minor modifications in connection with model tests against data from The Netherlands, Germany, Denmark and Slovakia (Hansen et al 1991; Jensen et al, 1994, 1996, 1997; Svendsen et al, 1995). Hence, the parameters related to both, evapotranspiration/ water balance processes and to the nitrogen transformation processes have, except for the soil parameters described in Section 3.2.3, been taken as the standard values. More details on the parameter values, their assessed uncertainties and results from the Daisy simulations are provided in Hansen et al. (1999). 3.2.6. Boundary and initial conditions In addition to precipitation and groundwater abstraction rates the following boundary conditions are used: • The area included in the catchment is per definition a hydrological catchment as based on topography. Thus a zero-flux boundary is used along the catchment boundaries, also for the aquifer layers. The bottom of the model is considered impermeable. • For all upstream river ends a zero-flux boundary condition is applied. For the downstream end, a constant water level was applied. The most important initial conditions are the moisture content in the unsaturated zone and the elevation of the groundwater table. The initial soil moisture content was assumed equal to field capacity, while the initial groundwater tables was assumed equal to the groundwater tables after a seven years simulation period with guessed initial conditions. The model was run for seven years (1987–1993). In order to reduce the importance of uncertain initial conditions, the two first years were considered as a ‘warming-up period’ and the last five years were considered the simulation period.
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
129
Table 2 Water balance in mm/year for the Karup catchment at station 20.05 Hagebro (518 km 2) Year
1989 1990 1991 1992 1993 Average
Precipitation
812 1020 863 892 835 884
River flow
Observed
Model 1 km grid
Model 2 km grid
Model 4 km grid
428 496 446 499 434 460
392 518 441 531 425 461
353 512 424 527 405 444
460 476 449 437 432 451
3.3. Odense model
4. Results
The same procedure as outlined above for the Karup model was followed. The two main differences as compared to the Karup catchment are that the top soil belong to more fine textured classes with lower hydraulic conductivities and that the aquifer having groundwater abstraction is confined in the Odense catchment. This results in an assumption that the covering sediments are less permeable than the aquifer material. As no direct information on these confining sediments is given in EC (1982) the hydraulic properties of the soil in the root zone are assumed valid. This implies in practise that recharge rates to the aquifer is lower than in the Karup catchment and that the horizontal flow towards the drains and the river system is correspondingly larger. A similar geological geometry as in the Karup catchment is assumed, i.e. the upper less permeable, confining layer is assumed to have a thickness of 15 m and the reduction front is assumed to be located in the lower aquifer, 3 m below this confining layer.
To test the model performance a number of validation tests were carried out for both catchments. Validation is here defined as substantiation that a site specific model performs simulations at a satisfactory level of accuracy. Hence, no universal validity of the general model code is tested nor claimed. In Tables 2 and 3 and Figs. 5–8 results are shown for model grid sizes 1, 2 and 4 km and for the Karup catchment additionally for both the distributed and uniform upscaling procedures. The validation tests described below only considers the 1 km grid model runs, while the remaining results are discussed further below in the section dealing with scaling effects. 4.1. Karup catchment The Karup model (1 km grid) was validated by comparison of model simulations and field data on the following aspects: • Annual water balances. Table 2 shows the annual water balances for the five years simulation period together with the observed annual discharge. The
Table 3 Water balance in mm/year for the Odense catchment at station 45.21 Ejby Mølle (536 km 2) Year
1989 1990 1991 1992 1993 Average
Precipitation
649 943 760 770 906 805
River flow
Observed
Model 1 km grid
Model 2 km grid
Model 4 km grid
220 349 312 308 334 305
177 351 291 306 329 291
187 394 308 332 353 315
181 299 265 243 306 259
130
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
Fig. 5. Comparison of the recorded discharge hydrograph for the Karup catchment with simulations based on 1, 2 and 4 km grids. The two simulated curves corresponds to the combined upscaling/aggregation procedure (Distributed) and the simpler upscaling procedure (Uniform).
simulated and observed hydrographs are shown in Fig. 5. • Nitrate concentrations in the upper groundwater layer. Simulated values are compared to observed values from 35 wells in terms of statistical distributions over the aquifer (Fig. 6). The main findings from these validation tests can be summarised as follows: • The annual water balance is simulated remarkably well. Thus the simulated and recorded flows, which also reflect the annual groundwater recharges in this area, differ only 2% as average values over the five year simulation period (Table 2). • The variation of the river runoff over the year is relatively well described, although not at all as good as the long term average water balance (Fig. 6). The model generally underestimates the runoff in the summer periods (low flows) and overestimates the winter flow. There may be many reasons for this. The most important is probably that the observed groundwater levels and dynamics are poorly reproduced by the model. The runoff
from the Karup catchment is dominated by drainage flow and baseflow components. Thus a good simulation of groundwater levels and dynamics are required in order to produce a good runoff simulation. An improved simulation of groundwater levels and dynamics requires that the model includes, in particular, spatial variations of the transmissivity of the aquifer, which is not possible based on the available input data. • The nitrate concentrations simulated by the model are seen to match the observed data remarkably well, both with respect to average concentrations and statistical distribution of concentrations within the catchment. It may be noticed that the critical NO3 concentration level of 50 mg/l (maximum admissible concentration according to drinking water standards) is exceeded in about 60% of the area.
4.2. Odense catchment The Odense model (1 km grid) was validated by
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
131
Fig. 6. Comparison of the statistical distribution of nitrate concentrations in groundwater for the Karup catchment predicted by the model with 1, 2 and 4 km grids and observed in 35 wells. The upper figure corresponds to the upscaling/aggregation procedure resulting in a distributed representation of agricultural crops, while the lower figure is from the run with the upscaling procedure, where all the agricultural area is represented by one uniform crop.
comparison of model simulations and field data on the following aspects: • Annual water balances. Table 3 shows the annual water balances for the five years simulation period together with the observed annual discharge. The
simulated and observed hydrographs are shown in Fig. 7. • Nitrate concentrations in the upper groundwater layer. Simulated values are compared to observed values from 42 wells in terms of statistical distributions over the aquifer (Fig. 8).
132
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
Fig. 7. Discharge hydrographs for Odense catchment simulated with 1, 2 and 4 km grids.
The main findings from these validation tests are: • The annual water balance is simulated reasonably well, although not with the same accuracy as for the Karup catchment. Thus the simulated and recorded flows differ 18% for the 1 km grid
model as average values over the five year simulation period (Table 3). A comparison with another model study for this area reveals that one of the reasons for this deviation is uncertainties (errors) in the catchment delineation in the flat downstream part of the catchment. Another reason may be that
Fig. 8. Comparison of the statistical distribution of nitrate concentrations in groundwater for the Odense catchment predicted by the model with 1, 2 and 4 km grids and observed in 35 wells.
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
the soil hydraulic conductivity functions and the soil water retention curves that significantly affect the evapotranspiration are not very accurately determined. These inaccuracies may originate either from non-representative soil texture data in the 1 km × 1 km GISCO database or by errors introduced by use of the pedo-transfer functions. • The variation of the river runoff over the year is relatively well described, although the winter peaks are simulated too small and the summer low flows too high, reflecting that some of the internal hydrological processes may not be simulated correctly. • The distribution of groundwater concentrations by the end of the simulation period is seen not to compare very well to the observations from 42 wells. Thus, in 80% of the observation wells no nitrate was found, whereas the model simulates zero concentration in only 25% of the area. With respect to the critical concentration value of 50 mg/ l, the observations indicate that such high concentrations are not found in the area, while the model simulates such concentrations to exist in about 5% of the catchment area. The main reason for this disagreement is most likely that in reality the nitrate is in most of the area reduced (disappears) in the confining sediments overlaying the aquifer. This is not simulated by the model, because the reduction front was assumed to be located within the aquifer, while analysis of local geological data reveals that it in reality is located in the upper confining layer over most of the aquifer. • It is noticed that the nitrate concentrations are significantly lower in the Odense catchment than in the Karup catchment, both the observed and the simulated values. The main reason for this is that the different soil properties and the less number of animals result in a lower nitrate leaching from the root zone in the Odense catchment.
4.3. Scaling effects The results of running the Karup and Odense models with different computational grid sizes, 1, 2 and 4 km, appear from Tables 2 and 3 for annual water balances and Figs. 5 and 7 for discharge hydrographs. Further, the results in terms of groundwater
133
concentrations are shown in Figs. 6 and 8. From these results the following findings appear: • The simulated annual runoff is almost identical and thus independent of grid sizes. A reason for some of the small differences is that the catchment areas in the 1, 2 and 4 km models are not quite identical. Thus, the root zone processes responsible for generating the evapotranspiration and consequently the runoff does not appear to be scale dependent as long as the statistical properties of the soil and vegetation types are preserved, which is the case with the upscaling/aggregation procedure used in this case. • The hydrograph shape differs significantly for the three grid sizes. For the Karup model, the simulation with 1 km grid reproduces the low flow conditions reasonably well, whereas the 2 and 4 km grids have a rather poor description of the baseflow recession in general and the low flow conditions in particular. For the Odense model, the simulation with the 1 km grid shows too large baseflows during the low flow season, while the 2 km grid model has the right level and the 4 km grid model simulates less low flow than observed. This indicates that there are significant scale effects on the stream–aquifer interaction that are not properly described in the present upscaling/aggregation procedure. • The nitrate concentrations in the groundwater is not clearly influenced by the grid size for the Karup catchment, while there appears to be some effect for the Odense catchment. The reason for this difference is related to the different hydrogeological situations in the two catchments. In the Karup catchment the groundwater table is generally located a couple of meters below terrain surface and the horizontal flows take place in both the Quaternary and the Miocene sediments. Hence for both the 1, 2 and 4 km grid models, the main part of the horizontal groundwater flow takes place in the about 15 m of the aquifer located above the reduction front, and only a relatively small part of the flow lines are crossing the reduction front, below which the nitrate disappears. In the Odense catchment, the horizontal groundwater flows take place almost exclusively in the lower aquifer, of which only the upper 3 m is located
134
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
above the reduction front. This implies that a large part of the groundwater flow is crossing the reduction front on its route from the infiltration zones in the hilly areas towards the discharge zones near the river. As the size of the grid influences the smoothness of the aquifer geometry, the grid size will significantly influence the number of flow lines crossing the reduction front and hence the nitrate concentrations. Such scaling effect on geological conditions is not accounted for in the present upscaling/aggregation procedure. Further, for evaluating the importance of the combined upscaling/aggregation method (‘distributed’) a model run has been carried out for the Karup catchment with another upscaling method. This alternative method is based on upscaling of soil/crop types all the way from point scale to catchment scale. This implies that all the agricultural area is described by one representative (‘uniform’) crop instead of the 17 cropping patterns used in the ‘distributed’ method. This representative crop has been assumed to have the same characteristics as the dominant crop, namely winter wheat, and further to be fertilised by the same total amount of the organic manure as in the other simulations, supplemented by some mineral fertiliser up to the nitrate amount prescribed in the norms defined by Plantedirektoratet (1996). The results are illustrated in Figs. 5 and 6 by the legend denoted ‘uniform’. The effects on the discharge hydrographs (Fig. 5) are seen to be negligible, indicating that the dominant crop (by chance) has similar evapotranspiration characteristics as the sum of the different crops weighted according to their actual occurrence. The nitrate concentrations in groundwater (Fig. 6) show some differences in terms of a lower average concentration and a less smooth areal distribution as compared to the distributed agricultural representation. Thus, in case of the ‘uniform’ representation the nitrate concentrations fall in two main groups. Around 30% of the area, corresponding to the natural areas with no nitrate leaching, has concentrations between 0 and 20 mg/l, while the remaining 70%, corresponding to the agricultural area with the ‘uniform’ crop, has concentrations between 70 and 90 mg/l. In the ‘distributed’ agricultural representation the areal distribution curve is much smoother in accordance with the measured data.
5. Discussion and conclusions Two prerequisites are required for performing large scale simulations of nitrate leaching on an operational basis: firstly access to readily available global (or in the present case European) databases, and secondly an adequate scaling enabling suitable models to be applied at a larger scale than the field scales for which they usually have been proven valid. A key challenge as compared to the experiences reported in the literature is then how to make use of the physically based model at large scale without possibility for detailed calibration at that scale, when we know that its physically based equations are developed for small scales. Such model can only be stated as well proven for small scales, and the few attempts made so far to use it on scales above 1000 km 2 have applied calibration at that scale (Refsgaard et al. 1998b, 1992; Jain et al., 1992).
5.1. Data availability From the experiences gathered and the lessons learnt with regard to availability of European data bases the following conclusions can be drawn: • Not all of the existing “European” databases are generally applicable due to various restrictions (e.g. copyright, not open to other projects, pointers only). • Not all databases maintained by international institutions contain harmonised and integrated data sets. Many databases in fact only contain a collection of national data sets that are neither integrated in one seamless data set, nor harmonised in their contents or nomenclatures. • Not all input data requirements could be satisfied from GIS (spatial) data sets, why tables and paper maps are needed to supplement the information. However often the available data are too coarse in scale (e.g. EU statistics at a higher administrative unit than needed) or too specific (e.g. transfer functions for natural soils only but not for agricultural soils). • Use of national data sets is to some extent necessary, with restrictions to data quality and origin. • The search for data sets could have been largely improved by the existence of a European spatial
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
data clearinghouse and the association of the available data sets with meta information. It is noted that in spite of comprehensive efforts made during recent years for assessing spatial data by use of advanced remote sensing technology the only data in the “European” databases which originate from remote sensing data are the CORINE land cover data, which were useful for distinguishing between natural, urban and agricultural areas, but which did not contain any further information about agricultural crops of importance in the present context. In spite of the above limitations, the attempts in the present study to identify suitable data sources at the European scale have shown that useful data are available at that scale for most of the required model input data. Although these data require some kind of transformation, as e.g. pedo-transfer functions, the data appear adequate for overall model simulations at this scale. However, some gaps exist in the European level databases. Thus, for the following data it was necessary to use national data sources: • • • •
Meteorological data on a daily basis. Soil organic matter from arable land. Agricultural statistics. Agricultural practices.
These data were all easily available at a national scale, and hence their availability is not expected to pose significant constraints for large scale modelling in other parts of Europe. The most critical data that may cause problems in terms of availability at larger scale are the geological data, for which no global (or European) digital database apparently exists. The present case study relied heavily on an EC report produced by the Danish Geological Survey. The information in this report proved adequate for the present purpose, although the lack of geochemical information turned out to have some importance for one of the two catchments. Similar readily available EC reports exist for other countries, but they appear to be non-standardised and comprise information at a variable level of details. Hence, the positive conclusions from using the geological data in EC (1982) for Denmark cannot necessarily be generalised.
135
5.2. Parameter assessment—no calibration An important element of the present methodology is the principle not to carry out any calibration. The parameter values were assessed in three different ways: • Directly from the available data, e.g. topography and geology. • Indirectly from the available data through application of predefined transfer functions, e.g. the soil hydraulic parameters. • Use of standard parameter values that have been assessed in previous studies on other locations. While the first two methods can be characterised as fully objective and transparent, it may be argued that there always will be some elements of subjective assessment hidden in the use of standard parameter values and that the possible calibration exercises in previous studies may question the “no calibration” statement. In the present case the standard parameters originate from two model codes and associated accumulated experiences: • Parameters in the MIKE SHE part. The standard parameter used here is the time constant for routing of groundwater to drains (50 days). From comprehensive hydrological modelling experience on dozens of Danish catchments starting with Refsgaard and Hansen (1982) this value can be characterised as a typical value. It is not the optimal value that would be estimated in a calibration for any of the two respective catchments: Thus, for instance the calibrated value for Karup was in Refsgaard (1997) estimated to 33 days. • Parameters in the Daisy part. The standard parameters used here are the ones controlling the vegetation part of the evapotranspiration and the nitrogen turnover processes in the root zone. These parameters are essential both for the water balance and the nitrogen concentrations. The Daisy has standard parameter that can be used if no calibration is possible (or desirable). These standard parameter values have originally been assessed from agricultural field experiments on plot scales (Hansen et al, 1990). As then the process descriptions and associated standard parameter values
136
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
have only been subject to minor adjustments through a number of additional tests on new data sets from different countries. It should be emphasised that Daisy has not previously been calibrated on the Karup and Odense catchments. These two catchments, and in particular the Karup catchment, have been subject to modelling studies which have included calibration of the water balance (evapotranspiration) parameters. However, in the previous studies of the Karup catchment (Styczen and Storm, 1993) and (Refsgaard, 1997) the water balance in the root zone was simulated by MIKE SHE, which is not the case in the present study. As the process descriptions for evapotranspiration in MIKE SHE and Daisy are fundamentally different, the Daisy standard parameters used in the present study, have not been affected at all by the previous MIKE SHE studies in the same catchment. Thus although it may correctly be argued that the standard model parameters are results of previous studies where calibration was carried out, the specific parameters used in the present study have not been subject to, and are not results of, calibration neither in the Karup nor the Odense catchments. In our opinion, one of the strengths of physically based models is the possibility to assess many parameter values from standard values, achieved from experience through a number of other applications. We think that the results of the present study shows both this strength and some of limitations in this respect. Thus on one hand, the key results in terms of annual runoff and nitrogen concentration distributions are encouraging, while on the contrary Figs. 5 and 7 clearly illustrate that it would be very easy to obtain a better hydrograph fit through calibration of a couple of parameter values. When parameter values are assessed in this way they inevitably are subject to considerable uncertainty, which again will generate significant uncertainty in model results. It is therefore highly relevant to conduct uncertainty analyses in order to assess whether the resulting uncertainty becomes so large that the model results are not of any use for water management in practise. A methodology and some results of such uncertainty analyses are provided in Hansen et al. (1999) for the root zone processes and in Refsgaard et al. (1998a) for the catchment processes.
5.3. Upscaling The adopted upscaling methodology is a combination of upscaling and aggregation. Hence, upscaling in its traditional definition (Beven, 1995) is used only from point scale to field scale, where the same equations are assumed valid and where ‘effective’ parameter values are used. The parameter values estimated through pedo-transfer functions (soil data) and the vegetation parameters representing the different crops are assumed valid at field scale. Subsequently, an aggregation procedure is used to represent catchment scale conditions with regard to soil and vegetation types. This aggregation procedure is in full agreement with the findings made regarding the apparent existence of a threshold area (REA) above which “… spatial patterns of dominant process controls can be represented by their statistical distribution functions” (Famiglietti and Wood, 1995). This theoretical consideration is supported empirically by the model results, which show that the annual catchment runoff can be simulated well, even when using different model grid sizes. For the Karup catchment, where the nitrate reduction in the aquifer does not appear to have influenced the results adversely, even the statistical distribution of nitrate concentrations is simulated well. For simulation of annual runoff and nitrate concentration distributions, both of which are affected primarily by root zone processes, the impact of changes of scale is thus relatively small. In contrary to this, the impact on hydrograph shape is consistently rather large. This finding, which also is documented earlier in Refsgaard (1997), indicates that the applied upscaling/aggregation procedure has important limitations with regard to describing the stream–aquifer interactions. Thus in summary, upscaling of processes described by vertical, non-correlated, but patchy, columns is successful, while the upscaling fails in case of processes where horizontal flows between grids dominate. The differences in hydrograph shapes caused by the differences in grid sizes illustrate how careful a model user has to be when changing grid size. In our opinion it is not relevant to talk about an ‘optimal’ scale for hydrograph simulation. The important point is rather that the present methodology is scale dependent with regard to hydrograph simulation; hence a change of scale (grid size)
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
generates a need for recalibration of parameters responsible for baseflow recession and low flow simulation. An alternative, and commonly used, upscaling procedure, where upscaling is used all the way from point scale to catchment scale by selecting the dominant crop type in each grid, resulted in one uniform crop representing all the agricultural area. Results indicate that whereas this uniform upscaling procedure may be sufficient for simulating annual water balance and discharge hydrographs, it is not satisfactory for simulation of nitrate leaching and groundwater concentrations. This is in agreement with Beven (1995) who states that upscaling from small scales to larger scales using effective parameter values cannot be assumed to be generally adequate. An inherent limitation of the applied upscaling/ aggregation method is that it does not preserve the georeferenced location of simulated concentrations, but only their statistical distribution over the catchment area. Therefore, comparisons with field data make no sense on a well by well or subcatchment by subcatchment basis, and no information on the actual location of the simulated “hot spots” within the catchment is possible. If it from a management point of view is required with a more detailed spatial resolution of the model predictions, then the same upscaling method has to be carried out at a finer scale with all the statistical input data being supplied on a subcatchment basis. This is in principle straightforward, but in reality it may often be limited by data availability. A critical assumption in the upscaling procedure is the application of the point scale equations at the field scale with effective parameters. This corresponds to interpreting the field as a single equivalent soil column using effective hydraulic parameters. This approach was evaluated on two Danish experimental 0.25 ha plots, a coarse sandy soil and sandy loam, using the Daisy model (Djurhuus et al., 1999). The two plots were monitored with respect to soil water content and nitrate in soil water at several depths at 57 points, where also texture, soil water retention and hydraulic conductivity functions had been measured. The conclusions from comparing the field measured data with the model simulations over the experimental plot, represented by the 57 points, was that the observed mean nitrate concentrations were matched
137
well by a simulation using the geometric means as effective parameters. This conclusion is in agreement with previous studies for Danish hydrological regime (Jensen and Refsgaard 1991a–c; Jensen and Mantoglou, 1992). Other studies from other regimes (Bresler and Dagan, 1983) conclude that effective soil hydraulic parameters are not adequate for modelling water flow in spatially variable fields. The critical issue determining whether such approach is feasible or not may depend on whether Hortonian overland flow is created in the hydrological regime in question. Thus, although the upscaling methodology from point to field scale is far from universally valid, there are good reasons to believe that this assumption was satisfactorily fulfilled in the present case studies. The spatial patterns, which in subsurface hydrology is considered to be of significant importance (Wen and Go´mez Herna´ndez, 1996), have been treated in different ways with regard to continuous data (parameter values) and categorical data (soil and vegetation classes). The effects of spatial autocorrelation of soil and vegetation parameters within a field have been assumed incorporated into the ‘effective parameters’, which in the present case are assessed in a rather crude way through pedo-transfer functions and use of standard values. The categorical data have been treated differently in the aggregation procedure for soil and vegetation classes. The soil data (one soil type for Karup and two soil types for Odense) were assessed from the soil map and assigned at a grid basis so that the percentage of each soil type within a catchment was preserved and the individual grids to the largest possible extent were characterised by the dominant soil type within the respective grid. For the vegetation types, the same procedure was applied to initially distinguish between agricultural and non-agricultural areas by use of the land cover map. Subsequently, it was assumed that the spatial distribution of cropping patterns are random and without spatial autocorrelation. This is justified by the agricultural management practise of rotating the crops within the individual farms. 5.4. General applicability of methodology From the results of the present study it appears that it is possible to use distributed physically based models of the same type as the MIKE SHE/Daisy
138
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
for catchment scale assessment of nitrate contamination from agricultural land. It appears obvious that such model application is straightforward and the above conclusion is valid for other areas in Denmark. The interesting question is therefore how general this conclusion is to other areas in Europe (and on other continents) and what the scientific and practical limitations are. In this respect the following considerations may be noted: • Except for the geological data, the general availability of which are somewhat uncertain, there is no reason to expect that the application of similar data for other catchments in other European countries should not be as relatively easy as the application for the two Danish catchments. Likewise, the encouraging simulation results of using European level databases, in spite of their often coarse resolution and high level of aggregation, may also be expected for other areas. With regard to geological data it may be noted that considerable efforts are being made at most (if not all) national geological institutes to provide geological data to users in digital form; hence the limitation on non-easy data availability existing so far is likely to be overcome, at least nationally, during the coming years. • The combined aggregation/upscaling procedure appears valid in many areas. The catchments for which it was used in the present study were limited to a maximum of about 500 km 2. However, the further upscaling to larger areas provides no fundamental problems, as it consists of just a larger number of computational grids. Computationally, running a model like MIKE SHE/Daisy for an area of for instance 100 000 km 2 with e.g. 250 subcatchments of each 100 grids is maybe close to the limit of what is practically feasible today (five years run would require 100 h CPU time on a Pentium 300 MHz), but this problem will soon disappear as computers become faster. • The MIKE SHE/Daisy modelling methodology is general and applicable to many other areas. Some limitations, however, is related to special geological conditions such as karstic flow and fissured aquifers, which cannot be described explicitly. Another important limitation is related to the upscaling procedure from point to field scale, which may fail in areas where Hortonian overland
flow is a dominant mechanism. In this respect it should be noted that many areas with dominant overland flow regimes are mountainous regions characterised by thin soil layers and steep slopes, which generally not are regions with important aquifers. Hence, it may be concluded that the methodology can relatively easily be applied to larger areas and used as decision support tool for evaluation of legislative and management measures aiming at reducing nitrate contamination risks. Acknowledgements The present work was partly funded by the EC Environment and Climate Research Programme (contract number ENV4-CT95-0070). Good ideas and constructive comments to the manuscript by Gerard Heuvelink, University of Amsterdam, are greatly acknowledged. Further, the constructive criticism of Marnik Vanclooster, Universite´ Catholique de Louvain, and an anonymous reviewer are acknowledged.
References Abbott, M.B., Bathurst, J.C., Cunge, J.A., O’connell, P.E., Rasmussen, J., 1986. An introduction to the european hydrological system—syste´me hydrologique europe´en SHE 2: structure of a physically based distributed modelling system. Journal of Hydrology 87, 61–77. Agricultural Statistics, 1995. Danmarks Statistik, 294 pp. (In Danish). Arnold, J.G., Williams, J.R., 1995. SWRRB—a watershed scale model for soil and water resources management. In: Singh, V.J. (Ed.). Computer Models of Watershed Hydrology, Water Resources Publication, pp. 847–908. Arnold, J.G., Williams, J.R., Nicks, A.D., Sammons, N.B., 1990. SWRRB—A basin scale simulation model for soil and water resources management, Texas A & M University Press, College Station 241 pp. Beasley, D.B., Huggins, L.F., Monke, E.J., 1980. ANWERS: a model for watershed planning. Transactions of ASAE 23 (4), 938–944. Beven, K., 1995. Linking parameters across scales: subgrid parameterizations and scale dependent hydrological models. Hydrological Processes 9, 507–525. Blo¨schl, G., Sivapalan, M., 1995. Scale issues in hydrological modelling: a review. Hydrological Processes 9, 251–290. Brester, E., Dagan, G., 1983. Unsaturated flow in spatially variable
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140 fields: application of water flow models to various fields II. Water Resources Research 19, 421–428. Cosby, B.J., Hornberger, M., Clapp, Ginn, T.R., 1984. A statistical exploration of relationships of soil moisture characteristics to the physical properties of soils. Water Resources Research 20, 682–690. Dagan, G., 1986. Statistical theory of groundwater flow and transport: pore to laboratory, laboratory to formation, and formation to regional scale. Water Resources Research 22 (9), 120–134. DeCoursey, D.G., Rojas, K.W., Ahuja, L.R., 1989. Potentials for non-point source groundwater contamination analyzed using RZWQM. Paper No. SW892562, presented at the International American Society of Agricultural Engineers’ Winter Meeting, New Orleans, Louisiana. DeCoursey, D.G., Ahuja, L.R., Hanson, J., Shaffer, M., Nash, R., Rojas, K.W., Hebson, C., Hodges, T., Ma, Q., Johnsen, K.E., Ghidey, F., 1992. Root zone water quality model, Version 1.0, Technical Documentation. United States Department of Agriculture, Agricultural Research Service, Great Plains Systems Research Unit, Fort Collins, Colorado, USA. Djuurhus, J., Hansen, S., Schelde, K., Jacobsen, O.H., 1999. Modelling mean nitrate leaching from spatially variable fields using effective parameters. Geoderma 87, 261–279. EC, 1982. Groundwater resources in Denmark. Commission of the European Communities. EUR 7941 (In Danish). EC, 1996. Commission proposal for an Action Programme for Integrated Groundwater Protection and Management, Brussels. EEA, 1995. Europe’s Environment. The Dobris Assessment. The European Agency, Copenhagen. Engesgaard, P., 1996. Multi-species reactive transport modelling. In: Abbott, M.B., Refsgaard, J.C. (Eds.). Distributed Hydrological Modelling, Kluwer Academic Publishers, Dordrecht, pp. 71–91. EU, 1991. Resolution from Ministerial seminar held in The Hague in November 1991. Famiglietti, J.S., Wood, E.F., 1995. Effects of spatial variability and scale on arealy averaged evapotranspiration. Water Resources Research 31 (3), 699–712. Gelhar, L.W., 1986. Stochastic subsurface hydrology. From theory to applications. Water Resources Research 22 (9), 135–145. Hansen, S., Jensen, H.E., Nielsen, N.E., Svendsen, H., 1990. Daisy, a soil plant system model. NPO-forskning fra Miljøstyrelsen, Report no. A10. Danish Environmental Protection Agency, Copenhagen. Hansen, S., Jensen, H.E., Nielsen, N.E., Svendsen, H., 1991. Simulation of nitrogen dynamics and biomass production in winter wheat using the Danish simulation model Daisy. Fertilizer Research 27, 245–259. Hansen, S., Thorsen, M., Pebesma, E., Kleeschulte, S., Svendsen, H., 1999. Uncertainty in simulated leaching due to uncertainty in input data. A case study. Soil Use and Management. Heng, H.H., Nikolaidis, N.P., 1998. Modelling of non-point source pollution of nitrogen at the watershed scale. Journal of the American Water Resources Association 34 (2), 359–374. Heuvelink, G.B.M., Pebesma, E.J., 1998. Spatial aggregation and soil process modelling. Geoderma. Jain, S.K., Storm, B., Bathurst, J.C., Refsgaard, J.C., Singh, R.D.,
139
1992. Application of the SHE to catchment in India. Part 2. Field experiments and simulation studies with the SHE on the Kolar subcatchment of the Narmada River. Journal of Hydrology 140, 25–47. Jensen, C., Stougaard, B., Østergaard, H.S., 1996. The performance of the Danish simulation model Daisy in prediction of Nmin at spring. Fertilizer Research 44, 79–85. Jensen, C., Stougaard, B., Østergaard, H.S., 1994. Simulation of the nitrogen dynamics in farm land areas in Denmark 1989–1993. Soil Use and Management 10, 111–118. Jensen, K.H., Refsgaard, J.C., 1991. Spatial variability of physical parameters in two fields. Part II: Water flow at field scale. Nordic Hydrology 22, 303–326. Jensen, K.H., Refsgaard, J.C., 1991. Spatial variability of physical parameters in two fields. Part III. Solute transport at field scale. Nordic Hydrology 22, 327–340. Jensen, K.H., Refsgaard, J.C., 1991. Spatial variability of physical parameters in two fields. Part I. Water flow and solute transport at local scale. Nordic Hydrology 22, 275–302. Jensen, K.H., Mantoglou, A., 1992. Application of stochastic unsaturated flow theory, numerical simulations, and comparisons to field observations. Water Resources Research 28, 269–284. Jensen, L.S., Mueller, T., Nielsen, N.E., Hansen, S., Crocker, G.J., Grace, P.R., Klir, J., Ko¨rschens, M., Poulton, P.R., 1997. Simulating trends in soil organic carbon in long-term experiments using the soil–plant–atmosphere model DAISY. Geoderma 81 (1–2), 5–28. Kleeschulte, S., 1998. Assessment of data availability for direct modelling use at the European scale. In: Refsgaard, J.C., Ramaekers, D.A. (Eds.), Assessment of ‘cumulative’ uncertainty in spatial decision support systems: Application to examine the contamination of groundwater from diffuse sources. Final Report, vol. 1, EU contract ENV-CT95-070. http:// projects.gim.lu/uncersdss. Knisel, W.G. (Ed.), 1980. CREAMS: a field-scale model for chemicals, runoff, and erosion from agricultural managements systems. US Department of Agriculture, Science, and Education Administration. Conservation Research Report no. 26, 643 pp. Knisel, W.G., Williams, J.R., 1995. Hydrology component of CREAMS and GLEAMS models. In: Singh, V.P. (Ed.). Computer Models of Watershed Hydrology, Water Resources Publication, pp. 1069–1114. Lamm, C.G., 1971. Det danske jordarkiv (The Danish soil archieve), Tidsskrift for Planteavl, pp. 703–720 (in Danish). Leonard, R.A., Knisel, W.G., Still, D.A., 1987. GLEAMS: groundwater loading effects of agricultural management systems. Transactions of ASAE 30, 1403–1418. Mangold, D.C., Tsang, C.F., 1991. A summary of subsurface hydrological and hydrochemical models. Reviews of Geophysics 29 (1), 51–79. Michaud, J.D., Shuttelworth, W.J., 1997. Executive summary of the Tuczon aggregation workshop. Journal of Hydrology 190, 176– 181. Person, M., Raffensperger, J.P., Ge, S., Garven, G., 1996. Basinscale hydrogeologic modelling. Reviews of Geophysics 34 (1), 61–97.
140
J.C. Refsgaard et al. / Journal of Hydrology 221 (1999) 117–140
Plantedirektoratet, 1996. Guidelines and forms 1996/1997. Ministry for Food, Agriculture and Fishery, 38 pp. (In Danish). Refsgaard, J.C., 1997. Parameterisation, calibration and validation of distributed hydrological models. Journal of Hydrology 198, 69–97. Refsgaard, J.C., Hansen, E., 1982. A distributed groundwater/ surface water model for the Susa˚ catchment. Part 1. Model description. Nordic Hydrology 13, 299–310. Refsgaard, J.C., Storm, B., 1995. MIKE SHE. In: Singh, V.P. (Ed.). Computer Models of Watershed Hydrology, Water Resources Publication, pp. 809–846. Refsgaard, J.C., Seth, S.M., Bathurst, J.C., Erlich, M., Storm, B., Jørgensen, G.H., Chandra, S., 1992. Application of the SHE to catchment in India. Part1. General results. Journal of Hydrology 140, 1–23. Refsgaard, J.C., Thorsen, M., Jensen, J.B., Hansen, S., Heuvelink, G., Pebesma, E., Kleeschulte, S., Ramamaekers, D., 1998. Uncertainty in spatial decision support systems—Methodology related to prediction of groundwater pollution. In: Babovic, V., Larsen, L.C. (Eds.), Hydroinformatics ‘98. Proceedings of the Third International Conference on Hydroinformatics, Copenhagen, Balkema, 24–26 August 1998, pp. 1153–1159. Refsgaard, J.C., Sørensen, H.R., Mucha, I., Rodak, D., Hlavaty, Z., Bansky, L., Klucovska, J., Topolska, J., Takac, J., Kosc, V., Enggrob, H.G., Engesgaard, P., Jensen, J.K., Fiselier, J., Griffioen, J., Hansen, S., 1998. An integrated model for the Danubian Lowland—methodology and applications. Water Resources Management 12, 433–465. Refsgaard, J.C., Ramaekers, D., Heuvelink, G.B.M., Schreurs, V., Kros, H., Rose´n, L., Hansen, S., 1998. Assessment of ‘cumulative’ uncertainty in spatial decision support systems: Application to examine the contamination of groundwater from diffuse sources (UNCERSDSS). Presented at the European Climate Science Conference, Vienna, 19–23 October. Saulnier, G.M., Beven, K., Obled, C., 1997. Digital elevation analysis for distributed hydrological modelling: Reducing scale dependence in effective hydraulic conductivity values. Water Resources Research 33 (9), 2097–2101. Sellers, P.J., Heiser, M.D., Hall, F.G., Verma, S.B., Desjardins, R.L., Schuepp, P.M., MacPherson, J.I., 1997. The impact of using area-averaged land surface properties—topography, vegetation conditions, soil wetness—in calculations of intermediate scale (approximately 10 km 2) surface-atmosphere heat and moisture fluxes. Journal of Hydrology 190, 269–301. Styczen, M., Storm, B., 1993. Modelling of N-movements on catch-
ment scale—a tool for analysis and decision making. 1. Model description. 2. A case study. Fertilizer Research 36, 1–17. Styczen, M., Storm, B., 1995. Modelling of the effects of management practices on nitrogen in soils and groundwater. In: Bacon, P.E. (Ed.). Nitrogen Fertilization in the Environment, Marcel Dekker, New York, pp. 537–564. Svendsen, H., Hansen, S., Jensen, H.E., 1995. Simulation of crop production, water and nitrogen balances in two German agroecosystems using the Daisy model,. Ecological Modelling 81, 197–212. Thorsen, M., Feyen, J., Styczen, M., 1996. Agrochemical modelling. In: Abbott, M.B., Refsgaard, J.C. (Eds.). Distributed Hydrological Modelling, Kluwer Academic Publishers, Dordrecht, pp. 121–141. UNCERSDSS, 1998. Assessment of cumulative uncertainty in Spatial Decision Support Systems: Application to examine the contamination of groundwater from diffuse sources (UNCERSDSS). EU contract ENV4-CT95-070. Final Report, available on http://projects.gim.lu/uncersdss. Vanclooster, M., Viaene, P., Christians, K., 1994. WAVE—a mathematical model for simulating agrochemicals in the soil and vadose environment. Reference and user’s manual (release 2.0). Institute for Land and Water Management, Katholieke Universiteit Leuven, Belgium. Vanclooster, M., Viaene, P., Diels, J., Feyen, J., 1995. A deterministic validation procedure applied to the integrated soil crop model. Ecological Modelling 81, 183–195. Vereecken, H., Vanclooster, M., Swerts, M., Diels, J., 1991. Simulating nitrogen behaviour in soil cropped with winter wheat. Fertilizer Research 27, 233–243. Wen, X.-H., Go´mez-Herna´ndez, J.J., 1996. Upscaling hydraulic conductivities in heterogeneous media: An overview. Journal of Hydrology 183, ix–xxxii. Wood, E.F., Sivapalan, M., Beven, K.J., Band, L., 1988. Effects of spatial variability and scale with implications to hydrologic modelling. Journal of Hydrology 102, 29–47. Wood, E.F., Sivapalan, M., Beven, K., 1990. Similarity and scale in catchment storm response. Reviews of Geophysics 28, 1–18. Woods, R., Sivapalan, M., Duncan, M., 1995. Investigating the representative elementary area concept: an approach based on field data. Hydrological Processes 9, 291–312. Young, R.A., Onstad, C.A., Bosch, D.D., 1995. AGNPS: an agricultural nonpoint source model. In: Singh, V.P. (Ed.). Computer Models of Watershed Hydrology, Water Resources Publication, pp. 1001–1020.
[11]
Thorsen M, Refsgaard JC, Hansen S, Pebesma E, Jensen JB, Kleeschulte S (2001) Assessment of uncertainty in simulation of nitrate leaching to aquifers at catchment scale. Journal of Hydrology, 242, 210-227.
Reprinted from Journal of Hydrology with permission from Elsevier
Journal of Hydrology 242 (2001) 210±227
www.elsevier.com/locate/jhydrol
Assessment of uncertainty in simulation of nitrate leaching to aquifers at catchment scale M. Thorsen a, J.C. Refsgaard a,*, S. Hansen b, E. Pebesma c, J.B. Jensen a, S. Kleeschulte d b
a DHI Water and Environment, Hùrsholm, Denmark Royal Veterinary and Agricultural University, Copenhagen, Denmark c University of Amsterdam, Amsterdam, The Netherlands d GIM, Luxembourg, Luxembourg
Received 21 February 2000; revised 21 July 2000; accepted 23 October 2000
Abstract Deterministic models are used to predict the risk of groundwater contamination from non-point sources and to evaluate the effect of alleviation measures. Such model predictions are associated with considerable uncertainty due to uncertainty in the input data used, especially when applied at large scales. The present paper presents a case study related to prediction of nitrate concentrations in groundwater aquifers using a spatially distributed catchment model. Input data were primarily obtained from databases at an European level. The model parameters were all assessed from these data by use of transfer functions, and no model calibration was carried out. The Monte Carlo simulation technique was used to analyse how uncertainty in input data propagates to model output. It appeared that the magnitude of the uncertainty depends signi®cantly on the considered temporal and spatial scale. Thus simulations of ¯ux concentrations leaving the root zone at grid level were associated with large uncertainties, whereas uncertainties in simulated concentrations at aquifer level on catchment scale was much smaller. q 2001 Elsevier Science B.V. All rights reserved. Keywords: Nitrate; Non-point pollution; Distributed model; Catchment scale; Uncertainty; Monte Carlo method
1. Introduction 1.1. Background Deterministic models are important tools for assessing nitrate leaching, transport and transformation in connection with groundwater resources management. Such models may be classi®ed according to the description of the physical processes as black box, * Corresponding author. Present address. Department of Hydrology, Geological Survey of Denmark and Greenland, Thoravej 8, DK-2400 Copenhagen, Denmark. E-mail address:
[email protected] (J.C. Refsgaard).
conceptual and physically-based and according to the spatial description as lumped and distributed (Wood and O'Connell, 1985; Nemec, 1994; Refsgaard, 1996; and others). In this respect three typical model types are the lumped black box model, the lumped conceptual and the distributed physically-based. Most nitrogen leaching models such as RZWQM (DeCoursey et al., 1989) and DAISY (Hansen et al., 1991) are of the physicallybased type, but cover only the root zone at plot or ®eld scale. Within the ®elds of nitrogen modelling at a catchment scale, typical examples of a black box, a conceptual and a distributed physically-based model are statistical regression models (Simmelsgaard,
0022-1694/01/$ - see front matter q 2001 Elsevier Science B.V. All rights reserved. PII: S 0022-169 4(00)00396-6
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
1991), the SWRRB (Arnold et al., 1990; Arnold and Williams, 1995) and the MIKE SHE/DAISY (Styczen and Storm, 1993), respectively. The black box and conceptual models are attractive because they require relatively less data, which are usually easily accessible, while the predictive capability of these models with regard to assessing the impacts of alternative agricultural practices is questionable due to the semiempirical nature of the process descriptions. A key problem in using the more complex physicallybased catchment models operationally lies in the generally large data requirements prescribed by the developers of such model codes. However, due to the better process descriptions these models may for some types of application be expected to have better predictive capabilities than the simpler models (Heng and Nikolaidis, 1998). Traditionally, complex leaching models are only used on plot or ®eld scales in areas with extraordinarily good data availability, and even for such cases the relevance of such an approach is often questioned because of the perceived uncertainty related to the model simulations (Skop, 1993). Hence, there is an evident need to assess the uncertainty related to large scale simulation of aquifer pollution from diffuse sources. When analysing for uncertainties in model simulations the two fundamentally different sources of uncertainty are: (1) uncertainty on input data in terms of input variables (time varying input such as climate data) and model parameters (e.g. soil physical characteristics); and (2) inadequate model structure (process descriptions, equations). When comparing the model outputs to measured ®eld data a third source of uncertainty has to be added, namely the error in the measurement of output from nature. Stochastic approaches are useful tools in uncertainty analyses. Assessment of uncertainties of model simulations requires a joint stochastic±deterministic approach, where the input data and/or the structure of the deterministic model somehow are considered stochastic. By considering input data as realisations of stochastic variables with given statistical properties, the governing equations become socalled stochastic partial differential equations (PDEs). The three traditional approaches to solving
211
the stochastic PDEs are (1) state space formulations Ð Kalman ®ltering (Gelb, 1974; Ahsan and O'Connor, 1994), (2) Monte Carlo techniques (Smith and Freeze, 1979a,b; Freeze, 1980; Zhang et al., 1993, and (3) analytical solutions to the stochastic PDEs (Gelhar, 1986; Dagan, 1986; Jensen and Mantoglou, 1992). A severe limitation of the above three methods is that they only consider uncertainties on input data, while all of them assume the model structure to be correct. A more comprehensive approach also allowing consideration of the uncertainty in the model structure and process equations is the generalised likelihood uncertainty estimation (GLUE) methodology outlined in Beven and Binley (1992). Although no such studies have been reported yet, the GLUE in principle allows the uncertainty on model structure to be considered by introducing several alternative models, so that the Monte Carlo procedure includes both uncertainties on input data and on model structure. The objective of the present paper is, by use of Monte Carlo simulations, to assess whether a distributed physically-based model can provide fairly accurate predictions of nitrate concentrations in aquifers when applied at a catchment scale with input data only from readily available, aggregated data sources such as European databases. A limitation of the present paper is that only uncertainties in input data are considered, while errors in model structures are not taken into account. The studies reported in literature dealing with assessment of uncertainty of physically-based models consider only individual components of the hydrological cycle, typically groundwater, while the studies dealing with conceptual models, including both surface water, root zone and groundwater processes, have not considered uncertainties on nitrogen or other water quality aspects. Thus, to our knowledge, no similar attempts have been reported so far. The present paper focussing on uncertainty assessment at catchment scale is an extension of Refsgaard et al. (1999) and Hansen et al. (1999), where details on the deterministic modelling at catchment scale and the uncertainty aspects at the nitrogen leaching from the root zone, respectively, have been described. All three papers present results from the UNCERSDSS project (Refsgaard et al., 1998).
212
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
2. Methodology 2.1. Modelling approach The deterministic simulation is carried out by the coupled MIKE SHE/DAISY system. This is a coupling of a 1D root zone model (DAISY) and a 3D distributed catchment model (MIKE SHE). MIKE SHE is a modelling system describing the ¯ow of water and solutes in a catchment in a distributed physically-based way. This implies numerical solutions of the coupled PDEs for overland (2D) and channel ¯ow (1D), unsaturated ¯ow (1D) and saturated ¯ow (3D) together with a description of evapotranspiration and snowmelt processes. For further details reference is made to the literature (Abbott et al., 1986; Refsgaard and Storm, 1995). DAISY (Hansen et al., 1991) is a 1D physicallybased modelling tool for the simulation of crop production and water and nitrogen balance in the root zone. DAISY includes modules for description of evapotranspiration, soil water dynamics based on Richards' equation, water uptake by plants, soil temperature, soil mineral nitrogen dynamics based on the advection±dispersion equation, nitrate uptake by plants and nitrogen transformations in the soil. The nitrogen transformations simulated by DAISY are mineralisation-immobilisation turnover (MIT), nitri®cation and denitri®cation. In addition, DAISY includes a module for description of agricultural management practices. By combining MIKE SHE and DAISY, a complete modelling system is available for the simulation of water and nitrate transport in an entire catchment. In the present case the coupling is a sequential one. Thus for all agricultural areas, DAISY ®rst performs calculations of water and nitrogen behaviour from the soil surface and through the root zone. The percolation of water and nitrate at the bottom of the root zone, simulated by DAISY, is then used as input to MIKE SHE calculations for the remaining part of the catchment. For natural areas, MIKE SHE calculates also the root zone processes assuming no nitrate contribution from these areas. Due to the sequential execution of the two codes, it has to be assumed that there is no feedback from the groundwater zone (MIKE SHE) to the root zone (DAISY). As the riparian buffer zone, where such feedback mechanism is effective, often mainly
(like in our case study) constitutes a part of the natural areas, this limitation is of minor practical importance. Furthermore, overland ¯ow generated by high intensity rainfall (Hortonian) can not be simulated by this coupling, while saturation-excess overland ¯ow (Dunne) can be accounted for by MIKE SHE. Thus, MIKE SHE does not in the present case handle evapotranspiration and other root zone processes in the agricultural areas. As DAISY is 1D, one DAISY run in principle should be carried out for each of MIKE SHE's horizontal grids. However, several MIKE SHE grids are assumed to have identical root zone properties (soil, crop, agricultural management practices, etc), so that in practise the outputs from each DAISY run can be used as input to several MIKE SHE grids. To ful®l one of the overall objectives of the project, which was to assess the quality of European data sets for direct use for modelling at the European scale, two key constraints were applied to the modelling approach. One constraint was that, if possible, input data such as model parameters and driving variables should be based on publicly available information, which preferably could be accessed from the standard European databases such as GISCO or EUROSTAT, or from very easily available national sources. Another constraint was that all model parameters obtained from standard databases were to be used directly or by way of transfer functions without any model calibration. 2.2. Scaling As the equations in both the MIKE SHE and the DAISY codes basically are point scale equations a scaling procedure had to be adopted in order to apply the codes at a catchment scale. MIKE SHE/ DAISY is in this case run with equations and parameter values in each model grid point representing ®eld scale conditions. The ®eld scale is characterised by `effective' soil and vegetation parameters, but assuming only one soil type and one cropping pattern. The smallest horizontal discretisation in the model is the grid scale (2 £ 2 km 2) that is larger than the ®eld scale. This implies that all the variations between categories of soil type and crop type within the area of each grid can not be resolved and described at the grid level. Input data, whose variations are not
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
213
Fig. 1. Location of the Karup catchment in Jutland, Denmark.
included in the grid scale representation, are distributed randomly at the catchment scale so that their statistical distributions are preserved at that scale. The results from the grid scale modelling are then aggregated to catchment scale (130 grids) and the statistical properties of model output and ®eld data are then compared at catchment scale. Thus the scaling procedure from point scale to catchment scale may be characterised as a combination of an upscaling step and an aggregation step. The upscaling step is simply the important assumption that the point scale equations are valid at ®eld scale. The aggregation step highlights a key issue from the concept of representative elementary area (REA) (Wood et al., 1988), namely that variability can be explicitly represented only at scales larger than the model grid size. More details on the adopted scaling approach is
provided in Refsgaard et al. (1999), where it is also documented that the approach can be assumed valid for the case study in question. 2.3. Input error assessment The MIKE SHE/DAISY model contains a very large number of input parameters. Ideally, all these parameters should be treated stochastically and included in the uncertainty analyses. However, this would result in an unrealistically high number of Monte Carlo simulations and CPU-time. Therefore, the input uncertainty was limited to ®ve key parameters (see Section 3.2 below), which were selected so that they, by experience, are known to be the dominant parameters in the processes governing the water balance and nitrate leaching and transformation.
214
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
The actual input error assessment, i.e. the choice and parameterisation of the joint probability distribution of the stochastic variables was partly based on the analysis of available data and partly on expert judgement. Available data comprised data from national surveys or previous studies. The expert judgement refers for instance to the choice of the distribution type if no data were present, and the assessment of `realistic' ranges between which the true parameter values were expected to vary. Although this assessment seems rather subjective, it was hard to ®nd a better way of doing this in the case of lacking data. Since the basic unit of calculation is a ®eld, the variation of ®eld-effective values was used for determining the range of the parameter probability distributions. A single realisation of such a parameter was then used in the model for each grid cell. All stochastic parameters were treated as being mutually independent. The reasons for this are that no signi®cant correlation was suspected a priori, and that no data were available to actually estimate possible correlation. 2.4. Error propagation The propagation of errors in the input data to the model output was assessed using Monte Carlo analysis. This means that a number of realisations were drawn at random from the stochastic input parameter distributions and that the model was run for each realisation. The ensemble of model outputs then is an estimate of the model output probability distribution, as only in¯uenced by uncertainty in model input parameters. In order to reduce the number of Monte Carlo runs, Latin hypercube sampling was used to draw realisations from the input variables (McKay et al., 1979). This essentially means that each sample of a stochastic input variable was strati®ed in N strata with equal probability mass, where N equals the number of Monte Carlo runs. The theoretical background for the adopted Latin hypercube sampling method is described in Pebesma and Heuvelink (1999). 3. Application 3.1. Study area The area used in the study is the Karup river basin, located in the middle part of Jutland, Denmark
(Fig. 1). The topographic catchment covers approximately 500 km 2 of which 70% are used for agricultural purposes and 30% are natural areas. The catchment characteristics are described in Styczen and Storm (1993). The data used for the present study and the model construction are described in detail in Refsgaard et al. (1999) and Hansen et al. (1999). In the following a brief summary is provided. The catchment was in the model represented in a 3D network. The discretisation used for the uncertainty analysis was 2 km in the horizontal direction and varied in the vertical from 5 to 40 cm in the unsaturated zone, and from 10 to 15 m in the saturated zone. The catchment area and the location of the river branches as well as the stream geometry were generated on the basis of a digital elevation map from USGS/GISCO using Arc/Info facilities. Spatial distributions of land use and soil types were derived from the GISCO database and hydrogeological data were obtained from EC (1982). Distributions of crop types and livestock densities were obtained from Agricultural Statistics (1995) and converted to slurry production using standard values for nitrogen content. Based on typical crop rotations proposed by The Danish Agricultural Advisory Centre and the constraints offered by crop distribution and livestock density two cattle farm rotations, one pig farm rotation and one arable farm rotation were constructed. In order to capture the effect of the interaction between weather conditions and crop, simulations were performed in such a way that each crop at its particular position in the considered rotation occurred once in each of the years in the rotation. This resulted in a total of 17 agricultural crop rotation schemes and one scheme representing natural areas with no assumed nitrate leaching. These 18 schemes were distributed randomly over the area in such a way that the statistical distribution was in accordance with the agricultural statistics. To simulate the trend in the nitrate concentrations in the groundwater and in the streams, it is necessary to have information on the history of the fertiliser application in space and time. In Denmark, norms and regulations for fertilisation practice are de®ned (Plantedirektoratet, 1996). These regulate the maximum amount of nutrients allowed for a particular crop depending on forefruit and soil type, and in addition, provide norms for the lower limit of nitrogen
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
215
Table 1 Statistical properties of the input error considered in the Monte Carlo analysis Parameter Daily rainfall Standard error Clay content SOM2 Cattle slurry Dry matter content Total N content Pig slurry Dry matter content Total N content Depth of reduction front a
Unit
Distribution
Mean
Std.
Range
% % %
Uniform Truncated normal
50 8.5 0.5
0.22
0.0±17.0 0.06±0.94
% %
Truncated normal Truncated normal
7.5 0.5
2.5 0.12
1.89±14.35 0.24±1.02
% % m
Truncated normal Truncated normal Uniform
4.9 0.61 22.5
2.5 0.18
0.82±13.79 0.24±1.02 18±27
a
The series was normalised so that the mean value was preserved.
utilisation for organic fertilisers. It was assumed that the farmers follow these statuary norms. Based on estimated application rates of organic and mineral fertiliser to the individual crops each year, the DAISY model simulated time series of nitrate leaching from the root zone for each agricultural grid. The MIKE SHE model then routed these ¯uxes further through the unsaturated zone and in the groundwater layers accounting for dispersion and dilution processes and ®nally into the Karup stream where the integrated load from the entire catchment was estimated. The model was run for seven years, from 1987 to 1993. The large storage possibilities in the unsaturated zone and the aquifer imply that the initial conditions in¯uence the simulation results for several years. The initial conditions were established by running the deterministic model twice for the period 1987± 1993. In the ®rst run the initial conditions were guessed and in the second run they were taken as the simulated conditions by the end of the period. The simulated 1993 conditions in the second run were then used as initial conditions for the Monte Carlo runs. This procedure ensures that the initial conditions are consistent with the assumptions made in the deterministic simulation, but not necessarily with the parameter values drawn in the Monte Carlo runs, where e.g. a run with a parameter value resulting in higher nitrate leaching, in principle, should have been associated with higher initial nitrate concentrations in the aquifer. In order to reduce the effect of this, the two ®rst years were considered as a
`warming-up period' and the last ®ve years were considered the simulation period. 3.2. Assessment of input errors Uncertainty on the following ®ve parameters was introduced in the analysis: precipitation, soil hydraulic properties, soil organic matter (SOM) content, slurry composition, and depth of the nitrate reduction front in the aquifer. The rationale for selecting these ®ve parameters and details on their assessment are provided in Sections 3.2.2±3.2.6 below. The statistical characteristics of the data included in the Monte Carlo analysis are shown in Table 1. 3.2.1. Length scale and spatial correlation A fundamental question in the assessment of uncertainty of input data for a spatially distributed model like MIKE SHE/DAISY is whether the input data are spatially correlated or not. It is possible to take spatial correlation into account, however, it will complicate the Monte Carlo sampling considerably (Kros et al., 1999). The critical question in this relation is whether the spatial autocorrelation length scale of the input data is larger than the computational scale, or whether the dominating spatial variability takes place within a computational length scale, in which case it should be incorporated into the effective model parameters and their inherent uncertainties. As discussed above, the basic unit of calculation is the model grid (2 £ 2 km 2) with some of the parameters, however, representing ®eld-effective values
216
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
(typically 1±10 ha in size). Hence the soil hydraulic parameters, the SOM content and slurry composition are representing ®eld length scales in the order of 100±300 m, while the precipitation and reduction front are represented at a 2 km length scale. For the ®eld related parameters the correlation length scales can be assumed smaller than 100 m. For soil hydraulic properties this is documented in previous studies (Hansen and Jensen, 1988), while no data exist on length scales for SOM. With respect to slurry composition this parameter is the result of farm management and storage conditions, and it is known that the temporal variability of the produced slurry on the individual farm is considerable. Hence, it is assumed that the variability within the individual ®elds is much larger than the variability among the ®elds. Daily rainfall data are known to have correlation length scales that are usually larger than the 2 km grid scale used in the present case. Geostatistical analysis (Storm et al., 1988) suggests that the length scale for Danish conditions is in the order of 10 km. Similarly, the location of the reduction/oxidation front, which is mainly dependent on geological conditions, may be assumed to be signi®cantly larger than the 2 km grid. This implies that the three ®eld related parameters in principle should be treated as spatially independent in the Monte Carlo analysis, while the two other input data could be treated as almost spatially constant. As a consequence of the adopted scaling approach the relevant scale for which the uncertainty on the input data should be generated in the Monte Carlo analysis is the catchment scale and not the grid scale. The uncertainty at catchment scale can be generated either by allowing spatial variation among grids and use a variance applicable for grid scale in the Monte Carlo sampling or by assuming a spatially constant value and using the (smaller) catchment scale variance. In the present study we have adopted the latter approach. This has two important limitations. Firstly, the nitrate reduction processes in the aquifer, where the horizontal dimension with ¯ows between neighbouring grids is important, is not fully correctly described because the autocorrelation length scale is not preserved. Secondly, the output uncertainties are only simulated correctly at the catchment scale, while they are underestimated at grid scales.
3.2.2. Precipitation In general the required daily climate data are available throughout Europe from the national meteorological institutes. Among the required meteorological variables the precipitation is the one, subject to most local variations. Therefore uncertainty on the daily amount of precipitation was included in the present analysis. The uncertainty was described by adding a random error to the measured series. This error was assumed to follow a normal distribution with zero mean and a standard deviation equivalent to 50% of the measured daily value. Thus, dry days were kept dry. The error was assumed to contain no temporal autocorrelation. Finally, the series was normalised so that the mean value, taken over the 25 Monte Carlo runs, was preserved. The adopted variance is in agreement with Allerup et al. (1982) as standard error of daily rainfall for a catchment of this size. 3.2.3. Soil hydraulic properties The modelling system requires soil hydraulic parameters in terms of retention curves and hydraulic conductivity functions. Such data were not directly available through European databases. Instead, these properties were estimated using pedo-transfer functions based on soil information in terms of texture composition obtained from the GISCO soil database in which soils are divided into ®ve texture classes according to FAO classi®cation. All soil types of the Karup catchment fall within one texture class (coarse texture) which covers soils with less than 18% clay and more than 65% sand. As the texture class covers a wide range of different texture compositions, soil hydraulic properties derived from this information will be associated with considerable uncertainty. Based on a review by Tietje and Tapakenhinrichs (1993) evaluating available pedo-transfer functions and based on the constraints imposed by the available information on texture (clay, silt and sand content), the pedo-transfer functions proposed by Cosby et al. (1984) were selected. These functions estimate the saturated hydraulic conductivity and the parameters in the soil water retention function proposed by Campbell (1974). The hydraulic conductivity function was calculated according to Burdine (1952) using the same parameters. In order to facilitate a smooth retention function the Campbell functions were modi®ed according to the modi®cations of the Brooks±Corey
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
function (Brooks and Corey, 1966) proposed by Smith (1992). In Danish soils the clay and the silt content are correlated. Based on information in the Danish Soil Library (Lamm, 1971) a relation between clay and silt has been established: Silt content 0:035 1 0:82 £ Clay content
r 2 0:68 Adopting this relation and assuming that clay, silt and sand constitute all soil solids, the soil hydraulic properties can be calculated once the clay content is known. In the uncertainty analysis, the clay content was drawn strati®ed random from a uniform distribution ranging from 0 to 17% (Table 1). In reality, the uncertainty on the soil hydraulic parameters originate from two sources, namely the uncertainty on soil texture and the uncertainty related to use of the adopted pedotransfer function. In the present approach uncertainty is only associated to soil texture. Data from the Danish Soil Textural Database show that a uniform distribution, as adopted in the present study, clearly overestimates the uncertainty on soil texture (Bùrresen, 2000). The assumed large uncertainty range on soil texture may therefore compensate for the lack of uncertainty on the pedotransfer function, so that the integrated uncertainty on the soil hydraulic parameters is of the right order of magnitude. Considering that the autocorrelation length scale for soil texture is in the order of 100 m, this adopted uncertainty range may at a ®rst glance appear as a rather high uncertainty for soil texture at the catchment scale. However, as the FAO texture class is so broad that it actually covers different soil types with large differences in hydraulic properties the adopted catchment scale variance should be seen to cover uncertainty on which soil type actually is present in the catchment rather than uncertainty on hydraulic properties due to small scale variations. 3.2.4. Soil organic matter In DAISY, the MIT model considers three types of organic matter: newly added relatively fresh organic matter (AOM) with a relatively short turnover rate, the living soil microbial biomass (SMB) and old native SOM with slow turnover, respectively. The former two can be initialised with default values
217
when the model is run with a `warm-up' period of a couple of years prior to the actual simulation period. The latter comprises by far, most of the organic matter found in the soil. However, SOM is divided into two sub-pools, SOM1 and SOM2. The turnover of SOM1 is so slow that its contribution to the annual nitrogen mineralisation in agricultural soils is negligible. Hence, when initialising the MIT model the important factor is the quantity of SOM2. As the European databases did not provide this information we had to rely on estimates of both the amount of the organic matter present in the soil and the amount of this organic matter that is allocated to the SOM2. The assumed statistical properties of this uncertainty are shown in Table 1. 3.2.5. Slurry composition Due to the high livestock density, slurry is a substantial source of nitrogen in the Karup region. Hence the management of slurry is of prime importance for the leaching losses. A main problem in management of slurry is the large variability found in the composition of the slurry. This variability makes the actual fertiliser application in slurry differ from the planned application and introduces therefore a considerable source of uncertainty. In the uncertainty analysis this has been accounted for by introducing uncertainty on the dry matter content and the nitrogen content of the slurry. The assumed error statistics are shown in Table 1. Further details on the agricultural management and the rationale behind the error statistics are provided in Hansen et al. (1999). 3.2.6. Depth of reduction front In the uncertainty analysis the depth of the reduction front in the saturated zone was drawn from a uniform distribution in the interval 18±27 m below soil surface. 3.3. Uncertainty analyses The initial part of the uncertainty analysis comprised an evaluation of the selected number of Monte Carlo runs. As the CPU-time required to run the model for the seven year period is substantial it was necessary to keep the number of Monte Carlo runs to a minimum. Therefore an initial choice of 25
218
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
Table 2 Evaluation of the representativeness of 25 Monte Carlo runs Variable
Leaching from root zone (kg N/ ha/year) Groundwater concentration (mg NO3/l) River ¯ow (mm/year) River concentration (mg NO3/l) a
1±25
26±50
51±75
1±75
CV (%)
Mean
Std.
Mean
Std.
Mean
Std.
Mean a
Std.
64.7
19.2
68.2
18.9
67.2
16.7
66.7
18.1
27.1
47.7
8.0
48.3
7.2
47.6
6.0
47.8
7.0
14.6
464.0 45.1
22.0 7.8
464.0 46.2
23.0 7.3
464.0 45.7
17.0 6.6
464.0 45.7
21.0 7.1
4.5 15.5
Homogeneity of means accepted by F-test.
runs was made. In order to investigate whether 25 Monte Carlo runs are suf®cient to capture the variability, 75 Monte Carlo runs were performed and the results were split into 3 groups of 25 runs each and the statistical distribution of the three elements were compared. The output variables analysed were river ¯ow, average NO3 concentration in groundwater, and average NO3 concentration in the stream. The three sets of Monte Carlo runs were evaluated by comparing the statistical distribution of simulation results, i.e. testing whether the simulation results can be described by a normal distribution and whether homogeneity of mean and variance can be assumed. In the second part of the uncertainty analysis the sources of uncertainty with respect to uncertainties associated with each of the selected Monte Carlo parameters were evaluated by performing ®ve sets of Monte Carlo simulations in each of which one of
the initially stochastic parameters was kept deterministic. The uncertainty contributions of the different parameters were then evaluated. As annual leaching depends on weather, crop and crop position in the rotation, groundwater concentrations in single years were not considered, instead data averaged over the ®ve year simulation period, 1989±1993, were used for the uncertainty analysis.
4. Results Ð uncertainties of model results 4.1. Evaluation of the number of Monte Carlo runs The main results of the comparison between three individual sets of 25 Monte Carlo runs are given in Table 2. Statistical tests showed that the hypothesis of homogeneity of means and variances can not be
Fig. 2. Statistical distribution from 25 Monte Carlo runs of simulated average annual river ¯ow at the catchment outlet. The corresponding measured value based on daily river ¯ow data was 451 mm/year.
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
219
Fig. 3. Statistical distribution over 25 Monte Carlo runs of simulated areal average NO3 concentrations in upper aquifer layer by the end of 1993. The corresponding measured value based on data from 35 wells was 58 mg/l.
rejected. As the three sub-sets appear statistically similar it was concluded that 25 Monte Carlo runs were suf®cient to assess the uncertainty on the simulation results. It should be emphasised that the small number of Monte Carlo runs only is possible because we focus on mean values and standard deviations. If the aim were to assess uncertainties on extreme values, such as the 1% fractile, 25 runs would obviously not have been suf®cient. 4.2. Comparisons with ®eld data The simulated uncertainty intervals on selected
model results were, if possible, compared to corresponding measured data available from monitoring programmes conducted in the area. In this context it is noted that due to the adopted scaling approach, the simulation results are only supposed to re¯ect the ®eld observations at a catchment scale and not at a point scale. The simulated water balance represented by average annual river discharge at the catchment outlet vary from 428 to 502 mm/year (Fig. 2). The corresponding measured value is 451 mm/year which falls within the simulated interval and within 5% of both the median (462 mm) and the average (463 mm)
Fig. 4. Statistical distribution over 25 Monte Carlo runs of percentage of catchment area with NO3 concentrations above the drinking water limit of 50 mg/l. The corresponding measured value based on data from 35 wells was 57%.
220 M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227 Fig. 5. Measured (B) and simulated ( £ ) areal distribution of NO3 concentrations in groundwater at eight points in time. Measured values are based on 35 groundwater observations.
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
221
Fig. 6. (a) Simulated time series of six monthly ¯ux concentrations from the root zone obtained in three different crop rotations (B mean, u ^ 1 £ std). The range of seasonal variation in standard errors is shown inside the ®gures. (b) Simulated time series of average areal aquifer concentrations (B mean, u ^ 1 £ std). The range of seasonal variation in standard errors is shown inside the ®gures.
222
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
Fig. 6. (continued)
of the simulated values. Fig. 3 presents the simulated distribution of average nitrate concentrations in the upper groundwater layer averaged over the entire catchment and over the ®ve years simulation period. The corresponding value obtained from observations in 35 wells is 58 mg/l, which falls within the simulated interval (35.4±61.4 mg/l) and within 25% of both the median (46.7 mg/l) and the average (47.4 mg/l) of the Monte Carlo runs. In Fig. 4 the fraction of the catchment area with groundwater concentrations above the drinking water limit of 50 mg/l is shown in terms of statistical distribution for the 25 Monte Carlo runs. Also in this case the observed value from the 35 observation wells (57%) falls within the simulated interval (27±65%) and within 10% of the median (53%) of the Monte Carlo runs. A visual comparison is shown in Fig. 5, where observed areal distributions of nitrate concentrations from existing wells are compared to similar results from the Monte Carlo runs on a six-monthly basis. From this ®gure it is seen that the measured concentration distribution in general is within the uncertainty band generated from the Monte Carlo simulations, though not always centred. It appears that, in general, the simulated fraction of the area with nitrate concentrations exceeding 50 mg/l is slightly overestimated in the summer period and slightly underestimated in the winter period, indicating that the overall trend in the concentration level is simulated adequately whereas the seasonal variation in observed concentrations is not fully represented in the simulations.
4.3. Nitrate concentrations in aquifer Ð at different temporal and spatial scales The results regarding the uncertainty on simulated nitrogen leaching from different cropping patterns and the importance of the contribution from different error sources are described in detail in Hansen et al. (1999). The present paper focuses on the catchment scale and on how uncertainties at a point scale propagate and are transformed (reduced) at larger spatial and temporal scales. The transformation process is illustrated in Fig. 6 which shows the uncertainty, characterised by time series of the means and standard deviations among the 25 Monte Carlo runs for (a) six-monthly ¯ux concentrations from the root zone (DAISY output) for three different crop rotations, and (b) mean sixmonthly concentrations in the upper aquifer layer averaged over the entire aquifer. It is very clearly seen from the ®gures how the uncertainties are reduced when moving from root zone leakage to aquifer concentrations at catchment scale. Thus it is remarkable that for instance the average standard errors (standard deviation divided by mean) of six monthly root zone ¯ux concentrations in the order of 33±44% are reduced to a standard error of 18% on the assessed mean six monthly values for ground water concentrations at the catchment scale. The large seasonal variation in concentration levels observed in the percolation water (Fig. 6a) is levelled out in the simulated groundwater concentrations at both grid level and catchment level. This is mainly a
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227 Table 3 Simulations used for evaluation of uncertainty contributions. All six sets are based on the input uncertainties drawn for the ®rst set of Monte Carlo simulations (1±25) Monte Carlo run series
Status of parameters
O
All ®ve parameters are treated stochastic Precipitation is treated deterministic Texture is treated deterministic Soil organic matter is treated deterministic Slurry composition is treated deterministic Depth of reduction front is treated deterministic
A B C D E
result of dilution and averaging in the entire groundwater volume of the upper layer which accounts for 8±13 m of the saturated zone. The differences in concentration levels between crop rotations is, on the other hand, still re¯ected in the groundwater concentrations of corresponding grids (Fig. 6b) with lowest concentration arising from the plant production rotations and highest concentrations from the pig rotations. 4.4. Analyses of different sources of input error In addition to the basic set of Monte Carlo simulations (1±25), where all ®ve selected parameters were treated stochastically, ®ve series were simulated in each of which one of the Monte Carlo parameters was kept deterministic (Table 3). The results of
223
these extra ®ve series were compared to the result of the basic set in order to evaluate the uncertainty associated with each of the selected parameters. In Table 4, the uncertainty contribution of each series given as variances is shown. The variance contribution of single parameters was obtained by subtracting the total simulated variance obtained with only four stochastic parameters (e.g. series A) from the total variance obtained with ®ve stochastic parameters (series O). Ideally, the sum of the variances corresponding to the simulation series A±E should equate the variance associated with Monte Carlo run series O, if no covariance components were generated. It is, however, noted that discrepancies occur indicating that all variance and covariance components are not accounted for. In spite of this, the results can give a rough estimate on the relative importance of the selected sources of uncertainty. As can be seen from Table 2 (runs 1±25) the uncertainty on the simulated annual river ¯ows (CV std./ mean 5%) was signi®cantly less than the uncertainty related to the components of the nitrogen balance i.e. nitrogen leaching (CV 30%) and nitrate concentrations in groundwater and stream water (CV 17%). According to Table 4 the uncertainty on simulated river ¯ow was dominated by contributions from uncertainty on soil texture and on precipitation, whereas the uncertainties associated with components of the nitrogen balance were dominated by the uncertainty contributions from both soil texture, SOM and slurry composition. Uncertainty on precipitation contributed only little to the simulated uncertainties on the nitrogen components despite the in¯uence it had on the water balance. The depth of
Table 4 Estimation of uncertainty on selected simulation results distributed on calculated variance contribution (s 2) from precipitation (A), soil texture (B), soil organic matter (C), slurry composition (D), and depth of the reduction front (E), respectively Variable
Variance contribution from single parameters A
Leaching from root zone (kg/ha year) Groundwater concentration (mg/l) River ¯ow (mm/year) River concentration (mg/l) a
SUM (A:E)
All parameters O a
B
C
D
E
0
192
100
114
0
406
370
2
30
29
28
0
89
64
284 0
345 27
6 21
6 19
0 0
641 67
499 61
Variance from simulations with all ®ve Monte Carlo parameters included.
224
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
the reduction front appeared to have only minor in¯uence on the uncertainty of stream water concentrations in the present simulations. 5. Discussion and conclusions From the analysis of input error contributions it was observed that only three of the ®ve input parameters included in the uncertainty analysis contributed signi®cantly to the simulated variation in the model output related to the nitrogen balance, i.e. areal leaching from the root zone and average nitrate concentrations in groundwater and stream water. Of these three only one, soil texture, is related to the transport processes. The two others, SOM and slurry composition, are related to the nitrogen turnover processes. The uncertainty introduced to the driving variable precipitation in¯uenced the simulated water balance but not the simulated nitrogen balance. This indicates that the timing of the percolating water governed by the hydraulic parameters is more important for the simulated nitrogen loads than the total annual amounts of percolation. This result is supported by other studies showing that one of the major factors in¯uencing nitrogen losses from the root zone under northern temperate climate is the amount of readily available organic nitrogen present in the soil at the end of the growing season where groundwater recharge is initiated (Landbrugets RaÊdgivningscenter, 1996). The predicted uncertainty on the simulated river ¯ow is in good agreement with results from Storm et al. (1988). The uncertainty introduced to the depth of the reduction front in the saturated zone had no in¯uence on the simulation results. The main reason for this is that the simulated groundwater levels were shallower than normally observed in the area. This prevented the percolating water from passing through the reduced zone before entering the stream. If the hydrogeological parameters had been included in the Monte Carlo analysis, the depth of the reduction front might have contributed to the simulated variation in the nitrogen balance component, in particular stream ¯ow concentrations, as well. A fundamental limitation of the adopted approach is that the errors due to incorrect model structure are neglected. One approach to assess such model error is through comparison of predicted and observed values.
In the present case it was, however, not possible during the validation tests to identify a signi®cant model error. This must not be taken as a general proof for a correct model structure. It only shows that the model performs without apparent model error for the particular case study. Another limitation of the adopted approach lies in the choice of associating input uncertainty to only ®ve parameters. Although these ®ve parameters according to our experience are the most important ones in the different processes governing the nitrate leaching and transformation, this has not been documented by systematic sensitivity analyses, either by us or by other authors. It can be argued that the uncertainties have been underestimated by neglecting the uncertainty on the other input parameters. Hence, the absolute uncertainty ®gures should be considered with some reservation. A third limitation is the mostly subjective method of assessing errors in input data. If suitable data had been available for assessing such errors in a statistically more rigorous way this should have been done. Cases where such data are available are typically studies on small experimental areas, while our case is more comparable to practical studies, where such data most often are not available. In spite of the weak data basis for the input error assessment, the adopted Monte Carlo analysis is still valuable as a rigorous method of analysing uncertainty propagation, although the predicted uncertainties should be treated with some caution. When considering uncertainties at different scales it must be noticed that due to the adopted approaches with respect to upscaling and Monte Carlo sampling the uncertainties can only be assumed to be correctly assessed at the catchment scale, while the uncertainties at smaller scale are underestimated. This ampli®es the ®nding re¯ected in Fig. 6, namely that the uncertainties in ¯ux concentrations leaving the root zone is much larger than the uncertainty at the catchment/aquifer scale. Taking this into account one could argue that the uncertainty in simulated ¯ux concentrations leaving the root zone at point/grid scale is so large that this in itself may lead to the conclusion that modelling with this type of model, this grid size, and this data basis is of minor practical use. However, the uncertainty at the catchment (or aquifer) scale, which is an interesting scale seen from a water
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
supply and policy point of view is reduced so much that the results may be useful in practice. This duality illustrates that discussions of model uncertainty are useless unless the type of simulation result is de®ned precisely in terms of spatial and temporal scale, which is probably one of the reasons why `®eld/process study oriented scientists' and `modellers/large scale oriented scientists' often misunderstand each other. One way of reducing the simulated uncertainty would be to increase the quality of the input data support either by using national databases instead of the European data sets or by actually gathering site speci®c data through ®eld monitoring. The uncertainty related to the texture composition could be reduced by using national soil databases, which often include more detailed classi®cation systems than the FAO approach provided in the GISCO database. Keeping the procedure of using pedo-transfer functions for obtaining hydraulic parameters this would decrease the uncertainty within each de®ned soil class. Based on the effect of keeping soil texture deterministic (Table 4) it could for example be expected that a 50% reduction in the input error related to soil texture obtained by collecting better data in this way would reduce the uncertainty on simulated groundwater concentration with approximately 25%. Gathering of better precipitation data would, on the other hand, only improve simulation of the water balance and not in¯uence the simulated uncertainty in groundwater concentrations signi®cantly. Another way of decreasing the uncertainty would be to carry out model calibration, as this in principle would decrease the uncertainty related to the input parameters. In practice it is, however, dif®cult to quantify how much the input error of a single parameter should be reduced if calibration involving this parameter is conducted. In the present study, calibration of the hydrogeological parameters by use of measured groundwater levels and observed stream ¯ow might have in¯uenced both the simulated groundwater concentrations by introducing a more diverse hydrology and in particular the simulated stream concentrations as the reduction front may have come into function. Calibration of the root zone processes would have required ®eld data in terms of e.g. soil moisture contents, nitrogen concentrations in the root zone, crop yields, etc., data which
225
are not often available. In order to get some idea of the quality of the simulated mass balances, one possibility could be to calibrate the simulated crop yields using regional agricultural statistics, though these can only provide rather rough estimates. From the results of the present study it can be concluded that the present modelling approach appear feasible for estimating uncertainties in predicted nitrate concentrations at larger scales, and hereby also for evaluating the reliability of the simulation results. The results also indicate that the use of distributed physically-based models is feasible at the catchment scale, even if data have to be obtained from readily available aggregated data sources such as European databases. Given the constraints for obtaining data and given that no model calibration was performed in the present case study, the validation tests came out surprisingly well as measured groundwater concentrations were within the uncertainty intervals of the simulated groundwater concentration. The uncertainty of the model simulations at catchment scale are at a relatively low level, and thus the predictive capability of the model appear very interesting from a practical water resources management point of view. Acknowledgements The present work was partly funded by the EC Environment and Climate Research Programme (contract number ENV4-CT95-0070). We thank the two reviewers, Tim Burt and Bernd Huwe, for valuable comments to an earlier version of this manuscript. References Abbott, M.B., Bathurst, J.C., Cunge, J.A., O'Connell, P.E., Rasmussen, J., 1986. An introduction to the European hydrological system Ð SysteÂme Hydrologique EuropeÂen `SHE'. 1. History and philosophy of a physically based distributed modelling system. 2. Structure of a physically based distributed modelling system. Journal of Hydrology 87, 45±77. Agricultural Statistics, 1995. Danmarks Statistik, 294pp. Ahsan, M., O'Connor, K.M., 1994. A reappraisal of the Kalman ®ltering technique as applied in river ¯ow forecasting. Journal of Hydrology 161, 197±226. Allerup, P., Madsen, H., Riis, J., 1982. Methods for calculating areal
226
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227
precipitation Ð applied to the SusaÊ-catchment. Nordic Hydrology 13, 263±278. Arnold, J.G., Williams, J.R., Nicks, A.D., Sammons, N.B., 1990. SWRRB Ð A Basin Scale Simulation Model for Soil and Water Resources Management. Texas A & M University Press, College Station (241 pp). Arnold, J.G., Williams, J.R., 1995. SWRRB Ð a watershed scale model for soil and water resources management. In: Singh, V.P. (Ed.). Computer Models of Watershed Hydrology. Water Resources Publication, pp. 847±908. Beven, K., Binley, A.M., 1992. The future role of distributed models: model calibration and predictive uncertainty. Hydrological Processes 6, 279±298. Brooks, R.H., Corey, A.T., 1966. Properties of porous media affecting ¯uid ¯ow. Journal of the Irrigation and Drainage Division of the American Society of Civil Engineering 92, 61±88. Burdine, N.T., 1952. Relative permeability calculations from poresize distribution data. Transactions of the AIME 198, 35±42. Bùrgesen, C.D., 2000. Personal communication. Danish Institute of Agricultural Science. Campbell, G.S., 1974. A simple method for determining unsaturated conductivity from moisture retention data. Soil Science 117, 311±314. Cosby, B.J., Hornberger, M., Clapp, Ginn, T.R., 1984. A statistical exploration of relationships of soil moisture characteristics to the physical properties of soils. Water Resources Research 20, 682±690. Dagan, G., 1986. Statistical theory of groundwater ¯ow and transport: pore to laboratory, laboratory to formation, and formation to regional scale. Water Resources Research 22 (9), 120±134. DeCoursey, D.G., Rojas, K.W., Ahuja, L.R., 1989. Potentials for non-point source groundwater contamination analyzed using RZWQM. Paper no. SW892562. Presented at the International American Society of Agricultural Engineers' Winter Meeting, New Orleans, Louisiana. EC, 1982. Groundwater resources in Denmark. Commission of the European Communities. EUR 7941 (in Danish). Freeze, R.A., 1980. A stochastic-conceptual analysis of the rainfallrunoff process on a hillslope. Water Resources Research 16 (2), 391±408. Gelb, A. (Ed.), 1974. Applied Optimal Estimation MIT Press, Cambridge, MA. Gelhar, L.W., 1986. Stochastic subsurface hydrology. From theory to applications. Water Resources Research 22 (9), 135±145. Hansen, S., Jensen, H.E., 1988. Spatial variability of soil physical properties. Theoretical and experimental analysis. II. Soil water variables-data acquisition, processing and basic statistics. Research report no. 1210. Department of Soil and Water and Plant Nutrition. The Royal Veterinary and Agricultural University, Copenhagen, 54pp. Hansen, S., Jensen, H.E., Nielsen, N.E., Svendsen, H., 1991. Simulation of nitrogen dynamics and biomass production in winter wheat using the Danish simulation model DAISY. Fertiliser Research 27, 245±259. Hansen, S., Thorsen, M., Pebesma, E., Kleeschulte, S., Svendsen, H., 1999. Uncertainty in simulated leaching due to uncertainty in input data. A case study. Soil Use and Management 15, 167±175.
Heng, H.H., Nikolaidis, N.P., 1998. Modelling of nonpoint source pollution of nitrogen at the watershed scale. Journal of the American Water Resources Association 34 (2), 359±374. Jensen, K.H., Mantoglou, A., 1992. Application of stochastic unsaturated ¯ow theory, numerical simulations and comparison to ®eld observations. Water Resources Research 28 (1), 269±284. Kros, J., Pebesma, E.J., Reinds, G.J., Finke, P.A., 1999. Uncertainty assessment in modelling soil acidi®cation at the European scale: a case study. Journal of Environmental Quality 28 (2), 366±377. Lamm, C.G., 1971. The Danish soil database. Tidskrift for Planteavl 75, 703±720 (in Danish). Landbrugets RaÊdgivningscenter, 1996. Square grid for nitrate investigations in Danmark 1990±1993. Landskontoret for Planteavl, Skejby, Denmark (in Danish). McKay, M.D., Conover, W.J., Beckman, R.J., 1979. A comparison of three methods for selection values of input variables in the analysis of output from a computer code. Technometrics 2, 239±245. Nemec, J., 1994. Distributed hydrological models in the perspective of forecasting operational real time hydrological systems (FORTHS). In: Rosso, P., Peano, A., Becchi, I., Bemporad, G.A. (Eds.). Advances in Distributed Hydrology. Water Resources Publications, pp. 69±84. Pebesma, E.J., Heuvelink, G.B.M., 1999. Latin hypercube sampling of Gaussian random ®elds. Technometrics 41 (4), 303±312. Plantedirektoratet, 1996. Vejledninger og skemaer 1996/1997. Ministry for Food, Agriculture and Fishery, 38pp. Refsgaard, J.C., 1996. Terminology, modelling protocol and classi®cation of hydrological model codes. In: Abbott, M.B., Refsgaard, J.C. (Eds.). Distributed Hydrological Modelling. Kluwer Academic, pp. 17±39. Refsgaard, J.C., Storm, B., 1995. MIKE SHE. In: Singh, V.P. (Ed.). Computer Models of Watershed Hydrology. Water Resources Publication, pp. 809±846. Refsgaard, J.C., Ramaekers, D., Heuvelink, G.B.M., Schreurs, V., Kros, H., RoseÂn, L., Hansen, S., 1998. Assessment of cumulative uncertainty in spatial decision support systems: application to examine the contamination of groundwater from diffuse sources (UNCERSDSS). Presented at the European Climate Science Conference, Vienna, 19±23 October, 1998. To appear in conference proceedings. Refsgaard, J.C., Thorsen, M., Birk Jensen, J., Kleeschulte, S., Hansen, S., 1999. Large scale modelling of groundwater contamination from nitrogen leaching. Journal of Hydrology 221, 117±140. Simmelsgaard, S.E., 1991. Estimating functions for nitrogen leaching: nitrogen fertilizers in agriculture Ð requirement and leaching now and in the future. National Institute of Agricultural Economics, Copenhagen, Denmark (in Danish). Skop, E., 1993. Calculation of nitrogen leaching on a regional scale. Technical report no. 65. National Environmental Research Institute, Silkeborg, Denmark, 54 pp (in Danish). Smith, L., Freeze, R.A., 1979a. Stochastic analysis of steady state ¯ow in a bounded domain. 1. One-dimensional simulations. Water Resources Research 15 (3), 521±528. Smith, L., Freeze, R.A., 1979b. Stochastic analysis of steady state
M. Thorsen et al. / Journal of Hydrology 242 (2001) 210±227 ¯ow in a bounded domain. 2. Two-dimensional simulations. Water Resources Research 15 (6), 1543±1559. Smith, R.E., 1992. An integrated simulation model of nonpoint-source pollutants at the ®eld scale. Department of Agriculture, Agricultural Research Service, 120pp. Storm, B., Jensen, K.H., Refsgaard, J.C., 1988. Estimation of catchment rainfall uncertainty and its in¯uence on runoff prediction. Nordic Hydrology 19, 77±88. Styczen, M., Storm, B., 1993. Modelling of N-movements on catchment scale Ð a tool for analysis and decision making. 1. Model description. & 2. A case study. Fertiliser Research 36, 1±17. Tietje, O., Tapkenhinrichs, M., 1993. Evaluation of pedo-transfer
227
functions. Soil Science Society of America Journal 57, 1088± 1095. Wood, E., O'Connell, P.E., 1985. Real-time forecasting. In: Anderson, M.G., Burt, T.P. (Eds.). Hydrological Forecasting. Wiley, New York, pp. 505±558. Wood, E.F., Sivapalan, M., Beven, K.J., Band, L., 1988. Effects of spatial variability and scale with implications to hydrologic modelling. Journal of Hydrology 102, 29±47. Zhang, H., Haan, C.T., Nofziger, D.L., 1993. An approach to estimating uncertainties in modelling transport of solutes through soils. Journal of Contaminant Hydrology 12, 35±50.
[12]
Refsgaard JC, Henriksen HJ (2004) Modelling guidelines – terminology and guiding principles. Advances in Water Resources, 27(1), 71-82.
Reprinted from Advances in Water Resources with permission from Elsevier
Advances in Water Resources 27 (2004) 71–82 www.elsevier.com/locate/advwatres
Modelling guidelines––terminology and guiding principles Jens Christian Refsgaard *, Hans Jørgen Henriksen Department of Hydrology, Geological Survey of Denmark and Greenland (GEUS), Øster Voldgade 10, Copenhagen DK-1350, Denmark Received 29 October 2002; received in revised form 7 August 2003; accepted 18 August 2003
Abstract Some scientists argue, with reference to Popper’s scientific philosophical school, that models cannot be verified or validated. Other scientists and many practitioners nevertheless use these terms, but with very different meanings. As a result of an increasing number of examples of model malpractice and mistrust to the credibility of models, several modelling guidelines are being elaborated in recent years with the aim of improving the quality of modelling studies. This gap between the views and the lack of consensus experienced in the scientific community and the strongly perceived need for commonly agreed modelling guidelines is constraining the optimal use and benefits of models. This paper proposes a framework for quality assurance guidelines, including a consistent terminology and a foundation for a methodology bridging the gap between scientific philosophy and pragmatic modelling. A distinction is made between the conceptual model, the model code and the site-specific model. A conceptual model is subject to confirmation or falsification like scientific theories. A model code may be verified within given ranges of applicability and ranges of accuracy, but it can never be universally verified. Similarly, a model may be validated, but only with reference to sitespecific applications and to pre-specified performance (accuracy) criteria. Thus, a model’s validity will always be limited in terms of space, time, boundary conditions and types of application. This implies a continuous interaction between manager and modeller in order to establish suitable accuracy criteria and predictions associated with uncertainty analysis. 2003 Elsevier Ltd. All rights reserved. Keywords: Model guidelines; Scientific philosophy; Validation; Verification; Confirmation; Domain of applicability; Uncertainty
1. Introduction Models describing water flows, water quality and ecology are being developed and applied in increasing number and variety. With the requirements imposed by the EU Water Framework Directive the trend in recent years to base water management decisions to a larger extent on model studies and to use more sophisticated models is likely to be reinforced. At the same time insufficient attention is generally given to documenting the predictive capability of the models. Therefore, contradictions emerge regarding the various claims of model applicability on the one hand and the lack of documentation of these claims on the other hand. Hence, the credibility of the models is often questioned, and sometimes with good reason. As emphasised by e.g. Forkel [12] modelling studies involve several partners with different responsibilities. *
Corresponding author. Tel.: +45-38-14-27-76; fax: +45-38-14-20-
50. E-mail address:
[email protected] (J.C. Refsgaard). 0309-1708/$ - see front matter 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.advwatres.2003.08.006
The Ôkey players’ are code developers, model users and water resources managers. However, due to the complexity of the modelling process and the different backgrounds of these groups, gaps in terms of lack of mutual understanding easily develop. For example, the strengths and limitations of modelling applications are most often difficult, if not impossible, to assess by the water resources managers. Similarly, the transformation of water managers’ objectives to specific performance criteria can be very difficult to assess for the model users. Due to lack of documentation and transparency, modelling projects can be difficult to audit, and without a considerable effort it is hardly possible to reconstruct, repeat and reproduce the modelling process and its results. In the water resources management community a number of different guidelines on good modelling practise have been prepared. One of the most, if not the most, comprehensive examples of modelling guidelines has been developed in The Netherlands [37] as a result of a process involving all the main players in the Dutch water management field. The background for this process was a perceived need for improving the quality in modelling
72
J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82
by addressing malpractice such as careless handling of input data, insufficient calibration and validation and model use outside its scope [34]. Similarly, the background for modelling guidelines for the Murray–Darling Basin in Australia was a perception among the end-users that model capabilities may have been Ôover-sold’, and that there is a lack of consistency in approaches, communication and understanding among and between modellers and water resources managers, often resulting in considerable uncertainty for decision making [25]. A key problem in relation to establishment of generally acceptable modelling guidelines is confusion on terminology. For example the terms validation and verifications are used with different, and some times interchangeable, meaning by different authors. The confusion arises from both semantic and philosophical considerations [32]. Another important problem is the lack of consensus related to the so far non-conclusive debate on the fundamental question concerning whether a water resources model can be validated or verified, and whether it as such can be claimed to be suitable or valid for particular applications [3,11,16,20,26]. Finally, modelling guidelines have to reflect and be in line with the underlying philosophy of environmental modelling which have changed significantly during the past decades from what in retrospect may be called rather naive enthusiasms (see for example Freeze and Harlan [13]; Abbott [1]––many of us focussed on the huge potentials of sophisticated models outlined in these early days without reflecting too much on the associated limitations) to what now appears to be a much more balanced and mature view (e.g. Beven [7,9]). Thus, there is a gap between the theory and practice, i.e. between the various, contradictory views and the lack of a common terminology and methodology in the scientific community on the one side, and the need of having quality assurance guidelines for practical model applications on the other side. The objective of the present paper is to establish guiding principles for quality assurance guidelines, including establishing a consistent terminology and a foundation for a methodology bridging the gap between scientific philosophy and pragmatic modelling.
2. Key opinions in the scientific community The present paper does not attempt to provide a full review of all relevant papers on this subject. Rather, it provides a review of a few selected characteristic examples. 2.1. Terminology No unique and generally accepted terminology and methodology exist at present in the scientific community
with respect to modelling protocol and guidelines for good modelling practise. Examples of general methodologies exist [4,32,33], but they use different terminology and have significant differences with respect to the underlying scientific philosophy. A rigorous and comprehensive terminology for model credibility was presented by Schlesinger et al. [33]. This terminology was developed by a committee composed of members from diverse disciplines and background with the intent that it could be employed in all types of simulation applications. In regard to terminology, distinctions are made between model qualification (adequacy of conceptual model), model verification (adequacy of computer programme) and model validation (adequacy of site-specific model). With the exception of a few important terms, such as generic model code and model calibration, which are not considered by Schlesinger et al. [33], their proposed terminology includes all the important elements of the modelling process. Konikow and Bredehoeft [20], in their thought provoking paper, express the view that ‘‘the terms validation and verification have little or no place in groundwater science; these terms lead to a false impression of model capability’’. Their main argument relates to the anti-positivistic view that a theory (in this case a model) can never be proved to be generally valid, but may in contrary be falsified by just one example. They argue and recommend that the term history matching, which does not indicate a claim of predictive capability, should be used instead. Oreskes et al. [26], in their classic and philosophically based paper, distinguish between verification, validation and confirmation: • Verify is ‘‘an assertion or establishment of truth’’. To verify a model therefore means to demonstrate its truth. According to the authors ‘‘verification is only possible in closed systems in which all the components of the system is established independently and are known to be correct. In its application to models of natural systems, the term verification is highly misleading. It suggests a demonstration of proof that is simply not accessible’’. They argue that mathematical components are subject to verification, because they are part of closed systems, but numerical models in application cannot be verified because of uncertainty of input parameters, scaling problems and uncertainty in observations. • The term validation is weaker than the term verification. Thus validation does not necessarily denote an establishment of truth, but rather ‘‘the establishment of legitimacy, typically given in terms of contracts, arguments and methods’’. They argue that ‘‘the term valid may be useful for assertions about a generic model code but is clearly misleading if used to refer to actual model results in any particular realisation’’.
J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82
• The term confirmation is weaker than the terms verification and validation. It is used with regard to a theory, when it is found that the theory is in agreement with empirical observations. As discussed below such agreement does not prove that the theory is true, it only confirms it. Oreskes et al. [26] do not define how the terms verification and validation should be used, but rather define their meaning and set limitations to the contexts in which they meaningfully can be used. An important distinction is made between open and closed systems. A system is a closed system if its true conditions can be predicted or computed exactly. This applies to mathematics and mostly to physics and chemistry. Systems where the true behaviour cannot be computed due to uncertainties and lack of knowledge on e.g. input data and parameter values are called open systems. The systems we are dealing with in water resources management, based on geosciences, biology and socio-economy, are open systems. It may be argued that e.g. the behaviour of a groundwater flow system can be predicted correctly if all the details of the subsurface (soil system and geological system) media were known, because the fundamental physical laws governing the flow are known. However, in practice it will never be possible to know all the details of the media down to molecular scale, and hence uncertainties will always exist. For instance, several alternative representations of the subsurface system at microscopic scale will be able to provide the same flow field at a macroscopic scale. Therefore, the results from a groundwater flow model are said to be nonunique. In addition, as the system is a so-called open system, the boundary conditions generate further uncertainty. Matalas et al. [24] draw a distinction between the terms Ômodel’ and Ôtheory’. They state that ‘‘a theory represents a synthesis of understanding, which provides not only a description of what constitutes the states of the system and their connectedness (i.e. postulated concepts), but also deducted consequences from these postulates. A model is an analogy or an abstraction, which . . . may be derived intuitively and without formal deductive capability’’. Rykiel [32] argues that models can be validated as acceptable for pragmatic purposes, whereas theoretical validity is always provisional. In this respect he, like Matalas et al. [24], distinguishes between scientific models and predictive (engineering) models. Scientific models can be corroborated (confirmed) or refuted (falsified) in the sense of hypothesis testing, while predictive models can be validated or invalidated in the sense of engineering performance testing. Thus according to Rykiel [32], validation is not a procedure for testing scientific theory or for certifying the Ôtruth’ of
73
current scientific understanding, but rather a testing of whether a model is acceptable for its intended use. Within the hydraulic engineering community attempts have been made to establish a common quality assurance methodology IAHR [18]. The IAHR methodology comprises guidelines for standard validation documents, where validation of a software package is considered in four steps [10,23]: conceptual validation, algorithmic validation, software validation and functional validation. It is noted that the term validation in the IAHR methodology corresponds to what other authors call code verification, while schemes for validation of site-specific models are not included. 2.2. Scientific philosophical aspects of verification and validation Different principal schools of philosophical thought exist on the issue of verification and validation. During the second half of the 19th century and the first half of the 20th century positivism was the dominant philosophical school. Matalas et al. [24] characterises the positivistic school in the following way: ‘‘. . . theories are proposed through inductive logic, and the proposed theories are confirmed or refuted on the basis of critical experiments designed to verify the consequences of the theories. And through theory reduction or adoption of new or modified theories, science is able to approach truth’’. The logic rationale behind positivism is the inductive method, i.e. the inference from singular statements, such as accounts of results of observations or experiments, to universal statements, such as hypothesis or theories. Popper [29] opposed the positivistic school arguing that science is deductive rather than inductive, and that theories cannot be verified, only falsified. The deductive method implies inferences from a universal statement to a singular statement, where conclusions are logically derived from given premises. Science is considered as a hypothetico-deductive activity, implying that empirical observations must be framed as deductive consequences of a general theory or scientific law. If the observations can be shown to be true then the theory or law is said to be corroborated. Popper used the term corroborate instead of confirmation, because he ‘‘wanted a neutral term to describe the degree to which a theory has stood up to severe tests and proved its mettle’’. The greater the number and diversity of confirming observations the more credible the theory or law becomes. But no matter how much data and how many confirmations we have, there will always be the possibility that more than one theory can explain the observations. Over time the false theories are likely to be confronted with observations that falsify them. Thus, scientific theories are never certain or proved but only hypotheses subject to corroboration or falsification.
74
J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82
Popper [29] distinguished between two kinds of universal statements: the Ôstrictly universal’ and the Ônumerical universal’. The strictly universal statements are those usually dealt with when speaking about theories or natural laws. They are a kind of Ôall-statement’ claiming to be true for any place and any time. In contrary numerical universal statements refers only to a finite class of specific elements within a finite individual spatio-temporal region. A numerical universal statement is thus in fact equivalent to conjunctions of singular statements. Kuhn [21] also strongly criticised positivism, and in a discussion of selection of correct scientific theories (paradigms) states ‘‘. . . few philosophers of science still seek absolute criteria for the verification of scientific theories. Noting that no theory can ever be exposed to all possible relevant tests, they ask not whether a theory has been verified but rather about its probability in the light of the evidence that actually exists. And to answer that question one important school is driven to compare the ability of different theories to explain the evidence at hand.’’ According to the deductive approach a given system is reduced into elements or sub-systems that are closed, i.e. without uncertainties from the boundary or initial conditions, and a given hypothesis is then confirmed by use of causal relationships and rigouristic logic. The deductive method is the traditional scientific philosophy and methodology for Ôexact sciences’ such as physics and chemistry. Hansen [15] and Baker [5] argue that this deductive or Ôtheory-directed’ scientific method is not suitable to earth sciences, such as geology and biology, which are characterised by open systems, and where many of the signs in the historical development process are not preserved. Instead, they argue for another scientific method, which they, respectively, denote Ôholistic’ or Ôearth-directed’. The earth-directed scientific method does not focus on idealised theories verified in experimental laboratories. Instead, it is oriented towards observations in nature, uncontrolled by artificial constraints. The earth-directed method, being more Ôsoft’ and accepting conclusions on the complex state of nature from an integration of many observations, but without the logical rigorous proof required by the deductive method, can be argued to be well in line with Popper’s philosophy where the scientific knowledge comprises a variety of falsifiable theories that are subject to tests against observations [15]. 2.3. Philosophy of environmental modelling Following several papers (ranging from Beven [6] to [7]) with comprehensive critique against the predominant philosophy underlying most environmental modelling, Beven [9] outlines a new philosophy for modelling of environmental systems. The basic aim of this new
approach is to extend the most common, past approach with a more realistic account of uncertainty rejecting the idea of being able to identify only one optimal model as being the most reliable for a given case. His basic idea is in line with Oreskes et al. [26] that verification and validation of environmental models is impossible, because natural systems are open. Instead environmental models may be non-unique subject to only a conditional confirmation, due to e.g. errors in model structure, calibration of parameters and period of data used for evaluation. Due to this there will always be the possibility of equifinality in that many different model structures and parameter sets may give simulations that cannot be falsified from the available observational data. Beven therefore argues that the range of behavioural models (structures and parameter sets) is best represented in terms of mapping of the Ôlandscape space’ into the Ômodel space’, and that uncertainty predictions should consider all the behavioural models.
3. Proposed terminology and methodological framework The following terminology is inspired by the generalised terminology for model credibility proposed by Schlesinger et al. [33], but modified and extended to accommodate some of the scientific philosophical issues raised above. The simulation environment is divided into four basic elements as shown in Fig. 1. The inner arrows describe the processes that relate the elements to each other, and the outer circle refers to the procedures that evaluate the credibility of these processes. In general terms a model is understood as a simplified representation of the natural system it attempts to describe. However, in the terminology proposed below a distinction is made between three different meanings of the general term model, namely the conceptual model, the model code and the model that here is defined as a site-specific model. The most important elements in the terminology and their interrelationships are defined as follows: Reality: The natural system, understood here as the study area. Conceptual model: A description of reality in terms of verbal descriptions, equations, governing relationships or Ônatural laws’ that purport to describe reality. This is the user’s perception of the key hydrological and ecological processes in the study area (perceptual model) and the corresponding simplifications and numerical accuracy limits that are assumed acceptable in order to achieve the purpose of the modelling. A conceptual model thus includes both a mathematical description (equations) and a descriptions of flow processes, river system elements, ecological structures, geological features, etc. that are required for the particular purpose of modelling. By drawing an analogy to the scientific
J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82
75
Fig. 1. Elements of a modelling terminology. Modified after Schlesinger et al. [33].
philosophical discussion above the conceptual model in other words constitutes the scientific hypothesis or theory that we assume for our particular modelling study. Model code: A mathematical formulation in the form of a computer program that is so generic that it, without program changes, can be used to establish a model with the same basic type of equations (but allowing different input variables and parameter values) for different study areas. Model: A site-specific model established for a particular study area, including input data and parameter values. Model confirmation: Determination of adequacy of the conceptual model to provide an acceptable level of agreement for the domain of intended application. This is in other words the scientific confirmation of the theories/hypotheses included in the conceptual model. Code verification: Substantiation that a model code is in some sense a true representation of a conceptual model within certain specified limits or ranges of application and corresponding ranges of accuracy. Model calibration: The procedure of adjustment of parameter values of a model to reproduce the response of reality within the range of accuracy specified in the performance criteria. Model validation: Substantiation that a model within its domain of applicability possesses a satisfactory range of accuracy consistent with the intended application of the model. Model set-up: Establishment of a site-specific model using a model code. This requires, among other things, the definition of boundary and initial conditions and parameter assessment from field and laboratory data.
Simulation: Use of a validated model to gain insight into reality and obtain predictions that can be used by water managers. This includes insight into how reality can be expected to respond to human interventions. In this connection uncertainty assessments of the model predictions are very important. Performance criteria: Level of acceptable agreement between model and reality. The performance criteria apply both for model calibration and model validation. Domain of applicability (of conceptual model): Prescribed conditions for which the conceptual model has been tested, i.e. compared with reality to the extent possible and judged suitable for use (by model confirmation). Domain of applicability (of model code): Prescribed conditions for which the model code has been tested, i.e. compared with analytical solutions, other model codes or similar to the extent possible and judged suitable for use (by code verification). Domain of applicability (of model): Prescribed conditions for which the site-specific model has been tested, i.e. compared with reality to the extent possible and judged suitable for use (by model validation). The credibility of the descriptions or the agreements between reality, conceptual model, model code and model are evaluated through the terms confirmation, verification, calibration and validation. Thus, the relation between reality and the scientific description of reality which is constituted by the conceptual model with its theories and equations on flow and transport processes, its interpretation of the geological system and ecosystem at hand, etc., is evaluated through the confirmation of the conceptual model. As a logical consequence of our
76
J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82
position on scientific methodology, we use the term confirmation in connection with conceptual model. This implies that we agree that it is never possible to prove the truth of a theory/hypothesis and as such of a conceptual model. And even if a site-specific model is eventually accepted as valid for specific conditions, this is not a proof that the conceptual model is true, because, due to non-uniqueness, the site-specific model may turn out to perform right for the wrong reasons. Methods for conceptual model confirmation should follow the standard procedures for confirmation of scientific theories. This implies that conceptual models should be confronted with actual field data and be subject to critical peer reviews. Furthermore, the feedback from the calibration and validation process may also serve as a means by which one or a number of alternative conceptual model(s) may be either confirmed or falsified. The ability of a given model code to adequately describe the theory and equations defined in the conceptual model by use of numerical algorithms is evaluated through the verification of the model code. Use of the term verification in this respect is in accordance with Oreskes et al. [26], because mathematical equations are closed systems. The methodologies used for code verification include comparing a numerical solution with an analytical solution or with a numerical solution from other verified codes. However, some programme errors only appear under circumstances that do not routinely occur, and may not have been anticipated. Furthermore, for complex codes it is virtually impossible to verify that the code is universally accurate and error-free. Therefore, the term code verification must be qualified in terms of specified ranges of application and corresponding ranges of accuracy. A code may be applied outside its documented ranges of application, but in such cases it must not carry the label Ôverified’ and caution should be expressed with respect to its results. The application of a model code to be used for setting up a site-specific model is usually associated with model calibration. The model performance during calibration depends on the quantity and quality of the available input and observation data as well as on the conceptual model. If sufficient accuracy cannot be achieved either the conceptual model and/or the data have to be reevaluated. A discussion of the problems and methodologies in model calibration is provided by Gupta et al. [14]. Often the model performance during calibration is used as a measure of the predictive capability of a model. This is a fundamental error. Many studies (e.g. Refsgaard and Knudsen [31]; Liden [22]) have demonstrated that the model performance against independent data not used for calibration is generally poorer than the performance achieved in the calibration situation. Therefore, the credibility of a site-specific model’s
capability to make predictions about reality must be evaluated against independent data. This process is denoted model validation. In designing suitable model validation tests a guiding principle should be that a model should be tested to show how well it can perform the kind of task for which it is specifically intended [19]. This implies for instance that for the case where a model is intended to be used for conditions similar to conditions where test data exist, such as extension of streamflow records, a standard split-sample test may be applied. However, models are often intended to be used as management tools to help answer questions such as: What happens to the water resources if land use is changed? In such case no site-specific test data exist and the question of defining a validation test scheme becomes non-trivial.
4. Discussion 4.1. Scientific philosophical aspects The fundamental view expressed by scientific philosophers is that verification and validation of numerical models of natural systems is impossible, because natural systems are never closed and because the mapping of model results are always non-unique [26]. Thus, seen from a theoretical point it is tempting to conclude that the establishment of modelling guidelines comprising these terms simply is not possible. On the other hand, there is a large and increasing need to establish guidelines to improve the quality of modelling, and such guidelines need to address the issues of verification and validation in order to be operational in practise. Irrespective of what the scientific community decides regarding terminology and validation methodology, including the associated philosophical aspects, models are being used more and more to support water resources management in practise. As long as the present situation continues, characterised by a large degree of confusion on terminology and methodology, the potential benefits of using models are severely constrained. They are often subject to either Ôoverselling’ or Ômistrust’, and misunderstandings between model users and water resources managers may easily occur in the absence of a commonly accepted and understood Ôlanguage’. Thus, establishment of a terminology and methodology that bridge the gap between scientific philosophy and pragmatic modelling is a key challenge and an important one. This gap between a scientific philosophical and a pragmatic modelling position is also clearly reflected in the dialogue between Konikow and Bredehoeft [20] and De Marsily et al. [11]. Following the Popperian school, Konikow and Bredehoeft [20] express the view that ‘‘the terms validation and verification have little or no place
J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82
in ground-water science; these terms lead to a false impression of model capability’’. De Marsily et al. [11], in a response, argue for a more pragmatic view: ‘‘. . . using the model in a predictive mode and comparing it with new data is not a futile exercise; it makes a lot of sense to us. It does not prove that the model will be correct for all circumstances, it only increases our confidence in its value. We do not want certainty; we will be satisfied with engineering confidence.’’ With regard to scientific methodology we fundamentally agree with the views of Popper [29] and the earth-directed theoretical method described by Baker [5]. Consequently, we agree with the view of Oreskes et al. [26], Konikow and Bredehoeft [20] and many others that it is not possible to carry out model verification or model validation, if these terms are used without restriction to domains of applicability and levels of accuracy. The restrictions in use of the terms confirmation, verification and validation imposed by the respective domains of applicability imply, according to Popper’s views, that the conceptual model, model code and site-specific models can only be classified as numerical universal statements as opposed to strictly universal statements. This distinction is fundamental for our proposed methodology and its link to scientific philosophical theories. 4.2. Model confirmation, verification and validation An important aspect of our proposed methodology lies in the separation between the three different Ôversions’ of the word model, namely the conceptual model, the model code and the site-specific model. This separation is in line with Matalas et al. [24] and Rykiel [32], who distinguish between the theory (conceptual model) and the engineering model (the site-specific model). Similarly, Schlesinger et al. [33] distinguish between conceptual model and computerised model. Schlesinger et al. [33], Matalas et al. [24] and Rykiel [32] do not separate the model code from the site-specific model. Due to this distinction it is possible, at a general level, to talk about confirmation of a theory or a hypothesis about how nature can be described using the relevant scientific method for that purpose, and, at a site-specific level, to talk about validity of a given model within certain domains of applicability and associated with specified accuracy limits. As Beven [9] argues we need to distinguish between our qualitative understanding (perceptual model) and the practical implementation of that understanding in our conceptual model. As we have defined a conceptual model as combination of a perceptual model and the simplifications acceptable for a particular model study a conceptual model becomes site-specific and even case specific. For example a conceptual model of a ground-
77
water aquifer may be described as two-dimensional for a study focussing on regional groundwater heads, while it may need to include more complex three-dimensional geological structures for detailed simulation of solute transport studies. Confirmation of a conceptual model is a non-trivial issue. It is hardly possible to prescribe general test procedures, in particular not exact tests. Conceptual models are more difficult in some domains than in others. For example, the process descriptions/equations and the actual system is relatively easily identifiable in a hydrodynamic river flow system as compared to a groundwater system or an ecosystem, because the geology will never be completely known in a groundwater system and the biological processes may not be well known in an ecosystem. The more complex and difficult the conceptual model becomes the more Ôsoft’ the confirmation tests may turn out to be. Thus, expert knowledge in terms of peer reviews may be an important element of such tests. In cases where considerable uncertainty exists in the conceptual model, the possibility of testing alternative conceptual models should be promoted. An example of this is given by Troldborg [35], who reports a study where three scientists developed alternative geological interpretations for the same area, and three numerical groundwater models were set-up and calibrated on this basis. During this process, or in the subsequent validation phase, one or more of these models may turn out to perform so poorly that the underlying conceptual model has to be rejected. This approach of building the uncertainty of our knowledge of reality into alternative conceptual models, which are subsequently subject to a confirmation test, is fully in line with Popper’s scientific philosophical school. Unfortunately, this is very seldom pursued in practise. Code verification is not an activity that is carried out from scratch in every modelling study. In a particular study it has to be ascertained that the domain of applicability for which the selected model code has been verified covers the conditions specified in the actual conceptual model. If that is not the case, additional verification tests have to be conducted. Otherwise, the code explicitly must be classified as not verified for this particular study, and the subsequent simulation results therefore have to be considered with extra caution. Establishment of validation test schemes for the situations, where the split-sample test is not sufficient, is an area, where limited work has been carried out so far. The only rigorous and comprehensive methodology reported in literature is that of Klemes [19]. He proposed a systematic scheme of validation tests, where a distinction is made between simulations conducted for the same catchment as was used for calibration (split-sample test) and simulations conducted for ungauged catchments (proxy-basin tests). He also distinguished between
78
J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82
cases where catchment conditions such as climate, land use and ground water abstraction are stationary (splitsample test) and cases where they are not (differential split-sample test). A further discussion, including examples, of Klemes’s test scheme is given in Refsgaard [30]. The two key principles are: (a) the validation tests must be carried out against independent data, i.e. data that have not been used during calibration, and (b) the model should be tested to show how good it can perform the kind of task for which it is specifically intended to be applied subsequently. This implies e.g. that multi-site validation is needed if predictions of spatial patterns are required, and multi-variable checks are required if predictions of the behaviour of individual subsystems within a catchment is needed. Thus, a model should only be assumed valid with respect to outputs that have been explicitly validated. This means for instance that a model which is validated against catchment runoff cannot automatically be assumed valid also for simulation of erosion on a hillslope within the catchment, because smaller scale processes may dominate here; it will need validation against hillslope soil erosion data. From a theoretical point of view the procedures outlined by Klemes [19] for the proxy-basin and the differential split-sample tests, where tests have to be carried out using data from similar catchments, are weaker than the usual split-sample test, where data from the specific catchment are available. However, no obviously better testing schemes exist. Therefore, this will have to be reflected in the performance criteria in terms of larger expected uncertainties in the predictions. It must be realised that the validation test schemes proposed above are so demanding that many applications today would fail to meet them. Thus, for many cases where either proxy-basin and differential splitsample tests are required, suitable test data simply do not exist. This is for example the case for prediction of regional scale transport of potential contamination from underground radionuclide deposits over the next thousands of years. In such case model validation is not possible. This does not imply that these modelling studies are not useful, only that their output should be recognised to be somewhat more uncertain than is often stated and that the term Ôvalidated model’ should not be used. Thus, a model’s validity will always be confined in terms of space, time, boundary conditions, types of application, etc. According to the methodology, model validation implies substantiating that a site-specific model can produce simulation results within the range of accuracy specified in the performance criteria for the particular study. Hence, before carrying out the model calibration and the subsequent validation tests quantitative performance criteria must be established. In determining the acceptable level of accuracy a trade-off will, either explicitly or implicitly, have to be made between costs,
in terms of data collection and modelling work, and associated benefits that can be obtained due to more accurate model results. Consequently, the acceptable level of accuracy will vary from case to case and must be seen in a socio-economic context. It should therefore usually not be defined by the modeller, but in a dialogue between the modeller and the manager. 4.3. Need for interaction between manager, code developer and modeller As discussed above, the validation methodologies presently used, even in research projects, are generally not rigorous and far from satisfactory. At the same time models are being used in practise and daily claims are being made on validity of models and on the basis of, at the best, not very strict and rigorous test schemes. An important question then, is how can the situation be improved in the future? As emphasised by Forkel [12] improvements cannot be achieved by the research community alone, but requires an interaction between the three main Ôplayers’, namely water resources managers, code developers and model users (modellers). The key responsibilities of the water resources manager are to specify the objectives and define the acceptance limits of accuracy performance criteria for the model application. Furthermore, it is the manager’s responsibility to define requirements for code verification and model validation. In many consultancy jobs accuracy criteria and validation requirements are not specified at all, with the result being that the model user implicitly defines them in accordance with the achieved model results. In this respect it is important in the terms of references for a given model application to ensure consistency between the objectives, the specified accuracy criteria, the data availability and the financial resources. In order for the manager to make such evaluations, some knowledge on the modelling process is required. The model user has the responsibility for selection of a suitable code as well as for construction, calibration and validation of the site-specific model. In particular, the model user is responsible for preparing validation documents in such a way that the domain of applicability and the range of accuracy of the model are explicitly specified. Furthermore, the documentation of the modelling process should ideally be done in enough detail that it can be repeated several years later, if required. The model user has to interact with the water resources manager on assessments of realistic model accuracies. Furthermore, the model user must be aware of the capabilities and limitations of the selected code and interact with the code developer with regard to reporting of user experience such as shortcomings in documentation, errors in code, market demands for extensions, etc.
J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82
The key responsibilities of the developer of the model code are to develop and verify a model code. In this connection it is important that the capabilities and limitations of the code appear in the documentation. As code development is a continuous process, code maintenance and regular updating with new versions improved as a response to user reactions become important. Although a model code should be comprehensively documented, there will in practise always occur doubts once in a while on its functioning, even for experienced users. Hence, active support to and dialogue with model users are crucial for ensuring operational model applications at a high professional level. 4.4. Performance criteria––when is a model good enough? A critical issue in relation to the methodological framework is how to define the performance criteria. We agree with Beven [9] that any conceptual model is known to be wrong and hence any model will be falsified if we investigate it in sufficient detail and specify very high performance criteria. Clearly, if one attempts to establish a model that should simulate the truth it would always be falsified. However, this is not a very useful information. Therefore, we are using the conditional validation, or the validation restricted to domain of applicability (or numerical universal as opposed to strictly universal in Popperian terms). The good question is then what is good enough? Or in other words what are the criteria? How do we select them? A good reference for model performance is to compare it with uncertainties of the available field observations. If the model performance is within this uncertainty range we often characterise the model as good enough. However, usually it is not so simple. How wide confidence bands do we accept on observational uncertainties––ranges corresponding to 65%, 95% or 99%? Do we always then reject a model if it cannot perform within the observational uncertainty range? In many cases even results from less accurate models may be very useful. Therefore, our answer is that the decision on what is good enough generally must be taken in a socio-economic context. For instance, the accuracy requirements to a model to be used for an initial screening of alternative options for location of a new small well field for a small water supply will be much smaller than the requirements to a model that is intended to be used for the final design of a large well field for a major water supply in an area with potential damaging effects on precious nature and other significant conflicts of interests. Thus, we believe that the accuracy criteria cannot be decided universally by modellers or researchers, but must be different from case to case depending on how much is at stake in the decision to depend on the support from model predictions. This implies that the perfor-
79
mance criteria must be discussed and agreed between the manager and the modeller beforehand. However, as the modelling process and the underlying study progresses with improved knowledge on the data and model uncertainties as well as on the risk perception of the concerned stakeholders it may well be required to adjust the performance criteria in a sort of adaptive project management context [27]. 4.5. The role of uncertainty assessments Should we then trust a model if it happens to pass a validation test? Are we sure that this model is the best one and that the underlying conceptual basis and input data are basically correct? Yes on the one hand, in such case we may trust a model as a suitable tool to make predictions through model simulations. But on the other hand, we can never be sure that a model that passes a validation test will have a sound conceptual basis. It could be right for the wrong reasons, e.g. by compensating error in conceptual model (model structure) with errors in parameter values. And we know that it would be possible to find many other models that can pass the validation test, and that it would not be possible beforehand to identify one of these models as the best one in all respects. Having realised this equifinality problem the relevant question is what we should do to address it in practical cases. In this respect our framework prescribes that model predictions (see definition of Ôsimulation’ in Section 3) made subsequent to passing a validation test should include uncertainty assessments. Hence, we basically agree with Beven [9] that uncertainty assessments are necessary, and that such uncertainty analyses should include uncertainty on model structure, parameter values etc. Different methodologies exist for conducting uncertainty assessments, e.g. Beven [8] and Van Asselt and Rotmans [36]. 5. Guiding principles and future perspectives for modelling guidelines 5.1. Guiding principles In our opinion the two key factors causing the poor quality of the modelling work in practise are: (a) too poor quality of the modelling work done by practitioners (inadequate use of guidelines and quality assurance procedures and inadequate role play between manager (client) and modeller (consultant)) and (b) lack of data and methodology in the hydrological science. Modelling guidelines like [25,37] almost exclusively address the former issue while scientific literature like [7,9] focus on the latter issue. In our opinion it is crucial that the two lines of action are combined. This implies that we need to define modelling guidelines that are both operational
80
J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82
in practise and scientifically founded. The framework we have described here attempts to establish one such a bridge between the two fields, i.e. pragmatic modelling and natural science. An important aspect of this framework is in a scientifically consistent way to enable the manager and the modeller to make the compromises that are required in practise. On this background the following five key principles for pragmatic modelling have emerged: • A terminology that is internally consistent. We acknowledge that many authors in the scientific literature use different terminology and that, in particular, some authors do not use the terms verification and validation. However, these terms are also widely used, and we need in practise to have understandable terms for these operations. Thus, with the clear distinction between conceptual model, model code and site-specific model and the restrictions to domains of applicability (numerical universal in Popperian sense) we believe that our terminology is in accordance with the main stream of scientific philosophy. • We never talk about universal code verification or universal model validation, but always restrict these terms to clearly defined domains of applicability. This is a necessary assumption for the consistency of the terminology and methodology and must be emphasised explicitly in any guidelines. • Validation tests against independent data that have not also been used for calibration are necessary in order to be able to document the predictive capability of a model. • Model predictions achieved through simulation should be associated with uncertainty assessments where amongst others the uncertainty in model structure and parameter values should be accounted for. • A continuous interaction between manager and modeller is crucial for the success of the modelling process. One of the key aspects in this regard is to establish suitable performance criteria for the model calibration and validation tests. This dialogue is also very important in connection with uncertainty assessments. 5.2. Future challenges Some of the issues dealt with in the present manuscript are still not fully explored. The four most important future challenges are: • Establishment of accuracy criteria for a modelling study is a very important issue and one where we maybe differ from most scientific literature. Modellers often establish numerical accuracy criteria in order to classify the goodness of a given model [2,17,28]. These attempts are very useful in making the performance more transparent and quantitative, but do not
provide an objective means to decide what the optimal accuracy criteria really should be in a given case. According to our framework no universal accuracy criteria can be established, i.e. it is generally not possible from a natural scientific point of view to tell when a model performance is good enough. Such acceptance criteria will vary from case to case depending on the socio-economic context, i.e. what is at stake in the decisions to be supported by the model predictions. The good question now is: how do we translate the Ôsoft’ socio-economic objectives to Ôhard-core’ model performance criteria? This is obviously a challenge that cannot be solved by natural science alone, but need to be addressed in a much broader context including aspects of economy, stakeholder interests and risk perception. Until we become better to overcome this challenge we will, however, not be able to arrive at the optimal balance between the costs of modelling and the derived societal benefits. Although this work has hardly begun yet, and we know that it is a very difficult road, we see no real alternative. • Although all experience shows that models generally perform poorer in validation tests against independent data than they do in calibration tests, model validation is in our opinion a much neglected issue, both in many modelling guidelines and in the scientific literature. Maybe many scientists have not wanted to use the term validation due to the scientific philosophically related controversies, but in any case many scientists are not advocating the need for model validation. One of the unfortunate consequences of this Ôlack of interest’ is that not much work has been devoted to developing suitable validation test schemes since Klemes [19]. In our opinion further development of suitable testing schemes and imposing them to all modelling projects is a major future challenge. • A third issue that requires considerable attention is how do we decide among alternative model structures and parameter sets (the equifinality problem). If we use multiple criteria one model may be better on one criteria and another on another criteria. In our opinion we need not necessarily chose. We know that all conceptual models are wrong and we know that wrong conceptual models are compensated by biased model parameter values through calibration. But, unless we can falsify a conceptual model directly, which is very difficult, or unless the resulting model is falsified through the validation test, this model is a possible candidate for predictions. And if several models pass the validation tests we may not be able to tell which one is the best. In such case they should all be considered suitable, and the fact that they provide different predictive results should be used as part of the uncertainty assessments. Work on this relatively
J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82
new paradigm has just begun [9] and a lot of work is still required to further develop and operationalise it. • Finally, there are many more challenges related to uncertainty in water resources management. Quality assurance and uncertainty assessments are two aspects that are very closely linked. Initially, the manager has to define accuracy criteria from a perception of which uncertainty level he believes is suitable in a particular case (see above). Subsequently, as the modelling study proceeds, the dialogue between modeller and manager has to continue with the necessary trade-off between modelling accuracy and cost of modelling study. In the uncertainty assessments it is very important to go beyond the traditional statistical uncertainty analysis. Thus, e.g. aspects of scenario uncertainty and ignorance should generally be included and in addition the uncertainties originating from data and models often needs to be integrated with socio-economic aspects in order to form a suitable basis for the further decision process [36]. Thus, like with the accuracy criteria (above) the use of uncertainty assessments in water resources management goes beyond natural science. Acknowledgements The present work was carried out within the Project ÔHarmonising Quality Assurance in model based catchments and river basin management (HarmoniQuA)’, which is partly funded by the EC Energy, Environment and Sustainable Development programme (Contract EVK2-CT2001-00097). The constructive comments and suggestions to the manuscript by the HarmoniQuA project team and by our colleague William (Bill) G. Harrar are acknowledged. Finally, the constructive criticisms by Keith Beven, University of Lancaster; Rodger Grayson, University of Melbourne and a third, anonymous referee helped to improve the manuscript significantly.
References [1] Abbott MB. The theory of the hydrological model, or: the struggle for the soul of hydrology. In: O’Kane JP, editor. Advances in theoretical hydrology. Elsevier; 1992. p. 237–54. [2] Andersen J, Refsgaard JC, Jensen KH. Distributed hydrological modelling of the Senegal River Basin––model construction and validation. J Hydrol 2001;247:200–14. [3] Anderson MG, Bates PD, editors. Model validation: perspectives in hydrological science. John Wiley and Sons; 2001. [4] Anderson MP, Woessner WW. The role of postaudit in model validation. Adv Water Resour 1992;15:167–73. [5] Baker VR. Conversing with the Earth: the geological approach to understanding. In: Frodeman R, editor. Earth matters The earth science, philosophy and the claims of community. Prentice Hall; 2000. [6] Beven K. Changing ideas in hydrology––the case of physically based models. J Hydrol 1989;105:157–72.
81
[7] Beven K. Towards an alternative blueprint for a physically based digitally simulated hydrologic response modelling system. Hydrol Process 2002;16(2):189–206. [8] Beven K, Binley AM. The future of distributed models: model calibration and uncertainty prediction. Hydrol Process 1992;6: 279–98. [9] Beven K. Towards a coherent philosophy for modelling the environment. Proc Roy Soc Lond A 2002;458(2026):2465–84. [10] Dee DP. A pragmatic approach to model validation. In: Lynch DR, Davies AM, editors. Quantitative skill assessment of coastal ocean models. Washington: AGU; 1995. p. 1–13. [11] De Marsily G, Combes P, Goblet P. Comments on ’Ground-water models cannot be validated’, by Konikow LF, Bredehoeft, JD. Adv Water Resour 1992;15:367–9. [12] Forkel C. Das numerische Modell––ein schmaler Grat zwischen vertrauensw€ urdigem Werkzeug und gef€ahrlichem Spielzeug. Presented at the 26. IWASA, RWTH Aachen, 4–5 January 1996. [13] Freeze RA, Harlan RL. Blueprint for a physically-based digitallysimulated hydrologic response model. J Hydrol 1969;9:237–58. [14] Gupta HV, Sorooshian S, Yapo PO. Toward improved calibration of hydrologic models: multiple and noncommensurable measures of information. Water Resour Res 1998;34(4):751– 63. [15] Hansen JM. The line in the sand the wave on the water––Steno’s theory on the language of nature and the limits of the knowledge. Copenhagen: Fremad; 2000. 440 pp (in Danish). [16] Hassanizadeh SM, Carrera J. Editorial, special issue on validation of geo-hydrological models. Adv Water Resour 1992;15:1–3. [17] Henriksen HJ, Troldborg L, Nyegaard P, Sonnenborg TO, Refsgaard JC, Madsen B. Methodology for construction, calibration and validation of a national hydrological model for Denmark. J Hydrol 2003;280(1–4):52–71. [18] IAHR. Publication of guidelines for validation documents and call for discussion. Int Assoc Hydraul Res Bull 1994;11:41. [19] Klemes V. Operational testing of hydrological simulation models. Hydrol Sci J 1986;31:13–24. [20] Konikow LF, Bredehoeft JD. Ground-water models cannot be validated. Adv Water Resour 1992;15:75–83. [21] Kuhn TS. The structure of scientific revolutions. Chicago: University of Chicago Press; 1962. [22] Liden R. Conceptual runoff models for material transport estimations. PhD dissertation, Report No. 1028, Lund Institute of Technology, Lund University, Sweden, 2000. [23] Los H, Gerritsen H. Validation of water quality and ecological models. Presented at the 26th IAHR Conference, London, Delft Hydraulics, 11–15 September 1995, 8 pp. [24] Matalas NC, Landwehr JM, Wolman MG. Prediction in water management. In: Scientific basis of water resource management. Washington, DC: National Research Council, National Academy Press; 1982. p. 118–27. [25] Middlemis H. Murray–Darling Basin Commission. Groundwater flow modelling guideline. Aquaterra Consulting Pty Ltd, South Perth, Western Australia. Project no. 125, 2000. [26] Oreskes N, Shrader-Frechette K, Belitz K. Verification, validation and confirmation of numerical models in the earth sciences. Science 1994;264:641–6. [27] Pahl-Wostl C. Towards sustainability in the water sector––the importance of human actors and processes of social learning. Aquat Sci 2002;64:394–411. [28] Parkin G, O’Donnell GO, Ewen J, Bathurst JC, O’Connel PE, Lavabre J. Validation of catchment models for predicting land-use and climate change impacts. 2. Case study for a Mediterranean catchment. J Hydrol 1996;175:595–613. [29] Popper KR. The logic of scientific discovery. London: Hutchingson & Co; 1959. [30] Refsgaard JC. Towards a formal approach to calibration and validation of models using spatial data. In: Grayson R, Bl€ oschl G,
82
[31]
[32] [33]
[34]
J.C. Refsgaard, H.J. Henriksen / Advances in Water Resources 27 (2004) 71–82 editors. Spatial patterns in catchment hydrology: Observations and modelling. Cambridge University Press; 2001. p. 329–54. Refsgaard JC, Knudsen J. Operational validation and intercomparison of different types of hydrological models. Water Resour Res 1996;32(7):2189–202. Rykiel ER. Testing ecological models: The meaning of validation. Ecol Modell 1996;90:229–44. Schlesinger S, Crosbie RE, Gagne RE, Innis GS, Lalwani CS, Loch J, et al. Terminology for model credibility. SCS Tech Comm Model Credibil Simul 1979;32(3):103–4. Scholten H, Van Waveren RH, Groot S, Van Geer FC, W€ osten JHM, Koeze RD, et al. Good modelling practice in water
management. Paper presented on Hydroinformatics 2000, Cedar Rapids, IA, USA, 2000. [35] Troldborg L. Effects of geological complexity on groundwater age prediction. Poster Session 62C, AGU December 2000. EOS Transactions, 81(48), F435. [36] Van Asselt MBA, Rotmans J. Uncertainty in integrated assessment modelling––from positivism to pluralism. Climat Change 2002;54(1–2):75–105. [37] Van Waveren RH, Groot S, Scholten H, Van Geer FC, W€ osten JHM, Koeze RD, et al. Good modelling practice handbook. STOWA Report 99-05, Utrecht, RWS-RIZA, Lelystad, The Netherlands. Available from: http://waterland.net/riza/aquest/.
[13]
Refsgaard JC, Henriksen HJ, Harrar WG, Scholten H, Kassahun A (2005) Quality assurance in model based water management – review of existing practice and outline of new approaches. Environmental Modelling & Software, 20, 1201-1215.
Reprinted from Environmental Modelling & Software with permission from Elsevier
Environmental Modelling & Software 20 (2005) 1201–1215 www.elsevier.com/locate/envsoft
Quality assurance in model based water management – review of existing practice and outline of new approaches Jens Christian Refsgaarda,), Hans Jørgen Henriksena, William G. Harrara, Huub Scholtenb, Ayalew Kassahunb a
Geological Survey of Denmark and Greenland (GEUS), Øster Voldgade 10, DK-1350 Copenhagen K, Denmark b Wageningen University (WU), Dreijenplein 2, 6703 HB, Wageningen, The Netherlands Received 11 December 2003; received in revised form 30 March 2004; accepted 30 July 2004
Abstract Quality assurance (QA) is defined as protocols and guidelines to support the proper application of models. In the water management context we classify QA guidelines according to how much focus is put on the dialogue between the modeller and the water manager as: (Type 1) Internal technical guidelines developed and used internally by the modeller’s organisation; (Type 2) Public technical guidelines developed in a public consensus building process; and (Type 3) Public interactive guidelines developed as public guidelines to promote and regulate the interaction between the modeller and the water manager throughout the modelling process. State-of-the-art QA practices vary considerably between different modelling domains and countries. It is suggested that these differences can be explained by the scientific maturity of the underlying discipline and differences in modelling markets in terms of volume of jobs outsourced and level of competition. The structure and key aspects of new generic guidelines and a set of electronically based supporting tools that are under development within the HarmoniQuA project are presented. Model credibility can be enhanced by a proper modeller-manager dialogue, rigorous validation tests against independent data, uncertainty assessments, and peer reviews of a model at various stages throughout its development. Ó 2004 Elsevier Ltd. All rights reserved. Keywords: Modelling guidelines; Quality assurance; Water resources management; Uncertainty; Support tools
1. Introduction Models describing water flows, water quality and ecology are being developed and applied in increasing number and variety. The trend in recent years has been to base water management decisions to a larger extent on modelling studies, and to use more sophisticated models. In Europe this trend is likely to be reinforced by the EU Water Framework Directive due to its demand for integrating groundwater, surface water, ecological
) Corresponding author. Tel.: C45 38 142 776; fax: C45 38 142 050. E-mail address:
[email protected] (J.C. Refsgaard). 1364-8152/$ - see front matter Ó 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.envsoft.2004.07.006
and economic aspects of water management at the river basin scale and due to the explicit requirement to study impacts of alternative measures (human interventions) intended to improve the ecological status in the river basin. Insufficient attention is often given to documenting the predictive capability of models. Therefore, contradictions may emerge regarding the various claims of model applicability on the one hand and the lack of documentation of these claims on the other hand. Hence, the credibility of the model is often questioned, and sometimes with good reason. Another important trend is the demand to involve different stakeholders in the water resources management process, and therefore also indirectly in the modelling process (Pahl-Wostl, 2002). This stakeholder
1202
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
involvement does not imply active participation in the technical modelling itself, but rather appears as a demand to be able to understand and review the various assumptions and their implications for the modelling results. This trend is seen at the global scale in connection with the generally accepted principles behind integrated water resources management, where public participation is a key element (GWP-TAC, 2000). In Europe, this is reflected in the EU Water Framework Directive, where it is explicitly prescribed that stakeholders and the general public should be involved in the water resources management process. The need for improving the quality of the modelling process has been emphasised by the research community, e.g. Klemes (1986), NRC (1990), Anderson and Woessner (1992), Forkel (1996), and Rykiel (1996). The recommendations made in this respect primarily focus on scientific/technical guidance on how the modeller should carry out various steps during the modelling process in order to achieve the best and most reliable results. Anderson and Bates (2001) in a discussion of model credibility and scientific integrity state that ‘‘over the last decade we have begun to have an appreciation of the need to be much more rigorous in establishing procedures for defining model credibility’’. They argue further that this demand has not evolved from the hydrological science itself due to immaturity and data limitations, but instead comes from policy makers and regulators who wish to have some kind of certification of model results. As emphasised by e.g. Forkel (1996) modelling studies involve several partners with different responsibilities. The ‘key players’ are code developers, model users and water managers. However, a lack of mutual understanding may develop due to the complexity of the modelling process and the different backgrounds of the ‘key players’. For example, the strengths and limitations of modelling applications are often difficult, if not impossible, for the water managers to assess. Similarly, the transformation of objectives defined by the water manager to specific performance criteria can be very difficult for the model users to assess. It can be difficult to audit modelling projects due to the lack of proper documentation and transparency. Furthermore, it is often difficult to reconstruct and reproduce the modelling process and its results. In the water resources management community many different guidelines on good modelling practise have been developed. One of, if not the most, comprehensive example of a modelling guideline has been developed in The Netherlands (Van Waveren et al., 2000; Scholten and Groot, 2002) as a result of a process involving all the main players in the Dutch water management field. The background for this process was a perceived need for improving the quality in modelling by addressing
malpractice issues such as careless handling of input data, insufficient calibration and validation, and model use outside its intended scope (Scholten et al., 2000). Similarly, modelling guidelines for the Murray-Darling Basin in Australia were developed due to the perception among end-users that model capabilities may have been ‘over-sold’, and that there was a lack of consistency in approaches, communication and understanding among and between the modellers and the water managers, which often resulted in considerable uncertainty for decision making (Middlemis, 2000). As pointed out by Merrick et al. (2002) good modelling practice cannot be decomposed into a set of rigid rules that can be followed without communication between modellers and water managers. Furthermore, there is a risk that modellers will not embrace guidelines aiming to inject too much consistency in the review procedure. Experiences from Australia have shown that review reports are commonly interpreted by water managers (non-modellers) as quite negative. Nonmodellers may tend to focus mainly on the negative review comments rather than balance those against the positive comments. This may mostly be the case for projects where there has not been a proper specification of the purpose and conditions at the initiation of the model study or where previous reviews during earlier project stages have been inadequate. External reviews performed at the end of a project when things may have already gone wrong may often result in defensive responses both from the modellers and the water managers (Henriksen, 2002a). All the existing modelling guidelines that we are aware of exist as reports. Electronically based support is only available as text forms to record modelling activities. No electronically based tool that is coupled to a knowledge base defining how to carry out the modelling (electronic version of guidelines with comprehensive guidance to different types of users) exists at present. This is a paradox, considering the significant resources that are invested in improving modelling software packages with respect to new sophisticated information technology. Poor modelling results may be caused by the lack of adequate model codes, or data of insufficient quantity or quality. However, according to our experience the most prevalent reason for poor modelling results is the inadequate use of guidelines and quality assurance procedures, and improper interaction between the manager (client) and the modeller (consultant). Our work has been carried out within the context of an EU supported research project (http://www.harmoniqua.org) aimed at developing a common set of quality assurance guidelines and supporting software tools. The scientific philosophical basis for the adopted terminology and guiding principles are described by Refsgaard and Henriksen (2004). The objective of the present
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
paper is to establish new approaches and outline the requirements of supporting tools for quality assurance procedures in the modelling process.
2. Theoretical framework 2.1. Terminology and scientific basis The terminology and methodology used in the following are based on Refsgaard and Henriksen (2004). The key elements in the terminology are illustrated in Fig. 1 and the most important definitions are: A model code is a generic software program, which can be used for different study areas without modifying the source code. A model is a site application of a code to a particular study area, including input data and parameter values. A model code can be verified. A code verification involves comparison of the numerical solution generated by the code with one or more analytical solutions or with other numerical solutions. Verification ensures that the computer programme accurately solves the equations that constitute the mathematical model. Model validation is here defined as the process of demonstrating that a given site-specific model is capable of making accurate predictions for periods outside a calibration period. A model is said to be validated if its accuracy and predictive capability in the validation period have been proven to lie within acceptable limits or errors. These terms are commonly used, although with differences in meaning between authors. Our views on
Fig. 1. Elements of a modelling terminology (Refsgaard and Henriksen, 2004).
1203
these terms and the ongoing discussion on validationfalsification-confirmation as well as between the terms perceptual model, conceptual model and site-specific model are given in Refsgaard and Henriksen (2004). Here we just note that, from a quality assurance guideline point of view, it is fundamental for us to make a clear distinction between the terms conceptual model, model code and (site-specific) model. Furthermore, we never use the terms verification and validation in a universal sense, but always restricted to clearly defined domains of applicability (numerical universal in Popperian sense). In addition to ensure a proper quality of work the three most important underlying principles that have been identified from an analysis of the modelling process are (Refsgaard and Henriksen, 2004): Validation tests against independent data that have not also been used for calibration are necessary in order to be able to document the predictive capability of a model. Model predictions achieved through simulation should be associated with uncertainty assessments where amongst others the uncertainty in model structure and parameter values should be accounted for. A continuous interaction between water manager and modeller is crucial for the success of the modelling process. One of the key aspects in this regard is to establish suitable performance criteria for the model calibration and validation tests. This dialogue is also very important in connection with uncertainty assessments.
2.2. Types of QA guidelines 2.2.1. Definition and classification of quality assurance (QA) Quality assurance (QA) is defined by NRC (1990) as the procedural and operational framework used by an organisation managing the modelling study to assure technically and scientifically adequate execution of all tasks included in the study, and to assure that all modelling-based analysis is reproducible and defensible. In line with this we define QA guidelines as protocols and guidelines to support good application of models in water management. QA in the modelling process has two main components: (a) QA in development of model codes; and (b) QA in relation to application studies. Our paper focuses on the second component only. QA in model application studies includes data analyses, methodologies of good modelling practice, reviews and administrative procedures. Such QA guidelines can be classified according to how much focus is
1204
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
put on the consensus building process between the modeller and the water manager in the following three classes: Internal technical guidelines (Type 1) established and used internally by the modeller’s organisation. Public technical guidelines (Type 2) established as public guidelines and used internally by the modeller’s organisation. Public interactive guidelines (Type 3) established as public guidelines and based on regulation of the interaction between the modeller and the water manager throughout the modelling process.
2.2.2. Type 1: Internal technical guidelines Most organisations involved in modelling studies have some kind of internal QA procedures. They usually focus on the technical aspects, i.e. to ensure that the modelling work itself is done without making unqualified judgements or errors. The betters of these are based on the modelling protocols and similar scientifically based procedures originating from the research community. These procedures are internal in nature because they have been established or adopted unilaterally by the modeller’s organisation, and because they seldom deal with the interaction between modeller and end-user. Examples of Type 1 guidelines include: Internal QA procedures, common in many companies. Text books. Many textbooks contain chapters with recommended modelling protocols (e.g. Anderson et al., 1993). Manuals to software packages with hints on the best way to use a model (e.g. Rumbaugh and Rumbaugh, 2001; DHI, 2002).
2.2.3. Type 2: Public technical guidelines These guidelines often contain the same substance as the internal technical guides mentioned above. However, they differ in the sense that they have been prepared through a consultative and consensus building process involving many persons and organisations. They focus on the technical aspects and give no or little emphasis to the interaction between the modeller and the end-user. Examples of Type 2 guidelines include: The CAMASE guidelines for modelling that were developed after substantial consultation within the scientific modelling community (CAMASE, 1996). Standards from American Society for Testing and Materials (e.g. ASTM, 1994). Many of the UK standards, especially the older ones (Packman, 2002).
2.2.4. Type 3: Public interactive guidelines These guidelines have, like the public technical guidelines (Type 2), been established through a public consultative and consensus building process. However, they differ from the Type 2 guidelines by an additional focus on regulating the interaction between the modeller and the water manager, who often have the roles of consultant and client, respectively. Important elements in public interactive guidelines are reviews that, in addition to QA in the sense of technical guidance, can facilitate the consensus-building process between the parties. Experience shows that such a process is crucial for the overall credibility of the modelling process. Examples of such QA guidelines include (more details on these guidelines provided in next chapter): The Dutch guidelines (Van Waveren et al., 2000; Scholten and Groot, 2002). The Australian groundwater flow modelling guidelines established by the Murray-Darling Basin Commission (Middlemis, 2000; Merrick et al., 2002; Henriksen, 2002a). The Danish groundwater modelling guidelines (Henriksen, 2002b). Some of the recent UK standards (Packman, 2002). Californian guidelines prepared by Bay-Delta Modelling Forum (BDMF, 2000).
2.3. Development stage and prevalence of QA guidelines Reviews of a number of existing QA guidelines (see details in next chapter) revealed significant differences in current practice, both between domains and between different countries. In some domains and some countries there has been a clear trend over the past couple of decades to move from Type 1 to Type 2 or Type 3 guidelines. In order to understand the development of QA guidelines and be able to provide recommendations based on anticipated future needs, it is important to try to understand why the present differences in the developmental stage of QA guidelines exist. The hypothesis that we will test is that the development stage depends on two main factors: The scientific maturity of the underlying discipline, i.e. how well understood are the underlying processes and how easily available are the data necessary for practical applications. In this respect, a mature scientific discipline is one where there is a general acceptance in the scientific community on how the processes are described, there are no significant controversies on key issues, and it is feasible to acquire the necessary data for practical
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
studies. Similarly, an immature scientific discipline is one where some processes are not well understood, where there are several alternative ‘schools’ on how to describe things, and where it is often not possible to obtain sufficient field data necessary to perform scientifically sound modelling. Immature scientific disciplines are often considered as being complex, and are characterised by unresolved problems such as scale problems. For example, whereas biology is a relatively old science in comparison with hydrogeology, biota (ecological) modelling is considered to be immature in contrast to groundwater flow modelling which is considered to be mature. Biota modelling is rather uncertain due to the inherent complexity of ecological systems and the general limited availability of relevant field data, whereas the mathematical principles describing groundwater flow are well established and flow systems are readily characterised in the field. The modelling market maturity, i.e. how well developed is the market for modelling studies. In this respect, a mature market is characterised by (a) the modelling market is relatively old with numerous examples of good and poor quality modelling studies, and the motivation for establishing QA guidelines is largely due to water managers having experience with studies of poor quality; (b) most jobs are outsourced to private consultants; (c) the volume of modelling work is large, so that a number of consultants can be sustained and standard routines can evolve; and (d) there is a considerable competition among modellers in getting the jobs. Similarly, an immature market is characterised by (a) it is relatively new (typically !10 years); (b) most modelling studies are carried out by government agencies themselves; (c) the volume of work for the consultants is small; and (d) there is virtually no competition
among modellers, instead the work is carried out by a few specialised groups which are often located in or have close ties to the research community. If these hypotheses were true one would a priori expect that a considerable degree of scientific maturity is required for QA guidelines of Type 2 to develop, and that further a mature modelling market is a necessary prerequisite for the development of Type 3 guidelines.
3. Existing guidelines Reviews of existing QA guidelines were conducted (Refsgaard, 2002). The reviews attempted to cover two aspects: (a) variation of practices between seven different modelling domains (groundwater, precipitation-runoff, hydrodynamics, flood forecasting, surface water quality, biota (ecology) and socio-economy); and (b) differences between geographical regions. The reviews of stateof-the-art in the seven domains were carried out by seven different organisations with special expertise in the respective domains. During these reviews a broad search of relevant QA guidelines were made with primary focus on existing guidelines in Europe and secondarily on guidelines from North America and Australia. Subsequently, a few cases with guidelines from different geographical areas were selected for a more detailed review. The reviews did not intend to be exhaustive by including all important QA guidelines, but aimed at selecting guidelines representative for conditions in Europe, North America and Australia. In order to test the above hypotheses the conclusions of the state-of-the-art of QA guidelines for the different domains summarised in Section 3.1 are plotted in Fig. 2 as a function of scientific maturity. Furthermore, examples of guidelines from different countries are
Scientific maturity
Mature
HD
FF
GW-HD
PR Modelling domains GW-AD SWQ HD-Sed Biota Immature
SE
GW-WQ Type 1 Internal
Type 2 Public
1205
GW-HD: GW-AD: GW-WQ: PR: HD: HD-Sed: FF: SWQ: Biota: SE:
Groundwater flow Groundwater solute transport Groundwater geochemistry Precipitation runoff Hydrodynamic – surface water flow Sediment transport/morphology Flood forecasting Surface water quality Biota (ecology) Socio-economy
Type 3 Interactive
QA guidelines
Fig. 2. State-of-the-art for QA guidelines in different modelling domains plotted against maturity of the underlying scientific disciplines.
1206
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
Modelling market
ASTM UK
Mature (Old, big, competive)
BDMF AUS-GW NL-GMP DK-GW UK
UK Cases-guidelines
Immature (New, small, specialised)
BDMF: AUS-GW: NL-GMP: DK-GW: UK: ASTM: CEE: FR-FF:
CEE FR-FF
Type 1 Internal
Type 2 Public
Bay Delta Modelling Forum (California) Australia, groundwater Dutch Good Modelling Practise Denmark, groundwater United Kingdom, several domains American Society for Testing and Materials Central and Eastern Europe France, flood forecasting
Type 3 Interactive
QA guidelines
Fig. 3. Different types of guidelines as a function of maturity in the modelling market.
presented in Section 3.2 and Fig. 3 with focus on market maturity. 3.1. State-of-the-art in different modelling domains Groundwater modelling (Refsgaard and Henriksen, 2002): In this field, QA guidelines are well developed and used in many countries, but mostly in groundwater flow modelling, where the state-of-the-art corresponds to Type 3 guidelines. For solute transport, and in particular for geochemical modelling, relatively few guidelines exist and they are not commonly used. The need for QA guidelines differs from country to country, amongst others due to different stages of development of the groundwater modelling market. For instance, the guides from the American Society for Testing and Materials (ASTM) were among the first of their kind to be developed, in the early 1990s, because the practical application of groundwater models at that time had progressed further in the USA than in most other countries. Precipitation-runoff modelling (Perrin et al., 2002a): Relatively few guidelines exist for this domain as standalone guidelines. The guidelines that do exist are generally confined to relatively simple (lumped) approaches, while no generic guidelines exist for the more complex models of the distributed physically-based type. Thus, the state-of-the-art for precipitation-runoff as a standalone domain may be characterised as Type1/Type2. However, it is also noted that precipitation-runoff modelling is often used as an integral part of other domains, e.g. groundwater models, hydrodynamic models, flood forecasting models and surface water quality models. For some of these integrated applications some guidelines have been developed which include the precipitation-runoff domain. This is, for instance, the case for the Danish groundwater guidelines
(Henriksen, 2002b) which include aspects of precipitation-runoff modelling. Hydrodynamic modelling (Metelka and Krejcik, 2002a): This domain includes environmental applications such as modelling of urban drainage and sewer systems, rivers, floodplains, estuaries and coastal waters both with respect to flows, sediment and morphological issues. QA guidelines are well developed in some fields (e.g. in urban drainage and river modelling), but not in other fields (e.g. sediment and morphological modelling). For hydrodynamic modelling in coastal areas and estuaries few QA guidelines have been identified. The state-of-the-art may be characterised as Type 2 for most parts of the domain and Type 1 for other parts. It is noted that hydrodynamic modelling is often an integral part of flood forecasting and surface water quality modelling. Although very similar in theoretical scientific background, this domain is different from the field of Computational Fluid Dynamics that typically is used for industrial purposes. Flood forecasting modelling (Balint, 2002): This domain differs fundamentally from the other domains by being based on real-time operation. This implies that the models, once established, are applied on a routine (daily) basis although often under extreme boundary conditions. The focus on QA in this domain is often concentrated on data quality for the on-line data acquisition. Due to this fundamental difference in nature, the status of QA guidelines for this domain does not fit well into the above classification, and it is not easily comparable to the status of the other domains. Surface water quality modelling (Da Silva et al., 2002): Surface water quality modelling is based on a description of physical, chemical and biological processes. Often the data availability to assess model processes and parameters is sparse and often the key processes are not well understood. QA guidelines
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
are generally not well developed. The state-of-the-art may be characterised as Type 1. Biota (ecological) modelling (Old et al., 2002): Ecology is a diverse branch of biology that focuses on the relations of flora and fauna to one another and to their physical environment. Ecological models are widely used today, but perceived as being rather uncertain due to the inherent complexity of ecological systems and the general limited availability of relevant field data. QA guidelines are generally not well developed. The state-of-the-art may be characterised as Type 1. Socio-economic modelling (Heinz and Eberle, 2002): No general QA guidelines exist for socio-economic modelling. The few existing guidelines, such as the CAMS, CFMPS and RBMPs in the UK, are specific for particular types of application, and they are so far only used in practice in a few countries. The state-of-the-art may be characterised as Type1/Type2. In Fig. 2 the state-of-the-art for QA guidelines in the respective modelling domains have been plotted against the scientific maturity of the underlying disciplines. The scientific maturity of the respective domains has been assessed subjectively on the basis of the criteria outlined in Section 2.3 above. There is a tendency that the least developed guidelines (Type 1) appear in domains where the underlying scientific basis is characterised as immature, i.e. in surface water quality, biota (ecology) and groundwater quality, reflecting that many fundamental scientific issues remain to be solved. Similarly, the Type 2 and Type 3 guidelines are dominant in domains characterised by scientific maturity. However, there are clear exceptions such as precipitation-runoff and flood forecasting, where other factors than scientific maturity must play a role for the development stage of QA guidelines. 3.2. Current practice in different countries The current practice of using QA guidelines in different countries has been illustrated through some selected cases that have been reviewed in Refsgaard (2002). In Fig. 3 the type of QA guidelines used in the case studies is plotted against the maturity of the modelling market that has been assessed subjectively on the basis of the criteria given in Section 2.3 above. The practice as reflected by the case studies and shown on the figure is summarised as follows: Dutch guidelines (Scholten and Groot, 2002): The Dutch guidelines are the most generic of the existing guidelines in the sense that they cover all the domains relevant for river basin management. The technical guidance for different modelling domains exist, but are not as detailed as some of the guidelines that only cover one domain (e.g. ASTM guides or Australian guidelines on groundwater flow modelling). The Dutch guidelines
1207
emphasise the dialogue process between modeller and water manager, including the review procedures. The Dutch guidelines belong to Type 3. The Dutch modelling market may be characterised as mature. Australian groundwater flow modelling guidelines (Henriksen, 2002a): The Australian guidelines are technically comprehensive. They focus on the dialogue between the modeller and the water manager in general and on review procedures in particular. The guidelines were developed over several years with involvement of all of the key stakeholders. The Australian guidelines belong to Type 3. The Australian groundwater modelling market may be characterised as mature. Danish groundwater modelling guidelines (Henriksen, 2002b): The Danish Handbook of Good Modelling Practice and draft guidelines is similar to the Australian ones, although some important details differ. The water managers, who also ensure that they presently are being used in most studies, have initiated the Danish guidelines. The Danish guidelines belong to Type 3. The Danish groundwater modelling market may be characterised as mature. Central and Eastern Europe (Metelka and Krejcik, 2002b;Van Gils and Groot, 2002): Public QA guidelines are neither well developed nor used. Many modellers therefore rely only on internal QA procedures (Type 1) adopted by their respective organisations. This situation reflects a new and unregulated market for modelling services, and a market where the managers and their organisations often are technically too weak to adopt and enforce QA guidelines. French guidelines in flood forecasting (Perrin et al., 2002b): Public or interactive guidelines do not exist in this area, and the case study describes a set of internal technical guidelines (Type 1). Although flood forecasting is an old modelling discipline, the modelling market is virtually non-existent, because flood forecasting modelling in France (as well as in most other countries) is carried out either by a government agency or by a specialised research institute. UK guidelines (Packman, 2002): QA guidelines are generally very well developed in the UK. Application of guidelines is prescribed as a routine in most areas of model application. Thus, in general the UK market for modelling services is well regulated and characterised as being mature. Most of the guidelines are of Type 2 and some recent ones of Type 3. The exceptions to this are the surface water quality and biota (ecological) domains where no general guidelines exist. The guidelines in these domains are therefore confined to internal procedures inspired by textbooks and manuals (Type 1). Bay Delta Modelling Forum, California (BDMF, 2000): The Californian guidelines provide a framework, but very few technical details. The main emphasis of these guidelines is on the interaction between modellers, managers and the public (Type 3). In this respect various
1208
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
kinds of reviews are prescribed at various stages of the modelling process. The American market in general and the Californian in particular are well established (mature). American Society for Testing and Materials (ASTM, 1992, 1994): The American guidelines are especially comprehensive in the groundwater domain, where they have served as inspiration for all the other groundwater guidelines, including the Australian and the Danish guidelines. There are a number of guidelines on various elements of the modelling process. These guides are 5–10 years old and are mainly technical of nature, while limited focus is put on the interaction and review process. In addition to the above QA guidelines ISO (the International Organisation for Standardisation) regularly publishes quality management and quality assurance standards. ISO standards provide guidance on fundamental principles and procedures, but on a rather general level. We have found ISO standards addressing development, supply and maintenance of computer software (ISO 9000-3:1997) and other standards providing guidance for a general process based quality management system in an organisation (ISO 9004:2000(E)). However, none of the ISO standards include any particular guidance on matters related to water resources modelling or management, and they are therefore of limited practical use as compared to the above other QA guidelines dedicated to water resources modelling. 3.3. Content of existing guidelines 3.3.1. Key elements The existing guidelines all comprise modelling protocols with recommended steps and technical guidance on how to perform these steps in the modelling process. The key elements may be divided into two groups, namely: (1) technical guides on how to use models; and (2) guides for regulating the interaction between modeller and end-user/water manager. The key elements in the technical guides include:
Definition of the purpose of the modelling study. Collection and processing of data. Establishment of a conceptual model. Selection of code or alternatively programming and verification of code. Model set-up. Establishment of performance criteria. Model calibration. Model validation. Uncertainty assessments. Simulation with model application for a specific purpose. Reporting.
The key elements in the interaction between the modeller and the end-user in addition to some of the above elements also includes other aspects: Definition of the purpose of the modelling study, including translation of the end-users needs to preliminary performance criteria. Establishment of performance criteria. The accuracy of the model predictions has to be established via a trade off between the benefits of improving the accuracy in terms of less uncertainty on the management decisions and the costs of improving the accuracy through additional model studies and/ or collection of additional field data. Reviews with subsequent consultation between the modeller and the end-user at different phases of the modelling project. The content of the technical guides are to a large extent domain specific, while the elements of the interaction between the modeller and the end-user are more general in nature and differ only slightly from one domain to another.
3.3.2. Integration across modelling domains Almost all the existing guidelines were developed for a specific domain e.g. groundwater modelling. As integrated modelling may be expected to play an important role in connection with implementation of the EU Water Framework Directive and adoption of Integrated Water Resources Management principles, guidelines not including integrated modelling aspects are inadequate. Even the Dutch guidelines (Scholten and Groot, 2002) which cover a large number of domains are essentially single domain guidelines, because they do not provide guidance on how to integrate across domains (interdependencies etc.). However, the Dutch guidelines do have the clear advantage over other existing guidelines in that they are based on a common methodology and a common glossary. It should be noted though that some guidelines cover more than one modelling domain, as they are defined here. For instance hydrodynamic modelling or groundwater modelling are often combined with precipitationrunoff, and guidelines combining these domains exist.
3.3.3. Differences in terminology As illustrated in Refsgaard (2002) the terminology used in the modelling community varies significantly between domains and even to some extent from one country to another. This clearly demonstrates the need for establishing one common terminology and glossary for modelling applications as addressed by Refsgaard and Henriksen (2004).
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
guideline tool: will generate guidelines from the KB; monitoring tool: will monitor all activities within a modelling job and store these activities as a single model journal in a model archive; report tool: generates reports from a model journal; advisor tool: advises modellers in new modelling jobs based on decisions and choices of previous jobs and associated model journals in the model archive.
4. Outline of new guidelines – HarmoniQuA 4.1. Overall aim and structure On the basis of the knowledge achieved through the review of existing guidelines, the HarmoniQuA project aims to develop a new comprehensive set of guidelines and supporting software tools to facilitate an improved quality of the modelling process and hence enhance the confidence of all stakeholders. HarmoniQuA forms part of the CATCHMOD cluster of EU research projects (Blind, 2004). It aims to be a methodological component of a future infrastructure for model based decision support for water management at catchment and river basin scale. This main goal will be reached by providing the elements of a methodological layer in this infrastructure, embodied in a knowledge base (KB) and software tools. HarmoniQuA will collect methodological expertise, structure this knowledge and identify and fill in gaps. It will consist of generic and domain specific knowledge, modelling software specific aspects, and a transparent and consistent glossary of terms and concepts. This body of knowledge will be structured in a knowledge base. The following set of software tools will provide functionality for the HarmoniQuA system:
An overview of the HarmoniQuA products (KB and tools) and how these interact with the activities of the users is presented in Fig. 4. The lower part of Fig. 4 depicts the five major steps of the modelling process. These five major steps are decomposed into 45 tasks, with interrelations (order and feedback) as shown in Fig. 5. Each task has an internal structure, i.e. name, definition, explanation, interrelations with other tasks, activities, activity related methods, references, task inputs and outputs. This knowledge structure (steps, tasks, within-task-knowledge) is stored in the KB. The five steps and the tasks have been selected on the basis of existing modelling protocols and QA guidelines and include the key elements outlined in Section 3.3 above. Model based decision support has several dimensions, which hinder a ‘one-size-fits-all’-approach. HarmoniQuA attempts to serve several types of users in
Knowledge Base
Model Archive
Guidelines Software capabilities Glossary Domains: Groundwater Precipitation-runoff Hydrodynamics Flood forecasting Water quality Biota (ecology) Socio-economics
Model journal, Project A Model journal, Project B Model journal, Project C Model journal, Project D
Guidance
Advise
Generic + specific for: - model domain - user - job complexity
From previous model projects
MoST Reporting Specific for types of users
Monitoring Generic + specific for: - model domain - user - job complexity
User Model Team Single/multiple domain Model Study Plan
Data and Conceptualisation
1209
Model Set-up
Calibration and Validation
Simulation and Evaluation
Reporting and client review take place in each step
Fig. 4. HarmoniQuA tools (MoST) to support the QA process.
feedforward
feedfback
Review task
Decision task
Ordinary task
Legends
Agree on Model Study Plan and no Budget
Proposal and Tendering
Prepare Terms of Reference
Determine Requirements
yes
Report and Revisit Model Study Plan (Conceptualisation)
Code Selection
yes
Assess Soundness of Conceptualisation
Process Model Structure Data
no
Need for Alternative Conceptual Models?
bad
no
no
no
bad
Review Model Set-up and Calibration and Validation Plan
Report and Revisit Model Study Plan (Model Set-up)
OK Specify or Update Calibration + Validation Targets and Criteria
Test Runs Completed
Construct Model
Model Set-up
yes
bad
not OK
not OK
no no
no
Review Calibration and Validation and no Simulation Plan yes
Report and Revisit Model Study Plan (Calibration + Validation)
Document Model Scope
OK Uncertainty Analysis of Calibration and Validation
Assess Soundness of Validation
Validation
OK
Assess Soundness of Calibration
yes
All Calibration Stages Completed?
yes
Parameter Optimisation
Select Calibration Parameters
Define Stop Criteria
Select Optimisation Method
Specify Stages in Calibration Strategy
Calibration and Validation
Fig. 5. The five steps and 45 tasks of modelling process in the HarmoniQuA knowledge base.
Review Conceptualisation and no yes Model Set-up Plan
no
yes
Summarise Conceptual Model and Assumptions
Model Parameters
Model Structure and Processes
yes
Sufficient Data?
Process Raw Data
Define Objectives
bad
Describe System and Data Availability
Describe Problem and Context
Identify Data Availability
Data and Conceptualisation
Model Study Plan
bad
no
Model Study Closure
yes
Review of Simulation
Reporting of Simulation (incl. Uncertainty)
Uncertainty Analysis of Simulation
yes
Assess Soundness of Simulation
Analyse and Interpret Results
yes
Check Simulations
Simulations
no
bad
no
Simulation and Evaluation
1210 J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
a series of water management domains, in jobs of diverse complexity and diverse application purpose. In this way, users working on a specific job will only be confronted with guidelines, instructions, decisions and activities that are relevant to their role in a particular modelling job. The HarmoniQuA tools have been developed in Prote´ge´2000 following an ontological approach. More details can be found in Kassahun et al. (2004). The tools are available on http://www.harmoniqua.org/. 4.2. Key elements Some of the key features to be implemented in the new HarmoniQuA guidelines are: 4.2.1. Interactive guidelines The dialogue between the different players is crucial to ensure that the output from the modelling process is understandable for stakeholders and beneficial for the client. The importance of involvement of stakeholder and public opinions are emphasised by Pahl-Wostl (2002) and addressed in some Type 3 guidelines (e.g. BDMF, 2000; Pascual et al., 2003). In HarmoniQuA, each of the five major steps (Fig. 5) is therefore concluded with a dialogue task, in terms of either contract negotiation (first step) or reviews (last four steps). A dialogue task encourages the assessment of the present step and provides the opportunity to redefine the content of the model study plan for the next step based upon the results and findings of the present step. These dialogue steps provide flexibility to the modelling study and ensure that the tasks that have yet to be performed can be modified according to the achieved results and perceptions of modeller and client.
1211
et al., 2003; Scholten and Van der Tol, 1998). These attempts are very useful in making the performance more transparent and quantitative, but do not provide an objective means to decide what the optimal accuracy criteria really should be in a given case. According to Refsgaard and Henriksen (2004) no universal accuracy criteria can be established, i.e. it is generally not possible from a natural scientific point of view to tell when a model performance is good enough. Such acceptance criteria will vary from case to case depending on the socio-economic context, i.e. what is at stake in the decisions to be supported by the model predictions. An appropriate question may be: how do we translate the ‘soft’ socio-economic objectives to ‘hard-core’ model performance criteria? This is obviously a challenge that cannot be solved by natural science alone, but needs to be addressed in a much broader context including aspects of economy, stakeholder interests and risk perception. Performance statistics must comprise quantifiable and objective measures. However numerical measures cannot stand alone. Often expert opinions are necessary supplements.
4.2.2. Transparency and reproducibility Transparency and reproducibility are important, especially for large studies involving use of complex models. This will be ensured through the Monitoring Tool which enables modelling teams, consisting of modellers, managers and auditors, to be guided through the modelling process, to monitor all modelling activities and to oversee the status of each task to perform. With an increasing tendency to reuse existing models or rebuild them with additional data, modified conceptual models (revised model structure and/or inclusion of additional processes) and improved calibration and validation tests, this functionality of the Monitoring Tool becomes very important.
4.2.4. Uncertainty assessments Quality assurance and uncertainty assessments are two aspects that are very closely linked. Initially, the manager has to define accuracy criteria from a perception of which uncertainty level he/she believes is suitable for a particular case (see above). Subsequently, as the modelling study proceeds, the dialogue between modeller and manager has to continue with the necessary trade off between modelling accuracy and the cost of the modelling study. In the uncertainty assessments it is very important to go beyond the traditional statistical uncertainty analysis. Thus, e.g. aspects of scenario uncertainty and ignorance should generally be included and in addition the uncertainties originating from data and models often needs to be integrated with socioeconomic aspects in order to form a suitable basis for the further decision process (e.g. Van Asselt and Rotmans, 2002). Thus, like with the accuracy criteria (above) the use of uncertainty assessments in water resources management goes beyond natural science. Assessment of uncertainty due to errors in the model structure is a particularly difficult task and is most often neglected. One way of evaluating this source of uncertainty is through the establishment of alternative conceptual models. This aspect is emphasised in the HarmoniQuA guidelines.
4.2.3. Accuracy criteria Establishment of accuracy criteria for a modelling study is a very important, but difficult, issue. Modellers often establish numerical accuracy criteria in order to classify the goodness of a given model (e.g. Henriksen
4.2.5. Model validation Although experience shows that models generally perform poorer in validation tests against independent data than they do in calibration tests, model validation is in our opinion a neglected issue, both in many modelling
1212
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
guidelines and in the scientific literature. Maybe many scientists have not wanted to use the term validation due to the scientific philosophically related controversies, but in any case many scientists are not advocating the need for model validation. One of the unfortunate consequences of this ‘lack of interest’ is that not much work has been devoted to developing suitable validation test schemes since Klemes (1986). In our opinion further development of suitable testing schemes, particularly for non-linear models and for applications comprising extrapolations beyond the calibration data basis, and imposing them to all modelling projects is a major future challenge. 4.2.6. Dedication aspects The QA guidelines describe the different tasks and responsibilities of the different types of users such as (1) modellers; (2) water managers; (3) auditors; (4) stakeholders (other than water manager); and (5) general public. The QA guidelines are developed so that they adequately reflect the different requirements in several modelling domains (and still maintain a common generic core to ensure coherency). Furthermore, the guidelines will be applicable for studies where several domains, including socio-economy, are integrated. The QA guidelines differentiate according to job complexity in modelling, e.g. (1) basic (rough calculations); (2) intermediate (moderately complex calculations); and (3) comprehensive (sophisticated, detailed calculations).
5. Discussion and conclusions 5.1. Types and reasons of existing QA guidelines We have classified quality assurance (QA) guidelines in three types: Internal technical guidelines (Type 1), Public technical guidelines (Type 2), and Public interactive guidelines (Type 3). We have then characterised the conditions for which the guidelines are used by (a) the scientific maturity of the underlying discipline(s) and (b) the maturity of the modelling market in the region/ country for which the guidelines were developed. Our review of existing QA guidelines is not exhaustive, but limited to examples aimed at being representative for conditions in Europe, North America and Australia. Thus, we have for instance not reviewed QA guidelines from countries in Asia, where modelling has taken place for many years. The results of our review revealed significant variations in the type of guidelines available and their usage between different modelling domains and countries. We hypothesised that the stage of QA guideline development largely depends on the maturity of both the specific scientific discipline and the modelling market in the respective country or region (Figs. 2 and 3).
Considering Figs. 2 and 3 it appears that the maturity of the scientific discipline and market both play an important role in QA development. However, neither the scientific level nor the market maturity alone is able to explain the differences in the stage of QA guideline development. If the underlying process understanding or necessary data are too weak, then the modelling process lacks credibility no matter how well QA procedures are adhered to. Hence, the motivation to establish sophisticated QA guidelines in such cases is small. Similarly, even though a specific discipline may be scientifically mature, modellers may be reluctant to use sophisticated QA guidelines if they are not required to do so by regulators and/or water managers. The general development of QA guidelines has progressed over time from Type 1 towards Type 3. A developmental process that is consistent with the results of the reviews as reflected in Figs. 2 and 3 is the following. Initially, when models are introduced for practical application, internal technical guidelines (Type 1) originating from the research community are applied. The development from Type 1 to Type 2 QA guidelines requires a certain degree of maturity within both the specific scientific discipline and the market. This implies that there should not be significant lacks of knowledge on process descriptions, and that there is a common agreement about the scientifically sound procedures for solving the problems in this domain. The development of Type 2 guidelines is most often driven by the demands of regulators and water managers. The development from Type 2 to Type 3 requires a clear and conscious demand from regulators and water managers. It would also have been possible to classify the QA guidelines after other criteria, for example according to how uncertainty analysis is treated, whether they apply to single or multiple domains and whether they apply to natural or social science. We have chosen our classification for two main reasons. Firstly, an improved mutual understanding between modeller and water manager is crucial for a model application to be successful in practice, and this should be facilitated by the QA guidelines. Secondly, the trend of increasing stakeholder involvement in the water resources management process demands that QA guidelines also enable stakeholders to observe and take part in parts of the modelling process. Our characterisation of QA guidelines according to scientific and market maturity has some weaknesses. First of all, the assessments have been done subjectively, because there was no other feasible method. Secondly, the two characteristics are not completely independent. Thus a large and mature market will often put demands on new scientific knowledge and hence to enhance the scientific development, as well as it will lead to needs for improved technical standards. Altogether, it may be concluded that our hypotheses on the importance of scientific and market maturity for
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
the development of QA guidelines have not been falsified. However, due to the above weaknesses and the limited empirical basis (review not exhaustive but selected examples) this conclusion should be taken with some reservation. 5.2. Organisational requirements for QA guidelines to be effective As emphasised by e.g. Forkel (1996) modelling studies involve several partners with different responsibilities. The ‘key players’ are code developers, model users (modellers) and water managers (including planning and regulatory authorities). To a large extent the quality of the modelling study is determined by the expertise, attitudes and motivation of the teams involved in the modelling and quality assessment process. The attitude of the modellers is important. NRC (1990) characterises this as follows: ‘‘most modellers enjoy the modelling process but find less satisfaction in the process of documentation and quality assurance’’. Scholten and Groot (2002) describe the main problem with the Dutch Handbook on Good Modelling Practice that they all like it, but only a few use it. QA will only become successful if both of the parties, modeller and water manager, are motivated and active in supporting its use. The water manager has a particular responsibility, because he/she has the power to request and pay for adequate QA in modelling studies. Therefore, QA guidelines can only be expected to be used in practice, if the water manager prescribes their use. In this respect it is very important that the water manager has the technical capacity to organise the QA process. A significant problem for water manager’s organisation is that it often lacks individuals who are trained at an appropriate level to understand and use models. If the water manager does not possess such skill within his/her own staff, an external modelling expert can be hired to help the manager in the QA process. However, this requires that the manager is aware of the problem and the need. 5.3. The HarmoniQuA guidelines The approach adopted in the present HarmoniQuA guidelines correspond to Type 3. However, in addition to its focus on the dialogue and role play between the various actors in the modelling process, i.e. modellers, water managers, auditors and the public/stakeholders, the HarmoniQuA approach is innovative compared to existing Type 3 QA guidelines on the following aspects: Supporting software tools, beyond simple scoreboards and templates, are novel and important elements. These tools, which contain the knowledge base (KB), can guide the users through the modelling process, monitor decisions and outcomes,
1213
and provide experienced based advise on the appropriate route to be followed. This will significantly improve the transparency and reproducibility of the modelling process. To our knowledge no such tools exist or are under development at present. The focus on performance and accuracy criteria in the modelling process is not novel as such. However, the current adaptation of these criteria through the process in connection with the formalised review steps is, if not novel, then at least emphasised much more in the HarmoniQuA guidelines than in any other existing guidelines. This approach allows the HarmoniQuA guidelines to fit nicely with the new ideas of adaptive management (Pahl-Wostl, 2002). The uncertainty aspects are given a more central role than in existing guidelines, where uncertainty often is confined to assessment of predictive uncertainties towards the end of the study. In the HarmoniQuA guidelines uncertainty aspects plays an important role in 13 of the 45 tasks. Thus, uncertainty assessment is a central element in the dialogue between modeller and water manager already in the beginning of the model study when the initial performance criteria are outlined. Furthermore, HarmoniQuA recommends including less quantifiable elements such as scenario uncertainty and model structural uncertainty in the assessment. Model validation tests against independent data have more emphasis than in most other guidelines. Although the most comprehensive of the existing guidelines, the Dutch guidelines (Van Waveren et al., 2000), for example recommends validation to be carried out, they do not describe validation tests beyond the traditional split-sample test. The HarmoniQuA guidelines are unique in their dedication aspects, namely that different tasks and responsibilities are described for different users, different modelling domains and different levels of modelling job complexity. The Australian groundwater modelling guidelines have the same feature, but only with respect to the review procedures (Merrick et al., 2002).
The HarmoniQuA guidelines consist of a comprehensive set of QA guidelines for multiple modelling domains combined with the supporting software tools. These functionalities appear to be well suited to the challenges and demands of modern water resources management. The usefulness, user friendliness and appreciation by the users will be assessed through a testing of the guidelines and tools in a range of river basin modelling projects. Acknowledgements The present work was carried out within the Project ‘Harmonising Quality Assurance in model based
1214
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215
catchments and river basin management (HarmoniQuA)’, which is partly funded by the EC Energy, Environment and Sustainable Development programme (Contract EVK1-CT2001-00097). The constructive comments of five anonymous reviewers are acknowledged. References Anderson, M.G., Bates, P.D., 2001. Hydrological science: model credibility and scientific integrity. In: Anderson, M.G., Bates, P.D. (Eds.), Model Validation. Perspectives in Hydrological Science. John Wiley & Sons, Chichester, pp. 1–10. Anderson, M.P., Woessner, W.W., 1992. The role of postaudit in model validation. Advances in Water Resources 15, 167–173. Anderson, M.P., Ward, D.S., Lappala, E.G., Prickett, T.A., 1993. Computer models for subsurface water. In: Maidment, D.R. (Ed.), Handbook of Hydrology. McGraw-Hill, Inc (Chapter 22). ASTM, 1992. Standard Practice for Evaluating Mathematical Models for the Environmental Fate of Chemicals. Standard E978-92, American Society for Testing and Materials, http://www.astm.org. ASTM, 1994. Standard Guide for Application of a Ground-Water Flow Model to a Site-Specific Problem. Standard D5447-93, American Society for Testing and Materials, http://www.astm.org. Balint, G., 2002. State-of-the-art for flood forecasting modelling. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 7, Geological Survey of Denmark and Greenland, Copenhagen, http://www.harmoniqua.org. BDMF, 2000. Protocols for Water and Environmental Modeling. Bay-Delta Modeling Forum. Ad hoc Modeling Protocols Committee, http://www.sfei.org/modelingforum/. Blind, M., 2004. ICT requirements for an ‘evolutionary’ development of WFD compliant River Basin Management Plans. In: Pahl, C., Schmidt, S., Jakeman, T. (Eds.), iEMSs 2004 International Congress: ‘‘Complexity and Integrated Resources Management’’. International Environmental Modelling and Software Society, Osnabru¨ck, Germany, June 2004. CAMASE, 1996. CAMASE was a Concerted Action for the Development and Testing of Quantitative Methods for research on Agricultural Systems and the Environment, http://www.bib.wau. nl/camase/. Da Silva, M.C., Barbosa, A.E., Rocha, J.S., Fortunato, A.B., 2002. State-of-the-art for surface water quality modelling. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 8, Geological Survey of Denmark and Greenland, Copenhagen, http://www.harmoniqua.org. DHI, 2002. MIKE 11 User Guide. DHI Water & Environment, Hørsholm, Denmark. Forkel, C., 1996. Das numerische Modell – ein schmaler Grat zwischen vertrauenswu¨rdigem Werkzeug und gefa¨hrlichem Spielzeug. Presented at the 26. IWASA, RWTH Aachen, 4–5 January 1996. GWP-TAC, 2000. Integrated Water Management, TEC Background Papers No. 4, Global Water Partnership, SE-105 25 Stockholm, Sweden, ISBN: 91-630-9229-8. Heinz, I., Eberle, S., 2002. State-of-the-art for socio-economic modelling. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 10, Geological Survey of Denmark and Greenland, Copenhagen, http://www.harmoniqua.org. Henriksen, H.J., 2002a. Australian groundwater modelling guidelines. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 13, Geological Survey of Denmark and Greenland, Copenhagen, http://www.harmoniqua.org.
Henriksen, H.J., 2002b. Danish groundwater modelling guidelines. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 14, Geological Survey of Denmark and Greenland, Copenhagen, http://www.harmoniqua.org. Henriksen, H.J., Troldborg, L., Nyegaard, P., Sonnenborg, T.O., Refsgaard, J.C., Madsen, B., 2003. Methodology for construction, calibration and validation of a national hydrological model for Denmark. Journal of Hydrology 280 (1–4), 52–71. Kassahun, A., Scholten, H., Zompanakis, G., Gavardinas, C., 2004. Support for model based water management with the HarmoniQuA toolbox. In: Pahl, C., Schmidt, S., Jakeman, T. (Eds.), iEMSs 2004 International Congress: ‘‘Complexity and Integrated Resources Management’’. International Environmental Modelling and Software Society, Osnabru¨ck, Germany, June 2004. Klemes, V., 1986. Operational testing of hydrological simulation models. Hydrological Sciences Journal 31, 13–24. Merrick, N.P., Middlemis, H., Ross, J.B., 2002. Groundwater Modelling Guidelines for Australia – Recommended Procedures for Modelling Reviews. International Groundwater Conference. Balancing the Groundwater Budget. Northern Territory. Australia. 12–17 May 2002. Metelka, T., Krejcik, J., 2002a. State-of-the-art for hydrodynamic. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 6, Geological Survey of Denmark and Greenland, Copenhagen, http://www.harmoniqua.org. Metelka, T., Krejcik, J., 2002b. Quality assurance in Central and Eastern Europe. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 15, Geological Survey of Denmark and Greenland, Copenhagen, http://www.harmoniqua.org. Middlemis, H., 2000. Murray-Darling Basin Commission. Groundwater Flow Modelling Guideline. Aquaterra Consulting Pty Ltd. South Perth. Western Australia. Project no. 125. NRC, 1990. Ground Water Models: Scientific and Regulatory Applications. National Research Council, National Academy Press, Washington, D.C. Old, G.H., Packman, J.C., Calver, A.N., 2002. State-of-the-art for biota (ecological) modelling. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 9, Geological Survey of Denmark and Greenland, Copenhagen, http://www. harmoniqua.org. Packman, J.C., 2002. Quality Assurance in the UK. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 17, Geological Survey of Denmark and Greenland, Copenhagen, http://www. harmoniqua.org. Pahl-Wostl, C., 2002. Towards sustainability in the water sector – the importance of human actors and processes of social learning. Aquatic Sciences 64, 394–411. Pascual, P., Stiber, N., Sunderland, E., 2003. Draft Guidance on the Development, Evaluation, and Application of Regulatory Environmental Models. Council for Regulatory Environmental Modeling. US EPA, Washington D.C. Perrin, C., Andreassian, V., Michel, C., 2002a. State-of-the-art for precipitation-runoff modelling. In: Refsgaard, J.C. (Ed.), Stateof-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 5, Geological Survey of Denmark and Greenland, Copenhagen, http://www.harmoniqua.org. Perrin, C., Andreassian, V., Michel, C., 2002b. Quality assurance for precipitation-runoff modelling in France. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 16, Geological Survey of Denmark and Greenland, Copenhagen, http://www. harmoniqua.org.
J.C. Refsgaard et al. / Environmental Modelling & Software 20 (2005) 1201–1215 Refsgaard, J.C. (Ed.), 2002. State-of-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Report from the EU research project HarmoniQuA, http://www. harmoniqua.org. 18 chapters, 182 pp. Geological Survey of Denmark and Greenland, Copenhagen. Refsgaard, J.C., Henriksen, H.J., 2002. State-of-the-art for Groundwater Modelling. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 4, Geological Survey of Denmark and Greenland, Copenhagen, http://www.harmoniqua.org. Refsgaard, J.C., Henriksen, H.J., 2004. Modelling guidelines – terminology and guiding principles. Advances in Water Resources 27, 71–82. Rumbaugh, J.O., Rumbaugh, D.B., 2001. Guide to Using Groundwater Vistas. Environmental Simulations, Inc, Virginia, USA. Rykiel, E.R., 1996. Testing ecological models: the meaning of validation. Ecological Modelling 90, 229–244. Scholten, H., Van der Tol, M.W.M., 1998. Quantitative validation of deterministic models: when is a model acceptable? In: Obaidat, M.S., Davoli, F., DeMarinis, D. (Eds.), The Proceedings of the Summer Computer Simulation Conference. SCS, The Society for Computer Simulation International, San Diego, CA, USA, pp. 404–409.
1215
Scholten, H., Groot, S., 2002. Dutch guidelines. In: Refsgaard, J.C. (Ed.), State-of-the-Art Report on Quality Assurance in modelling related to river basin management. Chapter 12, Geological Survey of Denmark and Greenland, Copenhagen, http://www. harmoniqua.org. Scholten, H., Van Waveren, R.H., Groot, S., Van Geer, F.C., Wo¨sten, J.H.M., Koeze, R.D., Noort, J.J., 2000. Good Modelling Practice in Water Management. Paper Presented on Hydroinformatics 2000, Cedar Rapids, IA, USA. Van Asselt, M.B.A., Rotmans, J., 2002. Uncertainty in integrated assessment modelling – From positivism to pluralism. Climatic Change 54 (1–2), 75–105. Van Gils, J.A.G., Groot, S., 2002. Examples of good modelling practice in the Danube Basin. In: Refsgaard, J.C. (Ed.), Stateof-the-Art Report on Quality Assurance in Modelling Related to River Basin Management. Chapter 18, Geological Survey of Denmark and Greenland, Copenhagen, http://www.harmoniqua. org. Van Waveren, R.H., Groot, S., Scholten, H., Van Geer, F.C., Wo¨sten, J.H.M., Koeze, R.D., Noort, J.J., 2000. Good Modelling Practice Handbook, STOWA Report 99-05, Utrecht, RWS-RIZA, Lelystad, The Netherlands, http://waterland.net/riza/aquest/ (In Dutch).
[14]
Refsgaard JC, Nilsson B, Brown J, Klauer B, Moore R, Bech T, Vurro M, Blind M, Castilla G, Tsanis I, Biza P (2005) Harmonised techniques and representative river basin data for assessment and use of uncertainty information in integrated water management (HarmoniRiB). Environmental Science and Policy, 8, 267-277.
Reprinted from Environmental Science and Policy with permission from Elsevier
Environmental Science & Policy 8 (2005) 267–277 www.elsevier.com/locate/envsci
Harmonised techniques and representative river basin data for assessment and use of uncertainty information in integrated water management (HarmoniRiB) Jens Christian Refsgaard a,*, Bertel Nilsson a, James Brown b, Bernd Klauer c, Roger Moore d, Thomas Bech e, Michele Vurro f, Michiel Blind g, Guillermo Castilla h, Ioannis Tsanis i, Pavel Biza j a
Geological Survey of Denmark and Greenland (GEUS), Department of Hydrology, Øster Voldgade, DK-1350 Copenhagen, Denmark b Universiteit van Amsterdam (UVA), Amsterdam, The Netherlands c Centre for Environmental Research (UFZ), Leipzig, Germany d Centre for Ecology and Hydrology (CEH), Wallingford, UK e DHI Water and Environment (DHI), Hørsholm, Denmark f Istituto di Ricerca Sulle Acque del CNR (IRSA), Bari, Italy g Institute of Inland Water Management and Waste Water Treatment (RIZA), Lelystad, The Netherlands h Universidad de Castilla – La Mancha (UCLM), Albacete, Spain i Technical University Crete (TUC), Chania, Greece j Povodi Moravi (PM), Brno, Czech Republic
Abstract This paper describes progress on HarmoniRiB, a European Commission Framework 5 project. The HarmoniRiB project aims to support the implementation of the EU Water Framework Directive (WFD) by developing concepts and tools for handling uncertainty in data and modelling, and by designing, building and populating a database containing data and associated uncertainties for a number of representative basins. This river basin network aims at becoming a ‘virtual laboratory for modelling studies’, and it will be made available for the scientific community. The data may, e.g. be used for comparison and demonstration of methodologies and models relevant to the WFD. # 2005 Elsevier Ltd. All rights reserved. Keywords: Uncertainty; River basin management; Data; Models; River basin network; HarmoniRiB; Water Framework Directive
1. Introduction 1.1. Problems to be addressed The Water Framework Directive (WFD) provides a European policy basis at the river basin scale. The river basin management and planning process prescribed in the WFD is an adaptation of the Integrated Water Resources Management principles (GWP, 2000), involving all physical domains in water management, sectors of water use, socio-economics and stakeholder participation. As such, * Corresponding author. Tel.: +45 38 14 27 76; fax: +45 38 14 20 50. E-mail address:
[email protected] (J.C. Refsgaard). 1462-9011/$ – see front matter # 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.envsci.2005.02.001
the WFD poses new challenges to water resources managers. The traditional physical domain specific and sectoral approaches need to be combined and extended to fulfil the WFD requirements. The preparation of the river basin management plans, prescribed in the WFD, is furthermore influenced by uncertainties on the underlying data and modelling results. In several sections of the WFD document, uncertainty is addressed (Blind and de Blois, 2003). In addition, most of the WFD guidance documents, being more specific than the WFD document itself, explicitly emphasise that uncertainty analyses should be performed. However, in spite of strong recommendations to consider uncertainty aspects the guidance documents do not include recommendations on how to do so.
268
J.C. Refsgaard et al. / Environmental Science & Policy 8 (2005) 267–277
Therefore, there is a clear and urgent need for developing new concepts, methodologies and tools that can be used to assist in implementing the WFD. In order to support such research and development, it is necessary to have a network of representative river basins with datasets suitable for this purpose. This implies that the datasets, in addition to covering the diversity in terms of ecological regimes and socio-economic conditions found across Europe, must have built-in information on the uncertainties in the data. 1.2. Objectives The paper presents status and preliminary results from an ongoing research project, HarmoniRiB, that is supported under EU’s 5th Framework Programme. The overall goal of HarmoniRiB is to develop methodologies for quantifying uncertainty and its propagation from the raw data to concise management information. The four specific project objectives are: To establish a practical methodology and a set of tools for assessing and describing uncertainty originating from data and models used in decision making processes for the production of integrated water management plans. It will include a methodology for integrating uncertainties on basic data and models and socio-economic uncertainties into a decision support concept applicable for implementation of the WFD. To provide a conceptual model for data management that can handle uncertain data and implement it for a network of representative river basins. To provide well documented datasets, suitable for studying the influence of uncertainty on management decisions for a network of representative river basins and to provide examples of their use in the development of integrated water management plans. To disseminate intermediate and final results among researchers and end-users across Europe and obtain and incorporate feedback on the methodologies, tools and the datasets.
2. Uncertainty assessments 2.1. Definitions and taxonomy Uncertainty and associated terms such as error, risk and ignorance are defined and interpreted differently by different authors (see Walker et al., 2003 for a review). The different definitions reflect, among other factors, the different scientific disciplines and philosophies of the authors involved, as well as the intended audience. In addition they vary depending on their purpose. Some are rather generic, such as Funtowicz and Ravetz (1990), while others apply more specifically to model based water management, such as Beck (1987). The terminology used in HarmoniRiB has
emerged after discussions between social scientists and natural scientists specifically aiming at applications in model based water management (Klauer and Brown, 2003). By doing so we adopt a subjective interpretation of uncertainty in which the degree of confidence that a decision maker has about possible outcomes and/or probabilities of these outcomes is the central focus. Thus, according to our definition a person is uncertain if s/he lacks confidence about the specific outcomes of an event. Reasons for this lack of confidence might include a judgement that the information is incomplete, blurred, inaccurate, imprecise or potentially false. Similarly, a person is certain if s/he is confident about the outcome of an event. It is possible that a person feels certain but has misjudged the situation (i.e. s/he is wrong). There are many different (decision) situations, with different possibilities for characterising of what we know or do not know and of what we are certain or uncertain. A first distinction is between ignorance as a lack of awareness about imperfect knowledge and uncertainty as a state of confidence about knowledge (which includes the act of ignoring). Our state of confidence may range from being certain to admitting that we know nothing (of use), and uncertainty may be expressed at a number of levels in between. Regardless of our confidence in what we know, ignorance implies that we can still be wrong (‘in error’). In this respect Brown (2004) has defined a taxonomy of imperfect knowledge illustrated in Fig. 1. In evaluating uncertainty, it is useful to distinguish between uncertainty that can be quantified, e.g. by probabilities and uncertainty that can only be qualitatively described, e.g. by scenarios. If one throws a balanced die, the precise outcome is uncertain, but the ‘attractor’ of a perfect die is certain: we know precisely the probability for each of the 6 outcomes, each being 1/6. This is what we mean with ‘uncertainty in terms of probability’. However, the estimates for the probability of each outcome can also be uncertain. If a model study says: ‘‘there is a 30% probability that this area will flood two times in the next year’’, there is not only ‘uncertainty in terms of probability’ but also uncertainty regarding whether the estimate of 30% is a reliable estimate. Secondly, it is useful to distinguish between bounded uncertainty, where all possible outcomes have been identified (they can be distinct or indistinct) and unbounded uncertainty, where the known outcomes are considered incomplete. Since quantitative probabilities require ‘all possible outcomes’ of an uncertain event and each of their individual probabilities to be known, they can only be defined for ‘bounded uncertainties’. If probabilities cannot be quantified in any undisputed way, we often can still qualify the available body of evidence for the possibility of various outcomes. The bounded uncertainty where all probabilities are deemed known (Fig. 1) is often denoted ‘statistical uncertainty’ (e.g. Walker et al., 2003). This is the case traditionally addressed in model based uncertainty assess-
J.C. Refsgaard et al. / Environmental Science & Policy 8 (2005) 267–277
269
Fig. 1. Taxonomy of imperfect knowledge resulting in different uncertainty situations (Brown, 2004).
ment. It is important to note that this case constitutes one of many decision situations outlined in Fig. 1, and in other situations the main uncertainty in a decision situation cannot be characterised statistically. 2.2. Framework for describing data uncertainty By considering space–time variability and data type, Brown et al. (2005) have distinguished 13 uncertainty categories of uncertain data (Table 1). By considering measurement scale, it becomes possible to quickly limit the relevant uncertainty models for a certain variable. On a discrete measurement scale, for example, it is only relevant to consider discrete probability distribution functions, whereas continuous density functions are required for continuous numerical data. In addition, the use of space and time variability determines the need for autocorrelation functions alongside a probability density function ( pdf ). Brown et al. (2005) explain that this classification of data by measurement scale and space–time variability is useful for uncertainty assessment because: (1) it reduces the amount of required information requested from the user in populating a
database; (2) it reduces the amount of information stored in a database (model parameter values); (3) it ensures a close relationship between the structure of the probability model and the techniques used to estimate its parameters and; (4) it encourages planning of measurement campaigns for collecting information on uncertainty. Each data category is associated with a range of uncertainty models, for which more specific pdfs may be developed with different simplifying assumptions (e.g. Gaussian; second-order stationarity; degree of temporal and spatial autocorrelation). The advantages of allowing a range of possible models for each data category are threefold. First, there is a need to explicitly define an appropriate set of statistical assumptions for a particular dataset. Secondly, a range of possible assumptions can be defined a priori, and hence the significance of particular assumptions can be demonstrated with examples. Finally, the trade-off between model complexity, identifiability and reliability can be reviewed over time and balanced against the (changing) practical constraints on assessing uncertainty. For example, levels of risk and expertise can be associated with the simplifying assumptions allowed in a pdf, with default
Table 1 The subdivision and coding of uncertainty-categories, along the ‘axes’ of space–time variability and measurement scale (Brown et al., 2005) Space–time variability
Constant in space and time Varies in time, not in space Varies in space, not in time Varies in time and space
Measurement scale Continuous numerical
Discrete numerical
Categorical
A1 B1 C1 D1
A2 B2 C2 D2
A3 B3 C3 D3
Narrative
}
4
270
J.C. Refsgaard et al. / Environmental Science & Policy 8 (2005) 267–277
models for low-risk applications involving users with limited expertise. Minimum requirements can also be identified for specific datasets, such as data on toxic chemicals. Categorical data (3) differ from numerical data (1, 2) and narrative (4) in three important ways. First, categorical data cannot be manipulated statistically (i.e. computation of mean and variance), because the categories are not measured on a numerical scale. Secondly, individual values may be assigned to unique classes (one value to one class), where pdfs are based on the measured frequency, or perceived probability (Bayes rule), that a value occurs in a particular ‘hard’ class or they can be partially assigned to multiple classes (fuzzy), where probabilities reflect doubt about the proportional membership of a value to a particular class (Heuvelink and Burrough, 1993). For the purposes of an uncertainty analysis, this distinction is important, because accuracy assessments are more complicated for fuzzy descriptions of reality. An important issue often overlooked with categorical data (e.g. the confusion matrix in landcover classification) is the problem of correlation in space and time or between datasets, since traditional statistical techniques do not apply to categorical data. Reviews with results on data uncertainty reported in the literature have been compiled into a guideline report for assessing uncertainty in various types of data originating from meteorology, soil physics and geochemistry, hydrogeology, land cover, topography, discharge, surface water quality, ecology and socio-economics (Van Loon and Refsgaard, 2005). 2.3. Software tool to support uncertainty assessment in data and models The components of the HarmoniRiB uncertainty software are shown in Fig. 2. There are four software components in the HarmoniRiB design, namely: (1) a module for assessing uncertainties in data and storing this information within a database design
(the database design is described briefly below (assess data uncertainty)); (2) a module for assessing uncertainties in models (assess model uncertainty); (3) a module for sampling from a distribution of uncertain inputs and (possibly) model parameters and implementing the model for each realisation of the uncertain inputs and parameters (uncertainty propagation); (4) a module for synthesising and presenting the uncertainty results ( present uncertainty). The Data Uncertainty Engine (DUE) is illustrated in Fig. 3. It separates the analysis of data uncertainties into four stages, whereby objects are first imported into the software (1), the sources of uncertainty are then identified (2) (important for a structured analysis) and are translated into a simple model (3) (e.g. probability model) from which ‘alternative realities’ can be generated. These ‘alternative realities’ are used in an uncertainty propagation analysis to establish the impacts of data uncertainty on other operations, such as modelling. Finally, it is necessary to reflect on the quality of an uncertainty analysis (4), as they are fraught with assumptions and difficulties and can be misleading without quality control. The information required to generate ‘alternative realities’ of one or more environmental attributes is stored in the project database (see below). The methodology proposed for assessing model uncertainty is outlined in Refsgaard et al. (submitted for publication). 2.4. Uncertainty in socio-economics Often uncertainty assessments are confined to uncertainties in data and models originating from natural science. We also consider uncertainty in socio-economic aspects by developing concepts based on the management of water resources and river basins (e.g. Cech, 2003). It takes into account literature on evaluation, e.g. cost-benefit analysis (Hanley and Spash, 1993; Bergstrom et al., 2001), multicriteria analysis (Roy, 1996; Munier, 2004) and decision making under uncertainty (Jungermann et al., 1998). The innovative aspects of our work lie in the further development
Fig. 2. HarmoniRiB software components.
J.C. Refsgaard et al. / Environmental Science & Policy 8 (2005) 267–277
271
Fig. 3. Screen shots from the HarmoniRiB data uncertainty assessment tool.
of these ideas to support the implementation of the WFD and particularly elaborating the role of uncertainty in the process of creating and selecting management measures. The uncertainty in socio-economic data of official statistics (Eurostat, Statistical bureaus of German La¨ nder and the FRG) has been surveyed. We found that the efforts to produce accurate economic data are enormous but the knowledge and awareness of the remaining uncertainties is generally low. Despite the lack of knowledge and awareness about uncertainty in socio-economic data and their sources we judge the consideration of these uncertainties in river basin management as highly relevant. On the basis of our investigations and our experience, we expect that it will be difficult to reach a meaningful quantification of many of these uncertainties. Methods for the systematic collection of qualitative information on uncertainties as well as strategies to deal with uncertainties that are not necessarily based on quantification are therefore needed.
3. Databases for accommodating uncertain data 3.1. Functionality with respect to data uncertainty We have designed and developed software for a database than can handle data and data uncertainty. The novelty of this database is that it meets the following requirements: It can store time-series data. It can store spatial data, both raster and vector, as well as time-series of spatial data.
It can store information about uncertainty in these data. The uncertainty characteristics are described according to the uncertainty categories listed in Table 1. This implies that for the continuous data types the uncertainty is described by use of a probability density function (pdf) and a correlation matrix (or correlation function) for normally distributed data. For categorical data (such as land cover or soil type), a non-parametric distribution is typically required, and may be stored alongside transition probabilities for describing statistical dependence. The HarmoniRiB database design therefore allows the user to associate a probability model with each uncertain data item. In future, the database will be extended to allow numerical bounds (e.g. confidence intervals) and scenarios when probabilities cannot be defined. Information on the sources of uncertainty and the quality of an uncertainty model is also stored in the database. An initial list of pdfs and autocorrelation functions are included in a Probability Distribution Function Dictionary and an Autocorrelation Function Dictionary of the database. In addition the software will allow a user to add new functions when required. In practice, it may not be possible to calculate the pdf parameters for every attribute value in the database individually. It may only be feasible to calculate them at the level of the attribute with which the value is associated (i.e. an assumption of stationarity in space or time). In all cases, an uncertainty model is referenced by an Uncertainty Model ID (UMID), which acts as a pointer to an uncertainty model that applies to a specific location in space or time and to the information on statistical dependence between locations and attributes.
272
J.C. Refsgaard et al. / Environmental Science & Policy 8 (2005) 267–277
3.2. General database functionality The overall aim of the HarmoniRiB database system is to enable the HarmoniRIB Data Centre to receive, quality control, store and make available the representative basin data being assembled by the project. Ideally, it should be able to handle any data required for developing WFDcompliant River Basin Management Plans. This includes data for underlying modelling studies, and thus exceeds the WFD needs for reporting or river basin characterisations. The data will cover a wide range of water related topics but will mainly take the form of site descriptions and time series records. They will also include spatial data describing site locations, networks and variables such as land use or elevation. The proposed HarmoniRiB database design for holding these data is generic and is based on the WIS Cube (Moore, 1997). The major enhancements are not only the inclusion of uncertainty but also the seamless linking of metadata to data and a new underlying table design. At the user level, a HarmoniRiB database perceives the world as being composed of objects. These are any objects whose description and history the user wishes to record. The types or classes of object are decided by the user. Examples of object classes relevant to the WFD are sampling points, wells, reservoirs and rivers. The descriptions of objects and the events observed at them are recorded in terms of attribute values. Attributes, like object classes, are decided and defined by the user, the definitions being held in a dictionary. A wide range of spatial and non-spatial data types are supported, allowing the system to record most known or foreseeable types of attribute information required for the implementation of the WFD. Examples of attributes are object identifiers (names, reference codes, serial numbers, etc.), position, mean daily river flow, concentration (of e.g. nitrate), soil type and hydraulic conductivity. At the conceptual level, there is no differentiation between spatial and non-spatial attributes. They are all stored within the same logical framework. One way of visualising the manner in which data are stored in a HarmoniRiB database is to imagine a large cube, made up of individual cells as shown in Fig. 4. The three axes of this cube represent objects (WHERE observations were made), attributes (which record WHAT the observation was a measure of) and occasions (WHEN the observations were made). Thus, each cell in the cube records the value of an attribute at a particular object for a particular point in time. For example, one cell might record the concentration of calcium on 29 June 2002 at 10:20 (GMT) in the river Thames at Wallingford. The design regards all attribute values as potentially changeable over time, thus enabling it to handle time-series data such as river flow. This facility applies to spatial attributes as well as conventional time series making it possible to track an object’s movement. There is no constraint on the number of objects, attributes or occasions
Fig. 4. The Cube as a way of visualising how time series data are stored (Tindal et al., 2004).
which can be recorded, other than that imposed by the physical limits of the hardware. The Cube is otherwise unlimited in all directions. The cells in the cube hold the users’ data. Each cell contains a single attribute value. A cell can also contain some or all of the following information associated with the value: A qualifier for the value. A qualifier is an item of information which users may enter in order to amplify the meaning of an attribute value. For example, qualifiers may be useful in: Bird or bacteriological count attributes where the value may take the form of, say, ‘more than 10,000’. In this case, the value would be entered as 10,000, and the qualifier as > Chemical concentration attributes, where the actual concentration is unknown, but it is possible to say that it is less than a certain value, where the value represents the limit of detection of the analysis method. The value would be entered as the limit of detection, for example 0.001, and the qualifier as < A method of derivation identifier. The method code is a user defined code identifying the source from which the value was obtained or the method by which it was derived. This information can be used, for example, by future users of the value, to determine its reliability. A measure of the value’s uncertainty in the form of a reference to an uncertainty model stored elsewhere in the database. This part of the requirement represents the major area of innovation and is likely to evolve as the project progresses. Dataset ID. Every value in the database has a pointer connecting the value to the dataset of which it is a member. The definition of what constitutes a dataset is up to the user. The only mandatory part of its definition is that the data values that make up a dataset must be owned by the same person or organisation. This condition is necessary to facilitate access control which will relate to ‘owned’ blocks of data.
J.C. Refsgaard et al. / Environmental Science & Policy 8 (2005) 267–277
Uncertainty Model ID. Each value contains a reference to an uncertainty model, which describes the range of possible values that an attribute might take at a given location. At the physical level, the data will be stored in a set of tables in a relational database such as Oracle. These will be held in a single account managed by the database administrator. Approved applications such as the data load facility will have direct access to this account and will be able to select and update data. Users and user written applications will be given read only access to the database via their own accounts. The database software is developed for application on an ArcSDE/ArcGIS platform using ESRI technology.
4. River basin network and data Many networks of river basin data have been established for research purposes during the last couple of decades. A review of the characteristics of existing networks with respect to type of data, geographical coverage, data
273
accessibility and data use by third parties is provided by Passarella and Vurro (2003). Examples of existing international networks are Flow Regimes from International Experimental and Network Data (FRIEND); Global Runoff Data Centre (GRDC); Hydrology for the Environment, Life and Policy (HELP); World Hydrological Cycle Observing System (WHYCOS); European River and Catchment Database Pilot Project (ERICA); Inventory of the Catchments for Research in Europe (ICARE) metadatabase and the Experimental Representative Basins (ERB) network and GLOWA. In addition to these international networks, many national databases containing data from national networks of river basins exist, e.g. Lowland Catchment Research (LOCAR); Data Storage for the Rijkswaterstaat (DONAR) and British Oceanographic Data Centre (BODC). Some of the existing networks provide data for operational purposes, while most of them have been established for research purposes. Many of these networks have existed for long periods and have served (and still do) important purposes. However, seen from a Water Framework Directive perspective, most of them have the key deficiency that they focus on only some aspects (domains) of
Fig. 5. Location of the HarmoniRiB network of representative river basins.
5325
1090
21328
600
1980
3780 (1980 in NL) 311
12917
DE, Weisse Elster
DK, Odense
ES, Jucar
GR, Geropotamou
IT, Candelaro
NL + DE Vecht
UK, Thames
Drinking water, electrical power, Flood protection, minimum discharges, water quality recreation, nature Agriculture Drinking water, industry Point and non-point sources; wastewater and contaminated sites; strong economic and social changes. Agriculture Public water supply, Agricultural contamination; groundwater abstraction depletes recreation, nature stream flow and wetlands Agriculture Irrigation, hydroelectric, Farming use; hydroelectrical use; touristic water demand touristic supply, industry Agriculture Irrigation, touristic Water shortage, water quality, oversized dam, salt intrusion, difficulties in sharing water among municipalities Agriculture Irrigation, industry Water shortage; rainfall rates decrease; intensive horticultural farming. Industry, agriculture, habitation Agriculture, drinking water, Agriculture, water quality, ecology, flooding — receiving water, recreation room for water retention Urban, agriculture Public water supply, ecosystem, Water supply vs. ecology recreation 30000 929
19000
10277 230
10000 66
9900 28
25000 135
15000 278
5600
Agriculture, forest 142 3998 CZ, Svratka
Main water uses Population GNP Dominant land use density (Euro/pers/year) (person/km2) Country river basin Area (km2)
data required for water management in WFD, and most typically they do not contain data on ecological and socioeconomic aspects. Even comprehensive national databases such as LOCAR and DONAR do not contain do not contain much data on groundwater, land use and socio-economics. Among the international networks HELP has the broadest scope with a focus on socio-economic aspects. HELP, however, does not include groundwater or coastal water data. Furthermore, HELP so far only consists of rather few river basins Worldwide and does not have a good coverage in Europe. Thus, none of the existing river basin networks can provide suitable datasets for supporting research on integrated water management of direct relevance for implementation of the WFD. In addition, none of the existing networks comprise any quantifiable information on data uncertainty. Consequently, it is concluded that there is a clear need to supplement the existing networks with a network of representative river basins that as its principal aim has to provide data supporting research in integrated water resources management as required by the WFD. The HarmoniRiB river basin network is meant for this purpose. The HarmoniRiB network of representative river basins comprise eight basins, see Fig. 5 for locations and Table 2 for characteristic features. These basins have been selected to ensure a good coverage across Europe in terms of ecoregions, types of water problems, socio-economic conflicts and amount and quality of existing data. In addition, two of the river basins (Odense and Jucar) are also included in the Pilot River Basin Network, where the EC guidance documents have been tested. The aim of HarmoniRiB is, through interaction with the respective river basin organisations and data owners, to provide well documented data for research purposes, suitable for studying the influence of uncertainty on management decisions. The data will be publicly accessible for all research purposes. Thus, scientists may use the data to, e.g. assess the appropriateness of models and other tools in relation to the WFD. For each of the eight river basins a comprehensive amount of data is presently being collected and uploaded to the HarmoniRiB database. The data basically include all data that are required to carry out analysis for the WFD implementation (Blind and de Blois, 2003). Most of the data are organised in seven datasets, one for each of the six domains: climate, rivers, lakes, groundwater, transitional waters, and coastal waters, and one for spatial data, river basin characteristics and socio-economic data. Specific lists of data have been prepared by matching the data requirements given in the guidance documents on ‘Monitoring’ (EC, 2003b) and ‘Analysis of pressures and impacts’ (EC, 2003a), with the data available in the respective river basins (Rasmussen, 2003). After collecting and reformatting the data they are being uploaded to the HarmoniRiB Data Centre. Subsequently, uncertainty will be assessed and added to the data following the framework outlined above.
Main conflicting interest
J.C. Refsgaard et al. / Environmental Science & Policy 8 (2005) 267–277
Table 2 Key characteristics of the HarmoniRiB network of representative river basins
274
J.C. Refsgaard et al. / Environmental Science & Policy 8 (2005) 267–277
5. Case studies For each of the river basins the methodologies will be tested through one case study for each of the eight river basins. The focus in the case studies will be assessment of uncertainties related to various aspects of the decision process related to evaluating potential measures for achieving the WFD objective of good ecological status. The following aspects of uncertainty will be considered: Uncertainty related to framing of the decision making process. This uncertainty will typically be described in qualitative terms. Uncertainty related to prediction of effects of a given measure, i.e. what is the impact of a given management decision such as changes in agricultural practice of abstraction of groundwater. Such predictions will often be made by use of hydrological models and involve the following sources of uncertainty: - Uncertainty of input data. - Uncertainty of model parameter values. - Uncertainty of model techniques (numerical solution, software bugs, etc.). - Uncertainty of model structure. Uncertainty on economic assessments, which, like for uncertainty in hydrological model predictions, may originate from economic data and from the choice of evaluation method. A key problem in assessing the uncertainty of the effects of a measure is that the effects usually are estimated as a
275
difference between two model simulations, e.g. a reference run describing the present conditions and a run where the measure is taken into account. Procedures for assessing uncertainty of a model simulation are well known, while procedures for assessing uncertainties in differences between two simulation runs are theoretically difficult and rarely used. However, here we are mainly interested in the uncertainty on the difference figures. These uncertainties related to differences in simulated output may be much smaller than the uncertainties in the model predictions of each simulation (Reichert and Borsuk, 2005) as many sources of uncertainty affect the predictions for different alternatives in similar ways. The results of the case study will be uncertainties expressed partly quantitatively and partly qualitatively. The quantitative parts may be illustrated as in Fig. 6, where the uncertainty on the impacts (hydrological models) are shown along the vertical axis and the uncertainty on the costs of implementing a measure is shown along the horizontal axis. In the hypothetical example shown in Fig. 6 measure no. 1 (PoM 1) is clearly suboptimal as compared to the two other measures, because its effect is much lower and the implementation cost higher. A decision on whether to chose PoM 2 or PoM 3 is, however, more difficult, because the uncertainty ranges are overlapping both with regards to effects and costs. The choice will also be influenced by the risk strategy of the decision maker. If the decision maker wants a high degree of certainty for an effect corresponding to the dashed line denoted ‘Minimum effect’ s/he will have to select PoM 3, even if the expected cost efficiency of PoM 2 is more favourable.
Fig. 6. Graphical representation of uncertainty in simulated effect of measure vs. estimated uncertainty in cost of implementing a measure.
276
J.C. Refsgaard et al. / Environmental Science & Policy 8 (2005) 267–277
6. Discussion and conclusions
Acknowledgement
Assessment of uncertainty in model simulations is important when such models are used to support decisions in water resources management (Beven and Binley, 1992; Pahl-Wostl, 2002; Jakeman and Letcher, 2003; Refsgaard and Henriksen, 2004). This is reflected in EU’s new water management approaches as described in the Water Framework Directive (EC, 2000) and the associated guidance documents. A basic principle in EU environmental policy on which the WFD is based is ‘‘. . . to contribute to pursuit of the objectives of preserving, protecting and improving the quality of the environment in prudent and rational use of natural resources, and to be based on the precautionary principle . . . ’’ (paragraph 11 in the directive). The holistic concept that is prescribed in the WFD with its integrated approach to natural resources and socio-economic issues therefore requires that uncertainty be considered in the decision making process in order for it to become truly rational. This need for taken uncertainties into account is also explicitly stated in the WFD guidance documents (Blind and de Blois, 2003). The key sources of uncertainty of importance for evaluating the effect and cost of a measure in relation to preparing a WFD-compliant river basin management plan are (1) uncertainty related to framing of the decision making process; (2) uncertainty related to hydrological models (input data, parameter values, model technique, model structure) and; (3) uncertainty in economic assessments. The framework adopted in HarmoniRiB addresses this wide spectrum of uncertainties. The particularly novel contributions of HarmoniRiB in this respect are related to the assessment of uncertainty in data and to the integration of uncertainty in effects of a measure (outputs from hydrological models) and socio-economic uncertainty, including uncertainty in costs of implementing a measure. New principles often lead to a demand for new research for supporting their implementation. This is also the case for the WFD. Hence there is a need for easy access to river basin datasets suitable for WFD related research. None of the existing international river basin networks can provide suitable datasets for supporting research on integrated water management of direct relevance for implementation of the WFD. In addition, none of the existing networks comprise any quantifiable information on data uncertainty. The HarmoniRiB project aims at filling this gap by designing, building and populating a database containing data and associated uncertainties for a eight river basins representatively characterising the diversity of climatic regimes and water management challenges across Europe. This river basin network aims at becoming a ‘virtual laboratory for modelling studies’, and it will be made available for the scientific community. The data may, e.g. be used for comparison and demonstration of methodologies and models relevant to the WFD.
This work is partly funded by the EC Energy, Environment and Sustainable Development programme (Contract EVK1-2002-00109).
References Beck, M.B., 1987. Water quality modelling: a review of the analysis of uncertainty. Water Resour. Res. 23 (8), 1393–1442. Bergstrom, J.C., Boyle, K.J., Poe, G.L. (Eds.), 2001. The Economic Value of Water Quality. Edward Elgar, Chaltenham. Beven, K., Binley, A.M., 1992. The future of distributed models, model calibration and uncertainty predictions. Hydrol. Processes 6, 279–298. Blind, M., de Blois, C., 2003. The Water Framework Directive and its Guidance Documents — Review of data aspects. In: Refsgaard, J.C., Nilsson, B. (Eds.), Requirements, Report, Geological Survey of Denmark, Greenland, Copenhagen (Chapter 5). Available on http:// www.harmonirib.com/. Brown, J.D., 2004. Knowledge, uncertainty and physical geography: towards the development of methodologies for questioning belief. Trans. Inst. Br. Geographers 29 (3), 367–381. Brown, J.D., Heuvelink, G.B.M., Refsgaard, J.C., 2005. An integrated framework for assessing and recording uncertainties about environmental data. To appear in a special issue of Water Sci. Technol. Cech, T.V., 2003. Principles of Water Resources — History, Development, Management, and Policy. John Wiley & Sons, New York. EC, 2000. Water Framework Directive. Directive 2000/60/EC. European Commission. EC, 2003a. Guidance for the analysis of Pressures and Impacts in accordance with the Water Framework Directive. Working Group 2.1. Available on http://forum.europa.eu.int/Public/irc/env/wfd/library. EC, 2003b. Water Framework Directive, Common Implementation Strategy. Working group 2.7. Monitoring. Available on http://forum.europa.eu.int/Public/irc/env/wfd/library. Funtowicz, S.O., Ravetz, J., 1990. Uncertainty and Quality in Science for Policy. Kluwer Academic Publishers, Dordrecht. GWP, 2000. Integrated Water Resources Management. TAC Background Papers No. 4. Global Water Partnership, Stockholm. Available on http:// www.gwpforum.org/. Hanley, N., Spash, C.L., 1993. Cost-Benefit Analysis and the Environment. Edward Elgar, Brookfield. Heuvelink, G.B.M., Burrough, P.A., 1993. Error propagation in cartographic modelling using Boolean logic and contionous classification. Int. J. Geogr. Inform. Sci. 7 (3), 231–246. Jakeman, A.J., Letcher, R.A., 2003. Integrated assessment and modelling: features, principles and examples for catchment management. Environ. Modell. Software 18, 491–501. Jungermann, H., Pfister, H-R., Fischer, K., 1998. Die Psychologie der Entscheidung (The Psychology of Decisions). Spektrum Akademischer Verlag, Heidelberg. Klauer, B., Brown, J.D., 2003. Conceptualising imperfect knowledge in public decision making: ignorance, uncertainty, error and ‘risk situations’. Environ. Res., Eng. Manage. Moore, R.V., 1997. The logical and physical design of the land Ocean Interaction Study database. Sci. Total Environ. 194/195, 137–146. Munier, N., 2004. Multicriteria Environmental Assessment. Kluwer Academic Publishers, Dortrecht. Pahl-Wostl, C., 2002. Towards sustainability in the water sector — the importance of human actors and processes of social learning. Aquatic Sci. 64, 394–411. Passarella, G., Vurro, M., 2003. Review of Existing River Basin Networks. In: Refsgaard, J.C., Nilsson, B. (Eds.), Requirements Report. Geological
J.C. Refsgaard et al. / Environmental Science & Policy 8 (2005) 267–277 Survey of Denmark and Greenland, Copenhagen (Chapter 3). Available on http://www.harmonirib.com/. Rasmussen, P., 2003. Requirements for Data for HarmoniRiB. In: Refsgaard, J.C., Nilsson, B. (Eds.), Requirements Report. Geological Survey of Denmark and Greenland, Copenhagen (Chapter 7). Available on http://www.harmonirib.com/. Refsgaard, J.C., Henriksen, H.J., 2004. Modelling guidelines — terminology and guiding principles. Adv. Water Resour. 27, 71–82. Refsgaard, J.C., van der Sluijs, J.P., Brown, J., van der Keur, P., submitted for publication. A framework for dealing with uncertainty due to model structure error. Reichert, P., Borsuk, M.E., 2005. Does high forecast uncertainty preclude effective decision support. Environ. Modell. Software 20 (8), 991–1001. Roy, B., 1996. Multicriteria Methodology for Decision Aiding. Kluwer Academic Publishers, Dortrecht. Tindal, C.I., Moore, R.V., Dunbar, M., Goodwin, T., 2004. The HarmoniRiB project — the effect of uncertainty on catchment management. In: British Hydrological Society International Conference on Hydrology: Science and Practice for the 21st Century, 12–16 July 2004, London, UK. Walker, W.E., Harremoe¨ s, P., Rotmans, J., Van der Sluijs, J.P., Van Asselt, M.B.A., Janssen, P., Krayer von Krauss, M.P., 2003. Defining uncertainty. A conceptual basis for uncertainty management in model-based decision support. Integrated Assess. 4 (1), 5–17. Van Loon, E., Refsgaard, J.C. (Eds.), 2005. Guidelines for assessing data uncertainty in hydrological studies. First draft version prepared September 2004. Final version to be published beginning of 2005 on http:// www.harmonirib.com/. Jens Christian Refsgaard is co-ordinator of the HarmoniRiB project. Since his graduation in hydrology at the Technical University of Denmark in 1976 he has worked with hydrological modelling and water resources management at DTU, DHI and now at GEUS, where he holds a position as research professor. He is currently also WP leader in HarmoniQuA (quality assurance in the modelling process) and NeWater (new approaches in water resources management). Bertel Nilsson is a research scientist in hydrogeology at Geological Survey of Denmark and Greenland since 1988. James Brown is a postdoctoral research associate at the University of Amsterdam with interests in environmental modelling, methods for uncertainty analysis of models, and the impacts of scientific uncertainty on decision making.
277
Bernd Klauer has a professional background in mathematics, physics and economics. After his PhD in economics from the University of Heidelberg he became engaged at the UFZ Centre for Environmental Research, Leipzig. There he currently works as a senior scientist and leader of a research group on integrated assessment and decision support. Roger Moore is a member of the Centre for Ecology and Hydrology, UK. His backgound lies in civil engineering but has spent most of his career working on integrated database design mainly in the UK but also around the world. Currently, he is also co-ordinator for The FP5 project HarmonIT. Thomas Bech holds an MSc in electronics engineering and computer science, and has worked as software developer and project manager at Seven Technologies and DHI Water & Environment. He is currently working as a Software Development Manager at DHI Water & Environment. Michele Vurro graduated in hydraulic engineering. Researcher at CNR.IRSA from 1982, and is now principal researcher with responsibility for methodology and techniques for protecting and managing water resources, with particular emphasis on water budget under scarce water availability. Michiel Blind, Msc Environmental Science — Water Systems Analysis, has worked 5 years on monitoring network design at Wageningen University, where after he continued his career at RWS-RIZA, on IT-water management issues. He is mainly involved in European Research Projects on Catchment modelling. Guillermo Castilla is a forest engineer specialized in Remote Sensing and GIS. He is currently involved in the dissemination activities of HarmoniRiB. Ioannis K. Tsanis is a professor in the Department of Environmental Engineering at Technical University of Crete. He obtained his PhD in civil engineering from University of Toronto. His research activities are in the areas of hydroinformatics, water resources management and coastal engineering. His main background is hydrological modelling, water resources management and hydroinformatics. Pavel Biza has been educated in civil engineering and developed his career at the water board Povodi Moravy in the Czech Republic. He is now involved in development of river basin management plans.
[15]
Refsgaard JC, van der Sluijs JP, Brown J, van der Keur P (2006). A framework for dealing with uncertainty due to model structure error. Advances in Water Resources, 29, 1586-1597.
Reprinted from Advances in Water Resources with permission from Elsevier
Advances in Water Resources 29 (2006) 1586–1597 www.elsevier.com/locate/advwatres
A framework for dealing with uncertainty due to model structure error Jens Christian Refsgaard a,*, Jeroen P. van der Sluijs b, James Brown c, Peter van der Keur a a
Department of Hydrology, Geological Survey of Denmark and Greenland (GEUS), Oster Voldgade 10, 1350 Copenhagen, Denmark b Copernicus Institute for Sustainable Development and Innovation, Department of Science Technology and Society, Utrecht University, Utrecht, The Netherlands c University of Amsterdam (UVA), Amsterdam, The Netherlands Received 29 July 2004; received in revised form 6 September 2005; accepted 21 November 2005 Available online 5 January 2006
Abstract Although uncertainty about structures of environmental models (conceptual uncertainty) is often acknowledged to be the main source of uncertainty in model predictions, it is rarely considered in environmental modelling. Rather, formal uncertainty analyses have traditionally focused on model parameters and input data as the principal source of uncertainty in model predictions. The traditional approach to model uncertainty analysis, which considers only a single conceptual model, may fail to adequately sample the relevant space of plausible conceptual models. As such, it is prone to modelling bias and underestimation of predictive uncertainty. In this paper we review a range of strategies for assessing structural uncertainties in models. The existing strategies fall into two categories depending on whether field data are available for the predicted variable of interest. To date, most research has focussed on situations where inferences on the accuracy of a model structure can be made directly on the basis of field data. This corresponds to a situation of ‘interpolation’. However, in many cases environmental models are used for ‘extrapolation’; that is, beyond the situation and the field data available for calibration. In the present paper, a framework is presented for assessing the predictive uncertainties of environmental models used for extrapolation. It involves the use of multiple conceptual models, assessment of their pedigree and reflection on the extent to which the sampled models adequately represent the space of plausible models. 2005 Elsevier Ltd. All rights reserved. Keywords: Environmental modelling; Model error; Model structure; Conceptual uncertainty; Scenario analysis; Pedigree
1. Introduction 1.1. Background Assessing the uncertainty of model simulations is important when such models are used to support decisions about water resources [6,33,23,39]. The key sources of uncertainty in model predictions are (i) input data; (ii) model parameter values; and (iii) model struc*
Corresponding author. Tel.: +45 38 14 27 76; fax: +45 38 14 20 50. E-mail address:
[email protected] (J.C. Refsgaard).
0309-1708/$ - see front matter 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.advwatres.2005.11.013
ture (=conceptual model). Other authors further distinguish uncertainty in model context, model assumptions, expert judgement and indicator choice [46,54,48] but these are beyond the scope of this paper. Uncertainties due to input data and due to parameter values have been dealt with in many studies, and methodologies to deal with these are well developed. However, no generic methodology exists for assessing the effects of model structure uncertainty, and this source of uncertainty is frequently neglected. Any model is an abstraction, simplification and interpretation of reality. The incompleteness of a model
J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597
structure and the mismatch between the real causal structure of a system and the assumed causal structure as represented in a model always result in uncertainty about model predictions. The importance of the model structure for predictions is well recognised, even for situations where predictions are made on output variables, such as discharge, for which field data are available [16,8]. The considerable challenge faced in many applications of environmental models is that predictions are required beyond the range of available observations, either in time or in space, e.g. to make extrapolations towards unobservable futures [2] or to make predictions for natural systems, such as ecosystems, that are likely to undergo structural changes [4]. In such cases, uncertainty in model structure is recognised by many authors to be the main source of uncertainty in model predictions [44,13,31,28]. 1.2. An example – five alternative conceptual models The problem is illustrated for a study conducted by the County of Copenhagen in 2000 involving a real water management decision [11,37]. The County of Copenhagen is the authority responsible for water resources management in the county where the city of Copenhagen abstracts groundwater for most of its water supply. According to a new Water Supply Act the county had to prepare an action plan for protection of groundwater against pollution. As a first step, the county asked five groups of Danish consulting firms to conduct studies of the aquifer’s vulnerability towards pollution in a 175 km2 area west of Copenhagen, where the groundwater abstraction amounts to about 12 million m3/year. The key question to be answered was: which parts of this particular area are most vulnerable to pollution and need to be protected? The five consultants were among the most well reputed consulting firms in Denmark, and they were known to have different views and preferences on which methodologies are most suitable for assessing vulnerability. As the task was one of the first consultancy studies on a new major market for preparation of groundwater protection plans it was considered a prestigious job to which the consultants generally allocated some of their most qualified professionals. The five consultants used significantly different approaches. One consultant based his approach on annual fluctuations of piezometric heads assuming that larger fluctuations represent greater interaction between aquifer and surface water systems and hence a larger vulnerability. Several consultants used the DRASTIC multi-criteria method [1], but modified it in different ways by changing weights and adding new, mainly geochemically oriented, criteria. One consultant based his approach on advanced hydrological modelling of both groundwater and surface water systems using the MIKE
1587
SHE code [40], while two other consultants used simpler groundwater modelling approaches. Thus, the five consultants had different perceptions of what causes groundwater pollution and used models with different processes and causal relationships to describe the possibility of groundwater pollution in the area. In addition, their different interpretations and interpolations made from common field data resulted in significantly different figures for e.g. areal means of precipitation and evapotranspiration and the thickness of various geological layers [37]. The conclusions of the five consultants regarding vulnerability to nitrate pollution are shown in Fig. 1. It is apparent that the five estimates differ substantially from each other. In the present case, no data exist to validate the model predictions, because the five models were used to make extrapolations. Thus, it is not possible, from existing field data, to tell which of the five model estimates are more reliable. The differences in prediction originate from two main sources: (i) data and parameter uncertainty and (ii) conceptual uncertainty. Although the data and parameter uncertainties were not explicitly assessed by any of the consultants (as is common in such studies), the substantial differences in model structures and the fact that the consultants all used the same raw data point to structural uncertainty as the main cause of difference between the five model results and as a major source of uncertainty in model predictions.
Fig. 1. Model predictions on aquifer vulnerability towards nitrate pollution for a 175 km2 area west of Copenhagen [11].
1588
J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597
Usually a water manager bases their decisions on the conclusions from only one study. The uniqueness of the present study was that five consultants were asked to answer the same question on the basis of the same data. In this respect the differences between the five estimates are striking and clearly do not provide a sound basis for deciding anything about which areas should be protected. A worrying question, which is left unanswered, is whether the basis for decisions is similarly poor in the many other cases where only a single conceptual model has been adopted and where millions of DKK have subsequently been used to prepare and implement action plans. 1.3. Objective and outline of paper The objective of this paper is to review possible strategies for dealing with model structure errors and to outline a framework for handling the effects of model structure errors on predictive uncertainty, with particular emphasis on situations where model predictions represent extrapolations to situations not covered by calibration data and are often outside the domain on which our knowledge on the dynamics of the system and our understanding of its causal relationships is based. The paper is organised so that reviews of existing strategies and the discussion of their potentials and limitations are given in Section 2. A new framework is presented in Section 3 for analysing the uncertainties due to model structure errors when models are used for making extrapolations beyond their calibration base. Finally, the problems and perspectives of the new framework
are discussed in Section 4. The terminology used is defined in Appendix.
2. Review of possible strategies 2.1. Classification The existing strategies for assessing uncertainty due to incomplete or inadequate model structure may be grouped into the categories shown in Fig. 2. The most important distinction is whether data exist that makes it possible to make inferences on the model structure uncertainty directly. This requires that data are available for the output variable of predictive interest and for conditions similar to those in the predictive situation. In other words it is a distinction between whether the model predictions can be considered as interpolations or extrapolations relative to the calibration situation. The two main categories are thus equivalent to different situations with respect to model validation tests. According to Klemes’ classical hierarchical test scheme [26,38], the interpolation case corresponds to situations where the traditional split-sample test is suitable, while the extrapolation case corresponds to situations where no data exist for the concerned output variable (proxy-basin test) or where the basin characteristics are considered non-stationary, e.g. for predictions of effects of climate change or effects of land use change (differential split-sample test). In the review of existing strategies given below examples of studies have been selected to illustrate the classification and the common approaches. It is not an
Availability of data for model validation test?
No direct data (extrapolation)
Target data exist (interpolation)
Increase parameter uncertainty
Estimate structural term
Intermediate data (differential splitsample case)
Multiple conceptual models
Expert elicitation
No data at all (proxy basin case)
Fig. 2. Classification of existing strategies for assessing conceptual model uncertainty.
Pedigree analysis
J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597
exhaustive review, but illustrates the range of approaches available to diagnose structural uncertainty in models. 2.2. Data exist – interpolation In this situation, calibration is usually carried out against a sample of the existing field data to ensure some kind of optimal parameter values, and then the model predictions are compared with the remaining (‘independent’) field data. The deviations between model predictions and independent field observations can be used to infer the model’s conceptual error. Different methodologies can be used in this respect. 2.2.1. Increasing parameter uncertainty to account for structural uncertainty One strategy is to increase the parameter uncertainty to a level where it is assumed to compensate for omitting model structure error from the analysis. Van Griensven and Meixner [45] provide an example of this. They assess the total predictive uncertainty without identifying or quantifying the underlying sources of uncertainty. They use the split-sample approach assessing ranges of predictive uncertainty from analyses of predictions and data for a period different from the calibration period. Their total predictive uncertainty is assessed by increasing the model parameter uncertainty beyond the magnitudes estimated during calibration to a level where the resulting predictive uncertainty intervals bracket the observations. This technique does not introduce a separate stochastic term for the structural uncertainty, but represents the structural term in the parameter term. The model structure error is likely to influence the model simulations in non-random and temporally varying ways. By compensating the model structure error by increasing the variance of a temporally constant random variable the results from this approach can be questioned, particularly if used for predictions in situations where split-sample tests are not made. 2.2.2. Estimation of the structural uncertainty term Other strategies attempt to estimate the structural contribution to uncertainty in the model predictions. An example of such an approach is given by Radwan et al. [35], who estimate the total predictive uncertainty from a statistical analysis of the residuals between model predictions and observations. Further, they analyse the propagated uncertainties from model input and parameter values. By subtracting these two uncertainties from the total predictive uncertainty they assign the remaining predictive uncertainty to be an effect of model structure uncertainty. It is then possible to add the model structure uncertainty when making other predictions. This approach assumes that the uncertainties from different sources are additive. This assumption is question-
1589
able, because the combination of uncertainties is often non-linear due to interactions, correlations and dependencies between variables in a model. It also assumes that the differences in predictions and observations are caused by structural error and not by the poor specification of input and parameter uncertainty, nor by errors in the observations. Vrugt et al. [53] present another stochastic approach based on a simultaneous parameter optimisation and data assimilation with an ensemble Kalman filter. By specifying values for measurement error and a so-called ‘stochastic forcing term’, representing structural uncertainty, they are able to estimate the dynamic behaviour of the model structure uncertainty. Both techniques assume a smooth contribution from structural uncertainty, but an important advantage of the latter is that parameter innovations (an output from the Kalman filter) may be used to diagnose non-stationarity in system structure. 2.3. No direct data – extrapolation In cases where model structure errors cannot be assessed directly due to a lack of relevant data, the main strategy is to do the extrapolation with multiple conceptual models. Two supporting methods can be used here for the generation and qualification of each of the alternative models: expert elicitation and pedigree analysis (Fig. 2). 2.3.1. Multiple conceptual models In the scenario approach a number of alternative conceptual models are considered. For each of these, the model input and parameter uncertainties may be analysed and the differences between model predictions are then seen as a measure of the model structure uncertainty. The idea of using alternative or competing candidate model structures was introduced in water quality modelling some time ago [5]. The issue typically dealt with here is whether models developed for current conditions can yield correct predictions when used under changed control. Van Straten and Keesman [50] note in this respect that good performance at the calibration stage does not guarantee correctly predicted behaviour, due to non-stationarity of the underlying processes in space or time. The multiple modelling approach has also been used in flood forecasting. For example, Butts et al. [8] use 10 different model structures to evaluate structural uncertainty in flood predictions. They conclude that exploring an ensemble of model structures provides a useful approach in assessing simulation uncertainty. In groundwater modelling different conceptual models are typically based on different geological interpretations [18,43,42,30,34]. Højberg and Refsgaard [21] present an example using three different conceptual
1590
J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597
models, based on three alternative geological interpretations for a multi-aquifer system in Denmark. Each of the models was calibrated against piezometric head data using inverse technique. The three models provided equally good and very similar predictions of groundwater heads, including well field capture zones. However, when using the models to extrapolate beyond the calibration data to predictions of flow pathways and travel times the three models differed dramatically. When assessing the uncertainty contributed by the model parameter values, the overlap of uncertainty ranges between the three models significantly decreased when moving from groundwater heads to capture zones and travel times. They conclude that the larger the degree of extrapolation, the more the underlying conceptual model dominates over the parameter uncertainty and the effect of calibration. The strategy of applying several alternative models based on codes with different model structures is also common in climate change modelling. In its description of uncertainty related to model predictions of both present and future climates the Intergovernmental Panel on Climate Change (IPCC) [22] bases its evaluation on scenarios of many (up to 35) different models. The same strategy is followed in the dialogue model [52]. Dialogue is a so-called integrated assessment model (IAM) of climate change. It has been developed as an interactive decision-support tool for energy supply policy making. Dialogue simulates the cause effect chain of climate change, using mono-disciplinary sub-models for each step in the chain. The chain starts with scenarios for economic growth, energy demand, fuel mix etc., leading to emissions of greenhouse gasses, leading to changes in atmospheric composition, leading to radiative forcing of the climate, leading to climate change, leading to impacts of climate change on societies and ecosystems. Rather than selecting one mono-disciplinary sub-model for each step, as most other climate IAMs do, dialogue uses multiple models for each step (for instance, three different carbon cycle models, simplified versions of five different global climate model – outcomes, etc.), representing the major part of the spectrum of expert opinion in each discipline. 2.3.2. Expert elicitation Expert elicitation can be used as a supporting method in uncertainty analysis. It is a structured process to elicit subjective judgements and ideas from experts. It is widely used in uncertainty assessment to quantify uncertainties in cases where there is no or too few direct empirical data available to infer uncertainty. Usually the subjective judgement is represented as a probability density function reflecting the experts’ degree of belief. Expert elicitation aims to specify uncertainties in a structured and documented way, ensuring the account is both credible and traceable to its assumptions. Typically it is
applied in situations where there is scarce or insufficient empirical material for a direct quantification of uncertainty [20]. An example with use of expert elicitation to estimate probabilities of alternative conceptual models is given by Meyer et al. [29]. They assessed probabilities as subjective values, from expert elicitation, reflecting a belief about the relative plausibility of each model based on its apparent consistency with available knowledge and data. Expert elicitation can also be used to generate ideas about alternative causal structures (conceptual models) that govern the behaviour of a system. Techniques used in decision analysis include group model building [51] and the hexagon method [19] but these techniques usually aim to achieve consensus. From the point of view of model structure uncertainty, these elicitation techniques could perhaps be used to generate alternative conceptual models. 2.3.3. Pedigree analysis Another supporting method is pedigree analysis. The idea comes from Funtowicz and Ravetz [17], who note that statistical uncertainty in terms of inexactness does not cover all relevant dimensions of uncertainty, including the methodological and epistemological dimensions. To promote a more differentiated insight into uncertainty they propose to extend good scientific practice with five qualifiers for quantitative scientific information: numeral unit, spread, assessment, and pedigree (NUSAP). By adding expert judgement of reliability (assessment) and systematic multi-criteria evaluation of the processes by which numbers have been produced (pedigree), NUSAP has extended the statistical approach to uncertainty (inexactness) with the methodological (unreliability) and epistemological ignorance dimensions. By providing a separate qualification for each dimension of uncertainty, it enables flexibility in their expression. Each special sort of information has its own aspects that are key to its pedigree, so different pedigree matrices using different pedigree criteria can be used to qualify different sorts of information. Early applications of pedigree analysis of environmental models have focussed on parameter pedigree, using proxy representation, empirical basis, methodological rigor, theoretical understanding and validation as pedigree criteria. Later on, pedigree analysis has been extended to assessment of model assumptions and problem framing [49,12]. 2.4. Discussion of strengths/weaknesses and potentials/ limitations The strategies used in ‘interpolation’, i.e. for situations that are similar to the calibration situation with respect to variables of interest and conditions of the natural system, have the advantage that they can be based directly on field data. A fundamental weakness is that
J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597
field data are themselves uncertain. Nevertheless, in many cases, they can be expected to provide relatively accurate estimates of, at least, the total predictive uncertainty for the specific measured variable and for the same conditions as those in the calibration and validation situation. Some of the methods cannot differentiate how the total predictive uncertainty originates from model input, model parameter and model structure uncertainty. Other methods attempt to do so. However, this distinction is, as recognised by many authors, e.g. Vrugt et al. [53], problematic. In the case of uncalibrated models, the parameter uncertainty is very difficult to assess quantitatively, and wrong estimates of model parameter uncertainty will influence the estimates of model structure uncertainty. In the case of calibrated models, estimates of model parameter uncertainty can often be derived from autocalibration routines. An inadequate model structure will, however, be compensated by biased parameter values to optimise the model fit with field data during calibration. Hence, the uncertainty due to model structure will be underestimated in this case. A more serious limitation of the strategies depending on observed data is that they are only applicable for situations where the output variables of interest are measured (e.g. [35,45,53]). While relevant field data are often available for variables such as water levels and water flows, this is usually not the case for concentrations, or when predictions are desired for scenarios involving catchment change, such as land use change or climate change. Another serious limitation stems from an assumption that the underlying system does not undergo structural changes, such as changes in ecosystem processes due to climate change. The strategy that uses multiple conceptual models benefits from an explicit analysis of the effects of alternative model structures. Furthermore, it makes it possible to include expert knowledge on plausible model structures. This strategy is strongly advocated by Neuman and Wierenga [31] and Poeter and Anderson [34]. They characterise the traditional approach of relying on a single conceptual model as one in which plausible conceptual models are rejected (in this case by omission). They conclude that the bias and uncertainty that results from reliance on an inadequate conceptual model are typically much larger than those introduced through an inadequate choice of model parameter values. This view is consistent with Beven [7] who outlines a new philosophy for modelling of environmental systems. The basic aim of his approach is to extend traditional schemes with a more realistic account of uncertainty, rejecting the idea that a single optimal model exists for any given case. Instead, environmental models may be non-unique in their accuracy of both reproduction of observations and prediction (i.e. unidentifiable or equifinal), and subject to only a conditional confirmation, due
1591
to e.g. errors in model structure, calibration of parameters and period of data used for evaluation. A weakness of the multiple modelling strategy, is the absence of quantitative information about the extent to which each model is plausible. Furthermore, it may be difficult to sample from the full range of plausible conceptual models. In this respect, expert knowledge on which the formulations of multiple conceptual models are based, is an important and unavoidable subjective element. The level of subjectivity can be reduced if the scenarios are generated in a formalised and reproducible manner. For example, this is possible with the TPROGS procedure [9,10], by which alternative geological models can be generated stochastically. The subjectivity does not disappear with this approach. Rather, it is transferred from formulation of the geological model itself to assumptions on probability functions and correlation structures of the various geological units that are more easily constrained in practice. The strategy of expert elicitation has the advantage that subjective expert knowledge can be included in the evaluation. It has the potential to make use of all available knowledge including knowledge that cannot be easily formalised otherwise. It can include views of sceptics, and reveals the level of expert disagreement on certain estimates. Expert elicitation also has several limitations. The fraction of experts holding a given view is not proportional to the probability of that view being correct. One may safely average estimates of model parameters, but if the expert’s models were incommensurate, one cannot average models [25]. If differences in expert opinion are irresolvable, weighing and combining the individual estimates of distributions is impossible. In practice, the opinions are often weighted equally, although sometimes self-rating is used to obtain a weight-factor for the experts competence. Finally, the results of expert elicitation tend to be sensitive to the selection of the experts whose estimates are gathered. In a review of four different case studies in which pedigree analysis was applied, Van der Sluijs et al. [49] show that pedigree analysis broadens the scope of uncertainty assessment and stimulates scrutiny of underlying methods and assumptions. Craye et al. [12] reported similar experiences. It facilitates structured, creative thinking on conceivable sources of error and fosters an enhanced appreciation of the issue of quality in information. It thereby enables a more effective criticism of quantitative information by providers, clients, and also users of all sorts, expert and lay. It provides differentiated insight in what the weakest parts of a given knowledge base are. It is flexible in its use and can be used on different levels of comprehensiveness: from a ‘back of the envelope’ sketch based on self-elicitation to a comprehensive and sophisticated procedure involving structured informed in-depth group discussions, covering each pedigree criterion. The scoring of pedigree criteria is to a certain
1592
J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597
degree subjective. Subjectivity can partly be remedied by the design of unambiguous pedigree matrices and by involving multiple experts in the scoring. The choice of experts to do the scoring is also a potential source of bias. The method is relatively new, with a limited (but growing) number of practitioners. There is as yet no settled guideline for good practice. We must keep in mind that it is not a panacea for the problem of unquantifiable uncertainty.
3. New framework We propose that conceptual uncertainty can be assessed by adopting a protocol based on the six elements shown in Fig. 3. The central aim is to establish a number of plausible conceptual models, with a range that adequately samples the space of possible conceptual models, to evaluate the tenability of each conceptual model and the overall range of models selected in relation to the perceived uncertainty on model structure and to propagate the uncertainties in each case. STEP 1: Formulate a conceptual model. A conceptual model is established. Since we have defined a conceptual model as a combination of our qualitative process understanding and the simplifications acceptable for a particular modelling study, a conceptual model becomes highly site-specific and even case-specific. For example a conceptual model of an aquifer may be described as
Formulate a conceptual model
Set up and calibrate model
Sufficient conceptual models?
Perform validation tests and accept/reject models
Evaluate tenability and completeness of conceptual models
Make model predictions and assess uncertainty Fig. 3. Protocol for assessing conceptual model uncertainty.
two-dimensional for a study focussing on regional groundwater heads, while it may need to include threedimensional geological structures for detailed simulation of contaminant transport. Formulating a new conceptual model may involve changing or refining the model structure, e.g. by modifying the hydrogeological interpretations (in the case of groundwater models), dimensionality, temporal and spatial resolution, initial and boundary conditions and process descriptions (governing equations). STEP 2: Set up and calibrate model. On the basis of the formulated conceptual model a site- and case-specific model is set up. Subsequently the model is calibrated and the model parameter uncertainty assessed. For the purposes of ‘interpolation’ (i.e. relevant observations are available), the parameter uncertainty can reasonably be constrained through calibration. However, for the case of ‘extrapolation’, the risk of calibrating model parameters for prediction of unobserved variables is that the model becomes biased for the unobserved variable. STEP 3: Sufficient conceptual models? The first two steps are repeated until sufficient conceptual models are included. This judgement will be influenced by the practical constraints on including additional models and the desire to include additional conceptual models that are substantially different from those already included. STEP 4: Perform validation tests (to the extent data availability allows). In order to evaluate how well the models describe the system in question, the performances of each of the models are tested by comparing model predictions with independent field data, i.e. data not used for calibration. This may be achieved by splitting the sample data into a calibration and validation set, or, alternatively, by cross-validation (e.g. bootstrapping: [15]) against ‘independent data’. The models whose predictive capability is deemed low are discarded and the reasons for these predictive failures are explored, where possible, for insight into the origins of structural uncertainty. In ‘extrapolation’ cases, data will usually not be available for validation tests and STEP 4 must be skipped. However, in some cases, it is possible to test ‘intermediate’ model results. For example a groundwater model aimed at prediction of concentration values can often be tested against groundwater head and discharge data, or sparse concentration data may be available for parts of the study area. STEP 5: Evaluate tenability and completeness of conceptual models. The aim of this step is to analyse the retained models with respect to their predictive bias and uncertainty. This has two elements: (i) to evaluate the tenability of each conceptual model; and (ii) as far as possible, to evaluate the extent to which the retained models represent the space of plausible conceptual models. The tenability of the conceptual models is evaluated
No opinion Black box model Crude speculation
Not at all plausible
Embryonic field Grey box model Preliminary theory Educated guesses indirect approx. rule of thumb estimate
Not very plausible
Competing schools Somewhat plausible Accepted theory with partial nature and limited consensus on reliability Modelled/derived data indirect measurements
Aggregated parameterised meta model
All but rebels Reasonably plausible Accepted theory with partial nature (in view of the phenomenon it describes) Historical/field data uncontrolled experiments small sample direct measurements
Model equations reflect acceptable mechanistic process detail
All but cranks Highly plausible Model equations reflect high mechanistic process detail Well-established theory Controlled experiments and large sample direct measurements
1593
through expert reviews. First, the strength of the tenability of each conceptual model is evaluated by using the pedigree matrix in Table 1. A structured procedure for the elicitation of pedigree scores is given by Van der Sluijs et al. [47]. Note that there is no need to arrive at a consensus pedigree score for each criterion: if experts disagree on the pedigree scores for a given model, this reflects further epistemological uncertainty surrounding that model. Next, the adequacy of the retained conceptual models to represent the range of plausible models is evaluated. This is an assessment of whether the space of the retained conceptual models is sufficient to encapsulate the relevant range of plausible conceptual models without becoming impractical. This has strong similarities to Dunn’s concept of context validation [14]. Context validity refers to the validity of inferences that we have estimated the proximal range of rival hypotheses. Context validation can be performed by a bottom-up process to elicit from experts rival hypotheses on causal relations governing the dynamics of a system. One could argue that an infinite number of conceivable models might exist. However, it has been shown in projects where such elicitation processes were used, that the cumulative distribution of unique rival models flattens out after consultation of a limited number of experts, usually somewhere between 20 and 25 when chosen with diverse enough backgrounds [27]. STEP 6: Make model predictions and assess uncertainty. Together with model predictions of the desired variables, uncertainty assessments are carried out. This will typically include uncertainty in input data and parameter values in addition to the conceptual uncertainty. Furthermore, on the basis of the goodness of the conceptual models, evaluated in STEP 5 the goodness of the assessed predictive uncertainty associated with the model structure should be evaluated.
Exact measures of the modelled quantities
Good fits or measures of the modelled quantities
Well correlated but not measuring the same thing
Weak correlation but commonalties in measure
Not correlated and not clearly related
3
2
1
0
Crude speculation
4. Discussion and conclusions
4
Quality and quantity Proxy
Supporting empirical evidence Score
Table 1 Pedigree matrix for evaluating the tenability of a conceptual model
Theoretical understanding
Representation of understood underlying mechanisms
Plausibility
Colleague consensus
J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597
4.1. Methodologies to assess conceptual uncertainty As discussed above, the existing strategies fall into two main categories, each with limitations. The strategies where model structure errors are assessed from observed data are confined to interpolation cases, understood as cases where the model can be calibrated and validated against field data for the variables of predictive interest and where the natural system does not undergo structural change. The strategies used for situations involving extrapolation depend either on multiple conceptual models (preferred) or on expert elicitation or pedigree analysis for a single conceptual model (usually less preferred). The novelty of our proposed framework is the combination of multiple conceptual models and the pedigree
1594
J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597
approach for assessing the overall tenability of these models in one formalised protocol. Some of our proposed steps are similar to other approaches for dealing with equifinality, multiple possible models and the rejection of non-behavioural model [6,31]. Other steps are based on qualitative approaches, including expert knowledge in a structured manner [20,49]. The aim of our new framework is not to identify the ‘‘true’’ model structure or the cause of the errors in the existing model structure. Instead, we propose an approach that integrates different types of knowledge, not previously combined, such as quantitative and qualitative uncertainty, to estimate the impact of model structure uncertainty on model predictions. The GLUE approach (generalised likelihood uncertainty estimation, [6,7]) also operates with a range of alternative models. Although almost all applications of GLUE reported so far operate with only one model structure and many alternative model parameter sets, it is possible to use GLUE with alternative model structures [24]. In addition to prescribing multiple conceptual models, an important difference between our proposed approach and GLUE is that we recommend parameter optimisation is conducted as part of the calibration in order to take full advantage of the information in field data. There are different opinions about whether calibration by parameter optimisation is advisable or not. The main advantage of calibration is that it improves the ability of the model to reproduce hydrological behaviour of a system within the limits of observed behaviour [31]. An important by-product is that it provides useful information about the uncertainty of model parameters. The disadvantage is that parameter optimisation may result in biased parameter values to compensate for errors in model structure and that many parameter sets (i.e. many models) perform more or less equally well but provide different results. In implementing our framework, model calibration might be skipped and many models with different parameter sets retained, as in the GLUE approach. The reason we are not advocating such an approach is partly for pragmatic reasons (very large computational requirements) and partly that we aim to focus on model structure uncertainty rather than parameter uncertainty. Although intended for use in a very different context, the central aim behind our proposed protocol is similar to the approach of IPCC [22], who assign a level of confidence to their assessment of climate change by evaluating predictions from multiple models. The level of confidence placed in a particular finding reflects both the degree of consensus amongst modellers and the quantity of evidence that is available to support the finding. IPCC [22] classifies the confidence qualitatively in three levels: (i) ‘well established’, (ii) ‘evolving’ and (iii) ‘speculative’.
4.2. Critical issues for implementing the new protocol 4.2.1. Performance criteria – threshold for accepting/ rejecting models A critical issue in relation to acceptance/rejection of models (STEP 4 above) is how to define performance criteria. We agree with Beven [7] that any conceptual model is (known to be) wrong in an absolute sense, and hence that any model will be rejected if we investigate it in sufficient detail and specify very high performance criteria. On the other hand, the whole point in modelling is to simplify. A good reference for model performance is to compare it with uncertainties of the available field observations. If the model performance is within this uncertainty range we may characterise the model as good enough. However, usually it is less straightforward. For example, how wide should the confidence bands be before we reject models or accept them within observational uncertainties – ranges corresponding to 65%, 95% or 99%? Indeed, the differences between 95% and 99% may be significant in practical terms. Do we always then reject a model if it cannot perform within the observational uncertainty range? How reasonable are our estimates of uncertainty in observations? In many cases, even the results from less accurate models may be very useful. Another reference for what is acceptable accuracy is the use of a benchmark model as discussed by e.g. Seibert [41]. The difficulty is then transferred to selecting an appropriate benchmark. Our answer is that the decision on performance criteria must, in general, be taken in a socio-economic context, for which predictive uncertainties must be clearly explained and open to interpretation beyond small groups of scientists. Thus, we believe that the accuracy criteria cannot be decided universally by modellers or researchers, but must be different from case to case depending on the nature of a decision and the risks involved. 4.2.2. Qualitative assessment of tenability of conceptual models Pedigree analysis structures the critical appraisal of alternative model structures and provides insight in the state of knowledge on which each of the conceivable model structures is based. However, it does not give an indication of the relative quality of the various model structures. With reference to Table 1, the pedigree analysis for a simple statistical model (A) and a complex mechanistic model (B) could, for example, result in statements like: • Model A is weakly correlated to the predicted variable (Proxy, score 1), based on a large sample of direct measurements (Quality and quantity, score
J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597
4), built on a preliminary theory and a black box model (Theoretical understanding, score 1; Representation of mechanisms, score 1), somewhat plausible (Plausibility, score 2) and controversial among colleagues (Colleague consensus, score 2); • Model B exactly addresses the desired predictive variable (Proxy, score 4), is based on data with rule of thumb estimates (Quality and quantity, score 1), built on a well-established theory with model equations reflecting high process details (Theoretical understanding, score 4; Representation of mechanisms, score 4), reasonably plausible and accepted by all colleagues except rebels (Plausibility and Colleague consensus, score 3). Such statements cannot be integrated in a quantitative uncertainty analysis in terms of probabilities, but they should be available as the best possible scientifically based characterisation of uncertainties and as such be made available to those involved in the decision making process. Furthermore, as the selected conceptual models can never cover all possibilities, but instead cover limited range, it is important to emphasise that the overall uncertainty of model predictions cannot be assessed in an absolute sense, only in a conditional or relative sense [7,31]. Our suggested method does not alter this fundamentally. However, we believe that the outcome of the proposed formalised review is a qualitative assessment that is more useful in a decision making context than unstructured information, or verbose information from scientific outlets that is not always available to the decision maker. The challenge is to design environmental management strategies that are robust against the uncertainties identified. Inclusion of a wider range of conceivable model structures may help to anticipate surprises that would have been overlooked otherwise. 4.2.3. Different degrees of extrapolation Our proposed framework deals with situations where predictions involve extrapolations beyond available field data. However, there are different degrees of extrapolation (Fig. 2). If we look at the situation where a threedimensional groundwater model is calibrated against groundwater head and discharge data, model predictions of groundwater recharge to a given layer is a smaller extrapolation than model predictions of groundwater age or contaminant concentration. In both situations, model predictions are carried out for variables that have not been used as calibration targets and for which no traditional split-sample validation tests are possible. The type of validation test recommended for such situation is a proxy-basin test, which according to the principles in Klemes [26] and Refsgaard [38], for instance, could imply that validation tests have to be conducted in two similar catchments where relevant data (e.g. con-
1595
centrations) exist, and where such data are not used for calibration. The residuals in the other catchments can then be seen as a measure of the uncertainty to be expected in the catchment of interest. If model predictions are made for groundwater heads in cases involving groundwater abstraction, and the existing data available for calibration and validation tests do not include such abstraction, we also have an extrapolation case, although of a different nature. In this case we have data for the variable of predictive interest, but the catchment characteristics are non-stationary. This corresponds to the situation of model validation denoted by a differential split-sample test [26,38]. The differential split-sample test scheme recommended by Klemes also operates by tests on similar catchments where data for the type of non-stationary situation exist. Differential split-sample tests are often less demanding than proxy-basin tests [36]. A similar type of differential split-sample situation arises when predictions are required for a system in which structural change is expected (e.g. [50,4]. In cases where the conceptual models can be transferred to other catchments in a reliable and reproducible way, such proxy-basin and differential split-sample tests could be conducted and the results used to evaluate the goodness of the underlying conceptual models. It is worth noting that Klemes’ test schemes, which also apply for cases of extrapolation, operate with tests for two alternative catchments. This has clear similarities with our strategy of recommending the use of multiple conceptual models. 4.3. Perspectives In many cases where environmental models are used to make predictions that are extrapolations beyond the calibration base, no suitable framework exists for assessing the effects of model structure error. The proposed framework is composed of elements originating from different scientific disciplines. The elements are well tested individually, but not previously applied in such an integrated manner for water resources or environmental modelling applications. The full framework still needs to be tested in real-life cases.
Acknowledgement For the three authors from GEUS and UVA the present work was supported by the Project ‘Harmonised Techniques and Representative River Basin Data for Assessment and Use of Uncertainty Information in Integrated Water Management’ (www.harmonirib.com), which is partly funded by the EC Energy, Environment and Sustainable Development programme (Contract EVK1-2002-00109). The constructive comments of
1596
J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597
Hoshin V. Gupta and two anonymous reviewers are acknowledged.
Appendix. Terminology The terminology used is mainly based on Refsgaard and Henriksen [39]: Reality: The system that we aim to represent with the model, understood here as the study area. Conceptual model: A representation of ‘reality’ in terms of verbal descriptions, equations, governing relationships or ‘natural laws’ that purport to describe reality. This is the user’s perception of the key hydrological and ecological processes in the study area (perceptual model) and the corresponding simplifications and numerical accuracy limits that are assumed acceptable in order to achieve the purpose of the modelling. A conceptual model therefore includes a mathematical description (equations) of assumed processes and a description of the objects they interact with, including river system elements, ecological structures, geological features, etc. that are required for the particular purpose of modelling. Model code: A generic mathematical description of a conceptual model, implemented in a computer program. It is generic in the sense that, without program changes, it can be used to establish a model with the same basic type of equations (but allowing different input variables and parameter values) for a different study area. Model: A case-specific tailored version of a model code established for a particular study area and set of modelling objectives (output variables) including specific input data and parameter values. Model confirmation: Determination of the adequacy of the conceptual model to provide an acceptable performance for the domain of intended application. Code verification: Substantiation that a model code adequately represents a conceptual model within certain specified limits or ranges of application and corresponding ranges of accuracy. Model calibration: The procedure of adjusting the parameter values of a model in such a way that the model reproduces an observed response of the system represented in the model within the range of accuracy specified in the performance criteria. Model validation: Substantiation that a model, within its domain of applicability, possesses a satisfactory range of accuracy, consistent with the intended application of the model. Note that various authors have criticised the use of the word validation for predictive models because universal validation of a model is in principle impossible and therefore prefer to use the term model evaluation [32,3]. In our definition [39] the term validation is not used in a universal sense, but is always restricted to clearly defined domains of applicability and
performance accuracy (‘numerical universal’ in Popperian sense). Pedigree: Pedigree conveys an evaluative account of the production process of information, and indicates different aspects of the underpinning and scientific status of the knowledge used. Pedigree is expressed by means of a set of pedigree criteria to assess these different aspects. Criteria for model parameter pedigree are for instance proxy representation, empirical basis, methodological rigor, theoretical understanding and validation. Assessment of pedigree involves qualitative expert judgement. To minimise arbitrariness and subjectivity in measuring strength, a pedigree matrix is used to code qualitative expert judgements for each criterion into a discrete numeral scale from 0 (weak) to 4 (strong) with linguistic descriptions (modes) of each level on the scale [49].
References [1] Aller LT, Bennet T, Lehr JH, Petty RJ. DRASTIC: a standardized system for evaluating ground water pollution potential using hydrogeologic setting, US EPA Robert S. Kerr Environmental Research Laboratory, EPA/600/287/035, Ada, OK, 1987. [2] Babendreier JE. National-scale multimedia risk assessment for hazardous waste disposal. In: International workshop on uncertainty, sensitivity and parameter estimation for multimedia environmental modelling held at US Nuclear Regulatory Commission, Rockville (MD), August 19–21, 2003. Proceedings, pp. 103–9. [3] Beck MB. Model evaluation and performance. In: El-Shaarawi AH, Piegorsch WW, editors. Encyclopedia of environmetrics, vol. 3. Chichester: John Wiley & Sons, Ltd; 2002. p. 1275–9. [4] Beck MB. Environmental foresight and structural change. Environ Modell Software 2005;20:651–70. [5] Beck MB, van Straten G, editorsUncertainty and forecasting of water quality. Springer-Verlag; 1983. [6] Beven K, Binley AM. The future of distributed models, model calibration and uncertainty predictions. Hydrol Process 1992;6:279–98. [7] Beven K. Towards a coherent philosophy for modelling the environment. Proc Roy Soc London, A 2002;458(2026): 2465–84. [8] Butts MB, Payne JT, Kristensen M, Madsen H. An evaluation of the impact of model structure on hydrological modelling uncertainty for streamflow prediction. J Hydrol 2004;298:242–66. [9] Carle SF, Fog GE. Transition probability based on indicator geostatistics. Math Geol 1996;28(4):453–77. [10] Carle SF, Fog GE. Modeling spatial variability with one and multidimensional contineous-lag Markov chains. Math Geol 1997;29(7):891–917. [11] Copenhagen County. Pilot project on establishment of methodology for zonation of groundwater vulnerability. In: Proceedings from seminar on groundwater zonation, November 7, 2000, County of Copenhagen [in Danish]. [12] Craye M, van der Sluijs JP, Funtowicz S. A reflexive approach to dealing with uncertainties in environmental health risk science and policy. Int J Risk Assess Manage 2005;5(2):216–36. [13] Dubus IG, Brown CD, Beulke S. Sources of uncertainty in pesticide fate modelling. Sci Total Environ 2003;317:53–72. [14] Dunn W. Using the method of context validation to mitigate type III errors in environmental policy analysis. In: Hisschemo¨ller M,
J.C. Refsgaard et al. / Advances in Water Resources 29 (2006) 1586–1597
[15]
[16] [17] [18]
[19] [20] [21]
[22]
[23]
[24]
[25] [26] [27]
[28]
[29]
[30]
[31]
[32]
[33]
[34] [35] [36]
Hoppe HV, Dunn W, Ravetz J, editors. Knowledge, power and participation in environmental policy. Policy studies review annual, vol. 12. New Jersey (USA): Transaction Publishers. p. 417–36. Efron B, Tibshirani RJ. An introduction to the bootstrap. Monographs on statistics and applied probability. New York: Chapman and Hall; 1993. Franchini M, Pacciani M. Comparative analysis of several conceptual rainfall-runoff models. J Hydrol 1992;122:161–219. Funtowicz SO, Ravetz JR. Uncertainty and quality in science for policy. Dordrecht: Kluwer; 1990. p. 229. Harrar WG, Sonnenborg TO, Henriksen HJ. Capture zone, travel time and solute transport predictions using inverse modelling and different geological models. Hydrogeol J 2003;11(5):536–48. Hodgson AM. Hexagons for systems thinking. Eur J Oper Res 1992;59:220–30. Hora SC. Acquisition of expert judgement: examples from risk assessment. J Energy Eng 1992;118:136–48. Højberg AL, Refsgaard JC. Model uncertainty – parameter uncertainty versus conceptual models. Water Sci Technol 2005;52(6):177–86. IPCC. Climate change 2001: the scientific basis. Contribution of working group I to the third assessment report of the intergovernmental panel of climate change [Houghton JT, Ding Y, Griggs DJ, Noguer M, van der Linden PJ, Dai X, Maskell K, Johnson CA, editors]. Cambridge University Press, Cambridge (UK) and New York (NY, USA). p. 881. Jakeman AJ, Letcher RA. Integrated assessment and modelling: features, principles and examples for catchment management. Environ Modell Software 2003;18:491–501. Jensen JB. Parameter and uncertainty estimation in groundwater modelling. PhD thesis, Department of Civil Engineering, Aalborg University, Series Paper no. 23, 2003. Keith DW. When is it appropriate to combine expert judgements? Clim Change 1996;33:139–43. Klemes V. Operational testing of hydrological simulation models. Hydrol Sci J 1986;31:13–24. Kloprogge P, van der Sluijs JP. The inclusion of stakeholder knowledge and perspectives in integrated assessment of climate change. Climatic Change, in press. Linkov I, Burmistrov D. Model uncertainty and choices made by modelers: lessons learned from the international atomic energy model intercomparisons. Risk Anal 2003;23(6):1297–308. Meyer PD, Ye M, Neuman SP, Cantrell KJ. Combined estimation of hydrogeologic conceptual model and parameter uncertainty. NUREG/CR-6843 Report, NRC, Washington, DC, 2004. National Research Council. Conceptual models of flow and transport in the vadose zone. Washington, DC: National Academy Press; 2001. Neuman SP, Wierenga PJ. A comprehensive strategy of hydrogeologic modeling and uncertainty analysis for nuclear facilities and sites. University of Arizona, Report NUREG/CR-6805, 2003. Oreskes N, Shrader-Frechette K, Belitz K. Verification, validation, and confirmation of numerical models in the Earth Sciences. Science 1994;263:641–6. Pahl-Wostl C. Towards sustainability in the water sector – the importance of human actors and processes of social learning. Aquat Sci 2002;64:394–411. Poeter E, Anderson D. Multiple ranking and inference in ground water modeling. Ground Water 2005;43(4):597–605. Radwan M, Willems P, Berlamont J. Sensivity and uncertainty analysis for river quality modelling. J Hydroinform 2004:83–99. Refsgaard JC, Knudsen J. Operational validation and intercomparison of different types of hydrological models. Water Resources Res 1996;32(7):2189–202.
1597
[37] Refsgaard JC, Hansen LK, Vahman M. Groundwater zonation in Copenhagen County – Intercomparision of thematic results from different consultants. In: Seminar on groundwater zonation, County of Copenhagen, November 7, 2000 [in Danish]. [38] Refsgaard JC. Towards a formal approach to calibration and validation of models using spatial data. In: Grayson R, Blo¨schl G, editors. Spatial patterns in catchment hydrology: observations and modelling. Cambridge University Press; 2001. p. 329–54. [39] Refsgaard JC, Henriksen HJ. Modelling guidelines – terminology and guiding principles. Adv Water Resources 2004;27:71–82. [40] Refsgaard JC, Storm B. MIKE SHE. In: Singh VP, editor. Computer models of watershed hydrology. Water Resources Publication; 1995. p. 809–46. [41] Seibert J. On the need for benchmarks in hydrological modelling. Hydrol Process 2001;15(6):1063–4. [42] Selroos JO, Walker DD, Strom A, Gylling B, Follin S. Comparison of alternative modelling approaches for groundwater flow in fractured rock. J Hydrol 2001;257:174–88. [43] Troldborg L. The influence of conceptual geological models on the simulation of flow and transport in quaternary aquifer systems. PhD Thesis. Geological Survey of Denmark and Greenland, Report 2004/107. [44] Usunoff E, Carrera J, Mousavi SF. An approach to the design of experiments for discriminating among alternative conceptual models. Adv Water Resources 1992;15:199–214. [45] Van Griensven A, Meixner T. Dealing with unidentifiable sources of uncertainty within environmental models. In: Pahl C, Schmidt S, Jakeman T, editors. iEMSs 2004 international congress: ‘‘Complexity and integrated resources management’’. International Environmental Modelling and Software Society, Osnabru¨ck, Germany, June 2004. [46] Van der Sluijs JP. Anchoring amid uncertainty; On the management of uncertainties in risk assessment of anthropogenic climate change, Ph.D. thesis, Utrecht University, 1997. p. 260. [47] Van der Sluijs JP, Potting J, Risbey JS, Van Vuuren D, de Vries B, Beusen A, et al. Uncertainty assessment of the IMAGE/TIMER B1 CO2 emissions scenario, using the NUSAP method. Report commissioned by the Netherlands National Research Program on global Air Pollution and Climate Change, RIVM, Bilthoven, The Netherlands, 2002. p. 225. [48] Van der Sluijs JP, Risbey JS, Kloprogge P, Ravetz JR, Funtowicz SO, Corral Quintana S, et al. RIVM/MNP Guidance for uncertainty assessment and communication: detailed guidance, report commissioned by RIVM/MNP – Copernicus Institute, Department of Science, Technology and Society, Utrecht University, Utrecht, The Netherlands, 2003. p. 71. [49] Van der Sluijs JP, Craye M, Funtowicz SO, Kloprogge P, Ravetz J, Risbey JS. Combining quantitative and qualitative measures of uncertainty in model based foresight studies: the NUSAP system. Risk Anal 2005;25(2):481–92. [50] Van Straten G, Keesman KJ. Uncertainty propagation and speculation in projective forecasts of environmental change: a lake-eutrophication example. J Forecast 1991;10:163–90. [51] Vennix JAM. Group model-building: tackling messy problems. Syst Dyn Rev 1999;15(4). [52] Visser H, Folkert RJM, Hoekstra J, De Wolff JJ. Identifying key sources of uncertainty in climate change projections. Clim Change 2000;45:421–57. [53] Vrugt JA, Diks CGH, Gupta HV. Improved treatment of uncertainty in hydrologic modelling: combining the strengths of global optimization and data assimilation. Water Resources Res 2005;41(1). Art No W01017. [54] Walker WE, Harremoe¨s P, Rotmans J, Van der Sluijs JP, Van Asselt MBA, Janssen P, et al. Defining uncertainty. A conceptual basis for uncertainty management in model-based decision support. Integr Assessment 2003;4(1):5–17.