Model Validation Kit – status and outlook H.R. Olesen National Environmental Research Institute (NERI) P.O. Box 358, DK-4000 Roskilde, Denmark E-mail
[email protected]
Abstract: Over the past few years, the so-called Model Validation Kit has been the basis for much work on model evaluation. The kit has recently been enhanced with a supplement containing experimental data from Indianapolis. A change of methodology is under consideration, based on the concept of near-centreline concentrations. The paper examines some consequences of such a change in methodology. Keywords: atmospheric dispersion models, model evaluation, Model Validation Kit, near-centreline concentrations. 1 Introduction The present conference is the fifth in a series of meetings which have been organised by the initiative on “Harmonisation within Atmospheric Dispersion Modelling for Regulatory Purposes” since 1992. At these meetings, model evaluation has been a key issue. Work has been conducted in order to start establishing a toolbox of recommended methods for model evaluation. The basis for much of this work has been a so-called Model Validation Kit (Olesen, 1995a). Over the past few years, the kit has been distributed to approximately 140 research groups. Since the previous harmonisation workshop in Oostende in 1996, a supplement to the Model Validation Kit has become available. The supplement comprises data from an experiment in Indianapolis, USA, and some related software tools. One section in the present paper introduces this supplement. The Model Validation Kit has been used for a large number of studies reported at the previous workshops (see IJEP, 1995; 1997). It will also play a role in connection with a new Model Documentation System which has become publicly available under the auspices of the European Topic Centre on Air Quality of the European Environment Agency (Moussiopoulos, 1998). This Model Documentation System is a catalogue of models available through the Internet (http://www.etcaq.rivm.nl). The Model Documentation System as such does not contain detailed information on model performance, but model quality is an issue of obvious interest to the users of the system. Therefore, modellers are encouraged to conduct model evaluation exercises according to standard methodologies — to the extent that standard methods exist. The Model Validation Kit provides one such de facto standard methodology and can be recommended for use for the time being. But the kit has limitations, and model evaluation results should be used with prudence as pointed out in a series of previous papers (Olesen, 1994; 1995b; 1996). If we look into the future, there may be changes to the methodology of the Model Validation Kit. Such potential changes are a topic for discussion in the present paper. The consequences of changes are presently being explored in the model evaluation community, but until consensus has been reached on a new set of well-tested standard tools, the existing Model Validation Kit provides the best available common frame of reference for evaluation of short-range models. The bulk of this paper is devoted to a discussion of one particular issue which eventually can have great implications. According to the methodology used in the present version of the kit, observed arcwise maxima are compared directly to modelled centreline values. In a draft for an ASTM standard (American Society for Testing of Materials) an alternative methodology has been proposed which uses the concept of “near-centreline concentrations”. The implications of using either of the two methodologies will be discussed. 2 Supplement to the Model Validation Kit The original Model Validation Kit comprises three experimental datasets (Kincaid, Copenhagen and Lillestrøm) and software. The software includes a program for statistical model evaluation, a program for analysis of
1
12 0
43
0
74
0
0
37
90
106
6
9
0
10
0
0 0
0
0
0
22 0
0
8
104 0 0
0
0
23
15
110 119
0
0 0
0
0
238 60
0
0
North-South (km)
6
38
0 0
0
0
10 km
0 0
0
83
86
0
20
0 0 0
0
0
4
0
0
0
0
0
0
0
0 0
14
0
52 63
92
7 km 0
7 0
0
0
0 0 0 0 0 0 0 0 0
0
0 0 0
0 0
5 km
12 0
0
2
0 0
130
0
0 0
0
3 km
0 0 0 0 0
Source
0
-6
-4
-2
2 km 1 km
0
2
4
6
East-West (km)
Figure 1 Geographical distribution of measured concentrations at Kincaid, 22 May 1981, 10-11 hours. Values are in ppt, and the arcwise maxima are enclosed in circles.
residuals, and a simple plotting package well suited for presenting the results from these programs. The software was originally developed by Hanna et al. (1991). The supplement to the Model Validation Kit which became generally available in 1997 includes a data set from Indianapolis (USA) as well as some software tools. Among these tools are utilities specifically for handling Indianapolis data. They are intended to make it relatively easy for a modeller to combine his modelled results with the observed tracer concentration data from Indianapolis. Additionally, the software tools include an enhanced version of the SIGPLOT software – which can also be used in conjunction with the original Model Validation Kit. During the Indianapolis experiment, SF6 tracer was released from an 84 m power plant stack in the town of Indianapolis, USA. 170 hours of tracer data are available from monitoring arcs at distances ranging from 0.25 to 12 km from the source. The data set will not be described here in any detail. We refer to the descriptions by Murray and Bowne (1988), by Olesen (1997a), and on the Internet. The Indianapolis data set is an interesting complement to the other data sets of the Model Validation Kit because it represents urban conditions. The Model Validation Kit and its supplement are available free of charge from the author. Information on the kit can be found on the Internet through the home page of the initiative on Harmonisation within Atmospheric Dispersion Modelling for Regulatory Purposes (http://www.dmu.dk/AtmosphericEnvironment/harmoni.htm). 3 The near-centreline methodology 3.1 Introduction When evaluating dispersion models, various sets of concentration variables can be considered as being of interest. For instance, such variables are cross-wind integrated concentrations and arcwise maximum concentrations. Both of these variables are considered in the Model Validation Kit (however, in the current version of the kit cross-wind integrated concentrations are not included for the case of Kincaid). There is a possible alternative to the use of observed arcwise maxima, which will be discussed here at some length. It uses the concept of nearcentreline concentrations. An evaluation methodology using this concept has been proposed by J. Irwin in the context of the American Society for Testing and Materials (ASTM, 1997). This methodology is also being considered in the context of an 2
240 220
Concentration (ppt)
200 180 160 140 120 100 80 60 40 20 0 -70
-60
-50
-40
-30
-20
-10
-0
10
20
30
Azimuth relative to centre of mass (degrees)
Fig. 2. Kincaid data. Concentrations along the arc 7 km downwind, 10-11 hours May 22, 1981.
ISO ad hoc work group on model evaluation1. There is an overlap between the persons involved in the ASTM and the ISO work as well as in the "Harmonisation..." workshops, so results obtained in one of these frameworks are likely to propagate to the others. The essence of the ASTM methodology is that we should consider not only one observed value (the maximum) per arc, but consider several “near-centreline” concentrations. A model’s ability to reproduce the near-centreline concentrations should be evaluated by first classifying the experimental data into regimes with similar physical properties, and then assessing statistical performance measures within each regime. In particular, the fractional bias (FB) is considered. Finally, a composite performance measure over many regimes can be constructed. The remaining part of this paper is devoted to the question: What are the consequences if we change methodology – from a methodology focusing on maximum arcwise concentrations (MAC’s) to a methodology focusing on near-centreline concentrations (NCC’s)? The complete methodology involves a wide range of issues, but we will here restrict ourselves to the most basic questions. The present paper can be regarded as one among many building blocks in a foundation for a decision on recommended model evaluation methodologies. 3.2 Why consider a change? The background for considering a change in methodology compared to the Model Validation Kit lies in the fact that atmospheric dispersion processes are stochastic. Models can be expected only to predict ensemble averages – not the results of specific realisations. The Model Validation Kit in its present form does not explicitly address this issue. Anyhow, it has the advantage of being straightforward and practically oriented. On the other hand, a methodology building on the concept of near-centreline concentrations is better suited to handle the question of ensemble averages. This has a price in terms of increased complexity. Also, the definition of a "perfect model" will change with a change of methodology. In the following subsections, the procedure for defining near-centreline concentrations and the implications of using them will be explored and contrasted with the procedures of the existing Model Validation Kit. 3.3 Basic concepts – defining an ensemble average An example of the layout of a dispersion experiment is shown as Figure 1. It displays the geographical distribution of measured ground-level tracer concentrations during one particular hour of the Kincaid experiment. According to the methodology used in the Model Validation Kit, arcs of monitors are considered where all monitors in an arc have approximately the same distance from the source. For each arc, the observed maximum is compared directly to the modelled value for the plume centreline at ground level. Take as an example the 7 km arc as illustrated in Fig. 2. The maximum value (238) is compared to a modelled value for the plume centre line.
1
ISO is the International Standards Organisation, and the workgroup is under TC 146/SC5: the subcommittee on Meteorology under the Technical Committee on Air Quality.
3
In contrast, according to the methodology involving NCC's we should instead take several, near-centreline concentrations (in this case for instance the three highest values) and compare them to modelled values. The details in this will be discussed later, but let us first consider a basic question: How should we ideally form an ensemble average of arcwise centreline concentrations? (Under the assumption that we have many realisations of one "event", i.e. a meteorological scenario.) This is not a priori clear. Let us assume that we know all details of the concentration distribution along an arc. We may choose between at least two methods for forming an ensemble average of arcwise centreline concentrations: (i) The maximum based method: Take the maximum concentration for each arc and form the average
cmax
1 n = ∑ cmax,i n i =1
(where cmax,i is the maximum concentration along the arc for the i'th realisation of the event). (ii) The centre-of-mass based method: Find the concentration at the centre-of-mass for each arc, and form the average
cc _ o _ m =
1 n ∑ cc_ o_ m,i n i =1
(where cc_o_m,i is the arcwise centre-of-mass concentration of the arc for the i'th realisation of the event). Method (i) fits with an evaluation methodology where we compare observed arcwise maxima to modelled centreline values as we do in the case of the Model Validation Kit. A "perfect model" would be a model which correctly predicts ensemble averages of arcwise maximum values. In practice, we cannot determine the true arcwise maximum because our network of monitors has gaps. Strictly speaking, this “receptor spacing effect” makes it impossible for us to assess whether a given model is perfect according to definition (i). Method (ii) corresponds to an evaluation methodology where we identify the centre of mass for each arc and compare the concentration there with the modelled centreline concentration. A "perfect model" is one which correctly predicts ensemble averages of arcwise centre-of-mass values. Some properties of the two types of "perfect models" are trivial from a mathematical standpoint, but very interesting from a practical point of view. In the first place, if we have a perfect model (according to either method), and if we have a reasonable number of realisations of the same event, then there will exist observed values greater than our model prediction. Thus, it is a characteristic feature of a perfect model that it underpredicts the highest concentrations. There is a slight modification: in case of a perfect method-(i)-model, the underprediction can to some extent be concealed by the “receptor spacing effect”. Secondly, a model which is perfect according to the maximum based definition (i) will predict larger concentrations than a model which is "perfect" according to the centre-of-mass based definition (ii). The practical consequences of this will be treated in the subsequent sections. A discussion along similar lines as above, but with a few additional details can be found in another paper by the author (Olesen, 1997b). 3.4 Basic concepts – preparation of a data set with near-centreline concentrations The basis of the ASTM methodology, with changes as suggested by Irwin (1998) will be outlined here. First, take an experimental data set with a good coverage of samplers along the monitoring arcs. Next, classify the observations into regimes. Each regime should represent uniform physical conditions (arcs at the same distance from the source, with the same stability etc.). The definition of a “regime” is important because ensemble averages will be determined regime by regime. Consider an arc, and accept it or reject it for further processing. For instance, it will be rejected if there are unacceptably few monitors. Determine the centre-of-mass and the lateral dispersion, y, for the arc. Based on all observations in the regime, compute the average y for the regime. Go back to the individual arcs and select “near-centre-line” concentrations. On a given arc, there may be none, one, or several selected values. “Near-centreline” is defined in terms of the regime-averaged y. The resulting data set with observed concentrations will be the basis for all further work. It can be used by modellers for computing concentrations at corresponding points. The above steps can be performed once and for all for a given experimental data set, once the set of regimes is defined.
4
Concentration normalised by emission (10-9 s/m3)
110 100 90 80 70 60 50 40
Average of NCC's
30 20 10 0 -20
-10
0
10
20
Azimuth relative to centre of mass (degrees)
Fig. 3.
Kincaid data. Concentration from 14 arcs belonging to a regime with arcs 7 km downwind and -25