overview of tools and methods for meteorological and ... - WMO Library

GAW Report No. 181

World Meteorological Organization Research Department Atmospheric Research and Environment Branch 7 bis, avenue de la Paix – P.O. Box 2300 – CH 1211 Geneva 2 – Switzerland Tel.: +41 (0) 22 730 81 11 – Fax: +41 (0) 22 730 81 81 E-mail: [email protected] – Website: http://www.wmo.int/pages/prog/arep/index_en.html

GAW Report No. 181

For more information, please contact:

Joint Report of COST Action 728 and GURME

Joint Report of COST Action 728 and GURME Overview of Tools and Methods for Meteorological and Air Pollution Mesoscale Model Evaluation and User Training

WMO/TD - No. 1457

© World Meteorological Organization, 2008 © COST Office, 2008, ISBN 978-1-905313-59-4 The right of publication in print, electronic and any other form and in any language is reserved by WMO. Short extracts from WMO publications may be reproduced without authorization provided that the complete source is clearly indicated. Editorial correspondence and requests to publish, reproduce or translate this publication (articles) in part or in whole should be addressed to: Chairperson, Publications Board World Meteorological Organization (WMO) 7 bis avenue de la Paix P.O. Box No. 2300 CH-1211 Geneva 2, Switzerland

Tel.: +41 22 730 8403 Fax.: +41 22 730 8040 E-mail: [email protected]

NOTE The designations employed in WMO publications and the presentation of material in this publication do not imply the expression of any opinion whatsoever on the part of the Secretariat of WMO concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. Opinions expressed in WMO publications are those of the authors and do not necessarily reflect those of WMO. The mention of specific companies or products does not imply that they are endorsed or recommended by WMO in preference to others of a similar nature which are not mentioned or advertised. This document (or report) is not an official publication of WMO and has not been subjected to its standard editorial procedures. The views expressed herein do not necessarily have the endorsement of the Organization.

European COoperation in Science and Technology (COST) COST, which is supported by the EU RTD Framework Programme, is the oldest and widest European intergovernmental network for cooperation in research. Established by the Ministerial Conference in November 1971, COST is presently used by the scientific communities of 35 European countries to cooperate in common research projects supported by national funds. The funds provided by COST support the COST cooperation networks (COST Actions) through which, with EUR 30 million per year, more than 30,000 European scientists are involved in research having a total value which exceeds EUR 2 billion per year. This is the financial worth of the European added value which COST achieves. A “bottom up approach” (the initiative of launching a COST Action comes from the European scientists themselves), “_ la carte participation” (only countries interested in the Action participate), “equality of access” (participation is open also to the scientific communities of countries not belonging to the European Union) and “flexible structure” (easy implementation and light management of the research initiatives) are the main characteristics of COST. As precursor of advanced multidisciplinary research, COST has a very important role for the realisation of the European Research Area (ERA) anticipating and complementing the activities of the Framework Programmes, constituting a “bridge” towards the scientific communities of emerging countries, increasing the mobility of researchers across Europe and fostering the establishment of “Networks of Excellence” in many key scientific domains such as: Biomedicine and Molecular Biosciences; Food and Agriculture; Forests, their Products and Services; Materials, Physical and Nanosciences; Chemistry and Molecular Sciences and Technologies; Earth System Science and Environmental Management; Information and Communication Technologies; Transport and Urban Development and Individuals, Societies, Cultures and Health. It covers basic and more applied research and also addresses issues of pre-normative nature or of societal importance. For further information visit: www.cost.esf.org ESF logo - ESF provides the COST Office through an EC contract EU logo - COST is supported by the EU RTD Framework Programme

ESF provides the COST Office through an EC contract

COST is supported by the EU RTD Framework programme

Joint Report of COST Action 728 (Enhancing Mesoscale Meteorological Modelling Capabilities for Air Pollution and Dispersion Applications) and GURME (GAW Urban Research Meteorology and Environment Project)

OVERVIEW OF TOOLS AND METHODS FOR METEOROLOGICAL AND AIR POLLUTION MESOSCALE MODEL EVALUATION AND USER TRAINING Editors: K. Heinke Schlünzen (Meteorological Inst., ZMAW, University of Hamburg, Germany), Ranjeet S Sokhi, University of Hertfordshire, UK

Contributors: Elissavet Bossioli, Peter Builtjes, Bruce Denby, Marco Deserti, John Douros, Barbara Fay, Gertie Geertsema, Marko Kaasik, Kristina Labancz, Volker Matthias, Ana Isabel Miranda, Nicolas Moussiopoulos, Viel Ødegaard, Denise Pernigotti, Christer Persson, Roberto San Jose, K. Heinke Schlünzen, Ranjeet Sokhi, Joanna Struzewska, Alessio D'Allura, Maria Athanassiadou, A. Arvanitis, Alexander Baklanov, Sylvia Bohnenstengel, J Elissavet Bossioli, Giovanni Bonafè, Carlos Borrego, Anabela Carvalho, Ulrich Damrath, Edouard Debry, Jaime Diéguez, Sandro Finardi, Bernard Fisher, Stefano Galmarini, Hubert Glaab, Steen C. Hoe, Nutthida Kitwiroon, Liisa Jalkanen, P. Louka, Alexander Mahura, Helena Martins, D R Middleton, Millán Millán, Alexandra Monteiro, Lina Neunhäuserer, Jose Luis Palau, Ulrike Pechinger, Gorka Perez-Landa, Martin Piringer, Denise Pernigotti, Víctor Prior, Maria Tombrou, C. Simonidis, Leiv Håvard Slørdal, Ariel Stein, Jens Havskov Sørensen, Y. Yu Electronic version: November 2008

Website: www.cost728.org

WMO/TD-No. 1457 November 2008

Table of Contents EXECUTIVE SUMMARY ........................................................................................................................................ i 1.

INTRODUCTION ..................................................................................................................................... 1

2.

COST728 MESOSCALE MODEL INVENTORY ........................................................................................ 3

3.

SUMMARY OF MESOSCALE MODEL APPLICATIONS .......................................................................... 7

4.

DETERMINATION OF MODEL UNCERTAINTY ..................................................................................... 12

4.1

Monte Carlo Meteorological and Air Quality Data Uncertainty Analysis ................................................................... 12

4.2

Sensitivity Analysis.................................................................................................................................................... 16 4.2.1 Meteorological and Photochemical Ensemble Simulations ............................................................................ 16 4.2.2 Input Parameters Sensitivity Analysis (Topography, Land-use)..................................................................... 16 4.2.3 Adjoint Modelling Approach............................................................................................................................ 17 4.2.4 Sensitivity of Model Results to Nesting .......................................................................................................... 18 4.2.5 Sensitivity of UAP Forecasts to Meteorological Input and Resolution............................................................ 18

5.

MODEL QUALITY INDICATORS ........................................................................................................... 21

5.1

Quality Indicators for Evaluating Meteorological Parameters ................................................................................... 21 5.1.1 Observation Availability .................................................................................................................................. 21 5.1.2 Observation Error ........................................................................................................................................... 21 5.1.3 Recommended Quality Indicators for Different Meteorological Parameters................................................... 22

5.2

Quality Indicators for Air Quality Model Evaluation ................................................................................................... 24 5.2.1 Statistical Parameters for Concentrations ...................................................................................................... 24 5.2.2 EPA Quality Indicators.................................................................................................................................... 25 5.2.3 EU Directives Modelling Quality Objectives ................................................................................................... 25 5.2.4 Application Examples ..................................................................................................................................... 26

6.

VALIDATION DATASETS ..................................................................................................................... 29

6.1

Model Validation Datasets and Selection Criteria ..................................................................................................... 29

6.2

Mesoscale Model Validation Datasets and COST728 .............................................................................................. 30

6.3

Other Efforts for the Harmonisation and Standardisation of Validation Datasets ..................................................... 31

7.

MODEL VALIDATION AND EVALUATION EXERCISES........................................................................ 35

7.1

Mesoscale Meteorological Model Validation and Evaluation Studies ....................................................................... 35 7.1.1 Use of the European Tracer Experiment (ETEX) for Model Evaluations....................................................... 37 7.1.2 Meteorological Simulations over the Greater Athens Area Using MM5 and MEMO Mesoscale Models ...... 37 7.1.3 Evaluation of MEMO Using the ESCOMPTE Pre-campaign Dataset ........................................................... 39 7.1.4 Modelling of SOA in the MARS-MUSE Dispersion Model ............................................................................. 40 7.1.5 Photochemical Simulations over the Greater Athens Area ........................................................................... 41 7.1.6 Mesoscale Meteorological Model Inter-comparison and Evaluation in FUMAPEX ....................................... 42 7.1.7 Evaluation of COSMO-IT for Air Quality Forecast and Assessment Purposes ............................................. 44 7.1.8 Evaluation of MM5-CMAQ Systems for an Episode over the UK.................................................................. 46 7.1.9 Evaluation of the MM5-CMAQ-EMIMO Modelling System in Spain .............................................................. 48

7.2

Concentrations of Chemical Species ........................................................................................................................ 49

8.

MODEL EVALUATION METHODOLOGIES ........................................................................................... 51

9.

USER TRAINING .................................................................................................................................. 54

9.1

User Training in Different Countries.......................................................................................................................... 54

9.2

Summary on User Training ....................................................................................................................................... 56

9.3

Recommendations for User Training ........................................................................................................................ 57 9.3.1 Model User ..................................................................................................................................................... 57 9.3.2 Model Developer ............................................................................................................................................ 57

10.

Conclusions ........................................................................................................................................ 58

References ........................................................................................................................................................ 60 Annex A: Glossary of terms.............................................................................................................................. 65 Annex B: Entries to the web based model inventory ...................................................................................... 66 Annex C: Estimates for measurement and model uncertainty ........................................................................ 68 Annex D: Statistical measures for meteorological parameters ....................................................................... 70 Annex E: Statistical measures for concentrations........................................................................................... 74 Annex F: Evaluation of different wavelengths ................................................................................................. 76 Annex G: Detailed evaluation results from FUMAPEX (fp 5 project) ............................................................... 77 Annex H: Details on the evaluation of COSMO_IT for air quality and assessment purposes ......................... 79 Annex I: Detailed results of the evaluation of CMAQ for an episode over UK................................................ 83 Annex J: Detailed evaluation results for MM5-CMAQ-EMINO over Spain........................................................ 85 Annex K: Structure of meta-database for model evaluation exercises .......................................................... 87 Annex L: Entries in meta-database for model evaluation exercises............................................................... 91 Annex M: Summary tables on model validation and evaluation ...................................................................... 95 Annex N: Mesoscale model user training ......................................................................................................... 98 Annex O: Structure of WMO GURME air quality forecasting training course................................................ 106

EXECUTIVE SUMMARY This report provides an overview of current methodologies and tools for mesoscale meteorological model validation and result evaluation, on validation datasets and user training. This overview will assist in the wider aim of COST728 to enhance European capabilities on meteorological models for air pollution dispersion applications. This report is meant as a first, but important, step for developing protocols for evaluating the use of mesoscale atmospheric models for pollution transport studies and for developing procedures for model quality assurance based on scientific and fundamental principles. Three different time scales are considered: • • •

Episodes (a few days). Single cases that concern meteorological situations relevant for determining statistical values. Extended periods/years, on an hour-by-hour to daily averaged basis, to determine air quality concentrations relevant for the EU-Directives.

A summary is provided in this report of the existing models and their capabilities (Section 2), of selected mesoscale model applications undertaken by COST728 partners (Section 3). Information on how to determine model uncertainty (Section 4) and how to evaluate model performance (Section 5) is given, and a review of available, well documented, three-dimensional datasets of known quality for model evaluation is given (Section 6). A summary of earlier evaluation exercises is given in Section 7. Validation methodologies and data include those attempts and data of meteorological parameters that are relevant for concentration forecasts. Concepts for model evaluation that are based on fundamental physical principles rather than on single case application are summarized in Section 8. The evaluation of models is a necessary, but not the only, sufficient step to ensure reliable model results. Mesoscale models are too complex to be applied without deep knowledge of their application and experience in their application. There is currently no consensus on the extent or the depth of training that would be required for a non-expert to competently use mesoscale models. Mesoscale model user training being undertaken by COST728 partners is summarized in Section 9. First conclusions on how to evaluate mesoscale models and how to perform training are given in Section 10. A glossary of terms follows in Annex A.

i

1.

INTRODUCTION

The objective of this report is to provide an overview of current methodologies and tools for evaluation of mesoscale meteorological models, on datasets and user training. This overview will assist in the wider aim of COST728 to enhance European capabilities on meteorological models for air pollution dispersion applications. This report is meant as a first, but important, step for developing protocols for evaluating the use of mesoscale atmospheric models for pollution transport studies and for developing procedures for model quality assurance based on scientific and fundamental principles. Three different time scales are considered: • • •

Episodes (a few days). Single cases that concern meteorological situations relevant for determining statistical values. Extended periods/years, on an hour-by-hour to daily averaged basis, to determine air quality concentrations relevant for the EU-Directives.

This report is based on literature reviews, questionnaires and on the web based metadata databases of COST728. The basic structure of the 5-piece meta-database system is shown in Figure 1. COST728 collaborates closely with COST 732 and ACCENT with respect to the model meta-database.

Model validation & evaluation exercises, evaluation methodologies User training Model inventory

Guideline for QA

Model applications episodes sensitivity analyses

Validation data sets (metadata)

Links

Figure 1. Distributed database of COST728 to collect meta-data: “Model inventory” (Section 2), “Model applications” (Section 3), “Validation datasets” (Section 6), “Validation and evaluation exercises”(Section 7),”User training” (Section 9)

The “model inventory” has been set up by University of Hamburg (UHH*) and the metadatabases on “validation datasets” and “model validation and evaluation exercises” have been set up by Aristotle University Thessaloniki (AUTH†). User training information is given in Annex E. The interlinked information from the database will be used to define a protocol for conducting quality assurance of mesoscale models. This report provides a summary of the existing models and their capabilities (Section 2), a summary of selected mesoscale model applications undertaken by COST728 partners (Section 3), information on how to determine model uncertainty (Section 4) and how to evaluate model *

http://www.cost728.org

†

http://pandora.meng.auth.gr/mqat

1

performance (Section 5), and a review of available, well documented, three-dimensional datasets of known quality for model evaluation (Section 6). Model validation datasets include air pollution episode datasets resulting from or used in earlier projects (e.g. ESCOMPTE * , FUMAPEX † , COST715‡, CITY-DELTA§, TFS**). A summary of earlier validation and evaluation exercises is given in Section 7. Evaluation methodologies and validation datasets include those attempts and datasets of meteorological parameters that are relevant for concentration forecasts. Within COST728 the impact of meteorological input uncertainties on concentration and meteorological model output is of primary concern. Uncertainty estimates for other datasets such as, emissions, chemistry and kinetic data will not be specifically investigated within COST728, instead the uncertainty values will be extracted from other sources. Evaluation concepts that are based on fundamental physical principles rather than on single case application are summarized in Section 8. The evaluation of models is a necessary, but not the only, sufficient step to ensure reliable model results. Mesoscale models are too complex to be applied without deep knowledge of their application and experience in their application. There is currently no consensus on the extent or the depth of training that would be required for a non-expert to competently use mesoscale models. Mesoscale model user training being undertaken by COST728 partners is summarized in Section 9. First conclusions on how to evaluate mesoscale models and how to perform training are given in Section 10. A glossary of terms follows in Annex A. As mentioned earlier the information discussed in this report has been derived mainly from existing methods and tools being used by the COST728 partners. This also relates to mesoscale meteorological model applications and user training aspects. Consequently, the report does not attempt to provide a fully comprehensive review covering all mesoscale models being employed or training being offered in Europe or elsewhere. However, the work cited here, substantially reflects the wider research and applications being undertaken within European organizations.

*

ESCOMPTE

†

EC 5FP project FUMAPEX: Integrated Systems for Forecasting Urban Meteorology, Air Pollution and Population Exposure; web-site: http://fumapex.dmi.dk ‡ COST715 §

CITY-DELTA

**

TFS: Tropospheric Research Programme funded by the German Research Minister (1996-2002)

2

2.

COST728 MESOSCALE MODEL INVENTORY

K. Heinke Schlünzen(1), Roberto San Jose(2) (1) ZMAW, University of Hamburg, Meteorological Institute, Hamburg, Germany (2) Computer Science School, Technical University of Madrid, Madrid, Spain

A reliable forecast of meteorological data (wind direction and speed, turbulence parameters, humidity, radiation, boundary layer height, influence of heterogeneous terrain) is one of the preconditions for a reliable forecast of concentrations (Schlünzen, 2002). Therefore, COST728 places its emphasis on improving the meteorological models used in atmospheric dispersion studies. In this chapter an overview on the current capabilities of these models is given. A web-based model inventory with detailed information on model capabilities is provided by COST728 (accessible from http://www.cost728.org). The inventory includes models for the microscale (models resolving the canopy layer and obstacles therein), mesoscale (regional models, domain covering at least 100 x 100 km2) and macroscale (hemispheric and global models) and covers meteorology, chemistry and transport models as well as models that simultaneously simulate meteorology, transport and chemistry. Table 1 summarizes the models relevant for COST728. Additional information on these entries can be found in Annex B. Table 1. Summary of the regional Eulerian meteorology models included in the web based model inventory (status 30.10.2007) Model name

Comments

ADREA

Meso-microscale model

ALADIN (AU, PL)

Comprehensive Air Quality model based on ALADIN forecast data

ARPS

Advanced Regional Prediction System

BOLCHEM

Meteorology model with chemistry included

GESIMA

Non-hydrostatic mesoscale model

HIRLAM

Regional model, limited area

_Enviro _NH COSMO (COSMO_CH§§, COSMO_Climate model***, COSMO_EU†††, COSMO_DE‡‡‡, COSMO_IT§§§)

Non-hydrostatic mesoscale model (German versions COSMO_EU and COSMO_DE, Swiss Alpine version COSMO_CH, Italian version COSMO_IT; COSMO climate model)

MC2

Non-hydrostatic. Limited area model. Semi-Lagrangian

MEMO

Non hydrostatic mesoscale model.

(ES, GR, PT) MERCURE

Limited area

MESO-NH

non-hydrostatic limited area model. optional chemistry on-line (then called Meso-NHC)

METRAS

Non-hydrostatic community model with (passive) tracer and pollen transport; part of multiscale model system M-SYS (meso-/microscale meteorology and chemistry); simplified nonresearch public domain version.

MM5

Non-hydrostatic model. A globally used version.

(GR, PT, UK, GER)

§§

RAMS

Non-hydrostatic models. Limited area

SAIMM

Prognostic nonhydrostatic Model

TAPM

Very fast. Simplified linear chemistry

UM

Global and regional domains. Climatic variables

WRF_ARW

mesoscale meteorological model. Improved physics and new modules with respect to MM5.

WRF_CHEM

Chemistry included on-line. Hemispheric domain.

Formerly named aLMo

***

Formerly named CLM

†††

Formerly named LME

‡‡‡

Formerly named LM

§§§

Formerly named LAMI

3

The model qualities are summarized in several tables as part of the inventory for the different model types (see web based model meta-database at http://www.cost728.org/). Overview tables on the equations solved, on parametrizations and solution techniques used, as well as on initialisation and nesting techniques used within the models can be found on the website. Furthermore, summary tables provide details on the validation and evaluation of the models (Section 7). The database is open for updates and new entries. Changes in the existing entries are possible at any time; the summary tables are updated automatically once changes are made to the entries. Most models provide several options for solving the basic equations, the approximations used and the applied parametrizations. In this document only a summary is given of how they are normally applied by the research institutes contributing to COST728. Currently (as of 30.10.2007) 18 different mesoscale meteorology model families (Table 1) have been introduced in the model meta-database; five of these are hydrostatic models (ALADIN, BOLCHEM, GME, HIRLAM, TAPM). Twelve models calculate precipitation from prognostic equations (ARPS, GESIMA, COSMO, MC2, MERCURE, MESO-NH, METRAS, MM5, RAMS, TAPM, UM, WRF), another four diagnose precipitation (ALADIN, BOLCHEM, GME, HIRLAM, MEMO) and two models do not calculate precipitation at all (ADREA, SAIMM). Cloud cover is diagnostically calculated in nine of the 15 models calculating clouds (ALADIN, BOLCHEM, GME, HIRLAM, COSMO, MEMO, MERCURE. MESO-NH, UM). With the exception of two models (ALADIN, UM) all calculate turbulent kinetic energy from a prognostic equation (ADREA, ARPS, BOLCHEM, COSMO climate model, ENVIROHIRLAM, GESIMA, HIRLAM, COSMO, M-SYS, MC2-AQ, MCCM, MEMO, MERCURE, Meso-NH, METRAS, MM5, RAMS, SAIMM, TAPM, WRF) and four models also calculate dissipation from a prognostic equation (ADREA, MERCURE, SAIMM, TAPM). Inversion heights are calculated using prognostic equations by six models (ADREA, MC2, MEMO, MM5, SAIMM, TAPM) and diagnosed in another nine (ALADIN, ARPS, BOLCHEM, HIRLAM, COSMO_EU, MERCURE, MESO-NH, METRAS, RAMS). Details on the parametrizations used in the model can be found in the COST728-WG1 report, and on modelling systems in the COST728-WG2 report (Baklanov et al., 2008). Further information can be found on the COST 728 website****. Among the non-hydrostatic models, five models (COSMO, MC2, MM5 family members, UM, WRF) use no approximations (non-hydrostatic, fully compressible models), five the anelastic approximation (GESIMA, MERCURE, MESO-NH, METRAS, NH-HIRLAM) and six (BOLCHEM, GESIMA, MEMO, METRAS, RAMS, SAIMM) the Boussinesq approximation. ALADIN, BOLCHEM, HIRLAM, SAIMM and TAPM are hydrostatic models. Table 2 lists the meteorological and chemical models according to the classical approach of solving the dispersion equation: Lagrangian †††† and Eulerian. Additionally, the classification includes the terms mesoscale, global and microscale for the Eulerian type of models in regard to the classical meteorological approach of the extension of the domain covered by the model simulation. The information on the models is again mostly based on the COST model inventory‡‡‡‡, which does not include OPANA and CFS. OPANA is documented in the European Environmental Agency model inventory§§§§ and CFS is the operational climate model from NCEP (USA)*****.

****

http://www.cost728.org/

††††

Lagrangian concerns here the model type and not the method of numerical solution (e.g. Lagranigian advection schemes) ‡‡‡‡ http://www.cost728.org §§§§

http://pandora.meng.auth.gr/mds/showlong.php?id=113

*****

http://cfs.ncep.noaa.gov/

4

Table 2. Summary of the Eulerian and Lagrangian transport and chemistry & transport models included in the web based model inventory (status 05.02.2007) Chemistry & transport model Eulerian mesoscale models AURORA BOLCHEM CAMx _ALADIN _MM5 CALGRID CHIMERE CMAQ EMEP Enviro_HIRLAM EPISODE

LOTOS-EUROS MARS MATCH MC2-AQ MCCM MECTM†††††

MOCAGE MUSE OFIS OPANA SILAM

TCAM TREX WRF/Chem

Comments

AURORA employs exactly the same grid as its meteorological 'driver' (the ARPS model) BOLCHEM can operate using two different gas phase chemistry schemes: SAPRC-90 and CB-IV. Many applications. Several chemical schemes. Optimized numeric’s. OSAT and PA. Meteorology is taken from CALMET. Simple chemistry. Meteorology with MM5 and COSMO-IT. Meteorology with MM5. Several chemical schemes and process (PA). European applications. Eulerian mesoscale models Combined Eulerian and Gaussian approaches. Limited area. Simple linear chemistry. Diagnostic meteorology or from MM5. Ozone and Aerosol chemistry. European domain. 4 vertical layers. Model for the Atmospheric Dispersion of Reactive Species Basic atmospheric chemistry. Mesoscale Compressible Community - Air Quality Mesoscale Climate Chemistry Model 3D Eulerian photochemistry and aerosol chemistry community model; part of model system M-SYS; meteorology from METRAS. With chemical data assimilation. Multilayer dispersion model. Photochemistry and PM Meteorology from MEMO. Two-layer 2-dimensional Eulerian photochemical model. Eulerian Chemical model based on MM5-CMAQ. Including EMIMO model (emission model) Dual Eulerian & Lagrangian Monte-Carlo modelling system with modular chemistry and data assimilation options. HIRLAM & ECMWF meteo. Aerosol chemistry. Limited area model. Limited area. Simple chemistry. Weather Research and Forecasting Chemistry model

Lagrangian models FLEXPART FLEXTRA LPDM SILAM

Eulerian microscale models MICTM‡‡‡‡‡

Particle Lagrangian Model. European domain. Trajectory Model. European domain. Lagrangian particle dispersion model Lagrangian dispersion model with a high-precision iterative advection algorithm and a Monte-Carlo random-walk representation of atmospheric diffusion 3D microscale Eulerian photochemistry model; meteorology from MITRAS.

Eulerian global models CFS

†††††

Part of model system M-SYS

‡‡‡‡‡

Part of model system M-SYS

Climate Forecast System. An operational climate system

5

As of February 2007, 47 different transport or transport and chemistry model families (Annex B) are introduced in the database. In the mesoscale, 29 of the models (Table 2) include chemical reactions (AERMOD, ALADIN-CAMx, AURORA, BOLCHEM, CALGRID, CALPUFF, CAMx_ALADIN/_MM5, CHIMERE, CMAQ, EMEP, EPISODE, FARM, LOTOS-EUROS, MARS, MATCH, MC2, MCCM, MECTM, Meso-NH, MOCAGE, MUSE, OFIS, OPANA, RCG, SILAM, TAPM, TCAM, TREX, WRF/Chem), 13 include aerosol chemistry (ALADIN-CAMx, CHIMERE, CMAQ, ENVIRO-HIRLAM, FARM, MECTM, MESO-NH, MOCAGE, MUSE, OFIS, RCG, TCAM, WRF/Chem). The meteorology is taken from various other models and interpolated to the grid used in the CTM. Some chemistry transport models use the same grid as their meteorological drivers (ALADIN-CAMx, CMAQ, LPDM, MATCH, MC2-AQ, MECTM, MOCAGE, SILAM, TREX) and thereby avoid additional errors caused by the interpolation. Ten mesoscale models avoid the problem by calculating both, meteorology and chemistry (BOLCHEM, CALMET/CALPUFF, CALMET/CAMx, MC2-AQ, MCCM, Meso-NH, RCG, TAPM, WRF/Chem).

6

3.

SUMMARY OF MESOSCALE MODEL APPLICATIONS

Joanna Struzewska(1), K. Heinke Schlünzen(2) (1) Warsaw University of Technology, Faculty of Environmental Engineering, Institute of Environmental Engineering Systems, Warsaw, Poland (2) ZMAW, University of Hamburg, Meteorological Institute, Hamburg, Germany

The main purposes of mesoscale air quality models are to quantify the concentration levels of primary and secondary gaseous and particle pollutants, to asses the loading of acidifying compounds to the different parts of the ecosystem, and to understand the physical and chemical processes involved in formation, transport and deposition of these compounds. In the past air quality models were divided into two categories: policy decision support models and research models. Modelling tools designed to provide input for policy purposes had simplified descriptions of physical and chemical processes and were used to carry out simulations over long time periods or for multiple scenarios. The research models, including the complex description of atmospheric processes, required large computer resources to carry out long term integrations or multiple runs for policy applications. At present, due to increasing computational capabilities, the state-of-the-art air quality models are being used in decision-making processes. Even long term assessments of abatement measures are to be evaluated with models that give reliable results under a variety of environmental conditions. This requirement is also crucial for operational and semi-operational systems used to inform the public on air quality or on the possible occurrence of smog episodes. Table 3. Mesoscale air quality models' application Long term

Short term

Policy support • Long term assessment (concentration, deposition, exposure) • Emission reduction policy • Public information (on-line forecast, alerts on episodes) • Emergency response

Research • Trends, seasonal and interannual variation of trace species concentrations • Climatological transport pathways • Regional climate change impacts and feedbacks • Chemical processes studies (e.g. SOA formation, removal processes, photooxidants formation) • Biogenic and natural emissions variation (VOC, primary aerosols) • Impact of meteorological processes on pollutants transformation, transport and dispersion

For short term scenarios, air quality model applications might be connected with the description of dispersion characteristics, chemical transformation and removal. There is an ongoing effort to increase the understanding of the fundamental physical and chemical processes that govern pollutant transformation and transport in the atmosphere. This effort is important to improve and develop comprehensive parametrization schemes for meteorological and air quality modelling systems. In the mesoscale, the air flow depends both on dynamics and on energy balance heterogeneities (i.e. spatial variation of surface characteristics, terrain slope). For air pollution dispersion, thermal effects are especially important during the periods characterised by weak synoptic forcing, which – due to poor ventilation - are favourable for the formation of pollution episodes. Uncertainties are mostly found in the flow fields above complex terrain, when precipitation and clouds develop, and in the structure of the planetary boundary layer. Hence, current research activity in this area is dedicated to better describe surface fluxes over complex terrain, turbulent mixing and height of the planetary boundary layer. In addition, convective processes and cloud formation are considered to be important factors influencing pollutant distributions.

7

Meteorological conditions not only impact the pollutants dispersion, but also the chemical transformation processes, the intensity of biogenic emissions and the efficiency of dry and wet removal. This requires proper treatment of surface layer characteristics and correct approaches for radiation and condensation. Figure 2 schematically shows the relation of calculated concentrations on meteorological parameters and chemical/physical processes that depend on pollutant and other characteristics. The Figure also indicates the meteorological, physical and chemical parameters and processes that should be treated in a mesoscale model.

Pollutant and release characteristics

Emission (gas, particle, point, area,..) Wind Pollutant Chemical Physical charac- transtransRadiaTurbuteristics fers fers tion lence and Concentration TemmeteoroHumiperalogical (i.e. (i.e. dity ture para- chemical aerosol meters reactions) growing) Clouds

Pollutant characteristics and meteorological parameters

Deposition (wet and dry)

Pollutant and surface characteristics

Figure 2. Sketch of necessary meteorological parameters (in green circle) and pollutant characteristics (in blue cube) including their dependency on external characteristic in the outermost field (adopted from Schlünzen & Krell, 1994)

Comprehensive chemical mechanisms (often with a few hundred reactions) involving a large number of chemical species (~70 or more) are included in current air quality models. However, considering the type of modelled problem, three major applications might be distinguished: summer photochemical pollution episodes, aerosol formation and distribution, and the assessment of acidification and eutrophication (Table 4). Table 4. Model applications – types of air quality problems Type of the problem Photooxidants

Aerosols

SOx, NOx, NHx

Model applications • Impact of local circulation (breeze, urban heat island, flow over complex terrain) • Summer smog downwind of urban areas (urban plume) • Pan-European summer photochemical episodes • Long-range transport of ozone and its precursors • Urban winter smog formation • SOA formation • Long range transport of naturally emitted aerosols • Aerosols size segregation and speciation • Critical loads (deposition) • Wet removal (rain and fog)

8

All institutions involved in the COST728 Action implement significant research programmes in the field of dispersion meteorology and air quality. Based on the results of an internal survey, universities are mostly research oriented, while meteorological services, due to their national responsibilities, combine scientific activity with more practical and policy oriented applications (Table 5). It is worthwhile to note that there is a clear tendency to go in the direction of "OneAtmosphere Models". This means that all the different aspects mentioned in Table 4 and Figure 2 are treated within the same model system. Table 5 provides an overview of the partner activities within COST728. These activities reflect the types of applications in which air quality models are applied in Europe. Note that not all research work is listed for the different institutions. Table 5. Main contributions of the participating institutions to the research activities of COST728

X X

X

X

X

X X

Bulgarian Academy of Sciences

X

Deutscher Wetterdienst (German Weather Service)

X

X

X

X

X X

X

X X

X

X

X

X

X

Dokuz Eylul University Earth System Research Lab Global Systems Dir (ESRL / GSD)

X

X

X

*

Natural aerosol: dust, sea salt, pollens, particles from biomass burning

†

Local circulations and/or complex terrain

‡

gas pollutants and particles (incl. radionuclides)

9

X

X

X

X

X

Anthropogenic and biogenic emissions

Feedback to meteorology

X

X

Danish Meteorological Institute

X

X

Czech Technical University

Short and long term Dry and wet removal processes‡ Aerosol processes

Short term

Urban air quality

Short and long term

Aristotle University of Thessaloniki ARPA-Hydro-Meteorological Service (Emilia Romagna region, Italy) Bogazici University

Regional to local scale

Long range and transboundary transport Emission & transport of natural aerosols* Short-term air quality episodes Dispersion influenced by local meteorology†

Institution

Chemistry

Regional scale

Global to regional scale Long term Air quality assessment over long periods

Short term

Urban surface layer

Period

Air quality

Climate change impacts on air quality

Local to regional scale

Surface fluxes

Spatial cover

Dispersion meteorology

ABL height parametrization

Application type

Flemish Institute for Technological Research

X

Regional to local scale

X

X

X

X

X

GKSS Research Center

Hungarian Meteorological Service

X

Instituto de Meteorologia, Lisboa X

KNMI Royal Netherlands Meteorological Institute

X

MAQNet, York University, Canada

X

X

X

X

X

X

National Institute of Meteorology and Hydrology

X

X X

X

Anthropogenic and biogenic emissions X

X

X

X

X

X

X

X

X

X

X

X X

X

Meteorological Institute, ZMAW, University of Hamburg

X

X

X

X

X

X

X

X

X

X

x

X

Norwegian Meteorological Institute X

Paul Scherrer Institute

X

X

X X

Technical University of Madrid (UPM) TNO-MEP

X

X

Meteo-France

Swedish Meteorological and Hydrological Institute

X

X

X

X

Istanbul Technical University

Short and long term

X

Fundación CEAM

Graz University of Technology

Short term

Feedback to meteorology

Short and long term

Dry and wet removal processes‡ Aerosol processes

X

Regional scale

Urban air quality

Finnish Meteorological Institute

Urban surface layer Institution

Chemistry


Short term ABL height parametrization

Period



Air quality


Spatial cover


Surface fluxes

Application type

X X

X

X

X X

UK Met Office

10

X

X

X

X X

X X

X X X

X

X

Chemistry Regional to local scale

Universitat Politècnica de Catalunya

X

X

University of Aveiro

X

University of Brescia X

X

X

X

X

X

X

X

University of Sofia

X

X X

X

X

X

X

X

X

X X

University of Tartu University of West Macedonia US EPA Atmospheric Modelling Decision

Anthropogenic and biogenic emissions

X X

University of Athens (NKUOA)

University of Hertfordshire

Short and long term Feedback to meteorology

Short term

Urban air quality

Short and long term

Dry and wet removal processes‡ Aerosol processes

Regional scale


Institution


Short term

Urban surface layer

Period

Air quality



Surface fluxes

Spatial cover


ABL height parametrization

Application type

X

X

X

Warsaw University of Technology

11

X

X

X

X

X

X

X

X

X

X

X

X X

X

X

4.

DETERMINATION OF MODEL UNCERTAINTY

Ana Isabel Miranda(1), Anabela Carvalho(1), Richard Tavares(1), Peter Builtjes(2), Víctor Prior(1), Carlos Borrego(1), K. Heinke Schlünzen(3), Barbara Fay(4), Veil Odegaard(5) (1) (2) (3) (4) (5)

CESAM & Department of Environment and Planning, University of Aveiro, 3810-193 Aveiro, Portugal TNO, Dep. of Air Quality and Climate Change, Utrecht, the Netherlands and Free Univ. Berlin, Inst. Of Meteorology, Berlin, Germany ZMAW, University of Hamburg, Meteorological Institute, Hamburg, Germany Deutscher Wetterdienst, Offenbach, Germany Det Norske Meteorologisk Institutt, Blindern, Oslo, Norway

The main purpose of this chapter is to present a state-of-the-art review on the impact of model errors on meteorological data relevant for concentration calculations, giving some case studies as examples. Hence, this review deals with model uncertainty estimation methodologies, namely, those related to meteorological outputs important for air quality simulation. Uncertainties associated with air quality model simulations are varied and complex (Fine et al., 2003). Despite the need to quantify these uncertainties (Dabberdt et al., 2004), few attempts have been made to investigate meteorological uncertainties and their role in limiting the expected accuracy of deterministic air quality simulations. The impact of uncertainties in meteorological inputs has been particularly difficult to assess because of the complex correlations, both in the spatio-temporal evolution of the individual meteorological inputs and among the meteorological inputs (Sathya, 2003). On the one hand, meteorology may control or influence emission rates of chemical species and aerosol formation processes (Seaman, 2000) due to the strong dependence of reaction rates on relative humidity, solar energy, temperature and the presence of liquid water and, on the other hand, chemical species concentrations are influenced by thermodynamical processes. Boundary layer structure is strongly related with chemical species concentrations in air quality modelling systems, especially concerning mixed-layer depth, boundary layer stability, turbulent mixing intensity and lower tropospheric three-dimensional wind field (Shafran et al., 2000). These quantities are determined by atmospheric processes that must be simulated accurately, namely, horizontal and vertical transport, turbulent mixing and convection. Seaman (2000) has listed the principal meteorological state variables usually supplied to air quality models: • • • • • • • • •

Horizontal and vertical wind components. Temperature. Water vapour mixing ratio. Cloud fraction and liquid water content. Solar actinic flux. Sea level pressure. Boundary layer depth. Turbulence intensity. Surface fluxes for heat, moisture and momentum.

With such a large number of input fields, estimating the uncertainties in the model outputs is not a trivial exercise. In principle, there are two main methods to investigate model uncertainty, Monte Carlo analysis (Section 4.1) and sensitivity studies (Section 4.2). 4.1

Monte Carlo Meteorological and Air Quality Data Uncertainty Analysis The Monte Carlo analysis is one of the most commonly used methods to estimate uncertainties in model input variables since it is based on quite simple principles (Hanna et al., 1998; Hanna et al., 2001; Bergin et al., 1999). It may be applied to a complete set of more than 100 input parameters and it allows the use of standard nonparametric statistical tests concerning confidence intervals. Several studies have included Monte Carlo simulations with perturbed meteorological and photochemical variables (Hanna et al., 2001; Beekmann and Derognat, 2003), which attempt to span the range of uncertainties of the input to parameters by quasi-random sampling from a specified probability distribution for each parameter, and adjoint linear sensitivity studies of meteorological and photochemical variables (Menut, 2003) about a control parameter set. According to Zhang et al. (2005), all these studies have limitations in the manner of treating 12

meteorological variability. The Monte Carlo simulations apply adjustments to meteorological fields that are uniform in space and time, thereby ignoring the true scales of meteorological variability and the differences in meteorological uncertainty across scales (Hogrefe et al., 2001). The linear sensitivities computed by the adjoint technique are valid only in the neighbourhood of the control simulation, and in the case of sensitivity to wind, that neighbourhood is likely to be quite small (Yegnan et al., 2002). The Monte Carlo uncertainty analysis deals with only one component of the total model uncertainty: the uncertainty in the inputs to the model. In the Monte Carlo procedure, a model is run a large number of times. Each time new values for each of all the input variables whose variability is considered are selected from their respective “pre-defined” uncertainty distributions using a suitable re-sampling technique such as the Simple Random Sampling (SRS) or Latin Hypercube Sampling (LHS), and the model outputs are recorded. The ensemble of model outputs may then be subjected to statistical analysis to ascertain uncertainty in model predictions due to input uncertainties. The following example intends to present a summary of the work developed by Hanna et al. (2001), which was started in 1997, in order to illustrate the Monte Carlo simple random sample methodology applied to input data uncertainties and their impacts on photochemical model results. The UAM-IV is the air quality model used in this study. It is essential to obtain knowledge on each specific input variable uncertainty. The first step is to identify the input parameters that will be considered for the Monte Carlo experiment, and the second is to associate with each input parameter a distribution function (shapes and key parameters such as median and variance). All this information on data was gathered by Hanna et al. (2001) through an expert elicitation where around 20 experts were asked to give estimates of uncertainties, based on their experience, on a web page. The experts had to give estimates of the uncertainty range that would include 95% of the possible values (i.e., from the 2.5th percentile to the 97.5th percentile of the cumulative distribution function (CDF)). Table 6 gives the information compiled for the considered input variables by Hanna et al. (2001), their 95% uncertainty ranges, their assumed distribution functions and the standard deviations of the natural logarithm of the input variable (for log-normal distributions) or the input variable itself (for normal distributions). No correlations on input variables were considered due to lack of information. Moreover, there are some constraints on independent random value estimations for input wind speed and direction for each site at the same instant. Almost all 128 input variables considered in the Monte Carlo experiment are described by a log normal distribution function, by a hypothesis. Exceptions are wind direction, ambient temperature, relative humidity and cloud cover which are assumed to follow a normal distribution. Table 6. Uncertainty ranges (include 95% of data) and associated sigmas (standard deviations of logtransformed data) for some of the 128 UAM-V input variables studied in the Monte Carlo runs by Hanna et al. (2001). An uncertainty range defined by plus and minus and a “factor of…” encompasses 95 % of the data. For small uncertainty factors (i.e., less than 2), a factor of 1 + x uncertainty can be considered to be “plus and minus 100 x %” Variable Initial ozone concentration Initial NOx concentration Initial VOC concentration Top ozone concentration Top NOx concentration Top VOC concentration Side ozone concentration Side NOx concentration Side VOC concentration Major point NOx emissions

Uncertainty range (includes 95 % of data) Factor of 3 Factor of 5 Factor of 5 Factor of 1.5 (50 %) Factor of 3 Factor of 3 Factor of 1.5 Factor of 3 Factor of 3 Factor of 1.5

13

Sigma (log-normal unless noted) 0.549 0.805 0.805 0.203 0.549 0.549 0.203 0.549 0.549 0.203

Variable Major point VOC emissions Wind speed Wind direction Ambient temperature H2O concentration (as RH) Vertical diffusivity (8AM-6PM; < 1000 MAGL) Vertical diffusivity (all other times and heights) Rainfall amount Cloud cover (tenths) Cloud liquid water content Area biogenic NOx emissions Area biogenic VOC emissions Area mobile NOx emissions Area mobile VOC emissions Area low point VOC emissions Other area NOx emissions Other area VOC emissions NO2, HCHOr, HCHOs, ALDs, and O3-O1 Photolysis rates CB-4 reactions 1-94

Uncertainty range (includes 95 % of data) Factor of 1.5 Factor of 1.5 ± 40 º ±3K 30 % Factor of 1.3 (30 %) Factor of 3 Factor of 2 30 % Factor of 2 Factor of 2 Factor of 2 Factor of 2 Factor of 2 Factor of 2 Factor of 2 Factor of 2 Factor of 2 Factor of 2 Factor of 1.01 to 3.02 Median 1.80, Mode 2.5

Sigma (log-normal unless noted) 0.203 0.203 20 º (normal) 1.5 K (normal) 15.0 % (normal) 0.131 0.549 0.347 15 % (normal) 0.347 0.347 0.347 0.347 0.347 0.347 0.347 0.347 0.347 0.347 0.10 to 0.55 Median 0.30, Mode 0.46

Some additional estimates on input and output data uncertainty have been given by COST728 experts in Annex C. The simple sample random Monte Carlo exercise was performed for a period of time containing an ozone episode at 12-14 July 1995 over the domain covering the north-eastern part of the United States and referred to as the Ozone Transport Assessment Group (OTAG) domain. This domain is divided into 11 sub-domains where the results were analysed. Four emission input files were considered: 1995 emissions, 2007 estimated emissions, 50 % NOx reductions on anthropogenic emissions at 2007 and, also for this particular year, 50 % VOC reductions on anthropogenic emissions. The sets of 128 random perturbation numbers for the 100 Monte Carlo runs were identical over the four base emissions scenarios. Gross uncertainties in the output results were determined with a high level of confidence. The variance of the output variables can be well defined allowing, for example, to determine the range of variance in the predicted maximum daily hourly average ozone concentrations with 100 Monte Carlo runs. To illustrate the consistent spread of the distributions, Table 7 lists the 2.5th, 50th, and 97.5th percentiles of the distributions of predicted maximum hourly averaged ozone concentration (ppm), over the 12-14 July 1995 period, for the 11 sub-domains and for the entire OTAG domain (whole-domain) from the 100 Monte Carlo runs with the median year-2007 projected emissions.

14

Table 7. 2.5th, 50th, and 97.5th percentile on the cumulative distribution function of the 100 Monte Carlo predictions of maximum hourly averaged maximum ozone concentration (ppm) for the 11 subdomains and for the entire domain for the 12-14 July 1995 ozone period and for year-2007 median emissions, for the OTAG domain (12 km grid) (Hanna et al., 2001) 2.5th percentile 0.09 ppm 0.08 ppm 0.07 ppm 0.07 ppm 0.06 ppm 0.07 ppm 0.07 ppm 0.06 ppm 0.06 ppm

50th (median) 0.17 ppm 0.14 ppm 0.12 ppm 0.12 ppm 0.12 ppm 0.12 ppm 0.11 ppm 0.11 ppm 0.11 ppm

97.5th percentile 0.32 ppm 0.22 ppm 0.21 ppm 0.19 ppm 0.19 ppm 0.18 ppm 0.19 ppm 0.19 ppm 0.19 ppm

Charlotte

0.07 ppm

0.11 ppm

0.18 ppm

St. Louis Whole Domain

0.06 ppm 0.13 ppm

0.09 ppm 0.19 ppm

0.15 ppm 0.32 ppm

Sub-domain Atlanta Balt-Wash Nashville Chicago Louisville Pittsburgh Philly New York New England

In another study Beekmann and Derognat (2003) applied a Bayesian Monte Carlo (BMC) uncertainty analysis to a case study simulation (7 August 1998 and 16-17 July 1999) of photochemical smog formation in the Ile-de-France region during the ESQUIF campaign. The uncertainty assessment is based on the chemistry transport model CHIMERE covering the European continental scale with nesting options for several urban areas. The study addresses the overall model uncertainty due to several model input parameters (emissions, meteorological parameters, rate constants, photolysis frequencies). The Bayesian variant of Monte Carlo analysis allows the attribution of larger weights to those individual simulations which give a better fit to observations. The authors obtained the following results: •

•

•

• •

Uncertainties in the simulated ozone maxima (O3 max) for the 3 days, both for the baseline and for the 50% reduced emissions scenario, are reduced by a factor between 1.5 and 2.7 by the measurement constraint and range between ±15 and ±30% (when expressed as relative differences between the 50th and the 10th or 90th percentiles). Uncertainties in the simulated differential sensitivity of ozone formation to NOx and VOC emission reductions are reduced by a factor between 1.8 and 3.1 by the measurement constraint and range between ±4 and ±10 ppb (when averaged over the plume). The measurement constraint induces little changes in daily surface ozone (DSO) for 7 August, shifts it to even more positive values for 16 July, and shifts it to negative values for 17 July (larger probability for a sensitivity to NOx emission reductions). The constraint by ozone measurements in the urban area and in the plume is, in most cases, sufficient to efficiently reduce uncertainties in O3 maxima for the baseline and for the 50% reduced anthropogenic emissions scenario. Additional nitrogen species measurements (NOy, NOx) in the plume are necessary to reduce the uncertainties in the DSO; additional constraints by VOC and wind measurements only slightly change the results. Sensitivity tests with modifications in the BMC method (varying uncertainty ranges for input parameters, a lognormal instead of a normal distribution of the uncertainty in observations, and a larger number of Monte Carlo simulations) confirm that the results are robust. Changes in the simulated sensitivity to NOx and VOC emission reductions are related to the modified a posteriori distributions of emissions, i.e., a smaller average VOC/NOx emission ratio for 16 July (-22%) and a larger one for 17 July (+27%). A possible underestimation of the PBL height on 17 July would cause a posteriori NOx emissions to decrease to a lesser extent (-11% instead of -21%). The median wind speed and propane equivalent carbon/NOy ratio obtained with the standard constraint alone are in good agreement with corresponding measurements from the DIMONA flight.

15

4.2

Sensitivity Analysis An alternative approach to investigate the model uncertainties are sensitivity studies.

4.2.1

Meteorological and Photochemical Ensemble Simulations Zhang et al. (2007) presents an ensemble approach in order to evaluate the impacts of meteorological uncertainties on ozone pollution. The purpose of the study is to investigate the sensitivity of Eulerian grid model ozone simulations to small perturbations of meteorological variables that are realistic in structure and evolution. Through this ensemble approach it is demonstrated that the sensitivity of ozone to such perturbations is substantial and constitutes a serious limitation on deterministic photochemical simulations. This study demonstrates the impact of such uncertainties on ozone pollution predictability, through ensemble forecast using current state-of-the-art meteorological and photochemical prediction models. The strong correlation between the peak ozone with initial wind and temperature uncertainties clearly demonstrated the importance of accurate representation of meteorological conditions for local prediction. This paper illustrates the real need for probabilistic evaluations and forecasting of air pollution, in particular for regulatory purposes. 4.2.2

Input Parameters Sensitivity Analysis (Topography, Land-use) This case study concerns the influence of thermally induced circulations on photochemical model results. The applied numerical system includes models MM5 and MARS. The model input domain covers the North-western part of Portugal mainland (40 x 40 grid cells with 5 x 5 km resolution). The simulated period for the pollutant’s dispersion in the atmosphere included two consecutive summer days under the influence of a thermal low-pressure system located over the Iberian Peninsula, 15 and 16 of July 2000. More details on this modelling study can be found in Carvalho et al. (2006). The sensitivity analysis on vertical ozone concentration fields was carried out using the factor analysis developed by Stein and Alpert (1993). According to these authors 2n simulations are required to correctly evaluate the contribution and the interaction between the n factors. In this study a set of four simulations were performed (Table 8). The modified characteristics include constant height as flat terrain at 0 m and constant land use defined as mixed shrub and grassland (code 9 from the United States Geological Survey database). Table 8. Sensitivity analysis options Simulation f12 f2 f1 f0

Land use USGS USGS constant constant

Topography USGS Null USGS Null

The ozone concentration fields obtained are labelled according to the four simulations (Table 8). Hence, fields are obtained for the control simulation f12, for the flat terrain simulation (f2), for the constant land use category simulation (f1) and for the simulations where these two factors were also set constant, i.e., flat terrain and constant land use category (f0). To detect the contribution of these factors to the vertical ozone distribution it is necessary to observe the fields defined by:

fˆ0 = f 0 fˆ2 = f 2 − f 0

fˆ1 = f1 − f 0 fˆ12 = f12 − (f1 + f 2 ) + f 0

Where ˆf 0 shows the ozone concentration not related with either of the two factors under analysis;

ˆf shows the ozone concentration induced by the topography; ˆf shows the influence of 1 2 ˆ heterogeneous land use and f gives information related to the non-linear interaction of these two 12

factors on the ozone concentration field. It is possible to conclude that vertical gradients almost vanish for the simulation where flat terrain and constant land use are considered, although some 16

ozone spots appear below 5 km altitude. For these conditions higher ozone concentrations are observed in higher altitudes over land. An enhancement of ozone concentrations is simulated over the ocean (western part of the domain) being more pronounced when topography is the inductor factor ( ˆf 1 ). At noon, ozone concentrations increase in the western part of the domain due to both

factors ( ˆf 1 and ˆf 2 ) for the two simulated days, as well as its horizontal extension. This feature is

also detected at 18 UTC but is less intense. On the other hand, land use effects ( ˆf 2 ) are more related with ozone diminishing values over land (Centre and East part of the domain). Topography represents a leading factor in ozone transport to the higher levels in the western part of the domain. The air quality stations selected for MARS ozone uncertainty estimation are Teixugueira, Estarreja and Coimbra, which are representative of a rural, an industrial and an urban location respectively. The coefficient of variation (CV; Annex E) was the selected parameter for uncertainty evaluation (Figure 3).

60.0

50.0

40.0 Control simulation - f12

CV (% ) 30.0

Flat - f2 Const.Land-use - f1 Falt and const. Land-use - fo

20.0

10.0

0.0 COIMBRA

AVANCA

TEIXUGUEIRA

Air quality stations

Figure 3. Coefficient of variation (CV) obtained in the hourly ozone concentration results for each run due to input changes in topography as expressed in Table 8

As expected air quality stations located at a rural or industrial/rural site show greater CV values for all performed simulations, they may reach values higher than 50 % for the control simulation at Teixugueira. Over these sites an enhancement of variability is also observed when constant land use is introduced. Ozone simulated results at the urban air quality station of Coimbra are slightly more variable around mean values when flat terrain is considered. This result agrees with that obtained by Stein and Alpert (1993). 4.2.3

Adjoint Modelling Approach According to Menut (2003) several sensitivity studies have been performed on the basis that a single perturbation is applied on an impact parameter and its influence on the modelled concentration is diagnosed. These methodologies are powerful for uncertainties investigations. However, these studies provided limited information on the parameters ranking by magnitude of impact. In addition, no information can be derived on the time and location of the most important contribution for a chosen parameter. Adjoint modelling uses a different approach since sensitivity of one pollutant is estimated to all parameters under only one model integration. Menut (2003) focused on the sensitivity of O3, Ox and NOx in the surface layer.

17

4.2.4

Sensitivity of Model Results to Nesting Lenz et al. (2000) present a study where the sensitivity of the results of a high-resolution chemistry transport model are analysed with respect to nesting the model into a larger scale model. The simulations for meteorology were performed with the mesoscale transport and fluid model METRAS (Schlünzen, 1990; Schlünzen et al., 1996) and the simulations for transport and chemistry with the mesoscale model METCM (Müller et al. 2000). The model study has been performed for the second measuring campaign of the TRACT experiment which took place in the bordering area of south-western Germany, north-eastern France and the northern part of Switzerland on September 16, 1992 (Zimmermann, 1995). Airplane measurements of different meteorological and chemical quantities were taken along two flight patterns in the model area. These measurements have been used to compare the model results and to assess the importance of model nesting for selected quantities. Cumulative distribution functions were calculated and hit rates with respect to measurement reliability determined. METRAS and MECTM have been nested into the larger scale models MM5 (Grell et al., 1993) and CTM2 (Hass, 1991) from the EURAD model system (Ebel et al., 1997). For the prognostic meteorological variables, the method of nudging has been used (Schlünzen et al., 1996). MECTM is nested into time dependent boundary conditions interpolated from the trace gas concentrations of CTM2 (Niemeier, 1997; Müller et al., 2000). To determine the sensitivity of the model system METRAS/MECTM to model nesting, four model simulations have been performed: • • • •

Full nesting: nesting of meteorological and chemical variables. Nesting of meteorology, but chemical variables are not nested. Meteorology not nested, but chemical variables are nested. No nesting: neither meteorological nor chemical variables are nested.

In general, the nesting of the meteorological part of METRAS into a larger scale model enhances the precision of the forecast of the meteorological variables. However, the forcing data adapted from the larger scale model must be of good quality, because the nested model METRAS can only partly correct deficiencies included in the forcing data. In the considered case the forecast of NOx and O3 concentrations is more dependent upon a correct description of the meteorological boundary values than on the concentration fluxes across the lateral boundaries. The simulated NOx concentrations are rather insensitive to the nesting of trace gas concentrations, which is probably a result of the poor performance of the forcing data in the considered case. For the prediction of O3 concentrations, the nesting of meteorology has to be accompanied by the nesting of chemical variables. Thus, for both gases it is concluded that nesting meteorology is at least as relevant as nesting concentrations. Further sensitivity studies, as well as simulations for other simulation periods, are needed to confirm these findings. Sensitivity of UAP Forecasts to Meteorological Input and Resolution† Numerical weather prediction (NWP) models are increasingly used as providers of meteorological data for urban air quality (UAQ) or urban air pollution (UAP) models. UAQ forecasts are used as a decision making tool for local authorities. The near-surface wind and temperature fields are the main forcing of UAP models, directly and indirectly in defining the turbulence regime and parameters. Attempts to improve the input of temperature, wind and turbulence parameters in the boundary layer are evaluated in terms of their effect on the UAQ forecasts.

4.2.5

In the EU FP5 project FUMAPEX‡ (Integrated Systems for Forecasting Urban Meteorology, Air Pollution and Population Exposure, 2002-2005), 6 partners participated in a mesoscale meteorological model inter-comparison and validation exercise with their distinct model chains. Simulations were performed for 10 different pollution episodes in 4-5 target cities (Helsinki, Oslo, Bologna, Valencia, (Torino)) with varying models and partners. The episodes are characteristic for the regions: winter inversion-induced and spring particle episodes in Helsinki, Oslo and Bologna, summer ozone episodes in Bologna and Valencia. The models were nested as follows: CEAM*

This work was conducted within the BMBF funded tropospheric research programme TFS.

†

This work was conducted as part of the FUMAPEX FP5 EU project.

‡

http://fumapex.dmi.dk

18

RAMS (40, 13, 4.5, 1.5 km), DMI-HIRLAM (5, 1.4 km), DNMI-HIRLAM (10 km) and MM5 (9, 3, 1 km), DWD COSMO-EU (formerly: LM) and ARPA model COSMO-IT (7, 2.8, 1.1 km), FMIHIRLAM (33, 22 km), MM5 UH (Univ. of Hertfordshire) (81, 27, 9, 3, 1 km). Table 9. Horizontal resolution Torino Valencia Oslo Helsinki Copenhagen

RAMS 4 km, 1 km RAMS 9 km, 4.5 km CAMx 12 km, 4 km

COSMO-EU 7 km/35l, 2.8 km/45l, 1.1 km/45l COSMO-EU 7 km/35l, 2.8 km/45l, 1.1 km/45l COSMO-EU 7 km/35l, 2.8 km/45l, 1.1 km/45l

HIRLAM 15 km, 5 km, 1.4 km

The sensitivity of the UAQ forecasts to NWP model horizontal resolution (Table 9) is examined in the simulations of the FUMAPEX target cities Bologna, Turino (model system RAMS/FARM), Valencia (model systems RAMS/CAMx and COSMO-EU-trajectories), Oslo (model systems MM5/ AirQUIS, COSMO-EU-trajectories and COSMO-EU/LPDM), Helsinki (model system COSMO-EU-trajectories, COSMO-EU/LPDM) and Copenhagen (model system HIRLAM/DERMA) (Ødegaard et al., 2005). Furthermore, the sensitivity of the forecasted concentrations to vertical resolution (Table 10), forecast length, improved parameterization and introduction of urban surface was investigated. Table 10. Vertical resolution Valencia Oslo Helsinki

COSMO-EU 7 km/35l, 2.8 km/45l, 1.1 km/45l COSMO-EU 7 km/35l, 2.8 km/45l, 1.1 km/45l COSMO-EU 7 km/35l, 2.8 km/45l, 1.1 km/45l

MM5 17l, 26l

The influence of an increase in horizontal (and -in some models- vertical) resolution without adapted urban parameterisations for the below 10 km resolution is largest for the model results interpolated to the station locations. No height corrections were made. The model intercomparison revealed for increasing horizontal resolution sharper gradients in wind and temperature fields, higher maximum values of wind speed, increased channelling of trajectories, increased vertical velocity in steep terrain. The last might be an artefact resulting from a model problem. The impact can be summarised for the different types of stations as follows: For coastal stations (Helsinki / Valencia) Improvements are mainly due to a better land/sea description of the coastline and the associated soil type distribution which affects the surface fluxes via its specific thermal and hydrological characteristics. Similar effects may also occur at inland stations but with smaller impact. The substantial influence of varying surface properties and physiographic parameters with increased resolution near the coast is stated for FMI-HIRLAM, DMI HIRLAM and DWD COSMOEU (Fay et al. 2004, 2005). For mountainous areas (Oslo / Valencia / Bologna) Large impacts and often some improvement (e.g. for 2 m temperature and 10 m wind) at station locations are mainly due to the more detailed orography leading to improved topography effects like blocking, shading, flow around mountains, increased channelling in valleys, more clearly defined convergence/divergence lines, improved foehn simulations and mesoscale circulations. The changes/improvements are very distinct when looking at horizontal model fields of temperature, wind speed and direction). In all other cases, the impact of an increased model resolution on the investigated parameters is small for surface parameters but may increase for vertical profiles of meteorological parameters or parameters showing some height integration (e.g. total cloud cover, planetary 19

boundary layer height). Thus, the general results of previous studies about increasing model resolution (as reviewed e.g. in Mass et al., 2001) are confirmed for the mesoscale models participating in FUMAPEX and the considered urban regions and air pollution episodes. The results are described in detail in Fay et al. (2004, 2005, 2006) and Fay and Neunhäuserer (2006). Increasing the vertical resolution did not reveal such clear results, since in most experiments horizontal and vertical resolution are changed simultaneously. However, to study the impact of vertical resolution clean tests are required, otherwise it is difficult to address which one has most impact, and on which results. Vertical resolution of air pollution model increases the complexity of the input data, since dispersion depends heavily on the height of the emissions. The forecast length varied between 48 hours for all cities and 72 hours for Torino. The general NWP result that forecast error increases with forecast length is not valid for meteorological input to UAQIFS. In contrast, the error level depends on the meteorological processes to be described. Urban processes are difficult to reproduce with NWP simulations on short and long time scales. Initial errors and spin up problems are relevant for high resolution. The initial imbalances are due to differences in resolution between host model analysis and UAQ model To study the impact of PBL parameterizations on concentrations, a strong inversion problem was studied for Oslo. The PBL schemes tested all seem capable of doing their job – i.e. in stable cases they guarantee small (zero) vertical exchange of heat and momentum. This is only based on MM5 results, thus a generalisation still needs to be performed.

20

5.

MODEL QUALITY INDICATORS The statistical analysis to evaluate model performance and to estimate uncertainties comprises a set of parameters, giving information about the ability of the model to predict the tendency of observed values, errors on the simulation of average and peak observed values, and type of errors (systematic or unsystematic). 5.1

Quality Indicators for Evaluating Meteorological Parameters Veil Odegaard Det Norske Meteorologisk Institutt, Blindern, Oslo, Norway

Evaluation of meteorological parameters as simulated by mesoscale models should be performed by comparison with observations. Annex D summarizes the most used evaluation measures for meteorological parameters. The availability of observations is limited concerning spatial resolution and number of parameters. Moreover, an observation is not necessarily representative of the area surrounding the observational site. When running mesoscale models one important aim is to have simulations which are closer to the extreme (peak) values that are observed compared to more smooth simulations from synoptic scale NWP models. Statistical measures often favour smooth fields. These problems are discussed in detail below. 5.1.1

Observation Availability Mesoscale models’ output considerable amount of information on meteorological parameters, also on parameters that are not observed. Evaluation of models is inevitably limited to those parameters which are observed. Common parameters observed by ground observation networks are mean sea level pressure, 2 m temperature and 10 m wind, 2 m relative humidity and rainfall. From manual observation sites cloud coverage, fog and snow depth/snow coverage are often available as well. Precipitation and wind can be derived from radars. Surface (sea surface) temperature, wind direction, cloud cover, cloud top temperature and snow cover are commonly derived from satellite data. Observations can be based on the stationary ground network of synoptic stations or they can be supplied by remote sensing instruments, also carried on satellites. The advantage of remotely sensed observations, from such as radar or satellite measurements, is usually the higher spatial resolution compared to the ground network. Horizontal resolution down to 1 km is available from many data sources. These data sources will play an important role in future observation systems, and great effort is made inside the meteorological community to make optimum use of these data. For evaluation purposes the main challenge lies in the interpretation of the monitored variables in terms of meteorological parameters, e.g. how to derive air temperature from radiation data, and to develop reliable methods for estimating precipitation amounts from radar reflections. 5.1.2

Observation Error The ground observing network supplies data which can be directly compared to model simulations. However, the ground network has a spatial resolution which is far lower than that of mesoscale models. In addition, the data are point wise and, in general, not representative of the larger surrounding area. The knowledge of the observation error (including measurement error and error due to the representativeness of the measurements) is important for data assimilation as well as for model evaluation. To reduce the measurement error, data control routines are used to remove values outside the accepted range. The accepted range must consider all possible observation values and preferably it should be derived from long time series of observations (several years) at the specific location. The accepted range should also be physically sound, e.g. relative humidity should be in the range 0-100%, dew point temperature has a range related to the temperature itself and should, therefore, be expressed as dew point depression. The dew point temperatures are always equal to or lower than the ambient temperature. The observation error also includes the representativeness of the measurements. WMO rules for the set-up of meteorological sites ensure that the sites have some representativeness of 21

the surrounding (few kilometres) and of some time (10 minutes averages). Local influences like trees, buildings or, urban growth which is especially relevant for long-term analyses of data, may affect the measurements. As a result of shading or increased insolation, e.g. in steep terrain, temperature can exhibit very local representativity. Within urban areas most measurements are usually of local representativity: meteorological data are very much influenced by building impacts, while concentration data additionally depend on the emission distribution which is of very local nature in urban areas. 5.1.3

Recommended Quality Indicators for Different Meteorological Parameters The quality indicators are described in Annex D. Special problems arising from different event timings in model results and measurements are discussed in Annex F. 5.1.3.1 Mean Sea Level Pressure Air quality is not directly related to mean sea level pressure (mslp). The parameter is important to establish the correct weather regime and thus determines a lot of other parameters to which air quality is sensitive. The quality of mslp forecasts is a general model quality indicator. RMSE and STDE are widely used and compared. An overall measure for model performance is the hit rate H using an allowed deviation of i.e. 1.7 hPa (Table 24). Caution should be shown to BIAS in mslp, since mslp is not observed directly. The pressure is reduced from observation height to sea level using temperature among others. Similarly the model mslp is also reduced. One source of BIAS is that these reduction procedures are inconsistent. Similarly, values for H might be small since the error caused by pressure reduction is large. 5.1.3.2 Wind Speed Mesoscale models tend to have problems with capturing the total amplitude of the wind speed (ff). An overall measure for model performance is the hit rate H using e.g. an allowed deviation of 1 m s-1 (Annex D). However, air pollution episodes occur in low wind speed cases which need to be evaluated separately. For evaluation of the model's ability to capture the low wind speed cases the observations of these cases should be treated separately in the statistical computations. BIAS and hit ratio HR and false alarm ratio FAR can be calculated for the wind speed in the interval 0 to 1.5 m s-1. Ozone episodes occur frequently in high pressure situations with sea breeze circulation, which has a diurnal cycle. Evaluation of the diurnal cycle in the model can utilize BIAS calculated for each time of the day. When very strong winds are observed the absolute value of the error is often large. Normalized measures are recommended for evaluation of long time series, which include all observations, and when the extreme events are not in focus. 5.1.3.3 Wind Direction An overall measure for model performance of wind direction (dd) is the hit rate H, using e.g. an allowed deviation of DA=30° (see Annex D). This value was suggested by Cox et al. (1998) for the evaluation of weather forecast models. It is relatively large for air quality applications and should be reduced for those to DA=10°, if the averaging interval for the comparison data is of several 10 minutes. When using BIAS for wind direction the wind observations must be sorted on directions, e.g. eight different directions. In areas with complex orography and surface properties the BIAS will tend to be pronounced for some wind directions, in particular those arising from surface properties that are not resolved in the applied model. In complex orography it is possible to have most of the wind cases distributed on different directions in observations and model data. Wind vector error is therefore a better measure. For this purpose the direction weighted wind error DIST can be used. 5.1.3.4 Temperature Air pollution episodes are not forced by the temperature itself but by the vertical (and horizontal) temperature gradient. Intense air pollution episodes occur when there is a temperature inversion. The height of the inversion level is the crucial parameter. Evaluation of inversion level 22

requires temperature observations in several levels, e.g. from observation towers or radio soundings. These observations are limited and have an insufficient coverage. The vertical resolution of radio sounding data is often too coarse. Moreover, there are few towers with sufficient height to capture inversion levels. Hit ratio (HR) and false alarm ratio (FAR) of near surface inversion is a possible measure. Air temperature at 2 m level (T2m) is strongly tied to surface properties. Model error can vary considerably in areas with complex surface properties. BIAS calculated for e.g. coastal, urban, mountainous or forested areas separately points directly to model deficiencies and specific corrections of model errors. The strong influence of local conditions on the temperatures often gives a standard deviation of error STDE that has the same magnitude at the end of the forecast as at the beginning, due to model deficiencies to capture the local conditions. The sea breeze regime results in a diurnal cycle of the temperature which is as pronounced as the diurnal cycle of the wind. The diurnal cycle of the BIAS reveals model deficiencies. Normalized measures should be avoided for temperature since error in high temperature cases should count equal to an error in low temperature cases. This can be considered when using hit rates H with an allowed deviation of 2 K (Annex D). 5.1.3.5 Cloud Cover Air pollution episodes occur in clear and cloudy conditions, but radiative processes like ozone production are active in direct sunlight. Cloud cover is a parameter with relatively large error in NWP models as well as measurements. Comparison of model quality by RMSE, STDE and BIAS are often used to gain information on model prediction of cloud cover. 5.1.3.6 Humidity Air quality modelling is not very sensitive to atmospheric humidity. Evaluation should be performed on dew point temperature since this parameter is less sensitive to error than the temperature itself. An overall evaluation of model performance should be made by using the hit rate (H), using an allowed deviation of 2 K (Annex D). However, dew point temperature is not a directly calculated parameter because the saturation vapour pressure is a non-linear temperature dependent function. If the horizontal variation of humidity’s large, the model error tends to be large. As for temperature, the dependence of the humidity and humidity error on surface properties is also strong. Important surface properties that influence the observations might be unresolved in the model. This is easily recognized when inspecting the BIAS at sites where surface properties vary. 5.1.3.7 Precipitation Air quality is highly dependent on the scavenging effect of precipitation. An adequate evaluation measure is to calculate the false alarm ratio FAR of rain/no rain (Annex D). If the model tends to have too smoothed precipitation fields or the event is rare, FAR is close to 1. Hence, focusing on FAR would result in very similar values for very different reasons. As an additional measure HR or H can provide valuable insight into the model performance. The lower limit for categorizing rain must be so high that the rainfall can be effective in washout pollutants and binding dust, perhaps 1 mm / 6 hours is a reasonable value. However, spatial representativity of precipitation data is very small when using rain gauge data. Therefore, quantitative comparisons with these data should only be performed when comparing at least monthly values (95% of measurements within factor 1.4; Annex C). As for wind speed the error tends to be large when observed amounts are large. Therefore, normalized values should be used when calculating BIAS and RMSE on long time series.

23

5.2

Quality Indicators for Air Quality Model Evaluation Ana Isabel Miranda, Alexandra Monteiro, Helena Martins, Carlos Borrego CESAM & Department of Environment and Planning, University of Aveiro, 3810-193 Aveiro, Portugal

This section presents a collection of quality indicators currently used in the evaluation of concentration values calculated with air quality models, together with examples of their application. Part of this compilation and analysis work has been conducted in the scope of the European project Air4EU (Borrego et al., 2006). The methods to perform a statistical model evaluation which are discussed in this section are the most commonly used statistical parameters, including by the EPA, and are incorporated in the EU Framework Directive. 5.2.1

Statistical Parameters for Concentrations Discussion on the evaluation of air quality models and on the development of general evaluation methods has been carried out by many scientists. However, standard evaluation procedures and also performance standards still do not exist. Traditionally, model predictions are directly compared to observations, but this may cause misleading results because uncertainties in observations and model predictions arise from different sources (Chang and Hanna, 2004). Hanna et al. (1993) recommended a set of quantitative statistical performance measures for evaluating models, which have been widely used in many studies and have been adopted as a common European model evaluation framework (Olesen, 2001). Recently the hit rate (H) (Trukenmüller et al., 2004; VDI, 2005) has been added to these measures (Olesen, 2007). The main statistical parameters used as quality indicators are presented in Annex E. The parameters defined in Annex E are not exhaustive; others can be defined and used according to the purpose and emphasis of the study. Multiple performance measures should be applied and considered in any model evaluation exercise, as each measure has advantages and disadvantages and there is not a single measure that is universally applicable to all conditions. Since, for most atmospheric pollutant concentrations, the distribution is close to lognormal, the Gaussian distribution based measures fractional BIAS (FB) and normalized mean square error (NMSE) may be overly influenced by infrequently occurring high observed and/or predicted concentrations, whereas the logarithmic measures geometric mean BIAS (MG) and geometric variance (VG) may provide a more balanced treatment of extreme high and low values. MG and VG may be overly influenced by extremely low values, near the instrument thresholds, and are undefined for zero values. Factor of two (FAC2) is the most robust measure as it is not overly influenced by outliers. FB is a measure of mean relative BIAS and both, FB and MG, indicate only systematic errors, whereas NMSE is a measure of mean relative scatter and both, NMSE and VG reflect systematic and unsystematic (random) errors. The correlation coefficient (r) reflects the linear relationship between two variables and is thus insensitive to either an additive or a multiplicative factor. The hit rate (H) is a measure independent of the error distribution and is the only error measure that can consider both, absolute and relative measurement uncertainty by selecting corresponding values for W and A (Annex E). Elbir (2003) proposed a statistical analysis that includes the index of agreement (IOA), which determines the degree to which magnitudes and signs of the observed value about mean observed value are related to the predicted deviation about mean predicted value, and allows for sensitivity toward difference in observed and predicted values as well as proportionality changes. IOA varies from 0.0 (theoretical minimum) to 1.0 (perfect agreement between observed and predicted values) and gives the degree to which model predictions are error free. Schlünzen and Meyer (2007) and Trukenmüller et al. (2004) used in their evaluations hit rates (H_EU) based on the accuracy requirements defined in the EU daughter directives on air quality (Section 5.2.3). In contrast to the EU directives they, however, kept the timing in their evaluation. When all values are within an allowed difference, H_EU should be 100%.

24

5.2.2

EPA Quality Indicators EPA (1996) presents a compilation of a series of photochemical model simulations conducted within the United States and validation exercises. Validation focuses on the models’ ability to predict the domain-wide peak ozone concentration and the concentrations at all locations with observed ozone concentrations above 60 ppb. These quality indicators are described in Table 11 (only valid for ozone), including the acceptable values, which are merely indicative, because they were defined based on tests performed. Table 11. EPA quality indicators for air quality model performance evaluation regarding ozone (P= Predicted value, O = Observed value, N = Number of values) Parameter

Formula

Acceptable values

Normalized accuracy of the maximum 1-hour concentration unpaired in space and time

P − O max A u = max O max

Mean normalized BIAS of all predicted and observed concentration pairs with Co>60ppb

MNB60 =

1 N Pi − Oi ∑ N i =1 Oi

± 5-15%

1 N Pi − Oi ∑ N i =1 Oi

± 30-35%

MNG60 =

Mean normalized gross error of all predicted and observed concentration pairs with Co>60ppb

± 15-20%

This group of parameters complements the measures in Section 5.2.1, since it evaluates the model’s capability to simulate peaks, which is particularly important for the evaluation of atmospheric pollutants episodes, as described in the example in Section 5.2.4.1. 5.2.3

EU Directives Modelling Quality Objectives The Air Quality Framework Directive (FWD) sets a general policy framework for ambient air quality. For this purpose, a set of long-term objectives for air quality is established by the legislation. Monitoring and modelling are identified as air quality management tools, and the uncertainty of the monitoring data and modelling results is one of the essential issues of the FWD. The FWD and Daughter Directives establish requirements for air quality modelling, including the definition of the Modelling Quality Objectives, as a measure of modelling results acceptability. In this context, the uncertainty for modelling and objective estimation is defined as the maximum deviation of the measured and calculated concentration levels, over the period for calculating the appropriate threshold, without taking into account the timing of the events. The quality objectives defined for each quality indicator are listed in Table 12. Table 12. Modelling Quality objectives established by European Directives Pollutant

Quality Indicator Hourly mean

Quality Objective 50-60%

Daily mean

50%

Annual mean

30%

PM10, Pb CO

Annual mean 8-hour mean

50% 50%

Benzene

Annual mean

50%

8-hour daily maximum

50%

1-hour average

50%

SO2, NO2, NOx

Ozone

Directive

1999/30/EC

2000/69/EC 2002/3/EC

Model quality measures described in the above EU Directives have been interpreted as the relative maximum error without timing (RME), which is the largest concentration difference of all percentile (p) differences normalized by the respective measured value:

25

RME =

(

max Pp − Op

)

(1)

Op

The question of timing is relevant for those target values defined as a number of allowed exceedances of a given threshold concentration. Besides that, the model quality objectives for the allowed uncertainty are given as a relative uncertainty, without clear guidance on how to calculate this relative uncertainty. It can be assumed that the respective measured value is used to normalize the absolute difference between the maximum deviation of the measured and calculated concentration levels. Another possibility would be to take the maximum relative deviation, but this approach could shift the emphasis to the very low measured concentration ranges, where usually the largest relative deviations between observations and calculations occur, which could be the main reason for non-compliance of annual mean values uncertainty requirements. Besides that, other problems of the interpretation of the model uncertainty requirements could occur since there are no differences between a short-term and long-term model application uncertainty analysis, being the first one in advantage due to the number of paired-in-time results. An alternative model error measure was proposed by Stern and Flemming (2004), defining the quality indicator as the concentration difference at the percentile p corresponding to the allowed number of exceedances of the limit value normalized by the observation (Relative Percentile Error - RPE):

RPE =

Pp − Op Op

,p

(2)

This measure is more robust than the error defined in the EU Directive, evaluating also the model performance in the high concentration ranges, but without the sensitivity to outliers. Since the model uncertainty is examined in the concentration range of the limit values with eq. (2), there is also a direct link to the EU Directives. 5.2.4

Application Examples

5.2.4.1 Application of Quality Indicators to Portugal In order to test and illustrate these model quality measures, a one-year simulation of the chemistry-transport MODEL 1 was used. MODEL 1 was applied in the regional scale mode, covering Portugal with a resolution of 10 km for the entire year 2001 (Borrego et al., 2005). The model results were compared with measured data from 23 sites of the national air quality monitoring network according to the EU directives thresholds. Table 13 presents the average of the relative maximum error (RME) and the relative error at the percentile, which corresponds to the allowed number of exceedings of the limit value threshold (RPE) for the background and all the monitoring sites, for each pollutant indicator defined by the EU Directives. Table 13. Average of RME and RPE for the background and all the monitoring sites, for each pollutant indicator defined by the EU Directives Pollutant SO2

NO2

O3

EU Directives indicators Human health protection (hourly mean) Human health protection (daily mean) Vegetation protection Vegetation protection Human health protection (hourly mean) Human health protection Human health protection (8h running daily mean) Vegetation protection

RME (%)* 79 66 33 44 81 47 69 71

Percentile (P) 99.73 (25th max 1h mean) 99.18 (4th max 24h mean) annual mean winter mean 99.79 (19th max 1h mean) annual average 93.15 (26th max 8h daily mean) AOT40

RPE (%)* 34

RPE (%)** 40

57

69

33 44 39

46 58 48

47 16

50 35

49

65

*considering only background monitoring stations; ** considering all monitoring stations 26

Concerning the hourly and daily averages indicators, the analysis of the relative maximum error (RME) defined by the EU directives reveals that it is calculated at the highest measured value. In these cases, the assessment of the model uncertainty depends on the model performance in a concentration range having an extremely small probability. This also means that the model uncertainty assessment could probably be based on an outlier concentration caused by an error of the monitoring unit or an extreme weather situation. In fact, and in opposite to the RME (eq. 1), the error measure RPE (eq. 2) shows a quite total compliance with the legislation uncertainty requirement of 50% for all the pollutants indicators. These conclusions are in agreement with other model evaluation studies with similar or even higher complexity (Stern and Flemming, 2004; Hass et al., 2003). The analysis of Table 13 also reveals the problem of the heterogeneity of the observed concentration fields and the importance of selecting the adequate and representative monitoring sites for model resolution, since it is impossible for a grid model to simulate all stations with the required accuracy. The same methodology concerning the estimation of model quality measures of the EU Directives should be applied in case of local scale applications. Restrictions to the application of this methodology will appear in the case of models with feasible temporal applications of several days, and for pollutants with averaging periods of 1 year, such as PM, Pb and Benzene. Comparing EPA performance measures of two different air quality models (referred to as MODEL 1 and MODEL 2) reveals differences in performance. The models were applied to an ozone episode that occurred in Portugal from 27 to 31 May 2001. During this period, exceedances of the O3 limit value (180 µg m-3) were registered in 5 air quality monitoring stations over Portugal, three of them considered as background stations and two located in industrial areas. Table 14 presents the calculated EPA quality indicators. Table 14. EPA quality indicators obtained for MODEL 1 and MODEL 2 simulation Parameter Au MNB60 MNE60

Average for all stations Model 1 Model 2 18.0 46.6 -0.8 0.1 0.0 0.1

Average for background stations Model 1 Model 2 10.1 26.5 -1.1 0.1 0.0 0.1

The results show that while the values for quality indicators are acceptable (Table 14) for Model 1, Au is not acceptable for Model 2. 5.2.4.2 Application of EU Quality Indicators for the Southern North Sea Region Schlünzen and Meyer (2007) investigated the impact of the meteorological situation and chemistry on dry deposition to the southern North Sea through the use of the high-resolution model system M-SYS that employs METRAS (meteorology) and MECTM (chemistry). For the evaluation of the METRAS meteorology results, the hit rates (H) and correlation coefficients (r) are calculated (Table 15); values from Schlünzen and Meyer, 2007). Values for the desired accuracy criteria DA were taken as given in Table 26 (Annex D) for the meteorological variables. SK gives results of Schlünzen and Katzfey (2003) for comparison. Temperature and dew point temperature as well as wind direction are well simulated, while wind speed agrees less (Table 15). The hit rates for wind speed ff are between 27% and 39%. These values are in the same range as the hit rates found by Cox et al. (1998). They received for 12–26 h forecasts hit rates for ff of 22–41%. For the same case Schlünzen and Meyer (2007) calculated hit rates H_EU for the concentrations by using the accuracy requirements defined in the EU daughter directives on air quality (European Communities, 1999, 2002; Table 12). Based on the directives, the maximum deviation of the measured and calculated concentration levels must not exceed 50–60% of the hourly limit value for sulphur dioxide (SO2: hourly limit value 350 μg.m-3) and nitrogen dioxide (NO2: hourly limit value 200 μg m-3), and 50% of the hourly threshold value for ozone (O3: information 27

hourly threshold value is 180 μg m-3). Hit rates based on 15% of the maximum measured values were additionally calculated, to receive a more detailed information on the error spread (Table 16). H_EU are 100 % except for NO (H_EU=99%) and O3 (H_EU=94.8%). For the evaluation the timing was kept. H_15 values are in the same range as found for meteorological data. BIAS values range from about 10% of measured mean values (SO2, ozone) to 50% (NH3). NH3 maximum values show large differences, probably a result of too low emissions or missing transport from the NH3 emitting regions of western Germany that is bordering the eastern part of the model domain. Table 15. Correlation coefficients and hit rates (in %) for the variables T (temperature), Td (dew point temperature), ff (wind speed), dd (wind direction) based on routine meteorological data (RD) and two specially equipped field sites (WAO, MPN, de Leeuw et al., 2003)

T Td ff dd

Correlation coefficient r RD WAO MPN 0.91 0.95 0.89 0.85 0.91 0.94 0.26 0.21 0.64 0.74 0.78 0.80

DA 2 ºC 2 ºC 1ms-1 30º

86 76 39 67

Hit rate H WAO 77 87 34 74

SK MPN 89 94 27 64

73 79 58 63

Table 16. Number of measurements (No), mean based on measurements (MeMean in mg m-3), BIAS of measured and simulated data (BIAS in mg m-3), maximum value based on measurements (MeMax in μg m-3) and model results (MoMax in μg m-3), hit rate (H_EU in %) with desired accuracy DA_EU (in μg m-3) defined as 50% of threshold values as given by EU directives, hit rate (H_15 in %) with desired accuracy DA_15 (in μg m-3) defined as 15% of MeMax NO NO2 NH3 SO2 O3

No 3531 4697 561 5000 4777

MeMean 8.0 21.8 6.8 8.7 51

BIAS 3.1 5.4 3.3 0.9 -5.5

MeMax 247 100 81 133 185

MoMax 196 128 22 108 194

28

DA_EU 100 100 100 175 90

H_EU 99 100 100 100 94.8

DA_15 37 15 12 20 28

H_15 91 64 89 96 38

6.

VALIDATION DATASETS

John Douros(1), Kristina Labancz(2), Nicolas Moussiopoulos(1) (1) Aristotle University, Thessaloniki, Greece (2) Hungarian Meteorological Service, Department for Atmospheric Environment, Budapest, Hungary

The issue of model evaluation is of particular importance both for research model applications as well as in air quality management applications. Discussions of model performance tend to often revolve around the various definitions of several core concepts such as “validation” or “verification”. Model evaluation consists of a number of elements including usefulness and reliability of the model and its results. For a model to be useful, it must reflect the behaviour of the real world atmospheric processes being simulated with a pre-defined level of accuracy that is acceptable for the intended purpose of use. A model is regarded to be reliable if the implementation of the calculations involved reproduce the conceptual model of the system to be simulated. Model evaluation is necessary in order to identify the strengths and weaknesses of a model and assess the efficiency of its use in providing realistic results to be used in air quality management and assessment. Therefore, before any model is applied for a particular research or management purpose, it must be evaluated so that its suitability for the specific application is ensured. While validation incorporates the more traditional elements of model testing, e.g. comparisons with analytic solutions or more qualitative evaluations of the model behaviour, the evaluation process involves as a necessary step the comparison of model results with observations and quantitative measures. The measured values intended to be used for model evaluation are referred to as “validation datasets”, when they are produced and used specifically within a foreseen model evaluation procedure. 6.1

Model Validation Datasets and Selection Criteria Model validation datasets are produced within model development laboratories (e.g. in wind tunnels), within field experiments dedicated to produce validation datasets to check the performance of models for a specific model application, or derived from monitoring datasets. Several issues arise in using the appropriate model validation datasets for particular evaluation purposes. For each specific case the required data completeness (suitable size, temporal and spatial coverage, minimum number of data gaps and consideration of any compilation procedures that may have caused data to be eliminated), quality and accuracy have to be specified. These requirements vary according to the intended model application, as well as the model properties, such as model scale and parametrizations. Although some requirements may differ depending on the application, the requirement for Quality Assurance and Quality Control (QA/QC) of the produced datasets is applicable in all cases. This is because all validation datasets should satisfy some general quality criteria in order for the model evaluation exercise to be realistic and to be useful to the modelling community. QA/QC assures that the relevant measurements performed meet some pre-defined standards of quality with a stated level of confidence. It should be emphasised that the function of QA/QC is not to achieve the highest possible data quality. Rather, it is a set of activities enabling the measurements to comply with the specific Data Quality Objectives for the particular monitoring programme. The main parts of the Quality System are: Quality assurance (QA): the management of the activities within the data acquisition, and setting of overall objectives and criteria. Quality control (QC): the procedures of the day-by-day operations and data validation. Quality assessment: the external validation of the implementation of the quality system. Assuming that these procedures are followed during the validation dataset generation, the data reported for a particular monitoring station will have a stated level of accuracy and precision, a specified area of representativeness, and a sufficient time coverage, as defined by the Data Quality Objectives (DQO). The EU Air Quality Directives* specify DQO, and certain data quality related requirements. DQO requirements are given for: *

http://www.europa.eu.int/comm/environment/air/ambient.htm

29

• • • •

Minimum accuracy and data capture for monitoring, as well as for modelled data, and objective estimation. Location of monitoring stations. Minimum number of stations. Reference monitoring methods.

Therefore, before the generation of a validation dataset DQO for accuracy, precision, data capture and time coverage should be defined, which must comply with the EU AQ Directives and with the evaluation objectives. Site selection criteria for the location of the monitoring stations must be established taking into account the nature of the particular campaign. Moreover, a documented calibration programme and a data validation procedure complying with the Decision (97/101/EC) (EC, 1997) should be followed, in order to ensure that the criteria for quality are met. Although the definition of “quality” and “accuracy” of the validation dataset may slightly vary depending on the specific application purpose and user group, in general accuracy refers to the closeness of results of observations to the true values (or the values accepted as being true). This implies that observations of most spatial phenomena are usually only considered to be estimates of the true value. Quality is a wider term that includes accuracy, and can simply be defined as the fitness for purpose of a specific dataset. Data that are appropriate for the purpose of evaluating a model for one application may not be suitable for the evaluation of the model for another application. Therefore, the definition for acceptable data quality varies depending on the scale, accuracy, and extent of the dataset, as well as the quality of other datasets to be used. Five components are most commonly associated with data quality definitions, which are: • • • • •

Lineage. Positional accuracy. Numerical accuracy. Logical consistency (data should be presented in a consistent and unambiguous form). Completeness.

The difference between observed and true (or accepted as being true) values indicates the accuracy of the observations. All model validation datasets should be adequately documented, and their availability and relevant contact details should be provided in the documentation. It is also important to include any information relevant to the error status (such as statistical error indices) of the data, so that the user will be able to make a subjective statement on the quality and reliability of the data, and thus a scientifically-based decision on the suitability of the data for the intended application. Model validation datasets independent of those used for calibration should be employed for model evaluation. Every effort should be made to evaluate the model across the range of conditions for which it will be run. Model evaluation and analysis of model errors must be undertaken for the key variables required from the modelling study. It has to be noted that within project or team members, a cycle of information exchange and feedback needs to be developed among model developers, technology developers, and technology appliers, in spite of the difficulty in sharing proprietary information. 6.2

Mesoscale Model Validation Datasets and COST728 Several European projects and working groups have produced model validation datasets as well as model validation databases that are available upon request by other project members, academic institutions or authoritative bodies for model assessment and model intercomparison studies. The present action, COST728, provides a framework for the use of model validation datasets. By reference to existing model results, the strengths and weaknesses of current approaches and common successes or failures (if any) will be established within COST728. WG4 in particular, aims at developing tools and methodologies that can be applied to evaluate mesoscale meteorological models for air pollution and dispersion applications. Within the scope of this aim, a database (meta-database) has been compiled (Figure 1), that includes information on available well-documented air quality and meteorological datasets, produced or used in earlier projects (http://pandora.meng.auth.gr/mqat/). Although the meta-database is an ongoing activity and is envisaged as a valuable data source for model users and demonstrates the wide range of 30

projects related with or meant to support model evaluation. Therefore, its sustainable operation and frequent update is of major importance for the modelling community world-wide. The model validation datasets established within COST728 WG4 (from, e.g. FUMAPEX, CITY-DELTA, ESCOMPTE, MESOCOM, VALIUM) and elsewhere (e.g. the EUMetNet Short Range Numerical Weather Prediction programme, SRNWP) will then be reviewed to highlight, amongst others, which model parametrizations are thought to be most critical in each case. Some examples of other projects and initiatives on model validation datasets are summarized in Annex I. The table presents the datasets produced and/or used particularly for model evaluation purposes that are currently uploaded into the meta-database. Similar objectives as for the COST728 meta-database are also behind JRC’s DAM metadatabase†. The main purpose of JRC’s DAM meta-database is to facilitate the access to valuable information regarding available datasets, to any model developer or user that intends to evaluate his/her modelling tool. DAM is intended to be an interface between modellers and the information available through existing web sites or contact points. Although DAM can not be considered as capable of answering any possible type of request from model users or developer, it aims at being as complete as possible, although continuous update is here as important as it is for the COST728 meta-database. 6.3

Other Efforts for the Harmonisation and Standardisation of Validation Datasets A notable effort to harmonise model evaluation studies using the same evaluation datasets has been undertaken by the Harmonisation initiative‡. In 1991, a European initiative was launched to enhance cooperation and standardisation of atmospheric dispersion models for regulatory purposes. The initiative responded to the increased need for several new dispersion models with advanced parametrizations to be developed in a well-organized manner and emerge into practical, generally accepted tools to be used by policy and decision makers. In this context, a series of workshops has been organised within the initiative during the last 10 years to promote the use of new-generation models within atmospheric dispersion modelling, and to improve "modelling culture". A central activity of the "Harmonisation" initiative, closely related to the conferences, is the development and distribution of the so-called Model Validation Kit. The Model Validation Kit is intended to be used for evaluation of atmospheric dispersion models. It is a collection of four field datasets as well as the suitable software where the data will be incorporated for model evaluation. The Kit is recommended as a practical tool serving as a common frame of reference for model performance evaluation. It addresses the classic problem of dispersion from a single point source. The package was updated to Version 2.0 in October 2005, and an extensive set of web pages, from where the Kit can also be downloaded, provide details on the contents of the Model Validation Kit§. An American Society for Testing and Materials (ASTM)** standard guide on model evaluation that was published in November 2000 represents an alternative approach to that of the Model Validation Kit. However, it is requested that the results from the Model Validation Kit should be interpreted with care, because it does not explicitly address the question of stochastic nature of observed concentrations. The ASTM standard guide contains detailed discussions on the framework and procedures for model evaluation. The framework is general in the sense that it does not refer to a certain model type or with a specific concentration variable. However, there is an Annex to the guide, which specifies an example where the framework is used. This example deals with the classic problem of a plume being emitted from an isolated point source. Another initiative, the European Network of Excellence on Atmospheric Composition Change, ACCENT, operating since March 2004, aims to promote a common European strategy for research on atmospheric composition change, to facilitate this research and to optimise two-way interaction with policy-makers and the general public. Within ACCENT, the Transport and Transformation of Pollutants (T&TP) project aims at bringing together the European community of researchers of atmospheric sciences to identify current problems of understanding and promoting †

http://rem.jrc.cec.eu.int/dam/

‡

http://www.harmo.org/default.asp

§

http://www.harmo.org

**

http://www.harmo.org/astm

31

research work and to improve the performance of models for analysis and forecasting on global, regional and local scales. In this context, the refinement of methods for assessing urban and local scale air pollution levels is of particular importance. In particular, methods are required for source apportionment, for ensuring compliance to the air quality legislation and for the analysis of air pollution episodes. Towards this aim, the generation of model validation datasets on sites with different characteristics is required, in order to improve the quantitative level of confidence for model predictions. The case of the MITRAS model provides an example where the model evaluation data were generated within an academic group to assess the performance of a model developed by the same community††. Based on the mesoscale model METRAS, the microscale model MITRAS has been developed by Schlünzen et al. (2003) in a consortium of four partners within the tropospheric research programme, funded by BMBF. MITRAS is a community model. The Meteorological Institute, Centre for Ocean and Climate, Research, University of Hamburg, Germany, coordinated the model development, implemented the modules that were produced within the consortium into MITRAS, and was also responsible for model validation. The datasets that were specifically produced for model validation were determined through wind tunnel (CEDVAL) experiments. These controlled CEDVAL wind tunnel experiments were tailored for the evaluation of microscale models. This approach cannot be transferred to mesoscale models, since the scaling does not allow the use of wind tunnel data and field experiments cannot be performed with controlled external boundary conditions. The GAIM ‡‡ Task Force was formed as an overarching framework activity of the International Geosphere-Biosphere Programme (IGBP) for coordinating and promoting different multi-disciplinary research components that can be combined to formulate an integrated view of the Earth System, using as tools both data and models. In order to assess the validity of Earth system models, it is critical to understand the sensitivity of the system to each of the input data and to conduct model sensitivity analyses of dynamic vegetation models, ocean carbon cycle models, GCMs, and hydrologic models as well as for simple Earth system models with respect to the various input climate and ecological data. GAIM will also undertake addressing some of the more theoretical issues involved in complex model development, coupling and evaluation. In particular, in the case of the evaluation procedure, the minimum necessary resolution for model validation datasets will be determined and inverse methods for applying model validation datasets will be established. Another example is the Working Group on iodine which has been established in the framework of the EMRAS programme §§ which continues some of the more traditional work of previous international programmes (VAMP -Validation of Model Predictions, BIOMOVS BIOspheric Model Validation Study, BIOMASS – BIOsphere Modelling and ASSessment) on increasing confidence in methods and models for the assessment of radiation exposure related to environmental releases, for the purposes of radiation protection of the public and the environment. The preparation of appropriate model validation dataset is therefore an essential component of this work. In this case, the validation dataset formed a comprehensive database that took into consideration the issues of the different model variables, the temporal and spatial resolution of the simulations, and other elements that would justify the analysis of the model evaluation results. In particular, concerning the model validation dataset, it was suggested to prepare a large database including air concentration of iodine over Warsaw, meteorological data (such as precipitation data, wind trajectory, temperature), soil concentration of iodine for several locations and time periods and durations, iodine concentration in grass, specified leafy vegetables (lettuce) and milk. Epidemiological data were also monitored in this case, such as thyroid burden of inhabitants of the affected district and information about the age, sex, date of thyroid blocking and diet (milk consumption) as well as physical activity was associated by interview with each measurement.

††

http://www.mi.uni-hamburg.de/mitras

‡‡

http://gaim.unh.edu/Structure/GAIM_Plan/index.html

§§

http://www-ns.iaea.org/projects/emras/emras-background.htm

32

The Biospheric Model Validation Study-Phase II (BIOMOVS II***) previously mentioned was an international cooperative programme that examined the accuracy of predictions of environmental assessment models. Model evaluation was based on calculations made by individual participants for ten test case scenarios focusing both on short- and long-term releases of radioactivity from facilities such as power reactors. Model predictions were compared with each other and, where possible, with independent field observations, and reasons were sought for any observed differences. Confidence intervals on predictions and differences between predictions and observations were often less than a factor of 10, although there was much variability among models and scenarios. Model performance depended not only on the formulation and parameter values of the model, but also on the experience and assumptions made by the user. The study demonstrated the need of using harmonised validation datasets to better explain and justify model structure and application and to assess sources of uncertainty. A key recommendation was that assessments should not be undertaken in isolation by one individual modeller using one model. The VAMP (Validation of Environmental Model Predictions) programme which ran from 1988 to 1994, aimed at examining the widespread distribution of radionuclides in the environment after the Chernobyl accident. The results of the measuring and monitoring campaigns performed in this context established a basis for evaluating the predictions of mathematical models. The VAMP programme proved to be very successful and involved over 100 scientists from many different countries. The exercises in VAMP provided a unique opportunity for testing the accuracy of model predictions, following a common evaluation methodology. In some cases, existing models and transfer coefficients were found to give a reasonable representation of the transfer of radionuclides through the environment. In other cases, previous generic assumptions regarding, for example, dietary intakes and food sources, proved to be inappropriate for application to a particular environment. In the model testing studies, there was a general trend towards over-prediction. One of the most likely reasons for this is associated with the use to which these models are normally put, that is, they are most commonly used for the purpose of comparing radiation doses received by critical population groups from releases of radionuclides from operating practices with dose limits. In this application, there is a need to be sure that doses do not exceed the dose limit and so the assumptions and parameter values in the models tend to be selected in a way which will make underestimation unlikely. The NAME Data Management activities are being coordinated by the NCAR Earth Observing Laboratory (EOL)†††. EOL has established and maintains the NAME Project‡‡‡ including the data management pages and final project archive. The NAMAP-2 Project§§§ has recently been initiated to bring together a variety of modelling comparisons and evaluation efforts. Designated modelling group participants are being identified, as are experimental details such as standard protocols, formats, and model validation datasets. This project will take advantage of the large amount of special research and verification datasets collected during the 2004 NAME EOP to improve model development. The NAME data group in EOL will facilitate archiving and coordination of NAMAP-2 model and validation datasets. SATURN **** (project of Eurotrac-2, 1997-2002) was comprised of different experimental activities, such as local scale field experiments, urban scale field experiments and large urban scale field campaigns. Comprehensive model validation datasets resulted from these field campaigns which were planned in urban areas representative for conditions prevailing in different parts of Europe. Each of these campaigns satisfied the following criteria: The scientific aims were well-defined (for instance: analysis of specific physical or chemical processes or checking a suggested working hypothesis); the field measurement programme included all quantities necessary to address the given scientific problem; the temporal and spatial resolution was sufficient for establishing a dataset applicable for model validation purposes. The evaluated field ***

http://info.casaccia.enea.it/evanet-hydra/Cadarache/VAMP%20BIOMOVS%20assessment/VAMP-BIOMOVS.htm

†††

formerly the Field Operations and Data Management Group of UCAR’s Joint Office for Science Support

‡‡‡

http://www.joss.ucar.edu/name

§§§

http://www.joss.ucar.edu/name/namap2/

****

http://www.gsf.de/eurotrac/sp-sat-f.htm , http://aix.meng.auth.gr/lhtee/saturn.htm

33

campaign results are summarised in the COST728 WG4 data metadata inventory and accessible via http://www.cost728.org/. Some further examples of field datasets selected and used for model validation and evaluation purposes in Europe and the U.S. †††† are presented in Annex I, along with data information and availability details.

††††

http://camp.gmu.edu/FieldDatasetInventory.htm

34

7.

MODEL VALIDATION AND EVALUATION EXERCISES

Several validation and evaluation exercises are reported in the model meta-database * . More details are given in the model validation and evaluation exercises data base (Annex I). A comprehensive collation of exercises has not been possible and the focus has been to select from those that have been undertaken by the groups participating in COST728. However, even for the COST728 groups this summary is not complete in the sense that all model validation exercises conducted by the groups are listed. The validation exercises reported, however, serve to reflect the spectrum of evaluation attempts performed by some key groups in Europe. They comprise different approaches to demonstrate the usefulness of the model to answer particular scientific questions. The approaches that were used for the different models are listed Annex M. 7.1

Mesoscale Meteorological Model Validation and Evaluation Studies Volker Matthias GKSS Research Centre, Geesthacht, Germany

Mesoscale meteorological models are particularly suited for analysing complex meteorology and air pollution situations. One important application has been the analysis of air pollution episodes. As part of such applications, models have been evaluated with a range of methods. We distinguish between four types of model validation: i) ii) iii) iv)

Comparison to analytic solutions. Comparison to reference datasets. Model intercomparisons. Additional efforts (case studies).

Out of the 15 mesoscale meteorological models reported in the database. MM5 is operated by three different groups and appears three times in the database. Seven mesoscale meteorological models were compared against analytic solutions. Mostly the wind field was investigated for flows over mountainous terrain. Eight models were compared to reference datasets. This includes standard meteorological measurements as wind, temperature and humidity at ground level, vertical profiles from radiosoundings and Radar. Especially, the operational models from the weather services like COSMO-EU, COSMO-DE and GME undergo daily comparisons to measurements. Additionally, well documented and quality tested datasets from extensive field campaigns were taken to evaluate the models. This is documented for e.g. METRAS using data from TRACT, BERLIOZ and FLUMOB (see Section 6) and from experiments in the Arctic. Model intercomparison studies are reported for 10 models. The operational weather forecast models are very extensively tested by comparing their results to daily NWP forecasts from other European weather services. Other evaluation exercises include the EU FP5 project FUMAPEX, COST715 and MESOCOM. MESOCOM (Thunis et al., 2003) includes seven models and use idealised cases which provides information on the variability of the model results and on possible reasons for the differences but not about the ‘correctness’ or accuracy of the model results. In FUMAPEX, measurements were also taken into account. Seven models, most of them operational NWP models, were applied to pollution episodes in different European cities. For more information on the FUMAPEX results see also sections 4.2.5, 7.1.6, 7.1.7 of this overview report. By far, most of the activities are attributed to what is here called ‘additional validation and evaluation efforts’ which are reported for 11 models. Almost every publication of model results includes at least a paragraph on the evaluation of the model by comparing the results to measurements. However, the comparisons have not always been conducted very systematically and variables other than temperature, wind and humidity come into play. Comparisons of cloud liquid water content were made for GME, COSMO-EU, MM5, NHHIRLAM and UM, for example. Boundary layer height was compared to, among others, lidar and radar measurements for *

http://www.mi.uni-hamburg.de/costmodinv

35

COSMO-EU, MM5 and UM. METRAS was also tested in other regions of the Earth than the midlatitudes, namely the sub-tropics and the Arctic. This section describes the results of model evaluation studies conducted by COST728 partners. Several meteorological and air quality models have been evaluated for modelling episodes. These models include CALMET, COSMO-IT, MEMO, MM5, MARS/MUSE, MM5/AirQUIS, M-SYS and MM5/CMAQ. Table 17 shows the model evaluations employed by model users within COST728. These models are not in themselves representative of the whole field but they do provide real examples of how mesoscale models are being evaluated for studying episodic conditions. Examples for sensitivity study results can be found in Section 4.2. Table 17. Summary of model evaluation studies undertaken recently by COST728 members Model name

ADREA ALADIN/A ALADIN/PL ARPS COSMO-CH ARPS BOLCHEM CALMET COSMO-EU GESIMA GME Hirlam COSMO-IT COSMO-DE COSMO-EU_MH MARS/MUSE M-SYS MC2-AQ MCCM MEMO MERCURE Meso-NH METRAS MM5 (GR, /AirQUIS / CMAQ (UH) NHHIRLAM RAMS RCG SAIMM UM WRF/Chem

Analytic solutions X

X X

X X X

X X X X

Comparison with observations X

X X X X X X

Statistical Analysis

X X X X X X X X X

X X

X X X X X X X X

X

X

X X X X

X X

X X X X X

X X X X X

X

X X X

X X

X

Sensitivity test on model setting

X X

X

X X X X

Model intercomparison

X

X X

X X

The following studies also indicate that a full model evaluation exercise would involve the following elements: • • • •

Comparison of model results with observations. Intercomparison of models for same cases. Statistical quantification of the model performance. Sensitivity analysis of the outputs to changes in input parameters and model formulations or scheme options. 36

Not all model evaluation examples given in the following sections have employed all of the above methods. This in itself suggests that more thorough studies of mesoscale model evaluation for meteorological and air pollution applications are required. 7.1.1

Use of the European Tracer Experiment (ETEX) for Model Evaluations Christer Persson SMHI, Norrköping, Sweden

Within ETEX two successful atmospheric dispersion experiments on meso- and European scale were carried out in October and November 1994, ETEX-1 and ETEX-2. At each experiment an inert, non water soluble tracer was emitted to the atmosphere from a site in Western France. The release time was each time 12 hours and air samples were taken during 72 hours after the release at 168 stations in 17 European countries. Also upper air tracer measurements were made from three aircrafts. During ETEX-1 the tracer plume was emitted into an unstable atmosphere and first transported north-eastwards across Europe and reached during the second and third days a deformation zone with weak winds and a rather complex atmospheric transport situation. After 2.5 days the tracer plume was stretched in a broad band from Norway south to Bulgaria. In ETEX-2 the tracer emission took part in stable warm air and a very strong south-westerly wind (Nodop et al., 1998). Although old and using a passive tracer, these experiments still can be very valuable for model evaluations on meso- and European scales. ETEX-1 and -2 are two of the few existing sets of information related to controlled long-range dispersion of tracers. These datasets are today easily available within the ENSEMBLE project for nuclear emergency preparedness at JRC, Ispra†. The ENSEMBLE software can be used in a convenient way for model evaluations, including some statistical measures, based on these datasets. Meteorological data needed for dispersion model calculations, in the form of gridded 3-D analyses / forecasts every 3-h as well as observations, can be obtained from ECMWF (European Centre for Medium-Range Weather Forecasts‡) and also, sometimes in more high resolution than from ECMWF, from several European National Meteorological Services. The ENSEMBLE project is a RTD FP5 project supported by the European Commission and is described, for example, in Galmarini et al. (2004) and based on work within the ENSEMBLE Consortium § . Model evaluations of user–oriented measures of effectiveness to transport and dispersion model predictions have been performed by Warner et al. (2004, 2005). This work is based on the ETEX experiments and the quite extensive evaluation methods are described in the papers. 7.1.2

Meteorological Simulations over the Greater Athens Area Using MM5 and MEMO Mesoscale Models

Nicolas Moussiopoulos(1), John Douros(1), George Tsegas(1), Evangelia Fragkou(1), Anabela Carvalho(2), Carlos Borrego(2) (1) Laboratory of Heat Transfer and Environmental Engineering, Aristotle University Thessaloniki, Thessaloniki, Greece (2) Department of Environment and Planning, University of Aveiro, Aveiro, Portugal

MEMO simulations have been performed in complex terrain areas, including the Greater areas of Athens, Greece and Marseille, France. In the case of Athens, MEMO was evaluated against observations and also compared to the 3D Eulerian, limited-area, nonhydrostatic, terrainfollowing MM5 model system. The Greater Athens Area (GAA) presents several terrain irregularities and large water bodies. It is located on an oblong basin and is surrounded by mountains at three sides and is open towards the Saronikos Bay to the southwest. The local wind circulations caused by this complex topography, particularly the sea breeze, greatly influence air †

http://rem.jrc.cec.eu.int/etex/

‡

http://ecmwf.org/

§

http://ensemble.jrc.it

37

pollution circulation in the GAA. The period between the 16th and 19th of July 2002 was simulated, for which a complete set of meteorological observations was available. Comparisons with observations were carried out at 10 different stations in and around the GAA, including mainly suburban and two urban stations, Patision and Marousi. Both MEMO and MM5 reproduced the afternoon sea breeze, although in the case of MM5 the flow was generally more homogenous, whereas MEMO simulated a sea breeze in two different cells of the peninsula (Figure 4). More specifically, a south-easterly change in wind direction in the Gulf of Petalion encouraged the development of a sea breeze cell in the Mesogia Plain, which was hardly apparent in the case of MM5. (a)

(b)

Figure 4. MM5 (a) and MEMO (b) predicted wind (m s-1) over terrain (m), 19th of July 2002, 14:00

Both MEMO and MM5 followed the observed wind speed diurnal pattern successfully (Figure 5a,b). The wind speed BIAS reveals a tendency of both models to overestimate and MEMO’s overestimations appear to be slightly more pronounced than MM5’s. However, the correlation coefficient of the time series makes evident MEMO’s capability to follow the diurnal variation of wind speed more accurately almost everywhere in GAA. Regarding temperature, both MM5 and MEMO were able to capture the diurnal pattern, as well as the gradual decreasing trend during the simulation period (Figure 5c,d). This decrease in temperature was both observed and predicted at all stations, in agreement with the prevailing synoptic conditions. At most stations, MEMO was closer to the observations. MM5 performed well in the beginning of the simulation, but it underestimated significantly the temperatures during the last two days of the modelled period.

38

(a)

(b)

7

7

Obs. MM5 MEMO

6

Obs. MM5 MEMO

5

5

4

4

Wind [ms-1]

Wind [ms-1]

6

3

3

2

2

1

1

0

0

16-Jul-02

17-Jul-02

18-Jul-02

19-Jul-02

20-Jul-02

16-Jul-02

17-Jul-02

Date in July 2002

38

37

37

36

36

35

35

34

34

33

33

32

32

31

31

30 29 28 27 26

22

19-Jul-02

20-Jul-02

30 29 28 27

25 24

Obs. MM5 MEMO

Obs. MM5 MEMO

23 22

21

21

20 16-Jul-02

20-Jul-02

26

25

23

19-Jul-02

(d)

38

Temperature [oC]

Temperature [oC]

(c)

24

18-Jul-02 Date in July 2002

20 17-Jul-02

18-Jul-02

19-Jul-02

20-Jul-02

16-Jul-02

Date in July 2002

17-Jul-02

18-Jul-02 Date in July 2002

Figure 5. Surface wind speed (a, b) and temperature (c, d) at the stations of Marousi (a, c) and Liosia (b, d) for the whole simulation period

7.1.3

Evaluation of MEMO Using the ESCOMPTE Pre-campaign Dataset N. Moussiopoulos(1) , I. Douros(1), P. Louka(1), C. Simonidis(2), A. Arvanitis(1) (1) Laboratory of Heat Transfer and Environmental Engineering, Aristotle University Thessaloniki, Thessaloniki, Greece (2) Institute of Technical Thermodynamics, University of Karlsruhe, Karlsruhe, Germany

The greater Marseille Area (GMA) is a challenging area for mesoscale simulations since it includes certain discrete geographical formations (sea, the Southern Alps, the Rhone valley) which directly influence the local circulation. The selected case-study was the period between June 29 and July 1, 2000, i.e. a summer period for which, depending on the meteorological conditions, the formation of photochemical smog was favoured. Three out of a total of 15 measuring stations were considered at locations where differences in local meteorology characteristics can be expected, namely a location by the sea (Marseille), a location further inland (Tarascon) and a location far from the sea (Carpentras). In order to assess the sensitivity of the model results to the grid resolution, two different cell sizes were used, namely 4×4 km2 and 2×2 km2. In general, the simulated values for both resolutions are comparable, with those of highest resolution not surprisingly capturing more details. Taking a closer look at each selected station, it is shown that for the station of Marseille located by the sea the correlation with measurements is better for the MEMO run with the 2 km resolution than for 4 km resolution, especially for wind speed and temperature, as the local flow is more accurately captured. The model performance for temperature is found to be better for Marseille and Tarascon, while the correlation is considerably higher for the daytime values than for night-time values for the station of Carpentras, for which night-time temperature is overpredicted. As Carpentras is located on a mountain slope, this is possibly due to an underestimation of the radiative heat flux from the ground associated with the 39

land-use categorisation implemented in the model. Overprediction of the night-time temperature is generally observed at mountainous stations of similar positions. (a)

(b)

Figure 6. BIAS for wind speed (a) and air temperature (b) during the simulation period (29 June to 1 July 2000). Red bars denote midnight

The statistical analysis generally suggests that there is an overestimation of wind speed early in the morning of the last day of the simulation (Figure 6a), which, however, does not correspond to a bad prognosis of the wind direction during the same time period (not shown). Temperature, on the other hand, reveals the night - overestimation trend (Figure 6b), which was already evident from the diurnal profiles at the selected stations. Generally, the BIAS at a grid spacing of 4 km is comparable to that at 2 km, a fact that does not justify the much higher computational effort associated with the higher resolution. 7.1.4

Modelling of SOA in the MARS-MUSE Dispersion Model Edouard Debry, Ioannis Douros and Nicolas Moussiopoulos Laboratory of Heat Transfer and Environmental Engineering, Aristotle University Thessaloniki, Thessaloniki, Greece

The MARS/MUSE (Moussiopoulos, 1995) 3D Eulerian mesoscale photochemical modelling system was evaluated for the area of Milan. In the version examined, a modal aerosol model was incorporated for secondary organic aerosols (SOA), in which coagulation, condensation/evaporation and nucleation were solved for each mode of the aerosol distribution. Simulation results were compared with sets of measured data for PM10, O3, NO2 and NO. The simulation time is one week starting on the 1st of April 1999. Initial and boundary conditions are provided by predictions of the EMEP model. The model was able to reproduce the average level of observations quite well (small BIAS), however, it could not adequately follow the diurnal variation of PM10 for the stations of Magenta and Limito (large BIAS, small correlation coefficient; Table 18). Secondary organics produce an increase in PM10 concentration but do not change significantly the PM10 behaviour. This tends to mean that secondary organics rapidly reach an equilibrium between gas and aerosol phase. The simulation with SOA slightly reduces the error regarding simulation without SOA, except for Limito station, but in both cases the RMSE and correlation coefficients remain of the same order of magnitude (Table 18). The impact of SOA on the BIAS values is more evident.

40

Table 18. Statistic analysis of the MARS/MUSE simulations with and without SOA Station Limito (with SOA) Limito (without SOA) Meda (with SOA) Meda (without SOA) Vimercate (with SOA) Vimercate (without SOA) Magenta (with SOA) Magenta (without SOA)

BIAS

RMSE

+13.81 +11.42 -2.44 -4.94 0.38 -2.15 -3.46 -4.99

24.99 23.22 16.86 17.34 13.90 13.38 15.15 15.42

Correlation coefficient r 0.15 0.14 0.19 0.15 0.43 0.44 -0.02 -0.03

The time average and standard deviation of PM10 and PM2.5 concentrations at Meda station for observations and simulations can be examined with or without SOA. The fact that the SOA increment also appears for PM2.5 predictions in equivalent proportions indicates that secondary organics are mainly associated with finer aerosols. 7.1.5

Photochemical Simulations over the Greater Athens Area Elissavet Bossioli, Maria Tombrou-Tzella University of Athens, Department of Applied Physics, Laboratory of Meteorology, Athens, Greece

In the study of Bossioli et al., (2007), several factors that influence the ozone concentration levels over the Greater Athens Area (GAA) are examined by applying the three-dimensional photochemical Urban Airshed Model, UAM-V off-line coupled with the Penn State/NCAR meteorological Mesoscale Model MM5-v3.6. The initial scenario (Base Case) constitutes the basis for all the numerical experiments. In particular, this scenario considers both the meteorological and emission data of anthropogenic origin in their primary form with no further modification. The Base Case scenario (BC) reproduces the observed ozone patterns, but it underestimates the observed peaks in most of the downwind suburban stations (Figure 7). Based on the Base Case scenario, several numerical experiments were performed focusing on: i) a better representation of the anthropogenic emissions; ii) the incorporation of the spatial and hourly distribution of the biogenic non methane hydrocarbon emission rates; iii) the adoption of two speciation profiles for the anthropogenic NMVOC emissions; iv) the effect of the urban sector introduced via a simplified urbanized meteorological dataset; v) the application of the MM5 model without nesting in order to isolate the synoptic effects from the local circulation evolution and vi) the effect of the ozone boundary inflows. The performance of the UAM-V model is evaluated by using statistical measures for management studies, recommended by US EPA (1991). The measures that are used are the mean normalized BIAS (MNB), the mean normalized error (MNE), the unpaired peak prediction accuracy (Au) and the spatially-paired peak estimation accuracy (As). All the calculated values are based on the hourly prediction-observation differences normalized by the observed ozone concentrations for all the monitoring stations. The MNB and MNE measures have been calculated for three cut-off concentration levels (110, 80 and 40 μg m-3). The As values were calculated both for the entire set of stations and for each category separately.

41

(a)

(b) Athinas (measur) BCB BCB_spec (urbanized)

N.Smirni (measur) BCB BCB_spec (urbanized) 300

200

200

100

0

100

0 1

3

5

7

9

11

13

15

17

19

21

23

1

3

5

7

9

(c) BC

13

BCB

BCB_spec (standard)

Likovrisi (measur) BCB BCB_spec (urbanized) 300

200

200

(μg/m )

300

15

17

19

21

23

BC BCB_spec (standard)

3

3

(μg/m )

11

(d)

Geoponiki (measur) BCB_spec (urbanized)

100

0

100

0 1

3

5

7

9

11

13

15

17

19

21

23

1

3

5

7

9

(e)

11

13

15

17

19

21

23

(f)

Liossia (measur) BCB BCB_spec (urbanized)


300

200

200

(μg/m )

300

Demokritos (measur)

BC

BCB BCB_spec (no nesting)

BCB_spec (standard) BCB_spec (urbanized)

3

3

(μg/m )


3

(μg/m )

300

3

(μg/m )

Patision (measur) BC BCB_spec (standard)

100

0

100

0 1

3

5

7

9

11

13

15

17

19

21

23

1

3

5

7

9

11

13

15

17

19

21

23

Figure 7. Measured and predicted mean hourly O3 concentrations at (a) the urban traffic station of Athinas (measurements at Patision are also included), (b) the urban background station of N.Smirni, (c) the suburban industrial station of Geoponiki, (d) the suburban station of Likovrisi, and the suburban background stations of (e) Liossia, and (f) Demokritos for various scenarios. BC: Base Case; BCB: BC plus biogenic emissions; BCB_spec (standard): BCB with different NMVOC speciation profiles (the meteorological model is applied with nesting); BCB_spec (urbanized): BCB_spec with the effect of the urban sector introduced via a simplified urbanized meteorological dataset; BCB_spec (no nesting): the meteorological model is applied with no nesting. Figure from Bossioli et al. (2007)

7.1.6

Mesoscale Meteorological Model Inter-comparison and Evaluation in FUMAPEX Barabra Fay(1), Veil Odegaard(2) (1) Deutscher Wetterdienst, Offenbach, Germany (2) Det Norske Meteorologisk Institutt, Blindern, Oslo, Norway

As already outlined in 4.2.5, an extensive model inter-comparison was performed within EU FP5 project FUMAPEX. Evaluations were performed for episodes including evaluations of 1-year time series. Evaluation was separately performed for selected episodes and for a full year (longterm evaluation. A few results are highlighted here, more details can be found in Annex G and Fay et al. (2005). 42

A main outcome of the evaluations performed in FUMAPEX is an insufficient simulation of temperature inversions in all models. This is attributed to the following model deficiencies: • • • • • • •

Model set-up (insufficient vertical resolution, hydrostatic modelling (HIRLAM), terrainfollowing coordinates. Physiographic parameters (incl. deficient land-sea mask). Soil and surface parameterisations (invariant snow properties, false soil moisture, lack of urbanised parameterisations). Cloud parameterisation. Surface evaporation (overestimated). Simulation of strong stability (deficient turbulence parameterisation, overpredicted vertical exchange and vertical wind shear of horizontal wind). Data assimilation (missing snow and sea ice, insufficient vertical soundings, soil and surface parameters and urban observations).

For the one year evaluation and the summer and winter season the ranges of the parameter scores in FUMAPEX are shown for COSMO-EU/COSMO-IT, DNMI HIRLAM and MM5, and FMI HIRLAM in Table 19. Model performance for episode forecasting seems to depend mainly on the model ability to forecast the specific meteorological episode features in sometimes complex locations and even for extreme meteorological conditions, and on the station representativeness and observation quality. The performance depends much less on the location being urban, suburban or rural. Table 19. Range of score for forecast lengths below 48 hours for wind speed at 10 m (FF 10 m), Temperature at 2 m (T 2m), dew point temperature at 2m (Td 2 m)

BIAS, 1 year data, not seasonal RMSE Year Summer Winter

FF 10 m (m s-1)

T 2 m (°C)

Td 2 m (°C)

-1.0 to +1.2 1.5 to 2.3 1.0 to 2.9 1.2 to 3.9

-1.5 to 2.0 1.8 to 4.2 1.2 to 4.3 1.6 to 4.5

-2.0 to 3.5 1.4 to 4.8 1.2 to 8.0 1.6 to 6.6

Comparing the results of the different models for the different episodes in terms of their skill in forecasting air pollution episodes, the models apparently perform better in predicting the summer episodes than the winter/spring inversion episodes. In some regions like Valencia, summer episode conditions are very frequent and possibly include less unusual or extreme meteorological conditions than most other areas. This picture is also confirmed in the evaluation of the episode performance against the background of longer-term statistical scores. These results clearly show the scope, but also the limitations of even highly resolving mesoscale NWP models, especially for the sometimes extreme episode conditions. Very strong inversions and stability, complex orography, superimposed valley-mountain and land-sea breeze systems combined also with larger-scale circulations may decrease model performance and challenge model predictability. Information on the model evaluation strategy used in FUMAPEX and detailed single model evaluation statistics (full tables and graphs, for the whole year and the seasons) and their interpretation for all models and episodes are compiled in Fay et al. (2004, 2005).

43

7.1.7

Evaluation of COSMO-IT for Air Quality Forecast and Assessment Purposes

M. Deserti(1), G. Finzi(6), S. Bande(2), G. Bonafè(1), E. Minguzzi(1), M. Stortini(1), E. Angelino(3), M.P. Costa(3), G. Fossati(3), E. Peroni(3), G. Pession(4), F. Dalan(5), S. Pillon(5), C. Carnevale(6), E. Pisoni(6), G. Pirovano(7), M. Bedogni(8) (1) ARPA Emilia Romagna ([email protected]) (2) ARPA Piemonte (3) ARPA Lombardia (4) ARPA Valle d’Aosta (5) ARPA Veneto (6) DEA, Università degli Studi di Brescia (7) CESIRICERCA S.p.A. (8) Mobility and Environment Agency of Milan

The prognostic meteorological model COSMO-IT is the Italian version of the nonhydrostatic limited area model COSMO-EU (formerly named Lokal Modell; Steppeler, 2003). It is run twice a day with a horizontal resolution of about 7 km, and provides meteorological forecasts and analysis for Italy. Two validation exercises were performed with the aim to evaluate the performances of the operational COSMO_IT model as input to a Chemical Transport Model for producing forecast and/or hindcast simulations of air quality in the Po Valley (northern Italy),. The first exercise was performed by the Hydrometeorological Service of Emilia-Romagna (SIM) in the framework of the EU FP5 FUMAPEX project (Fay et al., 2004), and comparisons of model output (forecasts and analysis) with routine meteorological data and with data from special measurements campaigns were performed. To assess the performances of the coarse model (7 km horizontal resolution), the model outputs were also compared with high resolution outputs during pollution episodes over three different areas represented in Figure 8 (COSMO_IT domain 1.1 km and COSMO_IT domain 2.8 km horizontal resolution). The second exercise was carried out by the Meteorology Centre of Teolo (CMT) in the framework of the Italian CTN-ACE project (Deserti, et al. 2004) and compared model output (reanalysis) with routine meteorological data over a one year period. In addition, COSMO_IT 7 km results were compared with results from a diagnostic mass consistent meteorological model (CALMET, 4 km horizontal resolution) run on a sub domain (CALMET Domain, Figure 8).

Figure 8. Model domains for the Italian evaluation exercises

In Annex H results of the COSMO_IT evaluation are given for special campaigns and in the long-term in a tabular form. They can be summarized as follows: •

COSMO_IT forecasts were generally better in flat terrain than in the mountains (complex terrain) . 44

• • • • •

The results are less good during peak pollution episodes (atmospheric stability, low or calm wind, clear sky). Long term verification shows that T 2 m temperature forecasts are of acceptable quality (mean absolute error MAE < 2.5 K except for urban stations), wind speed forecasts are generally acceptable especially over flat terrain. Results for wind are dependant on forecast time and season (Figure 9). In the Po Valley winds are frequently overestimated while in the Apennine mountains winds are frequently underestimated, leading to incorrect air pollution episode forecasting. The forecast of wind direction is generally poor (the direction MAE ranges between 30° and 80°) and depends strongly on the station (Po Valley better than Apennine mountains), season (summer is the worst) and the wind sector (225° – 270° sector during the night, 45°90° sector during the day, 0°-45° sector during the day in the mountains).

Figure 8. Wind roses for most of the selected surface stations (left), for COSMO_IT (right)

The following general conclusions can be drawn from the evaluation of the suitability of COSMO_IT results (Annex H) for AQ forecasts: •

•

Errors in temperature and humidity forecasting in the PBL are partly due to an incorrect partitioning of surface heat fluxes into sensible and latent heat fluxes. These errors lead to a strong underestimation of the surface temperature inversions (Figure 10). A more detailed soil texture field and an operational routine for the soil moisture initialization could reduce these errors. The errors in the T 2 m daily cycle, the cold and moist BIAS in the PBL, the overestimation of the 10 m wind over flat terrain and the bad treatment of the cases with extreme thermal stratification are problems related to the turbulence scheme implemented in the used version of COSMO_IT. That problem will probably be solved in the near future by reorganizing the turbulence scheme and tuning some parameters of the PBL scheme, changing the interpolation of model variables to synoptic levels, reducing the depth of the lowest model layer and testing and implementing improved schemes for soil moisture analysis.

COSMO_IT was also run at 2.8 km. A 1.1 km horizontal resolution simulation shows that wind field structures become more detailed and realistic. Next to that, an impact was found on turbulence and the variability of vertical velocity, but, due to the lack of experimental data, it was not possible to validate these effects with observations. Therefore, it can not be concluded from this study that an increase of horizontal resolution can improve the accuracy of meteorological input for air quality models.

45

(a)

(b)

Figure 9. Temperature profile at “San Pietro Capofiume” station (rural), the 21st of June 2002 at noon (a) and at midnight (b), observed (black line) and forecasted with the three different horizontal resolutions: 7 km (blue line), 2.8 km (green line) and 1.1 km (red line)

Both, episodic and long term verification show that forecasts were generally better in rural than in urban regions. This result shows the need to account for urbanization. The FUMAPEX project has suggested several strategies and techniques for NWP urbanization, nevertheless, it should be considered that any type of urbanization of COSMO_EU could enhance turbulence in the urban areas. This could improve the model’s ability to forecast mixing height and other related turbulence parameters, but could also reduce its ability to forecast inversions. This is, at the moment, one of the most critical problems for peak pollution episode forecasting. 7.1.8

Evaluation of MM5-CMAQ Systems for an Episode over the UK Y Yu(1), R S Sokhi(1), Nutthida Kitwiroon(1), Bernard Fisher(2), D R Middleton(3) Centre for Atmospheric and Instrumentation Research (CAIR), University of Hertfordshire, UK (2) Environment Agency, Reading, UK (3) Met Office, Exeter, UK

(1)

The MM5/CMAQ was applied to an air pollution episode on 22-28 June 2001 to examine the performance characteristics of MM5/CMAQ for simulating regional ozone and NO2 and SO2. Further details of the model setup and analysis can be found in Yu et al. (2006, 2008). This section focuses on the results of the model evaluation exercise. A range of statistical measures were used for MM5 and CMAQ evaluation. These were calculated for the innermost domain of MM5 and CMAQ for the whole simulated period of June 2001. Model values used in these calculations are extracted from the first model level (about 14 m AGL) for CMAQ evaluations. 7.1.8.1 MM5 Performance Table 20 lists the results of the statistics for MM5. These values reflect averages over space (all monitoring stations in the innermost MM5 domain) and time (all hours in the simulation episode). All the statistics indicate a good overall agreement between observations and model predictions, especially for 2 m temperature and 10 m wind direction with a correlation coefficient of 0.94 and 0.8, respectively. Considering the low wind speed observed during this period, the model wind speed performance is very good and comparable to values found in the literature (Zhong et al. 2003).

46

Table 20. Performance statistics for the meteorological predictions with 3-km grid spacing

Mean Obs. Mean Sim. Total # (N) Corr. Coeff. R BIAS NMB% MAE NME% RMSE Index of Agrement IOA

10 m wind speed (m s-1)

2m temperature (oC)

Score

18.2 18.8 4599 0.94 0.7 3.7 1.44 7.6 1.7 0.97

3.4 3.0 4414 0.59 -0.3 -8.8 1.2 36.6 1.5 0.75

10 m wind direction (degree) 155 158 4391 0.8 7.3 4.7 28.2 18.2 42.6 0.93

7.1.8.2 CMAQ Performance Measured hourly air quality data at 22 monitoring stations were used in the model evaluation for CMAQ. Qualitatively, the model simulates the diurnal O3 and NO2 concentration patterns very well at all sites. Figure 11 compares the measured O3 and NO2 time series with the modelled results extracted from the first model level (about 14 m AGL) at two representative sites. At both sites, the model captures the O3 night time lows quite well, but it tends to underpredict daytime peaks during high ozone days, for example on 24 June at London Bexley and on 25/26 June at Harwell. (a)

(b) 180

Obs CMAQ (3km) CMAQ (9km)

London Bexley

160

250 200

140 120

-3

O3 (μg m )

-3

O3 (μgm )

Obs CMAQ (3km) CMAQ (9km)

Harwell

100 80 60 40

150 100 50

20 0 12:00 24/06 12:00 25/06 12:00 26/0612:00 27/06 12:00 28/06 12:00

0 12:00 24/0612:0025/06 12:00 26/06 12:00 27/06 12:00 28/0612:00 local time

local time

(c)

(d) 140

London Bexley 120

80

Obs 3km 9km

70

Obs 3km 9km

60 -3

NO2 (μgm )

100 NO2 (μg m -3 )

Harwell

80 60 40

50 40 30 20

20

10

0 12:00 24/06 12:00 25/06 12:00 26/06 12:00 27/06 12:00 28/06 12:00

0 12:00 24/0612:00 25/0612:00 26/06 12:00 27/06 12:00 28/06 12:00

Figure 10. Comparison of measured and modelled time series of O3 (a, b) and NO2 (c, d) concentrations at London Bexley (a, c) and Harwell (b, d). Modelled O3 and NO2 concentrations were from the first model level (about 14 m AGL) for 23 to 28 June 2001

47

Statistical parameters indicate a satisfactory overall model performance (Annex I). The CMAQ model was able to reproduce the observed temporal and spatial variations of O3 (Table 33) and NO2 (Table 34). On average, the model slightly under-predicts O3 concentrations with a BIAS of -3.6 μg m-3 and a MNB of 30 % for 3 km resolution and a BIAS of -0.8 μg m-3 and a MNB of 39.4 % for 9 km resolution. For O3 the 9-km and the 3-km resolution simulations gave comparable model performances. However, the model tends to miss very high peak O3 values. The causes of this disagreement should be investigated as part of future work. In the case of NO2 prediction the model shows an under-prediction of NO2 concentrations with a BIAS of -11.8 μg m-3 and a MNB of –14.7% for 3 km resolution and a BIAS of -13.6 μg m-3 and a MNB of -14.0 % for 9 km resolution. The model performs better for ozone than for primary pollutant such as NO2. For NO2, generally the 3-km resolution gives better predictions than the 9-km resolution simulation. 7.1.9

Evaluation of the MM5-CMAQ-EMIMO Modelling System in Spain Roberto San Jose Computer Science School - Technical University of Madrid, Campus de Montegancedo, Madrid, Spain

The MM5-CMAQ air quality modelling system has been used with the EMIMO model which has been developed by UPM in 2001 and with several versions afterwards (San José et al., 2004, 2005, 2006, 2007; Sokhi et al., 2006). The system has been used in Spain to carry out a large set of air quality impact studies for new industrial sites (combine cycle power plants, incinerators, etc.). In most of the cases the system has been implemented on a domain configuration of 400 x 400 km2, 100 x 100 km2 and 24 x 24 km2 centred on the industrial plant with 9 km, 3 km and 1 km spatial resolution, respectively. Recently, the system has been applied over a whole year, in earlier applications it was only applied for 5 days, a month, or for 60 days. The system is also being applied in forecasting mode to provide air quality forecasts to several Spanish and UK cities (Leicester City Council), such as Madrid, Las Palmas de Gran Canaria. These systems are operating on a daily basis. The system is implemented on sophisticated cluster platforms to provide real-time forecasts of the industrial emissions of electric and cement companies to help to take decisions by the commercial partners and policy makers. In the latter case, the system is run in parallel with different scenarios and the impact of industrial emissions is assessed by calculating the differences between scenarios. For application the system has been calibrated by comparing the concentration results with observational data provided by the different air quality monitoring stations. In Figure 12 the hourly area average observed ozone data obtained from the Madrid Community monitoring network and the average modelled data with MM5-CMAQ-EMIMO are compared for year 2005. The differences between the mean observed values and the mean modelled values are 1.1 µgm-3 which is approximately 0.04 % of uncertainty. This result is obtained with a 3 km spatial resolution model domain nested within a 9 km and 50 km European CMAQ model domain. .

Figure 11. MM5-CMAQ-EMIMO model simulation over Madrid (Spain) domain with 3 km spatial resolution. Comparison between observed and modelled ozone data averaged over 23 different monitoring stations in Madrid Community for 2005 (365 x 24 hours = 8760 data)

48

More details on results of the MM5.CMAQ-EMINO system can be found in Annex J. The system has been used in hindcast mode (air quality impact studies) and forecast mode (real-time forecasting systems with 96 – 120 hours into the future). Figure 13 shows two examples in realtime forecasting mode. (a)

(b)

Figure 12. Ozone observations versus modelled data produced by the MM5-CMAQ-EMIMO air quality modelling system (3 km resolution) operating in real-time 24 hours forecasting mode for (a) 20-28 August 20-28 2006 in Torrejón (Madrid Community) and for (b) 7-28 August 2006 in the Torrejon monitoring station (Madrid Community)

The results obtained by the ESMG (Environmental Software and Modelling Group) of the UPM-FI show similar characteristics to those obtained in applications in UK (see Section 7.1.8). 7.2

Concentrations of Chemical Species Besides the studies mentioned in Section 7.1, numerous studies have been published, in which the performance of a single, specific regional scale Chemical Transport Model (CTM) has been described. The outputs of these CTMs are compared to observations, statistical analysis is carried out, and conclusions are drawn to indicate that “there is reasonable agreement between model results and observations”. Over the last couple of years several model validation and model intercomparison studies have been carried out in Europe, in which several models participated in contrast to the usual single model evaluation studies. The large advantage of such a set-up is that the models are also tested against each other, and that a more open discussion originates, in which the strong and the weak parts of the different models are analysed. One of the first regional scale studies was reported by Hass et al. (1996 and 1997), in which four photo-oxidant models were compared and validated against O3 observations for a 2-day episode in 1990. The results of this study have been used in a later study by Delle Monache and Stull (2003) to investigate the possibilities of ensemble modelling. Roemer et al. (2003) performed a study with ten different CTMs focussing on ozone trends over Europe. This study was followed by a study to intercompare aerosol modelling over Europe (Hass et al., 2003), in which six different CTMs participated. In the framework of the review of the EMEP model, seven models were evaluated and intercompared both for gas-phase species and aerosols (van Loon et al. 2004). Within the project EURODELTA, led by JRC-Ispra, long term ozone simulations are compared and evaluated from calculations by seven different models to analyse their ensemble average, their combined uncertainty and their overall performance (van Loon, 2006; Vautard, 2006).

49

Considering the above stated studies, a number of joint characteristics can be seen: •

•

• •

The model evaluations were focussed on trace gas and aerosol characteristics, and did not explicitly consider meteorology. Obviously, specific studies have been performed to evaluate meteorological models, also focussed on air quality applications, as an example see Seaman (2000). In the studies listed above both prognostic / NWP meteorology as well as diagnostic meteorology is used as input, but the impact of using these two types of input data is not explicitly studied. In general, common types of statistical parameters are used in the analysis, such as correlation coefficient r, RMSE, NMSE, Fractional BIAS, Standard Deviation STDE. Sometimes PDF of differences between observations and model results, frequency analysis, with appropriate plots and tables are calculated. The main driving force of air quality, the emission data are harmonised in the intercomparison study between the different models. The models taking part in these studies should be considered as operational, deterministic models, which address hour-by-hour calculations over several years. This means that these are not empirical models, for which the model evaluation is quite different, or models, which focus on the evaluation of process studies.

Summarising and presenting the results of model intercomparison and validation studies is often quite complicated due to the large amount of data that is produced. At JRC-Ispra in the framework of the City-Delta project, a tool has been developed based on the so-called Taylor diagrams in which the statistical evaluation of models can be presented in a coherent way. At TNO, within the EMEP-review project, a system has been developed by which the statistical analysis of the results of model validation and intercomparison can be presented in a handy, tabular form. In practice, many mistakes are made during model application and model improvement studies by small coding and input errors. A possibility to avoid at least part of such mistakes is the use of a test-set/test-system/quick scan which enables the testing, in a fast way, the behaviour of new and updated model versions. Such a quick scan system might be used in a general way by different modelling groups and might contain a kind of standard test-set. For chemical box models, such a system has been developed by Poppe and Kuhn (1996).

50

8.

MODEL EVALUATION METHODOLOGIES K. Heinke Schlünzen ZMAW, University of Hamburg, Meteorological Institute, Hamburg, Germany

The current status of model evaluation mainly concerns comparisons of single components with data or analytic solutions, model intercomparisons or comparisons with evaluated reference data (Section 7). After some early attempts for suggesting generic evaluation protocols (Model Evaluation Group, 1994) and for mesoscale model evaluation guidelines (Schlünzen, 1997), several more approaches followed during the past few years. The SATURN project (Moussiopoulos, 2003; Borrego et al., 2003) initiated work in this area. Activities, associated with more general model evaluation include: •

The Clean Air for Europe project (CAFE) under the 6th Environment Action Programme that strives to develop a thematic strategy on air pollution. The City-Delta project which has been organised by the Joint Research Centre of the EC and focuses on urban background concentrations in several European cities. The ENV-e-City project which aims to improve access to environmental data, whereby meteorology for air pollution assessments is a pilot application area. The Network of excellence ACCENT*.

• • •

With respect to what the actual procedures of a model evaluation protocol should consist, the initial attempt by the Model Evaluation group (1994) concluded that the following steps should be followed: a. b. c. d. e. f.

A complete description of the model. A complete description of the database which is used for the evaluation of the model. Scientific evaluation: a description of the equations employed to describe the physical and chemical processes that the model has been designed to include. Code verification including sensitivity analysis and model inter-comparison. Model validation including comparison with experimental data and statistical analysis on the basis of selected measures. User oriented assessment which essentially includes a documentation of the code, including best practice guidelines.

The above list is similarly used in a Guideline for evaluating obstacle resolving microscale models (VDI, 2005). The above six points are mainly aiming at the model developer. Following the structure of VDI (2005) they can be summarized in three groups: 1. 2. 3.

General evaluation, includes points a and f and can be performed off the computer without deep scientific knowledge. Scientific evaluation, includes points b and c and can also be performed off the computer but needs scientific knowledge. Benchmark tests, includes points d and e which need computer simulations and detailed comparisons.

The results of these three evaluation steps should be summarized in an evaluation protocol (Figure 14). In a second part of the evaluation, control steps should be suggested to ensure, the model user receives reliable results. These can be part of a best practise guideline (Point f of the above list).

*

http://www.accent-network.org/

51

Objective 1. General evaluation 2. Scientific evaluation

Part I: to be applied by model developer

3. Benchmark tests •Σ Evaluation Protocol

Part II: to be applied by model user

Operational evaluation

Figure 12. Structure of an evaluation guideline (from Schlünzen et al., 2007)

The different steps given in Figure 14 are detailed in Figure 15. While the general evaluation is generic and needs not be more specific for the mesoscale, all other steps need to be defined specifically for the mesoscale and the applications intended (e.g. integration period). (a)

(b)

1. G eneral evaluation

2. Scientific evaluation

C hecking the com prehensibility

Æ Specification needs to be scale dependent Å

1.1 Docum entation m ust be available, consisting of • Short m odel description • Extended m odel description • User m anual • Technical reference

Identify the processes required in the model

1.2 Source code open for inspection 1.3 Three publications in refereed journals

(c)

•

Model equations

•

Model approximations

•

Parameterisations

•

Boundary conditions

•

Initialisation

•

Input data

•

…

(d)

Σ E v a lu a tio n p ro to c o l

3. B enchm ark tests (I) Æ S pecification needs to be scale dependent Å

C o m p ile s a ll e va lu a tio n re su lts on one page

3.1 Q uality indicators • C orr. coef., N R M S E , S tddev, hit rate, etc. • P D F of differences betw een O B S and M O D • Frequency analysis

3.2 D efinition of validation test cases • • • •

S pecification of grid structure D efine tim e scale D efine horizontal and vertical resolution S pecification of input and com parison data • Q uality assurance of data, spatial representativeness, flagging (U se previous intercom parisons and new studies) • P rescribe fixed input data (em issions, land-use, m eteorology, B C , etc.) • S pecify evaluation criteria, evaluation variables and error tolerances

(e)

Operational evaluation Æ Specification needs to be scale dependent Å • Demands on model grid structure • Use operational on-line quality checks of model • Quality control of model results

3.3 D efinition of sensitivity tests (These should reflect the purpose) • A ll specifications as for validation test cases

• No 2 Δx-oscillations (inspection of cross sections). • check of „independence“ of model results from resolution and model area size (5% differences allowed). • check model results for plausibility and – whenever possible- quantitatively compare with measurements and results of other models.

3.4 O n-line quality tests (These are operationally checking for num erical problem s) • N o num erical oscillations in tim e • m ass conservation • no exceedance of threshold values (e.g. negative specific hum idity/concentrations) • ...

• Documentation of model evaluation and model limitations.

Figure 13. Details of the different parts of a generic evaluation guideline. General evaluation (a), scientific evaluation (b), benchmark tests (c), evaluation protocol (d) and operational evaluation (e) (Figures from Schlünzen et al., 2007) 52

The steps 1 to 3 should be performed by a model developer and be summarized in an evaluation protocol, while the operational evaluation should also be performed by the model user. The details on what to check is, however, again scale specific and needs to be defined within an evaluation protocol. The structure outlined in Figure 14 and Figure 15 is already applied in VDI (2005) for the evaluation of microscale models. It is currently taken as structure for VDI (2008) and outlines the structural approach to be used in COST728. Here, models are to be evaluated with respect to their ability for air pollution dispersion applications. For this purpose special focus will be on meteorological parameters (e.g. wind direction, PBL height, radiation and, in addition, concentrations). The model resolution for which the guideline is to be developed is 1-16 km in the horizontal direction. Focus will lie on the model results with respect to hourly data within a time period between a few days and up to one year.

53

9.

USER TRAINING

Marko Kaasik(1), Ranjeet Sokhi(2), K. Heinke Schlünzen(3), Gertie Gertsema(4), Barbara Fay(5), Liisa Jalkanen(6) (1) University of Tartu, Institute of Environmental Physics, Tartu, Estonia (2) Centre for Atmospheric and Instrumentation Research (CAIR), University of Hertfordshire, UK (3) ZMAW, University of Hamburg, Meteorological Institute, Hamburg, Germany (4) KNMI, Section Observations and modelling / Department research applied models, De Bilt, Netherlands (5) Deutscher Wetterdienst, Offenbach, Germany (6) WMO, 7 bis avenue de la Paix, Postale N° 2300, Geneva, Switzerland

User Training is an important aspect of model implementation and application, especially for routine or operational situations or assessment studies. In order to gauge the status of user training provision within Europe, a questionnaire about mesoscale model user training was distributed to Cost728 participants. This consultation resulted in 10 responses from 6 countries (Annex N). The main responses were collected in November 2006 with some updates in October 2007. Different responses from the same country represent either different institutions or different models in the same institution. Regular user training exists in Finland, Germany, Portugal and United Kingdom. Formal user training dedicated to mesoscale models was reported not to exist in Estonia, Norway and the Netherlands. In these countries the users are trained individually, in the form of individual supervising and consultation. 9.1

User Training in Different Countries

Estonia Regular training courses for users of any specific model do not exist. There is a course “Numerical Methods in Meteorology” (32 hours for M.Sc. students) in the University of Tartu, including basics of numerical integration of equations of the atmosphere and parametrizations. Students get hands-on experience individually, supervised by researchers of the working group of atmospheric dynamics, preparing their B.Sc., M.Sc. and Ph.D theses. The number of students specialised in atmospheric modelling (one per year as average) is not sufficient to carry out a specialised course. The basic air quality model is SILAM and the basic meteorological model – HIRLAM. Most of students acquire skills as model developers at B.Sc. or Ph.D. level. France Two training courses were presented for France. For Meso-NH the training course mainly focuses on accidental realises of radioactivity and aiming on Meso-NH users. The course takes 25 hours, covering code modification as well as the results’ visualization and Meso-NH-Chemistry. Half of the time is dedicated to practical work on all the topics. Web sites are available for the Meso-NH training course†, as well as for the training course for CHIMERE‡. Each new CHIMERE user can run the model for a real test-case by using the web site supported by model documentation. In this documentation, a chapter is dedicated to this test run. Germany Training on at least the theory of mesoscale atmospheric modelling is given at most Universities in Germany, where Meteorology can be studied as a Major (Berlin, Bonn, Cologne, Frankfurt, Hamburg, Hannover, Karlsruhe, Kiel, Leipzig, Mainz). Some of the institutes have a research focus on mesoscale meteorology and add to the theoretical training hands-on lectures for all students. This is most deeply developed at University of Hamburg, where undergraduate students start to use mesoscale models with only the basic knowledge on the theory of modelling. The course is extended within in new master course, starting 2008, by adding to the current lecture curriculum another 28 hours of lecture dedicated to numerical schemes and to physical modelling.

†

http://mesonh.aero.obs-mip.fr/mesonh/

‡

http://euler.lmd.polytechnique.fr/chimere/

54

The example model used for applications is the in-house developed mesoscale model METRAS that is also used for consultancy work in Europe. Lectures are therefore also open for consultants. Another training institution is the German Weather Service (Deutscher Wetterdienst) which provides training for the use of the non-hydrostatic COSMO_EU of the COSMO consortium on small-scale modelling and DWD’s High Resolution Model HRM (Majewski, 2001) which is the operational NWP model in 9 countries worldwide. While the COSMO_EU user training used to be a one-day workshop of mainly practical training in operating the COSMO_EU with different run-time options for university students and future COSMO_EU users, it recently has tended to be a more specialised and theoretical training, e.g. in advanced numerical methods. The HRM user training may be a 1- to 2-week course at DWD or abroad aimed at HRM users often from less scientifically advanced countries who operate the HRM. It comprises intensive theoretical lectures and handson training on all model aspects. It is supplemented by visits of DWD members abroad to install the HRM at the specific institutes and perform hands-on training. Regularly, groups of HRM scientists from abroad also spend about a month at DWD for specific HRM work and research. Portugal In Portugal the University of Aveiro in collaboration with the Institute for the Environment and Development organize training events with the TAPM model. The majority of the participants come from institutions connected to the government mainly to apply and understand the model features, its capabilities and skills. At their institutions these users will be able to install the model, run it and interpret the obtained results. The adequate assessment and interpretation of the model predictions based on the input parameters was one of the main goals of the training events. In the future, these training events may be repeated for other air quality models that are operated at the University of Aveiro such as MARS, CAMx and CHIMERE. The Netherlands Regular users-consultation sessions are given in which new developments are communicated together with the reasons for these changes and their consequences are given. Regular training course to new users and developers is not available. If necessary new users can follow courses at ECMWF. Model users and developers are academics mostly with a PhD in physics, training is thus being provided by universities. Dedicated training for special purposes is on ad-hoc basis. Training of model developers is mainly on the job, i.e. information and knowledge is acquired through close collaboration with experienced developers. Forecasters are model output users who need a profound understanding of the characteristics of NWP models. Nowadays forecasters are academics holding a degree in physics with a major in meteorology. New forecasters receive training courses either in-house, or at sister institutes (e.g. ECMWF). Experience is acquired through the operational tasks. Information on model updates is given in meetings, via a Dutch magazine dedicated to meteorology and via intranet. In the in-house courses different aspects of meteorology are trained. These courses last typically half a day to several days. UK Within the UK most, if not all, users of mesoscale models are within the atmospheric science ‘research’ community, represented by the National Centre for Atmospheric Science (NCAS §). All main UK atmospheric science research groups are affiliated to NCAS which also provides the infrastructure and support for its members. In addition to the research community, other organizations are also interested in the use of mesoscale models for air pollution problems. These include the Governments’ Department of Environment, Food and Rural Affairs (DEFRA), the Environment Agency and industrial users such as those within the power industry. Although individual research groups can employ any mesoscale model, NCAS only provides support for the Unified Model (UM) which has been developed by the UK Met Office. Details on the NCAS Computational Modelling Support can be found on the web**. §

http://www.ac.uk

**

http://ncas-cms.nerc.ac.uk/

55

The users (research, policy and industrial) which have a particular interest in air pollution application of mesoscale models belong to the MESOMAQ network (Mesoscale Modelling for Air Quality applications††)). A list server has been setup for MESOMAQ which aids communication and exchange of ideas in this field‡‡. As part of MESOMAQ new developments are underway to enable users to employ UM meteorological fields to drive the CMAQ model. In addition to some universities providing science based training in mesoscale modelling, NCAS also organises training sessions on the Unified Model (UM). Future initiative are underway to extend this form of training. USA Comprehensive training is provided within the USA by CMAS (Community Modelling and Analysis System) on the use of the Models 3 System§§. Although much of the training is held within the USA, recently, training has been provided by CMAS within Europe (e.g. as supported by ACCENT). Models 3 training normally includes an introduction to CMAQ and the emissions processor SMOKE. Separate events have been held on MM5 training. With the development of the new Weather Research and Forecasting (WRF) model, training is regularly being offered***. WMO The WMO GAW Urban Research Meteorology and Environment (GURME) project Training Team has developed a training course on Air Quality Forecasting (AQF†††). The five day course was delivered in Lima, Peru, in July 2006 for participants from Latin American countries, please see Annex O for course content. The use of satellite data for accessing aerosol properties was included in training in January 2006, the presentations are available for download at http://www.wmo.int/pages/prog/arep/gaw/urban_training_finals_en.htm. This topic is also planned to be part of appropriate future AQF courses. South Asian countries, in December 2008 in Pune, India, received training on AQF, including air quality impacts on health and agriculture and AQ management for policy support, from local and foreign instructors. A course is planned to be held together with AIRNow International in Shanghai in spring 2009 for enhancing capabilities for air quality forecasting. 9.2

Summary on User Training The inventory seems too incomplete to provide an overall evaluation of user training or highlight which country might provide a better user training in Europe and elsewhere. Therefore, the following summary should not be interpreted as an evaluation of the training needs and provisions in the different countries but as an effort to derive possible advantages and shortcomings from the information available. In Germany the user training is carried out at the Deutsche Wetterdienst (models COSMO_EU and HRM) and at the universities. As an example, detailed information was given for the University of Hamburg (METRAS model). In United Kingdom training exists on the UM model, in Finland – Finnish Meteorological Institute (SILAM model) and in Portugal – University of Aveiro plus Institute for the Environment and Development (TAPM model). The two courses provided for France have both web based documentations. The number of academic hours in the course varies greatly – from 8 to 90; 33 hours are the average training hours. The frequency of introductory courses is 1-2 times per year, some more specific and individual parts up to 4 times per year. In Germany the courses are long and detailed, based on 1-2 teachers per course (plus invited teachers sometimes). In other countries the specific parts of the course are divided between experts. Teachers are either atmospheric scientists (meteorologists) or computer specialists. The average number of participants varies from 6-20 per ††

, http://ncasweb.leeds.ac.uk/mesomaq/

‡‡

http://www.ncas.ac.uk/mailman/listinfo/mesomaq

§§

http://www.cmascenter.org/

***

http://www.wrf-model.org

†††

http://www.wmo.ch/web/arep/gaw/urban.html

*


56

basic course and 6-35 per country. The highest number per country is in Germany. However, the relatively small Finland is remarkable with 20 participants once, or sometimes even twice per year. Advanced or specific courses have less participants in general, but there exist seminars with 20-40 participants held 2-3 times per year in UK. Among the objectives of the lectures are listed training for operational modelling and research. Hands-on lectures constitute 33-75% of volume of the course (higher fraction for shorter courses). Only courses in the University of Hamburg and DWD (most extensive ones) end with examination. In general the trainers expect that a skilled user can run the model independently, understand the basic constructions of the model and general input-output relationships. In Portugal the user must be able to install the TAPM model, understand its basic formulations and interpret the obtained results. 9.3

Recommendations for User Training In general, we recommend to distinguish two levels of training courses (1) for model users (operational and practical in character) and (2) for model developers (more emphasis on the science of the model and its application). 9.3.1

Model User The preliminary knowledge expected from model users includes basic computer skills (data processing, data formats) and desirably some knowledge about processes in the atmosphere and environmental management. During the course, they must be trained to run the existing programme, to make necessary changes in the input and be able to understand and critically evaluate the results. Such training is targeted at practical application and (to some extent) at testing the model. It is desirable that a model user is able to install the model. However, in the case of a complicated model, consisting of several modules of different types, the model package needs to be configured for the certain configuration of operational system, installing is beyond the skills of ordinary user. The skills for installing are intermediate between necessary ones for an ordinary user and a developer in such a case. 9.3.2

Model Developer The preliminary knowledge expected (in addition to the user skills) from people to be trained as model developers, is understanding of processes in the atmosphere and their mathematical representation, and programming language(s) used to write the code. Typically, an academic degree in atmospheric sciences is expected. Alternatively, for developing some certain parts of the code (e.g. data assimilation, interfaces for user and other software modules), advanced skills in IT and programming are desirable. The course includes overviews of functions of modules and connections between them, functionality and mathematical formulation of each module and how they are programmed. Handson exercises must be sufficient to learn to change, debug and compile the code and control the changes. As a result, the developer must be able to follow the results of any changes in the code (or a part of code he / she is expected to develop) and critically evaluate the output: whether the changes in the code made it better (results closer to reality, a test case, an analytical solution etc.) or worse, and how much. However, some creativity on defining problems and planning of further steps is expected from a developer.

57

10.

CONCLUSIONS

This report provides an overview on the range of mesoscale models being used for air pollution dispersion applications by the COST728 partners. The emphasis of this report is on existing methodologies for mesoscale meteorological model evaluation and related applications. Results from several validation and evaluation exercises are summarized and an overview of user training in different countries is presented. The report contains the basic information for developing recommendations for model evaluation that will be specified in the next phase of work. In addition, a formal evaluation protocol will be developed. A goal of this protocol is to lead to a model quality assurance process that is based on scientific and fundamental principles. The protocols will be target orientated and therefore be different for the three different time scales considered in COST728: • • •

Episodes (few days). Single cases that concern meteorological situations relevant for determining statistical values. Extended periods / years, on an hour-by-hour to daily averaged basis, to determine air quality concentrations relevant for the EU-Directives.

The time scale oriented evaluation protocol will follow the structure given in Figure 14. Test cases are currently defined for COST728 that will allow testing of the evaluation protocol. The target of the protocol are both scientific and user oriented. From the users point of view the objectives are: • •

To define quality standards for meteorological data usable to perform air quality assessment (concentration hindcast). To assess the quality of meteorological input data to forecast air pollution for the next 2-3 days (mainly PM10 daily average, O3 hourly average and 8 hours running mean, NO2 hourly average) from regional to local scale (urban agglomerations, about 5*5 km2 horizontal resolution).

Several points can be highlighted from the report: •

•

•

A web-based inventory of mesoscale models, with details on model characteristics, has been created by COST728 and is accessible from the web*. It is hoped that the database will also be useful to the wider mesoscale modelling community and will act as a focal point for users requiring technical information in a summarised format on mesoscale models. The model inventory is already additionally used by the microscale and global scale modelling community. Selected applications of mesoscale models have been reviewed. These include research and policy related applications spanning local to regional scales. The perspective, however, has been from the evaluation view point with the aim of providing a first overview of the methods being used by COST728 partners to test and evaluate their models. University partners are mostly research oriented, while meteorological services, due to their responsibilities, combine scientific activity with more practical and policy oriented applications. The examples cited in this report show that there is a clear tendency to go in the direction of unified or integrated atmospheric models, sometimes referred to as ‘oneatmosphere’ models, where all the main aspects are treated within the same modelling framework. Estimation of model uncertainty can be examined either through Monte Carlo analysis or through sensitivity studies. Examples are given where the model uncertainty has been estimated in meteorological parameters as well as in pollutant concentrations. Input parameters can be varied through sensitivity analysis to estimate the resultant change in the output concentrations. This technique has also been applied to investigate the influence of model configuration and settings including changes in initial and boundary conditions, nesting of domains, resolution and the use of parametrization schemes. 58

•

•

•

•

A range of model performance quality indicators have been examined including standard error STDE, BIAS, correlation coefficient r and hit rates H. Examples of case studies have been given to show how these indicators should be used for meteorological and air quality applications. More recent indicators such as Relative Percentile Error (RPE) are also examined and could be more robust than the RME indicated in the first EU Air Quality Directives. The measure is currently reformulated and will probably soon be replaced by a measure similar to RME (Eq. 1), but with using the limit values in the denominator. It is important that model validation datasets should be independent of those used for setting up or calibrating models. To provide a confidence in the model performance, the model should be tested across the range of conditions relevant to its intended applications. In order to facilitate model evaluation, datasets have been collated on a meta-database developed with in COST728 (accessible via www.cost728.org). Such needs have been examined within a wider context of model applications and other European and international initiatives such as ACCENT, IGBP, BIOMOVS II and VAMP. Validation datasets need to be of known quality. Although the above concepts and framework are continuously being developed within this COST Action, some selected examples of individual studies are provided to show the range of approaches being adopted by the wider community. This report will form the basis for developing more common recommendations and protocols for evaluating mesoscale models. It will lead to model quality assurance procedures based on scientific and fundamental principles. It is intended that these will be employed in joint case studies being planned by COST728. A limited overview of training for mesoscale modelling in some European countries has been conducted. It was evident that even from this small survey the level of training is quite disparate within the EU. Regular user training seems only to exist in a few countries including Finland, France, Germany, Portugal and United Kingdom.

59

REFERENCES AQEG, 2004: Nitrogen Dioxide in the United Kingdom. Air Quality Expert Group (AQEG) First Report, Department of the Environment, London, U.K.† Baklanov, A., Fay, B., Kaminki, J. (2007): Overview on existing integrated (off-line and on-line) mesoscale systems in Europe. Available from the web-site: http://www.cost728.org . Beekmann, M., Derognat, C., 2003: Monte Carlo uncertainty analysis of a regional-scale chemistry model constrained by measurements from the Atmospheric Pollution Over the Paris Area (ESQUIF) campaign. Journal of Geophysical Research, 108 , 8559, doi: 10.1029/2003JD003391. Bergin, M.S., Noblet, G.S., Petrini, K., Dhieux, J.R., Milford, J.B., Harley, R.A., 1999: Formal uncertainty analysis of a Lagrangian photochemical air pollution model. Environmental Science and Technology, 33(7), 1116-1126. Bohnenstengel, S. and Schlünzen, K.H., 2006: A locality index to classify meteorological situations with respect to precipitation, submitted to Journal of Applied Meteorology, in review. Bonafè, G. and Jonghen, S., 2006: “ LAMI verification for air quality forecast and assessment purposes: case studies, special measurement campaigns, long-term evaluation, ARPA-SIM Internal Report (available from www.arpa.emr.it/sim). Borrego, C., Schatzmann, M. and Galmarini, S., 2003. Quality assurance of air pollution models. Chapter 7 in SATRUN (Studying Atmospheric Pollution in Urban Areas) ed. Moussiopoulos, Springer. Borrego, C.; Miranda, A.; Costa, A.; Monteiro, A.; Ferreira, J.; Martins, H.; Tchepel, O.; Carvalho, A., 2006: AIR4EU Milestone Report 6.5 - Cross-Cutting 2: Uncertainties of Models & Monitoring, July 2006, Portugal. Borrego, C.; Monteiro, A.; Ferreira, J.; Miranda, A.I.; Costa, A.M.; Sousa, M., 2005: Modelling uncertainty estimation procedures for air quality assessment. In 3rd International Symposium on Air Quality Management at Urban, Regional and Global Scales (AQM), 26-30 September 2005; Istanbul, Turkey - Proceedings of the 3rd International Symposium on Air Quality Management at Urban, Regional and Global Scales. Eds. S. Topçu, M.F. Yardim, A. Bayram, T. Elbir and C. Kahya, Vol. I, pp. 210219. Bossioli, E., Tombrou, M., Dandou, A., Soulakellis, N., 2007: Simulation of the effects of critical factors on ozone formation and accumulation in the greater Athens area. Journal of Geophysical Research, 112, D02309. Carvalho, A.C.; Carvalho, A.; Gelpi, I.; Barreiro, M.; Borrego, C.; Miranda, A.I.; Pérez-Muñuzuri, V.: 2006: Influence of topography and land use on pollutants dispersion in the Atlantic Coast of Iberian Peninsula. Atmospheric Environment 40 (21), 3969-3982. Chang, J.C. and Hanna, S.R., 2004: Air quality model performance evaluation. Meteo. Atmos. Phys. 87, 167196. Cox, R., Bauer, B. L. and Smith, T. 1998: Mesoscale model intercomparison. Bull. Am. Meteorol. Soc. 79, 265–283. Dabberdt, W.F., Carroll, M.A., Baumgardner, D., Carmichael, G., Cohen, R., Dye, T., Ellis, J., Grell, G., Grimmond, S., Hanna, S., Irwin, J., Lamb, B., Madronich, S., McQueen, J., Meagher, J., Odman, T., Pleim, J., Peter, H., Westphal, D. L., 2004: Meteorological Research Needs for Improved Air Quality Forecasting: Report of the 11th Prospectus Development Team of the U.S. Weather Research Program. Bulletin of the American Meteorological Society, 85 (4), 563-586. Delle Monache, L. and Stull, R., 2003: An ensemble air-quality forecast over western Europe during an ozone episode. Atmos. Environ., 37, 3469-3474. Deserti, M., Lollobrigida, F., Angelino, E., Bonafè G., Minguzzi, E., Stortini, M.; Cascone, C., De Maria, R., Clemente, M., Mossetti, S. and Angius, S., 2004: “Modelling techniques for air quality evaluation and managing in Italy: the work of the national Topic Center”; Proceedings of the 9th Int. Conf. On harmonization within atmospheric Dispersion Modelling for Regulatory Purposes, 197 – 201. Ebel, A., Elbern, H., Feldmann, H., Jakobs, H. J., Kessler, C., Memmesheimer, M., Oberreuter, A. and Piekorz, G., 1997: Air Pollution Studies with the EURAD Model System (3): EURAD - European Air Pollution Dispersion Model System, Mitteilungen aus dem Institut für Geophysik und Meteorologie der Universität zu Köln, Heft 120.

†

. http://www.defra.gov.uk

60

EC, 1997: Council Decision 97/101/EC, establishing a reciprocal exchange of information and data from networks and individual stations measuring ambient air pollution within the Member States, OJ L 035, 05.02.1997, 14-22, and its Amended Annexes to 97/101/EC, Commission Decision 2001/752/EC, OJ L 282, 26.10.2001, 69-76.‡ EC, 1999: First Daughter Directive, Council Directive 1999/30/EC, relating to limit values for sulphur dioxide, nitrogen dioxide and oxides of nitrogen, particulate matter, and lead in ambient, OJ L 163, 29.06.1999, 41-60.§ EC, 2002: Third Daughter Directive, Council Directive 2002/3/EC, relating to ozone in ambient air, OJ L 67, 09.03.2002, 14-30.** Elbir, T., 2003: Comparison of model predictions with the data of an urban air quality monitoring network in Izmir, Turkey; Atmospheric Environment 37 (2003) 2149–2157. Fay, B. and L. Neunhäuserer (2006) Evaluation of high-resolution simulations with the non-hydrostatic numerical weather prediction model Lokalmodell for urban air pollution episodes in Helsinki, Oslo and Valencia. Atm. Chem. and Phys.,6. SRef-ID: 1680-7324/acp/2006-6-2107, 2107-2128. Fay, B., L. Neunhäuserer, J.L. Palau, G. Perez-Landa, J.J. Dieguez, V. Ødegaard, G. Bonafe, S. Jongen, A. Rasmussen, B. Amstrup, A. Baklanov, U. Damrath (2005) Evaluation and inter-comparison of operational mesoscale models for FUMAPEX target cities. EU-project FUMAPEX Report D3.4, DWD, Offenbach, Germany, 110pp. Fay, B., L. Neunhäuserer, J.L. Palau, J.J. Dieguez, V. Ødegaard, N. Bjergene, M. Sofiev, M. Rantamäki, I.Valkama, J. Kukkonen, A. Rasmussen, A. Baklanov (2004) Model simulations and preliminary analysis for three air pollution episodes in Helsinki. EU-project FUMAPEX Report D3.3, DWD, Offenbach, Germany, 60pp. Fay, B., Neunhäuserer L., Baklanov, A., Bonafé, G., Jongen, S., Kukkonen, J., Ødegaard, V., Palau, J. L., Perez-Landa, G.,Rantamäki, M., Rasmussen, A., Sokhi, R. S., Yu, Y. (2006) Final results of the model inter-comparison of high-resolution simulations with numerical weather prediction models for 8 urban air pollution episodes in 4 European cities in the FUMAPEX project. Proceedings for oral pres. at 28th ITM, May 2006, Leipzig, Germany. Fay, B., Neunhäuserer, L., Ødegaard, V., Sofiev, M., Valkama, I., Kukkonen, I., Palau, J.L., Pérez-Landa, G., Bonafé, G., Rasmussen, A., Baklanov, A., 2004.: Evaluating and inter-comparing operational NWP and mesoscale models for forecasting urban air pollution episodes in FUMAPEX. 4th Annual Meeting of the European Meteorological Society. Nice, France, 27-30 Sep 2004. Fine, J.; Vuilleumier, L.; Reynolds, S.; Roth, P., and Brown, N., 2003:. Evaluating Uncertainties in Regional Photochemical Air Quality Modeling. Annual Reviews. Environmental Resources. 28:59-106. Downloaded from http://arjournals.annualreviews.org/. Galmarini, S. , Bianconi, R. , Addis, R. , Andronopoulos, S. , Astrup, P. , Bartzis, J.C., Bellasio, R. , Buckley, R. , Champion, H. , Chino, M. , D'Amours, R. , Davakis, E., Eleveld, H. , Glaab, H. , Manning, A. , Mikkelsen, T. , Pechinger, U. , Polreich, E., Pradanova, M. , Slaper, H. , Syrakov, D. , Terada, H. , van der Auwera, L. (2004). Ensemble dispersion forecasting, Part II: Application and evaluation. Atmospheric Environment, 38, 28, 4619-4632. Galmarini, S., Bianconi, R. , Klug, W., Mikkelsen, T. , Addis, R. , Andronopoulos, S. , Astrup, P. , Baklanov, A., Bartiniki, J., Bartzis, J.C., Bellasio, R., Bomyay, F., Buckley, R., Bouzom, M., Champion, H. , D'Amours, R. , Davakis, E., Eleveld, H. , Geertsema, G.T., Glaab, H., Kollax, M., Ilvonen, M., Manning, A. , Pechinger, U., Persson, C., Polreich, E., Potemski, S., Pradanova, M., Saltbones, J., Slaper, H., Sofiev, M.A., Syrakov, D. , Sørensen, J.H., van der Auwera, L., Valkama, I., Zelazny, R., 2004: Ensemble dispersion forecasting, Part I: Concept, approach and indicators. Atmospheric Environment, 38, 28, 4607-4617. Grell, A.G., Dudhia, J. and Stauffer, D.R., 1993: A Description of the Fifth-Generation PENN STATE/NCAR MESOSCALE MODEL (MM5), NCAR Technical Note 398+IA. National Center for Atmospheric Research, Boulder, Colorado, USA. Hanna, S.R., Chang, J.C, Strimaitis, D.G., 1993: Hazardous gas model evaluation with field observations. Atmos Environ 27A: 2265-2285.

‡

http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31997D0101:EN:NOT http://air-climate.eionet.europa.eu/announcements/country_tools/aq/aq-dem/docs/2001_752_EC.pdf § http://eur-lex.europa.eu/LexUriServ/site/en/oj/1999/l_163/l_16319990629en00410060.pdf ** http://eur-lex.europa.eu/pri/en/oj/dat/2002/l_067/l_06720020309en00140030.pdf

61

Hanna, S.R., Chang, J.C. and Fernau, M.E., 1998 Monte Carlo estimates of uncertainties in predictions by a photochemical grid model due to uncertainties in input variables. Atmospheric Environment 32 (21), 3619-3628. Hanna, S.R., Zhigang, L., Frey, H.C.; Wheeler, N., Vukovich, J., Arunachalam, S., Fernau, M., Hansen, D.A., 2001: Uncertainties in predicted ozone concentrations due to input uncertainties for the UAM-V photochemical grid model applied to the July 1995 OTAG domain. Atmospheric Environment 35, 891-903. Hass, H., 1991: Description of the EURAD Chemistry-Transport-Model Version 2 (CTM2), Mitteilungen aus dem Institut für Geophysik und Meteorologie der Universität zu Köln, Heft 83. Hass, H., Builtjes, P.J.H., Simpson, D., and Stern, R., 1997: Comparison of model results obtained with several European regional air quality models, Atmos. Environ., 31, No. 19, 3259-3279. Hass, H., van Loon, M., Kessler, C., Matthijsen, J., Sauter, F., Stern, R., Zlatev, R., Langner, J., Fortescu, V., Schaap, M., 2003: Aerosol modeling: results and intercomparison from European Regional-scale Modeling Systems. A contribution to the EUROTRAC-2 subproject GLOREAM. EUROTRAC report. Hogrefe, C., Rao, S.T., Kasibhatla, P., Hao, W., Sistla, G., Mathur, R., and McHenry, J., 2001: Evaluating the performance of regional-scale photochemical modelling systems: Part II – ozone predictions. Atmospheric Environment, 35, 4175-4188. Kukkonen J., Valkonen E., Walden J., Koskentalo T., Aarnio P., Karppinen A., Berkowicz R.and Kartastenpää R. 2001: A measurement campaign in a street canyon in Helsinki and comparison of results with predictions of the OSPM model. Atmos. Environ. 35-2, pp 231-243. Kukkonen, J., Valkonen E., Walden J., Koskentalo T., Karppinen A., Berkowicz R. and Kartastenpää R., 2000: Measurements and modelling of air pollution in a street canyon in Helsinki. Environmental Monitoring and Assessment 65 (1/2):371-379. Kunz, R. and Moussiopoulos, N. 1995: Simulation of the wind field in Athens using refined boundary conditions. Atmos. Environ. 29, 3575-3591. Lenz, C.-J., Müller, F. and Schlünzen, K.H., 2000: The sensitivity of mesoscale chemistry transport model results to boundary values. Env. Monitoring and Assessment, 65, 287 -298. Majewski, D.,2001: HRM – User’s Guide. Document by Deutscher Weterdienst, Offenbach, Germany, 73 p. Menut, L., 2003: Adjoint modelling for atmospheric pollution sensitivity at regional scale. Journal of geophysical research, 108, 8562, doi: 10.1029/2002JD002549. Model Evaluation Group, 1994: Guideline for model developers’ and model evaluation protocol. European Community, DG XII, Major Technological Hazards Programme, Brussels, Belgium. Moussiopoulos, N. (ed.), 2003: Air Quality in Cities, SATURN/EUROTRAC-2, Subproject Final Report, Springer, Berlin, 298 pp. Moussiopoulos, N., 1995: The eumac zooming model, a tool for local-to-regional air quality studies. Meteor. Atmos. Phys., 57, 115-133. Müller, F., Schlünzen, K.H. and Schatzmann, M., 2000: Test of numerical solvers for chemical reaction mechanisms in 3D air quality models. Environmental Modelling & Software, 15, 639-646. Neunhäuserer, L., Fay, B., Raschendorfer, M. (2007): Towards urbanisation of the non-hydrostatic numerical weather prediction model Lokalmodell (LM). Bound. Lay. Met. 124, 81-97. Niemeier, U., 1997: Chemische Umsetzungen in einem hochauflösenden mesoskaligen Modell Bestimmung geeigneter Randwerte und Modellanwendungen, Berichte aus dem Zentrum für Meeres- und Klimaforschung, Reihe A, 28. Zentrum für Meeres- und Klimaforschung der Universität Hamburg, Meteorologisches Institut. Nodop, K., Klug, W., Kulmala, A., Dop, H.V., Pretel, J., Addis, R., Fraser, G., Girardi, G.G.F., Inoue, Y., and Kelly N., 1998: ETEX: a European tracer experiment; observations, dispersion modelling and emergency response. Atmos. Environ. 32(24), 4089 – 4094. Nurmi, Pertti (1994) Recommendations on the verification of local weather forecasts, Technical Memorandum 200, European Centre for Medium-Range Weather Forecasts, Shinfield Park, Reading. http://www.ecmwf.int/publications/library/do/references/show?id=86094 Ødegaard, V., A. D'Allura, A. Baklanov, J. Dieguez, B. Fay, S. Finardi, H. Glaab, S.C. Hoe, M. Millan, A. Mahura, L. Neunhauserer, J.L. Palau, G. Perez, L.H. Slørdal, A. Stein, J. Havskov Sørensen (2005) Study of sensitivity of UAP forecasts to meteorological input, met.no report 13/2005. http://met.no/english/r_and_d_activities/publications/2005/13_2005/abstract_13_2005.html. Olesen, H.R., 2001: Ten years of harmonization activities: past, present and future. 7th Int. Conf. On Harmonization within Atmospheric Dispersion Modelling for Regulatory Purposes, Belgirate, Italy. National Environmental Research Institute, Roskilde, Denmark. Web page: www.harmo.org. 62

Olesen, H.R., 2007: Computing hit rate. National Environmental Research Institute, Roskilde, Denmark. Web page: www.harmo.org. Pernigotti, D., Sansone, M. and Ferrario, M., 2005: Validation of one-year LAMI model re-analysis on the PoValley, northern Italy. Comparison to CALMET model output on the sub-area of the Veneto Region. Proceedings of the 10th International Conference on Harmonisation within Atmospheric Dispersion Modelling for Regulatory Purposes, HARMO 10, Sissi (Crete, Greece) 17-20 October 2005. Poppe, D. and Kuhn, M., 1996: Intercomparison of the gas-phase chemistry of several chemistry and transport models. EUROTRAC –ISS Report. Roemer, M., Beekmann, M., Bergström, R., Boersen, G., Feldmann, H., Flatøy, F., Honore, C., Langner, J., Jonson, J.E., Matthijsen, J., Memmesheimer, M., Simpson, D., Smeets, P., Solberg, S., Stern, R., Stevenson, D., Zandveld, P. and Zlatev, Z., 2003: Ozone trends according to ten dispersion models. EUROTRAC-2 Special Report, EUROTRAC International Scientific Secretariat, GSF – National Research Center for Environment and Health, Munich, Germany. San José, R., Pérez, J.L., González, R.M., 2004: A mesoscale study of the impact of industrial emissions by using the MM5-CMAQ modelling system. Intern. J. of Environment and Pollution, 22, 1 – 2, 144 – 162. San José, R., Stohl, A., Karatzas, K., Bohler, T., James, P. , Pérez, J.L., 2005: A modelling study of an extraordinary night time episode over Madrid domain. Environmental Modelling and Software, 20, 5, 587-593. San José, R., Pérez, J.L., González, R.M., 2006: The use of MM5-CMAQ for an incinerator air quality impact assessment for metals, PAH, dioxins and furans: Spain case study. Lecture Notes, Large–Scale Scientific Computations, pp. 498-505. Springer-Verlag GmbH. Computer Science. Vol 3743. San Jose, R., Pérez, J.L., González, R.M., 2007: An operational real time air quality modelling system for industrial plants. Environmental Modelling and Software , 22, 297-307. Sathya, V., 2003: Uncertainty analysis in air quality modelling – the impact of meteorological input uncertainties. PhD Thesis. École Polytechniquye Féderale de Lausanne. Schlünzen, K.H., 1990: Numerical studies on the inland penetration of sea breeze fronts at a coastline with tidally flooded mudflats, Beitr. Phys. Atmosph., 63, 243-256. Schlünzen, K.H., 1997: On the validation of high-resolution atmospheric mesoscale models , J. Wind Engineering and Industrial Aerodynamics, 67 & 68 , 479-492. Schlünzen, K.H., 2002: Simulation of transport and chemical transformations in the atmospheric boundary layer - review on the past 20 years developments in science and practice. Meteorol. Zeitschrift, 11, 303 - 313. Schlünzen, K.H., Bigalke, K., Lüpkes, C., Niemeier, U., and von Salzen, K., 1996: Concept and realization of the mesoscale transport- and fluid-model 'METRAS', Meteorologisches Institut, Univerität Hamburg, Germany, METRAS Techn. Rep. 5, 156. Schlünzen, K.H., Builtjes, P., Deserti, M., Douros, J., Kaasik, M., Labancz, K., Matthias, V., Miranda, A.I., Moussiopoulos, N., Ødegaard, V., San Jose, R., Sokhi, R., Sofiev, M., Struzewska, J., 2007: Model evaluation methodologies for mesoscale atmospheric models. DACH 2007, 10.-14.09.2007, Hamburg, extended abstract on the web. †† Schlünzen, K.H., Hinneburg, D., Knoth, O., Lambrecht, M., Leitl, B., Lopez, S., Lüpkes, C., Panskus, H., Renner, E., Schatzmann, M., Schoenemeyer, T., Trepte, S. and Wolke, R., 2003: Flow and transport in the obstacle layer - First results of the microscale model MITRAS. J. Atmos. Chem., 44, 113-130. Schlünzen, K.H., Katzfey, J.J., 2003: Relevance of subgrid-scale land-use effects for mesoscale models. – Tellus 55A, 232–246. Schlünzen, K.H., Krell, U., 1994: Mean and local transport in air. In: Circulation and contaminant fluxes in the North Sea, Springer Verlag, Berlin, p.317-344. Schlünzen, K.H., Meyer, E.M.I, 2007: Impacts of meteorological situations and chemical reactions on daily dry deposition of nitrogen into the Southern North Sea. Atmospheric Environment, 41-2, 289-302. Scire, J. et al., 2000: A user’s guide for the CALMET Meteorological Model. Seaman, N.L., 2000: Meteorological modeling for air-quality assessments. Atmospheric Environment 34, 2231-2259. Shafran, P.C., Seaman, N.L. and Gayno, G.A., 2000: Evaluation of Numerical Predictions of Boundary Layer Structure during the Lake Michigan Ozone Study. Journal of Applied Meteorology 39 (3), 412–426. ††

http://meetings.copernicus.org/dach2007/download/DACH2007_A_00399.pdf

63

Sokhi, R.S., San José, R., Kitwiroon, N., Fragkou, E., Pérez, J.L., Middleton, D.R., 2006: Prediction of ozone levels in London using the MM5-CMAQ modelling system. Environmental Modelling and Software, 21, 4, 566-576. Stein, U., Alpert, P., 1993: Factor separation in numerical simulations. Journal of the Atmospheric Sciences 50 (14), 2107-2115. Steppeler, J., Doms, G., Schättler, U., Bitzer, H.W., Gassmann, A., Damrath, U., Gregoric, G., 2003: Mesogamma scale forecasts using the nonhydrostatic model LM. Meteorology and Atmospheric Physics, 82, 75-96. Stern, J.; Flemming, R., 2004: Formulation of criteria to be used for the determination of the accuracy of model calculations according to the requirements of the EU Directives for air quality – Examples using the chemical transport model REM-CALGRID, Freie Universität Berlin, Institut für Meteorologie. Thunis, R., Galmarini, S., Martilli, A., Clappier, A., Andronopoulos, S., Bartzis, J., Vlachogianni, M., deRidder, K., Moussiopoulos, N., Sahm, P., Almbauer, R., Sturm, P., Oettl, D., Dierer, S., and Schlünzen, K.H., 2003: MESOCOM: An inter-comparison exercise of mesoscale flow models applied to an ideal case simulation. Atmos. Environ., 37, 363 - 382. Trukenmuller, A.; Grawe, D. and Schlunzen, K.H., 2004: A model system for the assessment of ambient air conforming to EC Directives. Meteo. Zeist. 13, 387-394. US EPA, 1991: Guideline for regulatory application of the Urban Airshed Model. EPA-450/4-91-013. United States Environmental Protection Agency, Research Triangle Park, NC 27711, July 1991. US EPA, 1996: Compilation of Photochemical Models’ Performance Statistics for 11/94 Ozone SIP Applications. EPA-454/R-96-004. US EPA, Office of Air Quality Planning and Standards, Research Triangle Park, NC 27711, 156 pp. web page: http://nepis.epa.gov/pubtitle.htm. van Loon, M., 2006: Evaluation of long-term ozone simulations from seven regional air quality models and their ensemble average. Submitted to Atmospheric Environment. van Loon, M., Roemer, M.G.M. and Builtjes, P.J.H., 2004: Model inter-comparison in the framework of the review of the Unified EMEP model. Report prepared by TNO Environment, Energy and Process Innovation, Apeldoorn, The Netherlands (forthcoming) (http://www.mep.tno.nl/EMEP_review). Vautard, R., 2006. Is regional air quality model diversity representative of uncertainty for ozone simulation? Submitted to Geophysical Research Letters. Vautard, R., Honore, C., Beekmann, M., Rouil, L., 2005: Simulation of ozone during the August 2003 heat wave and emission control scenarios, Atmospheric Environment, 39, no16, pp. 2957-2967. VDI, 2005: Environmental meteorology – Prognostic microscale wind field models – evaluation for flow around buildings and obstacles. VDI Guideline 3783, Part 9, VDI Düsseldorf, Germany. VDI, 2008: Environmental meteorology – Prognostic mesoscale wind field models – Evaluation for dynamically and thermally induced flow fields. VDI Guideline 3783, Part 7, VDI Düsseldorf, Germany, in preparation. Warner, S., Platt, N. and Haegy, J.F., 2004: Applications of user–oriented measure of effectiveness to transport and dispersion model predictions of the European tracer experiment. Atmospheric Environment, 38, 6789-6801. Warner, S., Platt, N. and Haegy, J.F., 2005: Comparison of transport and dispersion model predictions of the European tracer experiment - user oriented measures of effectiveness. Atmospheric Environment, 39, 4425-4437. Yegnan, A., Williamson, D.G., Graettinger, A.J., 2002: Uncertainty analysis in air dispersion modelling. Environmental modelling and Software, 17 (7), 639-649. Yu, Y., Sokhi, R.S. and Middleton, D.R., 2006: Estimating contributions of Agency-regulated sources to secondary pollutants using CMAQ and NAME III models. Report for the UK Environment Agency. Yu, Y., Sokhi, R. S., Kitwiroon, N., Middleton, D. R. and Fisher, B., 2008: Performance characteristics of MM5-SMOKE-CMAQ for a summer photochemical episode in Southeast England, United Kingdom. Atmospheric Environment, 42, 4870-4883. http://dx.doi.org/10.1016/j.atmosenv.2008.02.051 Zhang, F., Bei, N., Nielsen-Gammon, J.W., Li, G., Zhang, R:, Stuart, A., Aksoy, A., 2007: Impacts of meteorological uncertainties on ozone pollution predictability estimated trough meteorological and photochemical ensemble forecasts. J.Geophys. Res. 112, D04304, doi:10.1029/2006JD007429 Zhong, S.Y. and Fast, J., 2003: An evaluation of the MM5, RAMS, and Meso-Eta models at subkilometer resolution using VTMX field campaign data in the Salt Lake Valley. Monthly Weather Review, 131 (7): 1301-1322. Zimmermann, H., 1995: Field Phase Report of the TRACT Field Measurement Campaign, EUROTRAC International Scientific Secretary, Garmisch-Partenkirchen, Germany. 64

ANNEX A Glossary of Terms Accuracy*) Deviation from result*) Evaluation*)†)

Macroscale model Mesoscale model Model*)

Microscale model Programme*) Model calculation*) Validation*), †)

Verification*)

‡)

Precision

Repeatability* Representativeness Reproducibility*

* † ‡

Extent of agreement between a value to be determined and a reference value Difference between a model result and the reference value Assessment of the response of a model and the associated programmes with respect to its performance characteristics, including comparison with measured data, probilistic and statistical analysis, process analysis and sensitivity analysis. Typically, comparisons are made against a set of standards. Hemispheric and global models Regional model covering domain scales of the order of a few 100km to few 1000km. Description of atmospheric and associated processes according to physical principles, using fundamental physical equations, assumptions, approximations, and parametrizations. The equation systems of the models described are solved by means of numerical methods with specified boundary and initial values. Model which resolves the canopy layer and obstacles Translation of the model on a computer using a computer programming language Use of the programme for a specific application Testing the extent to (or the accuracy with) which a programme describes, within the formal scope of the model, the phenomena for which it is developed. This can include a comparison with measured data. To validate a model, scope specific criteria need to be defined. Act of confirming that the model exhibits the specified behaviour in terms of output results and process analysis for a given case. A model can not be verified in general (all possible cases) but might be verifiable for single cases. Extend of agreement between independent measurements or independent model results for the same situation. Extend of agreement of results of two model experiments performed under the same conditions (same computer, compiler, person, input). Range of validity of a measurement over space and time A measure to indicate how well the model results can be reproduced e.g. by another person by following a procedure defining how to perform the model experiment to be reproduced.

definition based on VDI (2005) and clarified for the mesoscale Clarifications used in the definition given by Schlünzen (1997) used here definition taken as an abridged version from DIN ISO 6879 and adapted for mesoscale models

65

ANNEX B ENTRIES TO THE WEB BASED MODEL INVENTORY The tables summarize all entries into the web based model inventory* (Table 21, Table 22, Table 23) for the different scales. Table 21. Entries in the web based model inventory – microscale models (status 05.02.2007) meteorology

microscale - COST 732

ADREA Chensi M-SYS MERCURE Meso-NH MIMO MITRAS RCG STAR-CD VADIS

transport or chemistry & transport ADREA AERMOD Chensi M-SYS MERCURE Meso-NH MICTM MIMO MITRAS NAME RCG STAR-CD VADIS

meteorology & chemistry & transport

M-SYS Meso-NH MIMO RCG

Table 22. Entries in the web based model inventory – global scale models ( status 05.02.2007) meteorology

Macroscale

*

GME Hirlam UM

transport or chemistry & transport CAM-CHEM CHIMERE CHIMERE (ARPA-IT) EMEP FLEXPART FLEXPART/A GEOS-Chem GOCART IMPACT LPDM MATCH MOCAGE SILAM


66


Table 23. Entries in the web based model inventory – mesoscale models ( status 24.10.2007) transport or chemistry & transport

meteorology

mesoscale - COST728

*

ADREA ALADIN/A ALADIN/PL ARPS BOLCHEM CALMET/CALPUFF CALMET/CAMx CLM COSMO-2 COSMO-7 ENVIRO-HIRLAM GESIMA GME Hirlam LAMI* COSMO_IT LM-MUSCAT LME LME_MH M-SYS MC2-AQ MCCM MEMO (UoT-GR) MEMO (UoA-PT) MERCURE Meso-NH METRAS MM5 (met.no) MM5 (UoA-GR) MM5 (UoA-PT) MM5 (UoH-UK) MM5(GKSS) NHHIRLAM RAMS RCG SAIMM TAPM UM WRF-ARW WRF/Chem

ADREA AERMOD ALADIN-CAMx AURORA BOLCHEM CAC CALGRID CALMET/CALPUFF CALMET/CAMx CAMx CHIMERE CHIMERE (ARPA-IT) CMAQ CMAQ(GKSS) EMEP ENVIRO-HIRLAM EPISODE EURAD-IM FARM FLEXPART FLEXPART V6.4 FLEXPART/A LM-MUSCAT LOTOS-EUROS LPDM M-SYS MARS (UoT-GR) MARS (UoA-PT) MATCH MC2-AQ MCCM MECTM MEMO (UoT-GR) MERCURE Meso-NH MOCAGE MUSE NAME OFIS RCG SILAM TAPM TCAM TREX WRF/Chem

LAMI is the old name of COSMO-IT

67


BOLCHEM CALMET/CALPUFF CALMET/CAMx LM-MUSCAT M-SYS MC2-AQ MCCM Meso-NH RCG TAPM WRF/Chem

ANNEX C ESTIMATES FOR MEASUREMENT AND MODEL UNCERTAINTY Table 24. Estimates by Heinke Schlünzen (Meteorological Institute, University Hamburg) and Sylvia Bohnenstengel (Max-Planck-Institute für Meteorologie, Hamburg)

Variable

ozone concentration at surface NOx concentration at surface NO/NO2 speciation at source VOC concentration at surface Top ozone concentration Top NOx concentration Top VOC concentration Side ozone concentration Side NOx concentration Side VOC concentration Major point NOx emissions Major point VOC emissions Wind speed Wind direction Ambient temperature Dewpoint temperature H2O concentration (as RH)

Uncertainty estimate for the variable (includes 95 % of measurement data) (initial and comparison data) ± 15%* ± 15%* ± 10% ± 30% ± 25% ± 25% ± 50% ± 15% ± 15% ± 30% Hourly base: factor of 5 Hourly base: factor of 8 ± 1 m/s; 15% ± 30 ° ± 1.5 K ± 1.5 K RH absolute ±5%, ± 10% RH relative

Vertical diffusivity (8AM-6PM; < 1000 MAGL) Vertical diffusivity (all other times and heights) Rainfall amount‡ Cloud cover (tenths) Cloud liquid water content Surface pressure Area biogenic NOx emission Area biogenic VOC emission Area mobile NOx emission Area mobile VOC emission Area low point VOC emission Other area NOx emissions Other area VOC emissions

Estimate for the variable (includes 95 % of data) from model results

Factor of 2† Factor of 2† Factor of 3† Factor of 2 Factor of 2 Factor of 2 Factor of 2† Factor of 2† Factor of 3† Factor of 2† Factor of 3† ± 3 m/s ± 5 m/s ± 60 ° ±4K ±4K RH absolute ±10%, ± 20% RH relative Factor of 10 Factor of 5

daily value factor of 3.3, weekly value: factor of 2.1, monthly value factor of 1.4, annual values: 2%§ ± 30% This is upper air: Factor 10000 ± 170 Pa Factor 2**

Factor 2** Factor 3* Factor 3* Factor 3* Factor 3* Factor 3*

daily value factor of 4, weekly value: factor of 3, monthly value factor of 1.5, annual values: 5% ± 80% This is upper air: Factor 10000 ± 5 hPa Factor 3 Factor 3

*

This includes uncertainties resulting from lack in representativeness of the sites (but traffic sites are left out)

†

Within the urban canopy layer uncertainties will be larger

‡

Assumed is here that single station data are used for comparison. The area representative model results are also thought to be compared with single station measurements (not radar or satellite data) § Bohnenstengel & Schluenzen (2007): A locality index to classify meteorological situations with respect to precipitation. Submitted to Journal of Applied Meteorology ** Concerns the principal relation of biogenic emission with respect to vegetation (does not include temperature/humidity /radiation errors)

68

NO2, HCHOr, HCHOs, ALDs, and O3-O1 Photolysis rates

Factor 3* Factor 3†

Factor 2*

Table 25. Estimates by Ranjeet Sokhi, University of Hertfordshire

Variable

Initial ozone concentration Initial NOx concentration Initial VOC concentration Top ozone concentration Top NOx concentration Top VOC concentration Side ozone concentration Side NOx concentration Side VOC concentration Major point NOx emissions Major point VOC emissions Wind speed Wind direction Ambient temperature H2O concentration (as RH) Vertical diffusivity (8AM-6PM; < 1000 MAGL) Vertical diffusivity (all other times and heights) Rainfall amount Cloud cover (tenths) Cloud liquid water content Area biogenic NOx emission Area biogenic VOC emission Area mobile NOx emission Area mobile VOC emission Area low point VOC emission Other area NOx emissions Other area VOC emissions NO2, HCHOr, HCHOs, ALDs, and O3-O1 Photolysis rates

Uncertainty estimate for the variable (includes 95 % of data)

Uncertainty range (includes 95 % of data)

±10% ±10o ±2 K

±8% ±10% -15%

Factor of 3 Factor of 5 Factor of 5 Factor of 1.5 (50 %) Factor of 3 Factor of 3 Factor of 1.5 Factor of 3 Factor of 3 Factor of 1.5 Factor of 1.5 Factor of 1.5 ± 40 º ±3K 30 %

0.549 0.805 0.805 0.203 0.549 0.549 0.203 0.549 0.549 0.203 0.203 0.203 20 º (normal) 1.5 K (normal) 15.0 % (normal)

Factor of 1.3 (30 %)

0.131

Factor of 3

0.549

Factor of 2 30 % Factor of 2 Factor of 2 Factor of 2 Factor of 2 Factor of 2 Factor of 2 Factor of 2 Factor of 2

0.347 15 % (normal) 0.347 0.347 0.347 0.347 0.347 0.347 0.347 0.347

Factor of 2

0.347

Factor of 2

0.347 0.10 to 0.55 Median 0.30, Mode 0.46

Factor of 1.01 to 3.02 Median 1.80, Mode 2.5

CB-4 reactions 1-94 *

assuming hourly emission data.

†

including errors in cloudiness and upper-air concentrations.

Sigma (log-normal unless noted)

69

ANNEX D STATISTICAL MEASURES FOR METEOROLOGICAL PARAMETERS The statistical measures most commonly used with numerical weather prediction (NWP) models are discussed below in relation to meteorological parameters from NWP simulations. For some parameters the error inevitably increases with the measured value. The normalization of the measure is one way to avoid that the large errors related to a few extreme observations dominate the error measure. In the following, predicted values are shortened by Pi and observed values by Oi . For each single site and each time (i). All in all we consider a dataset of N values. D-1 ERROR (EI) The difference (Ei) between simulation and observation is calculated for each site and each time by Ei = Pi − Oi (3) This error should be 0.0 for an ideal forecast. D-2 AVERAGE VALUES The average value of measurements O and model results P are calculated as given in Eq. (4) and (5), respectively.

O=

P=

1 N ∑ Oi N i=1

(4)

N

1 ∑ Pi N i =1

(5)

The two averages should be the same for an ideal forecast. D-3 STANDARD DEVIATIONS The standard deviation of measurements and model results are calculated as given in Eq. (6) and (7), respectively.

(

σo =

1 N ∑ Oi − O N i =1

σP =

1 N ∑ Pi − P N i =1

(

2

)

(6)

2

)

(7)

The two standard deviations should be the same for an ideal forecast. D-4 BIAS The average difference (BIAS) of all Ei for each forecast length is calculated as

BIAS =

1 N ∑ Ei = P − O N i =1

(8)

The BIAS should be zero for an ideal forecast. BIAS gives a measure of the sign of the error of the simulations. This is particularly useful for model developers, as it points to weak parts of the model. In addition, BIAS is easily corrected with statistical post-processing. It is important to note that BIAS can vary in time and space. Breaking up the data into time of the year and the day, and into single locations gives more information and better possibilities for correction. Moreover, the BIAS gives information of systematic model errors for particular observational values, e.g. low wind speed cases if the data are sorted when 70

calculating the BIAS. D-5 STANDARD DEVIATION OF ERROR (STDE) The standard deviation of error (STDE) evaluates the non-systematic part of the error and is a measure of model predictability.

STDE =

[(

) (

1 N ∑ Pi − P − Oi − O N i =1

)]

2

(9)

The STDE should be zero for an ideal forecast. STDE usually increases with forecast length. When STDE has the same magnitude throughout the model simulation it can be interpreted as if the error is saturated initially due to model deficiencies. D-6 SKILL VARIANCE (SKVAR) The skill variance (SKVAR) evaluates the ability of the model in reproducing the variance of the observed data. It is also sometimes named normalized standard deviation.

SKVAR =

σP σO

(10)

The SKVAR should be one for an ideal forecast. D-7 ROOT MEAN SQUARE ERROR (RMSE) The total error (RMSE) results from STDE and BIAS:

RMSE =

N 1 N (Pi − Oi )2 = 1 ∑ Ei 2 = BIAS 2 + STDE 2 ∑ N i =1 N i =1

(11)

Root mean square error (RMSE) is simply a combination of BIAS and STDE and expresses the total model error. The RMSE should be zero for an ideal forecast. It is a useful measure for a comparison of e.g. two different models. One should bear in mind that the squaring implies that few large errors have relatively more impact on the measure than many small errors. D-8 CORRELATION COEFFICIENT (r) The correlation coefficient (r) is very similar to STDE. However r is dimensionless while STDE has the dimension of the measured parameter. There is a caution on using r on very long time series, e.g. years. The annual amplitude of the temperature is usually so large that r will be dominated by this large scale structure. This hides the smaller scale errors like diurnal cycles. Caution should also be given to r, because a large systematic error (BIAS) will not be expressed. It is calculated as:

(

)(

⎡1 N ⎢ N ∑ Oi − O Pi − P r = ⎢ i =1 σ O σP ⎢ ⎢⎣

)⎤⎥ ⎥ ⎥ ⎥⎦

(12)

The correlation coefficient r should be one for an ideal forecast. D-9 HIT RATE (H) Hit rate (H) is defined as the fraction of the total simulation data that have a value which is inside an acceptable range DA of the simultaneous observations. H is particularly useful as an overall measure to evaluate model performance. H is one of the few measures that do not assume a Gaussian distribution of the errors. The hit rate can be interpreted as the probability of detection (POD).

71

H=

1 N ∑ ni N i=1

with

⎧1 for Ei ≤ DA ni = ⎨ ⎩ 0 else

(13)

DA is the desired accuracy (Table 26). H allows the comparison of model results for quite different meteorological situations. Table 26 gives guidance on ranges that can be used for categorizing values of some weather parameters. The hit rate H should be 100% for an ideal forecast. Table 26. Desired accuracy DA (values taken from Cox et al., 1998* Wind Temperature Dew point Wind speed direction (°C) depression (°C) (m s-1) Desired accuracy ±2 ±2 ± 1 for ff< 10m s-1 ± 30° DA ± 2.5 for ff> 10m s-1 Variable

Pressure (hPa) ± 1.7

Precipitation categories are suggested to have a desired accuracy dependent on the precipitation amount (0-1mm/day, 1-5mm/day, 5-10mm/day, 10-25 mm/day, above 25 mm/day). A similar approach can be used for wind speed. D-10 HIT RATIO (HR) For precipitation the hit ratio HR is also a very good measure. It describes the model's capability of simulating extreme events (no rain or rain). Using the contingency table for a-d (Table 27), HR can be calculated using eq. 14. Optimum value for HR is 1.

HR =

a+d a+b

(14)

Table 27. Contingency table (yes means value inside interval DA) observed event yes observed event no forecast event yes A B forecast event no C D

For calculation of an overall HR, e.g. hit in all categories of the event, the accuracy ranges used in the contingency tables are suggested to be constant for parameters as temperature, dew point temperature and pressure. For wind speed and precipitation the accuracy ranges should be larger for large amounts/speed as the variability in the observed values increase with increasing amounts/speed. For precipitation the following ranges are suggested: 0 to 1 mm/day, 1 to 5 mm/ day, 5 to 10 mm/day, 10 to 25 mm/day, above 25 mm/day. For wind the suggested ranges are 0 to 2 m/s, 2 to 5 m/s, 5 to 10 m/s, 10 to 20 m/s, above 20 m/s. The optimum vale for HR is 1. D-11 FALSE ALARM RATIO (FAR) The false alarm ratio (FAR) should also be calculated to ensure that the frequency of extreme events is not over-predicted. Using the classification given in Table 27, FAR can be calculated:

FAR =

b a+b

(15)

Values for FAR range from 0 to 1, optimal score is 0.

____________ *

Values are also used in Schlünzen and Katzfey (2003), Trukenmüller et al. (2004), Schlünzen and Meyer (2006)

72

An alternative definition is:

FAR1 =

b b+d

(16)

Again, the optimum values is 0.

D-12 DIRECTION WEIGHTED WIND ERROR (DIST) The direction weighted wind error DIST takes into account both wind intensity and direction. With ui and vi being the wind vector components (either measured (Oi) or predicted (Pi)) at a specific site and time, and n being the total number of observations, DIST is defined as:

DIST( s ) =

1 N ((uP i -u O i ) 2 + ( v P i -v O i ) 2 ) ∑ N i=1

(17)

The DIST should be zero for an ideal forecast. D-13

MEAN ABSOLUTE ERROR (MAE)

MAE =

1 N ∑ Ei N i =1

(18)

takes only positive values and is less sensitive to large errors than is the root mean square error. The MAE should be zero for an ideal forecast. D-14 PROBABILITY OF DETECTION (POD) In order to evaluate the model’s ability to forecast a particular event, e.g. rain(event yes) / no rain (event no) the probability of detection POD (Nurmi, 1994) is commonly used. Using the classification given in Table 27, POD is defined as:

POD =

a a+c

(19)

The values of POD lies between 0 and 1, optimal score is 1. POD can be interpreted as the number of correct alarms in relation to number of occurring events. Combining POD and FAR gives

POD + FAR

⎧> 1 systematic overestima tion ⎪ ⎨= 1 no bias ⎪< 1 systemaitc underestim atiuon ⎩

(20)

D-15 HANSSEN-KUIPERS SKILL SCORE (KSS) POD and FAR1 are combined to give the Hanssen-Kuipers skill score (KSS):

KSS = POD − FAR1 KSS range is -1 to 1, optimal score is 1.

73

(21)

ANNEX E STATISTICAL MEASURES FOR CONCENTRATIONS This section includes those statistical measures that are frequently used for evaluating concentration forecasts. Some of the parameters are already defined in Annex D and the formulas are only repeated here for clarity. As in Annex D, predicted values are shortened by Pi and observed values by Oi and N values are considered. Table 28. Quality indicators for air quality model performance evaluation Parameter Average observed value Average modelled value Standard deviation of measurements Standard deviation of model results Average normalized absolute BIAS Mean normalised BIAS

Mean normalised error

Standard deviation of error Fractional bias

Geometric mean bias

Geometric variance

Skill variance Root mean square error

Formula

Ideal value

(4)

1 N ∑ Oi N i=1 1 N P = ∑ Pi N i =1 O=

Same

(

1 N ∑ Oi − O N i =1

σP =

1 N ∑ Pi − P N i =1

(

(6)

2

)

2

)

⎛P −O⎞ ⎟ ANB = ⎜⎜ ⎟ ⎝ O ⎠ 1 N ⎛ P − Oi ⎞ ⎟ MNB = ∑ ⎜⎜ i N i=1 ⎝ O i ⎟⎠

1 N ⎛ Pi − O i ∑⎜ N i=1 ⎜⎝ O i

STDE =

FB =

[(

⎞ ⎟ ⎟ ⎠

) (

1 N ∑ Pi − P − Oi − O N i =1

)]

2

(P − O) 0.5(P + O )

1 N ⎛1 N ⎞ MG = exp⎜ ∑ ln Pi − ∑ ln O i ⎟ N i=1 ⎝ N i=1 ⎠ 2 ⎛⎛ 1 N 1 N ⎞ ⎞⎟ ⎜ VG = exp ⎜ ∑ ln Pi − ∑ ln O i ⎟ ⎜ ⎝ N i=1 N i=1 ⎠ ⎟⎠ ⎝ σ SKVAR = P σO RMSE =

(5)

as O

σo =

MNE =

Eq.

N 1 N (Pi − Oi )2 = 1 ∑ Ei2 ∑ N i =1 N i =1

Same as σ O

(7)

0.0

(22)

0.0

(23)

0.0

(24)

0.0

(9)

0.0

(25)

1.0

(26)

1.0

(27)

1.0

(10)

0.0

(11)

0.0

(28)

= BIAS 2 + STDE 2 Normalized mean square error

1 N (Pi − Oi )2 ∑ N NMSE = i =1 PO 74

Parameter Correlation coefficient

Coefficient of variation Fraction of predictions within a factor of two of observations

Hit rate

Formula

(

)(

⎡1 N ⎢ N ∑ O i − O Pi − P r = ⎢ i=1 σ O σP ⎢ ⎢⎣ STDE CV = O

)⎤⎥

Ideal value 1.0

Eq. (12)

0.0

(29)

1.0

(30)

1.0

(31)

1.0

(32)

0.0

(33)

0.0

(34)

⎥ ⎥ ⎥⎦

⎧ Pi ≤2 1 ⎪1 for 0.5 ≤ FAC2 = ∑ ni with n i = ⎨ Oi N i=1 ⎪0 else ⎩ N

HC =

1 N ∑ ni N i =1

⎧ Ei ≤ A or Ei ≤ DA ⎪1 for with ni = ⎨ Oi ⎪ 0 else ⎩ A desired relative accuracy DA minimum desired absolute accuracy Index of agreement

N

IOA = 1 − Unpaired peak concentration accuracy

∑ (P − O ) i =1

2

i

i

N

⎛⎜ P − P + O − O ⎞⎟ ∑ i i ⎠ i =1 ⎝ P − O max A u = max O max

2

Pmax , Omax are unpaired maxima Spatially-paired peak concentration accuracy

(no timing / spacing considered ) Pmax ,x − O max,x As = O max,x

Pmax,x , Omax,x are maxima paired in space (but not in time)

75

ANNEX F EVALUATION OF DIFFERENT WAVELENGTHS Given two curves, for example. wind direction, observed and simulated, one can calculate the standard deviation of the difference between them in the form of standard deviation of the error (STDE). When the simulated wind direction has a phase displacement (an error) of 60° compared to the observations, the STDE is approaching the one having the simulations as a straight line (Figure 16). Intuitively there is more information in the curve with correct amplitude which is displaced 60° than in a field without any variation. This shows that STDE is not always the best measure. Horizontal displacement of min/max values might occur as a result of physiographic properties which are not fully resolved in the numerical model. Doing a similar exercise as for STDE with correlation coefficient gives the same result. How to deal with this property of the statistical measures is not clear. Several attempts to suggest other measures have been made but still the traditional statistical measures are widely used. Comparing frequency distribution (or doing spectral analysis) from observations and models is one method to evaluate models but that doesn't give a measure of how the model captures the day to day observed values on a specific location. When using for example the hit rate it is at least ensured that simulated data are within an allowed difference to the measured values.

Figure 16. Standard deviation of error calculated for the difference between observations (solid) and simulations (broken) when the simulations have a phase displacement of 15° (top), 30° (second plot), 45° (third plot) and 60° (fourth plot) relative to the observations, and if the simulation is a straight line (lower)

76

ANNEX G DETAILED EVALUATION RESULTS FROM FUMAPEX (FP 5 PROJECT) A summary of the results from the various intercomparisons is provided here, separated in the episodic evaluation and the long-term evaluation. Episode Evaluation Results The main results for the episodes are: • • • • •

•

•

Poor forecast of meteorological inversions in most models: underpredicted inversion strength dominant in Northern European areas with inversioninduced episodes (Helsinki Dec 1995; Oslo Jan 2003); for all models: underpredicted inversion strength also in Helsinki spring dust episodes; for most models: underpredicted inversion strength in Po valley for inversion-induced winter episodes and for night-time inversions in summer ozone episodes. Overpredicted surface and 2 m temperatures in many models/cases for extreme inversion episodes. Underpredicted stability (key meteorological episode-predictor) for all models for cases of (very) stable stratification/inversions, exceeding vertical exchange. Overpredicted 10 m wind speeds in calms or low wind conditions for all models (episodepredictor especially for inversion-induced episodes, coincides with / leads to reduced inversion strength and stability. False wind direction forecasts, often in combination with inversions and overpredicted wind speeds may lead to erroneous temperature advection especially in regions with large temperature gradients: in mountain-valley systems like the Po valley or in coastal areas with frozen land / open sea in Scandinavian winters (e.g. Helsinki Dec 1995 for FMI HIRLAM, DWD COSMO_EU, CEAM RAMS, partly DMI HIRLAM; Oslo Jan 2003 for DWD COSMO_EU (and partly DNMI MM5)). Successful forecast of the episode-predictors and various wind field structures determining pollutant concentrations (e.g. rising maximum temperatures, drainage winds, timedependent convergence lines inland and at sea, combined sea breeze and upslope circulations with pollutant injections) for the Valencia ozone episode, for both participating models (CEAM RAMS, DWD COSMO_EU). A very complex wind situation with poor predictability seems to prevail for Bologna city (2002 episodes) due to the very variable occurrence and superposition of local-scale, mesoscale and synoptic scale influences between an Apennine valley-mountain circulation and a larger Po valley circulation. Both participating models (ARPA COSMO_IT, DMI HIRLAM) succeed in forecasting nocturnal drainage winds and mountain-valley circulations in the Apennines and sudden wind speed increases marking interruptions or the end of episodes for ARPA COSMO_IT.

Long-term Model Evaluation Results The meteorological services ARPA-SIM, DMI, DNMI, DWD and FMI participated in the longer-term statistical evaluation (usually 1 year) for 2 m temperatures (Figure 17), 10 m winds, and 2 m relative humidity performed to analyse a more representative data sample and place the episode results in the longer-term context. Especially the results for the 50 stations for COSMO_EU/COSMO_IT were also investigated in grouped categories: urban, suburban, rural, Po valley and Apennine mountains (Section 7.1.7). The different models perform well or poorly depending on: • Chosen station or group of stations. • Meteorological parameter and time of day (forecast hour). • Chosen statistical score (BIAS or RMSE). • Partly also on the season for the same parameter. Comparing the size of parameter deviations from observations during the episode with the 77

evaluated one year statistical scores for the models, some unusually poor model performance is detected for some of the episodes. This also points to these episodes showing some extreme behaviour of meteorological parameters. The episodes and models concerned are • • •

Helsinki Dec 1995 (all models). Oslo Jan 2003 (DNMI MM5, DMI HIRLAM, DWD COSMO_EU = all models participating). Bologna Jan (DMI HIRLAM, ARPA COSMO_IT = all models participating).

(a)

(b) LM all LAMI mount FMI-HIRLAM Tryv 3

4,5

LM urb LAMI val Blind 1 Trayv 10

LM sub Piet(sub) Blind 3 H10 Helsin

Lm rur Panig(rur) Blind 10 H10 Copenh

LM all LAMI mount FMI-HIRLAM Tryv 3

LM Blind Met(urb) Tryv 1

4,5

LM sub Piet(rur) Blind 3 H10 Helsin

Lm rur Panig(sub) Blind 10 H10 Copenh

LM Blind Met(urb) Tryv 1

4 T2m rmse (°C)

3,5

3,5

3 2,5

3

48

45

42

39

36

33

30

27

24

21

18

9 12 15

6

0

48

42 45

36 39

30 33

24 27

18 21

1,5

9 12 15

1 6

2

3

1,5

3

2,5

2

0

T2n rmse (°C)

4

LM urb LAMI val Blind 1 Tryv 10

forecast hour

forecast hour

Figure 17. Example statistics of model inter-comparison of 2 m temperature, RMSE summer (a), RMSE winter (b). If only one value is given in the position of forecast hours 22 to 27, it is valid as averaged over the whole 48 h forecast (for all DNMI results: only 24 forecast hours available) Piet(rur), Panig(sub) and Met(urb) are simulated with COSMO_IT, Blind 1,3,10 and Tryv 1,3,10 with DNMI MM5(1+3 km) and DNMI HIRLAM (H10=10 km), as are H10 Helsinki + Copenhagen

Model performance for episode forecasting seems to depend mainly on the model ability to forecast the specific meteorological episode features in sometimes complex locations and even for extreme meteorological conditions, and on the station representativeness and observation quality. The performance depends much less on the location being urban, suburban or rural.

78

ANNEX H DETAILS ON THE EVALUATION OF COSMO_IT FOR AIR QUALITY AND ASSESSMENT PURPOSES Meteorological parameters playing a key role in the Po Valley winter pollution episodes are temperature inversions and wind fields. Summer episodes are characterised by gradually increasing maximum temperature values and very weak winds, including local circulations. The key parameters evaluated during some pollution episodes are temperature inversion, temperature and relative humidity at 2 meter, surface energy budget, wind speed and direction at 10 meter, wind profiles, cloudiness and turbulent kinetic energy. The experimental data to evaluate the model, were collected during a micrometeorological field campaign carried out in San Pietro Capofiume, a rural site located in the Po Valley, 25 km from Bologna, during winter 2004/2005 and spring 2005. Solar and infrared radiation data have been collected by means of a radiometer; sensible heat flux, friction velocity and Monin-Obukhov length data have been collected by means of a sonic anemometer. The site is a flat grassland area surrounded by farmland. The data from this campaign have been compared with the routine analysis of COSMO_IT (LAMA) that uses data from standard meteorological stations for assimilation. At the same rural location soil moisture data have been collected by means of a TDR and compared with the model soil moisture. Soil moisture is a relevant parameter for air quality purposes. It modifies the partitioning of the heat fluxes at the surface between sensible and latent heat flux, which determines the temperature profile of the PBL and consequently its stability. In some chemical transport models (CTM), the soil moisture plays a key role in the calculation of the resuspension of the aerosols. The evaluation of the standard meteorological parameters (temperature, wind speed, wind direction and humidity) for the 48 h forecasts starting at 00 UTC was performed for a long term period and the 4 seasons (April 2003 - March 2004). The evaluation was separated for different areas using data (Table 29 - Table 31) from the following stations: • •

Bologna (Bonafè and Jonghen, 2006): 3 stations (1 rural, 1 urban, 1 suburban). Emilia Romagna (Bonafè and Jonghen, 2006): 32 temperature stations (28 in the Po Valley + 4 in the Apennine), 13 wind stations (12+1), 1 radiosounding station. Veneto, Piemonte and Emilia-Romagna (Pernigotti et al., 2005): 42 wind stations. On the Veneto test area the COSMO_IT analysis were also compared with the CALMET (Scire et al., 2000) wind fields (diagnostic fields based on 21 wind stations).

•

Table 29. Summary of the COSMO_IT results for the pollution episodes Parameters/features

Where

when

COSMO_IT results

Inversion

Po Valley

strongly underestimated

T2m and RH2m daily cycles

close to Bologna

summer

too smooth

T in the PBL

close to Bologna

summer

Underestimated

urban heat island

urban areas

not reproduced

wind at 10 m

Po Valley

more frequently over- than underestimated

wind canalization

valleys in the Apennine

night, summer

Underestimated, not very accurate

valleys in the Alps

night, summer

1.1 km: well reproduced, but often it lasts

valleys in the Alps

always underestimated

too long Cloudiness T2m

over the Alps

sometimes

different with different resolution

winter episode

7 km: 15-18°C underestimated km: 4-8°C underestimated (but prognostic T doesn’t change)

TKE

close to Bologna

winter episode

Increases with finer resolution

spatial variability of the vertical

close to Bologna

winter episode

increases with finer resolution

velocity

79

Table 30. Summary of the results of the evaluation of COSMO_IT analysis against observed data collected during special campaigns Parameters/features

When

friction velocity

COSMO_IT results mean daily course is reproduced

late afternoon, evening ∼0.2 m/s overestimation Monin-Obukhov length

overpredicts the occurrence of unstable conditions underpredicts the occurrence of stable conditions Evening

SHF

stabilization of the surface layer often occurs too late random errors

afternoon

~200 W/m2 instead of ~100 W/m2

Night

[-20, -50] W/m2 instead of [0, –10] W/m2

infrared radiation budget

when observed is [-80,0] W/m2, simulated is often [-120,-75] W/m2

visible radiation budget

good agreement between observations and simulation

soil moisture

good relation between observation and simulation

more than half of the errors are positive and very small systematic overprediction of ∼0.03 m3/m3

Table 31. The most relevant errors detected through the long-term evaluation Parameters T2m

Where Bologna urban area

When Night

COSMO_IT results strong underestimation (∼3°C)

evening, winter/spring/summer large RMSE (∼4°C) Summer/winter Apennine

RH2m

wind speed

spring

too strong daily cycle

Winter

very large RMSE (∼6°C)

Po Valley

Summer

too smooth daily cycle

all stations

spring/winter

large RMSE (>3°C)

Summer

RMSE grows with verification time

night, spring/summer

overestimation (>20%), large RMSE (∼30%)

Apennine

summer

overestimation (∼20%)

Po Valley

summer

too smooth daily cycle

all stations

summer

RMSE grows with verification time

morning, spring

large RMSE (>2 m/s)

Bologna urban area

Bologna urban area at the mouth of Reno Valley

overestimation (∼1m/s) summer

underestimation (∼1 m/s), large RMSE (∼2 m/s)

morning, spring

large RMSE (>2 m/s) large RMSE (>2 m/s)

Apennine

wind direction

afternoon, summer

underestimation (∼1 m/s)

night, autumn

underestimation (∼1 m/s)

Po Valley

winter

large RMSE (∼2.5 m/s)

autumn/winter

overestimation (∼0.5 m/s)

at the mouth of Reno Valley

night, autumn/winter/spring

very large RMSE (∼90°)

night

SSW winds poorly forecasted

all stations Apennine

large RMSE (∼70°) noon

NNE winds poorly forecasted

Po Valley Inversions

underestimation (∼2.5°C)

Eastern part of the Po Valley central part of the Po Valley

N wind frequency overestimated summer

strong underprediction

autumn/winter/spring

Underprediction

summer

Underprediction

Scores calculated are mean absolute error MAE, BIAS and RMSE, for wind direction mean absolute error, hit rate HR for 45° sectors and wind roses for the wind speed classes

overview of tools and methods for meteorological and ... - WMO Library

overview of tools and methods for meteorological and ... - WMO Library

Suggest Documents

WORLD METEOROLOGICAL ORGANIZATION ... - WMO Library

world meteorological organization global atmosphere ... - WMO Library

Siting and Exposure of Meteorological Instruments - WMO

WORLD METEOROLOGICAL ORGANIZATION - WMO

WORLD METEOROLOGICAL ORGANIZATION - WMO

WORLD METEOROLOGICAL ORGANIZATION ... - WMO

metrological traceability for meteorological sensors illustrated ... - WMO

Guide for Urban Integrated Hydro-Meteorological, Climate and ... - WMO

Visual analytics of movement: an overview of methods, tools, and ...

An Overview of Formal Methods Tools and Techniques - Springer

evaluation of meteorological analyses for the radionuclide ... - WMO

World Meteorological Organization (WMO) - International Cloud Atlas

3. chapter three: meteorological, hydrological and climate ... - WMO

World Meteorological Organization (WMO) - Springer Link

World Meteorological Organization (WMO) - International Cloud Atlas

Methods and Tools for Performance Assurance of

methods and tools

Geographic Methods and Tools

Titre gnral - WMO Library

Aviation Hazards - WMO Library

airborne dust - WMO Library

cowclip 2011 - WMO Library

Untitled - WMO Library

Untitled - WMO Library