A Grid-enabled Regional-scale Ensemble ... - Semantic Scholar

3 downloads 325 Views 738KB Size Report
application on the Grid infrastructure provided by the SEE-GRID-SCI European research project. ..... standalone (non-ensemble) modes. ... The model codes were statically compiled on a User Interface (UI) gLite node using PGI Fortran. ..... [15] Kain, J.S., and J.M. Fritsch, 1993: Convective parameterization for mesoscale ...
A Grid-enabled Regional-scale Ensemble Forecasting System in the Mediterranean area

K. Lagouvardos1, E. Floros2 and V. Kotroni1 1

National Observatory of Athens, Institute of Environmental Research and Sustainable Development, Athens, Greece 2

Greek Research and Technology Network S.A.

Contact Information Dr. Kostas Lagouvardos, Dr Vassiliki Kotroni National Observatory of Athens, Institute of Environmental Research and Sustainable Development, Lofos Koufou, P. Penteli, 15236, Athens, Greece tel: 00-30-210-8109126, fax: 00-30-210-8103236, e-mail : [email protected], [email protected]

Evangelos Floros Greek Research and Technology Network S.A., Mesogion 56, 11527 Athens, Greece tel: +30 210 7474277, fax: +30 210 747449, email: [email protected]

1

Abstract Numerical weather prediction applications require considerable processing power and data storage resources, and thus can benefit from the offerings of grid technologies. In this paper we discuss the porting of such an application on the Grid infrastructure provided by the SEE-GRID-SCI European research project. This study explores the regional ensemble forecasting technique as a means to contribute to the improvement of weather forecasting in the Mediterranean. Indeed, a regional ensemble system was built, based on the use of two limited area models that are run using a multitude on initial and boundary conditions over the Mediterranean. This largescale application involves the use of large infrastructures that are not easily available and thus its porting to the Grid at production level has been proved to be a challenging endeavour. The application workflow, the operational and grid infrastructure requirements, and also the problems encountered are presented. Multiple requirements mainly related to the characteristics of the implemented workflow, the model characteristics and the limitations imposed by the grid infrastructure itself, had to be satisfied. The paper concludes with a recent result of the implemented application. Indeed the regional scale ensemble forecasting system provided useful probabilistic forecasts for a severe thunderstorm case that affected Central Europe during the summer 2009 (with damages, casualties and several injuries).

Keywords: ensemble forecasting, Grid Computing, weather models

2

1 Introduction During the last 30 years, weather forecasting is based on the use of numerical weather prediction (NWP) models that are able to perform all the necessary calculations that describe/predict the major atmospheric processes. The NWP skill is inherently limited for a number of reasons that include errors in the initial conditions, in the model numerical schemes, and simplifications and deficiencies in the representation of physical processes. On top of that, one common problem in weather forecasting derives from the uncertainty related to the chaotic behaviour of the atmosphere. Indeed, Lorentz [24][25] stated that: “even with perfect models and perfect observations, the chaotic nature of the atmosphere would impose a finite limit of about two weeks to the predictability of the weather”, starting thus the theory of chaos. As the meteorological forecast skill decreases with time, Epstein [10] and Leith [23] suggested that in addition to performing “deterministic” forecasts, “stochastic” forecasts that provide an estimate of the prediction skill should be performed, and the computationally feasible approach towards this aim is to perform “ensemble forecasts”. Ensemble forecasting was initially performed by introducing perturbations in the initial conditions or in the models themselves. Ensemble forecasting has three main goals: (a) to provide an ensemble average forecast that is more accurate than the individual forecasts as the components of the forecasts that are most uncertain tend to be averaged out, (b) to provide to the forecasters an estimation of the reliability of the forecasts and (c) to provide a quantitative basis for probabilistic forecasting [16]. The approach of ensemble forecasting has been operationally implemented at the European Centre for Mediumrange Weather Forecasting (ECMWF) and at the National Centers for Environmental Prediction (NCEP) at the beginning of 1990’s [31] [28]. In the aforementioned meteorological centres the global ensemble consists in running the same model but with perturbed initial conditions that replicate the statistical uncertainty in the initial conditions. Since then, and mainly since 2000, other approaches to ensemble forecasting have been also followed that use the same initial conditions, but running different models (multi-model ensemble) in order to account for the model imperfections and the uncertainties about the model deficiencies. Further, the ensemble average of operational global forecasts from different operational centres has also shown the considerable skill and utility of such ensemble system approaches for short range ensembles of regional models [14]. This latter method was further developed and applied by [19]; it was called multi-model statistical super-ensemble, that does not simply average the model forecasts but applies a weighted average to model forecasts where individual biases of member models are collectively removed. At a regional scale, the implemented ensemble systems are quite limited. Among them the COSMO-LEPS project has started in November 2002 on the European Centre for Medium Range Weather Forecasting (ECMWF) computer system under the auspices of COSMO (COnsortium for Small-scale MOdeling, http://www.cosmo-model.org). The COSMO-LEPS is made up of 16 integrations of the COSMO model, which is nested on selected members of the ECMWF ensemble prediction system, and runs on a daily basis using a 10 km grid spacing [26]. In Spain, the Spanish Meteorological Service runs daily experimentally a regional scale ensemble prediction system (AEMET-SREPS) that consists of 20 members that are constructed by integrating 5 local area models, at 25 km horizontal resolution. Each model uses 4 different initial and boundary conditions,

3

provided by 4 global models [11]. Further, there have been case study applications where a number of high resolution NWP models have been run in order to assess the uncertainty of the forecasts, mainly of the quantitative precipitation forecasts, and how this uncertainty propagates into hydrological simulations and discharge predictions [8] Operational weather forecasting is routinely performed at the National Observatory of Athens (NOA) since 1999, based on the use of the BOLAM model [20], and also since 2001 based on the use of MM5 model [18]. With the aim to contribute to the improvement of the forecasts in the Mediterranean, the regional ensemble forecasting technique should be explored together with other techniques. For that reason, in the frame of this study, a regional ensemble system was built over the Mediterranean, based on the use of two limited area models that are run using a multitude on initial and boundary conditions provided by NCEP. This activity needs a large infrastructure that is not easily available at medium-scale research centres such as NOA. For that reason, the Grid infrastructure is explored for its ability to support the high CPU and storage needs of such a regional ensemble forecasting system. The development of the Grid infrastructure introduced innovative ways to share computing and storage resources that are geographically distributed by establishing a global resource management architecture [12]. Such large scale international Grid infrastructures have been deployed for instance in the context of the Enabling Grids for EsciencE (EGEE) [21] and the South Eastern Europe Grid eInfrastructure for regional eScience (SEE-GRIDSCI, http://www.see-grid-sci.eu) projects. Both are utilizing the gLite grid middleware [22] in order to enable access to a large number of computing clusters all over Europe. For example, currently SEE-GRID-SCI consists of 35 sites, providing a rough total of 2200 CPU cores and 200 TB of storage, spread among 16 countries in the South-East Europe region. A large number of applications are already running on Grid infrastructures. From these applications few are ready for large-scale production: The large Hadron Collider experiments at CERN, like the ATLAS collaboration [7], have been the first to test a large data production system on Grid infrastructures [2]. WISDOM is another international initiative to enable a virtual screening pipeline on a Grid infrastructure. NWP is a typical example of an application that requires large amounts of processing power and therefore can benefit tremendously from the offerings of grid technologies and the aforementioned infrastructures. Meteorological models are usually data-parallel applications that apply domain decomposition methods to accelerate the calculations of weather forecasts over specific geographical regions. As NWP applications are highly demanding, there are a large number of widely used NWP models that have been ported to the grid during the last few years: BRAMS and WRF in the frame of EELA2 project (E-science grid facility for Europe and Latin America - http://www.eu-eela.eu), WRF (both ARW and NMM cores), ETA/NCEP in the frame of SEEGRID-SCI project (http://www.see-grid-sci.eu), among others. Ensemble forecasting applications are even more demanding since they require the concurrent execution of parallel applications thus requiring both a large number of CPUs for execution and extended storage for storing the initial data of the simulation and the produced results. Grid infrastructures, as described previously, provide the appropriate development and production environment to satisfy the requirements imposed by weather ensemble forecasting applications. The work presented in this paper is being conducted in the context of the SEE-GRID-SCI project exploiting the relevant grid infrastructure.

4

The rest of this paper is organized as follows. Section 2 is devoted to the presentation of the NWP models used for the regional scale ensemble forecast system and their setup. Section 3 presents the implementation of this system in the Grid infrastructure. In section 4 some examples of possible output results of the ensemble system and an example of a case-study are shown, while the prospects of this work are discussed in the last section.

2 The Regional Ensemble Forecasting System The Regional-scale Ensemble Forecasting System (REFS hereafter), discussed in this paper, is based on the use of two NWP limited-area models: BOLAM and MM5. As already mentioned, these models are used at NOA to provide operational “deterministic” forecasts. The same two models are used for the ensemble system. In the following a brief presentation of the models is given as well as of the setup of the REFS.

2.1 The MM5 model MM5 model (Version 3) is a non-hydrostatic, primitive equation model using terrain-following coordinates [9]. Several physical parameterization schemes are available in the model for the boundary layer, the radiative transfer, the microphysics and the cumulus convection. In order to select a combination of microphysical and convective parameterization schemes that better reproduce wet processes, Kotroni and Lagouvardos [17] performed a comparison of various combinations of schemes for cases with important precipitation amounts over E. Mediterranean. This comparison showed that the combination of the Kain-Fritsch parameterization scheme [15] with the highly efficient and simplified microphysical scheme proposed by [29] provides the most accurate forecasts of accumulate precipitation for a grid spacing of 24 km. For that reason, both the operational chain of MM5 at NOA and the operational chain of the regional scale multi-analysis ensemble system based on the MM5 model, use the combination of these two schemes. Concerning the choice of the planetary boundary layer (PBL) scheme, the scheme proposed by [13] is used. This selection is based on the finding by our previous work [1], that verified the operational forecasts with MM5 model over Athens with three PBL schemes for the warm period 2002 and found that the MRF scheme produces the best forecasts. One grid has been defined and used at the regional ensemble system (Fig. 1a), consisting of 220x140 grid points with 24-km horizontal grid increment covering the major part of Europe, the Mediterranean and the northern African coast. In the vertical twenty-three unevenly spaced full sigma levels are selected (σ = 1., 0.99, 0.98, 0.96, 0.93, 0.89, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45, 0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.10, 0.05, 0.00). This grid coincides with the outer grid of the operational MM5 model chain used at NOA since 2001. The deterministic operational runs with MM5 model at NOA are performed at a Linux cluster using 12 dual core AMD-Opteron CPUs with 1GB of memory on each CPU.

2.2 The BOLAM model The second model used in the regional ensemble forecasting system is the BOLAM hydrostatic model. The version of BOLAM used in this study is based on previous versions of the model described in detail by [3][4][5][6]. The microphysical scheme implemented in BOLAM is coded mainly on the basis of the transformation process models described in [29]. The scheme includes five hydrometeor categories: cloud ice,

5

cloud water, rain, snow, and graupel. The sub-grid scale precipitation is treated in BOLAM following the KainFritsch convective parameterisation scheme [15]. In the version of Kain-Fritsch scheme implemented in BOLAM, an additional modification, regarding the delaying of downdraft occurrence [30], has been introduced. Namely, the first downdraft is started not before 30 min of initiation of new convection. The BOLAM model is used for operational weather forecasting at the National Observatory of Athens (NOA) since 1999. An evaluation of these operational forecasts in the Mediterranean region is given in [20] with very encouraging results concerning mainly precipitation forecasts. For the scope of this study one domain is defined consisting of 135x110 points with a 0.21 deg horizontal grid interval (~23 km) centred at 41°N latitude and 15°E longitude, covering the area of the Eastern Mediterranean (Fig. 1b). In the vertical, 30 levels are used from the surface up to the 10 hPa level. The vertical resolution is higher in the boundary layer and, to a lesser extent, at the average tropopause level. This domain coincides with the outer domain used for the deterministic operational model chain running at NOA since 1999. The deterministic operational runs with BOLAM model at NOA are performed at an AMD-Opteron Linux workstation, with 1GB of memory.

2.3 REFS setup For the REFS system deployment the initial and boundary conditions are provided from the Global Forecast System (GFS, NCEP, USA). At NCEP 20 perturbed forecasts are run 4 times daily out to 16 days at a horizontal resolution of ~105 km. The perturbations of the initial conditions for this global ensemble system are statistically independent and they are produced following the Ensemble Transform method with rescaling as described by [32]. For our application 10 members of the global GFS ensemble system, including gridded fields of the 10 perturbed initial conditions and the resulting forecasts at 6-hour intervals, are used to initialize each model (MM5 and BOLAM) and to perturb the boundary conditions. All simulations are initialised at 0000 UTC and the duration of the regional ensemble forecast simulations is 72 hours. So in total the REFS consists of 10x2=20 model forecasts that when made available are averaged in order to provide the REFS products. The REFS system presented in this paper differs from the already developed regional scale ensemble predictions systems. The COSMO-LEPS system [26] uses one meteorological model and initial and boundary conditions provided from a global ensemble system so it represents a regional scale multi-analysis ensemble system. The AEMET-SREPS [11] on the other hand uses 4 models (so it is a multi-model system) but the difference from the REFS system discussed in this paper is in the use of initial and boundary conditions, that in the case of AEMETSREPS are provided by the deterministic runs of 4 global models. In the following section all the technical aspects on the model scripting, the CPU and storage usage, and gridification procedures are discussed in detail.

3 REFS on the Grid Porting a production-level, large-scale application to the Grid is always a challenging endeavour. In the case of REFS we had to satisfy multiple requirements mainly related to the characteristics of the implemented workflow, the model characteristics and the limitations imposed by the grid infrastructure itself.

6

3.1 Application Workflow By its nature the application is an ensemble of independent jobs running in parallel. Each job implements a multi-step workflow depicted in Fig. 2. This workflow reflects the execution of the weather model given a set of initial and boundary conditions. The execution of a weather prediction application can be roughly split into four individual steps: 1.

Download of initial and boundary conditions

2.

Initial conditions pre-processing

3.

Execution of the weather model

4.

Post-processing of results and production of the final forecast

The initial and boundary conditions are provided by the National Centers for Environmental Protection (NCEP) of the US National Oceanic and Atmospheric Administration (NOAA). These data are made available to the public through the National Operational Model Archive and Distribution System (NOMADS) using the ftp2u service. NOMADS is a Web-based application that facilitates the preparation and delivery of atmospheric and oceanic data based on a set of user defined criteria. These criteria determine the terrestrial domain and the various atmospheric parameters that are required as input for a weather prediction model. More specifically, for the purpose of our forecast we retrieve the atmospheric conditions and the sea-surface temperature for a domain extending from 30W to 60E in longitude and from 20N to 70N in latitude. The above data are split among 14 different files that are transferred using FTP from NOMADS. A single file contains the sea surface temperature, and the remaining 13 the atmospheric conditions for a respective number of 6-hour time periods providing a total extend of initial conditions for a 72-hour forecast. The preparation and acquisition of the initial and boundary conditions poses the first challenge for the weather forecast workflow. First of all, the data are made available after a certain hour in the day (usually late in the morning) which means that no models can run before that time. Moreover, the NOMADS servers offering these data are frequently suffering from high system loads due to the fact that many research centres around the world are trying simultaneously to access the service and retrieve data. Thus, in many cases an attempt to prepare or retrieve the data might fail. Although NOMADS provides a set of redundant servers that can be used interchangeably when such high system loads are experienced, one may need to try many times before succeeding to access the service. In other cases some of the servers might be completely unreachable thus concentrating the load to one or two machines, rendering the service some days practically unreachable. For example, there have been cases that in a time period of a month, roughly 30% of download attempts would fail. This corresponds to more or less 10-days per month where ensemble forecasting data could not be fetched. Lately the availability of data has significantly increased due to the addition of a new NOMADS server offering high availability and this will be reflected in the near future statistics of the application. Once the initial conditions or boundary data are successfully retrieved they typically pass a first step of preprocessing in order to be transformed into the correct format required by the weather model. This pre-processing is done by one or more utility applications specifically developed for this purpose.

7

After the data have been processed they are ready to be used as input for the weather model. The execution of the model itself is the core of the workflow and typically the most time-consuming part of the application. For example in the case of BOLAM the model run part requires an average 30 minutes out of 40 minutes which is a typical time for the whole workflow execution (in the case of MM5 the same values are 60 minutes out of 80 minutes of total average workflow run). Note, that these metrics largely depend on the computing architecture and the initial input domain. Apparently, the model execution is also the most CPU and memory demanding part of the workflow, especially in the case that a parallel weather model is used like the MPI-enabled version of MM5. In this case, the model runs by utilizing multiple CPU cores in the same or different computing nodes, exploiting some implementation of the MPI standard (e.g. MPICH). In these cases the CPU core utilization, while the model is running, approaches 100%. The final step of the workflow is the post-processing of the data produced by the model run. This process can be as simple as packing the results and transferring them to another host for storage and further processing, or can comprise various filtering steps that will transform the output to other formats before again being packaged and stored. The final output can be as large as 200MB depending on the weather model that was used. In the case of ensemble forecasting the issue of storage space becomes significant since every member produces a different set of data that has to be combined together in order to generate the final ensemble forecast. For example in the case of the 10-member ensemble forecast a good 2GB of storage is required every day for storing intermediate and final results. Furthermore, if archiving of results is required storage space becomes a considerable issue for the application. In reality this workflow is also applicable for running operational deterministic model works in order to generate the regular daily forecast. By using multiple different initial conditions, and by running multiple copies of the same workflow, we implement the multi-analysis part of the ensemble forecasting. By further utilizing different weather models and running the same workflow, we achieve the multi-model ensemble notion of REFS. The gridification of the REFS application virtually implies the execution of the described workflow on a computational grid infrastructure. The porting of this workflow to the grid is rather straightforward: all steps can be run on the grid site. The workflow can be packed as a single job and submitted to the grid. In the case of ensemble forecasting the same workflow has to run many times in parallel. Thus the natural approach is to submit one job per ensemble member. The model run (step 3) will be the most demanding part in this porting process, since it implies that the model code has to be able to run in the platform supported by the grid infrastructure (e.g. Scientific Linux 4 or 5 in the case of EGEE and SEE-GRID-SCI). In the case of the parallel MPI-enabled model, this is not so trivial since in practice these infrastructures provide poor to average support for MPI. The first step in the gridification process was to understand this workflow based on the existing practices used in NOA to execute both deterministic and ensemble forecasts. The next step was to improve this process locally, and to abstract it in order to be able to run in different platforms and to be generic enough to support different weather models. Once the supporting scripts and applications were optimized along these lines the focus of the gridification process shifted to the models themselves, and their ability to be ported in the computing platforms provided by the production grid infrastructures.

8

3.2 Weather Models REFS exploits two different models to implement the multi-model ensemble forecasting, namely BOLAM and MM5. BOLAM is a serial application written in Fortran and poses modest requirements in terms of CPU power and memory consumption. MM5 is a parallel application, also developed with Fortran, that uses MPICH v1.2. This model is well known for being sensitive to various factors related to the execution environment. In particular, from our past experience using the model in different environments, we have observed that the model may not work properly for specific combinations of library versions, OS distributions and compiler options. Furthermore, it can be influenced negatively by other applications running in the same host. For what concerns main memory, the model requires at least 1GB per process in order to run properly. The above observations immediately pose some challenging requirements for the proper execution of MM5 on the grid. These challenges stem from the fact that a grid infrastructure in principle encompasses a generic-purpose environment not optimized for a specific application. Rather it is up to the application itself to adopt to the environment and try to find the best solution in order to properly exploit the offered resources.

3.3 Operational Requirements Running a production level application requires high Quality of Service (QoS) from all the subsystems and the middleware that facilitates the application execution. This impacts the application in multiple levels. For instance it sets some specific requirements from the application’s expected behaviour in case of system misbehaviour and the compensation steps taken when one of the software layers responsible for the application execution fail. Another important parameter is the interaction with the end user. In our case, the latter are NWP experts, familiar with running simulation codes on local computing clusters using Unix commands, shell scripting and OS system utilities. Our goal when developing REFS was to keep the same command line ‘look-n-feel’, hiding in the same time the complexities introduced by the grid infrastructure. This means that issues related to job, data and credential management should remain hidden from the user. Interaction with the grid whenever needed should be done through a small set of command line tools that would hide away the complexity of the grid middleware layers.

3.4 Grid infrastructure requirements The primary motivation for exploiting the Grid for the purposes of ensemble forecasting is in order to take advantage of the vast amounts of processing power, storage capacity and the ability it offers for data sharing and collaboration. One the other hand, it is true that current Grid implementations have been designed with highthroughput and not high-performance in mind. This means that real-time, interactive applications, or applications with quick turnaround times or strict deadline requirements, do not fit very well the “one-size fits all”, batchprocessing model of the Grid. The former relates to the issue of the software tools made available to end-user applications. Most scientific grids are offering only open-source tools, supporting software and libraries. Commercial and licensed software usually requires special arrangements in order to be made available to grid users. In our case, both BOLAM and MM5 require commercial compilers (either Intel Fortran or Portland Group (PGI) Fortran) in order to compile and run properly.

9

The table below summarizes the technical requirements of BOLAM and MM5 from the underline Grid infrastructure. Table 1. BOLAM and MM5 technical requirements Number of Jobs submitted per day Cores per ensemble Submission time Ensemble Completion Deadline Archived Storage Storage per job on running node (WN) Minimum Memory per job on running node (WN) Build Environment

Grid Environment

BOLAM 10 (1 job / member)

MM5 10 (1 job / member)

Ensemble Overall 20

10 (1 core / job)

70-130

Once every day between 11:00 am – 12:00 am 3-4hrs

60-120 (6 to 12 cores / MPI job) Once every day between 11:00 am – 12:00 am 4-5hrs

10Mbytes / 100Mbytes total 20Mbytes

200Mbytes per 2GBytes total 250Mbytes

2.1 GBytes per day

job,

job,

Once every day between 11:00 am – 12:00 am 4-5hrs

-

512 MBytes

1 GByte

-

PGI Fortran or INTEL Fortran.

PGI Fortran or INTEL Fortran. MPICH 1.2

gLite 3.1 (Components used: CE/WN, MyProxy, SE, LFC, UI).

gLite 3.1 (Components used: CE/WNs with support for MPICH 1.2, MyProxy, SE, LFC, UI)

Binaries compiled for Linux Red Hat Enterprise 4 or similar (SL4, CentOS 4). Ganga [27] used for job submission and user-level scheduling

3.5 Early implementation issues and workarounds During the first months of REFS development we evaluated the software tools available from the Grid infrastructure, the behaviour of the grid services and the behaviour of the models running on the grid in standalone (non-ensemble) modes. We soon realized that there would be a few hurdles to overcome. Job scheduling would be the first problem; the jobs would rarely be scheduled and finish on time. In many cases failures would occur either due to the model itself, to the infrastructure or the NOMADS servers. In such cases recovery and resubmission of jobs was required. MPI support was one of the main reasons for MM5 job failures. Insufficient MPI support in current large scale grid infrastructures is a well known issue that relevant Grid projects are still trying to overcome. During this trial period we identified the subset of sites that offer proper MPICH 1.2.7 support and we restricted the job scheduling to only these sites. The fetching of initial and boundary data was also the source for many failures. As mentioned in previous paragraphs NOMADS servers can many times be overloaded from requests or become completely unavailable. This problem is aggravated in the case of ensemble forecasts where a different set of initial data have to be loaded for each member. In these cases it is very likely that some of the members might be delayed considerably ,or utterly fail, increasing the probability that the ensemble will not run properly. To overcome this we developed a specialized Python application that will try many times and from different servers to retrieve the required data.

10

The number of tries and the timeout before each try is user-defined. The application is also multithreaded enabling the fast retrieval of initial data in parallel streams reducing this way the overall time required to transfer all required data. In order to develop a scheduling monitoring logic on the user side we decided to adopt the Ganga framework [27]. Ganga is a Python library which provides an object-oriented programmatic abstraction of the command line tools available by gLite. Using Ganga we are able to have better control on the job management aspects of the application. It also gives us the opportunity to exploit the power of Python for issues related to user-interaction, statistics and log-keeping etc. From this early development stage it also became evident that, irrelevant of the weather model used, the ensemble forecasting exhibits similar characteristics in terms of execution steps and tool usage. For this reason we decided to build a parametrical general-purpose code-base that could be used to perform ensemble forecasting on the grid irrelevant of the model used for weather prediction. The model codes were statically compiled on a User Interface (UI) gLite node using PGI Fortran. This way we overcame the requirement to have licensed PGI libraries on the target Worker Nodes (WNs). Nevertheless this way we have sacrificed performance, since the generated binaries were optimized for the target running machines. Moreover, due to the MM5’s instabilities as reported previously, this binary code would crash often for no apparent reason in many grid sites. After many trials with different compilers and compilation parameters we finally adopted the Intel Fortran compiler for MM5. The code was compiled in a UI which was custom built for us in order to provide an identical environment to that offered by the HellasGrid infrastructure (http://www.hellasgrid.gr) which is part of the SEE-GRID-SCI infrastructure. By doing this we managed to eliminate the model crashes and optimize the performance by minimizing the model’s execution time. On the downside this means that we have to run the code only in a subset of the SEE-GRID-SCI’s infrastructure (the 6 sites comprising HellasGrid). This also increases the probability that the jobs will have to queue for a long time before they are finally scheduled for execution thus the chances that the ensemble will complete within the required time window are even slimmer. Unfortunately, there is no optimal solution to this problem. One option could be to experiment with the shortdeadline job solution that has been tested with moderate success in the EGEE infrastructure. Nevertheless, the profile of MM5 ensemble forecasting application does exactly match that of a short deadline job (it can rather be defined as a strict deadline job with the execution times extending to many hours). Finally, the solution we chose was to get in touch with 3 HellasGrid sites and agree with them to allocate a subset of their cores and dedicate them for the MM5 ensemble. This way it is guaranteed that once submitted the jobs will be scheduled and run immediately on the Grid, minimizing the startup overhead and practically zeroing the probability that the ensemble will not complete within the required deadline.

3.6 Running REFS in production phase REFS has moved to production in the second year of development. During this phase the code runs daily producing weather forecast for the region shown in Fig. 1. The code-base is still being fine-tuned and new features and improvements are being added. Fig. 3 depicts the application workflow as it is currently implemented.

11

The application execution is triggered daily at a specific time in the morning (e.g. currently 11:30 am for MM5 and 12:00 pm for BOLAM) by a cron job that initiates the REFS job management (REFSjm) Python script. REFSjm starts by reading a configuration file that provides various parameters like the type of the model that is submitted (MM5 or BOLAM) and details about the underlying Grid infrastructure that will be used. REFSjm uses the Ganga library to prepare and submit a compound job consisting of 10 sub-jobs, one for each member of the ensemble. Upon successful submission of the job, REFSjm switches to monitoring state by sampling ever 30 seconds the status of the sub-jobs. If a sub-job fails, REFSjm will try to re-submit it up to a maximum number of times defined by the user. The compound job will run until all subjobs complete, either successfully or with a failure, or until a user-defined time-limit expires, in which case REFSjm will kill the compound job. Upon the job completion REFSjm sends a notification email to a list of recipients defined in the configuration file. Since our intention is to run the application unattended on a daily basis it is important to automate the Grid user credential (proxy certificate) generation. For this purpose we utilize the MyProxy service. A proxy certificate, valid for 30 days, is stored on a MyProxy server and is contacted by a cron job in order to retrieve every day a freshly signed proxy certificate valid for 24 hours. The procedure does not require any password although it is expected that the user will upload manually a new proxy certificate to the MyProxy server once every month. This approach was selected as more secure compared to other solutions that would require storing of the private key password somewhere in the local file system. During job submission, REFSjm will typically contact a WMS (Workload Management System) server which keeps an overall picture of the Grid status and is responsible for scheduling the jobs to the appropriate CEs (Computing Elements) according to the default or user-supplied policies. There is also the ability to bypass the WMS and submit the jobs directly to a specific CE. In the general case though, each subjob will be submitted to a different CE and will run concurrently in multiple WNs spread among the same or different grid sites. In the CE the job will be scheduled for execution in one of the WNs in the site. On the WN side the application is driven by a shell script which is responsible for preparing and executing the workflow described in the previous section. The first action performed by the script is to fetch the packages containing the model binaries and the static data needed for the model execution. These static data, which can be quite large, contain for example the details of the earth terrain over which the model will run. This package is stored on a Grid Storage Element (SE) and is accessed using the LFC (Local File Catalog) service of gLite. As a side note we should mention that all data management related activities in REFS are performed using LFC. After fetching the model package, the script invokes the NCEPget python script. NCEPget is responsible for the preparation and transfer of the initial data from NOMADS to the WN. NCEPget contains the required logic to overcome the problems that can occur with NOMADS as described in the previous section. As mentioned, two sets of data are required, the atmospheric initial and boundary conditions and the sea surface temperature. The latter is common for all models and members of the ensemble. For this reason and in order to accelerate the procedure and increase the robustness of the application, the boundary data are retrieved and stored on the LFC asynchronously by another program running on the UI, before the execution of the ensemble. Once all initial and boundary data have been transferred to the local host, the script initiates the execution of the workflow steps: pre-processing, model execution, post-processing. All three steps are implemented as userprovided shell scripts that are transferred to the WN within the job sandbox. In the case of MM5 the model

12

execution script contains the mpirun command that will execute the model in parallel in the grid cluster. For this purpose the requirement for MPI is explicitly defined in the job specifications and the local scheduler at the CE reserves the appropriate amount of CPU cores. For what concerns the invocation of MPI in particular, we rely on the environment variables provided by gLite, that help us determine various important parameters like the number of processes and the name of machines available to the application, the path to the MPI executables etc. These variables facilitate the proper setup of the execution environment and the invocation of mpirun script. After the successful completion of the post-processing step, the main script uploads the results to the LFC on a specific directory, and exits. The final step for processing the results and generating the Ensemble forecasting is performed outside the grid serially on the UI node. REFS code is generic enough to enable the quick adoption of other weather models that implement similar ensemble forecasting workflows. In order to do this the user has to prepare the binary packages and upload them on the LFC, and provide the shell scripts that implement the workflow steps (pre-process, model run, postprocess). The rest of the code-base can be used practically as is, although the user most probably will also want to modify the REFSjm configuration file in order to fine-tune the job submission for the specific requirements for his/her application.

4 Demonstration of REFS results 4.1 Example Forecasts “Deteterministic” forecasts are inherently uncertain, and it is important that this uncertainty is estimated and communicated to forecast users so that they can make optimal decisions. Probability forecasts can be more “believable” than deterministic forecasts. Probability forecasts are also required for the user to be able to make optimal decisions – the predicted uncertainty of the forecasts is a key element in the decision-making process. In order to make optimal decisions, it is necessary for the forecast user to be able to quantify risk by having estimates of the probabilities of all the different possible outcomes. The regional weather ensemble forecast system developed on the Grid infrastructure will provide forecasts that endorse the probability of occurrence of weather events, with special emphasis on high-impact weather, e.g high winds, extreme precipitation, frost conditions and heat waves. Namely from the post-processing of the multitude of forecasts a variety of products will be made available, such as: (a) probability of 6-hour rainfall exceeding 1 mm, 5 mm, 10 mm, etc., (b) probability of wind speed exceeding 10 m/s, 15 m/s, etc., (c) ensemble average temperature at 850 hPa level, (d) standard deviation from the mean temperature at 850 hPa level, (e) meteograms of temperature at 850 hPa level at various cities. In the following, a recent example of the regional ensemble forecast system results is given. During the summer, continental Europe weather is characterized by the onset of thunderstorms that usually develop during the warm period of the day (noon and early afternoon). These thunderstorms are often accompanied by heavy localized rain, gusty winds, lightning and hail, provoking locally severe damages and sometimes casualties. Due to the short life cycle and to the localized nature of thunderstorms, the forecast of such events is a

13

challenging task for the weather prediction scientific community and therefore, ensemble forecasts and associated probabilities of occurrence are of paramount importance. On 18 July 2009, a cluster of thunderstorms affected areas of western Hungary and western Slovakia, with the most intense phenomena observed within the time interval 1200-1500 UTC, as it is evident in Fig.4, where lightning activity over the area is reported, as measured by the long-range lightning detection system operated by the National Observatory of Athens. The thunderstorm activity was particularly intense over the western part of Slovakia, with fatal results. Indeed, one person died after a huge tent collapsed, due to heavy rainfall and gust winds, injuring dozens of other persons during Slovakia's largest musical open festival. Figure 5 shows the probability map of 6-h accumulated precipitation exceeding 10 mm as calculated from the regional ensemble weather forecasting system presented in this study. It is evident that REFS delineates an area with the probability of exceeding the threshold of 10 mm of rain within 6 hours is 100%, so all REFS members predict high rain amount over this area. This information gives to the forecaster more confidence on the model forecasts, permitting thus to issue the respective warning for this area.

4.2 Performance results In this paragraph we report on the performance results of REFS from the computing point of view. We focus on two parameters: the performance of the application, reflected by the total time it takes for an ensemble to complete successfully and the application robustness indicated by the number of times the application succeeded or failed to complete. Table 2 summarizes the statistics collected for MM5 and BOLAM ensembles during a period of four months, from mid-April 2009 until mid-August 2009. MM5 ran with 60 processors (6 cores allocated per member) on the three reserved sites in the HellasGrid infrastructure. BOLAM ensembles, in the same time, were submitted through WMS to all sites in SEE-GRID-SCI supporting the Meteo VO (a Virtual Organisation setup in SEE-GRID-SCI to accommodate all meteorology applications of the project). In both cases the deadline set for the ensemble to complete was 400 minutes (6:40hrs). Table 2. Ensemble performance statistics Statistic Total Ensemble Runs Average Members successfully completed per Ensemble Ensembles Completed with status “Success” Ensembles Completed with status “Fail” Ensembles Completed with status “Killed” Ensembles Completed with status “Running” Average time for an Ensemble to complete successfully (h:mm:ss) Minimum observed time for successful Ensemble completion (h:mm:ss) Maximum observed time for successful Ensemble completion (h:mm:ss)

MM5 85 8.54 61 (72%) 1 (1%) 7 (8%) 15 (18%) 2:59:07 1:19:49 6:23:15

BOLAM 123 7.95 69 (56%) 32 (26%) 22 (18%) 0 (0%) 1:59:40 0:41:10 6:37:42

From the point of view of robustness, it is evident that MM5 is running with higher rates of success and with more probability for the whole ensemble to complete comparing to BOLAM. This is justified by the fact that the former model runs on dedicated clusters whereas the latter goes through all the stages of Grid scheduling relying on WMS and also the local schedulers in order to decide where and when the jobs will run. The “Killed” and “Running” states refer to the situation where the deadline for completion has expired and REFSjm sends a “kill” command to the Grid in order to force the termination of all jobs. As we can see for MM5 there

14

are cases where this command is ignored and the jobs are not killed. This most probably is related with the nature of MPI jobs and the way they are handled by the Grid scheduling mechanism which in many cases is unable to acquire full control of the job. Regarding execution time, BOLAM shows better average and best completion times. This is more or less expected since this model generally completes much faster than MM5 for the same region given the same initial and boundary data. The best execution time observed for a BOLAM ensemble is ~41min. This is close to the optimum performance possible since each individual member of the ensemble requires ~40-45min to complete. The same time for a MM5 member is ~75-90min for an optimized binary and thus the observed best time is 1:19 hrs. Figures 6(a) and 6(b) show the distribution of completion times (jobs with exit status success), for MM5 and BOLAM respectively, for the referenced period of 4 months. In these figures the x axis represents the total execution time whereas the y axis is the runs counter (typically 1 run per day). One interesting observation from these graphs is that performance degradation has the tendency to expand during a period of time of 3-4 days and generally is not isolated in a single day. These can be explained by the fact that when the infrastructure is overloaded or exposes some kind of technical problem, this problem takes a few days to be resolved and for the Grid to come back to its normal status. Another observation one can make is that, despite the fact that MM5 runs on dedicated clusters, the ensemble fails to complete successfully (this means that at least one member failed) and on-time during every run. There are two reasons for this: the first relates with the failures caused by NOMADS servers. Some days it takes more time for one member to retrieve the initial data or a member may fail to retrieve them at all, thus the non-perfect success rates and time. The second reason should be most probably investigated in the Grid software stack itself and the fact that many times the job submission, monitoring and management mechanism fails to provide the required level of service. On the other hand, the failure rates of BOLAM (26%) are associated with the fact that no dedicated resources are used, rather we follow the normal approach of relying to the WMS to find the most appropriate resource for execution. If there is a problem with the selected site the jobs will fail to complete successfully. Such problems are for instance temporary network failures, middleware misconfiguration etc. Although REFSjm will try to resubmit the failed jobs, there is a user defined value of maximum re-submissions that if reached, the system will stop trying to execute the job and mark the member as failed. At this point, it should be stated that even if not all the ensemble members finish successfully there is still value in the process. As already stated in section 2.3 the perturbed initial conditions used for the global ensemble system are statistically independent. For REFS 10 different and statistically independent sets of initial and boundary conditions are used for each model, producing thus 20 ensemble members. Although it is desirable to obtain as large ensemble as possible, if at least half of the members from each model have finished, the ensemble forecast is issued.

5 Concluding remarks and prospects This work presented the development of a regional ensemble forecasting system, aiming at providing more accurate weather forecasts, especially in cases of high impact weather (high and low temperatures, strong winds, heavy precipitation events, etc). Since the necessary computer infrastructure for the development and

15

operational use of such a system is highly demanding, this forecasting system and the relevant development make use of grid technology, both for the execution of the necessary computing jobs, plus the storage of the large amount of output data. The application gridification proved more challenging than anticipated. Many factors contributed to this. First of all the application depends on external sources (NOMADS) that may not be available all the time. Despite the various workarounds and solutions we’ve tried it remains a fact that some days the NCEP data were not available thus the forecast could not execute. At this point it should be mentioned that lately the addition of a new NOMADS server offering high availability has improved considerably the access to data. Moreover, the porting of MM5 to the Grid was not a straightforward task due to the specific peculiarities of the application. Finally the Grid infrastructure itself fails in many situations to provide the required QoS needed for the applications of this nature. Despite the fact that we have applied advanced reservation techniques in order to ensure that the required amount of CPU cores will be immediately available for the MM5 ensemble, temporary failures of other services some days still impede the successful completion of the ensemble. For example, if the VOMS service fails to generate proxy credentials no operation can be performed on the Grid. Also, if the LFC service or the relying SEs are off-line, no data can be transferred from and to the grid sites, prohibiting not only the transfer of input and output files but of the model binaries and support files themselves. In the coming months we plan to work closely with the Grid sites in order to detect exactly the causes of these misbehaviour and try to improve the overall application performance. Indeed, our experience shows that the close collaboration with resource providers and site administrators is a pre-requisite in order to run production level codes on large-scale Grid infrastructures. Current state-of-the-art Grids are highly distributed but still contain a few single point of failures (like the services mentioned above) that when not operating properly may imbed the execution of applications. Although these infrastructures are closely monitored for problems, in many cases it is up to the user to notify about failures and push for their prompt resolution. Moreover, in order to run applications with strict deadlines restrictions (as for example a meteorological model) the negotiation for, and dedicated allocation of, resources is in some cases inevitable in order to ensure fast scheduling and acceptable job completion times. For what concerns the problem of MPI support in particular, obviously the usage of a few specific sites with dedicated resources is not optimal and, up to some point, not in line with the Grid computing spirit. This lack of proper MPI support has been actually recognised by large production Grids (EGEE in particular) and specific efforts have been put in to place in order to overcome these problems. We hope that our feedback and by closely collaborating with them, we will contribute in improving the current situation, for the benefit of other applications with similar requirements. Despite the above technical issues, it was demonstrated that the resulting system is able to provide ensemble forecasts at a regional case and at the necessary resolution in order to better describe local weather phenomena that can produce high impact weather. An example of a case-study characterized by heavy thunderstorms in Central Europe during summer 2009 (with damages, casualties and several injuries) was discussed, showing the utility of REFS for the accurate prediction of thunderstorm activity during summer. So far, the described REFS makes use of two weather models (BOLAM and MM5). It is in the author’s plans to extend REFS with the addition of two other meteorological models, namely WRF/NMM and Eta, through the collaborations built in the frame of the EU funded project SEE-GRID-SCI. WRF/NMM and Eta are both non-

16

hydrostatic models that as MM5 model are parallel applications and use MPI. Indeed the multi-analysis ensemble (based on the use of multiple initial conditions) is used in order to generate inter-forecast variability depending on a realistic spectrum of initial errors. The multi-model ensemble contributes to sampling the uncertainty of the models themselves. That is why the increase from two to four models will benefit the multi analysis multi-model REFS that is being built and the ensemble members will increase from 20 to 40.

Acknowledgements This work has been supported by the EU funded project SEE-GRID-SCI. The authors are also grateful to NCEP (USA) for providing GFS initial and forecast field data that allowed the operational use of the regional ensemble forecasting system.

REFERENCES [1]

Akylas E., V. Kotroni and K. Lagouvardos, 2007: Sensitivity of high resolution operational weather forecasts to the choice of the planetary boundary layer scheme, Atmospheric Research, 84, 49-57.

[2]

Bird, I., et al., 2004: Operating the LCG and EGEE production Grids for HEP. In: Proceedings of the CHEP‘04 Conference.

[3]

Buzzi, A., M. Fantini, P. Malguzzi and F. Nerozzi, 1994: Validation of a limited area model in cases of Mediterranean cyclogenesis: surface fields and precipitation scores. Meteorol. Atmos. Phys., 53, 137-153.

[4]

Buzzi, A., R. Cadelli, P. Malguzzi, 1997: Low level jet simulation over the Antarctic ocean. Tellus, 49 A, 263-276.

[5]

Buzzi A., N. Tartaglione, P. Malguzzi, 1998: Numerical simulations of the 1994 Piedmont flood: role of orography and moist processes. Mon. Wea. Rev., 126, 2369-2383.

[6]

Buzzi, A., and L. Foschini, 2000: Mesoscale meteorological features associated with heavy precipitation in the southern Alpine region. Meteorol. Atmos. Phys., 72, 131-146.

[7]

Campana, S., et al., 2005: Analysis of the ATLAS Rome Production Experience on the LHC computing Grid. In: IEEE International Conference on e-Science and Grid Computing.

[8]

Diomede T., S. Davolio, C. Marsigli, M. M. Miglietta, A. Moscatello, P. Papetti, T. Paccagnella, A. Buzzi and P. Malguzzi, 2008: Discharge prediction based on multi-model precipitation forecasts. Meorol. Atmos. Phys. 101,245-265.

[9]

Dudhia, J., 1993: A non-hydrostatic version of the Penn State/NCAR nesoscale model: validation tests and simulation of an Atlantic cyclone and cold front. Mon.Wea. Rev., 121, 1493-1513.

[10]

Epstein, E. S. 1969. Stochastic dynamic prediction. Tellus. 21, 739-757.

[11]

Escriba, P.A., A. Callado, D. Santos, C. Santos, J. Simarro, and J. A. García-Moya, 2009: Added value of non-calibrated and BMA calibrated AEMET-SREPS probabilistic forecasts: the 24 January 2009 extreme wind event over Catalonia. Plinius Conference Abstracts, Vol. 11, Plinius11-138-2, 2009, 11th Plinius Conference on Mediterranean Storms.

17

[12]

Foster, I., Kesselman, C., 2003: The Grid: Blueprint for a New Computing Infrastructure, 2nd edn., Morgan Kaufman, San Francisco.

[13]

Hong, S-Y, and H-L Pan, 1996: Nonlocal boundary layer vertical diffusion in a medium-range forecast model. Mon Wea. Rev., 124, 2322-2339.

[14]

Hou, D., E. Kalnay, and K.K. Droegemeier, 2001: Objective verification of the SAMEX '98 ensemble forecasts. Mon. Wea. Rev., 129, 73-91.

[15]

Kain, J.S., and J.M. Fritsch, 1993: Convective parameterization for mesoscale models: The Kain-Fritsch scheme. The Representation of Cumulus in numerical models, Meteor. Monogr., No 46, Amer. Met., Soc., 165-177.

[16]

Kalnay E., 2003: Atmospheric modelling, data assimilation and predictability. Cambridge Univ. Press, 341 pp.

[17]

Kotroni, V. and K. Lagouvardos, 2001: Precipitation forecast skill of different convective parameterization and microphysical schemes: application for the cold season over Greece. Geoph. Res. Let., Vol. 108, No 10, 1977-1980.

[18]

Kotroni V. and K. Lagouvardos, 2004: Evaluation of MM5 high-resolution real-time forecasts over the urban area of Athens, Greece. J. Appl. Meteor (in press).

[19]

Krishnamurti, T. N., C. M. Kishtawal, Zhan Zhang, Timothy Larow, David Bachiochi, And Eric Williford, 2000: Multimodel Ensemble Forecasts for Weather and Seasonal Climate, J. Climate, 13, 4196-4216

[20]

Lagouvardos K., V. Kotroni, A. Koussis, C. Feidas, Α. Buzzi, P. Malguzzi, 2003: The meteorological model BOLAM at the National Observatory of Athens: assessment of two-year operational use. J. Appl. Meteor., 42, 1667-1678.

[21]

Laure E. and Jones B., 2009: Enabling Grids for e-Science: The EGEE Project. Grid Computing: Infrastructure, Service, and Application, CRC Press, 55-72

[22]

Laure E., Fisher S., Frohner A., et al., 2006: Programming the grid with gLite, Computational Methods in Science and Technology, 12 (1), 22-45

[23]

Leith, C.E., 1974: Theoretical skill of Monte Carlo forecasts. Mon. Wea. Rev., 102, 409-418.

[24]

Lorenz, E. N., 1965: A study of the predictability of a 28-variable atmospheric model. Tellus, 17,. 321333.

[25]

Lorenz, E. N., 1968: The predictability of a flow which possesses many scales of motion. Tellus, 21, 289307.

[26]

Montani, A., Capaldo, M., Cesari, D., Marsigli, C., Modigliani, U., Nerozzi, F., Paccagnella, T., Patruno, P., and Tibaldi, S.: Operational limited-area ensemble forecasts based on the Lokal Modell, ECMWF Newsletter Summer 2003, 98, 2–7, 2003.

[27]

Mościcki J.T., et al., 2009: Ganga: a tool for computational-task management and easy access to Grid resources. Computer Physics Communications, abs/0902.2685

18

[28]

Palmer, T. N., Molteni, F., Mureau, R., Buizza, R., Chapelet, P., & Tribbia, J., 1993: Ensemble prediction. Proceedings of the ECMWF Seminar on Validation of models over Europe: vol. I, ECMWF, Shinfield Park, Reading, RG2 9AX, UK.

[29]

Schultz, P., 1995: An explicit cloud physics parameterization for operational numerical weather prediction. Mon. Wea. Rev., 123, 3331-3343.

[30]

Spencer, P. L. and D. J. Stensrud, 1998: Flash flood events: importance of the subgrid representation of convection. Mon. Wea. Rev., 126, 2884-2912.

[31]

Tracton, MS, and E. Kalnay, 1993: Ensemble forecasting at NMC: Practical aspects. Weather and Forecasting, 8, 379-398.

[32]

Wei, M. and Z. Toth, R.Wobus, Y.Zhu, 2008: Initial perturbations based on the ensemble transform (ET) technique in the NCEP global operational forecast system. Tellus, 60A, 62–79.

19

FIGURES LIST

40 N

20 E Figure 1. (a) Horizontal extension of (a) MM5 grid and (b) BOLAM grid used in the regional ensemble forecasting model chain.

20

Figure 2. Typical workflow of a weather prediction application

Figure 3. REFS workflow implementation on the Grid

21

Figure 4. Map of lightning flashes recorded from 1200 up to 1800 UTC, 18 July 2009 from the ZEUS lightning detection network.

22

Figure 5. Probability of exceedance of 10 mm of precipitation in 6 hours from 1200 up to 1800 UTC, 18 July 2009, as forecasted by the REFS system.

23

Figure 6. (a) MM5 ensemble distribution of completion times, (b) BOLAM ensemble distribution of completion times.

24

Suggest Documents