G-AQFS: Grid computing exploitation for the management of air quality in presence of complex meteorological circulations G. Aloisio1, M. Cafaro1, R. Cesari2, C. Mangia2, G.P. Marra1,2,*, M. Miglietta2, M. Mirto1, U. Rizza2, I. Schipa2, A. Tanzarella2 1
Center for Advanced Computational Technologies/ISUFI, University of Lecce, Italy Institute of Atmospheric Sciences and Climate, C.N.R, Section of Lecce, Italy 1 {giovanni.aloisio, massimo.cafaro, gianpaolo.marra, maria.mirto}@unile.it, 2 {r.cesari, c.mangia, gp.marra, m.miglietta, u.rizza, i.schipa, a.tanzarella}@isac.cnr.it 2
* To whom all correspondence should be addressed Gian Paolo Marra Institute of Atmospheric Sciences and Climate, C.N.R S.P. Lecce-Monteroni, km 1,200 - 73100 Lecce (Italy) Tel: +39 0832 298 811 Fax: +39 0832 298 716 Email:
[email protected]
Abstract: Leveraging Grid Computing technology, i.e. the virtualization of distributed computing and data resources such as processing, network bandwidth and storage capacity to create a single system image, we present a Grid Air Quality Forecast System (G-AQFS). The Modeling system consists of meteorological and dispersion models coupled in cascade. The computational workflow of the Modeling system is defined by means of DAGs (Direct Acyclic Graph). A simple system is presented to manage and schedule the computational Grid resources. In particular, the algorithm developed for the Work Flow Scheduler named Depth-First Search Job with Priority (DFSP) is illustrated. As case study the system has been applied over Salento area, in the Apulia region (South-eastern Italy), to simulate ground level ozone concentration. Model predictions have been compared with field measurements, with reasonable results. Keywords: Air Quality and Atmospheric Modeling, Computational Grid, Grid Computing, Globus Toolkit Reviewed and accepted: 31 Mar. 2004 1. Introduction The management of air quality is a quite complex task: it involves identification of the sources of materials emitted into the atmosphere, estimation of the emission rates of pollutants, understanding of transport and diffusion of the substances and knowledge of the physical and chemical transformation processes that can occur during that transport. Mathematical models, putting together all these aspects, can represent a fundamental tool not only to assist environmental authorities in planning control measures, but also to improve the understanding of the emissions, chemistry, and meteorology used to drive them. Coastal areas are often preferred sites for industrial development. The meteorology of such regions can adversely affect transport and dispersion of air pollutants and cannot be generally obtained with simplified models, which assume that the flow is stationary and homogeneous.[1]. In the presence of coastlines and orography, we have complex circulations, characterized by large horizontal and vertical variations of meteorological parameters, that are caused by the different diurnal heating cycle, at the sea/land boundary [2]. In particular, in a flat straight peninsula, small scale temporal and spatial variations of the wind field and of the boundary layer structures are present, for the development and overlapping of different thermal circulations. The ground level impact of pollutants is determined by non-stationary 3D trajectories, which should be computed for a correct pollutant transport and dispersion calculation. Therefore, a combined Modeling system, that couples atmospheric flows with dispersion and chemistry is needed. This is particularly true for photochemical pollution, where non-linear chemistry is combined with meteorological effects, that strongly influence the maximum ozone concentration: primary and secondary pollutants may also be transported far away from the area where they are emitted and produced. It happens very often that high ozone levels are reached not close to the areas in which precursors are emitted, but in areas downwind the source.
The emerging Grid technology offers the resources needed to perform complex atmospheric and climate simulations. These simulations may ultimately be used to assess the impacts of global climate change at the regional scale. According to Foster et al., Computational Grid [3] is a collection of distributed, possibly heterogeneous resources which can be used as an ensemble to execute large-scale applications. By using these resources, it is possible to access information about the grid components, locate and schedule resources, communicate between nodes, access programs and data sets within data archives, measure and analyze performance and finally authenticate users and resources. We exploit the Globus Toolkit [4], the de facto middleware standard for computational grid, offering the power and security needed to develop atmospheric Modeling applications. In this paper, we describe the use of Grid Computing technology, optimising the results obtained from some meteorological and dispersion models coupled in cascade: RAMS [5], CALMET [6], CALPUFF [7] and CALGRID [8]. As test case, the modeling system has been applied to simulate a summer photochemical smog episode over the Peninsula Salentina. This is a narrow flat land in the south-eastern part of Italy with two big industrial complexes on the opposite coastlines. The geographic position of such area favour the development of complex meteorological circulations and consequent complex pattern of ground level pollutant concentration. Predicted meteorological and concentration data are then compared with measured data. The paper is organized as follows. In Section 2, we recall some essential computational background and in Section 3, we describe the Modeling system and the models used in the simulations. Section 4 describes G-AQFS and its main components, while Section 5 describes meteorological and dispersion simulations and presents a case study. Finally, the conclusions and the future evolution of our work are given in Section 6. 2. Computational background Grid computing couples together parallel and distributed computing and high speed networking; it is an evolution of some of the concepts that date back to 1992 when metacomputing was introduced. The key idea of a virtual supercomputer made of several computing resources connected by high-speed networks promised to allow the solution of problems otherwise too large for a single supercomputer, or whose execution will have benefited from division in several components to be executed on different architectures. Grid technologies need to provide support for: ·
security;
·
resource management;
·
data management;
·
communication;
·
quality of service;
·
adaptation.
Journal of Digital Information Management G Volume 2 Number 2 G June 2004
67
A security infrastructure is needed to provide users with authentication and authorization services; the aim of resource management is to hide heterogeneity providing a coherent and uniform interface. Data grids need mechanisms to handle large amounts of data, such as metadata indexing, searching, and parallel data transfers. Communication requires specialized protocols for the wide area environment, possibly guarantying quality of service. Finally, support for adaptation is crucial in grid environments and it is based on the knowledge of static information known a priori about resources, networks etc. and dynamic information not known until runtime (e.g. resources load). Collaboration is the heart of scientific investigation, and it is even more constructive between people with expertise in different fields. Grids can assist and foster collaboration in many ways, and provide researchers with sophisticated approaches to their data resources. The challenge here is to provide shared virtual environments with the possibility of multiple flows (audio, video, text, control) in near real time. Grids will allow researchers to solve large scale computational science problems, to collaborate on results, to manage and analyze large volumes of distributed data. This is the impact for the professional scientist. 2.1 Scientific workflow Today many scientific endeavours rely on complex, computer simulations. In this context, computing has becoming increasingly complex and intensive. Indeed, in several fields such as computational biology, chemistry and for the management of environment problems, the scientists carry out their research with heavy use of computer techniques. We can identify one of these with the term Scientific Workflow. Workflows received an enormous amount of attention in the research of information systems such as databases, business environments and others. M.P. Singh and M. Vouk [17] introduce the term Scientific Workflows as a blanket term to describe series of structured activities and computations that arises in scientific problemsolving in a complex, structured and with dependencies environment. Details about Scientific Workflows can be found in [11, 12]. 2.2 Globus Toolkit and Grid Resources Broker (GRB) libraries The Globus Toolkit [19] has gained wide acceptance worldwide, so that it is deployed at multiple organizations as the middleware of choice for grid computing and many scientific endeavours rely on it. It is a layered architecture that addresses grid security, remote access and control providing support for PKI [18] single sign-on authentication/ authorization, an information rich environment based on the LDAP protocol and a standardized interface to heterogeneous computing resources. However, the complexity of this middleware hinders the majority of scientists (that are not computer scientists) from doing their useful work. The Grid Resource Broker [14, 15] libraries are designed to bridge the gap between those scientists and developers scared by the complexity of the Globus Toolkit and the grid. They leverage existing functionalities in the Globus Toolkit, providing enhanced Globus services. The transition to grid computing is not easy. Moreover the learning curve for Globus is particularly steep, so we designed GRB to promote the use of computational and data grids by providing a tool to access at Globus services. The libraries utilize the following core Globus services to enable grid computing: Grid Resource Information Service (GRIS); Grid Index Information Service (GIIS); Globus Resource Allocation and Management (GRAM); Grid Security Infrastructure (GSI); Grid Security Infrastructure FTP (GSI-FTP). The GRIS is a service running on each computing resource; it collects both static and dynamic resource information, and can also report the collected information to a hierarchical GIIS server posting and receiving data in LDIF format using the LDAP protocol (Lightweight Directory Access Protocol). Thus, the GIIS provides a uniform resource information service allowing distributed access to structure and data information about the grid status.
Resource management is the aim of the GRAM (Globus Resource Allocation Manager) module; it takes care of resource location and allocation; moreover it performs process management. The user can specify application requirements using RSL, the resource specification language. The Globus Security Infrastructure (GSI) module is responsible for handling the authentication mechanism used to validate the identity of the users and resources; it makes use of both the Generic Security Service API (GSS) and of SSL (Secure Socket layer). Certificates are used for authentication/authorization mechanisms. Remote access to data via sequential and parallel interfaces can be obtained exploiting the GSI-FTP server. The service is based on the GridFTP protocol and allows partial file, third-party and striped/parallel transfers, support for GSI authentication and negotiation of TCP buffer sizes. 3. Modeling system The Modeling system consists of meteorological and dispersion models [9, 10]. Two meteorological models have been coupled in cascade: RAMS and CALMET. The first one is a prognostic mesoscale model while the second one is essentially a boundary layer preprocessor designed to provide all of the fundamental parameters, necessary input for dispersion models. The output provided by RAMS/ CALMET system has been used by the dispersion models CALGRID and/or CALPUFF. RAMS (Regional Atmospheric Modeling System) is a highly versatile model developed at Colorado State University for simulating and forecasting weather system. It contains an atmospheric model, which performs the actual simulation and a data analysis package which prepares initial data for the atmospheric model from observed meteorological data. The atmospheric model is constructed around the full set of primitive dynamical equations, which govern atmospheric motions. The RAMS model in this study was initialised and driven using the data by the European Centre for Medium-Range Weather Forecasts (ECMWF), updating fields every six hours. CALMET (CALifornian METeorological model) is a meteorological model which includes a diagnostic wind field generator containing objective analysis and parameterized treatments of slope flows, kinematical terrain effects, terrain blocking effects, a divergence minimization procedure, and a micrometeorological model for overland and over water boundary layers. The input required by CALMET consists of four categories of data: geophysical data file, upper air sounding data, surface meteorological data, over water data, and, optionally, a prognostic gridded wind field. We have chosen this option. The output of CALMET consists of 3D gridded fields of wind components and air temperature, and 2D fields of turbulent parameters. CALMET is designed to drive the two dispersion models CALPUFF and CALGRID. CALPUFF (CALifornian PUFF model) is a non-steady-state Gaussian puff model containing modules for complex terrain effects, overwater transport, coastal interactive effects, building downwash, dry and wet pollutant removal, and simple chemical transformation. It is designed to use meteorological fields provided by CALMET and time-dependent source and emission data. It produces one-hour averaged ground concentrations for the simulated species. CALGRID (CALifornian GRID model) is an Eulerian photochemical three-dimensional model which includes accurate modules for horizontal and vertical advection/diffusion. The model is based on the SAPRC-90 chemical mechanism, which contains 54 chemical species and 129 reactions. The model requires information about meteorological and turbulent field (by CALMET) and emission data in the domain, at the boundary and at initial time. It produces a 3D hourly field of concentration of the simulated species. 4. Grid Air Quality Forecast System G-AQFS is a tool that allows managing the running of models in cascade, integrated in a Computational Grid. The application flow of this system is defined by the number of models considered and by their logical interconnections. Furthermore the user, as we show later,
Journal of Digital Information Management G Volume 2 Number 2 G June 2004
68
can redefine the logical workflow, based on the same models, but with different logical interconnections. The G-AQFS software core must be available on at least one machine where someone (a power user) has installed and configured all of the packages of the system. The computational strategy is based on the Master-Slave model. The Master runs on the machine where the system is installed and the Slaves on other nodes belonging to the same computational Grid. In particular, the machine where G-AQFS is installed, is called Master Grid Node (MGN). The other computational Grid resources are named Slave Grid Nodes (SGNs). The installation of G-AQFS on MGN includes RAMS, CALMET, CALPUFF, and CALGRID models, the configurations files and the software modules required by the integrated Modeling system. G-AQFS contains two large data repositories: the land topography and the large-scale synoptic models data coming from the European Centre for Medium-Range Weather Forecasts - ECMWF (both used by the RAMS model). Data repositories are stored only on the MGN. During the execution, G-AQFS manages assigned workflow, defined by means of a DAG (Direct Acyclic Graph). DAGs are very useful and widely-used to manage scientific workflows. The graph vertices represent the models and the edges characterize functional dependencies. The DAG file contains information about: vertices number, vertices label, priority of vertices running, list of previous and following vertices and the package name related to a vertex. By interpreting the list of the previous and following vertices, the graph edges are obtained. The vertices priority is an index used by the scheduling algorithm for the determination of the workflow tasks. The accepted values are positive numbers [1=MAX_PRIORITY, 2, 3, etc ]. A package of a vertex of an assigned DAG can be run on the MGN or on SGNs of the Computational Grid. This possibility depends mainly by the use of the data repositories present on the MGN. As shown in Figure 1, the packages 1, 2, 4 and 11 require accessing the ECMWF datasets (such packages correspond to runs of the RAMS model). It is possible to move the run of these packages on SGNs but this choice needs replicating the ECMWF datasets. To specify if the execution of a package is on Master or Slave Grid Node, we use the following syntax for user DAG description: in the former case, a DAG vertex is indicated as follows:
Package Grid Driver
1
12
13
2
3
15
4
5
6
14
16
7
17
8
Priority=1 (High) Priority=2
9
10
Default Priority=3 (Low)
Figure 1. DAG of the G-AQFS packages
files towards the slave machine and to collect back on the master machine the output generated during the execution. An overview of the G-AQFS components is shown in Figure 2. The Controller module integrates functional elements needed for the management of the Modeling system workflow and for the control of single jobs on a master or slave grid node. The workflow is described through the DAG simulation. The Package Configuration module collects and catalogues the configuration files and the models. Several DAG topologies (defined by the user to describe the models and specific running paths) are catalogued in the Workflow Topologies module (WFT). This way, all of the information needed for the correct working of the Controller are defined. While running paths related to several DAGs, the Controller uses the Races Container module to store every simulation of the model chain. For each DAG, the simulation result is stored in the Race Container.
Executable for several platforms (i386, alpha, ...)
AQFS Packages
V:L:P:(PAVL):(NAVL):PACKAGE:MASTER
11
Packages Configuration
Packages Driver
Work Flow Topologies
otherwise: V: L : P : ( PAV L ) : ( N AV L ) : PA C K A G E : S L AV E : ( LIST_OF_PLATFORM_FOR_EXECUTION) (V=VERTEX, L=LABEL, P= PREVIOUS_ADJ_VERTEX_LIST, NEXT_ADJ_VERTEX_LIST)
PRIORITY,
PAVL= NAVL=
The LIST_OF_PLATFORM_FOR_EXECUTION parameter is the list of the platforms on which the executables are available. The platform of master node is known in the setup phase and if the user wants to use for the execution of a package this machine, the related executable must be available. The DAG description is provided by the user and allows the creation of customized simulations. Figure 1 shows the DAG of the G-AQFS packages included in this first version of the system (the main ones are: RAMS, CALMET, CALPUFF and CALGRID). A Computational Grid is made of platform-independent nodes. Generally, many platforms of calculation are available: Super Computers, workstations and PCs. Thus, the executables of the GAQFS packages for several platforms are needed (for some packages the source code can be also provided). In the workflow execution, the output of a model is input into another one, and to include also datasets coming from particular data repositories (topography, ECMWF, measures, etc.), we need to manage the presence of various data formats and select the right information to use as input for a specific model. Package execution is managed by appropriate drivers called Package Drivers (PDs). The driver is responsible for input file preparation, and for package running (on MGN or SGNs). In the case of execution on a Slave Grid Node, the PD needs to move the input and the executable
Topography Data Set
ECMWF Data Set G-AQFS Controller
Races Container
Figure 2. G-AQFS overview 4.1. The Work Flow Scheduler In this section we explore the strategy that allows managing the running of packages present in an assigned workflow by exploiting a DAG. To allocate a DAG we use the adjacency list representation (a linked representation). Each vertex has two list (the list of incoming vertices and the list of outgoing vertices) and other attributes e.g., the priority of execution, the job status, the name of the Package Driver and the list of input files. Part of G-AQFS software has been implemented using Ruby [21], an object-oriented scripting language. ADAG node is represented by the Vertex class. The status of a vertex can be:
Journal of Digital Information Management G Volume 2 Number 2 G June 2004
69
∀ pi ∈ PV [G ] : ∃ v j ∈ V [G ] ∋ ' v j . priority = pi ∀pi , p j ∈ PV [G ] : pi ≠ p j
Vertex not explored:
vertex.status=BLUE
The Package Driver associated is running:
vertex.status=RED
The execution of the Package Driver is finished:
Each vertex has an attribute for the name of the Package Driver: v.pk_driver.
vertex.status=GREEN
With an assigned DAG the Work Flow Scheduler is an implementation of this recursive algorithm presented here as pseudo code. The recursive function is named [Search(v, p)].
When the DAG file is read all of the vertices are in the BLUE status (vertexi.status=BLUE). Interpreting the priority of each vertex we can decide which package can execute the related Package Driver respecting execution constraints imposed by the DAG. A vertex j cannot run if its incoming edges are in GREEN status. When the vertex j can run its Package Driver, the status is set to RED (vertex j.status=RED). When the execution stops, the controller (described in next section) sets the status to GREEN (vertexj.status=GREEN). The Work Flow Scheduler strategy iterates the search for GREEN status vertices respecting functional dependencies imposed by the DAG. The priority of vertices influences the visit of DAG nodes. The search is obtained by traversing the vertex tree in a depth-first fashion. We named this DAGs view Depth-First Search Job with Priority (DFSP). In order to illustrate the DFSP algorithm of the Work Flow Scheduler we need some definitions: Let G be a directed acyclic graph (DAG). Let
V [G ]
be the set of vertices of DAG.
For each vertex priority p ≥ 0
v ∈ V [G ] we have an assigned ( pMAX = 1) .
For each vertex exists a pair where in is the number of edges entering the v vertex and out is the number of edges leaving the v vertex.
Let T[G] be a set of vertices with this definition (finals vertices of DAG): . Let S[G] be a set of vertices with this definition (sources vertices of DAG): . Let ADG[v]Previous be the set of predecessor vertices of the v vertex.
Let ADG[v]Next be the set of successor vertices of the v vertex.
Each vertex has an attribute representing the status:
BLUE = Vertex not explored v.status = RED = Vertex running GREEN = Vertex execution done Each vertex has a priority attribute: v.priority. The higher priority is
pMAX = 1 .
Let Pv [G] be the set of values of priority pi of DAG:
The Work Flow Scheduler stops the execution when all of the vertices of a DAG are in the GREEN status (vertexj.status=GREEN). num_vertex = count_vertex_of_graph(G) counter = 0 while (counter < num_vertex) Job = NIL For (for each p ∈ PV[G]Ordered by max priority) For (for each v ∈ S[G]Ordered by max priority) Job = Search(v, p) If (Job != NIL) then execute(Job.pk_driver) exit_all_for End if End for End For counter = count_green_vertex_of_graph(G) End while Search(v, p) Job = NIL If (v.status == BLUE) If (v ∉ S[G]) then Ready = 1 For (for each a ∈ ADJ[v]Previous) If (a.status != GREEN) then ready = 0 End for If (ready == 1) then If (v.priority ≤ p) then v.status = RED Job = v return Job End if Else return Job End if Else If (v.priority ≤ p) then v.status = RED Job = v return Job End if End if Else if (v.status == GREEN) For (for each a ∈ ADJ[v]Next with priority ≤ p ordered by max priority) Job = Search(v, p) If (Job != NIL) then return Job End for End if return Job End search 4.2. G-AQFS Controller overview
Journal of Digital Information Management G Volume 2 Number 2 G June 2004
70
The G-AQFS Controller is shown in Figure 3. The main components are the Work Flow Scheduler and the Round Robin Scheduler with priority. The first one uses an algorithm, that we have developed for this system, called “depth-first job search with priority”, in order to determine the operating sequence of the packages, based on a specific DAG. The second one builds a queue of Grid resources, in order to implement a round robin mechanism. All of the functional modules use concurrent threads when the Controller starts. In the following, we describe the G-AQFS Controller components: WFS (Work Flow Scheduler): the “depth-first job search with priority” algorithm is implemented to select the packages to be executed. The output (a job) is the input for the Q1-FIFO. RRSP (Round Robin Scheduler with Priority): by querying the Globus MDS (Monitoring and Discovery Service) [11] with the a specific function of the GRB (Grid Resource Broker) library, it is possible to know the dynamic information related to the status of Grid resources. Thus, a queue of available computational resources, ordered on the basis of computational features (e.g. cpu type and speed, RAM, workload, etc.) is built. The RRSP module takes a package from the Q1-FIFO queue and runs it on the best computational resource available at the Ti time, i.e., on BEST_GRID_HOST(Ti) machine. The BEST_GRID_HOST(Ti) is obtained using a simple metric based on MDS attributes. Afterwards, the package is forwarded to the Q2FIFO queue. PS (Package Starter): it controls constantly if there are any elements inserted in the Q2-FIFO queue. If so, it creates a thread for independent running of the PD, associated to the package.
As case study the Modeling system has been applied over Salento Peninsula in the Apulia region (South-eastern Italy) to simulate photochemical pollution. The area situated between two seas is subject to complex meteorological circulations. The period chosen is 4th - 7th august 2002. In this period, the analysis of AVN maps at 500 hPa shows the presence of a minimum south – east of British Islands and a wide area of high pressure developing from Tunisia to central Mediterranean Sea. This typical summer condition determines stable weather over the Apulia Region. The absence of a strong synoptic circulation favours the generation of complex sea breeze circulations (mainly northern during day time) along the coasts of Apulia. The simulation has been performed using the RAMS-CALMET-CALGRID part of the DAG. The simulations with the RAMS model have been performed in a twoway nested grids configuration with three grids (see Figure 4). This allows to resolve meteorological features at different spatial scales. For initial and boundary conditions, the Isentropic Analysis System (ISAN) package (the module of RAMS for the generation of data analyses) is used. Initially, analysed fields are based on the ECMWF (European Centre Medium Weather forecast) gridded datasets. Every 6 hours, the lateral and the top boundary conditions are updated in the coarsest grid, by using the ECMWF gridded datasets. In the coarsest grid domain, a nudging toward the data is applied in the 3 grid points closest to the lateral boundaries and in the upper 5 grid levels. CALMET and CALGRID have been runned on the inner grid. The domain size and grid spacing, used by the modeling system, are summarized in Table 1.
PD: the GridFTP protocol [16], available through the GRB-GSIFTP library, has been used for transferring the package and its related input-output files. This package is run on the grid nodes using the GRB-GSI library. PD starts the execution adding a token with the associated package name in a circular queue. The monitoring of job advancements is achieved by PD querying the MDS. If the jobs fails, PD deletes the related token from the circular queue. CSJ (Check Status of Jobs): this thread monitors the token status inserted in the circular queue. When a package terminates the execution normally, the CSJ notifies such condition to the WFS module. If any problem occurs during the package running, the CSJ communicates to the WFS module the status error and the type of failure. In this case the WFS module will reschedule the package execution. The CSJ includes a mechanism, using a timeout, in order to assure that a token does not remain in circular queue for a long time, due to an error of the associated PD. In such a situation, the token is removed and the CSJ module communicates an error message to the WFS; the package will be forwarded to the Q1-FIFO queue for a new running. When all packages present in the DAG of the considered path are executed, the Controller stops.
MDS Search
Stop
Q1-FIFO
Start
WFS
Table 1. Specification of domain size and grid spacing used by the modeling system. Nx, Ny and Nz are the number of mesh points respectively; Dx and Dy are the mesh spacing.
Grid Computational Resources
Q2-FIFO
RRSP
PS
CSJ
PD
Figure 4. The modeling domain and the three nested grids.
Circular Queue
Figure 3. G-AQFS Controller overview. 5. A case study The main focus of this work is the implementation of the first version of the G-AQFS. In order to do this, the main activity is developing, debugging and running the Modeling system on a small Grid environment (with four Linux workstations).
Grid
Nx
Ny
Nz
Dx, Dy(km)
1 RAMS
60
60
25
30
2 RAMS
44
38
25
15
3 RAMSCALMETCALGRID
58
58
10
1.875
5.1 Meteorological and dispersion simulations: comparison with observed data In order to compare observed data with model results, it must be taken into account that the measurements are taken at discrete locations, while calculated values are representative for an horizontal grid cell of 1800x1800 m, and that the lowest sigma level is located 48m above the ground. So, the meteorological values produced by
Journal of Digital Information Management G Volume 2 Number 2 G June 2004
71
the RAMS model over that level need to be adjusted at the height of observations, that is 10m, by using standard boundary layer profile formulas [20].
360 330 300 270
wd (deg)
In addition, comparisons between observations and model predictions are complicated by the fact that observations are point measurements while model predictions are Reynold’s average mean state variable. For the comparisons we utilized meteorological data from the station A, and ozone concentration data measured at the station B (see Figure 5). Table 2 summarises station locations and their main characteristics.
(b)
240 210 180 150 120 90 60 30 0
0
6 12 18
0
6 12 18
0
6 12 18
0
6 12 18
23
TIME (from 4th to 7th of august 2002)
(c) 40 38
Table 2. Main characteristics of the measurement stations. Ws is wind speed, wd is the wind direction and T is temperature Meteorological stations
Measured quantities
Measurement heights (m)
A: (40°20’, 18°06’)
ws, wd, T
10
36
temperature (°C)
Figure 5. Salento Peninsula: A and B indicate the meteorological and environmental monitoring station, respectively.
34 32 30 28 26 24 22 20
Environmental monitoring network 18°01’) O3 - Hourly
B: (40°23’, 10 concentration
Figure 6 (a, b and c) shows the time variation of the modelled and observed meteorological parameters at station A for the period 4-7 August 2002. The triangles represent the measured data, the continuous and the dashed lins indicate the prognostic model RAMS and the boundary layer model CALMET respectively. It is evident that both models can reproduce the diurnal evolution of the three meteorological parameters characterised by maxima during the day and minima at night. Figure 7 shows the comparison between the predicted and observed ozone ground concentration in a measurement station situated in the middle of modeling domain. It is evident that the model can reproduce
0
6
12 18
0
6
12 18
0
6
12 18
0
6 12 18 23
TIME (from 4th to 7th of august 2002) the data realistically with the typical diurnal cycle characterised by maxima during the day and minima at night.
Figure 6 (a, b and c). The evolution of modelled and observed wind speed and direction in the station located station A: solid line represents RAMS simulation, dash-dot line represents CALMET simulation while triangles are the observed data
(a)
10 9 8
ws (m/s)
7 6 5 4 3 2 1 0
0
6
12 18
0
6
12 18 0
6
12 18
0
6 12 18 23
TIME (from 4th to 7th of august 2002) Journal of Digital Information Management G Volume 2 Number 2 G June 2004
72
Figure 7. Comparison between modelled and measured O3 ground concentration at station B (40°23’, 18°01’). 5.2 Performance considerations Repeating the experiment several times and making a performance comparison between the execution time of the Modeling system on one machine and on our small grid environment, we obtained a reduction of time of about 50%. 6. Conclusions and Future work An integrated G-AQFS for studying transport, diffusion and reaction of air pollutants has been presented. The system includes 2 meteorological models, emission pre-processors and 2 dispersion models for inert and reactive pollutants. The integrated grid computing system (G-AQFS) allows the investigation of meteorological and chemical effects on the formation of air pollution. The computational workflow is managed by means of user defined DAGs. The computational Grid resources are scheduled with a round robin mechanism and querying the Globus MDS: a simple metric is used to define the best computational Grid resource at time Ti . As test case the modeling system on a small Grid environment has been applied over Salento Peninsula, in a summer period for studying photochemical pollution. Comparison between predicted and measured ground level ozone concentrations indicates that the system can simulate the ozone evolution realistically. We plan to test G-AQFS on a real Grid infrastructure. We are going to investigate the scheduling techniques and to compare the various possible metrics based on more Globus MDS attributes. Furthermore, our work can be extended in many ways, for instance varying the scheduling algorithm or adding other models. Acknowledgements Many thanks to the “Osservatorio dell’inquinamento dell’atmosfera e dello spazio circumterrestre di Campi Salentina (Lecce-Italy)” for supplying Environmental data, to Dr. Dario Conte and Mr. Cosimo Elefante for their computational support, to Mr. Giovanni Lella and Mr. Gennaro Rispoli for their technical support. References 1. Fisher, B. (2002). Meteorological factors influencing the occurrence of air pollution episodes involving chimney plumes. Meteor. Appl., 9: 199-210. 2. Melas, D., Lavagnini A., Sempreviva A.M., (2000). An investigation of the Boundary Layer Dynamics of Sardinia Island under sea-breze conditions. J. Appl. Meteor., 39: 516-524. 3. Foster, I., Kesselman, C. “The Grid: Blueprint for a New Computing Infrastructure”, Published by Morgan Kaufmann. 1998. 4. Foster, I., Kesselman, C(1997). Globus: A Metacomputing Infrastructure Toolkit, International Journal of Supercomputer Applications, 11(2) 115-128. 5. RAMS - Pielke, R.A., Cotton, W.R., Walko, R.L., Tremback, C.J., Lyons, W.A., Grasso, L.D., Nicholls, M.E., Moran, M.D., Wesley, D.A., Lee, T.J. & Copeland, J.H (1992). A comprehensive Meteorological Modeling System –RAMS, Meteorol. Atmos. Phys., 49, 69-91.
6. CALMET - Scire, J.S., Insley, E.M. & Yamartino, R(1990). Model Formulation and user’s guide for the CALMET meteorological Model, California Air Resource Board. 7. Scire J.S, Stimatis D.G., Yamartino R (1990). Model Formulation and user’s guide for the CALPUFF dispersion Mode.- California Air Resource Board. 8. Yamartino, R.J; J. Scire.; S.R. Hana; G.R. Carmichael, and Y.S. Chang. (1989). CALGRID: A Mesoscale Photochemical Grid Model. Volume I: Model Formulation Document Sigma Research Report No. A6-215-74. PTSD, California Air Resources Board. Sacramento, CA 94814. September. 9. Schipa, A. Tanzarella, R. Cesari, C. Mangia, G.P. Marra, P. Martano, M. Miglietta, U.Rizza (2003). A Modeling system for the transport and dispersion in presence of complex meteorological circulations, ECAM2003, September 2003, Roma, Italia. 10. Rizza, U, Mangia, C, Giostra, U, Martano, P, Gabucci, M.F, Rocco, D.Di, Miglietta, M, Schipa, I, Marra, G.P (2001). A Modeling system for air quality estimates in coastal area. Risk Analysis II, Editor C.A.Brebbia, WIT Press, ISBN 1-85312-830-9. 552-513. 11. On-line Scientific Workflows www.extreme.indiana.edu/swf-survey/
Survey,
URL:
http://
12. Basney, J., Livny, M (1999). Deploying a High Throughput Computing Cluster. In: High Performance Cluster Computing, Vol. 1. Prentice Hall PTR.Monitoring Discovery Service, 13. URL: http://www.globus.org/mds/ 14. Aloisio, G., Blasi, E., Cafaro, M., Epicoco,I (2001). The GRB library: Grid Computing with Globus in C.”, Proceedings HPCN Europe 2001, Amsterdam, Netherlands, Lecture Notes in Computer Science, Springer-Verlag, N. 2110, 133-140. 15. Aloisio, G., Cafaro, M., Epicoco, I (2002). Early experiences with the GrifFTP protocol using the GRB-GSIFTP library”, Future Generation Computer Systems, Volume 18, Number 8. 1053-1059, Special Issue on Grid Computing: Towards a New Computing Infrastructure, North-Holland. 16. GridFTP Protocol. URL: http://www-fp.mcs.anl.gov/dsl/GridFTPProtocol-RFC-Draft.pdf. 17. Singh, M.P, Vouk,M Scientific Workflows: Scientific Computing Meets Transactional Workflows. URL: http://lsdis.cs.uga.edu/activities/ NSF-workflow/singh.html. 18. Tuecke, S (2001), ‘Grid Security Infrastructure (GSI) Roadmap’. Internet Draft 2001, URL:www.gridforum.org/security/ggf1_200103/ drafts/draft-ggf-gsi-roadmap-02.pdf 19. Information for the Globus Toolkit Development Community, URL:http://www.globus.org/developer/ 20. Stull, R.B (1998). An Introduction to boundary layer meteorology. Kluwer Academic Publishers. 21. Yukihiro, Matsumoto, Ruby: an object-oriented scripting language, URL:http://www.ruby-lang.org/en/
Journal of Digital Information Management G Volume 2 Number 2 G June 2004
73