Parallel and GRID Implementation of a Large

Parallel and GRID Implementation of a Large Scale Air Pollution Model Tzvetan Ostromsky1 and Zahari Zlatev2 1

Institute for Parallel Processing, Bulgarian Academy of Sciences, Acad. G. Bonchev str., bl. 25-A, 1113 Sofia, Bulgaria [email protected] http://parallel.bas.bg/ceco/ 2 National Environmental Research Institute, Department of Atmospheric Environment, Frederiksborgvej 399 P. O. Box 358, DK-4000 Roskilde, Denmark [email protected] http://www.dmu.dk/AtmosphericEnvironment

Abstract. Large-scale environmental models are powerful tools, designed to meet the increasing demand in various environmental studies. The atmosphere is the most dynamic component of the environment, where the pollutants and other chemical species actively interact with each other, and can quickly be moved in a very long distance. Therefore the advanced modeling is usually done in a large computational domain. Moreover, all relevant physical, chemical and photochemical processes should be taken into account, which heavily depend on the meteorological conditions. All this makes the air pollution modeling a huge and rather difficult computational task, requiring a large amount of computational power. The most powerful supercomputers have been used for the development and test runs of such a model, the Danish Eulerin Model (DEM). Distributed parallel computing via MPI is one of the most efficient techniques in achieving good performance and getting results in real time. The quickly advancing GRID computing technology is another powerful tool that can be used to reach higher level of performance of such a huge model. Both techniques and their inherent problems are discussed in this paper. Results of numerical experiments are presented and analysed and some conclusions are drown, based on the experiments.

1

Introduction

The problem for air pollution modelling has been studied for years [8,9,15]. An air pollution model is generally described by a system of partial differential equations for calculating the concentrations of a number of chemical species (pollutants and other components of the air that interact with the pollutants) in a large 3-D domain (part of the atmosphere above the studied geographical region). The main physical and chemical processes (horizontal and vertical wind, diffusion, chemical reactions, emissions and deposition) should be adequately represented in the system. T. Boyanov et al. (Eds.): NMA 2006, LNCS 4310, pp. 475–482, 2007. c Springer-Verlag Berlin Heidelberg 2007

476

T. Ostromsky and Z. Zlatev

The Danish Eulerian Model (DEM) [1,10,14,15] is mathematically represented by the following system of partial differential equations: ∂cs ∂t

∂(ucs ) ∂(vcs ) ∂(wcs ) − − + ∂x ∂y ∂z ∂ ∂cs ∂cs ∂cs ∂ ∂ + Kx + Ky + Kz + ∂x ∂x ∂y ∂y ∂z ∂z

=−

+ Es + Qs (c1 , c2 , . . . cq ) − (k1s + k2s )cs ,

(1)

s = 1, 2, . . . q .

where – – – – – –

2

cs – the concentrations of the chemical species; u, v, w – the wind components along the coordinate axes; Kx , Ky , Kz – diffusion coefficients; Es – the emissions; k1s , k2s – dry / wet deposition coefficients; Qs (c1 , c2 , . . . cq ) – non-linear functions describing the chemical reactions between species under consideration [4] .

Splitting into Submodels

The above rather complex system (1) is split into three subsystems (submodels), according to the major physical and chemical processes as well as the numerical methods applied in their solution. These are the horizontal advection and diffusion (2); chemistry, emissions and deposition (3); and the vertical exchange (4) submodels (see below). (1)

(1)

(1)

∂(ucs ) ∂(vcs ) ∂ ∂cs =− − + ∂t ∂x ∂y ∂x

(1)

∂cs Kx ∂x

∂ + ∂y

(1)

∂cs Ky ∂y

(2)

(2)

∂cs (2) (2) (4) = Es + Qs (c1 , c2 , . . . c(2) q ) − (k1s + k2s )cs ∂t (3) (3) (3) ∂(wcs ) ∂ ∂cs ∂cs =− + Kz ∂t ∂z ∂z ∂z

(3) (4)

The discretization of the spatial derivatives in the right-hand-sides of the sub-models (2) – (4) results in forming three systems of ordinary differential equations. More details about the numerical methods, used in the submodels, can be found in [1,6,7,15].

3

Parallelization Strategy

The MPI standard library routines are used to parallelize this model. The MPI (Message Passing Interface, [5]) was initially developed as a standard communication library for distributed memory computers. Later, proving to be efficient,

Parallel and GRID Implementation of a Large Scale Air Pollution Model

477

portable and easy to use, it became one of the most popular parallelization tools for application programming. Now it can be used on much wider class of parallel systems, including shared-memory computers and clustered systems (each node of the cluster being a separate shared-memory machine). Thus it provides high level of portability of the code. Our MPI parallelization is based on the space domain partitioning [12,13]. The space domain is divided into several sub-domains (the number of the subdomains being equal to the number of MPI tasks). Each MPI task works on its own sub-domain. On each time step there is no data dependency between the MPI tasks on both the chemistry and the vertical exchange stages. This is not so with the advection-diffusion stage. Spatial grid partitioning between the MPI tasks requires overlapping of the inner boundaries and exchange of certain boundary values on the neighboring subgrids for proper treatment of the boundary conditions. The subdomains are usually too large to fit into the fast cache memory of the target processor. In order to achieve good data locality, the smaller (low-level tasks are grouped in chunks where appropriate for more efficient cache utilization. A parameter CHUNKSIZE is provided in the code, which should be tuned with respect to the cache size of the target machine. More detailed description of the main computational stages and the parallelization strategy can be found in [1,10,12,13,15]

4

Performance and Scalability of the Parallel Code

Results of parallel execution of the 2D MPI version of DEM for one month on the SUN HPC system at DTU are presented in Table 1. The target system SunFire E25K consists of 72 UltraSPARC-IV dual-core CPU-s (1350 MHz), i.e. 144 CPU in total. This is the largest SMP server available for Scientific Computing in Denmark. The MPI parallel code scales very well as can be seen from the Table 1. Results of parallel execution of the 2-D version of DEM for one month on a cluster of SunFire E25k computers at DTU. The time for waiting on the queue is given in the second column. The total user time and the times of the main computational stages in seconds, as well as the corresponding (speed-up) (given next in brackets), are shown in the last 3 columns. The 2-D DEM on a SunFire E25k machine, CHUNKSIZE=32 PE’s Wait time Run time User time Advection Chemistry [sec.] [sec.] time [sec.] (speed-up) 1 13 1800 1798 307 1374 2 3 904 902 (1.99) 155 (1.98) 702 (1.96) 4 158 456 454 (3.96) 78 (3.94) 346 (3.97) 8 9740 249 247 (7.28) 41 (7.49) 178 (7.72) 12 9634 182 181 (9.93) 31 (9.90) 120 (11.45) 16 9451 161 152 (11.82) 24 (12.79) 91 (15.10) 24 9184 116 107 (16.80) 17 (18.06) 60 (22.90) 32 10943 93 87 (20.67) 14 (21.93) 46 (29.87)

478


speed-ups in Table 1. The system, although not heavily loaded, has a lot of users and request of more processors can cause queueing of the job for several hours. This time is given in the second column of the table.

5

Running DEM on Various GRID Cites

As one can see from the above results, DEM can be run efficiently on various supercomputers. Moreover, the MPI code scales rather well on various parallel machines with up to 32 PE-s. If the job is not run on a dedicated queue, however, one should take into account the time, which the job spends waiting in a queue. For relatively short jobs (as, for example, 2-D DEM for one month period) this time can be much longer than the execution time of the job. Moreover, the parallel queues requireing more than 8 PE-s, the waiting time increases quickly with increasing the number of required processors, so the time saved due to the speed-up is entirely ”eaten” by the time for waiting in a queue. On the other hand, there is a vast amount of low-cost computer resources, distriuted all over the world, with free computational power. Some of them are even faster compared to a single processor of the SunFire supercomputer from the previous section, as seen from Table 2. Thanks to the novel GRID technology (based on the power of Internet), this scattered free resources can be used as a powerful computing system. If a similar job is submitted to various GRID cites, it has a good chance to be executed with almost no delay in the queue. The execution time, however, will vary with respect to the speed of the particular machine. The results of such experiment are given in 2. The sequential 2-D version of DEM has been submitted to all cites open to the Earth Science Research group (ESR) of the EGEE grid project through an appropriate queue (resource). Almost the half of them (14 out of 31) started running within 5 min. from the submition, 4 – within 1 hour from the submition, and 13 – more than an hour from the submition; 6 of the cites aborted or failed to execute the job, the rest 25 runs were successful. Comparing the results in both tables, one can see that most of the GRID cites finished the job in shorter time than the parallel supercomputer on 8 or more PE’s if the queueing time is also taken into account, in spite of the much shorter running times on the supercomputer. This happens, because the parallel supercomputers are often bisy and the job has to wait on a long queue, especially if a large number of processors is required. The possibility of some of the GRID cites to run parallel jobs (MPI jobs in particular) has not been used yet. This is a task for the near future. By using this possibility we can expect to decrease significantly the run time of the job (in dependence with the degree of parallelism). The waiting time, however, will probably increase due to the larger resourse requirements, which in general require more time to be satisfied. This is a common rule, valid for any multiprocessor system.


479

Table 2. Results from running the 2-D version of DEM with 96 × 96 grid for 1 month on the available GRID cites. The address of the cites is given in the first column, the waiting time (while the job is in state ”scheduled”) is given in the second column, the time for execution (wall clock) – in the third column. The same test problem, with the same parameters, as in the previous section, is used in this experiment. The performance of most cites (20 out of 31) seems to be similar or better than those of a single PE of the SunFire E25k supercomputer, used in the previous section experiment. GRID cite Wait time Run time User time Advection Chemistry [web address] [sec.] [sec.] [sec.] [sec.] [sec.] atlasce01.na.infn.it 41652 937 935 186 718 cclcgceli01.in2p3.fr 725 5706 2815 916 1750 ce.epcc.ed.ac.uk 78939 aborted ce.grid.tuke.sk 60 1763 1751 279 1408 ce.hep.ntua.gr 136 1741 1736 256 1426 ce.phy.bg.ac.yu 44569 2169 2163 515 1530 ce.ui.savba.sk 51 1509 1506 221 1238 ce001.grid.bas.bg 37111 1701 1697 257 1386 ce01.ariagni.hellasgrid.gr 68 1461 1460 226 1185 ce01.isabella.grnet.gr 152 2161 2155 421 1629 ce01.kallisto.hellasgrid.gr 21520 1466 1465 226 1189 ce01.marie.hellasgrid.gr 59 1451 1448 224 1177 ce02.marie.hellasgrid.gr 28279 1518 1516 239 1225 ce1.egee.fr.cgg.com 11088 failed ce2.egee.unile.it 253 failed grid012.ct.infn.it 136 3569 3001 663 2189 grid10.lal.in2p3.fr 15343 1165 1161 223 900 grid8.wdcb.ru 2369 aborted gridba2.ba.infn.it 53571 1977 1972 427 1440 gridgate.cs.tcd.ie 139 1947 1761 338 1332 griditce01.na.infn.it 1434 1952 1944 369 1477 helmsley.dur.scotgrid.ac.uk 50268 2396 2394 457 1813 hudson.datagrid.jussieu.fr 188 2641 2608 510 1958 lcgce01.gridpp.rl.ac.uk 216 1637 1613 325 1204 mu6.matrix.sara.nl 153 1526 1522 296 1148 polgrid1.in2p3.fr 34031 1546 1544 232 1263 prod-ce-01.pd.infn.it 665 1596 1591 240 1301 scaicl0.scai.fraunhofer.de 77881 aborted spaci01.na.infn.it 224 failed tbn20.nikhef.nl 66 1519 1518 225 1245 testbed001.grid.ici.ro 41631 2091 2083 349 1674

6

Applications

DEM has many applications in various environmental studies, forestry and wild life protection, human health preservation, agricultural economics, global climate changes study, etc. . Some of them are illustrated by the plots below, based on some of the output results of the model.

480


The levels of AOT40 for crops and forests respectively are given in the first two plots for year 2004. This special characteristics are used to evaluate the negative effect of the high ozone concentrations on the vegetation of plants (see [2,3,11,16,17] for more detail). The next two plots are related to the effect of the high ozone levels on the human health.

7

Conclusions and Plans for Future Work

– The Danish Eulerian Model is a complicated large-scale air pollution model. Its numerical treatment requires significant computational power, provided either by a high-performance supercomputer or the emerging GRID technology.


481

– By using splitting of the original PDE system and applying special parallelization techniques to each of the submodels we have created efficient, scalable and highly portable parallel implementation of DEM. Important results from various application areas can be obtained within a reasonable time. – The powerful GRID technology allows us to achieve similar or better time results on distributed low-cost resources of the GRID, on the price of somehow lower reliability. This is in comparison with relatively bisy supercomputers and ordinary (not high-priority) queues. – The portability of the parallel code, achieved by using only MPI standard library, is essential also for its parallel GRID implementation. This is one of our tasks for future work. Our preliminary expectations are for decreasing sigificantly the run time and increasig the waiting time for free PE-s, but in general best results for moderate degree of parallelism (4 or 8 PE-s). Targeting the fastest result, the optimal trade-off would be obtainened by experiments.

Acknowledgments This research was supported in part by the Bulgarian IST Centre of Competence in 21 Century – BIS-21++ (contract # INCO-CT-2005-016639) and by NATO grant NATO ARW “Impact of future climate changes on pollution levels in Europe”. A grant from the Danish Natural Sciences Research Council gave us access to all Danish supercomputers.

References 1. V. Alexandrov, A. Sameh, Y. Siddique and Z. Zlatev, Numerical integration of chemical ODE problems arising in air pollution models, Env. Modeling and Assessment, 2 (1997) 365–377. 2. A. Bastrup-Birk, J. Brandt, I. Uria and Z. Zlatev, Studying cumulative ozone exposures in Europe during a 7-year period, Journal of Geophysical Research, 102 (1997), pp. 23,917–23,935. 3. I. Dimov, Tz. Ostromsky, I. Tzvetanov, Z. Zlatev, Economical Estimation of the Losses of Crops Due to High Ozone Levels, Large-Scale Scientific Computations of Engineering and Environmental Problems II (M. Griebel, S. Margenov, P. Yalamov, eds.), NNFM, Vol.73, Vieweg (2000), pp. 275–282. 4. M. W. Gery, G. Z. Whitten, J. P. Killus and M. C. Dodge, A photochemical kinetics mechanism for urban and regional modeling, J. Geophys. Res. 94 (1989), pp. 12925–12956. 5. W. Gropp, E. Lusk and A. Skjellum, Using MPI: Portable programming with the message passing interface, MIT Press (1994), Cambridge, Massachusetts. 6. E. Hesstvedt, Ø. Hov and I. A. Isaksen, Quasi-steady-state approximations in air pollution modeling: comparison of two numerical schemes for oxidant prediction, Int. Journal of Chemical Kinetics 10 (1978), pp. 971–994. 7. Ø. Hov, Z. Zlatev, R. Berkowicz, A. Eliassen and L. P. Prahm, Comparison of numerical techniques for use in air pollution models with non-linear chemical reactions, Atmospheric Environment 23 (1988), pp. 967–983.

482


8. G. I. Marchuk, Mathematical modeling for the problem of the environment, Studies in Mathematics and Applications, No. 16 (1985), North-Holland, Amsterdam. 9. G. J. McRae, W. R. Goodin and J. H. Seinfeld, Numerical solution of the atmospheric diffusion equations for chemically reacting flows, J. Comp. Physics 45 (1984), pp. 1–42. 10. Tz. Ostromsky, W. Owczarz, Z. Zlatev, Computational Challenges in Large-scale Air Pollution Modelling, Proc. 2001 International Conference on Supercomputing in Sorrento, ACM Press (2001), pp. 407–418. 11. Tz. Ostromsky, I. Dimov, I. Tzvetanov, Z. Zlatev, Estimation of the Wheat Losses Caused by the Tropospheric Ozone in Bulgaria and Denmark, Numerical Analysis and Its Applications (L. Vulkov, J. Wasniewski, P. Yalamov, Eds.), LNCS-1988, Springer (2001), pp. 636–643. 12. Tz. Ostromsky, Z. Zlatev, Parallel Implementation of a Large-scale 3-D Air Pollution Model, Large Scale Scientific Computing (S. Margenov, J. Wasniewski, P. Yalamov, Eds.), LNCS-2179, Springer, 2001, pp. 309–316. 13. Tz. Ostromsky, Z. Zlatev, Flexible Two-level Parallel Implementations of a Large Air Pollution Model, Numerical Methods and Applications (I.Dimov, I.Lirkov, S. Margenov, Z. Zlatev - eds.), LNCS-2542, Springer (2002), pp. 545–554. 14. WEB-site of the Danish Eulerian Model, available at: http://www.dmu.dk/AtmosphericEnvironment/DEM 15. Z. Zlatev, Computer treatment of large air pollution models, Kluwer (1995). 16. Z. Zlatev, I. Dimov, Tz. Ostromsky, G. Geernaert, I. Tzvetanov, A. Bastrup-Birk, Calculating Losses of Crops in Denmark Caused by High Ozone Levels, Environmental Modelling and Assessment, Vol. 6, Kluwer (2001), pp. 35–55. 17. Z. Zlatev, G. Geernaert and H. Skov, A Study of ozone critical levels in Denmark, EUROSAP Newsletter 36 (1999), pp. 1–9.

Parallel and GRID Implementation of a Large

Parallel and GRID Implementation of a Large

Suggest Documents

Parallel runs of a large air pollution model on a grid of Sun computers

Grid Portal Implementation of BioSimGrid: A Biomolecular

GCIMCA: A Globus and SPRNG Implementation of a Grid-Computing ...

cudaBayesreg: Parallel Implementation of a Bayesian Multilevel ...

cudaBayesreg: Parallel Implementation of a Bayesian Multilevel ...

Implementation of a Massively Parallel Dynamic ...

Implementation of parallel tridiagonal solvers for a

An Optimized Combination of a Large Grid

Parallel implementation of a central decomposition ... - CiteSeerX

Parallel Implementation of a Data-Transpose ...

A Parallel Implementation of blockMesh for Quick

A Parallel Implementation of Molecular Packing using

A PARALLEL IMPLEMENTATION OF THE COVARIANCE MATRIX ...

A Parallel Implementation of Singular Value

A PARALLEL IMPLEMENTATION OF THE COVARIANCE MATRIX ...

A Parallel implementation of Gram-Schmidt Algorithm

A Parallel Algorithm for Calculation of Large

A Parallel Algorithm for Calculation of Large

Simulation and Implementation of Grid-connected

Design and implementation of an enterprise grid

Prospective implementation of grid-interactive

Parallel Independent Grid Files Based on a

IMPLEMENTATION OF GRID MAPPED ROBOT

Distributed and Parallel Systems : Cluster and Grid