Distributed Algorithm for Two{Dimensional Global Climate Modeling T. L. Huntsberger Department of Computer Science University of South Carolina Columbia, SC 29208
[email protected] Abstract
The importance of global climate modeling for long{ term studies of global warming and ozone depletion is the focus of the recent DOE CHAMMP program [7]. We have developed a model and distributed algorithm for 2{D climate modeling which includes ocean intermediate/deep layer dynamics, coupled atmosphere/biosphere hydrology and velocity interactions, and variable lapse rates. This algorithm allows the testing of new sub{models, without the computational overhead and possible poor spatial resolution entailed in a full 3{D Global Climate Model (GCM) study. The results of some performance and scaling experiments with the 2{D climate model on the Intel Paragon at the University of South Carolina using ParaSoft Express are also presented.
1 Introduction
Global climate models can be broadly categorized based on the dimensionality of the analysis. The lowest order Energy Balance Models (EBMs) are zero and one dimensional, with recent extensions to two dimensions [17, 19]. These types of models predict mean values of most atmospheric variables quite well, depending on the assumptions made about energy input, atmospheric aerosol concentrations, etc. The next level up the hierarchy in sophistication are the Radiative Convective Models (RCMs) [8] and Statistical Dynamical Models (SDMs) [15], which are two dimensional. Integration is usually in the vertical direction where atmospheric dynamics are accurately modeled. At the nal level are the three dimensional Global Circulation Models (GCMs) [1, 13, 27]. Early studies into climate modeling using SIMD platforms concentrated on the porting of the algorithms, with very little scaling and performance information being gathered. Among these investigations are the spectral meteorological model on the ICL DAP by Carver [6], the barotropic quasigeostrophic ocean circulation model on the MPP by Grosch and Fa-
toohi [9] and the barotropic vorticity model on the MPP by Suarez and Abeles [22]. We have developed a model and distributed algorithm for 2{D climate modeling which includes ocean intermediate/deep layer dynamics, coupled atmosphere/biosphere hydrology and velocity interactions, and variable lapse rates. This algorithm facilitates the testing of new sub{models, without the computational overhead and possible poor spatial resolution entailed in a full 3{D Global Climate Model (GCM) study. The next section reviews previous work in parallel implementations of GCMs. This is followed by a description of our two dimensional model. Finally the results of some experiments run on the Intel Paragon at the University of South Carolina are given.
2 3-D Global Climate Models
The dominant methods for solution of the classes of PDEs used in climate models are nite dierence and spectral transform. A recent study by Jakob and Hack at NCAR on a 20 processor shared memory MIMD Encore Multimax compared these two techniques for solution to the shallow water model equations, which are the basis for the NCAR Community Climate Model (CCM2) [13]. Their previous work indicated that the spectral transform methods showed promise for ecient solution on a single processor CRAY X-MP/48 [5]. The results of their most recent investigation indicated a ramping eect in performance of the spectral transform with overhead costs sometimes as high as 30%. This greatly aected the scaling behavior of the algorithm. Performance of the nite dierence version of the algorithm started to degrade after 15 processors. Most of the overhead was attributed to the bus bottleneck, which will not improve with the addition of more processors. Recently Worley, Walker and Drake at ORNL analyzed the shallow water problem on a distributed memory 128 node Intel iPSC/860 hypercube [24, 25, 28, 29]. They investigated scaling with problem size, as well as with number of nodes. The TM notation is generally used for spatial problem decomposition of
an image of size I 2J, where J = d 3M2+1 e. For example, T60 corresponds to a 2o spatial resolution, while T169 corresponds to a 0:7o resolution. Eciency very quickly fell to 32% on the T21 problem (CCM1 size) by 16 nodes. For the T42 problem (CCM2 target), the eciency fell to 38% by 32 nodes. In the T169 problem study (ECMWF size) the eciency exhibited a less drastic fall-o, degrading to a 72% level by 128 nodes. Most of the ineciency can be traced to the sequential and message passing portions of the FFT steps in the spectral algorithm. The main limitation in the problem size study was the resident memory on each node. The T340 problem could only be run on the two smallest size decompositions.
3 2{D models
The 3{D UCLA Atmospheric General Circulation Model (AGCM) of Arakawa [1, 2] was recently ported into a 2{D latitude/longitude decomposed form by Wehner and associates [26]. Integration is carried out along a vertical column, which shows little sign of parallelization possibilities at least for a small number of layers. They used the staggered mesh decomposition of Arakawa [3] with vorticity corrections at the poles and discrete Fourier lters to allow longer time steps. Macros were designed to handle the portability issues of dynamic memory allocation and message passing. Tests run with a 9 vertical layer 4o by 5o square decomposition on a BBN-TC2000 indicated a scaling eciency of about 42% with 100 processors. Scaling performance was about the same on a network of workstations, with an eciency of about 56% on 16 nodes. Most of the loss of performance was in the Fourier ltering step, which involved global communication steps throughout the calculations. The starting point for our work is the 2{D climate model proposed by Sellers [19]. This model used a 10o by 10o box grid and a vertical integration step to compute monthly values of sea level pressure, temperature, wind speed and direction, relative humidity, precipitation, evaporation, runo, soil moisture content, ice and snow cover thickness, cloud cover and poleward energy transport. The temporal timestep used by Sellers was 15 days, which would be unsuitable for a ner spatial grid decomposition. Some of the assumptions made in the model led to unrealistic values of the predicted variables. Among these are: 1. The lack of a stratosphere due to a cut{o at the tropopause pressure level during vertical integration and the use of a constant lapse rate (6.5 K km?1 ) leads to unreasonably high values for zonal wind values at the top of the atmosphere. We have used the idealized vertical temperature pro le of the U.S. Standard Atmosphere [23] in our model to address this problem. Vertical integration in our model is carried out over 30 levels, with 10 levels in the troposphere. With this many levels in the troposphere, we can model the eects of vegetation canopy cover on lapse rate which may dier signi cantly from the Standard
Atmosphere assumption in the planetary boundary layer (PBL). 2. The use of surface friction coecients that depend only on land or water presence within a grid box lead to unrealistic wind speed responses. We have used the simple biosphere model (SiB) [20] to explicitly include eects of vegetation canopy on these coecients. The vegetation classi cation scheme of Matthews is used to determine local types [16]. Vegetation can introduce up to an order of magnitude variation in the surface friction coecients depending on the height of the canopy [4]. In addition, the evapotranspiration rates dier greatly for bare earth versus vegetated regions leading to totally dierent precipitation behavior. 3. The use of parameterized pressure dependencies leads to a lack of meridional resolution. Pressure and pressure gradients are used in the wind speed equations: 0 =0 (1) fv0 ? a2 ju0ju0 ? RpdT0 @p @x 0 0 = 0; (2) fu0 + a2 jv0jv0 + Rpd T0 @p 0 @y where u0 and v0 are the horizontal velocity components, Rd is the gas constant for dry air (287.04 J kg?1 K ?1 ), p0 is the surface pressure, T0 is the surface temperature and a2 is a surface friction coecient. In the original model used by Sellers [19], the last term on the left hand side of equation (2) was parameterized using zonal mean values and a latitude dependent variable b as, @p0 = b @ T0 ; @y a @y 2
Our model uses initial values of T0 and p0 as seeds, so the partial derivatives in the original equations can be evaluated directly with no need for the parameterization above. The enhancements to the original model should address some of the points mentioned as shortcomings in the study done by Sellers [19]. This will be at the expense of much more computation, making a sequential algorithm unsuitable for problem sizes of interest. Despite the relatively large number of levels (30) used for the vertical integration, our model is still a 2{D one since the vertical pro les are based on parameterized functions of the surface variables. Time updates in our model were performed using a two-step Lax-Wendro scheme in order to minimize mesh drifting and eliminate the need for the expensive Fourier ltering step used by Wehner and associates [26]. For the ocean component of our model, we use the box model of Harvey and Schneider [11]. This model assumes a 3 level vertical decomposition of the ocean
into mixed, intermediate and bottom layers. The equations governing these 3 layers are: RM dTdtM = Qs + L # +L " ?H ? LE + Cw VE + V_ (TI ? TM ) (3) g M ?I I = ? Cw VE (T ? T ) + C V_ (T ? T ) (4) RI dT w B I dt g M ?I I M g _ (5) RB dTdtB = Cw V (TB ? TI ) g where Qs is solar radiation absorbed at the surface, L # is the downward emitted atmospheric longwave radiation, L " is the upward emitted surface longwave radiation, H and LE are the turbulent sensible and latent heat uxes, Cw is the volumetric heat capacity of water, g is the surface area of water within a cell, VE is the equivalent mixed layer volume, M ?I is the turn-over time of mixed layer water with intermediate layer water, V_ is the thermohaline mass ux and TM , TI , and TB are the mixed, intermediate and bottom layer temperatures.
4 Experimental Studies
During the course of our experimental studies, we wanted to test the scaling properties of the algorithm both with problem size and number of processors. The algorithm was written using ParaSoft Express to maintain portability and code consistency across platforms [18]. The algorithm was developed under Network Express and then ported to the Paragon. The Express grid mapping routines [18] gave us enough
exibility to quickly prototype dierent decompositions and dierent order nite dierence solutions to the motion equations. Our datasets were standardized with I columns by J rows, where I = 2J, in order to maintain a symmetrical grid cell size. The results of runs on the 56 node Intel Paragon at the University of South Carolina are presented in Table 1. Results are for a model time of one month. PxQ T120 T169 T337 2x2 45.598 88.505 322.965 2x4 23.00 44.254 161.489 4x4 11.508 22.348 80.752 4x8 5.763 11.194 40.388
T682 1111.904 555.457 277.780 138.923
Table 1: Timing results for model (all times in secs) The number of processors in each dimension are given by P and Q. We found very little variation in the average times for dierent row/column decompositions (i.e., 4 8 versus 2 16), so only a single entry is reported for each total number of processors. Eciencies are equal to or better than 98% on all of the runs. A 100 year run at T169 resolution with a 4x8 spatial
decomposition on the Intel Paragon took 251 minutes, which indicates a scaling eciency of about 89%. A 10 year run at T682 resolution with a 4x8 spatial decomposition on the Intel Paragon took 53.1 hours with a scaling eciency of 92%. The temporal stepsize for the T682 run was 1/2 hour. Scaling is not linear for the reasons given in the next paragraph. Table 1 reports only the average times per node for each run. There is a load balancing problem in the algorithm during the solution of equations (1) and (2). The equations are quartic in either variable with four dierent decompositions of the answers due to the absolute value in the friction term. We solve the system using the eigenvalue method and need to eliminate solutions that don't meet the correct sign criteria. As such, in areas of the globe where the system is unstable, such as the equator and poles, the system may have to be solved up to four times. We have derived a measure for the load balance LB as: ; L = P 1 B
1 n
timeaver n timei
where timeaver is the average time for the run, and timei is the time taken for each of n processors. This expression has a value of 1.0 for a perfectly balanced system. Load balance versus problem size and number of processors is shown for the Intel Paragon in Table 2, where the same notation for runs is used as that of Table 1. PxQ 2x2 2x4 4x4 4x8
T120 0.977 0.974 0.963 0.958
T169 0.981 0.978 0.968 0.964
T337 0.992 0.991 0.981 0.978
T682 0.999 0.999 0.994 0.994
Table 2: Load balance measure The load imbalance was not severe up to and including 32 processors, but was steadily increasing. A possible solution to this problem would be an uneven dataset decomposition, which would reduce the number of grid points in unstable solution areas. Load imbalance decreased with problem size, indicating that the system was more stable at ner grid resolutions. Further studies with a higher number of processors will be needed to see if this trend continues. The load imbalance was still within a reasonable range for all of the problem sizes that were studied. The algorithm produces the sea-level wind and ocean current elds prior to the vertical integration. For the sake of brevity, only the ocean current eld is shown. In Figure 1, gray values represent the scaled speed with black being the lowest negative value and white being the highest positive value. The major ocean currents are captured, most notably the Gulf Stream and Equatorial Countercurrent. The model was validated using the mean annual surface temperatures at a grid resolution of 2:5os.
References [1] A. Arakawa and V. R. Lamb, \Computational design of the basic dynamical processes of the UCLA general circulation model," in Methods in Computational Physics, 17, (Ed. J. Chang), Academic Press, NY, pp. 173{265, 1977.
[2] A. Arakawa and M. Suarez, \Vertical dierencing of the primitive equations in sigma coordinates," Mon. Wea. Rev., 111, pp. 34{45, 1983.
Figure 1: Ocean current eld for January 1990. The NASA GISS database was used for the monthly ground truth information and our model produced monthly values using a temporal stepsize of 12 hours. The most notable dierences were along the western coastline of South America where the temperatures were underestimated, and in the pole regions where the temperatures were overestimated.. Some of the discrepancies can be traced to our simple sea-ice model in the polar regions, and the lack of an El Ni~no component in our model for the coastal region. The majority of dierences wereo less than 1o C, with the largest difference being 2:8 C on the western South American coast.
5 Discussion
We have presented a distributed 2{D GCM model which is based on an energy balance form of the coupled atmosphere/ocean system. This model includes variable lapse rates and vegetation canopy corrections to the vertical integration process. There are 30 layers in the vertical direction, with 10 layers in the troposphere. This level of detail allows better PBL and cloud analysis to be included. The algorithm showed excellent scaling behavior over both problem size and number of processors. The slight amount of load imbalance that resulted from the system of equations for wind speed in unstable portions of the globe can be solved using uneven decompositions. We are presently investigating the incorporation of more sophisticated PBL models such as that of Sommeria [21] and Laval [14], better sea-ice models such as that of Hibler [12], and better ocean circulation models such as that of Han [10] into our algorithm.
Acknowledgements Sincere thanks to Al Bessey for the 100 years of solitude. I would also like to thank the PICS group and Ken Sallenger at the University of South Carolina for extended access to the Intel Paragon for the 10 year T682 run.
[3] A. Arakawa, \Finite{dierence models in climate modeling," in Physically{Based Modelling and Simulation of Climate and Climatic Change I, (Ed. M. E. Schlesinger), Kluwer Academic, Dordrecht, pp. 79{168, 1988. [4] S. P. Arya, Introduction to Micrometeorology, Academic Press, Inc, NY, 1988. [5] G. L. Browning, J. J. Hack and P. N. Swarztrauber, \A comparison of three numerical methods for solving dierential equations on the sphere," Mon. Wea. Rev., 117, pp. 1058{1075, 1989. [6] G. Carver, \A spectral meteorological model on the ICL DAP," Parallel Comp., 8, pp. 121{126, 1988. [7] U. S. Department of Energy, \Building an advanced climate model: Program plan for the CHAMMP climate modeling program," DOE Rep. DOE/ER{0479T, Washington, DC, Dec 1990. [8] Y. Fouquart, \Radiative transfer in climate models," in Physically{Based Modelling and Simulation of Climate and Climatic Change I, (Ed. M. E. Schlesinger), Kluwer Academic, Dordrecht, pp. 223{284, 1988. [9] C. E. Grosch and R. Fatoohi, \An implementation of a barotropic quasigeostrophic model of ocean circulation on the MPP," in Proc. First Sympos. Frontiers of Massively Parallel Scienti c Computation, Goddard Space Flight Center, Greenbelt, MD, Sep 24{25, 1986, pp. 3{9. [10] Y.{J. Han, \Modelling and simulation of the general circulation of the ocean," in Physically{ Based Modelling and Simulation of Climate and Climatic Change I, (Ed. M. E. Schlesinger), Kluwer Academic, Dordrecht, pp. 465{508, 1988. [11] L. D. D. Harvey and S. H. Schneider, \Transient climate response to external forcing on 100{104 year time scales, Parts 1 &2," J. Geophys. Res., 90, pp. 2191{2222, 1985.
[12] W. D. Hibler, \Modelling sea ice thermodynamics and dynamics in climate studies," in Physically{Based Modelling and Simulation of Climate and Climatic Change I, (Ed. M. E. Schlesinger), Kluwer Academic, Dordrecht, pp. 509{563, 1988. [13] R. Jakob and J. J. Hack, \Parallel MIMD programming for global models of atmospheric
ow," in Proc. Supercomputing '89, Reno, NV, 1989, pp. 106{112. [14] K. Laval, \Land surface processes," in Physically{Based Modelling and Simulation of Climate and Climatic Change, (Ed. M. E. Schlesinger), Kluwer Academic, Dordrecht, pp. 285{306, 1988. [15] M. C. MacCracken and S. J. Ghan, \Design and use of zonally{averaged climate models," in Physically{Based Modelling and Simulation of Climate and Climatic Change II, (Ed. M. E. Schlesinger), Kluwer Academic, Dordrecht, pp. 755{809, 1988. [16] E. Matthews, 1983, \Global vegetation and land use: New high-resolution data bases for climate studies," J. Clim. Appl. Meteor., 22, pp. 474{ 487, 1983. [17] G. R. North, \Lessons from energy balance models," in Physically{Based Modelling and Simulation of Climate and Climatic Change II, (Ed. M. E. Schlesinger), Kluwer Academic, Dordrecht, pp. 627{651, 1988. [18] Express Reference Guide, ParaSoft Corporation, Pasadena, CA, 1992. [19] W. D. Sellers, \A two{dimensional global climatic model," Mon. Wea. Rev., 104, pp. 233{ 248, 1976. [20] P. J. Sellers, Y. Mintz, Y. C. Sud and A. Dalcher, \A simple biosphere model (SiB) for use within general circulation models," J. Atmosph. Sci., 43, pp. 505{531, 1986. [21] G. Sommeria, \Three{dimensional simulation of turbulent processes in an undisturbed trade wind boundary layer," J. Atmosph. Sci., 33, pp. 216{241, 1976. [22] M. J. Suarez and J. Abeles, \Implementation of the barotropic vorticity equation on the MPP," in Proc. First Sympos. Frontiers of Massively Parallel Scienti c Computation, Goddard Space Flight Center, Greenbelt, MD, Sep 24{25, 1986, pp. 161{164. [23] U.S. Standard Atmosphere, NOAA, NASA, USAF, Washington, DC, 1976, 227 pp.
[24] D. W. Walker, P. H. Worley and J. B. Drake, \Parallelizing the spectral transform method{ part II," Tech. Rep. ORNL/TM{11855, Oak Ridge National Laboratory, Oak Ridge, TN, June 1991. [25] D. W. Walker, P. H. Worley, and J. B. Drake, \Parallelizing the Spectral Transform Method, Part II," Concurrency: Practice and Experience, 4:7, pp. 509{531, 1992. [26] M. F. Wehner, J. J. Ambrosiano, J. C. Brown, W .P. Dannevik, P. G. Eltgroth, A. A. Mirin, J. D. Farrara, C. C. Ma, C. R. Mechoso and J. A. Spahr, \Toward a high performance distributed memory climate model," in Proc. 2nd Intern. Sympos. on High Performance Distributed Computing (HPDC2), Spokane, WA, July 20{23, 1993, pp. 102{113. [27] D. L. Williamson, J. T. Kiehl, V. Ramanathan, R. E. Dickinson and J. J. Hack, \Description of NCAR community climate model (CCM1)," NCAR Tech. Note NCAR/TN{285+STR, NTIS PB87{203782/AS, NCAR, Boulder, CO, June 1987. [28] P. H. Worley and J. B. Drake, \Parallelizing the spectral transform method{part I," Tech. Rep. ORNL/TM{11747, Oak Ridge National Laboratory, Oak Ridge, TN, Feb 1991. [29] P. H. Worley, D. W. Walker and J. B. Drake, \Parallelizing the spectral transform method," in Proc. Sixth Distributed Memory Computing Conf. (DMCC6), Portland, OR, Apr 28{May 1, 1991, pp. 306{313.