A Dynamic Load Balancing Framework for Unstructured Adaptive

0 downloads 0 Views 153KB Size Report
It is particularly daunting when dynamic mesh adaption is used on unstructured grids. The CPU time and in-core memory requirements for such problems are ex-.
A Dynamic Load Balancing Framework for Unstructured Adaptive Computations on Distributed-Memory Multiprocessors Andrew Sohn CIS Dept., New Jersey Institute of Technology, Newark, NJ 07102, [email protected] Rupak Biswas RIACS, NASA Ames Research Center, Mo ett Field, CA 94035, [email protected] Horst D. Simon NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, [email protected] ABSTRACT The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires signi cant data movement at runtime. We present a dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view each time the computational mesh is adapted. JOVE has been implemented on an SP2 in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. Furthermore, JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.

the minimization of inter-processor communication. Various methods on dynamic load balancing have been reported to date [2-7]; however, most of them lack a global view of loads across processors. A systematic way of measuring and balancing all processor loads is needed for a method to be applicable to a variety of realistic applications. Our dynamic load balancer, called JOVE, is intended to satisfy these requirements. It possesses three novel features. First, a dual graph representation of the computational mesh is used to keep the complexity and connectivity constant during the course of an adaptive computation. Second, a new inertial spectral mesh partitioning method is introduced. Finally, an accurate metric for the computational gain and communication overhead is developed.

2 THREE-DIMENSIONAL MESH ADAPTION The three-dimensional unstructured mesh adaption scheme that is used in this work is called 3D TAG [1]. At each adaption step, tetrahedral elements are re ned or coarsened based on an error indicator for each edge. Edges whose error values exceed an upper threshold are bisected. Similarly, edges whose error values lie below a lower threshold are removed. Only the subdivision types shown in Fig. 1 are allowed for each element.

1 INTRODUCTION Flow computations in complex three-dimensional domains is a challenging task. It is particularly daunting when dynamic mesh adaption is used on unstructured grids. The CPU time and in-core memory requirements for such problems are extremely large and can only be satis ed by massively-parallel machines [8,10]. During an adaptive, unsteady calculation, the unstructured meshes are locally re ned and coarsened to capture important solution features. The computational intensity is thus not only time dependent, but also varies spatially over the problem domain. A distributed-memory implementation of such methods requires two steps: partitioning the computational mesh into smaller submeshes, and mapping the submeshes to processors. While static partitioning and mapping are adequate for problems that do not change in computational intensity over time, it is grossly inecient for unsteady, adaptive calculations. It is thus imperative that the amount of work assigned to each processor be balanced at runtime to increase processor utilization and improve performance [4,8]. Balancing the runtime computational load is usually very dicult. It requires reliable measurements of the computational load and the amount of data movement, as well as

1:8

1:4

1:2

Figure 1: Types of element subdivision. What makes parallel adaptive ow computation dicult are changes in the mesh. As the numerical simulation progresses, some regions of the grid may contain more elements due to re nement while other regions may contain fewer due to coarsening. The imbalance is likely to increase with each adaption. In the worst case, the use of a parallel machine would o er little advantage over sequential machines.

3 JOVE DYNAMIC LOAD BALANCING SCHEME Figure 2 gives an overview of our approach to dynamic load balancing. The system consists of three modules: the load

Proceedings of the 8th ACM Symposium on Parallel Algorithms and Architectures, Padua, Italy, June 24{26, 1996. 189

ered for repartitioning. If projecting the new values on the current partitions indicates that they are adequately load balanced, there is no need to repartition the mesh. In that case, JOVE terminates and allows the computation to continue uninterrupted on the current partitions. A proper metric is required to measure the load imbalance. If Wmax is the sum of the wcomp on the most heavilyloaded processor, and Wavg is the average load across all processors, the average idle time for each processor is (Wmax ? Wavg ). This is an exact measure of the load imbalance. The mesh is repartitioned if the imbalance factor Wmax=Wavg is greater than a speci ed threshold.

balancer JOVE, a ow solver, and the 3D TAG mesh adaptor [1]. Details of the ow solver are beyond the scope of this paper, except to note that it generates error values for each edge that are then used by 3D TAG to adapt the mesh. JOVE: Load Balancer Balanced Mesh

Flow Solver Errors

if (Pre_evaluate(new) == OK) { Partition(new); comp,comm = Evaluate(old,new); if (comp > comm) Move(old,new); }

New Mesh

3D_TAG: Mesh Adaptor

3.3 INERTIAL SPECTRAL MESH PARTITIONING If Pre evaluate(new) determines that the dual graph with the new wcomp is not load balanced, JOVE invokes the mesh partitioning procedure. Several partitioning algorithms are available for unstructured grids; however, a new procedure that combines the high quality of spectral methods [9] with an ecient update strategy is used. This new algorithm [11] is based on the center of inertia of the vertices of the dual graph and utilizes information from the initial spectral partitioning. It is thus capable of rapidly updating a partition from one grid to the next. The following algorithm brie y explains the method. for (i=0; i

Suggest Documents