A simple and robust method for load balancing ... - Semantic Scholar

0 downloads 0 Views 945KB Size Report
Center for Scientific Computing. State University of New York at Stony Brook. Stony Brook, NY 11794-3600 [email protected] y [email protected].
A simple and robust method for load balancing and mesh smoothing 

y

Yuefan Deng , Ronald F. Peierls , and Carlos Rivera

z

Center for Scienti c Computing State University of New York at Stony Brook Stony Brook, NY 11794-3600 

[email protected]

y

[email protected]

z

[email protected]

Keywords: Load balance, mesh smoothing, molecular dynamics Mathematical Reviews Index: 65Y05, 65L50, 81Vxx

1

Short title: A simple load Balance method Address for correspondence: Yuefan Deng Center for Scienti c Computing State University of New York at Stony Brook Stony Brook, NY 11794-3600 email: [email protected] phone: 516-632-8614

2

Abstract This paper describes a simple and robust method for load balance and mesh smoothing in both 2d and 3d spaces. The method produces nearly perfect balance in 2d for three arti cial load imbalance problems. We also explain the extension of the method to mesh smoothing, while pointing out the route to generalize it to higher dimensions.

3

1 Introduction We extend some earlier research on load balancing [2] to develop a robust and ecient algorithm for solving a broad range of computational problems, involving e ective utilization of computing resources, which arise in a variety of areas. We consider the class of computations characterized by the decomposition of a region in 2d (or 3d) space into polygonal (polyhedral) cells. Such computations include: 1. Computations on nite di erence grids; 2. Finite element computations on polygonal (polyhedral) meshes; and 3. Parallel computations involving spatially localized subdomains. The generic issue which often arises in each of these cases, is that a natural, static, decomposition of the problem may have undesirable properties for ecient numerical computing. For example, in a parallel computation, loads allocated to individual processors vary during execution, leading to signi cant imbalance reducing parallel eciency. Similarly, in mesh based computations, the mesh cells may have poor aspect ratios or have wide variation in size. Many computationally intensive scienti c applications fall into the class described above: 1. Molecular dynamics with short-range interactions [3, 1, 4]. Evolving inhomogeneous distributions of particles lead to load imbalance with any natural, spatially based, decomposition. 2. Multi-phase uid ow problems with moving interfaces. Any static mesh will quickly become obsolete as the interface moves. A rapid re-meshing procedure is needed. 3. Porous media ow problems with localized source terms such as wells. The natural 4

domain decomposition may produce severe load imbalance. 4. Mesh smoothing of hexahedral meshes generated in nite element engineering computations. Volume meshes generated from surface tting can result in meshes with undesirable properties which lead to or incur poorly convergent results. 5. Processing time varying, highly inhomogeneous, images using hierarchical data structures on parallel computers can produce serious load imbalances. In some cases the space to be decomposed is not the physical space in which underlying problem is de ned, but a discrete index space labeling the computational elements. If the decomposition has a hierarchical structure, our method can be applied initially at a coarse level, then incrementally to ner levels. In x2 we present the essential features of the proposed method by considering the particular case of a 2d logically rectangular decomposition. In x3 we present the algorithm in more details, including its extension to 3d and higher dimensions. We also discuss embedding it in a hierarchical approach, and issues related to parallelization. In x4 we present the results in studies of load balancing.

2 Our Approach The central idea of our method is to move the vertices of the polygonal (or polyhedral) regions, preserving the topology, but changing the detailed geometry to achieve the desired properties. We introduce a ctitious \force vector" at each vertex which determines the direction and magnitude of the move. To illustrate the method, we focus on the case of a logically rectangular (quadrilateral) 5

decomposition of a 2d problem. Each vertex is shared by four quadrilaterals. These four are grouped into two pairs in three di erent ways. For each such grouping, a force component is constructed whose direction depends on which of the pairs has the larger total load, and whose magnitude depends on the ratio of the loads. The direction is chosen so as to reduce the area of the more heavily loaded pair. The resultant of all these components determines the direction and distance to move the vertex. Alternative directional forces can be introduced to correct for such things as poor aspect ratios. The main steps are: Step I.

Compute the partial force vector at each vertex due to each pair decomposition.

Combine all the forces into a resultant total force vector. Step II.

Allow each vertex to move in the direction of the force by a distance which is

proportional to its magnitude. The proportionality constant is chosen to optimize the rate of improvement, while maintaining stability. Only one vertex of any triangle can be moved at any one stage. This does not exclude parallel implementations, since it is always possible to divide the vertices of the mesh into two non-overlapping sets of vertices which are independently movable. Step III.

If the desired properties have not yet been attained, repeat the above steps unless

improvement has ceased, or some speci ed cost has been exceeded.

6

3 The Method 3.1 Detailed description Although our principal focus in this paper is on the 2d case, much of the discussion can be generalized to higher dimensions. We therefore begin with a statement of the problem and introduce notation for the general case. We consider a logically rectangular d-dimensional mesh of n1  n2  : : : nd cells. We adopt the convention that Roman indices have range (1; : : : d), while Greek indices have range (1; : : : 2d ? 1). We also de ne the function bk (i) to be the k-th bit in the binary representation of the integer i, and Bk (i)  2bk (i) ? 1. A grid point is labeled by its d logical coordinates fpk g; 0  pk  nk , and is located at the Cartesian point xP  fxPk g. Each cell has 2d vertices, and therefore each interior mesh point is common to 2d cells. We introduce the 2d neighbor points to P each di ering from P in one coordinate:

NkP =

fp1; : : : ; pk?1; pk + 1; pk+1; : : : pdg

NkP =

fp1; : : : ; pk?1; pk ? 1; pk+1; : : : pd g

(From here on we drop the superscript P , since all discussion relates to the same point.) The 2d cells having the vertex P in common are labeled by 0  < 2d. Cell C contains, in addition to P , the d neighbor points fbk ( )Nk + (1 ? bk ( ))Nk g. These d points together with P form a d-simplex, S . The Nk points belong to exactly half of these 2d cells; the Nk points belong to the other half. 7

Let w be the measure for cell C of the quantity we wish to equidistribute, and de ne W = 2?d Wk = 21?d

Xw ;



X ( jNk 2C )

w :

We introduce a measure of excess value : k = (1 ? )

Wk ? W 2W

The magnitude jk j lies between 0 and (1 ? ). We de ne the vector vk to have magnitude jk j, and to be directed from P towards Nk (Nk ), if k is positive(negative). Each of these d vectors represents a potential move for P which tends to reduce the imbalance between cells containing Nk and cells containing Nk . Other vectors are constructed to generate potential moves tending to reduce the imbalance between the two groups of cells resulting from other partitions, 2d ? 1 independent ones being possible. The endpoint of the mean vector V

Pk

= dvk

determines the new location of P . This new point, is constrained so that all resultant cells remain convex. We need to make several remarks: 1. The above discussion assumes P is an interior point. For boundary points we adopt re ective boundary conditions to extend the region, and apply the algorithm as before. 8

2. In moving P we assume that all the neighbors Nk are held xed. Other mesh points may be simultaneously moved without a ecting the relocation of P . It is easy to see that we can apply a red-black coloring scheme to adjacent points and that in a parallel implementation half the total number of mesh points can be moved at each step. 3. The parameter  is constructed on the assumption that our goal is the equidistribution of the integral of some density function over the cells sharing the vertex P . Other goals, such as improving the shape of the cells can be accommodated by constructing di erent parameters and move vectors. 4. The approach can be carried out hierarchically. This involves an initial uniform decomposition into the coarsest mesh of exactly 2d cells. The interior point and boundary points are moved according to the algorithm to balance these loads. Each distorted rectangular cell which results is then uniformly subdivided into 2d subcells and the procedure repeated at the next level.

3.2 Two dimensional example For d = 2 the con guration is shown in Fig. 1. The d-simplices are triangles. There are three independent load ratios. Two of the vectors

vk

are constructed by comparing the

loads in cells (C0; C1) with (C2; C3), (C0; C2) with (C1; C3), as in de ned above. One more is needed, and this is obtained by comparing (C0; C3) with (C 1; C2). The vector direction is along v1  v2, depending on which pair has the higher load. The need for this third vector can be seen by considering the case when C1 and C2 are equally loaded as are also C0 and C3 at a much lower level. This is manifestly unbalanced, but the forces associated with 1

and 2 cancel. 9

In order to preserve convexity, it is necessary to replace the neighbor point Nk by an interior point Nk if the extrapolated external edge of one of the four quadrilaterals cuts the line from P to one of its neighbors Nk , as shown in the Fig. 1.

4 Case study and test results We have de ned a term called load imbalance ratio [2] which quanti es the percentage of time wasted due to load imbalance in one cycle of computing, i.e., R = 1 ? Wavg =Wmax: We have designed three basic load imbalance cases to examine the performance of our algorithm. All cases have 5  5 subdomains, (Figs. 2-4), the load in each subdomain is represented by the number of uniformly distributed dots in that subdomain. In all of our experiments, the maximum number of dots is 1000 and the minimum is 100. In Case 1, we place the maximum number of dots at the upper left subdomain and a gradually decreasing number of dots from this subdomain depending on distance. This case \models" a point source. In Case 2, we place the maximum number of dots at the diagonal subdomains and a gradually decreasing number of dots from the diagonal depending on distance. This case \models" a line source. In Case 3, we place the maximum and minimum numbers of dots alternatively, creating a checkerboard-like load distribution, which \models" a lattice of source points. We believe this to be a tough imbalance problem, which ultimately challenges the robustness of our method. In Figs. 2-4, we show the balancing process of the above three cases. In each gure, we have two pictures. The upper one shows the nal balanced decomposition of subdomains with the original distribution of dots. The lower one shows the fall of the load imbalance ratio typically from around 50% to nearly zero when we have a perfect balance. Inserted 10

in this lower picture, we have three histograms showing the initial, middle-point, and nal load distributions. Apparently, we noticed in all cases the distribution gets narrower and narrower and eventually becomes almost a vertical line indicating perfect balance.

5 Remarks Load balancing is generally highly problem dependent. There have been other tiling approaches, but few are applicable to 3d problems. Recursive bisection is an approach which has been carried out with some success, but the resulting topology depends on the bisection history and can not always be used to easily follow dynamically changing loads. This paper is meant to establish the principle of the method and to demonstrate that it can succeed in a variety of load distributions. In subsequent work we intend to apply it to some 3d problems, to improve both the eciency of the code and its parallelization, and to develop the hierarchical version. We also intend to introduce alternative vertex "forces" to apply the algorithm to mesh smoothing. The authors thank the National Science Foundation (Grant Number: DMS9626859) for supporting this project.

References [1] Y. Deng, R. McCoy, R. Marr, R. Peierls, and O. Yasar, Molecular dynamics on distributed-memory MIMD computers with load balancing, Appl. Math Letters, 8, 37 (1995).

11

[2] Y. Deng, R. McCoy, R. B. Marr, and R. F. Peierls, An Unconventional Method for Load Balancing, in Proceedings of the 7th SIAM Conference on Parallel Processing for Scienti c Computing, San Francisco, CA (1995), Eds. D. Bailey et al, p 645.

[3] S. J. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J. Comput. Phys. 117, 1 (1995).

[4] R. McCoy (Ph.D,'95;SUNY-Stony Brook), Parallel molecular dynamics.

12

C2

1 N

N2

C3

* N2

N2

N1

C

0

C1

P

Journal of comp. physics Deng, Peierls Rivera Fig. 1

13

Two dimensional case: basic notation

14

Journal of comp. physics Deng, Peierls Rivera Fig. 2

Imbalance(%)

I

# Subdomains 2

I

60 1

50 0 0

200

400

600

800

# Subdomains 2

1000 Load

II

40 1

30 0 0

200

400

600

800

# Subdomains 14

III

12

20

1000 Load

10 8 6 4

10

2 0 0

200

400

600

II 0

0

50

100

15

150

800

1000 Load

III

200

Step

Top: This picture shows how we balance the loads on 5x5 subdomains with the upper left corner having the most load using uniform decomposition. Bottom: Plot of load imbalance ratio which falls from about 65% to nearly zero. Inserted in this picture are three histograms showing the initial. middle-point, and nal load distributions.

16

Journal of comp. physics Deng, Peierls Rivera Fig. 3

Imbalance(%)

I

60

# Subdomains 12

I

10

55

8 6

50

4 2

45

0 0

200

400

600

800

1000

Load

# Subdomains

40

3

35

II

2

30

1

25

0 0

20

200

400

600

800

1000

Load

# Subdomains 22

15

III

18

II

10

14 10 8 4

50

III

0 0

200

400

600

800

1000 Load

0 0

50

100

17

150

200

Step

Top: This picture shows the solution for a dominant load along one of the diagonals. Bottom: Plot of load imbalance ratio which falls from about 60% to nearly zero. Inserted in this picture are three histograms showing the initial. middle-point, and nal load distributions.

18

Imbalance(%) 45

Journal of comp. physics Deng, Peierls Rivera Fig. 4

# Subdomains

I

I

12 10

40

8 6 4

35

2 0 0

200

400

600

800

# Subdomains 2

30

1000 Load

II

1

25

0

20

0

200

400

600

800

1000 Load

# Subdomains 4

15

III

3 2

10

1 0 0

5

0

200

400

600

800

1000 Load

II

0

200

III

400

600

19

800

Step

Top: This picture shows the solution for an initial load distribution in a checkerboard pattern. Bottom: Plot of load imbalance ratio which falls from about 44% to nearly zero. Inserted in this picture are three histograms showing the initial, middle-point, and nal load distributions.

20