Optimal and Alternating-Direction Loadbalancing ... - Semantic Scholar

3 downloads 0 Views 233KB Size Report
Robert Els asser, Andreas Frommer, Burkhard Monien, Robert Preis sub-region independently (see e.g. 4, 5]). Here we associate a node with a mesh region, anĀ ...
Optimal and Alternating-Direction Loadbalancing Schemes Robert Elsasser1;? , Andreas Frommer2, Burkhard Monien1 ; *, and Robert Preis1; * Department of Mathematics and Computer Science University of Paderborn, D-33102 Paderborn, Germany 1

2

felsa,bm,[email protected]

Department of Mathematics and Institute for Applied Computer Science University of Wuppertal, D-42097 Wuppertal, Germany [email protected]

Abstract. We discuss iterative nearest neighbor load balancing schemes

on processor networks which are represented by a cartesian product of graphs like e.g. tori or hypercubes. By the use of the AlternatingDirection Loadbalancing scheme, the number of load balance iterations decreases by a factor of 2 for this type of graphs. The resulting ow is analyzed theoretically and it can be very high for certain cases. Therefore, we furthermore present the Mixed-Direction scheme which needs the same number of iterations but results in a much smaller ow. Apart from that, we present a simple optimal di usion scheme for general graphs which calculates a minimal balancing ow in the l2 norm. The scheme is based on the spectrum of the graph representing the network and needs only m ? 1 iterations in order to balance the load with m being the number of distinct eigenvalues.

1 Introduction We consider the load balancing problem in a synchronous, distributed processor network. Each node of the network has a computational load and, if it is not equally distributed, it will have to be balanced by moving parts of the load via communication links between two processors. We model the processor network by an undirected, connected graph G = (V; E ) in which node vi 2 V contains a computational load of wi . We want to determine a schedule in order to move load across edges so that nally the weight on each node is approximately equal. In each step a node is allowed to move any size of load to each of its neighbors in G. Communication is only allowed along the edges of G. This problem models load balancing in parallel adaptive nite element simulations where a geometric space, which is discretized using a mesh, is partitioned into sub-regions. Besides, the computation proceeds on mesh elements in each ?

Supported by European Union ESPRIT LTR Project 20244 (ALCOM-IT) and German Research Foundation (DFG) Project SFB-376.

2

Robert Elsasser, Andreas Frommer, Burkhard Monien, Robert Preis

sub-region independently (see e.g. [4, 5]). Here we associate a node with a mesh region, an edge with the geometric adjacency between two regions, and load with the number of mesh elements in each region. As the computation proceeds, the mesh gets re ned or coarsened depending on the characteristics of the problem and the size of the sub{regions (numbers of elements) has to be balanced. Since elements have to reside in their geometric adjacency, they can only be moved between adjacent mesh regions, i.e. via edges of the graph [5]. In this paper we consider the calculation of the load balancing ow. In a further step, the moved load elements have to be chosen and moved to their new destination. Scalable algorithms for our load balancing problem iteratively balance the load of a node with its neighbors until the whole network is globally balanced. The class of local iterative load balancing algorithms distinguishes between diffusion [2] and dimension exchange [2, 13] iterations. Di usion algorithms assume that a node of the graph can send and receive messages to/from all its neighbors simultaneously, whereas dimension exchange iteratively only balances with one neighbor after the other. The quality of a balancing algorithm can be measured in terms of numbers of iterations that are required in order to achieve a balanced state and in terms of the amount of load moved over the edges of the graph. The earliest local method is the di usive rst order scheme (FOS) by [2]. It lacks in performance because of its slow convergence. With the idea of over-relaxation FOS can be sped up by an order of magnitude [6] (second order scheme, SOS). Di usive algorithms have gained new attention in the last couple of years [4, 5, 7, 8, 10, 12, 13] and it has been shown that all local iterative di usion algorithms determine the unique l2 -minimal ow [3]. In Sect. 3, we discuss a combination of the di usion and the dimensionexchange model which we call the Alternating Direction Iterative (ADI) model. It can be used for cartesian products of graphs like tori or hypercubes, which often occur as processor networks. Here, in each iteration, a processor rst communicates with its neighbors in one and then in the other direction. Thus, it communicates once per iteration with each neighbor. We show that for these graphs the upper bound on the number of load balance steps will be halved. As a drawback, we can prove that the resulting ow is not minimal in the l2 -norm and can be very large if optimal parameters for the number of iterations are used. As a (partial) remedy to this problem we present the Mixed Direction Iterative (MDI) scheme which needs the same number of iterations but results in a much smaller ow. [3] introduced an optimal load balancing scheme based on the spectrum of the graph. It only needs m ? 1 iterations with m being the number of distinct eigenvalues. This scheme keeps the load-di erences small from step to step, i.e. it is numerically stable. In Sect. 4, we present a much simpler optimal scheme OPT which also needs only m ? 1 iterations. It might trap into numerical instable conditions, but we present rules of how to avoid those. The calculation of the spectrum for arbitrary large graphs is very time-consuming, but it is known for many classes of graphs and can eciently be computed for small graphs. In Sect. 5 experiments are presented to underline the theoretical observations.

Optimal and Alternating-Direction Loadbalancing Schemes

3

2 De nitions Let G = (V; E ) be a connected, undirected graph with jV j = n nodes and jE j = N edges. Let wi 2 IR be P then load of node vi 2 V and w 2 IRn be

the vector of load values. w := n1 ( i=1 wi )(1; : : : ; 1) denotes the vector of the average load. De ne A 2 f?1; 0; 1gnN to be the node-edge incidence matrix of G. Every column has exactly two non-zero entries \1" and \?1" for the two nodes incident to the corresponding edge. The signs of these non-zeros implicitly de ne directions for the edges of G which will later on be used to express the direction of the ow. Let B 2 f0; 1gnn be the adjacency matrix of G. As G is undirected, B is symmetric. Column/row i of B contains 1's at the positions of all neighbors of vi . The Laplacian L 2 ZZ nn of G is de ned as L := D ? B , with D 2 IN nn containing the node degrees as diagonal entries and 0 elsewhere. Obviously, L = AAT . Let x 2 IRN be a ow on the edges of G. x is called a balancing ow on G i Ax = w ? w. Among all possible ows P we are interested in balancing ows with minimalPl2 -norm de ned as kxk2 = ( Ni=1 x2i )1=2 . Let us also de ne the l1 -norm kxk1 = Ni=1 xi and the l1 -norm kxk1 = maxNi=1 xi . We consider the following local iterative load balancing algorithm which performs iterations on the nodes of G with communication between adjacent nodes only. This method performs on each node vi 2 V the iteration 8 e = fvi ; vj g 2 E : yek?1 = (wik?1 ? wjk?1 ); xke = xek?1 + yek?1 ; X and wik = wik?1 ? e=fv ;v g2E yek?1 (1) i j

Here, yek is the amount of load sent via edge e in step k with being a parameter for the fraction of the load di erence, xke is the total load sent via edge e until iteration k and wik is the load of node i after the k-th iteration. The iteration (1) converges to the average load w [2]. This scheme is known as the di usion algorithm. Equation (1) can be written in a matrix notation as wk = Mwk?1 withPM = I ? L 2 IRnn . M contains at position (i; j ) of edge e = (vi ; vj ), 1? e=fvi ;vj g2E at diagonal entry i, and 0 elsewhere. Now let i (L), 1  i  m be the distinct eigenvalues of the Laplacian L in increasing order. It is known that 1 (L) = 0 is simple with eigenvector (1; : : : ; 1) and m (L)  2  degmax (with degmax being the maximum degree of all nodes) [1]. Then M has the eigenvalues i = 1 ? i and has to be chosen such that m > ?1. Note that 1 = 1 is a simple eigenvalue to the eigenvector (1; 1; : : : ; 1). Such a matrix M is called di usion matrix. We denote by = maxfj2 j; jm jg < 1 the second largest eigenvalue of M according to absolute values and call it the di usion norm of M . P Let w0 = m i=1 zi be represented as a sum of (not necessarily normalized) eigenvectors zi of M with Mzi = i zi ; i = 1; : : : ; m. Then w = z1 [3]. De nition 1. A polynomial based load balancing scheme is any scheme for which the work load wk in step k can be expressed in the form wk = pk (M )w0 where pk 2  k . Here,  k denotes the set of all polynomials p of degree deg(p)  k satisfying the constraint p(1) = 1.

4

Robert Elsasser, Andreas Frommer, Burkhard Monien, Robert Preis

Condition pk (1) = 1 implies that all row sums in matrix pk (M ) are equal to 1. For a polynomial based scheme to be feasible practically, it must be possible to rewrite it as an update process where wk is computed from wk?1 (and maybe some previous iterates) involving one multiplication with M only. This means that the polynomials pk have to satisfy some kind of short recurrence relation. The convergence of a polynomial based scheme depends on whether (and how = pk (M )e0 fast) the `error' ek = wk ? w tends to zero. These errors ek satisfy ek P m z [3]. 0 0 which yields kek k2  maxm j p (  ) jk e k k = 0 ; 1 ; : : : , where e = 2 i=2 k i i=2 i As an example, take the rst order-scheme (FOS) [2], where we have pk (t) = tk . These polynomials satisfy the simple short recurrence pk (t) = t  pk?1 (t); k = 1; 2; : : : ; so that we get wk = Mwk?1 ; k = 1; 2; : : :. Now jpk (i )j = jki j  k for i = 2; : : : ; m and therefore

kek k  k  ke k : Here, (M ) = maxfj1 ?  (L)j; j1 ? m (L)jg and the minimum of is achieved 2

0

2

2

1? for the optimal value = 2 (L)+2m (L) . Then we have = mm ((LL))+?22((LL)) = 1+  with  = m2 ((LL)) called the condition number of the Laplace matrix L. The ow characterized by FOS is l2 minimal [3]. The solution to the l2 minimization problem "minimize kxk2 over all x with Ax = w0 ? w" is given by x = AT z , where Lz = w0 ? w [3],[8]. Now zPcan be expressed through Pm the 1 0 eigenvalues i P and eigenvectors zi of L as z = m z where w = i i=2 i i=1 zi z . As it has been shown in [3], for a polynomial based load and w0 ? w = m i=2 i balancing scheme with wk = pk (M )w0 , the corresponding l2 -minimal ow xk (such that w0 ? wk = Axk ) is given by xk = qk?1 (L)w0 , the polynomial qk?1 being de ned by pk (1 ? t) = 1 + tqk?1 (t). The ows xk converge to x, the l2 -minimal balancing ow. It was also shown in [3] that xk can be very easily computed along with wk if the iteration relies on a 3-term recurrence relation (as is the case for FOS, SOS and the OPT scheme of Sect.4).

3 Alternating Direction Iterative Schemes Cartesian products of lines and paths have been discussed previously in [13], but no analysis of the quality of the ow have been made. We generalize the problem to the product of arbitrary graphs, show that one can reduce the number of loadbalance iterations by alternating the direction of the loadbalancing and analyze the resulting ow. We propose a new Mixed-Direction scheme that results in a better ow than the classical Alternating-Direction scheme. Let G be the cartesian product of two graphs G0 and G00 with vertex sets V 0 and V 00 , edge sets E 0 and E 00 and the spectra 1 (G0 ), : : :, p (G0 ) and 1 (G00 ), : : :, q (G00 ), where we denote with p = jV 0 j and q = jV 00 j. Now G has the vertex set V = V 0  V 00 and the edge set E = f((u0i ; u00j ); (vi0 ; vj00 )) j where u0i = vi0 and (u00j ; vj00 ) 2 E 00 or u00j = vj00 and (u0j ; vj0 ) 2 E 0 g. Any eigenvalue (G) of G is of the form (G) = i (G0 ) + j (G00 ) with i (G0 ) eigenvalue of G0 i 2 f1; : : : ; pg and j (G00 ) eigenvalue of G00 j 2 f1; : : : ; qg. (The same relation also holds for the

Optimal and Alternating-Direction Loadbalancing Schemes

5

eigenvalues of the adjacency matrices.) We denote with Ik the identity matrix of order k. Now the adjacency matrix BG of G is BG = Iq BG0 + BG00 Ip where

denotes the tensor product. Obviously, the matrices Iq BG0 and BG00 Ip have a common system of eigenvectors. Now we apply the alternating direction iterative (ADI) load balancing strategy on G, which is similar to the alternating direction method for solving linear systems [11]. In an iteration, rstly balance among one component of the cartesian product (for example G0 ) using FOS, and secondly among the other component (in our case G00 ) using FOS. We denote with M 0 = 1 ? 0 L0 and M 00 = 1 ? 00 L00 the di usion matrices of G0 and G00 . The di usion scheme ADI-FOS then has the form

8 e = f(u0i ; u00j ); (u0i ; vl00 )g 2 E : yek? = 00 (wku?0i ;u00j ? wku?0i ;vl00 ); and 8 e = f(u0i ; u00j ); (vl0 ; u00j )g 2 E : yek = 0 (wku0i ;u00j ? wkvl0 ;u00j ); 1

(

(

1

)

)

(

(

1

)

)

(2) (3)

Then we have wi+1 = (M 00 Ip )(Iq M 0 )wi = (M 00 M 0 )wi . Theorem 1. Let G be the cartesian product of two connected graphs G0 and G00 . Denote L; L0; L00 the corresponding Laplacians and M = I ? L; M 0 = I ? 0 L0 ; M 00 = I ? 00 L00 the di usion matrices for the FOS with optimal parameters ; 0 ; 00 . Moreover, let M and M 00 M 0 be the di usion norms. Then M 00 M 0