Load Balance Heuristics for Synchronous Iterative ...

4 downloads 0 Views 230KB Size Report
[4] Barry Wilkinson and Michael, Allen Parallel. Programming: Techniques and Applications using. Networked Workstations and Parallel Computers, Prentice.
Load Balance Heuristics for Synchronous Iterative Applications on Heterogeneous Cluster Systems Weizhe Zhang, Mingzeng Hu , Hongli Zhang School of Computer Science and Technology, Harbin Institute of Technology, P.R.China {zwz , mzh , zhl}@pact518.hit.edu.cn Abstract Heterogeneous computing systems are emerging as a computing infrastructure that will enable the use of distributed heterogeneous clusters for a variety of challenging applications. The actual challenge is the load balance for tightly-coupled applications. In this paper, we focus on the important subclass of tightlycoupled applications, synchronous iterative applications and formally define their load balance problem. Two novel static meta heuristic algorithms are proposed for the load distribution: a genetic tabu hybrid search (GTHS) algorithm and a host clustering based iterative search (HCIS) algorithm, when different communication computation ratios are considered. To this end, the analysis and experiment results demonstrate the effectiveness of heuristic algorithms.

1. Introduction The major limitation to parallelize grand applications on the HCS arises from the difficulty of balancing the load using different speed processors and heterogeneous networks. It is convenient to classify classic parallel problems into embarrassingly parallel, synchronous, loosely synchronous, asynchronous, and meta problems [1]. The class of embarrassingly parallel problems requires no communication or synchronization and loosely synchronous applications are data parallel but data points evolve differently. Almost all kinds of the traditional load balancing strategies (such as domain decomposition, recursive bisection and round robin algorithm) can be brought into play for such applications on HCS because there is no or little interprocess communication. However, fully synchronous applications with higher communication and computation ratio require all the operations are synchronized at regular points and computation/communication capabilities of each resource must be taken into account at one time.

Therefore, it is an actual challenge to balancing the load for this class of tightly coupled applications on HCS. Moreover, asynchronous and meta problems have two or more subtasks; each of the subtasks belongs to one of the above categories, which demands more sophisticated meta scheduling strategies [5, 6] that beyond this article. We focus on the important subclass of fully synchronous applications -- synchronous iterative (or multiphase) applications, which encompass many numerical solvers (such as elliptic PDE solvers [2], finite element equations solvers [3]), optimization algorithms, discrete-event simulation, atmosphere and ocean circulation simulation, N-body simulation and cellar automata [4]. In such applications, parallel iterative algorithms repeatedly execute a computation on a large collection of application data, with an explicit synchronization of the tasks and exchange of data performed at the end of each iteration. Each processor reaches a barrier synchronization after each iteration and the next iteration cannot begin until all processes have finished the previous iteration. Several authors have derived performance models and load balance strategies for synchronous iterative algorithms running on homogeneous resources [5-8]. However, the study of load balance strategies for heterogeneous resources has been sparse. In this article, the load balance problems of synchronous iterative applications are formulated into two combination optimization problems. Afterwards, two heuristic algorithms are carried out and their performance issues are evaluated through experiments. The rest of this paper is organized as follows. Section 2 is devoted to the problem description and the formulation of load balance problems. We formulate the problems into combination optimization problems: SILBP and FPSILBP. Next, in Section 3, we bring forward two heuristics. The genetic tabu hybrid search algorithm for min-max FPSILBP in Section 3.1 and the host clustering based iterative search algorithm for min-max SILBP in Section 3.2. Performance study and

comparison are carried out for N-body and Jacobiiteration in Section 4. Finally, we give some remarks and conclusions in Section 5.

2. Problem Description and Formulation Definition 1 (SILBP) Load Balance Problem of Synchronous Iterative Applications: Given a set V of n processors v1,….vn with computation power πi and a communication link (vi,vj) between each processor with bandwidth ωi,j, given the total workload U and the communication volume L at each step of the synchronous iterative application, determine q processors , a ring permutation of the processors (v1,….vq) andλi, where 1≤q≤ n, 1≤i≤q, 0≤λi≤1, | q



|

λ

i

=1, so that

i=1

Titer = min { max((λi ×U)×πi + L×(ωi£, i−1¨£mod©q +ωi,(i+1)modq )) } 1≤q≤| V |

1≤i≤| q |

(1) The special case of SILBP is formulated as follow: Definition 2 (FPSILBP) Full Processor Load Balance Problem of Synchronous Iterative Applications : Given a set V of n processors v1,….vn with computation power πi and a communication link (vi,vj) between each processor with bandwidth ωi,j, given the total workload U and the communication volume L, U/L ≥ CONST ,can we find a ring permutation of the processors (v1,….vn) andλi , where n

1≤i≤n, 1>λi>0, ∑ λ i = 1 , so that the resulting time i=1

Titer of each iteration max((λi ×U ) × πi + L × (ωi£, i −1¨£mod©n + ωi ,(i +1) mod n )) ≤ D ?. 1≤i ≤ n

Since SILBP and FPSILBP are both NP-complete, we don't expect to find a polynomial-time algorithm for finding the minimum iteration time of a ring processor permutation. Section 3 presents the polynomial-time heuristic algorithm, however, which produces approximate solutions for these problems.

3 Heuristic algorithm In this section, two novel heuristic algorithms, GTHS algorithm and HCIS algorithm, are presented for solving the FPSILBP optimization problem, the special case of SILBP and the general SILBP min-max problem respectively.

3.1 The genetic tabu hybrid search (GTHS) algorithm for min-max FPSILBP

We first give an overall step-by-step description of the GTHS algorithm; afterwards the details will be explained. GTHS algorithm for min-max FPSILBP Input: A complete weighted graph G = (V, E) and fitness function F(x) Output: A cycle permutation x = (v1….vn, v1) and the loadλi U on vi, 0

Suggest Documents