Optimal Allocation of Tasks onto Networked

0 downloads 0 Views 124KB Size Report
Optimal Allocation of Tasks onto Networked Heterogeneous Computers ... maximizing system reliability, good load balancing among processors and efficient ...
International Network Optimization Conference (INOC’03), 27-29 October 2003, Evry/Paris, France.

25

Optimal Allocation of Tasks onto Networked Heterogeneous Computers Using Minimax Criterion Gamal ATTIYA and Yskandar HAMAM ESIEE, Lab. A2 SI Cit´e Descartes - BP 99, 93162 Noisy-Le-Grand, FRANCE Phone: +33 1 45 92 66 11, Fax: +33 1 45 92 66 99 Email: [email protected] & [email protected]

Abstract Advances in microprocessors and computer networks have made distributed systems reality. However, exploiting the full potential of these systems requires efficient allocation of tasks comprising a distributed application to the available processors of the systems. This problem is known to be NP-hard and therefore untractable as soon as the number of tasks and/or processors exceeds a few units. This paper presents an optimal, memory efficient, algorithm for allocating an application program onto processors of a distributed system to minimize the program completion time. The algorithm derived from the well known Branch-and-Bound with some modifications to minimize its computational time. Some experimental results are given to show the effectiveness of the proposed algorithm. Keywords: Task Allocation, Distributed Computing Systems, Integer Programming, Optimization.

1

Introduction

Distributed Computing Systems have become competitive in providing the power of a super machine with only a small initial cost. Such system has the further benefit of providing industry with an easy and modular upgrade path because increasing the power of the system simply involves increasing the number of networked computers. A major problem arises with such system is how to optimally distribute an application program over the available processors of the system so as to achieve one or more objective such as minimizing interprocessor communication costs, minimizing total completion time (quick turnaround time), maximizing system throughput, maximizing system reliability, good load balancing among processors and efficient utilization of system resources. If this step is not done properly, an increase in the number of processors may actually result in a decrease of total throughput. This degradation is caused by what is commonly called the ’saturation effect’ which occurs due to heavy communication traffic incurred by data transfer between tasks that reside on separate computers. In general, the problem of assigning tasks of an application to processors of a distributed system is known to be NP-hard and therefore untractable as soon as the number of tasks and/or processors exceeds a few units. Several approaches have been suggested to solve it. They may be roughly classified into four categories, namely, graph theoretical [1]–[3], mathematical programming [4]–[8], state space search [9]–[14] and heuristics [15]–[18]. The graph theoretical method uses a graph to represent the problem and then finds an optimal allocation by applying the maximum flow minimum cut algorithms on the graph. The mathematical programming formulates the task allocation problem as an optimization problem and then solves it by using mathematical programming techniques such as Branch-and-Bound. The state space search uses the well known A* algorithm of the artificial intelligence to traverse a search tree searching for an optimal solution. The heuristic methods search for a near optimal solution by using special parameters that affect the system indirectly. Due to the complexity of the allocation problem, most reported algorithms yield suboptimal solutions while optimal allocation exists only for restricted cases or for small problems. In general, optimal solution can be found through an exhaustive enumeration/search of all possible assignments and their costs. But this requires N M ways in which M tasks can be assigned to N processors, so an exhaustive search is often not possible. This paper presents an optimal, memory efficient, algorithm for allocating a parallel program onto processors of a distributed system to minimize the program completion time. Based on the so called minimax criterion, the allocation problem is first formulated as an optimization problem and then solved by applying our algorithm. The algorithm derived from the well known Branch-and-Bound technique with two distinct features to minimize its computational time and

International Network Optimization Conference (INOC’03), 27-29 October 2003, Evry/Paris, France.

to increase the efficiency of using the system memory. First, the algorithm follows a best first branch strategy for selecting a node to be expanded. This property speeds up the searching process for the first candidate solution. Second, the algorithm handles tasks at the tree levels according to a pre-specified order. That is, it reorders the application tasks according to the task with the largest number of neighbors. If two tasks have the same degree, it considers the task of higher communication requirements. This property reduces the total number of generated nodes and so minimizes the algorithm computational time and increases the efficiency of using the system memory. The remainder of the paper is organized as follows. Section 2 presents a general structure of the allocation model. A mathematical model for the allocation problem is presented in Section 3. Section 4 illustrates the Branch-andBound technique and describes our algorithm. Some experimental results are figured in Section 5. Finally, the concluding remarks are presented in Section 6.

2

Framework of the Allocation Problem

The task Allocation problem is usually addressed using a graph representation of the application program. Typically, a distributed application is represented as a collection of tasks which corresponding to nodes in a graph while the arcs of the graph may represent communication between tasks, precedence relations or both of them. Both directed and undirected graphs may be used to represent the application. The directed graph may be used to represent execution dependencies in terms of precedence relation between tasks. In contrast, the undirected graph may be used to represent information exchange and concurrent execution between tasks. This paper considers the case where the parallel programs represented by undirected task graph G(V,E). Where, V represents set of M nodes (tasks) and E represents set of edges. In the graph, each task i ∈ V is labeled by memory requirements and each arc (i, j) ∈ E is labeled by communication requirements among tasks. On the other hand, the distributed computing system consists of N heterogeneous computers connected via an interconnection network. Each computer has some computation facilities and its own memory. Furthermore, the interconnection network has some communication capacities and cost of transferring a data unit from one computer sender to other computer receiver. The issue is how to map the task graph into the distributed system such that the requirements of tasks and edges are met and the capacities of the system resources are not violated. A general allocation model is depicted in Figure 1. In the model, the application program is assumed to be available in the form of undirected task graph as shown in Figure 1(a) and get fed into the allocator to be mapped into a distributed computing system of bus topology as shown in Figure 1(b). The allocator first formulates the allocation problem as an optimization problem and then distributes the application tasks into the available processors of the distributed system by applying the proposed algorithm. The final results constitute set of clusters, where each cluster represents set of tasks assigned to one processor for execution. 6

4 3

t2

t4 3

1

1

6

2

t5

6

t1

1

4

4

t7

t2

t4

TC1

t5

t7

TC2

3

2 3

t3 3

t6

(a) Task Interaction Graph

ALLOCATOR C1

C2

Cn

M1

M2

Mn

P1

P2

Pn

Cluster of Tasks at Each Processor

t1

Interconnection Network (b) Distributed Computer System

Figure 1: General Allocation Model

t3

t6

TCn

26

International Network Optimization Conference (INOC’03), 27-29 October 2003, Evry/Paris, France.

3

27

Integer Programming Model

This section presents a mathematical model to the allocation problem based on the minimax criterion. Generally, a processor load comprises all the execution and the communication costs associated with its assigned tasks. The time needed by the bottleneck processor (i.e., the heaviest loaded processor) will determine the entire program completion time. Therefore, minimizing the completion time may be achieved by minimizing the total time required at the maximum loaded processor to finish the work assigned to it. Given an application graph composed of M communicating tasks, a distributed computer system composed of N available computers with bus topology and the following information: memory requirements of task i, data to be transferred between tasks i and j, communication capacity requirements of edge (i,j), available memory capacity of computer p, available communication capacity of the shared transmission media, average communication cost of a data unit through the LAN, cost of processing task i on computer p,

mi dij bij Mp Alan ccavg Cip

Let the assignment variables be Xip such that Xip = 1 if task i is assigned to computer p and Xip = 0 otherwise. Assuming each edge (i,j) of the application graph represents one data flow between two communicating tasks (i and j ) and defining binary variables Yijp such that Yijp = 1 if two communicating task i and j of edge (i,j) are assigned to different processors p and q while Yijp = 0 if the two communicating tasks (i and j ) are assigned to the same processor p, then the allocation problem may be formulated as a pure 0-1 Integer Programming (IP) problem:

(1)

min Lmax S.T

 

Xip = 1

∀ tasks i

(2)

∀ computers p

(3)

p

mi Xip ≤ Mp

i

   i

p

(i,j)∈E

Cip Xip +

bij Yijp ≤ Alan 2



∀ shared bus lan

(dij ∗ ccavg ) Yijp ≤ Lmax

∀ computers p

(4) (5)

(i,j)∈E

Xip − Xjp ≤ Yijp

(6)

−Xip + Xjp ≤ Yijp

(7)

In the model, the cost function is formulated to minimize the largest time Lmax at a bottleneck computer p, where, Lmax is the total time required for processing all tasks assigned to p and for communicating with other computers q as defined in the constraint 5. Several constraints are incorporated into the model to meet the application requirements and not violate the system resources. The first set of constraints guarantees that each task will be assigned to exactly one computer. The second constraint set guarantees that the total memory required by tasks assigned to computer p does not exceed the available memory of p. The third constraint set guarantees that the total communication capacity required by all arcs placed on the transmission media lan does not exceed the available capacity of the lan. The fourth constraint set defines the maximum time at a bottleneck computer p.

4

Allocation Algorithm

This section first illustrates the Branch-and-Bound technique. Then it explains how the Branch-and-Bound technique was employed as an allocation algorithm in the literature. Finally it presents our modifications to the Branch-and-Bound technique and describes the algorithm operation.

International Network Optimization Conference (INOC’03), 27-29 October 2003, Evry/Paris, France.

4.1

Branch-and-Bound Technique

For 0-1 IP problem as the above minimization problem, to find an optimal solution, the BB algorithm enumerates all possible solutions by constructing and traversing a binary search tree [19]. The algorithm starts at a root node and computes the node cost by solving the current problem. It then proceeds by alternating branching and bounding operations using Backtracking or Jumptracking for selecting a node to be expanded. Each edge in the tree represents an assignment of a task to a computer while each node corresponds to a partial or complete allocation. The branching process expands the selected node into two vertices by setting one of the binary variables Xip that has not a binary value (0 or 1) into Xip = 1 at the first branch and Xip = 0 at the second branch. Bounding process evaluates the cost of the current node to determine whether the node can lead to an optimal solution or not. A node is infeasible if it does not satisfy the constraints or if its cost does not improve the current incumbent solution. If a node is infeasible, it may be fathomed, thereby preventing the generation of its subtree.

4.2

Branch-and-Bound as Allocation Algorithm

Generally, using the above concepts for solving the task allocation problem requires more computational time to trace all the generated nodes. In the worst case, tracing a binary search tree requires O(2N∗M +1 − 1) computational time for M tasks and N computers. Indeed, large size of memory is required to save all the untraced nodes. To reduce the search space, some of the existing algorithms construct and traverse a general search tree. They start with a root node of null assignment, assign task i to all the available computers, evaluate the generated nodes to specify whether the node may be fathomed or must waiting for other expansion, and then branch a node considering task i+1 and so on until an optimal solution is found. In the worst case, using these concepts for tracing the tree M +1 nodes decreases the computational time into O( N N−1−1 ). Although the worst case computational time is improved, finding an exact solution still needs more computational time.

4.3

Modified Branch-and-Bound Algorithm

From the above minimization problem, it is clear that the size of the assignment problem increases as the number of tasks and/or the number of computers increase. However, we observed that, most of the existing algorithms are tried to solve the original minimization problem. An important property that is not considered in the literature is to minimize the size of the assignment problem by constructing and solving the dual problem. Other important property that is not considered in the literature is to branch the tree nodes according to specific order of the given tasks. Obviously, most of the literature handled tasks in a sequence, i.e., task 1 at the tree first level, task 2 at the second level, and so on until an optimal solution is found. However, if the tasks are handled according to one of the priorities that affect on the task placement, lower number of nodes may be generated and the optimal solution may be found faster. In our case, the above minimization problem is converted into maximization problem by duality and the application tasks are ordered according to the task of highest degree, i.e., the task of largest number of neighbors. However, if two tasks have the same degree, the task of higher communication requirements is considered first. Indeed, the following modifications are applied on the Branch-and-Bound rules: • Selection rule: the Selection rule selects the node to be expanded. We use a best first branch strategy. That is, among all the unsolved nodes, the one with the maximum cost is selected for expansion. • Branching rule: the Branching rule defines the scheme for generating sons of a selected node. In our case, a selected node is branched by assigning a task from the ordered list corresponding to the current tree level to all the available computers. • Elimination rule: the Elimination rule eliminates a newly generated node. In our case, after getting first candidate solution, all the generated nodes that have cost lower than the current candidate cost are removed from the waiting list. The algorithm works as follows. Initially, it starts at a root node of null assignment, assigns the first task in the ordered list to each available computer, evaluates the generated node by using Simplex algorithm to find the current cost (CurCost) after mapping the selected task into a computer and then follows a best first branch strategy for selecting a node to be expanded. A selected node is branched by assigning the task corresponding to the current tree level to all the available computers. Along with the BB algorithm, a best cost (BestCost) is initially set to negative infinity and is updated to max(BestCost, CurCost) whenever a feasible leaf node is reached. A node is fathomed if it has infeasible solution or it has a cost lower than the BestCost, otherwise it is saved for later expansion. Indeed, as a leaf node is found, all the generated nodes that have a cost lower than the leaf node cost are illuminated. These restrictions will cause more nodes to be fathomed and so decrease the algorithm computation time as well as decrease the required memory for saving the generated nodes. Finally, when the algorithm terminates, the value of the optimal cost is the value of the BestCost and the optimal solution is the value of the decision variables at the leaf node corresponding to the BestCost.

28

International Network Optimization Conference (INOC’03), 27-29 October 2003, Evry/Paris, France.

5

29

Experimental Results

To show the effectiveness of the proposed algorithm against other algorithms derived form the BB technique, four algorithms are coded in Matlab and tested for a large set of randomly generated application graphs that being allocated to a distributed computing system of 4 computers with bus topology whose resources availability are randomly generated. The first and the second algorithms construct a binary search tree and use Backtracking and Jumptracking respectively for tracing the tree nodes. The third algorithm constructs a search tree by assigning a task to all the available computers, uses a best first branch to select a node for expansion and handles tasks at the tree levels in Sequence list. The fourth algorithm uses the same principals of the third algorithm but handles tasks at the tree levels in Ordered list. Furthermore, all the algorithms solves the dual problem. The criteria adopted for evaluation, as shown in Figure 2, are the total number of generated nodes, the total number of expanded nodes, the computation time of the each algorithm and the memory saving rates with respect to Backtracking. (a) Generated Nodes Vs. Number of Tasks

(b) Expanded Nodes Vs. Number of Tasks 500

Backtracking Jumptracking Sequence Ordered

800 600 400 200 0

Backtracking Jumptracking Sequence Ordered

400

Expanded Nodes

Generated Nodes

1000

300 200 100

4

6

8

10 12 14 16 Number of Tasks

18

0

20

4

(c) Computation Time Vs. Number of Tasks

10 12 14 16 Number of Tasks

18

20

100 Backtracking Jumptracking Sequence Ordered

60

Memory Saving Rates (%)

Computation Time (sec)

8

(d) Memory Saving Vs. Number of Tasks

80

40

20

0

6

4

6

8

10 12 14 16 Number of Tasks

18

20

80 60 40 Jumptracking Sequence Ordered

20 0

4

6

8 10 Number of Tasks

12

14

Figure 2: Performance Evaluation The results show that, at a given number of tasks, the number of generated nodes as well as the number of expanded nodes when using the Jumptracking are lower than that of using the Backtracking and this leads to lower computation time for the Jumptracking. This is because, the Jumptracking obtains the first candidate solution faster and so more nodes may be pruned. On the other hand, both the algorithms that use the best first branch strategy require low computational time to find an optimal allocation. This is because, using the best first branch strategy, the first candidate solution may be found faster and many subtrees may be eliminated and so low number of nodes are evaluated. However, the algorithm that uses an Ordered list is faster than the algorithm that uses a sequenced list. This means that the order in which the algorithm handles the application tasks at the tree levels affects its performance. The results also show that, as the number of tasks increases, the problem size increases and the number generated nodes by most of the algorithms increases. However, the increasing rate of the algorithm that handles tasks according to the pre-specified order is very small. As the number of generated nodes decreases, both the algorithm computational time and the required memory for saving the generated nodes decrease.

International Network Optimization Conference (INOC’03), 27-29 October 2003, Evry/Paris, France.

6

Conclusions

This paper presented an allocation algorithm for mapping tasks of a parallel program into heterogeneous distributed computing system so as to minimize the program completion time. The algorithm derived from the Branch-and-Bound (BB) technique with two distinct features to increase its efficiency. It follows a best first branch strategy for selecting a node to be expanded and handles tasks at the search tree levels according to the task of the largest number of neighbors. However, if two tasks have the same degree, the algorithm considers the task of higher communication requirements. The simulation results show that, constructing the search tree by assigning each task to all the available computers and taking the best first branch strategy to traverse the tree nodes reduced the number of generated nodes. Indeed, the order in which the algorithm considers tasks at tree levels considerably affects its performance. An improvement in the performance can be realized if the application tasks are handled according to one of the factors that affect in the task placement such as the task with the largest number of neighbors.

References [1] G.S. Rao, H.S. Stone and T.C. Hu, ”Assignment of Tasks in a Distributed Processor System with Limited Memory”, IEEE Transactions on Computers, C-28, 4, 1979. [2] C.-H. Lee, D. Lee, and M. Kim, ”Optimal Task Assignment in Linear Array Networks”, IEEE Transactions on Computers, 41, 7, 877-880, 1992. [3] C.-C. Hui and S.T. Chanson, ”Allocating Task Interaction Graphs to Processors in Heterogeneous Networks”, IEEE Transactions on Parallel and Distributed Systems, 8, 9, 1997. [4] P.R. Ma, E.Y. Lee, and M. Tsuchiya, ”A Task Allocation Model for Distributed Computing Systems”, IEEE Transactions on Computers, C-31, 1, 1982. [5] M.-S. Chern, G.H. Chen, and P. Liu, ”An LC Branch-and-Bound Algorithm for The Module Assignment Problem”, Information Processing Letters, 32, 61-71, 1989. [6] A. Billionnet, M.C. Costa, and A. Sutter,”An Efficient Algorithm for a Task Allocation Problem”, Journal of ACM, 39, 3, 502-518, 1992. [7] Y.-C. Ma and C.-P. Chung, ”A Dominance Relation Enhanced Branch-and-Bound Task Allocation”, Journal of Systems and Software, 58, 125-134, 2001. [8] Gamal Attiya and Yskandar Hamam, ”Static Task Assignment in Distributed Computing Systems”, 21st IFIP TC7 Conference on System Modeling and Optimization, Sophia Antipolis, Nice, France, 2003. [9] C.C. Shen and W.H. Tsai, ”A Graph Matching Approach to Optimal Task Assignment in Distributed Computing Systems Using a Minimax Criterion”, IEEE Transactions on Computers, C-34, 3, 197-203, 1985. [10] J.B. Sinclair, ”Efficient Computation of Optimal Assignments for Distributed Tasks”, Journal of Parallel and Distributed Computing, 4, 342-362, 1987. [11] S.M. Shatz, J.P. Wang, and M. Goto, ”Task Allocation for Maximizing Reliability of Distributed Computer Systems”, IEEE Transactions on Computers, 41, 9, 1156-1168, 1992. [12] S. Kartik and C. Murthy, ”Task Allocation Algorithms for Maximizing Reliability of Distributed Computing Systems”, IEEE Transactions on computers, 46, 6, 1997. [13] M. Kafil and I. Ahmed, ”Optimal Task Assignment In Heterogeneous Distributed Computing Systems”, IEEE Concurrency, 6, 3, 42-51, 1998. [14] A. Tom and C. Murthy, ”Optimal Task Allocation in Distributed Systems by Graph Matching and State Space Search”, Journal of Systems and Software, 46, 59-75, 1999. [15] V.M. Lo, ”Heuristic Algorithms for Task Assignment in Distributed Systems”, IEEE Transactions on Computers, 37, 11, 1384-1397, 1988. [16] P. Bouvry, J. Chassin and D. Trystram, ”Efficient Solution for Mapping Parallel Programs”, Proceeding of EuroPar’95, Volume 966 of LNCS, 379-390, 1995. [17] J. Aguilar and E. Gelenbe, ”Task Assignment and Transaction Clustering Heuristics for Distributed Systems”, Information and Computer Sciences, 97, 199-219, 1997. [18] Y. Hamam and K.S. Hindi, ”Assignment of Program Modules to processors: A Simulated Annealing Approach”, Journal of Operational research, 122, 509-513, 2000. [19] W.L. Winston, ”Operations Research: Applications and Algorithms”, Third Edition, Wadsworth Publishing Company, Belmont, California, 1994.

30

Suggest Documents