Scheduling Replicated Critical Tasks in Faulty Networks Using Evolutionary Strategies Garrison Greenwood and Ajay Gupta
Mark Terwilleger
Department of Computer Science Western Michigan University Kalamazoo, MI 49008 fgreenwoo,
[email protected]
Dept. of Math. & Comp. Sci. Lake Superior State Univ. Sault Ste. Marie, MI 49783
[email protected]
ABSTRACT
Scheduling tasks in distributed systems is a dicult problem. Finding good schedules becomes even more complex when the network is faulty and there are no additional resources available. This paper presents a technique using evolutionary strategies to nd task schedules in such systems. Our results indicate that good schedules can be found even when critical tasks must be replicated on distinct processors.
1. Introduction Fault tolerant distributed computing systems must have the capability of detecting hardware failures and continue operating albeit in a somewhat degraded manner. The goal in any fault tolerant system is to recover to an operational state in the minimum amount of time. Fault tolerance is normally achieved in distributed systems by providing redundant resources that can replace faulty resources. N way programming may also be used. This technique requires that critical sections of code be replicated and assigned to distinct processors. This provides both physical and logical fault tolerance [1]. We consider the class of fault tolerant distributed systems in which there are no additional resources available to replace faulty resources. This situation occurs where there are space, power, or other design constraints. In such systems, the recovery mechanism must redistribute tasks from faulty processors to the remaining processors. This reassignment becomes more dicult if replicated tasks must still be assigned to distinct processors. In general, task assignment problems are NP -hard [7]. Thus heuristic approaches must be adopted to perform task scheduling. Most heuristic scheduling algorithms suer from excessive computation time. For example, although simulated annealing algorithms can nd good schedules, the long execution time has been cited as a disadvantage of the technique [2]. Recently there has been a great deal of interest (and success) in using evolutionary algorithms to quickly Research supported in part by a Fellowship from the Fac-
nd reasonable task assignments in fully operational networks [4, 5]. This raises a question. If evolutionary algorithms can nd good task assignments in fully operational systems, why can't they nd them in faulty systems as well? Previously we have shown that evolutionary techniques can eciently nd reasonable schedules in faulty networks [6]. However, we did not consider in that work systems which have critical tasks replicated on distinct processors to achieve fault tolerance. This replication requirement must be maintained in the faulty network which makes the scheduling problem even more dicult. In this paper we extend our previous work by using evolutionary techniques to nd schedules which satisfy this requirement for critical tasks. Speci cally, this paper investigates the capability of evolutionary strategies (ES) in tasks scheduling problems on faulty networks of the type described above. We have selected a ring topology of p 2 homogeneous processors as our system for investigation. For such systems, a total of 5 potential hardware faults are identi ed. Using ES, we assign large random task graphs and large binary tree task graphs (modeling divide-and-conquer algorithms) into networks with various faults and quickly determine reasonable task assignments. We also consider cases where tasks are replicated to achieve fault tolerance. Our results indicate that the ES is a viable technique that can quickly nd tasks redistributions in faulty networks that will exhibit reasonable speedups.
2. Problem Description
ulty Research and Creative Activities Support Fund, WMUFRACASF 94-040 and by the National Science Foundation The distributed systems of interest consist of a under grant CCR-9405377. number of processors (with an associated set of
resources) interconnected in a ring topology. All processors are homogeneous and communicate over the interconnection network via message passing. We assume that each processor contains a hardware for performing computations and additional routing circuitry for handing communications. Thus processor failures and communication failures are independent. Processor failures are fail-stop (byzantine failures are not considered). Faulty links prevent all message trac from being transmitted over the link. It is assumed that all failures are permanent and detectable. In such systems, the following faults can occur: 1. One or more non-adjacent processors fail. 2. One or more adjacent processors fail. 3. A single link fails. 4. Two adjacent links fail. 5. Two or more non-adjacent links fail. In ring networks with P processors, these faults produce dierent faulty network topologies. For example, faults 1 and 2 reduce the number of available processors but the ring topology is preserved (though communication latencies are now increased). Fault 3 forms a P processor linear array while fault number 4 isolates a reliable processor and forms at most a P -1 processor linear array. Fault 5 forms two or more linear arrays. In some systems, critical tasks are replicated and run on distinct processors in order to achieve fault tolerance. For example, suppose accurate sensor data is crucial to proper system operation. By providing multiple sensors interfaced to distinct processors (with a copy of the sensor processing task on each processor) both physical and logical fault tolerance is achieved. The tasks to be scheduled in our distributed systems have the following characteristics: 1. Tasks have a de ned precedence. This can be represented as a digraph where vertices represent tasks and a directed arc from a task points to its successor. 2. Each task has a known execution time. 3. No task or subtask can be preempted once it has begun execution. 4. In support of fault tolerance, certain critical tasks will be replicated. Each replicated task must be assigned to a unique processor. 5. The size of all messages transferred between a task and its successor is known. 6. Communication delays are bounded. Any scheduling algorithm must assign each task to a processor and specify the order of execution on that processor. While there are many possible schedules, not all of them are desirable. For example, communication in distributed systems is costly. These costs can be minimized by assigning tasks
that communicate to the same processor. But, this approach does not take advantage of any possible concurrency present in the application and is not even permitted if it assigns replicated tasks to the same processor. As mentioned earlier, nding solutions to scheduling problems is NP-hard [7].
3. The ES Implementation ES are based upon the principles of adaptive selection found in the natural world. Each generation (iteration of the ES algorithm) takes a population of individuals (potential solutions) and modi es the genetic material (problem parameters) to produce new ospring. Both the parents and the ospring are evaluated but only the highest t individuals (better solutions) survive over multiple generations. ES have been successfully used to solve various types of optimization problems. The reader is referred to Fogel for an excellent discussion of the ES techniques [3]. The particular genetic encoding for an individual is referred to as the genotype. New individuals are created by special operations (e.g., mutation) which modi es the genetic material. Decoding this genetic material gives the observed characteristics of the individual which is referred to as the phenotype. In our case, the genotype consists of P integer lists which re ects the tasks allocated to the P processors in the distributed system. The left-to-right order of tasks in a list indicates the order of execution. The phenotype is the resulting schedule based upon this task allocation and execution ordering. Every point in the search space is an individual. The ES uses a population of individuals to search for task allocations that will yield good schedules. The initial population is randomly generated but, ideally, should be uniformly distributed throughout the search space so that all regions may be explored. During each generation, the individuals are mutated to produce ospring. This means the ES is simultaneously investigating several regions of the search space. This greatly decreases the amount of time required to locate a good schedule. Those regions of the search space containing individuals which represent reasonable schedules are selected for further investigation. Search operations are conducted at the genotype level while selection for survival is done at the phenotype level. Each individual in each generation is evaluated to determine its tness. Individuals with high tness represent task assignments in which no precedence constraints are violated and the schedule length (i.e., the time required to execute all tasks) is low. The ES terminates after a xed number of generations (?) have been produced and evaluated or earlier if a good schedule is found. The ES approach is implemented as follows:
1. Conduct a breadth- rst search of the task graph numbering the vertices in the order visited. 2. Create an initial population of individuals by selecting vertices from the task graph in numerical order and randomly assigning them to processors in the distributed system. Task numbers are appended onto an array corresponding to the selected processor. The leftto-right order of the array dictates the order of task execution. 3. For each individual, generate osprings by applying a mutation operator (described below) 4. Evaluate all individuals to determine their tness. This is done by computing the schedule length based upon the indicated task assignments and their order of execution. 5. Select the ttest individuals for survival. Discard the other individuals. 6. Proceed to step 3 unless a suitable solution has been found or ? generations have been evaluated New individuals (ospring) are created by applying a mutation operator to the current members of the population (parents). Each parent creates a single ospring each generation; parents and ospring compete equally for survival. Selection is purely deterministic as only the best survive. Mutation was implemented by randomly selecting a thread of execution from one processor and inserting it into the thread of execution on another processor. The processors, selection points within the execution threads and the size of the thread to be moved are all randomly chosen. Mutation is adaptive since the upper and lower bounds on the number of tasks to be moved depends upon the total number of tasks currently assigned to the losing processor. We illustrate the aect this has on the task assignments with the following example. Consider an assignment of 15 tasks on a 2-processor distributed system. (These concepts are easily extended to the case where there are P>2 processors and several times this number of tasks.) The \?" 's bracket the execution thread on processor 1 which will be transferred to processor 2. The \ " indicates where this thread is to be inserted.
Initial Assignment
Processor 1: 1 5 9 ? 4 2 ? 8 6 11 14 15 13 Processor 2: 3 7 12 10
Mutated Assignment
Processor 1: 1 5 9 8 6 11 14 15 13 Processor 2: 3 7 4 2 12 10
The tness of a genotype is the resultant speedup that can be obtained with the associated task assignment. (Speedup in this case refers to the schedule length if all tasks were assigned to a single processor divided by the schedule length in the distributed system.) The ES uses tness to determine which individuals survive into the next generation. Highly t individuals delineate allocations which yield high speedups. Conversely, low t individuals delineate allocations which yield low speedups, violate precedence constraints, or assign replicated tasks to the same processor. Task allocations are considered invalid if a schedule for the tasks cannot be obtained. Recall that the execution order of tasks assigned to a processor is re ected in the left to right order of tasks in the processor list. Certain ordering of tasks violates the precedence constraints indicated by the task graph. Referring to the example assignment above, suppose task 4 must be executed before task 7. Clearly the mutation operation above would violate the precedence constraints. In such cases, no schedule length can be determined; the tness of this individual is set extremely low so that it will not survive. If N -way programming is used to replicate tasks, the execution times of the copies may dier slightly from those of the original task. This is because dierent copies of the task are usually written by dierent programmers. The copies may even be written in dierent high-level languages. We modeled this phenomena by (randomly) varying the execution times of the copies by 5% of the original task execution time.
4. Results To evaluate the ES scheduling technique we embedded 3 sets of complete binary trees and three sets of randomly generated graphs into an 8-processor distributed system interconnected by a ring topology. (A random graph with N nodes had at least 2N edges.) The three sets had 127, 255, and 511 tasks, respectively, for both types of graphs. In each case up to 4 randomly selected tasks were considered critical and replicated two more times. Replicated tasks in the binary trees were randomly chosen and thus could include the root, interior or leaf nodes. Replicated tasks maintain the same connectivity as that of the original node. This can dramatically increase the degree (number of incident edges) of some nodes particularly for the random graphs. Execution times for the tasks where randomly chosen and varied from 10 to 20 milliseconds. The schedule length must also consider the time spent to communicate intermediate results. Communication tasks were modeled as follows. Transferring a message from one processor to its neigh-
Sp ee du p
7.9 7.4 6.9 6.4 5.9 5.4 4.9 4.4 3.9 3.4 2.9
127 tasks - - - 255 tasks 511 tasks
7.9 7.4 .. ..... . . 6.9 ... .. . . . . . 6.4 .......... ... ..... . . .......... ... . . 5.9 .......... ..... .......... .... 5.4 ....... ... .. ........... . 4.9 ........ .... .......... 4.4 .......... .......... . 3.9 ......... ....... 3.4 ..... 2.9
0
Sp ee du p
1 2 3 4 5 Number of Faulty Processors
Sp ee du p
6.6 6.1 5.6 5.1 4.6 4.1 3.6 3.1 2.6
127 tasks - - - 255 tasks 511 tasks
... ... ... . . . . ... . . ... ... . . ........... . ......... . . ........... ............. ... ........ . . ......... . .... . ...... .... . . ....... ... . . ........ ... .. . ........ .. ... . . ......... .... . ............ . ............
0
6.6 6.1 5.6 5.1 4.6 4.1 3.6 3.1 2.6
Sp ee du p
1 2 3 4 5 Number of Faulty Processors
Fig. 1: Speedup on a 8-processor faulty ring for binary tree Fig. 2: Speedup on a 8-processor faulty ring for random task task graphs. graphs.
bor takes approximately 2.3 milliseconds. (Our previous work has shown that this is a reasonable value [5].) Suppose task A assigned to one processor needs to pass a result to task B which is assigned to another processor. The communications latency is thus 2.3d milliseconds where d is the minimum number of links that must be traversed by the message. The communications task cannot be scheduled until the task which produced the message has completed execution. This, in turn, determines the earliest start time for task B . As stated previously, link failures convert a ring topology into a linear array. Our results indicate that there is no signi cant speedup dierence between a P -processor linear array and a P -processor ring topology. We conjecture that this is the result of using a small (only 8 processor) distributed system. As a result, we only report the results for processor failures. Eight runs of the ES scheduling technique were conducted and the average performance recorded. In all cases we set =100 and ran the ES for ?=100 generations. Figures 1 and 2 show the speedups obtained as the number of faulty processors increased. In both cases the 511 node task graphs show a near linear decrease in speedup as the number of faulty processors increased. The 127 node and 255 node task graphs show a lower rate of speedup decrease for 1 or 2 processor faults and then provides performance similar to that of the 511 node task graph. The most interesting fact evident from the graphs is that the ES was able to nd task allocations with excellent speedups even in the presence of failures. With a P -processor distributed system, the maximum theoretical speedup is P . With 3 failures,
our system has 5 operational processors (giving a theoretical maximum speedup of 5 which ignores communications costs). From gures 1 and 2 it can be seen that the speedups for the 511 node task graphs are approximately 4.8 for the binary tree and 4.5 for the random graphs. The speedups are slightly lower for the smaller task graphs. This is not unexpected since the communication to computation ratios increase as the task graphs become smaller.
5. Conclusions We have introduced a heuristic technique for scheduling task graphs into faulty networks. This technique uses ES which are based upon the biological properties of natural selection. Our test results indicate that this technique is capable of identifying suitable task reassignments that support any logical task distribution requirements (e.g., as the result of N -way programming). The actual speedups achievable will be slightly less than those indicated as we did not consider network congestion when modeling the communications tasks. Nevertheless, our results do show that the ES is a viable technique for scheduling tasks in faulty networks.
References [1] A. Avizienis, \The N -version Approach to Fault-Tolerant Software", IEEE Trans. of Software Engr, Vol SE-11, No. 12, pp 14911501, Dec 1985
[2] S. Bollinger and S. Midki, \Heuristic Techniques for Processor and Link Assignments in Multicomputers", IEEE Trans. on Computers, Vol 40, No. 3, Mar 1991. [3] D. Fogel, Evolutionary Computation, IEEE Press, 1995 [4] E. Hou, N. Ansari, and H. Ren, \A Genetic Algorithm for Multiprocessor Scheduling", IEEE Trans. on Parallel & Dist. Sys., Vol 5, No 2, pg 113-120, Feb 1994. [5] G. Greenwood, A. Gupta, and K. McSweeney, \Scheduling Tasks in Multiprocessor Systems Using Evolutionary Strategies", Proc. of 1st IEEE Conf. on Evolutionary Computation, pp. 345-349, June 1994. [6] G. Greenwood, A. Gupta, and M. Terwilliger,\Task Redistribution in Faulty Networks Using Evolutionary Strategies", Proc. of 1st Int'l Workshop on Parallel Proc., pp 249254, Dec 1994. [7] J. D. Ullman, \NP-complete Scheduling Problems" J. Comput. Syst. Sci., Oct 1975.