Applicability of Simulated Annealing Methods to Real-Time Scheduling and Jitter Control Marco DiNatale
Department of Computer Science University of Massachusetts, Amherst, MA 01003 Scuola Superiore S.Anna, Pisa, Italy
[email protected]
Abstract This paper presents a non-conventional scheduling approach for distributed static systems where tasks are periodic and have arbitrary deadlines, precedence, and exclusion constraints. The solution presented in this work not only creates feasible schedules, but also minimizes jitter for periodic tasks. The problem of scheduling real-time tasks with minimum jitter is particularly important in many control applications, nevertheless, it has been rarely studied in the scienti c literature. We present a general framework consisting of an abstract architecture model and a general programming model. We show how to design a surprisingly simple and exible scheduling method based on simulated annealing and present some experimental results.
1 Introduction Real-time distributed systems are becoming more commonplace. Applications like process control, avionics, and robotics need real-time support to schedule and synchronize real-time tasks running on remote nodes. In those cases where the environment and the application characteristics are well known in advance, the worst case conditions and the critical rates can be evaluated. The computations required by the application can be distributed in a set of periodically activated tasks being scheduled according to a xed pattern. The scheduler is executed o-line and uses the parameters of the task set to generate a table of activation times to be used by the local dispatchers. In some applications the feasible scheduling of all the task instances within the deadlines is not sucient to guarantee the correct behavior of the system. For example, common semantics used for periodic tasks permits two successive instances of the same task to be separated by an amount of time variable between
John A. Stankovic
Department of Computer Science University of Massachusetts, Amherst, MA 01003
[email protected] FAX 413-545-1249
zero and two periods minus the minimum computation time of the task. In some cases this variation is a serious problem and there is the need for dierent scheduling solutions that minimize jitter. Even if the task scheduling is performed o-line, it is a dicult job to nd a feasible ordering for thousands of task instances in the least common multiple of their periods, especially if we require synchronization and the usage of shared resources. Most of the research solutions try to solve the feasibility problem only (i.e., schedule tasks within their deadlines). These solutions most often use heuristic-driven search algorithms to nd the feasible task sequence. Good examples of state-of-the-art scheduling algorithms can be the branch and bound algorithm by Xu and Parnas [10] where tasks with exclusive relations and precedence constraints are scheduled in a preemptive manner and the Mars scheduler [5] using an iterative deepening search algorithm to solve the distributed scheduling problem. If we tried to use a traditional search-based solution of this kind, we should use a fairly complex heuristic function to achieve the secondary goal of minimizing the jitter. The problem of nding a suitable heuristic function for the jitter minimization problem is likely to be dicult, as there is no proposed solution to our knowledge. In this paper we propose the application of simulated annealing techniques to static scheduling problems as an alternative to conventional search algorithms. As we show, a simulated annealing scheduler can deal with both jitter minimization and feasibility. The main contributions of this paper include:
the design and implementation of a description language called DTDL that carefully identi es the information needed to be provided in order to perform distributed, real-time static scheduling; it serves as a controlled input to the scheduling tool itself, making it easier to use the tool;
the creation of a real-time scheduling tool based
on simulated annealing; developing innovative ways to specialize simulated annealing to nd solutions which contain feasible schedules with minimized jitter, and demonstrate the performance and value of the algorithm on three example task sets. In this paper, we focus our attention on a nonpreemptive scheduling scheme. In our opinion, the use of non preemptive schemes should not be ruled out a priori as there are cases where application constraints or design considerations make non-preemption an attractive choice. For example, non-preemptive schemes allow better control on the assignment of the processor time and ease the observability and the debugging of the system. Furthermore, non-preemptive schemes allow better control of jitter than their preemptive counterparts. Lastly, the greater exibility of the preemptive schemes doesn't pay o in multiprocessor systems as it does in single processor scheduling, as suggested by some theoretic results. In any case, it is possible to slightly change the solution presented in this paper to allow for a limited degree of preemption as shown in [4]. At least one other work has been published on the application of simulated annealing techniques to realtime systems. Speci cally, in [3] simulated annealing is applied to the task allocation problem. In this referenced work, the allocation of the task is performed by simulated annealing while the scheduling of the tasks is performed according to local xed-priority policies. In our work, on the contrary, the assignment of the tasks to the processor(s) is at rst assumed as xed while the scheduling of the tasks on the local cpus is calculated with simulated annealing techniques. We also show how our method can be expanded to handle the allocation problem. In the following sections we show that simulated annealing techniques can be very
exible and simple, and discuss the theoretic results that tell us that the method is likely to avoid local minima and to nd approximate solutions in polynomial time [1]. We believe that the extreme simplicity of the method and its applicability to problems that are hard to solve with conventional methods should be sucient to make simulated annealing scheduling a valuable approach for real-time designers.
2 Scheduling Problem: De nitions We consider the following scheduling problem: we assume a LAN network connecting multiprocessor
nodes. Each node consists of a multiprocessor bus and a number of boards with processors. A set of periodic processes P1 ; P2 : : : Pn are statically (later this condition will be removed) assigned to the processors and must be executed prior to their deadlines di . By periodic process we mean a process that is activated periodically or has periodic arrival (or starting) times ri . Each process instance in the scheduling period, de ned as the least common multiple of the processes' periods is characterized by a deadline di , a start time ri , a value vi and a cpu speci cation (static allocation case). Each process is divided into a chain of scheduling units, the tasks. The tasks can be de ned as sequential non-preemptive computational units that begin or end with a synchronization point or with a basic operation (request or release) on one or more resources. The tasks inherit the time attributes of the processes to which they belong, and add other attributes of their own. Each task instance i in the scheduling period is characterized by a deadline di , a computation time ci , a start time ri , a value vi , and a set of resources requested in exclusive mode
fRi1 ; Ri 2 ; : : : Ri n g
The tasks can have precedence and exclusion constraints (induced by the usage of synchronous primitives and shared resources) with the notation i ! j meaning that task i must be completed prior to the execution of task j and i j meaning that task i and task j cannot be executed concurrently. Now, if we divide the tasks into sets of periodic instances, we can de ne sik as the time instant when ik , the k-th instance of the task i , takes the control of the cpu in the computed schedule. The jitter Jik for the k-th instance of task i successfully scheduled (k = 1 : : : mi ) can be de ned as:
Jik =j sik ? sik ? Ti j for k = 1; 2; : : : mi ? 1 (1) +1
Jimi =j si + gcd(T ) ? Ti ? simi j where Ti is the activation period for task i and gcd(T ) 1
is the scheduling period. The distributed scheduling problem can now be formulated as a combinatorial optimization problem on the set of all the task instances. Assume the cost of the schedules is either the number of tasks scheduled after their deadlines or the maximum jitter Jmax = maxi;j fJij g. The problem is to minimize the cost of the schedules subject to the precedence, exclusion and deadline constraints and the implicit constraint that no task can be scheduled before its starting time.
Now that the scheduling problem has been de ned, we will give a brief survey of classical results that prove that the complexity of the simple feasibility problem is NP. Theory tells us that: The problem of deciding whether it is possible to schedule a set of periodic processes which use semaphores only to enforce mutual exclusion is NP-hard [6]. This tells us that the use of resources causes NP hardness, regardless of whether the normal execution of the tasks is preemptable or not. The multiprocessing scheduling problem with n processors, no resources, arbitrary partial order, and each task having a unit computation time is NP-complete [7]. This means that arbitrary precedence constraints cause NP completeness. The multiprocessing problem of scheduling P processors, with task preemption allowed and where we try to minimize the number of late tasks is NP-hard [8]. This is enough to say that even the multiprocessor preemptive scheduling of independent tasks is an NP-complete problem. After the formal de nition of our scheduling problem, we designed a system description language called DTDL (Dependency and Temporal Description Language) to describe processes, tasks, the allocation of the processes to the cpus, the usage of resources by the tasks, the precedence constraints among tasks, and nally all the temporal data of the system (deadlines, periods, computation times and so on). The DTDL description, mirroring all the de nitions of this section, should be produced at the time the application is designed and then checked and completed when the application program is compiled and analyzed o-line. Its purpose is not only to serve as a synthetic description of the application, but also as the input for our scheduling procedure. The DTDL language consists of a set of de nitions characterizing the composition and the attributes of a real-time application. We use DTDL to describe the examples in the experimental section. Figure 1 contains the description of DTDL language in BNF form.
3 Scheduling tasks with simulated annealing In the previous section we explained how our scheduling problem can be formulated as a combinatorial optimization problem. Now we explain how the
::= [ ] ::= { ";" } ::= { ";" } ::= PROCESS "(" ")" ::= RESOURCE ::= NODE CPU ::= PERIOD DEADLINE ::= {"," } ::= TASK "(" ")" ::= [ ] [ ] [ ] ::= RES ::= ::= { "," } ::= PREC ::= [ DEADLINE ] [ CTIME ] [ RTIME ]
Figure 1: Syntax of the DTDL language problem can be solved by applying a simulated annealing algorithm. In our combinatorial optimization problem we have an objective function to be minimized, and the space over which the function is de ned is factorially large so that it cannot be explored exhaustively. In other words we have to search the solution space S (typically a subset of the possible permutations of the set of all task instances) for an optimal (minimum jitter and/or maximum number of scheduled task) solution. A possible search algorithm is a local search: we de ne a transition operator TR between any pair of scheduling solutions (i ; j ) 2 S and a neighborhood structure Si for each solution i containing all the solutions that are \reachable" from i by means of the operator TR (to give an example, a possible de nition for a neighborhood is the set of all schedules obtainable from the given one by a permutation of any two task instances). If we use a local search, starting from any solution 2 S the algorithm examines the neighborhood of and the adjacent solution having minimal cost is iteratively chosen for the next step. This algorithm generally ends in a local optimum with no guarantee on the global optimality or even on the \quality" of the found solution. To improve the chances of getting to the global optimum this baseline algorithm can be improved by conditionally allowing transitions in a direction opposite to the pursued optimum. This modi cation is aimed to escape from the local optima, but not from the global ones. Simulated annealing works this way: each step of the algorithm corresponds to a state transition in a neighbor solution randomly chosen and stochasti-
cally accepted even in the case it has a growing cost. The method takes its name from an analogy with thermodynamics, speci cally with the way liquid metals cool and anneal in a solid state with minimum energy. The Boltzmann probability distribution,
Prob(E ) e ?kTE tells us that even at a low temperature there is a small chance of a system being in a high energy state. Therefore, there is a corresponding chance for the cooling metal to get out of a local energy minimum (polycristallyne state) in favor of nding a better, global one (crystal state), changing its con guration from energy E1 to an higher energy E2 with probability ?(E2 ?E1 )
(p def = 1 if E2 < E1 ) (2) To solve the distributed scheduling problem we use a simulated annealing algorithm, modeled on the one proposed in [2]. The input consists of the set of all the instances of the tasks 1 ; 2 : : : n in the least common multiple of the processes' periods or scheduling time, each labeled with its future starting (or arrival) time and deadline. The simulated annealing scheduler works iteratively, assigning random changes to the task parameters and computing the resulting schedules. The schedules are evaluated on the basis of a feasibility or jitter metric and the changes are accepted or rejected with the probability of acceptance given by ( 2). The pseudo code for the simulated annealing algorithm used in this paper is in gure 2. We will brie y examine the algorithm of gure 2 and then give a summary of the theoretic results on simulated annealing methods. This will help to explain how to choose the parameters and the evaluation function that characterize the algorithm itself. The required ingredients for our simulated annealing procedure are: The description of all the instances of the tasks and of the solution space (all possible scheduling orderings). A generator of random changes in the scheduling solutions; these changes are the options presented to the system (the transition operator TR). An objective cost function C (analog of energy) whose minimization is the goal of the procedure. A control parameter T (analog of temperature) and an annealing schedule which tells how it is lowered from high to low values.
p=e
kT
anneal(init_temperature, final_temperature, coolrate, instance_list) { temperature = init_temperature; value = schedule_and_compute_value(instance_list); for (j=1; j final_temperature; j++) { /* start evaluation of new chain */ nsucc = 0; for (k=1; k = MAXCHANGE) /* max accept trans. */ break; } temperature *= coolrate; /* decrease temperat. */ if (nsucc == 0) return; } }
Figure 2: Core of the simulated annealing scheduling algorithm The input parameters to the algorithm are the initial and nal value for the temperature parameter, the cooling rate, and the descriptors of all the task instances in the scheduling time. The latter are ordered in a list, each descriptor containing information about the timing data , the precedence relations and the resource usage of the corresponding task instance. The routine anneal works by calling other procedures that perform the random changes in the solution space, the evaluation of the newly found solution and the eventual acceptance (or rejection) of the new scheduling solution. Their meaning will be made clearer in the following when the mechanisms of our algorithm will be detailed. The nal output of the algorithm is an execution time (expressed as a starting and a corresponding ending instant) for each task instance in the least common multiple of the process periods or scheduling time. These execution time intervals can be stored in a table to be used by run time dispatchers. The theory tells us that, once the transition func-
tion TR is de ned, the probability of a transition among any two solutions (or states) can be expressed as the product of two factors: the generation probability Gij or the probability that the state j is chosen for the transition exiting from i (uniform over the neighborhood) and the acceptance probability Aij for the transition in j to be accepted (de ned as one for cost-decreasing solutions and exponentially decreasing for? higher cost transitions according to the law Ci ?Cj T ). The set of conditional probabilities p=e Pij (k ? 1; k) de ned for each couple of states (i; j ) and for the k-th iteration as the probability that the transition exiting from state j ends in state i identi es a Markov chain. If T is held constant (homogeneous chain), this Markov chain is proven to converge (as a limit for k ! 1) to a stationary distribution qi (T ) for each i-th state (or solution) that is a function of the control parameter T. (
)
?(C (i)?Copt)=T
(3) qi (T ) = P e e?(C (j)?Copt )=T j 2S where C is the cost function, Copt the minimum cost and S the space of all solutions. When T ! 0 the sta-
tionary distribution has probability one for the state (or the overall set of states) corresponding to the global minimum and zero for all the other states. This result holds even in case the chain is nonhomogeneous [9] (with T decreasing with k, the number of iterations), provided the parameter T, or better Tk , is decreased such that:
limk!1 Tk = 0 Tk Tk+1 8k 2 : Tk logn(k) where n is the maximum distance between any state and the minimum cost solution (expressed as a number of transitions) and is the maximum cost dierence between two solutions. In any case, the convergence is guaranteed only for an in nite length of the Markov chains, meaning the algorithm gives no advantage in nding exact solutions when compared to exhaustive search. It is the approximate implementations of the algorithm, like the one in gure 2 that make use of limited chains and provide approximate results in polynomial time that make simulated annealing a feasible choice. Almost all the approximations are based on the concept of quasi-equilibrium. The idea is to generate homogeneous Markov chains of nite length L(T ) for decreasing values of the parameter T. The length of each Markov chain should be sucient if not
to reach the stationary distribution, at least to get as close as possible to it. The objective is to obtain a distribution vector ai (L; T ), function of the length L of the chain and such that
ka(L; T ) ? q(T )k < " From this distribution vector the algorithm starts with the next iteration pursuing the next state of quasi-equilibrium for a lower value of T. The parameter T must not be decreased too quickly to have a reasonable length for the Markov chains when getting to the next state of quasi-equilibrium. The algorithm we used accepts a maximum number of transitions (proportional to the dimension of the problem) for each value Tk of the control parameter. We have chosen to limit the length of the chains by allowing a maximum number MAXCHANGE of accepted transitions for each value of Tk and an upper limit MAXTRY to the length of each chain (for lower values of Tk only a limited number of transitions can be accepted). The value Tk+1 is lowered proportionally to Tk according to the formula
Tk+1 = Tk
(4)
with = 0:95 as proposed in [1]. It is particularly important that T0, the starting value of T, be suciently large to allow a transition to any other state of the system. The terminal value for Tk or T1 must be chosen such that for T1 it is not possible to have a transition towards cost-increasing states. The initial and terminal value assigned to Tk , depend on the de nition of the cost function and the transition operator. Like any other algorithm of this kind, the nal error and the convergence speed depend on multiple factors like the size of the problem, the starting values for the control parameter, the limit for the length of the Markov chain for each iteration and especially the complexity of the functions for the generation of the random changes and the evaluation of the new con gurations. It is important to correctly de ne the parameters of the algorithm and to program it as eciently as possible to reduce the overall complexity and computation time.
3.1 The Transition Operator For our scheduling problem, the set of all possible solutions consists of all the possible permutations of the tasks subject to the additional precedence and exclusion constraints and to their deadlines. To generate new schedules, it is possible to change the scheduling
policy or, more reasonably, to keep the scheduling policy xed and change the task parameters to new values consistent with their constraints. In particular, if we de ne a task con guration as the set of the temporal characteristics (starting times, deadlines, computing times), the resources requested by each task (communication network included) and the precedence relations, and if we de ne a scheduling policy (in our case we choose to assign the cpus with a FIFO policy), each con guration corresponds to one resulting schedule. Our transition operator assigns random changes to the task con guration in order to generate a new, different schedule. The new con guration is obtained by changing the starting times ri of the tasks and delaying them to change the activation times and the priority ordering of the tasks in the new schedule. The choice of the new starting times is not arbitrary, but must de ne new values compatible with the originary starting times, the deadlines and the precedence relations. By compatible starting time for a task i we mean any time instant ri0 that would not a-priori prevent the feasible scheduling of the task i , that is
ri0 ri ri0 rj + cj for all tasks j : j ! i ri0 di ? ci ri0 dk ? ck ? ci for all tasks k : i ! k
The operator responsible for the transition to the neighbor solutions is the following:
a task instance is randomly chosen, at this point the operator computes a time interval where the
starting time of the instance can be moved the operator assigns one randomly chosen new starting time ri0 to the given task instance. The beginning rfirst of the compatibility interval is given by the maximum among the actual starting time for the task instance i and the maximum among the earliest possible completion times (rj + cj ) of the predecessors j . The ending time rlast of the interval is the minimum among the latest possible starting time for the task, di ? ci , and the deadlines of the successors k minus the sum of the times ci + ck necessary to complete the execution of i and to execute k before its deadline. If i is the task instance chosen for the transition, the algorithm explores all its predecessors j : j ! i and evaluates rfirst = maxj frj + cj ; ri g, then all
its successors k : i ! k and computes rlast = mink fdk ? ck ? ci ; di ? ci g. The new starting time for the task instance i can then be chosen in any instant between rfirst and rlast . Going back to the algorithm of gure 2 it is now possible to explain what the other functions called in the body of the anneal procedure do. The function choose instance randomly picks a task instance from the list of all instances; the function choose change computes the compatibility interval and the new starting time for the chosen instance. The function apply change performs the change in the starting time and inserts the task back in the instance list; The functions good change and undo change will later con rm or undo the proposed modi cation.
3.1.1 Precedence constraints Any con guration of precedence constraints can be managed by the algorithm in the following way: all the schedules calculated by the algorithm in the successive iterations are obtained by FIFO ordering all the task instances whose starting times have been modi ed randomly. If all tasks were to be executed on the same CPU and if there were no other shared resources, then any ordering in which the starting times of the predecessors are earlier than the starting times of the successors would be consistent with the precedence constraints. An initial ordering of this kind can be obtained by assigning the starting times to the tasks in the following way:
8i : ri < rj whenever i ! j that is
8i ri = maxfri ; maxj frj + cj for all j ! i gg (5) then, the transition operator previously de ned keeps the ordering of the starting times of the tasks consistent with the precedence constraints. This procedure is not sucient to avoid a control on the precedence constraints as it doesn't guarantee by itself that the tasks will be scheduled in the right order. We must remember that our scheduling problem is characterized by shared resources and remote precedence constraints, so a consistent ordering of the starting times doesn't imply that precedence constraints are respected. Nevertheless, we assign the starting times at the beginning of the simulated annealing algorithm (right before calling the procedure of gure 2) and in the intermediate iterations according to equation 5 and
the compatibility constraints for a couple of good reasons. First, the described assignments do not add any arti cial constraint, but change the problem into another one absolutely equivalent to the original. Second, they help avoid considering a-priori infeasible solutions, thus saving time when randomly producing new con gurations. At run time, when evaluating the solutions, our FIFO scheduler simply suspends the execution of the task instances that happen to be at the top of the ready list until the local or remote predecessors have completed their execution.
and the modi cation of the starting times allows an optimal solution to be attained starting from the initial con guration. The following theorem answers this question (the proof can be found in [4].)
3.1.2 Access to the shared resources
3.2 The cost function
The shared resources are managed at the moment the resulting schedule is evaluated for each single transition. The FIFO procedure that evaluates the partial schedules blocks and puts those tasks that request access to busy resources in a waiting queue. At the moment the resource is released the waiting instances compete again for the access. The optimization in the assignment of the resource is not the FIFO scheduler's duty, but is the result of the selection through simulated annealing of the right activation times for all tasks.
3.1.3 Access to the communication network The communication network can be modeled, in the simplest way, as a global shared resource, to be accessed in a mutually exclusive way. In static systems this solution is probably sucient; the concurrency control could be implemented a priori and on a global basis, thus allowing the use of very simple access protocols (without the MAC layer). A dierent possibility is to use only a fraction of the global bandwidth on each node (with media access protocols like TDMA, token ring or timed token) and allow concurrent accesses with bandwidth sharing. What is required, at the stage the partial schedules are computed, is to handle the network resource by calculating the time necessary to forward the messages to the remote locations, on the basis of the fraction of bandwidth allocated to each node, and assign these transmission times to the sending tasks. This implies a larger time necessary for sending single messages since only a fraction of the total bandwidth is available.
3.1.4 Reachability of the optimal solution It is reasonable to question whether our method for generating new solutions through a FIFO scheduler
Theorem 3.1 In case the previously de ned scheduling problem admits a feasible solution, then the solution can be found by applying the previously de ned transition operator a nite number of times and then scheduling the task instances according to a FIFO policy (blocking all ready tasks until their predecessors have completed and all the resources they need are available). The value for each newly generated schedule is a function of the number of tasks scheduled within their deadline and, for the basic schedulability problem, is simply obtained as
C=
X i
i
(6)
where i = 1 if i is scheduled beyond its deadline; 0 elsewhere. For the minimization of the maximum jitter, (using the notation of equation 1), we can write:
C = min fJ g i;k ik
(7)
where Jik the jitter for the k-th instance of task i is evaluated according to the equation ( 1). These functions are easily computable during (or right after) the evaluation of the partial schedules. It is important to specify that our scheduler rejects all the task instances that would execute past their deadlines together with their successors. The schedules obtained with one or more tasks missing (because not schedulable within their deadlines) are accepted and evaluated by the simulated annealing routine. In case a task instance is missing the jitter is simply calculated spanning the empty period and considering the next schedulable instance. The reason why we consider schedules with one or more missing instances is to make the solution space and the energy function as continuous as possible. In complex systems, we expect that most of the computed schedules are not feasible. If we throw them away, then we would probably end up with a very limited number of solutions, widely separated both in the solution space (maybe only reachable with complex transition functions) and in the value of the metric (as to make hops to global minima a pure matter of
luck). Because of the way we calculate jitter on both feasible and not feasible schedules, it is clear that the solution with the minimum jitter is likely to be also a feasible one (when existing). When no task instance is missing there are no void periods and the jitter should be sensibly reduced (actually a new scheduled instance causes a step in the jitter function as shown in the experimental section). To make sure that the solution with the minimum jitter is also a feasible one it is possible to check the nal schedules and verify that no instance is missing.
3.3 Initial and nal values for the control parameter T Given the cost function ( 6) or ( 7) and the transition operator that is used to generate the new con gurations, the initial value T0 must be evaluated such that in the rst chain almost all the proposed transitions could be accepted. A value too small for T0 could make the algorithm stop on a local minimum thus preventing the reachability of the global optimum. At the same time T0 shouldn't be too large to lose excessive time in useless random wandering through the solution space. We evaluate T0 from the equation shown in [1] that links the initial temperatue with the initial acceptance ratio. (8) ln(?0 1 ) by imposing an initial acceptance ratio 0 = 0:9 ( is an estimate of the average cost dierence for a cost-increasing transition). If the cost is given by the number of tasks terminated past their deadlines, the value of corresponds to the decrease in the number of scheduled tasks because of the delayed arrival of a task. Because of domino eects, the value of can be theoretically high, but in most cases the maximum number of concurrent instances can be a reasonable estimate of the highest cost increase for a new schedule obtained by changing the starting time of a task. If we adopt the jitter metric, the value of should be sensibly higher and greater than the double of the maximum period of the application processes, to allow for a temporary transition to a solution where at least one more task instance gets rejected. As regards the nal value T1 , it is sucient to say that the minimum increase in cost for both our metrics is one unit. This means that for T = 0:06 the probability of accepting any higher-cost solution
T0 =
is lower than p = 4 10?7 and for T = 0:02 lower than 10?20. Going below this last value makes little sense.
3.4 Implementation and considerations of eciency All the procedures called from the main loop of the simulated annealing algorithm must be as ecient as possible. In our case, the task to be modi ed is chosen with a look-up table and the admissible transitions are obtained by scanning a double linked list of task instances. The computation of the new schedule and the corresponding evaluation are implemented with a FIFO scheduler running in time quadratic in the size of the problem (number of task instances). The overall complexity is therefore O(n4 )
3.5 Considerations on task allocation In our model, we assumed the tasks to be statically assigned to the processors on the basis of locality or other optimization functions (distribution of the system load or minimization of the communication ow over the network). Actually, nding the optimal allocation for the system tasks according to most of the metrics of practical interest, is another NP-complete problem. A simulated annealing solution has been already proposed for this problem [3]. In our system, nothing prevents us from doing the same thing, despite a slower execution speed and an increased complexity for the scheduling algorithm. It is sucient to de ne another transition operator, that modi es the allocation of a randomly chosen task and use it in conjunction with the previously described operator that modi es the release times. At this point, it is necessary to consider two additional activities. First, the usage of the network must be re-evaluated for each new iteration and cannot be simply done statically, second, the evaluation of the new solution must not only consider the schedulability of the task instances with respect to their deadlines, but also the feasibility of the new allocations (e.g., because of memory bounds). Simulated annealing can then perform allocation of the tasks together with their scheduling. The great exibility of the method, that easily permits extensions or subcases of the original problem with minimum eort is probably the greatest advantage in the use of simulated annealing techniques.
4 Scheduling tool description and experimental results 4.1 The scheduling tool We built a scheduling tool based on simulated annealing and ran experiments. The DTDL description of the application is translated into a graph of processes and tasks linked by precedence constraints. Then, a description of the system architecture is automatically computed from the allocation declarations and the system resources are identi ed. Processes and tasks are sorted into two separate lists. Each process descriptor contains a list of all the tasks that originate from it. All tasks descriptors contain a reference to the predecessors and the successors (two linked lists) and another linked list to the resources they need. Once the scheduler has a representation of all tasks, processes and system resources, it begins to compute all the task instances contained in the scheduling period. The instances are ordered in a double-linked list according to their starting (arrival) times. The instance descriptors contain the time information inherited from the processes and tasks they represent, together with linked lists for predecessors and successors and the speci cation of the resource they use. The double-linked list of the task instances works as an event list during the iterative scheduling process. Each instance is characterized by an arrival time. When the partial schedules are evaluated, the list is scanned from the rst to the last arrival time. Each instance descriptor has a bidirectional link to a similar descriptor representing the termination event for the task instance. The event (or instance) list is scanned, simulating the time ow. When a new instance is examined, in case it is an arrival of a task instance, the instance is either queued in the ready list of its cpu or in the waiting lists of the resources that it needs and are currently in use. The procedure for checking the schedulability and assigning the cpu is called after the instance is in the ready list and in case the cpu is idle. Whenever an instance reaches the top of its ready list and all its predecessors are completed, its schedulability is checked. If the instance is found schedulable it gets the cpu, otherwise it is rejected together with all its successors. When an instance gets the cpu, the corresponding descriptor for the ending event is inserted in the event list, in a position corresponding to the worst case ending time. Later, while scanning the event list, the scheduler will nd the ending instance and the corresponding cpu
PROCESS proc1 ( NODE one CPU c2 PERIOD 40 DEADLINE 20 TASK T1 ( RES res1 CTIME 5 ), TASK T2 ( CTIME 2 ) );
PROCESS proc4 ( NODE one CPU c1 PERIOD 40 DEADLINE 20 TASK T1 ( PREC proc1 T1 CTIME 4 ), TASK T2 ( CTIME 2 ) );
Figure 3: two processes in the experimental set will be set idle, the resources used by the ending instance will be made available and the dispatching procedure will be called for all the idle cpus in the node. At a more abstract level, we have a rst phase, when the task instances are ordered in an event list and the starting times calculated according to the equations ( 5) and a second phase when the simulated annealing algorithm described in gure 2 takes control. The algorithm works through successive modi cations to the con guration of the task instances, the computation of the resulting schedules and the evaluation of the results. At the end of the cooling process the nal result is a set of tables containing the activation times (when the tasks get the control of the cpu) and the ending times of all task instances and for each cpu. These tables are stored in les and can be used at run time by the local dispatchers.
4.2 Experimental examples We show some experimental examples to explain the way the scheduling algorithm works and to show its potential performance advantages. We present three examples on three slightly dierent task sets. All the tasks and processes of the three examples differ only in the activation periods. To describe the simulated system we give the DTDL notation of two precedence related tasks ( gure 3) extracted from the overall set used in the experiments (the complete description can be found in [4]) and then we give a table, gure 4, that shows the computation times, the deadlines and the periods of all the processes in the three task sets. Figure 5 plots the maximum jitter as a function of the number of iterations of the algorithm. The Y-axis represents the maximum jitter among all tasks as the simulated annealing scheduler iterates for lower values of the temperature T (the plotted values are those obtained at the end of each iteration). The continuous
proc0 2+5 20 60 30 30 proc5 2+4 20 40 20 40
Comp times Deadlines Periods set1 Periods set2 Periods set3 Comp times Deadlines Periods set1 Periods set2 Periods set3
proc1 5+2 20 40 20 40 proc6 1 2 40 20 16
proc2 6+2 20 40 20 40 proc7 1 4 60 30 60
proc3 1+7 20 60 30 30 proc8 1 6 40 20 40
5 Conclusions
proc4 4+2 20 40 20 40 proc9 1 10 40 20 40
Simulated annealing algorithms are often used to solve NP-complete problems. It has the potential to avoid local minima while giving approximate solutions in polynomial time [1]. Even though it is polynomial, the algorithm remains computationally heavy which suggests its use only for problems that are not easily solvable with other techniques. A valuable example of problems of this kind is distributed, real-time scheduling with jitter minimization. Simulated annealing's
exibility and the potential for solving problems not easily solvable in a conventional way should be sucient to develop simulated annealing based scheduling tools such as the one developed here.
| | | |
References
| | | | | | | | | | | |
|
80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0
|
maximum jitter
Figure 4: Process attributes in the examples
0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 iterations
Figure 5: Maximum jitter on each iteration line on the graph shows the results for the Task set 1. For this task set, the system is not heavily loaded and simple EDF scheduling can give the correct schedule immediately. The solution obtained by EDF scheduling has a maximum jitter of 5 time units (dotted horizontal line) while our simulated annealing scheduler manages to order the task instances with no jitter. The dashed line shows the results for Task set 2. This is a reasonably loaded system obtained from the previous speci cation by dividing the process periods by 2. In this case, the EDF scheduler cannot nd a feasible solution (2 task instances are rejected). With our simulated annealing algorithm the whole set is scheduled and the maximum jitter in the activation of the task instances is as low as 3.5 time units. The third curve corresponds to a medium loaded system (Task set 3) where the processes have four dierent periods (more instances in the scheduling period). Again, EDF scheduling doesn't work, while our scheduler gives a feasible solution with a maximum jitter of 15 time units.
[1] E. Aarts, J. Korst, \Simulated Annealing and Boltzmann Machines," Wiley and Sons, 1989. [2] E. Aarts, P. Van Laarhoven, \A New Polynomial Time Cooling Schedule," Proc. IEEE Conf. on Computer Aided Design, Santa Clara, 1985. [3] K. Tindell, A. Burns, A. Wellings \Allocating real-time tasks (an NP-hard problem made easy)," Real-Time Systems Journal, June 1992. [4] M. Di Natale, J. Stankovic "A Simulated Annealing solution for Multiprocessor Scheduling," Tech. Report, ARTS lab 10-95, Scuola Sup. S. Anna, Pisa [5] G. Fohler, "Flexibility in Statically Scheduled Hard Real-Time Systems," Technisch Naturwissenschaftliche Fakultaet, Technische Universitaet Wien, April 1994. [6] A.K.Mok,"Fundamental Design Problems of Distributed Systems for the Hard Real-Time Environment," Ph.D. Thesis, MIT Cambridge MA, May 1983. [7] J. D. Ullman,"Polynomial Complete Scheduling Problems," Proc. 4th Symp. on Operating Systems Principles, 1973. [8] E. L. Lawler, "Recent Results in the Theory of Machine Scheduling," Mathematical Programming: the State of the Art, A. Bachen et al. (eds.), Springer Verlag, New York, 1983. [9] D. Mitra et. al., \Convergence and Finite-Time Behavior of Simulated Annealing," Proc. 24th Conf. on Decision and Control, Ft. Lauderdale, Dec. 1985. [10] J. Xu and D. Parnas, \Scheduling Processes with Release Times, Deadlines, Precedence, and Exclusion Relations," IEEE Transactions on Software Engineering, Vol. 16, No. 3, pp.360-369, March 1990.