An Engineering Approach to Decomposing End-to-End Delays on a Distributed Real-Time System
Manas Saksena Department of Computer Science Concordia University, Canada
Seongsoo Hong School of Electrical Engineering and ERC-ACI Seoul National University, Korea
[email protected]
[email protected]
Abstract In this paper we propose an adequate engineering technique for decomposing end-to-end delays in distributed real-time systems. Our technique greatly simpli es real-time system design process by turning a global distributed scheduling problem into a set of single processor scheduling problems with local deadlines. The deadline decomposition is done using critical scaling factor [4] as a schedulability metric. As the problem is extremely hard in general, we develop an approximate technique using a simple linear response time model to generate a quick initial solution. We then go on to show how the initial solution helps us identify the bottlenecks, and then use that knowledge to iteratively ne tune the initial solution. The end result is a practical engineering technique to decomposing end-to-end deadlines.
1 Introduction Recent maturity of real-time scheduling and analysis techniques (e.g., rate-monotonic and other variants of static priority scheduling), and their incorporation into industrial systems have established a periodic task model as an essential vehicle for real-time systems development. In such a periodic task model, tasks are made to run repeatedly at xed rates interacting with each other in a controlled manner. However, as real The work reported in this paper was supported in part by NSERC Operating Grant OGP0170345, and by Engineering Research Center for Advanced Control and Instrumentation (ERCACI) under Grant 95-26.
time applications become more diversi ed and complex, modeling them as a set of independent, periodic tasks gets more dicult. Many of today's real-time systems such as multimedia, manufacturing and vehicle control systems require sharing resources and data, synchronizing task executions, and timely ow of data through multiple data paths on a distributed platform. These systems often possess constraints which are established between external inputs and outputs such as end-to-end deadlines, input and output jitter, and rate constraints. Such end-to-end constraints, over a distributed system, lead to an extremely complex scheduling problem. One way to approach this is through global scheduling of tasks and resources | this brings strong coupling among scheduling decisions taken on each node, not easy to change, not easy to understand, and forces a common scheduling paradigm on all nodes. Another approach is to decompose constraints into local constraints on all processors, and schedule each processor independently. While not globally optimal, it seems more manageable. The reason is obvious: Scheduling on single processors is a better understood problem than distributed scheduling, and allows loose coupling between processors; in dynamic systems it allows new scheduling decisions to be taken locally without a complex global scheduling protocol. Also, it permits dierent scheduling techniques on each processor, tailored to meet the tasks scheduled on that processor. Thus we take this approach. With this approach, however, one must have a systematic way, based on notions of schedulability, of decomposing end-to-end constraints into local constraints. The current practice of complex real-time
system engineering is to transform the high-level endto-end constraints into a set of task-speci c constraints so that an underlying scheduling algorithm and a runtime dispatcher can process them for analysis and enforcement. This transformation process is often a manual and labor-intensive exercise of trial and error. Developers may reach a nal set of feasible constraints only through numerous iterations, and even then through conservative approximations of timing constraints at each step. There is also a serious problem in this practice: Many of the key synchronization and precedence requirements get tightly coupled with the derived timing constraints. As a result, real-time programmers often lose the trace-ability of the system under development or maintenance, and signi cant redesign may be required if the timing constraints are changed. We addressed these problems and presented a partial solution in a comprehensive design methodology [2, 3]. The methodology was formulated for single processor systems, and subsequently extended for distributed systems [5]. The extended approach, like the original one, has three components, as shown in Figure 1. The rst component is a constraint deriver, which translates an application's end-to-end requirements into a system of non-linear inequalities considering the application's topology and task allocation. The constraint system contains as free variables task speci c attributes, such as task periods, osets, deadlines and initial phasing. The second component is a constraint solver, which computes solutions of the constraint system. The last one is a task scheduler, which analyzes and schedules the resultant tasks. We pay special attention to the constraint solver, since it is of the greatest complexity among three. It tackles a system of constraints possessing four dierent types of variables, which defy a single uniform optimization criterion. Here, we again rely on the decomposition strategy to render the optimization problem manageable. To be speci c, we classify these variables into three classes, namely periods, delays or deadlines, and phases (osets and intertask phases), and provide a distinct optimization criterion for each variable class. For example, \utilization" is used to derive periods, and \schedulability" delays. Then we solve the system for one variable class at a time: For a given constraint system , we can single out a speci c class of variC
ables by removing variables of other classes from via variable elimination techniques [3]. Let us call the resultant constraint system 0 . After solving 0 , we plug the values of the solution back to and obtain 00 . We proceed to solve 00. The process is depicted in Figure 1. In this paper, we only address the problem of deriving intermediate delays which has the objective of maximizing the schedulability of the system at hand. We do not intend to discuss other dicult problems, since we can adopt a similar approach from [2, 3]. However, our problem is still extremely challenging, not only because the schedulability is a very dicult measure to formulate, but also because it is very hard to evaluate the schedulability of a task set when not all attributes of the task set are known. Therefore, instead of attempting to nd \best solution," we focus on eciently determining an initial solution that enables us to identify schedulability bottlenecks. We also propose a strategy for a heuristic solver to iteratively re ne the initial solution taking into account the bottlenecks identi ed in our initial solution. While not optimal, the method is helpful as an engineering technique to identify the bottlenecks, and make tradeos based on those bottlenecks, allowing a real-time system designer to converge to an acceptable solution quickly. C
C
C
C
C
C
2 Problem Description and Solution Overview 2.1 Task and System Model We consider a set of end-to-end transactions ? = ?1 ; ?2; : : :; ?m , as constituting an application. Each transaction ?i has an end-to-end deadline Di , and represents a chain or sequence of tasks i1 ; : : :ini from an external input to an external output. A data or control dependency exists between two successive tasks in a transaction, and imposes a precedence requirement on scheduling. The tasks of a transaction may span multiple processors. We model the network as a processor, and treat message transfer over the network as a (network) task. Transactions may be overlapping, i.e., tasks may be shared between two or more transactions. Tasks form the units of scheduling. Let = 1; : : :; n be the set of all tasks from the transactions. Each task i is f
g
i
h
T
f
g
2 T
Task Graph, Task Code, End−to−End Requirements,
Task Allocation
Constraint Derivation
General Non−Linear Inequations Constraint Solving for Task Offsets and Task Phasing
Constraint Solving for Local Delays
Constraint Solving for Task Periods
Per−Processor Scheduling
Schedulable Tasks with Offsets, Deadline, Periods and Phasings
Figure 1: The structure of the end-to-end design methodology. given two attributes: a period Ti , and a maximum execution time ei . Furthermore, let pi denote the processor on which the task i is allocated. For any two tasks, i and j , if i immediately precedes j in some transaction, then Tj exactly divides Ti ; i.e., if Tj =Ti = k then j is invoked once every k executions of i . This allows two transactions with dierent rates of execution to share tasks, without imposing a common period on them.
Example. We present a simple example task set that
will be used throughout this paper to illustrate the methodology. The example is meant primarily as an aid to the reader, and is not intended as a representative task set. Consider a system with three transactions ?1 = 1; 2 , ?2 = 3 ; 4 , ?3 = 5 ; 6 . Let the tasks 1 ; 3; 5 be allocated on processor 1 and the others on processor 2. To keep the example simple, we assume that message transfer delay over the network is bounded below by 5 time units. The attributes for the tasks are given below: h
i
h
i
h
i
1 2 3 4 5 6 ei 4.5 6 4 2.5 5.25 8.75 Ti 15 15 10 10 35 35 Each of the transactions has an end-to-end deadline, given as D1 = 19, D2 = 17, and D3 = 80.
Since each transaction involves one message transmission, the end-to-end delays are tightened as D10 = 14, D20 = 12, and D30 = 75.
2.2 Problem Statement Given an application as de ned above, with the endto-end deadlines on its transactions, the problem is to derive local deadline on each task.1 Let di denote the deadline of a task i . The deadlines derived for the tasks must satisfy the end-to-end deadline on each transaction; thus for each transaction ?i , the following constraint must be satis ed [5]: di1 + d i2 +
+ dini
Di
Let denote the set of all such constraints, and let D = di :: 1 i n be any solution satisfying the constraints . A solution that merely preserves the end-to-end deadlines is not very useful; it may be trivially unschedulable. Therefore, the local deadlines on the tasks must be derived in a manner that leads to \higher schedulability." In our approach, we tackle the deadline assignment C
f
g
C
1 In fact, a sequence of tasks on the same processor can be bunched up, with a common deadline; however that only changes the number of variables (i.e., the size), not the nature of the problem.
problem after periods have been assigned to the tasks, but other task attributes such as osets and phases have not been determined. Thus, the following information is known: (i) allocation of tasks to processors, (ii) task periods, and (iii) task execution times. Hence, given a solution for the deadlines, we can ask the question: Is the task set = i = Ti ; ei ; di :: 1 i n schedulable on each processor. The scheduling of such task sets is well understood, and it is known that such a task set is schedulable if and only if it is schedulable using earliest deadline rst scheme [6]. Note that this is only an intermediate stage in our entire methodology, and therefore all task and system attributes are not known at this time. This means, that the schedulability computed at this stage cannot guarantee that the nal task set will be schedulable; rather it may serve as a heuristic for deriving deadlines. Thus it would be better to have a continuous measure of schedulability, instead of a binary measure of \yes/no." A good measure for schedulability is critical scaling factor [4]. Let D = di :: 1 i n be any solution satisfying the end-to-end constraints. Then, the critical scaling factor (D) of the solution is the largest value of , such that the task set 0 = T ; e ; d :: 1 i n is schedulable. We rei i i fer to the critical scaling factor as the gain of a solution; it represents the capacity of the task set to accept up to ei computation demand for each task in without sacri cing schedulability. Clearly, the larger the gain, the better the solution. Thus, the problem may now be stated as follows: T
f
h
i
f
T
fh
i
g
g
g
T
\For given , determine set D of local intermediate deadlines di, satisfying the endto-end constraints such that (D) is maximized." T
2.3 Solution Overview It is well known that EDF is an optimal scheduling algorithm for a periodic task set with deadlines, as is the case in our problem [6]. Therefore, it is possible to devise an optimization algorithm that searches over the feasible solution space for D (i.e., those solutions that satisfy the end-to-end constraints), and determine the gain of a solution by iteratively running EDF on task set Ti ; ei ; di :: 1 i n for dierent values of . However, even a cursory thought reveals that this leads fh
i
g
to an overwhelmingly large search space. Therefore, we look for simpler algorithms that can be used to nd a \good," but not necessarily an optimal solution. To meet this end, we look for alternate dispatching models, which provide sucient, but not necessary, tests for schedulability. Examples of such dispatching models include deadline monotonic, round-robin, etc. In order to deal with an arbitrary dispatching model, we formulate a schedulability measure based on the response time of a task as in [1, 7]. Let riS (D; )2 denote the response time of task Ti ; ei ; di , with respect to some dispatching model S, in the task set = Ti ; ei ; di . Then, the gain of the solution with respect to the dispatching model S is: h
T
h
i
i
S (D) = min :: riS (D; ) di :
Since EDF is an optimalEDF scheduling model for our task sets, it follows that (D)S = (D),EDFand for any other dispatching model S, (D) (D). Static priority dispatching is an attractive dispatching model, as it provides simpler and more ecient tests for schedulability as compared to EDF, and yet closely approximates EDF for schedulability in most cases. However, static priority models introduce another variable into the picture, namely task priorities. Furthermore, an optimal priority assignment depends on the deadlines, leading to a circular dependency, which is not easy to resolve. To circumvent this problem, we adopt a simpler dispatching model based on \weighted processor sharing." In this approximation, which we call linear response time dispatching (LRT), each task gets a processor share which is equal to the relative utilization of the task. As a result, a task's response time does not depend on its deadline, breaking the circular dependency, and making it feasibleLRTto eciently obtain a solution D which maximizes (D). In Section 3, we present an algorithm for it. As one might expect, in general the linear response time model may be too far o from the optimal dispatching of EDF. However, as we show in Section 4, it provides a base to build on a better solution through iterative re nement. The model incorporates the load on processors in determining the response times, and
2 We use superscripts to indicate the dispatching model used, dropping it when it can be inferred from the context. We also identify any additional parameters within parenthesis.
therefore, allows us to quickly identify bottleneck tasks and transactions. That then serves as a simple heuristic to re ne the solution by assigning higher priorities to bottleneck tasks, and using a more elaborate dispatching model based on xed priorities.
3 Solving with Linear Response Time Model In linear response time dispatching, each task gets a processor share equal to its relative utilization. Thus, a task i gets a fraction fi = UU(pii ) , where Ui is the utilization (ei =Ti ) of the task i , and U(pi ) is the total utilization of processor pi on which the task i is allocated. The response time of a task i is thus: ri = ei =fi, which simpli es to ri = Ti U(pi ). Thus, the response time of i is equivalent to the amount of average computation demand of the processor within Ti . In other words, if a task is allocated on a lightly loaded (low utilization) processor, its response time will be small, as compared to tasks that are allocated on a heavily loaded processor.
Example Revisited: Revisiting our example, we
can determine the response times for each of the tasks using the linear response time model. First the utilizations on the two processors are: U1 = 0:85, and U2 = 0:9. From the utilizations, the response times are easily computed as follows: 1 2 3 4 5 6 ei 4.5 6 4 2.5 5.25 8.75 Ti 15 15 10 10 35 35 Ui 0.3 0.4 0.4 0.25 0.15 0.25 ri 12.75 13.5 8.5 9.0 29.75 31.5 Now, consider the task set with task execution times scaled by . Since the relative utilizations stay the same, it follows that a task gets the same share of the processor. Thus, for any task i , we have riLRT () = ri . We use this fact to eciently determine a solution D = di :: 1 i n , which maximizes LRT the gain : Given the task response times it is straightforward to determine if a solution exists to the constraint set , by simply substituting di = ri, and checking for constraint violations. The problem of nding the optimal T
f
g
C
gain is also quite straightforward, since it reduces to nding the largest scaling factor , such that the set of deadlines di = ri satis es the constraints . This amounts to solving a set of constraints with a single variable , such that the value of the variable is maximized. However, this approach will leave enough slack room for many end-to-end deadlines, and some of the task deadlines may be further relaxed. In Figure 2, we present an algorithm which accomplishes this by rst obtaining an optimal solution as shown above, and then re ning it by relaxing deadlines of non-bottleneck tasks. To accomplish this task, we de ne and use a new variable i = di=ei in the algorithm. Intuitively, i represents the gain of single task i . A solution is determined as a set of values = i :: 1 i n , with a gain of ( ) = mini i . The objective of the algorithm is then to rst to nd solution which maximizes ( ), and then re ne the solution to increase the values of i 's (i.e., relax the deadlines) for nonbottleneck tasks. The algorithm proceeds as follows: C
f
g
(1) We substitute all i by and obtain 0 [ i :: i =]. Then we nd the maximum value of satisfying 0 . (2) At this time, we have found the optimal value for . However, it is still possible to nd higher values of i for some of the tasks. This is done by substituting i by for all tasks in constraints that are just satis ed at this value. This results in a smaller set of constraints and variables i 's, which is then solved again in the same manner. This process is repeated until no more variables are left. (3) The individual deadlines are nally computed from i 's as di = i ri . C
C 8
C
Example Revisited: Let us revisit our example, and trace the algorithm on the example.
(Step 1) From the end-to-end deadlines, we have the following constraint set:
d1 + d2 14; d3 + d4 12; d4 + d5 75 (Eq 1)
'
$
Algorithm End-to-End Delay Decomposition Input: : set of variables 1 ; 2 ; : : : ; n . R : set of response times r1 ; r2 ; : : : ; rn . C : set of end-to-end delay constraints.
Output:
D : set of delays d1 ; d2 ; : : : ; dn .
1. 2. 3. 4. 5. 6. 7. 8.
&
Let C 0 = C [8i :: i =], and Let be the maximum value of that satis es C 0 . Let C 00 be the set of inequalities in C 0 that are \just satis ed" at = .
foreach 2 C 00 do foreach variable in do C := C [ = ] ; := ; := ? ; if not Nil then goto 1. j
j
j
j
8i :: di := ri i ;
Figure 2: Decomposition of End-to-End Deadlines into Local Deadlines
(Step 2) Since the scaled response time should be no greater than the deadline,
ri(i ) = ri i di ; for i = 1; 2; : : :; 6: (Eq 2)
Combining inequalities (Eq 1) and (Eq 2), we derive constraint set . r11 + r2 2 14; r3 3 + r4 4 12; r55 + r66 75 C
C
f
g
(Step 3) Solve [ i :: i =] such that is maximized. C 8
As a result, = 14=26:25 = 0:533. The bottleneck constraint is the rst constraint, thus giving 1 = 2 = 0:533. We can iterate further to get 3 = 4 = 12=17:5 = 0:686, and 5 = 6 = 1:22. The associated deadlines are then d1 = 6:8, d2 = 7:2, d3 = 5:83, d4 = 6:17, d5 = 36:43, d6 = 38:57.
4 Bottleneck Analysis Once a solution has been determined for the deadlines using the linear response time model, we can determine whether to accept the solution, or whether to look for a better solution using more intricate algo-
%
rithms. In the best case, the gain of the solution derived may be suciently high, and therefore there is no need to search for better solutions. However, the more interesting situation is when the optimal gain using the linear response time model is low, and we desire a solution that can give us a higher gain. Let us consider how further analysis and re nement of the solution may proceed once an initial solution has been obtained using the algorithm presented in the previous section using linear response time model. The values of i 's obtained as a result of our algorithm are helpful in proceeding further. Revisiting our example, we nd that the gains for tasks 1 ; 2; 3; and 4 are small, especially as compared to the other two tasks. This clearly indicates that these are the bottleneck tasks, and we should concentrate our eorts on them. In essence this implies that either (1) their response time should be lowered, or (2) their deadlines should be relaxed. A careful look reveals the following two fundamental tradeos: (1) A task i with a low value of i may be assigned a higher priority than another task j with a high value of j . This, in eect, reduces the response time of one task (i ), with a potential increase in the response time of the other (j ), thus bringing
their gains closer to each other. That, in turn, will improve the overall gain of the solution if i was the bottleneck task (i.e., one with the smallest gain). In general, this tradeo may be done for sets of tasks. For instance, the gains of the tasks in the example task set suggest that we can increase the overall gain of the solution by assigning lower priority to tasks 5 and 6 , as compared to the other four tasks. (2) In a distributed multiprocessor system, another tradeo may be obtained by trading o higher priority on one processor with a lower priority on another processor. For example, transactions ?1 = 1 ; 2 and ?2 = 3 ; 4 are the bottleneck transactions. By trading o high priority for 1 (as compared to 3 ), with low priority for 2 (as compared to 4 ), an overall gain in the solution may be obtained. In general such a tradeo may involve many transactions and many processors, leading to exponential number of combinations. Furthermore, there is no easy way to determine, a priori, which one of those combinations will work best. Unfortunately, there seems to be no way to circumvent this problem. However, as we illustrate shortly, out methodology allows us to identify when such a tradeo needs to be made, and isolate the tasks involved in the tradeo. While not solving the problem completely, it does go a long way in helping a real-time system designer to make the appropriate choices.
possible to determine the response time using a combination of xed priority response time analysis [7], and the analysis presented in this paper for linear response time model. Unfortunately, the task of determining an optimal set of deadlines, for a given set of response times is not that simple since the response times do not scale linearly with in the hybrid model. However, the algorithm presented in previous section may be used an approximate algorithm for this purpose. The interesting aspect of this model is that it subsumes both the linear response time model (by assigning equal priority to all tasks), and the xed priority model (by assigning unique priority to each task). By appropriately dividing the tasks into priority classes, it is possible to discriminate between processes \just the right amount" needed to get a good deadline decomposition. Our strategy is then outlined as follows:
4.1 A Hybrid Dispatching Model
show how the above heuristic solving strategy would work. Based on the gains, we divide the system into two sets of tasks | the higher priority tasks 1 = 1 ; 2 ; 3 ; 4 , and the lower priority tasks 2 = 5 ; 6 . The response times are now obtained as: r1 = 10:5; r2 = 9:75; r3 = 7; r4 = 6:5; r5 = 26:25; and r6 = 28:25. Then using the algorithm, we get 1 = 2 = 0:691; 3 = 4 = 0:88; and 5 = 6 = 1:376. This clearly shows that we can proceed further by ignoring tasks 5 and 6 , since they have high gains even with the lowest priorities. Focusing attention on the remaining tasks, we realize that all tasks have low gains, and we must use the second tradeo mentioned earlier. In this case, we are
h
i
h
i
Based on the above tradeos, we propose a hybrid dispatching model which combines the linear response time dispatching and xed priority dispatching. In this model, each task is also assigned a priority; however, tasks may have equal priority. That is, tasks are decomposed into priority classes. The dispatching between classes follows a xed priority model, while dispatching within a class follows the linear response time model. Thus, the response time of a task is determined by (1) interference from the higher priority tasks, and (2) the proportional progress within its own priority class. Given the set of tasks, and their priorities, it is
(1) Start with equal priorities for all tasks (i.e., the linear response time model), and nd a solution using the algorithm presented in this paper. (2) Using the gains obtained, and the tradeos presented earlier, assign tasks to dierent priority classes using some heuristics and/or human assistance. (3) Based on the priorities assigned, determine the worst case response times for all the tasks. Then, using algorithm presented earlier, obtain a solution to deadlines and gains. (4) If a satisfactory solution is found, then stop, otherwise go back to step 2.
Example Revisited: We revisit our example to
T f
f
g
g
T
left with 4 priority combinations. The following table shows the response times for each combination. Response Times High Priority Tasks 1 2 3 4 1 and 2 4.5 6 8.5 8.5 1 and 4 4.5 8.5 8.5 2.5 3 and 2 8.5 6 4 8.5 3 and 4 8.5 8.5 4 2.5 It is clear that the best choice of priorities is when 1 and 4 get high priorities (on processor 1 and 2 respectively), while 2 and 3 get low priorities. With this estimate of response times, we can determine the deadlines as: d1 = 4:85, d2 = 9:15; d3 = 9:27, and d4 = 2:73, and with a gain = 1:076. The above example illustrates several aspects of our methodology which we highlight now: (1) It shows how the linear response time model, and the hybrid model help us identify schedulability bottlenecks, (2) It shows how the bottlenecks can be used to identify tradeos needed to improve the overall solution, and (3) It shows how selective prioritization may be used to isolate the second tradeo (which, we believe is the major source of complexity for this problem) to a limited set of tasks. Unfortunately, it doesn't give all the answers. Heuristics or human guidance are still needed to make the correct tradeos. However, we believe that our methodology provides a substantial bene t by isolating and identifying the nature of tradeos involved.
5 Conclusion In this paper we have presented an adequate engineering technique to decomposing end-to-end delays of distributed real-time systems. Our technique makes use of approximate response times and critical scaling factor to model schedulability of tasks. It computes an initial solution very fast and helps programmers identify bottlenecks in their design. We propose a strategy for a heuristic solver to iteratively ne tune the initial solution. There is room for further improvement and future research. First, the eectiveness of the linear response time model should be analyzed through experimentation. Secondly, we need to develop heuristics to make the tradeos needed to iteratively re ne the initial solution. Finally, our technique should be put in the entire
design tool as shown in Figure 1. We are currently implementing the tool. Lastly, we believe that some of the concepts introduced here may be useful in adaptive real-time domains such as multimedia applications.
References [1] N. Audsley. Optimal priority assignment and feasibility of static priority tasks with arbitrary start times. Technical Report YCS 164, Department of Computer Science, University of York, England, December 1991. [2] R. Gerber, S. Hong, and M. Saksena. Guaranteeing end-to-end timing constraints by calibrating intermediate processes. In Proceedings of IEEE RealTime Systems Symposium, pages 192{203. IEEE Computer Society Press, December 1994. [3] R. Gerber, S. Hong, and M. Saksena. Guaranteeing real-time requirements with resource-based calibration of periodic processes. IEEE Transactions on Software Engineering, 21(7), July 1995. [4] J. Lehoczky, L. Sha, and Y. Ding. The rate monotonic scheduling algorithm: Exact characterization and average case behavior. In Proceedings of IEEE Real-Time Systems Symposium, pages 166{171. IEEE Computer Society Press, December 1989. [5] M. Saksena and S. Hong. Resource conscious design of real-time systems: An end-to-end approach. Technical Report ASRI-TR-95-01, Automation and Systems Research Institute, Seoul National University, Korea, November 1995. [6] J. Stankovic, M. Spuri, M. Di Natale, and G. Buttazzo. Implications of classical scheduling results for real-time systems. IEEE Computer, pages 16{ 25, June 1995. [7] K. Tindell, A. Burns, and A. Wellings. An extendible approach for analysing xed priority hard real-time tasks. The Journal of Real-Time Systems, 6(2):133{152, March 1994.