Dynamic Scheduling of Parallelizable Tasks and ... - CiteSeerX

4 downloads 524 Views 201KB Size Report
Resource Reclaiming in Real-time Multiprocessor Systems ... processor has its own dispatch queue. ... ways nd some tasks in the dispatch queues when they.
Dynamic Scheduling of Parallelizable Tasks and Resource Reclaiming in Real-time Multiprocessor Systems G.Manimaran C.Siva Ram Murthy Department of Computer Science and Engineering Indian Institute of Technology Madras 600 036, INDIA Fax: 91-44-2350509 Email: fgmani@bronto.,[email protected]

Abstract

Many time-critical applications require predictable performance and tasks in these applications have deadlines to be met despite the presence of faults. In this paper, we propose a new dynamic non-preemptive scheduling algorithm for a relatively new task model called parallelizable task model where real-time tasks can be executed concurrently on multiple processors. We use this parallelism in tasks to meet their deadlines and thus obtain better processor utilization compared to nonparallelizable task scheduling algorithms. We assume that tasks are aperiodic. Further, each task is characterized by its deadline, resource requirements, and worst case computation time on p processors, where p is the degree of task parallelization. To study the e ectiveness of our algorithm, we have conducted extensive simulation studies and compared its performance with the myopic1 scheduling algorithm [8]. We found that the success ratio o ered by our algorithm is always higher than the myopic algorithm for a wide variety of task parameters. Also, we propose a resource reclaiming algorithm to reclaim resources from parallelizable real-time tasks when their actual computation times are less than their worst case computation times. Our parallelizable task scheduling together with its associated reclaiming o ers the best guarantee ratio compared to the other algorithmic combinations.

1 Introduction

Multiprocessors have emerged as a powerful computing means for real-time applications such as exible manufacturing and process control because of their capability for high performance and reliability. The problem of multiprocessor scheduling is to determine when and where a given task executes. This can be done ei This work was supported by the Indian National Science Academy, and the Department of Science and Technology. 1 Scheduling algorithm used in the Spring kernel.

ther statically or dynamically. In static algorithms, the assignment of tasks to processors and the time at which the tasks start execution are determined a priori. Static algorithms are often used to schedule periodic tasks with hard deadlines. However, this approach is not applicable to aperiodic tasks whose arrival times and deadlines are not known a priori. Scheduling such tasks in a multiprocessor real-time system requires dynamic scheduling algorithms. In dynamic scheduling [2, 8], when new tasks arrive, the scheduler dynamically determines the feasibility of scheduling these new tasks without jeopardizing the guarantees that have been provided for the previously scheduled tasks. A feasible schedule is generated if the timing and resource constraints of all the tasks can be satis ed. Tasks are dispatched according to this feasible schedule. Dynamic scheduling algorithms can be either distributed or centralized. In a centralized scheme, all the tasks arrive at a central processor called the scheduler, from where they are distributed to other processors in the system for execution. In this paper, we assume shared memory multiprocessor with centralized scheduling scheme. The communication between the scheduler and the processors is through dispatch queues. Each processor has its own dispatch queue. This organization, shown in Fig.1, ensures that the processors will always nd some tasks in the dispatch queues when they nish the execution of their current tasks. The scheduler will be running in parallel with the processors, scheduling the newly arriving tasks, and periodically updating the dispatch queues. The scheduler has to ensure that the dispatch queues are always lled to their minimum capacity (if there are tasks left with it) for this parallel operation. This minimum capacity depends on the average time required by the scheduler to reschedule its tasks upon the arrival of a new task [9]. It was shown in [2] that there does not exist an algorithm for optimally scheduling dynamically arriving

Min. length of dispatch queues

P1 New Tasks

Task queue

P2

Scheduler

Current schedule

P3 Dispatch queues (Feasible schedule)

Fig.1 Parallel execution of scheduler and processors

tasks with or without mutual exclusion constraints on a multiprocessor system. These negative results motivated the need for heuristic approaches for solving the scheduling problem. Recently, many heuristic scheduling algorithms [8] have been proposed to dynamically schedule a set of tasks with computation times, deadlines, and resource requirements. For multiprocessor systems with resource constrained tasks, a heuristic search algorithm, called myopic scheduling algorithm, was proposed in [8]. The authors of [8] have shown that an integrated heuristic which is a function of deadline and earliest start time of a task performs better than simple heuristics such as the EDF, least laxity rst, and minimum processing time rst. Meeting deadlines and achieving high resource utilization are the two main goals of task scheduling in realtime systems. Both preemptive and non-preemptive algorithms are available in the literature to satisfy such requirements. The schedulability of a preemptive algorithm is always higher than its non-preemptive version. However, the higher schedulability of a preemptive algorithm has to be obtained at the cost of higher scheduling overhead. Parallelizable task scheduling, considered in this paper, is an intermediate solution which tries to meet the con icting requirements of higher schedulability and low overhead. Most of the known scheduling algorithms [8] consider that each task can be executed on a single processor only. This may result in missing of task deadlines when tasks' laxities are tight. Moreover, tasks would miss their deadlines when their total computation time requirement is more than the deadline. These are the motivating factors to go in for parallelizable task scheduling. Parallelizable real-time task scheduling has wide applicability in problems such as robot arm dynamics and image processing. For example, the robot arm dynamics

problem consists of two computational modules: computation of the dynamics and the solution of a linear system of equations, both of them exhibit high degree of parallelism and have real-time constraints [10]. Similarly, a typical real-time image processing application involves pixel-level operations such as convolution which can be carried out in parallel on di erent portions of the image, and operations in a task such as matching, grouping, and splitting of objects can also be done in parallel [3]. The NP-completeness of several cases of parallelizable task scheduling has been proved in [5]. Also, a heuristic algorithm for nding an approximate task partition on two processors was proposed in [5]. In [6], with linear overhead assumption, an optimal pseudopolynomial time algorithm is proposed to schedule imprecise computational tasks in real-time systems. In [1], algorithms for scheduling real-time tasks on a partitionable hypercube multiprocessor is proposed. Here, the degree of parallelization of a task is not determined by the scheduler, rather it is speci ed as part of the task itself on its arrival, i.e., the scheduler can not change the degree of parallelization of a task for meeting its deadline. Most importantly, the algorithms reported in [1, 5, 6] do not consider resource constraints among tasks which is a practical requirement in any complex real-time system. The rest of the paper is structured as follows: The system model and some de nitions are stated in Section 2. In Section 3, we present our parallelizable task scheduling algorithm and in Section 4, we evaluate its performance. Section 5 rst describes the problem of reclaiming resources from parallelizable tasks and then presents a solution for the same. Finally, some concluding remarks are made in Section 6.

2 Task Model

We assume that the real-time system consists of aperiodic tasks with m processors, where m > 1. 1. Each aperiodic task Ti has ready time ri and deadline di which are known only on its arrival. 2. cji is the worst case computation time of Ti , which is the upper bound on the computation time, when run on j processors in parallel, where 1  j  m. The actual computation time of a task Ti , when executed on j processors, is the actual execution time taken by the task Ti at run-time using j processors. We assume that the values for cji are known through static code analysis. 3. Resource constraints: A task might need some resources such as data structures, variables, and communication bu ers for its execution. Every task can have two types of accesses to a resource: a) exclusive access, in which case, no other task can use the resource with it or b) shared access, in which case, it can share the re-

source with another task (the other task also should be willing to share the resource). Resource con ict exists between two tasks Ti and Tj if both of them require the same resource and one of the accesses is exclusive. 4. When a task is parallelized, all its parallel subtasks, also called split tasks, have to start at the same time. This is necessary to achieve ecient synchronization among split tasks. 5. Tasks are non-preemptable, i.e., when a task is scheduled on one or more processors, it nishes to its completion. 6. For each task Ti , the worst case computation time for any j and k, where j < k satis es j  cji  k  cki . This is called the sublinear speedup assumption.

2.1 Terminology

De nition 1: The scheduler xes a feasible schedule

S taking into account the resource requirements of all

the tasks. The feasible schedule uses the worst case computation time of a task for scheduling it and ensures that the deadlines of all the tasks in S are met. A task when parallelized is scheduled on more than one processor. A partial schedule is one which does not contain all the tasks. De nition 2: A partial schedule is said to be strongly feasible if all the schedules obtained by extending the current schedule by any one of the remaining tasks are also feasible [8]. De nition 3: Start time(Ti ) is the scheduled start time of task Ti which satis es ri  starttime(Ti )  di ? ci . Finish time(Ti ) is the scheduled nish time of task Ti which satis es ri + ci  finishtime(Ti)  di . De nition 4: EATks (EATke ) is the earliest time when resource Rk becomes available for shared (or exclusive) usage [8]. De nition 5: Let P be the set of idle processors and Q be the set of resources requested by task Ti . Earliest start time of a task Ti , denoted as EST(Ti ), is the earliest time when its execution can be started. EST (Ti) = MAX (ri ; MINj2P (avail time(j )); MAXk2Q (EATku )), where u = s for shared mode and u = e for exclusive mode. The time at which processor Pj is available for executing a task is denoted as avail time(Pj ).

3 The Parallelizable Task Scheduling Algorithm

In this section, we present our parallelizable task scheduling algorithm which is a heuristic search algorithm similar to the myopic algorithm [8] except that it parallelizes a task whenever its deadline can not be met. The degree of parallelization (i.e., the number of split tasks) of a task is chosen in a way that the task's deadline is just met. For scheduling a task, the processor(s) and the resource(s) which have minimum earliest available time are selected. The parallelizable task

scheduling algorithm is given below. 1. Order the tasks (in the task queue) in nondecreasing order of their deadlines and then start with an empty partial schedule. 2. Determine whether the current schedule is strongly feasible. This is done for at most K tasks in the task queue which we call feasibility check window. (a) Let K be the actual number of tasks checked for feasibility within the feasibility check window. (b) Let num-split be the maximum degree of parallelization of a task. (c) Let split be the minimum degree of parallelization required to just meet the task's deadline whose value is in [1; num-split]. (d) Let cost be the number of splits encountered over all the tasks. (e) K = 0; cost = 0; feasible = TRUE. (f) While (cost  K and feasible is TRUE) i. If (K ? cost < num-split) then numsplit = K ? cost. ii. Check whether the (K + 1)-th task of the feasibility check window is schedulable without parallelizing it. iii. If (not schedulable) then schedule the task by parallelizing it. iv. If (still not schedulable) then feasible = FALSE else K = K + 1. v. cost = cost + split. 3. If feasible is TRUE (a) Compute the heuristic function (H) for the rst K tasks. (b) Choose the task with the best (smallest) H value to extend the schedule. 4. else backtrack to the previous search level. 5. Repeat steps (2-4) until termination condition is met. The termination conditions are either (a) a complete feasible schedule has been found, or (b) maximum number of backtracks or H function evaluations has been reached, or (c) no more backtracking is possible. For computing H, an integrated heuristic function di + W  EST (Ti ) which captures the deadline and resource requirements of task Ti is used, where W is the weight, which is an input parameter. The time complexity of the parallelizable task scheduling algorithm

for scheduling n tasks is O(Kn) which is same as that of the myopic algorithm. The value of K depends on the number of tasks which have been parallelized and their degrees of parallelization. For example, when K = K , all the tasks in the feasibility check window are checked for feasibility without parallelization and when K = 1, the rst task in the feasibility check window is checked for feasibility with the maximum degree of parallelization. This is how we equate the cost of parallelizable task scheduling algorithm to that of the myopic scheduling algorithm.

4 Simulation Studies

To study the e ectiveness of task parallelization in meeting task's deadline, we have conducted extensive simulation studies. Here, we are interested in whether or not all the tasks in a task set can nish before their deadlines. Therefore, the most appropriate metric is the schedulability of task sets [8], called success ratio, which is de ned as the ratio of the number of task sets found schedulable (by a scheduling algorithm) to the number of task sets considered for scheduling. Since there is no dynamic algorithm for scheduling parallelizable real-time tasks, we compare our algorithm with the well known Spring scheduling algorithm, which schedules sequential tasks. In our study, we assume that c1i  di for each task Ti in order to compare the results with a nonparallelizable task scheduling algorithm. Feasible task sets are generated for simulation using the following approach. 1. Tasks (of a task set) are generated till schedule length, which is an input parameter, with no idle time in the processors, as described in [8]. The computation time c1i of a task Ti is chosen randomly between MIN C and MAX C. 2. The deadline of a task Ti is randomly chosen in the range SC and (1+ R)  SC , where SC is the shortest completion time of the task set generated in the previous step. 3. The resource requirements of a task are generated based on the input parameters USeP and ShareP. 4. The computation time cji of a task Ti when executed on j processors, j  2, is equal to bcji ?1  (j ? 1)=j c + 1. For example, when c1i = 12, the computation times c2i , c3i , and c4i are 7, 5, and 4, respectively. Each point in the performance curves (Figs.3-5) is the average of 5 simulation runs each with 200 task sets. Each task set contains approximately 175 to 200 tasks by xing the schedule length to 800 during the task set generation. For all the simulation runs, the number of instances of every resource is taken as 2. The values for the xed parameters are representative values. Figs.3-5, represent the success ratio by varying W, R, UseP, num-btrk, and K, respectively. When num-split

is 1, the task is considered to be non-parallelizable and the algorithm behaves like the myopic algorithm. Note that the scheduling costs for di erent values of numsplit are equal. This is achieved by making the number of tasks checked for feasibility (K ) as a variable, as discussed in the previous section, i.e., when num-split=1, K = K and K is less than K for num-split > 1. From Figs.3-5, it is interesting to note that an increase in degree of parallelization (based on the sublinear speedup function used here) increases the success ratio.

4.1 E ect of Laxity Parameter

Fig.3 shows the e ect on success ratio by the laxity parameter (R). This helps in investigating the sensitivity of task parallelization on varying laxities. From Fig.3, it is clear that lower values of num-split are more sensitive to change in R than the higher values of numsplit. For example, the success ratio o ered by numsplit=1 varies from 47.2%-99.4% compared to the variation in success ratio (68.5%-99.7%) o ered by numsplit=4. This is due to fact that tasks experience more degree of parallelization (in order to meet their deadlines) when their laxities are tight, and the same task sets with higher laxities rarely need parallelization since their deadlines can be met without parallelizing them. This shows that task parallelization is more e ective for tasks having tighter laxities. parameter MIN C MAX C R UseP ShareP K W num-btrk num-proc num-res

explanation min. computation time of tasks. max. computation time of tasks. laxity parameter decides the deadline of tasks. probability that a task uses a resource. probability that a task uses a resource in shared mode. size of feasibility check window. weightage given to EST(Ti ). number of backtracks permitted. number of processors considered. number of resources considered.

Fig.2 Simulation parameters

4.2 E ect of Resource Usage

From Fig.4, we observe that the success ratio decreases with increasing UseP. This is due to more resource con icts among tasks which make the value of EST(Ti ) decided by the availability of required resources rather than the availability of processors and ready time of the task Ti . For lower values of resource usage (UseP), the di erence between success ratio o ered by numsplit=4 and num-split=1 is less compared to at higher values of UseP. This shows that task parallelization is more e ective when the resource requirements of tasks are high.

num-proc = 10 num-res=4 UseP = 0.6 ShareP = 0.5 num-btrk = 10 K = 7 W = 1.0

cess ratio does not improve signi cantly with increasing values of num-btrk for all values of num-split. This clearly motivates the need for nding techniques which increase the success ratio with increasing scheduling cost by xing the number of backtracks. The parallelization of tasks proposed in this paper is one such technique as demonstrated by our simulation results for di erent values of num-split.

100

Success Ratio

90

80

70

num-split=1 num-split=2 num-split=3 num-split=4

60

5 Resource Reclaiming

50

40 0.04

0.06

0.08

0.1 0.12 Laxity parameter

0.14

0.16

0.18

Fig.3 E ect of laxity (R) paramter num-proc = 10 num-res = 4 num-btrk = 10 ShareP = 0.5 R = 0.09 K = 7 W = 1.0

100

Success Ratio

95

90

num-split=1 num-split=2 num-split=3 num-split=4

85

80

75 0.2

0.25

0.3

0.35 0.4 0.45 0.5 0.55 Resource usage probability

0.6

0.65

0.7

Fig.4 E ect of resource usage probability num-proc = 10 num-res = 4 UseP = 0.6 ShareP = 0.5 R = 0.09 K = 7 W = 1.0

95 94 93

Success Ratio

92

Resource reclaiming refers to the problem of reclaiming the resources left unused by a real-time task when it takes lesser time to execute than its worst case computation time, or when a task is deleted from the current schedule, and is invoked by each processor on completion of its currently executing task. Task deletion takes place when extra tasks are initially scheduled to account for fault tolerance. When no faults occur, there is no necessity for these temporally redundant tasks to be executed and hence they can be deleted. Resource reclaiming on multiprocessor systems with independent tasks is straightforward. A resource reclaiming algorithm in said to be work-conserving if it never leaves a processor idle when there is a dispatchable task. But, resource reclaiming in multiprocessor systems with resource and precedence constrained tasks is more complicated. This is due to the potential parallelism provided by a multiprocessor, and potential resource and precedence constraints among tasks. When the actual computation time of a task differs from its worst case computation time in a nonpreemptive multiprocessor schedule with resource constraints, run-time anomalies [4] may occur. These anomalies may cause some of the already guaranteed tasks to miss their deadlines. In particular, one cannot simply use a work-conserving scheme without verifying that the task deadlines will not be missed. Therefore, a resource reclaiming algorithm is said to be correct only if it does not result in run-time anomalies.

5.1 A New Reclaiming Algorithm

91 90 89 88 87

num-split=1 num-split=2 num-split=3 num-split=4

86 85 0

5

10 15 Number of backtracks

20

25

Fig.5 E ect of number of backtracks

4.3 E ect of Number of Backtracks

In Fig.5, the impact of number of backtracks on the success ratio is plotted for various values of num-split. From the plot, it is interesting to note that the suc-

In [9], two algorithms, Basic algorithm and Early start, were proposed for resource reclaiming in multiprocessor real-time systems with resource constraints among tasks. In [7], a new data structure, restriction vector (RV), was introduced and using RV, resource reclaiming algorithms were proposed for tasks having resource and precedence constraints among them. These algorithms were designed for sequential tasks and cannot be applied to parallelizable tasks considered here. When used, these algorithms violate the assumption (requirement) that all the split tasks of a task should start at the same time. We propose the following extensions, which can be applied to any of the above mentioned reclaiming algorithms:

1. The reclaiming algorithm should ensure that all split tasks of a task start at the same time to satisfy the parallelizable task property. 2. To improve the guarantee ratio, the reclaiming algorithm can attempt to reduce the degree of task parallelization of a task in the schedule. This may be possible due to the resources reclaimed while executing the previous tasks. The reduction in the degree of parallelization of a task reduces the parallelizing overheads (sublinear speedup) and thereby increases the processor and resource utilization which in turn increases the guarantee ratio. While doing this, it should be ensured that the nish time of the task after reducing its degree of parallelization be less than or equal to its scheduled nish time. This is necessary to avoid run-time anomalies. T1

Tk *

WCC(T1) = 7

T2

Tk *

WCC(T2) = 7

T3

Tk *

WCC(T3) = 7

0

7

11

WCC(Tk) = 4 on 3 processors

Fig. 6a Pre-run schedule

T1

Tk *

ACC(T1) = 6

T2

Tk *

ACC(T2) = 6

T3

Tk *

ACC(T3) = 4

0

5 6

10

ACC(Tk) = 4 on 3 processors

Fig. 6b Post-run schedule with no change in parallelization

ACC(T1) = 6

T1 T2

Tk *

ACC(T2) = 5

T3

Tk *

ACC(T3) = 5

ACC(Tk) = 5 on 2 processors 0 5 10 Fig. 6c Post-run schedule with reduced parallelization for Tk

Fig.6 illustrates the proposed extensions. Fig.6a is the pre-run schedule produced by the scheduler, and Fig.6b and Fig.6c are the post-run schedules produced after applying our rst and second extensions, respectively. Note that, in Fig.6b, even though processor P3 nishes early (at time 4) it has to wait till the other processors (P1 and P2 ) are ready (at time 6) for executing the task Tk . The worst case computation times (WCC) and the actual computation times (ACC) of the tasks are indicated in Fig.6. The worst case computation times for task Tk are 8, 5, and 4 on 1, 2, and 3 processors, respectively. In Fig.6c, the degree of parallelization of task Tk has been reduced from three to two which results in earlier completion time of Tk from 11 (Fig.6a) to 10. Such type of useful processor utilization (with reduced parallelizing overheads) will result in im-

proved guarantee ratio and the same is achieved by our reclaiming algorithm. We have also conducted simulations, to study the parallelizable task scheduling algorithm together with its associated reclaiming algorithm. We found that the parallelized scheduling with proposed reclaiming o ers better guarantee ratio compared to other combinations. Due to space limitations, we do not present these results.

6 Conclusions

In this paper, we have proposed a new algorithm for dynamic scheduling of parallelizable tasks in multiprocessor real-time systems. We have demonstrated through simulation that the task parallelization is a useful concept for achieving better schedulability than allowing more number of backtracks without parallelization. The simulation studies show that the success ratio o ered by our algorithm is always higher than that of the myopic algorithm for a wide variety of task parameters. Also, we have extended the existing resource reclaiming algorithms for handling parallelizable tasks. From the simulation studies of parallelizable task scheduling, the task parallelization is more e ective for tasks having tighter laxities and high resource con icts.

References

[1] D. Babbar and P. Krueger, \On-line hard real-time scheduling of parallel tasks on partitionable multiprocessors," Intl. Conf. on Parallel Processing, vol.2, pp.29-38, 1994. [2] M.L. Dertouzos and A.K.Mok, \Multiprocessor on-line scheduling of hard real-time tasks," IEEE Trans. on Software Engg., vol.15, no.12, pp.1497-1506, Dec. 1989. [3] I. Ekmecic, I. Tartalja, and V. Milutinovic, \A survey of heterogeneous computing: concepts and systems," Proc. IEEE, vol.84, no.8, pp.1127-1146, Aug. 1996. [4] R.L. Graham, \Bounds on multiprocessing timing anomalies," SIAM J. Appl. Math., vol.17, no.2, Mar. 1969. [5] C.C. Han and K.J. Lin, \Scheduling parallelizable jobs on multiprocessors," Real-Time Systems Symposium, pp.59-67, 1989. [6] J.W.S. Liu, K.J. Lin, W.K. Shih, A.C. Yu, J.Y.Chung, and W. Zhao, \Algorithms for scheduling imprecise computations," IEEE Computer, pp.58-68, May 1991. [7] G. Manimaran, C. Siva Ram Murthy, Machiraju Vijay, and K. Ramamritham, \New algorithms for resource reclaiming from precedence constrained tasks in multiprocessor real-time systems," to appear in Journal of Parallel and Distributed Computing, 1997. [8] K. Ramamritham, J. A. Stankovic, and Perng-Fei Shiah, \Ef cient scheduling algorithms for real-time multiprocessor systems," IEEE Trans. on Parallel and Distributed Systems, vol.1, no.2, pp.184-194, Apr. 1990. [9] Chia Shen, K. Ramamritham, and J.A. Stankovic, \Resource reclaiming in multiprocessor real-time systems,"IEEE Trans. on Parallel and Distributed Systems, vol.4, no.4, pp.382-397, Apr. 1993. [10] A.Y. Zomaya, \Parallel processing for real-time simulation: a case study," IEEE Parallel & Distributed Technology, pp.4956, June 1996.

Suggest Documents