Heuristic scheduling algorithms to access the ... - Computing Science

Heuristic scheduling algorithms to access the critical section in Shared Memory Environment Tahany A. Fergany Engineering Mahematics Departmenrt Faculty of Engineering Cairo University, Egypt.

Reda A. Ammar Computer Science Department University of Connecticut Storrs, CT 06269-3155, USA. Ali I. El-Desouky Computer & Control Department Faculty of Engineering Mansoura University, Egypt.

Mohamed M. Hefeeda Computer & Control Department Faculty of Engineering Mansoura University, Egypt.

Abstract

The cooperation of n processes to solve a problem is useful only if the partial results are efficiently exchanged between processes. Shared variables facilitate communication among the processes. But they must be protected from nondeterminism, which can result from concurrent access by more than one process at a time. In order to protect the shared variables from nondeterminism, the code that hnadles these variables is placed in a critical section [1, 6, 9]. The critcal section is a section of code which can be executed by only one process at a time and which, once started, will be able to finish without interruption. Unfortunately, accessing the critical section by different processes will create a serial bottleneck that can seriously impair the performance of the software. Since shared memory multiprocessors are becoming more important in commercial environment, it becomes necessary to schedule shared memory access in the most efficient way. The scheduling problem [2-5, 7, 8, 10, 11] is complicated by the fact that each branch of parallel structure resulted from the FORK operation includes the time to process the portion of the code before accessing the shared variables, the time to access to the shared variables, and the time to process the portion of the code after using the shared variables, which may all be different. In order to make optimization possible, it is necessary to have an approach to quantify the time costs of parallel computations. After that, the time cost of processes which require access to the critical section can be minimized by using a suitable scheduling methods. The computation structure model [9] is used to represent the detailed time cost of a parallel structure. It is assumed in this model that the underlying computer system has a finite number of processors with the same speed and they communicate with each other through a shared

In shared memory parallel processing environment, shared variables facilitate communication among processes. To protect shared variables from concurrent access by more than one process at a time, they placed in a critical section. Scheduling a set of parallel processes to access this critical section with the aim of minimizing the time spent to execute these processes is a crucial problem in parallel processing. This paper presents heuristic scheduling algorithms to access this critical section.

1 Introduction The increasing demands for faster computers have led to the availability of many parallel computers. It is hoped that the impracticable computationallyintensive applications will be practicable by their execution on highly parallel computers. A number of factors prevent the growth of parallel computing. First, the substantial investment in sequential programming tools that aid in program testing, execution profiling, and interactive debugging. Second, the lack of a single, predominant, parallel architecture. Third, the difficulty of developing efficient programs for parallel computers. This paper addresses one of the obstacles that hinders producing efficient parallel programs; that is accessing the shared variables. In parallel programs, parallelism is gained through process creation. One of the most common mechanisms proposed for creation of processes is FORK/JOIN mechanism [1, 9], where the FORK statement spawns several processes and JOIN statement is used to synchronize the termination of processes. The portion of program between the FORK and JOIN is called the parallel structure. The semantics of the parallel structure require that exactly those processes created by FORK operation terminate at the associated JOIN operation and no operations after JOIN can start until all processes created by FORK are completed.

1

first evaluates these algorithms by simulation programs and compares between them. Second, it presents a new algorithm which gives a better results.

memory. In the computation structure model, the lock nodes are used to obtain locks on shared data and unlock nodes are used for releasing these locks. These locks facilitate protection of the shared variables.

2 Previous Research Efforts Previously [2, 4, 7, 8], Algorithms were developed for accessing the critical section based on the time cost of the operations before the lock nodes, the time cost of the operations between the lock and unlock nodes, and the time cost of the operations after the unlock nodes. In the parallel structure in Fig. 1, assume that every two lock nodes are in conflict, and let: Time cost of the Pre-lock Job = PLJi Time cost of the Lock and Access Shared Variables = LASVi Time cost of the Remaining Job = RJi In order to schedule the operations between FORK/JOIN nodes (That are, the PLJs, the LASVs, and RJs) we considered eight possible cases which may arise in the parallel structure. These cases are listed in Table 1 along with their scheduling algorithms. In table 1, The “=“ indicates that all jobs have the same time cost; and ““ indicates that at least one job has a time cost different from the others. Algorithms for cases (I, II, III, IV, V, and VII) were mathematically proved to give the optimal solutions [7 ]. For cases VI, and VIII the Branch and Bound algorithm was developed which yields the minimum time cost for the parallel structure [7]. Although the Branch and Bound approach is widely acceptable technique [7], it is computationally expensive, especially when the problem size grows. So that heuristic algorithms were introduced which can produce optimal or near optimal solutions.

Fork

PLJ

LASV

RJ

B1

Bi

Bn

Lock1

Locki

Lockn

S1

Si

Sn

Unlock1

Unlocki

Unlockn

A1

Ai

An

Join

Fig. 1 Parallel structure model Join

In Fig. 1 we have a parallel structure with n branches that all are in conflict, i.e. they need to access the critical section simultaneously. In this parallel structure we can classify the operations into the following three categories: 1. The operation before accessing the critical section is defined as Pre-Lock Job, PLJ. 2. The operation of accessing the critical section which contains three sub-operations. These suboperations are: the lock operation to prevent other branches to access the critical section; the access of the shared variables operation; and the unlock operation to free the critical section for the other branches. So that, this combined operation is defined as Lock and Access Shared Variables, LASV. 3. The operation after accessing the critical section which is defined as Remaining Job, RJ. Algorithms were developed to schedule the access of the critical section [1, 6, 9 ]. Branch and Bound algorithm [7, 8] was used to find the optimal order in which the conflicted processes access the critical section. Branch and bound algorithm produces the optimal solution but it may take a long time to find it especially for large number of processes, greater than 8. So that, other heuristic algorithms were suggested [2, 4 ] which can produce optimal or near optimal solutions in short time. Those algorithms are called comparison and adjustment algorithms. This paper

Case

PLJ

LASV

RJ

Scheduling Algorithm

I

=

=

=

FCFS or LRJF

II

=

=

LRJF

III

=

=

FCFS or LRJF

IV

=

LRJF

V

=

=

FCFS

VI

=

Branch and Bound

VII

=

FCFS

VIII

Branch and Bound

FCFS: First Come First Served, LRJF: Longest Remaining Job First

Table 1 Scheduling Methods

2.1

Comparison Algorithm

A heuristic algorithm, i.e. not mathematically proved, that finds optimal solutions in some cases and near optimal solutions in the others. It is simple compared to Branch and Bound algorithm therefore it takes less time. For the parallel structure in Fig. 1

2

with n conflicted branches, the comparison algorithm is applied as follows:

reduce the execution time of the parallel structure. Move the maximum branch, the branch whose execution time is the longest, to the front of the waiting queue. In this way it can access the critical section earlier and hence its execution time reduces. Move the longest waiting branch, the branch that finishes its PLJ operation and waits the longest time to access the critical section, to the front of the waiting queue of the critical section. Thus, it can access the critical section earlier and reduces its execution time and the overall execution time. Simulation results, see section 4, showed that applying the new algorithm with the order: phase 1, phase 2, and finally phase 3 gave better results than the original algorithm. Moreover, when we changed the order of the phases to be phase 2, phase 1, and finally phase 3 the algorithm gave much better results. But another combinations of the three phases gave worse results than the original algorithm. We tried the following combinations: (phase 2, phase 3, phase 1); (phase 2, phase 3, phase 1, phase 3); (phase 1, phase 3, phase 2, phase 3); (phase 1, phase 2, phase 3, phase 2, phase 3) and all of them gave worse results.

1. Use the Longest Remaining Job First, LRJF, scheduling policy to order the branches of the given parallel structure. 2. If for every i = 2, 3, ... , n, PLJi-1 < PLJi, then the branches follows First Come First Served, FCFS, policy at the same time. No additional movements will be considered and the resulting order provides an optimal (or near optimal ) solution. 3. If for an i = 2, 3, ..., n, PLJi-1 > PLJi, and PLJi-1 - PLJi < RJi-1 - RJi , we reverse the order of the branch i-1 with branch i. 4. Repeat step 3 until no more movements.

2.2 Adjustment Algorithm The comparison algorithm is easy to apply but we need to add another round of adjustments to produce the optimal solution. the adjustment process is based upon the following two phases of movements: 1. Look for a branch that follows the current maximum branch and whose communication cost is smaller than the communication cost of a branch that precedes the current maximum branch. Swapping of these two branches may reduce the execution time of the parallel structure. 2. Move the maximum branch, the branch whose execution time is the longest, to the front of the waiting queue. In this way it can access the critical section earlier and hence its execution time reduces. This adjustment process is an iterative process and will continue until no more improvements is possible. The comparison algorithm is used to derive the initial solution for the adjustment algorithm.

3.1 Example: This example describes the application of the new algorithm on a parallel structure consists of five branches each branch has three time costs, PLJ, LASV, and RJ. The comparison algorithm is used to derive the initial solution. The following figure shows the application of the new algorithm. Initial solution 4 20 12

PLJ

92

83 LASV

55

71

67

68

54

3 The New Adjustment Algorithm

67

59

70

90

13

RJ

The adjustment algorithm produces optimal solutions in many cases and near optimal solutions in the others. Yet, we can add another round of enhancement, phase 3, which enhances the original adjustment algorithm and produces better results. Phase 3 states that: Moving the longest waiting branch, the branch that finishes its PLJ operation and waits the longest time to access the critical section, to the front of the waiting queue of the critical section may reduce the overall execution time. Thus, the new algorithm consists of the following three steps: 1. Look for a branch that follows the current maximum branch and whose communication cost is smaller than the communication cost of Apply phase 2 a branch that precedes the current maximum branch. Swapping of these two branches may After swaping branch 4 with branch 2.

134

197

275

363

340

Total Time Cost of each branch

Apply phase 2 Max. branch

3

12

4

92

20

83

55

71

68

67

54

67

59

90

70

13

134

197

296

343

340

12

20

92

4

83

After swaping branch 4 with branch 3.

55

67

68

71

54

67

70

90

59

13

134

204

292

332

340

0

47

42

198

190 Waiting time of each branch.

Apply phase 3

Longest waiting branch

4

20

92

12

83

71

67

68

55

54

59

70

90

67

13

134

212

300

332

332

different branches, and test to see if the new order is better than the previous one. b) If the new order has larger overall execution time then keep the previous order, and try another swapping. c) Evaluate the longest path of the parallel structure with the new order. Assume that the new maximum branch is branch k. d) If k = 1 or the execution time of the current parallel structure equals to the sum of execution times of PLJj, LASVj, and RJj, then the scheduling order we have is optimal and no additional improvement is possible. Otherwise, apply phase 1 again until no more improvement is achieved. 6. Apply Phase 3 as follows: a) Find the branch w that has the maximum waiting time. The waiting time of a branch x is evaluated by subtracting the time cost of PLJx of that branch from the time needed for the previous branch x-1 to finish the critical section. b) Initialize a displacement variable i to be 1. c) Swap the branch w with branch w-i. Evaluate the new execution time. d) If the new order has larger overall execution time then retrieve the previous order, increment i, and go to step 6.c. e) Evaluate the branch with maximum waiting time of the new parallel structure. Assume that the new branch is branch j. f) If j = 1 or the execution time of the current parallel structure equals to the sum of execution times of PLJj, LASVj, and RJj, then the scheduling order we have is optimal and no additional improvement is possible. Otherwise, apply phase 3 again until no more improvement is achieved.

After swaping branch 4 with branch 1.

Note that in the above figure, useless steps are omitted. The new adjustment algorithm can be written in steps as follows: 1. Find the branch k of the parallel structure, after applying the comparison algorithm, whose path has the longest execution time. 2. If k = 1, then the current parallel structure has the minimum possible execution time. 3. If the execution time of the parallel structure equals to the sum of execution times of PLJk, LASVk, and RJk, then the scheduling order we have is optimal and no additional improvement is possible. 4. Apply Phase 2 as follows: One)Initialize a displacement variable i to be 1. Two)Swap branch k with branch k-i. Evaluate the new execution times. Three)If the new order has larger overall execution time then keep the previous order, increment i, and go to step 4.b. Four)Evaluate the longest path of the parallel structure with the new order. If there is more than one branch has the same maximum value we use the back most one. Assume that the new maximum branch is j. Five)If j = 1 or the execution time of the current parallel structure equals to the sum of execution times of PLJj, LASVj, and RJj, then the scheduling order we have is optimal and no additional improvement is possible. Otherwise, apply phase 2 again until no more improvement is achieved. 5. Apply Phase 1 as follows: a) Set two pointers i (the front index) and j (the back index). The front index changes from 1 to k-1 and the back index changes from k+1 to n. For every value of j change i from 1 to k-1. If LASVi > LASVj then swap the two branches, evaluate the execution times of

4 Simulation Results This section, firstly, shows the effect of scheduling the critical section on the execution of parallel programs. Secondly, evaluates the scheduling algorithms and compares between them.

4.1 Effect of Scheduling To show the benefits of scheduling the access to the critical section, we developed a C++ simulation program. The program generates different number of branches, from 3 t0 8. For each branch, the program generates 500 sets of random values for PLJ, LASV, and RJ. Then, for each set it evaluates the execution time. Also, it finds the optimal order for the branches to access the critical section, this is done by trying all possisble permutations which equal the factorial of the number of branches. Then, it evaluates the optimal execution time. Eventually, it aggregates and averages the execution time and the

4

randomly. It starts with LASV range which is double the range of the PLJ and RJ until LASV range reaches only 1% of PLJ and RJ ranges; the last cases is likely to appear in practice. For each range, it generates different number of branches, from 3 to 8. For each branch it generates 500 sets of random values for PLJ, LASV, and RJ. Then, for each set it orders the branches according to the scheduling algorithm, Comparison, Adjustment, or New Adjustment, and evaluates the execution time. Then, it finds the optimal execution time by exhaustive search, i.e. trying all possible permutations which equal the factorial of no. of the branches, to compare with. If the optimal time is not equal to the time resulted after applying the algorithm, the program counts this case as a not-optimal one and evaluates the time difference between the time of optimal and not optimal cases. Then, it aggregates the time differences resulted from the not optimal cases out of the overall 500 cases. After that, the program evaluates the percentage of the total time difference to the total optimal time. The following pseudo-code describes the structure of the main body of the program.

optimal execution time over the 500 sets. The following pseudo-code describes the structure of the main body of the program. for( branches=3;branches

Heuristic scheduling algorithms to access the ... - Computing Science

Heuristic scheduling algorithms to access the ... - Computing Science

Suggest Documents

Cloud Scheduling using Meta Heuristic Algorithms

applying milp/heuristic algorithms to automated job- shop scheduling ...

Bayesian Heuristic Approach to Scheduling

Heuristic scheduling algorithms for allocation of - Semantic Scholar

Meta-heuristic algorithms for parallel identical machines scheduling ...

Heuristic algorithms for scheduling on uniform ... - Semantic Scholar

Heuristic algorithms for scheduling heat-treatment furnaces of steel ...

QoS Guided Heuristic Algorithms for Grid Task Scheduling - CiteSeerX

Approximation and Heuristic Algorithms for Computing Backbones in ...

Heuristic Algorithms - CiteSeerX

Approximation and Heuristic Algorithms for Computing Backbones in ...

Heuristic Algorithms - CiteSeerX

Adapting Heuristic Mastermind Strategies to Evolutionary Algorithms

An introduction to heuristic algorithms - Semantic Scholar

Treecode Algorithms for Computing Nonbonded ... - Computer Science

Security Algorithms for Cloud Computing - Science Direct

Evolutionary Algorithms on Volunteer Computing ... - Computer Science

Heuristic Scheduling of Parallel Heterogeneous

Memory Access Scheduling - Rice University's Computer Science

The Combination of Heuristic and Meta-heuristic Algorithms for DG ...

A Heuristic to Solve the Synchronized Log-Truck Scheduling Problem ...

Implementation of Job Scheduling Algorithms in a Cloud Computing ...

Dynamic Multilevel Hybrid Scheduling Algorithms for Grid Computing

A Relative Study of Task Scheduling Algorithms in Cloud Computing ...