Future Generation Computer Systems 25 (2009) 364–370 www.elsevier.com/locate/fgcs
An efficient adaptive scheduling policy for high-performance computing J.H. Abawajy ∗ Deakin University, School of Engineering and Information Technology, Geelong, VIC. 3217, Australia Received 25 March 2006; accepted 20 April 2006 Available online 26 July 2006
Abstract The advent of commodity-based high-performance clusters has raised parallel and distributed computing to a new level. However, in order to achieve the best possible performance improvements for large-scale computing problems as well as good resource utilization, efficient resource management and scheduling is required. This paper proposes a new two-level adaptive space-sharing scheduling policy for non-dedicated heterogeneous commodity-based high-performance clusters. Using trace-driven simulation, the performance of the proposed scheduling policy is compared with existing adaptive space-sharing policies. Results of the simulation show that the proposed policy performs substantially better than the existing policies. c 2006 Elsevier B.V. All rights reserved.
Keywords: Distributed systems; Commodity cluster computing; Space-sharing; Job scheduling; Heterogeneous systems; Performance analysis
1. Introduction The processing power of a single computer system has become inadequate for certain problems of a global scale, and the use of a super-computer is not always an option for many researchers. With advances in hardware, software and computer networks, the pre-dispositions in high-performance computing system design and deployment have shifted from the conventional parallel and distributed super-computer onto network-based distributed systems such as commodity-based cluster computing [11]. Clusters are now recognized as popular high-performance computing platforms for both scientific and commercial applications [5]. These commodity-based highperformance clusters have better price–performance ratios for a given computing problem than alternative high-performance computing platforms and have raised parallel and distributed computing to a new level. In this paper, we address the problem of job scheduling for commodity-based high-performance clusters (HPC) with an emphases on space-sharing scheduling policy. The motivation for addressing this problem is that, while commoditybased HPC clusters offer tremendous computing power, this potential power is not exploited effectively [12,2]. Also, as ∗ Tel.: +61 3 52271376.
E-mail address:
[email protected]. c 2006 Elsevier B.V. All rights reserved. 0167-739X/$ - see front matter doi:10.1016/j.future.2006.04.007
commodity clusters become more commonly used for largescale applications that pose tremendous processing and/or storage demands on the system, resource management and scheduling becomes an important issue for the efficient deployment of commodity-based HPC clusters [2,12,6]. Since commodity-based HPC clusters are commonly operated in space-sharing mode [6], the space-sharing scheduling policy is an appropriate policy for such platform. Space-sharing is the short-term partition of processors in the system into varying size sets and then the allocation of each set to different jobs. There are three main forms of space-sharing disciplines: static space-sharing, adaptive space-sharing, and dynamic space-sharing. Under static space-sharing policy, the processors are divided into a fixed number of disjoint sets, each of which is allocated to individual jobs. The problem with the static approach is that it can lead to a problem known as processor fragmentation, in which there is a mismatch between the allocated processor size and the processor requirements of the jobs [1]. Also, short jobs can easily be blocked by long jobs for a long time before being executed. However, in practice short jobs usually demand a short turnaround time. The adaptive space-sharing policy configures each job to execute on a subset of the total available processors, the size of which is based on the job’s processor requirements as well as the current system load conditions. However, adaptive approaches are not sensitive to subsequent system changes. This shortcoming
J.H. Abawajy / Future Generation Computer Systems 25 (2009) 364–370
is addressed in the dynamic space-sharing approach, where processors can be taken away from running jobs, or jobs can be given extra processors at run-time. This pre-emptive capability, however, entails substantial overhead, as such dynamic spacesharing policies are commonly used in shared memory systems, while adaptive space-sharing policies are the preferred choice in distributed systems over the static space-sharing policy. In this paper, we propose a new adaptive space-sharing policy for non-dedicated, heterogeneous and distributively owned HPC clusters [2]. Although the adaptive space-sharing policy has been shown to provide better performance than the static space-sharing policy for distributed memory systems [9, 3], the static space-sharing policy is the most common approach used in commodity-based HPC clusters [6,10,7]. Using tracedriven simulation, the effectiveness of the proposed policy is examined. We compare the performance of the proposed policy with two baseline policies. The results show that the proposed policy provides substantial performance improvements at system loads of interest (i.e., medium-to-high system loads). Also, an increased understanding of the ways in which a shared environment and resource heterogeneity affect space-sharing parallel job scheduling policies is discussed. From this study, we conclude that an adaptive space-sharing policy based on commodity clusters is viable, both from a conceptual point of view and quantitatively for jobs of sufficient size. The rest of the paper is organized as follows. Section 2 discusses the background, related work and system model used in this paper. In Section 3, a new adaptive space-sharing policy is proposed to address these problems. In Section 5, the baseline adaptive space-sharing policies used to compare with the proposed policy are discussed. In Section 6, we describe the experimental setup as well as the system and workload models used in the experiments. The performance of these adaptive policies under various system and workload parameters is presented in Section 7. This section shows that the proposed adaptive policy outperforms two different similar policies. The conclusion and future directions are given in Section 8. 2. Background and problem statement In this section, we discuss the background, related work and the system workloads. We also formulate the problem and discuss the basic structure of adaptive space-sharing policies and the problem facing these scheduling policies under a shared heterogeneous environment. 2.1. Commodity-based high-performance computing Commodity-based HPC clusters can generally be classified into enterprise high-performance clusters (EHPC) and pervasive high-performance clusters (PHPC). Both clusters are formed from a collection of commodity off-the-shelf hardware computers interconnected and configured to operate as a single unit. However, computers in EHPC are dedicated, homogeneous and interconnected with a local-area network. Also, the entire system is privately owned and operates within a network scope managed by a single system administrator. In contrast,
365
pervasive computing environments are far more dynamic and heterogeneous than enterprise environments. In pervasive computing, heterogeneity occurs in many aspects: hardware, software platforms, network protocols, and processors. Also, computers in PHPC are individually owned, non-dedicated and interconnected with a local-area or wide-area network [2]. 2.2. System workloads In this paper, we focus on pervasive high-performance clusters (PHPC). Users submit applications to the system for execution without being concerned about the heterogenity or the changing set of volunteer machines on which the computation is actually performed. In this paper, we refer to a program or an application as a job. There are two types of jobs in the system: local and parallel. A parallel job is composed of a set of tasks that can execute on any of the processors as external tasks. In contrast, the local jobs consist of a single process (task) and run as a single entity on the workstation that they are originated on as local tasks. Moreover, local tasks have pre-emptive priority over the external tasks, which means that, when a local task arrives at a node, the external tasks running on that node are suspended and will be resumed when the node becomes available or migrated to another available node by the Migration Manager. 2.3. Problem statement The universal problem in resource allocation is the conflict between individual users trying to maximize their use of a resource and the global limitations on the availability of that resource. In shared and heterogeneous systems, in addition to the classic allocation problem (how many processors to allocate to a job), we are also faced with the placement problem (which processors to allocate to a job) and task reassignment problem (when and which tasks to reallocate to where). Exiting scheduling policies for commodity-based HPC clusters provide only rudimentary facilities for space-sharing the processors. Existing adaptive space-sharing policies have been developed for traditional dedicated homogeneous multiprocessor systems [9] and have also found their way into commodity cluster computing environments with very little or no modifications [8]. Also, most of the previous studies focus on optimizing a single application at a time. However, the viability of running parallel and sequential applications concurrently on clusters has been demonstrated in [4]. The conventional adaptive space-sharing policy can generally be characterized by three main attributes: 1. the scheduler is activated whenever a job arrives and there are idle processors, or whenever a job departs and there are queued jobs; 2. an incoming job is assigned to a subset of the total available processors; and 3. a running job releases its partition only after it finishes execution.
366
J.H. Abawajy / Future Generation Computer Systems 25 (2009) 364–370
However, the processor heterogeneity, load variation and dynamic availability of the resources that commonly characterize a commodity-based HPC cluster system requires important differences in the decisions made by the scheduling policy. Thus, the standard adaptive space-sharing policy cannot be used effectively in non-dedicated heterogeneous multiuser HPC clusters. For example, there are multiple jobs with different execution priorities in HPC clusters. Therefore, the requirement that once a job is scheduled on a partition it executes on the same partition without any further interruption until completion is not realistic. Also, the all-or-nothing approach that is inherent in standard adaptive space-sharing policies can lead to a form of processor fragmentation, in which faster processors finish assigned tasks and sit idle while the slower processors finish the part of the job assigned to them [1]. Also, its inability to adjust scheduling decisions in response to subsequent workload and system changes limits the performance benefits of conventional adaptive spacesharing policies on non-dedicated heterogeneous multi-user HPC clusters. Moreover, the presence of local computation and resource heterogeneity individually and in combination can further exacerbate the deterioration of parallel job performance. This is because the assignment of a task to a machine on which it executes slowly can significantly reduce overall performance. Moreover, this may lead to load imbalances and impose constraints on the choice of processors for executing jobs. For example, in cluster of workstations, local jobs can be injected at any point in time into the system and have to be executed without delay. In such an environment, an optimal assignment of tasks to idle machines can easily become sub-optimal if one of the machines is suddenly loaded with a task from another user. In the next section, we propose a new two-level spacesharing policy for heterogeneous systems. We believe that the scheduling policy for shared and heterogeneous systems must be able to address how many processors to allocate to a job, which processors to allocate to a job, and when and which tasks to reallocate to where. To the best of our knowledge, there is no work that combines these three components (allocation problem, placement problem, and reassignment problem) into one scheduling approach to attain an adaptive space-sharing policy. In the following section, we present a new two-level adaptive scheduling policy that addresses the three problems. 3. Two-level adaptive space-sharing policy In this section, we present a new two-level adaptive scheduling policy for shared and heterogeneous systems. The commodity-based HPC cluster used in this paper is composed of P interconnected volunteer machines of varying processing speeds linked together by a high-speed shared network. Without loss of generality, we assume that one processor (CPU) per node with its own operating system, private memory, and each node may have its own file system. We also assume that each node has a primary user (i.e., owner). The nodes are heterogeneous
Fig. 1. A schematic representation of the commodity cluster computing architecture.
and geographically distributed and globally used by large user communities. The pervasive high-performance cluster (PHPC) used in this paper is shown in Fig. 1. The job-level scheduler is responsible for allocating and reallocating the system resources to parallel jobs. In contrast, the task-level scheduling is responsible for allocating and reallocating the local resources to the local and external tasks. The status of a running application is monitored by the Partition Manager, whereas the Node Manager is responsible for monitoring the system nodes. The Partition Manager is also responsible for determining the partition size as well as making decisions of task migration. The scheduler is invoked when a job arrives, finishes, or migration of a task is required. In this paper, we focus on the scheduling aspect only. Fig. 2 shows the proposed adaptive algorithm. The step is essentially an on-line bin-packing approach in which a fixed number of bins, possibly of different sizes, are given and the aim is to pack as many items (tasks) as possible. In the following, these two requests are described in detail. 3.1. Scheduling request Every time the scheduler is invoked with a request to schedule jobs in the job wait queue, a set of clusters from a subset of available processors is created. Let Ct ∈ P be the subset of processors available in the system at time t. Let µt be the total processing capacity available in the system at time t, computed on the basis of the mean service rate, µi , of an individual processor pi ∈ Ct as follows: µt =
|Ct | X
pi (µi ).
(1)
i=1
Let Jwait be the total number of non-local jobs in the job wait queue in the system at time t. An incoming non-local job specifies its maximum parallelism, Mmax , such that 1 ≤ Mmax ≤ P. Let Msum be the sum of the maximum parallelism
J.H. Abawajy / Future Generation Computer Systems 25 (2009) 364–370
367
Fig. 2. A two-level adaptive space-sharing scheduling policy.
of the waiting jobs, given as follows: Msum =
|Job wait | X
Mmax (Ji ∈ Jobwait ).
(2)
i=1
The average parallelism of the Jwait jobs in the job wait queue at time t, Mt , is given as follows: Mt = µeff ∗
Msum |Jwait |
(3)
where µeff is the effective processing capacity of the system given as follows: Ct P
µeff =
µsys
.
(4)
The parameters Ph (µh ) and ri (µi ) in Eq. (4) are the mean processing rate of the fastest processor in the system and the mean processing capacity of the individual available processor, ri ∈ Ct , respectively. The parameter µsys is the total processing capacity of the system, given as follows: µsys =
P X j=1
µ j ( p j ).
where ℵi is determined as follows: µt ℵi = . Mt
(7)
The algorithm then assigns a job, Ji ∈ Jwait , from the job wait queue to each cluster, Si ∈ S. 3.2. Reassignment request
Ph (µh ) − ri (µi )
i
Based on the above information, the Ct available processors are partitioned into S subclusters such that each subcluster Si ∈ S has m i processors and ℵi processing capacity as follows: µt S= (6) ℵt
(5)
The reassignment request is solved by adjusting scheduling decisions in response to subsequent workload and system changes. The issue here is how to find an available processor to migrate the suspended task to. One solution is to make an idle processor held by a particular non-local job immediately available for reassignment. This means that the released processors are returned to the pool of available processors immediately. The advantage of this method is that it is very simple to implement. The problem is that the released processor may wait idle, while there are parallel tasks either suspended due to local task execution or running on a very slow processor.
368
J.H. Abawajy / Future Generation Computer Systems 25 (2009) 364–370
An alternative is to make processors only return to the pool of available processors if the job cannot use it at all, as shown in Fig. 2. Remember that the basic adaptive scheduling policy is only activated when a job arrives or completes. Therefore, the released processors can remain unused if these events do not occur. Moreover, the approach discussed in Fig. 2 is sensitive to the number of processors allocated to a job. In other words, at high system loads, in which adaptive policies tend to assign a single processor to the jobs, they are not useful. The reassignment policy tries to address these problems. 4. Illustration of the scheduling algorithm Suppose that we have 3 jobs in the job wait queue and 7 idle processors, P1 , . . . , P7 , such that P1 and P2 are 3 times faster than P5 , P6 and P7 , while P3 and P4 are twice as fast as P5 , P6 and P7 : X µt = 3 × (P1 + P2 ) + 2 × (P3 + P4 )+ P5 + P6 + P7 . (8) The partition size is 4, and the 7 idle processors will be divided among the 3 jobs in such a way that J1 = {P1 , P5 }, J2 = {P2 , P6 } and J3 = {P3 , P4 , P7 }. 5. Baseline adaptive space-sharing policies We have examined the impact of dynamic load variations and resource heterogeneity on several adaptive scheduling policies [3,9,8]. However, we choose two adaptive scheduling policies, introduced in [8] and [9] respectively. The rational for using these two policies is that they represent two main streams of adaptive scheduling policies, in that the adaptive scheduling policies introduced in [8] are work-conserving policies while the adaptive scheduling policies introduced in [9] are non-work conserving policy. Moreover, in addition to being the subject of many papers, these two policies also use quite different rules in determining the number of active jobs in the system, as well as the partition size allocated to these jobs. We also found that, qualitatively, the impact of both load variations and resource heterogeneity is similar on all the policies that we studied. 5.1. Equi-partition adaptive policy In Equi-partition Adaptive Policy (AAP), if a job finds idle processors on arrival, it is allocated the lesser of the number of available processors and the jobs maximum parallelism (Pmax). Otherwise, the job waits in the job wait queue. When a job completes, the released processors are divided among the waiting jobs in two steps. First, a target partition size (PS) is computed. Second, based on the target partition size, the number of processors assigned to each job is determined. Note that, if there is an excess processor left over, one of the jobs is given the excess processor. For example, when a job that has 10 processors is completed and there are 3 jobs in the job wait queue, then the 10 idle processors will be divided among the jobs as 4-3-3. Let us illustrate how the proportional share rule works with the AAP policy discussed in the previous section. Suppose that
we have 3 jobs in the job wait queue and 7 idle processors, P1 , . . . , P7 , such that P1 and P2 are three times faster than P5 , P6 and P7 while P3 and P4 are twice as fast as P5 , P6 and P7 . Without the proposed rule, the partition size will be 2 and each job will get 2 processors, with one of them getting the processor left over. With the proposed rule, the partition size is 4, and 7 idle processors will be divided among the jobs in such a way that J1 = {P1 , P5 }, J2 = {P2 , P6 } and J3 = {P1 , P3 , P5 }. 5.2. Modified adaptive policy In the Modified Adaptive Policy (MAP) [9], which is a variation of the well-known adaptive scheduling policy, when determining the partition size, the policy considers both running and waiting jobs while the contribution of the scheduled jobs is only half of the total number of running jobs. After target partition size determination, the policy decides whether or not to allocate the available processors to the jobs, as well as how many job to schedule. If the number of free processors is smaller than the computed target partition size, then a nonwork-conserving decision is made, which means that no job is scheduled and the free processors are kept idle. As soon as the target number of processors is available, the scheduler starts allocating processors to the jobs from the waiting queue in the first come first served FCFS manner. Let us illustrate how the proportional share rule works with the AAP policy discussed in the previous section. Suppose that we have 3 jobs in the job wait queue and 7 idle processors, P1 , . . . , P7 , such that P1 and P2 are three times faster than P5 , P6 and P7 while P3 and P4 are twice as fast as P5 , P6 and P7 . Without the proposed rule, the partition size will be 2 and each job will get 2 processors, with one of them getting the processor left over. With the proposed rule, the partition size is 4, and 7 idle processors will be divided among the jobs in such a way that J1 = {P1 , P5 }, J2 = {P2 , P6 } and J3 = {P1 , P3 , P5 }. 6. Performance evaluation We conducted several experimental measurements using simulation to quantify the impacts of load variations and resource heterogeneity on adaptive scheduling policies, as well as to determine the effectiveness of the proposed approaches to combat the inefficiencies caused by these factors. We have examined the impact of dynamic load variations and resource heterogeneity on adaptive scheduling policies discussed in [3, 9,8]. However, we choose two adaptive scheduling policies, namely the equi-partition Adaptive Policy (AAP) introduced in [8] and the modified adaptive policy (MAP) [9] for comparison with the proposed scheduling policy. 6.1. System and workload models We used a system composed of 40 independent workstations; each workstation services a stream of workstation owner processes, which we refer to as local jobs. We used traces of workstations from public workstation clusters from the University of Maryland to carry out the performance evaluation of the three scheduling policies and the proposed solutions. This trace
J.H. Abawajy / Future Generation Computer Systems 25 (2009) 364–370
369
Fig. 3. Relative performance of the SOUL and AAP policies as a function of the mean response time.
contains data for about 40 workstations and covers a 14-day period. However, the trace did not have the processor speed data, so we added this information. In order to model the processor speeds, we used the SPECfp2000 results for a family of Intel Pentium 3 and 4 processors with different clock speeds. We used the execution time function used in [9,8] to generate the execution time of the jobs as follows: T (W, P) = Φ ×
W +α+β × P P
(9)
where P is the number of processors assigned to the job, W is the amount of work (commutative service demand) of the jobs, and Φ denotes the load imbalance among the threads of the job. The parameter α is the amount of sequential computation and the amount of per-processor work required for the parallelization of the computation, and β is the communication and congestion delays that increase with the number of processors. By assigning different values to these parameters, one can generate different types of parallel jobs. The values that we used in this paper for these parameters are the same as in [9,8]. We set F = 1 and generate the maximum parallelism of the job through a uniform distribution ranging between 2 and 40. The service demands (i.e., W ) of the parallel job is generated using a two-stage hyper-exponential distribution with a mean of 13.76 and a coefficient of variation of 3.5. The inter-arrival time is exponentially distributed, with a coefficient of variation of 1.0. We then created two parallel workloads to derive the performance analysis of the two scheduling policies. The first workload, Workload WK1, consists of curves with relatively good speed-up, to the degree that this is permitted by the value of the maximum parallelism of the job. The second workload, Workload WK3, consists of curves with a speed-up not as good as that of Workload WK1. In the experiments, the effects of the memory requirements and the communication or synchronization latencies are not represented explicitly in the
system model. Rather, they appear implicitly in the shape of the job execution time functions. 7. Simulation results and discussions In this section, we will present results of the simulation. We use the mean response time as the chief performance metric as in [3,9,8]. In all experiments performed in this paper, a batch strategy is used to compute confidence intervals (at least 30 batches, each with 5000 jobs runs, were used for the results reported in this paper). 7.1. Relative performance of the scheduling policies In this section, we compare the performance of the proposed adaptive scheduling policy (i.e., SOUL) against the Equi-partition Adaptive Policy (AAP) [8]. Fig. 3 show the performance of the two policies under both WK1 and WK3 workloads, respectively. From the data on the two graphs, we observe that the interruption from a local job on the performance of non-local jobs is quite substantial. The two graphs also show that, by simply disallowing a job to hold onto the processors when it has no use of it, the performance of the jobs can be improved substantially. 7.2. Relative performance of the scheduling policies In this section, we compare the performance of the proposed adaptive scheduling policy (i.e., SOUL) against the MAP policy [9]. Fig. 4 shows the performance of the MAP and the SOUL policies under WK1 and WK3 workloads, respectively. The trend is the same as in the previous section, showing that the proposed approach leads to a substantial improvement in the performance of the jobs. This is because the proposed approach takes both heterogenity and node variability into account when scheduling.
370
J.H. Abawajy / Future Generation Computer Systems 25 (2009) 364–370
Fig. 4. Relative performance of the SOUL and MAP policies as a function of the mean response time.
8. Conclusion and future directions With its growing popularity, efficient scheduling and resource management is essential to make clusters more suitable for next-generation applications. This paper examines issues that arise when combining parallel and sequential jobs on a single heterogeneous platform under the adaptive scheduling policy. We showed that both heterogeneity and background workloads have a noticeable effect on parallel program performance. We proposed a new adaptive policy to reduce the impacts of heterogeneity and processor load variations on the performance of the jobs under the spacesharing policies. The results of our experiments show that the proposed technique is effective in minimizing the impacts of heterogeneity and processor load variation quite well. We are currently investigating whether or not the migration of processes from busy to idle workstations is necessary to maintain acceptable parallel application performance. Acknowledgement
[5] L. He, S.A. Jarvis, D.P. Spooner, H. Jiang, D.N. Dillenberger, G.R. Nudd, Allocating non-real-time and soft real-time jobs in multiclusters, IEEE Transactions on Parallel and Distributed Systems 17 (2) (1999) 99–112. [6] S. IqbalL, R. Gupta, Y.-C. Fang, Scheduling in hpc clusters, in: Dell Power Solutions, 2005, pp. 133–136. [7] J. Sherwani, N. Ali, N. Lotia, Z. Hayat, R. Buyya, Libra: A computational economy based job scheduling system for clusters, Software: Practice and Experience (SPE) Journal 34 (6) (2004) 573–590. [8] A. Stergios, A. Sevcik, Parallel application scheduling on networks of workstations, Parallel and Distributed Computing 43 (1997) 109–124. [9] T.K. Thanalapati, S.P. Dandamudi, An efficient adaptive scheduling scheme for distributed memory multicomputers, IEEE Transactions on Parallel and Distributed Systems 12 (7) (2001) 758–768. [10] M.Q. Xu, Effective meta-computing using LSF multi-cluster, in: Proceedings of CCGrid2001, San Francisco, California, 2001, pp. 100–106. [11] C.S. Yeo, R. Buyya, H. Pourreza, R. Eskicioglu, P. Graham, F. Sommers, Cluster Computing: High-Performance, High-Availability, and HighThroughput Processing on a Network of Computers, vol. 29(6), Springer Science+Business Media Inc., New York, USA, 2006, pp. 521–551. [12] B.B. Zhou, B. Qu, R. Brent, Effective scheduling in a mixed parallel and sequential computing environment, in: Proceedings of the 6th Euromicro Workshop on Parallel and Distributed Processing, 1998, p. 32.
My thanks to Maliha Omar, without whom this paper would not have been completed. References [1] J.H. Abawajy, Performance analysis of adaptive scheduling policies under shared heterogeneous distributed systems, in: Proceedings of ISPDSEC/PACT2002, 2002, pp. 336–343. [2] J.H. Abawajy, Preeemptive job scheduling policy for distributively-owned workstation clusters, Journal of Parallel Processing Letters 14 (2) (2001) 255–270. [3] S.P. Dandamudi, H. Yu, Performance of adaptive space sharing processor allocation policies for distributed-memory multicomputers, Journal of Parallel and Distributed Computing 58 (1) (1999) 109–125. [4] A.K.L. Goscinski, A.M. Wong, Performance evaluation of the concurrent execution of nas parallel benchmarks with byte sequential benchmarks on a cluster, in: Proceedings of Parallel and Distributed Systems, 2005, pp. 313–319.
Dr. J.H. Abawajy is a faculty member of computer science at Deakin University, Department of Science and Technology, School of Engineering and Information Technology. Dr. Abawajy received his Ph.D. in computer science from the Ottawa-Carleton Institute of Technology (Canada), an M.Sc. in computer science from Dalhousie University (Canada), and a B.Sc. in computer science from St. F. X University (Canada). He has published more than 70 papers in refereed international journals and conferences. His research interests are in the area of highperformance grid and cluster computing, performance analysis, data management for large-scale applications and mobile systems. Dr. Abawajy has guestedited several journals and served on the program committee of numerous international and national conferences. He has also chaired several special sessions and workshops in conjunction with international conferences. Dr. Abawajy has worked as a software engineer, UNIX systems administrator and database administrator for many years.