A Scheduling Approach with Respect to Overlap of Computing and

2 downloads 0 Views 209KB Size Report
bound to job scheduling decisions or not by a decoupled way. ... Public Store ... transferring to Data Transferring Component (DTC), if overlap of computing and ... currently. • Arrived Task Queue (ATQ) and Scheduled Task Queue (STQ): An ATQ .... named VGRID, in which tasks are auto-scheduled in a visual fashion, and it ...
A Scheduling Approach with Respect to Overlap of Computing and Data Transferring in Grid Computing Changqin Huang1,2, Yao Zheng1,2, and Deren Chen1 1

College of Computer Science, Zhejiang University, Hangzhou, 310027, P. R. China 2 Center for Engineering and Scientific Computation, Zhejiang University, Hangzhou, 310027, P. R. China

Abstract. In this paper, we present a two-level distributed schedule model, and propose a scheduling approach with respect to overlap of computing and data transferring. On the basis of network status, node load, and the relation between task execution and task data access, data transferring and computing can occur concurrently in the following three cases: a) A task is being executed on a part of its dataset when the other of its dataset is being replicated; b) A dataset of a scheduled task is being replicated to a node, at which another task is running; c) Data exchange happens when dependant subtasks are running at different nodes. Corresponding theoretical analysis and experimental results demonstrate that the scheduling approach improves execution performance and resource utilization.

1 Introduction A computational grid is an emerging computing infrastructure that enables effective access to distributed and heterogeneous computing resources in order to serve the needs of a Virtual Organization (VO) [1]. The performance that can be delivered varies dynamically for resources competing, network status, task type, and so on. Therefore, resource management and scheduling is a key and hard issue. In data management, replication from primary repositories to other locations at an apt moment can be an important optimization step [2,3]. In the present paper, we focus on scheduling approaches suitable for large-scale data-intensive applications or those of data-intensive and computing-intensive nature, which exist widely in the area of engineering and scientific computation. In the present work, we adopt a distributed schedule model, in which there exist two level schedulers. The scheduler schedules task execution on the basis of a variety of metrics and constraints, meanwhile it tries its best to reduce task expending time to improve performance by overlap of computing and data transferring. This paper is organized as follows: Section 2 reviews related work in the arena of grid scheduling. In Section 3, details of our approach and proposed scheduling model are described. An algorithm and apt analysis are included in Section 4. Case studies with experimental results are included in Section 5, and conclusions in Section 6.

2 Related Work For the development and deployment of applications on computational grids, there are a number of approaches to scheduling. Vadhiyar et al. [4] present Metascheduler with a 2D chart and Metascheduler types. Berman et al. [5] adopt performance evaluation techniques [6], and they utilize the NWS [7] resource monitoring service at application-level scheduling. Abraham et al. [8] use a parametric engine and heuristic algorithms. Zomaya et al. [9] apply a genetic algorithm. Beaumont et al. [10] aim at independent and equal-sized tasks. Dogan et al. [11] consider the problem of scheduling independent tasks with multiple QoS requirements. The above schedules [4,5,8-11] are related to independent tasks or ignore issues of efficient replication. An adaptive scheduling algorithm for parameter sweep applications is used by Casanova et al. [12], and they take data storage into account. The essential difference between their work and ours is that our heuristic actively replicates datasets. Thain et al. [13] describe a system that links jobs and data by binding execution and storage sites into I/O communities, but do not address policy issues. Ranganathan et al. [14] focus on data-intensive applications, where data movement can be either tightly bound to job scheduling decisions or not by a decoupled way. They don’t consider the cases that task computing and data transferring proceed in a parallel fashion on a node.

3 Scheduling Strategy and Scheduling Model To provide the context for this scheduling strategy and system model, we first address the scheduling scenario in detail. Each site (LAN) comprises a number of nodes (such as PCs, clusters, and supercomputers), and each node has a limited amount of storage. A set of data is initially put onto a node, where user’s task is submitted, or it is mapped to nodes at this site according to a certain distribution. The target computational grid consists of heterogeneous nodes connected by LANs and/or WANs. The whole computational grid is hierarchical: node, LAN, and WAN. Scheduling at a single node is ignored here. The scheduling is divided into two levels: Global Scheduler (GS), corresponding to a WAN; and Local Scheduler (LS), corresponding to a LAN. Firstly, tasks are submitted to a site/node, at which the associated LS is activated to schedule the tasks, and when this scheduler fails in scheduling, these requests are passed to the associated GSs, A GS is responsible for determining which site(s) in its domain these tasks are sent to. Finally, the corresponding LS gives a complete schedule by the local scheduling algorithm. It kills these requests at the other schedulers, then lets tasks be executed and results be returned. As far as algorithms are concerned, the “Best” schedule considers information such as CPU speed, network status between hosts, and task properties. This information is retrieved from resource information providers, such as the Network Weather System (NWS) and the Metacomputing Directory Service (MDS). Our approach requires an application-specific performance model and a scheduler. The scheduler

schedules tasks and make a decision of transferring data. The goal of the scheduler is to develop a schedule that minimizes makespan and maximizes utilization rate. Each scheduler has two components and two queues as shown in Fig. 1, and their functionalities and relations are described as follows. Top-level

Global Scheduler

Global Scheduler

Local Scheduler

bottom-level

Arrived Task Queue (ATQ)

Get t a sk

T ask S c h e d u lin g C om ponent (T S C )

i n f o rm r e s p o n se Get task

Put tas k Enable task d i s t r i b ut i o n e x e c u t i on

Scheduled Task Queue (STQ) Put t a sk

Get t a sk

Local Scheduler

Local Scheduler

D a ta T ra n s fe rrin g C om ponent (D T C ) Get data I n f o r m a ti o n Enable data movement

Legend: Data control and transfer Task control and transfer Interact between components

User Database Public Store

Fig. 1. Scheduling model and interaction in a scheduler or among schedulers

• Task Scheduling Component (TSC): TSC makes a scheduling decision on the basis of information about resources and tasks, and passes some messages of data transferring to Data Transferring Component (DTC), if overlap of computing and data transferring occurs (we will discuss the details in Section 4). When there exist tasks in Arrived Task Queue (ATQ), TSC keeps activated and gives a schedule of all tasks in the associated ATQ, puts the scheduled tasks into the associated Scheduled Task Queue (STQ), and directs the tasks to be executed on selected resources. Within a limited period of time, if the TSC is not able to give a schedule to a certain task, it will deliver the task request to associated GS’s ATQ to schedule the task with the similar method, otherwise, it returns “failure”. • Data Transferring Component (DTC): DTC keeps track of the popularity of each dataset locally available. It works in the following two ways: a) Only if DTC receives associated messages from TSC, it can make a decision on “how” to replicate datasets necessary for tasks in STQ or tasks being executed, under conditions that CPU is busy but connected network is idle. b) It makes a decision on “how” to exchange data necessary for dependant subtasks being executed. Finally, it directs nodes to transfer datasets, so computing and data transferring are performed concurrently. • Arrived Task Queue (ATQ) and Scheduled Task Queue (STQ): An ATQ stores all tasks to be delivered to its scheduler. Tasks are put into an ATQ when task requests arrive, and a task is taken out when it has been scheduled. The STQ store those tasks scheduled by the local TSC, and its task is taken out when the task comes into execution.

4 Scheduling Algorithm and Theoretical Analysis

4.1 Assumptions Based on the above-mentioned scheduling model, with a concern for network traffic, we limit a task’s execution in a LAN. To simplify the scheduling, we make the following assumptions: a) Each task/subtask (a subtask exists when a task is divided into parallel subtasks) is assigned to a specific node that it can meet its deadline. b) The time spent on it can be predicted by related techniques (e.g., the PACE [15]). c) Before execution, each task/subtask can get the information about the relation of its computing and its dataset (e.g., the computing may proceed on a part of dataset). d) Tasks/subtasks can be pre-scheduled on the basis of task status and grid information. The core algorithm, by which the scheduler schedules task execution, is not uncertain and can be selected by users (e.g., the FCFS algorithm, and the GA algorithm). 4.2 Scheduling Algorithm Both GS and LS adopt the same approach described below except the core algorithms may not be the same. Only if a distributed task gets its dataset/subset, it starts to run. ScheduleGenerate(corealgorithm, predictmodel,performancemetric) Repeat If receives messages of certain dependant subtask generating data for exchange Repeat Searches for network information and corresponding nodes for data exchange. If corresponding nodes exist and associated network is idle Gives DTC a message of the data exchange for the dependant subtask Endif Until succeeds in Giving DTC a message Endif If ATQ is not empty Gets a task from ATQ and divides it into parallel subtasks if it can do Searches for information of nodes Schedules by the core algorithm under the constrains Puts scheduled tasks into STQ and puts remainder tasks into the higher ATQ Endif If STQ is not empty Gets a scheduled task T from STQ and analyzes its dataset characteristic If the network connected to nodes, at which T will be executed, is idle If its dataset can be divided Divides dataset and gives DTC a message of the subset replication for T Else Gives DTC messages of dataset replication for T Endif

Endif Endif Until the scheduling system halts 4.3 Theoretical Analysis The metric in our analysis used is makespan and average resource utilization rate here. We only analyze efficiency to be brought by our data transferring strategy. The generic algorithm without our approach is assumed: A scheduled task/subtask needs to hit all of its dataset by replication before it starts execution. It is opposite to our approach. To simplify analysis, both computing and data exchanging in the concurrent way, between the dependant subtasks being executed, are not considered here. Under the conditions that the dataset of one task/subtask is divisible, the above algorithm is considered. Let p denote this task or subtask, and m data size. Dataset is divided into n blocks equally. Let x denote the percentage of CPU performance decrease when transferring data on network concurrently, ν 1 the speed of transferring data on network, ν 2 the speed of processing data by a CPU, ω 1 the makespan for generic scheduling algorithm, and ω 2 the makespan for a our algorithm.It happens that both computing and transferring data are performed concurrently except the first block of data is transferred, so we have the following equation: ω1 =

ω2 =

If

m

ν1

+

m

ν2

m /n

ν1

(1)

,

( m / n ) * ( n − 1)

ω1 − ω 2

m

+ max(

ν 2 (1 − x %)

,

( m / n ) * ( n − 1)

ν

).

(2)

2

, then ω − ω = ; reversely, ≥ 1 2 ν1 ν 2 (1 − x %) ν2 m ( m / n ) * ( n − 1) 1 = + (1 − ) . Obviously, under the first condiν1 ν2 1 − x% m

m

tion, there exists ω 1 − ω 2 > 0 ; In general, x ≤ 20 when there exists multi-storage system, and m is very large, so there exists ω 1 − ω 2 > 0 under the second one. Totally, the makespan adopting our algorithm decreases considerably in general. Let υ 1 denote the average resource (CPU) utilization rate for generic scheduling algorithm, and υ 2 the average resource utilization rate for our algorithm. We give a period of time t = ω : (3)

m

υ1 =

ν m

ν1

2

+

m

ν

2

=

ν

,

2

ν1 +ν

2

m

υ2 =

ν 2 (1 − x %)

m /n

ν1

+ max(

m ( m / n ) * ( n − 1) , ) ν 2 (1 − x %) ν1

(4) .

m (m / n) * ( n − 1) , m and ( m + m ) ≥ [ m / n + max( Because m ≤ , )] ν 2 ν 2 (1 − x %) ν1 ν 2 ν1 ν 2 (1 − x%) ν1 υ 2 ≥ υ 1 . If x is little and m is large, υ 2 will increase considerably. If the dataset of one task/subtask cannot be divided, overlapping computing with data transferring can take place between the task/subtask to being executed and the scheduled task/subtask in a STQ. By analyzing, there exists a similar conclusion: Only if x is little and m is large, the makespan will be reduced and the average resource utilization rate will be increased considerably.

5 Experiments We have developed an engineering computation oriented visual grid prototype system, named VGRID, in which tasks are auto-scheduled in a visual fashion, and it permits a selection of task scheduling core algorithms. In this environment, three pairs of experiments have been designed by using the above scheduling approach. The tasks consist of the iterations of two application examples: Monte Carlo integration in high dimensions, including a small dataset transferring; video conversion application, including a large dataset replication and compression. All nodes are PCs with Intel Pentium 4 processors of 2.0G Hz, memory of 512M, Ethernet 100M, and hard disk 80G/7200rpm. The experiments are described as follows, where two approaches are used. Approach A adopts the FCFS algorithm, whereas Approach B adopts the FCFS algorithm with our scheduling approach. Case 1: One task: video conversion. One node. Case 2: Four tasks: Monte Carlo simulation, video conversion, Monte Carlo simulation, and video conversion in sequence. One node. Case 3: Four tasks: Monte Carlo simulation, video conversion, Monte Carlo simulation, and video conversion in sequence. Three nodes. Experimental results are illustrated in Fig.2 and 3. As shown in these figures, different types of tasks, scheduled task sequences and grid resources have different performance scenarios. In all experiments, all average resource utilization rates increase over 15% by adopting our algorithm. But in Case 1, the new makespan decreases very little and the associated average resource utilization rate increases 18%. This means that overlap of computing and transferring data cannot bring benefits, but adds a little workload. Therefore our algorithm isn’t very fit for the tasks of this type as performance decrease percentage x is large to these tasks, it occurs under the conditions of competing grid resources.

makespan (m)

1000 800 600

generic algorithm

400

new algorithm

200 0

1

2 case number

3

(%)

utilization rate

Fig. 2. Variation of the makespan

100 80 60 40 20 0

generic algorithm new algorithm 1

2 case number

3

Fig. 3. Variation of the average resource utilization rate

6 Conclusions A scheduling model and an associated algorithm were proposed in the present work. This approach tries its best to reduce task expending time to improve performance by overlapping computing with data transferring. We have theoretically analyzed and instantiated this algorithm with three tests based on a FCFS core algorithm in the VGRID under different conditions. Our results show: Firstly, it is obvious to improve system performance. Secondly, the relation of task execution and its dataset, and the size of data have a significant impact on system performance. Though these results are promising, in interpreting their significance we have to bear in mind that they are based on the simplified grid scenarios. The case, that these dependant subtasks move data for exchange, has not yet been studied in detail.

Acknowledgements The authors wish to thank the National Natural Science Foundation of China for the National Science Fund for Distinguished Young Scholars under grant Number 60225009. We would like to thank the Center for Engineering and Scientific Computation, Zhejiang University, for its computational resources, with which the research project has been carried out.

References 1 I. Foster, C. Kesselman et al.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications, 2001, 15 (3): 200-222 2 J. Subhlok and G. Vondran: Optimal Use of Mixed Task and Data Parallelism for Pipelined Computations. Journal of Parallel and Distributed Computing, 2000, 60: 297-319 3 O. Beaumont, A. Legrand et al.:  Scheduling Strategies for Mixed Data and Task Parallelism on Heterogeneous Clusters and Grids. Proc. of the 11th Euromicro Conference on Parallel, Distributed and Network-Based Processing,2003 4 S. S. Vadhiyar and J. J. Dongarra: A Metascheduler for the Grid. Proc. of the 11th IEEE International Symposium on High Performance Distributed Computing, 2002 5 F. Berman et al.: Adaptive Computing on the Grid Using AppLeS. IEEE Transactions on Parallel and Distribted Systems, 2003, 14(4): 369-382 6 W. Smith, I. Foster, and V. Taylor: Predicting Application Run Times Using Historical Information. Proc. of the IPPS/SPDP Workshop on Job Scheduling Strategies for Parallel Processing, 1998 7 R. Wolski et al.: The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Future Generation Computing Systems, 1999, (5-6): 757-768 8 A. Abraham, R. Buyya et al.: Nature’s Heuristics for Scheduling Jobs on Computational Grids. Proc. of 8th International Conference on Advanced Computing and Communications, Cochin, India, 2000 9 A. Y. Zomaya et al.: Observations on Using Genetic Algorithms for Dynamic LoadBalancing. IEEE Transactions on Parallel and Distributed Systems, 2001, 9: 899-911. 10 O. Beaumont and L. Carter: Bandwidth-Centric Allocation of Independent Tasks on Heterogeneous Platforms. Proc. of the International Parallel and Distributed Processing Symposium, 2002 11 A. Dogan and F. Özgüner: Scheduling Independent Tasks with QoS Requirements in Grid Computing with Time-Varying Resource Prices. Proc. of Grid Computing-GRID 2002, 2002, 58-69 12 H. Casanova, G. Obertelli et al.: The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid. Proc. of Supercomputing 2000, Denver, 2000 13 D. Thain, J. Bent et al.: Gathering at the Well: Creating Communities for Grid I/O. Proc. of Supercomputing 2000, Denver, 2000 14 K. Ranganathan and I. Foster: Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications. Proc. of the 11th International Symposium on High Performance Distributed Computing, 2002 15 G. R. Nudd et al.: PACE – A Toolset for the Performance Prediction of Parallel and Distributed Systems. Journal of High Performance Computing Applications, 2000, 3: 228-251

Suggest Documents