Prioritized Workflow Scheduling in Cloud Computing

4 downloads 0 Views 186KB Size Report
Lovely Professional University. Jalandhar. Punjab [email protected] ..... Principles and paradigms.” John Wiley & Sons, Inc.Hoboken,. New Jersey.
International Conference on Cloud, Big Data and Trust 2013, Nov 13-15, RGPV

Prioritized Workflow Scheduling in Cloud Computing Rajwinder Singh Lovely Professional University Jalandhar Punjab [email protected]

Abstract- Cloud Computing is a large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted virtualized, dynamically scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet. It has the capacity to yoke the internet and wide area network to use the resources that are available remotely there by to provide cost efficient solution on pay per use basis. Cloud Computing provides a wide computation and resource facilities for execution of various application over workflows the internet. There are many heterogeneous resources involved in the execution of single application workflow. With the increasing demand and benefits of cloud computing infrastructure, society can take leverage of intensive computing capability services and scalable virtualized environment of cloud computing to carry out various tasks. Cloud is growing very fast; also the workload increases with increase in Cloud Services and clients. Now days it become challenge for a Cloud Infrastructure to handle requests generated by number of Cloud clients. There is a need to schedule these requests first to execute on different available Cloud resources. The execution of cloud workflows faces many uncertain factors in allocating and scheduling workload. The first step is to provide an efficient workflow allocation model by considering the client’s requirements. Workflow scheduling model will schedule jobs in such a way that all the jobs will get executed taking minimum possible time, considering client’s requirements. Keywords- Prioritized Workflow, Cloud Computing, Workflow, Workflow scheme.

I. INTRODUCTION Cloud Computing is a large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted virtualized, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet [1][4]. Cloud computing –a relatively recent term, embraces Cyber infrastructure and is built upon virtualization, distributed computing, utility computing, and more recently networking, web and software services. It implies a service oriented architecture, reduced information technology overhead for the end-user, providing great flexibility, reduced total cost of owner ship and on-demand services [2]. In Cloud computing user by sitting on single computer system using Internet and an application

Prasenjit kumar patra Lovely Professional University Jalandhar, Punjab [email protected]

(commonly browser) can access number of resources and services provided by various cloud providers. The numbers of cloud services are growing very fast; which also increase the Cloud users. With this increase there is a need to manage all those cloud users and their requests. To manage all the requests generated from different users in such a manner that each request will get a proper utilization, the concept of workflow comes into picture. One from many definitions of workflow, a Workflow is defined as the automation of a business process, in whole or in part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules [6]. A workflow models a process as consisting of a series of steps that simplify the complexity of execution and management of applications [10]. Scheduling is an important term when to talk about the managing requests and users in a computing environment. Scheduling is nothing but a set of task versus set of processors. Workflow scheduling can be defined as the automation in workload scheduling where the workload is the requests and tasks generated by number of user s or clients in cloud. A scheduling can be categories into two categories: Job Scheduling and Job Mapping and Scheduling. Job Scheduling is what in which independent jobs gets scheduled among various available processors of distributed computing for optimization. A Job Mapping and Scheduling requires the allocation of multiple interacting tasks of a single parallel program in order to minimize the completion time on parallel computer system [9]. A task is an (sequential) activity that uses a set of inputs to produce a set of outputs. Processes in fixed set are statically assigned to processors, either at compile-time or at start-up. There are two types of scheduling: static and dynamic. In static load balancing, all information is known in advance and tasks are allocated according to the prior knowledge and will not be affected by the state of the system. Dynamic loadbalancing mechanism has to allocate tasks to the processors dynamically as they arrive. Redistribution of tasks has to take place when some processors become overloaded [8].

42

International Conference on Cloud, Big Data and Trust 2013, Nov 13-15, RGPV

II. RELATED WORK A cloud computing is a very big network where the millions of users accessing thousands of servers all times. Scheduling the tasks/Jobs at the server end is very difficult for a server, because the number Job requesting is very large in number, which required some resources to execute. The schedule thus build by the server must be optimal and good enough so that each request by the user gets response in time, and ever y Task/Job gets proper resources for its execution [14]. There are various algorithms and models defined to solve this problem and few of workflow scheduling algorithms and models are: A. POSEC and Pareto Analysis Especially in cloud, when to talk about Job mapping and scheduling, there are two algorithm which works on these jobs for the optimization scheduling: one is algorithm based on POSEC method [12] and the other is algorithm based on Pareto analysis. POSEC is Prioritize by Organize, Streamlining, Economizing and Contributing. The POSEC method prioritizes the jobs on the basis of their parameters like organize, streamlining, economizing and contributing and then categorize the jobs into four level keeping its urgency and importance as parameter. The scheduling of jobs is then applied on these levels for optimal execution. The objective of this algorithm is efficient time management and load balancing [9]. According to Pareto Analysis[12] 80% of tasks completes its execution taking 20% of time and rest 20% jobs will take up rest 80% of time for their execution. This principle is used to sort tasks into two parts. According to this for m of Pareto analysis it is recommended that tasks that fall into the first category be assigned a higher priority . The 80-20rule can also be applied to increase productivity: it is assumed that 80% of the productivity can be achieved by doing 20% of the tasks. If productivity is the aim of time management, then these tasks should be prioritized higher. If the higher priority jobs are put into first 80%jobs category then the execution of jobs takes very less time as the important or prioritized jobs will execute first and thus model build is more optimal. It has been found that algorithm based on Pareto Analysis take less time to execute the same set of jobs as executed using algorithm based on POSEC method [9]. B. Hierarchical cloud workflow scheduling schema The Cloud workflow system can coordinate multiple job submissions over cloud services. The goal of Cloud workflow scheduling schema is to make sure the proper activities are executed by the right service at the right time. Another way to achieve optimal schedule is by using hierarchical cloud workflow scheduling schema. According to this schema the whole workflow scheduling is divided into three stages: at very first stage whatever job requests are coming, it look for the parallel jobs and then splits all the parallel jobs. Each job needs some resources for its execution during its second step matching of jobs with corresponding candidate services takes

place. This means the resources are assigned to different jobs as per their requirements. And in the third/last stage a scheduling algorithm is applied for the execution of these jobs. This Scheduling algorithm can be any depending on the requirements and for the optimality. This Algorithm focus on the hierarchical Cloud service workflow scheduling, Cloud workflow tasks parallel split, syntax and semantic based Cloud workflow tasks matching algorithm, and multiple QOS constraints based Cloud workflow scheduling and optimization, and also presents the experiments conducted to evaluate the efficiency of our algorithm. Using Heuristic Generic algorithm this scheme can achieve an optimal workflow schedule [5]. C. One-port model and Multi-port model Specifically for the linear workflow, a linear workflow is what in which dependencies between stages can be represented by a linear graph, and to schedule such workflow two methods are used: one-port model and multiport model. One-port model is one in which each processor can perform computation, receiving incoming task or sending output one at a time only. There is no parallel processing in one-port model but in case of multiport model a processor can perform multiple operations like computation, receiving input etc at one time. Multiport model allows multiple incoming and outgoing at the same time. These two algorithmic models are useful for linear workflow optimization and helps in minimizing the latency [7]. This will lead to an optimal workflow scheduling for linear workflow. D. Activity based costing in cloud computing Activity based costing in cloud computing is another model for optimal workflow scheduling. This Activity based costing is the way to measure both cost of the object and its performance. According to Activity based costing model, a task can be evaluated separately on the bases of their resources, space and time taken to completely execute. A job can be categorized on the basis of Available and Partially Available factor. An Available job is one whose all required resources for execution is available at same data center only and a Partially Available job is such whose required resources for its execution is not present on single data center which means resources are scattered among different data centers. For a Partially Available job resources need to be collect from different data centers for its execution. The jobs are further subdivided on the basis of their dependencies as Dependent and Independent jobs [3]. The scheduling can be done at this bottom level for the optimality of workflow scheduling. E. Two levels task scheduling mechanism based on load balancing Two levels task scheduling mechanism based on load balancing is one way for task scheduling optimization in cloud computing system. According to this method tasks are scheduled at scheduling optimizer and this scheduling optimizer take information from system model and predicted execution time model which keep track of all the resources and predicted execution information. Scheduling optimizer itself checks whether if the current prepared scheduled is optimize or not if not

43

International Conference on Cloud, Big Data and Trust 2013, Nov 13-15, RGPV

it will regenerate the optimal schedule [11]. Using this model the execution time decreases and also the resource utilization increases.

our experimental Cloud environment. The details of the experimental process and the case scenario are described below.

III. PROBLEM FORMULATION Cloud Computing is a large-scale distributed computing paradigm. It has the capacity to yoke the internet and wide area network to use the resources that are available remotely. Cloud Computing provides a wide computation and resource facilities for execution of various application over workflows the internet. There are many heterogeneous resources involved in the execution of single application. All these features of cloud computing makes it more popular in computing environment also the cloud computing is growing day by day in a very fast rate. Now cloud computing becomes a very big network where the millions of users accessing thousands of servers all times. These Servers may present at single place or may be at different geographical places. The cloud users generate number of requests onto cloud server for processing. As the cloud users are very large in number thus the requests generated by them are also large in number. Suppose the number of cloud users access applications or resources which are managed by single management server then managing and scheduling these requests at server is very difficult for a management server. Also each request generated by the cloud user needs some computing or storage space resources to get executed and it became a tough job for a management server to manage all requests, cloud users and resources. Scheduling requests onto different processing elements does solve the purpose to some extent. But for this the schedule must be optimal and good enough so that each request by the user gets proper resources for their execution and executes taking least time. IV. PROPOSED ALGORITHM Managing number of requests generated from cloud users is a difficult task for a management server and to overcome the managing and scheduling problem which provides each request a better resource with minimum execution time, a Prioritized Workflow Scheduling Algorithm is proposed and implemented in simulated environment. The algorithm works on following three steps: Step 1: From number of requesting jobs, cluster these jobs on the basis of their certain attributes. Step 2: Within each cluster apply some priority algorithm to prioritize jobs on basis of certain attributes. This job attribute for prioritizing may be different from the attribute used for clustering. Step 3: Now assign these clusters the computing environment which is capable of performing execution of jobs within cluster taking least time. V. SIMULATION AND RESULT The proposed algorithm executed and resulted in a simulated environment using a simulator named Cloudsim. Cloudsim is a java based tool especially for the execution of cloud based application or simulation. This algorithm has been evaluated in

Fig.1. A proposed model (Prioritized Workflow Scheduling)

a) Simulation Description By taking assumption; the requests generated from number of cloud users are termed as Jobs in setup. All the jobs taken are independent jobs which mean there is no dependency between any set of jobs. The “Prioritized Workflow Scheduling” model is implemented and compared with first come first serve (Sequential Scheduling and execution) initially. The results are calculated as the total time taken to execute all jobs in different scenarios. CloudSim v3.0.2 is used to implement Workflow Scheduling in Cloud Computing. The simulation is performed on a computer running Window 7. The configuration of computer is as Processor- Intel® Core™ i3-2350M CPU @ 2.30GHz 2.30GHz Processor, RAM- 4GB DDR3 Main Memory, HDD - 500GB 5400RPM Hard Drive. b) Virtual Machine A virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes programs like a physical machine [13]. Table I describes the configuration of VMs. TABLE I CONFIGURATION OF VMs Configuration

VM

Ram

512 MB

No. of Processors

1

MIPS

250

Band Width

1000

Storage Space

10000MB

Processor Type

Xen

c) Cloudlet Cloudlet will work as Input job/task to the Cloud Environment. Cloudlet is an extension to the cloudlet. It stores, despite all the information encapsulated in the Cloudlet, the ID of the VM running it.

44

International Conference on Cloud, Big Data and Trust 2013, Nov 13-15, RGPV

To experimentally implement “Prioritized Workflow Scheduling” in cloud environment, there are two scenarios created in simulated cloud environment using simulator CloudSim. Experiments were performed on Cloud simulated environment using “Prioritized Workflow Scheduling” and using simple First Come First Serve (sequential) approach. Two different scenarios are: The 1st scenario considered the numbers of virtual machines are kept constant (50) and number of jobs is variable. Table II shows the Comparison of Execution time of jobs when executed by applying “prioritized workflow Scheduling” versus the execution time when using sequential workflow execution algorithm keeping number of VMs constant (50). Fig 2 shows a graphical representation of comparison between two algorithms keeping number of VMs constant.

versus the execution time when using sequential workflow execution algorithm keeping number of Jobs constant (500). Fig 3 shows a graphical representation of comparison between two algorithms keeping number of Cloudlets constant.

TABLE II COMPARISON OF EXECUTION TIME WITH CONSTANT NUMBER OF VMs Sr. No.

No. of Cloudlets 10

Execution Time using Proposed approach 92.28

Execution Time using Simple approach 98.74

1 2

50

98.88

100.04

3

100

150.74

188.8

4

150

234.92

272.84

5

200

284.78

349.98

6

300

409.06

496.44

7

400

528.4

678.06

8

500

677.28

742.3

9

600

779.23

871.8

10

700

894.48

997.1

11

800

1090.34

12

900

13

Fig.2. Comparison of Execution Time with constant number of VMs

TABLE III: COMPARISON OF EXECUTION TIME WITH CONSTANT NUMBER OF JOBS

5

Execution Time using Proposed Approach 6063.79

Execution Time using Simple Approach 6143.41

2

10

3105.64

3198.25

3

20

1598.21

1694.14

1176.44

4

30

1151.5

1215.3

1177.71

1313.89

5

40

835.88

908.93

1000

1292.01

1452.34

6

50

668.52

756.73

14

1100

1427.39

1602.43 7

60

595.74

652.52

15

1200

1548.36

1742.15

16

1300

1647.79

1872.67

8

70

516.78

582.47

17

1400

1801.95

2026.66

9

80

480.03

554.33

18

1500

1917.78

2151.2

10

90

407.73

488.54

11

100

369.57

431.06

In the 2nd Scenario the numbers of jobs are kept constant and virtual machines are variable Table III shows the Comparison of Execution time of jobss when executed by applying “Prioritized Workflow Scheduling”

Sr. No.

No. of VM

1

45

International Conference on Cloud, Big Data and Trust 2013, Nov 13-15, RGPV

Fig.3. Comparison of Execution Time with constant number of Cloudlets

VI. CONCLUSION There are two scenarios created for simulation. One was with using “Prioritized Workflow Scheduling” and other using simple First Come First Serve (sequential) approach; from the observed results, it is concluded: 1.

2. 3. 4.

The Prioritized Workflow Scheduling in cloud computing improves the execution time of jobs when compared with simple First Come First Serve (sequential) approach. The execution time is improving with increase in number of requesting jobs. The execution time is improved with even increase in the number of Virtual Machines. Using Prioritized Workflow Scheduling approach, two priority algorithms can be applied on single set of jobs.

[6] WFMC-TC-1011, liu,k.(1999). Workflow Management Coalition, Workflow Management Coalition Terminology & Glossary.” Brussels [7] K. Agarwal, A. Benoit, L.Magnan, Y.Robert (2010). “Scheduling Algorithms for linear workflow optimization.” IEEEInternational Symposium on Parallel & Distributed Processing, Atlanta, GA 19-23 April 2010 , pp-1 –12, Parallel & Distributed Processing (IPDPS) [8] UCB/EECS-2009-55(2009). ”Job Scheduling for Multi-user Mapreduce Cluster.” Electrical Engineering and Computer Sciences University of California at Berkeley [9] Navjot Kaur, T. S. (2011). Comparison of workflow Scheduling Algorithms in Cloud Computing Internation Journal of Advanced computer Science and Applications ,2(10) , 81-86 [10] S. Pandey, Dileban Karunamoorthy and Rajkumar Buyya (2011) “Workflow Engine for Clouds” Rajkumar Buyya, James Broberg, Andrzej Goscinski “Cloud Computing: Principles and paradigms.” John Wiley & Sons, Inc.Hoboken, New Jersey. [11] Tayal, S. (2011). Tasks Scheduling Optimization for the Cloud Computing System. International Journal of Advanced Engineering Science and Technologies (IJAEST) ,5(2), 111- 115. [12] Nyema R Hamilton “Key Facts about Time Management” Retrieved July 20, 2012 from http://ezinearticles.com/?KeyFacts-About-TimeManagement&id=7071162. [13] Smith, James; Nair, Ravi (2005). "The Architecture of Virtual Machines". Computer (IEEE computer society) 38(50):32-38. [14] Rajwinder Singh, Pushpendra Kumar Pateriya, “Workflow Scheduling in Cloud Computing”, International Journal of Computer Application, (0975-8887), January 2013.

REFERENCE [1] M.laden, A.Vouk (2008.) “Cloud computing - Issues, Research and Implementations,” Journal of Computing and Information Technology, 16(4), 235–246 [2] Prasenjit kumar patra, Harshpreet singh, Gurpreet singh (2013) “Fault Tolerance Techniques and Comparative Implementation in Cloud Computing” International Journal of Computer Applications, 64(14), 37-41 [3] Ashutosh Ingole, Sumit Chavan and Utkarsh Pawde. (2011) “An optimized Algorithm for task scheduling based on activity based costing in Cloud Computing.” 2nd national Conference on Information and Communication Technology NCICT. Nagpur (India), 23-24 December, 34-37, Foundation of Computer Science, New York, USA. [4] Jeremy Geelan, Twenty-one Experts Define Cloud Computing. January 24, 2009, Retrieved october 2012, SYS-CON Media . [5] Cheng, B. (2012). “Hierarchical Cloud Services Workflow scheduling Optimization schema using Heuristic Generic Algorithm”. Przegląd Elektrotechniczny 1b/2012, 92-95

46

Suggest Documents