budget constrained priority based genetic ... - Semantic Scholar

9 downloads 563 Views 318KB Size Report
based upon user's Quality of Service (QoS) constraints. In this paper, we present Priority based Genetic. Algorithm BCHGA to schedule workflow applications.
BUDGET CONSTRAINED PRIORITY BASED GENETIC ALGORITHM FOR WORKFLOW SCHEDULING IN CLOUD Amandeep Verma1 and Sakshi Kaushal1 1

University Institute of Engineering and Technology, Panjab University, Chandigarh, India. [email protected], [email protected]

Abstract: Data transfer is a big overhead for cloud workflows due to pay-as-use business model of cloud environment. In such environment, the cost arising from data transfers between resources as well as execution costs must also be taken into account during scheduling based upon user’s Quality of Service (QoS) constraints. In this paper, we present Priority based Genetic Algorithm BCHGA to schedule workflow applications to cloud resources that optimize the total cost of workflow within the user’s specified budget. Each workflow’s task is assigned priority using bottom level (b-level) and top level (t-level). To increase the population diversity, these priorities are then used to create the initial population of BCHGA. The proposed algorithm is simulated in java and evaluated with synthetic workflows based on realistic workflows from the different areas by considering the pricing model of cloud service provider like Amazon. The simulation results show that our proposed algorithm has a promising performance as compared to Standard Genetic Algorithm (SGA).

the slower one. Therefore, scheduling algorithms applicable in grid workflow scheduling cannot be directly applied over cloud workflow applications [7]. In this paper, we proposed Budget Constrained Priority based Genetic Algorithm (BCHGA) to schedule applications to cloud resources that optimize the total cost ( sum of execution cost and data transfer cost) for running the workflow and also minimize the execution time under the budget constraint. The remaining paper is organized as follow: the related work done in the area of workflow scheduling for distributed computing is presented under title: Related Work. The workflow scheduling model used in experiment is explained in section: Workflow Scheduling Model. The Standard Genetic Algorithm (SGA) and proposed BCHGA are discussed in Section titled Standard Genetic Algorithm and Proposed Algorithm respectively. The proposed BCHGA is evaluated and compared with SGA in section Performance Evaluation and the paper is concluded in section Conclusion and Future Work. RELATED WORK

Keywords: Cloud Computing, Workflow, Scheduling, DAG, Genetic Algorithm, b-level, t-level. INTRODUCTION Cloud computing delivers hardware infrastructure and software applications as services on demand over Internet through virtualization [1, 2]. It adopts a marketoriented business model where users are charged for consuming cloud services such as computing, storage, and network services like conventional utilities in everyday life (e.g. water, electricity, gas, and telephony) [3] on a pay-as-you-go basis. Workflow scheduling is one of the key challenging issues in the cloud workflow systems. It allocates suitable resources to workflow tasks such that the execution can be completed to satisfy objective functions imposed by users [4]. There are two major types of scheduling for grid workflows: best-effort based and QoS constraint based scheduling [5]. These scheduling algorithms attempt to minimise the execution time, ignoring other factors such as the monetary cost of accessing resources. But, as Cloud computing adopts “market-oriented business model”, so there is another important parameter other than the execution time, i.e., cost of accessing resources [6]. Usually, faster resources are more expensive than

Workflow scheduling is classical NP-complete problem. The major grid workflow scheduling algorithms have been classified into two basic categories which are besteffort based scheduling and QoS constraint based scheduling [5]. In traditional community based computing paradigms, best-effort based scheduling strategies based are often applied to only minimize the execution time without considering the monetary cost since resources are shared freely among system users. Many heuristic algorithms such as Minimum Execution Time, Minimum Completion Time, Min-min, and Maxmin are used as candidates for best-effort based scheduling strategies [8]. Many researchers used Genetic Algorithm (GA) for task assignment. The fitness function of GA is developed to encourage the formation of the solutions to achieve the budget and deadline constraint time minimization of workflow execution in grids while meeting a specified budget for delivering results [9]. A Hierarchic Genetic Scheduler [10] is developed for improving the effectiveness of the single population genetic based scheduler in the dynamic grid environment for scheduling independent jobs. The authors considered the bi-objective independent batch job scheduling problem with makespan and flowtime minimized in hierarchical mode. Furthermore, a two phase algorithm, called

H2GS [11] has been proposed for task scheduling in heterogeneous processor networks. The first phase implements a heuristic list based algorithm, called LDCP to generate a high quality schedule. In the second phase, this schedule is injected into the initial population of GA, which proceeds to evolve shorter schedules. In [12], the author embedded the elitism method into GA to generate the shorter schedules as well as to decrease the computation time to find the suboptimal schedule as compared to basic GA. The authors further extend their work by improving sub-optimal results using Simulated Annealing (SA) [13] by considering the multiprocessor systems. For cloud workflow systems, mainly QoSconstraint based scheduling strategies based on marketoriented business model are required. A compromisedtime-cost scheduling algorithm has been proposed to accommodate transaction-intensive [14] and instanceintensive [15] cost-constrained workflows in cloud respectively by compromising execution time and cost with user input enabled on the fly. The algorithm cut down the mean execution cost by over 15% whilst meeting the user-designated deadline or shortens the mean execution time by over 20% within the userdesignated execution cost. A market-oriented hierarchical scheduling strategy [16] has been developed for instance intensive workflow applications, to do the workflow scheduling at two levels in cloud environment. A package based random scheduling algorithm has been presented as the candidate servicelevel scheduling algorithm and three representative metaheuristic based scheduling algorithms including GA, Ant Colony Optimization (ACO) and Particle Swarm optimization(PSO) were adapted, implemented and analyzed as the candidate task-level scheduling algorithms under the different QoS constraints. However, all these work do not consider different pricing model of cloud environment. Saeid et al., [17] proposed two workflow scheduling algorithm for cloud environment: one-phase algorithm, IC-PCP and two-phase algorithm, IC-PCPD2. Both algorithms have a polynomial time complexity for scheduling large workflows under deadline constrained. The author considered different type of pricing model for simulation. So we used Genetic Algorithm to schedule workflow applications to cloud resources by considering the pricing model as specified by Amazon [18]. In this paper, we focus on minimizing the execution cost and time while meeting the user specified budget for delivering the result. WORKFLOW SCHEDULING MODEL A workflow application is modelled by a Directed Acyclic Graph (DAG), defined by a tuple G (T, E), where T is the set of n tasks {t1, t2,......,tn}, and E is a set of e edges, represent the dependencies. Each ti ε T, represents a task in the application and each edge

(ti..........tj) ε E represents a precedence constraint, such that the execution of tj ε T cannot be started before ti ε T finishes its execution [19]. If (ti, tj) ε T, then ti is the parent of tj, and tj is the child of ti. A task with no parent is known as an entry task and a task with no children is known as exit task. Basic Definitions Bottom level (b-level). The b-level of a task is the length of the longest path from the task to a leaf task [13]. The b-level of node is calculated as: blevel(t ) = w +

max {d + blevel(t )},

( )

(1)

where wi is the average execution time of the task on the different computing machines. succ(ti) includes all the children tasks of ti. dij is the data transmission time from a task ti to tj. If a task has no children, its b-level is equal to the average execution time of the task on the different computing machines. Top-level (t-level). The t-level of a task of DAG is defined to be the length of the longest path from the task to the entry task without considering the execution time of that task [13] and is given by the following equation: tlevel (t ) =

max {d + tlevel(t )+w },



( )

(2)

where wi is the average execution time of the task on the different computing machines. pred(ti) includes all the parent tasks of ti. dij is the data transmission time from a task ti to tj. For entry task i.e. a task has no parent, its tlevel is equal to zero. Estimated Completion Time (ECT). The ECT is a n x m matrix where ECTi,j shows the estimated completion time of a task ti on the machine m j. The users are charged based upon the number of time intervals that they have used the particular machine. All computation and storage services of service provider are assumed to be in the same physical region, so the average bandwidth between the different available machines is roughly equal [17]. An Illustrative Example Consider a DAG with 11 tasks as shown in Figure 1. Each edge weight of DAG represents the data transmission time between the tasks.Table I shows the expected completion time of various tasks on three different machines. b-level and t-level of all tasks is calculated using equation (1) and (2) respectively. Then the tasks are sorted in descending order of b-level and in ascending order of t-level to decide the order of execution of all the tasks (Table II).The tasks are sent to different machines according to their order of execution for completion of workflow application. Figure 2 and 3 shows the schedules generated according to b-level and t-level of DAG respectively.

T1

T3

T2

3

5

T4

1

3

Parame ters

AvgEC T

bleve l

tleve l

Order of Executio n accordin g to blevel

Order of Executio n accordin g to tlevel

T1

3

16

0

2

1

T2

2

17

0

1

2

T3

3

14

0

3

3

T4

2

13

0

4

4

T5

2

11

5

5

5

1

Tasks 2 T6

T5

2

5

T7

1

3

3

2

1 T11

T10

T9 T8

Fig. 1: A Sample DAG

Machines Tasks T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11

M1

M2

M3

T6

2

8

6

7

6

3 2 3 2 2 2 2 4 3 2 5

5 3 5 3 3 3 3 6 5 3 7

1 1 1 1 1 1 1 2 1 1 3

T7

2

10

7

6

7

T8

4

4

12

9

10

T9

3

3

11

10

9

T10

2

2

10

11

8

T11

5

5

12

8

11

Table 2: b-level and t-level of DAG

Table 1: ECT Matrix T2

M1:

M2:

T1

M3:

T3

T4

T6

T9

T 10

T11

T5

T8

T7

Fig. 2: Schedule according to b-level

M1:

T1

T4

T8

T7

STANDARD GENETIC ALGORITHM (SGA) Standard Genetic algorithm (SGA) is a specific class of evolutionary algorithms inspired by evolutionary biology. Any solution in the search space of the problem is represented by an individual or chromosome [20]. A genetic algorithm maintains a population of individuals that evolves over generations towards the better solutions through a repetitive application of genetic operators such as crossover, mutation and selection [21]. The quality of an individual in the population is determined by a fitness-function. The fitness value indicates how good the individual is compared to others in the population [22]. For Budget constrained scheduling problem, SGA defines a fitness function as: F(I) =

M2:

M3:

T2

T3

T5

T6

T11

T10

T9

Fig. 3: Schedule according to t-level

()

,

(3)

where c(I) is the total cost of an individual, I and B is the user specified budget for scheduling the workflow application. An individual is fit if the value of F(I) < 1, otherwise the individual is not included into the population.

the children.

The pseudocode for Standard Genetic Algorithm (SGA) is given as: Pseudocode for SGA 1. BEGIN 2. Create an initial population consists of randomly generated solutions. 3. While termination criteria is not met do 4. Evaluate the fitness of the individual in the population using equation (3). 5. Apply the selection operator to select the parent from the population 6. Apply the crossover operator on the selected parent using crossover probability Cr to create the children. 7. Apply the mutation operator with probability Mr on the newly created children. 8. Validate each child according to the fitness function. 9. Add the valid child to create the new population 10. end while. 11. END To increase the population diversity of SGA, we proposed Budget Constrained Priority based Genetic Algorithm.

8.

Apply the mutation operator with probability Mr on the newly created children.

9.

Validate each child according to the fitness function.

10.

Add the valid child to create the new population

11.

end while.

12.

END

PERFORMANCE EVALUATION In this section, we present our simulations of l the proposed algorithm. To evaluate the workflow scheduling algorithm, we used five synthetic workflows based on realistic workflows from diverse scientific applications, which are:     

PROPOSED ALGORITHM We proposed BCHGA, in which bottom level and top level of workflow tasks are used to assign priority to different workflow tasks. The pseudcode of BCHGA is given below Pseudocode for BTGA 1.

BEGIN

2.

Calculate the b-level and t-level of all the tasks of workflow using the equations (1) and (2).

3.

Create the initial population of BCHGA as for all the individuals, firstly, the priority of each task is set equal to the total of its b-level and a random number which is generated in the range of its (t-level/2, -t-level/2). Then all the tasks are assigned to the available machines according to their priority. Each individual is encoded using the 2-d encoding.

Montage: Astronomy CyberShake: Earthquake Epigenomics: Biology LIGO: Gravitational physics SIPHT: Biology

The detailed characterization for each workflow including their structure and data and computational requirements can be found in [23]. Figure 4 shows the approximate structure of a small instance of each workflow. The DAX (Directed Acyclic Graph in XML) format for all these workflows are available at [24], from which we have chosen three sizes for our experiment, i.e., Small (about 50), Medium (about 100 tasks) and large (about 1000 tasks). Experiment setup

4.

While termination criteria is not met do

5.

Evaluate the fitness of the individual in the population using equation (3).

6.

Apply the selection operator to select the parent from the population

7.

Apply the crossover operator on the selected parent using crossover probability Cr to create

For our experiment, we have simulated a cloud environment in java The processor speeds of VM’s are selected randomly in the range of 1000-5000 MIPS and price of using these VM’s is set within a range of 2-10 basic units such that fastest VM is roughly five times more expensive than the slowest one. The average bandwidth between these is roughly equal [17].To evaluate the impact of time interval on our algorithm, we consider long time interval equal to one hour like Amazon [18]. Performance Metrics The performance metric chosen for the comparison is Normalized Schedule Length (NSL). The NSL of a schedule is calculated as: =





,

where Mc is the execution time of the same workflow by executing all the tasks on the fastest VM, according their precedence constraints. For assigning the budget, first we define the cheapest schedule as scheduling each workflow tasks on the fastest VM, according their precedence constraints, considering all data transmission cost as zero. Thus the execution cost of this fastest schedule, denoted by Cf, is a lower bound for the budget of executing workflow. So, the budget for whole workflow is defined as:

5 shows the average NSL of scheduling large workflows with BCHGA, and SGA with time interval equals to one hour. It shows that BCHGA outperforms than the SGA in all cases. It is clear from the Figure 5(a) and 5(e) that average NSL for Montage and Cybershake is high as compared to other workflow structures as both of these workflow structure consist of large number of tasks with smaller execution time on fastest VM at the second row, thus increasing the overall execution cost of the workflow. For scheduling small and medium size workflow, we are getting the similar graphs.

Budget= α * Cf where α is a budget factor in range from 1.5 to 5 . For GA, the following parameters are set: Parameter Initial population Crossover Probability (Cr) Mutation probability (Mr) Maximum Generation

Value 10 0.7

NSL------>

8 6 4

BCHGA

2

SGA

0 1.5

2.5 3.5 4.5 α--------->

0.1 50

(a) Montage

NSL------>

8 6 4

BCHGA

2

SGA

0 1.5

2.5 3.5 4.5 α--------->

(b) Epigemonics

NSL------>

8 6 4

BCHGA

2

SGA

0 Fig. 4: Structure of Various Workflows [23] Experiment Result As GA is a stochastic algorithm, so each algorithm is run for 10 times for each workflow and average value of NSL is used for comparing BCHGA, and SGA. Figure

1.5

2.5

3.5

4.5

α-----------> (c) SIPHT

NSL------>

8

REFERENCES

6

[1] Foster I., Zhao Y., Raicu L., and Lu S. (2008): ‘Cloud computing and grid computing 360-degree compared’, in Proceeding of Grid Computing Environment Workshop, Austin, pp.1-10.

4

BCHGA

2

SGA [2] Verma A., and Kaushal S. (2011): ‘Cloud computing security issues and challenges: a survey’, in Proceeding of 1st International Conference on Advances in Computing and Communications, PartIV, Kochi, India, July 22-24. Series Title: Communications in Computer and Information Sciences, Vol.193, Springer, pp. 445-454.

0 1.5 2 2.5 3 3.5 4 4.5 5 α----------->

NSL------>

(d) LIGO

[3] Gabriel M., Wolfgang G., and Calvin J. R. (2011): ‘Hybrid computing—where hpc meets grid and cloud computing’, Journal of Future Generation Computer Systems, Vol.27, No.5, May, pp.440-453.

8 7 6 5 4 3 2 1 0

BCHGA SGA 1.5 2 2.5 3 3.5 4 4.5 5 α--------> (e) CyberShake

Fig 5: The NSL of scheduling workflows with BCHGA, and SGA with the time interval equals to 1 hour. CONCLUSION AND FUTURE WORK In this paper, we have presented Budget Constrained Priority based Genetic Algorithms, BCHGA to schedule applications to cloud resources that minimizes the execution cost while meeting the Budget for delivering the result. Each workflow’s task is assigned priority using bottom level and top level. These priorities are then used to create the initial population of BCHGA. The proposed algorithm is evaluated with synthetic workflows that are based on realistic workflows with different structures and different sizes. These algorithms also consider the pay-as-you-use pricing model of current commercial cloud like Amazon. The comparison of proposed algorithms is done with SGA under same budget constraint and pricing model. The simulation results show that our proposed algorithms have promising performance as compared to SGA. In future we intend to improve our work for the real cloud environment including other QoS constraints and comparison can be made with other meta-heuristic techniques like PSO and ACO etc.

[4] Taylor I., Deelman E., Gannon D., and Shields M. (2007): ‘Workflows for E-Science: Scientific Workflows for Grid’, 1st Edition, Springer. [5] Yu J. and Buyya R. (2008): ‘Workflow scheduling algorithms for grid computing’, In: Xhafa F, Abraham A (eds) Metaheuristics for scheduling in distributed computing environment, Springer, Berlin. ISBN: 978-3-540-69260-7. [6] Pandey S. (2010): ‘Scheduling And Management Of Data Intensive Application Workflows In Grid And Cloud Computing Environment’, PhD Thesis, University of Melbourne, Australia. [7] Pandey S., Karunamoorthy D., and Buyya R. (2011): ‘Workflow engine for clouds’, Cloud Computing: Principles and Paradigms”, Chapter 12, Weliy STM. [8] Yu J. and Buyya R. (2005): ‘Cost based scheduling of scientific workflow application on utility grid’, in Proceeding of 1st IEEE International Conference on e-Science and Grid Computing, Melbourne, Australia, pp.8p-147. [9] Florin P., Ciprian D., and Valentin C. (2009): ‘Genetic algorithm for DAG scheduling in grid environments’, in IEEE ICCP 2009: Proceeding of International Conference on Intelligent Computer Communications and Processing, Cluj-Napoca, pp. 299-305. [10] Joanna K., and Samee U. K. (2012): ‘Multi-level hierarchic genetic-based scheduling of independent jobs in dynamic heterogeneous grid environment’, Journal of Information Sciences, Vol. 214, No.12, December , pp. 1-19.

[11] Mohammad I.D., and Nawwaf K. (2011): ‘A hybrid heuristic-genetic algorithm for task scheduling in heterogeneous processor networks’, Journal of Parallel and Distributed Computing, Vol. 71, No.11, November , pp. 1518-1531. [12] Amir M. R., and Mohammad A. V. (2008): ‘A novel task scheduling in multiprocessor systems with genetic algorithm by using elitism stepping method’, INFOCOMP– Journal of Computer Science, Vol. 7, No. 2, pp.58-64. [13] Amir M. R., and Mohammad A. V. (2009): ‘A novel genetic algorithm for static task scheduling in distributed systems’, Journal of Computer Theory and Engineering, Vol. 1, No. 1, pp: 1-6, ISSN 17931801. [14] Yang Y., Liu K., Chen J., Liu X., Yuan D., and Jin H. (2008): ‘An algorithm in swindew-c for scheduling transaction-intensive cost-constrained cloud workflows’, in e-Science08: Proceeding of 4th IEEE international conference on e-science, Indianapolis, USA, pp: 374–375. [15] Ke L., Hai J., Jinjun C., Xiao L., Dong Y., and Yun Y. (2010): ‘A compromised-time-cost scheduling algorithm in SwinDeW-C for instance-intensive cost-constrained workflows on cloud computing platform’, International Journal of High Performance Computing Applications, Vol. 24, No. 4, pp: 445-456. [16] Zhangjun W., Xiao L., Zhiwei N., Dong Y., and Yun Y. (2013): ‘A market-oriented hierarchical scheduling strategy in cloud workflow systems’, Journal of Supercomputing, Vol. 63, No. 1, January, pp.256-293, DOI 10.1007/s11227-011-0578-4, [17] Saeid A., Mahmoud N., and Dick H.J.E. (2013): ‘Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds’,

Journal of Future Generation Computer Systems, Vol. 29, No.1, January, pp.158-169. [18] Amazon.com (2012): ‘Amazon Elastic Compute Cloud [online]’. Available: http://aws.amazon.com/ec2/. [19] Verma A., and Kaushal S. (2012): ‘Deadline and budget distribution based cost-time optimization workflow scheduling algorithm for cloud’, in IJCA Proceeding of International Conference on Recent Advances and Future Trends in IT, Patiala, India, April, pp.1-4. [20] Yu J. and Buyya R. (2006): ‘A budget constraint scheduling of workflow application on utility grid using genetic algorithm”, in HPDC 2006: Proceeding of 15th IEEE International Symposium on High Performance Distributed Computing, Paris, pp. 1-10. [21] Yu J. and Buyya R. (2006): ‘Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms’, Scientific Programming Journal, Vol. 14, No.3, December, pp: 217–230. [22] Kumar P., and Verma A. (2012): ‘Scheduling using improved genetic algorithm in cloud computing for independent tasks’, in Proceeding of International Conference on Advances in Computing, Communications and Informatics, Chennai, India, August 3-5, pp. 137-142. [23] Bharathi S., Lanitchi A., Deelman E., Mehta G., Su M.H., and Vahi K. (2008): ‘Characterization of scientific workflows’, 3rd workshop on Workflows in Support of Large Scale Science, CA, USA, pp: 110. [24] Peagasus Work flow Generator ( 2012): Available :http://confluenece.peagasus.isi.edu/display.peagasu s/WorkflowGenerator