A hybrid multi-objective Particle Swarm Optimization

0 downloads 0 Views 2MB Size Report
Jan 25, 2017 - that gives Budget and Deadline Constrained (BDC) plan to check ... while considering the confirmed resource reservations from the other users. ... to the heterogeneous computing systems like utility grids, where the ...... [16] J. Yu , R. Buyya , Taxonomy of workflow management systems for grid computing, ...
Parallel Computing 62 (2017) 1–19

Contents lists available at ScienceDirect

Parallel Computing journal homepage: www.elsevier.com/locate/parco

A hybrid multi-objective Particle Swarm Optimization for scientific workflow scheduling Amandeep Verma∗, Sakshi Kaushal University Institute of Engineering & Technology, Panjab University, Chandigarh, India

a r t i c l e

i n f o

Article history: Received 28 August 2015 Revised 26 October 2016 Accepted 23 January 2017 Available online 25 January 2017 Keywords: Cloud computing Scientific workflows Particle Swarm Optimization Scheduling Multi-objective Optimization IaaS cloud

a b s t r a c t Now-a-days, Cloud computing is a technology which eludes provision cost while providing scalability and elasticity to accessible resources on a pay-per-use basis. To satisfy the increasing demand of the computing power to execute large scale scientific workflow applications, workflow scheduling is the main challenging issue in Infrastructure-as-a-Service (IaaS) clouds. As workflow scheduling belongs to NP-complete problem, so, meta-heuristic approaches are more preferred option. Users often specified deadline and budget constraint for scheduling these workflow applications over cloud resources. But these constraints are in conflict with each other, i.e., the cheaper resources are slow as compared to the expensive resources. Most of the existing studies try to optimize only one of the objectives, i.e., either time minimization or cost minimization under user specified Quality of Service (QoS) constraints. But due to the complexity of workflows and dynamic nature of cloud, a trade-off solution is required to make a balance between execution time and processing cost. To address these issues, this paper presents a non-dominance sort based Hybrid Particle Swarm Optimization (HPSO) algorithm to handle the workflow scheduling problem with multiple conflicting objective functions on IaaS clouds. The proposed algorithm is a hybrid of our previously proposed Budget and Deadline constrained Heterogeneous Earliest Finish Time (BDHEFT) algorithm and multi-objective PSO. The HPSO heuristic tries to optimize two conflicting objectives, namely, makespan and cost under the deadline and budget constraints. Along with these two conflicting objectives, energy consumed of created workflow schedule is also minimized. The proposed algorithm gives a set of Pareto Optimal solutions from which the user can choose the best solution. The performance of proposed heuristic is compared with state-of-art multi-objective meta-heuristics like NSGA-II, MOPSO, and ε -FDPSO. The simulation analysis substantiates that the solutions obtained with proposed heuristic deliver better convergence and uniform spacing among the solutions as compared to others. Hence it is applicable to solve a wide class of multiobjective optimization problems for scheduling scientific workflows over IaaS clouds. © 2017 Elsevier B.V. All rights reserved.

1. Introduction Cloud computing is a booming area in distributing computing that delivers dynamically scalable services on demand over Internet through virtualization of hardware and software [1]. It is based on a market-oriented business paradigm where users can consume these services based upon Service Level Agreement (SLA) and charged on a pay-as-you-go basis like ∗

Corresponding author. E-mail addresses: [email protected] (A. Verma), [email protected] (S. Kaushal).

http://dx.doi.org/10.1016/j.parco.2017.01.002 0167-8191/© 2017 Elsevier B.V. All rights reserved.

2

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

conventional utilities [2]. The main advantages of cloud are its scalability and flexibility where the user can lease and release resources/services as per the need [3]. Moreover, the cloud provider offers two resource provisioning plans, namely, shortterm on demand and long term reservation plans. In on-demand plan, the user can dynamically provision the resources at the moment when the resources are needed to fit the fluctuated and unpredictable demands. With the reservation plan, the user a priori reserves the resources in advance [4]. Amazon EC2 [5] and Go Grid [6], cloud providers offer services with both plans. Scheduling of workflows requires massive computation and communication cost. These workflows, especially those related to scientific areas such as astronomy and biology, present a strong case for the usage of the cloud for their execution. Workflow scheduling is a process of mapping inter-dependent tasks on the available resources such that workflow application is able to complete its execution within the user’s specified QoS constraints such as deadline and budget [7]. Initially, in case of grid workflows, the scheduling algorithms attempt to minimise the execution time without considering the cost of accessing resources. But, in case of cloud computing, service provider provides resources of different capabilities at different prices. Normally, faster resources are more expensive than the slower one. Thus, different scheduling plans of same workflow using different resources may result in different makespan and different monetary cost. Therefore, workflow scheduling problem in cloud, requires both time and cost constraints to be satisfied as specified by the user [8]. Time constraints ensure that the workflow is executed within the given deadline and the cost ensures that the budget specified by the user is not overshot. A good heuristic tries to balance both these values and still obtain a near optimal solution [9]. In recent years, to achieve better solution quality, most researchers focused on developing nature inspired meta-heuristic algorithms like Simulated Annealing (SA) [10], Genetic Algorithm (GA) [11], Ant Colony Optimization (ACO) [12], and Particle Swarm Optimization (PSO) [13] to solve multi-objective workflow scheduling problem considering minimizing makespan and minimizing execution cost as two main conflicting objectives; without paying much attention to energy consumed of created schedule plan. This paper presents the use of multi-objective optimization approach to generate Pareto optimal solutions for cloud workflow applications. In this paper, we proposed the multi-objective Hybrid Particle Swarm Optimization (HPSO) algorithm based upon non-dominance sorting procedure to solve the cloud workflow scheduling problem. Our new approach is a hybrid of a multi-objective PSO algorithm and our previously proposed BDHEFT algorithm [14]. BDHEFT algorithm is giving a trade-off schedule plan between cost and makespan depending upon the user preference. A multiobjective PSO algorithm generates a great range of potential solutions for multi-objective workflow scheduling problem, so the combination of a meta-heuristic with BDHEFT algorithm leads to more efficient behavior of HPSO algorithm to schedule workflow applications over IaaS cloud. We considered the two conflicting objectives, i.e., makespan and total cost, along with the energy consumed by the created schedule under deadline and budget constraints. To the best of our knowledge, none of the previous multi-objective workflow scheduling approaches considered the energy consumption along with these two conflicting objectives, i.e., all these three objectives at a same time under deadline and budget constraints. The simulation analysis substantiates that the solutions obtained with proposed heuristic deliver better convergence and uniform spacing among the solutions as compared to others. The remaining paper is organized as follow: Section 2 presents the related work done on multi-objective optimization for workflow scheduling. The problem description is presented in Section 3. In Section 4, we briefly introduced the approach of multi-objective optimization. Section 5 explains the proposed hybrid multi-objective PSO. Section 6 discusses the simulation strategy and result analysis. Finally, Section 7 concludes the paper.

2. Related work Scheduling of workflows is an NP-complete problem [15]. Many heuristic algorithms such as Minimum Completion Time, Sufferage, Min-min, and Max-min are used as candidates for best-effort based scheduling strategies [16]. List scheduling has been very popular method for workflow task scheduling. In list scheduling, the priority is assigned to the workflow tasks and a task with higher priority is scheduled before a task with lower priority. There are different list based heuristic algorithms in literature like Dynamic Critical Path(DCP) [17], Dynamic Level Scheduling (DLS) [18], Critical Path on Processor (CPOP) [19], Heterogeneous Earliest Finish Time (HEFT) [19], etc. All of these heuristics minimized the makespan without considering the monetary cost of executing the workflow tasks. So these methods are mainly suitable for grid environment. There are also many scheduling heuristics in literature that are derived from list scheduling algorithms for workflow scheduling. Zheng and Sakellariou [20] proposed two scheduling heuristics LOSS and GAIN (based upon HEFT) for grid workflows that either tried to optimized time or cost, to meet the user’s specified budget. So at a time, only one of the objectives, i.e., either time or cost is optimized. Cost and deadline constrained workflow scheduling in IaaS clouds was discussed in [21]. In this case, the resource model considered in the proposed algorithms consists of homogeneous resources. Saeid et al., [22] proposed two workflow scheduling algorithms for cloud environment: one-phase algorithm, IC-PCP and two-phase algorithm, IC-PCPD2. Both algorithms have a polynomial time complexity for scheduling large workflows and minimized the cost of workflow execution under deadline constrained. The authors considered different types of pricing models for simulation. Similarly, Bossche et al., [23] proposed a set of algorithms to schedule the deadline constrained bag of tasks applications on hybrid clouds to minimize the execution cost, i.e., only one of the objectives is minimized. Also this work is only suitable for application consisting of number of independent tasks. In our previous work, we had proposed deadline and budget constrained heuristic based Genetic Algorithms to schedule workflow tasks over the cloud resources

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

3

that minimize either computation cost or makespan at a time [24,25,26]. The major key issue with all these heuristics and meta-heuristic techniques is that majority of them is mono-objective optimization techniques. Few researches had considered bi-objective (time and cost mainly) criteria to schedule workflow tasks in distributed environment. Zhou et al., [27] presented Multi-Objective Evolutionary Algorithms (MOEAs) to solve multi-objective scheduling optimization problems in grid. The MOEA approach produced Pareto optimal set of solutions. A multi-objective scheduling algorithm, using R-NSGA-II approach [28] has been proposed to generate Pareto optimal solutions. This scheme considered three conflicting objectives like execution time, total cost and reliability and produced solutions near to the Pareto optimal front within a small time. A Multi-Objective List Scheduling (MOLS) algorithm [29] was discussed to find a dominant solution by using Pareto relation for heterogeneous environment. There are also bi-criteria scheduling heuristics derived from list scheduling algorithms for workflow scheduling. To achieve a trade-off between execution time and reliability, Bi-objective Dynamic Level Scheduling algorithm (BDLS) [30] was proposed that can be used for producing task assignment where the execution time was weighted against the failure reliability. Su et al., [31] considered the time and cost and used the concept of Pareto dominance to execute large programs in the cloud to reduce the monetary cost without considering the budget and deadline constraints of user. A ε -Fuzzy Dominance sort based discrete Particle Swarm Optimization (ε -FDPSO) approach has been proposed to solve the multi-objective workflow scheduling problem in the grid [32]. The authors used a fuzzy based mechanism to generate the Pareto optimal solutions and optimized makespan, cost and reliability objectives simultaneously in order to incorporate the dynamic characteristics of grid resources. Similarly, multi-objective HFET [33] has been proposed to schedule workflows over Amazon EC2. The authors considered minimizing cost and time as two conflicting objectives and created the Pareto optimal schedules from which the user is able to select the best solution manually. These authors further extended their work [34] to generate a set of tradeoff optimal solutions in terms of makespan and energy efficiency. Juan et al., [35] extended HEFT workflow scheduling heuristic for dealing with multiple conflicting objectives, i.e., time and cost and approximating the cost tradeoff Pareto frontier optimal schedules for federated cloud environment. Zheng and Sakellariou [36] presented budget and deadline constrained, BHEFT, which is the extension of HEFT algorithm that gives Budget and Deadline Constrained (BDC) plan to check whether a workflow request should be accepted or not while considering the confirmed resource reservations from the other users. During creation of a BDC plan, the authors considered the spare budget for each task of workflow while selecting the resource. However, this heuristic is only applicable to the heterogeneous computing systems like utility grids, where the number of resources is fixed and if a user reserved a resource, then no one other user is able to execute its tasks on that resources for time. To overcome this problem, in our previous work, we introduced a novel heuristic, Budget and Deadline constrained Heterogeneous Earliest Finish Time (BDHEFT) [14] that generates a BDC schedule plan by considering the spare deadline along with spare budget while selecting the suitable resource for each workflow’s task in cloud environment. The BDHEFT algorithm proved to gives a BDC schedule plan which has significant reduced execution cost as compared to the BDC plan created by state-of-art scheduling heuristic under the same deadline and budget constraints. From the review of literature, it has been found that majority of these multi-objective heuristics are only suitable for utility grid model. Very few works is done to solve multi-objective workflow scheduling problem in a cloud environment. Even, most of the existing studies that try to solve multi-objective workflow scheduling problem, considered mminimizing makespan and minimizing execution cost as two main conflicting objectives; without paying much attention to energy consumed of created schedule plan. This paper presented the use of multi-objective optimization approach to generate Pareto optimal solutions for cloud workflow applications. We proposed the multi-objective Hybrid Particle Swarm Optimization (HPSO) algorithm based upon non-dominance sorting procedure to solve the cloud workflow scheduling problem

3. System model and assumptions This section presents the model used and assumptions made for simulation study.

3.1. Application model and cloud model A workflow application is modelled by a Directed Acyclic Graph (DAG), defined by a tuple G (T, E), where T is the set of n tasks {t1 , t2 , . . . , tn }, and E is a set of e edges, represent the dependencies. Each ti ∈ T, represents a task in the application and each edge (ti.......... tj ) ∈ E represents a precedence constraint, such that the execution of tj ∈ T cannot be started before ti ∈ T finishes its execution [25]. If (ti , tj ) ∈ T, then ti is the parent of tj , and tj is the child of ti . A task with no parent is known as an entry task and a task with no children is known as exit task. The task size (Zi ) is expressed in Millions of Instructions (MI). Our cloud model consists of a service provider which offers, m, computational resources, R = {r1 , r2 , . . . , rm } at different processing power and different prices. It is assumed that any resource from the set, R is able to execute all the tasks of a workflow. The processing power of a resource, rp ∈ R, is expressed as Millions of Instruction per Second (MIPS) and is denoted by P Pr p . The pricing model is based upon pay-as-you-go basis similar to the current commercial clouds, i.e., the users are charged based upon the number of time intervals that they have used the resources, even if they have not completely used the last time interval.

4

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

Each task can be executed on different resources. The execution time, ET(i,p) , of a task ti on a resource, rp is calculated by the following equation:

E T(i, p) =

Zi P Pr p

(1)

and the execution cost EC(i,p) is given by:

EC(i, p) =

μ p ∗ E T(i, p)

(2)

where μp is the price unit of using resource rp for each time interval. Moreover, all the computation resources of a service provider are assumed to be in same physical region, so data storage cost and data transmission costs are assumed to be zero and the average bandwidth between these resources is assumed to be roughly equal. Only, time to transmit data between two dependent tasks (ct), which are mapped to different resources, is considered during experiment. Let EST (ti , rp ) and EFT (ti , rp ) denote the earliest Execution Start Time and the Earliest Finish Time of a task ti on a resource rp , respectively. For the entry task, we have:

EST (tentry , r p ) = avail (r p )

(3)

For the other tasks in DAG, we compute EST and EFT recursively as follows:



EST (ti , r p ) = max

max

avail (rp )   AF T t j + ct

(4)

t j ε pred (ti )

E F T (ti , r p ) = E T(i, p) + E ST (ti , r p )

(5)

where pred(ti ) is the set of parent tasks of task ti , and avail(rp ) is the time when the resource rp is ready for task execution. AST(ti ,rp ) and AFT(ti, rp ) denote the Actual Start Time and Actual Finish Time of task ti on resource rp , respectively. These may be different from task’s earliest start time EST(ti ,rp ) and finish time EFT(ti ,rp ). The makespan is equal to the maximum of actual finish time of the exit tasks texit and is defined by

M = max{AF T (texit )}

(6)

The makespan is also referred to as the running time for the entire DAG. The energy model used in this study is derived from the capacitive power (Pc ) of Complementary Metal-Oxide Semiconductor (CMOS)-based logic circuits [37] which is given by:

Pc = ACV

2

f

(7)

where A is the number of switches per clock cycle, C is the total capacitance load, V is the supply voltage, and f is the frequency. Eq. (7) clearly indicates that the supply voltage is the dominant factor; therefore, its reduction would be most influential to lower power consumption. The energy consumed by executing workflow tasks over available resources defined as [37]

E=

n 

ACV 2 f . E T(i, p) =

i=1

n 

αV 2i E T(i, p)

(8)

i=1

where Vi is the supply voltage of the processor on which task ni is executed, and E T(i,p) is the execution time of task ti on the scheduled resource rp . 4. Workflow scheduling based on Particle Swarm Optimization The first part of this section presents a brief overview on multi-objective combinatorial optimization and Particle Swarm Optimization algorithms. 4.1. Multi-objective optimization A Multi-objective Optimization Problem (MOP) [38] with m decision variables and n objectives can be formally defined as:

Min (y = f (x ) = [ f1 (x ), . . . , fn (x )]) where x = (x1 , . . . , xm ) ∈ X is an m-dimensional decision vector, X is the search space, y = (y1 , . . . , yn ) ∈ Y is the objective vector and Y the objective-space. In general MOP, there is no single optimal solution with regards to all objectives. In such problems, the desired solution is considered to be the set of potential solutions which are optimal for one or more objectives. This set is known as the Pareto optimal set. Some of the Pareto concepts used in MOP are as follows:

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

5

(i) Pareto dominance. For two decision vectors x1 and x2 , dominance (denoted by ≺) is defined as follows:

x1 ≺ x2 ⇐⇒ ∀i fi (x1 ) ≤ fi (x2 ) ∧ ∃ j (x1 ) < f j (x2 ) The decision vector x1 is said to dominate x2 if and only if, x1 is as good as x2 for all the objectives and x1 is strictly superior to x2 in at least one objective. (ii) Pareto optimal set. The Pareto optimal set PS is the set of all Pareto optimal decision vectors.

Ps = {x1 ∈ X, | ∃ x2 ∈ X, x2 ≺ x1} where the decision vector, x1 , is said to be Pareto optimal when it is not dominated by any other decision vectors, x2 , in the set. (iii) Pareto optimal front. The Pareto optimal front PF is the image of the Pareto optimal set in the objective space.

PF = { f (x ) = ( f1 (x ), . . . , fn (x ))| x ∈ Ps } 4.2. Particle Swarm Optimization Particle Swarm Optimization (PSO) is a stochastic optimization technique that operates on the principle of the social behavior of swarms of birds or the schools of fish [39]. In this technique, a swarm of individuals, known as the particles, flow through the swarm space. Each particle represents a candidate solution to the given problem. Each particle is associated with two parameters, namely, current position, xi and current velocity, vi . The position of a particle is influenced by the best position visited by it, i.e., its own experience (pbest). Along with pbest, the second parameter that influences the position is the position of the best particle in its neighborhood, i.e., the experience of neighboring particles (gbest). The performance of each particle is measured using a fitness function that varies depending on the optimization problem. During each PSO iteration k, particle i updates its velocity vki and position vector xki as described below [39]: (a) Updating Velocity Vector

    vk+1 = ωvki + c1 rand1 ∗ pbesti − xki + c2 rand2 ∗ gbest − xki i

(9)

where ω: inertia weight; c1 : cognitive coefficient based on particle’s own experience; c2 : social coefficient based on the swarms experience; rand1 , rand2 : Random variables with values between (0,1) The inertia weight, ω, controls the momentum of the particle. Improvement in performance is obtained by decreasing the value of ω linearly from its maximum value, ω1, to its minimum value, ω2 [40]. At iteration k, its value, ωk is obtained as:

ωk = ( ω1 − ω2 )

max_k − k + ω2 max_k

(10)

Similarly, if c1 decreases from its maximum value, c1max, to its minimum value, c1min, then more divergence among the particles in the search space can be achieved, while if c2 increases from its minimum value, c2min, to its maximum value, c2max , then the particles are much closer to the present gbest. The following equations are used to find the values of c1i and c2i at iteration k:

c1k = (c1min − c1max )

k + c1max max_k

(11)

c2k = (c2max − c2min )

k + c2min max_k

(12)

where max_k is the maximum number of iterations and k is the iteration number. (b) Updating Position Vector 1 xk+ = xki + vki i

where

xki :

position of the particle at

(13) kth

iteration;

vki :

velocity of the particle at

kth

iteration

(c) Fitness Function The fitness function used in proposed HPSO is as described in Eq. (14):

F itness =

α ∗ T ime + (1 − α ) ∗ Cost

(14)

where Time is the total execution time of a generated workflow schedule and is given by Eq. (15); Cost is the total execution cost of a generated workflow schedule and is calculated using Eq. (16); and α is the cost-time balance factor in a range of [0,1] which represents the user preference for execution time and execution cost.

T ime = max{AF T (texit )} Cost =

n i=1

ECi, j

The next section describes the proposed algorithm based upon multi-objective PSO and BDHEFT algorithm.

(15) (16)

6

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

5. Proposed work In order to solve the multi-objective workflow scheduling problem, we have proposed the multi-objective Hybrid Particle Swarm Optimization (HPSO) algorithm based upon dominance sorting procedure to solve the cloud workflow scheduling problem. The proposed algorithm is hybrid of Multi-Objective Particle Swarm Optimization (MOPSO) algorithm and BDHEFT algorithm. The main operators used in this algorithm are explained below: 5.1. Updating external archive In multi-objective algorithms, elite archive is [41] used to store the non-dominated particles found along the search process. After the evaluation of objective functions, each particle is checked for its dominance with other members of the population. Then, the solutions from the current generation and the solutions from the archive of previous generations are combined together to make 2 N solutions, where N is the size of archive. After this, these 2 N solutions are sorted in ascending order of their dominance. If multiple solutions are having the same value of dominance, then perimeter, I (.) [32] is assigned to each such solution and the solution having higher value of I (.) is preferred. From these 2 N solutions sorted on the basis of dominance and perimeter, the best N solutions are selected to update the archive. 5.2. Perimeter assignment When multiple solutions are having the same dominance value, then we use the diversity perimeter, I(y) whose value for any solution y is given by [32]:

I (y ) =

M  i=1

f i (x ) − f i (z ) max( fi ) − min( fi )

(17)

where x and z are adjacent solutions to y, after sorting the combined in ascending order according to ith objective. The solutions at boundary are allocated infinite value. Solution with higher value of I(y) is preferred because it indicates the region of sparseness along solution v, which ultimately maintains the diversity of the solutions. All N solutions are sorted corresponding to M objective functions. Therefore, perimeter assignment has O (MN log N) complexity. 5.3. Updating particles memory (pbest and gbest) The gbest solution is selected from the solutions of the current archive which is sorted on the basis of non-dominance and perimeter using binary tournament selection. For the pbest, we compare particle current position with the best position of particle from the previous generation. The non-dominating solution is assigned as the current pbest. If the solutions are mutually non-dominating to each other, then the current position of particle is selected as current pbest. 5.4. Adaptive mutation The use of mutation operator is needed in multi-objective PSO [32] to avoid getting stuck into local minima and to efficiently explore the search space. We have applied replacement mutation in HPSO algorithm. The mutation probability, P (Mutation) in HPSO algorithm is calculated using the following equation:

P (Mutation ) = 1 −

k max_k

(18)

where k is the current iteration and max_k is the maximum number of iterations taken. For every particle a random number (rand) in range (0, 1) is taken. If rand < P (Mutation), then randomly a task is selected from the particle for mutation. HPSO algorithm is executed for the following test suits: Bi-objective workflow scheduling problem: For bi-objective cloud workflow scheduling problem, we considered two conflicting objectives, i.e., minimization of execution time, Time and minimization of total cost, Cost, of the created schedule. Therefore cloud workflow scheduling problem is formulated as:

Min(T ime, Cost ) Subjectto T ime < D andCost < B

(19)

where B is the cost constraint (Budget) and D is the time constraint (Deadline) required by users for workflow execution. Tri-objective workflow scheduling problem: For tri-objective workflow scheduling problem, along with two conflicting objective, i.e., minimization of execution time, Time and minimization of total cost, Cost, we have considered the energy consumed by the created schedule, E that should be as low as possible. Therefore the workflow scheduling problem can be formulated as a mathematical optimization problem:

Min(T ime, Cost, E ) Subject to T ime < D and Cost < B

(20)

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

7

Fig. 5.1. HPSO algorithm.

The main steps followed in HPSO algorithm are described in Fig. 5.1. The fitness function used for bi-objective workflow scheduling problem as well as for tri-objective workflow scheduling problem is same, i.e., Eq. (14) presented in Section 4.2. During each HPSO iteration, for bi-objective problem, the created schedule is considered as feasible iff Cost < Budget, B and Time < Deadline, D, otherwise, the schedule is considered as infeasible. Then non-dominance among all created feasible solutions is compared based upon time and cost and archive is updated accordingly. Similarly, for tri-objective problem, during each HPSO iteration, the created schedule is considered as feasible iff Cost < Budget, B and Time < Deadline, D, otherwise, the schedule is considered as infeasible. Afterwards, the energy consumed by the feasible schedule is calculated according to Eq. (8) presented in Section 3.1. Non-dominance among all created feasible solutions is compared based upon time, cost, and energy consumed and archive is updated accordingly.

6. Performance evaluation In this section, the simulation of the proposed heuristic, HPSO is presented. To evaluate the proposed workflow scheduling algorithm, we used five synthetic workflows based on realistic workflows from diverse scientific applications, which are: • • • • •

Montage: Astronomy EpiGenomics: Biology CyberShake: Earthquake LIGO: Gravitational physics SIPHT: Biology

The detailed characterization for each workflow including their structure, data and computational requirements can be found in [42]. Fig. 6.1 shows the approximate structure of each workflow.

8

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

Fig. 6.1. Structure of various workflows [42]. Table 6.1 Voltage–relative speed pairs. Level

0 1 2 3 4 5 6

Pair 1

Pair 2

Pair 3

Voltage (vi )

Relative Speed (%)

Voltage (vi )

Relative Speed (%)

Voltage (vi )

Relative Speed (%)

1.5 1.4 1.3 1.2 1.1 1.0 0.9

100 90 80 70 60 50 40

2.2 1.9 1.6 1.3 1.0 – –

100 85 65 50 35 – –

1.75 1.4 1.2 0.9 – – –

100 80 60 40 – – –

6.1. Experimental setup For simulation, we assume a cloud environment consisting of a service provider, which offers 20 different computation resources with different processing speed and hence with different prices. For this study, we have used the CloudSim [43] library. The existing CloudSim simulator allows scheduling of independent tasks in cloud environment. It is not suitable for workflow scheduling as workflow consists of multiple dependent tasks. So, the core framework of CloudSim simulator is extended to handle workflow scheduling. One of the crucial changes is to read DAX files generated by Pegasus [44] of the workflow structures and to extract the required parameters like run time, input file size, output file size and task dependencies. During simulation, we have assumed that this run time is achieved by executing the task over the fastest resource available on the cloud.The processor speeds of different resources are selected randomly in the range of 10 0 0–10,0 0 0 MIPS such that fastest resource is roughly ten times faster than the slowest one as well as ten times more expensive. Each resource is Dynamic Voltage Scaling (DVS) enabled; in other words, it can operate with different Voltage Scaling Levels (VSLs), i.e., at different clock frequencies. For each resource, a set Vj of v VSLs is random and uniformly distributed among three different sets of VSLs (Table 6.1), with assumption that when resources are busy then they are operating at maximum voltage scaling level and during idle state, their voltage level drops to the minimum scale. For calculating the energy of created schedule, we had used the Eq. (8) as described in Section 3.1. The average bandwidth between these resources is set equal to 20 Mbps. We are using the pricing model similar to Amazon. The values for deadline D, and Budget B are generated as:

Deadline D = LBD + k1 ∗ (U BD − LBD ) where LBD = MHEFT (makespan of HEFT), UBD = 3 ∗ MHEFT [36] and k1 is a deadline ratio in range from 0 to 1. Budget B = LCB + k2 ∗ (U CB − −LCB ), where LCB is the lowest cost obtained by mapping each task to the cheapest service and UCB is the highest cost obtained conversely and k2 is a budget ratio in range from 0 to 1.

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

9

6.2. Performance metrics The analysis of the proposed algorithm has been done with existing state-of-art algorithms using the following performance metrics for different values of deadline and budget. (a) Generational Distance (GD): GD [45] is the well known convergence metric to evaluate the quality of an algorithm against the true front P∗ . The true front P∗ has been obtained by merging solutions of algorithms over 20 runs. Mathematically, GD is given by Eq. (21):

 GD =

|Q |

d2 i=1 i

1 / 2 (21)

|Q |

where di is the Euclidean distance between the solution of Q and the nearest solution of P∗ . Q is the front obtained from algorithms for which GD metric is calculated. (b) Spacing: Spacing metric [45] is used to evaluate diversity among the solutions. It is given by Eq. (22):

S pacing =

|Q | 2 1  di − d¯ |Q |

(22)

i=1

where di is the distance between the solution and its nearest solution of Q and it is different from Euclidean distance and d¯ is the mean value of the distance measures di . The small value of both GD and Spacing metric is desirable for an evolutionary algorithm.

6.3. Simulation results This section presents simulation results and analysis of our proposed Multi-objective HPSO algorithm. At present, the most popular techniques to solve MOP are Non-dominated Sort Genetic Algorithm (NSGA-II) [45], MOPSO [41], and ε -FDPSO [32]. To measure the effectiveness of proposed HPSO algorithm, all these algorithms have been designed and simulated for multi-objective workflow scheduling problem for cloud environment. For implementing the NSGA-II, we have taken binary tournament selection, one-point crossover and replacing mutation. We have assumed parameters used in ε -FDPSO, MOPSO, and HPSO algorithms to be: population size = 20, c1 = 2.5 → 0.5 and c2 = 0.5 → 2.5, inertia weight ω = 0.9 → 0.1 and for NSGAII population size is 20, crossover rate is 0.8, and mutation rate is 0.5. The performance of scheduling algorithms is evaluated considering different workflow applications, e.g., Montage, CyberShake, EpiGenomics, SIPHT and LIGO for bi-objective and tri-objective test suits. For bi-objective workflow scheduling problems, we considered the two conflicting objectives, i.e., the makespan and cost of the schedules and for tri-objective workflow scheduling problems, along with these conflicting objectives, the energy consumed of the created schedule is also considered. To obtain the Pareto optimal solutions with ε -FDPSO, MOPSO, NSGA-II, and HPSO algorithms, 10 samples have been captured through simulation and each algorithm is iterated 100 times. Fig. 6.2 to Fig. 6.6 shows the bi-objectives and tri-objectives non-dominated solutions for Montage, CyberShake, EpiGenomics, LIGO, and SIPHT workflows, respectively. The x-axis represents the execution time of created schedule for respective workflow structure, y-axis represents the execution cost of created schedule for respective workflow structure, and z-axis represents the energy consumed by created schedule for respective workflow structure. The results are analyzed using two metrics, i.e., GD and Spacing. Table 6.2 to Table 6.6 present the comparison result among all the four algorithms on the basis of GD and Spacing metrics for Montage, CyberShake, EpiGenomics, LIGO, and SIPHT workflows, respectively. The results are obtained by taking the average and standard deviation of 20 simulations as described in following sub-sections.

6.3.1. Montage workflow Fig. 6.2(a) and Fig. 6.2(b) represent the bi-objective non-dominated Pareto Optimal solutions and tri-objective nondominated Pareto Optimal solutions for Montage workflow, respectively. Most of the solutions obtained using HPSO algorithm is lying closely to the true front while preserving the uniform spacing among the solutions. Table 6.2 shows the results obtained using Spread and GD metrics for Montage workflow considering bi-objective and tri-objective test suits. From Table 6.2, it has been observed that the value of convergence metric GD corresponding to HPSO algorithm is less as compared to other algorithms considered for bi-objectives as well as for tri-objectives test suits. It specifies that the performance of the HPSO algorithm is better and reaches a solution set that is 63%, 84% and 90% closer to true Pareto front as in comparison to the solution set created by MOPSO, FDPSO and NSGA-II algorithms, respectively. Similarly, the values of spacing metric using HPSO algorithm is 15%, 26% and 35% lower than that of values of spacing metric obtained using MOPSO, FDPSO and NSGA-II algorithms, respectively, on an average for both bi-objectives and tri-objectives test suits. This is due to use of trade-off schedule plan between cost and makespan created by BDHEFT algorithm in the creation of nondominated solution set. Hence, HPSO algorithm provides uniform spacing as well as better convergence among the solution set as compared to other three algorithms for Montage workflow structure.

10

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

Fig. 6.2. Bi-objective and tri-objective non-dominated solutions for Montage workflow.

6.3.2. CyberShake workflow The non-dominated Pareto Optimal solutions for the bi-objective and tri-objective test suits for CyberShake workflow is shown in Fig. 6.3(a) and Fig. 6.3(b), respectively. It has been observed that the solution set obtained using HPSO algorithm is lying closely to the true front while preserving the uniform spacing among the solutions. Table 6.3 shows the results of simulation obtained using Spread and GD metrics for CyberShake workflow considering bi-objective and tri-objective test suits. From Table 6.3, it has been observed that the value of convergence metric GD corresponding to HPSO algorithm is less as compared to other algorithms considered for bi-objectives as well as for tri-objectives problems. It specifies that the performance of the HPSO algorithm is better and reaches a solution set that is 65%, 83% and 86% closer to true Pareto front

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

11

Table 6.2 Comparative results of bi-objectives and tri-objectives Spread and GD for Montage. Spread (bi-objectives) Bf

0.3 0.5 0.7

HPSO

MOPSO

FDPSO

NSGA-II

Average

Std. Dev.

Average

Std. Dev.

Average

Std. Dev.

Average

Std. Dev.

0.473 0.577 0.633

0.138 0.107 0.021

0.574 0.678 0.721

0.146 0.108 0.156

0.603 0.818 0.863

0.154 0.112 0.125

0.773 0.893 0.901

0.161 0.114 0.211

Std. Dev. 0.046 0.156 0.0765

Average 0.584 0.655 0.842

Std. Dev. 0.064 0.168 0.147

Average 0.783 0.795 0.883

Std. Dev. 0.002 0.021 0.057

Average 0.101 0.136 0.151

Std. Dev. 0.022 0.027 0.028

Average 0.137 0.146 0.167

Spread (tri-objectives) Bf

HPSO

0.3 0.5 0.7

Average 0.368 0.407 0.506

MOPSO Std. Dev. 0.002 0.091 0.053

Average 0.507 0.552 0.627

FDPSO

NSGA-II Std. Dev. 0.206 0.123 0.151

Generational Distance (bi-objectives) Bf

HPSO

0.3 0.5 0.7

Average 0.002 0.005 0.031

MOPSO Std. Dev. 0.004 0.003 0.034

Average 0.022 0.029 0.036

FDPSO Std. Dev. 0.035 0.041 0.091

Average 0.054 0.065 0.086

NSGA-II Std. Dev. 0.039 0.009 0.011

Generational Distance (tri-objectives) Bf

HPSO

0.3 0.5 0.7

Average 0.005 0.014 0.036

MOPSO Std. Dev. 0.001 0.005 0.032

Average 0.038 0.04 0.063

FDPSO Std. Dev. 0.041 0.053 0.119

Average 0.091 0.091 0.141

NSGA-II Std. Dev. 0.032 0.013 0.019

as in comparison to the solution set created by MOPSO, FDPSO and NSGA-II algorithms, respectively. Similarly, the values of spacing metric using HPSO algorithm are 28%, 43% and 47% lower than that of values of spacing metric obtained using MOPSO, FDPSO and NSGA-II algorithms, respectively, on an average for both bi-objectives and tri-objectives test suits because of embedding the trade-off solution of BDHEFT algorithm while generating the non-dominated solution set. Therefore, HPSO algorithm provides uniform spacing as well as better convergence among the solution set as compared to other three algorithms for CyberShake workflow structure. 6.3.3. EpiGenomics workflow For EpiGenomics workflow, the non-dominated Pareto Optimal solutions for the bi-objective and tri-objective test suits are shown in Fig. 6.4(a) and Fig. 6.4(b), respectively. Fig. 6.4 shows that most of the solutions obtained using HPSO algorithm is close to the true front as well as shows the uniform spacing among the solutions. Table 6.4 shows the results of simulation obtained using Spread and GD metrics for EpiGenomics workflow considering bi-objective and tri-objective test suits. The value of convergence metric GD corresponding to HPSO algorithm is less as compared to other algorithms considered for bi-objectives as well as for tri-objectives problems as shown in Table 6.4. Also from Fig. 6.4, it specifies that the performance of the HPSO algorithm is better and reaches a solution set that is 25%, 55% and 72% closer to true Pareto front as in comparison to the solution set created by MOPSO, FDPSO and NSGA-II algorithms, respectively. Similarly, the values of spacing metric using HPSO algorithm are 20%, 36% and 47% lower than that of values of spacing metric obtained using MOPSO, FDPSO and NSGA-II algorithms, respectively, on an average for both bi-objectives and tri-objectives test suits. Since, we have considered the schedule plan created by BDHEFT algorithm while creating solution set for HPSO algorithm, so, HPSO algorithm provides uniform spacing as well as better convergence among the solution set as compared to other three algorithms for EpiGenomics workflow structure. 6.3.4. LIGO workflow Fig 6.5(a) and Fig. 6.5(b) show the non-dominated Pareto Optimal solutions for the bi-objective and tri-objective test suits for LIGO workflow, respectively. It is clear from Fig. 6.5 that the solutions obtained using HPSO algorithm is lying closely to the true front while preserving the uniform spacing among the solutions. Table 6.5 shows the results of simulation obtained using Spread and GD metrics for LIGO workflow considering bi-objective and tri-objective test suits. The value of convergence metric GD corresponding to HPSO algorithm is less as compared to other algorithms considered for bi-objectives as well as for tri-objectives problems as shown in Table 6.5. Also from Fig. 6.5, it specifies that the performance of the HPSO algorithm is better and reaches a solution set that is 25%, 55% and 72% closer to true Pareto front as in comparison to the solution set created by MOPSO, FDPSO and NSGA-II algorithms, respectively. Similarly, the values of spacing metric using HPSO algorithm are 20%, 36% and 47% lower than that of values of spacing metric obtained using

12

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

Table 6.3 Comparative results of bi-objectives and tri-objectives Spread and GD for CyberShake. Spread (bi-objectives) Bf

HPSO

MOPSO

FDPSO

NSGA-II

Average

Std. Dev.

Average

Std. Dev.

Average

Std. Dev.

Average

Std. Dev.

0.3 0.5 0.7

0.435 0.473 0.581

0.046 0.232 0.052

0.526 0.565 0.609

0.159 0.148 0.111

0.537 0.691 0.627

0.134 0.146 0.134

0.676 0.794 0.974

0.168 0.076 0.148

Std. Dev. 0.058 0.022 0.034

Average 0.532 0.557 0.581

Std. Dev. 0.019 0.023 0.041

Average 0.577 0.592 0.622

Std. Dev. 0.102 0.136 0.215

Average 0.0396 0.0756 0.0762

Spread (tri-objectives) Bf

HPSO

0.3 0.5 0.7

Average 0.373 0.455 0.497

MOPSO Std. Dev. 0.029 0.022 0.033

Average 0.516 0.541 0.563

FDPSO

NSGA-II Std. Dev. 0.051 0.029 0.025

Generational Distance (bi-objectives) Bf

HPSO

0.3 0.5 0.7

Average 0.0094 0.0106 0.0116

Bf

HPSO

0.3 0.5 0.7

Average 0.0105 0.0209 0.0221

MOPSO Std. Dev. 0.057 0.194 0.046

Average 0.0228 0.035 0.0365

FDPSO Std. Dev. 0.081 0.116 0.075

Average 0.0595 0.0715 0.0915

NSGA-II Std. Dev. 0.079 0.096 0.076

Generational Distance (tri-objectives) MOPSO Std. Dev. 0.027 0.035 0.041

Average 0.0228 0.035 0.0365

FDPSO Std. Dev. 0.105 0.015 0.035

Average 0.0595 0.0715 0.0915

NSGA-II Std. Dev. 0.064 0.031 0.028

Average 0.0396 0.0756 0.0762

Std. Dev. 0.065 0.032 0.013

Table 6.4 Comparative results of bi-objectives and tri-objectives Spread and GD for EpiGenomics. Spread (bi-objectives) Bf

HPSO

MOPSO

FDPSO

NSGA-II

Average

Std. Dev.

Average

Std. Dev.

Average

Std. Dev.

Average

Std. Dev.

0.3 0.5 0.7

0.3612 0.4172 0.5386

0.196 0.04 0.122

0.4767 0.5043 0.6727

0.058 0.104 0.036

0.5426 0.643 0.899

0.061 0.058 0.104

0.5715 0.9364 0.9726

0.036 0.149 0.126

Bf

HPSO

0.3 0.5 0.7

Average 0.3612 0.4172 0.5386

Std. Dev. 0.03 0.041 0.125

Average 0.5426 0.643 0.899

Std. Dev. 0.154 0.104 0.105

Average 0.5715 0.9364 0.9726

Bf

HPSO

0.3 0.5 0.7

Average 0.0097 0.0104 0.0172

Std. Dev. 0.007 0.061 0.005

Average 0.0709 0.0763 0.089

Std. Dev. 0.007 0.034 0.015

Average 0.0709 0.0763 0.089

Spread (tri-objectives) MOPSO Std. Dev. 0.072 0.067 0.037

Average 0.4767 0.5043 0.6727

FDPSO

NSGA-II Std. Dev. 0.171 0.064 0.117

Generational Distance (bi-objectives) MOPSO Std. Dev. 0.012 0.014 0.01

Average 0.0192 0.0198 0.0228

FDPSO Std. Dev. 0.009 0.021 0.031

Average 0.037 0.0254 0.0261

NSGA-II Std. Dev. 0.014 0.027 0.003

Generational Distance (tri-objectives) Bf

HPSO

0.3 0.5 0.7

Average 0.0097 0.0104 0.0172

MOPSO Std. Dev. 0.001 0.006 0.002

Average 0.0192 0.0198 0.0228

FDPSO Std. Dev. 0.02 0.012 0.027

Average 0.037 0.0254 0.0261

NSGA-II Std. Dev. 0.086 0.045 0.066

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

13

Fig. 6.3. Bi-objective and tri-objective non-dominated solutions for CyberShake workflow.

MOPSO, FDPSO and NSGA-II algorithms, respectively, on an average for both bi-objectives and tri-objectives test suits. As resultant schedule plan of BDHEFT algorithm is inserted into the initial population of HPSO algorithm, so, it improves the convergence as well as uniform spacing among the solution set of HPSO algorithm as compared to other three algorithms for LIGO workflow structure

6.3.5. SIPHT workflow Fig 6.6(a) and Fig. 6.6(b) show the non-dominated Pareto Optimal solutions for the bi-objective and tri-objective test suits for SIPHT workflow, respectively. Most of the solutions obtained using HPSO algorithm is lying closely to the true front

14

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

Table 6.5 Comparative results of bi-objectives and tri-objectives Spread and GD for LIGO. Spread (Bi-objectives) Bf

HPSO

MOPSO

FDPSO

NSGA-II

Average

Std. Dev.

Average

Std. Dev.

Average

Std. Dev.

Average

Std. Dev.

0.3 0.5 0.7

0.285 0.3722 0.394

0.023 0.187 0.13

0.3993 0.6735 0.6922

0.179 0.084 0.134

0.6965 0.7316 0.8124

0.123 0.127 0.083

0.7041 0.852 0.9473

0.093 0.092 0.092

Bf

HPSO

0.3 0.5 0.7

Average 0.3773 0.3904 0.4917

Std. Dev. 0.2 0.266 0.135

Average 0.6006 0.6691 0.7725

Std. Dev. 0.077 0.123 0.1

Average 0.718 0.8754 0.9465

Bf

HPSO

0.3 0.5 0.7

Average 0.0048 0.0121 0.0184

Std. Dev. 0.005 0.021 0.055

Average 0.0358 0.038 0.0649

Bf

HPSO

0.3 0.5 0.7

Average 0.0257 0.035 0.0453

Std. Dev. 0.022 0.01 0.037

Average 0.103 0.1088 0.117

Spread (tri-objectives) MOPSO Std. Dev. 0.056 0.054 0.051

Average 0.4465 0.4847 0.5897

FDPSO

NSGA-II Std. Dev. 0.068 0.124 0.108

Generational Distance (bi-objectives) MOPSO Std. Dev. 0.02 0.012 0.03

Average 0.0156 0.0244 0.0237

FDPSO Std. Dev. 0.007 0.045 0.055

Average 0.0204 0.0278 0.0468

NSGA-II Std. Dev. 0.029 0.016 0.017

Generational Distance (tri-objectives) MOPSO Std. Dev. 0.017 0.013 0.011

Average 0.0312 0.0397 0.0503

FDPSO Std. Dev. 0.036 0.013 0.012

Average 0.0339 0.0794 0.0899

NSGA-II Std. Dev. 0.009 0.014 0.018

Table 6.6 Comparative results of bi-objectives and tri-objectives Spread and GD for SIPHT. Spread (bi-objectives) Bf

HPSO

MOPSO

FDPSO

NSGA-II

Average

Std. Dev.

Average

Std. Dev.

Average

Std. Dev.

Average

Std. Dev.

0.3 0.5 0.7

0.4315 0.5266 0.5952

0.153 0.149 0.053

0.5973 0.7091 0.7774

0.195 0.135 0.137

0.6965 0.7316 0.8124

0.122 0.26 0.242

0.7409 0.8267 0.9832

0.219 0.057 0.13

Bf

HPSO

0.3 0.5 0.7

Average 0.2061 0.4868 0.5434

Std. Dev. 0.203 0.191 0.212

Average 0.6006 0.6691 0.7725

Std. Dev. 0.116 0.132 0.146

Average 0.7122 0.8157 0.9327

Bf

HPSO

0.3 0.5 0.7

Average 0.0047 0.008 0.022

Std. Dev. 0.003 0.017 0.011

Average 0.0423 0.0865 0.129

Std. Dev. 0.011 0.065 0.056

Average 0.1204 0.1433 0.1503

Spread (tri-objectives) MOPSO Std. Dev. 0.142 0.174 0.137

Average 0.5499 0.5726 0.6549

FDPSO

NSGA-II Std. Dev. 0.191 0.067 0.054

Generational Distance (bi-objectives) MOPSO Std. Dev. 0.006 0.068 0.013

Average 0.0109 0.0298 0.0486

FDPSO Std. Dev. 0.048 0.014

Average 0.0206 0.0538 0.0769

NSGA-II Std. Dev. 0.007 0.037 0.01

Generational Distance (tri-objectives) Bf

HPSO

0.3 0.5 0.7

Average 0.005 0.0151 0.0457

MOPSO Std. Dev. 0.007 0.021 0.008

Average 0.0227 0.0444 0.0691

FDPSO Std. Dev. 0.011 0.024 0.032

Average 0.0731 0.0965 0.1166

NSGA-II Std. Dev. 0.008 0.024 0.019

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

15

Fig. 6.4. Bi-objective and tri-objective non-dominated solutions for EpiGenomics workflow.

while preserving the uniform spacing among the solutions. Table 6.6 shows the results of simulation obtained using Spread and GD metrics for LIGO workflow considering bi-objective and tri-objective test suits. The value of convergence metric GD corresponding to HPSO algorithm is less as compared to other algorithms considered for bi-objectives as well as for tri-objectives problems as shown in Table 6.6. Also from Fig. 6.6, it specifies that the performance of the HPSO algorithm is better and reaches a solution set that is 63%, 80% and 87% closer to true Pareto front as in comparison to the solution set created by MOPSO, FDPSO and NSGA-II algorithms, respectively. Similarly, the values of spacing metric using HPSO algorithm are 28%, 38% and 44% lower than that of values of spacing metric obtained using MOPSO, FDPSO and NSGA-II algorithms, respectively, on an average for both bi-objectives and tri-objectives test suits. HPSO algorithm provides uniform spacing as well as better convergence among the solution set as compared to other three algorithms for SIPHT workflow structure due to the use of trade-off schedule plan created by BDHEFT algorithm.

16

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

Fig. 6.5. Bi-objective and tri-objective non-dominated solutions for LIGO Workflow.

From Fig. 6.2 to Fig. 6.6 and Table 6.2 to Table 6.6, it is evident that the solutions obtained using HPSO algorithm are lying closely to the true front. It is concluded that HPSO algorithm provides better convergence and uniform spacing among the solutions for all workflow structures under consideration. Hence, it is applicable to solve a wide class of multi-objective optimization problems for scheduling scientific workflows in a cloud environment.

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

17

Fig. 6.6. Bi-objective and tri-objective non-dominated solutions for SIPHT Workflow.

7. Conclusion and future work Over the years, many researchers have focused their attention on cloud workflow scheduling problem with a single objective. However, the goal of decision maker is multi-fold and prefers the set of Pareto optimal solutions when considering real life applications. We proposed the multi-objective Hybrid Particle Swarm Optimization (HPSO) algorithm based upon non-dominance sorting procedure to solve the cloud workflow scheduling problem. It is a combination of multi-objective Particle Swarm Optimization algorithm and list based heuristic. Its performance is analyzed using three conflicting objectives of makespan, total cost and energy consumption under deadline and budget constraints. The efficacy and applicability of the proposed approaches are demonstrated through varying sized application task graphs and comparing it with stateof-art meta-heuristics. The simulation experiments exhibit that HPSO performs better for workflow scheduling over IaaS cloud in terms of the convergence towards the true Pareto optimal front and uniformly distributed solutions with small computation overhead. Hence, it is applicable to solve a wide class of multi-objective optimization problems for scheduling

18

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

scientific workflows over IaaS clouds. In future, the proposed work can be extended by considering other QoS constraints like reliability, trust management, VM migration for further lowering the energy consumption etc. The concept of neural networks, fuzzy logic, etc. needs to be tested for possible enhancement to the proposed heuristics and focus would be to reduce multi-objective optimization problem into single objective optimization problem using scalarization method.

References [1] I. Foster, Y. Zhao, I. Raicu, S. Lu, Cloud computing and grid computing 360-degree compared, in: Grid Computing Environments Workshop, 2008, pp. 1–10, doi:10.1109/GCE.2008.4738445. [2] M. C.J.R. Gabriel, G. Wolfgang, Hybrid computing—where hpc meets grid and cloud computing, J. Futur. Gener. Comput. Syst. 27 (2011) 440–453. [3] A. Verma, S. Kaushal, Cloud computing security issues and challenges: a survey, in: Advances in Computing and Communications SE - 46, 2011, pp. 445–454, doi:10.1007/978- 3- 642- 22726- 4_46. [4] S. Chaisiri, B.S. Lee, D. Niyato, Optimization of resource provisioning cost in cloud computing, IEEE Trans. Serv. Comput. 5 (2012) 164–177, doi:10.1109/ TSC.2011.7. [5] Amazon EC2, 2014. http://aws.amazon.com/ec2. [6] Go Grid, 2014. http://www.gogrid.com. [7] S.M., I. Taylor, E. Deelman, D. Gannon, Workflows for e-Science: Scientific Workflows for Grid. 1st, 1st ed., Springer, 2007. [8] S. Pandey, Scheduling and Management of Data Intensive Application Workflows in Grid and Cloud Computing Environment, University of Melbourne, Australia, 2010. [9] L. Y. Ke, J. Hai, C.X.L. Jinjun, Dong, A compromised time-cost scheduling algorithm in SwinDeW-C for instance-intensive cost-constrained workflows on cloud computing platform, J. High Perform. Comput. Appl. (2010) 1–16. [10] G H.Y. Attiya, Task allocation for maximizing reliability of distributed systems: a simulated annealing approach, J. Parallel Distrib. Comput. 66 (2006) 1259–1266. [11] G L.M. Falzon, Enhancing genetic algorithms for dependent job scheduling in grid computing environments, J. Supercomput. 62 (2012) 290–314. [12] R. Graham, L. John, A fast, effective local search for scheduling independent jobs in heterogeneous computing environments., 2003. [13] C H.B. Grosan, A Abraham, Multiobjective evolutionary algorithms for scheduling jobs on computational grids, in: International Conference on Applied Computing, 2007, pp. 459–463. [14] A. Verma, S. Kaushal, Cost-time efficient scheduling plan for executing workflows in the cloud, J. Grid Comput. (n.d.). doi:10.1007/s10723-015-9344-9. [15] J. Yu, R. Buyya, Workflow scheduling algorithms for grid computing, in: Xhafa F Abraham (Ed.), A Metaheuristics Sched. Distrib. Comput. Environ., Springer, Berlin. IS, 2008. [16] J. Yu, R. Buyya, Taxonomy of workflow management systems for grid computing, J. Grid Comput. 3 (2008) 171–200. [17] Y.K. Kwok, I. Ahmad, Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors, IEEE Trans. Parallel Distrib. Syst. 7 (1996) 506–521. [18] G.C. Sih, E.A. Lee, Compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures, IEEE Trans. Parallel Distrib. Syst. 4 (1993) 175–187. [19] H. Topcuoglu, S. Hariri, M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, Parallel Distrib. Syst. 13 (2002) 260–274, doi:10.1109/71.993206. [20] R. Sakellariou, H. Zhao, E. Tsiakkouri, M.D. Dikaiakos, Scheduling workflows with budget constraints, in: Integr. Res. GRID Comput. CoreGRID Integr. Work. 2005 Sel. Pap., 2007, pp. 189–202. http://www.springerlink.com/content/np74q18167155vp6. [21] M. N.J. Malawki, G. Juve, E Deelman, Cost and deadline-constrained provisioning for scientific workflow ensembles in iaas clouds, in: IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, UT, 2012, pp. 1–11. [22] S. Abrishami, M. Naghibzadeh, D.H.J. Epema, Deadline-constrained workflow scheduling algorithms for Infrastructure as a service clouds, Futur. Gener. Comput. Syst. 29 (2013) 158–169, doi:10.1016/j.future.2012.05.004. [23] R. Van Den Bossche, K. Vanmechelen, J. Broeckhove, Online cost-efficient scheduling of deadline-constrained workloads on hybrid clouds, Futur. Gener. Comput. Syst. 29 (2013) 973–985, doi:10.1016/j.future.2012.12.012. [24] A. Verma, S. Kaushal, Deadline and budget distribution based cost- time optimization workflow scheduling algorithm for cloud, in: International Conference on Recent Advances and Future Trends in Information Technology, 2012, pp. 1–4. [25] A. Verma, S. Kaushal, Deadline constraint heuristic-based genetic algorithm for workflow scheduling in cloud, Int. J. Grid Util. Comput. 5 (2014) 96– 106, doi:10.1504/IJGUC.2014.060199. [26] A. Verma, S. Kaushal, Budget constrained priority based genetic algorithm, in: Fifth International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom 2013), IET, 2013, pp. 216–222, doi:10.1049/cp.2013.2206. [27] A. Zhou, B.-Y. Qu, H. Li, S.-Z. Zhao, P.N. Suganthan, Q. Zhang, Multiobjective evolutionary algorithms: a survey of the state of the art, Swarm Evol. Comput. 1 (2011) 32–49, doi:10.1016/j.swevo.2011.03.001. [28] R. Garg, Multi-objective optimization to workflow grid scheduling using reference point based evolutionary algorithm, Int. J. Comput. Appl. 22 (2011) 44–49. [29] H.M. Fard, R. Prodan, J.J.D. Barrionuevo, A multi-objective approach for workflow scheduling in the heterogeneous environment, in: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Ottawa, Canada, IEEE, 2012, pp. 300–309, doi:10.1109/CCGrid.2012.114. [30] A. Dogˇ an, F. Özgüner, Biobjective scheduling algorithms for execution time-reliability trade-off in hetero∗ geneous computing systems, Comput. J. 48 (20 05) 30 0–314, doi:10.1093/comjnl/bxh086. [31] S. Su, J. Li, Q. Huang, X. Huang, K. Shuang, J. Wang, Cost-efficient task scheduling for executing large programs in the cloud, Parallel Comput 39 (2013) 177–188, doi:10.1016/j.parco.2013.03.002. [32] R. Garg, A.K. Singh, Multi-objective workflow grid scheduling using ε -fuzzy dominance sort based discrete Particle Swarm Optimization, J. Supercomput. 68 (2014) 709–732, doi:10.1007/s11227- 013- 1059- 8. [33] J.J. Durillo, R. Prodan, Multi-objective workflow scheduling in Amazon EC2, Clust. Comput. (2013) 169–189. [34] J.J. Durillo, V. Nae, R. Prodan, Multi-objective energy-efficient workflow scheduling using list-based heuristics, Futur. Gener. Comput. Syst. 36 (2013) 221–236, doi:10.1016/j.future.2013.07.005. [35] J.J. Durillo, R. Prodan, J.G. Barbosa, Pareto tradeoff scheduling of workflows on federated commercial clouds, Simul. Model. Pract. Theory. 58 (2015) 95–111, doi:10.1016/j.simpat.2015.07.001. [36] W. Zheng, R. Sakellariou, Budget-deadline constrained workflow planning for admission control, J. Grid Comput. 11 (2013) 633–651, doi:10.1007/ s10723- 013- 9257- 4. [37] M. Mezmaz, N. Melab, Y. Kessaci, Y.C. Lee, E.G. Talbi, a.Y. Zomaya, et al., A parallel bi-objective hybrid metaheuristic for energy-aware scheduling for cloud computing systems, J. Parallel Distrib. Comput. 71 (2011) 1497–1508, doi:10.1016/j.jpdc.2011.04.007. [38] K. Deb, in: Multi-Objective Optimization using Evolutionary Algorithms, John Wiley & Sons, 2005, pp. 13–46. [39] J. Kennedy, R. Eberhart, Particle Swarm Optimization, in: Neural Networks, 1995. Proceedings of IEEE International Conference 4, 4, 1995, pp. 1942– 1948, doi:10.1109/ICNN.1995.488968. [40] P.K. Tripathi, S. Bandyopadhyay, S.K. Pal, Multi-Objective Particle Swarm Optimization with time variant inertia and acceleration coefficients, Inf. Sci. (Ny). 177 (2007) 5033–5049, doi:10.1016/j.ins.2007.06.018.

A. Verma, S. Kaushal / Parallel Computing 62 (2017) 1–19

19

[41] C.a C. Coello, G.T. Pulido, M.S. Lechuga, Handling multiple objectives with Particle Swarm Optimization, Evol. Comput. IEEE Trans. 8 (2004) 256–279, doi:10.1109/TEVC.2004.826067. [42] S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, S Lanitchi, K. Vahi, et al., Characterization of scientific workflows, in: Work. Work. Support Large Scale Sci., CA, USA, 2008, pp. 1–10, doi:10.1109/WORKS.2008.4723958. [43] R.N. Calheiros, R. Ranjan, A. Beloglazov, C.A.F. De Rose, R. Buyya, CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms, Softw. - Pract. Exp. 41 (2011) 23–50, doi:10.1002/spe.995. [44] http://confluenece.peagasus.isi.edu/display.peagasus/WorkflowGenerator, (2014). [45] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2002) 182–197, doi:10.1109/4235.996017.