On Scientific Workflow Scheduling in Clouds under ...

5 downloads 134 Views 942KB Size Report
computing elements, and incorporate the instance-hour billing model into cost ...... workflow scheduling in software as a service cloud,” Scientia. Iranica, no.
On Scientific Workflow Scheduling in Clouds under Budget Constraint Xiangyu Lin, Chase Qishi Wu Department of Computer Science University of Memphis Memphis, TN 38152, USA {xlin, qishiwu}@memphis.edu

Abstract—Next-generation e-Science features large-scale, compute-intensive workflows of many computing modules that are typically executed in a distributed manner. With the recent emergence of cloud computing and the rapid deployment of cloud infrastructures, an increasing number of scientific workflows have been shifted or are in active transition to cloud environments. As cloud computing makes computing a utility, scientists across different application domains are facing the same challenge of reducing financial cost in addition to meeting the traditional goal of performance optimization. We construct analytical models to quantify the network performance of scientific workflows using cloud-based computing resources, and formulate a task scheduling problem to minimize the workflow end-to-end delay under a user-specified financial constraint. We rigorously prove that the proposed problem is not only NP-complete but also non-approximable. We design a heuristic solution to this problem, and illustrate its performance superiority over existing methods through extensive simulations and real-life workflow experiments based on proof-ofconcept implementation and deployment in a local cloud testbed. Keywords-scientific workflows; workflow scheduling; cloud computing;

I. Introduction With the emergence of cloud computing and the rapid deployment of cloud infrastructures, an increasing number of scientific workflows have been moved or are in active transition to clouds. Built on virtualization technologies over wide-area networks, cloud computing has established a new computing paradigm that creates a much more customizable and costeffective computing environment than any previous computing solution for scientific applications. Instead of managing and utilizing concrete nodes as in grids, clouds provision virtual servers working in federation through the Internet [1] to reduce the users’ cost in purchasing, operating, and maintaining a physical computing infrastructure [2]. Furthermore, virtualization technologies allow multiple virtual machines (VMs) to be configured on one physical machine, resulting in more flexible sharing and higher utilization of physical resources [1]. The current cloud services are categorized into: Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). IaaS clouds provide virtualized hardware and storage for users to deploy their own applications, and therefore are most suitable for executing scientific workflows. Real-world IaaS cloud services such as Amazon EC2 [3], provide VM instances with different CPU or memory capacities to meet different demands of various applications. Such VM instances are usually priced according to their processing powers but not necessarily linearly [4], and charged by the provisioned time units, such as hours∗ . Within the same cloud, VMs work in a structure as a virtual cluster and data transfers are typically performed through a shared storage system without financial charge [6] [1]; while across different clouds, users generally need to pay for inter-cloud data transfers. ∗ In the cost evaluation of workflow execution, any partial hours are often rounded up as in the case of EC2 [5].

The shifting of scientific workflows to clouds has reaped the most significant benefit of resource virtualization by untangling application users from complex management and maintenance of underlying resources, but meanwhile has also brought many new challenges for workflow execution and performance optimization. Due to the nature of cloud computing that makes computing a utility, one major objective of resource provisioning in clouds is to allocate thus pay for only those cloud resources that are truly needed and used [4]. Hence, to support cost-effective execution of scientific workflows in cloud environments, scientists are now facing the challenge of reducing financial cost in addition to the traditional goal of optimizing workflow performance. In this paper, we construct analytical models to quantify the network performance of scientific workflows in IaaS cloud environments, and formulate a task scheduling problem to minimize the workflow endto-end delay under a user-specified financial cost constraint, referred to as MED-CC (Minimum End-to-end Delay under Cost Constraint). We rigorously prove that the proposed MED-CC problem is NP-complete and non-approximable. We then design a heuristic solution to this problem and illustrate its performance superiority over existing methods through extensive simulations and experiments using real-life scientific workflows in a local cloud testbed. Our workflow scheduling solution can offer IaaS cloud service providers an economical resource allocation scheme to meet the budget constraint specified by the user, and meanwhile can also serve as a cloud resource provisioning reference for scientific users to make proactive and informative resource requests. The rest of the paper is organized as follows. Section II conducts a survey of related work on workflow optimization especially in cloud environments. Section III models the performance of scientific workflows executed in clouds and formulates MED-CC, whose complexity is analyzed in Section IV. Section V designs a workflow scheduling algorithm. Section VI presents simulation and experimental results. II. Related Work We conduct a survey of related work on workflow optimization in heterogeneous networks and its extension to cloud environments. Task scheduling or module mapping for workflows has been investigated extensively in the literature in the past decade [7]–[10]. Many heuristics have been proposed to minimize the workflow makespan (i.e. execution time) in grid environments, such as Heterogeneous Earliest Finish Time (HEFT) [11], and Hybrid Balanced Minimum Completion Time (HBMCT) [12]. HBMCT first assigns weights to the nodes and edges of a workflow graph, and then partitions the nodes into ordered groups and schedules independent tasks within each group. These scheduling algorithms have been demonstrated to be very effective in makespan minimization in their targeted simulation or experimental settings.

There also exist a relatively limited number of workflow scheduling efforts with a budget constraint in utility grids such as [8], [12]–[14]. In [14], Sakellariou et al. proposed two approaches, namely, LOSS and GAIN, to schedule DAG-structured workflow applications in grid environments. They start from a standard DAG schedule (such as HEFT or HBMCT) and a least-cost schedule, respectively, and reschedule the workflow to meet the budget constraint. In [15], Yu et al. performed a cost-based scheduling process through workflow task partitioning and deadline assignment to meet a user-defined deadline with the minimal cost. The workflow tasks are first categorized into synchronization tasks and simple tasks according to the number of their parent and child tasks. Interdependent simple tasks that are executed sequentially are then grouped into branches connected by synchronization tasks. The overall deadline specified by the user is divided into sub-deadlines over the task partitions in proportion to their minimal processing time. Workflow scheduling that takes both cost and delay into consideration in cloud environments has not been studied as extensively. Recently, Zeng et al. proposed a budget-conscious scheduler to minimize many-task workflow execution time within a certain budget [16]. Their scheduling strategy starts with tasks labeled with an average execution time on several VMs, and then sorts the tasks such that those on the initial critical path will be rescheduled first based on an objective function defined as Comparative Advantage (CA). In [17], Abrishami et al. designed a QoS-based workflow scheduling algorithm based on Partial Critical Paths (PCP) in SaaS clouds to minimize the cost of workflow execution within a userdefined deadline. As many existing critical-path heuristics, they schedule modules on the critical path first to minimize the cost without exceeding their deadline. PCP are then formed ending at those scheduled modules, and each PCP takes the start time of the scheduled critical module as its deadline. This scheduling process continues recursively until all modules are scheduled. In [4], [18], Mao et al. investigated the automatic scaling of clouds with budget and deadline constraints and proposed Scaling-Consolidation-Scheduling (SCS) with VMs as basic computing elements. In [19], Hacker proposed a combination of four scheduling policies based on an on-line estimation of physical resource usage. In this paper, we construct analytical models to quantify workflow performance in IaaS clouds with VMs as fundamental computing elements, and incorporate the instance-hour billing model into cost calculation. These models facilitate a rigorous formulation of a budget-constrained end-to-end delay minimization problem for scientific workflows executed in clouds. III. Mathematical Models and Problem Formulation We shall start with the generic notations in [20] for independent tasks, and extend them towards our workflow execution model with inter-module dependencies to identify and quantify the key financial and time cost components in clouds. A. Cost Models 1) Financial Cost We consider a set V M of n available virtual machines and a set P of m programs to be executed. The financial cost Ci, j of executing a specific program Pi , i ∈ {0, 1, . . . , m − 1}, on a V M j , j ∈ {0, 1, . . . , n − 1}, for a duration of Ti, j is the sum of four cost components: (1) Ci, j = C(I j ) +C(Ei, j ) +C(Ri ) +C(Si ), where i) C(I j ) = I(V M j ) denotes the cost of initializing V M j , ii) C(Ei, j ) denotes the cost of running program Pi on V M j ,

iii) C(Ri ) denotes the cost of transferring the data required (downloading input data) and produced (uploading output data) by Pi , and iv) C(Si ) denotes the data storage cost of Pi . The time duration Ti, j spans from the initialization of V M j to the end of output data transfer from Pi . 2) Time Cost Similarly, the execution time Ti, j of program Pi on virtual machine V M j is calculated as: Ti, j = T (I j ) + T (Ei, j ) + T (Ri ), (2) where i) T (I j ) denotes the startup time of V M j , ii) T (Ei, j ) denotes the time for running Pi on V M j , and iii) T (Ri ) denotes the time Pi takes to download and upload data to and from the virtual machine it is running on. B. Workflow and Cloud Models As illustrated in Fig. 1, there are three layers in the workflow scheduling problem in cloud environments: i) the task graph layer comprised of interdependent workflow modules, ii) the resource graph layer representing a network of VMs, and iii) the cloud infrastructure layer consisting of physical computer nodes connected by network links. Note that in the task graph layer, we consider scientific workflows that have been preprocessed by an appropriate clustering technique based on the inter-module dependencies and the volumes of inter-module data transfer [21]–[24] such that a group of modules in the original workflow are bundled together as one aggregate module in the resulted task graph. The clustering techniques have been well studied in the literature and are beyond the scope of our work. We consider a one-to-one mapping scheme in our work such that each aggregate module in the task graph is assigned to a different VM for execution. This consideration requires us to choose the most suitable VM type, i.e. execution environment, for each module. However, in practice, once the schedule is obtained, we can always reuse the VMs of the same type to reduce the actual number of VMs being created and used based on the workflow structure and physical resource availability.

Figure 1.

Modeling workflow execution in cloud environments.

The layer-based hierarchy of cost models in our workflow scheduling problem is detailed as follows: • A task graph or workflow modeled as a directed acyclic graph (DAG) Gw (Vw , Ew ), consisting of |Vw | = m modules with |Ew | directed edges li, j representing the data dependency between module wi and w j . Each module wi ∈ Vw carries a certain amount of workload W Li , and each edge li, j transfers a certain data size DSi, j . • A set of available VM types V T = {V T0 ,V T1 , . . . ,V Tn−1 }, where each type V T j is associated with both cost- and







performance-related attributes. We define each V Ti as: (3) V T j = {V Pj ,CV j }, where V Pj represents the overall processing power of the virtual machine of type V T j including its processor speed, disk volume and memory space, and CV j denotes the overall financial cost for using this virtual machine per unit time including all cost components such as VM initialization, module execution, and data transfer. A cloud infrastructure graph modeled as an arbitrary weighted graph Gc (Vc , Ec ) consisting of a set of physical computer nodes. Each computer node ci ∈ Vc has a processing power PPi . The cost of a data transfer Ri, j of size DSi, j from module wi to w j is calculated as: C(Ri, j ) = CR ∙ DSi, j , (4) where CR is the fixed data transfer cost per unit data determined by the cloud infrastructure. In the case of intra-cloud data transfer, CR = 0. A fully connected virtual resource graph modeled as an arbitrary weighted graph G0c (Vc0 , Ec0 ) consisting of |Vc0 | VMs interconnected by |Ec0 | virtual links. Each virtual link l 0p,q 0 , which is a function of the physical has a bandwidth BWp,q bandwidth between two physical machines provisioning VMs c0p and c0q . Note that each VM c0 ∈ Vc0 is an instance of a certain VM type. The time of a data transfer Ri, j of data size DSi, j from module wi to w j is calculated as: DSi, j T (Ri, j ) = + d 0p,q , (5) 0 BWp,q where d 0p,q denotes the delay of the virtual link. The time T (Ei, j ) of the execution Ei, j of module wi on a VM of type V T j is a function of wi ’s workload and V T j ’s performance-related attributes: W Li , (6) T (Ei, j ) = V Pj and the corresponding module execution cost is calculated as: C(Ei, j ) = T 0 (Ei, j ) ∙CV j ,

(7)

where is the rounded up time of T (Ei, j ). For each module, we denote its earliest start time and earliest finish time as est(wi ) and e f t(wi ), respectively, and denote its latest start time and latest finish time as lst(wi ) and l f t(wi ), respectively. The module’s buffer time, defined as lst(wi ) − est(wi ) or l f t(wi )−e f t(wi ), is the amount of time the execution of module wi can be delayed without affecting the overall endto-end delay of the entire workflow. The critical path (CP) is the longest path in the task graph weighted with time cost, which consists of all the modules with zero buffer time. In addition, we consider several general constraints on workflow mapping and execution/transfer precedence [25]: • Each module/edge is required to be mapped to a different virtual node/link. • A computing module cannot start execution until all its required input data arrive. • A dependency edge cannot start data transfer until its preceding module finishes execution. C. Problem Formulation Based on the mathematical models constructed above, we formally formulate a workflow scheduling problem for Minimum End-to-end Delay under a user-specified Cost Constraint in cloud environments, referred to as MED-CC:

wish to find a task schedule S : wi → V T j , ∀wi ∈ Vw , ∃V T j ∈ V T such that the minimum end-to-end delay (MED) of the one-toone mapped workflow is achieved: MED = min (TTotal ) = min ( ∑ T (Ei, j )), (8) all possible S

CTotal =

m−1

∑ C(Ei, j ) ≤ B.

(9)

i=0

In the above definition, TTotal and CTotal denote the total execution time and the total financial cost of a mapped workflow, respectively, and C(Ei, j ) denotes the execution cost of module wi on a virtual machine of type V T j . Since this work targets a single cloud, we do not consider the data transfer cost, i.e. CR = 0. IV. Complexity Analysis We have the following theorem on the complexity of the MED-CC problem. Theorem 1. MED-CC is NP-complete. Proof: Obviously, MED-CC ∈ NP, since given any solution (a schedule), we can always verify in polynomial time if it is a feasible solution by calculating and comparing the associated cost with the given budget. We prove the NP-hardness of MED-CC by showing that the Multiple-Choices Knapsack Problem (MCKP) [26], which is a well-known classical NP-complete problem, is a special case of MED-CC. We first provide the definition of MCKP as follows: Definition 2. MCKP: Given m classes N1 , . . . , Nm of items to pack in a knapsack of capacity c, where each item j ∈ Ni , i = 1, 2, . . . , m, has a profit pi j and a weight wi j , the problem is to choose exactly one item from each class such that the sum of profits is maximized while the sum of weights does not exceed capacity c, formulated as: • Maximize m

z=∑

∑ p i j xi j ,

i=1 j∈Ni

T 0 (Ei, j )

Definition 1. MED-CC: Given a DAG-structured workflow graph Gw (Vw , Ew ), a set of available virtual machine types V T = {V T0 ,V T1 , . . . ,V Tn−1 }, and a fixed financial budget B, we

all possible S all w ∈CP i

subject to the financial constraint:



Subject to

m

∑ ∑ wi j xi j ≤ c,

i=1 j∈Ni

∑ xi j = 1, i = 1, . . . , m,

j∈Ni

xi j ∈ {0, 1}, j ∈ Ni , i = 1, . . . , m. We consider a special case of MED-CC where the workflow only consists of a linear pipeline of m modules with zero data transfer time over each dependency edge. For convenience, we refer to this special case as MED-CC-Pipeline. For each VM type V T j , we compute an estimated performance vector [4] as: TE [ j] = [T (E0, j ), T (E1, j ), . . . , T (Em−1, j )], where T (Ei, j ) represents the execution time of module wi on a virtual machine of type V T j . Accordingly, we compute a vector of execution cost CE [ j] = [C(E0, j ),C(E1, j ), . . . ,C(E(m−1), j )], where C(Ei, j ) = CV j ∙ T 0 (Ei, j ). The optimization goal is to minimize the total execution time TTotal of the pipeline-structured workflow subject to the constraint of a given financial budget B. We show that MED-CC-Pipeline is essentially the MCKP problem. As illustrated in Fig. 2, m modules in MED-CCPipeline correspond to m classes in MCKP, and n available VM types for each module wi in MED-CC-Pipeline correspond to n items in each class Ni in MCKP. We set c = B and

assign each item in a class in MCKP a weight, which is equivalent in value to the corresponding financial cost C(Ei, j ) in MED-CC-Pipeline, and a profit, which is equivalent in value to K − T (Ei, j ) in MED-CC-Pipeline, where K is a sufficiently large constant such that K ≥ T (Ei, j ), ∀i, j. Under this problem construction, MCKP and MED-CC-Pipeline are the same problem that chooses exactly one item or VM type per class or module to maximize the total profit or minimize the total execution time TTotal , subject to the knapsack capacity or financial budget constraint B. Since the data transfer in MED-CC-Pipeline is ignored, TTotal is simply the sum of execution times of all modules and the module dependency along the pipeline does not affect the problem.

-

pi,max =

max (pi, j ) in the i-th class in MCKP, and K is a

j=1,2,...,n

sufficiently large constant such that K ≥ pi, j , ∀i, j. Similarly, for all the modules in MED-CC, we set the the financial charging rate of the j-th VM type V T j to be an identical constant CV∗, j such that CV∗, j = CVi, j , i = 1, 2, . . . , m. c We first construct a constant factor k = m∙wmax,max from IMCKP , where wmax,max = max wi, j . i=1,...,m; j=1,...,n

For each VM type V T j in MED-CC, we set its financial w wmax, j j charging rate to be CV∗, j = k ∙ (W Lmaxmax, /V P∗, j )0 = k ∙ T 0 (Emax, j ) , where wmax, j = max (wi, j ), W Lmax = max (W Li ), and i=1,2,...,m

m

-

TTotal = ∑ T (Ei, j ),

-

i=1

-

-

-

-

-

m

CTotal = ∑ CV∗,max ∙ -

i=1

Figure 2.

m

CTotal = ∑ C(Ei, j ), i=1

where Ei, j denotes the execution of module wi on the VM of type V T j under the schedule S . It follows that we can find their value ranges according to the schedule when V Pmax is selected for each module: m m W Li W Li ≥∑ , TTotal = ∑ i=1 V P∗, j i=1 V Pmax

-

-

i=1,2,...,m

T 0 (Emax, j ) is the rounded up time of T (Emax, j ). This instance construction process is illustrated in Fig. 3. Since A is an approximate algorithm of IMED , it returns the total execution time TA ≤ (1 + ε ) ∙ TOPT , where TOPT denotes the optimal total delay of IMED under the current budget. Once we have a schedule S , we can calculate:

Illustrating the relation between MCKP and MED-CC-Pipeline.

Since a special case is NP-complete, the original MED-CC problem is also NP-complete. The proof by restriction is established in [27]. Proof ends. Theorem 2. MED-CC does not have a polynomial-time approximation scheme (PTAS) of any constant approximation ratio, unless P = NP. Proof: We prove the non-approximability of MED-CC by contradiction. Given a positive constant ε , let us assume that there exists an approximate algorithm A to a problem instance IMED of MED-CC with an approximation ratio less A (IMED ) TA than 1 + ε , i.e. OPT (IMED ) = TOPT ≤ 1 + ε . Consider an arbitrary instance IMCKP of MCKP consisting of m classes with ni (ni > 1, i = 1, 2, . . . , m) items in the i-th class, and a positive constant PB. We first increase the number of items in each class to n = nmax = max (ni ) by padding i=1,2,...,m

n − ni dummy items, each with zero profit and a weight larger than the weight of each of the original ni items. This way, none of the dummy items would affect the solution since they would not be included in any schedule. The decision version of MCKP asks if there exists a choice of items such that the total profit is no less than the profit bound PB. Given the MCKP instance IMCKP , we can always construct an instance IMED of MED-CC by creating a sequence of m modules corresponding to m classes, creating a set of n VM types corresponding to n items in each class in MCKP, and setting the budget B = c. Suppose that each module has n VM types of its own to choose from. For all the modules in MED-CC, we set the processing power of the j-th VM type V T j to be an identical constant V P∗, j such that V P∗, j = V Pi, j , i = 1, 2, . . . , m. For each module wi in MED-CC, we set its workload to be W Li = V Pmax ∙ (K − pi,max ), where V Pmax = max (V P∗, j ), j=1,2,...,n

W Li V Pmax

m

wmax,max W Li ∙ 0 VP (W L /V P ) max max max i=1 m ∙ wmax,max W Lmax ≤k ∙ ∙ (W Lmax /V Pmax )0 V Pmax c ≤ ∙ m ∙ wmax,max = c, m ∙ wmax,max where CV∗,max is the charging rate for the VM type with processing power V Pmax . W Li Let us denote ∑m i=1 V Pmax by TA . When TTotal achieves TA , we are choosing the VM type with V Pmax for all modules wi . It is obvious that TA is the optimal result of IMED in this case. The above optimal solution to IMED corresponds to the optimal solution to IMCKP where the item with pi,max in each class ni is selected, if the given knapsack capacity c ≥ B. We can always choose the value of ε = εA such that: εA ∙ TA < T (Ei,h ) − T (Ei,max ), ∀i ∈ {1, . . . , m}, ∀h ∈ {1, . . . , n}, h 6= arg max{pi, j }. =k ∙ ∑

j∈{1,...n}

The approximate algorithm A to IMED with ε = εA only returns a solution that corresponds to the optimal solution to IMCKP , because any other feasible solutions (within the budget) would result in TTotal > (1 + ε )TA , which conflicts with our definition of A . We only need to consider the following cases: • Case i): If PB ≤ m ∙ K − TA , where K is the constant defined above, the decision version of IMCKP has a solution that answers “yes”, since A can always achieve the required profit bound PB; • Case ii): If PB > m ∙ K − TA , the decision version of IMCKP does not have a solution (i.e. the answer is “no”), since A cannot achieve the required profit bound PB. In the above cases, if IMCKP has a solution, we can solve IMCKP exactly in polynomial time by applying A to IMED , which is not possible unless P = NP. Hence, we conclude that there is no PTAS for MED-CC. Proof ends.

=

=

-

-

×

=

-

=

=

=

=

=

× ×

-

= =

=

-

=

= =

-

Figure 3.

Instance construction in the non-approximability proof.

V. Algorithm Design Our work targets compute-intensive workflows in a single datacenter cloud where data transfer time is assumed negligible because: i) we consider aggregate workflows where modules have been bundled into groups using a clustering technique such that the inter-module data transfer is minimized, and ii) the time for data transfers in most compute-intensive scientific workflows constitutes less than 10% of the overall workflow execution time [1]. Note that the widely adopted cloud computing models such as those in CloudSim [28] use high-bandwidth network connections by default within the same datacenter and hence only inter-datacenter transfers incur non-trivial latency. The GAIN3 approach in [14] starts with an initial least-cost schedule, and then repeatedly reassigns the module with the largest GainWeight value. Among all GAIN group algorithms starting from least-cost, GAIN3 yields the best performance, thus we choose it for comparison. Our approach, referred to as Critical-Greedy (CG), also starts with a least-cost schedule, but differs in the rescheduling criteria. The ratio of the time and cost difference between different schedules is adopted in many scheduling algorithms for makespan or cost optimization, such as the Cost Decrease Ratio (CDR) in [17], LossWeight and GainWeight in [14], Comparative Advantage (CA) in [16], and the rank value for deadline assignment in [18]. We also consider the time and cost difference between different schedules in our algorithm. A. Critical-Greedy Algorithm We first define the difference ΔT (Ei, j ) and ΔC(Ei, j ) in execution time and financial cost, respectively, as follows: ΔT (Ei, j )=Told − T (Ei, j ), (10) ΔC(Ei, j )=C(Ei, j ) −Cold , (11) where Told and Cold are the execution time and cost of wi in some old schedule Sold , and T (Ei, j ) and C(Ei, j ) are the execution time and cost incurred by mapping wi to a VM of a different type V T j in some new schedule Snew . Starting with the least-cost schedule, we calculate the current critical path of the mapped workflow and only consider critical modules for rescheduling. The critical path calculation takes O(m+|Ew |), where m is number of modules and |Ew | is number of dependency edges. This step is executed repeatedly during the rescheduling process, and the number of iterations depends on both the number n of VM types, and the budget level. We

Algorithm 1 Critical-Greedy(Gw , V T , B)

Input: A workflow graph Gw (Vw , Ew ), a set of virtual machine types V T , a fixed budget B. Output: MED and CTotal under SCG . 1: Calculate the execution time matrix TE and execution cost matrix CE , ∀wi ∈ Gw , ∀V Ti ∈ V T ; 2: Find the least-cost schedule Sleast−cost , which maps each computing module wi to the type V T j such that C(Ei,min ) = min j∈{1,...,n} C(Ei, j ); If there are multiple VM types with the same amount of C(Ei,min ), choose the one with the minimum T (Ei, j ) among them; 3: Calculate the total cost as Cmin , which is the lower-bound cost; 4: if B < Cmin then 5: return an error message as there is no feasible solution for the given B. 6: else 7: Calculate the time difference matrix consisting of ΔTEi, j , and cost difference matrix ΔC(Ei, j ), ∀wi ∈ Gw , ∀V Ti ∈ V T based on Sleast−cost ; 8: Ctmp = Cmin ; 9: while (Cextra = B −Ctmp ) > 0 do 10: Calculate the Critical Path (CP) under the current schedule; 11: for all modules wi ∈ CP do 12: Calculate ΔT (Ei, j ) and ΔC(Ei, j ) under all possible reschedules; 13: Find module wi resulting in the largest time decrease ΔT (Ei, j ) with ΔC(Ei, j ) ≤ Cextra ; If there are multiple VM types providing the same largest time decrease, choose the one with the minimum cost increase; 14: if no wi is found with ΔC(Ei, j ) ≤ Cextra then 15: break; 16: Reschedule wi with V T j ; 17: Ctmp = Ctmp + ΔC(Ei, j ); 18: CTotal = Ctmp ; 19: MED = eft(wm−1 ); 20: return MED and CTotal under SCG ;

first identify and reschedule the critical module with the largest execution time reduction ΔT (Ei, j ) with ΔC(Ei, j ) within the left budget. During the rescheduling process, the critical path of the mapped workflow may change, and in this case, we recalculate a new set of critical modules and repeat the rescheduling process until no rescheduling is feasible with the left budget. The pseudocode of Critical-Greedy is provided in Alg. 1. B. A numerical example We provide a simple numerical example to illustrate the scheduling procedure of Critical-Greedy, where we consider a workflow consisting of 6 computing modules w1 , . . . , w6 with an entry (start) module w0 and an exit (end) module w7 representing the initial data input process and the final data output process of the workflow. Both the entry module and

Table II S CHEDULES COMPUTED BY C RITICAL -G REEDY (CG).

Figure 4.

The workflow used in the example.

SCG 1 2 3 4 5 6

B [60.0, ∞) [56.0, 60.0) [52.0, 56.0) [50.0, 52.0) [49.0, 50) [48.0, 49)

w1 2 2 2 2 2 2

w2 3 3 2 2 2 2

w3 3 3 3 3 1 1

w4 3 3 3 3 3 1

w5 3 2 2 2 2 2

w6 3 3 3 1 1 1

MED 5.43 6.77 8.10 10.77 12.10 16.77

Table I T HREE AVAILABLE VM TYPES IN THE EXAMPLE . V Pj 3 15 30

CV j 1 4 8

the exit module are assumed to have a fixed execution time of one hour. The workload of each module and the data size on each directed edge are marked in Fig. 4. There are three available types of VMs: V T1 , V T2 and V T3 , whose processing power V Pj and charging rate CV j per unit time are provided in Table I. For simplicity, we ignore the financial cost of the entry and exit modules. By calculating the execution time and cost of each module mapped on every type of VM based on our profiling models in Eqs. 6 and 7, we obtain the matrices CE and TE in O(m ∙ n), respectively, where m is the number of workflow modules and n is the number of available VM types. This step is only executed once in the beginning of the algorithm. Then, we determine the least-cost schedule Sleast−cost with the minimal cost of 48 and a total delay of 16.77 hours by the instantiation of 3 V T2 and 3 V T1 , and the fastest schedule S f astest with the minimal execution time of 5.43 hours and a total cost of 64 by the instantiation of 6 V T3 . The mappings of VM types in the schedules Sleast−cost and S f astest are marked (shaded) in Fig. 5. Based on these two schedules, we know that a reasonable financial budget falls in the range of [Cmin = 48,Cmax = 64], as any budget below 48 cannot have the workflow completed and any budget above 64 is a waste of monetary expenses.

Figure 5.

The TE and CE matrices in the least-cost and fastest schedules.

For a given budget, Critical-Greedy starts with the least-cost schedule for each module, and then distributes the available left budget to reschedule the VM types for some or all of the modules. This way, the workflow is guaranteed to finish completely unless the budget is below the minimal cost under the least-cost schedule. For illustration purposes, we consider a financial budget B = 57. Starting with the least-cost schedule, we first reschedule module w4 to a VM of type V T3 , which decreases the execution time of w4 by 6 and decreases the current total time TTotal to 12.1. Since the critical path changes after the previous

18 17 16

MED (Time Units)

VM Types V T1 V T2 V T3

15 14 13 12 11 10 9 8 7 6 5 4 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64

Budget Increment

Figure 6. MED measurements of Critical-Greedy under different financial budgets in the numerical example.

round of rescheduling, we recalculate a new critical path, and reschedule module w3 to type V T3 , resulting in an updated total time TTotal = 10.77. This process of critical path recalculation and module rescheduling is repeated for w6 mapped to V T3 and w2 mapped to V T3 until there is no budget left, or no further rescheduling is possible within the left budget. We finally achieve the minimal end-to-end delay of 6.77 hours under the budget of 57 with one unit of budget left unused. We list in Table II all the schedules SCG found by CriticalGreedy (CG) in the above numerical example as the financial budget B varies within the range of [Cmin = 48,Cmax = 64]. Each schedule determines the mapping between all the modules and their corresponding VM types, as shown in the table. The MED performance measurements of these schedules are further plotted in Fig. 6, which shows that the MED decreases as the budget increases. In practice, once SCG is produced, we can explore the possibility of VM reuse. In this example, as shown in Table II, schedule 1 suggests a potential VM reuse among 3 groups of modules, i.e. (w2 , w4 , w6 ), (w3 , w5 ), and (w4 , w5 ); schedule 3 suggests another potential VM reuse between w4 and w6 . We can further determine the VM reuse based on both the workflow structure and the underlying resource availability status. Due to VM reuse, the number of actual VMs needed is generally less than the number of workflow modules in many cases. VI. Performance Evaluation A. Simulation Settings We evaluate the performance of Critical-Greedy using CloudSim [28], which has been widely adopted for the modeling and evaluation of cloud-based solutions. Particularly, we extended CloudSim to support DAG-structured workflows and scheduling algorithms. We implement both Critical-Greedy and GAIN3 in Java in CloudSim, and run them on a Windows 7 Desktop PC equipped with Intel(R) Core(TM) Duo CPU T2300 of 1.66GHz and 2.00GB memory. To evaluate the algorithm performance, we consider different problem sizes from small to large scales. The problem size is defined as a 3-tuple (m, |Ew |, n), where m is the number of workflow

100

Critical-Greedy GAIN3

Percentage of optimal

90 80 70 60 50 40 30 20 10 0

(5, 6, 3)

(6, 11, 3) (7, 14, 3) Problem Size

(8, 18, 3)

Figure 7. Performance statistics of Critical-Greedy and GAIN3 compared with optimal. Table IV AVERAGE MED MEASUREMENTS OF C RITICAL -G REEDY AND GAIN3 ACROSS 20 BUDGET LEVELS WITHIN THE RANGE OF [Cmin ,Cmax ]. CG (m, |Ew |, n) CG GAIN3 Imp (%) Ratio GAIN Prb Idx 1 (5, 6, 3) 8.63 8.63 0.00 1.00 2 (10, 17, 4) 10.04 10.76 6.72 0.93 3 (15, 65, 5) 9.51 11.13 14.82 0.85 4 (20, 80, 5) 11.16 12.73 12.93 0.87 5 (25, 201, 5) 16.19 19.92 21.11 0.81 6 (30, 269, 6) 23.46 28.48 17.95 0.82 7 (35, 401, 6) 24.03 29.23 17.83 0.82 8 (40, 434, 6) 24.30 29.58 18.27 0.82 9 (45, 473, 6) 22.29 25.77 13.89 0.86 10 (50, 503, 7) 19.76 24.78 20.48 0.79 11 (55, 838, 7) 27.03 33.64 19.65 0.80 12 (60, 842, 7) 25.64 38.93 34.20 0.66 13 (65, 993, 7) 25.27 37.77 33.46 0.67 14 (70, 1142, 7) 33.87 46.75 27.67 0.72 15 (75, 1179, 8) 31.40 38.55 18.57 0.81 16 (80, 1352, 8) 29.56 38.71 23.72 0.76 17 (85, 1424, 8) 29.62 39.49 25.07 0.75 18 (90, 1825, 8) 29.53 42.20 30.16 0.70 19 (95, 1891, 9) 29.20 43.22 32.53 0.68 20 (100, 2344, 9) 38.34 48.19 20.50 0.80

optimal results. We observe that Critical-Greedy is more likely to reach the optimality than GAIN3 in a statistical sense. 2) Comparison with GAIN3 We further consider 20 problem sizes from small to large scales, indexed from 1 to 20. In each problem size, we randomly generate one problem instance and calculate the corresponding cost range [Cmin ,Cmax ]. We then vary the budget from Cmin to Cmax at an interval of ΔC = (Cmax −Cmin )/20, indexed by 20 budget levels. We provide in Table IV the average MED achieved by both CG and GAIN3 as well as the average performance improvement across all 20 budget levels, which are further plotted in Fig. 8 for a visual comparison. MEDCG For a clear illustration, the ratio MED between the average GAIN execution time of CG and GAIN3 is also provided in the table. 70 60 Average Improvement

modules, |Ew | is the number of workflow links, and n is the number of available VM types. We generate workflow instances with different scales in a random manner: we first lay out m modules sequentially from w0 to wm−1 as a pipeline, each of which is assigned a certain workload randomly generated within an appropriate range. The workload for the entry and exit modules is ignored for simplicity. For each module wi , we randomly choose a number k within the range [1, m − 1 − i] and then choose k modules with their module ID’s in the range [i + 1, m − 1] as its successors. Finally, we connect all modules without any predecessors to the entry module w0 such that the total number of links is equal to the given |Ew |. We consider the pricing model for VMs used by real-world cloud service providers such as EC2. The price is a linear function of the number of processing units in the VM type. In our simulation, we set a base unit of processing power for the virtual CPU with a base price. The financial cost of each VM type is priced according to the number of base units of its processing power. Different VM types are associated with different performance parameters. We compare our Critical-Greedy (CG) algorithm with GAIN3 [14] in terms of the total workflow execution time under the same budget constraint. The MED performance improvement of CG over GAIN3 is defined as: MEDGAIN − MEDCG Imp = × 100%, MEDGAIN where MEDGAIN is the MED achieved by GAIN3, and MEDCG is the MED achieved by Critical-Greedy. The GAIN3 algorithm is initialized with the least-cost schedule, and then reassigns the task with the largest GainWeight, which is the ratio of the time decrease over the cost increase. The key difference between Critical-Greedy and GAIN3 exists in the selection of candidate modules for rescheduling and the rescheduling criterion. Critical-Greedy collects only the critical modules in each iteration, and makes a rescheduling decision based primarily on the time decrease as long as it is affordable. B. Simulation Results 1) Comparison with Optimal Solutions We first compare the performance of Critical-Greedy with optimal solutions in small-scale problems with 5, 6, and 7 modules with 3 available VM types. For each problem size, we randomly generate 5 problem instances with different module workload and DAG topology. For each problem instance, we calculate the corresponding least cost Cmin and maximum cost Cmax , as described in the numerical example provided in Subsection V-B, and then choose a random budget level in this range. We run Critical-Greedy on these instances and compare the MED results with the optimal ones computed by an exhaustive search approach, as shown in Table III. We observe that Critical-Greedy achieves the same results as the optimal solution in most cases, which indicate the efficacy of Critical-Greedy. For a more comprehensive comparison with optimal solutions, we consider small-scale problems consisting of 5, 6, 7 and 8 workflow modules with 3 available VM types. For each problem size, we randomly generate 100 problem instances with different module workload and DAG topology, and for each problem instance, we calculate the range of [Cmin ,Cmax ] and use the median of this range as the budget level. We run Critical-Greedy, GAIN3 and the optimal exhaustive search on these instances, and measure the corresponding MED results. Fig. 7 shows the percentage of instances where Critical-Greedy and GAIN3 achieve the

50 40 30 20 10 0

1

2

3

4

5 6 Problem Index

7

8

9

10

Figure 8. Average MED performance improvement of CG over GAIN3 across 20 instances of different budget levels for each problem size.

Table III P ERFORMANCE COMPARISON BETWEEN C RITICAL -G REEDY AND THE OPTIMAL RESULTS IN SMALL - SCALE PROBLEMS . Problem Size (5,6,3) (6,11,3) (7,14,3) Instance Index Critical-Greedy Optimal Critical-Greedy Optimal Critical-Greedy Optimal 1 20.98 20.98 52.69 52.69 61.75 54.52 2 23.63 23.63 19.54 19.54 53.25 53.25 3 51.35 51.35 23.57 23.57 37.02 31.67 4 20.57 20.57 45.22 45.22 38.43 38.43 5 41.60 41.60 31.75 31.75 41.97 41.97

70

50

Improvement Percentage

Improvement Percentage

60

40 30 20 10 0 1

2

3

4

5 6 Problem Index

7

8

9

10

Figure 9. Average MED performance improvement across 200 instances (20 different budget levels × 10 random workflow instances) for each problem size.

80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Problem Index

1

2

3

4

5

6

7

8

9

20 19 18 17 16 15 14 13 12 11 10

Budget Level

Figure 11. Overall performance improvement with varying budgets and problem sizes. 60

Improvement Percentage

50 40 30 20 10 0 1

2

3

4

5 6 Budget Level

7

8

9

10

Figure 10. Average performance improvement across 200 instances (20 different problem sizes × 10 random workflow instances) for each budget level.

For a more thorough comparison, we conduct another two sets of experiments. i) For each problem size, we randomly generate 10 workflow instances with different module workload and DAG topology. Then, for each workflow instance, we calculate the corresponding least cost Cmin and maximum cost Cmax , and divide the budget range [Cmin ,Cmax ] into 20 budget levels at a uniform interval ΔC = (Cmax −Cmin )/20. The average performance improvement percentage across these 200 problem instances for each problem size is plotted in Fig. 9. ii) For each budget level, we consider 20 different problem sizes, for each of which, we generate 10 random workflow instances. The average performance improvement percentage across these 200 problem instances for each budget level is plotted in Fig. 10. For a better illustration, we plot the overall performance improvement percentage of Critical-Greedy over GAIN3 in Fig. 11, where x axis denotes the budget increment across 20 levels and y axis denotes the index of 20 problem sizes. Each point (x, y, imp) in the 3D plot represents the average performance improvement across all 10 problem instances of the same problem size under the same budget level (the actual budget values may be different in different instances). These performance results show that Critical-Greedy achieves an average of 35% performance improvement over GAIN3, which is considered as significant for workflow execution in large-scale scientific applications.

3) Discussion We observe that the performance improvement in larger problem sizes is in general more significant. This is mainly because when there is a small number of modules and links, the rescheduling decisions are similar in these two algorithms. In larger problem sizes with complex DAG topology, in a given task schedule, the number of branch (non-critical) modules is much larger than that of critical ones. In such cases, GAIN3 makes little improvement on the overall end-to-end delay, since the modules with large GainWeight, which is only a local difference ratio, may not have a critical impact on the entire execution time. For large workflows, GAIN3 very likely reschedules some branch modules that have no global impact but only consume budget. Another observation is that the performance improvement increases as the budget increases. When the budget is close to the lower bound Cmin , there is not much space for both algorithms to explore. As Critical-Greedy’s performance gain rises mainly from distributing the available budget to speed up the most critical module, it is able to improve the performance more than GAIN3 under a larger budget. C. Experimental Results 1) WRF Workflow

Figure 12.

The structure of a general WRF workflow.

Our workflow experiments are based on the Weather Research and Forecasting (WRF) model [29], which has been widely adopted for regional to continental scale weather forecast. The WRF model [30] generates two large classes of simulations either with an ideal initialization or utilizing real

data. In our experiments, the simulations are generated from real data, which usually require preprocessing from the WPS package [31] to provide each atmospheric and static field with fidelity appropriate to the chosen grid resolution for the model. The structure of a general WRF workflow is illustrated in Fig. 12, where the WPS consists of three independent programs: geogrid.exe, ungrib.exe, and metgrid.exe. The geogrid program defines the simulation domains and interpolates various terrestrial datasets to the model grids. The user can specify information in the namelist file of WPS to define simulation domains.The ungrib program “degrib” the data and writes them in a simple intermediate format. The metgrid program horizontally interpolates the intermediateformat meteorological data that are extracted by the ungrib program onto the simulation domains defined by the geogrid program. The interpolated metgrid output can then be ingested by the WRF package, which contains an initialization program real.exe for real data and a numerical integration program wrf.exe. The postprocessing model consists of ARWpost and GrADs. ARWpost reads-in WRF-ARW model data and creates output files for display by GrADS. 2) Experiment Settings Nimbus [32] is an open-source toolkit focused on providing IaaS capabilities to the scientific community. The basic structure of a Nimbus cloud consists of a central controller node and multiple virtual machine monitor (VMM) nodes. VM images are uploaded through a client program to the storage repository on the controller node, and VMs are provisioned upon client requests on available VMM nodes. We set up a local Nimbus cloud testbed with Xen virtualization tools [33]. This private cloud contains 5 physical servers: one serves as the cloud controller and VM images repository, and the others are VMM nodes where VMs are deployed. We duplicate three WRF pipelines each from program ungrib.exe to ARWpost.exe, and group the programs into different aggregate modules to simulate real-world workflow clustering and provide various module parallelism, as shown in Fig. 13 and Fig. 14. Fig. 14 is a high-level view of grouped workflow in Fig. 13, where w0 and w7 are the start and end modules.

Figure 13.

The experiment workflow of three WRF pipelines.

Figure 14.

The grouped WRF experiment workflow.

Based on the physical resources, we provide 3 available VM types with sufficient RAM size and disk space for WRF execution and different CPU speeds, as shown in Table V, and assign pricing linearly according to the processing power.

VM Types VT1 VT2 VT3

TE VT1 VT2 VT3

Table V VM TYPES IN THE EXPERIMENT. CPU (GHz) RAM (KB) Disk (GB) 0.73 × 1 1024 6.8 2.93 × 1 1024 6.8 2.93 × 2 1024 6.8 Table VI E XECUTION TIME MATRIX TE . w1 w2 w3 w4 w5 43.8 22.7 13.8 47.0 752.6 19.2 9.6 7.0 30.0 241.6 12.0 10.1 7.2 19.4 143.2

Table VII P ERFORMANCE COMPARISON BETWEEN CG

AND WORKFLOW EXPERIMENTS .

Budget B 147.5 150.0 155.0 174.9 180.1 186.2

Schedule SCG SGAIN3 SCG SGAIN3 SCG SGAIN3 SCG SGAIN3 SCG SGAIN3 SCG SGAIN3

w1 1 3 1 3 3 3 1 3 3 3 1 3

w2 1 2 1 2 2 2 1 2 1 2 1 2

w3 1 2 1 2 1 2 1 2 1 2 1 2

w4 1 1 1 1 1 3 1 2 1 3 3 3

w6 377.8 123.1 119.7

GAIN3 w5 2 1 3 1 2 1 3 2 3 2 3 2

CV j 0.1 0.4 0.8

IN THE

w6 1 2 1 2 1 2 2 2 2 2 2 2

WRF

MED 468.6 809.2 467.9 809.8 436.8 784.0 213.9 281.2 212.7 270.6 206.4 270.8

Since the VMs in the experiment have the same disk space, they have a nearly identical VM startup time. Hence, we can always launch the VMs in advance before actually running workflow modules. 3) Performance Comparison We store the entire workflow as well as the input data on each VM image. To obtain the execution time of each module, we create a VM of each available VM type and run each module on it for multiple times. We observe that the module execution time remains stable on the same type of VM, and construct the execution time matrix TE in unit of seconds in Table VI. From TE and the pricing of VM types, we calculate Cmin = 125.9 and Cmax = 243.6, and then run two scheduling algorithms, i.e. Critical-Greedy (CG) and GAIN3, with 6 different budget values selected within the range of (Cmin ,Cmax ). Note that in the real-network experiments, adjacent modules with execution precedence constraints can reuse the same VM if they are mapped to the same type in the schedule. For example, in the schedules generated by CG, i) when B = 147.5, w1 and w3 are executed on the same VM of type V T 1, so are w2 , w4 and w6 ; ii) when B = 155.0, w4 and w6 are executed on the same VM of type V T 1. The schedules generated by both algorithms and their corresponding MED with different budget values are tabulated in Table VII and further plotted in Fig. 15 for comparison. These performance measurements show that the proposed CG algorithm consistently outperforms GAIN3 in all the test cases we studied. VII. Conclusion We formulated a budget-constrained workflow scheduling problem for delay minimization in cloud computing environments, referred to as MED-CC, which was proved to be NP-complete and non-approximable. We proposed a heuristic algorithm, Critical-Greedy, to MED-CC and demonstrated its performance superiority over existing methods through extensive simulations and real-life workflow experiments in a cloud testbed.

900 800

CG GAIN3

MED(seconds)

700 600 500 400 300 200 100 0

147.5 150.0 155.0 174.9 180.1 186.2 Budget Values

Figure 15. Performance comparison between CG and GAIN3 with different budget values using the WRF experiment workflow.

It would be of our future interest to refine and generalize the mathematical models to achieve a higher level of accuracy for workflow execution time measurement in real-world cloud environments. We also plan to incorporate the cost of inter-cloud data movement into workflow scheduling in multi-cloud environments. Such data transfer may pose some restrictions on VM provisioning as we need to consider VMs’ connectivity to support inter-module communication based on the available bandwidth in the cloud infrastructure. Acknowledgment This research is sponsored by U.S. Department of Energy’s Office of Science under Grant No. DE-SC0002400 with University of Memphis.

References [1] C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, and J. Good, “On the use of cloud computing for scientific workflows,” in Proc. of the 4th IEEE Int. Conf. on eScience, Washington, DC, USA, 2008, pp. 640–645. [2] E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good, “The cost of doing science on the cloud: the montage example,” in Proc. of the ACM/IEEE Conference on Supercomputing, Austin, Texas, USA, 2008. [3] Amazon EC2. http://aws.amazon.com/ec2/. [4] M. Mao and M. Humphrey, “Auto-scaling to minimize cost and meet application deadlines in cloud workflows,” in Proc. of Int. Conf. for High Performance Computing, Networking, Storage and Analysis, 2011, pp. 49:1–49:12. [5] G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. Berman, and P. Maechling, “Data sharing options for scientific workflows on amazon ec2,” in Proc. of the ACM/IEEE Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Washington, DC, USA, 2010, pp. 1–9. [6] Pegasus in the cloud. http://pegasus.isi.edu/cloud. [7] Y. Gu and Q. Wu, “Optimizing distributed computing workflows in heterogeneous network environments,” in Proc. of the 11th Int. Conf. on Distributed Computing and Networking, Kolkata, India, Jan. 3-6 2010. [8] M. Rahman, S. Venugopal, and R. Buyya, “A dynamic critical path algorithm for scheduling scientific workflow applications on global grids,” in IEEE Int. Conf. on e-Science and Grid Computing, Dec. 2007, pp. 35–42. [9] R. Bajaj and D. Agrawal, “Improving scheduling of tasks in a heterogeneous environment,” IEEE Trans. Parallel Distrib. Syst., vol. 15, no. 2, pp. 107–118, February 2004. [10] Q. Wu and Y. Gu, “Optimizing end-to-end performance of data-intensive computing pipelines in heterogeneous network environments,” J. of Parallel and Distributed Computing, vol. 71, no. 2, pp. 254–265, 2011. [11] H. Topcuoglu, S. Hariri, and M. Wu, “Performance effective and low-complexity task scheduling for heterogeneous computing,” IEEE TPDS, vol. 13, no. 3, pp. 260–274, 2002. [12] R. Sakellariou and H. Zhao, “A hybrid heuristic for dag scheduling on heterogeneous systems,” in Proc. of the 18th IEEE IPDPS, vol. 2, 2004, p. 111.

[13] J. Yu, “A budget constrained scheduling of workflow applications on utility grids using genetic algorithms,” in Proc. of SC WORKS Workshop, 2006, pp. 1–10. [14] R. Sakellariou, H. Zhao, E. Tsiakkouri, and M. Dikaiakos, “Scheduling workflows with budget constraints,” in Integrated Research in GRID Computing, S. Gorlatch and M. Danelutto, Eds. Springer US, 2007, pp. 189–202. [15] J. Yu, R. Buyya, and C. Tham, “Cost-based scheduling of scientific workflow application on utility grids,” in Proc. of the 1st Int. Conf. on e-Science and Grid Computing, Washington, DC, USA, 2005, pp. 140–147. [16] L. Zeng, V. Bharadwaj, and X. Li, “Scalestar: Budget conscious scheduling precedence-constrained many-task workflow applications in cloud,” in Int. Conf. on Advanced Information Networking and Applications, vol. 0, Los Alamitos, CA, USA, 2012, pp. 534–541. [17] S. Abrishami and M. Naghibzadeh, “Deadline-constrained workflow scheduling in software as a service cloud,” Scientia Iranica, no. 0, 2012. [18] M. Mao, J. Li, and H. Marty, “Cloud auto-scaling with deadline and budget constraints,” in Grid Computing, Oct. 2010, pp. 41–48. [19] T. Hacker and K. Mahadik, “Flexible resource allocation for reliable virtual cluster computing systems,” in Proc. of the ACM/IEEE Supercomputing Conference, 2011, pp. 48:1–48:12. [20] V. Viana, D. Oliveira, and M. Mattoso, “Towards a cost model for scheduling scientific workflows activities in cloud environments,” in Proc. of the IEEE World Congress on Services, 2011, pp. 216–219. [21] J. Jung and J. Bae, “Workflow clustering method based on process similarity,” in Proc. of the Int. Conf. on Computational Science and Its Applications, vol. 2, 2006, pp. 379–389. [22] G. Singh, M. Su, K. Vahi, E. Deelman, B. Berriman, J. Good, D. Katz, and G. Mehta, “Workflow task clustering for best effort systems with pegasus,” in Proc. of the 15th ACM Mardi Gras Conference, 2008, pp. 9:1–9:8. [23] W. Chen and E. Deelman, “Integration of workflow partitioning and resource provisioning,” in Proc. of the 12th IEEE/ACM Int. Symp. on Cluster, Cloud and Grid Computing, Washington, DC, USA, 2012, pp. 764–768. [24] M. Tanaka and O. Tatebe, “Workflow scheduling to minimize data movement using multi-constraint graph partitioning,” in Proc. of the 12th IEEE/ACM Int. Symp. on Cluster, Cloud and Grid Computing (CCGRID), 2012, pp. 65–72. [25] Y. Gu, Q. Wu, and N. Rao, “Analyzing execution dynamics of scientific workflows for latency minimization in resource sharing environments,” in Proc. of the 7th IEEE World Congress on Services, Washington DC, Jul. 4-9 2011. [26] S. Martello and P. Toth, Knapsack problems: algorithms and computer implementations. New York, NY, USA: John Wiley & Sons, Inc., 1990. [27] M. Garey and D. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness. San Francisco: W.H. Freeman and Company, 1979. [28] R. Calheiros, R. Ranjan, A. Beloglazov, C. D. Rose, and R. Buyya, “Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms,” Software: Practice and Experience, vol. 41, no. 1, pp. 23–50, 2011. [29] W. Skamarock, J. Klemp, J. Dudhia, D. Gill, D. Barker, M. Duda, X. Huang, W. Wang, and J. Powers, “A description of the advanced research wrf version 3,” National Center for Atmospheric Research, Boulder, Colorado, USA, Tech. Rep. NCAR/TN-475+STR, June 2008. [30] Http://www.wrf-model.org/index.php. [31] WRF Preprocessing System (WPS). http://www.mmm.ucar.edu/wrf/users/wpsv2/wps.html. [32] Nimbus cloud project. http://www.nimbusproject.org. [33] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, “Xen and the art of virtualization,” in ACM SIGOPS Operating Systems Review, vol. 37, no. 5. ACM, 2003, pp. 164–177.

Suggest Documents