Elastic Resource Provisioning for Cloud Workflow Applications

3 downloads 13487 Views 3MB Size Report
the cloud workflow applications under study. 1) Interval-based charging strategies. In many realistic work- flows, task processing times are usually much shorter ...
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

1

Elastic Resource Provisioning for Cloud Workflow Applications Xiaoping Li, Senior Member, IEEE, and Zhicheng Cai, Member, IEEE

Abstract—Many workflow applications are moved to clouds for elastic capacities. Elastic resource provisioning is one of the most important problems. Realistic factors are involved, including an interval-based charging model, data transfer time, VM loading time, software setup time, resource utilization, and the workflow deadline. A multirule-based heuristic is proposed for the problem under study which contains two components: a deadline division and task scheduling. Taking into account the gaps between tasks, the impact of different critical paths and the precedence constraints, the workflow deadline is properly divided into task deadlines based on the solution of a relaxed problem. The relaxed problem is modeled by integer programming and solved by CPLEX. All tasks are sorted in terms of the developed depth-based rule. For different realistic factors, three priority rules are developed to allocate tasks to appropriate available time slots, from which a weighted rule is constructed for task scheduling. The weights are calibrated by random instances. Experiments are conducted using a benchmark realistic workflow. Experimental results show that the proposal is effective and efficient for realistic workflows. Note to Practitioners—This paper is motivated by the elastic resource provisioning problem of virtual data centers in clouds which are managed by scientific research institutes, or small or middlesized enterprises, to minimize the total resource renting cost of cloud workflow applications. For example, when we rent virtual machines from Amazon EC2 for big-data analysis applications, the number and the type of rented virtual machines change in terms of saving on renting costs. Because virtual machines are priced in intervals in most commercial clouds, tasks must be properly scheduled on rented virtual machines to improve the utilization of rented intervals. Existing methods do not factor in software setup times, yet these have an impact on scheduling effectiveness (especially for the cases when tasks have shorter execution times than software setup times). In this paper, a heuristic called MRH is developed for elastic virtual machine provisioning. Similarly, practical factors (utilization of rented intervals, VM loading time, software setup, data transfer, execution efficiency, the match between the length of time slots and that of task executions) are considered in MRH. Experimental results on realistic applications show that MRH could decrease virtual machine renting costs by up to 78.57%. Furthermore, MRH is fast which could meet the quick reaction times required in modern IT applications in rented virtual data centers (such as data centers built on Amazon EC2). Manuscript received June 16, 2015; revised September 17, 2015; accepted November 10, 2015. This paper was recommended for publication by Associate Editor J. Sethuraman and Editor H. Ding upon evaluation of the reviewers’ comments. This work was supported in part by the National Natural Science Foundation of China under Grant 61572127 and Grant 61272377 and the Specialized Research Fund for the Doctoral Program of Higher Education under Grant 20120092110027. (Corresponding author: Zhicheng Cai.) X. Li is with the School of Computer Science and Engineering, Southeast University, Nanjing 211189, China, and also with the Key Laboratory of Computer Network and Information Integration, Southeast University, Ministry of Education, Nanjing 211189, China (e-mail: [email protected]). Z. Cai is with the School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China (e-mail: [email protected]). Digital Object Identifier 10.1109/TASE.2015.2500574

Index Terms—Cloud computing, heuristic, resource provisioning, workflow scheduling.

M

I. INTRODUCTION

ANY business and institutional applications take hours or days to process large volumes of data. The speed of processors cannot keep up with the growth rate of data. The limited capacity of local clusters cannot meet the requirements of users to analyze data quickly. Nowadays, many companies and institutes are trying to migrate their applications for real-time analysis, online advertising, and scientific computing to existing commercial clouds. The auto-scaling nature of clouds provides flexible resource provisioning. Resources are acquired or released according to application volumes at any time. Resources are charged according to the use that is ultimately made of them. However, resource under-provisioning usually decreases the system's performance whereas resource over-provisioning always leads to idle resources (which could incur unnecessary costs). Therefore, it is a challenge for cloud users to match computation tasks to the most appropriate rented cloud resources with minimum renting costs while meeting system requirements at the same time. Generally, business analysis and scientific computing applications involve large volumes of data. In order to reduce processing times, data are partitioned and processed in parallel. Intermediate data are transferred to the sequential steps of the applications. Such applications with parallel and sequential tasks are often modelled as workflows, which can be represented by directed acyclic graphs (DAGs). Tasks are nodes while precedence relationships (such as data dependency) between tasks are arcs. Furthermore, most applications are restricted by deadlines. Therefore, cloud users are required to allocate workflow tasks to the most appropriate type and number of virtual machine (VM) instances in order to achieve a balance between the cost of renting the resource and the workflow makespan. There are several established projects (such as Pegasus [1] ASKALON [2], and GrADS [3]) that manage workflow applications on distributed resources, e.g., grid resources. Task scheduling with various constraints and objectives is one of the most important functions of workflow management (the other common functions are workflow description, execution and monitoring). In the task scheduling strategies of traditional distributed systems, resources are shared by different entities. However, resources are managed by providers, i.e., users cannot exclusively utilize the resources according to their requirements. Deelman et al. [1] claimed that resource reservation could improve the performance of workflow management.

1545-5955 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

VMs in clouds are fully reserved resources, which means that workflow applications can be handled more effectively and flexibly. At the same time, existing strategies for scheduling shared resources in grids are not suitable for scheduling reserved resources in clouds because of the unique characteristics and constraints these have, such as their auto-scaling capacity and their interval-based pricing model. In this paper, we present heuristics for scheduling workflow tasks on reserved resources in clouds, which could be used, for example, in the task manager (Condor Schedd) of Pegasus [1] with a cloud resource pool. There are several factors that affect service provisioning for the cloud workflow applications under study. 1) Interval-based charging strategies. In many realistic workflows, task processing times are usually much shorter than the reserved pricing intervals. Therefore, serializing parallel tasks in a workflow can improve the utilization of the rented resource intervals while meeting the workflow deadline, i.e., resource renting costs can be minimized. 2) Software setup time and intermediate data transfer time. VMs rented from public clouds are bare machines. It is very time-consuming to pre-install all of the different types of professional software required. Therefore, setup times are spent on downloading specific software from file systems and loading them onto the operating system before tasks are executed. Since many tasks need the same kind of software in a workflow-based application, software setup times can be greatly reduced by reusing the installed software. Similarly, a large volume of intermediate data is usually transferred to workflow application tasks. If a successive task is placed at the VM instance where one of its immediate predecessors is located, the data transfer time between them can be avoided. However, the VM for the predecessor may not have the software it needs for the following task, which leads to a software setup time. Conversely, if the next task is assigned to another VM which has the required software, then the data produced by its predecessor has to be transferred. Therefore, it is desirable to find a balance between software setup times and data transfer times, which makes the related scheduling problem much more complex. 3) Complex network structures and diverse task requirements (computation-intensive, memory-intensive or I/O intensive). Many commercial clouds (e.g., Amazon EC2) provide multiple types of VMs with different configurations (number of virtual cores, CPU frequency, memory size and I/O bandwidth) and prices. To optimize efficiency and costs, different tasks are given different requirements in VM instances (normal, high-CPU, or high-memory). There are some studies [4], [5] on cloud workflow scheduling in which data transfer time and the match between task attributes and VM types are usually considered. However, software setup time is either not considered or just considered as a fixed part of the execution time, which leads to software reinstallation. In this paper, the three factors mentioned above are considered at the same time. We propose a multiple rule-based heuristic for provisioning elastic cloud resources to workflow tasks to minimize total resource renting costs. The framework

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

of the proposals is divided into two steps: dividing the workflow deadline into task deadlines, and scheduling tasks for VM instances constrained by task deadlines. As opposed to traditional methods which divide workflow deadlines by the fastest task processing times, an optimizing deadline dividing method is presented here. Furthermore, three novel priority rules are developed for task scheduling to match tasks with the time slots (periods of time available for tasks). The main contributions of this paper are summarized here. • Based on the solution of a relaxed problem, a workflow deadline division method is developed. To properly generate task deadlines, the execution time/cost tradeoff, the impact of different critical paths and the precedence constraints of tasks are considered. • Three rules are proposed for matching tasks with time slots with different objectives: minimizing the number of rented intervals, minimizing real costs (including data transfer costs, software setup, and VM loading) and maximizing resource utilization. • We present a multiple rule-based heuristic (MRH) for the problem under study. Tasks are constrained by the generated task deadlines and are scheduled in the order determined by the introduced task selection rule and the weighted three rules. II. RELATED WORK In the literature, more attention has been paid to system performance of clouds from the perspective of cloud providers [6]–[9] than to minimizing the resource renting costs paid by cloud users. Zuo et al. [10] proposed self-adaptive learning particle swarm optimization for minimizing the outsourcing cost of tasks with deadline constraints for IaaS clouds. Mao et al. [11] presented a cloud auto-scaling mechanism to automatically change the scale (rent or release) and the type of VM instances based on real-time workloads to minimize renting costs while meeting performance requirements. Bossche et al. [12] considered cost-optimal scheduling in hybrid IaaS clouds for deadline constrained workloads to maximize the utilization of internal data centers and minimize the cost of running outsourced tasks in clouds. Existing methods only deal with independent tasks. However, precedence relationships (e.g., data transfer dependency) between tasks are widespread in practical applications. Such applications are generally modelled as workflows. There are few studies on cloud workflow scheduling. Byun et al. [13] considered workflow scheduling in distributed systems. They proposed the Balanced Time Scheduling (BTS) algorithm to minimize client-oriented resource renting costs. BTS assumes that a fixed number of homogeneous resources are used during the whole workflow execution. Later, they [14] developed the Partitioned Balanced Time Scheduling (PBTS) algorithm for elastic resource provisioning to minimize the number of resources in each time partition rather than that of the whole workflow execution duration. Resources are also assumed to be homogeneous, i.e., only one kind of VM is used in the system. Heterogeneous resources and data transfer times have been considered by some researchers. However, software setup times are rarely concerned. Recently, Abrishami et al. [5] proposed two heuristics, IC-PCP and IC-PCPD2, for allocating workflow

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LI AND CAI: ELASTIC RESOURCE PROVISIONING FOR CLOUD WORKFLOW APPLICATIONS

3

TABLE I CONFIGURATIONS AND PRICES (PER HOUR) FOR VMS

Fig. 1. Architecture of the considered system.

tasks to heterogeneous resources. They considered data transfer between tasks and choices of multiple VM types. However, operating system image loading times and software setup times were not considered. Durillo and Prodan [15] studied the resource provisioning of scientific workflows in Amazon EC2 to optimize both makespan and resource renting costs. Operating system image loading times and software setup times were ignored. Mao et al. [4] proposed an approach for supporting the running of workflow applications on auto-scaling cloud VMs. The workflow deadline was divided by a simple heuristic. Resources were scaled according to a Load Vector and the instance-hour utilization of VMs was improved by consolidating tasks. However, they considered the software setup time by regarding it as a part of the processing time and did not take into account the trade-off between the data transfer time, the software setup time and the choice of VM type. In Pegasus [1], resource selection algorithms can be designed as pluggable components with software-aware setup times. However, neither the dependent software setup time nor the tradeoff between the software setup time and the data transfer time has been investigated for cloud workflow applications in the literature. In this paper, we consider cloud workflow scheduling to optimize renting costs under a given workflow deadline, in which operating system image loading times and the software setup times are considered. III. PROBLEM DESCRIPTION There are many constraints in complex elastic resource provisioning for cloud workflow applications. It is intractable to take into account all constraints in practice. In this paper, we model the scheduling problem according to some assumptions on application tasks and cloud resources. As shown in Fig. 1, there are two roles (the cloud provider and the cloud user) in clouds. Cloud providers (such as Infrastructure as a Service (IaaS) provider) supply services (such as VMs and storage) to cloud users. Cloud users rent resources from cloud providers to establish their own virtual data centers to support their applications, in which the resource renting cost is minimized by applying auto-scaling mechanisms. When a workflow is submitted to the system, the Workflow Scheduler (WS) is in charge

Fig. 2. Illustrative workflow.

of renting VM instances from clouds. In order to optimize the system's performance, e.g., improve the utilization of rented intervals, algorithms should be designed in the WS to provision resources and schedule tasks properly. The Elasticity Broker (EB) is in charge of renting and releasing VM instances from clouds according to the schedules obtained by the WS. The EB rents a new VM instance from cloud providers if the WS decides to assign a task to a newly rented VM instance. The EB also releases VM instances which are no longer used in the next time interval. The objective of this paper is to design algorithms for the WS to minimize resource renting costs while meeting the workflow deadline . Different types of VMs (with distinct prices per month, hour or minute) are provided by many existing commercial clouds. Similarly, interval-based (month, hour, or minute) pricing models are adopted by many commercial clouds, i.e., users are charged by the number of intervals. Generally, the whole interval is paid even though only a part of the interval is used. Table I gives the configurations and prices of different VM types on Amazon EC2. Workflows are often used to model cloud application tasks in many fields such as big data mining, the online advertising industry and scientific computing. Workflows are depicted by a Directed Acyclic Graph (DAG) , where is the set of tasks in the workflow, is the set of precedence constraints for tasks (such as data transfer dependencies), and indicates that cannot start before is completed. and represent the immediate predecessor and immediate successor sets of respectively. Fig. 2 depicts a workflow example with seven tasks ( and are the dummy source and sink nodes). The label on each arc denotes the volume of intermediate data transferred from to .

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

Different types of VM have different configurations, which are suitable for different types of tasks. Normal VM instances are suitable for common tasks, high-CPU VM instances for computation-intensive tasks, and high-memory VM instances for database operation tasks. For example, a computation-intensive task needs less time and involves a lower cost on high-CPU VM instances than on other types of VM instances. The execution of a memory-intensive task is efficient on high-memory VM instances. Therefore, it is critical to allocate proper types of VM instances to tasks to improve cost-efficiency (with shorter times and lower costs). The fact that execution times of tasks are different on various VM instances makes allocation very difficult. Several existing methods [16], [17] can be used to estimate execution times. Let denote the execution time of type of VM instances with price . task on In addition to the execution time of each node, data transfer times are significant. In fact, data transfer between tasks is always time-consuming, and is dependent on the volume of data and the system network bandwidth . Because VM instances are usually rented from the same data center of a cloud provider, we assume that the bandwidths of different VM instances are the same. Different system software (such as the operating system, middle-ware, and professional software components) are required to execute tasks, which can be regarded as setup times (including VM image transfer times, OS loading times, specific software downloading and installing times). In this paper, we assume that VM setup times for the same type of VM instances are equal (which can be estimated according to experiences). denotes the VM setup time for the VM type being the software component for with the setup time being . Scheduling DAG-based tasks to distributed machines has been proved to be NP-hard in [18], [19] and the over realistic features make DAG-based scheduling even harder. In this paper, a heuristic is proposed for the cloud workflow scheduling under study to minimize VM renting costs within the workflow deadline. IV. PROPOSED HEURISTIC Dividing deadlines and scheduling tasks are two crucial phases when scheduling workflows. Workflow deadlines must be divided into task deadlines and then tasks must be scheduled with the task deadline constraints that have been obtained. Many factors exert influences on the two phases, including the complexity of workflows (precedence constraints between tasks), task types (batch-based tasks or single tasks), and resource classes (priced by per usage or time intervals). For workflow scheduling problems with different factors, different deadline division, and task scheduling methods have been proposed [5], [20]–[22]. Since different constraints and factors are considered in this paper, it is necessary to investigate new deadline division and task scheduling methods. A. Workflow Deadline Division Dividing the entire deadline up for each task in the workflow is closely related to the execution time of the task, which is determined by the allocated VM type. Once all tasks have been allocated to corresponding VM types, the earliest start and

Fig. 3. Examples of time floats based on determined execution times. (a) Workflow structure. (b) Example of time floats.

finish times of every task can be obtained by dynamic programming. We define time float as the time slot (gap) between the earliest finish time of a task and the earliest start time of one of its immediate successors or between the earliest finish time of the sink node and the workflow deadline. For the workflow example shown in Fig. 3(a), Fig. 3(b) depicts its partial Gantt Chart with time floats and . In order to make task scheduling more flexible, these time floats must be distributed appropriately to tasks for generating task deadlines. Therefore, VM type selection and time float distribution are two critical steps for the workflow deadline division. Abrishami et al. [5] and Yu et al. [20] divided the workflow deadline into task deadlines (subdeadlines) in proportion to the minimum execution times of the fastest VM types. However, the fastest VM type selection does not take into account the execution cost or only time is considered, which results in unbalanced task deadlines. Some tasks are given more candidate VM types whilst others are given fewer. When time and cost are considered this does not lead to optimal solutions. Both time and cost are considered in the workflow deadline division methods DET [21] and PCP [22]. However, DET and PCP are not suitable for the problem under study in this paper because more constraints (such as data transfer, the reuse of software, and interval-based pricing models) are present in cloud computing, which makes workflow deadline division much more complex. Furthermore, because of the networked structures of workflows, the time floats of some critical paths are constrained by those of some other critical paths. However, some existing time float distribution methods do not consider the impacts while others only take into account partial time floats on critical paths, which limits the search for better solutions in task scheduling procedures. In this paper, new VM type selection and time float distribution methods are proposed for the workflow deadline division. When choosing VM types, it is hard to determine whether setup times (the data transfer time, the software setup time, etc.) are needed or not in the final solutions and the interval-based pricing model makes the problem more complex. Taking into account the time/cost tradeoff between the different VM types required by different tasks, the original problem is relaxed to a traditional service-type selection problem by removing these

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LI AND CAI: ELASTIC RESOURCE PROVISIONING FOR CLOUD WORKFLOW APPLICATIONS

constraints. The relaxed problem is modeled using integer programming (IP). The service-type selection problem is solved with the IBM ILOG-CPLEX solver [23] in a few seconds for the tested workflows (the IBM ILOG-CPLEX Optimizer is usually used to solve integer programming problems, very large linear programming problems, etc.). In this paper, the float duration of task is defined as the sum of the task execution time and the length of the distributed time floats. The float durations are used to record the distributed time floats of the previous paths. The time floats of the new critical paths are updated according to float durations at the time. In terms of the impact of different critical paths, a float duration based time float distribution method is investigated. An iterative method is proposed to comprehensively distribute all of the time floats in each critical path. 1) VM Type Selection: To try to use the same or similar VM types in both the task scheduling and the task deadline division phases, the time/cost tradeoff of different tasks have to be considered together to generate task deadlines. When choosing VM types, it is hard to determine whether the setup times (the data transfer time, the software setup time, etc.) are needed in the final solution. The chosen VM types at this step are just used to determine task deadlines and they are not the final types for scheduling tasks. Therefore, we simply add these setup times to the execution times of each VM type when choosing VM types for deadline division. The processing time of task on VM type is (1) Furthermore, the problem becomes much more complex if the interval-based pricing model is involved in the workflow deadline division phase. We assume that VM instances are charged by time units rather than by pricing intervals. The cost of task on VM type is calculated as (2) where denotes the length of the pricing interval of VM type . Based on these assumptions, the problem under study is relaxed to a typical service-type decision problem. The binary variable takes value 1 if task is assigned to VM type , otherwise . represents the finish time of task . The objective is to minimize the total cost. The cost of unused time slots in the rented intervals is not included. The IP problem can be formally modeled as follows: (3) (4) (5) (6) (7) (8) Each task is assigned to one and only one VM type according to constraint (4). Constraints (5) and (6) guarantee the precedence constraints. Constraint (7) is a binary variable. The deadline is met by constraint (8).

5

Fig. 4. Results of the time float distribution method in IC-PCPD2.

The IP is solved by IBM ILOG-CPLEX. Though the relaxed problem is NP-hard [24], the computation time of CPLEX when % (“A relative tolerance on the gap between the best objective and the objective of the best node remaining.” [23]) is no more than 3 s on the tested benchmarks. When the gap is less than a specified value, the integer optimization of CPLEX stopped and a heuristic solution is obtained. represents the VM type of task in . 2) Time-Float Distribution: Based on the current selected VM types, and are defined as the earliest finish time, the earliest start time and the latest finish time of task . Two types of time float distribution methods have traditionally been used: 1) Intuitive division in proportion to task execution times (such as that described in [5] and [22]). This method does not consider the distributed time floats of the previous partial critical paths which can result in longer task deadlines. In other words, the relation between different critical paths is not considered. Accordingly, is the current critical path, in which is the th task on the path and is the length of . Abrishami et al. [5] calculated the subdeadline of using , where the earliest finish times and , the earliest start time and the latest finish time of the related tasks are generated according to the current task execution times directly without considering the distributed time floats of previous paths. Therefore, the becomes longer (compared with the scenario based on the distributed time floats) which ultimately results in longer task subdeadlines for all the tasks. In the workflow in Fig. 3(a), for example, we assume that the subdeadlines of colored tasks have been determined, i.e., some time floats are distributed to colored tasks after determining the task subdeadlines of path and . is assumed to be the last critical path and the task subdeadlines of tasks are determined accordingly. Fig. 4 shows the task subdeadlines generated by IC-PCPD2 [5]. The subdeadlines of , and are calculated without considering the time floats distributed to the subdeadlines of tasks on and . Therefore, the subdeadlines of tasks on are still generated by the original latest finish time of , which results in significant large task subdeadlines. Although and are also been given larger

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

task subdeadlines, longer subdeadlines for tasks on allow and to choose slower and cheaper VM types which ultimately give the and fewer chances to choose appropriate VM types. The PCP [22] also mentioned time float distribution. However, for each partial critical path, only the difference between the latest finish time of the last task and its earliest finish time is considered. 2) The task float duration-based division method [20], in which task deadlines are generated by allocating a float duration to each task in the workflow. The time float distribution process is a traverse procedure, which implies that different traverse orders lead to different distribution results. A depth and width first combined traverse method is used to determine the task processing order in [20]. However, the method of distributing time floats to task float durations is not given, which has a great impact on the final generated task deadlines. In this paper, a new float duration-based time float distribution method is developed. The traverse method ASSIGNPARENT presented by Abrishami et al. [5], [22] is applied to iteratively generate partial critical paths. Once a new critical path has been generated, the path time floats are distributed to task float durations on the path. Initially, . is iteratively adjusted by adding distributed time floats. In other words, task float durations are used to record the distributed time floats of previous critical paths. The related temporal parameters of each task are calculated by Algorithm 1. A task is said to have been “Done” if its float duration cannot be increased any further, i.e., . is named as the done-task set. The total length of all time floats on a newly generated critical path (denoted by ) is calculated as

(9) . All of the time floats bewhere tween tasks on the path, which are resulted from precedence relationships, are fully considered rather than only the last time float [22]. The latest finish time of task is obtained by typical backward calculations in terms of . Thus is determined by . The time floats on the path are distributed to the in-process tasks in proportion to their task float durations, i.e., the length of the allocated duration to is (10) , this implies that If would exceed . Therefore, is pruned by when . Correspondingly, is updated to . Whenever the float duration of a task is updated, the earliest start times and the earliest finish times of its successors are recalculated. After is distributed to

tasks, the latest finish times of all tasks are updated according to the newly generated task float durations. Then, is recalculated. If , it means the time float of the path cannot be fully distributed in the last scan because of complex precedence constraints between tasks. The above mentioned process is iterated again to distribute the time float of the path. Otherwise, the time float distribution of the current path is terminated and a new critical path is generated, for which the time float is once again distributed to the tasks. The process is iterated until the time float of the last critical path has been distributed. Finally, the earliest finish time of each task generated by the task float durations is set as the task deadline (labeled with ). The workflow deadline distribution is formally described in Algorithm 2. Fig. 5(a)–(c) illustrate the details of distributing time floats to tasks for the workflow example in Fig. 3(a). In the proposed method, the length of all time floats on the newly generated critical path is calculated by . For

, the distributed time float is

, and is updated by . the float duration of We recalculate the earliest start times and earliest finish times of all successors by the current float durations. Similarly, when , the float duration of is updated by . We recalculate the earliest start times and earliest finish times of all successors. The distributed time float . Because , then and the float duration of is updated by . We recalculate the earliest start times and earliest finish times of successors. The distribution of this scan is finished and the latest finish times of all the tasks are recalculated by the current task float durations. Since the time float of the path has not been fully distributed in the first scan, the time float of the next scan is . is distributed to and the latest finish times of all of the tasks are recalculated by the above procedure. Then, when , the time float distribution of the path is finished.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LI AND CAI: ELASTIC RESOURCE PROVISIONING FOR CLOUD WORKFLOW APPLICATIONS

7

Fig. 5. Example of time float distribution. (a) Float duration of the path . (b) After the first scan of distribution. (c) After the second scan of distribution.

B. Multiple Priority Rule-Based Task Schedule Once task deadlines have been determined at the workflow deadline division stage, tasks are scheduled to appropriate time slots of VMs in a specific order. The order is determined by a task depth-based priority rule. In terms of deadlines, tasks are matched to time slots of VM instances based on the following three proposed priority rules. 1) Task Selection: A task-depth based priority rule is proposed to select the next task which is ready to schedule. The depth of is defined (denoted by ) as the smallest number of tasks from to in the DAG, when is the set of ready tasks, from which the predecessors are all scheduled. is initialized to be . To get the next task which is ready, is partitioned into different subsets according to the depths of its tasks. Usually, tasks with the same depths require the same software in practical applications. Therefore, tasks of the subset with the smallest depth are scheduled first. For each subset, tasks are

sorted by the execution time in non-increasing order. The task with the maximum execution time is assigned the highest priority. The selected task is scheduled. All of its immediate successors are added to if all of the predecessors of the successors have been scheduled. The scheduling procedure terminates once becomes empty. 2) Priority Rules for Matching Tasks With Time Slots: Task scheduling is affected by several factors, such as the number of newly rented intervals, the price per interval, the execution time, the data transfer time, the software setup time, the VM loading time, and the length-match between the task processing time and the length of the remaining rented intervals. These factors interact with each other. For example, the task execution time has an impact on the number of newly rented intervals and the length-match. In [5], only the interval-cost (the cost of newly added VM instance intervals which are the product of the price per interval and the number of newly rented intervals) is used to select VM instances. Although minimizing the interval-cost decreases the cost of the current step, tasks can be assigned to poor VM type time slots. For instance, if high-memory VM types are cheaper, a computation-intensive task can be assigned to a high-memory VM instance with a much longer execution time, which ultimately leads to a higher actual cost (the actual cost during task processing which excludes the cost of the isted fractions in terms of interval-based pricing models). However, minimizing the actual cost of each task is helpful to minimize the total rental cost of a large number of tasks by sharing the rented intervals (maybe with a higher price per interval). In other words, it can be misled to select a VM type with a cheaper price per interval. Instead of using the interval-cost, we develop three rules for time slot selection: 1) minimizing the number of newly rented intervals to maximize the reuse of the fractions of rented intervals; 2) minimizing the actual cost to make good matches

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

Fig. 6. Available time slots of a VM instance.

between task execution requirements and VM configurations, minimize data transfer times, reuse software, and minimize the VM loading time; and 3) making a good match between the processing time and the length of remaining rented intervals to improve the utilization of rented VM time intervals. The three rules have different advantages. We construct a hybrid rule by combining them with weights. The weight is calibrated experimentally in Section V. Under this rule, we let represent all rented VM instances in the considered cloud system where is the number of VM instances in the system, is the set of available time slots between time zero and for VM instance (e.g., the available time slots are [3, 7], [8, 13] and [16, 21] for the VM instance in Fig. 6), and denotes the available time slot set for task , i.e., all of the available time slots between to in . When is scheduled on the VM instance to which the time slot belongs, the binary variable ; otherwise . The data transfer time of task on the time slot is . When VM is launched at time slot ; otherwise . The VM setup time for is where is the VM type of . When is assigned to a time slot without software ; otherwise . The software setup time of task on is . The details of the three rules to assign to the most appropriate time slot in are as follows. FNIF rule (Fewest amount of Newly rented time Intervals First): The number of newly rented intervals should be as small as possible by maximizing their utilization. Time fractions of the rented VM instances are produced in the majority of the intervals by the scheduled tasks because only a few intervals can be fully used. By assigning tasks to the time fractions, the utilization of the rented VM instances can be improved and the rental cost can be reduced. Fig. 7 shows an example scenario in which have been scheduled and task is the current ready task. [3, 7], [8, 13], [16, 21], [6,13], [15,21], [8,21], [7,16] are the available time slots of rented VM instances. Assuming that the pricing interval is 6 and is the set of time fractions on the rented intervals, these intervals can be reused by assigning possible tasks to them. For example, [3, 7], [8, 13], [7, 16] are rented slots while [8, 21] is a partially rented slot. and are equal to slots [3, 7], [8, 13], and [7, 16], respectively. However, is only a part of slot . If is assigned to [6, 13], we need to rent a new time interval. If is assigned to [8, 21], we also need to rent a new time interval, though the time fraction can be reused. Slot [7, 16] just meets the requirement. We can simply utilize the time fraction and avoid the need to rent a new time interval.

Fig. 7. Rent as few VM intervals as possible.

Accordingly, slot [7, 16] is the most suitable for . Therefore, let denote the number of newly rented intervals when is assigned to . is the maximum number of time intervals needed by . We define as follows: (11) is the VM type of . Then, the priority is deterwhere . For example, mined by the normalization and

based on the assumption that for simplification. LACF rule (Lowest Actual Cost First): When VM configurations match task processing requirements, the execution time is usually shorter than the time required on a non-well-matched VM type. The actual cost is cheaper though the price per interval of the VM may be more expensive. In other words, lower actual cost always means a better match between task execution requirements and the VM configuration. Furthermore, VM loading, software setup, and data transfer all exert a great influence on actual cost. Therefore, the total actual cost includes four components: 1) the significant cost of VM loading when the task is assigned to a new VM instance; 2) the software setup costs of downloading and installing necessary software from remote file systems if the task is deployed to an existing VM instance without this software; 3) the data transfer cost when the task and its immediate predecessors are assigned to different VM instances; and 4) the execution cost of the task on different types of VMs, which depends on the match between task attributes and VM configurations. Therefore, the actual cost is defined as (12)

Fig. 8 shows four tasks in a data mining workflow. are the scheduled tasks and is the ready task. and are in charge of data preparation and data partition, respectively. and are in charge of data mining and need the same kind of software. • If is assigned to [2, 21] on and .

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LI AND CAI: ELASTIC RESOURCE PROVISIONING FOR CLOUD WORKFLOW APPLICATIONS

9

Fig. 9. Make better match between length of slots and the execution time.

Fig. 8. Select the VM instance with the cheapest setup, data transfer and execution cost.

• If

immediately after (because the software reis executed), (to to ), and

the better the match between the length of the time slot and the processing time required by the task. Out of all the candidate slots, the shortest total length of the smallest fractions is when is assigned to [9, 21]. Therefore, [9, 21] has the highest priority for . Accordingly, we define the match as (14)

is assigned to [7, 21] on

quired is installed before transfer data from

where tions) of

is the remaining time (newly produced smaller fracon . For example, and

. • If is assigned to [4, 21] on , no software is needed and . because the immediate predecessor is allocated to the same VM instance. The execution time on is longer than the time on the other VMs and . As a result, the total cost

. 3) Hybrid Time Slot Selection Rule: Based on the three developed rules, a hybrid priority rule is computed to allocate task to candidate slot as follows:

.

where , and are weights. Different values of , and mean distinct weights on the three time slot selection rules (FNIF, LACF and BMF). Higher weighting for the FNIF rule implies a greater tendency to reuse the rented intervals irrespective of the VM configurations matching the task processing requirements. The LACF rule only takes into account the actual cost. Based on a larger , the proposed heuristic tends to select time slots which have a lower actual cost without considering the pricing interval of the rented time slots. If is large enough, the proposed heuristic searches the time slots with the lowest actual cost. The rented time fractions of the non well-matched VM types are wasted and the resource utilization of the rented intervals decreases. The BMF rule focuses on the match between task execution times and the length of time slots, which improves the utilization of rented intervals. Similarly, if is sufficiently large, the proposed heuristic only favors the length-match instead of actual cost, which results in high resource renting costs. Therefore, and should be calibrated by experiments to find the best combination of the three rules. Under this rule, for each ready task , a new rented VM instance of type is temporarily added to the data center and the existing rented instances are compared to it. The priority for on all time slots in the data center is calculated by (15). The time slot with the highest priority is selected to assign the ready task. is removed from the data center if it is

• If a new VM instance loading time is

is rented for . Then

and The priority value of this rule is normalized as

, the VM .

(13) . For the above example, BMF rule (Best Match between the slot length and the task processing time First): In order to improve resource utilization, BMF tries to choose the available time slot with the best match from the length of available time slots and the processing time of tasks. Because of the interval-based pricing models, the time fractions of rented VM intervals are produced during the scheduling process. A perfect match is desirable if a task is allocated to the time fraction whose length just meets what the task requires. Naturally, it is almost impossible to ensure perfect matches when allocating all the tasks, as in practice, some small fractions appear before (because of the earliest time constraints in workflows) or after (when the length of the required interval is less than that of the fraction) the assigned task. For the example in Fig. 9, when is assigned to [1, 21] on , two smaller fractions are created before and after , whose lengths are 2 and 1, respectively. The smaller the total length of the fractions,

(15)

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

not used. After that, the ready task set is updated and the next task which is ready is obtained. The process is iterated until all tasks are scheduled. The task scheduling procedure is formally described in Algorithm 3. C. Multi-Rule Heuristic The proposed Multi-Rule Heuristic (MRH) consists of two steps: 1) the workflow deadline is divided into task deadlines by FDD and 2) after determining the best combination of the three parameters , and , tasks with deadline constraints are scheduled by the TaskSchedule(a,b,c) (Algorithm 3) procedure. D. Complexity Analysis The time complexity of MRH depends mainly on two parts: that of the workflow deadline division and that of task scheduling. The workflow deadline division consists of two steps: solving the MILP using CPLEX by distributing the deadline using a traverse method which combines depth and width first. Although the MILP is NP-hard, the CPU time of CPLEX for workflow applications with 1000 tasks is less than 3 seconds when is 0.2%. The time complexity of the combined traverse method is . Accordingly, we let the maximum number of available time slots be (which is closely related with the number of scheduled tasks) and the time complexity of the hybrid of the three priority rules on all slots be . The time complexity of the task scheduling is . V. PERFORMANCE EVALUATION There is no comprehensive benchmark available to researchers for the problem under study. Fortunately, the preliminary works by Bharathi et al. [25] studied the characteristics of several types of realistic workflows, which include Montage (MON), CyberShake (CYB), Epigenomics (biology), LIGO (LIG), and SIPHT (SIP). In this paper, workflow instances are produced using the workflow generator developed by

Bharathi et al. [25] (https://confluence.pegasus.isi.edu/display/pegasus/Workflow-Generator). The workflows generated by the workflow generator are saved in XML formats, which provide network structures, task names and running times. The workflows are extended to instances of the considered problem. In order to generate the execution time for different types of VM, tasks with the same software requirements are assigned a unified category chosen from Normal, High-memory and High-CPU. Accordingly, we let and be the the total memory and the CPU workload of . and represent the memory and CPU configurations of the VM type described in Table I. The execution time described in the XML file is assumed to be the execution time on the and types (see Table I) of instances when the category is Normal, High-memory, and High-CPU, respectively. and and C M for Normal, High-memory and High-CPU respectively). The execution time of on any VM type is . The software needed by each task is determined by the task name described in XML files. Data transfers are described in XML files which consist of the volume of the data and the transfer directions. For the tested workflows, the number of tasks take values from . One hundred instances are randomly generated for each thus testing a total of 1100 instances. The test-bed generated in this paper is available on the website (http://www.seu.edu.cn/lxp/bb/06/ c12114a113414/page.psp). In the cloud computing environment under study, nine types of VMs (in Table I) with different configurations and prices are considered. The bandwidth of this experiment takes values from MBps (100 Mps have been common in data center network adapters over the past few years and 1000 Mps are currently used). Before installing new software, a file have to be downloaded from a remote file system. Transfer times varied in terms of the file size and bandwidth. The software setup time takes values from seconds (this span is large enough to simulate general scenarios). The VM loading time is set to 30 s. Another important parameter in the experiments is the length of the pricing time interval. Although there are minute, hourly and monthly-based pricing models, we adopt the hourly-based model which is commonly used in commercial clouds. We let be the shortest execution time of the tested workflow with the fastest and sufficient amount of VM instances and is the longest execution time of with tasks being executed sequentially on only one VM instance. We take the deadlines of each tested workflow from and . is named as the deadline factor. The Normalized Cost (NC) in [5] is adopted to normalize the resource renting cost as a large set of workflows with different , where is the total attributes. resource renting cost of a schedule and is the cheapest cost to execute the workflow regardless of the deadline. To the best of our knowledge, there is no algorithm for the problem analyzed in this paper. In IaaS clouds, the workflow

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LI AND CAI: ELASTIC RESOURCE PROVISIONING FOR CLOUD WORKFLOW APPLICATIONS

scheduling problem studied by Abrishami et al. [5] is one of the most similar problems to the problem under study in this paper. Two heuristics, IC-PCP and IC-PCPD2 are proposed. They showed that, on average, IC-PCP outperformed IC-PCPD2. For a fair comparison, IC-PCP is modified to take into account the operating system loading time and the software setup time. When scheduling a sequence of tasks on an existing VM instance, the software setup times are included in the total execution time when they needed different types of software. When a new VM instance has to be rented, the operating system loading time is added to the total execution time so as to compare it to existing VM instances. Similarly, Juan et al. [15] considered a multi-objective workflow scheduling problem to simultaneously optimize makespan and resource renting costs. A Pareto-based list scheduling heuristic (MOHEFT) was presented to obtain a set of solutions for each tested workflow instance, which were tradeoffs between makespan and the resource renting cost. To fairly compare the proposals with MOHEFT, the makespan of each solution in MOHEFT Pareto solutions is set as the deadline of the proposed MRH heuristics and the corresponding resource renting cost is obtained. For each workflow instance, we obtain a set of solutions with different deadlines based on the Pareto solutions of MOHEFT. Since neither VM loading times nor software setup times are considered in MOHEFT, we modify MOHEFT for the problem under study: In each iteration of MOHEFT, both the VM loading time and the software setup time are checked when the current task is mapped to each available time slots of every existing partial solution. A larger value of the parameter (size of the remaining solution after cutting each iteration) for MOHEFT generates better effectiveness and a longer computation time. When the of MOHEFT is infinite, MOHEFT leads to an exhaustive search which is extremely time consuming. When increased to 20, the computation time of MOHEFT rises to about 650 seconds for workflows with 1000 tasks, which is significantly longer than times of the MRH computation time (a few seconds). We aim to prove that even with this condition the proposed MRH still generated lower resource renting costs for the same deadline. All of the compared algorithms are coded in Java and run on virtual machines with a single core i5-2400 CPU @3.1 GHZ and 1 G RAM under the Windows XP operating system. A. Parameter Calibration We first calibrate the best combination of the three parameters and (which are the weights of the three constructed priority rules) in MRH on the five types of workflows mentioned above. Workflow instances for the parameter calibration are generated randomly using the workflow generator [25]. For simplicity, we only test the relevant relations between the three , taking values parameters. We define and . All of the and combinations from are tested. The experimental results are analyzed by the multifactor analysis of variance (ANOVA) technique [26]. A number of hypotheses have to be ideally met by the experimental data. The main three hypotheses (in order of importance) are the independence of the residuals, homoscedasticity or homogeneity of the factor's level variance and normality in the residuals of

11

Fig. 10. Interactions between intervals.

Fig. 11. Mean plot of

and

with 95.0% Tukey HSD confidence

with 95.0% Tukey HSD confidence intervals.

the model. Apart from a slight non-normality in the residuals, all of the hypotheses are easily accepted. The response variable in the experiments is NC for each algorithm in every instance. Interactions between and with 95% Tukey Honest Significant Difference (HSD) confidence intervals are depicted in Fig. 10. Fig. 10 shows that the NC difference between and is statistically significant. MRH has better NC when than when , which means that it is better to select time slots with the maximum number of rented intervals than to select time slots with the lowest actual cost, i.e., greater weights on the FNIF rule than on the LACF rule. The reason lies in the fact that task execution times are much shorter than the lengths of the applied pricing intervals (one hour) in the considered workflow instances. Rented intervals would be wasted to a large extent if the time slots with the lowest actual cost are selected. Though there is no statistically significant difference on NC when , the experiment results show that MRH ob. Therefore, in this tains the lowest NC when paper. and 95.0% Tukey HSD Fig. 11 shows the means plot of intervals. Fig. 11 implies that there is no statistically significant difference for the performance of MRH with different , which

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 12

Fig. 12. Epigenomic workflows.

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

Fig. 13. Epigenomic workflow results when software setup time equals 0.

illustrates that the weight of the BMF rule is not important for the hybrid rule. The reason lies in the fact that the task execution times of the considered workflows are much shorter than the considered pricing intervals (one hour). Fewer chances are left for MRH to consolidate tasks to improve the utilization of rented intervals. We only select the lowest value . According to the above analysis, the parameter combination of and is chosen by the MRH for the five types of workflows under study. B. Algorithm Comparisons 1) Comparisons With Epigenomics: Fig. 12 illustrates an example of Epigenomic (biology) workflows which consist of 8 different kinds of tasks: fastsplit, filtercontants, sol2sanger, fastq2bfq, map, mapmerge, mapindex and pipeup. Figs. 13 and 14 show the average normalized cost of Epigenomic workflows when the software setup time equals 0, 5, 10, 15, and 20 s, respectively. The horizontal axis is labeled with two categories. The bottom ones are the number of nodes in the workflows and the top ones are the deadline factors. The two figures show that the proposed MRH outperforms IC-PCP in most cases (a saving of about 20% on resource renting costs for instances with the largest deadline factor). However, the proposed MRH is outperformed by IC-PCP for the cases with both small deadline factors and short software setup times, for which the reason lies in two aspects. 1) Tight task deadlines would be resulted by the CPLEXbased deadline division method adopted in the proposed MRH for short tasks when the workflow deadline is strict. Tight task deadlines deteriorate the reuse of both software and rented resource intervals which increases the total renting cost. 2) Fewer chances are given to the proposals to improve the utilization of rented intervals by task consolidation when

Fig. 14. Epigenomic workflow results when software setup time equals 5, 10, 15, and 20 s.

strict deadlines are given. For example, Fig. 15 shows the Gantt Chart of a schedule generated by MRH when the deadline factor is four (about six hours), which illustrates that tasks in Region 1 and 2 are well consolidated and the rented hours are fully used. However, for the remaining time fractions in Region 3, it is hard to find well-matched tasks because of the critical deadline factor. Fig. 16 depicts the Gantt Chart of MRH when the deadline factor is 32 (about 39 hours) for the same workflow as in Fig. 15, which shows that the rented hours are fully used by improving task consolidation. As software setup times increase, the performance of the proposal improves considerably, when compared

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LI AND CAI: ELASTIC RESOURCE PROVISIONING FOR CLOUD WORKFLOW APPLICATIONS

Fig. 17. LIGO workflows.

Fig. 15. Gantt Chart for the MRH schedule result when the deadline factor equals 4.

Fig. 18. LIGO results when software setup time equals 0 s.

Fig. 16. Gantt Chart for the MRH schedule result when the deadline factor equals 32.

to IC-PCP, as shown in Fig. 14. MRH outperforms IC-PCP significantly in total renting costs (though the NC values are not obvious in the figure) when the software setup time takes 5 s, 10 s, 15 s, or 20 s. The reason lies in the fact that the scheduling strategy of IC-PCP binds sequential tasks together and schedules them on the same VM instance. Usually, sequential tasks need different types of software which generate considerable software setup times. Furthermore, both MRH and IC-PCP are very fast heuristics. MRH takes only a few seconds for workflows featuring thousands of tasks. 2) Comparison With LIGOs: The network structure of LIGO (gravitational physics) workflows is shown in Fig. 17 and consists of four kinds of tasks: TmpltBank, Inspiral, Thinca, and TrigBank. Figs. 18 and 19 show the average NCs of LIGO workflows when the software setup time is zero and greater than zero

Fig. 19. LIGO results when software setup time equals 5, 10, 15, and 20 s.

13

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 14

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

Fig. 20. Gantt Chart for the MRH scheduling result when the deadline factor equals 4.

Fig. 21. Gantt Chart for the MRH scheduling result when the deadline factor equals 32.

respectively. When the software setup time equals 0 s, the performance of the proposals in the cases which have critical deadline factors is worse than that of IC-PCP. This tendency is similar to that of the Epigenomics instances. The two figures show that the average NC of workflows decreases from 16 to nearly 1 as the deadline factor increases from 2 to 128. The reason is that the shortest execution duration of each LIGO workflow is much shorter than the pricing interval (one hour). When the deadline factor is below 4, only a small part of the rented hours can be used, which leads to a very low resource utilization rate (e.g., 17% for the workflow in Fig. 20). However, the resource utilization rate increases to about 90% when the deadline factor reaches 32. Fig. 19 shows that the proposed MRH substantially outperforms IC-PCP for all instances, as software setup times increase to 5 s, 10 s, 15 s, and 20 s, as occurs with Epigenomics. When the number of nodes increases, the performance of the three proposed heuristics improves greatly and the average NC approaches 1 (the lower bound) for some cases with large deadline factors. This is because the subsequent task selection procedure tends to schedule tasks needing the same kind of software together while IC-PCP binds sequential tasks with different software requirements together and schedules them on the same VM instance. If we look at the Gantt Chart in Fig. 21 for example, we can see that tasks with the same kind of software requirements are bundled together so as to share the same installed software and the rented hour instance, which considerably decreases renting costs. The two sub-figures below are zoomed images of parts of the top sub-figure. The left one consists of tasks needing TmpltBank software and the right one consists of Inspiral tasks. However, this is not the case for IC-PCP, e.g., the Gantt Chart in Fig. 22 shows that when sequential tasks with

Fig. 22. Gantt Chart for the IC-PCP scheduling result when the deadline factor equals 32. TABLE II AVERAGE NORMALIZED COST FOR OTHER TYPES OF WORKFLOWS

different software requirements are bound together and scheduled on the same VM instance, this leads to sparse distribution of tasks and lower resource utilization rates. A zoomed view of Fig. 22 is shown at the bottom, in which the small black rectangles between different tasks represent the software setup. 3) Comparison With Other Workflows: To further evaluate the performance of the proposals, another three kinds of realistic scientific workflows, namely CyberShake (earthquake science), Montage (astronomy) and SIPHT (biology) are tested. Their structures are cited from https://confluence.pegasus.isi.edu. Overall results are shown in Table II, revealing that the proposals are much better than IC-PCP in the CyberShake and

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LI AND CAI: ELASTIC RESOURCE PROVISIONING FOR CLOUD WORKFLOW APPLICATIONS

Fig. 23. LIGO results of MOHEFT and MRH.

Montage workflows. For example, when the deadline factor is greater than 16 in the Montage workflow, the average renting cost of MRH is less than half of the IC-PCP resource renting cost. For SIPHT workflows, the proposals are better than IC-PCP when the deadline factor is greater than 8 (maximum renting cost saving of about 14%). The reason is similar to the Epigenomics case. 4) Comparisons With MOHEFT: In order to compare the results of MRH and MOHEFT, the Relative Deviation Index (RDI) of the resource renting cost is adopted, , where represents an algorithm and is the smallest NC between MRH and MOHEFT. ARDI is the average of RDI on a given set of workflow instances. As MOHEFT is a multi-objective heuristic, for each workflow instance it produces a Pareto set of solutions , in which is a solution with execution time and renting cost (in this paper, ). For fair comparison, the execution time of each MOHEFT solution is adopted as the makespan of MRH and a corresponding solution ( ) is obtained by MRH. Therefore, solutions are generated by MRH for each workflow instance. For each , the RDI of MRH and MOHEFT are computed, in which is the cheapest out of and . Fig. 23 is the ARDI of MRH and MOHEFT with different deadline factors. The overall tendency in Fig. 23 shows that MRH gets zero ARDI in most cases and that MOHEFT fluctuates between 0.5 and 1, which implies that MRH usually obtains much better results than MOHEFT. The average execution time of MOHEFT is about 650 s on instances with 1000 tasks. The average execution time of MRH on the instances with 1000 tasks is only 2.3 seconds and the total execution time of different deadlines is about seconds, which is much faster than MOHEFT. VI. CONCLUSION We have considered VM instance provisioning for workflow applications with interval-based charging, setup times and deadline constraints in clouds. The objective is to minimize resource renting costs. For the problem under study, we proposed

15

a heuristic MRH which consists of a deadline division and task scheduling stages. We presented a method to divide the workflow deadline into task deadlines based on the solutions of a relaxed problem. Before task scheduling, the current task to be scheduled is determined by a task depth-based priority rule. Taking into account different factors (newly rented intervals, setup times and resource utilization), three task scheduling rules are developed. Based on the three rules, we constructed a weighted hybrid rule in which the weights are calibrated by experiments on randomly generated instances. MRH is compared to IC-PCP and MOHEFT over benchmark instances of five types of real workflows. MRH outperformed IC-PCP and MOHEFT significantly in most cases, e.g., MRH reduced normalized costs by 78.57%. However, the proposed MRH is outperformed by IC-PCP when workflow deadline factors are small and software setup times are short on some workflow types, such as Epigenomics and SIPHT. In addition, MRH is fast in meeting the requirements of cloud computing. For future research, the impact of different pricing interval lengths on resource provisioning algorithms is worth studying. Resource provisioning for scenarios with task-batch based workflows or instance-intensive workflows is also a desirable area of study for future work. REFERENCES [1] E. Deelman et al., “Pegasus: A framework for mapping complex scientific workflows onto distributed systems,” Scientific Programming, vol. 13, no. 3, pp. 219–237, 2005. [2] M. Wieczorek, R. Prodan, and T. Fahringer, “Scheduling of scientific workflows in the ASKALON grid environment,” ACM SIGMOD Rec., vol. 34, no. 3, pp. 56–62, 2005. [3] F. Berman et al., “New grid scheduling and rescheduling methods in the GrADS project,” Int. J. Parallel Programming, vol. 33, no. 2–3, pp. 209–229, 2005. [4] M. Mao and M. Humphrey, “Auto-scaling to minimize cost and meet application deadlines in cloud workflows,” in Proc. IEEE Int. Conf. High Performance Computing, Networking, Storage and Anal., 2011, pp. 1–12. [5] S. Abrishami, M. Naghibzadeh, and D. Epema, “Deadline-constrained workflow scheduling algorithms for IaaS clouds,” Future Generation Comput. Syst., vol. 29, pp. 158–169, 2013. [6] H.-m. Luo, C.-k. Yan, and J.-w. Luo, “Dynamic programming based grid workflow scheduling algorithm,” in Software Engineering and Knowledge Engineering: Theory and Practice. Berlin, Germany: Springer, 2012, pp. 993–1000. [7] Q. Wu and Y. Gu, “Supporting distributed application workflows in heterogeneous computing environments,” in Proc. 14th IEEE Int. Conf. Parallel and Distrib. Syst., 2008, pp. 3–10. [8] Z. Xiao, W. Song, and Q. Chen, “Dynamic resource allocation using virtual machines for cloud computing environment,” IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 6, pp. 1107–1117, 2010. [9] H. Hsiao, H. Chung, H. Shen, and Y. Chao, “Load rebalancing for distributed file systems in clouds,” IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 5, pp. 951–962, 2013. [10] X. Zuo, G. Zhang, and W. Tan, “Self-adaptive learning pso-based deadline constrained task scheduling for hybrid iaas cloud,” IEEE Trans. Autom. Sci. Eng., vol. 11, no. 2, pp. 564–573, 2014. [11] M. Mao, J. Li, and M. Humphrey, “Cloud auto-scaling with deadline and budget constraints,” in Proc. IEEE/ACM Int. Conf. Grid Computing, 2010, pp. 41–48. [12] R. Van den Bossche, K. Vanmechelen, and J. Broeckhove, “Cost-optimal scheduling in hybrid iaas clouds for deadline constrained workloads,” in Proc. IEEE 3rd Int. Conf. Cloud Computing, 2010, pp. 228–235. [13] E.-K. Byun, Y.-S. Kee, J.-S. Kim, E. Deelman, and S. Maeng, “BTS: Resource capacity estimate for time-targeted science workflows,” J. Parallel Distrib. Computing, vol. 71, no. 6, pp. 848–862, 2011.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 16

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING

[14] E.-K. Byun, Y.-S. Kee, J.-S. Kim, and S. Maeng, “Cost optimized provisioning of elastic resources for application workflows,” Future Gener. Comput. Syst., vol. 27, no. 8, pp. 1011–1026, 2011. [15] J. J. Durillo and R. Prodan, “Multi-objective workflow scheduling in amazon EC2,” Cluster Computing, pp. 1–21, 2013. [16] L. David and I. Puaut, “Static determination of probabilistic execution times,” in Proc. 16th Euromicro Conf. Real-Time Syst., 2004, pp. 223–230. [17] M. A. Iverson, F. Ozguner, and L. C. Potter, “Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment,” in Proc. 8th Heterogeneous Computing Workshop, 1999, pp. 99–111. [18] R. H. Möhring, “Minimizing costs of resource requirements in project networks subject to a fixed completion time,” Oper. Res., vol. 32, no. 1, pp. 89–120, 1984. [19] R. Bajaj and D. P. Agrawal, “Improving scheduling of tasks in a heterogeneous environment,” IEEE Trans. Parallel Distrib. Syst., vol. 15, no. 2, pp. 107–118, 2004. [20] J. Yu, R. Buyya, and C. Tham, “Cost-based scheduling of scientific workflow applications on utility grids,” in Proc. 1st Int. Conf. e-Science and Grid Computing, 2005, pp. 8–pp. [21] Y. Yuan, X. Li, Q. Wang, and X. Zhu, “Deadline division-based heuristic for cost optimization in workflow scheduling,” Inf. Sci., vol. 179, no. 15, pp. 2562–2575, 2009. [22] S. Abrishami, M. Naghibzadeh, and D. Epema, “Cost-driven scheduling of grid workflows using partial critical paths,” IEEE Trans. Parallel Distrib. Syst., vol. 23, no. 8, pp. 1400–1414, 2012. [23] “Ibm ilog cplex optimization studio,” [Online]. Available: http://en. wikipedia.org/wiki/CPLEX [24] P. De, E. Dunne, J. Ghosh, and C. Wells, “Complexity of the discrete time-cost tradeoff problem for project networks,” Oper. Res., vol. 45, no. 2, pp. 302–306, 1997. [25] S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M.-H. Su, and K. Vahi, “Characterization of scientific workflows,” in Proc. 3rd Workshop Workflows in Support of Large-Scale Sci., 2008, pp. 1–10. [26] T. Bartz-Beielstein, M. Chiarandini, L. Paquete, and M. Preuss, Experimental Methods for the Analysis of Optimization Algorithms. Berlin, Germany: Springer, 2010.

Xiaoping Li (M’09–SM’12) received the B.Sc., M.Sc., and Ph.D. degrees in applied computer science from the Harbin University of Science and Technology, Harbin, China, in 1993, 1999, and 2002, respectively. He joined Southeast University, Nanjing, China, in 2005, and is currently a Professor with the School of Computer Science and Engineering. From January 2003 to December 2004, he performed postdoctoral research at the Department of Automation at Tsinghua University, Beijing, China. From March 2009 to March 2010, he was a Visiting Professor with the National Research Council, London, ON, Canada. He is the author or coauthor over more than 100 academic papers, some of which have been published in international journals such as the IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, the IEEE TRANSACTIONS ON SERVICES COMPUTING, Information Sciences, Omega, European Journal of Operational Research, International Journal of Production Research, Expert Systems with Applications, Journal of Network and Computer Applications, and Engineering Optimization. His research interests focus on scheduling in cloud computing, scheduling in cloud manufacturing, machine scheduling, project scheduling, terminal container scheduling, learning effects in scheduling, and manufacturing software interoperability.

Zhicheng Cai (M’14) received the B.Sc. and Ph.D. degrees in computer science and engineering from Southeast University, Nanjing, China, in 2009 and 2015, respectively. He is currently a Lecturer with the School of Computer Science and Engineering, Nanjing University of Science and Technology. His research interests focus on load prediction, dynamic capacity management, task scheduling in clusters such as grids and clouds, project scheduling, and service-oriented computing. He is the author of several publications in international journals such as the IEEE TRANSACTIONS ON SERVICES COMPUTING and at conferences such as ICSOC, ICPADS, SMC, and CASE.