Accepted Manuscript Modeling and scheduling hybrid workflows of tasks and task interaction graphs on the cloud Mahmoud Naghibzadeh PII: DOI: Reference:
S0167-739X(16)30140-6 http://dx.doi.org/10.1016/j.future.2016.05.029 FUTURE 3053
To appear in:
Future Generation Computer Systems
Received date: 1 October 2015 Revised date: 11 April 2016 Accepted date: 22 May 2016 Please cite this article as: M. Naghibzadeh, Modeling and scheduling hybrid workflows of tasks and task interaction graphs on the cloud, Future Generation Computer Systems (2016), http://dx.doi.org/10.1016/j.future.2016.05.029 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
*Revised Manuscript with source files (Word document) Click here to download Revised Manuscript with source files (Word document): TIPG-Paper-F.docx Click here to view linked Referenc
Modeling and Scheduling Hybrid Workflows of Tasks and Task Interaction Graphs on the Cloud Mahmoud Naghibzadeh Department of Computer Engineering, Ferdowsi University of Mashhad
[email protected] Abstract—A Workflow represents a complex activity that is often modeled as a Directed Acyclic Graphs (DAG) in which each vertex is a task and each directed edge represents both precedence and possible communication from its originating vertex to its ending vertex. When the execution of a task is completed, the communication with its successor(s) can start and anticipated results are transferred. Only after all parents of a task are completed and their results (if any) are received by the task its execution can start. This constraint restricts a more general case in which some tasks could interact during their executions. In this paper, a task model composed of both interaction and precedence of tasks is introduced. It is shown that, under certain conditions, this kind of graph can be transformed into an extended DAG, called Hybrid DAG (HDAG), composed of tasks and super-tasks. With this, it becomes possible to model many applications in the form of hybrid workflows and go about scheduling them on the cloud. In this paper, validity conditions of such graphs is investigated and a verification algorithm is developed. The time used by this algorithm is evaluated. An algorithm for scheduling hybrid workflows is presented and its performance is evaluated. The effect of different values for relative deadline on the schedulability of workflows is also presented. Keywords: Task interaction-precedence graph, hybrid DAG, hybrid workflow, Cloud computing
I. INTRODUCTION Activities of many composite processes such as likelihood propagation using Bayesian network, family trees (actually graphs) in genealogy, and scientific and industrial workflows, are modeled as Directed Acyclic Graphs (DAGs) in which many vertices are connected via directed edges [1]. As the name suggests, an important property of such graphs is that there is no cycle in a DAG. This model is applicable where there are both control and data dependencies. Control dependencies impose a partial precedence relation between tasks. Besides, if a task has to transfer data to one (or more) of its successors it can do so only at the end of its execution, i.e., in the middle of execution it is not possible for two or more tasks to interact. This resembles an assembly line in which one station has to complete its job and then pass the object to the next station. In any case, a DAG represents a partial ordering of tasks that must be obeyed in their executions. Figure 1 shows a DAG composed of 10 vertices and 12 edges. The number next to a vertex shows its required execution time averaged on all applicable resource types.
Similarly, the number on an edge shows the average data transfer time from the task of the originating vertex to task of the ending vertex. The transfer takes place at the end of the execution of the task of the originating vertex. T1
2
22 T2
8
T3 45
4
38 16
5 T7
T5 8
56
T4
3
2 T6
43
3
6
5
T9 8
2 T8
T10
42
4
9 30
Figure 1: A DAG of precedence-communicating tasks
On the other hand, a Task Interaction Graph (TIG) is used to show which tasks of an application interact during their executions [2]. Tasks of a TIG may start on different times so far as their interaction are possible and each task can continue its execution as long as it does not need to synchronize or communicate with any other task. Only two directly connected tasks, i.e., neighbors, could interact and it is possible that they may not do so depending on the logic of the corresponding programs and the current values of data objects. Even, the interaction may be one sided, in some circumstances. The nature of interaction depends on the parallel (or concurrent) problem being solved and it may be of actual information transfer or even the need to indirectly interact to use shard data. Parallel execution of a TIG means all tasks of the TIG must start simultaneously (on different processors) and the processors are freed when the execution of all tasks of the TIG are completed. On the other hand, concurrent execution does not necessitates the simultaneous start of all tasks and simultaneous release of all processors, but it requires that whenever two (or more) tasks need to interact they must be in the state of execution, i.e., their execution has started but not yet completed. Therefore, a TIG graph, G(V, E), is composed of a set of vertices V and a set of undirected edges E where each edge connects two vertices and the whole graph is connected. The definition of TIG is similar to the general definition of an undirected connected graph. To make it specific, in graph theory, an undirected edge is interpreted as two directed edges however, this is not true for a TIG because if a directed edge is considered to show precedence then there would be a
cycle corresponding to every pair of vertices for which there is an undirected edge, which is not allowed. If the edges show dependencies, once again we are faced with circular dependencies, which is not allowed. Each edge in a TIG means that the tasks corresponding to its vertices can interact and transfer data during their executions. As an example of a TIG, consider a parallel program that is designed to multiply a sparse Matrix M with m rows and n columns by Vector V with n rows. This simple example has wide variety of scientific applications and it has recently been ported to new technologies such as General Purpose Graphics Processing Units (with thousands of cores). Suppose Task i’s responsibility is pairwise multiplication of elements of Row i, i=1,2,…m, of Matrix M and corresponding elements of Vector V and computing their sum, i.e.,
Here, Task k, k=1,2,..n and Task l, l=1,2,…n, lk conflict on elements of Vector V for which both M[k, j] and M[l, j] are nonzero [3]. These two tasks cannot simultaneously access the same memory location. The nature of task interactions in this example is indirect. In the context of scheduling DAGs and TIGs, a task is a piece of work that is completely assigned to one processor to do. The cloud provides wide varieties of resources and software in the forms of Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) for public use. Users can lease these services for different applications and for the needed periods of time and pay as they are used [4][5]. This is a great opportunity for many users who are short of required resources to run their complex computational jobs. Scientific workflows that solve complex problems with hundreds of tasks require many simultaneous processing resources that the users may not have. Therefore, the cloud has become a widely used Internet-based infrastructure to overcome this problem. Scheduling workflows on the cloud is an ongoing research activity with continuous improvements on their quality of services such as make-span, price to be paid, privacy preserving, energy efficiency, and fairness. There are many types of workflows for different applications [6]. A scientific workflow is often modeled as a DAG which is unable to handle task interactions that may happen during tasks’ executions. In this paper, a new task modeling approach called Task Interaction-Precedence Graph (TIPG) is introduced in which task precedence, dependency, and interaction are possible, subject to passing validity tests. The modeling starts with only tasks, and TIGs are automatically recognized by the developed algorithms. Correctness and the validity of the model is also automatically verified. This replaces the tedious task of doing it manually. The TIPG model is an extension of the DAG model in which interaction is also made possible. At the beginning, there could be no edge between a given pair of tasks which means there is no communication, interaction, or precedence relation between the two tasks. There could be a directed edge between the two tasks which means the execution of
the task which the edge begins from must precede that of the other task; at the same time the beginner task can communicate, i.e., send data, to the other task at the end of the former’s execution. There could be an undirected edge between the two tasks which means, the two tasks can interact during their executions, i.e., share input, output, or intermediate data. Further, it is assumed that each task or super-task is executed only once and loops are not allowed. Even each task of each super-task is executed only once. Difficulties involved in the scheduling of applications which can be modeled using TIPG on the cloud is studied. Some preliminary scheduling results are presented which shows the success rate of scheduling with respect to the given deadlines is improved. However, the originality of this work is on the introduction of a new task model which allows tasks of a workflow to directly or indirectly interact during their executions. The remainder of the paper is organized as follows. Related work is briefly sketched in Section 2; problem formulation is elaborated on in Section 3; Section 4 provides the details of transformation of a TIPG into a potential hybrid DAG of tasks and super-tasks; it then checks the validity of the potential hybrid DAG. If it is indeed valid the workflow model is complete. Section 5 is concerned with scheduling such workflows. There is a summary and future work section before the references of the paper. II. RELATED WORK In scheduling workflows on the cloud we are faced with processor and bandwidth varieties, assortment of execution ordering of ready tasks, and different quality of services needed by users. Optimal scheduling of a workflow represented by a general DAG is an NP-hard problem [7]. Therefore, different approaches such as integer linear programming [8], genetic algorithms [9], and heuristic algorithms [10], are proposed to produce close to optimal solutions. The goal is often meeting a user-defined deadline, minimizing the computational cost of the workflow, minimizing the make-span, and/or maximizing the success rate of the proposed algorithm with respect to user limitations. Of course, some algorithms are multi-objective which means that they are designed to optimize more than one objective, such as minimizing make-span and cost at the same time [11]. Here, Make-span is defined to be the time length from when the workflow is submitted to the cloud to the time when its execution is completed. Success rate is the ratio of the number of workflows successfully scheduled to the total number of workflows examined. There are other aspects to scheduling workflows. Data and computation privacy preserving is something that many companies would like to preserve. They hesitate to send their sensitive data or delicate computations over to the public cloud. Scheduling on a hybrid cloud in which private data and computations are kept on the private cloud and others may be sent to the public clouds has been recently studied [12]. Simultaneous scheduling of many workflows belonging to one organization with the goal of, for example, minimizing make-span is an additional active area of current
research [13]. Fairness is a new concept related to workloads of workflows all belonging to one enterprise where it wants to be nondiscriminatory in assigning resources to different workflows [14]. Scheduling TIGs is another field which makes the foundation of this paper’s scheduling extended workflows. In a connected parallel TIG, it is not possible to complete one (or many but not all) task at a time but the whole TIG must complete at one time. For example, the whole sparse matrix and vector multiplication can be thought of one super-task and computing each element of the resulting vector can be organized as one task. Tasks of such a supertask can simultaneously run on different hardware resources of the cloud. However, it is possible to assign more than one task to one processor to be executed in some specified order but, only one task can be executing at any given time on each processor, i.e., single-programming. In this research, a super-task is by itself considered to be a TIG, hence these two concepts may be used interchangeably. Figure 2 shows a TIG graph for accomplishment of the parallel multiplication of a small-size sparse matrix and a vector. A TIG may represent a complete independent application or it may be part of a larger application. For example, sparse matrix and vector multiplication is seldom a complete application and it can be part of solving a greater problem such as matrix multiplication using matrix-vector product to reduce the number of operations and solving a system of linear equations. 12 25 10 8 4 5 9 7 42 2 11
85 34 76 96 17
T1
T3
T5 T4
T2
Figure 2: The TIG of matrix-vector multiplication
TIGs are traditionally scheduled on multiprocessors and computer clusters [15]. Nowadays, distributed systems and the clouds provide favorable platforms for solving parallel applications [2]. A TIPG is neither a simple DAG nor a TIG but a new graph model in which all interaction, communication, and precedence are allowed. The closest model to this could be a DAG (or workflow) composed of simple (sequential) tasks and parallel tasks. Nevertheless, there are many differences between the two. (1) Before the production of a DAG of tasks and parallel tasks (which will be called TPDAG for short) parallel tasks of the application being modeled have to be recognized and to be considered as indivisible units, similar to simple tasks, within the model. However, a TIPG model starts with tasks only and there is no composite task at the beginning. Two different types of relations are allowed between each pair of tasks, interaction and precedence combined with communication of the preceding task to the descending task. Interaction resembles concurrency of tasks and information exchange during execution. The recognition of super-tasks in the TIPG is
automatic and without the interference of users. (2) Parallel tasks of a TPDAG are coscheduled to run in parallel, i.e., all subtasks of a parallel task start simultaneously and the whole parallel task completes at one time. This means, a child of a task which is coalesced into a parallel task cannot start until the whole parallel task is completed and its data (if any) is received [9]. On the other hand, tasks of a supertask within a TIPG can run concurrently too, i.e., they can start at different times and complete at different times as long as required communications are possible. Besides, a child of super-task can start its execution as soon as its parent tasks (not super-task) are completed. This can have a great impact on the time-span of the TIPG and as a result on the price to be paid for running the TIPG on the cloud. (3) Inputs to a parallel task are receive by the parallel task itself and outputs are also sent by the parallel task. For a supertask of a TIPG, any input from external tasks of a super-task is independently received by the receiving task within the super-task and any output from a task within the super-task is sent directly by the sending task within the super-task. As a result, parallel communication from/to tasks of a supertask is the rule. This is another flexibility which can have a great impact on the quality of scheduling services. Our work could be considered as an extension of workflows of tasks and parallel tasks which have been used by many applications. A well-known such application is Proteomics workflow [16]. Many workflow management software such as Pegasus have the capability to handle parallel tasks. III. PROBLEM FORMULATION A directed acyclic graph when used to model workflows prohibits tasks to directly or indirectly interact during their executions. We propose an extended model called Task Interaction-Precedence Graph (TIPG) which allows interaction, communication, and precedence capabilities, simultaneously. A TIPG, G(V,U,D), consists of a set of vertices, V, a set of undirected edges, U, and a set of directed edges, D. Each edge, directed or undirected, connects one vertex to another and there could be at the most one edge between any pair of vertices. The transfer of information between two interacting tasks could occur any time during their executions or even, at the end of the execution of one task it could send information to another (which is not completed yet) for the last chance. Interactions are represented by undirected edges. On the other hand, each directed edge expresses two things, (1) a precedence relation, i.e., the execution of the starting vertex of the edge must precede that of the ending vertex and (2) possible data communication at the execution end of the originating vertex to the ending vertex before the latter can start. In introduction of TIPG, our purpose is to extend DAGs to provide the foundation for allowing interaction between any pair of tasks while preserving the directed and acyclicity attributes of the graph. Doing so, it becomes possible to use the rich background of scheduling workflows that are modeled by DAGs. Therefore, considering directed edges only, a TIPG must be acyclic, i.e., it should not be possible to start from some vertex and follow a sequence of directed
edges and reach back to the same vertex. However, for undirected edges only, cycles are not forbidden. For a TIPG graph to be valid it must be acyclic with respect to directed edges and also conflict-free. Definition 1: A TIPG is conflict-free if there is no directed path between any pair of vertices of any connected component of the TIPG where only undirected edges are considered to obtain these connected components. Definition 2: In the rest of this paper, wherever we talk about connected components of a TIPG graph G(V, U, D) we mean, a sub-graph G(V, U)G(V, U, D), VV, UU, i.e., U is composed of only undirected edges of the TIPG. Definition 2 complies with the classical definition of connected components for undirected graphs when all directed edges of the TIPG are ignored. Recall that, in computer science, for a directed graph, the concept strongly connected components is used rather than connected component. If there is any conflict in a TIPG graph it means that there are at least one pair of vertices that one vertex precedes the other and at the same time they can interact during their executions. According to Definition 2, this pair belongs to one component which is supposed to form a TIG. However, this component is not a valid TIG [15]. Recall that, the concern of this work is to schedule hybrid DAGs of tasks and TIGs not any other task structures. Hence the design of such a system is incorrect and it should be fixed before going about scheduling it. Therefore, conflict detection algorithm, which will be designed later, could be thought of one type of design verification [17]. It should be mentioned that the conflict-freeness is a sufficient condition not a necessary one. There could be examples of specific execution times and execution start-end restrictions that the condition is not satisfied but the system is schedulable. In this paper, no such restrictions are imposed on the system. Therefore, for the general case, conflict-freeness has to be ensured. An undirected edge between a pair of tasks means that these tasks can, directly or indirectly, interact at any time during their executions. This interaction includes possible information transfer from either vertex to the other, once or many times. If an undirected edge connects two vertices vi and vj it is represented by either (vi, vj) or (vj, vi), with no distinction. However, if a directed edge connects two vertices vk and vl it is represented by , where vk is the starting vertex and vl is the ending vertex of the edge. Suppose for the DAG of Figure 1, tasks T1 and T2, T3 and T5, and T5 and T7 have to interact during their executions. The resulting TIPG is shown in Figure 3. In this figure, all execution and communication times are removed only to increase clarity of the graphical representation of the TIPG, but they are implicitly in place. Besides, the average interaction times of pairs of tasks T1 and T2, T3 and T5, and T5 and T7 are 6, 4 and 7, respectively. These values will be needed later on for the workflow scheduling.
T1
T3
T2
T5
T7
T6
T8
T4
T9
T10
Figure 3: A sample task interaction-precedence graph
Considering only directed edges, this TIPG is acyclic. After connected components are produced we will show that there is no directed path between any pair of edges within any connected component, therefore the TIPG is conflictfree. This could be a valid model for tasks of a real application; there is one more thing to check. We will talk about this checking later but it is about the whole DAG of tasks and super-tasks being acyclic. Assuming that it passes this check, such an application cannot be modeled by neither a DAG nor a TIG graph. To represent a TIPG graph an NN matrix, M, where N is the number of vertices, is used. In this matrix, if there is no edge between vi and vj, i, j =1,2,3,…N then M[i,j] =Null; if there is a directed edge from vertex vi to vertex vj then M[i, j] is set to the average data transfer time from vi to vj; otherwise M[i, j] is set to -1 times the average interaction time between vi and vj, i.e.,
To distinguish between directed edges and undirected edges interaction times are represented by negative numbers. However, when necessary, the absolute values of these negative numbers will be used. Matrix M is usually very sparse with respect to non-null values. For example, for the TIPG of Figure 3 there would be 12 positive numbers, six negative ones, and 82 nulls. IV. VALIDITY VERIFICATION OF HYBRID DAGS In the context of this research a TIPG must pass three tests to become a valid hybrid DAG. (1) The original TIPG must be acyclic with respect to directed edges. (2) The TIPG must be conflict-free. (3) Considering each connected component as being an indivisible super-task with one entry and one exit, such a graph must be acyclic. Condition 2 is an absolute must satisfy statement, otherwise the model is not valid. Conditions 1 and 3 are the restriction of our research meaning that, this research aims at eventually scheduling hybrid workflows which can be modeled as DAGs and no other models of workflows such as iteration structure. These restrictions will also enable us to use the vast knowledge of
scheduling that has been developed by many other researchers in the area of DAG scheduling. An algorithm has to be designed to systematically perform these tests. To do so, after the first test is successful, it has to find all connected components of the graph considering only undirected edges, i.e., positive numbers of Matrix M are considered nulls and negative ones are considered undirected edges, in this stage. To our aim, it is safe to assume that a vertex, by itself, is not a connected component thus a connected component has at least two vertices. Then, for every pair of vertices of every connected component we have to make sure there is no directed path from one to the other, i.e., considering only directed edges of the TIPG in this stage. This is called conflict-freeness of the TIPG. Using notations of Definition 2, if for any two vertices (vi, vj)U there is at least one path, PU, from vi to vj the TIPG is not conflict-free and hence the model is not valid. Then the whole TIPG is restructured in the form of a directed graph by considering each whole connected component as one indivisible super-task. All communications to all tasks of a super-task are received by the super-task itself and all communications from the tasks of a super-task are from the super-task itself. The final check is to make sure that the new directed graph of tasks and super-tasks is acyclic. It is worth mentioning that one purpose of restructuring the graph is to run validity test and make sure that there are no cycles in the model. It will be discussed later that each task of a super-task can retain some degree of autonomy in data communications. Algorithm 1 shows the steps to be taken for transformation of a TIPG to a directed graph and verifying its validity as a hybrid DAG. At the start of the algorithm, Matrix M is the representation of the TIPG as it is explained earlier. This matrix will be augmented with new rows and columns corresponding to the connected components found. The number of elements in vector V is equal to the number of vertices on the original TIPG. At the end, Vector V represents which vertex belongs to which component, i.e.,
Since there are many references in the literature for finding cycles (if any) in a directed graph (Test 1) it is assumed here that the given TIPG graph is already acyclic. Algorithm 1 first produces all connected components and then proceeds with the validity tests. -------------------------------------------------------------------------------Algorithm 1: Producing a potential Hybrid DAG (HDAG) from a TIPG and checking its validity -------------------------------------------------------------------------------1:Boolean procedure HDAG_Production&Validity(M, V) 2: while (exists new connected component, C) 3: Update vector V to represent this component 4: end while /*from here on components are connected*/ 5: for every (pair of vertices (vi,vj) of every component) do 6: if (exists a directed path from vi to vj or vj to vi) 7: return false 8: end if
9: end for 10: for every (component, C) do 11: add a new row and column to M and fill entries using Rules 2 to 5 12: remove rows and columns of nodes of C from M 13: end for 14: for every (vertex v of every component) do 15: replace all positive values of row v by null 16: replace all positive values of column v by null 17: end for 18: if (cycle (M, V)) return false 19: else return true 20: end procedure --------------------------------------------------------------------------------
In Lines 2 to 4 all connected components of the given TIPG are found, one by one. From Line 4 on, any reference to component means connected component. Vector V is set to represent which tasks of the original TIPG are parts of which component, if any. The conflict-freeness of each connected component is then checked in Lines 5 to 9 and if there is at least one conflict in any connected component there is a design error and the original TIPG must be redesigned. Each component is called a super-task, in its entirety, is now a new object in the model. To represent connections of each of these new objects with other objects, for every component a new row and a new column is amended to the Matrix M, i.e., super-tasks are represented in the same structure where tasks are represented. The values of elements of these rows and columns are filled with respect to communication times between tasks and supertasks to this super-task using Rules 2 to 5 that are discussed later. Then all rows and columns of the tasks that are melted into a super-task are removed from the matrix. Lines 10 to 13 of the algorithm have this responsibility. Up to this point, a potential hybrid DAG is produced. Nevertheless, there should not be any edge from a task or a super-task outside a given super-task to a task inside this super-task and vice versa. Such edges have to be changed to/from the super-task itself. Lines 14 to 17 are intended to serve this purpose. What remains is that we have to make sure the resulting graph of tasks and super-tasks, where each super-task, as a whole, is considered an indivisible unit of work in this stage, is acyclic. Line 18 will take care of this job and if a violation is diagnosed the graph is not acyclic. Otherwise, the hybrid DAG is valid. In the body of Algorithm 1 the following rules are used. Rule 1: All vertices and undirected edges of every connected component are separately grouped and encapsulated as a super-task, before going about checking the correctness of the model. Rule 2: Any data sent from an external task or super-task to a task within a super-task is sent to the super-task itself. Rule 3: Any data to be sent from a task within a super-task to an external task or super-task is sent by the super-task itself. Rule 4: If an external task or super-task wants to send data to more than one task within a super-task the maximum or the average (depending on the scheduling method to be used) of the data over all receiving tasks is sent to the super-task. It is
assumed that there are independent communication links between each pair of resources. Rule 5: If more than one task within a super-task want to send data to an external task or super-task the maximum or the average (depending on the scheduling method to be used) of the data is sent by the super-task. 4.1 Complexity Analysis of Algorithm 1 The number of operations needed for Lines 2 to 4 of Algorithm 1 is proportional to (N+E), i.e., C1(N+E), where C1, N, and E, are a constant, the number of vertices, and the number of edges in the original TIPG, respectively [18]. The number of operations for Lines 5 to 9 is proportional to the number of pairs of vertices times NlogN +E, i.e., C2N2(NlogN+E). The number of operations for line 18 is C3(N+E). Therefore, time complexity of the algorithm is dominated by the time complexity of Lines 5 to 9. Consequently, the time complexity of the algorithm is O(N2(NlogN+E)). Algorithm 1 is very sapce-efficient one. It uses one matrix, M, and one vector, V. Matrix M is originally NN and Vector V has N elements. After augmentation, the number of rows and columns of the matrix will increse as many as there are super-tasks. Since the number of supertasks cannot reach N/2 hence the space complexity of the algotithm is O(N2). For the TIPG of Figure 3, Vector V will be filled as in Equation 1. VT=(1,1,2,0,2,0,2,0,0,0)T
(1)
There are two connected components, (1) the TIG of tasks T1 and T2, and (2) the TIG of tasks T3, T5, T7. There are no directed paths from T1 to T2, T2 to T1, T3 to T5, T5 to T3, T5 to T7, T7 to T5, T3 to T7, or T7 to T3. Therefore, the TIPG of Figure 3 is conflict-free. The algorithm checks and makes sure that there is no cycle in the whole new graph of tasks and super-tasks when each super-task is considered indivisible. Showing so, the new graph is a valid hybrid workflow. An actual implementation of Algorithm 1 could include a preliminary stage of finding the reachability matrix of the graph by considering only directed edges. This can be done using the Floyed-Warshall algorithm [19]. This matrix can be accessed in Line 6 of the algorithm many times. The complexity of this preliminary stage of the algorithm is O(N3), where N is the number of vertices. This is a lower time complexity than the time complexity of the algorithm which we produced earlier, hence the time complexity of the algorithm is still O(N2(NlogN+E)). We can now proceed with scheduling valid hybrid workflows. V. SCHDULING HYBRID WORKFLOWS In this section, given a valid hybrid DAG and its representation using the original Matrix M and the Vector V which is obtained by Algorithm 1, we would like to study its scheduling on the cloud. Original Matrix M is the matrix before being modified by Algorithm 1, i.e., the same matrix is given to Algorithms 1 and 2 as input. Super-tasks will be
identified by Vector V. In scheduling a hybrid workflow, consisting of tasks and super-tasks represented by a hybrid DAG There are many parameters to pay attention to. Each super-task usually needs more than one simultaneous processing resources to be able to properly execute. There may exist some situations in which workflow owner is not under the pressure of completing the workflow soon. In such cases interacting tasks can execute in a multiprogramming fashion. If l is the least number of simultaneous processing resources needed for scheduling a super-task composed of n≥l tasks then this super-task could be scheduled using any number of l+1, l+2, or n, processing resources. This freedom of the ability to use different number of processing resources for each TIG adds a new dimension to the workflow scheduling problem. If the goal is to reduce the whole workflow’s make-span then we would like to use more processing resources. However, if the goal is to reduce the cost (money to be paid to the cloud provider) it might be better to use lower number of processing resources. The problem of scheduling a set of tasks on multiprocessor systems in which some tasks need more than one processor simultaneously has been studied before. The nature of such tasks are quite different from what is called TIG. Each of those tasks is called a multiprocessor task and to run it, all processors start and finish in simultaneously. An examples of such a task is one that is responsible for simultaneous analysis of some testing data received from many testers; data from each tester is analyzed by a different processing element [20]. Subtasks of a multiprocessor task are usually of finer granularity than tasks of a TIG. On the other hand, tasks of a TIG can start and finish on different times as it will be discussed later. Thirdly, each task of a super-task is allowed to send/receive communications to/from other tasks, i.e., a task of a super-task has its identity and some degree of autonomy. Furthermore, the execution time of all tasks of a super-task are not usually the same. When concurrency, as oppose to parallelism, is allowed, it is possible to assign different time intervals to different tasks of a TIG. At the exact time when a super-task becomes ready for execution, i.e., its predecessor tasks and/or super-tasks of each of its tasks are completed and their results are received by related tasks, the already leased resources have not or will not usually finish their current tasks, simultaneously. Therefore, scheduling a super-task in the middle of scheduling a hybrid workflow is quite different from scheduling a standalone TIG, by itself, on the cloud. The make-span of executing a super-task is not always equal to the sum of execution times of its included tasks. It is neither always equal to the minimum (or maximum) execution of all tasks. To design a schedulers, using some of the existing scheduling approaches, our hybrid workflow solution would not be completed before we have a good estimate of the execution time of each of the super-tasks of the workflow. Therefore, a safe average execution time must be estimated to go about adapting the existing workflow schedulers to schedule a hybrid workflow. It is
called average because the cloud offers a variety of processor (resource) types with different processing powers. Before an actual assignment of a super-tasks to specific processors, it is not possible to correctly compute its exact execution time. To start with, this average would be useful but when assigned, during the scheduling process, the actual execution times will be used. By highlighting these facts, one realizes that efficient scheduling of hybrid workflows is a more complex job than efficient scheduling of simple workflows. However, existing schedulers could provide good suggestions on how to go about scheduling hybrid workflows. There are two major categories of TIGs, TIGs with parallel, i.e. multiprocessor, tasks and TIGs with concurrent tasks. In this paper the workflows composed of super-tasks of both parallel tasks and concurrent tasks will be studied. 5.1 Workflows with parallel TIGs In the parallel tasks model, all tasks of a super-task are simultaneously assigned to as many processors as there are tasks. The common practice for parallel tasks is that they are assumed to be non-preemptable. A non-preemptable task is a task that when it is assigned to a processor to run, the processor cannot be taken back before its execution is completed. Execution continues to the end and, tasks can communicate during their whole execution life time. The super-task completes when all of its tasks complete. Although tasks could theoretically have different execution times, in the parallel model, all must be available for possible interactions with each other from the beginning up to the completion. For example, in the sparse matrix and vector multiplication, computing each element of the resulting vector could be assigned to a different processor. Because of the sparseness of the matrix not all processors have to do the same number of scalar multiplications and additions. However, because they use shared data, a task with less number of scalar multiplications may logically finish later than a task with more number of scalar multiplications. In such an environment, the whole result vector must move to the next stage of processing at one time. This is a simple model of TIGs which is assumed in the design of our first scheduling experiment. That is, all the resources needed by tasks of a super-task are assigned and at the end taken back, simultaneously. The assignment time is when all input data of all tasks from outside of the supertask are received. This assumption will be relaxed in the next subsection. In the hybrid workflow model presented here, for the purpose of scheduling, there is no need to add a master or coordinator task on top of each super-task which otherwise imposes extra overhead to the system. Each task of a supertask may be assigned to a different processor and it has the autonomy to independently receive data from its predecessor tasks outside the super-task and send its results to its successor tasks outside the super-task. However, tasks of a super-task are coscheduled and a task of a super-task cannot independently be scheduled. One can think of these tasks as being semi-autonomous.
Some characteristics of the TIPG, especially effective execution time of a TIG, are not needed for the section dealing with its validity verification. Here, we have to explain such details. Suppose Super-task Si is composed of Ni tasks, ti_1, ti_2, …,ti_Ni, with corresponding average execution times ei_1, ei_2,…, ei_Ni, i.e., Si={(ti_1, ei_1), (ti_2, ei_2),…(ti_Ni, ei_Ni)}. It is possible that the interaction time of tasks of some TIGs of a workflow is not negligible. The model has to have the capability of handling interaction times. Let’s represent the interaction time of two tasks Vi and Vj within a super-task by i,j. During the interaction both tasks are involved using a shared memory if both are on the same resources (with more than one core) using system calls such as send and receive. Theoretically speaking, it is possible that one task may waste more resource time during the interaction than the other. However, in this research it is assumed that the required interaction time is evenly split between the two. We can now compute the interaction time needed by each task as follows.
Hence, in the parallel task model, the average effective execution time of Super-task Si is then estimated as in Equation 2.
Considering Equation 2, the hybrid workflow obtained from TIPG of Figure 3 is shown in Figure 4. 59
47 S1
22
T1 S1
2
6 T2
S2
45 T3
T4 16
5
4 4
38
8
T5
T7
3
T9
43
7 56
8 5
6
3
8
2 T6 30
9
T8
2
42
T10 4
Figure 4: A hybrid DAG of tasks and super-tasks
In scheduling workflows it is very essential to first define the objectives such as minimum cost for the user, shortest make-span, privacy preserving, or even a combination of these objectives. Our objective for the rest of the paper would be to minimize the cost subject to deadline constraint. One of the most common approaches in scheduling simple workflows is called list scheduling. In this approach, each task is given a rank which represents the execution priority (with respect to precedence and its critical path length to the end of workflow) of this task in comparison to
other tasks. Ranks are computed such that a higher rank represents a higher execution priority [21]. A minor modification in the calculation of ranks is applied here to include both tasks and super-tasks and the recursive formula is shown by Equation 3. In this equation, the rank of a task or super-task is equal to its execution time if the task or super-task is the exit vertex. Otherwise, it is equal to its execution time plus the maximum between sum of the rank of each of its child and the communication from this task to that child. If there are more than one ending vertices in the hybrid workflow they are all directed, with zero communication edges, to a dummy task called texit with zero execution time, otherwise the only ending task or super-task is called texit. We assume that, a tentry is also added to the workflow, if necessary.
The ranks of tasks and super-tasks of the hybrid workflow of Figure 4 are shown in Figure 5. As it can be seen, tasks within super-tasks are not ranked. Based on ranks, in the same figure, an execution priority queue is shown. The head of the queue is at the right side of the figure. Tasks or super-tasks are taken one at a time from the head of the queue and if it is ready, i.e., its predecessor tasks are completed and their data is sent to this task (or to all tasks within this super-task), it can be assigned to a resource (or resources in the case of a super-task) to start executing. It is clear that there can be more than one task in the state of execution at any given time, simultaneously. For example, Super-task S2 and task T8 could execute at the same time; which means tasks T3, T5, T7, and T8 can be running simultaneously. 149.5
S2
43
245.5
T4
Sentry
T9
97
S1
Texit T6
T8
T10
181.5
99
55
Priority list (queue) of tasks: T9 |T10| T4 |T8| S2 | T6 | S1 Figure 5: Ranks of tasks and super-tasks of Figure 4
After laying the foundations of hybrid workflows of tasks and TIGs, scheduling such workflows with one objective, minimizing its execution cost, and one constraint, respecting the given workflow deadline will be studied, first. The pricing system of the cloud’s infrastructure as a service is based on an entity called cost unit which is a time interval of fixed size. Therefore, the objective would be to consume as few cost units of cheap resources as possible, for the given workflow. In the cloud, each processing resource comes with main memory and cache. For example, one virtual machine could be equivalent to one core of a quad core, Intel Xeon X3210, 2.13GHz, 8M cache, and 8GB
RAM. The price of a processing resource for each cost unit depends on the processing performance, i.e., average Million Instructions per Second (MIPS) capability, of the resource. On the other hand, processing performance of the resource depends on many things such as clock frequency, memory size, cache size, the number of cache levels in multicores, etc. Pricing cloud resources such that it is beneficial to both providers and consumers is a research topic by itself, however there are some guidelines [22]. On the graph of Figure 1, the numbers next to tasks represent the average execution time needed by the task with respect to the offered resource types by the cloud. When a task is actually assigned to a specific resource its actual execution time must be calculated. In scheduling a mixture of tasks and TIGs using list scheduling approach, any time in the process a TIG may show up, hence there must be enough available processors to co-schedule its tasks, simultaneously. As such, the whole scheduling strategy should be designed in such a way that, when possible, all processing resources are allocated to tasks and super-tasks evenly, at all times. This means, at a given time, if one processing resource is allocated up to time x then others, or many of them, are also allocated up to time x or close to time x. Speaking in probability terms, considering finish time of resources to be a random variable, if the mean of this random variable at time t is represented by , the standard deviation of finish time should be small, roughly speaking all finish times should be close to . This will prevent an undesirable consequence called excess external fragmentation. This concept is customarily used in main memory management when the memory space is broken into little unusable pieces and the memory allocation approach does not allow relocation of allocated spaces to get a large usable partition. A similar situation can occur for time allocation, instead of space allocation, when the time periods of processing resources are allocated to tasks or super-tasks. External fragmentation rapidly increases at the time when the availability time, i.e., completion time of currently assigned task, of resources are uneven and all tasks of a TIG have to be co-scheduled simultaneously. It turns out that to increase evenness degree of the assignments, tasks should be assigned to resources which complete them sooner. An optimistic estimate of the number of processing resources required to schedule a given hybrid workflow can help designing a better resource allocator. The deadline constraint of the hybrid workflow is the time by which the workflow must be completed by the cloud. If the workflow is submitted at time t and the deadline is at time d then the cloud is given a period of length RD=d-t to complete the workflow. To differentiate RD from absolute deadline, when necessary it will be called relative deadline, with respect to the submission time. The total computation time needed by all tasks and super-tasks can be computed using Equation 4. In this equation is a supertask and ei is the execution time of the super-task which is calculated using Equation 2 and Ni is the number of tasks in this super-tasks. Then again, ej is the execution time of task Tj which is not in any super-task.
Using Equation 4, the total computation time of hybrid workflow of Figure 4 is equal to C=59*2 +47*3+38+30+42+43+4=416 In the best situation, all communications could overlap with executions. However, this is not usually the case and the wasted time is also considered in the estimation of fragmentation which follows. For the fragmentation, since it is very similar to memory fragmentation. In the absence of other information, based on the 50% rule which says the number of unallocated partitions is one half of the number of allocated partitions, 1/3 of each processor resource is wasted [23]. In this calculation the average size of an allocated period is considered to be equal to the average size of an unallocated period. To the best of our knowledge, it is the first time that the concept of memory fragmentation is being used in estimating the effective computation time of a workflow. Therefore, effective computation time, EC, needed by the hybrid workflow is shown by Equation 5.
mean the total execution time needed by all tasks and supertask on the critical path without communications. For example, a scheduler may schedule workflows with say 20 percent higher than absolute minimum make-span. This by itself could be a criteria for efficiency of the scheduler which is not meaningful for critical path length with communications. One more factor which affects the scheduling decisions to minimize cost and complete the workflow within the given deadline is the price of resources. Although there are formulas that estimate the price of resources based on the price of most expensive and least expensive resource types, the price of resource types are directly obtainable from the specific provider who will run the workflow. These prices are publically available. Given the deadline of the workflow and the performance of resource types, we can estimate the least number of simultaneous resources which will be needed to execute the workflow. Of course, it is not forbidden to lease more simultaneous resources when needed. Equation 6 gives this number with respect to a reference resource type. If =1 is taken, it means that an average resource type is selected. If the performance of the fastest, average, and slowest resource types are Qf, Qa, and Qs, respectively, then
Using Equation 5, effective computation time for the hybrid workflow of Figure 4 is equal to It is worth mentioning that the execution time of tasks in a workflow are average values and if the processing performance of a resource is say one half of the average processing performance, the execution time of the task will be twice as what is given on the workflow model. The calculations are performed after the actual assignment of the task to the processor during the scheduling phase. The make-span of completing a workflow cannot be less than the time it takes to run the whole critical path of the workflow on the fastest resource. But what critical path we are talking about? There are two different critical paths, with communications and without communications. In finding the critical path with communications the goal is to assign the whole critical path to one resource and by doing so, saving communication times and hence hoping to produce a more efficient schedule [21]. This kind of critical path cannot help is finding the absolute minimum time-span needed for completing the workflow. In the critical path without communications only execution time of tasks are considered. Finding the critical path without communications is very similar to finding the critical path with communications by considering communications to be zero. For example, this critical path for the graph of Figure 4 would include tasks and super-tasks S1, T6, S2, T4, T10, and T9 with total execution time of 221. This is the least makespan for completing the workflow using resources with average processing power. Of course, it is not always possible to do so. On the other hand, the total execution time needed by the critical path with communications, which includes S1, T6, S2, T4, and T9 is 217. It can be very lower than that of the critical path without communications. From now on, when we talk about critical path length we would
Greater values of is
for lower performance resources and smaller ones is for higher performance ones. Since tasks of super-tasks are coscheduled, in cases where the number of tasks of some super-tasks are greater than RN we must use more than RN resources, simultaneously. As it will be discussed later, there are other situations that the scheduler has to use more than RN simultaneous resources.
Since the aim is to minimize the cost and complete the workflow before deadline, it is preferred to use cheaper resources with respect to their performances. With these details in mind, Algorithm 2 is developed. The global approach of the algorithm is to lease RN resources as soon as possible, condition that at the time of the lease there are tasks to assign to the newly leased resource and there are no other free resources. -------------------------------------------------------------------------------Algorithm 2: Scheduling hybrid workflows of tasks and TIGs -------------------------------------------------------------------------------1: Input: Matrix M represents Hybrid workflow, RD represents the allowed time period Vector V and tasks’ execution times 2: Output: Schedule H 3: Retrieve cloud resources’ availability and prices 4: Rank tasks and super-tasks and enqueue (Q) 5: compute RN 6: mark the entry task as ready 7: While (exists ready task in Q) 8: remove (highest ranked ready task T (Q)) 9: case 1: regular task Ti (T, V) 10: begin 11: if leased < RN and no available resource
12: lease_appropriate (1) 13: end if 14: call assign_task (Ti) 15: remove father-son links of Ti 16: if a child becomes orphan mark it ready 17: end begin 18: case 2: super-task Si (T, V) 19: begin 20: if available resources are not enough 21: lease_appropriate(Ni-available) 22: end if 23: call assign_stask(Si) 24: remove father-son links of Si 25: if exists child with no father mark it ready 26: end begin 27: end while 28: end algorithm --------------------------------------------------------------------------------
Before going into details of Algorithm 2, there are some points to be made. The queue Q is a priority queue with the properties that (1) tasks are ordered according to their ranks where higher ranked tasks are closer to the front of the queue, and (2) all ready tasks are moved to the front locations of the queue in the order of their ranks, i.e., higher ranked ready tasks are closer to the front of the queue than higher ranked non-ready tasks. To lease a new resource we must check whether we are ahead of our schedule or behind it. If we are ahead of our schedule the next resource to be leased is one grade cheaper than the last resource leased. If we are behind our schedule it is the other way around. lease_appropriate(1) will lease one such resource under some conditions. Leasing a new resource will be done if the total number of leased resources are at least RN and at the time of leasing a resource we are still ahead of our schedule, as it is explained in the following. lease_appropriate(Ni-available) leases as many as the difference of Ni and the number of available resources, if Ni is greater than the number of available resources. Suppose the exact time the scheduler wants to lease one or more resources is represented by Tf. The algorithm computes how much work has been done by all leased resources up to the time Tf using Equation 7. In this equation, is the total time that processor p has been leases up to time Tf. If a resourc is leased up to time T> Tf then its total leased time passed is computed, i.e., up to time Tf only. is the work that processor p can do in one unit of time. Equation 7 ignores communication times altogether. (7) Suppose is the total work (ignoring communications) needed by the workflow. We are behind our schedule if Equation 8 satisfies,
We are ahead of schedule if the reverse relation holds in Equation 8. Two sides of the equation are equal if the
system has done exactly the same amount of work that it was supposed to do by the time . To assign the selected task to a processor the procedure assign_task(task) is used. This procedure assigns the task to the resource which completes it the soonest. assign_stask(super-task) assigns all tasks of the super-task to as many processors as needed by the super-task. Same as before, it must do so in such a way that is best for the evenness of finish time of the processors. For both procedures, we must also pay attention to the amount of communications from the previously assigned tasks to this newly assigned task. It is to be clear that the scheduler will not intentionally delay the assignment of tasks to processors because it wants to increase the evenness factor. Depending on the number of leased processors and the number of tasks to be co-allocated, optimal assignment for the latter case takes a long time. A simple heuristic was adopted for this stage. If the scheduler wants to assign k tasks of a super-task to k resources simultaneously, k resources with the earliest finish times are selected and then the best assignment is found. Considering different performances of resources and communications to/from the super-task, even this simplified solution can be time consuming for a large k. At the exact time when all tasks of a super-task are ready to be assigned if there are already k1 (k1