A Semantic Based Execution Model for Multilevel Secure Workflows

6 downloads 38038 Views 305KB Size Report
Center for Information Management,. Integration ... a workflow, task dependencies are specified among them. .... We call such dependencies control-flow depen-.
A Semantic Based Execution Model for Multilevel Secure Workflows Vijayalakshmi Atluri

Wei-Kuang Huang

Center for Information Management, Integration and Connectivity and MS/IS Department, Rutgers University 180 University Avenue, Newark NJ 07102 [email protected]

Department of Operation and Information Management University of Connecticut [email protected]

Elisa Bertino Dipartimento di Scienze dell’Informazione Universit` a degli Studi di Milano Via Comelico, 39/41 20135 Milano, Italy [email protected]

ABSTRACT

Workflow management systems (WFMS) support the modeling and coordinated execution of processes within an organization. To coordinate the execution of the various activities (or tasks) in a workflow, task dependencies are specified among them. As advances in workflow management take place, they are also required to support security. In a multilevel secure (MLS) workflow, tasks may belong to different security levels. Ensuring the dependencies from the tasks at higher security levels to those at lower security levels (high-to-low dependencies) may compromise security. In this paper, we consider such MLS workflows and show how they can be executed in a secure and correct manner. Our approach is based on semantic classification of the task dependencies that examines the source of the task dependencies. We classify the high-to-low dependencies in several ways: conflicting versus conflict-free, result-independent versus result-dependent, strong versus weak, and abortive versus non-abortive. We propose algorithms to automatically redesign the workflow and demonstrate that only a small subset among all the types of high-to-low dependencies requires to be executed by trusted subjects and all other types can be executed without compromising security. The solutions proposed in this paper are directly applicable to another relevant area of research — execution of multilevel transactions in multilevel secure databases since the atomicity requirements and other semantic requirements can be modeled as a workflow. When compared to the research in this area, our work (1) is more general in the sense that it can model several other types of dependencies thereby allowing one to specify relaxed atomicity requirements and (2) is capable of automatically redesigning a workflow without requiring any human intervention by eliminating some cycles among task dependencies, which helps to attain higher degree of atomicity.

1

Introduction

Workflow management systems (WFMS) support the modeling and coordinated execution of processes within an organization. WFMS represent today an important, inter-disciplinary area which is commercially very significant, as witnessed by the large number of available products and by the standardization effort undertaken by the Workflow Management Coalition organization. The reason why WFMS are becoming increasingly important is because from an enterprise point of view the effective management of business processes is becoming increasingly crucial. Business processes control which piece of work (task) will be performed by whom and which resources are required and used to accomplish this task. Therefore, a business process specifies how a certain organization will achieve its goals. Optimizing such a process is crucial in today’s competitive world. Very often, the use of WFMS is connected with business process reengineering by which business processes are redesigned to achieve significant improvements in critical factors such as cost, quality, service and speed. Several applications are already supported by WFMS, including insurance policy/claims processing, travel expense approvals, healthcare management, system monitoring and exception handling, just to name a few. To coordinate the execution of the various activities (or tasks) in a workflow, a set of constraints called the task dependencies are specified among them. Task dependencies represent a key component in ensuring the flexibility required to support exceptions, alternatives, compensations and so on, which all arise in real-life activities. An example of constraint is to specify that a certain task must be aborted if another task is aborted. Such a constraint models the fact that if the latter task is not successfully completed, the former task is useless and therefore must be aborted. The development of flexible and powerful WFMS entails many important issues. These systems are thus continuously evolving in order to better satisfy application requirements. In particular, as advances in WFMS take place and their application scope widens, they are also required to support security, meaning that coordination among processes at different security levels has to be supported, indeed without violating security. In order to ensure correctness and reliability, workflows are associated with a workflow transaction model [12]. It is important to note that workflow transaction models must somehow be based on more “flexible” correctness criteria than traditional transaction models. For example, the classical “all-or-nothing” property of the traditional transaction models is not appropriate for workflow transactions. Such a workflow transaction may need to commit some of its actions, while aborting other actions. To satisfy such flexibility requirements, a large number of workflow transaction models have been proposed. Although in this paper we consider transactional workflows, in general workflows include users, activities, programs, and data. It is the ability to integrate the above four that sets general WFMSs apart from trasactional workflow models [1]. (Refer to [3] for a more general definition of workflow model.) Since the major thrust of this paper is to show how task dependencies can be enforced in MLS workflows, we limit our attention to transactional workflows. Despite the flurry of research and development work around workflow transaction models, security for such transaction models has not been addressed yet. In a multilevel secure workflow transaction (MLS workflow in short), tasks may belong to different security levels. Thus ensuring all the task dependencies, especially those from a task at a higher security level to that at a lower security level, may compromise security. It is easy to understand that in a multilevel environment it is not possible to force the abort of a lower level task upon the abort of a higher level task. The 2

goal of the work we present here is to consider MLS workflows and show how they can be executed in a secure and correct manner. Our approach begins with a semantic classification of the task dependencies which is based on a close examination of the source of the task dependencies. We argue that only certain types of dependencies can occur in MLS workflows. Then we propose algorithms to automatically (without human intervention) redesign the workflow in such a way that it can be executed in a secure and correct manner. In particular, our approach focuses on redesigning dependencies from higher level tasks to those at lower level because they are the cause for a potential signaling channel. We show that the redesigned workflow is equivalent to the original workflow by proving that both can potentially reach the same set of states. The remainder of the paper is organized as follows. Section 2 reviews the workflow and security models and develops the necessary definitions to formalize our approach. Section 3 presents the multilevel secure workflow model. Section 4 provides an approach to identify the various types of task dependencies based on the semantics of the dependencies. Section 5 presents our notion of equivalence to show that the redesigned workflow is in some sense equivalent to the original workflow. Section 6 presents the redesigning algorithms that describe how all the types of dependencies can be enforced without compromising security. Finally section 8 provides conclusions. Proofs of the theorems are presented in the appendix.

2

The Model

In this section we introduce the basic elements of our workflow model and we summarize the security model we assume.

2.1

The Workflow Model

A workflow is a set of tasks with task dependencies defined among them. A task in its simplest form consists of a set of data operations and task primitives {begin, abort, commit}. Execution of a task, in addition to invoking operations on data items (either read or write), requires invocation of these task primitives. All data operations in a task must be executed only after the begin primitive is issued. All tasks must end with either a commit or abort. A primitive may move a task from one state to another. A task (ti ) can be in one of the following states: initial state (ini ), execution state (exi ), commit state (cmi) or abort state (abi). (We use bi , ai and ci to denote the begin, abort and commit primitives of ti .) For instance, a task may move from its initial state to the execution state by invoking the begin primitive. Figure 1 shows the general structure of a task where the initial and final state are denoted by a filled circle, the intermediate state by an unfilled circle. Let D be the set of all data objects, and DO be the set of all data operations. That is, DO = {r[x], w[x]|x ∈ D}. Definition 1 A task ti is a partially ordered set of operations with an ordering relation ≺i where 1. ti = DOi ∪ P Oi where DOi ⊆ DO and P Oi ⊂ {bi , ai, ci}; 2. ci ∈ ti iff bi ∈ ti ∧ ai 6∈ ti , and ai ∈ ti iff bi ∈ ti ∧ ci 6∈ ti ; 3. for any oi ∈ DOi , bi ≺i oi ≺i either ci or ai (whichever is in ti ); and 3

ab

cm

c

a ex b

in

Figure 1: Task Structure

4. if ri [x], wi[x] ∈ ti , then either ri[x] ≺i wi [x] or wi[x] ≺i ri [x].

2

Definition 2 At any given time, the state of a task ti , denoted as sti , can be one of ini , exi , abi, cmi. However, the final state of the task can be one of ini , abi, cmi. 2 According to the above definition, the final state of a task may be the initial state itself. Such situation arises when a task is not executed. Note that not all tasks within a workflow are necessarily executed. Definition 3 Two data operations oi [d] and oj [d] conflict with each other if they operate on the same data object d and at least one of them is a write. 2 To control the coordination among different tasks, dependencies are specified based on these task primitives. Task dependencies in turn can either be static or dynamic in nature. In the static case, the workflow is defined well in advance to its actual execution, whereas dynamic dependencies develop as the workflow progresses through its execution [18]. Task dependencies may exist among tasks within a workflow (intra-workflow) or between two different workflows (inter-workflow). In [17], three basic types of task dependencies have been identified: control-flow dependencies, value dependencies and external dependencies. Control-flow dependencies may in turn involve explicit transmission of data as part of the result of a task. We call such dependencies control-flow dependencies with data flow. Definition 4 A workflow W can be defined as a directed graph whose nodes are the tasks t1 , t2 . . . tn x in the workflow and edges are the task dependencies ti −→ tj , where ti , tj ∈ W and x denotes the type of dependency. 2 In the remainder of this section, we briefly discuss the four categories of dependencies. 2.1.1

Control-flow Dependencies

A control-flow dependency specifies the conditions, specified as the state sti of task ti , under which a task tj can enter state stj . Control-flow dependencies can be modeled based on the ACTA framework [9]. Given two tasks ti and tj , a list of possible control-flow dependencies is presented below. 4

1. Begin-on-Commit Dependency: A task tj cannot begin until ti commits (represented as bc

ti −→ tj ). b

2. Begin Dependency: A task tj cannot begin until ti has begun (represented as ti −→ tj ). a

3. Abort Dependency: A task tj must abort if ti aborts (represented as ti −→ tj ). 4. Termination Dependency: A task tj must terminate (either commit or abort) only after the t completion (commit or abort) of ti (represented as ti −→ tj ). 5. Strong Commit Dependency: If a task ti commits then tj must commit (represented as sc ti −→ tj ). 6. Force Begin-on-abort (commit/begin) Dependency: A task tj must begin if ti aborts (comf ba

f bc

f bb

mits/begins) (represented as ti −→ tj (ti −→ tj / ti −→ tj )). 7. Exclusion Dependency: Given any two tasks ti and tj , if ti commits tj must abort, or vice e versa (represented as ti −→ tj ). 8. Weak begin-on-commit: Given any two tasks ti and tj , tj begins if ti commits, (represented wbc

as ti −→ tj ). sa

9. Strong Abort Dependency: A task tj can abort only if ti commits (represented as ti −→ tj ). A comprehensive list of task dependencies based on the three task primitives, namely, begin, commit and abort, can be found in [11, 9], which include commit, weak-abort, force-commit-onabort, serial, and begin-on-abort dependencies. These dependencies specified between ti and tj imply logical relationships among sti and stj . While some dependencies imply sti ⇐= stj ) (i.e., if sti is true then stj is true) and others imply sti =⇒ stj ) (i.e., sti is true onlyif stj is true). Based on the logical implication of dependency, we categorize them into strong (the latter case) and weak (the former case). Moreover, owing to the x fact that no task can be forced to commit (see NFC assumption below), a dependency ti −→ tj , whether it is strong or weak type, can be enforced only by enforcing the precedence relationship sti precedes stj (denoted as sti  stj ). Formally, the strong and weak types can be distinguished as follows: x

Definition 5 Given a control-flow dependency ti −→ tj , if the dependency implies a logical relationship sti =⇒ stj (sti ⇐= stj ), we say that it is weak (strong). 2 bc

An example of strong type is the begin-on-commit dependency (ti −→ tj ), which states that a task tj cannot begin until ti commits. (Other examples include b and t.) In practice, strong type dependency is used to specify the precondition(s) for a particular event to occur. The strong sc commit dependency, (represented as ti −→ tj ) which states “if a task ti commits then tj must commit” is an example of weak type. (Other examples include a, fba and wbc.) In a workflow, the weak type can be used in situations where a particular workflow state has to trigger an event. In addition to the above classification, dependencies can be further categorized based on the resultant state of the child task, which is as follows. 5

x

Definition 6 Given a control-flow dependency ti −→ tj , the dependency is of type abortive if stj 2 is abj ; otherwise it is non-abortive. The classification given by the above definition applies to both strong and weak type dependencies. For example, abortive type dependencies include abort, exclusion dependency, etc. whereas non-abortive type dependencies are commit, strong commit, begin on commit, serial, begin on abort, etc. No-Force-Commit and No-Prevent-Abort Assumptions: When enforcing the dependencies we make the following two assumptions. No-Force-Commit (NFC) Assumption: No task execution can be guaranteed to commit. However, a task can be forced to begin or abort. No-Prevent-Abort (NPA) Assumption: No task execution can be prevented from aborting. However, a task can be prevented from beginning or committing. Further examination of these dependencies reveal that all abortive type workflow dependencies must be weak. This is because a strong abortive type dependency states that a task has to abort only if a certain condition is true. However, according to the NPA assumption, abort of a task is unconditional. Thus one cannot guarantee that a task must abort only if certain condition arises and must not abort if the condition is not met. On the other hand, non-abortive type workflow dependencies can be either weak or strong. Additionally, because of the NFC assumption, enforcing some weak non-abortive type dependencies such as strong commit (sc) and force-commit-on-abort (fca) (tj must commit if ti aborts), may require an additional primitive operation such as precommit. 2.1.2

Control-flow Dependencies with Data-flow

A control-flow dependency with data-flow can be defined as follows: A task tj can enter state stj only after task ti enters state sti and ti passes values of data objects to tj . In these dependencies, in addition to the control flow, there could even be information flow (or data flow) between the tasks where a task needs to wait for data from another task. Notice that control-flow dependency with data-flow is meaningful only for limited combinations of sti and stj . For example, sti and stj can be “commit” and “begin,” respectively, but cannot be “begin” and “commit.” 2.1.3

Value Dependencies

A value dependency can be defined as follows: A task tj can enter state stj only after task ti ’s outcome satisfies a condition ci . The condition in the above statement can be a logical expression whose value is either 0 or 1. Note that this dependency is different from the control-flow dependency with the data flow. For example, “tj can begin if ti is a success (semantically).”∗ 2.1.4

External Dependencies

Unlike the prior two types, external dependencies are caused by some parameters external to the system, such as time. An external dependency can be defined as follows: A task ti can enter state ∗ Failure of a task does not necessarily mean abort of a task. A task may still semantically fail even if it successfully commits.

6

sti only after if a certain condition cj is satisfied where the parameters in cj are external to the workflow. Examples include a task ti can start its execution only at 9:00am or task tj can start execution only 24hrs after the completion of task tk .

2.2

The Security Model

We assume the security structure to be a partially ordered set S of security levels with ordering relation ≤. A class si ∈ S is said to be dominated by another class sj ∈ S if si ≤ sj . A class si is said to be strictly dominated by another class sj (denoted as si < sj ) if si ≤ sj and i 6= j † . Each data object d ∈ D, where D is the set of all data objects, is associated with a security level. Every task ti in a workflow W is associated with a security level. We assume that there is a function L that maps all data objects and tasks to security levels. That is, for every d ∈ D, L(d) ∈ S, and for every task ti ∈ W , L(ti ) ∈ S. We require every task to obey the following two security properties — the simple security and the restricted ⋆-property. 1. A task ti is allowed to read a data object d only if L(d) ≤ L(ti ) 2. A task ti is allowed to write to a data object d only if L(d) = L(ti ). In addition to these two restrictions, a secure system must prevent illegal information flows via signaling channels.

3

Multilevel Secure Workflows

A multilevel secure (MLS) workflow may consist of tasks of different security levels (as in example 1 below). Thus, an MLS workflow consists of nodes at different security levels where the dependency edges may connect tasks of either the same security level or different security levels, which can be distinguished as follows. The dependency edge connecting tasks of the same security level is referred to as intra-level dependency and the one connecting tasks of different security levels as inter-level dependency. Since intra-level dependencies by themselves cannot violate any multilevel security constraints and are not different from the task dependencies in a non-secure environment, hereafter we concentrate only on inter-level dependencies. We further divide inter-level dependencies into two categories: high-to-low‡ and low-to-high since their treatment has to be different in a MLS environment because of its “no downward information flow” requirement. Example 1 Consider a workflow that computes the weekly pay of all employees at the end of each week. This process involves several tasks as follows. Task t1 : compute the number of hours worked by an employee h, which is the sum of regular hours worked (n) and overtime hours worked (o) by the employee during that week, Task t2 : calculate the weekly pay of an employee (p) by multiplying h with the hourly rate of the employee (r), and Task t3 : after computing the pay for the week, reset h, n and o to zero. The information about hourly rate (r) and weekly pay (p) is considered sensitive, and therefore both r and p are classified high, while the rest of the information is classified low. Since this workflow involves write operations at different levels, it is a MLS workflow. †

Here we made an assumption that si 6= sj iff i 6= j. Although we use the term high-to-low, this dependency also includes those among two incomparable security levels. ‡

7

According to the two restrictions of our security model, since t1 and t3 write objects at low (h, n and o) they must be low tasks, and since t2 reads the high object (r) and writes the high object (p), it must be a high task. Moreover, the following task dependencies exist: task t2 can begin only after t1 commits, thus bc bc t1 −→ t2 , and t3 can begin only after t2 commits, thus t2 −→ t3 , as shown in figure 2. While bc bc t1 −→ t2 is a low-to-high dependency, t2 −→ t3 is high-to-low. Thus it is an MLS workflow. 2 Execution of a workflow involves (1) enforcing all task dependencies, (2) assuring correct execution of interleaved workflows, and (3) ensuring that the workflow terminates in one of the predefined acceptable states. In this paper, we focus on the first part only. While enforcing a high-to-low dependency, leakage of information may occur from tasks at higher level to those at lower level via signaling channels. As an example, consider the dependency bc t2 −→ t3 of the above example. Imagine t2 and t3 as two Trojan Horse infested programs. Assume these two tasks are currently executing in the system. If t2 , the higher level task wishes to signal information to the lower level task t3 , it will delay its commit by initiating a computationally intensive program. The lower level task t3 thus has to wait for its begin until t2 commits. Task t3 can measure such delays, for example, by going into a busy loop with a counter. Thus t2 can effectively signal information to t3 , thereby establishing a signaling channel. Therefore, it follows from the ”no downward information flow” requirement that for any given task dependency ti → tj , L(ti ) 6< L(tj ). That is, to prevent signaling channels, no high-to-low dependency must be enforced. In a correct MLS workflow specification, it is not possible to have a high-to-low value dependency because enforcing such dependency amounts to directly sending data from a higher to a lower security level. The same argument applies to the case of a high-to-low control flow dependency with data flow. With respect to external dependencies, for the purpose of our work, we categorize them as absolute and relative, where absolute dependencies are solely controlled by the external factors, whereas relative dependencies are specified as external parameters but are controlled by the internal events. For example, “a task ti can start its execution only at 9:00am,” is an absolute external dependency, whereas “task tj can start its execution only 24hrs after the completion of task tk ,” is a relative external dependency. We need this classification because enforcement of these two types is different in MLS environment. While absolute external dependencies can be enforced without compromising security, relative external dependencies may be exploited to establish a signaling channel, especially when this dependency is from a high task to a low task. A relative external dependency is nothing but a control flow dependency with additional temporal constraints. Since temporal constraints cannot be modeled by simple graph structures but require special modeling techniques that can incorporate external events, we do not consider them in this paper. Therefore, in this paper, we consider only the control-flow dependencies.

3.1

Execution Criteria

In the following, we define four levels of execution based on the degree of security or correctness that it guarantees. First we recall the definition of secure execution from [15]. An execution is said to be secure if it satisfies the non-interference property [13], i.e., no lower level task is affected by any higher level task.

8

t

high:

2

bc

bc

low:

t1

t

3

Figure 2: Task dependencies in the MLS workflow in Example 1

t

bc 3

1

4

a

bc t

t

bc

t

2

Figure 3: Task dependencies in the MLS workflow in Example 2 1. SSSC-level (strongly-secure and strongly-correct): An MLS workflow execution is said to be of SSSC-level, if it is secure and all the task dependencies are enforced. This calls for complete elimination of signaling channels, yet enforcing all dependencies. This is the most desirable case. 2. SSWC-level (strongly-secure and weakly-correct): An MLS workflow execution is said to be of SSWC-level, if it is secure but all the task dependencies need not be enforced. This requires complete elimination of all signaling channels, however, one need not enforce all the task dependencies. 3. WSSC-level (weakly-secure and strongly-correct): An MLS workflow execution is said to be of WSSC-level, if it enforces all the task dependencies but may allow a low capacity signaling channel. The capacity of the signaling channel can be reduced by introducing noise or introducing a fixed delay. 4. WSWC-level (weakly-secure and weakly -correct): An MLS workflow execution is said to be of WSWC-level, if it does not enforce all the task dependencies, yet allows signaling channels. Although it is desirable to have the first level of execution, this level is difficult to achieve due to the inherent conflicts between security and correctness. [4] proposes an approach to eliminate all high-to-low task dependencies. (i.e. ensures SSWC-level execution). In this paper, we show how SSSC-level of execution can be attained.

4

Semantic Classification of Task Dependencies in MLS Workflows

In this section, we take a closer look at all types of dependencies and examine what they semantically mean in an MLS environment. We give more insight into each type of dependency in MLS environment and argue that only some types of dependencies can be specified in MLS workflows and other types do not exist in a correct secure workflow specification. Note that our arguments focus only on high-to-low dependencies because, as we argue in section 6.1, low-to-high dependencies can be enforced without compromising security requirements. 9

To reason about the semantics of high-to-low dependencies (control-flow), we first would look at the source of this dependency and categorize them as follows: 1. This first category of dependencies arises to force the order of (conflicting) operations on shared data objects. 2. The second category of dependencies arises to force properties such as atomicity, mutual exclusion, etc. bc

Consider the task dependency t2 −→ t3 in example 1. The intention of this dependency is to avoid overwriting of n and o by t3 before t2 reads them. Thus, the source of this dependency is to force a specific order on the conflicting operations, and therefore belongs to the first category. Dependencies in the second category are specified according to the semantics of the workflow. For example, the semantics require that either one of two tasks must commit but not both (mutual exclusion). For instance, consider a travel reservation workflow, where two tasks are purchasing a ticket in Delta and in United, where only one task must commit but not both. In the following, we provide an example illustrating such dependencies in an MLS environment. Example 2 Consider a workflow that arranges a travel schedule for a person P. Assume that P has to first make a trip from Washington D.C. to Toronto and then from Toronto to Moscow. The second part of the trip is on a secret mission and therefore has to be considered as highly sensitive information and thus assumes high level. However, the first part of the trip is not classified and thus is considered low. Assume this workflow consists of the following four tasks: reserving a ticket for the first part of the trip (denoted as t1 ), purchasing the ticket for the first part (t2 ), reserving a ticket for the second part of the trip (t3 ), and purchasing the ticket for the second part of the trip (t4 ), where t1 and t2 are low level tasks and t3 and t4 are high level tasks. The following task dependencies exist: Purchasing a ticket cannot be started unless reserving the ticket is complete. bc bc Thus, t1 −→ t2 and t3 −→ t4 . Moreover, reserving a flight for the second part of the trip has to be done only after making sure that the flight is available for the low part of the trip, thus bc t1 −→ t3 . Furthermore, if purchasing the ticket for the second part of the trip aborts, purchasing a the ticket for the first part of the trip also needs to be aborted, thus, t4 −→ t2 . While the first two task dependencies are intra-level dependencies, the latter two are low-to-high and high-to-low dependencies, respectively, as shown in figure 3. 2 sc

The intention of the high-to-low dependency t4 −→ t2 in the above example is to capture the semantics of the workflow rather than forcing an order between conflicting operations. Thus this dependency belongs to the second category. If we examine once again the two types of the source of dependencies and analyze their effect on the workflow, we can make the following observation. Imagine the following two scenarios: in the first, assume the high-to-low dependency ti → tj is enforced, whereas in the second, this dependency sc is not enforced. With the second category of dependency (e.g., t4 −→ t2 in example 2) the result of tj might be different if ti → tj is not enforced than from the case when it is enforced. In other words, the result of tj might be affected if ti → tj is not enforced. However, the dependency ti → tj does not impact the result of ti . Thus we call the second category of dependencies result-dependent bc (RD). On the other hand, consider the first category of dependencies (e.g., t2 −→ t3 in example 10

1). If the dependency is not enforced, the result of tj will not be affected but that of ti will be affected. This can only occur when the two tasks share common data in multilevel secure systems (when this dependency is high-to-low). Thus, we call the first category of dependencies conflicting (CN). In the following, we formally define these categories. x

Definition 7 A dependency between two tasks ti −→ tj is said to be conflicting (CN) if there x exist at least two conflicting operations oi [d] and oj [d] (i 6= j); otherwise ti −→ tj is said to be conflict-free (CF). 2 On the other hand, from the perspective of task result, a dependency can either be resultindependent or result-dependent. Formally: x

Definition 8 A dependency ti −→ tj is said to be result-dependent (RD) if the result of executing the child is different when the dependency is enforced from that when it is not enforced; otherwise x ti −→ tj is said to be is result-independent (RI). 2 The intuitive idea behind this classification is that the result of the execution of either the child (in case of RD) or the parent (in case of CN) must be different when the dependency is enforced than from the case when it is not enforced. Thus, there cannot be any dependency which is both conflict-free and result-independent. Let us now examine this categorization in the wake of multiple security levels on data items and tasks. At this point, our primary concern is how to enforce high-to-low dependencies in a MLS workflow. • Case CN: Dependencies belonging to this category indicate that the two tasks involved in the dependency access some common data items in conflicting mode. The primary reason to enforce this type of dependency is to enforce the order of these conflicting operations. This category depicts a typical conflicting situation where the parent with a read operation is followed by the child with a write operation on the same data object (No other combination of read-writes are possible as per our security model). High-to-low CN dependencies can be further classified into result-dependent (RD) and result-independent (RI) dependencies, thus resulting in CN-RD dependencies and CN-RI dependencies. These two categories of dependencies are briefly discussed below. – CN-RI: RI dependencies mean that the result of the child does not depend on whether the dependency is enforced or not. However, no enforcement of this dependency may produce a different result for the parent task. Obviously the result of the child is independent of whether the dependency is enforced or not. Therefore, all dependencies falling into this category should be non-abortive such as begin on commit, begin on abort, etc. because otherwise the result of the child will get affected (i.e, abort) by the parent. – CN-RD: Dependencies such as force abort, termination, exclusion etc. that may cause the child task to get aborted (the abortive type) will fall under RD category. Abortive type dependencies are all RD because without enforcing the dependency, the child task may commit (as opposed to abort). This is because all abortive dependencies could possibly cause the abortion of the child and therefore are not of type RI.

11

RD

abortive (weak)

abortive (weak)

non-abortive (strong and weak)

1111111111111 0000000000000

11111111111 00000000000

RI

non-abortive (strong and weak)

CN

CF

Figure 4: Categorization of high-to-low dependencies • Case CF: Dependencies belonging to this category can only be formed by pure semantic specification. Although two tasks do not conflict, sometimes for ensuring an acceptable termination state of the workflow, such dependencies are specified. – CF-RD: These dependencies are specified in such a way that the execution of the child depends on the value or outcome of the parent. Therefore altering the execution order between these two tasks would affect the outcome of the child, in other words, the dependency is result-dependent. These dependencies can be either strong or weak. – CF-RI: A CF type dependency does not cause any effect on the result of the parent task. Thus, there cannot exist any dependency which is both CF and RI because its presence neither affects the parent nor the child. The above categorization of high-to-low dependencies (shown in figure 4) is important because each category needs to be handled according to a different approach in an MLS environment. The specific approach used for each category will be presented in the next section. We introduce now an algorithm to classify the high-to-low dependencies in a given workflow according to the above classification. The algorithm only needs to know the set of data that will be potentially read and written by each task and the set of dependencies among tasks. Note, however, the approach we introduce in the next section still applies even if a task only reads (writes) a subset of such set. Algorithm 1 [Identifying the Type of Dependency] x

for every ti −→ tj in W where L(tj ) < L(ti ), /* for every high-to-low dependency */ if ∃ri[d] ∈ ti and wj [d] ∈ tj where i 6= j /* if two tasks are conflicting */ x if ti −→ tj is abortive label x with CN-RD /* abortive CN type dependencies are RD */ else label x with CN-RI 12

/* non-abortive CN type dependencies are RI */ else label x with CF-RD /* all conflicting free dependencies must be RD */ end{for} 2

5

Notion of Equivalence

In this section, we develop the necessary formalism for proving the correctness of our MLS workflow execution model. x Given a dependency ti −→ tj , we say ti is the parent of tj and tj the child of ti . Definition 9 Given two tasks ti and tj in W , ti is said to be an ancestor (descendent) of tj , if ti is a parent (child) of tj or ti is a parent (child) of tk where tk is an ancestor (descendent) of tj . 2 Definition 10 Given a workflow W , a potential-state-set of W (denoted as P SS(W )) is a set such that each element (called potential-state) in P SS(W ) represents an allowed combination of final states of all the tasks in W . 2 Therefore, according to the above definition, P SS(W ) is a set of sets. As an example, consider a the workflow W = {t1, t2 }, and the dependency t1 −→ t2 . P SS(W ) = {(ab1, ab2), (∼ ab1, ab2), (∼ ab1, ∼ ab2 )} = {(ab1, ab2), (cm1, ab2), (cm1, cm2)}. Note that a different dependency between t1 and t2 may result in an entirely different P SS(W ). We define a workflow history as the set consisting of only data operations of all the tasks in the workflow by removing all the task primitive operations. Formally, it is defined as follows. Definition 11 [Workflow History] Given a workflow W comprised of a set of tasks {t1 , t2, . . . tn }, the workflow history WH of W is defined as WH = ∪ni=1 tiH such that tiH = ti − P Oi . 2 Our approach to executing MLS workflows is to redesign the workflow by using two mechanisms – splitting a task and running a compensating task. In the following, we develop the necessary formalism to show that the original workflow is in some sense equivalent to the redesigned workflow. Definition 12 Given a task ti and a security level s ≤ L(ti ), we say that there exists a partition tsi of ti if tsi = {oi[d] ∈ ti |L(d) = s} = 6 ∅. Since a task ti is allowed to read and write data at its own level as well as read data from lower levels, all read operations pertaining to a lower level s belong to one partition (say tsi ), whereas all read and write operations pertaining to the level of the task belong to another partition. Thus if a task reads from two lower levels and reads or writes at its own level, it has three partitions. As an example, consider a task ti as follows: ti = ri[x]ri[y]ri[z]wi[z], where L(ti ) = high, L(x) = low, L(y) = mid, and L(z) = high such that low < mid < high in S. Accordingly, ti has three partitions: thigh = ri [x] i mid ti = ri[y] tlow = ri [z]wi[z]. i Thus, every partition can read data from at most one security level. 13

Definition 13 A task ti is said to be compensatable if the effects of its execution can be semantically 2 undone by executing a compensating task t−1 i . −1 We assume all compensating tasks eventually commit. In other words, ex−1 i =⇒ cmi . −1 Definition 14 If there exists a compensating task t−1 i for a task ti , then L(ti ) = L(ti ).

2

Definition 15 [Semantic Projection] Given a workflow history WH , the semantic projection of WH , denoted S(WH ), is obtained as follows: (1) for every pair of tasks ti and t−1 in W , if both ti i −1 and t−1 commit, then remove all DO and DO from W , (2) for every t in W , if ti aborts, then i H i i i s s 2 remove DOi and DOi of every existing ti of ti for all s < L(ti ) from WH . The first item of the above definition states that if a task is compensated, then all its operations as well as the operations of the compensating task are removed from the workflow history. The second item states that if a task aborts, all its operations are removed from the history. Definition 16 [Semantic Transformation] Given a potential-state-set P SS(W ), the semantic transformation (τ ) of P SS(W ) is obtained by applying the following transformation rules: ∀P ∈ P SS(W ), (1) replace every occurrence of cmi , cm−1 ini , (2) remove every occurrence of in−1 i with i , ′ ′ s s (3) if cmi ∈ P where s = L(ti ), then replace every cmi such that s ≤ L(ti ) with cmi, otherwise ′ ′ replace every absi and cmsi such that s′ ≤ L(ti ) with abi , and (4) replace every occurrence of ini with abi. 2 Since the potential-state-set of a workflow represents the set of all possible final states, our semantic transformation replaces a final state or a combination of final states with its semantic equivalent state. More specifically, the first item of the above definition states that if a task successfully executes and later compensated then it is treated as if the task has not started its execution. The second item removes the states of all unexecuted compensating tasks. The third item states that if a task commits, then all its lower level partitions are also considered as committed; on the other hand, it it aborts, the lower level partitions are considered as aborted. Finally, the fourth item treats every unexecuted task as if it has aborted. Definition 17 [Semantic Equivalence] Given two workflows W and W ′ , we say that W ′ is semantically equivalent to W , denoted as W ∼ = W ′ , if (1) the semantic projection of both WH and ′ ′ ), (2) for every pair of conflicting WH consist of the same data operations, i.e., S(WH ) = S(WH operations oi [d] and oj [d], if ti is an ancestor of tj in W , then ti is an ancestor of tj in W ′ , and (3) P SS(W ) ≡τ P SS(W ′ ). 2 x

Definition 18 Given a dependency ti −→ tj of type x we define an inverse x−1 for x such that x P SS(W ) ≡τ P SS(W ′ ) where W = {ti , tj } with the dependency ti −→ tj and W ′ = {ti , tj , t−1 j } bc

x−1

−1 with dependencies tj −→ tj−1 and ti −→ t−1 j in which tj is a compensating task of tj .

2

In order to derive such an inverse for each dependency, we have made the following assumptions. 1. exi =⇒ either abi or cmi 3. ∼ cmi ≡ abi 2. ∼ abi ≡ cmi 4. ∼ exi ≡ ini The first assumption indicates that every task that starts its execution must either commit or abort. Assumptions 2 and 3 state that abort and commit are complements of each other. 14

x x−1

bc fba

a fba

e fbc

ba fbc

sc fba

Table 1: Some RD type dependencies and their inverses Assumption 4 states that a task that has not yet begun is functionally equivalent to the state where the task is in its initial state. Table 1 lists some RD dependencies and their inverse that are derived based on these assumptions. Lemma 1 The dependency f ba is an inverse for bc. Proof: We show how the dependency “fba” is an inverse of “bc” by showing P SS(W ) ≡τ P SS(W ′ ) bc bc where W = {ti , tj } with the dependency ti −→ tj and W ′ = {ti , tj , t−1 j } with dependencies tj −→ f ba

−1 t−1 j and ti −→ tj . bc

Recall from section 2.1.1 that ti −→ tj represents the dependency that task tj cannot begin until task ti commits, which implies the following logical relationship: cmi ⇐ exj . This implies the following logical combinations of cmi and exj : (cmi, exj ), (cmi, ∼ exj ), (∼ cmi , ∼ exj ). Applying the above assumptions to the above combination results in the following P SS(W ) = {(cmi, cmj ), (cmi, abj ), (cmi, inj ), (abi, inj )} = {((cmi, cmj ), (cmi, abj ), (abi, abj )}. bc

−1 Now if we consider W ′ , tj −→ t−1 j implies the logical relationship: cmj ⇐ exj . On the other f ba

−1 hand, ti −→ t−1 j states that if ti aborts then the compensating task tj must begin its execution. Therefore, it implies the following logical relationship: abi ⇒ ex−1 j . Combination of these two −1 dependencies implies the following logical combinations of abi , exj and cmj : (abi , ex−1 j , cmj ), (∼ −1 −1 −1 abi , exj , cmj ), (∼ abi, ∼ exj , cmj ), (∼ abi, ∼ exj , ∼ cmj ). (Note here that there is no combination with ab−1 j because of our assumption that every compensating task eventually commit.) Now applying our set of assumptions results in the following −1 −1 −1 P SS(W ′ ) = {(abi, cm−1 j , cmj ), (cmi, cmj , cmj ), (cmi, inj , cmj ), (cmi, inj , abj )}. Now application of the semantic transformation τ to P SS(W ′ ) results in a set of states, {(abi, inj ), (cmi, inj ), (cmi, cmj ), (cmi, abj )} = {(abi, abj ), (cmi, abj ), (cmi, cmj )}, which is equivalent to P SS(W ). 2

6

Execution of MLS Workflows

Enforcing a low-to-high dependency will not result in violation of security. By contrast, a signaling channel may be established while enforcing a high-to-low dependency. The high-to-low dependencies are, therefore, much more difficult to handle than low-to-high dependencies. In the remainder of this section, we first briefly summarize two possible approaches to enforcing low-to-high dependencies. We then discuss approaches to enforcing high-to-low dependencies which are the focus of our paper.

6.1

Low-to-High Dependencies bc

Consider a task dependency ti −→ tj such that L(ti ) < L(tj ), meaning that tj can begin only after ti ’s commit. Enforcing such a dependency requires the use of a mechanism by which the 15

higher-level task tj is activated upon commit of ti . Several approaches can be devised. A first approach is based on the use of triggers. Under this approach, a trigger would be incorporated into ti . Thus ti activates a trigger upon its commit at the high level, at which point the high task tj can begin. This will not violate security since it is equivalent to writing-up. To ensure that the trigger is delivered from low to high, and then to increase reliability, this approach can be complemented by mechanisms supporting reliable transfer of messages in multi-level systems. Recently, an approach called NRL Pump has been proposed [14] which provides a reliable transfer of messages from lower to higher levels with a controlled stream of acknowledgments from higher to lower levels. Even though a Trojan Horse can still leak some bits, the channel capacity can be kept so small that no meaningful leakage of information is possible [16]. An analysis has been carried out in [14] to measure the capacity of the signaling channel. Another approach is based on testing a given precondition. Such precondition has to be satisfied to begin the high task tj . This can be implemented by making the high task to read some data at low level and check for the satisfaction of the precondition periodically. Note that this approach does not require a secure message passing as in the earlier approach. On the other hand, it requires the high task to poll some low data to test for the precondition.

6.2

High-to-Low Dependencies

In the following section, we present our approach to handle high-to-low dependencies. Since CNRI dependencies are conflicting, the main issue is how to synchronize the tasks to satisfy the dependency without introducing signaling channels. Our approach to handle CN-RI dependencies eliminates the high-to-low dependency by splitting the high level task. The purpose of a CF-RD dependency is to force a low level task to move to a certain state according to the state of a high level task. Our approach to handle CF-RD compensates the low task when necessary by executing an inverse transaction. Finally, we view CN-RD as a combination of CN-RI and CF-RD since it could be due to the conflicting operations as well as due to the semantics of the workflow. 6.2.1

Enforcing CN-RI type high-to-low Dependencies

As described in the earlier section, these dependencies arise if there exists a high task that must read a low data item before it is modified by another low task. As in example 1, the intention of bc the dependency t2 −→ t3 is to prevent t3 from overwriting the low values of data items n and o yet to be read by t2 . Since delaying t3 until t2 ’s commit would result in a signaling channel, we propose two possible approaches to tackle this problem. The first approach is based on the multiple versions approach, presented in [10]; the second approach is new and proposed by this paper. Maintaining Multiple Versions: One may use multiple versions of data to cope with such high-to-low dependencies. Whenever a task writes a data item d, a new version of d is created, thus the value yet to be read by a high transaction is reserved as an older version of d. Costich and Jajodia [10] have proposed an approach in which they associate an index with each read/write operation. When a multilevel transaction first updates d, it is indexed by 1, the next write to d is indexed by 2, and so on, and when it reads d, the read operation is assigned the same index as that of the previous write operation. Thus, this indexing is used to preserve the dependencies by allowing a high task (which they call section of the multilevel transaction) to read an appropriate

16

t2

high:

x2

t4

x3

x1 mid: low:

t5

x6

t7 x9

x8

t6 x 7

x5

t1

x4

t8

t9

t3

Figure 5: An example demonstrating the closest ancestor version of d in order to enforce the dependency. This approach has the major drawback of requiring a special-purpose multiversioning concurrency control mechanism. The approach we propose here (described below) does not have such requirement and therefore can be supported by any standard DBMS. A similar approach to preserve the high-to-low dependency has been proposed by Smith et al. [19]. This uses a cache to save data to be read by a high section so that even if a low section overwrites the data yet to be read by the high section, the dependency is still preserved. Our approach of splitting the task (presented below) provides a framework to enforce the high-to-low dependency which can be implemented using the Smith et al.’s caching scheme. Splitting the High Task: According to our approach, first all the operations in every task are reordered in such a way that all read operations on lower level data items occur before all operations on data items at the level of the task (Since all these are read operations, it does not affect the correctness of the task.). Then the task is divided into partitions based on the data items it is accessing. For example, if a high task t2 in example 1 is split into two tasks, the first task tlow 2 contains all the low read operations, and the second task t2 all the high read/write operations. We bc

data −→ t2 to ensure that data read by introduce a begin-on-commit dependency with data flow tlow 2 low the read operations in t2 in fact are carried over to t2 even in the wake of other interfering tasks. bc bc bc Then, to enforce t1 −→ t2 and t2 −→ t3 , we convert these two dependencies as t1 −→ tlow and 2 bc low low t2 −→ t3 , as shown in figure 6. The low task t3 proceeds only if t2 commits, thus preserving the high-to-low dependency between t2 and t3 . In the following, we formally present our approach.

Definition 19 Given two tasks ti and tj in W , tj is said to be a closest-s-ancestor of ti if (1) L(tj ) = s, (2) tj is an ancestor of ti , and (3) there exists no tk in W such that L(tk ) = s and tk is an ancestor of ti and descendent of tj . 2 For example, in the workflow shown in figure 5, t8 and t1 are the closest-mid-ancestors of t7 and t3 is the closest-low-ancestor of t7 . By contrast, t6 is not a closest-mid-ancestor of t7 because there exists t8 which is an ancestor of t7 and a descendant of t8 . The following algorithm specifies our approach to task splitting. Algorithm 2 [split task] xdep

for every ti −→ tj in W where dep is CN-RI Step 1: split ti into number of partitions where each partition tsi = {oi [d] ∈ ti |L(d) = s ∧ s ≤ L(ti )} /* divide the high parent such that each partition consists of operations involving access to data items of the same security level */ L(t ) replace ti with ti i and add all tsi where s < L(ti ) to W 17

high

high:

t2 bc

low:

t

bc(data)

bc

t

1

low 2

bc

t3

Figure 6: Modified Workflow after executing split task for Example 1 t3

bc

t4

bc t

fba

bc 1

t2

bc

t -1 2

Figure 7: Modified Workflow after executing compensate task for Example 2 Step 2: for each tsi such that s < L(ti ) bc(data) L(t )

add an edge tsi −→ ti i /*add a bc dependency with data-flow from every lower level partition to that at level (ti )*/ for every tk in W where tk is the closest-s-ancestor of ti , bc

add an edge tk −→ tsi end{for} end{for} x Step 3: add an edge tsi −→ tj such that s = L(tj ) end{for} 2 xCN −RI

According to the above algorithm, given a dependency ti −→ tj , we first split ti into number of partitions based on the security level of data items involved in each operation of ti . Then we add an edge of type “bc” (i.e., begin-on-commit) with data flow from existing partitions at every lower L(t ) level s < L(ti ) (tsi ) to the highest level partition (ti i ). This is to ensure that the data read by a low partition reaches the high partition. Later, for each level where there is a partition of ti , we add an edge of type “bc” from every closest ancestor tk of ti at that level to that partition. This edge is to ensure that splitting does not remove any dependency from tk to tsi . We need to take care of the case where tk and tsi are conflicting. The abortive type need not be enforced because we need not to preserve the order of conflicting operations if tsi is to abort because tsi is always x read-only. Thus tk −→ tsi is always of type “bc.” However, in the above algorithm, we do not add an edge from a closest-s-ancestor tk of ti to a ′ partition tsi where s′ < s because the dependency path from tk to ti is not meant to capture the dependency from a lower level s′ . For example, in the workflow shown in figure 5, if tmid exists and 7 bc bc low mid mid t7 does not exist, we need to add dependencies t1 −→ t7 and t8 −→ t7 but do not need to does add any dependency from t3 though it is a closest-ancestor of t7 . On the other hand, if tmid 7 bc low low not exist but only t7 exists, then we need to add only the dependency t3 −→ t7 . In the last step, we add an edge from the partition at the level of tj to tj . This is of the same type of the original dependency. 18

Theorem 1 Let W be a workflow consisting of all CN-RI dependencies. Let W ′ be the workflow 2 obtained by applying algorithm 2 to W . Then, W ′ ∼ = W. 6.2.2

Enforcing CF-RD type high-to-low Dependencies

As noted earlier, weak abortive, strong non-abortive and weak non-abortive dependencies fall into the CF-RD category. In this section, we first present a straightforward approach employed in many systems, that is based on the use of a buffer. It has, however, the drawback of introducing some signaling channel, even though with a limited channel capacity. Then we present our approach, based on using compensating tasks, which does not have such drawback According to our approach, sometimes the compensating tasks need to be executed by a trusted subject such as a human user. Note that this is not necessarily a drawback because workflow systems are designed to support and allow user interactions. Therefore, requiring the intervention of a user is natural in a workflow environment. Using a buffer. An approach is to use a buffer at high (assume its size is sufficiently large) in which the commit message of the high task is stored. This message will first be subject to a delay of some random duration, and then will be transmitted to low. If several such messages of a single workflow get accumulated during the delay period of the first message, these messages cannot be sent at the same time, but must be sent individually with the delay incorporated in between each of them. Thus, though there exists a channel of downward information flow, the capacity of this channel would be low. (However, if the channel capacity does not exceed 100 bits per second, then it is fully secure (at B3 or A1 level).) It is important to note that this approach might affect the performance of the system because the signal from high is delayed thereby delaying all the low tasks unnecessarily. Running Compensating Tasks. In this paper, we propose an alternative approach to enforce a RD type of high-to-low dependency by executing a compensating task. We show below that some high-to-low dependencies (strong non-abortive and weak abortive type) can be enforced by compensating a low task. However, this approach works under the assumption that the low tasks are compensatable. In a special situation where a compensating task cannot be found, an equivalent dependency approach can be employed to enforce an RD type dependency. The idea of the equivalent dependency approach is to find another dependency which is logically equivalent to the original dependency (Refer to [5] for more details). Note that the low compensating task need to be initiated by a trusted subject. This approach, however, is applicable only to tasks for which there exist a compensating task. The formalism developed in section 5 can be used to enforce high-to-low dependencies as follows: bc For example, if there exists a high-to-low dependency ti −→ tj , since the inverse of “bc” is “fba,” f ba

and “bc” is strong and non-abortive, we replace the above dependency with ti −→ t−1 j , meaning −1 that the compensating task tj cannot begin until ti aborts. That is, both ti and tj can be executed independently, thus tj need not wait for ti thereby eliminating potential signaling channels. However, if ti aborts, a compensating task t−1 j will be started. Indeed, this compensating task must be executed by a trusted subject, e.g., a human user. If there are any dependencies involving tj , (e.g., x tj −→ tk ), compensating tj requires tk to be compensated to capture the cascaded compensation.

19

Since our approach does not compromise security yet can enforce equivalent compensating dependencies if all tasks are compensatable, it ensures SSSC-level execution. Figure 7 shows the modified workflow for example 2. The following algorithm shows how the above formalism can be employed to modify a high-tolow CF-RD type of dependency by introducing a compensating task. Algorithm 3 [compensate task] xCF −RD

for every ti −→ tj in W if there exists a t−1 j x remove ti −→ tj from W if x is strong non-abortive or weak abortive add a node t−1 j and x−1

bc

−1 edges ti −→ t−1 j and tj −→ tj in W /* compensate the low task and add a new inverse dependency of the original high-to-low dependency from parent to the compensating task of child */ for every tl where tl is the closest-L(tj )-ancestor of ti bc

add an edge tl −→ tj in W end{for} y for every tj −→ tk parent ← j and child ← k execute cascaded-compensation end{for} elseif x is weak non-abortive x−1

bc

in W add a node t−1 and edges tj −→ t−1 and ti −→ t−1 i i i /* compensate the high task */ add a new inverse dependency of the original high-to-low dependency from child to the compensating task of the parent */ y for every ti −→ tk parent ← i and child ← k execute cascaded-compensation end{for} end{for} cascaded-compensation z for each tparent −→ tchild if L(tparent ) ≤ L(tchild ), if there exists a t−1 child

f bb

bc

−1 −1 −1 add node t−1 child and edges tchild −→ tchild and tparent −→ tchild to W end{for} 2

The cascaded-compensation part of the above algorithm ensures that in the event of tj ’s compensation, all its descendents are also compensated. We accomplish this by adding a force-beginon-begin (fbb) from t−1 to tj ’s child tk , t−1 j k to tk ’s child tl , and so on. However, we do not need to continue this cascaded compensation until we reach the farthest descendent of tj , but can stop 20

whenever a high-to-low dependency is encountered in this path since this dependency has already been redesigned by one of algorithms 2, 3 and 4, based on its type. It is important to note here −1 that, if there exists more than one parent to a compensating task t−1 j (other than tj ), tj will be initiated when any one of its parents reach the triggering state. Theorem 2 Let W be a workflow consisting of all CF-RD dependencies. Let W ′ be the workflow x obtained by applying algorithm 3 to W . If for every dependency ti −→ tj in W where x is non−1 ′ ∼ abortive or weak abortive, there exist a t−1 j and a tk where tk is a descendent of tj , then W = W. 2 6.2.3

Enforcing CN-RD type high-to-low Dependencies

As noted earlier, we treat CN-RD as a combination of CN-RI and CF-RD. Therefore we use both split-task and compensate-task, called split compensate task. The split compensate task approach consists of two steps. In the first step, the dependency is treated as if it is a CN-RI type in which the high task is split using split task algorithm. In the second step, the CF-RD dependency between the high partition of the parent task and the child task is handled using the compensate task algorithm. Algorithm 4 formally presents the above illustration. Algorithm 4 [split compensate task] x

for each control-flow dependency (ti −→ tj ) in W such that L(ti ) 6≤ L(tj ) if (x is of type CN-RD), then re-label x as CN-RI execute algorithm split task L(t ) x add an edge ti i −→ tj and label x as CF-RD execute algorithm compensate task end{for} 2 Theorem 3 Let W be a workflow consisting of all CN-RD dependencies. Let W ′ be the workflow 2 obtained by applying algorithm 4 to W . Then, W ′ ∼ = W. In figure 8, we summarize how each type of dependency can be redesigned.

6.3

Algorithm for the Execution of MLS Workflows

Given a workflow specification W , in the following, we present an algorithm to derive a workflow execution graph (WEG) that determines the execution order of the tasks in a workflow. Here we assume that each task dependency is only of one type. Algorithm 5 [Constructing W EG from W ] nodes of W EG are tasks in W x include all dependencies (ti −→ tj ) in W as edges of W EG such that L(ti ) ≤ L(tj ) x for each control-flow dependency (ti −→ tj ) in W such 21

compensate low

split and compensate

compensate high

weak non-abortive

weak abortive

RD

strong non-abortive

compensate low

weak abortive

strong

abortive 0000000000000 1111111111111

11111111111 00000000000

non-abortive

RI

split

CN

CF

Figure 8: Approach for redesigning each type of dependency a

t2

high:

t3 bc

bc

low:

t5 b

sc

ba t6

t4

t1

t7

Figure 9: An example workflow (W) that L(ti ) 6≤ L(tj ) if (x is of type CF-RD), then if (x is a type weak abortive or strong non-abortive dependency), then execute algorithm compensate task else ignore x elseif (x is of type CN-RI ), then execute algorithm split task elseif (x is of type CN-RD ), then execute algorithm split compensate task end{for} 2 t

2

a

high 3

t

bc (data)

bc t

1

t 5 bc

bc

t

low 3

bc

b t4 bc t 6

fba bc

t 5-1 fbc t 7 bc

t 7-1

Figure 10: The WEG for W in figure 8 Theorem 4 Let W be a workflow. Let W EG be the workflow obtained by applying algorithm 5 to W . Then, W EG ∼ 2 = W. The workflow execution graph thus constructed consists of all dependencies from low-to-high unless it is to a compensating task. Since all compensating transactions that involve a high-to-low 22

high

t1

t3 ba

t1

t3

bc

bc

fbc

bc

-1

t3

wnbb fbb

t2

low

t2

bc

t

-1 2

(b)

(a)

Figure 11: Redesigning using the modified-cascaded-compensation dependency are executed by trusted subjects, enforcing the dependencies in WEG does not cause any signaling channels. Figure 10 shows the W EG for the W in figure 9. Theorem 5 Let W be a workflow. Let W EG be the workflow obtained by applying algorithm 5 to W . Then, the execution of WEG achieves SSSC-level of execution. 2

6.4

Reducing Cascaded Compensation

In algorithm 3, we have presented an approach to redesign CF-RD type high-to-low dependencies xCF −RD by using inverse dependencies with compensating tasks. That is, for every ti −→ tj dependency, x−1

bc

−1 we construct the redesigned workflow by replacing this dependency with ti −→ t−1 j and tj −→ tj . In the event of tj ’s compensation, all the tasks that follow tj also need to be compensated, resulting in cascaded compensation (as described in the cascaded-compensation part of algorithm 3). Since workflow is in general a long running activity, it is prudent to detect the compensation of a task as early as possible and prevent the unnecessary execution of the tasks that follow this task as they will have to be compensated later. In other words, if the tasks following tj have not started their execution, we prevent them from being executed, however, if they have already started their execution, or finished their execution by the time t−1 starts its execution, then we compensate j them as in algorithm 3. In order to accomplish this, we modify the cascaded-compensation part of algorithm 3 by introducing a new dependency, called weak no-begin-on-begin (wnbb), between t−1 j and its following wnbb

−1 task, as shown in algorithm 6 below. The semantics of t−1 j −→ tk are as follows. When tj begins its execution, if tk has not started its execution then tk will not begin.

Algorithm 6 [modified-cascaded-compensation] z

for each tparent −→ tchild if L(tparent ) ≤ L(tchild ), if there exists a t−1 child

bc

−1 add a node t−1 child and an edge tchild −→ tchild to W wnbb

add an edge t−1 parent −→ tchild to W f bb

−1 add an edge t−1 parent −→ tchild to W parent ← child end{for} 2

23

6.5

Relevance to Multilevel Transaction Processing

Research addressing how to incorporate multilevel security in workflow management systems is fairly new. Recently Atluri and Huang in [4] have proposed a Petri net based approach which can automatically detect and prevent all task dependencies that can potentially cause signaling channels. Since their approach eliminates some dependencies, it cannot guarantee correct execution of multilevel secure workflows. Several researchers have addressed issues concerning execution models for multilevel transactions, which are relevant to our work. Unlike traditional transactions, a multilevel transaction can read as well as write at multiple security levels. Multilevel transaction execution cannot meet both atomicity and secrecy requirements because aborting a portion of the transaction at a lower security level due to the abort of its higher level counterpart creates information flows that violate multilevel security restrictions. Since multilevel transactions can be modeled in our workflow framework, the solutions we propose in this paper are applicable to this area as well. Some of the earlier solutions deal with this problem by relaxing the atomicity requirements. For example Blaustein et al. [7] have proposed several levels of atomicity, and show that based on the structure of the multilevel transaction, only a certain level of atomicity can be achieved. They have proposed two algorithms, called Low-First and High-Ready-Wait. In the Low-First, single level portions (called sections) are executed in the order of increasing security level. That is, all lower level sections must be executed and committed before a higher level section starts execution. Thus, this algorithm cannot allow high-to-low dependencies. Thus, Low-First can make no guarantees on the level of atomicity. In High-Ready-Wait, all sections of a multilevel transaction are executed (but not committed) in a high to low order and then committed in a low to high order. Thus, High-Ready-Wait cannot enforce low-to-high dependencies. Moreover it works for only hierarchically ordered security structures and also may cause a limited capacity signaling channel. Thus, Blaustein et al.’s approach works only if either all dependencies are either low-to-high or high-to-low but does not work if there exist both high-to-low and low-to-high (which is referred to as cycles in [7]). Our redesigning approach can be employed to eliminate some of the high-to-low dependencies thus increasing the degree of atomicity that can be guaranteed. Note that earlier researchers have also proposed techniques for the elimination of high-to-low dependencies between sections of the multilevel transaction by rewriting the section [7] or maintaining multiple versions of data [10], but their rewriting of each section requires a careful examination of the semantics of the transaction by a human user, and moreover may not be possible in all cases. Whereas our approach redesigns a workflow by simply examining the read and write operations of the tasks and therefore can be fully automated. Our approach is similar to the cache scheme proposed by Smith et al. [19]. Later, Ammann et al. [2] have also proposed a solution based on semantic atomicity which again requires rewriting of multilevel transactions manually. Moreover, Ammann et al.’s approach is based on the assumption that every dependency from a higher to a lower level task can be converted into a lower to a higher level. In this paper, we characterize all the types of dependencies and show that only certain types (called conflicting in section 4) can be converted in such a way. Another advantage of modeling a multilevel transaction as a workflow transaction model is that it allows one to distinguish the various types of dependencies that can occur among the sections of a multilevel transaction. This allows one to identify the sections that require to be executed atomically instead of the entire transaction thereby allowing one to specify relaxed atomicity requirements.

24

7

The System Model

In this section, we describe the architecture of our model and provide the relevant algorithms for implementing each of its components. The architecture of our system is shown in figure 12. We adopt the kernelized architecture and employ separate Workflow management systems (WFMSs) and DBMSs for each security level, as shown. Thus WFMSs and DBMSs need not be trusted. For the sake of simplicity, we have shown it for two security levels – high and low. The static workflow redesign module is responsible for examining the nature of workflow dependencies and redesigning the workflow according to algorithm 5. Since the redesign process is carried out statically, this component need not be trusted. The redesigned workflow W’ is forwarded to the task dispenser, which is responsible for generating different levels of WEGs and dispensing them to the appropriate WFMSs. The task dispenser must be trusted to avoid any signaling channels. The portion of the W EG sent to the WFMS at each security level s, called W EGs comprises all tasks at levels dominated by s and all the dependencies among these tasks. Each WFMS at level s is responsible for scheduling the tasks and submitting them to the DBMS at its level such that all the dependencies in W EGs are enforced. The DBMS at each level s simply executes the submitted tasks and sends the responses back to the WFMS at its level. The WFMS at level s forwards these responses to the WFMSs at all security levels that that dominate s.

WFMS h

G hig

DBMS

High WFMS

WE

Redesigned Workflow W’

Task Dispenser

responses

High DBMS

responses

Static Workflow Redesign Module

W

EG

low

com reqpensa ues ting t

Multilevel Workflow W

high tasks

Low WFMS

low tasks Low DBMS responses

ing

sat en mp s co task

Compensating Manager

Figure 12: The MLS WFMS Architecture When a WFMS receives the responses from other WFMSs, it ignores if there is no dependency from the relevant low task to any of the tasks at its level. Otherwise, if there exists a dependency from a lower level task to any task at its level, the WFMS uses this response to assess the situation to enforce the dependency and sends the relevant tasks to the corresponding DBMS for execution. Based on the outcome of any of the tasks at its level, if a WFMS has to compensate a lower level task in order to enforce a dependency (according to the redesigned workflow W’), it sends a compensating request to the compensating manager. The compensating manager then notifies

25

the corresponding WFMS to initiate the compensating task. Since this request is originated by a higher level WFMS which triggers a lower level task, the compensating manager must be a trusted component in order not to introduce any signaling channels. In case more than one WFMS sends requests to compensate a task, the compensating manager initiates the compensating task only once. Note that only two components of our model must be trusted, which are shown as shaded blocks in figure 12. In the following we present the algorithms for each component, from which it is evident that the code required to implement these trusted components is very small.

7.1

Implementation of the Task Dispenser

Formally, W EGs is defined as follows. Definition 20 Given a W EG and a security level s, we define W EGs of W EG such that the nodes x x of W EGs = {ti |ti ∈ W EG and L(ti ) ≤ s} and the edges of W EGs = {ti −→ tj |ti −→ tj ∈ W EG where L(ti ) ≤ s and L(tj ) ≤ s}. Algorithm 7 [The Task Dispenser] for each security level s ∈ S W EGs ← W EG x for each dependency ti −→ tj ∈ W EGs x if s < L(ti ) or s < L(tj ), remove ti −→ tj from W EGs end {for} for each ti ∈ W EGs if s < L(ti ), remove ti from W EGs end {for} 2

7.2

Implementation of the WFMS

According to our architecture, while the WFMS layer enforces all the dependencies existing among the various tasks in a workflow by submitting the tasks to the appropriate DBMS in a coordinated manner, the corresponding DBMS simply executes their respective tasks and sends the responses back to the WFMS. Each task is divided into three control scripts, named begin script, commit script and abort script as indicated and these control scripts are submitted separately. In order to facilitate this implementation, we introduce another task primitive, called precommit for each task. Several other implementations require the notion of the precommit primitive such as two-phase commit to ensure the atomic commit of subtransactions in a distributed database environment. Similar implementations can be found in the context of extended transaction models, for example the reflective transaction framework in [6], transaction adapters in [8], etc. In a WFMS, the precommit primitive is necessary to ensure certain dependencies such as strong commit. The additional precommit primitive together with the separate control scripts ensures that WFMS would still posses the control of the commitment once the task has been sent to DBMS for execution. Otherwise, the tasks submitted to the DBMS might commit directly regardless of the dependency requirement. Algorithm 8 [The Task Execution by the DBMS] 26

begin scripti : BEGINi return exi while not abi do for each di ∈ DO perform di end for P RECOM M ITi return dni end while return abi commit scripti : COM M ITi return cmi abort scripti : ABORTi return abi The DBMS executes the task primitives as well as the data operations that are submitted to it. Whenever the DBMS reads a return statement, it echos back a notification to WFMS. Based on the dependency specification, the WFMS must schedule the submission of task scripts. To accomplish this, the WFMS keeps track of the current state of each workflow W , called CurrentState-Set (CSS(W )), and updates it whenever a task changes its state. The WFMS uses the responses from the DBMS at its own level and from the lower level WFMSs to update CSS(W ). CSS(W ) is then used to determine the primitives to be invoked. To facilitate this, the WFMS maintains the set of preconditions for each task primitive pri , called PreSet(pri), which is computed in advance. In the following, we formally define CSS(W ) and P reSet(pri ). Definition 21 [Current-State-Set] Given a workflow W = {t1 . . . , tn }, we define a current-state-set at a given time instance, CSS(W ) = {st1 , . . . , stn } which is a set of the current state of all tasks in W at that time. Definition 22 [Precondition-Set] Given a task ti in W , we define the precondition-set for each x primitive pri of ti as PreSet(pri) = {stk |tk −→ ti }. Definition 23 Given a PreSet(pri), we say PreSet(pri) is satisfied at a given instance of time if PreSet(pri) ⊆ CSS(W ).§ Algorithm 9 [The WFMS at each level s] begin (1) Static Step: for each task ti ∈ W EGs such that L(ti ) = s §

If PreSet(pri ) = ∅, then PreSet(pri ) is always satisfied.

27

determine PreSet(pri) for all pri of ti x −1 −1 if there exists a t−1 j ∈ W EGs such that ti −→ tj and L(tj ) < s determine PreSet(bj −1 ) end if end {for} (2) Dynamic Step: mark all tasks in W EGs unvisited for each unvisited task ti ∈ W EGs such that L(ti ) = s for each pri ∈ P Oi if P reSet(pri ) is satisfied, then responsei ← N ull submit script(pri) do responsei ← listen response(pri ) until response N ull send responsei to all WFMSs at levels > s replace sti ∈ CSS(W ) with responsei end if x −1 ′ if ti −→ t−1 j such that L(tj ) = s < s if P reSet(bj −1 ) is satisfied, then response−1 j ← N ull submit compensate request(t−1 j ) do response−1 j ← listen response(cmj −1 ) until response−1 j N ull −1 send responsej to all WFMSs at levels > s replace stj ∈ CSS(W ) with inj end if end if end {for} mark ti visited. end {for} end submit script(pri) { if pri=”bi ” then begin scripti elseif pri=”ci ” then commit scripti else abort scripti }

28

7.3

The Compensating Manager

When the compensating manager receives a compensation request from WFMS, it will first check if that compensating task has previously been executed. If not, it will send a message to the corresponding WFMS to initiate the compensating task. The algorithm is shown below. Algorithm 10 [The Compensating Manager] comp list ← φ response ← N ull do response ← listen compensate request(t−1 j ) until response N ull if t−1 j 6∈ comp list submit script(b−1 j ) comp list ← comp list ∪ {tj−1 } end if

8

Conclusions

Correct execution of multilevel secure workflow requires enforcing all the task dependencies. However, ensuring high-to-low dependencies is difficult because of the inherent conflicts between security and correctness. In this paper, we show how a multilevel secure workflow can be executed in a secure and correct manner. Our approach is based on semantic classification of the task dependencies that examines the source of the task dependencies. We propose algorithms to automatically redesign the workflow in such a way that all task dependencies can be executed without compromising security. Note that execution of workflows can be executed by untrusted commercially available workflow management systems although the redesign algorithm must be trusted. Our solutions are directly applicable to another relevant area of research — execution of multilevel transactions in multilevel secure databases since the atomicity requirements and other semantic requirements can be modeled as a workflow. By modeling a multilevel transaction as a workflow transaction model allows one to distinguish the various types of dependencies that can occur among the sections of a multilevel transaction. This allows one to identify the sections that require to be executed atomically instead of the entire transaction thereby allowing one to specify a relaxed atomicity requirements. Our redesign process can be used to increase the degree of atomicity one can guarantee for a multilevel transaction. Note that unlike prior research in this area that requires the redesign based on the semantics and therefore requires a careful examination by a human, our approach can be fully automated.

Acknowledgment The work of V. Atluri was partially supported by National Science Foundation CAREER award under grant IRI-9624222 and by National Security Agency under grant MDA904-96-1-0127.

29

References [1] G. Alonso, D. Agrawal, A. El Abbadi, M. Kamath, R. Gunthor, and C. Mohan. Advanced transaction models in workflow contexts. Research report, IBM Almaden Research Center, 1995. [2] Paul Ammann, Sushil Jajodia, and Indrakshi Ray. Ensuring atomicity of multilevel transactions. In Proc. IEEE Symposium on Security and Privacy, Oakland, California, May 1996. [3] Vijayalakshmi Atluri and Wei-Kuang Huang. An Authorization Model for Workflows. In Proc. of the Fifth European Symposium on Research in Computer Security, September 1996. [4] Vijayalakshmi Atluri and Wei-Kuang Huang. An Extended Petri Net Model for Supporting Workflows in a Multilevel Secure Environment. In Proc. of the 10th IFIP WG 11.3 Working Conference on Database Security, July 1996. [5] Vijayalakshmi Atluri, Wei-Kuang Huang, and Elisa Bertino. A Semantic Based Redesigning of Distributed Workflows. In 9th International Conference on Management of Data, December 1998. [6] Roger Barga and Calton Pu. A practical and modular method to implement extended transaction models. In Proceedings of the International Conference on Very Large Data Bases, pages 206–217, 1995. [7] Barbara T. Blaustein, Sushil Jajodia, Catherine D. McCollum, and LouAnna Notargiacomo. A model of atomicity for multilevel transactions. In Proc. IEEE Symposium on Security and Privacy, pages 120–134, Oakland, California, May 1993. [8] Panos K. Chrysanthis and Krithi Ramamritham. ACTA, A framework for specifying and reasoning about transaction structure and behavior. In Proc. ACM SIGMOD Int’l. Conf. on Management of Data, pages 194–203, June 1990. [9] P.K. Chrysanthis. ACTA, A framework for modeling and reasoning about extended transactions. PhD thesis, Department of Computer and Information Science, University of Massachusetts, Amherst, 1991. [10] Oliver Costich and Sushil Jajodia. Maintaining multilevel transaction atomicity in mls database systems with Kernelized architecture. In Carl Landwehr and Sushil Jajodia, editors, Database Security, V: Status and Prospects, pages 173–189. North Holland, 1992. [11] Ahmed K. Elmagarmid. Database Transaction Models for Advanced Applications. Morgan Kaufmann, San Mateo, California, 1992. [12] Dimitrios Georgakopoulos, Mark Hornick, and Amit Sheth. An Overview of Workflow Management: From Process Modeling to Workflow Automation Infrastructure. Distributed and Parallel Databases, pages 119–153, 1995. [13] J. A. Goguen and J. Meseguer. Security Policy and Security Models. In Proc. IEEE Symposium on Security and Privacy, pages 11–20, 1982. 30

[14] Myong H. Kang and Ira S. Moskowitz. A Pump for Rapid, Reliable, Secure Communication. In Proc. of the 1st ACM conf. on Computer and Communication Security, Fairfax, VA, November 1993. [15] T. F. Keefe, W. T. Tsai, and J. Srivastava. Multilevel Secure Database Concurrency Control. In Proc. IEEE 6th Int’l. Conf. on Data Engineering, pages 337–344, Los Angeles, California, February 1990. [16] Ira S. Moskowitz and Myong H. Kang. Covert Channels – Here to Stay? In Proc. COMPASS, pages 235–243, Gaithersburg, MD, IEEE Press, IEEE Cat. 94CH3415-7, ISBN 0-7803-1855-2, June 1994. [17] Marek Rusinkiewicz and Amit Sheth. Specification and Execution of Transactional Workflows. In W. Kim, editor, Modern Database Systems: The Object Model, Interoperability, and Beyond. Addison-Wesley, 1994. [18] Amit Sheth, Marek Rusinkiewicz, and G. Karabatis. Using Polytransactions to Manage Interdependent Data. Bulletin of IEEE Technical Committee on Data Engineering, 16(2):37–40, 1993. [19] K.P. Smith, B.T. Blaustein, S. Jajodia, and L. Notargiacomo. Correctness Criteria for Multilevel Secure Transactions. IEEE Transactions on Knowledge and Data Engineering, 8(1):32 – 45, February 1996.

A

Proofs

Theorem 1 Let W be a workflow consisting of all CN-RI dependencies. Let W ′ be the workflow obtained by applying algorithm 2 to W . Then, W ′ ∼ = W. Proof: We prove W ′ ∼ = W by proving all the three conditions in definition 17 are true. Part 1: We first prove condition 1 of definition 17. Let W consists of n tasks and m CN-RI dependencies. Thus if we assume each dependency as a separate workflow, (say W1 , W2 . . . Wm ), x then S(WH ) = ∪m k=1 S(WkH ). For every Wk consisting of ti −→ tj where x is CN-RI, Wk = {ti , tj }. Thus, WkH = tiH ∪ tjH . According to step 1 of algorithm 2, ti in W is split into partitions tsi where s ≤ L(ti ). Thus Wk′ = {tsi |s ≤ L(ti )), tj }. Since all the data operations of the original ti are present in {tsi |s ≤ L(ti )}, WkH = Wk′ H . According to our security model, each partition tsi where s < L(ti ) consists of read-only operaL(t ) tions, and only partition ti i may contain write operations. Therefore, if ti aborts (commits) in L(t ) Wk , ti i will also abort (commit) in Wk′ . If ti aborts, since according to item (2) of definition 15, the data operations of all tsi where s < L(ti ) are removed from Wk′ H . Thus S(WkH ) = S(Wk′ H ). If ti commits, since WkH = Wk′ H , it follows that S(WkH ) = S(Wk′ H ). m ′ If we consider all the m dependencies, then ∪m k=1 S(WkH ) = ∪k=1 S(WkH ). Which means ′ S(WH ) = S(WH ). Thus condition 1 of definition 17 is true. xCN −RI Part 2: Now we prove condition 2 of definition 17. Let ti −→ tj be a dependency in W . To prove condition 2 of definition 17, we should prove that applying algorithm 2 does not change the

31

order of the operations conflicting with those of ti that are either in itself or that belong to an ancestor or descendent of ti . According to our security model, each partition tsi such that s < L(ti ) consists of read-only operations, {ris [d]|L(d) = s} and ti does not contain any wi [d] such that it conflicts with an ri[d] in tsi . Thus no partition tsi conflicts with ti or other partitions of ti . Thus the order of all conflicting operations within the original ti are preserved in W ′ . xCN −RI Because of ti −→ tj , ti is an ancestor (in fact parent) of tj . Thus all operations of ti precede those of tj in W . According to algorithm 1, ti must consist of at least one operation ri[d] such L(t ) that L(d) = L(tj ) and there must exist a wi [d] in tj . Thus, ti j 6= ∅. Because algorithm 2 adds L(t ) L(t ) x an edge tsi −→ tj , it implies that ti j is an ancestor of tj , i.e., all operations of ti j precede those of tj . Thus W ′ preserves the order of conflicting operations between ti and tj . If there is any child tk to ti , other than tj , then splitting ti will not change the order of conflicting operations of tk and ti because these operations will only be in ti (since all other partitions consist of read only operations and if there is any other CN type dependency, that should have been expressed as x another dependency) and all the other dependencies ti −→ tk are not affected while splitting. Now we prove algorithm 2 preserves the order of conflicting operations of ti and its ancestors. Suppose tk is an ancestor at level s < L(ti ). Then ti conflicts with tk only if it has a write operation on a data object d (L(d) must be equal to s) and ti has a read operation involving the same d. bc That means tsi 6= ∅. A dependency tk −→ tsi , would preserve the order of the conflicting operations of ti and tk . However, if there exists another ancestor of ti at level s, say tl , such that tl is a x bc descendent of tk , then a dependency tl −→ tsi is enough, instead of tk −→ tsi , to preserve the order of the conflicting operations among tk and ti as well as tl and ti . Applying the same logic, we can bc conclude that the order of conflicting operations is preserved if we add a dependency tl −→ tsi , where tl is the closest-s-ancestor of ti . Since tl does not contain a conflicting operation with any ′ ′ other partition tsi where s 6= s′ < L(ti ), it is not necessary to add a dependency from tl to such tsi . Thus we prove condition 2 of definition 17. Part 3: Now we prove condition 3 of definition 17. According to algorithm 2 a task ti may be split L(t ) into several partitions tsi such that s ≤ L(ti ). Assume ti is split into p partitions excluding ti i , sp sp s1 s2 s1 s2 say ti , ti , . . ., ti . Among them, all ti , ti , . . ., ti consist of read-only data operations and only L(t ) L(t ) ti i may consist of write operations. Since ti in W is simply replaced by ti i in W ′ , the original L(t ) dependencies are preserved between ti and its ancestors. Thus, the final state of ti i in W ′ would be same as that of ti in W . L(t ) Suppose ti commits in W . Thus there exists a P ∈ P SS(W ) such that cmi ∈ P . Since ti i also s L(t ) commits, there exists a P ′ ∈ P SS(W ′ ) that corresponds to P such that cmi i , cmsi 1 , . . ., cmi p ∈ s L(t ) P ′ . (Note here that none of the tsi 1 , . . ., ti p } can have abort as a final state if ti i commits because L(t ) of the “bc” dependency from each lower level partition to ti i .) According to item 3 of definition s L(t ) 16, cmi i , cmsi 1 , . . . , cmi p are replaced by cmi, thus P and P ′ consist of the same elements. L(t ) Suppose ti aborts in W . Thus there exists a P ∈ P SS(W ) such that abi ∈ P . Since ti i also s L(t ) aborts, there exists a P ′ ∈ P SS(W ′ ) that corresponds to P such that abi i , cmsi 1 , . . ., cmi p ∈ P ′ . s L(t ) According to item 3 of definition 16, abi i , cmsi 1 , . . . , cmi p are replaced by abi, thus P and P ′ consist of the same elements. Thus for P SS(W ) ≡τ P SS(W ′ ). Hence W ′ ∼ 2 = W.

32

Theorem 2 Let W be a workflow consisting of all CF-RD dependencies. Let W ′ be the workflow x obtained by applying algorithm 3 to W . If for every dependency ti −→ tj in W where x is non−1 ′ ∼ abortive or weak abortive, there exist a t−1 j and a tk where tk is a descendent of tj , then W = W . Proof: We prove W ′ ∼ = W by proving all the three conditions in definition 17 are true. Part 1: We first prove condition 1 of definition 17. As in the proof for theorem 1, let W consists of n tasks and m CF-RD dependencies. Thus if we assume each dependency as a separate workflow, x (say W1 , W2 . . . Wm ), then S(WH ) = ∪m k=1 S(WkH ). For every Wk consisting of ti −→ tj where x is CF-RD, Wk = ti , tj . Thus, WkH = tiH ∪ tjH . According to algorithm 3, a compensating task t−1 j −1 is added. Thus Wk′ = {ti , tj , t−1 }. Thus, W = DO ∪ DO }. W = DO ∪ DO ∪ DO . k i j k i j j H H j As per the semantics of the compensating tasks, a compensating task t−1 is executed only when j tj would have to be aborted. Thus, applying item 2 of definition 15 results in S(WkH ) = DOi . Applying item 1 of defintion 15 to Wk′ , S(WkH ) = DOi . Which means S(WkH ) = S(Wk′ H ). If m ′ we consider all the m dependencies, then ∪m k=1 S(WkH ) = ∪k=1 S(WkH ). Which means S(WH ) = ′ S(WH ). Thus condition 1 of definition 17 is true. Part 2: Since in CF-RD type dependencies ti and tj do not have any conflicting operations, condition 2 of definition 17 is always true. x Part 3: We prove condition 3 of definition 17 as follows: For every ti −→ tj , we assume there exist x1 xn descendants of ti , tk1 , . . .tkn such that tj −→ tkn . We use the following property. tk1 , . . ., tkn−1 −→ (ti , tj , tk1 , . . ., tkn ) = (ti , tj ) ∪ (tj , tk1 ) ∪ . . . , ∪(tkn−1 , tkn ). Consider the case where x is either strong non-abortive or weak abortive. ¿From definition −1 18, P SS(ti , tj , t−1 j ) ≡τ P SS(ti, tj ). Similarly, P SS(tj , tk1 , tk1 ) ≡τ P SS(tj , tk1 ) and so on. Thus −1 P SS(ti , tj , tk1 , t−1 2 k1 . . . , tkn , tkn ) ≡τ P SS(ti , tj , tk1 , . . . , tkn ). Theorem 3 Let W be a workflow consisting of all CN-RD dependencies. Let W ′ be the workflow obtained by applying algorithm 4 to W . Then, W ′ ∼ = W.

2

Proof: This follows from theorems 1 and 2.

Theorem 4 Let W be a workflow. Let W EG be the workflow obtained by applying algorithm 5 to W . Then, W EG ∼ = W. Proof: This trivially follows from theorems 1, 2 and 3. This is because algorithm 2 does not add any new CF-RD type dependencies and algorithm 3 does not add any new CN-RI type dependen2 cies, thus this process will not be cyclic. Theorem 5 Let W be a workflow. Let W EG be the workflow obtained by applying algorithm 5 to W . Then, the execution of WEG achieves SSSC-level of execution. Proof: Since in WEG, all the CN-RI high-to-low dependencies are converted by algorithm 2 into low-to-high dependencies, all dependencies can be enforced without introducing any signaling xCF −RD channels. If x is not weak and non-abortive, according to algorithm 3, every ti −→ tj high-tox−1

low dependency is removed and a new high-to-low dependency ti −→ t−1 j is introduced in W EG. −1 This new dependency can, however, be added only if tj exists. Since this new dependency is 33

enforced only by a trusted subject it does not introduce any signaling channels. Since in WEG, all the CN-RD high-to-low dependencies are be broken down into a CN-RI type and a CF-RD type, from the above arguments can be made in this case. Thus execution of W EG achieves SSSC-level execution. 2

34

Suggest Documents