May 14, 2009 - wfa ::= WorkflowAnnotation(wfID focPointDesc fa+ conditionAnnotâ) fa::= FormAnnotation(formID itemAnnotation+). //item annotation.
Consistency Checking for Workflows with an Ontology Based Data Perspective Gabriele Weiler May 14, 2009 Abstract Static analysis techniques for consistency checking of workflows allow to avoid runtime errors. This is in particular crucial for long running workflows where errors detected late can cause high costs. Checking techniques can analyse the control flow of individual tasks as well as the consistency of how data of the workflow is represented, collected and utilized. In many classes of workflows, the data perspective is rather simple, and the control flow perspective is the focus of consistency checking. In our setting, however, workflows are used to collect and integrate complex data based on a given domain ontology. In such scenarios, the data perspective becomes central. In this report, we motivate and describe a simple workflow language with an ontology-based data perspective (SWOD), explain its semantics, classify possible inconsistencies, and present an algorithm for detecting such inconsistencies utilizing semantic web reasoning. We discuss soundness and completeness of the technique.
1
Introduction
Workflows describe the processing of data according to a well-defined control flow. Workflows often interact with humans and integrate them into the processing. We consider potentially long-running workflows in which the processed data has to be consistent with a given domain ontology, like e.g. for medical trials or complex financial workflows. The central goal is to detect errors in the workflow definition at design time of the workflow. Such static checking prevents to execute faulty workflows where errors might occur only months after the workflow was started, causing potentially huge costs. In this paper, we focus on the detection of data inconsistencies. Integration with existing control flow checking techniques, such as deadlock detection (e.g. [VvdAtH07], [QXW+ 07]) is considered future work. We divide data inconsistencies into two categories: 1. Data-dependent control flow inconsistencies causing e.g. undesired abortion of workflow executions or non-reachable tasks due to unsatisfiable conditions. 1
2. Semantic data inconsistencies causing data collected during workflow execution to be inconsistent with the knowledge of the underlying domain that we assume to be given by a domain ontology1 . To guarantee reliable workflow executions, it is important to detect and eliminate the first kind of inconsistencies. Avoiding the second kind of inconsistencies guarantees that the collected data is consistent, a prerequisite to get reliable results from data analysis and to enable integration of data collected in different information or workflow systems. Few existing algorithms are able to detect data inconsistencies in workflows. Sun et al. [SZNS06] developed a framework for detecting data flow anomalies, which is e.g. capable of detecting missing data in conditions, but is not able to check the satisfiability of conditions in a workflow. Eshuis [Esh02] describes a framework for verification of workflows based on model checking. His framework is able to check satisfiability of conditions, but these checks consider only boolean expressions and their dependencies. Our central contribution is an algorithm to check both described categories of data inconsistencies, in particular to verify consistency with domain ontologies. The main aspects of this contribution are: • Workflow Language. The considered class of workflows consists of human processable tasks, comprising forms that users have to fill in at execution time. We present a language for this class of workflows, putting special emphasize to a formally defined data perspective based on an existing domain ontology. We provide a well defined semantics for the ontology-based data. • Categories of Inconsistencies. We provide a precise definition and categorization of potential data inconsistencies that can cause problems during workflow execution. • Consistency Checking Algorithm. We describe a static consistency checking technique developed in two steps. The first algorithm computes for each possible execution of the workflow an ABox (according to description logics) that describes a set of individuals of the domain ontology. Based on the ABoxes, the algorithm detects the described inconsistencies. Since this algorithm does not terminate in all cases, we refine it, obtaining an algorithm that computes instead of the ABoxes an appropriate ontology. We prove that it is able to detect the same inconsistencies as the first algorithm. Our approach has various strengths. Semantic annotation in terms of the domain ontology can be assigned automatically to data collected in workflows encoded in our workflow language. That allows to query the data in terms of the ontology, enables meaningful interpretation of the data and inference of new knowledge from the data. Technologies developed for the semantic web 1 Similarly,
one could check consistency with complex XML-schema information
2
are reused, e.g. the Web Ontology Language (OWL) and description logic reasoners (DL-reasoners). Consistency checking avoids that contradictory data is collected that can not be utilized for information integration and increases reliability of the workflow execution avoiding e.g that workflow execution aborts unexpectedly. Our workflow language shall not be seen as a substitute to existing powerful workflow languages, but as an example, how these languages can be augmented with ontology-based data perspectives and profit from the algorithms described in this work. The paper is structured as follows: In Sec. 2, we describe as our motivating application scenario the ontology-based trial management system ObTiMA. In Sec. 3 a short introduction into Description logics is given. In Sec. 4, we specify the workflow language its semantics and possible data inconsistencies. In Sec. 5 algorithms to detect these inconsistencies are described. We conclude with related work and discussion.
2
ObTiMA - An ontology based trial management system for ACGT
Although our algorithm is generally usable for different kind of workflow systems, it has originally been designed for application in ObTiMA, an ontologybased trial management system [WBG+ 07]. ObTiMA is developed for the integrated European project ACGT (Advancing Clinico Genomic Trials on Cancer) [TBN+ 08], which aims to provide a European biomedical grid for cancer research. The main objective of the project is the semantic integration of heterogeneous biomedical databases using an ontology-based mediator. The mediator is able to query different heterogeneous data sources in terms of a shared ontology for cancer trials, the ACGT Master Ontology (ACGT-MO) [BWC+ 08]. ObTiMA allows clinical trial chairmen to set up patient data management systems with comprehensive metadata in terms of the ACGT-MO. Forms to collect patient data during the trial can be designed and questions on the forms can be created from the ontology in a user friendly way. Forms with ontology annotation created in this process, have the advantage that data, which is later collected on them during the trial and stored in form based databases, have metadata in terms of the ontology. Such data can be seamlessly integrated into a mediator architecture, without a tedious and error-prone mapping process to the ontology, which is necessary to integrate data from legacy data bases. Workflow management is increasingly important for clinical trials. For prospective trials treatment plans, which are workflows to guide doctors through the treatment of a patient, are crucial. Therefore, treatment plans can be designed in ObTiMA and a form with ontology annotation can be assigned to each task of these treatment plans to document the patient’s treatment. In ObTiMA the description for the treatment plans and forms are stored according to the workflow language described in Sec. 4. Although ObTiMA guarantees that the ontology
3
annotation of the forms is syntactically correct and contains only classes and relations from the underlying domain ontology, ObTiMA can currently not check if the annotation is consistent with the knowledge coded in the domain ontology and if structural inconsistencies occur in the workflows.
3
Description logics
We have chosen OWL DL 2 as the ontology language for the domain ontology, since that is the standard ontology language for the Semantic Web, but we do not allow role hierarchy or role assertions. OWL DL 2, as most modern ontology languages is based on description logics (DLs) a family of knowledge representation formalisms. The basic elements in DL are individuals (a,b,c,...), atomic concepts (A,B,...) representing sets of individuals and roles (r,s,..) representing binary relationships between individuals. Every DL provides constructors for defining (general) roles (R, S,...) and (general) concepts (C,D,...). Description logic languages are distinguished by the constructors they provide. Formal semantics for DL-languages are defined with interpretations I. An interpretation I is a pair I = (∆I , .I ), where ∆I is a non-empty set, called the domain of the interpretation and .I is the interpretation function that assigns: to every atomic concept A a subset AI ⊆ ∆I , to every atomic role r a binary relation rI ⊆ ∆I × ∆I , and to every individual a an element a I ∈ ∆I . The interpretation function is extended to complex concept and role descriptions by inductive definitions. When considering concrete datatypes in a DL, then the basic elements comprise furthermore constants and datatypes. The interpretation of the DL is then defined with respect to a concrete domain, which is a tuple D = (∆D , .D ), where ∆D is a fixed set called the data domain and .D assigns to each constant v an element (vD ∈ ∆D ) and to each datatype d a set (dD ⊆ ∆D ). In description logic a TBox T introduces the terminology (i.e. the vocabulary of an application domain) as a set of terminological axioms of the form C v D or C ≡ D. Axioms of the first kind are called inclusions, while axioms of the second kind are called equalities. A RBox describes the roles as a set of role axioms e.g. R v S or R ≡ S. An interpretation I satisfies an inclusion C v D (R v S) written as I |= C v D (I |= R v S) if C I ⊆ DI (RI ⊆ S I ), and an equality C ≡ D (R ≡ S) if C I = DI (RI = S I ). An ABox A describes a set of instantiations by assertions about individuals in terms of the vocabulary introduced in a TBox. An ABox contains concept assertions C(a) and role assertions R(a,b). An interpretation I satisfies a concept assertion C(a) (written as I |= C(a) ) if aI ∈ C I , and it satisfies a role assertion R(a,b) if (aI , bI ) ∈ RI . We assume an ontology O to be a finite set of terminology axioms (TBox), role axioms (RBox) and assertions (ABox). The domain ontologies, which we consider in the following, do not comprise role axioms and assertions. An interpretation I is a model of an ontology O if I satisfies all axioms in O. An ontology O implies an axiom α (written O |= α) if I |= α for every
4
model I of O. An Abox A is consistent with respect to an ontology O if there is a model I for O such that I |= A. An Abox A implies an assertion β (written A |= β) if I |= β for every model I of A. The signature of an ontology O (of an ABox) is the set Sig(O) (Sig(A)) of atomic concepts, atomic roles and individuals that occur in O (respectively in A). Note, that in the following we use the notation x in A for x ∈ Sig(A) and X in O for X ∈ Sig(O), where x is an individual and X is a concept. The explicit logical basis of OWL-DL 2 is SROIQ(D). Due to simplicity, we do currently not allow role constructors or role axioms in the domain ontologies we consider. Therefore, the explicit logical basis for the domain ontologies, we consider in this paper is the description logic ALCQ(D). Although OWL-DL has various XML-based notations, we will use in the following the description logic notation as described in this section and in [BCM+ 03] since this notation is more suited to describe our algorithms. But note that we use class instead of concept. Please note, that, when considering OWL-ontologies, concepts are called classes and roles are called properties. We will use both notations in the following.
4
Specifying Workflows
In this section we describe a simple workflow language with an ontology-based data perspective (SWOD). A workflow description in SWOD is assembled of a workflow template and a workflow annotation. The workflow template describes the control flow and the forms. The workflow annotation contains the semantic description for the data in terms of an existing domain ontology, which is necessary for data integration and interpretation.
4.1
Workflow Template
A workflow template describes tasks and their transitions. Tasks describe a piece of work, which has to be executed by a human user. Each task contains one form that has to be filled in to document the piece of work. A form contains one or more items, which are the questions to collect data. SWOD supports the basic control flow patterns sequence, XOR-/AND-split and XOR-/AND-join (for semantics s. [RtHvdAM06]). To simplify the exposition, we will describe in the following SWOD, inconsistencies and the consistency checking algorithms firstly for workflows without concurrency. In Sec. 6 we describe how to handle concurrency. Each outgoing transition of an XOR-split task has an associated condition. Each workflow description has one explicitly defined begin task and one end task. To illustrate our language and our algorithms, we use a simple example treatment plan. The outline is shown in Fig. 1.
5
Condition A Registration
Surgery
Form RE
Form SU
ChemotherapyA Form CA
ChemoCondition B therapyB
Follow Up Form FU
Form CB
atomic task XOR-split task
XOR-join task
Figure 1: Example workflow: “example treatment plan”.
4.2
Workflow Annotation
A workflow annotation describes the semantic annotation of the data in terms of an existing domain ontology. It assigns a description from the ontology to each item (described in item annotations) and to each condition (in condition annotations). With this information the data, stored in form based data sources, can be queried in terms of the ontology, e.g. through a mediator. Each workflow has a so-called focal point, which denotes the subject of a workflow execution (e.g. for a treatment plan it is a patient). Each annotation for an item or a condition refers to that focal point. The workflow annotation for the example treatment plan is shown in Lst. 2. The underlying domain ontology is a simple example ontology, which is described in Lst. 1. In the following all classes of the domain ontology are prefixed with ’d’. Item Annotation. An item annotation is the ontology description for an item. It consists of an ontology path and an item constructor. Ontology Path. An ontology path is the basic ontology description for an item. In a simplified notation it can e.g. be “Patient hasTumor Tumor” or “Patient hasTumor Tumor hasWeight Weight”. Formally an ontology path consists of variable assertions and relation assertions. A variable assertion is defined as “type(vname)”, where type can be a class or a primitive data type from the domain ontology and vname is the name representing the variable. Variables of the former kind are called object variables, variables of the later kind data type variables. E.g. in Lst. 2, line 7 the object variable “PTumor” of type “d:Tumor” and the data type variable “PDia” of type “float” are described. Relations between variables can be expressed with relation assertions (e.g. in Lst. 2, line 7 “PTumor” and “PDia” are related with “hasDiameter”). A relation between an object variable and a data type (an object) variable has to be a data type (an object) property from the domain ontology. For each workflow description one focal variable exists, which represents the focal point of the workflow, e.g. “PPatient” (Lst. 2, line 2). Each ontology path for an item
6
starts with this variable. Item Constructors. Different kinds of item descriptions can be assembled from an ontology path depending on how the value of the item is considered in the semantics (s. Sec. 4.3). Therefore, we defined different item constructors. The last variable in the ontology path is the associated variable of the item constructor. 1. Value-Items query values of datatype properties from the ontology, e.g. “diameter of tumor” (Lst. 2, line 6-7) or “weight of patient”. Numerical value-items can have associated range restrictions, denoted with “Min” and “Max”. The associated variable has to be a data type variable. E.g. the associated variable for item “diameter of tumor” is “PDia” and the maximum value is 20 cm. 2. Exist-Items query if an individual for the associated variable exists, e.g. “Has patient metastasis?” (Lst. 2, line 15-16) or “Has patient a tumor?”. The associated variable has to be an object variable. 3. Specify-Items are multiple choice items, which restrict an existent variable of a class by one of its subclasses, e.g. “Type of tumor?” with answer possibilities: “breasttumor”, “nephroblastoma”, “other” (Lst. 2, line 1114). The associated variable has to be an object variable. The answer possibilities need to have associated class descriptions from the ontology which have to be subsumed by the ontology classes which are defined as type of the associated variable. A variable or its equivalent variables may currently only occur as associated variable for one item of the same type. Note, that there are more complex constructors to create items by more complex descriptions from the ontology. For reasons of simplicity this shall not be considered at the moment. The grammar for an item annotation is as follows: itemAnnotation ::= ItemAnnotation(itemID ontoPath itemConstructor); ontoPath ::= OntoPath(focPointDesc ontoPathPart); ontoPathPart ::= (relAssert varAssert)∗; itemConstructor ::= existItem|specifyItem|valueItem; valueItem ::= Value(dvar Min(minv)? Max(maxv)?); specifyItem ::= Specify(ovar cases); cases ::= case | cases case; case ::= Case(acode ontdescr); existItem ::= Exist(ovar); focPointDesc ::= varAssert; relAssert ::= rel(srcvar tarvar); varAssert ::= type(var);
5
Ontology: HumanBeing v > u ∀ hasTumor.Tumor u ≤ 1 hasMother.HumanBeing u ¬ Tumor u ¬ Metastasis u ∀ hasAge.nonNegativeInteger[< 150] Tumor v > u ¬ Metastasis u ¬ HumanBeing Braintumor v Tumor u ¬ Breasttumor u ¬ Nephroblastoma u ∀ locatedIn.Brain
7
10
Breasttumor v Tumor u ¬ Nephroblastoma u ¬ Braintumor u ∀ locatedIn.Breast Nephroblastoma v Tumor u ¬ Breasttumor u ¬ Braintumor u ∀ locatedIn.Kidney Metastasis v > u ¬ Tumor u ¬ HumanBeing > v ∀ hasMetastasis.Metastasis u ∀ hasTumor.Tumor u ∀ hasDiameter.float[> 0.0] u ∀ hasAge.nonNegativeInteger u ∀ hasName.String ...
Listing 1: Extract from underlying domain ontology for example treatment plan.
5
10
15
20
WFAnnotation(ExampleTreatmentPlan focal PPatient FormAnnotation(FormRE ItemAnnotation(IAgeMother OntoPath(d:HumanBeing(PPatient) hasMother(PPatient, PMother) d:HumanBeing(PMother) hasAge(PMother, PAge) integer(PAge)) Value(PAge Max(130.0))) ItemAnnotation(ITumorDiameter OntoPath(d:HumanBeing(PPatient) hasTumor(PPatient, PTumor) d:Tumor(PTumor) hasDiameter(PTumor, PDia) float(PDia)) Value(PDia Max(20.0))) FormAnnotation(FormSU ItemAnnotation(INameMother OntoPath(d:HumanBeing(PPatient) hasMother(PPatient, PMother) d:HumanBeing(PMother) hasName(PMother, PNam) string(PNam)) Value(PNam)) ItemAnnotation(ITumorType OntoPath(d:HumanBeing(PPatient) hasTumor(PPatient, PTumor) d:Tumor(PTumor)) Specify(PTumor Case(breasttumor d:Breasttumor) Case(nephroblastoma d:Nephroblastoma) Case(other NOT(d:Breasttumor OR d:Nephroblastoma)))) ItemAnnotation(IMetestasis OntoPath(d:HumanBeing(PPatient) hasMetastasis(PPatient, PMet) d:Metastasis(PMet)) Exist(PMet)) ConditionAnnotation(ConditionA d:Tumor(?tum) ∧ hasTumor(PPatient, ?tum) ∧ hasDiameter(?tum, ?dia) ∧ float(?dia) ∧ ≤(?dia, 4)) ConditionAnnotation(ConditionB d:Tumor(?tum) ∧ hasTumor(PPatient, ?tum) ∧ hasDiameter(?tum, ?dia) ∧ float(?dia) ∧ >(?dia, 4))))
Listing 2: Extract from workflow annotation for example treatment plan.
Condition Annotation. A condition annotation is the ontology description for a condition. It is formalized similar to bodies of SWRL-rules (Semantic Web Rule Language [HPSB+ 04]), but in a simplified form. Conditions refer to the focal variable of the workflow also called focal variable of condition. Furthermore, the conditions refer to condition variables, which are prefixed with “?” (e.g. ?x). Conditions consist of a conjunction of atoms of the form class(x), dataType(x), objProp(x,y), dataProp(x,y) or cmpOp(x,y), where class is a class description, dataType a data type, objProp is an object property, dataProp an data type property from the domain ontology, cmpOp is a comparison operator like ≤ and x and y are either the focal variable of the condition, condition variables or data values. Each condition variable has to be related by property atoms to the focal variable. An example for a condition is ”Patient has a tumor with weight greater than 4”, which is formalized as follows: Tumor ( ? tumor ) ∧ HasTumor ( PPatient , ? tumor ) ∧
8
HasWeight ( ? tumor , ? w e i g h t ) ∧ >(? weight , 4 )
The grammar for a condition annotation is as follows: conditionAnnot ::= ConditionAnnotation(conID condition); condition ::= atom ∧ condition | atom; atom ::= class(x) | dataType(x) | objProp(x y) | dataProp(x y)| cmpOp(x y);
4.3
Semantics of Workflow Description
We describe the semantics of a workflow description by defining workflow executions for it. A workflow execution starts with the begin task of the workflow description. Then tasks are executed in the order they are related with transitions. For an XOR-split the task succeeding the condition, which is satisfied for the current execution, is executed next. This must be exactly one in a consistent workflow description. During execution of a task a user has to fill in a value into each item of the associated form of the task. Since we do not consider concurrency, at each point of execution the executed part of the workflow can be described by a sequence of already filled items (described by their item annotation and the filled in value). We call such a sequence executed workflow data path (EWDP), the grammar is as follows: ewdp::= ewdp filledItem | filledItem; filledItem := itemAnnotation value;
From such an EWDP an ABox can be calculated (s. Sec 4.3.1), representing the state, i.e. the data which has been collected until this point in the workflow execution. This ABox is the base to query the collected data in terms of the ontology. If a condition is satisfied for an execution can be determined with the help of the ABox calculated from the EWDP ending with the last item before the condition. A condition is satisfied for the execution, if it has a valid binding to this ABox (s. Sec. 4.3.2). 4.3.1
Creation of ABoxes for workflow executions
To derive the ABox Ap for one EWDP from an empty ABox we defined a calculus, called “ABoxRules”, which is shown in Appendix B.1. In Table 1 an ABox derived from an example workflow execution with calculus ABoxRules is shown. The calculus extends Ap for each filledItem as follows: • Ontology Path. For each of the object variables in ontology path an individual is created in Ap by adding its variable assertion as concept assertion and each of the object relation assertions is added as role assertion into Ap . The individual created for the focal variable of the workflow is called focal individual (e.g. PPatient in Table 1). • Item constructor. For a value-item the according individual in Ap is related with the appropriate data type relation to the value of the item. For
9
a specify-item the according individual is restricted with the class description associated to the answer filled into the form. For an exist-item an individual for the associated variable is only created if value is “yes”. Introducing Item Age of mother: ’51’ Diameter of tumor: ’2.1 cm’ Name of mother: ’Anna Zeil’ Type of tumor: ’breasttumor’ Has metastasis: ’Yes’
ABox AP {d:HumanBeing(PPatient), hasMother(PPatient, PMother), d:HumanBeing(PMother), hasAge(PMother, 51)}∪ {hasTumor(PPatient, PTumor), d:Tumor(PTumor), hasDiameter(PTumor, 2.1)}∪ {hasName(PMother, ’Anna Zeil’)}∪ {d:Breasttumor(PTumor)}∪ {hasMetastasis(PPatient, PMet), d:Metastasis(PMet)}
Table 1: ABox created for example execution of treatment plan. On the left the values filled into the items during execution are shown, on the right the assembled ABox is shown, assertions are ordered according to the items on the left. 4.3.2
Checking if condition is satisfied
A condition is satisfied in a workflow execution, if a valid binding exists between the condition and the ABox created from the EWDP ending with the last item before the condition. A formal defintion for valid bindings is given in Definition 4.1. In principle a valid binding between a condition and an ABox exists, if each of the variables from the condition can be bound to an individual or a constant of the ABox (represented as tuple in a set called binding B), where the focal variable of condition is bound to the focal individual, and for each atom in the condition the appropriate constraint described in Table 2 holds on the ABox and the binding. If the condition does not have a valid binding to the ABox, it is either not satisfied or it is not decidable if it is satisfied or not, due to missing data. E.g. Condition A (s. Lst. 2) has a valid binding to the ABox AP described in Table 1 and is therefore satisfied for the example workflow execution. A valid binding is BA = {(PPatient, PPatient); (PTumor, ?tum); ( 2.1, ?dia)}, since for each of the atoms in Condition A the appropriate constraint described in Table 2 holds. For condition B no valid binding exists. Binding BA is not a valid binding between Condition B and AP because the constraint for atom >(?dia, 4) does not hold. Therefore, in the example workflow execution, after the task Surgery the task Chemotherapy A is executed.
Definition 4.1. A condition C is satisfied for an ABox A if a valid binding B exists, which binds each variable from C to an individual or constant from A. A binding is a set of 2-tuples consisting of a condition variable from C and an individual or a constant from A. For a valid binding between A and C it has to hold: 10
Atom C(var) P(srcvar, tarvar) cmpOp(var, const)
Constraint ∃ x ∈ Sig(A).((x,var) ∈ B) ∧ (A |= C(x)) ∃ x, y ∈ Sig(A).({(x, srcvar), (y, tarvar)} ⊆ B) ∧ (A |= P(x,y)) ∃ x ∈ Sig(A). ((x, var) ∈ B) ∧ (x cmpOp const))
Table 2: Constraints for a binding B between a condition C and an ABox A, which have to hold for each atom in C, in order that B is valid. C is a class description or data type, P is a data type property or an object property, cmpOp is a comparison operator. • for each of the variables v in C the binding B comprises exactly one tuple (x, v), where x is an individual or constant from A • the focal condition variable of C is bound to the focal individual of A: (α(focvar, focalID), focConVar) ∈ B • for each atom in C the appropriate constraint described in Table 2 holds.
4.4
Data Inconsistencies
In the following we describe the different kinds of data inconsistencies, which can occur in a workflow description. 4.4.1
Data-Dependent Control Flow Inconsistencies
• Unsatisfiable conditions. If a condition is not satisfied for any workflow execution (has no valid binding to the ABox of its EWDP), we call the condition unsatisfiable for the workflow description. That means that for each workflow execution it is either not satisfied or it is not decidable if it is satisfied or not, due to missing data. In such a case it can occur that the tasks after the condition are not reachable. Examples: – When in the example treatment plan Condition A is replaced with “Patient does not have a tumor”, formalized as (=0 hasTumor.d:Tumor) (PPatient), this condition is unsatisfiable. It is not satisfied for any workflow execution, since each patient for which the forms are filled in has a tumor. – The condition “Patient has brain tumor” is also unsatisfiable at this point, since it can not be decided if the patient has a brain tumor from the information filled into the forms. In both cases task Chemotherapy A is not reachable, what is clearly a design error in the workflow description. • XOR-stalls If in a workflow execution none or more then one of the conditions at an XOR-split are satisfied, the task to be executed next can not be determined unambiguously. We call such a situation XOR-stall. In such a case workflow execution aborts. 11
Please note, that XOR-stalls can be also omitted by default conditions. Nevertheless, to detect this kind of inconsistency is crucial. When considering complex workflows with complex conditions, stating each condition explicitly and not introducing default-conditions can avoid that the default-condition comprises cases, the workflow designer did not think about, which do not fit to the flow following the default-condition. Example: Condition A of the treatment plan is replaced with “Patient has breasttumor” and Condition B with “Patient has nephroblastoma”. Since in case answer possibility “other” is selected for specify item “Type of tumor?”, none of the conditions is satisfied, an XOR-Stall occurs. 4.4.2
Semantic Data Inconsistencies
To guarantee that each ABox, which can be created during workflow execution, is consistent, the descriptions of the items must not violate any restriction of the domain ontology. If at any point of a workflow execution the calculated ABox is inconsistent wrt. the domain ontology, a semantic data inconsistency occurs. In such a situation the workflow description contradicts the domain ontology, what effects collected data to be erroneous. In such a case workflow execution aborts. We divide semantic data inconsistencies into different kinds, regarding which kind of restriction is violated in the domain ontology. We describe in the following the different kinds of inconsistencies with an example and give a definition what has to hold in at least one of the ABoxes A, which can result from the execution of the workflow, in order that the inconsistency occurs. • Violation of disjoint class restrictions Example: A variable on a form is declared to be of type “Breasttumor”, an equivalent variable on another form of type “Nephroblastoma”. Both forms can occur in the same workflow execution. The classes “Nephroblastoma” and “Breasttumor” are declared to be disjoint in the domain ontology. Definition: A |={ontoClass(x); disOntoClass(x)}, where ontoClass and disOntoClass are disjoint class descriptions from the domain ontology. • Violation of number restrictions Example: In the domain ontology is defined that a human being can have at most one mother. Two variables with different names for the patient’s mother are defined on the forms . Definition: Let objRel be an object property from the domain ontology and n a non negative integer, ontClass and ontClassF be classes from the domain ontology. A number restriction from the domain ontology (OD |= ontClass v (≤ n objRel.ontClassF)) is violated if an individual x exists with A |= ontClass(x) and at least n+1 individuals yi with ∀yi=1..n+1 .A |= {ontclassF( yi ); objRel(x, yi )} exist, with yi 6= yj for i 6= j. 12
• Violation of data type restrictions Example: In the domain ontology is defined that the height of a patient must not be greater than 250 cm. On the forms for the workflow a value between 0 and 300 cm may be filled into an item, which queries the height of a patient. Definition: Let datarel be a data property from the domain ontology and C a class from the domain ontology. A datatype restriction from the domain ontology (OD |= C v ∀ datarel.datarangeD) is violated if an individual x and a constant y exist with A |= {datarel(x,y); C(x) } and y ∈ / datarangeD. • Violation of domain, range or value restrictions Example: In the domain ontology is defined that the range of the object property “hasTumor” is the class “Tumor” and the classes “Tumor” and “Metastasis” are declared to be disjoint. On a form two object variables are declared, one of type “HumanBeing” and one of type “Metastasis” and both are related with “hasTumor”. Definition for violation of range restriction: Let objRel be an object property from the domain ontology and ontologyClass a class from the domain ontology. A range restriction from the domain ontology (OD |= > v ∀ objRel.ontologyClass) is violated if individuals x and y exist with A |= {objRel(x,y); ¬ ontologyClass(y)}. Similar definitions hold for domain and value restriction.
5
Algorithms to detect inconsistencies
In this section, we describe two algorithms to detect the described inconsistencies during design time. As a first step, we present an algorithm to compute all possible ABoxes for a workflow and check them for the inconsistencies. From the construction of the algorithm follows that all inconsistencies are detected. However, in general, this algorithm does not terminate. In a second step, we describe a refined terminating algorithm and prove that the second algorithm detects the same inconsistencies as the first one. The first description of the algorithms will only focus on workflows without concurrency. In Sec. 6 we will describe how to extend the algorithms for workflows with concurrency.
5.1
Consistency checking algorithm based on ABoxes (CCABoxes-algorithm)
The CCABoxes-algorithm computes all possible ABoxes XP for each of the possible workflow paths and checks them for the described inconsistencies. A workflow path for a workflow description is any sequence of tasks, which are connected through transitions and start with the begin task. We will describe in the following the CCABoxes-algorithm, an outline is shown in algorithmus 1.
13
Determine WFDPs. To calculate Xp for one workflow path we have to consider the items in the flow as well as each condition, because the ABoxes for which the condition is not satisfied are not any more possible after its application. We describe this information by a workflow data path (WFDP), which is a sequence of item and condition annotations, according to the following grammar: wfdp::= wfdp itemAnnotation| wfdp conditionAnnot | itemAnnotation | conditionAnnot;
The first step of the CCAboxes-algorithm is to determine all workflow paths from swodWF, which end with the end task and store the corresponding WFDPs in a set W. E.g. one of the two WFDPs for the example consists of the item annotations IAgeMother, ITumorDiameter, INameMother, ITumorType, IMetastasis and Condition A. Create XP s for each WFDP in W. Xp is created by the function aboxesRules(XP , OD , WFDP), where XP i is the initial set of ABoxes comprising an empty ABox (s. Sec. 5.1.1). The result of the function is either a set of consistent ABoxes, which are possible for this WFDP, or an error term describing an inconsistency which occurred during creation of Xp . If the result is an error term ERROR(kind id) with kind “XOR-Stall” or “Semantic Data inconsistency”, the algorithm aborts with an error message. If kind is “Condition unsatisfiable for WFDP” the processed WFDP is deleted from W since the corresponding workflow path can never be taken during workflow execution. After Xp is created for each WFDP, it is checked if a condition is unsatisfiable for the workflow by checking if any of the conditions does not appear in any of the WFDPs left in W, which represent the possible flows through the workflow. In that case the algorithm aborts with an appropriate error. If the algorithm does not abort with an error the workflow is consistent. Algorithm 1: CCABoxes-algorithm input: SWOD WF description swodWf, domain ontology OD Set W ← determineWfdps(swodWf); foreach WFDP ∈ W do Xpi ← {{}}; Xp ← aboxesRules(Api , OD , WFDP); if XP ≡ ERROR(kind, id) then if kind ≡ “XOR-Stall” OR “Semantic Data Inconsistency” then ABORT with ERROR(kind, id) if kind ≡ “Condition unsatisfiable for WFDP” then delete WFDP from W foreach condition con appearing in swodWf do if con does not appear in any of WFDP ∈ W then ABORT with ERROR(Condition unsatisfiable, id);
14
5.1.1
Creation of XP
Function aboxesRules(XP i , OD , WFDP) creates Xp as the longest possible derivation with the rules of the calculus SOABoxRules, which is shown in Appendix B.2. In principle Xp is created as follows from WFDP and the intitial set of ABoxes utilizing the calculus. WFDP is split into its item annotations and conditions (described in rules SAB-WfdpComp and SAB-WfdpCon), which are processed in the order they appear in the WFDP: • Item Annotation. For each item annotation following steps are processed as described in rule SAB-ItemAnnotationV and SAB-ItemAnnotationSE: – Object variable assertion. An individual assertion for each of the object variable assertions from the ontology path is inserted into each A ∈ Xp . To determine the individual for an object variable and a constant focal ID the function α(objvar) is used, where objvar is the object variable. – Object relation asssertion. A role assertion for each object relation assertion is inserted into each A ∈ Xp . – Item constructor. XP is extended to comprise all ABoxes resulting from all possible input values to this item. – Semantic Data Inconsistency. After processing an item description, each ABox in Xp is checked with a DL-reasoner for consistency (in function checkSemInc). If any inconsistency occurs, Xp is set to an error term ERROR(“Semantic Data Inconsistency”, itemId). • Condition. For a condition the set of possible ABoxes is reduced to comprise only the ABoxes for which the condition is satisfied. The set of valid bindings for an ABox A and a condition can be derived from the set of bindings containing only the initial binding by applying the calculus InfRules-Bin (described in Appendix B.3) with condition. The initial binding for an ABox and a condition contains one tuple that indicates the binding of the focal individual to the focal condition variable (focInd, focCondVar). – Condition unsatisfiable for WFDP. If Xp is empty after applying the condition, Xp is set to ERROR(“Condition unsatisfiable for WFDP”, conditionId). – XOR-Stall. Before the first condition of an XOR-split is applied it is checked if each ABox has a binding to at least one of the conditions. If that is not the case, a possible ABox exists for which none of the conditions is satisfied and Xp is set to an error term ERROR(XORStall, conditionId).
15
5.2
Consistency checking algorithm based on ontology (CCOnto-algorithm)
Primitive data types (e.g. integer) can have infinite data ranges. In the CCABoxesalgorithms e.g. for a value-item with data type integer, an Abox is created for each possible value of the data range, resulting into creation of an infinite number of ABoxes. That means the CCABoxes-algorithm is not computable. Therefore, we need to replace XP with an abstraction of the set of ABoxes, that has to be finite and that preserves the information to detect the described inconsistencies to gain a computable algorithm. We chose as an abstraction an ontology, which describes all ABoxes sufficiently. The resulting “CCOnto-algorithm” is described in Algorithm 2. The steps are the same as in the CCABoxesalgorithm. The only difference is that instead of Xp an ontology Op , which we will call path ontology is created and utilized to detect inconsistencies. Algorithm 2: CCOnto-algorithm input: SWOD WF description swodWf, domain ontology OD Set W ← determineWfdps(swodWf); foreach WFDP ∈ W do Opi ← OD ; Op ← ontoRules(Opi , WFDP); if OP ≡ ERROR(kind, id) then if kind ≡ “XOR-Stall” OR “Semantic Data Inconsistency” then ABORT with ERROR(kind, id) if kind ≡ “Condition unsatisfiable for WFDP” then delete WFDP from W foreach condition con appearing in swodWf do if con does not appear in any of WFDP ∈ W then ABORT with ERROR(Condition unsatisfiable, id);
5.2.1
Construction of path ontology
Function ontoRules(OP i , WFDP) creates Op as the longest possible derivation with the rules of the calculus OntoRules shown in Appendix B.4. The initial path ontology, OP i is an empty ontology. In principle Op is created as follows from WFDP and the intitial path ontology utilizing the calculus. WFDP is split into its item annotations and conditions (described in rules O-WfdpComp and O-WfdpCon), which are processed in the order they appear in the WFDP: • Item Annotation. For each item annotation following steps are processed as described in rule O-WfdpItemAnnotationV and O-WfdpItemannotationSE (illustrated with item IAgeMother (Lst. 2, line 4-5)): – Object variable assertion. As described in rule O-ObjVar, for each object variable in the ontology path a class is created, which can be 16
determined with function γ(ovar), called corresponding class of the variable. The class is declared to be a subclass of the type of the variable. Furthermore, the class is declared to be disjoint to other path classes. The class created for the focal variable is called top focal path class and its subclasses are called focal path classes. For the example item “PPatient v d:HumanBeing u ¬ PMother” and “PMother v d:HumanBeing” are added to Op . Class “PPatient” is the top focal path class. – Object variable assertion. For each object relation with source variable X, property z and target variable Y in Op it is declared as described in rule O-ObjRel that an individual of a corresponding class for X has at least one relation z to an individual of a corresponding class of Y (i.e. a qualified number restriction is added to Op , e.g. PPatient v ≥1 hasMother.PMother). – Item constructor. For an item constructor, depending on its type, item specific information is added to Op . For a value-item, as described e.g. in rule O-VitemMinMax, beside others it is declared that the corresponding class of the last object variable in the ontology path has at least one relation to the appropriate range of the item, i.e. the datatype of the item restricted with the min and the max value. For example item “PMother v ∃ hasAge.integer[< 130]” is added to Op ). Here it becomes clear that using an ontology allows the CCOnto-algorithm to work with data ranges instead of single data values, which effects the CCOnto-algorithm to be terminating. For an exist-item with associated variable y, last assertions objProp(x y) and class(y) in its ontology path, classes for answer possibilities “yes” and “no” are created as subclasses for each leaf sub class of X, where X is the corresponding class for x. A class created for answer possibility “no” has as superclass (=0 objProp.class). A class created for answer possibility “yes” has as superclass (≥ 1 objProp.Y) where y is the corresponding class for Y. E.g. for the item “Has patient Metastasis?” classes PPatMet and PPatNMet are created (s. Lst. 3). For a specify-item classes for each answer possibility are created, which have as super classes the according ontology description of the answer possibility and the corresponding class of the associated variable. E.g. for the item “type of tumor” classes PTumB, PTumN and PTumO are created (s. Lst. 3). For exist-items and specify-items OP is expanded. We call class X related to class Y (resp. Y backwards related to X), if OP |= (X v (≥ 1 objProp.Y)). OP is expanded if any class X is related to any superclass S of a set of newly created classes setNew. Then for each class R ∈ setNew a new class N is created as subclass of X and N is related to R. For the newly created classes the process is
17
repeated. E.g all subclasses of PPatMet and PPatNMet are created in the expansion step for item “type of tumor”. – Semantic Data Inconsistency. After processing an item description, Op is checked, in function checkSemInc, with a DL-reasoner for consistency and subsumption of data ranges (s. Theorem 5.7). If any inconsistency occurs, Op is set to an error term ERROR(“Semantic data inconsistency”, itemId). Op after processing all item descriptions for first workflow path of example treatment plan is shown in Lis. 3. • Condition. When a condition is applied to OP , all classes, which are corresponding to individuals belonging to ABoxes for which the condition is not satisfied, have to be deleted. Therefore the condition ontology is created, OP is preprocessed for the condition and it is reduced to its focal path classes that are subsumed by the focal condition class, to their superclasses and classes which are backwards related to these classes (for details s. below). – Condition unsatisfiable for WFDP. If Op has after applying a condition no more path classes, the condition is unsatisfiable for this WFDP. Then Op is set to ERROR(“Condition unsatisfiable for WFDP”, conditionId). – XOR-Stall. Before the first condition of an XOR-split is applied it is checked if the union of the focal condition classes of each outgoing transition subsumes the top focal path class. If that is not the case, it exists a possible ABox for which none of the conditions is satisfied and Op is set to an error term ERROR(XOR-Stall, conditionId). The union of the focal condition classes of the outgoing transitions of the XOR-split in the example treatment plan is APatient t BPatient (with BPatient ≡ =0 hasMetastasis.BMetastasis; BMetastasis ≡ Metastasis). Since with an DL-reasoner can be detected that the union subsumes the focal path class “PPatient”, no XOR-Stall can occur. PPatNMet v PPatient u =0 hasMetastasis.PMet u ¬ PPatMet PTumor ≡ PTumB t PTumN t PTumO PTumor v d:Tumor u∃ hasDiameter.float[≤ 20] u ¬ PPatient PPatMet v PPatient u ≥ 1 hasMetastasis.PMet PTumB v PTumor u d:Breasttumor u ¬ PTumN u¬ PTumO PMet v d:Metastasis u ¬ PTumor u ¬ PPatient u¬ PMother PTumN v PTumor u d:Nephroblastoma u¬ PTumB u¬ PTumO PPatient ≡ PPatNMet t PPatMet PPatient v d:HumanBeing u ≥ 1 hasMother.PMother u ≥ 1 hasTumor.PTumor PTumO v PTumor u ¬(d:Breasttumor t d:Nephroblastoma) PMother v d:HumanBeing u ∃ hasAge.integer[< 130] u∃ hasName.string
18
u¬ PTumor u¬ PPatient PPatMetBre v PPatMet u ≥ 1 hasTumor.PTumB PPatMetNep v PPatMet u ≥ 1 hasTumor.PTumN PPatMetOth v PPatMet u ≥ 1 hasTumor.PTumO PPatNMetBre v PPatNMet u ≥ 1 hasTumor.PTumB PPatNMetNep v PPatNMet u ≥ 1 hasTumor.PTumN PPatNMetBre v PPatNMet u ≥ 1 hasTumor.PTumO
Listing 3: Op after processing all item descriptions for first workflow path of example treatment plan; classes from domain ontology are prefixed with ’d’.
Construction of Condition Ontology. We have defined a calculus called InfRules-CO (s. Appendix B.5) to derive the condition ontology from an empty ontology. For each object variable in the condition, a class is created called condition variable class (CC). CC is set equal to class >. We call the conjunction of classes, which are set equal to CC, condition variable equal class (CEC). For each atom in the condition the condition ontology is extended as follows: • class(x): The CEC of variable x is set to (R u class), where R is the CEC of x before the application of the atom. • objProp(x, y): The CEC of variable x is set to (R u (≥ 1 objProp.Z)), where R is the CEC of x and Z is the CC of y before the application of the atom. • dataProp(x, y): The CEC of variable x is set to (R u (≥ 1 dataProp)), where R is the CEC of x before the application of the atom. • dataType(y) The CEC of each variable x, which is related with a data property DataProp to y is set to (P u (≥ 1 dataProp.datatype)) resp. (P u (≥ 1 dataProp.odatatype ∩ datatype)), when it was before the atom was applied (P u (≥ 1 dataProp) resp. (P u (≥ 1 dataProp.odatatype). • cmpOp(y, const): The CEC of each variable x, which is related with a data property DataProp to y is set to (P u (≥ 1 dataProp.[cmpOp const])) resp. (P u (≥ 1 dataProp.odatatype ∩ [cmpOp const])), when it was before (P u (≥ 1 dataProp) resp. (P u (≥ 1 dataProp.odatatype). As a first example we consider that Condition A in the example treatment plan is replaced with “Patient has Metastasis” (d:Metastasis(?met) ∧ hasMetastasis(PPatient, ?met)). We apply the condition to OP (s. Lst. 3) to determine if it is satisfiable for the WFDP. The condition ontology is as follows, where CPatient is the focal condition class: CPatient ≡ ≥1 hasMetastasis.CMet CMet ≡ d:Metastasis
In this example it is not necessary to preprocess OP as described below. The condition is satisfiable for the workflow path, since the focal path class PPatMet can be inferred as subclass of CPatient. Therefore, after applying the condition, OP comprises the classes PPatMet, its subclasses, its superclass PPatient and the classes, which are backwards related to these classes, which are PTumor, its 19
subclasses and PMet. Class PPatNMet and its subclasses are deleted from OP . Preprocessing path ontology to apply condition. To preprocess OP for a condition, for each datatype variable in the condition, OP is extended as follows. The datatype dataC of the variable and data property dataPropC, which relates the variable to object variables, is determined. Each leaf path class L in Op , which is a subclass of (∃ dataPropC.dataP), where dataP overlaps with dataC is determined. For each class L, two subclasses are created, one as a subclass of (∃ dataPropC.dataP ∩ dataC) and the other as subclass of (∃ dataPropC.dataP\dataC). OP is expanded as described above. The condition ontology derived for Condition A as described in Lst 2 is: CPatient ≡ ≥1 hasTumor.CTumor CTumor ≡ d:Tumor u ≥1 hasDiameter.float[≤ 4]
OP is expanded for the condition as follows. For data type variable ?dia the datatype is float[≤ 4]. Object variable ?tum is related with data property hasDiameter to ?dia. The classes PTumO, PTumB and PTumN are subclasses of data type restriction (∃ hasDiameter. float[≤ 20]). For each of them two subclasses are created, which we will call small tumor and big tumor class in the following, for PTumB e.g. big tumor class PTumBG and small tumor class PTumBL are created and therefore following axioms are added to OP : PTumBL v PTumB u (∃ hasDiameter.float[≤ 4]) PTumBG v PTumB u (∃ hasDiameter.float[> 4, ≤ 20])
In the expansion step for each leaf focal path class two subclasses are created, one which is related to the appropriate small tumor class and one related to the appropriate big tumor class. When the condition is applied to OP , it can be detected that the condition is satisfiable for WFDP, since all top focal path classes, which are related to small tumor classes, are subclasses of the focal condition class. After applying the condition, all focal path classes related to big tumor classes are deleted from OP as well as all big tumor classes itself, since they are no longer backwards related to any focal path class. Using “Patient does not have a tumor”, formalized as (=0 hasTumor.d:Tumor) (PPatient), as an replacement of Condition A is an example for an unsatisfiable condition. The condition ontology is: CPatient ≡ (=0 hasTumor.d:Tumor). None of the focal path classes in OP is a subclass of CPatient, since all are subclasses of (≥ 1 hasTumor.PTumor). That means the condition is unsatisfiable for the WFDP and since Condition A appears in only one WFDP, it is unsatisfiable for the workflow description. 5.2.2
Abstraction criteria
The steps of the CCABoxes - algorithm and the CCOnto-algorithm are the same. The only difference is, that in the CCOnto-algorithm instead of the set of all possible ABoxes XP for the workflow an ontology OP is created and utilized to detect the inconsistencies. Op is called an abstraction of the set of ABoxes XP (s. definition 5.1).
20
Definition 5.1 (Abstraction). Let OD be a domain ontology and WFDP be a workflow data path formalized with OD . Let OP be the path ontology for WFDP, which is derived by applying WFDP to the ontology containing all axioms from OD using OntoRules. Let XP be the set of ABoxes for WFDP, which is derived by applying WFDP to the set with one empty ABox using SOABoxRules. Then OP is called an abstraction of XP . Each rule from the set of OntoRules has a corresponding rule from the set of SOABoxRules. Corresponding rules have the same name (e.g. ontoPathPart) with prefix O for the OntoRules and prefix SAB for the SOABoxRules. In Theorem 5.4 we formulate abstraction criteria, which hold between the path ontology OP and the set of ABoxes XP derived for the same WFDP using the according rules: Each individual in any Abox of XP has at least one corresponding class in OP and each leafsubclass in OP has at least one corresponding individual. The definition for corresponding classes and individuals can be found in Definition 5.2. We will need this theorem to prove soundness and completeness of the CCOnto-algorithm (s. Sec. 5.2.3). The proof for the theorem is by induction over the set of corresponding rules. Firstly we prove that the initial path ontology Opi is an abstraction of the initial set of ABoxes Xpi . Then we prove for each pair of corresponding rules that the criteria still hold after applying the rule. Definition 5.2 (Corresponding classes and individuals). Let ontology OP be an abstraction of set of ABoxes XP and OD be the appropriate domain ontology. Let objvar be an object variable occurring in the WFDP from which OP and XP are derived. An individual x = α(objvar) from any ABox A ∈ XP and a leafsubclass X v γ(ojvar) from OP are called corresponding to each other if they comply to the following criteria: • Criteria 1: Let C be any class description from the domain ontology, but not a data qualified number restriction: A |= C(x) iff Op |= X v C • Criteria 2: Let objrel be an object property from the domain ontology and a an individual from A with a 6= x and A a corresponding class for a: A |= objrel(x,a) iff Op |= X v (≥1 objrel.A). • Criteria 3: Let datarel be a data property, a be some constant, Y a class from Op and datarange some datarange from the domain ontology with a ∈ datarange, and R a qualified number restriction (≥1 datarel.datarange) and (Y v R) ∈ Op \OD : A |= datarel(x,a) iff Op |= X v Y. If Op |= X v Y then an ABox A0 for each b ∈ datarange exists with A0 |= datarel(x, b).
21
Definition 5.3 (Related classes and individuals). Let ontology Op be an abstraction of set of ABoxes Xp and A be an ABox ∈ Xp and OD the appropriate domain ontology. We say that a class X is related to a class Y from Op , if n classes exist, with n ≥ 2 (X1 , ...Xn ) where X1 = X and Xn = Y and for each Xi and Xi+1 it holds Op |= (Xi v (≥1 objreli .Xi+1 )), where objreli is some object property from the domain ontology. Then Y is backwards related to X. We say that an individual x is related to an individual y, if n individuals exist, with n ≥ 2, (x1 , ..., xn ) where x1 = x and xn = y and for each xi and xi+1 it holds A |= (objreli (xi , xi+1 )), where objreli is some object property from the domain ontology. Then y is backwards related to x. Theorem 5.4. Let ontology Op be an abstraction of set of ABoxes Xp and OD be the appropriate domain ontology. Let objvar be any object variable occurring in the WFDP from which Op and Xp are derived. Then it holds: 1. For each individual x= α(ojvar) in any ABox ∈ Xp it exists at least one corresponding class X in Op . 2. For each leafsubclass X v γ(ojvar) in Op it exists a corresponding individual x in at least one of the ABoxes in Xp . Proof. We prove this theorem by induction. We firstly show that for the base case the theorem holds: The ontology containing only the axioms from the domain ontology is an abstraction of the set with one empty ABox. Then we prove in the induction steps for each pair of corresponding rules, that the theorem still holds after applying them with the same WFDP-part to an ontology and a set of ABoxes, for which the theorem holds. • Base Case Since the ontology OP i containing only the axioms from the domain ontology does not contain any class X v γ(objvar) and the set of one empty ABox XP i does not contain any individuals it is straightforward that the theorem holds for the base case. • Induction Step: Rules with names WFDPComp, WFDPCon, itemAnnotationV, itemAnnotationSE and ontoPathPart These rules split WFDP-parts in smaller parts and describe in what order the smaller parts have to be applied to Xp or Op . How these rules split and apply the smaller parts is identical for the OntoRules and the SOABoxRules. Therefore, the theorem still holds after these rules are applied if the theorem hold for the rules describing how to apply the smaller parts to Xp and Op . In the following we prove that the theorem holds for these rules. • Induction Step: Rules with names objvar, objrel, vitemMinMax, vitemMin, vitemMax and vitem
22
Let var be an object variable from WFDP. Then these rules add an assertion for each individual α(var) from each ABox ∈ Xp and an axiom for the class γ(var) from Op according to one of the criteria described in Definition 5.2. Other individuals and classes do not change. It is obvious that the changed individuals still have a corresponding class if they had before and that the changed classes still have at least one corresponding individual in one of the ABoxes if they had before. Furthermore, it is obvious that the corresponding entities of individuals or classes, which are not changed are also not changed. – For rules with name objvar: Xp and Op are extended according to criteria 1: ∗ Each ABox in Xp is extended with concept assertion ontoClass(α(var)), ontoClass being a class description from the domain ontology. ∗ The ontology is extended with γ(var) v ontoClass. – For rules with name objRel: Xp and Op are extended according to criteria 2: ∗ Each ABox in Xp is extended with role assertion objProp(α(var), α(tarvar)), objProp being an object property from the domain ontology. ∗ The ontology is extended with γ(var) v (≥ 1 objProp.γ(tarvar)). – For rules with name vitem, vitemMin, vitemMax and vitemMinMax: Xp and Op are extended according to criteria 3: ∗ For each ABox A in Xp , it is created for each z ∈ datarange a new ABox A’ = A, which is extended with role assertion dataProp(α(var), z), dataProp being a datatype property from the domain ontology. Xp ’ is the set of all newly created ABoxes A’ ∗ The ontology is extended with γ(var) v (≥ 1 dataProp.datarange). • Induction Step: Rules with names specifyitem, existitem Let objvar be some object variable from WFDP, z some integer. In these rules for each leafsubclass L of X = γ(objvar) z new classes YLZ are created and YLZ v CZ is added to Op . Each CZ is either a class description from the domain ontology or a qualified number restriction of form (≥ 1 objProp.γ(tarvar)) where objProp is an object property from the domain ontology and tarvar an object variable from WFDP. For each ABox AO in Xp z new ABoxes AOZ are created from the original set of ABoxes by adding – CZ (α(objvar)) if CZ is any class description from the domain ontology – objProp(α(objvar), α(tarvar)) if CZ is (≥ 1 objProp.γ(tarvar))
23
That means the changes are either according to criteria 1 or 2 from Definition 5.2. The new set of ABoxes Xp ’ is the set of all ABoxes AOZ . Each of the changed individuals had at least one corresponding class L, therefore after the change it has a corresponding class YLZ . Since each class L had at least one corresponding individual in AO before the change, YLZ has at least one corresponding individual in an AOZ after the change. This also holds for all other classes, beside for those classes C which are backwards related to any class L (since L is now not any more a leaf subclass). Therefore, for each class C new subclasses are created, which have relations to the classes YLZ (this is done in the expansion step: Exp(Op , set of all classes L)). Then each class C has again a corresponding individual, which was before a corresponding individual of a class L. – for rules with name specifyitem for each possible value of the specifyitem with associated ontology class description ontoDesc and associated variable var it is created according to abstraction criteria 1: ∗ for each ABox A in Xp a new ABox AontoDesc = A with additional axiom ontoDesc(α(var)). ∗ for each leafsubclass of γ(var) a subclass XontoDesc with inclusion axiom XontoDesc v ontoDesc. Expansion step is conducted for the set of all classes XontoDesc . – for rules with name existitem: ∗ for answer possibility “yes” according to criteria 2 it is created: · for each of ABoxes in Xp a new ABox AY es with additional assertion objProp( α(srcvar),α(tarvar)) · for each leafsubclass of γ(srcvar) a class XY es with inclusion axiom XY es v (≥1 objProp.γ(tarvar)) ∗ for answer possibility “no” according to criteria 1 it is created: · for each of ABoxes in Xp a new ABox AN o with additional assertion ((=0 objProp.ontoClass)(α(srcvar)), with ontoClass being the type of the corresponding variable of the existitem. · for each leaf-subclass of γ(srcvar) a class XN o and inclusion axiom XN o v (=0 objProp.ontoClass) is added to Op ∗ Expansion step is conducted for the set of classes XN o and XY es . • Induction step: Rules with name condition The rule SAB-condition leaves all ABoxes in Xp0 which have a binding to the condition. The rule O-condition leaves all classes in Op0 which are sublcasses of the focal condition class and all classes which are backwards related to any of these classes. We prove that OP ’ is still an abstraction of XP ’ in 4 steps: 1. for each focal individual in each A ∈ Xp ’ at least one corresponding class exists in Op ’, which is shown in part 1 of theorem 5.5 and
24
2. for each focal leaf class from Op0 at least one corresponding individual exists in some ABox A ∈ Xp ’ , which is shown in part 2 of theorem 5.5. 3. for each non-focal individual in each A ∈ Xp ’ at least one corresponding class exists in Op ’. Proof: By definition each non-focal individual a in an ABox is backwards related to the focal individual focind of the ABox. Before the condition was applied according to lemma 5.6 to each of the corresponding classes of focind a class is backwards related, which is the corresponding class of a. Only those non-focal path classes are deleted, which are not backwards related to any focal path class. Thus it still exists a corresponding class for a, since it exists at least one corresponding class for focind as shown in step 1. 4. for each non-focal path leaf class in Op ’ at least one corresponding individual exists in at least one A ∈ Xp ’. Proof: By definition each class A in Op is backwards related to a focal path class FocClass. Before the condition was applied according to lemma 5.6 to each of the corresponding individuals of FocClass an individual is backwards related, which is the corresponding individual of A. Only those nonfocal individuals are deleted, for which the backwards related focal individual is deleted (because only complete ABoxes are deleted). Thus it still exists a corresponding individual for A, since it exists at least one corresponding individual for FocClass as shown in step 2.
Theorem 5.5. Let C be a condition, Xp be a set of ABoxes and ontology Op be its abstraction, after the preprocessing step for C, according to Theorem 5.4. 1. For each tuple (x, c) in a binding B for C to an ABox A ∈ Xp holds, that at least one of the corresponding classes X ∈ Sig(Op ) of individual x is a subclass of θ(c), where c is a variable from C, x an individual from A and X a class from OP . 2. If a focal path class in Op , which is a leafclass, is a subclass of the focal condition class, then the ABoxes in Xp of each corresponding individual are satisfied for the condition. Proof. We prove in the following part 1 (P1) of the theorem by induction over the atoms of a condition. We firstly show that for the base case, the trivial condition, P1 holds. Then we prove in the induction steps for each kind of condition atom that P1 holds for a condition C, which is constructed by adding the atom to a condition Co , for which P1 holds. Proof for part 2 of the theorem is similar. • Base case: The trivial condition is >(focConVar), where > is the DL top class and focConVar the focal condition variable. 25
For each nonempty ABox ∈ XP exactly one valid binding B to C exists, which is {(focInd, focConVar)}, where focInd is the focal individual of A. The only axiom in the condition ontology for C is θ(focConVar) ≡ >. Thus each corresponding class for focInd from OP is a subclass of θ(focvar), because each class is a subclass of >. • Induction step: Atom class(var) (1) C has a valid binding to an ABox A, if for a valid binding of Co to A, which contains the tuple (x, var), the additional constraint A |= class(x) holds, where x is an individual from A. If this constraint holds then each corresponding class for x is a subclass of C. (2) The condition ontology Oc for C is constructed from OcO by replacing θ(var) ≡ CECvar with θ(var) ≡ CECvar u C. (3) P1 holds for Co . From (2) follows that it also holds for C, if at least one corresponding class X ∈ Op of individual x, which is a subclass of the origin class CECvar is a subclass of CECvar u C. From (1) follows that that is the case for each corresponding class of x. • Induction step: Atom objProp(srcvar tarvar) (1) If C has a valid binding to an ABox A, then it is an extension of a valid binding of Co to A, which contains the tuple (x, srcvar), with additional tuple (y, tarvar) and the additional constraint A |= objP rop(x, y) holds, where x and y are individuals from A. If this constraint holds then each corresponding class for x is a subclass of (≥ 1 objProp.Y), where Y is a corresponding class of y. (2) The condition ontology Oc for C is constructed from OcO by replacing θ(srcvar) ≡ CECsrcvar with θ(srcvar) ≡ CECsrcvar u (≥ 1 objProp.θ(tarvar)) and by adding θ(tarvar) ≡ >. (3) P1 holds for Co . From (2) follows that it would also hold for C, if at least one corresponding class X ∈ Op of individual x, which is a subclass of the origin class CECvar is a subclass of CECvar u (≥ 1 objProp.θ(tarvar)). And a corresponding class for y exists, which is a subclass of >. From (1) follows that that is the case. • Induction step: Atom dataProp(srcvar tarvar) (1) If C has a valid binding to an ABox A, then it is an extension of a valid binding of Co to A, which contains the tuple (x, srcvar), with additional tuple (y, tarvar) and the additional constraint A |= dataP rop(x, y) holds, where x is an individual and y a constant from A. If this constraint holds then each corresponding class for x is a subclass of (≥ 1 dataProp). (2) The condition ontology Oc for C is constructed from OcO by replacing θ(srcvar) ≡ CECsrcvar with θ(srcvar) ≡ CECsrcvar u (≥ 1 dataProp). (3) P1 holds for Co . From (2) follows that it would also hold for C, if at least one corresponding class X ∈ Op of individual x, which is a subclass of 26
the origin class CECvar is a subclass of CECvar u (≥ 1 dataProp). From (1) follows that that is the case. • Induction step: Atom dataType(var) (1) A precondition to add atom dataType(var) to Co is that Co contains an atom dataProp(srcvar, var). If C has a valid binding to an ABox A, then it is an extension of a valid binding of Co to A, which contains the tuple (y, var) and constraint A |= dataRangeO(y) holds, with the additional constraint A |= dataRangeC(y), where y is a constant from A and dataRangeC = dataRangeO ∩ dataType. If such a binding exists for an ABox, then for each corresponding class X of x it has to hold: X v (≥1 dataProp.datarangeOnt) with some datarangeOnt with y ∈ datarangeOnt. (2) Condition ontology is extended by replacing the CEC of variable srcvar with (≥ 1 dataProp.datarangeC), when it was before applying the atom (≥ 1 dataProp.datarangO), with datarangeC = datarangeO ∩ dataType (3) Since the preprocessing step for a condition guarantees, that if (≥ 1 dataProp.datarangeC) in condition and x has a corresponding class X with superclass (≥1 dataProp.datarangeOnt) in Op and datarangeOnt overlaps with datarangeC then also a corresponding class for x exists with superclass (≥1 dataProp.datarangeOnt ∩ datarangeC). (4) P1 holds for Co . From (2) follows that it would also hold for C, if at least one corresponding class X ∈ Op of individual x, which is a subclass of the origin class CECvar is a subclass of CECvar u (≥ 1 dataProp.dataRangeC). From (1) and (3) follows that a corresponding class with X v (≥1 dataProp.datarangeOnt ∩ datarangeC) exists and since datarangeOnt ∩ datarangeC is subsumed by datarangeC, X v (≥1 dataProp.datarangeC) holds. • Induction step: Atom cmpOp(var, const) Similar as for rules dataType.
Lemma 5.6. Let ontology Op be an abstraction of set of ABoxes Xp and A be an ABox ∈ Xp . Then the following holds: 1. When X is a corresponding class for individual x in A and class Y is backwards related to X, then an individual y exists in A and y is backwards related to x and Y is a corresponding class of y. 2. When y is backwards related to x and X is a corresponding class of x, then a class Y exists in Op and Y is backwards related to X and y is a corresponding individual of Y. Proof. Proof follows from Theorem 5.4 and criteria 2 of Definition 5.2. 27
5.2.3
Proof of soundness and completeness
The developed algorithms should be complete, i.e. all inconsistencies listed in Sec. 4.4 are detected, and sound, i.e. only the described inconsistencies are detected. From the definition of inconsistencies follows soundness and completeness for the CCABoxes-algorithm. Completeness and soundness of the CCOntoalgorithm is proved by showing that it detects the same inconsistencies as the CCABoxes-algorithm. In the following we will show for each of the inconsistencies “Semantic Data Inconsistency”, “XOR-Stall” and “Unsatisfiable condition” that exactly those inconsistencies are detected with the CCOnto-algorithm as with the CCABoxes-algorithm. Detecting Semantic Data Inconsistencies We have to show that by checking the path ontology Op for consistency and subsumption of data ranges (s. Theorem 5.7) each inconsistency described in section 4.4.2 can be found that can be found by checking Xp for consistency wrt. the domain ontology, where Op is an abstraction of Xp . We prove that with Theorem 5.7. In part 1 of the theorem it is shown that each appearance of an inconsistency described in section 4.4.2 is detected by checking Op for consistency and subsumption of data ranges (Proof of completeness). In part 2 it is shown that if either Op is inconsistent or subsumption of data ranges holds then at least one of the possible ABoxes, which can result from execution of the workflow is inconsistent (Proof of soundness). Theorem 5.7. Let ontology Op be an abstraction of set of ABoxes Xp and OD be the appropriate domain ontology. Let WFDP be the workflow data path from which Op and Xp are derived. 1. If in at least one of the ABoxes in Xp one of the inconsistencies described in Section 4.4.2 occurs, then for Op one of the following axioms hold: • Inconsistent ontology: Op is inconsistent, when checked with a DL-Reasoner. • Subsumption of data ranges: It exists some number restriction DECR = (≥1 datarel.datarangeP) and {(Y v DECR), Y v γ(objvar)} ∈ Op \OD , where datarel is a data property from OD and objvar is an object variable in WFDP. It exists some data value restriction DVR = (∀ datarel.datarangeD) and (E v DVR) ∈ OD , {(X v E), (X v Y)} ∈ Op and datarangeP is not a subset of datarangeD, where E is a class from OD and X is a class from Op . 2. One of the axioms described in part 1 of this theorem can only hold for Op , if at least one ABox ∈ Xp is inconsistent. Proof of part 1. In the following we will prove for each inconsistency described in Section 4.4.2 that if it occurs in one of the ABoxes ∈ Xp then one of the axioms described in Theorem 5.7 holds. 28
1. Violation of disjoint class restrictions According to the definition the inconsistency occurs if at least one ABox A exists in Xp with A |={ontoClass(x); disOntoClass(x)}, where ontoClass and disOntoClass are disjoint class descriptions from the domain ontology. Then according to Theorem 5.4 a class X exists with Op |={X v ontoClass; X v disOntoClass}. Then X is not satisfiable thus Op is inconsistent. 2. Violation of number restrictions Let objRel be an object property from the domain ontology and n a non negative integer, ontClass and ontClassF be classes from the domain ontology. According to the definition of the inconsistency a number restriction from the domain ontology (OD |= ontClass v (≤ n objRel.ontClassF)) is violated if at least one ABox A in Xp exists with an individual x with A |= ontClass(x) and at least n+1 individuals yi with ∀yi=1..n+1 .Op |= {ontclassF( yi ); objRel(x, yi )} exist, with yi 6= yj for i 6= j. Then according to Theorem 5.4 in Op a class X exists with X v ontClass and n+1 classes Yi with ∀Yi=1..n+1 .Yi v ontClassF ∧ X v (≥1 objRel. Yi ) with Yi disjoint to Yj for i 6= j. Since ontClass is also a subclass of (≤ n objRel.ontClassF), X is not satisfiable thus Op is inconsistent. 3. Violation of domain, range or value restrictions, defined for the properties of the domain ontology. According to the definition a range restriction from the domain ontology (OD |= > v ∀ ontoRel.ontologyClass) is violated if at least one ABox A in Xp exists with individuals x and y with A |= {ontoRel(x,y); ¬ ontologyClass(y)}. Then according to Theorem 5.4, in Op classes X and Y exist and Op |= {(Y v ¬ ontologyClass), (X v (≥ 1 ontoRel.Y))}. Then X is not satisfiable thus Op is inconsistent. The proof is similar for value and domain restrictions. 4. Violation of Data Type restrictions According to the definition a datatype restriction from the domain ontology (OD |= C v ∀ datarel.datarangeD) is violated if at least one ABox A in Xp exists with an individual x and a constant y with A |= {datarel(x,y); C(x)} and y ∈ / datarangeD. Then according to Theorem 5.4, in Op a class X exists with X v (≥1 datarel.dataRangeP), X v C, with some dataRangeP and y ∈ dataRangeP and it holds ¬(datarangeP ⊆ datarangeD) Then the second axiom from Theorem 5.7 holds: datarangeP of the data number restriction (≥1 datarel.dataRangeP) that is a direct superclass of X = γ(objvar) is not a subset of the datarangeD of the data value
29
restriction (∀ datarel.datarangeD) with subclass X and with same data type relation. Note: If datarangeT is disjoint from datarangeD then Op is also inconsistent because X is a subclass of (∀ datarel.datarangeD) and (≥1 datarel.dataRangeT).
Proof of part 2. We prove for each axiom that it can only hold, if at least one of the ABoxes from Xp is inconsistent: • Inconsistent ontology: We have to prove that Op is only inconsistent if at least one of the ABoxes in the corresponding Xp is inconsistent. That means if all ABoxes in Xp are consistent then Op is consistent. An ontology of expressivity ALCQ(D) is consistent if all classes are satisfiable. A class C is satisfiable if a consistent ABox with some individual a and the assertion C(a) exists. It is obvious that we can add for a class X to each ABox containing a corresponding individual x the assertion X(x). The resulting ABox is consistent iff the original ABox was consistent. Since each class X in the path ontology has a corresponding individual x in at least one of the ABoxes in Xp , each class X from the path ontology is satisfiable if all ABoxes are consistent. Thus we have proved that Op is consistent if all ABoxes ∈ Xp are consistent. • Subsumption of data ranges: From datarangeD is not a subset of datarangeP it follows that a constant a exists with a ∈ datarangeD and a ∈ / datarangeP. From Y v (≥1 datarel.datarangeP) ∈ Op \OD and ((X v Y), (X v γ(objvar))) and Theorem 5.4 follows that an ABox A for each b ∈ datarangeP exists with A |= datarel(x,b), where X is the corresponding class for x. Then an ABox Ainc exists with Ainc |= {datarel(x,c), datarangeP(c)} and c ∈ / datarangeD. Since Op |= (X v ∀ datarel.datarangeD) from Theorem 5.4 it follows that Ainc |= (∀ datarel.datarangeD)(x). Then the ABox Ainc is inconsistent.
Detecting unsatisfiable Conditions Let Op be an abstraction of Xp . We have to show that in the step “Check satisfiability of conditions for DMCC” the same WFPs are deleted. Therefore we have to show, that a condition has no binding to any of the ABoxes in Xp iff the focal class of the condition ontology does not subsume any focal path class in Op . That follows from Theorem 5.5. Detecting XOR-Stalls Let Op be an abstraction of Xp and CS be a set of conditions (which can possibly occur at an XOR-split). Then we have to show that no Abox exists in Xp for which no condition from CS is satisfied iff for each 30
focal path class fom Op at least one focal condition class of a condition in CS exist, which subsumes it. That follows from Theorem 5.5.
6
Handling concurrency
So far we have only focused on workflows without concurrency. In this section we will explain the extension of the semantics for the workflow annotation and of the algorithms to workflows with concurrency.
6.1
Semantics of Workflow Annotation
When considering concurrency, a partial or complete execution of the workflow is not any more a linear sequence of tasks but an acyclic graph of tasks. We will call such a graph executed workflow instance in the following. To describe the necessary information to assemble the ABox for an executed workflow instance we introduce the notion of executed workflow data instance (EWDI). A workflow data instance consists of nodes and of edges. The nodes are the the begin task of the workflow and the end task of the execution and each AND-split or ANDJoin the workflow instance comprises. The edges are the executed workflow paths connecting the nodes. Therefore, we describe an EWDI by listing its edges with the nodes they start from and end to. The edges of the EWDI have to be ordered in a way that for each node A, it holds that edges ending at node A have to be listed before edges starting from node A. Since we do not consider cyclic workflows this order can be easily achieved. To derive the ABox for one EWDI we defined a set of rules called ConABoxrules, which extend the set of ABox-rules with a set of rules described in Appendix B.6. For each of the nodes an ABox is calculated by deriving it from the ABoxes of its predecessor nodes. The start node is initialized with the empty ABox. Each node exept the AND-join nodes has one direct predecessor node. For those nodes the ABox is derived by extending the ABox of its direct predecessor node with the corresponding EWDP. An AND-join node has more then one direct predecessor node. The ABox of an AND-join node is the union of the ABoxes of its direct predecessor nodes extended with the corresponding EWDP. The calculated ABoxes are stored in a list LA with their corresponding nodes. LA is initialized with the empty ABox for all nodes contained in EWDI. The ABox for the end node is the resulting ABox for the EWDI. Definitions of inconsistencies for concurrent workflows is the same as described in Sec. 4.4.
6.2
Consistency checking algorithms
Considering concurrency, a flow through the workflow is not any more a sequence of tasks but an acyclig graph called a workflow instance. Therefore, we adapt the CCABoxes-algorithm in order that it calculates for each such flow a set of ABoxes called Xf The necessary information to assemble Xf for the flow
31
is described as workflow data instance (WFDI), which describes as an acyclic graph the complete items and conditions for the flow. Similar as in the EWDI, the nodes of an WFDI are the begin task of the workflow and the end task of the execution and each AND-split or AND-Join the workflow instance comprises. The edges are the workflow data paths connecting the nodes. To derive Xf for a WFDI we defined a set of rules called ConSOABoxesrules, which extend the set of SOABoxes-rules with a set of rules described in Appendix B.7. For each of the nodes Xf is calculated by deriving it from the Xf of its predecessor nodes. The start node is initialized with the initial Xf i . For the nodes, except the AND-joins, Xf is derived from the Xf of its predecessor node extended with WFDP. Xf for an AND-join node is the union of each ABox in Xf of its direct predecessor nodes extended with the corresponding EWDP (as described in function joinXf ). The Xf for the end node is the resulting Xf for the EWDI. The abstraction of Xf is Of which is derived with rule system ConOntoRules, which extends the set of OntoRules with a set of rules described in Appendix B.8. The extension is similar to the one described for the CCABoxes-algorithm.
6.3
Abstraction criteria
The abstraction criteria hold for Xf and Of . That follows from Theorem 5.4 and the fact that the abstraction criteria also hold for Xf derived from function joinXf (Xf ’, Xf ”) and Of derived from joinOf (Of ’, Of ”), where Of ’ is an abstraction of Xf ’ and Of ’ is an abstraction of Xf ’.
7
Related Work
7.1
Consistency Checking for Workflows
Various consistency checking algorithms exist to check structural consistency (e.g. [VvdAtH07], [QXW+ 07] or [PM07]), but few integrate data (e.g. [SZNS06], [SSB07], [SOSF04] or [Esh02]). Sun et al. [SZNS06] claim to have developed the first complete framework to analyze data-flow anomalies in workflow management. They describe the data perspective for workflows with data-flow matrices to enable the detection of data-flow anomalies. They define three basic types of data flow errors: missing data, redundant data and conflicting data and provide an analytical approach for detecting and eliminating the three kind of inconsistencies. Comparison of our and their approach is not straightforward since they do not focus on tasks to collect data and concentrate more on structural data flow inconsistencies then on inconsistencies on the semantic level of the data. Nonetheless, we discuss in the following that in principle the inconsistencies they define are included in the inconsistencies we define or are not relevant in our setting: 1. Missing Data. A missing data inconsistency occurs if a data item is accessed before it is initialized. 32
In our setting data is only accessed in conditions. Checking for “Unsatisfiable conditions” includes checking for missing data. 2. Redundant Data is defined as data that is produced by an activity and is not the input of an other activity or does not contribute to the production of the final output data. In our setting the data collected on forms can be compared to their “data produced by an activity”. Our tasks in comparison to the activities of Sun’s framework have the aim to collect data. Therefore, in our setting it is not an inconsistency if data collected in the workflow is not used in the workflow itself. 3. Conflicting Data occurs if more then one activity tries to initialies the same data item. We currently omit that kind of inconsistency in avoiding that two items may have the same or equivalent associated variables. However we are able to detect conflicting data on the semantic level, which goes far beyond Sun’s approach. Sundari et al. [SSB07] extend the work of Sun et al. [SZNS06] by more complex control flow structures and introduce a new type of inconsistency called “redundant data in loops”. Since we do currently not consider loops in our workflow language this kind of error can not appear in our setting. Sadiq et al. [SOSF04] describe a data model, which they claim to be sufficient to detect data flow inconsistencies, however they do not describe the algorithms. It has been shown of Sun et al. [SZNS06] that the inconsistencies they define include the inconsistencies defined in Sadiq et al. [SOSF04]. None of the described approaches is able to describe the semantics of the data and use it for consistency checking. Therefore these approaches are not able to detect data inconsistencies, we defined and have limited capability to detect data dependent control flow inconsistencies. However, the control flow perspective of all described approaches is more expressive than the one currently used in SWOD. We are not aware of an algorithm, which is able to detect inconsistencies in complex data perspectives by utilizing semantic annotation.
7.2
Semantic Annotation
Nonetheless, semantic annotation and its applications are an active research field crucial for the realization of the semantic web. Existing work on semantic annotations for web services is especially interesting, since web services can be composed to workflows, also called composite services. Various formats for semantic annotations of Web Services have been defined (e.g [FL08], [MBJ+ 04]), which amongst others describe data, like input and output values, with metadata from ontologies. E.g. SAWSDL (Semantic Annotations for WSDL and XML), recommended by W3C, defines the syntax for semantic annotations, but does
33
not define a formal semantics. The main aims of SAWSDL is automatic service discovery and composition, but enriched with appropriate semantics it can be used for data consistency checking of complex workflows composed from web services. Therefore, in the future we aim to investigate how to integrate semantic web services in our work. However, currently we were interested in allowing collection of data and focused on the semantic annotation for forms.
7.3
Generating Forms from Ontologies
Few related work on generating forms from an ontology exist. These approaches mostly base the forms on the structure of the ontology and do therefore not allow for flexible item creation (e.g. Protege form generation [RKF+ 05]). More flexible approaches allow to define arbitrary items but do not allow to define relational metadata from the ontology, but only link items to classes (e.g. caCore FormBuilder [WR07]). Our approach allows flexibility during form creation through defining different item types, but nevertheless provides relational metadata, through associating paths from the ontology to each item.
8
Discussion
In this paper, we have presented a workflow language with an ontology based data perspective (SWOD) and an appropriate algorithm to detect inconsistencies in the workflow description. Implementation. A prototypical implementation of the CCOnto-algorithm is currently developed in Java using the OWL API [HBN07] and the description logic reasoner Pellet [SPG+ 07]. It is intended to integrate the implementation into ObTiMA, to support users in defining consistent treatment plans. Therefore, we are currently working on generating user-friendly error-messages from the results of the algorithm, which give users proposals to eliminate the detected inconsistencies. Prior to integration in ObTiMA performance tests will be performed to ensure the applicability of the algorithms for big treatment plans. Future Work. In the future we plan to extend the data perspective e.g. by allowing full expressivity of OWL DL 2 for the domain ontology, by allowing to create more complex items from the ontology and to define constants or constraints between items. We aim for a more expressive control flow perspective comprising e.g. cyclic workflows with respect to the extensive research on workflow-patterns [RtHvdAM06]. Furthermore, we plan to work on solutions how to consider time in the data and control flow perspectives. Conclusion. Ontology-based data perspectives in data intensive workflows are well suited to provide the basis for static data analysis. In this work we defined a semantics for a workflow ontology annotation based on ABoxes, which provides the basis for algorithms to find data inconsistencies. We have shown how to use ontologies to finitely abstract infinitely many ABoxes, representing
34
infinite data ranges in such an algorithm. Integrated with existing algorithms for checking structural consistency, the technique described in this paper can have the capability to guarantee soundness of complex workflows.
References [BCM+ 03]
F. Baader, D. Calvanese, D. McGuiness, D. Nardi, and P. PatelSchneider, editors. The Description Logic Handbook, Theory Implementation and Applications. Cambridge University Press, 2003.
[BWC+ 08]
M. Brochhausen, G. Weiler, C. Cocos, H. Stenzhorn, N. Graf, M Doerr, and M. Tsiknakis. The ACGT Master Ontology on Cancer - A new Terminology Source for Oncological Practice. In Proc. of 21st IEEE International Symposium on ComputerBased Medical Systems, pages 324–329, 2008.
[Esh02]
R. Eshuis. Semantics and verification of UML activity diagrams for workow modelling. PhD thesis, University of Twente, http://www.ctit.utwente.nl/library/phd/eshuis.pdf, 2002.
[FL08]
J. Farrell and H. Lausen. Semantic annotations for wsdl and xml schema. W3C Recommendation, 2008.
[HBN07]
M. Horridge, S. Bechhofer, and O. Noppens. Igniting the OWL 1.1 Touch Paper: The OWL API. In OWLED 2007, 3rd OWL Experienced and Directions Workshop, Innsbruck, Austria, June 2007.
[HPSB+ 04]
I. Horrocks, P. F. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, and M. Dean. Swrl: A semantic web rule language. W3C Member Submission, 2004.
[MBJ+ 04]
D. Martin, M. Burstein, Hobbs J., O. Lassila, D. McDermott, S. McIlraith, S. Narayanan, M. Paolucci, B. Parsia, T. Payne, E. Sirin, N. Srinivasan, and K. Sycara. Owl-s: Semantic markup for web services. W3C Member Submission, 2004.
[PM07]
S. Perumal and A. Mahanti. Applying graph search techniques for workflow verification. In Proc. 40th Annual Hawaii International Conference on System Sciences, 2007.
[QXW+ 07]
Yi Qian, Yuming Xu, Zheng Wang, Geguang Pu, Huibiao Zhu, and Chao Cai. Tool Support for BPEL Verification in ActiveBPEL Engine. In Proceedings of the 2007 Australian Software Engineering Conference, pages 90–100, 2007.
35
[RKF+ 05]
D. Rubin, H. Knublauch, R. Fergerson, D. Olivier, and M. Musen. Protege-owl: Creating ontology-driven reasoning applications with the web ontology language. In AMIA Annu Symp Proc., 2005.
[RtHvdAM06] N. Russel, ter Hofstede, W.M.P van der Aalst, and N Mulyar. Workflow control-flow patterns: A revised view. Technical report, BPMcenter.org, 2006. [SOSF04]
S. Sadiq, M. Orlowska, W. Sadiq, and C. Foulger. Data flow and validation in workflow modeling. In Proc. 15th Australasian database conference, volume 27, pages 207–214, 2004.
[SPG+ 07]
B. Sirin, B. Parsia, B.C. Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical OWL-DL reasoner. Journal of Web Semantics, 5(2), 2007.
[SSB07]
M. Sundari, A. Sen, and A. Bagchi. Detecting data flow errors in workflows: A systematic graph traversal approach. In Proc 17th Workshop on Information Technology & Systems, 2007.
[SZNS06]
X.S. Sun, J.L. Zhao, J.F. Nunamaker, and O.R.L Sheng. Formulating the data-flow perspective for business process management. Information Systems Research, 17(4):374–391, December 2006.
[TBN+ 08]
M. Tsiknakis, M. Brochhausen, J. Nabrzyski, J. Pucacki, S.G. Sfakianakis, G. Potamias, C. Desmedt, and D. Kafetzopoulos. A semantic grid infrastructure enabling integrated access and analysis of multilevel biomedical data in support of postgenomic clinical trials on cancer. IEEE Transactions on Information Technology in Biomedicine, 12(2):205–217, 2008.
[VvdAtH07]
H.M.W. Verbeek, W.M.P van der Aalst, and A.H.M. ter Hofstede. Verifying Workflows with Cancellation Regions and ORjoins: An Approach Based on Relaxed Soundness and Invariants. Computer Journal, 50(3):294–314, 2007.
[WBG+ 07]
G. Weiler, M. Brochhausen, N. Graf, F. Schera, A. Hoppe, and S. Kiefer. Ontology based data management systems for postgenomic clinical trials within a european grid infrastructure for cancer research. In Proc. of the 29 Annual International Conference of the IEEE EMBS, pages 6434–6437, August 2007.
[WR07]
S. Whitley and D. Reeves. Formbuilder: A tool for promoting data sharing and reuse within the cancer community. In Oncological Nursing Forum, volume 34, 2007.
36
A
Abstract Syntax
In the following we describe the abstract syntax for SWOD, building on this syntax afterwards we describe the abstract syntax for the data structures needed for the algorithms.
A.1
SWOD
//named or complex class from ontology class::=name; //DataProperty from ontology dataProp::= name; //objectProperty from ontology objProp::= name; //data types from OWL dataType::= ”integer”|”string’’... type ::= class | dataType rel ::= objProp | dataProp //SWOD workflow description swodWfd ::= WorkflowDescription(wfTemp wfa) //workflow annotation wfa ::= WorkflowAnnotation(wfID focPointDesc fa+ conditionAnnot∗) fa::= FormAnnotation(formID itemAnnotation+) //item annotation itemAnnotation ::= ItemAnnotation(itemID ontoPath itemConstructor); ontoPath ::= OntoPath(focPointDesc ontoPathPart); ontoPathPart ::= (relAssert varAssert)∗; itemConstructor ::= existItem|specifyItem|valueItem; valueItem ::= Value(dvar Min(minv)? Max(maxv)?); specifyItem ::= Specify(ovar cases); cases ::= case | cases case; case ::= Case(acode ontdesccr); existItem ::= Exist(ovar); focPointDesc ::= varAssert; relAssert ::= rel(srcvar tarvar); varAssert ::= type(var);
//conditions conditionAnnot ::= ConditionAnnotation(conID condition); condition ::= objAtoms|dataAtoms | objAtoms ∧ dataAtoms; objAtoms ::= objProp(x y) | class(x) | objProp(x y) ∧ objAtoms | class(x) ∧ objAtoms dataAtoms ::= relatedDataAtoms ∧ dataAtoms| relatedDataAtoms relatedDataAtoms ::= dataProp(x y) ∧ dataConstraintAtoms | dataProp(x y) dataConstraintAtoms ::= dataConstraintAtoms ∧ dataType(x) | dataConstraintAtoms ∧ cmpOp(x y)
37
| cmpOp(x y) | dataType(x) cmpOp ::= ≤ | ≥ | = 6 |=| //starting rule for workflow template wftemp ::= WfTemplate(wfID task+ transition∗) task::= Task(taskID ’splitXOR’? ’joinXOR’? ’isStartTask’? ’isEndTask’? form) // the conditionID is only relevant for transitions that have // as source task an XOR−split task transition ::= Transition(taskID taskID conditionID?) item::= Item(itemID dataType query possibleAnswer+ minValue? maxValue?) form::= Form(formID name item+ )
A.2
data structures for algorithms
//executed workflow data path ewdp::= ewdp filledItem | filledItem; filledItem := itemAnnotation value; //workflow data path wfdp::= wfdp itemAnnotation| wfdp conditionAnnot | itemAnnotation | conditionAnnot;
nodeType::= ”join”|”nonjoin” wfiNode::= WfiNode(type, id) ewfiPart ::= EwfiPart(wfiNode ewdp wfiNode) wfiPart ::= WfiPart(wfiNode wfdp wfiNode) //executed workflow data instance ewfdi ::= ExWFDI(ewfipart ewfdi) | ewfipart //workflow data instance wfdi ::= WFDI(wfiPart wfdi) | wfdi
B
Rules
In the following the rules, which are used in the algorithms and to define the semantics of SWOD are listed.
B.1
ABoxRules
AB-ewdpA
A ` filledItem −→ A0 A0 ` ewdp −→ A00 A ` filledItem ewdp −→ A00
AB-compItem
A ` focPointDecl −→ A0 A0 ` ontoPathPart −→ A00 00 A ` itemConstructor relAssert varAssert value −→ A000 A ` ItemAnnotation(itemID OntoPath(focPointDesc ontoPathPart relAssert varAssert) itemConstructor) value −→ A000
38
AB-OntoPathpart
A ` varAssert −→ A0 A ` relAssert −→ A00 A00 ` ontoPathPart −→ A000 A ` relAssert varAssert ontoPathPart −→ A000 0
AB-OntoPathpartA
ontoPathPart = ”” A0 = A A ` ontoPathPart −→ A0
AB-objvar
A0 = A ∪ {ontoClass(α(var, focalid))} A ` ontoClass(var) −→ A0
AB-objrel
A0 = A ∪ {rel(α(srcvar, focalid), α(tarvar, focalid))} A ` rel(srcvar, tarvar) −→ A0
AB-vitem
A0 = A ∪ {rel(α(srcvar,focalid), value)} A ` Value(dvar Min(minv)? Max(maxv)?) rel(srcvar tarvar) varAssert value −→ A0
AB-specifyitem
A ` varAssert −→ A0 A ` relAssert −→ A00 A00 ` ovar cases value −→ A000 A ` Specify(ovar cases) relAssert varAssert value −→ A000 0
AB-specifyitemA
A ` ovar cases value −→ A0 A = A ∪ {ontdescr(α(ovar, focalid))|acode = value} A ` ovar cases Case(acode ontdescr) value −→ A00 00
0
AB-specifyitemB
A0 = A ∪ {ontdescr(α(ovar, focalid))|acode = value} A ` ovar Case(acode ontdescr) value −→ A0
AB-exitem
A ` type(var) −→ A0 A = A ∪ {rel(α(srcvar, focalid),α(tarvar, focalid))|value = “true”} A000 = A00 ∪ {(= 0 objVar.type)(α(srcvar, focalid))|value = “false”} A ` Exist(ovar) rel(srcvar tarvar) type(var) −→ A000 00
B.2
0
SOABoxRules
In the following the rules to derive the set of possible ABoxes for one workflow path, which are used in the CCABoxes-algorithm are listed.
39
SAB-wfdpComp
Xp0
Xp ` wfdp −→ Xp0 6= ERROR(kind, id) Xp0 ` itemAnnotation −→ Xp00 Xp ` wfdp itemAnnotation −→ Xp00
SAB-wfdpCon
Xp0
Xp ` wfdp −→ Xp0 6= ERROR(kind, id) Xp0 ` conditionAnnot −→ Xp00 Xp ` wfdp conditionAnnot −→ Xp00
SAB-ItemAnnotationV
Xp ` focPointDesc −→ Xp0 Xp0 ` ontoPathPart −→ Xp00 00 Xp ` itemConstructor relAssert varAssert −→ Xp000 Xp0000 = checkSemInc(Xp000 )
Xp ` ItemAnnotation(OntoPath(focPointDesc ontoPathPart relAssert varAssert) itemConstructor) −→ Xp0000 SAB-ItemAnnotationSE
Xp ` focPointDesc −→ Xp0 Xp0 ` ontoPathPart −→ Xp00 00 000 Xp ` varAssert −→ Xp Xp000 ` relAssert −→ Xp0000 0000 00000 Xp ` itemConstructor −→ Xp Xp∗ = checkSemInc(Xp00000 )
Xp ` ItemAnnotation(itemID OntoPath(focPointDesc ontoPathPart relAssert varAssert) itemConstructor) −→ Xp∗ SAB-ontoPathPart
Xp0
Xp ` varAssert −→ Xp0 ` relAssert −→ Xp00 Xp00 ` ontoPathPart −→ Xp000 Xp ` relAssert varAssert ontoPathPart −→ Xp000
SAB-OntoPathpartA
ontoPathPart = ”” X0 = X X ` ontoPathPart −→ X 0
SAB-objvar
Xp0 = {Ar |∀A ∈ Xp .Ar = A ∪ type(α(var))} Xp ` type(var) −→ Xp0
SAB-objrel
Xp0 = {Ar |∀A ∈ Xp .Ar = A ∪ rel(α(srcvar), α(tarvar)) Xp ` rel(srcvar tarvar) −→ Xp0
SAB-vitemMinMax
Xp0 = {Ar |∀A ∈ Xp .∀z ∈ ontoDataT ype.Ar = A ∪{rel(α(srcvar), z)} ∧ minv < z < maxv} Xp ` Value(iID dvar Min(minv) Max(maxv)) rel(srcvar tarvar) ontoDataType(var) −→ Xp0
40
SAB-vitemMin
Xp0 = {Ar |∀A ∈ Xp .∀z ∈ ontoDataT ype.Ar = A ∪{rel(α(srcvar), z)} ∧ minv < z} Xp ` Value(iID dvar Min(minv)) rel(srcvar tarvar) ontoDataType(var) −→ Xp0
SAB-vitemMax
Xp0 = {Ar |∀A ∈ Xp .∀z ∈ ontoDataT ype.Ar = A ∪{rel(α(srcvar), z)} ∧ z < maxv} Xp ` Value(iID dvar Max(maxv)) rel(srcvar tarvar) ontoDataType(var) −→ Xp0
SAB-vitem
Xp0 = {Ar |∀A ∈ Xp .∀z ∈ ontoDataT ype.Ar = A ∪ {rel(α(srcvar), z)}} Xp ` Value(iID dvar) rel(srcvar tarvar) ontoDataType(var) −→ Xp0
SAB-specifyitem
Xp0 = {Ar |∀A ∈ Xp .∀case.Ar = A ∪ {ontdescr(α(ovar)}} cases::=Case(acode ontdescr)* Xp ` Specify(iID ovar cases) −→ Xp0
SAB-exitem
Xp0 = {Ar |∀A ∈ Xp .Ar = A ∪ {rel(α(srcvar),α(tarvar)) } ∪{As |∀A ∈ Xp .As = A ∪ {(= 0 rel.ontoClass)(α(srcvar))}\{rel(α(srcvar),α(tarvar))}} Xp ` Exist(iID ovar) rel(srcvar tarvar) ontoClass (ovar) −→ Xp0
SAB-condition
Xp0 = {A|A ∈ Xp ∧ Bin(A, condition) 6= ∅)
Xp ` ConditionAnnotation(conID condition) −→ Xp0
B.3
InfRules-Bin
In the following the rules to derive a binding for an ABox and a condition, which are used in the CCABoxes-algorithm are listed. Bin-cond
Bin ` objAtoms −→ Bin0 Bin0 ` dataAtoms −→ Bin00 Bin, A ` objAtoms ∧ dataAtoms −→ Bin00 , A
Bin-objAtomsA
Bin ` objProp(x y) −→ Bin0 Bin0 ` objAtoms −→ Bin00 Bin, A ` objProp(x y) ∧ objAtoms −→ Bin00 , A
Bin-objAtomsB
Bin ` class(x) −→ Bin0 Bin0 ` objAtoms −→ Bin00 Bin, A ` class(x) ∧ objAtoms −→ Bin00 , A
41
Bin-class
∀B ∈ Bin.∃(b, x) ∈ B
Bin0 = {B|B ∈ Bin.∃(b, x) ∈ B ∧ (A |= class(b))} Bin, A ` class(x) −→ Bin0 , A
Bin-objPropNE
∀B ∈ Bin.∃(b, x) ∈ B ∧ @(d, y) ∈ B Bin0 = {B|∀B 0 ∈ Bin.∀d.(b, x) ∈ B 0 ∧ (A |= objP rop(b, d)) ∧ B = B 0 ∪ (d, y)} Bin, A ` objProp(x y) −→ Bin0 , A
Bin-obPropE
∀B ∈ Bin.∃(b, x) ∈ B ∧ ∃(d, y) ∈ B Bin0 = {B|∀B ∈ Bin.(d, y) ∈ B ∧ (b, x) ∈ B ∧ (A |= objP rop(b, d))} Bin, A ` objProp(x y)) −→ Bin0 , A
Bin-dataAtoms
Bin ` relatedDataAtoms −→ Bin0 Bin0 ` dataAtoms −→ Bin00 Bin, A ` relatedDataAtoms ∧ dataAtoms −→ Bin00 , A
Bin-relDataAtoms
∀B ∈ Bin.∃(b, x) ∈ B ∧ @(d, y) ∈ B Bin0 = {B|∀B 0 ∈ Bin.∀d.(b, x) ∈ B 0 ∧ (A |= dataP rop(b, d)) ∧ B = B 0 ∪ (d, y)} Bin0 ` dataConstraintAtoms −→ Bin00 Bin, A ` dataProp(x y) ∧ dataConstraintAtoms −→ Bin00 , A
Bin-dataProp
∀B ∈ Bin.∃(b, x) ∈ B ∧ @(d, y) ∈ B Bin0 = {B|∀B 0 ∈ Bin.∀d.(b, x) ∈ B 0 ∧ (A |= dataP rop(b, d)) ∧ B = B 0 ∪ (d, y)} Bin, A ` dataProp(x y) −→ Bin0 , A
Bin-dataConAtomsA
Bin ` dataConstraintAtoms −→ Bin0 Bin0 ` dataType(x) −→ Bin00 Bin, A ` dataConstraintAtoms ∧ dataType(x) −→ Bin00 , A
Bin-cond
Bin ` dataConstraintAtoms −→ Bin0 Bin0 ` cmpOp(x y) −→ Bin00 Bin, A ` dataConstraintAtoms ∧ cmpOp(x y) −→ Bin00 , A
Bin-dataType
∀B ∈ Bin.∃(b, x) ∈ B Bin0 = {B|B ∈ Bin.(b, x) ∈ B ∧ (A |= dataT ype(b))} Bin, A ` dataType(x) −→ Bin0 , A
Bin-cmpOp
∀B ∈ Bin.∃(b, x) ∈ B Bin0 = {B|B ∈ Bin.(b, x) ∈ B ∧ (b cmpOp y)} Bin, A ` cmpOp(x y) −→ Bin0 , A
42
B.4
OntoRules
In the following rules to derive the path ontology for one workflow path, which are used in the CCOnto-algorithm are listed. O-wfdpComp
Op0
Op ` wfdp −→ Op0 6= ERROR(kind, id) Op0 ` itemAnnotation −→ Op00 Op ` wfdp itemAnnotation −→ Op00
O-wfdpCon
Op0
Op ` wfdp −→ Op0 6= ERROR(kind, id) Op0 ` conditionAnnot −→ Op00 Op ` wfdp conditionAnnot −→ Op00
O-ItemAnnotationV
Op ` focPointDesc −→ Op0 Op0 ` ontoPathPart −→ Op00 00 Op ` itemConstructor relAssert varAssert −→ Op000 Op0000 = checkSemInc(Op000 )
Op ` ItemAnnotation(OntoPath(focPointDesc ontoPathPart relAssert varAssert) itemConstructor) −→ Op0000 O-ItemAnnotationSE
Op ` focPointDesc −→ Op0 Op0 ` ontoPathPart −→ Op00 00 000 Op ` varAssert −→ Op Op000 ` relAssert −→ Op0000 0000 00000 Op ` itemConstructor −→ Op Op∗ = checkSemInc(Op00000 )
Op ` ItemAnnotation(OntoPath(focPointDesc ontoPathPart relAssert varAssert) itemConstructor) −→ Op∗ O-OntoPathpartA
ontoPathPart = ”” O0 = O O ` ontoPathPart −→ O0
O-ontoPathPart
Op0
Op ` varAssert −→ Op0 ` relAssert −→ Op00 Op00 ` ontoPathPart −→ Op000 Op ` relAssert varAssert ontoPathPart −→ Op000
O-objVar
Op000
=
Op00
Op0 = Op ∪ Op0 ∪ {γ(var) v ontoClass} ∪ {(γ(var) v ¬C) | (C ∈ Sig(OP00 )) ∧ (OP00 |= C 6v γ(var))} Op ` ontoClass (var) −→ Op000
O-objRel
Op0 = Op ∪ {γ(srcvar) v (≥ 1 rel.γ(tarvar))} Op ` rel(srcvar tarvar) −→ Op00
43
O-vitemMinMax
Op0 = Op ∪ {γ(srcvar) v ∃rel.ontoDataType[ > minv, < maxv]} Op ` Value(iID dvar Min(minv) Max(maxv)) rel(srcvar tarvar) ontoDataType(var) −→ Op0
O-vitemMin
Op0 = Op ∪ {γ(srcvar) v ∃rel.ontoDataType[ > minv]} Op ` Value(iID dvar Min(minv)) rel(srcvar tarvar) ontoDataType(var) −→ Op0
O-vitemMax
Op0 = Op ∪ {γ(srcvar) v ∃rel.ontoDataType[ < maxv]} Op ` Value(iID dvar Max(maxv)) rel(srcvar tarvar) ontoDataType(var) −→ Op0
O-vitem
Op0 = Op ∪ {γ(srcvar) v ∃rel.ontoDataType}
Op ` Value(iID dvar) rel(srcvar tarvar) ontoDataType(var) −→ Op0 O-specifyitem
cases::=Case(acode ontdescr)* z = {E ∈ Sig(Op )|@D.E v γ(ovar) ∧ D @ E} Op0 = Op ∪ {(F v E), (F v ontdescr)|∀E ∈ z.∀case.F = newClass()} Op00 = Expand(Op0 , z) Op ` Specify(iID ovar cases) −→ Op00 O-exitem
z = {E ∈ Sig(Op )|@D.E v γ(srcvar) ∧ D @ E} Op0 = Op ∪ {(F v E), (F v (≥1 rel.γ(tarvar)))|∀E ∈ z.F = newClass()} ∪{(F v E), (F v (=0 rel.ontoClass))|∀E ∈ z.F = newClass()}\ {γ(srcvar) v (≥ 1 rel.γ(tarvar)))} Op00 = Expand(Op0 , z) Op ` Exist(iID ovar) rel(srcvar tarvar) ontoClass(ovar) −→ Op00
O-condition
Op0
Op0 = P rep(Op , condition) = Op \{(X v Y )|@Z.Z v X ∧ Z v f oc(condition)}
Op000 = Delete(Op00 )
Op ` ConditionAnnot(conID condition −→ Op000
B.5
InfRules-CO
In the following rules to derive a condition ontology, which are used in the CCOnto-algorithm are listed. TC-cond
0 0 00 Ocond ` objAtoms −→ Ocond Ocond ` dataAtoms −→ Ocond 00 Ocond ` objAtoms ∧ dataAtoms −→ Ocond
44
TC-objAtomsA
0 0 00 Ocond ` objProp(x y) −→ Ocond Ocond ` objAtoms −→ Ocond 00 Ocond ` objProp(x y) ∧ objAtoms −→ Ocond
TC-objAtomsB
0 0 00 Ocond ` class(x) −→ Ocond Ocond ` objAtoms −→ Ocond 00 Ocond ` class(x) ∧ objAtoms −→ Ocond
TC-class
∃CECx |((θ(x) ≡ CECx ) ∈ Ocond ) 0 Ocond = Ocond \{θ(x) ≡ CECx } ∪ {θ(x) ≡ CECx u class} 0 Ocond ` class(x) −→ Ocond
TC-objPropNE
∃CECx |((θ(x) ≡ CECx ) ∈ Ocond ) ∧ @CECy |((θ(y) ≡ CECy ) ∈ Ocond ) 0 Ocond = Ocond ∪ {θ(y) ≡ >} 00 0 Ocond = Ocond \{θ(x) ≡ CECx } ∪ {θ(x) ≡ CECx u ≥ 1 objProp.θ(y)} 00 Ocond ` objProp(x y) −→ Ocond
TC-objPropE
∃CECx |((θ(x) ≡ CECx ) ∈ Ocond ) ∧ ∃CECy |((θ(y) ≡ CECy ) ∈ Ocond ) 0 Ocond = Ocond \{θ(x) ≡ CECx } ∪ {θ(x) ≡ CECx u ≥ 1 objProp.θ(y)} 0 Ocond ` objProp(x y) −→ Ocond
TC-dataAtoms
0 0 00 Ocond ` relatedDataAtoms −→ Ocond Ocond ` dataAtoms −→ Ocond 00 Ocond ` relatedDataAtoms ∧ dataAtoms −→ Ocond
TC-relDataAtoms
∃CECx |((θ(x) ≡ CECx ) ∈ Ocond ) R=> R ` dataConstraintAtoms −→ R00 0 Ocond = Ocond \{θ(x) ≡ CECx } ∪ {θ(x) ≡ CECx u ≥ 1 dataProp.R0 } 00 Ocond ` dataProp(x y) ∧ dataConstraintAtoms −→ Ocond
TC-dataProp
∃CECx |((θ(x) ≡ CECx ) ∈ Ocond ) 0 Ocond = Ocond \{θ(x) ≡ CECx } ∪ {θ(x) ≡ CECx u ≥ 1 dataProp} 0 Ocond ` dataProp(x y) −→ Ocond
TC-dataConAtomsA
R ` dataConstraintAtoms −→ R0 R0 ` dataType(x) −→ R00 R ` dataConstraintAtoms ∧ dataType(x) −→ R00
TC-dataConAtomsB
R ` dataConstraintAtoms −→ R0 R0 ` cmpOp(x y) −→ R00 00 Ocond ` dataConstraintAtoms ∧ cmpOp(x y) −→ Ocond
45
TC-datatype
R0 = R ∩ dataType R ` dataType(var) −→ R0
TC-cmpOp
R0 = R ∩ [cmpOp const]) R ` cmpOp(var const) −→ R0
B.6
ConABoxRules
CA-ewfi
LA ` ewfipart −→ LA0 LA0 ` ewfdi −→ LA00 LA ` ExWFDI(ewfipart ewfdi) −→ LA00
CA-nojoin
A = getA(LA, srcid) A ` ewdp −→ A0 LA0 = setA(LA, A0 , tarid) LA ` EwfiPart(WfiNode(type, srcid) ewdp WfiNode(type, tarid)) −→ LA0
CA-join
A = getA(LA, srcid) A0 = getA(LA, tarid) 0 00 A ` ewdp −→ A LA = setA(LA, A0 ∪ A00 , tarid) LA ` EwfiPart(WfiNode(type, srcid) ewdp WfiNode(“join”, tarid)) −→ LA0
B.7
ConSOABoxRules
CS-wfi
LX ` wfipart −→ LX 0 LX 0 ` wfdi −→ LX 00 LX ` WFDI(wfipart wfdi) −→ LX 00
CS-nojoin
X = getX (LX , srcid) X ` wfdp −→ X 0 LX 0 = setX (LX , X 0 , tarid) LX ` WfiPart(WfiNode(type, srcid) wfdp WfiNode(type, tarid)) −→ LX 0
CS-join
X = getX (LX , srcid) X 0 = getX (LX , tarid) 0 00 X ` wfdp −→ X LX = setX (LX , joinX (X 0 , X 00 ), tarid) LX ` WfiPart(WfiNode(type, srcid) wfdp WfiNode(“join”, tarid)) −→ LX 0
B.8
ConOntoRules
CO-wfi
LO ` wfipart −→ LO0 LO0 ` wfdi −→ LO00 LO ` WFDI(wfipart wfdi) −→ LO00
46
CO-nojoin
O = getO(LO, srcid) O ` wfdp −→ O0 LO0 = setO(LO, O0 , tarid) LO ` WfiPart(WfiNode(type, srcid) wfdp WfiNode(type, tarid)) −→ LO0
CO-join
O = getO(LO, srcid) O0 = getO(LO, tarid) 0 00 O ` wfdp −→ O LO = setO(LO, joinO(O0 , O00 ), tarid) LO ` WfiPart(WfiNode(type, srcid) wfdp WfiNode(“join”, tarid)) −→ LO0
C
Description of functions for rules
//for each leafsubclass C, which is directly related to one of //SuperClasses S a new class N for each leafsubclass LS of S //is created and N is related to the appropriate LS and //becomes a subclass of C. Expand(Op , SuperClasses){ foreach SuperClass ∈ SuperClasses{ SetOfSubClasses = leafsubclasses of SuperClass foreach leafsubclass C of any class γ(objvar) with (C v (≥1 objrel.SuperClass)) ∈ Op { foreach class Z ∈ SetOfSubClasses{ N = newClass() Op = Op ∪ {(N v (≥1 objrel.Z)), (N v C)} } Expand(Op , {C}) } } } //for each restriction R = (≥ 1 datarel.datarangeC) which is part //of any class in condition ontology, for each class X in Op which //has direct superClass (≥1 datarel.datarangeP) and datarangeC overlaps with //datarangeP and ¬(datarangeP ⊆ datarangeC) //two subclasses of each leafsubclass of X are created from which one //is subclass of (≥1 objrel.datarangeP ∩ datarangeC)) Preprocess(OWLOntology Op , Condition condition){ foreach (≥ 1 datarel.datarangeC) which is part of any class in condition ontology{ foreach class X in Op which has direct superClass (≥1 datarel.datarangeP) and datarangeC overlaps with datarangeP and !(datarangeP ⊆ datarangeC){ foreach leafsubclass L of X{ N = newClass(); O = newClass(); Op = Op ∪ {(N v (≥1 objrel.datarangeP ∩ datarangeC)), (N v X), (O v X)} } } }
47
} //deletes all classes which are not any more related to a subclass //of focal path class Delete(OWLOntology Op ) //determine individual for object variable and focalID α(objectVariable ovar, Id focalID) //determine individual for object variable with constant focalID α(objectVariable ovar) //determine path class for object variable γ(objectVariable ovar) //determine condition class for condition variable θ(conditionVariable conVar)
//returns all top path classes in O, which are not C getTopPathClasses(OWLOntology O, OWLClass C)
Set of ABoxes joinX (X , X ’){ newX = {} foreach ABox A ∈ X { foreach ABox A’ ∈ X ’{ newA = A ∪ A’ newX = newX ∪ {newA} } } return newX } ontology joinO(ontology O, ontology O0 ){ newO = O ∩ O’ while there is an unprocessed objectvar unprocObjVar occuring in O or O’, which has no restriction where the filler is a subclass of an unprocessed objectvar, process unprocObjVar:{ if it does not exist a subclass of α(unprocObjVar), which is in Sig(O) and is not in Sig(newO){ foreach sub− and superclass axiom ax containing a subclass of α(unprocObjVar)of O’{ newO = newO ∪ ax } }else if it does not exist a subclass of α(unprocObjVar), which is in Sig(O’) and is not in Sig(newO){ foreach sub− and superclass axiom ax containing a subclass of α(unprocObjVar)of
48
O’{ newO = newO ∪ ax } }else { foreach leafsubclass LC of α(objvar) in Sig(O){ foreach leafsubclass LC’ of α(objvar) in Sig(O’){ create a new named class newLC determine all superclasses of LC from Sig(O) and of LC’ from Sig(O0 ), which are not named classes of Sig(O) ∪ Sig(O’), and put them into a list lisSup for all restrictions in lisSup, which contain a class, which is not in Sig(newO): replace class with appropriate newly created class foreach leafsubclass leafSub of α(unprocObjVar) in Sig(newO){ newO = newO ∪ newLC v leafSub foreach class lisCl in lisSup{ newO = newO ∪ newLC v lisSub } } } } } } } //focal variable of condition foc(condition) = θ (focalvar)
49