A Data Quality Approach to Conformance Checks for

0 downloads 0 Views 661KB Size Report
conformance checks for discovered raw data, used to compute business .... tency of the structure and the content of the facts are checked. That means the ...
A Data Quality Approach to Conformance Checks for Business Network Models Daniel Ritter

Stephanie Rupprich

SAP AG, Technology Development, Dietmar-Hopp-Allee 16, 69190 Walldorf, Germany Email: [email protected]

University Heidelberg, Grabengasse 1, 69117 Heidelberg, Germany Email: [email protected]

Abstract—The recent advances in programming languages, compilers and hardware allow software systems to process huge amounts of data. Therefore data is represented in domainspecific models, capturing real-world artifacts as structured data for calculation and analysis. To guarantee data quality and reduce uncertainty within the domain information, model and content-based data conformance have to be checked. Domain model changes often require code changes for validations and redeployment to keep the pace. Current programming techniques and tools do not or only partially address this topic. In this paper, we present a novel approach to content-based validation of structured data for arbitrary domains. We define a validation programming model and show how a compiler and run-time system based on Deterministic Finite Automata (DFA) can be generated. We validated this approach by applying it to the domain of Network Mining (NM), which requires conformance checks for discovered raw data, used to compute business networks. The validation programs are illustrated by debugging the run-time system with textual and graphical tools.

I. I NTRODUCTION Enterprises are part of value chains consisting of business processes with intra and inter enterprise stakeholders. To remain competitive, enterprises need insight into their business network and ideally into the relevant parts of partner and customer networks and their processes. However, currently the insight often ends at the borders of systems or enterprises. Business Network Management (BNM) helps to overcome this situation and allows companies to get insight into their (technical) integration, social and business relations [20], [22]. The model used to capture real-world entities that constitute such a network consists of linked integration and business data [18], [19]. The data stands for information about e.g. applications, business processes, (integration) middleware, and system landscapes, which is hidden in the different domains. This raw data is discovered by Network Mining (NM), i.e. part of BNM, and represented as Datalog facts [25], which allows reasoning on the data to re-construct the business network, consisting of nodes, called system discx and edges, called message flow discx [21]. Further entities are named subsequently while defining our approach. For the inference task, the quality and conformance of the raw data is important. The declarative approaches, e.g. RDF [23] in linked data domain or Datalog [25] in case of BNM, do not expose a domain model. Contrary, domain models are represented in

declarative languages. Hence conformance checks according to the model cannot be done in an xml schema validation-or database-like manner. In this paper, we propose a general approach to domain model conformance checks using Deterministic Finite Automata (DFA). We present and evaluate our approach by applying it to the NM domain, which covers common linked data, semantic web and quality issues. Our approach describes how data from complex linked data models can be checked for conformance using a validation specific programming language, its compile and runtime aspects. Hereby we show how validation programs can be written, compiled, executed and debugged on sample data from NW represented as Datalog facts. In a model-driven way, this approach allows even non-technical users to write custom conformance checks on structured data. Section II introduces the domain and links to related work, which is further discussed in Section III. Section IV describes basic design principles, while the programming model, compile and run-time systems are defined in section V. Section VI illustrates the approach by debugging ”not conform” data. Section VII gives conclusions and outlines future work. II. C ONFORMANCE C HECK D EFINITION AND A PPROACHES A. Business Network Mining and Conformance As part of BNM, Network Mining (NM) is the discipline that covers the discovery, extraction and domain specific analysis of relevant data from dynamic, distributed and heterogeneous enterprise landscapes [20]. The data is automatically discovered and the resulting raw material is transformed into a model suitable to cover all aspects and allow inference on the captured ”as-is” state of the network [22]. An excerpt of the NM model as dependency graph is shown in Fig. 1. The basic model entities for integration networks are the system discx and message flow discx Datalog facts, representing the knowledge about nodes and edges of the network. The system discx and host discx entities, i.e. the physical host a software system runs on, depend on data discx facts which references the original domain-data from the various sources. The same system discx and runs on discx facts denote semantic links between model entities. A dependency stands for an existential relationship between entities or facts from different levels, where higher level depend on lower level

Fig. 1.

Excerpt of dependency graph of the NM Model represented as Datalog facts

facts. The dependencies as shown in Fig. 1 are calculated by the validation program compiler (VPC) in our approach. The dependency graph of the model gives insight into its structure, which allows for efficient processing and optimization during compilation and run-time, e.g. distribution or shared memory schemas. Similar to the data quality problem classification in [8], the term model conformance check defines the compliance of the structure and content according to its implicit model, while meeting certain quality requirements. Upon these quality requirements are QR1: syntax violations, i.e. when data values contain disallowed characters or do not match predefined patterns, QR2: illegal values, QR3: missing properties, QR3: missing values, i.e. a property of an instance does not hold a value, QR4: missing references, i.e. relations cannot be determined, QR5: out of range values, QR6: duplicate instances, QR7: uniqueness violations, QR8: invalid references, and QR9: functional dependency violations, i.e. a combination of different property values, semantically not allowed in the same record. With these requirements the completeness of the data and its properties as well as their syntactic and semantic accuracy, model entity and property uniqueness can be checked. To ensure the model conformance of NM data, the consistency of the structure and the content of the facts are checked. That means the existence of all required fact arguments, the compliance with respect to argument restrictions, e.g. the length of an argument, and the resolution of dependencies between facts, e.g. a valid system discx fact requires a valid data discx fact, are validated. Thereby dependencies between facts are defined by URIs [1], e.g. uri, systemUri, which are also used to identify the fact. For instance, a system discx fact is valid if and only if it has a name and a URI, the name is not empty and is within defined length boundaries and there is a valid data discx fact in this fact set.

B. Conformance Check Approaches 1) Datalog Validation Rules: The conformance check approaches can be either found in the technology used for a specific system, e.g. Datalog reasoner [25], or the data format, e.g. XML [11]. For the BNM system, the knowledge is encoded as Datalog facts. A native approach to Datalog fact validation is the insertion of validation rules to Datalog programs, i.e. facts and inference rules. For instance, let runs on discx be a fact with corresponding system discx, host discx and data discx facts, as defined in List. 1, where the arguments of the facts are reduced to their URIs. The relation specifies which software system is running on which hardware. Consequently, a valid runs on discx fact must reference valid system discx and host discx facts. However, these facts are only valid if and only if valid data discx facts exist. With that, a validation program can be defined as Datalog rules [25] (see List. 1). Listing 1.

Datalog validation rules

v a l i d s y s t e m (? x ) :− sys tem dis cx (? x ) , data discx (? x ) . v a l i d h o s t (? x ) :− h o s t d i s c x (? x ) , data discx (? x ) . v a l i d r u n s o n ( ? x , ? y ) :− runs on discx (? x , ? y ) , valid system (? x ) , v a l i d h o s t (? y ) . Now, the query ? − valid runs on(?x, ?y) would return the correct tuple (0 systemU RI 0 ,0 hostU RI 0 ). However, the structural conformance check according to the model becomes intracktable with the increasing number of model entities or facts, since for each fact, there has to be a rule that defines its validity. This would result in complex Datalog programs which would have at least two facts with a similar meaning instead of having only the original fact. Furthermore, it is not possible to validate the facts based on their content. This means that there is no way to handle required and optional arguments in a different way.

2) Constraint Handling Rules: An alternative to that approach could be Constraint Handling Rules (CHR) [5], [6], [7], where user-defined constraints are brought into a higherlevel host language, e.g. Datalog. These constraints may include propositional rules, i.e. constraints without arguments, logical variables, i.e. constraints with arguments, and builtin constraints, i.e. predefined in the host language [7]. With CHR, the issue of a validruns on discx fact can be solved using logical variables as shown in List. 2, where the first two constraints ensure the existence of corresponding data discx fact for system and host. The second constraint ensures the validity of the runs on discx fact. One advantage of CHR is that no validation rules have to be defined, but they are used to handle the dependencies. However, this does not ensure the correct structure of arguments. Listing 2.

CHR constraints as logical variables

system (X) ⇔ data (X) . host (X) ⇔ data (X) . runs on (X , Y ) ⇔ system (X) , host ( Y ) , data (X) , data (Y) . 3) Schema Validation: One way to manage the variance of the arguments is schema validation, known from XML Schema validation, which allows to restrict the number and sequence of elements [11]. For that, all facts have to be represented as XML elements, where each fact consists of a predicate and an integer arity. The arguments are stored as list, i.e. values element. A valid fact has at least one value (argument) in the list in order for the XML document to be valid, which can be detected by the schema validation. However, the arity cannot be validated. For instance, a fact without values and an arity of 3 contradicts the definition of Datalog facts, but cannot be detected by schema validation. Although structural conformance and some content constraints within xml-documents can be checked by schema validation, there is no possibility to define specific facts according to the model, since the schema does define the abstract fact. For instance, it is not possible define a constraint on a system discx with exactly one name, an optional description and exactly one URI with exactly one related data discx fact, if the model specifies a predicate, an arity and a list of values (see List. 3). Listing 3.

Sample for implicit system discx model representation

s y s t e m d i s c x 3 < v a l />< v a l /> III. R ELATED W ORK Some of the related work in the area of structure validation using Datalog, CHR or schema validation will be discussed in section II. Additional approaches concentrate on knowledge

validation at a higher level. For example, [10] presents a summary of the recent trends in this field by defining validation as the insurance of “functional accuracy or correctness of the system’s performance” [10]. According to that, the types of validity are content validity, i.e. correct representation of the problem domain, construct validity, i.e. model represents expert behavior correctly, and empirical validity, i.e. proper mapping between the system output and the real world. [10] also states that verification ensures structural correctness of a system by detecting errors in the logic of the knowledge base. Other authors state on similar definitions. For example, [17] describes verification as structural correctness while he states that evaluation is the ability to reach correct conclusions. Others state that validation is concerned with building the right system, i.e. it ensures that the system does what is supposed to do [14], [24], [26]. Other authors imply that the knowledge in the knowledge base is injected by different experts and thus the rules have to be checked for consistency [10], [26], [24], [17], [14]. However, in this paper injecting and validating rules is not considered. This work deals with the conformance of the injected facts with the model that an expert has defined. As a consequence of the different definitions of validation, the methods for knowledge validation are mostly inappropriate for this work. Some suggestions of knowledge validation include decision tables [16], graphs [26], constraints [2], [4] and even petri nets [27]. A general approach seems to be testing [14], [15], [17], [24]. While [17] discusses completeness and consistency tests, [14] and [15] elaborate on the creation of the test sets. [24] explain validation by testing all possible input scenarios. Again, these approaches focus on a different problem domain. IV. D ESIGN P RINCIPLES Since common conformance check approaches only cover parts of the requirements of current information management systems, we take NM as a representative example to define design principles for a comprehensive approach. To check data quality and conformance according to a (business) network data model, the quality requirements discussed in section II for the structure and content, have to be validated automatically while allowing manual configurations of the validation programs. Therefore, validation programs shall reliably check the conformance, e.g. of complex business network Datalog programs, and shall be written in a simple, intuitive, application domain independent Domain Specific Language (DSL) adapted to the validation domain. The resulting programming model should be used by different persona, e.g. developer, data model and quality experts. That means, even non-technical persona shall be able to write validation programs without knowing how to program. The runtime system for validation programs shall allow for efficient processing, parallelization and shared memory computing. Default runtime artifacts shall be assigned to validation program operations. Custom runtime artifacts can be deployed, which allow for extensions, e.g. filter operators,

length checks, starts-with or ends-with checks for strings and boundaries for numeric types. The validation programs shall be debuggable to detect errors in the validation code as in any other programming language. When the validation run is finished, the program shall terminate and report on the conformance of the data. Contradictions shall be detected and visualized in an understandable and structured and readable way.

Based on the considerations on conformance checks and design principles, a system and programming model for validation programs is defined subsequently. Fig. 2 schematically shows the compile and the runtime system, which is based on Deterministic Finite Automata (DFA) as in [13]. For the validation of data represented as Datalog facts (Facts), a validation program defines the NM model for conformance (Definition file). Given this input, the validation program compiler (VPC) computes a dependency graph as semantic model, see Fig. 1, and converts it into an execution plan based on DFA. Then the runtime system (VRS) validates the facts according to the execution plan and terminates by showing the validation result (Output (Log)).

the fact, e.g. the URI. The key index is optional as there might be facts that do not have a single key argument, e.g. the runs on discx fact which has a combined identifier of systemURI and hostURI. The key index of a fact is written in brackets ([ ]). The arguments in the argument list are separated by a comma (,). If the argument depends on more than one fact, the predicates of the lower-level facts are separated by a comma (,). After the definition of one fact, there is a dot (.) that terminates the fact definition. Comments may be included using the number sign (#). A sample definition file is shown in List. 5. The first line is a comment, then a data discx fact is defined. Its key argument is the uri, the other three arguments are optional. None of the arguments depend on other facts. Then the definition of the system discx specifies a mandatory name as the cardinality is 1 while the description is optional. The last argument, i.e. systemURI, at index 2 of the argument list, is a required argument which depends on a valid and matching data discx fact. The host discx is specified similarly. The runs on discx definition at index 7 has no unique URI and specifies a semantic relation between system discx or bpartner discx and host discx.

A. Validation Programs

B. Compile Time System

The validation programs describe the conformance model in an external DSL as input to the VPC. The language follows the grammar as in List. 4. For instance, facts, their predicates, arguments and the relationships between them are defined.

The validation program compiler (VPC) analyzes the programs and creates a semantic model. From that, an execution plan based on DFA is compiled which is able to check the conformance of the facts. This approach follows the idea of event-based systems [12], which stay in a state until a specific event happens. Depending on the state and the event, they react with a specified action. Thus, the state is the “memory” of the system. The event causes the machine to react and may lead to a change of the machine’s internal state. Since, the validation of an argument decides on the conformance, or acceptance, of a fact, and a fact impacts the consistency of a set of facts, the concept of Finite Automata [13] can be applied. More precisely, fact validation has to be done by an Acceptor Automaton [12] as it either accepts or declines a fact. When applying the concepts of an acceptor DFA [13], [12], its states and transition functions to knowledge validation, the automaton used for fact validation (FVA) will be similar to a “normal” finite automaton. The Automaton definition for validation programs consists of a five-tuple which has a set of states (Q0 ), a set of input symbols (Σ0 ), a transition function (δ 0 ), a start state (q00 ) and a set of accepting states (F 0 ):

V. T HE C ONTENT- BASED VALIDATION A PPROACH

Listing 4.

Basic validation program grammar

/ / d e f i n i t i o n o f t h e schema []:< a r g u m e n t l i s t >. / / d e f i n i t i o n o f t h e argument
({< l i s t o f f a c t p r e d i c a t e s t h a t t h e a r g u m e n t d e p e n d s on>},< c a r d i n a l i t y >) . Listing 5.

1 2 3 4 5

6 7 8

Sample Definition file

# l e v e l −0 data discx [0]: uri ( { } , 1 ) , data ( { } , 0 ) , content type ({},0) , origin ({},0) . # l e v e l −1 s y s t e m d i s c x [ 2 ] : name ( { } , 1 ) , d e s c r i p t i o n ( { } , 0 ) , systemURI ({ d a t a d i s c x } , 1 ) . h o s t d i s c x [ 3 ] : name ( { } , 1 ) , d e s c r i p t i o n ( { } , 0 ) , o r i g i n ( { } , 1 ) , hostURI ({ data discx },1) . # l e v e l −2 r u n s o n d i s c x [ ] : systemURI ({ s y s t e m d i s c x , bpartner discx },1) , hostURI ({ h o s t d i s c x } , 1 ) , o r i g i n ( { } , 1 ) . The key index is the integer value that indicates the index of the argument in the argument list that uniquely identifies

F V A = (Q0 , Σ0 , δ 0 , q00 , F 0 ), while the definition is slightly different from the DFA that accept regular languages [13]: Q0 The set of states. The automaton is in one of those states before the validation of an argument. Afterwards is can be in the same as before or in another state out of Q0 . 0 Σ The set of input symbols is the set of arguments of a fact, which can be either valid or invalid. Therefore,

Fig. 2.

δ0

Schematic view on compile and runtime system

Σ0 can be interpreted as one fact with the restriction that its arguments are in an ordered set or, more precisely, a list. The transition function represents the validation of one argument, i.e. it determines whether the input argument is valid or not.

q00 F0

For the fact validation automaton, the start state is the state that the automaton is in before it starts to validate the first argument. The set of accepting states includes only one state, which is the state the automaton goes to after successfully validating all arguments. Therefore, the

success of the validation can be determined by investigating whether the automaton is in its accepting state after validating the fact. Let qn be the state that the automaton is in after the fact validation, then qn ∈ F 0 → fact is valid and qn 6∈ F 0 → fact is not valid. When applying this definition to the conformance problem, the compiler generates default or custom states from the semantic model. Thereby the semantic model gives information on how to generate the FVA. For instance, an optional argument in the semantic model, translates into a state which will be skipped by the transition function. Some of the default states, which are translated to the FVA execution plan similar to DB operators are described subsequently: Argument Validation In the transition function of this state, an argument is validated in terms checking whether it is specified. If the argument contains an empty string, the transition function returns the state that the automaton is in at the moment. Otherwise it returns the next state. Skip Argument Validation This transition function simply realizes there is an argument but does not validate it. It does not matter whether the argument is an empty string or not. Therefore, this transition function always returns the next state. Validation of a Specific Argument of a Particular Fact In this transition function, there is a fact- and argument-specific validation implemented. For example, there is one transition function that only validates the name of system facts, which is defined to be a non-empty string that is less than 255 characters long. Error State This state represents any kind of unexpected input. For example, when the fact model for the given fact cannot be found, the start state of the automaton will be this state. If a fact has more arguments than the corresponding model, this state is the last state of the automaton. In any case, once the automaton is in this state, the fact cannot become valid any more. Consequently, the transition function of this state returns the same state on any input. Fact Validation This transition function is the most advanced one as it has to validate all facts that the actual fact depends on. During the definition of the fact and argument models, nondefault validation operations might be required. For instance, the system name should be less than 255 characters long. Therefore, the system can be extended by a new transition function, i.e a new state, by following a special naming schema: V a l i d a t e
,

where the name starts with the literal “Validate” followed by the fact predicate and the name of the argument that this state is for. For example, the state that validates system names would be ValidateSystem discxName. The predicate and the argument name are the ones that were defined in the definition file. As remark on the implementation, the transition function and its states are implemented in the Java programming language. Hence all constructs of the context-free language could be used within the programs. Since the transition function is an extension to the FVA, which natively accepts regular grammar languages, the constructs used in transition functions are limited to variable declarations, conditions and finite loops. All other constructs are considered unsafe. For instance, for infinite loops or RPCs, the termination of the programs or the correct detection of conformance contradictions cannot be guaranteed. In general, the transition function of the automaton is defined as a complex function which allows (a) to check whether an argument is a non-empty string, (b) to skip the validation of an optional argument, (c) to include custom factand argument-specific validation, (d) to validate the URI of a fact, and (e) in all cases, to return the next state of the Validation Automaton if the validation was successful or an error state if any of the validations mentioned above fail. While the transitions (a) to (c) are simple tests on the content of the argument, transition (d) is more complex due to the reference check. C. Runtime System The runtime system manages the actual validation of a fact or a set of facts, while executing the (automaton) plans generated by the compile time system. The actual process of validation, consists of several phases such as preparation, applying a validation strategy (not explained in this paper), applying distribution or shared memory schema, pre-processing, validation, and post-processing. The preparation phase mainly consists of compiling the validation programs, generating the semantic model as well as the execution plan from the FVC. Then the runtime system is prepared by creating a context in which the facts will be validated. In general, a fact set includes a number of facts that may or may not have the same predicate. If the facts do not have the same predicate, they may depend on each other. The set of facts, that will be validated later on, is a subset of the context or the context itself: V alidationF actSet ⊇ Context. For example, let the context C be a fact set that consists of a runs on discx fact R with a system discx and a host discx fact (S and H) that each have a corresponding data discx fact (DS and DH ): C = {R, S, H, DS , DH } A fact set S1 , that is a subset of this context may be S1 = {S}

If the arguments of all facts are valid, the fact set S1 will be valid as well, as for the system discx fact, there is a corresponding data discx fact in the context. Now, let another subset S2 be S2 = {R, E} where E is an example fact. The fact set S2 will not be valid even though the arguments of all facts will be valid because there is no data discx fact DE that matches the configURI of the example configuration. Now, let the fact set S3 be S3 = {R, S, H, DS , DH } = C The fact set S3 will be valid as there are the corresponding lower-level facts in the fact set itself. Therefore, the automaton is not only able to validate a set of facts in the context, it is also able to validate the whole context. When the runtime system is prepared, it applies a validation strategy and a distribution or shared memory schema derived from the semantic model. The distribution schema determines connected components within the semantic model and deploys programs and data to arbitrary number of nodes. The shared memory schema facilitates the storage of all validated facts. When any of the already validated, lower-level facts is supposed to be validated again, the automaton can look up the shared memory and skip this validation. When disjunctive automata store their valid facts within the shared memory, the complete validation is processed efficiently. All invalid facts are stored in the local memory of the automaton for later revalidation. Before the real validation can be started, all conditions of the system are checked, e.g. shared memory, arity and argument consistency check, determination of start state. Then the validation starts by calling the transition function of the actual state for each argument, which returns a state depending on the validation result. If the return state is current state of the automaton, the validation of the argument was not successful. Then the processing is stopped with one invalid argument. Hence the fact cannot become valid any more. Otherwise, the current state is set to the state returned by the processed state. When a fact is validated, the post-processing handles success or failure cases and logs it appropriately. VI. T HE RUNTIME S YSTEM DURING P ROGRAM E XECUTION To demonstrate the defined compile-and runtime system, a sample validation program is used to debug the information represented as Datalog facts according to the NM model. Therefore, a user is assumed, who creates a semantic relation between a system and a host as runs on discx fact. In addition, that requires a system discx with a name, description and a URI. In this case, there are no custom validation states deployed to the system, which would e.g. allow to enhance the validation with user defined checks like QR1,2 or QR5,6. In the example shown in List. 6, the error comes from an invalid URI of the data discx fact, which belongs to the system discx, but does not match its URI invalidUri (relates to QR8). For that, the definition from List. 5 is taken as validation program.

Listing 6.

Invalid reference

runs on discx ( systemUri , hostUri ) . system discx ( sys1 , desc , systemUri ) . d a t a d i s c x ( i n v a l i d U r i , c o n t e n t , c o n t e n t− type , origin ) . The runtime system detects the inconsistency during execution. The automaton started with the validation of the runs on discx fact. Then checked the validity of the system discx fact. The name was validated, description was skipped (because it is optional) and then uri failed in the ValidateFact state due to missing data discx fact. The present data discx fact is valid, but has an invalid uri (relates to QR8). This can be seen in the textual output system_discx: Could not find a valid data_discx with the same key systemURI : systemUri). or the graphical output in dot-notation [9]. When the invalidUri in the data discx fact was corrected, the validation process is started and fails again, as shown in Fig. 3. This time the automaton did not find a valid host discx fact required for the runs on discx fact (relates to QR4). Guided through textual and graphical support, this issue can be fixed (see List. 7) by creating a valid host discx and linking it to the runs on discx fact. The textual, the graphical or both output variants can be chosen. In case of small input files, the graphical representation should be used. There the textual output could be used complementarily as more detailed description to the error. For larger input files, the dot-graph [9] could only help to locate the issue, but the text might be more helpful. Listing 7.

Correct fact input

runs on discx ( systemUri , hostUri ) . system discx ( sys1 , desc , systemUri ) . data discx ( systemUri , , , ) . h os t d is c x ( host1 , desc , hostUri ) . data discx ( hostUri , , , ) . VII. D ISCUSSION AND F UTURE W ORK In this paper, we introduced a novel approach for conformance checks on structured data given an implicit data model. For that, we defined a programming model including validation programs as well as a compile- and runtime system based on automata theory, which is able to execute these programs. We evaluated our approach on a linked data domain, i.e. NM as part of BNM. On sample data and data model from NM, we showed the correctness and reliability of our system. Through debugging, we illustrated how non-conform data, identifier uniqueness and model structure are checked. We hinted on how efficient, distributed validation plans are executed. The application of our system to linked data or network domains is quite promising. Although tool support for debugging, distribution and shared memory schemas have to be

Fig. 3.

Automata in error state due to missing host-fact

improved, the application to real business network data shows good results. Especially when knowledge is added manually to the system, inconsistencies were found precisely and fast. Additionally, the system turned out to be helpful in detecting incompatible model changes, or interoperability issues between the NM discovery clients and the BNM server. Through its model-driven, DSL-based approach, our experience with the maintenance of the model and entity lifecycle management is positive. For instance, model extensions or changes can be maintained in the validation program and are automatically adapted by the compiler and the runtime system. Future work will be conducted in the areas of the language, further debugging and tool support, and tool support for distributed or shared memory programming. In our approach, the grammar of the languages accepted by the automata is regular. According to [13], the automata could also be described by a regular expression. Consequently, it may be possible to describe the format of conform data models using regular expressions. R EFERENCES [1] Berners-Lee, T., Fielding, R. & Masinter, L.: Uniform resource identifier (uri): Generic syntax. http://labs.apache.org/webarch/uri/rfc/rfc3986.html, 2005. [2] Berstel, B. & Leconte, M.: Using constraints to verify properties of rule programs. Software Testing, Verification, and Validation Workshops (ICSTW), pp. 349–354, 2010. [3] About the Eclipse Foundation. http://www.eclipse.org/org/, 2011. [4] Elfaki, A., Muthaiyah, S., Magboul, I., Phon-Amnuaisuk, S. & Ho, C. K.: Defining variability in dss: An intelligent method for knowledge representation and validation. System Sciences (HICSS), pp. 1–9, 2010. [5] Fr¨uhwirth, T.: Constraint handling rules, in A. Podelski (ed.), Constraint Programming: Basics and Trends, Vol. 910 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, pp. 90–107, 1995. [6] Fr¨uhwirth, T.: Theory and practice of constraint handling rules, The Journal of Logic Programming 37(1-3): 95–138, 1998. [7] Fr¨uhwirth, T.: Constraint Handling Rules, Cambridge University Press, chapter 1, pp. 3–10, 2009. [8] F¨urber, C., Hepp, M.: Towards a Vocabulary for Data Quality Management in Semantic Web Architectures. LWDM, Uppsala, 2011. [9] Gansner, E., Koutsofios, E. & North, S.: Drawing graphs with dot. 2006. [10] Gupta, U. G.: Validation and verification of knowledge-based systems: A survey, Applied Intelligence 3: 343–363, 1993. [11] Harold, E. R. & Means, W. S.: XML in a Nutshell, 3 edn, O’Reilly, 2004.

[12] Hoffmann, P. D. D. W.: Theoretische Informatik, Carl Hanser Verlag, Munich, Germany, 2009. [13] Hopcroft, J. E., Motwani, R. & Ullman, J. D.: Introduction to Automata Theory, Languages, and Computation, 2 edn, Addison-Wesley Longman, Amsterdam, Netherlands, 2001. [14] Knauf, R., Gonzalez, A. & Abel, T.: A framework for validation of rulebased systems, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 32(3): 281–295, 2002. [15] Knauf, R., Tsuruta, S. & Gonzalez, A. J.: Toward reducing human involvement in validation of knowledge-based systems. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on 37(1): 120–131, 2007. [16] Merlevede, P. & Vanthienen, J.: A structured approach to formalization and validation of knowledge. Developing and Managing Expert System Programs, Proceedings of the IEEE/ACM, pp. 149–158, 1991. [17] Owoc, M. L., Ochmanska, M. & Gladysz, T.: On principles of knowledge validation, Collected papers from the 5th European Symposium on Validation and Verification of Knowledge Based Systems - Theory, Tools and Practice, EUROVAV ’99, Kluwer, B.V., Deventer, Netherlands, pp. 25–35, 1999. [18] Ritter, D., Bhatt, A.: Modeling Approach for Business Networks with an Integration and Business Perspective. ER 2011 Workshops, Br¨ussel, 2011. [19] Ritter, D., Ackermann, J., Bhatt, A., Hoffmann, F. O.: Building a Business Graph System and Network integration Model based on BPMN. In: 3rd International Workshop on BPMN, Luzern, 2011. [20] Ritter, D.: From Network Mining to Large Scale Business Networks. International Workshop on Large Scale Network Analysis (LSNA), WWW Companion, Lyon, 2012. [21] Ritter, D., Westmann, T.: Reconstructing Linked Business Networks from Network Mining Data using Datalog. Datalog 2.0, Vienna, 2012. [22] Ritter, D.: Towards Business Network Management. Confenis: 6th International Conference on Research and Practical Issues of Enterprise Information Systems, Ghent, 2012. [23] Resource Definition Framework (RDF). http://www.w3.org/RDF/, 2012. [24] Rosenwald, G. & Liu, C.-C.: Rule-based system validation through automatic identification of equivalence classes. Knowledge and Data Engineering, IEEE Transactions on 9(1): 24–31. [25] Ullman, J. D.: Principles of Database and Knowledge-base Systems, Computer Science Press, Rockville, MD, USA, 1988. [26] Wu, C.-H., Lee, S.-J. & Chou, H.-S.: Dependency analysis for knowledge validation in rule-based expert systems, Artificial Intelligence for Applications. Proceedings of the Tenth Conference on, pp. 327–333, 1994. [27] Wu, C.-H. & Lee, S.-J.: Knowledge validation with an enhanced highlevel petri net model, Artificial Intelligence for Applications. Proceedings., 11th Conference on, pp. 126–132, 1995.

Suggest Documents