Enterprise integration by market-driven schema evolution
Manfred A. Jeusfeld1, Matthias Jarke Aachen University of Technology Computer Science Department Ahornstr. 55, 52056 Aachen, Germany Email:
[email protected] Fax: +49-241-8888-321 Tel: +49-241-8021510
Short title: Enterprise integration by market-driven schema evolution
Submitted to: International Journal on Concurrent Engineering Research and Applications (CERA), Special Issue on Enterprise Modelling Languages
0ART OF THE WORK WAS DONE WHILE THE FIRST AUTHOR WAS WITH THE (ONG +ONG 5NIVERSITY OF 3CIENCE 4ECHNOLOGY
Abstract
The way how enterprises are organizing their processes is ultimately dictated by the requirements of their markets. Enterprises serving slowly changing market prefer a functional decomposition of their work force into departments with information flows along the hierarchy. Quickly changing markets require quick redefinition of implemented functions and information flows in the distributed information systems of the enterprise. This paper proposes conceptual modeling facilities and a trader architecture which enable a bottom-up, market-driven evolution of the information system schemes. A case study has been undertaken in the area of distributed quality management. The trader has been implemented with the meta database manager ConceptBase. The information systems realize the information flows via federated SQL servers.
Keywords schema evolution, meta modeling, federated database, cooperative design, query language, quality management
1
INTRODUCTION
As enterprises split up into smaller units, and as these smaller units start cooperating with each other even across company boundaries, the amount of information passed between independent units significantly increases. The lack of central management is accompanied by diverging schema definitions for the local information systems. Any local change potentially threatens the consistency of all applications, and local changes are frequent because of market changes. The traditional software life cycle prescribes a maintenance cycle performed by a software development team to implement a change. This includes updated analysis, design, coding, and testing.
This paper proposes a method for incremental change of the information flow in a distributed, enterprise-wide information system. The ultimate goal is to attach an ‘evolution function’ to the implemented system with the following properties:
A) Evolution may occur at any time in the life cycle and at any place in the distributed information system. Changes are proposed by the so-called method engineers of the participating information systems. Although there is no central arbiter who controls the changes, there should be support to adapt to the local changes. We propose a computerized information trader who accepts changes described in terms of a meta database forming an abstract information market. B) The method engineers should be able to define their own evolution policy, i.e., the set of rules governing an evolution phase. The rules depend on the development status of the distributed information system as well as on the organization of the enterprise. Rule definition must be possible in an ad hoc fashion, especially to react to local evolution steps. We propose an extensible set of market rules maintained by the information trader to be configured by the method engineers. C) There should be a standardized way to exchange information objects across system boundaries. There are many proposals for data exchange formats for distributed systems. We will use SQL because it covers both data definition and data manipulation and is already accepted as an industry standard.
In section 2 we will present the logic-based meta modeling language Telos used throughout this paper for formal representation of objects, tasks, methods, and agents. Section 3 presents Telosbased requirements and an implementation languages for schema evolution. Then, Section 4 shows how queries and constraints can assist the evolution task. Finally, Section 5 reports the application of the approach in the area of distributed quality information systems and presents an architecture with the information trader as part system.
2
META MODELING FEATURES
We use a derivative of the knowledge representation language Telos [MBJK90] to design the schema of the information trader’s meta database, esp. the rules and integrity constraints that support the distributed evolution of the federated database schemes. This derivative is optimized for integrating meta-level information stemming from heterogeneous design and implementation languages. While the base predicates, rules and integrity constraints are presented in more detail in [JGJ*95], the category normal form and the categorized queries are new.
Base predicates. O-Telos is a predicative language built on three base predicates: (x in c)
expresses that the item x is an instance of the item c,
(c isA d)
declares c as subclass of d, and
(x m/n y)
states an attribute relation named n with category m between x and y.
The arguments x, y, c, d are names of objects or classes. All three predicates are in infix notation, the first two are binary predicates, the third is a 4-ary predicate. For the purpose of compactness we will use a frame notation for to represent facts in these base predicates: Class c isA d Object x in c with m n: y end
Rules and integrity constraints. O-Telos uses this facility to define built-in axioms like class membership inheritance and well-typedness. The complete list of axioms is described in [JGJ*95]. For example, class membership inheritance is encoded by the deductive rule
∀ X C D X IN C ∧ C IS! D ⇒ X IN D Note that this rule quantifies over classes. Besides the three base predicates we allow equality/inequality predicates. We use the short form (x m y) instead of (x m/n y) defined by the deductive rule
∀ X M N Y X M N Y ⇒ X M Y
whenever the name of the attribute is irrelevant. Application-specific rules and integrity constraints are formally not distinguished from the built-in axioms. They just have more constants referring to application classes like ’Employee’ etc.
The Category Normal Form. This paper utilizes an abstract meta model to describe both the model (language) of requirements and the model of the implementation. To empower this meta model, it will be augmented by integrity constraints, rules, and queries that refer to concepts which are instances of instances of the meta model. The technical trick is to associate the formulas with attribute categories of the meta model (explained in subsequent sections). The trick works for formulas which have the so-called category normal form
∀ C N D X C M N D ⇒ X IN C ⇒ j MX C N D where j M is some formula with free variables x,c,n,d and ’m’ is the name of some attribute category. One should read such constraints as follows: if class ’c’ defines an attribute ’n’ under category ’m’ then each instance of ’c’ must fulfill the condition j M . Category normal forms can partially evaluated for ground instances of the ‘header’ predicate (c m/n d). Let's assume that (c1 m/n1 d1) is such a ground instance. Then, the partially evaluated form
∀ X X IN C ⇒ jM X C N D
is just a constraint on the instances of class 'c1': if 'x' is an instance of 'c1' then some condition must hold. As an example, consider the following frame defining the class 'Employee':
Class Employee isA Person with mandatory, single empid: Integer single proj: project end
This frame is a shorthand for the ground facts (Employee in Class), (Employee isA Person), (Employee mandatory/empid Integer), (Employee single/empid Integer), (Employee single/proj Project). The last three facts associate the attributes of employee to the constraints defined for the categories ’mandatory’ and ’single’. By partially evaluating the header predicates of their logical definition we obtain: ∀ X X IN %MPLOYEE ⇒ jMANDATORY X %MPLOYEE EMPID )NTEGER
∀ X X IN %MPLOYEE ⇒ jSINGLE X %MPLOYEE EMPID )NTEGER ∀ X X IN %MPLOYEE ⇒ jSINGLE X %MPLOYEE PROJ 0ROJECT
A similar idea was already proposed for an early implementation of Telos [GS87] where it was restricted to a single class level.
Categorized queries. We will use so-called categorized queries (extended from [SNJ94]) as shorthand for deductive rules. A query in frame format looks as follows: Query q isA c with m1 n1: c1 m2 n2: c2 ... end
Except for the keyword ’Query’ there is no syntactic difference to ordinary classes. However, objects are not inserted into queries but derived. Intuitively, the query class ‘q’ contains all objects of class ‘c’ whose attribute ‘n1’, ‘n2’ etc. fulfill the constraints defined for the categories ‘m’ or ‘n’, respectively. Recall the categorized constraints defined for a class. There, membership to the class (x in c) implies that a certain condition is true. For the case of queries, we invert the implication sign: if an object 'x' satisfies all conditions associated with 'q' then (x in q) holds, more formally:
∀ X X IN C ∧ j MX Q N C ∧ jM X Q N C ∧⇒ X IN Q
The parameters m1,n1,c1 etc. are taken from the definition of ’q’. At the logical level queries are deductive rules. However, the condition is composed by specializing the pre-fabricated constraints associated to its categories m1, m2 etc. Hence, a user just has to pick the names of the proper categories to formulate such queries. This mechanism is the key to provide the method engineers for a tool to define their information market rules (see section 4). Sometimes, it will be useful to access the relative complement of a query, denoted by ’q-’ where ’q’ is the name of the original query. The relative complement is defined by the rule
∀ X X IN C ∧ ¬X IN Q ⇒ X IN Q where ’c’ is the superclass of ’q’. ConceptBase [JGJ*95] provides an implementation for the above logical structures: the base predicates and the logical formulas are stored in a persistent object manager. Meta-level formulas (e.g., those matching the categorized normal form) are partially evaluated against the current extension of the base predicates to yield efficient representations.
3
ABSTRACTIONS OF FEDERATED DATABASES
These technical preliminaries allow us to define pre-fabricate properties of data models used below without explicitly assigning them to the data model. Instead, the designers can activate them on demand depending on the phase of the development process. For example, when the requirements to the information exchange are collected it makes no sense to demand that each request for information is assigned to a resource. However, when a request is to be implemented by a (SQL) query, then we may well require that the assignment is known. In the following, we motivate a fourlayered classification [ISO90] for federated databases.
A database schema can be regarded as a set of (logical) statements about admissible databases. Analogously, a data modeling language is a set of statements about admissible database schemes. Practically all data modeling languages use graphical representations. They differ in the set of node/link categories and the set of statements associated with them. We exploit this observation by setting up an abstraction hierarchy which considers the requirements on information exchange and
the federated DB schema as descendants of the same so-called language model (Figure A). The language model provides the syntactic and semantic means for representing dependencies between both layers. These dependencies are used to map updates to the requirements on information exchange into an evolution of the federated DB schema. Interschema knowledge is encoded in queries within the federated DB schema and in matches from information requests to resources in the requirements specification. Thereby, the effect of evolving requirements can be estimated.
Figure A: Levels of abstraction and their interconnection. The evolution method grounds on four abstraction levels for information known from repository systems [JJR90,ISO90]. The most abstract one is the language model (Section 3.1) used to define both the implementation language (Section 3.2) of SQL databases and the requirements language (Section 3.3). The third abstraction level holds the federated DB schema and the requirements of information exchange. Evolution is specified as updates to the requirements and implemented on the federated database schema (Section 4). The other nodes and links are included to complete the abstraction hierarchy.
3.1 The Language Model Evolution of federated database schema should be regarded as implementing the result of a negotiation between experts [AB89]. The degree of support depends on the formalization of such negotiation, i.e., a language. The model shown in fig. B is just a graphical representation of propositions (x in Object), (x references/references y), (t in Tool), (rq in Request), (rs in Resource), (t references/issues rq), (t references/offers rs), and (rq references/accesses rs). Figure B: Language model (meta model) for offering resources and issuing requests.
The language model is augmented in Section 4 by categories for constraining the evolution of requirements and their implementation. The next step is to apply the language model to the setting
of federated databases used as implementation platform in Section 5. Formally, the language model is class level for the models in the subsequent subsections.
3.2 The Implementation Language SQL Figure C is the formal representation of how SQL is used as medium for information exchange: The SQL server requests information by SQL queries, it offers uniformly information as relations. SQL queries join ’from’ relations and express membership by a ’where’ clause. SQL queries may have parameters which are filled in at query time. Such queries are typically found in application programs. Relations have attributes, some of them being primary keys. Relations have a (local) name within an SQL server. They can be made absolute by pre-pending the database name of the SQL server.
Figure C: Abstract model of SQL servers as implementation platforms.
The model is a formal instance of the class definitions in the language model. Below is an excerpt of the predicative representation. The ’references’ attribute category is the most general one in O-Telos and used for marginal attributed not directly classified to the language model. SQL-Query in Tool with references parameter: SQL-Domain end
SQL-Server in Tool with issues byQuery: SQL-Query end
SQL-Relation in Resource,Object with references attr: Rel-Attribute; primaryKey: Attr-Set ... end
Attr-Set in Object with references consistsOf: Rel-Attribute end ...
Semantic integrity of federated DB schemes can be expressed by first-order axioms. For example, the following axiom ensures uniqueness of relation names within the same SQL server. Note that the objects of the implementation model are now used as classes. Thus, the variable ’r’ stands for an instance of an instance of the class ’Resource’.
Compare node layouts for identifying the corresponding classes.
∀R S N SRV R IN 31, 2ELATION ∧ S IN 31, 2ELATION ∧ N IN 3TRING ∧ SRV IN 31, 3ERVER ∧ R NAME N ∧ S NAME N ∧ SRV SCHEMA R ∧ SRV SCHEMA S ⇒ R S
The abstract model of an SQL server is the framework for describing an implementation of a federated database. The implementation is indeed completely represented as instances of the classes in Figure C. An example is given in the Section 5. The model in Figure C accounts for the target implementation chosen for our quality management application. It is however general in the sense that it covers most of the SQL code necessary to describe the schema of a federated relational database and the queries executed on it.
3.3 The Requirements Language and its Implementation
The counterpart of the implementation language is the requirements language for federated databases - again an instance of the language model. The tool ’Participant’ is used to classify the part systems of the distributed environment, e.g. the tools presented in Section 5. They can appear in the role of information producers and consumers. The central task of the requirements language is to specify the match between the supply of information and the demand: the abstract information market. Information sources can be decomposed into smaller parts. Sometimes a consumer will be interested in parts rather than the full information. The three kinds of implementation links relate the requirements language to the implementation language: information sinks can be implemented by SQL queries, information sources are implemented as SQL relations. Figure D: Requirements language for information exchange.
It is named ’attribute’ in the original paper. We choose the new name in order to avoid ambiguities.
The predicative representation in O-Telos (see below) shows that the ’match’ link is classified into the same category as ’from’. This indicates an analogy that can be exploited in partial code generation (Section 5). Participant in Tool with references assignServer: SQL-Server end
Producer in Tool isA Participant with offers supply: Info-Source end
Consumer in Tool isA Participant with issues demand: Info-Sink end
Info-Sink in Request with accesses match: Info-Source references implementQuery: SQL-Query end
Info-Source in Resource with references implementRel: SQL-Relation; linkInfo: Info-Source end
The ’linkInfo’ reference between information sources is included in the model to state arbitrary references between information sources. Special cases of ’linkTo’ like ’part-of’ and ’subclass’ can easily be included when the requirements language is extended to a conceptual database design language. Note that the concepts of the requirements language (e.g., Info-Source) are explicitly related to concepts of the implementation language (e.g., SQL-Relation). This feature of the meta modeling approach is used in the next section to define evolution at the requirements level and derive necessary updates at the implementation level via queries.
4
EVOLUTION BY UPDATES TO REQUIREMENTS
The models presented in the previous section constitute a set of ground facts of a meta database. The requirements specification is a set of facts of this meta database, more specifically, a collection of instances of the requirements language presented in Section 3.3. Thus, evolution of the requirements is an update to the collection these instances. The development of a new federated database system is just a special case of evolution starting from an empty requirement specification.
We propose to regard the meta database as part of the distributed system (see Figure G). Then, updates to the federated schemes can be executed at run-time. ConceptBase [JGJ*95], a deductive object manager implementing O-Telos, provides services for consistently updating and querying a meta database.
4.1 Policies for Evolution The requirements specification and the federated DB schema are both stored as objects in a meta database. Since evolution is represented by updates to the meta database, integrity constraints are a suitable choice for specifying the legal updates. Additional tests can be expressed by queries. All such ‘market rules’ make up a policy for evolution. Since the evolution concerns the group of system designers, we argue that the group itself should be able to define and adapt the policy. This is feasible because of the rather coarse level of granularity in the requirements specification. Support can be provided by pre-fabricated 'patterns' of integrity constraints, rules, and queries. They are formulated in terms of the language model. The first one is the 'mandatory' category. It classifies attributes of objects with at least one filler.
/BJECT REFERENCES MANDATORY /BJECT ∀ C M D X C MANDATORY M D ⇒ X IN C ⇒ ∃ Y Y IN D ∧ X M Y The ground fact declares the 'mandatory' category. The quantified formula defines the ‘mandatory’ category. Note that this formula does not contain any constant referring to an object in the requirements or implementation language. It quantifies over those objects by variables 'c,m,d'. The variables 'x,y' quantify over instances of 'c' and 'd', i.e., over objects in the third level in the abstraction hierarchy presented in Figure A. Thereby, it can be used in both the requirements and the implementation branch of the meta database. Assume the group of system designers wants that the 'demand' and 'supply' attributes of the requirements language are mandatory. This is done by inserting just two facts (Consumer mandatory/demand Info-Sink) (Producer mandatory/supply Info-Source)
into the meta database. This renders two specialized versions of the constraint:
∀ X X IN #ONSUMER ⇒ ∃ Y Y IN )NFO 3INK ∧ X DEMAND Y ∀ X X IN 0RODUCER ⇒ ∃ Y Y IN )NFO 3OURCE ∧ X SUPPLY Y
ConceptBase includes an incremental formula compiler [JJ91] which automatically generates these specialized versions. If a fact like ’(Consumer mandatory/demand Info-Sink)’ is removed from the meta database then the corresponding constraint will also be removed. Thereby, change of policy is technically very easy.
A second pre-fabricated formula is used for querying unresolved requests for resources, e.g. an information sink with no matching source. Such situations may well occur since the requests usually refer to resources from distant tools. The following query ’ResolvedSinks’ serves for this purpose. Unresolved links can be computed by the relative complement query ’ResolvedLinks-’. Query ResolvedSinks isA Info-Sink with mandatory match: Info-Source end
If the query ‘ResolvedLinks-’ has instances then the information market is incomplete: the designers who defined these instances have to search for appropriate information source matching their demand. This feature is very useful to focus the designer on their task to define information flows between their distributed systems. The complete definition of algebraic properties for relationships is shown below. Note that all these categories including their predicative definitions are part of the meta database. The categories 'transitive' and 'symmetric' define patterns for deductive rules, the others are integrity constraints. Note that these categories will only come into effect when certain attributes of the requirements or implementation language are assigned to them. Class Object with references mandatory: Object; inv-mandatory: Object; transitive: Object; reflexive: Object; anti-reflexive: Object; single: Object; inv-single: Object; symmetric: Object end
∀ C M D X C MANDATORY M D ⇒ X IN C ⇒ ∃ Y Y IN D ∧ X M Y
∀ C M D X C INV MANDATORY M D ⇒ X IN C ⇒ ∃ Y X IN D ∧ Y M X
∀ C M D X C TRANSITIVE M D ⇒ X IN C ⇒ ∀ Y Z X M Z ∧ Z M Y ⇒ X M Y ∀ C M D X C REFLEXIVE M D ⇒ X IN C ⇒ X M X ∀ C M D X C ANTI REFLEXIVE M D ⇒ X IN C ⇒ ¬ X M X
∀ C M D X C SINGLE M D ⇒ X IN C ⇒ ∀ Y Z X M Y ∧ X M Z ⇒ Y Z
∀ C M D X C INV SINGLE M D ⇒ X IN C ⇒ ∀ Y Z X M Z ∧ Y M Z ⇒ X Y ∀ C M D X C SYMMETRIC M D ⇒ X IN C ⇒ ∀ Y X M Y ⇒ Y M X
The set of active market rules is defined by assigning the market relationships to the predefined categories. The designers have that the ’supply’ and ’demand’ are ’inv-single’ and ’mandatory’ , i.e., each information source and sink has exactly one participant connected to it. The ’linkInfo’ attribute is declared as ’transitive’ and ’symmetric’ to cope with indirect dependencies between information sources. Producer in Tool with inv-single,mandatory supply: Info-Source end
Consumer in Tool with inv-single,mandatory demand: Info-Sink end
SQL-Server in Tool with inv-single, inv-mandatory schema: SQL-Relation; byQuery: SQL-Query mandatory from: SQL-Relation end
Info-Source in Object with transitive,symmetric linkInfo: Info-Source end
The presence of these statements in the meta database enforces the semantics expressed by the formulas associated to them. Note that there are periods of time during the evolution of the requirements specification when some/all of the statements are not desirable, e.g., the association of an information source to a producer may become into effect only at the end of a long transaction (on the meta database) because there are multiple choices. The benefit of the pre-fabricated categories is that they can easily be ’switched’ on and off by the system designers without ever having to change (or even read) the predicative formulas that define them.
Before demanding the integrity one may use queries to check to what degree a given requirement specification and/or federated database schema fulfills the conditions. For example, the query Query GoodProducers isA Producer with inv-single,mandatory supply: Info-Source end
and its relative complement ’GoodProducers-’ partitions the producers into those who uniquely supply at least one information source and those who don’t. Not incidentally, the categories ’invsingle’ and ’mandatory’ do the job. As soon as the answer to ’GoodProducers-’ is empty, the designers may enforce them inserting the corresponding definition for ’Producer’as shown above.
4.2 Implementation of Evolution
The requirements specification is not complete in the sense that a federated database schema can be automatically generated from it. The information sources are aggregations of the relations implemented by them. Details about the attributes of the relations are not represented as properties of the information sources. On the other side of the requirements language, information sinks are even more abstract than there counterparts on the SQL side, namely queries and views. Nevertheless, a formal representation of interrelationships between the two languages yields some effective support in the evolution task.
Service 1: Assignment and re-assignment of relations to SQL servers. This can be derived from the assignment of the SQL servers to the participating tools:
∀ P I S P SUPPLY I ∧ P ASSIGN3ERVER S ∧ I IMPLEMENT2EL R ⇒ S SCHEMA R
If an information source X changes its supplier from A to B, then the new assignment of relations can be computed by evaluating the rule first for ’i=X,p=A’. The solutions for ’s’ and ’r’ deliver all relational schema definitions which have to be removed. The evaluation of the rule for ’i=X,p=B’ yields the new assignments of relations to SQL servers. The SQL queries
accessing the reassigned relations can be detected by following the ’from’ link. They need no change in the meta database. However their code has to be re-generated (service 5).
Service 2: Estimate effect of removal of information. The removed units are information sources. Two rules specify exactly which queries and relations are affected by the removal.
∀ I R I IN )NFO 3OURCE ∧ I IMPLEMENT2EL R ∨
∃ J J LINK)NFO I ∧ J IMPLEMENT2EL R ⇒ R DEPENDSON I ∀ I Q R Q IN 31, 1UERY ∧ Q FROM R ∧ R DEPENDSON I ⇒ Q DEPENDSON I
The first rule specifies the relations depending on an information source ’i’, the second propagate the dependencies to the queries. The ’linkInfo’ is assumed to be transitive and symmetric as stated in the previous section. Thus, the effect of removing X as an information source is computed by evaluation the two rules specialized by the substitution ’i=X’. The effect of removing a relation directly from the federated DB schema follows from the ’from’ and ’foreignKey’ dependencies. The corresponding rule is left out here. The above rules will only exhibit the pieces of SQL code to be updated. The actual implementation (e.g. by removing some joins from a query or removing the whole query) has to be dome manually or with foreign tool support.
Service 3: Detect potential foreign keys. Assume that a new information source X is inserted and implemented by some SQL relations (which have to be provided by the system designer). The new information source has same links (’linkInfo) to other information sources.
∀ I J R S I LINK)NFO J ∧ I IMPLEMENT2EL R ∧ J IMPLEMENT2EL S ∧ ∀ A PK R PRIMARY+EY PK ∧ PK CONSISTS/F A ∧ A VALUE4YPE D ⇒ ∃ B S ATTR B ∧ S VALUE4YPE D ⇒ S CANDIDATE FOREIGN+EY R
The middle two lines of the rule make up a necessary condition for foreign keys (matching SQL-domains). Thus, any potential foreign key dependency is subsumes by this condition. The ’linkInfo’ knowledge is used to rule out those relations whose information sources are not connected in the requirements specification, i.e., which are not planned to be used. One
can switch off transitiveness of ’linkInfo’ to limit the search space. A stricter variant of the rule is to test on equality of attribute names instead of the domains.
Service 4: Implement new information sinks. A new information sink is connected to existing information sources. According to Figure D it can be implemented by a SQL query. Certainly, the complete code for the query cannot be derived from the incomplete requirements specification. However, one can derive a superset of the ’from’ clause and use service 4 for setting join conditions.
∀ K I K IN )NFO 3INK ∧ K MATCH I ∧ R IN 31, 2ELATION ∧ R DEPENDSON I ⇒ K CANDIDATE FROM R An integrity constraint ensures that the SQL queries implementing the information sink ’k’ only take the candidate relations in their ’from’ clauses:
∀ K Q R K IN )NFO 3INK ∧ K IMPLEMENT1UERY Q ∧ Q FROM R ⇒ K CANDIDATE FROM R
The constraint rules out accesses to relations which are not specified in the requirements. In other words, if a query accesses information from a relation, then there must be a path from via an information sink and information source back to the same relation.
Service 5: Code generation. Once a complete federated DB schema has been specified in the meta database the executable SQL statements can be extracted formatting the instances of ’SQL-Query’ and ’SQL-Relation’ according to the SQL syntax. Instances of ’SQLRelation’ are uniquely assigned to SQL servers (Figure C). Thus, a code generator for instances of ’SQL-Query’ can identify those relations in the ’from’ clause which are imported from a distant SQL server. Parameterized SQL queries are completely specified by the identifier of the query and the sequence of parameters. This can be used for query time code generation as reported in Section 5.
To summarize, all services in this section are described by the means of deductive rules which check the meta database and extract derived information useful for the services. Integrity constraints
are, with a few exceptions, expressed by assigning attributes of the requirements and implementation language to pre-fabricated categories. Thereby, designers not familiar with predicate calculus can easily set up the terms under which evolution can take place. All rules and integrity constraints are effectively computable by Datalog interpreters [CGT90]. The effect of an incremental update can be computed by evaluating partially instantiated versions of certain rules.
5
APPLICATION IN QUALITY MANAGEMENT We assume that each participating system has an SQL server as front end. SQL servers provide a
uniform data model for information exchange and the ability to evolve their schemes at run time. Finally, all SQL servers are assumed to be able to evaluate queries on distant SQL servers. Indeed, such products are readily available from different vendors. Figure E summarizes the requirements on the participating database systems. The reader should note that the SQL servers here primarily play the role of buffers for information exchange between the heterogeneous applications of an enterprise. While SQL servers make up the target infrastructure, they are not suitable for specifying evolution. SQL schemes and queries are tuned for efficiency (of query evaluation) and not for understanding the semantics of the schemes. SQL has basically no support for the case where a group of designers have to evolve the federated database system. Moreover, Therefore, we present a method within which the participating experts themselves can cooperatively specify/evolve the requirements to the information exchange. The result is used to support the update of the federated database schema. The management of the updates is implemented via the meta database system ConceptBase [JGJ*95] whose predicative query language is capable of quantifying over requirements and schema elements as well as over their relationships.
Figure E: Tools and SQL servers connected to a network.
Such tools are interesting because their interface, a SQL server, can be manipulated from external tools by submitting SQL calls. Specifically, the schema of the server can be evolved from outside. This feature allows the incorporation of an information trader tool which supports schema evolution
based on global information not available in any local schema. The approach has been validated in the project WibQuS [JJS93,Pfei96]. The project participants developed autonomous quality management tools. Their information exchange was cooperatively designed and implemented using ConceptBase as information trader. The goal of the WibQuS project was to discover and enact information flow between half a dozen of quality management methods covering the so-called quality cycle [Ima86,Pfei93]. Practically any department of an enterprise is directly or indirectly involved in quality management. The following list of tools developed in WibQuS gives an impression on the diversity of tasks.
•
Dacapo: an extension of the quality function deployment method [Aka90], supports the engineer to map customer wishes (like "a coffee machine with boils coffee in less than a minute") to technical solutions
•
CaDoX: a knowledge-based system which helps the engineer in planning laboratory experiments; esp. for selecting the appropriate statistical method for evaluation of results
•
CAFA: a failure analysis tool for tracking back symptoms (detected failures) to their causes; the tool uses rule-based knowledge esp. from the FMEA (failure modes and effects analysis) method and data from the service field, the production units and others
•
Argus: a system for planning the measurement points and methods to check the quality of produced products, e.g. the exactness of a hole drilled in a plastic corpus; Argus will need information from Dacca and CaDoX
•
XSPC: an extension of the widely-used SPC (statistical process control) method; XSPC supports the engineer to design intervention intervals depending on actual values of process and product parameters (planned by Dacca and Argus)
•
WiFEA: a configurable form-based user interface that supports service technicians to record quality relevant information during their visits of customers; the forms are multi-purpose, e.g., the engineers using Dacca should be able to request specific information from the service field on the customer's satisfaction with some product properties
The short description shows that there are a lot of references to information generated in distant autonomous systems. The six methods of the WibQuS project are only one possible selection. There may also be interplay with the CAD database, with the production planning system, and more. Figure F contains a snapshot of a certain stage in the requirements specification for WibQuS. A new information sink ’Product-Description’ has been inserted as demand for the participant WiFEA. This demand is not matched by any information source, i.e. it is an answer of the query ’ResolvedSinks-’. The designers may find out that the requested information must be accessed from a CAD database not yet included in the architecture. It is included by inserting the following facts into the meta database and matched to the unresolved request:
(CAD in Producer) (Product-Structure in Info-Source) (Product-Structure linkInfo/hasProperties Product-Properties) (Product-Description match/useCAD Product-Structure)
The product structure is linked to the product properties managed by the Dacapo system. The next step is the declaration of the schema exported by the CAD system:
(CAD-DB in SQL-Server), (R34 in SQL-Relation) (CAD-DB schema/r1 R34), (R34 name/relname "PSE") (R34 attr/a1 ID), (R34 attr/a2 NAME), (R35 in SQL-Relation) (R35 name/relname "PSE2PSE") (R35 attr/a1 PSE), (R35 attr/a2 SUCC) (ID valueType/dom int), (PSE valueType/dom int), (SUCC valueType/dom int) (NAME valueType/dom varchar(20))
Figure F: Requirements specification in the WibQuS case.
The two relations ’PSE’ and ’PSE2PSE’ are used to store component trees of products. The products are identified by the ’ID’ attribute. The next task is the implementation of the new
information sink ’Product-Description’. In WibQuS, the information source ’Product-Properties’ of Dacapo is implemented by two relations ’R12’ (named PSEM) and ’R15’ (named PSEMA). They relate products with properties and admissible values for the properties. Then, service 4 of the previous section computes ’R34’, ’R35’, ’R12’ and ’R15’ as the candidate relations for ’ProductDescription’. The designer may now decide to implement it by a SQL query with a parameter for the NAME attribute. The statements for the meta database are as follows.
(Q77 in SQL-Query) (Product-Description implementbyQuery/q1 Q77) (Q77 from/r1 R34), (Q77 from/r2 R12), (Q77 from/r3 R15) (Q77 parameter/p1 varchar(20)) (Q77 where/text ".NAME= and .ID=.ID and ..." (Q77 select/list ".ID,.MAXVALUE")
For code generation, the placeholders for the relations and the parameters are substituted by the actual relation names or the parameter value, respectively. This completes the example for the insertion of the information sink ’Product-Description’.
5.2 Architecture with Information Trader
So far, the meta database has been used to describe and support the evolution of the federated DB schema. To be functional the evolution must be possible even in the production phase of the federated DB system. Therefore, the meta database is included together with the information trader is included as one of the distributed systems (see node QualityTrader in Figure G). Like all other systems it has an SQL interface to the network, i.e., it accepts SQL calls and can submit SQL calls to other systems. Updating the information market is done by inserting the appropriate Telos string into a dedicated relation. The information trader reads the new tuple and then invokes the services of Section 4.2 to adapt the overall system to the update.
Figure G: Federated database with trader component in WibQuS.
The participating systems use rather different software and hardware platforms, only agreeing on Internet and SQL standards. The QualityTrader can initialize the schema of new tools ’on the fly’ by extracting the SQL code from the meta database. Some relations common to all DB schemes are used to submit updates to the meta database.
5
RELATED WORK
Kashyap and Seth [KS94] proposed an architecture with information providers, consumers, and brokers. The task of the broker is to transform a consumer query into a query on the providers databases by comparing semantic and structural similarities. It is an approach for information access in federated databases where the part schemes have evolved independently. Our proposal is tightlycoupled [SL90]: the group of designers having a common goal collaboratively define the requirements of information exchange (interschema knowledge). Negotiation in federated databases was proposed by Alonso and Barbera [AB89]. Their main concern is to estimate the price for accessing information in order to organize the ’information market place’. They also propose to materialize views of often-used queries on local sites. Such views are regarded as crucial for query optimization in distributed databases (see [RES93]). The institution of an active trader has an important role in Open Distributed Processing (ODP) [PM93]. The properties of a service and its arguments (opposed to schema information in the federated database case) are defined in algebraic terms. The trader has the task to compute a match between a service request and a service offer. Type conversions of arguments are performed for both service request and answer transmission. Recent work on schema evolution concentrates on object-oriented databases. Compatibility with existing applications is achieved by either versioning of objects and schemes [Clam94] or by objectoriented views [TS93]. Rieche and Dittrich [RD94] present an application of federated DB system for molecular biology. They propose an architecture with an object-oriented DBMS as data
repository. The data repository provides a uniform query language and the functionality to import data from external databases.
6
CONCLUSIONS
We proposed a framework for the evolution of a federated DB schema via updates to a meta database. The meta database distinguishes between requirements of information exchange and their implementation in SQL databases. Evolution of information requests (queries) is supported concurrently by representing their match to information resources. The group of DB designers can set-up an evolution policy by enforcing suitable integrity constraints on the meta database. An extensible set of predefined categories frees the designers from the task of encoding predicate calculus. The implementation of the coarse-grained updates to the requirements specification to an evolution of the federated DB schema exploits the power of a deductive query language. Services include re-assignment of information resources and assistance on the code generation for SQL queries. A real-life application has been undertaken in the WibQuS project. The meta database is managed at the production phase of the distributed system by the ConceptBase system. The meta database keeps being accessible and updatable at any time. The approach has proven to work in an environment of heterogeneous SQL servers running on different hardware platforms. The constraint and query facilities were especially productive. In the early phase only few constraints were enforced. As experts became more familiar with the use of ConceptBase and the approach, more and more constraints were activated. The approach is open to intermediate languages between the requirements and implementation domains. Such a language would be another instance of the language model of the meta database. Richer languages (counted in number of different concepts) allow more support in the code generation and in consistency checking. Ongoing research investigates the requirements specification by discrete simulation techniques for detecting weak points in the information flow.
The experience with WibQuS has shown that meta models with customizable categories allow to bridge heterogeneous representations and empower designer groups to cooperate in the evolution of a distributed information system.
Acknowledgments. This work was supported in part by the German Ministry of Research and Technology under grant 02QF3004/1, and by the European Community under ESPRIT Basic Research Action 6810 (Compulog 2) and Working Group No. 8319 (ModelAge). We would like to thank Rainer Gallersdörfer, Peter Peters and Peter Szczurko for the fruitful collaboration in WibQuS for almost three years. A big thank goes to the members of the ConceptBase team, esp. Martin Staudt, Kai von Thadden and René Soiron, for investing much of their spare time in supporting the WibQuS project, and to our student programmers, Almudena Rodriguez-Pardo, Ute Löb, Markus Mandelbaum and Reiner Nix.
7
REFERENCES
[AB89]
Alonso, R., Barbará, D. (1989). Negotiating data access in federated database systems. 5th Intl. Conf. Data Engineering, pp. 56-65, 1989.
[Aka90]
Akao, Y. (1990). Quality function deployment - integrating customer requirements into product design. Productivity Press, Cambridge, 1990.
[CGT90]
Ceri,S., Gottlob,G., Tanca,L. 1990). Logic Programming and Databases. SpringerVerlag, 1990.
[Clam94]
Clamen,S. (1994). Schema Evolution and Integration. Distributed and Parallel Databases, 2, 1, pp. 101-126, Jan. 1994.
[GS87]
Gallagher,J., Solomon,L. (1987). Mapping CML to assertional level. Technical report, SCS Technische Automation und Systeme, Hamburg, Germany, 1987.
[Ima86]
Imai, M. (1986). Kaizen - The Key to Japan’s Competitive Success. MacGraw-Hill, New York, 1986.
[ISO90]
ISO/IEC 10027 (1990). Information technology - information resource dictionary system (IRDS) - framework. ISO/IEC International Standard, 1990.
[JGJ*95]
Jarke, M., Gallersdörfer, R., Jeusfeld, M.A., Staudt, M., Eherer, S. (1995). ConceptBase - a deductive object base for meta data management.Journal of Intelligent Information Systems, 4, 2, pp. 167-192, 1995.
[JJ91]
Jeusfeld, M.A., Jarke, M. (1991). From relational to object-oriented integrity simplification. Proc. 2nd Intl. Conf. Deductive and Object-Oriented Databases, LNCS 566, Springer-Verlag, pp. 460-477, 1991.
[JJR90]
Jarke, M., Jeusfeld, M.A., Rose, T. (1989). A software process data model for knowledge engineering in information systems. Information Systems, 15, 1, pp. 85-116, 1990.
[JJS93]
Jarke, M., Jeusfeld, M.A., Szczurko, P. (1993). Three aspects of intelligent cooperation in the quality cycle. Intl. J. of Intelligent and Cooperative Information Systems, 2, 4, pp. 355-374, 1993.
[MBJK90] Mylopoulos, J., Borgida, A., Jarke, M., Koubarakis, M. (1990). Telos: a language for representing knowledge about information systems. ACM Trans. Inf. Syst., 8, 4, pp. 325-362, 1990. [Pfei93]
Pfeifer, T. (1993). Qualitätsmanagement. Carl-Hanser-Verlag, Germany, 1993.
[Pfei96]
Pfeifer,T. et al. (eds., 1996). Wissensbasiertes Qualitätsmanagement - Methoden und Techniken zur Nutzung verteilten Wissens. Springer-Verlag, 1996.
[PM93]
C. Popien, B. Meyer, Federating ODP traders: an X.500 approach. Proc. of IEEE Intl. Conf. on Communications (ICC), Geneva, pp. 313-317, May 1993.
[RD94]
Rieche, B., Dittrich, K.R. (1994). A federated DBMS-based integrated environment for molecular biology. Proc. 7th Intl. Working Conf. Scientific and Statistical Database Management, Charlottesville, VA, USA, Sept. 28-30, 1994.
[RES93]
Roussopoulos,N., Economou,N., Stamenas,A. (1993). ADMS: a testbed for incremental access methods. IEEE Trans. Knowledge and Data Engineering, 5,5, pp. 762-774, Oct. 1993.
[SL90]
Sheth,A.P., Larson,J.A. (1990). Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 22, 3, pp. 183236, Sept. 1990.
[SNJ94]
Staudt,M., Nissen, H.W., Jeusfeld, M.A. (1990). Query by Class, Rule, and Concept. Journal of Applied Intelligence, 4, pp. 133-156, 1994.
[TS93]
Tresch, M., Scholl M.H. (1993). Schema transformation without database reorganization. SIGMOD Record, 22, 1, pp. 21-27, 1993.
,ANGUAGE -ODEL
LEARN
LEARN
REFERENCE
2EQUIREMENTS ,ANGUAGE
)MPLEMENTATION ,ANGUAGE EVOLVE
START 2EQUIREMENTS OF
CODE IMPLEMENT
&EDERATED $" SCHEMA
INFORMATION EXCHANGE
UPDATE CONTROL
QUERY PROCESS
4RACES
$ATABASE
Figure A: Levels of abstraction and their interconnection.
REFERENCES
2EQUEST
ISSUES
/BJECT
ACCESSES
4OOL
2ESOURCE
OFFERS
Figure B: Language model (meta model) for offering resources and issuing requests. 3TRING 31, 1UERY
BY1UERY 31, 3ERVER
DB.AME
WHERE
SELECT
FROM SCHEMA
31, 2ELATION
ATTR
3TRING
NAME
2EL !TTRIBUTE
VALUE4YPE 31, $OMAIN
CONSITS/F
PRIMARY+EY !TTR 3ET
PARAMETER
Figure C: Abstract model of SQL servers as implementation platforms.
#ONSUMER
DEMAND
IS!
)NFO 3INK
0ARTICIPANT
IMPLEMENT1UERY
MATCH
FROM
LINK)NFO IS!
)NFO 3OURCE
31, 1UERY
IMPLEMENT2EL
31, 2ELATION
SUPPLY 0RODUCER
ASSIGN3ERVER
31, 3ERVER
Figure D: Requirements language for information exchange.
4OOL
.ETWORK
31, 3ERVER
!PPLICATION PROGRAM PERFORMING SOME TASKS IN THE DISTRIBUTED SYSTEM )NTERFACE BETWEEN APPLICATION PROGRAM AND $" SYSTEMS EG EMBEDDED 31,
$" SYSTEM CAPABLE OF STORING RELATIONS LOCAL $" AND EXECUTING 31, STATEMENTS ON LOCAL AND REMOTE $"S !CCESS TO NETWORK EG VIA )NTERNET PROTOCOL
Figure E: Tools and SQL servers connected to a network.
$ACAPO
7I&%!
DEMAND
#A$O8
DEMAND
DEMAND
.%7 #USTUMER 7ISHES
0RODUCT $ESCRIPTION -AINTENANCE )NSTRUCTION
#ORRELATIONS
0ROCESS 3PECIFICATION
USE,AB
2ELEVANT 0ARAMETERS
USE1&$ USE3ERVICE
USE-ARKETING USE1&$
0RODUCT 0ROPERTIES 3ERVICE 2EPORT
#HECK 0ROPERTIES
SUPPLY 7I&%!
SUPPLY $ACAPO
Figure F: Requirements specification in the WibQuS case.
%XPERIMENT 2ESULT
SUPPLY #A$O8
#!&!
).4%2.%4
DACAPO
/RACLE 31, 3ERVER
7:, !ACHEN
31,
31,
#A$O8
/RACLE 31, 3ERVER
)04 !ACHEN
+0
31, 31,
/RACLE 31, 3ERVER )04 !ACHEN
/RACLE 31, 3ERVER
)&& 3TUTTGART
830#
31,
31, )NFORMIX 31, 3ERVER
/RACLE 31, 3ERVER
7I&%!
)&& 3TUTTGART
!2'53
&"+ +AISERSL 31,
)NGRES 3ERVER
,&1 $RESDEN
1UALITY4RADER )NFO 6 !ACHEN
-ETA $"
3Y"ASE /MNI31, 3ERVER
31,
#ONCEPT"ASE
Figure G: Federated database with trader component in WibQuS.