Dept of Computer Science .... we argued in favor of introducing case-based reasoning tech- .... sity's computer science department, an engineering department .... salary and experience. The same type of question is now asked for Person.
Determining
Schema Interdependencies in Object-Oriented Multidatabase Systems Jian Yang
Mike P. Papazoglou
Dept of Computer Science University College, UNSW Australian Defence Force Academy Canberra, ACT 2600 Australia
School of Information systems Queensland University of Technology GPO Box 2434 Brisbane QLD 4001 Australia
Abstract
the identification and subsequent reconciliation of structural and semantic disparities which occur between the individual data elements within the CDBSs. For the purposes of this research it is convenient to distinguish between two complementary sets of such problems: structural conflicts and incompatibilities, and semantic inconsistencies and incompleteness [6]. By structural conflicts and incompatibilities we mean that the objects, object properties represented in the component databases are either conflicting or they present domain mismatches with respect to one another - even though they may model to the same ‘real-world’ concepts. Examples include problems in instance matches, units of measurement, contextual and structural mismatches, and so forth.
Schema analysis is the process of determining and capturing how individual schema items in diverse databases are related to each other. During this process different types of semantic relationships between schema items in different databases are identified and meaningful cross-links are established. The work presented in this paper concentrates on the issue of determining several types of interdependencies among schema items in object-oriented multidatabase systems in terms of their semantic context. A methodology is proposed for semi-automated schema analysis whereby knowledge from schema analysts may be applied in an interactive manner to refine and fully determine potential schema item interdependencies.
Semantic inconsistencies involve different interpretations of the meanings of schema entities in the individual CDBSs. For example, consider a CDBS which refers to its students as undergraduate and postgraduate students, whereas a second may choose to refer to them as first, second, third year, Master’s and PhD students. The issue of semantic incompleteness relates to the situation where meaningful inter-object and inter-schema relationships, required for the purposes of schema merging and interoperation, are missing from the component schemas. For example, if one data source deals with information regarding commercial enterprises, it is quite unlikely to include all the information relating to a university lecturer with respect to his employment as company consultant. Conversely, a university database may not contain information concerning the consulting record of its lecturers. Only by establishing meaningful inter-schema relationship(s) between Lecturers in the university database and Employees in the company database and combining information regarding these two views will the corporate information source be able to form an overall picture about employees who are lecturers at the same time.
Iceywords: multi-database systems, object-oriented DBs, schema analysis and integration, semantic and structural discrepancies, inter-type assertions, analogical reasoning.
1
INTRODUCTION
A multidatabase system (MDBS) consists of a collection of independently developed and separately administered component database systems (CDBSs). The purpose of an MDBS is to support controlled sharing of system-wide data sources and provide coordinated control among a collection of autonomous and possibly heterogeneous CDBSs. Corporate data may reside anywhere in the CDBS network and can be operated upon transparently by both users and applications. To achieve this functionality an MDBS provides a homogenizing layer on top of the existing CDBSs, thus gives users the illusion of a single unified homogeneous database system. In order for sharing to take place among heterogeneous CDBSs, some common data model must be used for describing corporate data and for coordinating distributed control. This model must be semantically expressive in order to capture the intended meanings of CDBS schemas which may use different data formats, obey different data modeling conventions and may more importantly model different aspects of the same concepts of some common domain. It is generally accepted that object-oriented models (especially those which include active database capabilities) can be considered as a key core technology for achieving integration and interoperation between multiple autonomous data sources along with providing the capacity to perform distributed object management functions [l], [2], [31, [41* Successful integration of multiple CDBS schemas requires
The issues of structural conflicts and incompatibilities have been extensively researched. There exist various proposals for solving these and related problems [5,7,8,9,10,11, 121. In the area of structural incompatibilities, techniques were proposed for structural and domain mismatches whereby conflicting attributes are mapped to common domains by means of applying extended relational operators [13]. It is only recently that has been realized that tremendous potential lies in the use of sophisticated semi-automated tools for data analysis and integration. Bellcore’s Schema Design and Integration Toolkit (BERDI) [14] provides intelligent support to improve the quality and reduce the time required to perform schema analysis and integration of non-trivial schemas in industrial set ups. These capabilities built on and enhance previous attempts to provide interactive
47
support for schema integration
tion and reasoning to traditional object-oriented database modeling constructs. To provide a sound basis for discussion we delineate only those object-oriented KC)M features which are required for analyzing disparate CDBS schemas. More information about the nature and implementation of this knowledgeoriented model which was developed on the basis of the objectoriented database system ONTOS and the expert system she11 CLIPS can be found in [23, 24, 251.
purposes [lo].
As a general rule most of the above proposals are based on fairly high-level and intuitive understanding rather than a detailed classification scheme. Recently Kim and Seo [15] attempted to resolve this problem in a relational multi-database environment by proposing a systematic classification scheme. However, the issue of a general formal methodology for specifying useful inter-schema relationships and their characteristics during the process of schema analysis has yet to be addressed.
The following discussion will focus on typed objects which have three major characteristics: they are all instantiated from some common type description; they each belong to a single type only; and their instances are grouped into classes, on the basis of similarity.
Semantic issues and in particular the problem of semantic incompleteness have either been marginally treated or received no attention thus far. Most notable is the work of Kent [16] where the author describes a great variety of ways to represent a given fact - some of which relate to our perception of semantic inconsistencies. The paper, however, does not propose any solution for resolving semantic inconsistencies. In a more recent paper [17] Kent discusses and proposes solutions to another semantic problem, that of ontological correspondences between entities created in different CDBSs. The author proposes a solution to this problem in terms of spheres of knowledge which act as repositories of abstracted knowledge describing CDB information. Semantic incompleteness has yet to receive appropriate attention, it only received tangential treatment as a byproduct of the schema merging activities [18], [19]. The main thrust of these research activities has been towards using a knowledge representation scheme for automating the process of identifying different types of attribute relationships.
The information which we can get from a database system regards its schema and the actual instances of data stored in the database. The former provides us with information on the organization and definitions of data, i.e., intentions, which consist of types and relationships between types. A type T is a description of the structural and behavioral properties of a set of objects which exhibit the same characteristics and which make up the instances of that type. In the object model we organize types around a type (or inheritance) lattice [26], a directed acyclic graph, which supports the notions of generalization and specialization via is-a relationships and is used to represent the conceptual schema for an object-base. Different types may be identified with the same set of functions (attributes and methods) and a type T2 is said to be a subtype of type Tl if Tl provides a subset of the functions of T2. Conversely, we also say that Tl is supertype of T2. Figure 1 illustrates the rudimentary schemas of component databases in terms of types which are organized around inheritance lattices.
The work reported in this paper discusses several fundamental issues which have to be addressed during schema analysis in the context of object-oriented multidatabases and concentrates predominantly on semantic incompleteness issues. Towards this end, we use classification scheme in conjunction with determining inter-schema linkages according to the semantics and structure of analyzed CDBS schema items and a formal methodology for describing inter-schema relationships and their semantics. Research activities described in this paper build upon and partially extend previous research work that we conducted on schema analysis [20], [21], [22]. In [20, 211 we proposed a knowledge-based framework and discussed the basic features of a semi-automated tool for representing the semantics of interschema components and for reasoning about them, while in [22] we argued in favor of introducing case-based reasoning techniques for semi-automating the process of discovering structural and semantic relatedness between component schemas in CDBSs.
The notion
DATA
with
all instances of a
given type [27]. A class is based on the same specification as a type, but it includes the run-time notions of object creation by cloning the prototype for the class, and the extent, which is the set of all objects that are instances of the type at any given point in time [27]. Thus objects which correspond to a single type in the inheritance lattice, are grouped together and treated as a single unit, viz. their corresponding class. Since a class represents only the set of all objects that are instances of a specific type, at any given time, we introduce the concept Real World Semantic Class (SR) which will be used later for inter-schema relationship definition. 3?(T) denotes the set of all the possible objects which are instances of type T. Obviously, the class of objects of type T in a database (at a given point in time) is a subset of R(T).
The remainder of this paper is organized as follows: Section2 introduces the salient features of an object-oriented model used for discussing our approach to schema analysis. Section3 outlines our basic approach to schema analysis in objectoriented multidatabases systems and proposes a formal methodology and an algorithm for type comparison in inheritance Iattices, i.e., object-oriented schemas. Finally, Section-4 presents our conclusions and discusses future research directions.
2
of class is associated
In this paper, the term type property is used to refer to attributes, relationships and methods. An attribute may take an instance of a primitive type in ONTOS as its value, such as int, char, float, etc or a non-primitive type such as another object. A relationship relates the current instance of its defining type to other type instance(s). For example, the type Student in Figure 2 has the attributes major, degree, score and relationship take-course which relates an instance of type Student to instances of type Course.
MODELING
In this paper we assume that the MDBS uses a common objectoriented database model which adapts knowledge representa-
48
DB 1: Computer Science Department Type Student
student
I F?----l Academic
I supertype: Person
Programmer
attributes:
S=tary
major: sting
StdT
Lecturer
41 Senior Lecturer
degree: sting score: int relationship:
Professor
take-course:
DBZ: Engineering Department
Type course
methods: insert-major
{ 1
insert&gree
( ]
1
calculate~score { 1 I/* Student *I
Figure 2: Definition
As with most methodologies, we determine potential relationships between pairs of similarly structured CDBS schema types on the basis of comparison. Currently, a semi-automated knowledge-driven schema analysis tool (SAKSAT) is under development whereby a human-operator is in a position to interact with this relatively sophisticated schema analyzer - which also stores derivation histories and caSes of previous findings in a case base - in order to identify potentially related schema types and/or attributes [22]. This approach presents many similarities with BERDI [14]. SAKSAT acts as an “intelligent” assistant and is geared only towards determining the semantic relatedness of schema types, discovering implicit facts about them, reasoning about its findings and classifying the related types with the aid of a human operator whenever it does not have sufficient information. BERDI, on the other hand, addresses the entire schema integration cycle and focuses on the use of graphical user interface facilities to expedite integration. In this sense it uses graphical display facilities for representing extended ER schemas (called source schemas) and graphical query facilities for retrieving model level information (e.g., attributes and relationships with same names) and dictionary information (e.g., aliases, creators, businesses purposes, schemas of origin, entity descriptions and so on).
DB3: Company Staff
Management
Salaried
St&f
Staff
Figure 1: Examples
3
of Component
of Type Student in Component Database-l
Schemas
A METHODOLOGY FOR OBJECTORIENTED SCHEMA ANALYSIS
As a basis for our analysis of and approach to the various types of discrepancies that exist in object-oriented MDBSs we consider developing an application for a collaborative long-term project conducted between three distinct institutions: a university’s computer science department, an engineering department and a computer company. Thus the need to integrate at least three distinct but interconnected component databases. Parts of the individual CDBS schemas are shown in Figure 1, where each database carries a tag name of the form DB;(l < i 5 3). Figure 3 shows possible properties for the two types Person and Staff. This figure also indicates which properties these two types share in common and the properties which are private to each of them - referred to as additional properties in the figure.
Following the relevant literature (111 we refer to that part of schema analysis used for determining relationships among different schema types as type assertions. Inter-type assertions represent serious sources of problems because they do not exist in the separate schemas, as such and have to be identified during the analysis process. Previous research work for detecting inter-schema relationships either totally relied on attribute equivalence or used heuristics to order pairs of objects for review by the database administrator (DBA) [21]. In fact, two complementary methods were proposed for determining types assertions: attribute subsumption [lo, 121 and instance subsumption [19, 111. Although these types of approaches present certain similarities and may in fact influence the treatment of
49
type
assertions
they are rather
restrictive
in their
present
form.
where Pe(T) denotes the set of properties which are shareable by all objects in 8?(T); Fe(T) denotes the set of properties which are not shareable by all objects in S(T); P”‘(T) denotes the set of properties which are shareable by some objects in
Furthermore the proposed methodology a tremendous mental burden on the DBA. The entire process is ‘manual’ and the DBA acts only as a ‘technician’ who mechanically encodes the findings regarding type relatedness but cannot interact with the system to help it further refine type/attribute assertions based on previous findings. In order to overcome these limitations it is necessary to examine a variety of existing or implied semantic links between types (and their respective attributes) in the individual CDBSs. Moreover, in order to identify potential type assertions it is imperative that designers rely on the assistance of semi-automated intelligent schema analysis tools whereby the human-operator may be asked to apply his knowledge if additional information is needed to further refine the nature of type assertions.
3.1
A Theoretical Basis for Determining ality and Inferring Similarity
W-1. If C(Tl,Tz) is the set of common properties between types Tl and T2, then we have C(Tl,T2) = C(T2,Tr) = P(Tl) n P(T2). For example from Figure 4, one may observe that C(Tl,Tz) = C(T2, Tl) =name, id-no, address, tel-no. If Q(Tl,T2) is the set of properties tha,t a type Tl includes in addition to its common properties with T2, we then have Q(Tl,T2) = P(Tl) - C(Tr,T2). For example, from Figure 4, one may observe that Q(Tr, T2) = age, sex. Using similarities
Common-
(C(Tl,Td
We base our discoveries regarding existing commonalities and inferred similarities in CDBS schemas on a sound theoretical foundation. In the following, we provide formal definitions of inter-schema assertions based both on type definitions as well as existing type instances. For this purpose we use two fundamental concepts from the AI areas of Analogy and Induction, namely existing commonality, and inferred similarity 1281. Subsequently, we classify the various kinds of inter-type assertions and base our findings on these two concepts. In the following we give a list of helpful definitions for this purpose:
Figure
3: Common
and Additional
Inferred Similarities
Attributes
Figure
l%T) Pe’(T)
= fl,,,p, %x), = UzE9(~)Pets)
(qi E
Q(Tl,Td)
3
(qi E Pe(Td).
4: Existing
Salary Experience 7An . e !Hank !Salarv !Experrence Commonalities
and Inferred
Similarities
properties of Tl; (2) not all the objects of %(Tz) can share these properties; (3) all the objects of sZ(T2) cannot share these properties. The following formulae correspond to these three types of answers:
Q(Tl,Tz)) ---) (n; E Pe(Td). 2. C(Tl,Td # 0) A (4; E Q(Tl,Tz)) --+ (a E Pe'(Td). 3. C(Tl,T2) # 0) A (q; E Q(Tl,Td) ----+ (4; E ~4’2)). 1. C(E,T2)
Now for any object x, let Pe(x) be the set of all properties which are shareable by that object z, further let Fe(z) be the set of all those properties not shareable by the object z. Hence: P,(x),
0) A
the inferred
Obviously, there are three possible answers to the above question: (1) all the objects of R(T2) can share the additional
Since Pe(T) includes the properties which are defined in type T as well as those which are not included in this type’s definition but are allowed to be part of it (e.g, attributes like age and sex in Figure 3 are shareable properties of type T2) we then have P(T) 5 Pe(T).
= &ERcTl
#
we can derive
The above formula is used to prescribe the interaction between the schema analysis tool and the DBA. Loosely speaking this formula corresponds to a form of a question that should be answered by the DBA during his interaction with SAKSAT. The interpretation for this formula is as follows: consider the additional properties of type Tl referred to as (Q(Tl,Tz)) and ask the DBA if they can also be shared by type T2.
Let P(T) be the set of properties which are defined in a given type 2’ and Pe(T) be the set of all permitted, i.e. conformant, properties of type T which are currently not included in its signature. We refer to such properties as shareable properties of the type T.
Pe(T)
the above set definitions formula as follows:
We context in DB1 has two considers
now we may define and - Pe(T)
50
# 0) A (4; E
exemplify this type of system/DBA interaction in the of Figure 4. When SAKSAT compares the type Person with the type Staff in DB3 it finds that the former type additional attributes: age and sex. Subsequently, it these two attributes and asks whether type Staff can
also share them. The obvious answer is ‘Yes, all staff objects can have these two attributes’. Similarly, type Staff includes three attributes not contained in type Person, namely rank, salary and experience. The same type of question is now asked for Person. The obvious answer here is ‘Some can, some cannot’. This answer in fact implies that Staff is a subtype of Person. In this way, the system derives the inferred similarities between related types. In the following section we use these formal order to classify type assertions.
3.2
Classifying
Inter-Type
definitions
in
implies that although their real world semantic classes are disjoint, they may have some common properties as well as shareable properties. Some of the additional properties of one type cannot be shared by the other type, i.e. they are mutually exclusive. Types which are inter-related according to one of the above three categories are said to be overlapping. Definition 4: Tl and !R(T2) # 0 holds. Finding (C(Tl,T2) or
Assertions
1: Tl is Equivalent 1: If Tl Eq T2, then
Finding
either
Definition corresponds
we say that
Tr and T2
then their real are equivalent which case are these properthe other type
3.3
4
2: If Tl Ge T2, then either # 0) A or
(Q(Tl,Tz =
0) A (qj E
Q(T2,Td)
(qj E Pe’(T1));
(2) (CP'l,Td # 0) A (qi E Q(Tl,W) A (qj E QG%W) -+ (4; E Pe(Td) A (qj E Pe'(Td). Definition
2.1:
For
case 2.(l)
above,
we say that
Tl
Definition Supertype
2.2: This case is a subcase of definition 2.1 and to case 2.(2). Here we say that Tl is a Conceptual (Csup) of T2.
Description 2: If Tl is a generalization of T2 then the real world semantic class of T2 is a subset of that of Tl . And all the definitions of Tl are logically subsumed by T2. Definition I?
3: T1 and T2 are Category = 0, and C(Tl,Tz) # 0 holds.
(CCL
(Ca) related
T2) # 0) A (a E QP’I,
Td)
A (qj E
QP2,7’1))
-+ (G E ~4Td) A (qj E pe(Tl)). Description
3: The fact
that
two types
E WT2));
are category
If two types are role related, their real world are overlapping. All the properties of one type some instances of the real world semantic class [29].
that
two types
are disjoint
if they
A Strategy for Type tance Lattices
Comparison
in Inheri-
ascertain that there is no semantic affinity between two types, Tl and T2 in two different CDBSs, i.e., they are disjoint, then this can be used as an indication that their descendants are also not related, This reduces the search time since we do not have to compare pairs of types where one type in the pair belongs to schema whereas the other belongs to another schema which incidentally is common practise for determining attribute relatedness.
if !l?(Tl) O
3: If Tl Ca T2, then
Finding
(4; E Pe’b),z
The fact that types are organized around inheritance lattices facilitates inter-schema type comparison. One can compare types in a systematic way unlike in conventional data modER, etc, where this sort of comparison is els, e.g., relational, intuitive and painstaking since each type (e,.g. relation) in the MDBS must be compared with every other. Moreover, the process of type comparison can be expedited in several occasions by not comparing unrelated sub-lattices. For example, if we
is a
Supertype (Sup) of T2. corresponds
Q(TLT2)) --+
Schemas in object-oriented CDBSs correspond to inheritance lattices. The purpose of the inheritance lattice is to organize types around a rooted and connected directed acyclic graph graph so that each type can be reachable from the root type. From the previous it is obvious that the basic unit in an objectoriented schema is the type. Hence, the primary task of schema analysis in object-oriented databases is concerned with type analysis and comparison, i.e., determining the valid inter-type assertions regarding CDBS schemas.
Definition 2: Tl is a Generalization (Ge) of T2 iff k%(Tl) > %(T2) and Tz is a Specialization (Sp) of Tl.
(1) (C(Tl,Td)
E
In the following section we present an algorithm for type comparison and illustrate how this methodology is used to determine the inter-type assertions discussed in this section.
(Iso).
Description 1: If two types are equivalent, world semantic classes are equal. If two types then their definitions are either the same - in identicalor if one type has additional properties, ties can be shared, i.e, they are permissible, by - in which case the types are isomorphic.
Finding
0)A (a
The above definition implies have nothing in common.
1.2: This case is a subcase of definition 1.1 and to case l.(2) above. Here we say that Tl and Tz
are Isomorphic
#
Definition 5: Tl and T2 are Disjoint (Dsj) if the following two conditions 3?(Tl) O %?(!I’$) = 0 and C(Tl, T2) = 0 A (Q(T1, T2) = 0) A (Q(T2,Tl) = 0) hold.
# 0) A
above,
4: If Tl Ro T2, then
Description 4: semantic classes can be shared by of another type
to (Eq) T2 iff ZR(Tl) = 9i(T2).
Definition 1.1: For case l.(l) are Identical (Id);
n
For the first case above, we say that Tl and X2 are role ‘Tz’ related; whereas for the second case, we say that Tl and T2 are role ‘Tr’ related.
(QP'l,W = 0) A (Q(T2,Td = 0); or (2) (C(Tl,Tz) # 0) A (si E Q(Tl,Td) A (qj E Q(T2,Td) --+ (si E WT2)) A (qj E WTl)).
(1) (CP’l,Td
if %(Tr)
(CP’1, Td # 0) A (qj E Q(T2,Td) --+ (qj E Pe’(y),y E WC))
In what follows we give definitions and examples of the most representative classes of type assertions in an object-oriented multidatabase environment. For the classification scheme that follows we assume that the types Tl and T2 belong to two different schemas in different CDBSs. Definition
T2 are Role (Ro) related
related
51
The following type comparison:
rules are used to accelerate
the process of
. Rule 1: If T2 is a subtype of Tl, then T2 is disjoint with all the other siblings of Tl and their descendants. . Rule 2: If Tl and T2 are equivalent, then all the siblings of Tl and their descendants may be overlapping with all the siblings of T2 and their descendants. . Rule 3: If Tl and T2 are category related, then all the descendants of Tr and T2 are category related to each other. Using the above rules during the process of type comparison in object-oriented schemas , the search times will be significantly reduced. In the following we present an algorithm for inter-type comparison in inheritance lattices. We assume that IR is an interschema assertions matrix, where an element IR[Ti,Ti] is used to denote the sort of inter-type assertion holding between types Ti and Tj in different CDBs. According to the inter-type assertion definitions in Section 3.2, IR[Ti,Ti] is allowed to assume only one of the following values: {Eq, Ge, Sp, Ro, Ca, Un, Dsj}. These symbols correspond to assertions of the type Equivalent, Generalization, Specialization, Role, Category, Unknown, Disjoint, respectively. The IR matrix is initialized with Un entries in all its positions. The algorithm which is shown in Figure 5 presents a systematic method for comparing pairs of types in diverse CDBs. This algorithm can be easily generalized and be used for comparison among n-types rather than pairs of types only. In fact in a companion paper we have argued in favor of the use of case-base reasoning techniques for this purpose [22]. main begin for all types in two schemas IR[T;, Tj] := Un; endfor Ti := root of lattice 1; Tj := root of lattice 2; compare (Ti,Tj); end procedure compare (T;, Tj) if IRfT;, Tj] # Un then return endif result case of ‘Category’: IR[T;,Tj] := Ca; for each descendant Ti of Ti do for each descendant Ti of Tj do IR[Ti, Tj’] := Ca; endfor endfor return ‘Specialization’: IRITi, Ti] := Sp; for each descendant T,! of T; do IR[T,!,Tj] := Sp; endfor if there is no overlap between Tj and its sibling Tjb for each Tjb and its descendants Tjb do IR[T;,TTJ := Ca;
* T$ can be either Tjb or T,!b. * endfor endif for each child Ti of Tj do compare (Xi, T$); endfor return ‘Generalization’: IR[T;, Tj] := Ge; for each descendant Tj of Tj do IR[Ti, Tjl] := Ge; endfor if there is no overlap between T; and its sibling Tib for each Tib and its descendants Tib do IR[Tj, TG] := Ca; *T,$ can be either Tib or T,!b. * endfor endif for each child Ti( of Ti do compare (T,!, Tj); endfor return ‘Equivalent’: IR[Ti,Tj] := Eq; if there is no overlap between Ti and its sibling Tib do for each Tib and its descendants X$ do IR[Tj, Tz] := Ca; endfor endif if there is no overlap between Tj and its sibling Tjb do for each Tjb and its descendants T$, do IRIT;,TTb] := Ca; endfor endif for each child Ti of Ti do for each child Tj of Tj do compare (Ti, Tj); endfor endfor return ‘Role’: IR[Ti 7Tj] := Ro; for each child T,! of Xi do compare (Ti, Tj); end for for each child Tj of Tj do compare (Ti, T;); endfor return Disjoint ‘Disjoint’: IR[Ti,Tj] := Ro; return endcase end compare. Figure 5: An Algorithm
for Lattice-based
Type Comparison.
4
[8] S. Navathe et al., “Integration Design”, IEEE Computer, vol.,
CONCLUSIONS
Determining the structural and semantic relatedness of types is paramount in our effort to analyze and integrate diverse component database schemes in a multidatabase environment. Unless one understands what the individual object types mean, how they are organized, how they are semantically related to each other, and if there are any conflicts between them, one cmot integrate them effectively. The most critical resource in our effort to resolve these problems is knowledge of the internal structures and semantics of conceptual schemas. In this paper we identified several fundamental issues which have to be addressed during schema analysis in the context of object-oriented multidatabases. We introduced a formal framework for determining commonalities and inferring similarities between inter-schema types, classified inter-type assertions a~cording to the semantics and structure of analyzed types and proposed an algorithm for type comparison in inheritance lattices. Currently, a case based methodology is being explored to semi-automate the process of discovering structural and semantic conflicts and relatedness of type components from disparate component data sources. As this methodology leans heavily on reasoning, deductive and object-oriented facilities, our implementation vehicle relies on a novel data model based on a merger of the object-oriented database system ONTOS and the expert system she11 CLIPS. Acknowledgments We gratefully acknowledge the contribution of Brendan McKay for his suggestions and helpful comments regarding an earlier version of this article.
User Views in Database no. 1, January 1986.
[9] M. V. Mannino et al., “A Rule-Based Approach for Merging Generalization Hierarchies”, Information Systems, vol. 13, no. 3, 1988.
PO1A.
P. Sheth and J. A. Larson, “A Tool for Integration Views”. Conceptual Schema And User in Proceedings of 4th Conf. on Data & Knowledge Eng., LA, Feb. 1988.
PI
J. A. Larson et al., “A Theory of Attribute Equivalence in Database with Application to Schema Integration”, IEEE Trans. on Software Engineering”, vol. 15, no. 4, April 1989.
P21S. Hayne
and S. Ram, “Multi-User View Integration System(MUVIS): An Expert System for View Integration”, in Proceedings of 6th Conf. on Data & Knowledge Eng., Feb. 1990.
bl
L. DeMichiel, “Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains”, IEEE Trans. on Knowledge & Data Eng., vol.l, no.4, Dec. 1989.
P41 A.P. Sheth, H. Marcus, “Schema Analysis and Integration:
Methodology, Techniques and Prototype core, Techn. DMmemo: TM-STS-019981/l,
Toolkit”, BellMarch 1992.
1151 W. Kim and J. Seo, “Classifying
erogeneity in Multidatabase Dec. 1991.
Schematic and Data HetSystems”, IEEE Computer,
P’31W.
Kent, “The Many Forms of a Single Fact”, Spring COMPCON89, San Fransisco, March 1989.
References
PI F.
Manola pert, vol.,
PI
1171 W. Kent,
“Object-Oriented Knowledge no. 3, March 1990.
Bases”, AI Ex-
W. Kim, et. al, “A Distributed Object-Oriented Database System Supporting Shared and Private Databses”, ACM trans. on Information Systems, vol., no-l, Jan. 1991.
PI F.
Manola et. al, “Distributed Object Management”, Inter’l Journal of Intelligent & Cooperative Information Systems” , L, vol 1 no. 1, 1992.
[41 M.P.PapazogIou,
L.Marinos “An Object Oriented Approach to Distributed Data Management”, Journal of Systems & Software, vol. 11, no. 2, 1990.
PI C.
Batini, M. Lenzerini, S. B. Navathe, “A Comparative Analysis of Methodologies for Database Schema Integration”, Computing Surveys, A, vol 18 no. 4, December 1986.
PI A.
Silberschatz et al., “Database Systems: Achievements and Opportunities”, Communications of the ACM, vol. 34, no. 10, October 1991.
[71 B. M. Johansson
A Knowledge 1988.
“The Breakdown of the Information Model in Multi-Database Systems”, SIGMOD Record ?A, vol 20 no.4, Dec. 1991.
and C. Sundblad, “View Integration: Problem”, SYSLAB Working Paper, no. 15,
PI
A.P. Sheth, S.K. Gala “Attribute Relationships: An Impediment in Automating Schema Integration”, inproceedings of Workshop on Heterogeneous Databases Systems, Chicago, Dec. 1989.
P91A.
Savasere et al., ‘ [27] M. Atkinson, et al, “The Object-Oriented Database System Manifesto”, in Procs. 1st Deductive Object-Oriented Database Conf., Kyoto 1989. [28] S. J. Russell, “The Use of Knowledge in Analogy and Induction”, Research Notes in Artificial Intelligence, Morgan Kaufmann Publishers, Inc., San Mateo, CA, 1989. [29] M.P Papazoglou, “Roles: A Methodology for Representing Multifaceted Objects”, in Procs. Database Expert Systems and Applications (DEXA)-91, Berlin, August 1991.
54