Proceedings of the 28th Annual Hawaii International Conferenceon SystemSciences- 1995
An N-ary
Newark,
View
Integration
Method
Nabil R. Adam* MS/CIS Department Rutgers University NJ07102
[email protected]
Abstract The purpose of this paper is to describe a methodology that can integrate n database views simultaneously. The methodology consists of transforming a database view into an intermediate representation, based on the Conceptual Dependency theory. The conceptual representations corresponding to the views are then combined to form a “global” representation, which is subsequently converted back to a data model that represents the global, integrated schema. Our methodology makes use of the semantic content of a database view in the integration process, unlike other view integration methodologies proposed in the literature. We show, by examples, how this approach can eliminate multiple restructuring of constituent views in the integration process.
1
Using
Introduction
View integration refers to the process of combining multiple database views into a single one (e.g., [2, 3, 4, 5, 6, 8, 10, 12, 14, 13, 15, 19, 221) Typically, multiple views develop as a result of having independently developed application views. For example, the finance department focuses on the finance aspect of the organization, whereas the production department would focus on the production related aspect of the same organization. Thus, there could be a database view developed for the “finance” department, and another for the “production” department. These views could be mutually exclusive, where no entity or relationship in one is represented in the other, or overlapping, where some entities and relationships are represented in both. Integrating multiple views into a global schema requires equivalent concepts to be identified and merged, and dissimilar concepts to be kept as separate elements in the global schema. Integration *We would like to expressour thanks to the refereesfor their v&able comments
Conceptual
Dependencies
Aryya Gangopadhyay School of Business and Management Morgan State University
may also require restructuring individual views. As an example two entities in individual views may be generalized into a higher level entity in the combined one. In such cases, the relationships in which the entities participated before integration may also need to be restructured during integration. Thus, view integration is a semantic unification process that makes use of restructuring rules to establish similarity, equivalence, dissimilarity, or conflict among concepts. Similarity could be measured at the attribute level, entity level, relationship level, or view level. Attribute level similarity can be measured based on the names and domains of values. Entity and relationship level similarities can be measured on names, structure, constraint, and population. View level similarity can be measured in terms of the constituent entities and relationships. A view integration method can start with high-level application views developed for different functional areas and integrate them into a global schema. This approach has been referred to as the top-down approach in the view integration literature (e.g., [7]). Unlike the top-down approach, a bottom-up approach starts with the simplest views (often those developed for the lowest-level activities of a functional model or a hierarchy chart of a process model), and performs a successive integration of simple to complex views until the global schema is formed. It has been noted, e.g., in [7, l], that the bottom-up approach leads to less complexity as compared to the top-down approach. This is due to the fact that, at least in the initial stages of the integration process, the bottom-up approach deals with less number of components. We note here that even such an approach is not completely bottom-up, aud the advantage decreases as the integration progresses. Still another aspect of view integration is the number of views that are integrated at each stage of the process. View integration methodologies, in general, can be categorized as binary, where two views are integrated into one, or n-ary, where any number of views are integrated at once. As shown in [9], a binary view in-
391 1060-3425196 $4.00 Q 1996 IEEE
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
tegration method will typically require more restructuring than an nary view integration method. In this paper, we suggest a methodology that is nary and purely bottom-up in nature. The proposed methodology draws upon the work done in the theory of conceptual dependencies, originally proposed in the area of natural language processing [17, 181. The methodology consists of decomposing the views into their constituent elements and representing them as parts of a conceptual diagram, the diagram being obtained from the first view. Since conceptual diagrams were originally designed for representing natural language sentences, they provide a rich semantic construct for describing and integrating database views. The rest of the paper is organized as follows. A brief description of the theory of conceptual dependencies and details of the proposed methodology are presented in Section 2. Finally, our conclusion is presented in Section ??.
2
use of the primitive ACTS. An action is the conceptual counterpart of a relationship. A modifier is a concept that describes a property of a nominal or an action. An action is the conceptual counterpart of an attribute. There are two types of modifiers: those modifying PPs are called PAS (for Picture Aiders), and those modifying ACTS are called AAs (for Action Aiders). A PA could be an attribute like color, and also its value, like red. There are very few AAs, an example of which is speed, for the primitive ACT “propel”. The CD theory also provides a way of representing dependencies among these concepts. 2.2
Methodology
The
Conceptual
Dependency
Database
Views
The first step in integrating views is to convert them into conceptual diagrams. We start with the first view and develop the corresponding CD. The next view is then added to the CD already developed. This process continues until all views are incorporated into the CD, which is the global CD. The global CD is converted back to the database schema, which is the global database schema. Below is a detailed discussion of each of these major steps.
In this section, we begin by a brief description of the theory of conceptual dependencies. We then discuss the proposed methodology together with an illustrative example. More detailed discussion of the methodology and related proofs are included in [9].
2.1
Integrating
2.3
Theory
Conversion Diagram
of a View
to a Conceptual
Converting a given view to a conceptual diagram is accomplished by applying the following rules.
The Conceptual Dependency Theory was developed by Roger Schank [17, 181, and was subsequently used for developing natural language processing systems like MARGIE [16], SAM and PAM [21], and natural language generation systems (for example, BABEL [ll]). In this section, we emphasize those aspects of the Conceptual Dependency (CD) theory that are relevant to our work. There are three fundamental concepts in the CD theory: Nominal, Action, and Modifier. A nominal, also called a pictuv-e producer (PP), is a physical object, that can be thought of by itself, without the need for relating to some other concept. Examples of PPs are common nouns such as customer, product, and material, as well as proper nouns such as “John”, and “minestrone”. A nominal is the conceptual counterpart of an entity. A PP can perform certain actions (ACTS). ACTS are the conceptual counterparts of what appear as verbs at the sentential level. There are eleven primitive ACTS in the CD theory. All verbs are broken down into their basic conceptual elements by making
Rule 1. An entity is represented as a PP.
Rule 2. An attribute is represented as a modifier to the corresponding PP or ACT. Rule 3. A binary relationship is represented as an ACT, where the entity is the PP that perpetrates the ACT. Rule 4. N-ary relationship,
where N is greater than 2, is represented as Cases as follows: If a relationship describes a transfer of a possession of an object, then it is represented as a recipient case, where the object transferred is the objective case of the act of transfer, the final possessor is the recipient, and the initial possessor is the actor. A receiving clerk receives raw Example. Figures 1 and 2 material from vendors.
Rule 4.1.
392
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences -
R.W“arerial
Figure 1: Receiving Database Views
Receiving
1995
RawMab.ri*l
Raw Material
Clerk
From Vendors:
Figure 3: Moving Raw Material From Receiving Dock: Database view
Vendor Receiving
Dock
t
I
I
1
R
Raw Material
Inventory
D
Mover
dm
Moves
-
Raw Material
Figure 4: Moving Raw Material From Receiving Dock: CD Diagram
Figure 2: Receiving Raw Material From Vendors: CD Diagram represent the database view and the corresponding CD. In Figure 2, R denotes the recipient case and the direction of arrow represents the direction of transfer of the possession of the object.
Rule
Rule 4.2. If a relationship
describes the physical transfer of an object, it is represented as a directke case, where the initial and final locations represent the direction of the motion, the object transferred and the entity responsible for the transfer represent the objective caSe and the actor respectively for the act. Example. A mover moves raw material from the receiving dock to the inventory area. Figures 3 and 4 represent the database view and the corresponding CD. In Figure 4, D and 0 represent the directive and objective cases respectively, the arrows show the direction of the transition of a physical object, and the symbol represents a mutual de-
pendency between the actor and the action. 4.3. If a relationship describes an action that results in the generation of an object through the use of another object, then the former is the objective case for the act of generation, the latter (the instrument) is the objective case of a “dummy” action (usually “DO”), the entity responsible for the act of generating the former is the actor, and the actor, the instrument, and the dummy action “DO” together form the instrumental case. Example. The machine operator uses raw material and machine to manufacture the finished product. Figures 5 and 6 represent the database view and the corresponding CD. In Figures 5 and 6, M/C Op represent a machine operator, I stands for the instrumental case, and 0 stands for the objective case. “Uses” is the dummy action.
393 Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings
of the 28th Annual Hawaii International Conference on System Sciences
2.3.1
Integrating
-
1995
Conceptual
Diagrams
Two or more CDs can be integrated to form a composite CD. In integrating CDs, equivalent PPs can be merged, or generalized/specialized to higher/lower level PPs. ACTS can be merged if the constituent PPs are equivalent and the CDs represent the same case. Dissimilar components are simply added to expand the CD. As an example of generalization, let us assume that we have the following relationship depicted in separate view: every employee works in a department . The “Receiving Clerk”, “Mover”, and “Machine Operator” can be generalized to “Employee”, using the ISA relationship. W e note that all relationships of “Employee” must also hold for the specializations “Receiving Clerk”, “Mover”, and “Machine Op erator”. The integrated CD is represented in Figure 7 The corresponding schema is represented in Figure 8. 2.4 Figure 5: Machine Operator Produces Finished Product From Raw Material: Database View
Conversion Schema:
of a CD to the Conceptual
Given an integrated CD, the next step is to convert it back to the database schema, in the form of an ER diagram. The resulted schema is the global database schema. Converting a CD to an ER diagram is accomplished by applying the following procedure. 1. If the node is a PP, represent
M/C
Machine 0
II
Uses e
2.
If
represent it as a relation-
3.
If
represent
the node is an ACT, ship.
the node is a PA or an AA, att?%bute.
it as an
4. If the aTc is binary that connects a PP with a PA OT an ACT with an AA, represent the dependent concept as an attribute of the governor concept.
QJ
I -"anuf
it as an entity.
0
ac tureD Raw Material
Figure 6: Machine Operator Produces Finished Product From Raw Material: CD Diagram
5. If the an: is directional Product
Pi,Pj,
Pkt
that
joins
three
PPs
then
(a) If &itPj,Pk 1 (Pi,Pj,Pk --) Pi), where i # j # k # 1, then there are four associated PPs and we should then create a d-way relationship, connecting the four PPs by the ACT. the above condition implies that for a QTOUp Of i?WhnCes of pj, pj , pk , there is a specific instance of pt. Thus, the 4way relation can have as its primary key (ii, tj, tk, tr), where t represents the unique identifier of the corresponding p. This justifies having a d-way relation that makes use Note:
specific
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Anmuat Hawaii lrtternational Conferenceon System Sciences - 199s
of the functional dependency stated in the condition. A similar reasoning has been used for ternary relationship in [20]. (b)
If
the above condition
cannot
be met,
then
if %,Pj I (Pi,Pj -, Pk)t where Pi,Pj,Pk are any of the three PPs object, source, and destination, and i # j # I, then create a J-way relationship, connecting the three PP8 by the ACT. The actor is connected by a binary relationship with the object. Note: since a d-way relationship is not supported, we try for a ternary relationship. We hypothesize that the most natuml group of candidates for a ternary relationship is (object, source, destination). This is because the only other possible group (actor, source, destination) is semantically incomplete, since the object is essential in such a grouping.
Figure 7: The Integrated CD Diagram
3
(c) If the above conditions are not satisfied, then create three binary relationships: (pi, pj), (Pjr Pk), and (Pi,Pk).
Advantages Methodology
The
Proposed
The proposed methodology performs the integration by making use of the semantic structure underlying the views. Conceptual Dependency Theory however, has some advantages over working with the views themselves, including the following.
6. If the arc is of type is R (recipient case), then there are three associated PP8: aender, recipient, object (actor, object, location) such that:
l
(a) If jPi,Pj 1 (Pi,Pj -+ Pk), Where Pi,Pj,Pk are any of the three PPa sender, recipient, object (or actor, object, location), and i # j # 1, then create a 3-way relationship, connecting the three PP8 by the ACT. (b) If the above conditions are not satisfied, then create three binary relationships:
(Pi,Pj)r(Pj,Pk),un$(Pi,Pk) 7. If the arc is connecting a PP with an ACT that the PP is an objective case of the ACT, connect the PPs through the ACT.
of
such then
l
8. If the arc is connecting two PPs s,uch that there i8 a possessive dependency between them, then connect the two PPs by a relationship “location”, if the dependency is locative, or Upo8ae88ed-by”, if the dependency is possessive.
CDs, however, are smaller structures than views. Unlike a view, which can consist of multiple semantic units, a CD is a “complete” semantic unit. Thus, a view can be broken up into a number of CDs. For example, “Employee sends P.O. to a vendor and the vendor supplies material to the employee”, can be represented in one view. However, there would be two CDs corresponding to it: “Employee sends P.O. to vendor”, and “Vendor sends material to employee.” The semantic units in a view may be related to each other (for example, through causality), which is not explicitly represented in a view. These relationships can be used in resolving conflicts. Consider, for example, the following two CDs: CDl: CD.2
“Employee sends P.O. to a vendor”. “Supplier sends material to employee”.
Here we have two naming conflicts:
9. If the arc represent8 a change in the state of an attribute of a PP with the PP connected to its initial and final states, then represent the state a8 an attribute of the PP and the values of the initial and final value8 as value8 of the attribute.
- Synonyms: vendor and supplier. - Homonyms: sends in the two CDs. We notice here that the event in CD1 causes that in CD2. In this case, the recipient in CD1 is 395
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
prompted to act in CD2. Thus, the recipient case in CD1 is the same as the ACTOR in CD2. Therefore, “vendor” and “supplier” are synonymous. In addition, the ACT “sends? in CD1 and CD2 are homonyms, i.e., same word with different meanings. This can be recognized by the fact that the event in CD1 is causing that in CD2, and thus are describing two different actions. Hence, these two ACTS must be represented as separate relationships.
M
P. Buneman, S. Davidson, and A Kosky. Theoretical Aspects of Schema Merging. In A. Pirotte, C. Delobel, and G. Gottlob, editors, Lecture Notes in Computer Science, volume 580. Springer-Verlag, 1992.
VI
J. L. Carswell and S. B. Navathe. SA-ER: A Methodology that Links Structured Analysis and Entity-Relationship Modeling for Database DeConference on ER Apsign. In 5th International proach, 1987.
PI
M. Casanova and M. Vidal. Towards a Sound View Integration Methodology. In Proceedings of the 2nd ACM SIGACT/SIGMOD Conference on Principles of Database Systems, pages 36-47,
Conclusion
4
1983.
In this paper we outlined a methodology for integrating n database views simultaneously. The proposed methodology make use of the Conceptual Dependency Theory that was developed by Roger Schank [17, 181. U n l’1k e o th er view integration methodologies proposed in the literature, the proposed methodology performs the integration by making use of the semantic structure underlying the views.
PI A.
Gangopadhyay. Using Conceptual Dependencies for Database Design and Query Processing in a CIM Environment. PhD thesis, Rutgers University, May 1993.
[lOIJ.
Geller, A. Mehta, Y. Perl, E. Neuhold, and A. Sheth. Algorithms for Structural; Schema Integration. In P. Ng, C. V. Ramamoorthy, L. C. Seifert, and R. T. Yeh, editors, Proceedings of the Second International Conference on Systems Integration, pages 604-614, 1992.
References
PI N.
R. Adam and A. Gangopadhyay. Integrating Functional and Data Modeling in a Computer Integrated Manufacturing System. In Proceedings of the Ninth International Conference on Data Engineering, April 1993.
N. 1111
WI
Al-Fedaghi and P. Scheuermann. Mapping Considerations in the Design of Schemas for the on SoftRelational Model. IEEE Transactions ware Engineering, SE-7(l), 1981.
Conceptual
Generation. In chapter 6, North-Holland/American Else-
Information
pages 289-371. vier, 1975.
PI S.
Processing,
S. Navathe, R. Elmasri, and J. Larson. Integrating User Views in Database Design. IEEE Computer, 19:50-62, January 1986.
P31 S. Navathe and S. G. Gadgil. A Methodology for View Integration in Logical Database Design. Conference on Very Large In 8th International Databases, 1982.
[31 C. Batini and M. Lenzerini. A Methodology for Data Schema Integration in the Entity Relationship Model. IEEE iPransactions on Softluare Engineering, SE-10(6):650-663, 1984.
PI
Goldman,
Conceptual
P4
C. Batini, M. Lenzerini, and S. B. Navathe. A Comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys, 18(4):323-364, 1987.
S. Navathe and G. Pernul. Design of Relational Databases. In M. C. Yovits, editor, Advances in Computers, volume 35, pages l-80. Academic Pres, Inc., 1992.
P51 S. Navathe, T. Sashidhar, and R. Elmasri. Relationship Merging in Schema Integration. In 10th International Conference on Very Large Databases, 1984.
[51 M. Bouzeghoub and I. Comyn-Wattiau. View Integration by Semantic Unification and Transformation of Data Structures. In H. Kangassalo, cditor, Entity-Relationship Approach: The Core of Conceptual Modeling, pages 381-398. North Holland, 1991.
PI
C. K. Riesbeck.
Conceptual Analysis. In ConProcessing, chapter 4, pages 83-155. North-Holland/American Elsevier, 1975. cept,ual Information
396
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedingsof the 28th Annual Hawaii International Conferenceon SystemSciences- I995
Figure 8: The Integrated Database Schema [17] R. C. Schank. C onceptualizations Underlying Natural Language. In R. C. Schank and K. M. Colby, editors, Computer Models of Thought and Language, chapter 5, pages 187-247. W. H. Freeman and Co. San Francisco, 1973. [18] R. C. Schank. C onceptual Dependency Theory. In Conceptual Information Processing, chapter 3, pages 22-82. North-Holland/American Elsevier, 1975. [19] T. Teorey and J. Fry. Design of Database Structures. Prentice-Hall, Englewood Cliffs., NJ., 1982. [20] T. Teorey, D. Yang, and J. Fry. A Logical Design Methodology for Relational Databases Uing the Extended Entity-Relationship Model. A C M Computing Surveys, 18(2):197-222, June 1986. [21] R. Wilensky. Understanding goal-based stories. Technical report, Dept. of Computer Science, Yale University, 1978. [22] S. B. Yao, V. Waddle, and B. Housel. View Modeling and Integration Using the Functional Data Model. IEEE l’ransactions on Software Engineering, SE-8(6):544-553, 1982. 397 Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE