Dec 21, 1993 - ECR Model, Object Modelling Technique, object-oriented analysis, object-oriented ...... systems for logical design of products which share information with systems for physical design, which in ...... Objektorienteret Analyse.
CoCoA: A Conceptual Data Modelling Approach for Complex Problem Domains
BY JOHN ROBERT VENABLE B.S., The United States Air Force Academy, 1975 M.S., Management Science (Information Systems) The State University of New York at Binghamton, 1983 M.S., Advanced Technology (Computer Science) The State University of New York at Binghamton, 1985
DISSERTATION
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Advanced Technology (with specialization in Information Systems) from the Thomas J. Watson School of Engineering and Applied Science in the Graduate School of the State University of New York at Binghamton 1993
ii
(c) Copyright by John Robert Venable 1993 ALL RIGHTS RESERVED
iii
Accepted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Advanced Technology (with specialization in Information Systems) from the Thomas J. Watson School of Engineering and Applied Science in the Graduate School of the State University of New York at Binghamton
Dr. Joseph V. Cornacchio Computer Science, Watson School
December 21, 1993
Dr. Heinz K. Klein School of Management
December 21, 1993
Dr. Michal Culter Computer Science, Watson School
December 21, 1993
Dr. Richard L. Baskerville School of Management (external examiner)
December 21, 1993
iv
Dissertation Abstract "CoCoA: A Conceptual Data Modelling Approach for Complex Problem Domains" by John R. Venable Heinz K. Klein, Dissertation Advisor This dissertation research describes and evaluates the suitability of a conceptual data model named CoCoA (for Complex Covering Aggregation) for the description and modelling of complex problem domains. The dissertation additionally proposes a software architecture for information systems which support work in complex problem domains. A complex problem domain is one which exhibits the characteristics of (1) complex entities, (2) shared components, and (3) varying and overlapping levels of granularity. A complex entity is one which aggregates components which may include both other entities as well as the relationships between them. Some of these components may be shared with other complex entities. Examples of types of information systems which support complex problem domains include CASE (which supports development of information systems), CAD/CAM, CIM, Geographic Information Systems, Office Information Systems, and Enterprise Integration Systems. The research describes and evaluates the suitability of several popular conceptual data modelling approaches for modelling complex problem domains and compares them to the CoCoA conceptual data model. It then applies the CoCoA model to an extensive example in order to demonstrate and further evaluate the model's usefulness. The example application is to information systems modelling, which is a subset of the large complex problem domain which could be supported by CASE (or ICASE). Within information systems modelling, CoCoA is first applied to describe, model and integrate several information systems models which support the perspective of data flow modelling. The example is then extended to describe, model, and integrate the perspective of conceptual data modelling, which includes CoCoA itself, as well as other conceptual data models. Finally, a software architecture for information systems which support a complex problem domain is described, as well as a method for mapping the conceptual data models onto the architecture. Key Words: information systems model, information systems modelling language, data model, data abstraction, conceptual data model, object-oriented model, ER model, extended ER model, NIAM, Entity-Category-Relationship model, ECR Model, Object Modelling Technique, object-oriented analysis, object-oriented design, relational model, CAD/CAM, CASE, office information system, geographic information system, enterprise integration system, information systems development method v
This dissertation is dedicated to: My love, friend, and wife - Polly
vi
ACKNOWLEDGEMENTS
This dissertation is not the result of my own effort only. It has been aided and enabled by the contributions of many people. I would like to gratefully acknowledge the help of those people, without whom much of what follows would never have even been more than vague ideas, much less a (finally) finished product. My apologies to those I have forgotten. The limitations of my memory do not diminish the significance of your assistance. First and foremost, I wish to thank my doctoral dissertation advisor, Dr. Heinz K. Klein. Heinz has been an inspiration to me now for nearly 10 years. He is the epitome of what it means to be a scholar, upholding high academic standards while maintaining an enthusiasm for the field of Information Systems. He has always been very supportive of my research, taking an active interest in what I have accomplished and remaining attentive to what I have had to say. In particular, he consistently encouraged me to write, write, and write. Clearly, he knew exactly what it would take for me to finish this work and, ultimately, he succeeded in getting me to achieve it. His active guidance and support are to a large extent responsible for the successful completion of this work. For that in particular, he has my sincere thanks. My sincere thanks also to Dr. Joseph V. Cornacchio, chairman of my PhD Committee and Dr. Michal Cutler, Ph.D. committee member, for their help and advice in setting up and executing this research. I would also like to thank Dr. James E. Senn (now at Georgia State University), my first PhD advisor, who started me on the road to the PhD. My interest in the field and much of what I know about information systems development came from him. My thanks also to Dr. Jorge Diaz-Herrera (who moved on to George Mason University), for advising me for a short while after Dr. Senn left before his own departure. I learned much from him that is reflected in this dissertation. I would especially like to thank Dr. Lars Mathiessen and Dr. Jan Stage at Aalborg University in Denmark, where I have been writing this dissertation for the past one and a half years. They have taken a very active role in my work, helping to guide, structure, and manage it, as well as reviewing some of my writing. I am sure I would not have been successful without their assistance and encouragement. I would also like to thank many others at Aalborg University, for their general encouragement and for making Aalborg the pleasant, supportive, collegial place that it is. In particular, I would like to thank Dr. Peter Axel Nielsen, Dr. Ivan Aaen, Dr. Carsten Sørensen (a true friend, now at RISØ), Dr. Christian Søndergaard Jensen (who advised vii
me to start writing this section earlier than I'd planned), Dr. Lars Bækgaard (thanks for both the comments and the music), Dr. Kasper Østerby, Dr. Bent Bruun Kristensen, and Dr. Kurt Nørmark. Many of them have provided comments and suggestions on some part of my work and all have provided needed encouragement. Thanks also to Jane Bengsten, Lise-Lotte Dahl Knudsen, Helle Verdier, Jette Mathiesen, Lisbeth Grubbe Nielsen, Lisbeth Karlsson, and my fellow PhD students at Aalborg, particularly Søren Dittmer, Birgitte Krogh, and Jan Damsgaard, for, among other things, making my stay in Denmark much easier and more fun. I would also like to acknowledge the intellectual contribution to this dissertation through many collegial discussions at various points during and before this dissertation work by many in and around the information systems community, but especially Dr. Richard Welke at Georgia State University, Dr. Kalle Lyytinen and Dr. Juhani Iivari at the University of Jyväskylä in Finland, Dr. Richard Baskerville and Dr. Duane Truex at Binghamton University, Dr. Tony Wasserman at the University of San Francisco and Interactive Development Environments, Richard Carpenter formerly of Index Technology, and Lou Mazzuchelli at Cadre Technologies. Thanks for their general interest and intellectual contribution also go to various other members of IFIP WG 8.2, including Dr. Rudi Hirschheim, Dr. Trevor Wood-Harper, Julie Travis, Dr. David Avison, Dr. Michael Newman, Drs. Julie and Ken Kendall, Dr. Hans Opelland, and many others. Many thanks go to those at Binghamton University. Some of those who provided support, encouragement, and intellectual stimulation at the School of Management (SOM) include Dr. Thomas Kelly (then Dean), Dr. Gary Roodman, Dr. Jerri Frantzve, Dr. Walter Einstein, Dr. George Westacott, Dr. Dallas Defee, Dr. Arieh Ullman, Dr. Richard Blizzard, Dr. Gabriel Ramirez, and Dr. Dennis Lasser. Thanks also to Alesia Wheeler, Terri Bower, Margaret Goodfellow, Marge Walker, and Frances Littlefield in SOM for their support. Thanks also to Dr. Phil Kraft of the Sociology Department. My thanks also to some exceptional teachers in the Watson School of Engineering at Binghamton University, who, besides teaching me a lot, greatly stimulated and inspired me, including Margaret Iwobi, Don Gause, and Dr. Narendra Goel. Many thanks also to Lary Jones for assistance in all things technical, but especially for help in getting my dissertation transferred to the USA and printed. My sincere thanks to Dr. Duane Truex and to Polly Venable in that regard also. Thanks also for their interest, support, and comments on my research to my fellow students participating in the PhD program in Information Systems at Binghamton, including Dr. Duane P. Truex III, Dr. Ojelanki Ngwenyama, Dr. U. Rex Dumdum, Dr. Joanne P. Hoopes, Margaret Hamilton, Dr. Delvin Grant, Dr. Rajeev Kaula, and Roy Alvarez (from Cornell University). Thanks as well for their support to other fellow students in Master's Degree programs including Barbara Kay and Carlaine Blizzard.
viii
ix
Thanks also to my many students at Binghamton, Central Connecticut State, and Aalborg Universities, who greatly stimulated, inspired, and encouraged my work, not to mention making it fun, but especially Irene Ash, Curtis Britt, Kim Conaboy, Marla Corell, Karen Cornacchio (now D'Andria), Kristian Dahl, Steve Feehan, Joe Florendo, Rob Gaughran, Laura Johnston, John Kong, Ib-Rene Kruse, Gil Madrid, Bryce Matsuoka, Angela and George Meikle, Brenda McBride, Pat Morris, Kathleen Murray, Eileen Pascal, Ernie Schirmer, Lisa Schumann, Karen Silvermintz (now herself a PhD student), Julie Singleton, Lu Su, Nina Watrous, Seth Weinstock, and last, but only alphabetically, Torben Worm. Teaching is especially great when there are students like you. In particular, I would like to acknowledge the intellectual contribution, comments, suggestions, encouragement, assistance, advice, support, and friendship of Dr. Duane P. Truex III. Thanks for everything, buddy, I couldn't have done it without you. Finally, my most sincere thanks go to my wife, Mary Elizabeth Smith Tate Venable (better known as Polly). No only were you a great help, but you've stuck with me through all of this and I am deeply grateful. An uncountable number of thanks go to you.
x
TABLE OF CONTENTS
Copyright notice Signature page Abstract and Keywords Dedication Acknowledgements Table of Contents
iii iv v vi vii x
PART I: Introduction
1
Chapter 1 Introduction: Dissertation Research Topic and Method
3
1.1 1.2 1.3 1.4
3 6 7 9
Complex Problem Domains Conceptualization vs. Implementation Research Method Organization of this Dissertation
PART II: A Conceptual Approach
11
Chapter 2 Literature Review
13
2.1 Complex Problem Domains 2.1.1 Complex Objects 2.1.2 Multi-Granularity 2.1.3 Partially Overlapping Components 2.1.4 Combined Characteristics 2.1.5 Example Domains 2.2 Criteria for conceptual data models 2.2.1 Some Principles of Data Model Design 2.2.2 Other Criteria 2.2.3 Synthesis of Criteria 2.2.4 Comparison of Criteria
13 14 15 16 17 18 20 21 23 25 27
Chapter 3 Existing Conceptual Data Modelling Approaches
29
3.1 Entity relationship attribute model 3.1.1 Description 3.1.2 Evaluation 3.2 Extended entity relationship model 3.2.1 Description 3.2.2 Evaluation
29 30 32 33 33 36 xi
3.3 NIAM data model 3.3.1 Description 3.3.2 Evaluation 3.4 Entity-Category-Relationship data model 3.4.1 Description 3.4.2 Evaluation 3.5 Object Modelling Technique (OMT) 3.5.1 Description 3.5.2 Evaluation 3.6 Summary
37 38 45 49 50 52 54 54 57 60
Chapter 4 The CoCoA Conceptual Data Model
63
4.1 Entities 4.1.1 Semantics 4.1.2 Notation 4.2 Named Relationships or Associations 4.2.1 Semantics 4.2.2 Notation 4.3 Generalization and Specialization 4.3.1 Semantics 4.3.2 Notation 4.4 Categorization 4.4.1 Semantics 4.4.2 Notation 4.5 Covering Aggregation 4.5.1 Semantics 4.5.1.1 Simple Covering Aggregation 4.5.1.2 Complex Covering Aggregation 4.5.2 Notation 4.6 Evaluation of the CoCoA Model 4.6.1 Evaluation Against Criteria 4.6.1.2 Criteria for Semantic Concepts 4.6.1.2 Criteria for Syntactic Constructs 4.6.1.3 Criteria for Relationship to Other Areas 4.6.2 Support for Complex Problem Domains 4.6.2.1 Support for Complex Objects 4.6.2.2 Support for Multiple Granularity Levels 4.6.2.3 Support for Shared Objects
64 65 66 67 67 70 71 71 75 75 76 77 78 78 79 82 83 85 86 86 89 91 92 93 94 95
Chapter 5 Comparison and Integration of Data Flow Models: An Example Use of the CoCoA Model
97
xii
5.1 IS Modelling Languages and Perspectives 5.2 Modelling and Integrating an Individual Perspective 5.2.1 Introduction 5.2.2 View Integration 5.3 ISMLs in the Data Flow (Functional) Perspective 5.3.1 Data Flow Diagrams 5.3.2 SADT Activity Diagrams 5.3.3 ISAC A-Graphs 5.4 Integrating the Data Flow Perspective
98 101 101 102 105 106 107 110 112
Chapter 6 Comparison and Integration of Conceptual Data Models: Expanding the Example Use of the CoCoA Model
119
6.1 An Enhanced View Integration Method 6.2 CoCoA Modelling of Conceptual Data Models 6.2.1 Entity Relationship Model 6.2.2 Complex Covering Aggregation (CoCoA) Conceptual Data Model 6.3 Modelling of Data Abstraction Mechanisms 6.3.1 Simple Entities 6.3.2 Named Relationships or Associations 6.3.3 Fact Types and Reference Types 6.4 Integration of Conceptual Data Models 6.4.1 Integration of Abstraction Mechanisms and Terminology 6.4.2 Resolving Individual CoCoA Models with the Integrated CoCoA Model 6.4.2.1 The ER Model 6.4.2.2 The CoCoA Model 6.4.3 Higher Levels of Conceptual Data Modelling 6.5 Summary
120 123 124 127 129 130 133 140 145 146 157 157 158 159 160
PART III: A Possible Implementation Approach
161
Chapter 7 A Possible Internal Implementation: Objects and Relations
163
7.1 Overview of the Architecture 7.2 Implementation of CoCoA in the Relational DB Layer 7.2.1 Entity Types 7.2.2 Named Relationship Types 7.2.3 Generalization-Specialization Relationships 7.2.4 Categories 7.2.5 Covering Aggregation 7.2.5.1 Covering Entities 7.2.5.2 Covering Relationships 7.2.6 Surrogate Keys
164 169 169 169 171 172 173 173 174 174
xiii
7.3 Implementation of CoCoA in the Object-Oriented Layer 7.3.1 Data Representation 7.3.1.1 Entity Types 7.3.1.2 Named Relationship Types 7.3.1.3 Generalization-Specialization 7.3.1.4 Categories 7.3.1.5 Covering Aggregation 7.3.1.6 Surrogate Keys 7.3.1.7 Summary 7.3.2 Fundamental Operations 7.3.2.1 Operations to Support Entities and Attributes 7.3.2.2 Operations to Support Named Relationships 7.3.2.3 Operations to Support Covering Aggregation 7.3.2.4 Operations to Interface the Object-Oriented and Relational Levels 7.4 An Example 7.4.1 Relational Implementation of the ISAC IFIP Case Solution 7.4.2 Object-Oriented Implementation of the ISAC IFIP Case Solution 7.5 Summary
175 175 176 176 178 179 180 181 182 183 184 186 189 192 197 197 203 206
Chapter 8 A Possible External Implementation: The User Interface
207
8.1 Review of the Architecture 8.2 Level 3: Cognitive Level Operations 8.3 Level 4: User Presentation and Dialogue 8.3.1 User Presentation 8.3.2 User Interface Style 8.3.3 Dialogue Structure 8.3.4 Implementation 8.4 An Example 8.4.1 Cognitive Level Operations 8.4.2 User Interface Level 8.4.3 An Example User Interface
208 211 213 213 215 215 215 216 216 218 219
PART IV: Conclusions
225
Chapter 9 Conclusions
227
9.1 Contributions of the Research 9.2 Limitations of the Approach 9.3 Topics for Further Research
227 229 231
APPENDICES
235
xiv
Appendix A CoCoA Modelling of Additional Conceptual Data Models
237
1. 2. 3. 4.
237 238 243 246
Extended Entity Relationship Model NIAM Data Model ECR Data Model OMT Model
Appendix B CoCoA Modelling of Additional Conceptual Data Modelling Abstractions
251
1. 2. 3. 4. 5. 6. 7.
251 254 256 259 260 262 264
Generalization and Specialization Categorization Covering Aggregation Objectification Object Types Derivation and Derived Concepts Constraints
Appendix C Reviewed CoCoA Models of Conceptual Data Models
271
1. 2. 3. 4.
271 273 280 282
The EER Model The NIAM Model The ECR Model The OMT Model
Appendix D Fundamental Object-Oriented Operations
287
1. 2. 3. 4. 5.
288 290 298 304 315
Operations to Support Entities and Attributes Operations to Support Named Relationships Operations to Support Covering Aggregation Operations to Interface the Object-Oriented and Relational Levels Summary
Appendix E Implementation of ISAC A-Graph of the IFIP Case
317
1. Relational Implementation of the IFIP Case Example 2. Object-Oriented Implementation of the IFIP Case Example
319 327
BIBLIOGRAPHY
335
xv
Part I Introduction
Chapter 1 Dissertation Research Topic and Method
This dissertation is concerned with conceptualization and understanding of complex problem domains. The dissertation's perspective is that we need and want to build computer-based information systems which support human work within many such domains. But, before we can design such systems, we must understand the problem domains which they will support thoroughly and precisely. In order to do that, we must have some means to help us deal with the complexity. One possible means to do that -an improved conceptual data modelling technique -- is the topic of this dissertation. The purpose of this chapter is to prepare the reader for the rest of the dissertation. It will introduce the qualities of complex problem domains (section 1.1), discuss the secondary nature of our concern here with implementation of our conceptualization (section 1.2), present and justify the dissertation's method (section 1.3), and give the organization of the rest of this dissertation (section 1.4). 1.1 Complex Problem Domains The kinds of problem domains that we need to conceptualize of and understand today in building information systems are becoming more and more difficult. We are building information systems which are larger and larger and more and more complex. We are especially building systems today which try to integrate many diverse areas or which try to support areas of human work which are themselves very complex. There are many areas which exhibit high levels of complexity, including Computer-Aided Design and Computer-Aided Manufacturing (CAD/CAM), its outgrowth - Computer-Integrated Manufacturing (CIM), Computer-Aided Software (or System) Engineering (CASE), and 3
Office Information Systems (OIS). In each of these problem domains, we want to be able to support multiple people, each of whom contribute their own part to the work on the whole. We would like to have each be able to work in a way which suits their individual needs, yet enable them to cooperate easily with other people who in turn work in their own way. The different work elements usually need to be distributed over time also - yet still be supported and kept consistent. Even with only a single person, we often want to be able to support different portions of the work at different times. Enabling flexible and easy sharing and division of work, while at the same time maintaining a consistency of the work, is a major goal in each of these domains. One approach to serving these multiple needs is to build an integrated set of tools. Before we can build tool sets which interact properly, we must understand the problem domains they support, in all their complexity. Improving these activities is a major goal of this research. Note that this research does not directly address conflicting views of the work objects (e.g., alternative designs) by different individuals, nor negotiations to resolve the conflicts. Conflicting views (like versions or configurations), arguments, resolutions, and the like, are treated as parts of the problem domain to be understood, which make the problem domain more complex. The difficulty in understanding these problem domains is not just because of the size of a domain, i.e., that it has many concepts which must be organized and related to each other. It is also because of the complex nature of the relationships between those concepts. In particular, these problem domains are complicated by three factors which haven't been dealt with adequately in existing conceptual data models. These factors and various conceptual data models' abilities to deal with them are discussed extensively later in this dissertation; they are only briefly introduced here. 4
First, complex problem domains are partly composed of complex objects (here we use the term object in a very general sense). Complex objects are composed of both other objects and of the relationships between them. As a simple example, a list is composed of items in that list. But it is also composed of the relationships between those items, i.e., that one item follows another in the list. Secondly, complex objects may exist at multiple levels of granularity. Some complex objects are very large and very complex while others are relatively small and simple. For example, in the CASE field, a project configuration may be extremely large, being composed of requirements documents, designs, programs, etc. Each of those things may in turn be composed of other things. We want to be able to deal with each of them as a whole sometimes (for example, baselining a particular configuration), and as being made up of components at other times (e.g., when we work on the components of a design). Third, many of the complex objects partially overlap, i.e., a complex object may share some (or all) of its component objects and/or relationships with another complex object. As an example (again drawn from CASE), a data definition which is a component of the requirements may be used within various design and implementation components. Complex problem domains must be precisely understood in spite of these factors before we can build tool sets which adequately serve people working within these problem domains. 1.2 Conceptualization vs. Implementation A conceptual data model should adequately model the problem domain. Additionally, a conceptual data model should serve as a bridge to an eventual implementation. Some papers written in the IS field discuss conceptual (or semantic or logical) data modelling together with their support in databases or other systems (see citations below). Unfortunately, some of these papers casually mix the conceptual aspects of the data modelling with implementation concerns. 5
While both of these concerns are important, the primary concern and contribution of this dissertation is to describe a data model which facilitates the conceptualization and understanding of the semantic content (or concepts) of complex, multiple-granularitylevel, partially-overlapping problem domains. The complexity of the problem domains being addressed here requires it. Only afterward should we consider implementation concerns. On the other hand, we also cannot ignore the need to support implementation. Therefore, a subsequent part of this dissertation will discuss a possible means for implementing the proposed conceptual data model. Anything more than a high-level sketch of a possible means is beyond the scope of this dissertation. Furthermore, alternative implementation means are possible. 1.3 Research Method As noted above, this dissertation is primarily concerned with possible ways to better understand the kinds of complex, multiple-granularity-level, partially-overlapping problem domains which are challenging information systems builders today. It develops an approach to gaining a rich understanding of such a domain that can serve as a partial specification for a system which supports work in that problem domain. The dissertation is only secondarily concerned with implementation methods which support the conceptualization. As such it only proposes one possible implementation means. Research on the primary theme will follow a series of steps to provide both theoretical and empirical evidence in support of a new conceptual data modelling approach. First, the research will begin with a review of the literature in order to establish the distinguishing characteristics of the problem domains which are of interest in this dissertation. Second, criteria will be investigated and established for evaluating conceptual data modelling approaches. 6
Third, existing conceptual data modelling approaches will be surveyed and evaluated against the criteria. This will include characterizing how well or poorly they support description of the distinguishing characteristics of complex problem domains. Fourth, a new conceptual data modelling approach will be presented and evaluated. The presentation will highlight the new concepts and distinguish them from those borrowed from existing approaches. The new conceptual data model will be evaluated against the same criteria used to evaluate existing conceptual data models. This will provide theoretical evidence that the proposed approach is better. Finally, the new conceptual data model will be applied to a significant example of the problem domain to illustrate its benefits to the reader. This will provides empirical evidence of the benefits of the model, the evaluation being based on the individual reader's perception of the new conceptual data model's utility and benefits. Research on the secondary theme of implementation will be more limited. It will be provide theoretical evidence of the implementability of the conceptual data model by describing a standard mapping process to a possible implementation software architecture. The possible implementation software architecture will comprise a relational database which supports partially overlapping complex objects and their multiple levels of granularity through explicit representation of those characteristics. An object-oriented level on top of that supports working with the objects at various levels of granularity, as desired by the application programs which may use them. The defined standard mapping process will translate individual components from the conceptual data model onto the constructs provided by the implementation mechanisms. This mapping will be to both the relational and object-oriented levels of the possible implementation software architecture. However, space limitations will prevent the presentation of empirical evidence, such 7
as the development and evaluation of prototypes of the architecture or the application of the mapping process to a large example (although a small example will be presented). 1.4 Organization of this Dissertation This dissertation is organized into four parts -- Part I: Introduction, Part II: A Conceptual Approach, Part III: A Possible Implementation Approach, and Part IV: Conclusions. The purpose of Part I (this chapter) is to prepare the reader for the rest of the dissertation by introducing the salient aspects of the research topic and succinctly stating the research to be done. The purpose of Part II is to present the primary research contribution, which directly addresses conceptual data modelling of complex, multiple-granularity-level, partiallyoverlapping problem domains. It is organized to follow the steps in the research method above. Chapter 2 will review the relevant literature to describe complex problem domains and establish criteria for evaluating conceptual data models. Chapter 3 will review existing alternative conceptual data modelling approaches from the literature. It will also evaluate those conceptual data models against the criteria established in chapter 2. Chapter 4 will define the proposed conceptual data modelling technique and evaluate it against the established criteria. This will particularly include showing how the new conceptual data model supports description of the salient aspects of complex problem domains. Chapter 5 uses the proposed conceptual data model to describe information systems models which use the data flow perspective. This serves as an example application of the new conceptual data model to give empirical evidence of its usefulness. Chapter 6 expands that example by using the new conceptual data model to describe information systems models which use the conceptual data modelling perspective (including itself). This is a more complex example which will also highlight the differences (and similarities) between the new conceptual data model and other 8
conceptual data models. The purpose of Part III is to present the secondary research contribution, which proposes a possible way to implement the conceptual data model. The proposed implementation architecture interfaces a relational database layer (which facilitates normalized storage of complex, partially-overlapping objects at various granularity levels) with a basic object-oriented layer (chapter 7), which in turn interfaces with additional object-oriented layers on up to the user interface (chapter 8). The purpose of Part IV of the dissertation (chapter 9) is to present the conclusions of the research.
9
Part II A Conceptual Approach
Chapter 2 Literature Review
The purpose of this chapter is to review some of the literature in the area of the dissertation research. In section 2.1, the idea of complex problem domain is examined. In section 2.2, criteria for conceptual data models are critically reviewed and a set of criteria are selected. 2.1 Complex Problem Domains The purpose of this dissertation is to contribute to our ability to understand and describe complex problem domains. Before we do that, we should dissect, classify, and describe the characteristics of complex problem domains which make them so difficult to comprehend. In that way we can better assess whether the tools which we use in helping ourselves to understand are effective, because we can begin to understand why they are effective or ineffective. Complex problem domains have three important characteristics which make them so complex. First, they are made up of complex objects rather than only simple objects. Second, these complex objects exist at multiple levels of granularity. Third, these complex objects partially overlap. Each of these characteristics is described in turn in the following three sections (2.1.1-2.1.3). Their combination will be considered in section 2.1.4. Examples of complex problem domains will be considered in section 2.1.5. 2.1.1 Complex Objects A complex object is one which is composed of other things. This is the more or less well-known concept or abstraction of aggregation. However, the aggregation concept, as it is commonly used, is insufficient to describe the kinds of complex objects which are 13
present in complex problem domains. We can distinguish three kinds of aggregation, the first two of which are insufficient. These will be discussed more thoroughly in chapter 4; only an overview is given here. First there is aggregation of attributes into an entity, like the well-known record structure or the attributes of a tuple in a relation (e.g., as in [Smith and Smith 1977a and 1977b]). This is certainly useful, but doesn't handle everything. In fact, objects defined using only this kind of aggregation are known as simple objects. Second, there is the aggregation of component objects within a higher level, composite object. This is also well-known (e.g., as in [Kim et al. 1988 and 1989]). To distinguish it from the first kind, this kind of aggregation has been called covering aggregation [Alagic], while the first kind has been called cartesian aggregation [Alagic].
The third kind of aggregation is less well-known. It is the aggregation of component objects and the relationships between them within a higher level, composite object. For example, a stack is not composed just of the items which are in it. It also is composed of the relationships between the objects, i.e., that one item is on top of another (except the bottom one). To distinguish it from the other two kinds of aggregation, I will call this complex covering aggregation. A tool which helps us understand complex problem domains should provide ways to help conceptualize of this kind of complex objects (those made by complex covering aggregation). 2.1.2 Multiple Levels of Granularity Besides only being composed of some number of complex objects, complex problem domains have such complex objects at multiple levels of granularity. Some of the complex objects may be small and composed only of simple (non-complex) objects and relationships. On the other hand, complex objects may be composed in turn of various 14
other complex objects. This composition from complex components may proceed to arbitrary depth (i.e., a complex object may be composed of complex objects which are composed of complex objects which are ...). For a simple example, a tree is composed of subtrees, which can in turn be composed of subtrees, ... etc. As a less homogeneous example, you could have a tree in which every leaf is a list and each list item is an array, ... and so on. The implications for this problem are particularly troublesome when it comes to implementation. We want (and need) to be able to handle objects at different levels of granularity at different times, but efficient handling of objects is best done at the highest possible level of granularity. This is particularly true when we want to store such objects in secondary storage or retrieve such objects back again. At any rate, we need to understand what the different levels of granularity are with which we may want to deal with the various parts of the particular complex problem domain. A tool which helps us to understand complex problem domains should also provide ways to help conceptualize of the multiple levels of granularity of complex objects. 2.1.3 Partially Overlapping Components In addition to the multiple levels of granularity of the complex objects, complex problem domains have one other characteristic that makes them so complex - the objects which are components may be components of more than one complex object. This means both that one type of object may have different instances which are components of objects of different types and also that an object instance may be a component of more than one instance of a complex object at the same time (and these objects may even be of different types). In particular, this last part is typical of objects which are logical in nature [Kim et al. 1989]. The above can be stated differently from the point of view of the complex object instead of from the point of view of the component object. Multiple complex objects 15
(which may even be of different types) may share one or more (but not necessarily all) component objects. A tool which helps us to understand complex problem domains should also provide ways to help us to conceptualize of the partial sharing of complex objects. Which objects (object types) can be shared and which cannot? Which other objects (object types) can they be shared among?
2.1.4 Combined Characteristics Even worse than each of these three characteristics by themselves, complex problem domains are difficult to understand because these three characteristics may be combined in various ways. Considering relationships as components vs. different granularity levels: A relationship may exist between objects at different levels of granularity. Considering relationships as components vs. partially overlapping components: A relationship may also be a component of several different complex objects, (which may also at different levels of granularity!). On the other hand, a relationship might not be a component of a complex object, even though one of the objects that it relates to others is a component of that same complex object. Considering different levels of granularity vs. partially overlapping components: Two or more components which are shared between two complex objects may be at different levels of granularity. A component may also be shared by different complex objects that are themselves at different levels of granularity. Stated otherwise, a complex object may share both small and large (or simple and complex) components with other objects and a component may be shared by both large and small complex objects. The interactions of the characteristics of complex problem domains are what makes them particularly difficult to understand well. In order to better understand them, we 16
need techniques which make clear these characteristics and their interactions so that we can reason about them and understand them in all their complexity. 2.1.5 Example Domains There are a number of examples of problem domains which exhibit the characteristics described above. Examples include Computer-Aided Design and Manufacturing (CAD/CAM), Computer Integrated Manufacturing (CIM), Computer-Aided Software Engineering (CASE), Office Information Systems (OIS), and Geographical Information Systems. Each of these fields is working to get an understanding of the complexity of its problem domain. In each field, they are attempting to build complex information systems which attempt to integrate a variety of systems together which provide coverage of different parts of their respective problem domains. For example, in CAD/CAM, there is an attempt to integrate the logical design, physical design, the design for manufacturability, raw material requirements, and other resource requirements for products. With CIM, this is further integrated with production planning and control information, information about production resources such as equipment and employees, schedules, demand forecasts, and the like. We want to have systems for logical design of products which share information with systems for physical design, which in turn share information with systems for manufacturing design, and so on. We can also relate these to the characteristics which make problem domains particularly difficult to understand. For example, a production plan is a complex object which is composed of schedules, resource descriptions, and so forth. There are relationships between the various objects which are the components. For example, resources are assigned to the production of various products. Products and resources are scheduled at various times in the production schedule. 17
There are multiple levels of granularity. The components above are themselves complex objects. Pieces of equipment have component parts. Personnel have certain skills and training records. A production plan, composed of schedules, which are composed of resources etc., for a single product, may in turn be part of a higher-level production plan. Furthermore, the components may be shared components of other complex entities also. Equipment and personnel are part of a particular production facility or plant. They may be used in several different production plans for different products. A particular product may be in several different production plans, either under consideration or decided as the plan to be executed. As another example of a complex problem domain, in CASE, there is a need to integrate information and tools in all areas of system description and project and configuration management. For example, various portions of requirements specifications, such as data flow diagrams, entity relationship diagrams, etc., must be integrated with each other, then with design documents, such as hierarchy charts, object-oriented designs, pseudocode, and so on. These in turn should be integrated with implementation information, program code, etc. In Office Information Systems, there is a strong need to integrate all kinds of information - formal documents, databases of customer data and the like, forms, graphic images, presentations, informal and working documents, spreadsheets, and so forth. In fact, if taken to the extreme, you could consider CASE and CAD to be examples of support for kinds of office work. Geographical Information Systems also have a need to integrate a wide variety of data in complex ways. Many different kinds of information may be related to various locations. Examples include such information as demographic information on population - race, religion, income level, family structure, education, etc. - physical information such 18
as location of facilities - schools, roads, hospitals, fire departments, etc. - and other physical characteristics - topography, soil composition, water availability and quality, etc. - even political characteristics - voting and school districts, political party affiliation, city and county boundaries, etc., - all of which can be associated with geographic locations or regions. These pieces of information can be related to each other and aggregated together along with the geographic areas. Relating all these things can be very complex and difficult. Various pieces of information can be shared across different aggregates - for example they might belong do a particular political district and also to a particular watershed. 2.2 Criteria for conceptual data models In order to evaluate the suitability of a conceptual data model, we will need to have criteria against which to evaluate it. Of course, we can say immediately that for a given conceptual data modelling technique to be useful in our situation, it must support modelling situations with the characteristics of complex problem domains. I.e., it must support modelling complex objects with multiple levels of granularity and partially overlapping components. Beyond these functional criteria, however, we can determine other criteria which evaluate the usability of the conceptual data model from other points of view, such as ease of learning, comprehension, or use. This has been discussed in the IS literature. The remainder of this section will review, compare, evaluate, and enhance criteria developed in the literature, then establish a synthesized set of criteria for use later in the dissertation. Section 2.2.1 presents and discusses criteria from the literature. Section 2.2.2 presents some additional criteria. Section 2.2.3 synthesizes a set of criteria which will be used to evaluate conceptual data models later in this dissertation. 2.2.1 Some Principles of Data Model Design As [Shneiderman] (p. 161) notes, [McGee] gives criteria for evaluating data models 19
based on human factors considerations. Presumably this means that a model which meets these criteria should help the human user to understand the situation (problem domain) being modelled. The criteria include: Simplicity "... should have the smallest possible number of structure types, composition rules, and attributes." Elegance "... should be as simple as possible for a given direct modelling capability." Picturability "... should be displayable in pictorial form." Modelling directness "... should not provide equivalent direct modelling techniques." Overlap with co-resident models "... should mesh smoothly with other co-resident models." Paritionability "... should have structures which facilitate the administrative partitioning of data." Nonconflicting terminology "... use terminology which does not conflict with established terminology." This description by Shneiderman is now rather dated, since he used it to compare the network, hierarchical, and relational models, alluding only briefly to "models which can represent more semantic information." Still, while there seems to be some overlap among them (e.g., simplicity and elegance) and they are vague ("ill-defined" according to Shneiderman), many of the above criteria are useful here. More recently, [Batani et al. 1992] (pp. 29-30) identify four qualities of conceptual data models. They also identify two properties of graphic representations, noting that "The success of the model is often highly correlated with the success of its graphic 20
representation, ... ". Clearly they have adopted the picturability criterion from McGee as a must. These six things can also be interpreted as criteria. They are: Expressiveness "... models that are rich in concepts are also very expressive." Simplicity "... so that a schema built using that model is easily understandable to the designers and users ..."
Minimality "... every concept present in the model has a distinct meaning from every other concept ..." Formality "Formality requires that all concepts have a unique, precise, and well-defined interpretation." Graphic completeness "A model is graphically complete if all its concepts have a graphic representation." Ease of reading "A model is easy to read if each concept is represented by a graphic symbol that is clearly distinguishable from all other graphic symbols." Note that simplicity as defined by Batini et al. is not the same as defined by McGee. Note also that minimality as defined by Batini et al. is not like McGee's definition of simplicity, as we might expect, but more like McGee's definition of modelling directness. 2.2.2 Other Criteria In addition to the above criteria from the literature, we can establish other criteria. Those additional criteria include: Direct correspondence 21
There should be a direct correspondence of the model constructs to the features of the problem domain. This makes modelling simpler. Correspondence to conceptual structures The modelling constructs should correspond to the natural conceptual structures of the model user. Natural graphic constructs The physical nature of the graphic construct should correspond to the expectations of the user as to their meaning. This will make the model easier to learn and more suitable to use. Clarity Models created using the technique should be clear. This has two aspects. First, models should be unambiguous. Second, they should be precise in their description and meaning. Orthogonality of constructs The different constructs provided in the modelling technique should be orthogonal to each other in two ways, semantically and graphically and/or visually. Semantically, this is the same as the idea that there should be one and only one way (syntax) to express an idea (semantic concept). Graphically and/or visually, this means that the physical constructs should be easily distinguishable from each other, so that a model constructed using the technique is more easily interpreted (by someone who knows the meaning of the syntax).
Note that this last criteria corresponds in its first sense to McGee's modelling directness and to the definition of minimality in [Batini et al. 1992]. The second sense (orthogonality of graphic constructs) is similar to ease of reading in [Batini et al. 1992]. 2.2.3 Synthesis of Criteria 22
Drawing on all of the above criteria, we can merge them into a list of criteria which we may use to evaluate conceptual data models. Following from the division in [Batini et al.], we divide them here into three groups, criteria about (1) semantic concepts, (2) syntactic (graphic) constructs, and (3) the conceptual data model's relationship to other areas. Criteria for semantic concepts: 1.
Richness - Does the conceptual data model have sufficient semantic concepts to describe all the relevant aspects of the problem domain, so that no relevant ambiguities remain? In the case of this dissertation, among other things, can the model handle complex objects, at multiple levels of granularity, and which may have partially overlapping components?
2.
Minimality - Does the conceptual data model exclude unneeded, irrelevant features?
3.
Semantic orthogonality - Does the conceptual data model provide non-redundant, orthogonal semantic concepts?
4.
Problem domain correspondence - Do the constructs of the conceptual data model directly correspond to aspects of the problem domain?
5.
User conceptualization correspondence - Do the constructs of the conceptual model correspond directly to the user's means of abstracting from the problem domain?
6.
Precision - Are the semantic concepts of the conceptual data model defined formally enough that their meanings are unambiguous and well understood?
Criteria for syntactic constructs: 7.
Picturability - Is the conceptual data model represented in a graphical form?
8.
Syntactic-semantic correspondence - Is there a one-to-one correspondence between each semantic concept in the conceptual data model and a single syntactic construct to represent it?
9.
Visual expectation correspondence - Does the graphic/visual form of each syntactic 23
construct naturally suggest the meaning of its corresponding semantic concept? 10. Graphic completeness - Is there a graphic syntactic construct to represent each and every semantic concept in the conceptual data model? 11. Graphic orthogonality - Is each graphic construct easily visually distinguishable from other graphic constructs, rather than having different constructs which interfere with understanding other ones? Criteria for relationships to other areas: 12. Partitionability - Is it easy to divide the conceptual data model into smaller parts so that one can deal with one part of the problem domain at a time? 13. Non-conflicting terminology - Does the terminology of the conceptual data model follow accepted and/or commonly used terminology? Does the conceptual data model avoid using new terms (when not needed) or old terms in different ways?
2.2.4 Comparison of Criteria Table 2-1 on the next page shows how the synthesized criteria relate to the criteria of [McGee] and [Batini et al.]. Criteria shown in the same row of the table are related, but don't necessarily mean the exact same thing. Note that some more general criteria, such as simplicity and elegance in [McGee], are expressed in the synthesized criteria through several more specific (and precise) criteria. The more precise definitions can be better operationalized for objectively evaluating conceptual data models. Chapters 3 and 4 use these criteria to evaluate both existing conceptual data models and the one proposed in this dissertation.
24
Table 2-1: Comparison of Criteria for Evaluating Conceptual Data Models
)))))))))))))))))))))))))))))))))))))))))))))))))))))
Synthesized Criteria (Venable)
McGee
Batini et al. 1992
)))))))))))))))))))))))))))))))))))))))))))))))))))))
Criteria for semantic concepts Richness
Expressiveness
Minimality
Simplicity, Elegance
Simplicity
Semantic orthogonality
Modelling directness, Simplicity
Minimality
Problem domain correspondence User conceptualization correspondence Precision
Formality
)))))))))))))))))))))))))))))))))))))))))))))))))))))
Criteria for syntactic constructs Picturability
Picturability
Syntactic-semantic correspondence Visual expectation correspondence Graphic completeness Graphic orthogonality
Graphic completeness Simplicity, Elegance
Ease of reading
)))))))))))))))))))))))))))))))))))))))))))))))))))))
Criteria for relationships to other areas Partitionability
Partitionability, Overlap with co-resident models
Non-conflicting terminology
Non-conflicting terminology
25
Chapter 3 Existing Conceptual Data Modelling Approaches
The purpose of chapter 3 is to describe and critically evaluate existing conceptual data modelling approaches. The evaluation will utilize the criteria developed in chapter 2. This chapter will describe and evaluate five conceptual data modelling techniques. They have been selected according to their fame and their contribution to conceptual data modelling. First, the basic entity relationship (ER) model is covered in section 3.1. Then, the extended entity relationship (EER) model is described in section 3.2. Third, the Nijssen Information Analysis Method (NIAM) data model is analyzed in section 3.3. We evaluate the entity-category-relationship (ECR) model fourth in section 3.4. A fifth model, the Object Modelling Technique (OMT) model, is described in section 3.5. Finally, a summary of the evaluations is presented in section 3.6. 3.1 Entity Relationship Model The graphical entity-relationship (ER) model was published by [Chen] and has been the basis for many other models. This section reviews it. 3.1.1 Description The basic ER model is a fundamental view of data. The ER model was one of the first conceptual data models in a graphical form. It made a very significant contribution by proposing a fundamental abstraction mechanism which divides the description of problem domains into the various entities (things or objects, whether real or abstract) and the relationships (associations) between them. The problem domain that is modelled is restricted to those entities and relationships that we want to keep information (data) about. In the ER model, entities are shown in rectangles and relationships are shown in 29
diamonds, with lines connecting entities to relationships and vice versa. Names given to the entities and relationships are shown within their respective rectangles and diamonds. The ER model has a very strong intuitive appeal and is very widely used. An additional feature provided in the ER model is cardinality constraints. Cardinality constraints show limitations on the extent to which an entity may or must be associated with entities at the other end of a relationship. A common kind of cardinality constraint it the maximum number of entities with which an entity may be associated. This is usually shown as either one (1) or many (n), meaning more than one. This means that an entity (instance) may be associated with either "at most 1 instance" or "up to n instances" of entities of the type at the other end of the relationship. Usually this is shown by placing the 1 or n near entity rectangle at the other end of the relationship. Some versions show this with a crows foot notation on the line for many and a normal line for one. An entity can sometimes be characterized as being existence dependent on another entity. Such a dependent entity is called a weak entity [Chen]. The idea of a weak entity is concerned with information relevance rather than physical existence; i.e., when we stop keeping information about the entity on which a weak entity is existence dependent, we will no longer be interested in keeping information about the weak entity itself. Therefore it is a characteristic of our artificial information system world rather than of the real world that it models. A weak entity is expressed in the ER model with a double rectangle. Existence dependence is also expressed through the relationship via which the weak entity is associated with the entity on which it is dependent. Existence dependency via a relationship is expressed on the ER diagram with an arrow from the relationship diamond toward the existence dependent entity. Note that this also expresses the idea of mandatory participation in the relationship on the part of the existence dependent entity. 30
Finally, we should say something about roles in relationships. An entity's role in a relationship is "the function that it performs in the relationship" [Chen]. In ER diagrams, the role name is often omitted if it is otherwise implied in the context, but the role name may be explicitly added along the line connecting the entity to the relationship in which it plays that role. The structure of an ER model is often strongly reflected in the structure of a database or set of files in a system design or implementation. The attributes are reflected in the data elements in data structures and the fields in reports and forms. 3.1.2 Evaluation For all of its strengths, the ER model suffers from some important disadvantages. In this section we will evaluate the basic ER model against the criteria developed in chapter 2. The most serious deficiency of the basic ER model is in its assessment against the first group of criteria - those for semantic concepts. Most importantly, the basic ER model doesn't have sufficient richness of constructs to adequately support complex problem domains. It doesn't directly support aggregation, although the relationship construct is sometimes used to aggregate entities by giving the relationship a name like "component of" or "part of" which implies aggregation. Even so, it does not support aggregation of relationships. Clearly if it doesn't provide support for aggregation, then it doesn't support overlapping aggregates or multiple levels of granularity either. As for the other semantic criteria, the basic ER model performs fairly well. In particular, the model is very minimal in semantic content which makes it easy to learn and use for many other kinds of problem domains. It also has very good semantic orthogonality. As for syntactic criteria - the second group of criteria, the basic ER model also does fairly well. It has good one-to-one correspondence with the model's semantics, the use of 31
lines to link entities and the use of relationship diamonds along the lines, which visually suggests the continuation of the relationship, is good. It is graphically complete and the constructs are visually orthogonal. The two main syntactic difficulties are understanding the cardinality constraints the way they are expressed (by placing the number at the opposite end of the relationship from the entity that is constrained) and understanding which way the relationship name is to be read (which can be clarified by including role names). The former of these is noted as a problem addressed by NIAM [Nijssen and Halpin] (p. 71). In the third group of criteria, the model's relationship to other areas, while the model is partitionable, there is little guidance for doing so and no natural groupings. This has been an area of extensive work since the original ER paper was published. The model also used existing terminology well, particularly as it broke new ground. 3.2 Extended entity relationship models The entity relationship model is often extended to provide additional semantic constructs which improve its ability to describe complex situations. One of the bestknown, the Extended Entity Relationship (EER) Model of the Logical Relational Design Methodology (LRDM) was presented in [Teorey et al.] and is described here. 3.2.1 Description [Teorey et al.] extended the basic ER model notation with attributes (following the work of others). [Chen] had described attributes, but didn't incorporate them into ER diagrams. Attributes are items of information (data) which describe or identify entities. The attributes in an EER model show us what kinds of information we are interested in supporting about the various entities and relationships supported by an information system. In the EER model, the attributes are shown as horizontal ovals, connected by a line to the entity or relationship which they describe or identify. Identifier attributes are distinguished from descriptor attributes by underlining the name of the identifier. 32
Teorey et al. also extended the ER model by characterizing the degree of the relationships. They distinguished between unary, binary, and ternary relationships according to the number of entity types participating in the relationship (1,2, or 3). Chen also allows relationships of degree > 2, but didn't distinguish between them. Teorey et al. use a triangle, rather than a diamond to represent a ternary relationship, connecting each of the three entity types participating in it to different corners. Entity types in unary or binary relationships are connected to opposite corners of the diamond representing the relationship. Like the ER model, Teorey et al. provide a notation for denoting the maximum cardinality. Teorey et al. use the term connectivity to denote a maximum cardinality constraint. A maximum cardinality of many (n) is shown by shading the portion of the relationship symbol to which the entity is connected. A maximum cardinality of 1 is shown by not shading the same portion. Teorey et al. further extended the ER model by providing additional cardinality constraint information on the relationships. Sometimes cardinality is further augmented (or maybe combined with) a minimum number of associations, usually a choice between zero and one. Another way of saying the same thing is that the relationship is optional (minimum = 0) or mandatory (minimum = 1). They use the term membership class to denote the minimum cardinality. A minimum cardinality of 0 (optional) is shown with a small circle drawn through the line connecting the entity to the relationship. A minimum cardinality of 1 (mandatory) simply uses a plain line without a circle through it. In other conceptual data models, various other notations exist for minimum cardinality and sometimes it is combined with maximum cardinality in a single notation. So far, the extensions discussed have only been modifications to the ideas already present in the ER model. The EER model also added two new ideas, subset hierarchies and generalization hierarchies. Both subset and generalization hierarchies express the 33
"IS-A" or generalization relationship. The difference is that a subset hierarchy allows overlapping subsets while a generalization hierarchy does not. Thus, we could say that the EER model's generalization hierarchy divides an object type into disjoint or exclusive specializations, while a subset hierarchy divides it into non-disjoint or non-exclusive specializations. Teorey et al. also note that the specialization in a generalization hierarchy is done based on the value of a single-valued partitioning attribute. In the EER model, a subset hierarchy is represented by a fat arrow from the specialized entity type to the more general entity type. A generalization hierarchy is represented by fat arrows to an elongated hexagon with the name of the partitioning attribute inside it. The hexagon in turn has a fat arrow connecting it to the more general entity type. To summarize, the EER model of Teorey et al. add notation for descriptor and identifier attributes, add a different notation for ternary relationships (unary and binary relationships use Chen's diamond notation), use a different notation for maximum cardinality, add a notation for minimum cardinality, and add two new (but similar to each other) notations for non-exclusive (subset hierarchies) and exclusive (generalization hierarchies) generalization relationships, the latter including information about the partitioning attribute. 3.2.2 Evaluation From the semantic richness point of view, the major contribution of the EER model over the basic ER model was the addition of support for generalization through the subset and generalization hierarchies. It additionally supports the useful idea of mandatory or optional membership in a relationship, which the basic ER model did not. It also provided a way to represent attributes and to distinguish identifier (key) attributes from descriptor attributes. However, the EER model does not support aggregation of objects into complex objects and therefore is insufficiently rich for complex problem domains. To be fair, the 34
authors do mention "aggregation among entities" and describe it as a special case of a binary relationships, which can be treated like a binary relationship. However, this again does not support aggregation of relationships, the same limitation of such use for the basic ER model. Furthermore, in our opinion, the introduction of unary relationships (degree = 1) was a vacuous distinction. How could there be only one party in a relationship? What would a ternary relationship between 3 entities of the same entity class be called. A better term would be a homogeneous relationship, as in NIAM. Finally, relationships of degree > 3 were unsupported. As for the syntactic criteria, first, the EER model is picturable graphically. A large problem is that the correspondence of syntax to semantics is rather uneven. The semantic problem with unary relationships is exacerbated because both binary and unary relationships are graphically represented with a diamond. On the other hand, a triangle is used for a ternary relationship. This could be confusing, but at least makes the degree of the relationship clear (in this case). The EER model is not graphically complete in that representation of relationships of degree > 3 is neither shown nor discussed. The graphical constructs used are easily distinguishable from each other. Subset and generalization hierarchies are easily distinguishable from ordinary relationships. Their notations are similar, which is good since their concepts are related, but still easily distinguishable. It is hard to say what visual form would naturally correspond to the idea of generalization, but the wide arrows used seem reasonable and in conjunction with placement of subtypes below supertypes gives good results. The direction of the arrow toward the supertype seems to be a good convention too. Finally, we look at the third group of criteria. Like the ER model, no guidance is given for partitionability. The EER model also introduced some conflicts with 35
established terminology, particularly with its introduction of the "unary relationship" and the terms "connectivity" and "membership class" in place of cardinality constraints. 3.3 NIAM data model The Nijssen Information Analysis Method (NIAM) conceptual data modelling technique has been developed and refined over the years [Verhijen and Van Bekkum] (1982), [Nijssen and Halpin] (1989), [Wijers, ter Hofstede, and van Oosterom] (1991). In the first of the above publications, the notation for the model is called Information Structure Diagrams (ISDs). In the 1989 publication, the notation is called Conceptual Schema (CS) Diagrams. The latest of the three publications above applied the technique to information modelling for the CASE domain. Unless noted otherwise, this section and the page references below refer to [Nijssen and Halpin]. 3.3.1 Description Like the basic ER model and the EER model, the fundamental concepts of the conceptual data model embodied in NIAM are entities and relationships. A significant difference between NIAM and other ER-based conceptual data models is that things which are considered to be attributes in the above data models are entities in the NIAM conceptual data model. NIAM also permits objectification of relationships and has a very well-defined set of constraint constructs. NIAM divides its entities into entity types (called non-lexical object types or NOLOT in the 1982 publication) and label types (called a lexical object type or LOT in the 1982 publication). This distinction was not pursued in the most recent literature applying NIAM to CASE. Both labels and entities are referred to as objects. Entity types are represented in the NIAM notation by ellipses (or circles) with the name of the entity type inside (or beside) it. Label types are represented with broken (dashed) ellipses with the name inside (or beside) it. In the later literature applying NIAM to CASE, the authors use the term concept instead of entity type (or object). 36
Relationships are divided into facts (called idea types in 1982), which link entity types, and references (called bridge types in 1982), which link entity types to label types. Idea types and bridge types are also referred to as associations, especially in the latest literature applying NIAM to CASE.
Ternary and higher degree relationships are
permitted in NIAM. In the NIAM notation, associations or relationships are represented by a rectangle which is divided into sections, one section for each role; a binary relationship has two sections, a ternary relationship has three sections, and so on. The name of each role is written inside (or beside) its role box. Alternatively, a name is created for the relationship with a "slot" in it for each role's entity type (shown by ellipses marks), e.g., "... for ... scores ..." When read, entity type names are substituted for the slots, e.g., "student for subject scores rating" (with entity type names italicized). The slots in the name should be in the same order as the role boxes. In the NIAM data model, like the data models described above, a role can only be played by a single entity type (pp. 57 and 320). The newer (1989) version of the NIAM notation permits labels types which identify an entity type (keys) to be placed within parentheses inside the entity type ellipse to which they relate. The reference relationship is then not shown (but is implied). This makes the notation much more concise. Concatenated identification labels are permitted. They are shown in the same way, but with a "+" between the parts (p. 164). In the 1989 publication, NIAM permits, but rarely uses, something called unary fact types (pp. 51-52). These are fact types with only a single role! Any unary fact type may be (and usually is) transformed into a binary fact type. This notion and the resulting binary relationships correspond to attributes or properties in the above data models. An entity type constraint limits the values that a particular entity of an entity type may assume (pp. 111-113). It is used like an attribute data type definition. Entity type 37
constraints are represented in the NIAM notation by an annotation beside the entity type. Possible entity type constraints include enumerated types, character strings with a maximum length, integer ranges, and alphanumeric combinations. NIAM also permits nested fact types, which are also called objectified relationships (p. 54). This is when a relationship is made into an object allowing the relationship itself to participate in other relationships. Nested fact types must have a spanning uniqueness constraint (p. 87), discussed below. NIAM allows fact types to be derived, which the notation represents by marking them with an asterisk (pp. 58-59). The derivation method must be annotated elsewhere. One of the strengths of NIAM is that it supports a wide variety of constraints. It permits three kinds of constraints which may be applied within a single relationship. When so applied, these are generically called intra-fact-type constraints. Constraints between different relationships are called inter-fact-type constraints. There are two kinds of constraints which may be applied as either intra-fact-constraints or inter-fact-type constraints. They are uniqueness constraints and mandatory role constraints. They will be discussed first as intra-fact-type constraints. Occurrence frequency constraints are only on a role or combination of roles within a single relationship. A uniqueness constraint (called identifier constraints in 1982) limits the population of a single role or a combination of roles. A uniqueness constraint is represented in the NIAM notation by placing a double-ended arrow along or above the role (or roles) being constrained.
First we consider uniqueness constraints applied to binary relationships. A
single role which is under a uniqueness constraint may be related to only one of the objects in the other role of a binary relationship. If only one of the roles in is constrained, a single arrow is placed alongside that role. This corresponds to a 1:N (one to many) relationship. If both roles are each constrained individually, two double-ended arrows are used, one for each role. This corresponds to a 1:1 relationship. If neither role is 38
constrained, a single double-ended arrow is used above or along both roles. This denotes that, while the roles are not constrained individually, a particular combination of the two roles may still only occur once, i.e., their combination is constrained to a single occurrence. This corresponds to an N:M (many to many) relationship. Uniqueness constraints also apply to ternary and higher degree relationships. They are indicated in the NIAM notation in a way similar to that for binary relationships, with double-ended arrows paralleling the role boxes to which they apply. There may be gaps in the arrows over roles that happen to be between those in the combination. The authors note that a uniqueness constraint must always apply to a combination of at least n-1 roles for an n-ary relationship, otherwise, the relationship could be divided into smaller ones and the same information derived from them. The second intra-fact-type constraint, a mandatory role constraint (called a total role constraint in the 1982 publication) constrains the population of a single role. An entity is compelled by a mandatory role constraint to participate in that role in a relationship. A mandatory role is indicated in a NIAM diagram by a dot at the point where the line connecting the mandatory role to the entity playing that role joins the entity. The third intra-fact-type constraint is an occurrence frequency constraint, which specifies an exact number of times that an entity must play a role or combination of roles within a fact type. The number may be a single value, a range of values, a list of values, a list of value ranges, or some other specification. This kind of constraint is used when uniqueness and/or mandatory role constraints do not suffice by themselves. A major feature of NIAM is its use of inter-fact-type constraints between different relationships. Uniqueness (identifier) and mandatory (total) role constraints have so far been applied only to single relationships, but may be applied to more than one. NIAM also employs three other kinds of constraints which always apply to more than one relationship. These are equality, subset, and exclusion (disjoint) constraints. 39
As noted above, a uniqueness constraint is usually an intra-fact-type constraint. However, we have already seen that a uniqueness constraint may be applied across more than one role in a single relationship. Similarly, a uniqueness constraint can also be applied across two or more roles in different relationships. However, it can only be applied to roles which are not otherwise uniqueness (identifier) constrained (intra-facttype constraint, as discussed above). When applied to two roles, one in each of two binary relationships, the combination of the two constrained roles identifies an object which is connected to both of the other roles in the two relationships. It also implies that the other two roles in the two relationships are identity constrained and that they are additionally linked by an equality constraint (discussed below). When constraining roles in different relationships, a uniqueness constraint is represented in the NIAM notation by a letter "u" inside a small circle which is connected via dashed lines to the two roles. Their combination uniquely specifies a single object at the other end of both of the two relationships. A single mandatory role constraint can also be applied to more than one role, to show a "role disjunction." This means that an entity must participate in at least one (but not necessarily all) of the roles. A role disjunction is represented in the NIAM notation by joining the lines from the roles to the single dot on the entity border which represents the particular mandatory role constraint (role disjunction). Equality, exclusion, and subset constraints constrain either two roles or two combinations of roles in two different relationships. Single role pair constraints will be discussed first. An equality constraint is a constraint on the population of two roles played by the same entity type in two different relationships. Its meaning is that an entity which plays one of the two constrained roles must also play the other role, i.e., it must serve (at least once) in both roles or neither. It is represented in the NIAM notation by a dashed line 40
connecting the two constrained roles, with optional arrows on the ends. (In 1982, the notation was an equal sign inside a small circle which was connected by lines to the two roles, whose populations are mutually constrained.) Note that both equality constrained roles must be optional, otherwise they would both be mandatory and the mandatory role constraint would suffice. In an exclusion constraint, two roles that an entity type could potentially play are constrained to be mutually exclusive, i.e., an entity playing one of the two constrained roles may not play the other constrained role. It is the opposite of an equality constraint. Unlike an equality constraint, an exclusion constraint can be used even if both roles are connected to the same mandatory role constraint (a role disjunction). An exclusion constraint is represented in the NIAM notation with a letter "X" on a dotted line connecting the two constrained roles. A subset constraint constrains the population of one role played by an entity type to be a subset of the population of another role of the same entity type. A subset constraint is implied if one or more of the roles is mandatory, and should therefore be omitted from a NIAM diagram unless both roles are optional. It is represented in the NIAM notation by a dashed arrow pointing from the subset toward the superset. In the 1982 notation, a letter "s" inside a small circle was superimposed on the arrow. As mentioned above, equality, exclusion, and subset constraints may also be applied between the populations of combinations (called sequences) of roles in two relationships (instead of constraining one single role against another single role). In the NIAM notation, an equality, exclusion, or subset constraint on combinations of roles is shown the same as for single roles, except that instead of joining the dashed lines to individual roles, the lines are connected to either the common border between the two roles (in tworole combinations where the roles are adjacent) or alternatively to supplemental lines which join the various members of the combination. 41
Like the EER model, NIAM supports generalization. In the 1982 and 1989 publications it is called subtyping. In the 1991 paper it is called specialization. A subtyping relationship is indicated by an arrow drawn from the subtype (specialized entity type) to the supertype (generalized entity type). In NIAM, multiple direct supertypes are allowed for a particular subtype (i.e., NIAM allows multiple inheritance). Annotations on a NIAM Conceptual Schema diagram specify the rules for subtyping, which must be in terms of a fact type. Mutual exclusion of subtypes, exhaustive coverage of a supertype by its subtypes, or their combination (partitioning) are no longer (see below) shown in the NIAM notation, but are shown (or implied) with the annotations. The 1982 publication developed the disjoint constraint, which is applied to two subtype relationships to show when an entity may not be of both subtypes, only one. Thus, it indicates exclusive specialization. It was represented in the NIAM notation by an inequality symbol inside a small circle, which is connected by dashed lines to the two specialization relationships. This constraint type was not pursued in the 1989 publication, relying instead on inference from supplementary specifications of the subtypes. 3.3.2 Evaluation From a semantic viewpoint, the NIAM data model is very rich, particularly in its support for the specification of constraints. However, its primary semantic weakness in this research is that it is insufficiently rich because it provides no direct support for aggregating either entities or relationships. The very rich nature of NIAM works against it from a minimality point of view. First, the need for so many constraint types is certainly debatable, but if a constraint isn't needed, it doesn't have to be used. NIAM has several problems with semantic orthogonality. First, NIAM treats entities and attributes as the same, when, in my opinion at least, they should be treated differently, 42
as in the ER and EER models. The authors point out that they deliberately use a single concept, the fact type, instead of both attributes and relationships, because knowing "when to encode a fact as a relationship and when to encode a fact as an attribute of an entity" is "a difficult task to determine" (p. 313). However, treating attributes as entity types also introduces extra (unneeded) instances of both entity types and fact types; one attribute is traded for both an entity type and a fact type which relates it to another entity type, cluttering the diagram. Attributes can be represented more compactly, or omitted from ER or EER diagrams. The resulting fact types also have vague names such as "has" or "is of". Also, the concept of what is an entity becomes too vague. To be fair, having attributes conflicts with minimizing the number of concepts, but I think not having them is a bigger problem. It is my opinion that the choice results from NIAM's underlying emphasis on stored descriptions (which should be implementation concerns) rather than the problem domain itself. A second related semantic orthorgonality problem is that it is (sometimes) difficult to distinguish whether something should be an entity type or a label type. Of those entity types which could be considered as attributes in other data models, some could instead be considered to be label types in NIAM. The difference is somewhat vague, based mostly on intended usage. Third, there is some overlap between the various constraints, but this is unavoidable because some constraints are weakened versions of others and therefore implied when a stronger constraint is enforced. Finally, the objectification of relationships confuses the entity type and relationship constructs somewhat. The model seems to correspond well to the problem domain except for the lack of aggregation support and the treatment of attributes as entities. The detailed constraints and treatment of attributes are probably difficult concepts for users to comprehend and didn't correspond to at least this user's conceptualization. However, this is an unfair criticism because the NIAM notation and data model are not 43
intended for interacting with the user. The authors of the 1989 publication go to great lengths to show a workable scheme for interacting with the user using examples of data, then show how the developer maps this onto the data model for his/her own use. The NIAM data model is very precisely defined. It is the only conceptual data model known to me with a complete and thorough treatment of the handling of higher degree (nary with n > 2) relationships and the constraints are exceptionally well defined. Shifting to the syntactic point of view, the model is also generally very good. It is pictured graphically and most (but not all) concepts are graphically represented. Those left over are represented with textual annotations. The model is so semantically rich that it is difficult to imagine how a graphic construct would be possible for every concept. It seems that concepts which would be difficult to express graphically anyway (with dubious visual expectation correspondence) are those which are left to annotations. One possible exception is the differentiation between disjoint and overlapping subtypes, as well as the partitioning attribute, which might be shown on the diagram, instead of having to derive the information from several (at least two) annotations. Generally there is good correspondence from syntax to semantics, with one syntactic construct for each semantic concept. Avoiding redundant textual and graphic representation is a partial argument against graphically representing mutual exclusion of subtypes. The visual expectation correspondence is very good too. The semantics of entity types playing roles in a relationship is very well supported, and the easy differentiation of higher degree relationships is very nice. However, the role and relationship naming conventions used by the authors in the 1989 publication are often difficult to follow. The use of nouns for role names (which the authors occasionally do) would be better. Using verbs in role names still leaves the problem of which way to read the sentence (which entity type is the subject and which is the object in the sentence?). Sometimes this 44
problem is a result of treating attributes as entities, which make naming relationships difficult. The graphic orthogonality is also very good. While some of the inter-fact-type constraints might resemble each other (e.g., equality and subset constraints), at least it is clear they are constraints. The use of dashed lines/arrows for constraints is very helpful in this regard, otherwise, for example, subtyping arrows might potentially be confused with subset constraint arrows. Finally we evaluate the criteria relating NIAM to other areas. As for partionability, the authors note (p. 114) that there may be "whole schema diagrams" and "subschema diagrams" and that the subschema diagrams are often developed first and later merged. They provide no advice however on how they might be kept separate to provide simpler views. They do, however, discuss how having subschema diagrams relates to the use of implied constraints. In principle, the model should be partitionable, but the authors give no explicit help in showing how. A rather large problem exists in the use of terminology which conflicts with other data models. In particular, the use of entities to mean both attributes (undistinguished as such) as well as the more commonly accepted meaning of entities, is potentially extremely confusing. To be fair, the NIAM model has a long history dating back to when such terminology was less fixed and accepted than now. Also, having a different kind of semantics means some conflict is necessary. However, it would be much better if the model were related to the more generally accepted meanings of entities and attributes. 3.4 Entity-Category-Relationship data model The Entity-Category-Relationship (ECR) data model was published by [Elmasri et al.] in 1985. 3.4.1 Description The ECR model extended the basic ER model with two constructs, the subset 45
category, which shows specialization, and the grouping-of-entities category, which allows multiple entity types to participate in a role in a relationship. Additionally, it extended the basic ER model with the notion of classifications of entity participation in roles, related to cardinality constraints. Another addition is that attributes may be aggregated and/or multi-valued. The subset category is used to divide an entity type into more specialized entity types, called categories. Subset categories may be specified as being disjoint or not. The notation used places each subset category in a hexagonal box (to distinguish it from entity types and relationship types). The subset category is connected to the entity it is a subset of by a line. A subset symbol pointing in the appropriate direction is superimposed on the line. If two or more subset categories are disjoint, they are connected by their lines to an intervening small circle with a letter "d" in it instead of directly to the entity of which they are a subset. The small circle is then connected by a line to the superset entity. A grouping-of-entities category is used to allow more than one entity type to participate in a particular, single role in a relationship. As they point out, the restriction that only a single entity type is allowed to participate in a role in a relationship is too restrictive. For example, both individuals and corporations (two entity types), might own vehicles. These could then be grouped into an owner category, which is the union of both other types. In the ECR notation, a grouping-of-entities category is shown by creating a category in a hexagon, which is connected by a line to a small circle with the letter "u" inside, to represent a union of entity types. The subset symbol is again used between the category and the circle, showing that the category is a subset of the union. The small circle is connected by other lines to each entity type which is grouped into the category. A subset category may also be a grouping-of-entities category as shown by the links to other items. Similarly, an entity in the ECR model may also be a category. Therefore, the authors introduce the notation that shows that an entity type can also be a category on 46
its own - the symbol for which is an entity rectangle superimposed on a category hexagon. The standard ER notation of a diamond is used to represent a relationship. Both entities and categories may participate in relationships. The ECR model also includes notations to represent 4 different types of constraints on the participation of a category in a role in a relationship. The authors noted that a pair of numbers (i1, i2), where each entity that is a member of a category must participate at least i1 and at most i2 times in the particular role, may be used to specify most participation constraints that apply. The default (no constraint) is i1 = 0 and i2 = infinity. They define partial participation as that where i1 = 0. They represent it with a single line between the relationship and the category. They define total participation as that where i1 >= 1. They represent it with a double line between the relationship and the category. They define functional participation as that where i2 = 1. They represent it with an arrow from the relationship to the category. They define specific participation as total participation in which a relationship, once established, cannot be changed. This last constraint, that the relationship cannot be changed, cannot be represented by the (i1, i2) pair. They represent it with a triple line between the relationship and the category. Aggregate (composite) attributes are shown simply in the ECR notation by connecting the attributes' ovals with lines. The direction of aggregation is toward the entity/category to which the attribute hierarchy must ultimately be connected. Multi-valued attributes are indicated with the initials "M.V." next to the attribute oval. 3.4.2 Evaluation The ECR model, like the others above, is insufficiently rich because it provides no constructs to deal with aggregation of entities and especially relationships. As for minimality, I doubt the usefulness of the specific participation constraint. The use of both categories and entities, both of which can participate in relationships, be linked to other categories, and have attributes, complicates the model's use. The links 47
between the subset category and its superset or between the grouping-of-entities category and the entities which it groups is sufficient to represent the idea. When an entity is also a category this is particularly dubious. The biggest problem with the ECR model is the confusion presented by two different, but similarly named, constructs, i.e., generalization (superclass) categories and ISA (subclass) categories. This is extremely confusing. The same term is used for opposite abstractions. The introduction of a concept for grouping entity types into unions so that different entity types may play the same role (type) in a relationship seems useful enough, but is hopelessly confused in this model with existing subset and specialization semantics. The model seems to correspond to the problem domain and the users' conceptualization model, except for the confusions noted above and in the next paragraph. The ECR model is fairly precisely defined also, with one exception. This is when choosing whether something should be an entity or a category, as mentioned above. A relationship in ECR is formally defined as between categories. In a number of figures in the paper, categories are used instead of entities for no other discernable (to this reader) reason. However, in the two figures (14 and 17), an entity which is not also a category participates in a relationship. If both categories and entities can participate in a relationships, why should something be both? From the syntactic point of view, first the model is represented graphically. There is generally good one-to-one correspondence between the semantic concepts an a syntactic representation. It is also graphically complete. There are only a few small problems with visual expectation correspondence. The use of subset symbols to represent the direction of a subset relationship is useful, if the symbol is known to the user. The increasing emphasis on relationship lines (one, two or 48
three lines) with increasing constraints (partial, total, specific) is somewhat meaningful, but especially since it is supplemented with i1 and i2 figures. The use of an arrow for functional participation is also meaningful. Enclosing entity types grouped into a generalization category within a dashed hexagon (as on p. 79) would be meaningful, but this is not the notation used. The biggest problem with the ECR model's syntax is in graphic orthogonality. The same symbol is used to mean a subclass in one abstraction and a superclass in another abstraction. The only difference in the basic notation is the direction of the subset symbol on the line. Furthermore, entity types which are also categories look a lot like regular categories. Finally, with respect to the ECR model's relationship to other areas, it provides no direct support for partitioning. It introduces some alternative terminology, such as the classes of participation, which gives a different semantics to constraints. The use of the term "ISA category," instead of previous terms like subset or specialization, helps lead to the biggest conflicts in terminology, which are within the model itself. 3.5 Object Modelling Technique (OMT) The Object Modeling Technique (OMT) has been described in [Loomis et al.], [Blaha et al.], and [Rumbaugh et al.]. It is representative of the models which have grown out of conceptual data modelling techniques to support object-oriented systems analysis. Other examples include [Shlaer and Mellor] and [Coad and Yourdon]. OMT is the most mature in this reader's opinion and will be treated here. Some of the concepts are not strictly related to conceptual data modelling - in particular the concept of an operation, which is a processing concen, not a static data concern. 3.5.1 Description The primary differences between OMT and the models described above are that OMT adds the concept of aggregation of objects into complex objects and also adds the concept 49
of operations (methods) to the objects themselves. Instead of the horizontal oval notation for attributes used by the EER model described above, the OMT model uses a more compact notation by placing the attribute names within the entity rectangle. The entity rectangle is divided into sections with the entity type name in the top section and the attributes in a second section below. OMT makes no distinction between identifier and descriptor attributes. Attributes may also have a type associated with them. Additionally, each entity type can have methods associated with it. Methods are actions which can be performed that relate to entities of that type. The names of the methods are listed also within the entity rectangle, in a third section below the attribute names. The authors note that the attribute names and types and method names may be omitted during early development (when building a conceptual model). Unlike the models above, OMT does not use a separate symbol for binary relationships (called associations), only a line connecting them with the relationship name along side it. For ternary or higher relationships, a symbol (a horizontal diamond) is used, like in the basic ER model, except it is small and the relationship name is written alongside it. Role names may optionally be written along the lines connecting the entities to each other or to the relationship diamond. Like all of the models described above, OMT supports multiplicity (cardinality) constraints on relationships in its notation. An exactly one relationship (mandatory participation) is represented with just a regular connecting line. Zero or more (many) is shown with a filled in circle on the line, zero or one (optional) with an open circle, one or more with "1+" written beside the line, and other various possibilities (e.g., exactly 2) are represented with text alongside the line. Like the EER and NIAM models, OMT supports generalization. Unlike NIAM, OMT 50
differentiates graphically between disjoint and nondisjoint subclasses. Unlike the EER model, OMT uses a similar notation for both subset (nondisjoint) and generalization (disjoint) hierarchies. Both are represented by connecting the subclasses with lines to a triangle, which is in turn connected to the superclass. The only difference is that in disjoint subclass hierarchies, the triangle is open, while in nondisjoint subclass hierarchies, the triangle is filled in. The OMT notation also supports representation of a discriminator (partitioning attribute) by recording it next to the open triangle. OMT also supports multiple inheritance, therefore it is not a hierarchy of generalization, but a (directed, acyclic) network. Unlike all the models described above, OMT explicitly supports aggregation of objects into assemblies. (As noted in section 3.1.2, ordinary relationships could be used in the models above by giving the relationship a name like "component of" or "part of" which implies aggregation.) Component objects are connected to the assembly objects with a vertical diamond symbol superimposed on the line immediately adjacent to the assembly object. Multiplicity (cardinality) constraints can be placed on the lines of the component objects. OMT is a very rich model which also supports a multitude of other aspects. Its notation supports representation of derived attributes, classes, and associations, constraints between associations like in NIAM, objectification of relationships (making a relationship into an entity), display of instances and their instantiation, and qualification of relationships. Qualification of a relationship means showing an attribute of an object which has special significance to its relationship with another object. It may be done to clarify the relationship or reduce its cardinality. 3.5.2 Evaluation Looking first from the point of view of the semantic criteria, the first criteria is whether the model is sufficiently rich. OMT is certainly an extremely rich model, but we 51
have to consider whether it supports the characteristics of complex problem domains. It is the first of the models considered here which supports aggregation of objects into complex objects. Also, while it does not directly support aggregation of relationships into objects, it does support objectification of relationships (modelling an association as a class) which could then presumably be aggregated also into a complex object (although [Rumbaugh et al.] give no examples of this in their book). There are no limitations in the model about how aggregate objects can be related are aggregated to even higher levels, so presumably different levels of granularity could be supported. While it is not specifically mentioned, overlapping of components is presumably allowed. Therefore, OMT seems sufficiently rich to support the modelling of complex problem domains, albeit somewhat indirectly. The immense richness of OMT of course competes against minimality. Many concepts are unnecessary for most situations. However, they may later be useful and using only a subset of the features is only a minor disadvantage. However, the use of qualified associations introduces name spaces and referencing to the conceptual modelling at a stage when it isn't really needed. It needed, it should be brought up later in the process. The semantic concepts of OMT are for the most part clearly orthogonal to each other, in spite of their being so many of them. Only a few problems bear mentioning. First, the use of qualified associations places an attribute of one object with a different object. This is confusing. Also, the objectification of associations, as was mentioned also for NIAM, confuses somewhat the object and association concepts. The semantics generally correspond directly to the problem domain except for the following problem. The need to use objectification of an association in order to include it in an aggregation or other relationship is too indirect. Why not simply allow an association to also play a component role in an aggregation association? The 52
objectification is an unnecessary extra step. This problem also carries over into a problem of user conceptualization. Having to objectify an association doesn't correspond to a conceptualization in which associations between components are also components of complex aggregate objects. In spite of these limitations, the semantics of the model are precisely specified and understandable. From the syntax point of view, the OMT model also is picturable graphically and is pretty much graphically complete. It has a few problems though. The syntax does not always correspond on a one-for-one basis to the semantics in a few ways. Associations are represented in two ways, depending on whether they are binary or not. Also, attributes are attached to objects, except when they qualify associations. Then they are attached to a different object! Finally, minimum and maximum multiplicity constraints are combined and represented in several different ways. This latter problem carries over to visual expectation correspondence and is discussed below. From the point of view of visual expectation correspondence, there are a several problems also. The objectification of an association is not very clear from the notation. The diamond symbol used for aggregation is not intuitive either. Two of the symbols for multiplicity constraints, the circle and dot, resemble each other visually, but don't both correspond to the same maximum or minimum multiplicity constraint. If a dot were instead used to mean exactly one, then both symbols would both represent a maximum of one; only the minimums would be different. Furthermore, this choice would free a plain line to mean many (zero or more) instead, which would better correspond to the semantics for no constraint. The visual expectation for no constraint is better supported by no annotation to the line. From the point of view of graphic orthogonality, there are also a number of problems. 53
The three different kinds of relationships (association, generalization, and aggregation) are all represented by lines differentiated by small symbols on them. These are too similar visually, even if different symbols are used on the lines. This is particularly the case for associations and generalization, which both use diamonds. Also generalization and aggregation both have tree-like structures, which are visually similar. The problems with orthogonality will also complicate the representation of the needed semantics of overlapping components and multiple levels of granularity, which are not supported directly by the graphic symbology. It will be harder to see these semantics represented when the aggregation relationships can be confused with generalizations and associations. Finally, we consider OMT's relationship to other areas. The OMT model has constructs to support partitioning of the model into parts. An OMT object model may be broken into modules, which each represent a perspective. Each module may in turn be broken into sheets (a single printed page). The criteria for the division are, however, weak. OMT also has no major conflicts with other existing terminology, using commonly accepted terminology for concepts which existed before and introducing new terminology which doesn't conflict with old terms for its new concepts. 3.6 Summary Now we will summarize our findings on conceptual data modelling support for complex problem domains. The biggest problem was that of semantic support for complex objects with multiple levels of granularity and overlapping components. Of the five conceptual data models discussed, only OMT directly supports aggregation of objects. It does not directly support aggregating relationships between components as components themselves; it only indirectly supports this through objectification of relationships, which could then be aggregated. Multiple levels of granularity and overlapping components are not directly 54
addressed by the OMT model either, but could presumably be expressed with it. They would, however, be more difficult to see because of various problems with the graphic representation as discussed above. Of the five conceptual data models discussed above, the clearest treatment, both semantically and syntactically, for relationships (associations) is given by NIAM. The nature and degree of a relationship is made clear by the use of role boxes. The naming method for the roles could be more clear, however. Also, the representation for cardinality constraints could be more meaningful. NIAM's use of attributes as entity and relationship combinations is a serious problem, however. Either the method of treatment of attributes within OMT or that with EER is to be preferred. OMT's treatment of generalizations seems to be the cleanest and easiest to read graphically. The ECR model's concept of a grouping category, allowing more than one object/entity type (a union of types) to play a role in a relationship, seems useful. It is not found in the other models except indirectly through generalization. However, the ECR model's means of doing this seems very complicated and its requirement that only categories be allowed to participate in relationships seems overly restrictive. Therefore, it seems that no one of the above models is optimal for modelling complex problem domains. Only OMT seems to support the needed semantics, and then only indirectly. OMT also has some problems with the visual clarity of its syntax and the directness of the representation of its semantics, particularly in areas most needed for modelling complex problem domains. Instead, a new model can be developed which borrows some of the best features of the models described above, and adds better and more direct notations for the specialized aggregation semantics that are needed. Such a model will be described and evaluated in the following chapter. 55
56
Chapter 4 The CoCoA Conceptual Data Model
The purpose of this chapter is to present and evaluate the features and rationale of the CoCoA conceptual data model which is proposed in this dissertation. The criteria developed in chapter 2 are used for the evaluation. As pointed out in chapter 1, this dissertation proposes and demonstrates a new conceptual data modelling technique which is an extension of existing techniques (as discussed in chapter 3). The name chosen for this data model, CoCoA, is derived from its support for Complex Covering Aggregation, which is not supported in most other conceptual data modelling techniques. Complex Covering Aggregation is crucial to supporting modelling of complex problem domains. It is described in section 4.5.1.2 below. This chapter is organized to enumerate five different conceptual data modelling abstraction mechanisms supported within CoCoA, including: (1) entities (section 4.1), (2) named relationships (section 4.2), (3) specialization-generalization relationships (section 4.3), (4) categorization (section 4.4), and
63
(5) complex covering aggregation (section 4.5). Each section describes its construct's (1) meaning (semantics) and (2) notation (syntax), in the form of a diagramming technique. Finally, section 4.6 will evaluate CoCoA using the same criteria used to evaluate other conceptual data models in chapter 3. It will further summarize CoCoA's contribution in terms of its ability to support the conceptualization and understanding of complex, multi-granularity, partially overlapping problem domains. 4.1 Entities The term entity, as used in this dissertation and within CoCoA, refers to a concept for describing the real world or the problem domain, which may or may not be supported by an information system. An entity is an element of the problem domain (and its human description), not any (computer-based) implementation or description of them. Thus, it is used with its ordinary, everyday meaning. Similarly, the terms type and class are used in this chapter in their common-sense, everyday meaning, not as a programming language type or the like. An important issue is the distinction between the terms instance and type. The intent of the CoCoA model's entity construct is to describe both - in the following sense. People describe types of instances in order to make sense of a problem domain. The diagrams used here should then be read to say "instances of this type of entity", meaning "things of this kind". Again, the instances we refer to in this data model are in the problem domain rather than its (computer-based) implementation. 4.1.1 Semantics 64
An Entity may be conceptualized as a simple aggregation of its attributes (also called fields or properties). The term simple aggregation (also cf. "cartesian aggregation" in [Alagic]) is used to distinguish it from covering aggregation, discussed in section 4.5. Simple aggregation is a relationship between an entity and its attributes. It is a type definition which describes a kind of entity in terms of the attributes that are aggregated to form it. An attribute may in turn be defined as an aggregate of other attributes, e.g. an address. However, each aggregate attribute must decompose, at some low level, into atomic, or simple attributes. A simple attribute is non-decomposable and is instead described by some fundamental data type, e.g., integer or string (although the atomicity of even these types is debatable at some level). When implemented, simple aggregation is commonly defined through some method such as a record definition. An important question when applying the model is whether something in the problem domain should be conceptualized as an attribute or as an entity. Attributes should relate directly to an entity by describing some quality of the entity, identifying the entity, or both. Attributes should generally not be things which stand alone as entities themselves. However, this rule may be relaxed in the case where the attribute is some thing which is directly related to the entity and we are unconcerned with any attributes of that thing. For example, an address or the city of an address could potentially be considered to be entities. If we were concerned with attributes of the address or city outside of the context of a particular person, such as the grid coordinates or type of building at an address or the population and geographical area of a city, then we should consider them as entities instead of attributes. 65
Attributes of an entity are identifiers, descriptors, or possibly both. While the distinction is not critical within the CoCoA model, a brief discussion is worthwhile to help define the attribute concept. Identifiers serve as a reference to a particular entity. Obvious examples are names, social security numbers, part numbers, etc. Identifiers can be present in natural language or may be contrived when adequate identifiers do not exist, particularly when existing identifiers are not guaranteed to be unique. Descriptors describe or express some quality of an entity. Some examples are color, weight, date of birth, credit limit, etc. 4.1.2 Notation As shown in figure 4-1, a rounded rectangle is used in CoCoA to represent an entity. A name which describes the kind of entity (rather than identifying a particular entity of that kind) is located within the rounded rectangle. The name is usually an indefinite noun or noun phrase. An oval is used to represent an attribute. A name which describes the attribute type is located within each oval. Each oval is connected by a line to the entity which the attribute describes and/or Figure 4-1: Entity with attributes hidden and shown identifies. As noted above, 66
CoCoA does not distinguish between descriptors and identifiers. The attributes themselves may be displayed, or not, depending on the desired level of detail and the portions of the data model to be emphasized at any particular time. Whether the attributes are shown or not, they are implied by the presence of the corresponding entity, which aggregates them. Figure 4-1 shows an example entity with and without its attributes displayed. If available, an improvement in the visual clarity is made by making the entity type's rounded rectangle with a slightly wider line width, as shown in figure 4-1 and subsequent figures. 4.2 Named Relationships A named relationship is the same concept as that originally proposed in the well-known ER model and used in the EER, ECR, and OMT models discussed in chapter 3. It has been called various other names including relationship [Chen], association [Rumbaugh et al.], association relationship, and instance connection [Coad and Yourdon]. 4.2.1 Semantics An instance of a relationship establishes a link between instances of entities. In CoCoA, we define a kind of such relationships by giving it a name which describes the meaning of a number of such links. For example, we might want to be able to express the relationship that a person entity OWNS some car entity. Some authors leave relationships unnamed to reduce diagram clutter [Coad and Yourdon], but names are included in the CoCoA model to improve the clarity of the model. Named relationships are defined between various entities. Each entity participating in a 67
named relationship plays a particular role in that relationship. In CoCoA, roles are named. In the above example of the OWNS relationship, a person plays the OWNER role and a car plays the OWNED role in the relationship. An entity type may be defined using CoCoA to play roles in more than one kind of relationship and may even play more than one role in a single relationship type. Many conceptual data models leave roles unnamed. Sometimes, however, the naming of a relationship leaves it ambiguous as to what is really meant by the relationship. Suppose, for example, that a company entity type is connected to a project entity type by an relationship type named ORGANIZES. It might be ambiguous whether a project ORGANIZES a company or a company ORGANIZES a project. This might not be solved by looking at the context of the relationship. It is this problem which naming the roles addresses by making it more clear how the entity participates in the relationship. Role names should be carefully chosen to clearly indicate which role is the object and which is the subject of the relationship. For example, the company might play the ORGANIZER role and the project the ORGANIZED role (or vice versa, depending on the actual situation). Giving (good) names to the roles makes it clear what each entity type's function is in the relationship.
In CoCoA, named relationships also show various limitations on what kinds of and how many entities must or may be linked by them by playing particular roles. The CoCoA conceptual data model explicitly allows n-ary named relationships. I.e., relationships may be of arbitrary degree n >= 2, where n is the number of roles. CoCoA 68
makes no special distinction between binary and higher degree relationships. A special case of a relationship is one in which entities of the same type play more than one role. Such relationships are treated just like any other relationship. Kinds of named relationships are also described by their cardinality and optionality, which are together called multiplicity. Cardinality describes whether or not a particular entity instance can fill a role more than once in one type of relationship. The alternatives are either 1 or N for the maximum number of times an entity can participate in a role. Combining the maximums for the two roles in a binary relationship, we get 3 alternatives: 1:1, 1:N (the same as N:1), and N:M. Optionality describes whether or not participation in the relationship is mandatory for a particular entity type. This can also be thought of as the minimum number of times an entity can participate in the relationship. Optionality and cardinality can be expressed as an ordered pair of minimum and maximum participation associated with each entity's role. There are four common combinations: (0,1), (1,1), (0,n), or (1,n). CoCoA will generally use these combinations. Other numbers than 0, 1, or n are also possible for maximum and minimum and may be used in CoCoA. Finally, for those uncommon situations where a maximum and minimum are insufficient, a more complex, general purpose multiplicity constraint may be used within the CoCoA model. For example we might say that any entity must participate an even number of times in the role (including 0). This would be incompletely expressed with a (0,n) minmax pair. Multiplicity constraints are more difficult to express for relationships with degree > 2 (i.e., not binary). While individual roles may be constrained, participation constraints 69
may also be applied to combinations of roles (as in NIAM). The semantics of the multiplicity constraints in CoCoA apply only to participation in a single role of the relationship without concern for the other roles, either individually or in combination. In most conceptual data models, each kind of role may be played by entities of only one type. In CoCoA, more than one entity type may be linked to a particular role, i.e., the particular entity which plays that role may optionally be of any one of the entity types linked to the role. This concept (categories) is discussed further in section 4.4. 4.2.2 Notation The CoCoA model uses a notation for relationships inspired by, but extended from, NIAM. A relationship between two (or more) entities is represented as a box which is divided into parts, one for each role in the relationship. Binary relationships have two parts for their two roles, ternary relationships have three parts for their three roles and so forth. An example is shown in figure 4-2. Each role box has a role name inside it. Maximum and minimum cardinality constraints are also shown in each role box. Figure 4-3 shows an example of a ternary named relationship.
Figure 4-2: Named relationship
Figure 4-3: Ternary Relationship
70
4.3 Generalization and Specialization Generalization (cf. [Smith and Smith 1977b]) is a relationship between entity types only (not instances). It is the relationship between (super)classes and subclasses or between (super)types and subtypes. Specialization is the same relationship, only in the opposite direction. I.e., if entity type A is a generalization of entity type B, then entity type B must be a specialization of entity type A. Other names for this relationship are ISA ("is a") relationships, AKO ("a kind of") relationships, categorization (e.g., in the ECR model), and classification [Coad and Yourdon]. 4.3.1 Semantics A generalized entity type shares certain qualities with other entity types that are specializations of it. Hence it is "generic." For example, cars, airplanes, and boats can all be generalized into "vehicles." bicycles, rowboats, and aircraft like the Gossamer Condor Specialization is the generalization relationship only in the opposite direction. A generic entity is subdivided (specialized) into more specific types (or categories). This abstraction mechanism can be used to define new, more specialized, entity types. For example, the generic entity type "vehicle" can be specialized into cars, airplanes, and boats. Each specialized entity type may in turn be divided into further specializations. For example, boats could be specialized into sailboats, powerboats, and rowboats. land vehicles, air vehicles, and water vehicles A boat "is a" vehicle, hence the name ISA for this relationship. Similarly, a boat is also "a kind of" (AKO) or "a class of" or "a category of" vehicle. All these terms correspond to specialization. 71
A specialized entity type possesses all of the attributes of the generic entity type from which it is specialized. Taking the attributes of the generic entity type is called inheritance. For example, the generic entity type vehicle might have an attribute weight, which is inherited by the car, airplane, and boat entity types. A specialized entity type also inherits the ability to participate in all the named relationship roles in which entities of the generic entity type can participate. For example, a vehicle plays the owned role in the owns relationship, therefore a car, airplane, or boat may also play the owned role. Finally, a specialized entity type also inherits any covering aggregation relationships from the generic entity type. This will be discussed in section 4.5. In addition to those things inherited, a specialized entity type can then have other attributes, participate in other roles in named relationships, or participate in other covering aggregation relationships, which don't apply to the generic entity type. For example, airplanes might have the attributes "takeoff speed" and "maximum cruise altitude". Neither of these make sense for cars or boats. Specialization often is done based on the value of an attribute, called a partitioning attribute because it partitions the entity type into subtypes. For example, the generic entity type "vehicle" might have the attribute "travel medium," which might contain the value "land", "air", or "water." This attribute specializes the vehicle entity type into car, airplane, or boat. In this example, the names "land vehicle," "aircraft," and "water vehicle" might be appropriate for the immediate specializations of vehicle, since trucks, helicopters, and ships or canoes might possibly be included as further specializations in addition to cars, airplanes, and boats. 72
An entity type can be specialized in more than one dimension. For example, the vehicle entity type might also have the attribute "powered by," which might have the value "human", "horse", or "engine," giving specialization of "horse powered vehicles," "human-powered vehicles," and "engine-powered vehicles. If a partitioning attribute of a particular entity is only allowed to have only one value, then the subtypes are mutually exclusive (disjoint). If the attribute can be multi-valued, then the subtypes may overlap, i.e., instances may belong to more than one subtype. For example, some cars have been built which also function as boats, and would therefore have both "land" and "water" as values of the travel medium attribute. Actually, the attribute would be more accurately named "travel media," since it can be multi-valued.
Each specialized entity type
no longer needs the partitioning attribute from the parent generic entity, because it is implicit in the entity type. E.g., the value of "travel medium," while needed in the entity type vehicle, is implicit in an entity of type "water vehicle." Therefore, partitioning attributes are not inherited. In CoCoA, an entity type may be a specialization of more than one entity type. This is called multiple inheritance. For example, we might have rowboats as a specialization of the entity types "water vehicle" and "human powered vehicle" (which partially overlap). With multiple inheritance, a specialized entity type inherits from all of its (more than one) parent generic entity types. For example, the rowboat entity type might inherit a "displacement" attribute from the water vehicle entity type and a "person-power" attribute from the human-powered vehicle entity type. Some models of generalization do not allow multiple inheritance. Instead they enforce 73
a strict hierarchy of specializations, not allowing a network (multiple "parent" generalizations). This is usually done for purposes of semantic clarity, as will be discussed below. Sometimes it is done in implementation models for efficiency or reduced complexity in an implementation of computer-based support for the model. CoCoA expressly allows a network of generalization and specialization, which is both directed (i.e., in each link, one entity type is generic and the other is specialized) and acyclic (i.e., no entity type may directly or indirectly generalize itself). One reason for limiting a model to single inheritance is the problem of ambiguity in the inheritance of attributes which have the same name from different generic types. In CoCoA, attributes with the same name inherited multiply must have the same meaning and will only present once in the specialized entity type. The same thing goes for two generic entity types which play the same role in a relationship. There is only one role playing in the specialized entity type.
4.3.2 Notation The CoCoA model generally adopts the notation from OMT [Rumbaugh et al.]. An example is shown in figure 4-4.
74
Specialization and generalization relationships are represented by a line connecting two entities, with a triangle pointing toward the more general entity. If available, using a broader line for generalization and specialization than for named relationships helps visually distinguish the
Figure 4-4: Specialization and generalization relationship
two, as was also done to make entity types more distinctive. If the specialized entity is made so according to the value of a partitioning attribute, the partitioning attribute's name will be shown beside the triangle. Multiple specializations on the same dimension (partitioning attribute), may be connected via the same triangle. Non-mutually-exclusive partitioning into specializations may be shown with a blackened triangle. Multiple specialization and multiple generalization relationships (multiple inheritance) are allowed for any particular entity in CoCoA. 4.4 Categorization The category concept used in this data model is similar to the grouping-of-entities 75
category used in ECR [Elmasri et al.]. A category is a grouping (union) of entity types, which allows several types of entities to play the same role in a relationship. 4.4.1 Semantics Any entity which is of any one of the types in a particular category may play a role in a named relationship that is connected with that category. For example, both corporations and individuals could be grouped into a category "owner" so that entities of either type may assume the role of owner of vehicles. Categorization is closely related to generalization. An important question is whether one should choose categorization or generalization to model a particular situation in the problem domain. A category should be used when two or more entity types need to play the same role in a relationship, but a generalized entity type (with attributes and possibly with other roles in relationships) is not otherwise needed. Using the above example, suppose in some problem domain, people and corporations were specializations of an entity type called "legal agent," which can enter into contracts and bring legal suits. Since this exists even without it playing the owner role in a relationship to a vehicle entity, then the generic entity type (legal agent) would suffice; we would just let it play the owner role in the owns relationship. But if a generic entity type would only be needed to allow both people entities and corporation entities to play the owner role, we should use an owner category instead. Forming of a category to play a role in a named relationship is subject to the following limitation. In order for a category to be used, all members of the category must be subject to the same multiplicity constraints for the roles played. For a somewhat arbitrary and 76
unrealistic example, suppose corporations were allowed to own an unlimited number of cars and persons (people) were only allowed to own, let's say, 2 cars. These are different maximum multiplicity constraints and a category could not be used. Instead, the situation should be modelled with two kinds of OWNS relationships, e.g., a PERSONALLY_OWNS relationship for people with its owner role constraint, and a CORPORATELY_OWNS relationship for corporations with a different owner role constraint. If we reflect on it, this constraint makes sense because it applies also to any generic entity type or ECR grouping-of-entities category, since each may have only one multiplicity constraint on the way it plays a role. 4.4.2 Notation Unlike the way the ECR model treats grouping-of-entity categories, the CoCoA model treats categories implicitly. In CoCoA, categories do not have their own symbol and are not
Figure 4-5: Owner category implied
given explicit names. The goal of a category is to allow more than one type of entity to be able to participate in a particular role in a relationship. Therefore, each entity type in the category is simply connected directly to the role that it plays. This helps keep CoCoA's notation concise. Any of the entity types connected to a role may play that role. Figure 4-5 gives an example. CoCoA simply relaxes the restriction of many conceptual data models, that a role may 77
be played by only one entity type, as was noted in section 4.2.1. This is possible because of the representation chosen in CoCoA for named relationships. Multiple connections to one role are ambiguous in notations without explicit roles. CoCoA makes the role played by entities explicit by the portion of the relationship box to which each is connected. Note that, while categories are unnamed, a name for a category is implied by the name of a role to which multiple entity types are connected. For example connecting two entity types, person and corporation, to the owner role implies an owner category. 4.5 Covering Aggregation Covering aggregation is the abstraction mechanism by which the CoCoA model supports the complex, partly overlapping, multi-granularity level entities which characterize complex problem domains. The name is drawn from [Alagic] (pp. 13-14) who also distinguishes this kind of aggregation from the simple aggregation of attributes (which he calls cartesian aggregation) used to define simple entity types. Covering aggregation instead allows us to define complex entity types, i.e., those at least comprised of other entities (and named relationships). 4.5.1 Semantics Covering aggregation expresses "a part of" relationships. Its specialized, predefined meaning distinguishes covering aggregation from named relationships. [Mathiessen et al.] have used the term "defining relationship" to express the idea that the entity type (or object type) is at least partly defined by the components that it has. Covering aggregation is also different from simple aggregation, which was discussed in section 4.1. In simple aggregation, attributes are aggregated into an entity according to a 78
fixed type definition. They must be present in the entity (but could be non-valued) and do not stand alone by themselves. In covering aggregation, an entity aggregates components which are entities and relationships instead of attributes. Additionally, the user controls which items are part of the aggregate (covering) entity; the covering entity contains a set of other entities and the user can arbitrarily add or delete entities (of the allowed types) to or from the set. We also use a covering aggregate entity's name as a shorthand to refer to the entity together with all of its components [Iivari]. This collective reference is used for various purposes. It is a conceptual unit used in memory and reasoning in the same spirit as the concept of "chunking" from cognitive psychology. It is also used in communication with others. Although irrelevant to understanding complex problem domains, a covering aggregate entity, when implemented, can also serve as a unit for computer-based implementation of storage, update, and retrieval. In that context, saving and retrieving a covering entity implies saving and retrieving all of its components (e.g., see chapter 7). [Kim et al. 1989] also discuss using composite entities as units for versions, authorization, and locking. 4.5.1.1 Simple Covering Aggregation Simple covering aggregation may be distinguished from complex covering aggregation, which will be described further below. Simple covering is the aggregation of a set of entities into a covering entity. It is the same concept as the aggregation relationship of several papers and books, such as [Teorey et al.], [Elmasri et al.], [Blaha et al.], [Loomis et al.], [Coad and Yourdon], and [Rumbaugh et al.]. It is also similar to the "composite 79
objects" of [Kim et al. 1989] and "complex objects" of [Kim et al. 1988]. A number of questions arise with respect to the semantics that we are trying to express between the covering and covered items. One question is whether the constituent (covered) entities are existence dependent on the parent (covering) entity. [Kim et al. 1989] discuss this in detail. [Iivari] also includes this second dimension in his classification of complex objects. If the covering entity is deleted, should its components also be deleted? In CoCoA, component entities are not viewed as existence dependent. They must be explicitly deleted. It should be noted, however, that it may be desirable to be able to give a command to delete an entity AND all of its components, which constitutes an explicit deletion. A second question is what the allowed exclusivity of the covering relationship should be (again, cf. [Kim et al. 1989]). [Iivari] notes this as the fourth dimension of his classification. Should it be possible to have an entity which is a component of more than one covering entity? In the case of modelling physical entities such as automobiles, this doesn't seem necessary. But in the case of abstract or logical entities, such as the information systems descriptions stored in a CASE environment, we need to allow this possibility, e.g., for different versions of the covering entity. Therefore, in the CoCoA model, entities may be covered by more than one other entity. If an entity is a component of more than one covering entity and is also existence dependent, it is not deleted when an entity which covers it is deleted, unless that is the only entity which covers it. This is consistent with [Kim et al. 1989]. A third issue is whether a component can be made mandatory or optional (third in the 80
[Iivari] classification). In general, CoCoA will assume optional components. Mandatory components may also be specified, however. A mandatory component is one where the covering aggregation relationship has a minimum cardinality greater than one. In fact, other minimums and even maximums can also be specified, similar to the way it is done for named relationships (described above). A fourth question is recursion (cf. [Iivari], [Kim et al. 1988], [Batory and Buchman], and [Rumbaugh et al.]). Recursion of covering aggregation means that an entity can cover other entities of the same type. If allowed, recursion could be either direct or indirect. CoCoA explicitly allows and supports recursive aggregation, whether directly or indirectly. A final semantic issue is whether the model should support dependencies of the values of various attributes from components to covering entities and/or vice versa. As an example of the former, should the color of a component be the same as that of the parent entity (e.g., should the color of a car door or roof be constrained to be the same as that of the car as a whole)? As an example of the latter, should the weight of a car depend on the sum of the weights of its components? The desirability of these characteristics obviously depends on the particular situation. In the case of the CoCoA model, these last two concerns are not of interest and remain unaddressed. However, they may be of interest to others and may be added to the model by those who are interested. 4.5.1.2 Complex Covering Aggregation CoCoA supports a kind of covering aggregation called complex covering aggregation. It is this abstraction mechanism from which CoCoA takes its name. Complex covering 81
aggregation is an extension of simple covering aggregation. Simple covering aggregation allows only aggregation of entities (and their attributes). Complex covering aggregation allows covering of both entities and relationships. As an example of the need, consider the following. Simple covering aggregation is like the aggregation of a set of component entities. A set is the least structured of the possible user-controllable aggregation mechanisms in that no relationships are expressed between the entities within the set. Set membership is a pure aggregation relationship between a set and its members. There is no order or precedence implied in the set. However, we might wish to define an entity type which covers something other than just a set of entities. For example, we might want to define a list or a stack of entities as a user modifiable aggregation mechanism. In this case, unlike with a set, there are relationships between the entities (e.g., which item follows which in a list, or which item is on top of which in a stack). We will be concerned with maintaining these relationships between the entities. Furthermore, taking an object-oriented view, we will have different operations related to aggregation. With a set, we merely add or delete members, check membership, or possibly list all members. Lists and stacks have their own operations, which are well known. Therefore, with complex covering aggregation, we may aggregate not only sets of entities (as in simple covering aggregation), but whole data structures. I.e., we may cover not only entities, but also (some of) the relationships between the entities covered. Not all of the relationships between the covered entities necessarily have to be covered. The covering aggregation of both entities and relationships results in a complex entity, hence 82
the name given here. 4.5.2 Notation In CoCoA, we represent covering aggregation by enclosing the covered entities and relationships (components) within a large box. This notation is adapted from [Iivari] who cites [Batory and Buchman]. It is in the spirit of [Harel] who uses Euler Circles (similar to Venn Diagrams) in his statechart notation. This
Figure 4-6: Covering aggregation
box is connected to the covering entity (assembly) by a wide arrow directed toward the covering entity. Shading the box, if available, makes it more distinct and easy to see. Figure 4-6 gives an example using linked lists. In the example in figure 4-6, the aggregate entity, LINKED_LIST, is connected by two named relationships, BEGINS_WITH and ENDS_WITH, to a component entity, LIST_ITEM. This in no way implies that a named relationship between covering and 83
covered entities must be included in all CoCoA models. Named relationships between aggregate and component entities are strictly optional in CoCoA; they are entirely dependent on the characteristics of the problem domain being modelled. Note also that in the example in figure 4-6, the relationships between the aggregate and covered entities are shown inside the covering aggregation area and are thus themselves components. Whether they are placed inside or outside the covering box is also strictly dependent on the problem domain. In this case, one instance of the relationship is present for each and every list (unless it is an empty list!) and can be considered to be part of that list as it is part of the definition of a list. On the other hand, the relationship should be placed outside the covering aggregation area if it is not considered to be a component. See figure 5-3 in the next chapter for an example. If simple covering aggregation of a single component entity type is needed, the box may be omitted and the arrow drawn directly from one entity type to another. This will simplify the diagram without losing clarity. Covering aggregation boxes may overlap or be nested to support modelling complex problem domains. See section 4.6.2 for examples. Overlapping boxes are used to express shared components. If multiple boxes overlap, dashed lines and different shading may be used to improve the clarity. Nested boxes show multiple levels of granularity, i.e., components of components. Different shading for different aggregation boxes helps with visualization of nesting also. Representation of both sharing and nesting in the diagrams may also be distributed to more than one diagram. This is done by using the
84
same name for entities and/or relationships on different diagrams. The same name may also be repeated on a single diagram. This is useful for representing recursive aggregation (where entities of the same type as the aggregate can be components). For example, we might have a linked list which can be composed of linked lists. This could proceed to an arbitrary number of levels. Eventually,
Figure 4-7: Recursion in covering aggregation
the progression would have to terminate with simple list items (not lists). This example is modelled in figure 4-7. 4.6 Evaluation of the CoCoA Model In order to evaluate the CoCoA model, section 4.6.1 applies the criteria which were developed in chapter 2 and then applied to the ER, EER, NIAM, ECR, and OMT conceptual data models in chapter 3. In addition, section 4.6.2 addresses the richness of CoCoA in more detail by examining how it supports the characteristics of complex problem domains. 85
This kind of evaluation according to criteria is rather subjective and therefore problematic. Therefore, chapters 5 and 6 supplement this evaluation by applying CoCoA to an example complex problem domain (IS modelling for CASE support). 4.6.1 Evaluation Against Criteria The problems with the subjective evaluation of such criteria aside, there will likely be some things which the reader and I will agree on and diligently applying the criteria provides some degree of objectivity. 4.6.1.1 Criteria for Semantic Concepts First, we will look at the criteria for the semantic content of the conceptual data model. (1) Richness - The most critical of the semantic criteria is whether the model is sufficiently rich to support modelling of the particular problem domain. As discussed in chapter 1, a conceptual data model for complex problem domains needs to support modelling of complex, partially overlapping entities at multiple levels of granularity. It should not be surprising, since the CoCoA model was developed for that purpose, that it supports all these aspects. CoCoA provides direct support for aggregating both entities and relationships into more complex entities, for sharing components between partially overlapping covering aggregate entities, and for multiple levels of granularity. Of the conceptual data models previously evaluated, only OMT supported the concepts, and then only indirectly via objectification. This has been only a cursory evaluation of the CoCoA model's richness. CoCoA's support for modelling complex problem domains will be examined more closely in section 4.6.2. 86
(2) Minimality - The CoCoA model is also rather minimal for the variety of concepts supported. Simple aggregation of attributes to define an entity type, relationships with explicit roles and cardinality constraints attached directly to the roles, generalization/specialization of entity types, and a straightforward form for complex covering aggregation are the only concepts supported directly. The use of explicit roles and naming may not be considered minimal by some, but it is my view that it ensures that the meaning of the relationships defined is unambiguous, unlike models which leave roles out. Objectification of relationships is not needed to accomplish the development of complex covering aggregation, as was needed by OMT. The implicit use of categories, rather than having a formal concept and construct for it also reduces the size of the model. Attributes are directly linked to entities instead of through relationships (fact types) in NIAM. Finally, the wide variety of constraints, which are only very rarely needed, are not part of the model. (3) Semantic orthogonality - The biggest potential problem CoCoA has in this regard is the use of implicit categories instead of explicit generalizations. The use of a generalization is not wrong per se, it only introduces more visual complexity into the model. There also is the problem shared by all ER-based models of distinguishing between entities, attributes, and relationships. Of the models evaluated, only NIAM does not suffer from this problem (but its NIAM solution causes other problems instead). (4) Problem domain correspondence - The constructs of the CoCoA model were expressly chosen to support the salient aspects of complex problem domains, so CoCoA 87
rates very well. (5) User conceptualization correspondence - For people used to using ER-based conceptual data models, there should be very good correspondence to the user's conceptualization. The few differences are ones that are easy to adopt. The use of implicit categories is not a long stretch from using a normal relationship. The only change is that the constraint that only a single type of entity may play a role in a relationship is relaxed to allow multiple types. The concept of covering aggregation is straightforward and in accord with an abstraction mechanism generally used by humans. The extension of covering to include covering of relationships is also straightforward and should not clash with the users' conceptualizations. The explicit representation and naming of roles has been done in other models, so should present no problems. The naming conventions chosen should greatly increase the understandability of relationships. (6) Precision - The CoCoA model has been defined generally by adopting definitions from other works and textual explanation. Perhaps the model could also benefit from a mathematical explanation, but these are in fact dependent on thorough textual explanations also, so they are not developed here. In my view, the CoCoA model's constructs have been explained thoroughly and at length. However, all the conceptual data models, including CoCoA, will be more precisely defined in chapter 5. 4.6.1.2 Criteria for Syntactic Constructs Next, we will evaluate the CoCoA model against the criteria for syntactic constructs. (7) Picturability - CoCoA was developed to graphically represent the concepts of the model in a way in which it is easy to see the relationships between the parts. 88
(8) Syntactic-semantic correspondence - There is a one-to-one correspondence between the semantics and a syntactic means to express it, with one exception. If a covering aggregation covers only a single entity type, then the box enclosing the component may be omitted; the component need only be connected via a fat arrow to the covering aggregate entity. The box is used to visually represent the grouping of a number of entity and/or relationship components. If there is only one component type, the grouping notation would be superfluous. Omitting it reduces the complexity and clutter of the diagrams. (9) Visual expectation correspondence - It is the evaluation of the author that the CoCoA model succeeds very well in this criterium. This was a major goal in the design of CoCoA's visual representation. Attaching attributes directly to an entity type (without intervening relationships as in NIAM) is what a user would expect. The division of the relationship symbol into role boxes helps make the idea clear. This is especially true for higher degree relationships. Connecting an entity type (or more than one) to a role box makes it visually clear which type(s) can play a role. Placing the cardinality constraints right with the role makes it very clear which role they apply to. Generalization is a concept which is rather abstract and visual correspondence is difficult to evaluate. The use of a triangle with the point toward the supertype for a generalization relationship seems as good or better than any other notation. The use of a tree structure for partitioning into subtypes along one dimension gives visual reinforcement of the concept - especially when annotated with the name of the 89
partitioning attribute. Having a different tree for each partitioning attribute also reinforces the concept. Also, showing multiple links to subtypes visually shows multiple inheritance. The use of filled or open triangles to represent disjoint (exclusive) and nondisjoint generalization is somewhat arbitrary. A strong advantage of the CoCoA model is the strong visual expectation correspondence of the notation chosen for representing complex covering aggregation. Enclosing the items aggregated, both entity types and relationships, makes it very clear that they are grouped together. The fat arrow to the aggregate entity is visually clear as to the link. The ability to overlap and nest boxes makes it visually clear when component entities are shared and where multiple levels of granularity exist. (10) Graphic completeness - CoCoA is graphically complete for all major constructs. Maximum and minimum cardinality constraints are not shown graphically, because the visual correspondence of such notations is generally not clear. Instead, numbers are used and placed within the same box as the role name, so that it is clear to which role the numbers apply. (11) Graphic orthogonality - The CoCoA model overcomes (at least partially) the problems of OMT and other conceptual data models which use similar notations for named relationships (associations), generalization relationships, and aggregation relationships. The use of role boxes in relationships (rather than a diamond symbol as in the ER model or no symbol as in OMT) together with names for the roles distinguishes named relationships very clearly from the other two. Aggregation is also clearly distinguished from the other two by the use of the large box (shaded if available) to 90
enclose the entities being aggregated. It is also clearly distinguishable because of the large arrow toward the aggregate type. Unlike OMT, only the generalization relationship uses a line with a small symbol on it. If available, using broader lines for generalization than for relationships will further visually distinguish the two. Entities and relationships are clearly distinguishable also, because relationship boxes are partitioned and square cornered while entity boxes are not partitioned and have rounded corners. Attributes are also clearly distinguishable because they have a differently shaped graphic entity (on oval) with no straight edges. 4.6.1.3 Criteria for Relationship to Other Areas Finally, we evaluate the CoCoA model against the criteria relating to other areas. (12) Partitionability - A CoCoA model is partitionable by dividing it into separate diagrams. Entities (entities) and relationships with the same name represent the same items redundantly across an arbitrary number of diagrams. There are several general rules for partitioning. It is a good idea to put a whole generalization structure on one page, when possible. The same goes for aggregation relationships. With aggregation, it is also a good idea to put overlapping aggregates (with one or more shared components) on the same diagram. Such diagrams can be complex, but can be made less so if attributes are omitted from the diagram. Another diagram can show the attributes. (13) Non-conflicting terminology - In developing the CoCoA model, I have been sensitive to developing names for the concepts by borrowing from existing terminology where the concepts are the same and trying to invent new terms for new concepts. I have also tried to clarify terminology where it has been used ambiguously in the field. It is for 91
this reason that I use the terms simple aggregation, covering aggregation, and complex covering aggregation. In some cases, it is not possible to use generally accepted terms, because there are different terms in use. An example is in the area of cardinality constraints, with terms like optional relationships, mandatory role constraints, multiplicity, etc. being used. Also, some of the terms combine various dimensions. It is for this reason that I have chosen maximum and minimum cardinality - using the commonly accepted cardinality term while trying to make it more precise. At this point, I will look more closely at the issue of richness and how CoCoA supports modelling of complex problem domains. 4.6.2 Support for Complex Problem Domains The conceptual data model described in sections 4.1 through 4.5 above will be used for describing and gaining an understanding of problem domains which have (1) complex entities, (2) multiple levels of granularity, and (3) partially shared components. In section 4.6.1, we gave a cursory evaluation of the most critical of the semantic criteria -- whether the model is sufficiently rich to support modelling of the important aspects of complex problem domains. This section will expand and summarize the proposed conceptual data model's ability to support each of the three aspects above. 4.6.2.1 Support for Complex Entities CoCoA provides direct support for aggregating both entities and relationships into more complex entities. The covering aggregation concept included in the CoCoA model provides the support. Both entity types and relationships may be enclosed within the border of a complex covering aggregation area and linked to the entity which contains 92
them. This effectively communicates the concept graphically by showing the entities and relationships enclosed within the complex covering aggregation area. 4.6.2.2 Support for Multiple Granularity Levels CoCoA also supports multiple levels of granularity. Aggregate entities may be nested within other aggregate entities; i.e., they may be components of other entities. Nesting of entities is easy to see in the graphic notation chosen. A complex covering aggregation area is easy to see within the borders of another complex covering aggregation area. Figure 4-8 on the next page shows an example of a nested aggregate component. In this case, we have a binary tree, each node of which is a linked list.
93
Figure 4-8: Multiple levels of granularity - nested aggregation 4.6.2.3 Support for Shared Entities CoCoA directly supports shared component entities and relationships, i.e., partially overlapping covering aggregate entities. The notation provided by the CoCoA model visually supports understanding of that sharing. The inclusion of entities and/or relationships within the overlap (intersection) of two (or more) complex covering aggregation area expressly represents the sharing in a way which is easy to see. Figure 49 gives an example. In this case, we have a binary tree with a linked list data structure super-imposed on it. The items/nodes can at the same time be items in the list and also nodes of the tree. Note the use of the dotted line to make the intersection area more clear.
94
Figure 4-9: Example of overlapping aggregates with shared components
95
Chapter 5 Comparison and Integration of Data Flow Models: An Example Use of the CoCoA Model
The purpose of both this and the subsequent chapter is to illustrate the use and capability of the Complex Covering Aggregation (CoCoA) conceptual data modelling technique for modelling, understanding, and integrating complex problem domains. As noted in chapter 1, there are many different complex problem domains which we may want to support with information systems. Any of them could potentially serve as an acceptable problem domain for an example application of the CoCoA model. The complex problem domain to which we choose to apply the CoCoA model in this and the subsequent chapter is Information Systems (IS) modelling. IS modelling is a complex problem domain which is itself supported by various information systems, e.g., CASE tools and environments. The purpose of these chapters is not to show a complete classification and detailed modelling of IS modelling. Such a task is completely infeasible within the space available here. Instead, I will concentrate on a subset of the various IS modelling techniques which is adequate for demonstrating and evaluating the approach. I will develop CoCoA models of this subset, and use them to develop an integrated understanding of a subset of the IS modelling problem domain. These two chapters (5 and 6) will also present the resulting CoCoA models. 5.1 IS Modelling Languages and Perspectives Information systems developers make use of Information Systems Modelling Languages (ISMLs) to describe, investigate, reason about, and communicate about 97
information systems. Each ISML is used to describe an information system from a particular point of view or perspective, which is a way of conceptualizing of an information system and deciding what is important about it and what is irrelevant (at least for the purpose of that perspective). The idea of a perspective or viewpoint is common to other complex problem domains than IS Modelling. The term IS modelling perspective will be used to a perspective on information systems (and the modelling of them). The terms IS perspective or perspective by itself can be used where the context is clearly IS modelling (e.g., in the remainder of this and in the next chapter). Various authors have noted that we can classify approaches to modelling information systems according to the primary IS modelling perspective that they use, e.g,. [Charette], [Olle]. There are a large number of different ISMLs. There are a smaller number of IS modelling perspectives, because some ISMLs have the same perspective as others. Some IS modelling perspectives are supported by a large number of ISMLs. Other perspectives have support from only a few ISMLs. Some, more general ISMLs even span more than one perspective. There are many perspectives in IS Modelling. Among them are the 1) the control flow or behavioral perspective, 2) the data flow or functional perspective, 3) the conceptual data modelling perspective, 4) the data history perspective, 5) the state-transition model perspective, 6) the enterprise/business function and structure perspective, 7) the user interface perspective, 8) the object-oriented perspective, 9) the real-time perspective, 10) the quality of working life perspective, and 11) the security perspective. Some of these perspectives are well-known and well-defined (cf. [Olle]). Other IS modelling perspectives are still emergent, in flux, and evolving. Also, the above is not a complete list. Still other perspectives are possible, unknown, or undeveloped, and may be added by researchers and practitioners at random intervals. 98
Each IS modelling perspective is applicable at (a) particular stages(s) of the IS development process. Some perspectives are applicable in only one stage, while others are used in a number of stages during the ISD process, as will be shown below. Since we cannot discuss all of the IS modelling perspectives listed above, which of the extant perspectives should we model within this chapter in order to demonstrate and evaluate CoCoA? To answer this, we need some way of deciding which of the various IS modelling perspectives to include or not, as well as which of the ISMLs that are representative of a particular perspective to include or not. The perspectives and ISMLs chosen must adequately demonstrate the method and architecture. (1) They should not be trivial, nor should they be so complex as to obscure the method. (2) They should be reasonably well known to the reader, or at least well documented, so that understanding the ISML should not be an obstacle to understanding how the integration method works. (3) They should be relatively stable, i.e., their semantics should not be controversial or in a state of flux. For example, new approaches, such as object-oriented analysis techniques, are still quite dynamic with additions and changes coming frequently, and are therefore inappropriate to this work at this point in time. (4) Each perspective chosen should have a number of different ISMLs which model an IS from that perspective. This will allow demonstration of the integration within a single perspective. (5) There should be some overlap or linkage between the various perspectives chosen, in order to allow demonstration of the method for integration between or across multiple perspectives. One IS modelling perspective which is not trivial, reasonably well known, relatively stable, and has several ISMLs which use it is the conceptual data modelling perspective. However, it is somewhat more difficult to understand and integrate because it has a large number of concepts and some large differences within its perspective. Therefore, it is not suitable for an introductory example using CoCoA. However, it will be modelled in chapter 6. 99
Another IS modelling perspective, which partially overlaps with conceptual data modelling, is the data flow (or functional) perspective. While it is simpler than conceptual data modelling, it is still not trivial, it is stable, well-known, and has a number of representative ISMLs. Thus, it is suitable for an introductory illustration and this chapter will address that perspective. Of the ISMLs with this perspective, we choose to model and integrate Data Flow Diagrams (DFDs) [DeMarco], [Gane and Sarson], SADT activity diagrams [Ross], and the ISAC methodology's A-Graphs [Lundeberg]. But first, we must discuss the process of how CoCoA will be used to understand, model, and integrate these 3 ISMLs within the data flow perspective. 5.2 Modelling and Integrating an Individual Perspective This section describes a process for how we apply CoCoA to the task of understanding and integrating a single perspective. The result of using such a process will be an integrated CoCoA model of that perspective and those views which have been considered and at least partially support that perspective. In the case of IS modelling, the perspective will be an IS modelling perspective and the views will be (at least some of) IS modelling languages that embody that perspective. 5.2.1 Introduction Each ISML includes various concepts. Some or all of these concepts may be shared by other ISMLs which have the same IS modelling perspective. The concepts and the relationships between them are modelled in CoCoA as entity types and named relationship types, possibly with other CoCoA constructs. The ISMLs concepts are covering aggregated by the ISMLs. Because some of the ISMLs' concepts overlap, this will illustrate how the CoCoA model supports partially-overlapping components of complex covering aggregate entities - and therefore modelling of complex problem domains. Throughout this and the next chapter, the diagrams and descriptions emphasize the 100
concepts of ISMLs, not their graphic notation. In fact, notations may differ from use to use. Notation is not important to understanding an ISML, and may even interfere with understanding by complicating the CoCoA models that we build of them. Each different CoCoA model of an ISML represents one view of its particular perspective, as well as of the larger complex problem domain of information system modelling. Integrating these views is commonly called view integration. The following section will review view integration and the rationale for the approach taken in this chapter. 5.2.2 View Integration View Integration is an important topic in information systems and database research. This section will very briefly discuss concerns in performing view integration. There are several ways to perform view integration (cf. [Batini et al. 1986]). Generally the idea is to identify synonyms, combine any redundant entities found, and merge the rest. One strategy is to combine views pairwise (see figure 5-1 (a)), working up a binary
101
Figure 5-1: View Integration Methods
102
tree of pairwise integrations. A second approach (figure 5-1 (b)) is to first combine two views, then add a third view to the result, then the fourth, and so on in a series. A third approach (figure 5-1 (c)) is to synthesize more than two views at one time. All of the approaches are potentially valid and are based on a bottom-up approach in which the views exist before they are integrated. The choice of the order in which the various views are integrated is not important, in theory. Synonyms can be resolved either early or late in the process (again, cf. [Batini et al. 1986]). In the particular case of CoCoA models of ISMLs, the synonym identified among the concepts of different ISMLs represent shared components. The view integration of the various ISMLs of the data flow perspective, discussed
103
in the next section, will be accomplished by integrating the 3 ISMLs directly as shown in figure 5-2. Because we are considering only 3 ISMLs and there are no serious differences between them, it is a fairly simple example and this is feasible. However, in the case of complex problem domains
Figure 5-2: Modelling and Integration Process
with many levels of granularity, the problem is just a little more complex than discussed in [Batini et al. 1986] because of two things. First, some of the views will themselves become entities within the conceptual data model. For example, in CASE, the ISMLs are views and also entities. In the data flow modelling example which we will be working with next, the various ISMLs are views, but also are entities which could potentially be stored and manipulated. Secondly, new entities are determined along the way by generalizing and specializing entities being merged, in order to accommodate the merged views. Finally, as a practical matter, the larger the number of views to be merged, the larger the interim CoCoA models are, and the more the different views are interrelated, the more difficult it is for a human being to keep track of all the various entities, relationships, etc., and thereby to recall and recognize that two concepts are the same. Therefore, it is a good approach to try and group the views that seem to share the same kind of concepts so that 104
they are merged early. Then, another group which seems to cluster around some other concepts should be merged, then another, and so on. This argues for some combination of the above approaches. It also suggests that it is probably a good idea to go back and review for missed synonyms at some point, perhaps even periodically in the process, particularly in very large schemas. The conceptual data modelling perspective, to be modelled in the next chapter, is larger and more complex in this way, so a different method will be needed. This process will be discussed in the next chapter. 5.3 ISMLs in the Data Flow (Functional) Perspective Now we will turn the actual application of CoCoA to the data flow perspective. The data flow perspective shows how information flows from one process (or function or activity) to another. It shows which processes need certain information and what information they produce. It also shows how those processes are linked together (with data flows). A number of ISMLs share the data flow perspective. The ones to be modelled here are Data Flow Diagrams (DFDs) [DeMarco], [Gane and Sarson], SADT activity diagrams [Ross], and the ISAC methodology's A-Graphs [Lundeberg]. There are many others. Each of these ISMLs uses different terminology for the parts of the diagram, but the semantics of what they express is mostly the same. Therefore, each of them is a different way of expressing the dataflow perspective. There are also some subtle differences in the constructs available in the different ISMLs. In order to resolve these differences, the first step is to model each of the three ISMLs listed above individually. This activity is represented by the upper (white) arrows in figure 5-2. This will be done in sections 5.3.1 through 5.3.3. The second step is to merge the resulting CoCoA models of these ISMLs together, as represented by the lower (black) arrows in figure 5-2. This task will be done later, in section 5.4. 105
As noted above, the
emphasis in the CoCoA models will be on the ISMLs' concepts, not the notation for representing them. 5.3.1 Data Flow Diagrams Data flow diagrams (DFDs) are a component of the group of methodologies known as Structured Analysis, e.g., [DeMarco], [Gane and Sarson], and [Yourdon]. DFDs are modelled using CoCoA in figure 5-3. In the following description of figure 5-3 (and in descriptions of other CoCoA models), ALL UPPERCASE LETTERS are used for the names in the CoCoA figures.
Figure 5-3: CoCoA model of Data Flow Diagrams DFDs are represented in figure 5-3 with the DATA_FLOW_DIAGRAM entity type at the top of the figure. DATA_FLOW_DIAGRAM is a complex entity type, which is a covering aggregation of its concepts. The main concept of DFDs is the data flow, represented by the DATA_FLOW entity type. The TO and FROM named relationship 106
types link DATA_FLOWs to the other kinds of entity types, namely PROCESSes, DATA_STOREs, and EXTERNALs. Note that PROCESS, DATA_STORE, and EXTERNAL are all linked to the SOURCE role in the FROM named relationship type, forming an implicit category. This is needed because a DATA_FLOW can flow FROM and of these three SOURCE entity types. The same is true for the DESTINATION role of the TO named relationship type, because a DATA_FLOW may flow TO any of these three entity types as its DESTINATION. PROCESSes can be hierarchically decomposed (onto other diagrams) through the SPECIFIES relationship to another DATA_FLOW_DIAGRAM. Each of the different entity types in data flow diagrams have NAME attributes. 5.3.2 SADT Activity Diagrams The SADT methodology [Ross] uses Activity Diagrams, which also use the data flow perspective. A number of other methods, such as IDEF have evolved from SADT. SADT_ACTIVITY_DIAGRAMs, modelled using CoCoA in figure 5-4, include ACTIVITY boxes, INPUTs, OUTPUTs, CONTROLs, AND MECHANISMs. SADT's activity box is here named just ACTIVITY, since "box" implies syntax.
107
Figure 5-4: CoCoA model of SADT Activity Diagram Because an OUTPUT in an SADT_ACTIVITY_DIAGRAM can become either an INPUT, a CONTROL, or both at the other end of the arrow, they are actually the same entity type. In figure 5-4, the FLOW entity type is generalized from the INPUT, 108
OUTPUT, and CONTROL entity types. Each FLOW may play a role in INPUT_TO, OUTPUT_FROM, and/or CONTROL_TO relationships. What makes a FLOW an INPUT, OUTPUT, or CONTROL is the way it is related to a particular ACTIVITY. The INPUT, OUTPUT, and CONTROL entity types (if needed) can be derived from FLOWs' participation in one or more of the above 3 relationships in the INPUT_TO, OUTPUT_FROM, and CONTROL_TO roles, respectively. Note that the 3 specializations of FLOW are not mutually exclusive (shown by the darkened triangle) as a particular flow could be any combination of the 3 subtypes. In SADT_ACTIVITY_DIAGRAMs, FLOWs can be split into SUBFLOWs with different destinations and/or be joined from different sources to form SUPERFLOWs. The results of these actions (a SUPERFLOW or SUBFLOWs) are themselves FLOWs. These are represented by playing a SUBFLOW and/or SUPERFLOW role in a JOIN_INTO and/or SPLITS_INTO relationship. Having two different JOIN_INTO and SPLITS_INTO relationships in the CoCoA model allows splitting at either or both ends of a FLOW.
109
A MECHANISM PERFORMS one or more ACTIVITIes. MECHANISMs only rarely appear in SADT activity diagrams. The SPECIFIES relationship is present between ACTIVITIes and other SADT_ACTIVITY_DIAGRAMs. 5.3.3 ISAC A-Graphs The ISAC methodology [Lundeberg] includes within it a diagram called an ActivityGraph, or A-Graph for short, which supports the data flow perspective. The main concepts in ISAC_A-GRAPHs (see figure 5-5)
110
Figure 5-5: CoCoA model of ISAC A-Graphs
111
are the ACTIVITY, SET, and FLOW entity types. Figure 5-5 generalizes SETs and FLOWs into GENERALIZED_FLOWs. The only difference between SETs and FLOWs is the location of the SOURCE or DESTINATION of the GENERALIZED_FLOW. SETs have either an ORIGIN or a DESTINATION that is outside the A-Graph, whether it is another ACTIVITY or an EXTERNAL. Conversely, FLOWs have both SOURCE and DESTINATION within the A-Graph; i.e., they are both components of the ISAC_AGRAPH. The ISAC subtype of a GENERALIZED_FLOW can be derived from whether its SOURCE and DESTINATION are both components (a FLOW) or not (a SET). While ISAC_A-GRAPHs don't include explicit EXTERNALs, if the ORIGIN or DESTINATION of a SET is an EXTERNAL, its (the EXTERNAL's) name is shown as part of the representation of the SET; it is embedded in the text of the SET's NAME. The EXTERNAL embedded in the text can also be considered a derived attribute, to be derived from the relationship.
112
GENERALIZED_FLOWs are specialized in a second dimension also. Both SETs and FLOWs are divided into 3 kinds - MESSAGE_FLOWs, REAL_FLOWs, and COMPOSITE_FLOWs. MESSAGE_FLOWs contain information only. REAL_FLOWs contain physical objects. COMPOSITE_FLOWs contain both. COMPOSITE_FLOWS (both flows and sets) can be treated as a combined MESSAGE_FLOW and REAL_FLOW (either flow or set). Note that since MESSAGE_FLOWs may also be REAL_FLOWs, but don't have to be, the specializations are not exclusive. An ACTIVITY may also have a SPECIFIES relationship with another ISAC_AGRAPH, ISAC_A-GRAPHs permit GENERALIZED_FLOWs to be split or joined, allowing a FLOW or SET to come from multiple sources and/or go to multiple destinations. Again, the JOIN_INTO relationship supports SUBFLOWs from multiple SOURCEs and the SPLITS_INTO relationship supports SUBFLOWS to multiple DESTINATIONs. 5.4 Integrating the Data Flow Perspective The terminology used in Data Flow Diagrams, SADT Activity Diagrams, and ISAC AGraphs is inconsistent. There are also some inconsistencies in what concepts are included and how they relate to other concepts. In order to integrate the views of these ISMLs into a single, shared CoCoA model, the terminology for the entity types and relationship types in the individual CoCoA models must be resolved into a single integrated terminology. The existence of synonyms must be identified. A simple approach to doing so is to use a table to list the entity types and named relationship types of each ISML, placing them according to synonyms identified and to choose a unified terminology. The result of using this approach to the identification of synonyms and choosing an integrated terminology for the entities and relationships of these ISMLs is shown in tables 5-1 (a) and (b). The first three columns show the three ISMLs' entity and relationship names. The last column shows the names chosen for the integrated entities and 113
relationships in the integrated perspective. For each of the three ISMLs, the synonyms are listed on the same rows. In Tables 5-1 (a&b), an asterisk notes the items which can be derived, square brackets enclose those concepts which are found in some versions of data flow diagramming, but not in others.
Table 5-1 (a): Entities in the Data Flow Perspective
)))))))))))))))))))))))))))))))))))))))))))))))))))
DFD
SADT
ISAC
Integrated
Process
Activity (Box)
Activity
Process
Data Flow
Input * Output * Control *
Message Set * Message Flow *
Data Flow
Data Store
N/A
N/A
Data Store
External
N/A
(unnamed, embedded in text)
External
[Mechanism]
Mechanism
N/A
Mechanism
N/A
N/A
Real Set * Real Flow *
Real Flow
)))))))))))))))))))))))))))))))))))))))))))))))))))
Table 5-1 (b): Relationships in the Data Flow Perspective
)))))))))))))))))))))))))))))))))))))))))))))))))))
DFD
SADT
ISAC
Integrated
To
Input To
Flows To
Flows To
From
Output From
Flows From
Flows From
N/A
Control To
N/A
Control To
Explodes to
Sub-parts are
Sub-parts are
Specifies
[Join into]
Join into
Join into
Join into
[Splits into]
Splits into
Splits into
Splits into
[Performs]
Performs
N/A
Performs
)))))))))))))))))))))))))))))))))))))))))))))))))))
114
Sometimes identifying synonyms and merging is very easy. For example, all three of these ISMLs have a construct that transforms data. In a DFD it is called a PROCESS, which in SADT and ISAC it is called an ACTIVITY. Here (see table 5-1(a)), we use the term PROCESS for the integrated terminology. Similarly, all three ISMLs use another diagram to specify a PROCESS. We give this relationship the name SPECIFIES in the unified terminology (see table 5-1(b)). In other cases, finding the synonyms and merging them is more difficult. For example, INPUTs, OUTPUTs, and CONTROLs in SADT and MESSAGEs and FLOWs in ISAC are similar semantically to DATA_FLOWs in a DFD; they all show movement of information. However, DFDs do not differentiate between inputs, outputs, and controls. The difference isis in the relationships between the DATA_FLOWs and the PROCESSes. DFDs also do not differentiate between MESSAGEs and FLOWs. The difference here is whether the SOURCE or DESTINATION of the DATA_FLOW is an EXTERNAL. Therefore, the unified term DATA_FLOW can be used to represent all these concepts, and the specifics for the ISAC or SADT diagrams derived from the appropriate relationship. Sometimes, entity types or named relationship types which are synonyms are missed in one view or ISML, and only become apparent upon reflection when trying to merge the CoCoA models. An example for entity types is that SADT's MECHANISM entity type did not appear in the CoCoA model of DFDs. However, some versions of DFDs allow indications that a PROCESS or group of PROCESSes are to be performed by a certain department, person, system, hardware, and/or software (e.g., by enclosing several processes within some boundary, perhaps with dashed lines, and labelling it with the mechanism's name). So the MECHANISM entity type and the PERFORMS relationship may be useful in CoCoA models of some varieties of DFDs also. Similarly, an example for named relationships is that, although they weren't shown on 115
the CoCoA Model of DFDs, DATA_FLOWs can also be split or joined in some versions of DFDs, especially when carrying them over to lower levels from higher level diagrams. Therefore, JOIN_INTO and SPLITS_INTO relationships of SADT and ISAC are implied in DFDs too.
116
After performing a view integration of the three conceptual models using the identified
Figure 6-6: Integrated Data Flow Models 117
synonyms, figure 5-6 is the resulting CoCoA Model defining the integrated Data Flow perspective. The names for the entity types and relationship types use the unified terminology from tables 5-1 (a&b).
118
We can see in figure 5-6 that there are a core of concepts which are shared by all the ISMLs that we have investigated in the data flow perspective. These concepts could be readily shared between CASE tools supporting any of these 3 ISMLs. Some concepts are used by only some of the ISMLs. The DATA_STORE is only used in a DATA_FLOW_DIAGRAM. It is used in other ISAC diagram types though, but not in ISAC_A-GRAPHS. In figure 5-6, we also generalize the 3 ISMLs into a DATA_FLOW_MODEL entity type. DATA_FLOW_MODEL has a NAME attribute and participates in the SPECIFIES relationship, which are inherited by the 3 ISML entity types. Note that this means that a process in one ISML could be specified in a different ISML. Note that other, specific, versions of DATA_FLOW_DIAGRAM could be defined, for example, that do not support the JOIN_INTO and SPLITS_INTO relationships, or that do support REAL_FLOWs. Note also the use of different levels of shading and dashed lines to improve the clarity of the user presentation of complex covering aggregation within CoCoA. This completes our discussion of the data flow perspective. In the next chapter, we use CoCoA to model a more complex perspective, the conceptual data modeling perspective.
119
Chapter 6 Comparison and Integration of Conceptual Data Models: Expanding the Example Use of the CoCoA Model
The purpose of this chapter is to further demonstrate the use of CoCoA, this time on a more complex problem domain. The previous chapter demonstrated the use of CoCoA by modelling and integrating three of the ISMLs which represent the data flow perspective. This chapter will apply CoCoA to another part of the complex problem domain of IS modelling -- the conceptual data modelling perspective, which is represented by many ISMLs including CoCoA itself. A byproduct will be a more formal description of CoCoA. Conceptual data modelling is an important perspective used in describing information systems. Conceptual data models are ISMLs supporting the conceptual data modelling perspective. They are used to describe, reason about, or document the logical structure and meaning of data and the concepts they represent. They are not concerned with what the data is used for or how. They are also unconcerned with how data is represented within a computer-based information system. This chapter applies the CoCoA model to the modelling and integration of various different conceptual data models. The activity of modelling models is called metamodelling, so we are using of the CoCoA model here as a metamodel for conceptual data models. Section 6.1 will describe an enhanced procedure for applying CoCoA to the task of understanding and integrating an IS modelling perspective. A simpler version of the procedure was introduced in chapter 5. Section 6.2 will apply the CoCoA model to 119
formally describe several commonly-used conceptual data models and also the CoCoA model itself. Section 6.3 will apply the CoCoA model to individual conceptual data modelling concepts (as prescribed in section 6.1). Finally, section 6.4 will integrate the resulting CoCoA models of conceptual data models, including CoCoA itself into a model of the entire conceptual data modelling perspective. 6.1 An Enhanced View Integration Method Each different conceptual data model is an ISML, just like the different ISMLs within the data flow perspective (see chapter 5). Each conceptual data model represents only one view within the conceptual data modelling perspective. Before we proceed to applying the CoCoA model to the conceptual data modelling perspective, this section will describe the procedure for view integration used in this chapter. As noted in chapter 5, integrating these views is called view integration. Section 5.2 of chapter 5 presented an elementary method for view integration. This section proposes enhancements to that procedure. An enhanced procedure is necessary because the conceptual data modelling perspective is much more complex than the data flow perspective. It has many more and varied concepts and more and varied ISMLs which encompass them. Moreover, some concepts are partially contradictory to others. The objective and result of using the enhanced procedure presented here is the same as in chapter 5 -- an integrated CoCoA model of the perspective and the ISMLs which support that perspective. In the case of understanding and integrating the conceptual data modelling perspective, simply trying to integrate the CoCoA models of the individual ISMLs directly will present too large of a conceptual problem. The philosophy of the enhanced view integration method is to further divide and conquer the problem domain, below the level of the individual ISMLs. The enhanced procedure is illustrated in figure 6-1. The first step is the same as was 120
done in chapter 5; the procedure begins with modelling the individual ISMLs. But, instead of then integrating them directly, the second step of the enhanced procedure is to reconsider and formally model smaller groups or clusters of the individual concepts within the perspective. The enhanced procedure extracts these groups from the CoCoA models of the ISMLs (conceptual data models). Once the smaller groups of concepts have all been modelled individually using CoCoA, the third step is to integrate them. After completing the procedure, the CoCoA models of the various ISMLs can be mapped onto the integrated model - and the integrated model expanded if necessary. In the case of IS modelling, these groups or clusters can be called abstraction mechanisms. In the case of the conceptual data modelling perspective, they are called data abstraction mechanisms. Data abstraction mechanisms are the building blocks of conceptual data models.
121
Figure 6-1: Enhanced View Integration Method
122
Like an entire ISML, an individual abstraction mechanism may also be considered to be a subview of an IS Modelling perspective (e.g., conceptual data modelling), although a somewhat smaller view than an entire ISML. Modelling these smaller subviews (e.g., individual data abstraction mechanisms) will yield a richer understanding of each of them, how they and the ISMLs which incorporate them relate to each other, and consequently the entire IS modelling perspective (in this case, conceptual data modelling). This is partly because we consider each data abstraction mechanism in more detail and partly because we ignore how each data abstraction mechanism relates to other abstraction mechanisms within a particular ISML. The three steps in the enhanced procedure will now each be illustrated in sections 6.2 through 6.4 below. Several data abstraction mechanisms were already discussed in chapter 4. That list of abstraction mechanisms will be significantly extended in section 6.3. 6.2 CoCoA Modelling of Conceptual Data Models This section illustrates the first step of the enhanced procedure, as shown by the white arrows in figure 6-1. It develops CoCoA models of various conceptual data models. Each CoCoA diagram is supplemented with a brief natural language explanation. As in chapter 5, the diagrams and descriptions in this chapter emphasize the concepts of conceptual data models, not their notation. The conceptual data models to be modelled in CoCoA include the Entity Relationship (ER) model, the Extended Entity Relationship (EER) model, the Nijssen Information Analysis Method (NIAM) data model, the Entity-Category-Relationship (ECR) model, the Object Modelling Technique (OMT) model, and the Complex Covering Aggregation (CoCoA) model. The first five of these conceptual data models were previously described and evaluated in chapter 3. The CoCoA model was also previously described 123
and evaluated, but in chapter 4. Using the CoCoA model, this section describes these conceptual data models again more formally. The ER and CoCoA models are modelled using CoCoA in sections 6.2.1 and 6.2.2 respectively. The detailed description of the other conceptual data models can be found in appendix A. 6.2.1 Entity Relationship Model The entity relationship model [Chen] is a fundamental view of data. It was reviewed in detail in section 3.1. It describes the various things (entities, whether real or abstract) and the relationships between them that we want to keep information (data) about. It has a very strong intuitive appeal and is very widely used. When extended with attributes, it also tells us what information we are interested in keeping or representing in an information system for the various entities and relationships. The structure of an entity relationship model is often strongly reflected in the structure of a database or set of files in a system design and/or implementation. The attributes are reflected in the data elements in data structures and in the fields in reports and forms.
124
Figure 6-2: Basics of the Entity Relationship (ER) Model A CoCoA model of the basics of the ER model is shown in figure 6-2. As in chapter 5, the names given to the various constructs in the CoCoA models in the figures are shown in uppercase in this text. In figure 6-2, the BASIC_ER_MODEL is primarily composed of (aggregates) the ENTITY and RELATIONSHIP entities. Each ENTITY in turn aggregates any number of ATTRIBUTEs. Each RELATIONSHIP aggregates 2 or more ROLEs. (A binary relationship would be composed of 2 roles, a ternary relationship of 3 roles, and so on.) In the BASIC_ER_MODEL, these ROLEs may optionally be given names. A MAX cardinality constraint is an attribute of each ROLE. There is also a PLAYS relationship between ENTITY and the ROLE component of a relationship, i.e., an ENTITY PLAYS a ROLE in a RELATIONSHIP. Restated another way using the role name in the relationship, an ENTITY may be the PLAYER of any number of ROLEs. It is possible 125
for an ENTITY not to play any ROLE (even if it makes for an uninteresting diagram). In the ER model, a ROLE must be PLAYED by exactly one ENTITY. The above paragraph and figure 6-2 describe the basic characteristics of the ER model, which is used by many ER diagramming tools. However, we have omitted a pair of concepts from the original [Chen] paper -- weak entities and existence dependency relationships. Figure 6-3 expands on figure 6-2 to add these concepts.
Figure 6-3: The complete ER Model In figure 6-3, ENTITY is specialized into STRONG_ENTITY and WEAK_ENTITY. This is necessary because the participation cardinalities of STRONG_ENTITY and WEAK_ENTITY in their respective _PLAYS relationships are different -- otherwise, the STRENGTH attribute of ENTITY would have sufficed. A WEAK_ENTITY is existence dependent on exactly one ENTITY. I have also added a TYPE attribute to ROLE to indicate that a ROLE may be either a DEPENDENT ROLE or a NORMAL (independent) 126
ROLE. The asterisk indicates that the attribute is derived. Its value is determined by whether the ENTITY connected to the ROLE is a WEAK_ENTITY or a STRONG_ENTITY. This is needed because the information is redundant. 6.2.2 Complex Covering Aggregation (CoCoA) Conceptual Data Model This section uses CoCoA to describe itself. See figure 6-4 below. CoCoA was introduced in chapter 4. CoCoA is primarily an entity-relationship based conceptual data model, as is obvious from its ENTITY and NAMED_RELATIONSHIP entities. In CoCoA, ENTITIes aggregate ATTRIBUTEs and NAMED_RELATIONSHIPs aggregate ROLEs. Like the EER and OMT models, CoCoA supports GENERALIZATION_RELATIONSHIPs. Like OMT, CoCoA
127
Figure 6-4: The Complex Covering Aggregation (CoCoA) Model
supports simple covering aggregation with AGGREGATION_RELATIONSHIPs. The primary enhancement to CoCoA over other conceptual data models is its support for the complex covering aggregation conceptual abstraction, from which CoCoA takes 128
its name. Complex covering aggregation allows a relationship to be a direct component of an aggregate object, in addition to the more ordinary object components. The AGGREGATION_RELATIONSHIP in OMT (see appendix A) could only AGGREGATE ENTITIes. Note that, in CoCoA, both ENTITIes and NAMED_RELATIONSHIPs may play COMPONENT roles in AGGREGATES relationships.
129
A second difference from other conceptual data models is CoCoA's means of supporting grouping-of-entities categories (see the section in appendix B on the ECR model) by simply relaxing the usual constraint that only one entity type can play a particular role type in a relationship. An example of this relaxation is found in the previous paragraph where both NAMED_RELATIONSHIP and ENTITY are connected to the COMPONENT role of the AGGREGATES relationship. The relaxation of this constraint is directly modelled in figure 6-4 via the maximum cardinality constraint value of the PLAYED role in the PLAYS relationship. The n value there indicates that the ROLE can be PLAYED by more than one ENTITY type. In other conceptual data models, the value of this constraint is 1 instead of n (e.g., see figure 6-2). Returning to the example above drawn from the same figure, the actual cardinality is 2; both ENTITY and NAMED_RELATIONSHIP play the COMPONENT role in the AGGREGATES relationship. This example instance would violate the maximum cardinality constraint of the PLAYED role in other conceptual data models. 6.3 Modelling of Data Abstraction Mechanisms This section illustrates the second step of the enhanced procedure for CoCoA modelling and integration of a perspective, as represented by the arrows from the second to the third levels in figure 6-1. The second step draws from the CoCoA models of the individual views (e.g., ISMLs within an IS modelling perspective, as modelled in section 6.2). It extracts a smaller cluster of concepts (an abstraction mechanism for an ISML), then models it using CoCoA. It integrates all the various views' (e.g., ISMLs') use of that cluster (e.g., abstraction mechanism). This task is then repeated for each smaller cluster or group that can be found. In the case of the conceptual data modelling perspective, the concepts can be clustered around data modelling abstraction mechanisms. Conceptual data modelling abstractions supported by CoCoA were introduced in chapter 4. This section expands on the list from 130
that chapter, drawing from other the CoCoA models of existing conceptual data models (see section 6.2) as shown in figure 6-1. The resulting list includes 1) simple entities, 2) named relationships or associations, 3) generalization and specialization, 4) categorization, 5) covering aggregation (complex entities), 6) objectification, 7) object types, 8) derivation and derived concepts, 9) fact types and reference types, and 10) constraints. Each conceptual data modelling abstraction will be formally modelled using CoCoA. Of the data abstractions shown above, only simple entities, named relationships, and fact types are shown in this section, in sections 6.3.1-6.3.3 respectively. The details of the description of the remaining abstraction mechanisms can be found in appendix B. These CoCoA models will then be integrated in section 6.4. 6.3.1 Simple Entities Entities being composed of attributes were described in section 4.1 and are more formally modelled using CoCoA in figure 6-5. An ENTITY_TYPE is a covering aggregation of ATTRIBUTE_TYPEs. We use the suffix _TYPE here and throughout section 6.3 to emphasize that this is a metamodel, i.e., that the things being modelled here are sets of types whose instances are themselves types that can have their own instances. For example, the entity type ENTITY_TYPE might have an instance CUSTOMER, which is itself an entity type that can have instances such as Mr. Smith or Ms. Jones. Similarly, the entity type ATTRIBUTE_TYPE might have an instance CREDIT_LIMIT which might have instances such as $1,000 or $10,000.
131
Figure 6-5: Entity Types and Attribute Types Of the conceptual data models modelled in section 6.2 and appendix A, the most complete model of the attributes for the ATTRIBUTE_TYPE entity was in our CoCoA model of the OMT model (see appendix A). In the CoCoA model of OMT, each ATTRIBUTE_TYPE has a NAME, a DATA_TYPE, an (optional) INITIAL_VALUE, a DERIVED boolean (true if the value is derived), and a CLASS boolean (true if the attribute is an attribute of the class itself instead of instances of the class). This is the 132
starting point for our model in figure 6-5. To the above-mentioned attributes from OMT, we add three boolean attributes. From the ECR model, we add a MULTI-VALUED attribute, which is true if an instance of the ATTRIBUTE_TYPE may be multi-valued. A NULL_ALLOWED attribute is also added, which is true if a null is allowed for its respective ATTRIBUTE. Finally, from the EER model, we add the IDENTIFIER/DESCRIPTOR attribute. This is a partitioning attribute for ATTRIBUTE_TYPE. If an ATTRIBUTE_TYPE is not an IDENTIFIER_TYPE (IDENTIFIER is false), then it is a DESCRIPTOR_TYPE. ATTRIBUTE_TYPEs play roles in different relationships with ENTITY_TYPEs, depending on whether they are IDENTIFIER_TYPEs or DESCRIPTOR_TYPEs. They
133
participate in either IDENTIFIES or DESCRIBES relationships respectively. Note that the DERIVED attributes (of both ATTRIBUTE_TYPEs and ENTITY_TYPEs) are themselves derived. The DERIVED attribute's value is derived from the presence of a DERIVES relationship with the ATTRIBUTE_TYPE playing the DERIVED role. This relationship is adapted from the CoCoA model of the NIAM model (see appendix A). It is examined in more detail in the section on the derivation data abstraction in appendix B. The IDENTIFIES and DESCRIBES named relationships are also derived. They are derived from the fact that they are linked to either an IDENTIFIER_TYPE or a DESCRIPTOR_TYPE respectively and that each of these is a component of an ENTITY_TYPE which it either identifies or describes. 6.3.2 Named Relationships or Associations Named relationships were discussed in section 4.2 and are modelled using CoCoA in figures 6-6 and 6-7. As figure 6-6 shows, a NAMED_RELATIONSHIP_TYPE is primarily an aggregation of NAMED_RELATIONSHIP_ROLE_TYPEs. The longer name, NAMED_RELATIONSHIP_ROLE_TYPE, is used here instead of simply ROLE_TYPE because other kinds of ROLE_TYPEs will be identified relating to fact types (see section 6.3.3). Naturally enough, a NAMED_RELATIONSHIP_TYPE has a NAME attribute. It also has a DEGREE attribute (2 = binary, 3 = ternary, 4= quaternary, etc.), which is derived from
134
Figure 6-6: Basics of the Named Relationship Data Abstraction a count of the number of NAMED_RELATIONSHIP_ROLE_TYPEs aggregated by the relationship. As we have seen in the various ER-based models, an ENTITY_TYPE PLAYS a
135
NAMED_RELATIONSHIP_ROLE_TYPE in a NAMED_RELATIONSHIP_TYPE. Each NAMED_RELATIONSHIP_ROLE_TYPE has a NAME, MULTIPLICITY, MINIMUM, and MAXIMUM attributes. The MULTIPLICITY attribute specifies (or constrains) how many entities may participate in the NAMED_RELATIONSHIP_ROLE_TYPE. Commonly, it is a set of (minimum, maximum) cardinality constraint interval pairs, but may have more general specifications. It accounts for the most general case in the OMT model (see appendix A). The MINIMUM and MAXIMUM attributes can be derived from the MULTIPLICITY attribute, with MINIMUM being the minimum of the minimums in MULTIPLICITY and MAXIMUM being the maximum of the maximums. In some (but not all) conceptual data models, relationships are permitted to have attributes. The boolean attribute ATTRIBUTED partitions NAMED_RELATIONSHIP_TYPE into either ATTRIBUTED_NAMED_RELATIONSHIP_TYPE or UNATTRIBUTED_NAMED_RELATIONSHIP_TYPE. The difference is that an ATTRIBUTED_NAMED_RELATIONSHIP_TYPE may have ATTRIBUTE_TYPE components which identify or describe it. Note that we have not included the derived relationships IDENTIFIES and DESCRIBES as we did for ENTITY_TYPE, but they are still implied. Figure 6-6 glosses over a couple of issues dealing with multiple entity type participation and the structure of constraints on ENTITY_TYPE participation in NAMED_RELATIONSHIP_ROLE_TYPEs. The multiple entity type participation issue is dealt with in figure 6-7.
136
Figure 6-7: Multiple Participation in Named Relationship Roles In the CoCoA model, the common restriction that only one entity type can participate in a particular role in a relationship is relaxed to allow multiple entity types to participate. This is how CoCoA permits the implied use of grouping categories (but without a union of attributes as in ECR) without having to create a more general entity type. Figure 6-7 specializes NAMED_RELATIONSHIP_ROLE_TYPE into SINGLE_PARTICIPATION_NAMED_RELATIONSHIP_ROLE_TYPE and 137
MULTIPLE_PARTICIPATION_NAMED_RELATIONSHIP_ROLE_TYPE according to the MAX_TYPE_PARTICIPATION partitioning attribute. The former is the more common version and the latter is the version in CoCoA. Figure 6-6 assumes single participation. To account for multiple participation in figure 6-7, the simple PLAYS relationship in figure 6-6 is modified to the SINGLY_PLAYED_BY and MULTIPLY_PLAYED_BY relationships of figure 6-7. Note that the PLAYS relationship in figure 6-6 may still be derived from the union of the two _PLAYED_BY relationships. Note that there is still a possibility for multiplicity constraints on participation in NAMED_RELATIONSHIP_ROLE_TYPEs that remains unaddressed in figure 6-7. With single entity type participation, there is only one participation constraint possible per role. However, with multiple entity type participation, it is theoretically possible to have a different constraint on each different entity type which can play the role. For example, there might be a constraint that a CORPORATION could OWN an
138
unlimited number of VEHICLEs while are PERSON could only OWN 2 VEHICLEs (or even only one as a hypothetical example). Thus, the maximum multiplicity would be different for the two entity types' respective participation in the same role type. This possibility is unsupported in CoCoA or any of the other conceptual data models discussed so far. Modelling this situation in CoCoA requires 2 different relationships since the roles have different constraints (e.g., CORPORATELY_OWNS and PRIVATELY_OWNS). Furthermore, the author knows of no conceptual data model which does support it. Therefore, this possibility for conceptual data modelling remains unaddressed in figure 6-7. The second issue unaddressed in figure 6-6 is that the MULTIPLICITY attribute has an embedded structure. Normally, any structure embedded in an attribute should be modelled in more detail, e.g., as in figure 6-8. In figure 6-8, MULTIPLICITY has been broken down into MULTIPLICITY_INTERVAL entities, each with its own MIN and MAX attributes. A single MULTIPLICITY_INTERVAL, e.g., (0,n) or (1,1) will usually suffice, but two or more pairs are needed to express more complex multiplicity constraints. For example, (0,0) and (2,4) together mean a MULTIPLICITY attribute of "zero, 2, 3, or 4". The MULTIPLICITY attribute of a NAMED_RELATIONSHIP_ROLE_TYPE) can still
Figure 6-8: Making multiplicity structure explicit
be derived by sorting an associated set of MULTIPLICITY_INTERVALs, condensing (where MIN=MAX), and concatenating the intervals, giving a result similar to the quotation in the previous sentence. The MINIMUM and MAXIMUM attributes (not shown) of a NAMED_RELATIONSHIP_ROLE_TYPE can similarly be derived from the MULTIPLICITY_INTERVAL's MIN and MAX attributes. 139
So, we must choose between embedding the structure in an attribute, as in figure 6-6, or making it explicit (at a lower level of granularity), as in figure 6-8. In modelling complex problem domains we should usually choose to make such structures explicit. Doing so will enable other relationships to be made to the smaller-grain components within the structure. In this case, however, the better choice may be embedding the structure, because MULTIPLICITY_INTERVAL entities are not otherwise useful. We will almost always want the larger-grained, combined structure of the MULTIPLICITY attribute. Additionally, the MINIMUM and MAXIMUM attributes can be derived just as easily from MULTIPLICITY. Note also that embedding the structure is also more flexible, because it also allows unusual specifications, like "even numbers", although care must be taken so that the MINIMUM and MAXIMUM cardinalities for the ROLE may still be derived. So, the choice made here is to embed the structure in the MULTIPLICITY attribute, as it is in figures 6-6 and 6-7, and not to use the structure as in figure 6-8. The complete model of the named relationships abstraction is then the combination of figures 6-6 and 6-7. If, however, the explicit, small-grained structure were otherwise needed (e.g., for automatic enforcement of constraints or for reference from other parts of the description), then the explicit structure should be included as in figure 6-8. 6.3.3 Fact Types and Reference Types Fact types, reference types, label types, and entity types are the principle constructs of the NIAM method, as discussed in section 3.3 and appendix A. Fact types were briefly mentioned in the previous section and reference types have not been discussed thus far. NIAM fact types and reference types are similar to the NAMED_RELATIONSHIP_TYPEs which were formally modelled in section 6.3.2 and figure 6-6 and 6-7. However, only some NIAM fact types are actually the same as NAMED_RELATIONSHIP_TYPEs. Other NIAM fact types correspond to 140
ATTRIBUTE_TYPEs of ENTITY_TYPEs, as in appendix B. ENTITY_TYPEs are formally modelled in appendix B, but NIAM uses the term differently. In NIAM, an "entity type" corresponds to either an ENTITY_TYPE or an ATTRIBUTE_TYPE as we have used the terms. Because NIAM uses the same term "entity type" with different semantics than the previous use of the word, the NIAM concept of an entity type is renamed GENERAL_ENTITY_TYPE. As you can see at the bottom right of figure 6-9, a NIAM GENERAL_ENTITY_TYPE is specialized into either ENTITY_TYPE (as the term has been previously used) or DESCRIPTOR_TYPE (which is also a subtype of ATTRIBUTE_TYPE, see appendix B). The CoCoA model in figure 6-9 below integrates the NIAM FACT_TYPE, REFERENCE_TYPE, IDENTIFIER_TYPE, and GENERAL_ENTITY_TYPE concepts with the ATTRIBUTE_TYPE, ENTITY_TYPE, and NAMED_RELATIONSHIP_TYPE data modelling abstractions previously modelled in appendix B and section 6.3.2.
141
Figure 6-9: Fact Type Data Abstraction Figure 6-9 is organized in 4 layers from top to bottom for (1) kinds of relationship types including fact types, (2) role component types of relationship types and fact types, (3) playing of role types in relationship types, and (4) attribute types and entity types which play the role types. Various new entity types and relationship types have been 142
introduced as well. The top layer shows the different kinds (specializations) of RELATIONSHIP_TYPE. It includes an inheritance structure by which NIAM's REFERENCE_TYPE, FACT_TYPE, and UNARY_FACT_TYPE concepts are related to each other and to new and previously introduced concepts. The ATTRIBUTED_NAMED_RELATIONSHIP_TYPE and UNATTRIBUTED_NAMED_RELATIONSHIP_TYPE entity types were discussed in section 6.3.2. The two new entity types are called GENERALIZED_RELATIONSHIP and DESCRIPTION_TYPE. In the top layer, RELATIONSHIP_TYPE is specialized into the NIAM concepts of REFERENCE_TYPE and FACT_TYPE. A REFERENCE_TYPE defines what ENTITY_TYPE a IDENTIFIER_TYPE (called a LABEL_TYPE in NIAM) identifies. NIAM FACT_TYPEs can be partitioned into four different kinds. The first, a UNARY_FACT_TYPE, defines a dimensionless kind of fact about an ENTITY_TYPE. An example might be the fact that people (an ENTITY_TYPE) can
143
be smokers. The second kind of FACT_TYPE, a DESCRIPTION_TYPE, defines how an ATTRIBUTE_TYPE describes an ENTITY_TYPE. The remaining two kinds of FACT_TYPEs were described in section 6.3.2 and figures 6-6 and 6-7. The third is an UNATTRIBUTED_NAMED_RELATIONSHIP_TYPE, which is a pure NAMED_RELATIONSHIP_TYPE. Its only function is to define how two or more ENTITY_TYPES are related to each other. The fourth kind, an ATTRIBUTED_NAMED_RELATIONSHIP_TYPE also defines how two or more ENTITY_TYPEs are related, but additionally defines how attributes describe the named relationship itself. The second layer shows the various specializations of ROLE_TYPE. Each kind of RELATIONSHIP_TYPE in the top layer is composed of a different combination of different ROLE_TYPEs. A REFERENCE_TYPE is composed of two ROLE_TYPEs, the IDENTIFIER_ROLE_TYPE and the REFERENCED_ROLE_TYPE. A UNARY_FACT_TYPE is composed of only one ROLE_TYPE, the DESCRIBED_ROLE_TYPE, which identifies the ENTITY_TYPE which has the particular characteristic. A DESCRIPTION_TYPE has two ROLE_TYPEs, the DESCRIBED_ROLE_TYPE, which identifies the ENTITY_TYPE that is described, and the DESCRIPTOR_ROLE_TYPE, which identifies the DESCRIPTOR_TYPE (an ATTRIBUTE_TYPE) that describes the ENTITY_TYPE. An UNATTRIBUTED_NAMED_RELATIONSHIP_TYPE is comprised only of two or more NAMED_RELATIONSHIP_ROLE_TYPEs (which were defined in section 6.3.2) which are played by the ENTITY_TYPEs which participate in the relationship. An ATTRIBUTED_NAMED_RELATIONSHIP_TYPE is similar, except that it additionally has one or more DESCRIPTOR_ROLE_TYPEs, which identify DESCRIPTOR_TYPEs which describe the named relationship. The third level shows the various PLAYS_... relationships. These show which types 144
of entities or attributes in the fourth level can PLAY the various roles in the second level. Note that each ROLE_TYPE plays a PLAYED role in only one PLAYS_... relationship. Also, each PLAYER role is played by only one ENTITY_TYPE, DESCRIPTOR_TYPE, or IDENTIFIER_TYPE. IDENTIFIER_TYPEs identify, DESCRIPTOR_TYPEs describe, and ENTITY_TYPEs are referenced (identified), are described, and are related to other ENTITY_TYPEs. ATTRIBUTED_NAMED_RELATIONSHIP_TYPEs are also described, but don't play a DESCRIBED_ROLE_TYPE because the DESCRIPTOR_ROLE_TYPE(s) are embedded right within the relationship they describe. The fourth (bottom) layer shows the various ENTITY_TYPE or ATTRIBUTE_TYPE entities which can PLAY a ROLE_TYPE. Note that ATTRIBUTE_TYPE is specialized into either DESCRIPTOR_TYPE or IDENTIFIER_TYPE (the NIAM term is LABEL_TYPE). Note also that DESCRIPTOR_TYPE is a subtype of both ATTRIBUTE_TYPE and GENERAL_ENTITY_TYPE. This results from NIAM treating data one way and other data models another way, with DESCRIPTOR_TYPEs being classified on one side (as GENERAL_ENTITY_TYPEs) by NIAM and on the other side (as ATTRIBUTE_TYPEs) by the ER, EER, ECR, OMT, and CoCoA data models. In fact, an alternative name for GENERAL_ENTITY_TYPE might by NIAM_ENTITY_TYPE, because it is the only data model (known to the author) which uses the ENTITY_TYPE term in that way. I have chosen to use GENERAL_ENTITY_TYPE instead to be less method-specific and to allow for other conceptual data models which might have the same concept. Many of the NIAM concepts may be derived from the other concepts, as indicated by the asterisks in the figures modelling NIAM in Appendix A. Derivations are discussed in Appendix B. 6.4 Integration of Conceptual Data Models 145
This section illustrates the 3rd step of the enhanced procedure shown in figure 6-1. It describes an integrated model of the conceptual data modelling perspective. Doing so will further illustrate how the CoCoA model supports partially-overlapping components of complex covering aggregate entities - and therefore modelling of complex problem domains. A single CoCoA diagram of the entire conceptual data modelling perspective is unnecessary, difficult to construct, and not particularly useful. The CoCoA models of the data abstraction mechanisms developed in section 6.3 and in appendix B are smaller, more basic subviews of the conceptual data modelling perspective than individual ISMLs. Taken together, they constitute a larger CoCoA model. The names for entity types and relationship types which appear on more than one diagram provide the links between the diagrams. Therefore, it is not really necessary to unite them all on a single figure. Given the large number of entity types and relationship types describing the data abstractions, it is difficult to integrate them all into a single CoCoA diagram. It would also be especially difficult to overlay 6 different ISML views of the same perspective upon that figure. Finally, a single, large, complex figure would be too complex to read with clarity, particularly because some of the clusters of concepts would need to be distributed in order to physically arrange them on a diagram. For these reasons, this section will not develop a single-figure CoCoA model of the conceptual data modelling perspective. However, we can still use CoCoA to more fully integrate and clarify the integrated model. First, we must resolve the terminology. Section 6.4.1 will do so tables as was done in chapter 5. Second, section 6.4.2 will review the CoCoA models of the conceptual data modelling ISMLs developed in section 6.2 in light of the resolved terminology and the CoCoA models of the data abstraction mechanisms, resulting in CoCoA models of the conceptual data models which are consistent with those descriptions. Finally, section 6.4.3 will review the overlap of some 146
of the ISMLs covering of some of the data abstractions, in order to clarify their relationships and classify and generalize somewhat on the conceptual data modelling ISMLs. 6.4.1 Integration of Data Abstraction Mechanisms and Terminology The first step given in the previous paragraph is to document all the synonyms and develop an integrated terminology, as was done in chapter 5 for the data flow perspective. The synonyms and integrated terminology for the entity and named relationship types are shown in tables 6-1 and 6-2 on the next few pages. The tables group the concepts according to the data abstraction mechanism (in parts a to j of each table) and listing them in the order in which the abstraction mechanisms were discussed in section 6.3 and appendix B. Some entity or named relationship type names are repeated in the tables, because some are common to different data abstraction mechanism. These concepts are shown in italics. Entity types shown within parentheses are not perfect synonyms, but are generalized to or from the integrated term on the row, and therefore provide limited support for the concept. Concepts with asterisks are those in the integrated model which can be derived.
147
Table 6-1 (a): Entities in models supporting the simple entity data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
Entity
Entity (Entity Type) Entity Type
OMT
CoCoA
Integrated
Entity
(Class)
Entity
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Class
Object Type
Entity Type
General Entity
Type Strong Entity
Strong Entity
Strong Entity Type
Weak Entity
Weak Entity
Weak Entity Type
Attribute
Attribute (Entity Type) Attribute Type
Attribute
Attribute
Attribute
Identifier Label Type
Identifier Type
Descriptor
Descriptor Type Entity Type
(Data Type attribute)
Constraint * Table 6-1 (b): Entities in models supporting the named relationship data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Relationship Relationship Association Named
(Fact type) Relationship Named Relationship Relationship
Type Attributed Named Relationship Type Unattributed Named Relationship Type Role Role Relationship
(Role Type) Role
Role
Role
Named Role Type Single
148
Participation Named Relationship Role Type Multiple Participation Named Relationship Role Type
149
Table 6-1 (c): Entities in models supporting the fact type data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Relationship Type (Relationship) (Relationship) (Relationship) (Named
Fact Type (Relationship) Fact Type Relationship)
Reference Type *
Reference Type *
Unary Fact Type *
Unary Fact Type * Description Type *
Relationship Relationship Association Named
(Fact type) Relationship Named Relationship Relationship
Type Role Type
Role Type Referenced Role
Type * Identifier Role Type * Described Role Type * Descriptor Role Type * Role Role Relationship
(Role Type) Role
Role
Role
Named Role Type
Entity Type
General Entity
Type Table 6-1 (d): Entities in models supporting the generalization-specialization data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Generalization Generalization Relationship,
Classification GeneralizationRelationship 150
Generalization Relationship
Relationship Subset
Specialization Relationship
Relationship Single Inheritance Entity Type Multiple Inheritance Entity Type ISA Category *
151
Specialized Entity Type*
Table 6-1 (e): Entities in models supporting the categorization data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Category
Category Type
Grouping of Grouping of Entities Entities Category
Category Type
Union Relationship
Union
ISA Category *
Specialized Entity Type*
Relationship
Combined Entity Category * Table 6-1 (f): Entities in models supporting the covering aggregation data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Covering Covering Aggregation Aggregation Relationship Type Relationship Aggregation
(Covering
Relationship
Aggregation
Simple Covering Aggregation Relationship) Relationship Type (Covering Complex Covering Aggregation Aggregation Relationship) Relationship Type Table 6-1 (g): Entities in models supporting the objectification data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Nested Fact Objectified Fact Type
Association
152
Type
as Class
(Relationship) (Relationship) (Association) (Named
Fact Type (Relationship) Fact Type Relationship)
(Entity)
(Class)
(Entity) Entity Type (Entity) General Entity Type
153
(Entity Type)
Table 6-1 (h): Entities in models supporting the object data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
Entity
Entity (Entity Type) Entity Type
OMT
CoCoA
Integrated
Entity
(Class)
Entity
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Class Attribute
Attribute (Entity Type) Attribute Type
Attribute
Object Type Attribute
Attribute
Operation
Operation Type
Operation Propagation
Operation
Propagation Type Formal Argument Type Formal Argument Type Function Type
Function Type
Procedure Type
Procedure Type
Table 6-1 (i): Entities in models supporting the derivation data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Derivation Rule
Constraint Derivation Derivation Rule
154
Table 6-1 (j): Entities in models supporting the constraint data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Entity Type
Data Type
(attribute) Constraint * Inter-Fact-Type Participation Participation Constraint
Inter-Fact-Type Constraint Reflexive
Constraint Equality Constraint
Equality Constraint
Exclusion Constraint
Exclusion
Constraint Non-Reflexive Constraint Subset Constraint
Subset Subset Constraint Constraint
Ordered Role Ordered Role Type Combination
Type Combination
Intra-Fact-Type Participation Participation Constraint
Intra-Fact-Type Constraint (*)
Uniqueness Constraint * Occurrence Frequency Constraint * Mandatory Role Constraint * Unordered Role Type Combination
Unordered Role Type Combination
155
Table 6-2 (a): Relationships in models supporting the simple entity data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Identifies * Describes * Defines * Table 6-2 (b): Relationships in models supporting the named relationship data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
Plays
Plays named relationship role type
(Plays)
Singly plays
(Plays)
Multiply plays
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Plays *
Plays *
(Plays *)
Plays
Plays
Plays * Strongly plays
Strongly plays
Strongly plays
Weakly plays
Weakly plays
Weakly plays Qualifies
156
Qualifies
Table 6-2 (c): Relationships in models supporting the fact type data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Refers to * Uses to refer * Plays *
Plays role type * Plays referenced role type * Plays identifier role type * Plays described role type * Plays descriptor role type *
(Plays)
(Plays)
(Plays *)
(Plays)
(Plays)
Plays
Plays named relationship role type
Table 6-2 (d): Relationships in models supporting the generalization-specialization data abstraction ))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Generalizes
Generalizes
Generalizes * Generalizes, (Generalizes) Subsets into
Generalizes Singly generalizes
(Generalizes)
(Generalizes)
(Generalizes)
Specializes
Specializes
Multiply generalizes Specializes, Specializes Subsets
Specializes Partitions Partitions Partitions
157
Table 6-2 (e): Relationships in models supporting the categorization data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Groups
Groups
Groups into Groups into union
Plays *
Plays *
(Plays *)
union
Plays
Plays
Plays *
Plays
Plays named relationship role type
(Plays)
Singly plays
(Plays)
Multiply plays
Table 6-2 (f): Relationships in models supporting the covering aggregation data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Aggregates
(Cover
Simple cover aggregates) aggregates Cover Complex cover aggregates aggregates Aggregates
Cover
Aggregates into
aggregates into into Can share Can share component
component with
with
Table 6-2 (g): Relationships in models supporting the objectification data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
(No named relationships for this data abstraction - specialization only)
158
159
Table 6-2 (h): Relationships in models supporting the object data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Propagates
Propagates
From
Propagates from
To
Propagates to
Table 6-2 (i): Relationships in models supporting the derivation data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
Derives
Derives
Derives
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Derives Derives from Derives from *
Table 6-2 (j): Relationships in models supporting the constraint data abstraction
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
ER
EER
NIAM
ECR
OMT
CoCoA
Integrated
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Defines * Equality constrains
Equality constrains
Disjoint constrains
Disjoint constrains
Subset constrains
Subset constrains
Subset constrains
Uses to constrain
Uses to constrain
Uses to constrain
Follows in combination combination
Follows in
Multiplicity constrains Uniqueness constrains * Constrains frequency * Makes mandatory *
160
6.4.2 Resolving Individual CoCoA Models with the Integrated CoCoA Model Each of the conceptual data models considered in this chapter (ER, CoCoA, EER, NIAM, ECR, and OMT) has been previously modelled using CoCoA in either section 6.2 or appendix A. This section reviews those initial CoCoA models to see what changes need to be made to resolve them with the integrated terminology for conceptual data modelling developed earlier in this section, as well as the CoCoA models of the data abstraction mechanisms developed in section 6.3. For example, some concepts will now be derived, rather than stored. Some of the concepts will be renamed to be consistent with the integrated terms. The old names may need to be noted as aliases. This section briefly reviews the models and finalizes their description consistent with the integrated CoCoA model of the conceptual data modelling perspective. Sections 6.4.2.1 and 6.4.2.2 review and revise the CoCoA diagrams of the ER and CoCoA models. The EER, NIAM, ECR, and OMT models are reviewed in Appendix C. 6.4.2.1 The ER Model All entity types, relationship types, attribute types, and role types in the initial ER model have a direct equivalent in the integrated conceptual data model. A CoCoA model of the ER model which is integrated with the integrated CoCoA model of conceptual data modelling differs from the initial CoCoA model of the ER Model as shown in figure 6-3 only in its use of the integrated, consistent terminology. This terminology was developed mainly in figures 6-5 and 6-6 and summarized in tables 1 (a and b) and 2 (a and b). The changes to figure 6-3 are shown in figure 6-10.
161
Figure 6-10: Integrated CoCoA Model of the ER Model The changes from figure 6-3 are that the names are given the _TYPE suffix so that the meta-level of the CoCoA model is clear. For example, ENTITY in renamed ENTITY_TYPE. RELATIONSHIP is called NAMED_RELATIONSHIP_TYPE and ROLE is called NAMED_RELATIONSHIP_ROLE_TYPE to differentiate them from different kinds of relationships and roles. 6.4.2.2 The CoCoA Model Figure 6-11 shows the CoCoA model of CoCoA after it is made consistent with the integrated CoCoA model of the conceptual data modelling perspective. All of the entity
162
types, named relationship types, attribute types, and role types in the initial CoCoA model of CoCoA (see figure 6-4), like the ER model, have direct equivalents in the integrated conceptual data model. Also like the ER model, their names must be changed to be consistent with the integrated terminology. E.g., GENERALIZATION_RELATIONSHIP
Figure 11: Integrated CoCoA Model of the CoCoA Model 163
and AGGREGATION_RELATIONSHIP from figure 6-4 are renamed respectively as GENERALIZATION-SPECIALIZATION_RELATIONSHIP_TYPE and COVERING_AGGREGATION_RELATIONSHIP_TYPE. Additionally, the DERIVED attributes of ENTITY_TYPE, NAMED_RELATIONSHIP_TYPE, and ATTRIBUTE_TYPE are made to be derived (as shown by the asterisks), as are the MIN and MAX attributes of the NAMED_RELATIONSHIP_ROLE_TYPE entity. 6.4.3 Higher Levels of Conceptual Data Modelling Finally, we can take a look at the various models themselves and how they can fit into a generalization hierarchy of conceptual data modelling. This is done with the CoCoA model in figure 6-12. CONCEPTUAL_DATA_MODELs are here divided into three subtypes according to what concepts they are based on. The different conceptual data models discussed are shown in the type hierarchy according to which group each belongs.
164
Figure 6-10: Generalizations of Conceptual Data Models Figure 6-12 also includes the generalization hierarchy from figure 6-9 in order to show how the different model entity types' components overlap. OBJECT-BASED_MODELs and ER-BASED_MODELs are very similar in components except that the former has OBJECT_TYPEs as components and the latter has ENTITY_TYPEs. Note that FACTBASED_MODELs, however, take a significantly different view. They do not distinguish 165
between DESCRIPTOR_TYPEs and ENTITY_TYPEs; instead NIAM lumps them both together as a GENERAL_ENTITY_TYPE (actually call just "entity type" in NIAM). FACT-BASED_MODELs also do not distinguish ATTRIBUTE_TYPEs. In contrast, the difference between ATTRIBUTE_TYPEs and either ENTITY_TYPEs or OBJECT_TYPEs is a primary concern in ER-BASED_MODELs and OBJECTBASED_MODELs. As a minor note, not all OBJECT-BASED_MODELs or ERBASED_MODELs differentiate between IDENTIFIER_TYPEs and DESCRIPTOR_TYPEs. 6.5 Summary In this chapter we have applied the CoCoA model to conceptualizing, understanding, and integrating conceptual data modelling itself. This has provided us with a more formal definition of the CoCoA model. It has also provided an extensive example of the use of CoCoA and its power in helping model complex problem domains.
166
Part III A Possible Implementation Approach
Chapter 7 A Possible Internal Implementation: Objects and Relations
The purpose of chapters 7 and 8 is to show a possible high-level software design (architecture) for information systems (specifically, a CASE tool and environment) which directly supports the semantics of the CoCoA conceptual data model and thereby systems which support work in complex problem domains. The 4-level software architecture proposed in this dissertation (see section 7.1 and figure 7-1) progresses from the software architecture for CASE tools and environments proposed in [Klein and Venable]. Chapter 7 discusses the lower two levels and mappings to them from the CoCoA model, as well as their interface to each other. The overall architecture is introduced in section 7.1. Section 7.2 discusses implementation of CoCoA models in the bottom level, using a relational database. Section 7.3 discusses implementation of CoCoA in basic objects, the second layer. Section 7.4 discusses the interface between the lower two levels and section 7.5 provides an example. Chapter 8 will discuss the upper two levels.
7.1 Overview of the Architecture The proposed architecture is designed to support sharing of information between individual information systems (IS) which support individual views within a greater environment that supports work in a complex problem domain. The vision is of a federation of interacting IS, each supporting different aspects or views of the complex problem domain. The individual IS communicate (as least partially) via shared data about the concepts of the complex problem domain which is stored on secondary storage. 163
This general approach is widely advocated and not particularly novel. However, the architecture provides full support for CoCoA and therefore its ability to handle complex problem domains. It is somewhat novel in that individual IS do not necessarily share the same presentation of the shared concepts to their respective users. Additionally, the architecture provides some support for reusability at various levels of the resulting IS. The proposed IS architecture encompasses internal memory structures and operations on objects within the individual information systems, as well as the structure of secondary storage which is shared. The architecture has four levels (see figure 7-1). From lowest to highest, the levels are 1) secondary storage (a repository shared among all tools), 2) objects modelling the concepts of the complex problem domains (in-memory) with primitive/fundamental operations, 3) objects providing the users' cognitive-level actions on the concepts of the problem domain, and 4) the depiction of the concepts to the IS user and the dialogue mode (e.g., command, pop-up menus, etc.) of the IS. Each of these levels is now discussed from the bottom up.
164
Figure 7-1: Proposed IS software architecture 64444444444444444444444444444444444444444444444444444447 5 5 5 5 5 5 5
Main Level: Level 4 " Implements user representation of concepts " Implements the interaction mode " Manages the user dialogue
5 5 5 5 5 5 5
94444444444444444444444444444444444444444444444444444448 * * * * * * Calls * *
V
V
V
V
64444444444444444444444444444444444444444444444444444447 5 5 5 5 5 5 5 5
Cognitive Level: Level 3 " Implements user actions on the concepts " Independent of user representation " Independent of physical implementation of concepts
5 5 5 5 5 5 5 5
94444444444444444444444444444444444444444444444444444448 * * * * * * Calls * *
V
V
V
V
64444444444444444444444444444444444444444444444444444447 5 5 5 5 5 5 5 5
Low Level Objects: Level 2 " Implements primitive concepts " Implements in-memory data structure " Implements rudimentary operations on concepts
5 5 5 5 5 5 5 5
94444444444444444444444444444444444444444444444444444448 * * * * * * Uses * *
V
V
V
V
64444444444444444444444444444444444444444444444444444447 5 5 5 5 5 5 5 5
Secondary Storage Level: Level 1 " Persistent storage of concepts, e.g., IS model concepts " Integrated schema will allow data sharing with different tools
5 5 5 5 5 5 5 5
94444444444444444444444444444444444444444444444444444448
The bottom level (level 1 in figure 7-1) provides long-term storage (or persistence) of 165
the objects in the higher levels of the architecture. Data is shared between individual IS or IS components through this level. Therefore, level 1 is not just a level in an individual IS, but also part of what groups individual IS into a larger federation of systems within the larger context of the complex problem domain. Each individual IS supports a subschema from the overall integrated schema for the complex problem domain. Typically, an IS would support one of the entities at a higher level of granularity, e.g., an IS model in a CASE system. This dissertation recommends that this level be implemented using a relational database (RDB, see section 7.2), because an RDB effectively supports normalized storage of data. This is critical to ensuring that various anomalies do not occur in supporting complex problem domains. First, databases for complex problem domains must support access to objects at various levels of granularity. At arbitrary times, it may be desirable to view entities or objects from a very detailed point of view, a very aggregated point of view, or at some level in between. For example, in a CASE environment, we may wish to see a detailed attribute definition, no attributes at all - only entities and relationships, or which attributes are associated with which entities. Second, the sharing of some finergrained objects by several different covering aggregations (i.e., with overlapping complex objects) complicates the requirements. For example the attributes of an entity may also be used in database DDL definitions, as variables within programs, flow charts, minispecs, etc., in definitions of data flows, or in other places. A relational database can be used to ensure that shared objects are not deleted if they are components of more than one other object -- or to ensure deletion when they are not. In systems in which complex objects are stored monolithically, or otherwise in an non-normalized fashion, this is not possible. This problem is not addressed by the relational model itself, but relations can be included in the schema to explicitly model the needed constructs (see discussion in section 7.2). 166
The middle two levels (levels 2 and 3 in figure 7-1) together include the in-memory data structure for the internal representation of the concepts of a single view within a complex problem domain and a set of capabilities for manipulating it. Level 2 includes the in-memory data structure and a minimum set of primitive capabilities (operations) for manipulating the data structure at a detailed level. Example primitive operations on a data flow diagram in a CASE tool might be creating a data flow, giving it a name, or linking it to a process. Level 3 overlays more complex capabilities onto level 2, called cognitive level operations. Cognitive level operations are at the level at which the user would think about manipulating the concepts and which the user would invoke directly. Example cognitive-level operations on a data flow diagram might be to move a data flow connection from one process to another, to define a process in terms of subprocesses (explosion), or to group several processes into a single process (and possibly to push them to a lower level data flow diagram). It is a hypothesis that cognitive-level operations on a model can be built independently of the form of the user representation of the concepts and therefore reused. The capabilities of both level 2 and 3 could be implemented as software "objects" using object-oriented concepts, or as abstract data types (ADTs) with other programming concepts, depending on the implementation language to be chosen. See section 7.3. An "object" here represents a concept abstracted from a complex problem domain, such as the IS models described using CoCoA in chapters 5 and 6. The uppermost level (level 4 in figure 7-1) provides the user interface. It depicts the concepts in a user representation and implements the dialogue mode. This level could make use of other services than those shown here, such as a windowing system, a graphics package, a user interface management system (UIMS), or some other objects. Level 4 is the only level which is specific to a particular IS; only this level makes the 167
concepts visible to the user and provides the means to use information about them. This clearly separates the implementation of the concepts of the complex problem domain themselves (in levels 3 and below) from their depiction and the user interface (in level 4) within the architecture. Clearly separating the concepts of the problem domain being supported from the depiction and the rest of the user interface offers advantages. First, the lower three levels are fully reusable for any other IS which share the same viewpoint (i.e., that express the same semantics, even if they are depicted differently). E.g., two CASE tools supporting the same concepts (the same perspective) only depicted differently could be one for data flow diagrams and one for SADT activity diagrams. Effective and easy reuse lowers development costs. Second, by building the three lowest levels of the IS tool once and reusing them, we can free the tool developer to concentrate on level 4 in order to more effectively depict the concepts in the viewpoint to the tool user and design intuitive, easyto-use means for tool users to invoke the operations (provided by levels 2 and 3). 7.2 Implementation of CoCoA in the Relational DB Layer A relational database (RDB) was recommended above for implementing level 1 of the proposed architecture (figure 7-1). This section briefly summarizes the recommended means by which CoCoA models may be implemented within an RDB. [Venable 1993a] discusses the possibilities in detail. This section will use the term "relational attribute" to mean an attribute of a relation or tuple within the relational model, in order to distinguish one from an attribute of an entity in the CoCoA model (or in another conceptual data model). 7.2.1 Entity Types As discussed in [Venable 1993a], an entity type will have one relation corresponding to it. Each tuple corresponds to a single entity of that type. Such a relation can be called an "entity relation." 168
In the basic relational model, the domain of each relational attribute must be simple or atomic. Multi-valued relational attributes are not allowed. Therefore, any multivalued attributes must be implemented in a separate relation. 7.2.2 Named Relationship Types In the relational model, named relationship types are expressed through relational attributes which have domains that can be considered to be keys or identifiers to entity relations [Venable 1993a]. Two different implementation methods are possible depending on a named relationship type's multiplicity, either a dedicated relation or embedded foreign keys [Venable 1993a]. This dissertation recommends using dedicated relations and they will be used in all examples. A dedicated relation is called a "relationship relation." Each tuple in a relationship relation is called an "intersection record" and corresponds to an instance of the named relationship type. There is at least one relational attribute for each role in the named relationship; binary named relationships will have at least two relational attributes in their corresponding relationship, higher degree relationships will have correspondingly more. Each relational attribute in an intersection record contains the key or identifier of the entity type which plays the role, i.e., the key or identifier of the corresponding tuple in the entity relation. If a concatenated key is used, then a relational attribute is needed for each part of the key. Relationship relations are recommended for implementation for several reasons. First, named relationship types of any multiplicity can be expressed with a dedicated relation. N:M named relationship types must be expressed this way. Second, it will be easier to express the complex covering aggregation of relationships into complex entities if they are so represented. Third, the semantic gap between the conceptual model and its relational implementation is minimized. There is a natural correspondence between the relationship type and a relationship relation. 169
An important issue for relationally implementing CoCoA named relationship types is how to deal with roles that may be played by more than one entity type. This situation occurs because CoCoA treats categories implicitly. Therefore, it is discussed in section 7.2.4 below on categories. 7.2.3 Generalization-Specialization Relationships In the relational model, each tuple in a relation must have the same attributes as all other tuples. Different generalized and specialized entity types typically have different attributes. Therefore different relations are needed to implement entity types with different attributes - including specialized and generalized entity types. [Venable 1993a] discusses in detail different options for implementation and placement of attributes in these tables. This dissertation and its examples use an implementation with attributes placed in the most generic entity relations. In this implementation, each entity relation implementing a generic entity type must include at least one relational attribute which indicates a specialization of the entity. This will often be a partitioning attribute from the problem domain. For example, a "powered by" attribute in a vehicle entity relation indicates which specialization a particular vehicle belongs to, e.g., "motor", "human", or "horse". A separate entity relation will exist for each possible value of an attribute which indicates a specialization, i.e., there will be an entity relation for each specialized entity type. E.g., there should be horse-powered vehicle, motor-powered vehicle, and human-powered vehicle entity relations. Links are made between the specialized entity relation(s) and the generic entity relation because both contain the same key or identifier relational attribute. Its value in a particular specialized entity's tuple is the same as for a corresponding tuple in the generic entity relation. The only other relational attributes in the specialized entity relation will be those which are particular (specific) to the specialized entity type. For example, the engine-powered vehicle relation might have the attributes: vehicle number (the key), 170
horsepower, and fuel. To obtain all the attributes of any specialized entity, joins are necessary to combine the attributes of the specialized and generic entities. 7.2.4 Categories CoCoA treats categories implicitly by allowing roles in named relationships to be played by more than one entity type. There are two problems with this when it comes to relational implementation. The first is that it is not obvious which entity relation will contain the entity with the key shown in the relational attribute for the role. Some means must exist for finding out the entity type of the entity playing the role. The second problem is that the domains of the keys of the different entity types which participate in the role may be different. The relational model requires that the domain of relational attributes be the same. The first of these two problems is not that serious. The second may seriously complicate implementation. [Venable 1993a] discusses four possible implementation methods that address the first problem. The alternative used here is to create a separate "category relation." This is like a generic entity relation, except it has only two relational attributes, the key relational attribute for the category member and an attribute which indicates the specialization-like category member entity type. The latter relational attribute is could be used to look up the entity type of the category member. Solutions to the second problem are also discussed in [Venable 1993a]. The choice taken here is to use the same surrogate, system-generated keys to implement the identifier links between the relationship relation, the category relation, and the category members' entity relations. This obviates the problem of making only one key domain. 7.2.5 Covering Aggregation This section will briefly discuss the means used in this dissertation for relational implementation of covering aggregation relationships. [Venable 1993a] discusses the possibilities in detail. Section 7.2.5.1 will discuss implementation of the covering of 171
entities. Section 7.2.5.2 will discuss implementation of the covering of named relationships. 7.2.5.1 Covering Entities In the relational model, covering aggregation of entities (simple covering) can be expressed either as an explicit "covering relation" or with embedded foreign key attributes. [Venable 1993a] discusses these possibilities. Like explicit relationship relations, covering relations will be used here because the degree of the covering relationship will usually preclude embedding foreign keys in the covering object, i.e., there will usually be the possibility of more than one component of a covered entity type. Each covering relation has two relational attributes, one for the key of the covering entity and one for the key of the covered entity. We use one covering relation per entity type covered. Multiple covered entity types require a covering relation each. 7.2.5.2 Covering Relationships As established in [Venable 1993a], we must explicitly indicate in the relational data base when an instance of a relationship type is covered by a particular entity. [Venable 1993a] also discusses the implementation options. As for named relationship types and simple covering, the embedded foreign key alternative will not be used here. Instead, we implement covering aggregation of named relationships (complex covering) by combining the implementation of named relationships (as relationship relations) with the implementation method for covering of entities (covering relations).
First, we add a
surrogate, system-generated key relational attribute to the relationship relation which implements the covered named relationship type. This is used to identify the tuple representing the named relationship instance which is covered. Then, we use an explicit covering relation to identify the relation tuple(s) (intersection records) which represent the named relationship instances which are covered. 7.2.6 Surrogate Keys 172
System generated surrogate keys may be useful in simplifying the implementation of the relational storage. This dissertation proposes using a surrogate key relational attribute as the key to each tuple in both entity and named relationship relations in the relational database implementation (level 1 of the architecture). Doing so by puts all relational attribute keys into the same, guaranteed unique, domain. This solves the problem of different key domains for different entity types in a category.
7.3 Implementation of CoCoA in the Object-Oriented Layer Level 2 of the architecture shown in figure 7-1 may be implemented as objects using an object-oriented language system. An object-oriented implementation includes both data representation in instance variables and also operations. This section illustrates how both the data representation and the operations of objects in level 2 in the proposed architecture can be derived directly from a CoCoA model. These features (both instance variables and operations) serve as the interface to the low-level objects in level 2, which are used by the user-conceptual level objects in level 3 of the proposed architecture. Section 7.3.1 briefly describes the object-oriented data representation of CoCoA models. Section 7.3.2 introduces the operations needed on such objects. Level 3 of the architecture will not be discussed until chapter 8. 7.3.1 Data Representation Each of the 5 data abstractions supported by CoCoA ((1) entity types, (2) named relationship types, (3) generalization-specialization relationships, (4) categorization, and (5) covering aggregation) must be represented within level 2 of the architecture of figure 7-1 by instance variables encapsulated within objects. [Venable 1993b] discusses this in detail. This section is organized to briefly summarize a recommended object-oriented data representation of each of the above 5 data abstractions in sections 7.3.1.1 through 7.3.1.5 respectively. Section 7.3.1.6 addresses use of surrogate keys in the data 173
representation and section 7.3.1.7 summarizes this entire section. 7.3.1.1 Entity Types In the object-oriented model, simple (cartesian) aggregation of attributes is expressly supported. Object types are defined by specifying instance variables, similar to a record structure. Cartesian aggregation (of instance variables) is therefore a primary method for defining an object type and each object is a cartesian aggregation of attributes (along with operations). Therefore, each entity type will have a corresponding class, which will contain an instance variable for each attribute type. 7.3.1.2 Named Relationship Types Unlike entity types, the object-oriented model does not support named relationships with a dedicated construct. Named relationships are instead expressed with instance variables. The instance variables can be either pointers to other objects or object identifiers (leaving the object-oriented system to locate the object), depending on the object-oriented language system used. In the discussion below, wherever the term pointer is used, an object identifier could be used instead. As discussed in [Venable 1993b], there are two alternatives for using these instance variables, direct references and special named relationship objects. Direct reference implementation uses pointer(s) within one participating object to refer directly to the other participating object(s). With named relationship objects, the instance variable within the participating object(s) refer to a special named relationship object instead of directly to the other objects participating in the relationship. This named relationship object contains pointers to all the participating objects. The named relationship object implementation is chosen here. A named relationship object resembles an intersection record in the relational implementation. It contains one instance variable for each role in the relationship. Each instance variable refers to the object that plays that particular role. 174
The instance variables within the participating objects refer to this named relationship object instead of directly to the other objects participating in the same relationship. Each participating object only has to refer to the named relationship object no matter what the degree; i.e., objects participating in higher degree relationships use single instance variables instead of pairs, triples, and so on. If the maximum participation of the entity type is greater than one, then the pointer instance variable in the object class implementing the entity type will need to be implemented as a set or collection of pointers instead. Separate pointers within the participating objects are needed for each role played by the entity type. For example, consider a 1:N binary relationship, employee MANAGES employee, where an employee can play either the MANAGER or MANAGED roles. To an employee object, this is as if it can participate in two relationships. In this example, a set of pointers is needed for the MANAGER role and a single pointer is needed for the MANAGED role. There are several advantages to using named relationship objects. First, they can address potential problems of referential integrity in the implementation of the named relationship. For example, a problem may arise with direct reference pointers if one of the two objects is deleted and the other's pointer to it isn't changed. Similarly, if direct reference pointers are used reflexively, such a relationship implementation needs to be maintained, i.e., participating objects should always point to each other [Rumbaugh]. If one object's pointer is changed, but the other isn't, an inconsistency arises. A dedicated named relationship object centralizes responsibility for and can ensure referential integrity. Second, the named relationship object implementation reduces the semantic gap between the CoCoA model and its object-oriented representation. Third, it will simplify the data representation (and processing) of covering aggregation of named relationships, i.e., of complex covering aggregation. 175
7.3.1.3 Generalization-Specialization In the object-oriented model, specialization is supported directly, primarily to allow "inheritance" of type definitions when creating new object types. This makes definitions of new types quite easy. It also follows the use of specialization in CoCoA directly. Implementation should follow the structure of the generalization-specialization structure directly, if possible. A class should be created for each entity type on the diagram. Object types should inherit from those directly above them in the generalizationspecialization structure and should add the attributes where they are shown in the conceptual data model. However, a potential problem is that many object-oriented systems support do not allow multiple inheritance. If only single inheritance is permitted, mapping a CoCoA conceptual model with a multiple parent generalization-specialization structure will be much more difficult. It is therefore recommended here that an object-oriented system which supports multiple inheritance be used. However, since inheritance is mostly used to make it easier to implement data structures and operations, other means can be used to give the same effect. The same placement and structuring of attributes and methods within object classes can be achieved using explicit declaration to supplement whatever inheritance can be used to make it easier. 7.3.1.4 Categories Object-oriented implementation of categories is discussed in [Venable 1993b]. In the object-oriented model, categorization is not supported directly, hence there will be some semantic gap if we need to use this abstraction. Note again that categories are not needed if generalization and specialization will suffice. If the system permits multiple inheritance, implementation is easy. In this case, a category can be implemented by just converting it to a generalization and implementing a class for the category utilizing the object-oriented mechanisms discussed in section 176
7.3.1.3 above. This obviates the need for a separate category construct. The implementation of the relationship in which the category plays a role is done at the level of the generic (category) object type. The relationship pointer instance variable is then inherited by each class which implements a member of the category. Any operations relevant to the named relationship (discussed in section 7.3.2 below) should also be implemented at the parent "category object" level and inherited by the category members. Consider the example of the owner category which has person and corporation as members. An object class "owner" should be created with instance variables which implement its role in the "owns" relationship. Both "person" and "corporation" should then inherit from "owner" to have the attributes and methods to participate in the relationship. If the object-oriented programming system does not support multiple inheritance, some other mechanism must be constructed. One possibility is to construct a category object type. The category must then be used as an instance variable by each object type which participates in a relationship in the role implemented by the category. 7.3.1.5 Covering Aggregation [Venable 1993b] reviews methods for object-oriented implementation of covering aggregation. As with the relational implementation, implementation of covering aggregation in the object-oriented model follows in the same manner as that of named relationships. Implementation of covering aggregation is simpler than some named relationships, though, because covering aggregation is always a binary relationship. This section address simple covering aggregation (of entity types) first, then complex covering aggregation (or named relationship types). As discussed in [Venable 1993b], there are three alternatives for object-oriented implementation of simple covering aggregation: (1) instance variables that are direct references (either pointers or object identifiers), (2) a special covering aggregation object, 177
or (3) using the objects themselves. Because covering aggregation is a transitive relationship, implementation using a special covering aggregation object type is not an attractive implementation alternative. Using direct references or the component objects themselves are better. In this dissertation we will use direct references. To use direct references for implementing simple covering aggregation, if the maximum cardinality of the covering relationship is 1 (only one entity can be covered, we can use a single instance variable within the covering entity's object to point to the object implementing the entity that is covered. However, usually the maximum cardinality is n for the covering entity type's role. Therefore, we must usually use a set (or collection) of pointer instance variables. One instance variable (or set) is required per type of object covered. Like named relationships, covering aggregation relationships can be implemented reflexively, with pointers in both the covering and covered objects. Like simple covering aggregation, complex covering aggregation can be implemented by embedding pointers (or sets of pointers if more than one instance of a named relationship type can be covered by an entity type) to the other object types participating in the relationships within the covered objects. In this case, though, the embedded pointer points to the special named relationship object which implements the named relationship type. This is then identical to the mechanism for implementing simple covering as discussed above. A named relationship object can also contain its own reflexive instance variables for pointing to the object(s) which cover it. 7.3.1.6 Surrogate Keys In addition to the surrogate keys for the relational implementation, this dissertation further proposes using the same surrogate key as an instance variable within each object in the object-oriented system (levels 2 and 3 of the architecture, see figure 7-1). Having instance variables in the objects representing entities and named relationships which correspond to the key relational attributes in each entity and named relationship relation 178
in the relational database (level 1 of the architecture) will ease the problems of moving data between the object-oriented and relational levels of the architecture. When an object is created, a new surrogate key could also be created. This surrogate key should only be changeable when retrieving an existing object from a relational database (see operations below) or making a deep_copy operation. If this surrogate key can also be used as an object identifier (not just a pointer) in the object-oriented system, then it can also be used to implement relationships to other objects in the object-oriented levels of the architecture. When the objects are saved into the relational level of the architecture, as described below, the same object identifier or surrogate key can be the relational key attribute used to implement the relationships in the relational model. 7.3.1.7 Summary Standard means for implementing the conceptual data modelling abstractions as data structures from the object-oriented point of view have been presented. Simple entity types are mapped to objects using the standard object-oriented class with its instance variables. Named relationships should be implemented as dedicated named relationship objects, especially if the named relationship is to be covered. Generalization and specialization are implemented using the inheritance provided by object-oriented systems, preferably with multiple inheritance. Categories can be implemented just like generalized entity types, with category members inheriting the relationship from the class implementing the category. Covering aggregation relationships are implemented with embedded covering pointers. These will be treated specially by the operations discussed in section 7.3.2. The use of surrogate keys to improve the interface of levels 1 and 2 was also discussed. 7.3.2 Fundamental Operations An object-oriented system combines both data structures and the allowable 179
operations on them. The objects we are discussing here fit within level 2 of the proposed architecture (see figure 7-1). Most of these objects' classes' data structures are derived directly from the CoCoA model of the problem domain being considered, as described in section 7.3.1. All that remains is to specify the needed operations. This section briefly introduces the kinds of elementary operations that are needed to support the implementation of the CoCoA model's abstraction mechanisms. The idea of fundamental operations and some of the content to the following discussion follow from [Coad and Yourdon] and [Smolander]. See [Venable 1993b] for a more detailed comparison and review. Appendix D discusses the needed operations in more detail, relating them to the data implementations discussed in sections 7.2 and 7.3.1. [Venable 1993b and 1993c] provide more complete and detailed discussion on implementation means. This section does not show all of the operations needed. Operations which are tied more closely to the problem domain are not discussed here. These will have to be determined using methods discussed in various object-oriented design methods. The operations which are described here may be considered services which may in turn be used by more problem-domain-oriented operations. It also discusses additional operations which are needed to interface the object-oriented levels to the relational level in the proposed tool architecture. The need for some of the operations identified here will depend on the objectoriented language system used for implementation. Some of these operations are implicitly handled in some language systems and must be constructed explicitly in other language systems. This section is organized to discuss operations according to the CoCoA data abstraction they most directly support. Operations are not directly needed to support generalization-specialization relationships or for categorization (although these must be considered in how the other operations work). Operations are needed to support the 180
object-oriented implementation of entity types, named relationship types, and covering aggregation. These operations are introduced in that order in sections 7.3.2.1 through 7.3.2.3. Additionally, operations are needed on the objects in level 2 do provide the interface to level 1. These are introduced in section 7.3.2.4. 7.3.2.1 Operations to Support Entities and Attributes There are four basic operations needed to support entities and their attributes. The actual style of parameters used depends on the object-oriented language style. In many Object-oriented systems, the object recipient of the message is the first argument of the method, in others, a dot notation is used. This chapter assumes the latter. 1) Create_Object (Object_Type, Attribute_Value_List) This is the normal object-oriented instantiation operation. It may optionally initialize the instance variables (attributes) with appropriate values. Some object-oriented languages provide a Create_Object operation implicitly while others require one to be written. In some systems this is invoked by sending a message to the class, in others by sending it to the object itself. The same operation is used to create either simple or complex objects. Complex objects are initially created without any components. 2) Value_of_Attribute_xxx This is the normal object-oriented function operation to read the value of an instance variable of an object. For each attribute of each object, this operation is established, replacing Attribute_xxx with the attribute name. Depending on the object-oriented language system used, this operation will be implicit and a reference to the attribute name may suffice. 3) Modify_Attribute_xxx (New_Value) This is the normal object-oriented operation to assign a new value to an object's instance variable. For each attribute of each object, this operation is established, replacing Attribute_xxx with the attribute name. In those object-oriented 181
languages/systems which allow direct access to attributes (instance variables), this operation also will be implicit. 4) Delete_Object This is the normal object-oriented operation to throw away an unneeded object and free up the space it used. In object-oriented language systems with garbage collection, this operation may be optional, but not required. When an object is deleted, any relationships that the object participates in will also have to be deleted (see [Coad and Yourdon], [Smolander], and [Venable 1993b]) to preserve referential integrity. Deleting complex objects implies that any non-shared, existence dependent component object(s) must also be deleted. Finally, if an object being deleted is a component of another object, the operation must ensure that the covering aggregation relationship between it and the complex object is also deleted. 7.3.2.2 Operations to Support Named Relationships There are three fundamental operations needed to support named relationships. For each named relationship type, there will need to be create, delete, and traverse operations. Additionally, a fourth "is_related" function can be provided to examine the existence of a named relationship. The specific interface to each operation here can come in one of several different forms depending on whether 1) the entity type could play more than one role in the relationship, 2) the degree of the relationship, and/or 3) the maximum cardinality of the entity's participation. These forms are discussed in more detail in Appendix D and [Venable 1993b]. Whatever the form, the names of these operations need to be further elaborated according to the name of the relationship and/or role played, i.e., the Relationship_Role_xxx shown below must be replaced with the relationship name and/or role name from the CoCoA model. 5) Create_Relationship_Role_xxx(Other_Participating_Object_List)
This operation creates an object-oriented representation of a new instance of a named 182
relationship; it causes the appropriate data structures to implement the named relationship to be created and appropriate values assigned to them. Before the operation is invoked, it should be established that multiplicity constraints on the named relationship will not be violated. The role name need not be included in the operation name if the entity type which the object implements plays only one role in the named relationship. If more than one role is implemented in the object, then separate operations (with different names) are needed for each. 6) Delete_Relationship_Role_xxx (Other_Participating_Object_ID_List)
This operation eliminates an existing relationship between two (or more) objects. As for the Create_Relationship operation, the role need not be included in the operation name if the object only implements one role in the relationship. The operation should fail if attempting to reduce the cardinality of the relationship below the minimum. The named relationship must also exist before it can be deleted. Otherwise, it is an error to delete it. If the relationship being deleted is a component of any complex object(s), any covering aggregation references(s) to or from the complex object(s) must also be removed. If the object can only participate once in the given role in the named relationship (maximum cardinality = 1), then it is obvious which instance of the relationship to delete and the parameter isn't needed. If the object could participate more than once (maximum cardinality > 1) in the given role of the relationship, then the parameter is used to locate which relationship to delete. If the relationship to be deleted is binary, there will only be one other participating object in Other_Participating_Object_ID_List. If the relationship is ternary or higher, there will be two or more objects in the list which are used to locate the correct relationship to delete. 7) Traverse_Relationship_Role_xxx_to_Role_xxx (Search_Role_ID_List) returns a pointer, Object_ID, or set of pointers or Object_ID s
This operation (a function) enables us to see what other object(s) participate in the 183
particular named relationship type in which the object on which we invoke it participates. If the maximum cardinality of the object in the role is 1, then only one pointer will be returned. If the maximum cardinality is greater than one, then a list with any number of pointers can be returned. If the relationship is binary, there is only one other role and the second Role_xxx in the name is unnecessary. If the relationship is ternary or higher, the second Role_xxx is needed to determine which of the other roles to which we want to traverse the relationship. The interface to the operation could instead use Role_xxx as an additional parameter. The Search_Role_ID_List parameter is optional. It is only used where we additionally want to specify something about ternary or higher degree relationships to be traversed. The Search_Role_ID_List parameter holds values to search for in other roles in the relationship, i.e., other than the role played by the object on which the operation was invoked and also different than that played by the object(s) we are seeking. Only object(s) which participate in the Role_Type_to with the object on which the operation is invoked and additionally have objects which are in the Search_Role_ID_List are returned. The addition of such a parameter gives the operation some of the character of a SELECT operation on a relational database. Including no parameters (the equivalent of the second and third forms) would return all objects playing the other role. 8) Is_Related_by_Relationship_Role_xxx (Other_Role_ID_List) returns boolean
This is a function which returns TRUE if the relationship with the objects in Other_Role_ID_List exists as stated, FALSE otherwise. Like the operations above, the Role may not be needed as part of the name. 7.3.2.3 Operations to Support Covering Aggregation The above 8 operations follow from [Coad and Yourdon] and [Smolander] (see [Venable 1993b]). This dissertation further defines several new operations which support 184
the additional semantics of the complex covering aggregation data abstraction supported in CoCoA. This section presents them. Each complex covering aggregate object needs operations to support the semantics of its complex covering aggregation relationships. For covering aggregation relationships, there need to be operations to include objects and named relationships as components and to disinclude covered objects or named relationships. Also needed are operations to identify components and/or see whether particular objects or named relationships are covered components or not. Additionally, each object which is a covering aggregate object may need operations to account for their complex nature when they are assigned or compared. These 10 operations are presented below. Like operation types 1-8 above, some of these operations' names need to be further specified for each aggregation relationship defined in the conceptual data model. In the first eight operations below, the _Object_xxx or _Relationship_xxx needs to be replaced by the actual name of the object type or named relationship type that is being aggregated into the covering object. The following operations are all to be made part of a covering object, which implements a complex entity.
9) Include_Object_xxx (Included_Object_ID) This operation causes the object-oriented system representation to be updated to reflect that the complex object on which this operation is invoked contains the object in the parameter Included_Object_ID as a component. A precondition for this operation is that the object to be covered must not already be a component of the same object. Otherwise, an error should be returned. Additionally, a component which cannot be shared should not be included if it is already a component of another complex object. 10) Disinclude_Object_xxx (Disincluded_Object_ID) This operation causes the object-oriented system representation to be updated to 185
reflect that the complex object on which this operation is invoked no longer contains the object in the parameter Included_Object_ID as a component. The object to be disincluded must already be covered by the complex object on which the operation is invoked; otherwise it is an error. Additionally, if the object being disincluded is both existence dependent and not shared with another covering object, it should also be deleted. 11) Include_Relationship_xxx (Named_Relationship_Object_ID) This operation is the same as Include_Object_xxx, except it is for a named relationship. 12) Disinclude_Relationship_xxx (Named_Relationship_Object_ID)
This operation is the same as Disinclude_Object_xxx, except it is for a named relationship. We may also wish to explicitly examine the state of the covering aggregation relationships. The following four operations provide that capability. They are similar to, and parallel, the Traverse_Relationship and Is_Related_by operations in section 7.3.2.2. Therefore, they are only listed briefly here. 13) Retrieve_Component_Object_xxx This function returns a set of Object_IDs or pointers to objects of type Object_xxx. 14) Retrieve_Component_Relationship_xxx This function returns a set of Named_Relationship_Object_IDs or pointers to named relationship objects of type relationship_xxx. 15) Is_Component_Object_xxx (Object_ID) This function returns TRUE if Object_ID is in the set of covered objects of type object_xxx. 16) Is_Component_Relationship_xxx (Named_Relationship_Object_ID)
This function returns TRUE if Named_Relationship_Object_ID is in the set of 186
covered named relationships of type relationship_xxx. The covering aggregation relationships established, deleted, and/or viewed by the above operations are utilized implicitly for a number of other operations according to their semantics in the CoCoA model. When an action is performed on a whole complex object, it is sometimes implied that the same should happen to its components. In particular, this is true deep copying and deep comparing (discussed immediately below) and for storage and retrieval of the complex object (discussed in section 7.3.2.4 below).
17) Deep_copy This operation (sometimes called deep clone) produces a copy of a covering object, including copies of all of its components. The alternative operation, a simple copy, copies just the object and its (cartesian aggregate) attributes. 18) Deep_equal (Other_Covering_Object_ID) A simple equality operation on a covering object will only compare the cartesian aggregate attributes of the complex object and its references to its component objects and named relationships. This operation instead allows us to determine whether the covering object refers to equivalent component objects and named relationships, but which are not necessarily identical (i.e., not with the same identity). It returns a TRUE value if all of the attributes of the two objects are equal and they have component objects which are also deep_equal. 7.3.2.4 Operations to Interface the Object-Oriented and Relational Levels This section briefly introduces the operations needed to interface the relational implementation in level 1 with the object-oriented implementation in level 2 in the architecture in figure 7-1. These operations are discussed in more detail in appendix D and in [Venable 1993b and 1993c]. In particular, [Venable 1993c] discusses the generic implementation of these operations making use of a meta schema based on the CoCoA 187
models used as a basis for the relational and object-oriented implementations. The implementation recommended for level 1 is a relational data base (RDB). Level 2 is to be implemented in objects using an object-oriented language system. We need to be able to interface the two levels in order to make use of the objects within memory (level 2), yet share them through secondary storage (level 1). To do this, we will add additional operations beyond the 18 listed in sections 7.3.2.1-7.3.2.3. For both the entities and the named relationships, the operations needed are to 1) store the relational equivalent of the object-oriented representation into the RDB, 2) retrieve the equivalent object-oriented representation from the RDB, 3) delete the relational representation from the RDB. These operations are related to some of the operations on composite objects in [Kim et al. 1988], but expand the discussion to include covering of relationships and relate the operations more specifically to storage in an RDB. Together these comprise 6 additional operations on objects in level 2. This section introduces the operations in turn, first for CoCoA entities and then for named relationships. It takes into account the relational and object-oriented implementations of CoCoA previously discussed. Each of the names of the new operations begins with RDB, which stands for Relational Data Base. 19) RDB_Store This operation cause an object to store itself into a relational database. Put more formally, it changes the state of the representation of a CoCoA entity within the persistent storage of the relational system (level 1) to be consistent with the state of the representation of the same CoCoA entity as an object held in-memory by the objectoriented system (level 2). The previous state of the relational representation of the entity is discarded. A simple object with no relationships exists as a single tuple in a relational database. Its instance variables correspond to the relational attributes. If there is no corresponding 188
tuple in the appropriate relation, this operation causes a new tuple representing the object to be added to the relation. If the corresponding tuple already exists, this operation replaces the tuple with one representing the object. Therefore, RDB_Store corresponds to the "add" and "change" transactions in a traditional transaction processing system. The object-oriented implementation of an entity may contain instance variables which implement part of named relationships, as discussed in section 7.3.1.2. Those instance variables should not be saved by this operation, only by the use of the RDB_Store_Relationship operation to be discussed below. If the object is a complex object, this operation should invoke the store operations for those things that it covers. Entity objects covered should be stored first, then named relationship objects. This can be done by invoking the RDB_Store operation and/or RDB_Store_Relationship operation (discussed below) on each of the components. This operation must also ensure that the object's instance variables get stored in the appropriate relational attributes of the appropriate specialized and/or generalized entity relations. 20) RDB_Retrieve This operation retrieves an object from a relational database. Put more formally, this operation changes the state of the representation of an entity as an object held in-memory by an object-oriented system (level 2) to be consistent with the state of the representation of the same entity within the persistent storage of a relational system. The previous state of the object in-memory is discarded. If the object to be retrieved is a complex object, i.e., if there are representation(s) in the relational database that it covers other objects and/or named relationships, then this operation should invoke create and retrieve operations for those things that it covers. Entity objects covered should be created and retrieved first, then named relationship objects. 189
Like with the RDB_Store operation, the instance variables for the object being retrieved must get their values from the appropriate relational attributes according to the generalization-specialization structure. 21) RDB_Delete This operation deletes an existing object and any relationships it participates in from a relational database. Put more formally, this operation discards the representation of an entity within the relational system (level 1) which corresponds to the entity identified by the object held in-memory by the object-oriented system (level 2). The operation must also discard the relational representations of any named relationships in which the entity participates, to ensure preservation of referential integrity within the relational realm. For a simple object, the single tuple representing it is deleted from the relation corresponding to the object's type. For a complex object, the objects and relationships may need to be deleted if they are both existence dependent and not shared with other objects. Those objects which are not existence dependent will stand on their own should and not be deleted. The relational representation must be consulted to determine whether a component is shared with other entities, since the entities that the components are shared with might not be held inmemory by the object-oriented system at a particular point in time. Whether the components are to be deleted or not, the relational implementation of the covering relationships must be deleted also. Having discussed the operations for entities, we will now discuss them for named relationships. Similar operations are needed. The operations should be built into the named relationship objects instead of the entity objects. 22) RDB_Store_Relationship This operation maps the object-oriented implementation (pointer instance variables) of a named relationship onto its relational storage (surrogate key attributes). Named 190
relationship objects are mapped onto tuples in relationship relations. If there is an old tuple, it is replaced. If there is no old tuple, a new tuple is inserted. The RDB_Store_Relationship operation must also account for categories, if they are used. A tuple must be present in the category relation and the relational attribute depicting which entity type of several in the category is participating in the relationship must be set according to the entity type represented by the object. 23) RDB_Retrieve_Relationship The opposite process to the RDB_Store_Relationship operation must happen when an object is being retrieved. The RDB_Retrieve_Relationship operation, like the RDB_Store_Relationship operation, must account for categories. 24) RDB_Delete_Relationship This operation is similar to the RDB_Delete operation on entity objects. A precondition to invoking this operation is that if there is a mandatory role in the named relationship (minimum cardinality > 0), the entity tuple for the entity playing that role should have been deleted first, or the operation not be allowed to proceed. Like RDB_Retrieve_Relationship operation and RDB_Store_Relationship, the RDB_Delete_Relationship operation must account for categories. 7.4 An Example This section presents a few short extracts of an example of the relational and objectoriented implementations of a CoCoA model. It makes use of the ISAC solution [Lundeberg] to the well known IFIP case of the organization of a working conference [Olle et al.]. The details of the example are presented in Appendix E. Only a few short extracts of the example are presented here. Section 7.4.1 shows extracts of the relational implementation. Section 7.4.2 shows extracts of the object-oriented implementation. 7.4.1 Relational Implementation of the ISAC IFIP CASE Solution
191
We have already created a CoCoA Model of ISAC A-Graphs (see figures 5-5 and 5-
Figure 7-2: CoCoA Model of Integrated Data Flow Perspective 192
6). The CoCoA Model of the integrated data flow perspective is repeated here in figure 7-2. The ISAC Overview A-Graph of the solution to the IFIP Case as shown in [Lundeberg] is repeated in figure 7-3. The first part of implementating relations is deriving the schema from the CoCoA model in figure 7-2. The relational tables required to implement the entity types, named relationship types, implied category types, and covering aggregation relationship types are shown below.
193
a) Tables implementing entity types Data_Flow_Model ISAC_A-Graph Process External Generalized_Flow
b) Tables implementing named relationship types Specifies Flows_to Flows_from Join_into Splits_into
c) Tables implementing implied category types Destination Source
d) Tables implementing covering aggregation relationships ISAC_A-Graph_Covers_Process ISAC_A-Graph_Covers_External ISAC_A-Graph_Covers_Generalized_Flow ISAC_A-Graph_Covers_Flows_to ISAC_A-Graph_Covers_Flows_from ISAC_A-Graph_Covers_Join_into ISAC_A-Graph_Covers_Splits_into
194
Having specified the schema for relational storage of ISAC A-Graphs, now we can turn our attention to populating the schema with data to represent the overview A-Graph
Figure 7-3: Overview A-Graph of ISAC Solution to IFIP Case 195
from the IFIP Case example given in figure 7-3. In this section, we will only give some small examples of the populated schema. For the complete example, consult appendix E.
The surrogate key attribute in this example is a simple integer. Other representations may be more useful for surrogate keys, but are not considered here. The surrogate keys for each tuple, whether in an entity relation or a named relationship relation, must be unique to that tuple. In figure 7-3, there is exactly one ISAC A-Graph, which is shown by the population of the Data_Flow_Model and ISAC_A-Graph tables shown below. Data_Flow_Model ISAC_A-Graph The A-Graph in figure 7-3 contains several processes (called activities in ISAC), externals (which are implied by the source or destination of a set in an ISAC A-Graph), and generalized flows (called sets and flows in ISAC). Examples of the data to represent one of each of these from figure 7-3 are shown below. Process External Generalized_Flow The A-Graph in figure 7-3 also contains several named relationships. Examples of the relational representation of some of them which relate the above entities are shown below. Flows_to 196
Flows_from The Flows_to and Flows_from named relationships in figure 7-2 both have roles which can be played by more than one entity type, implying 2 categories. Examples of the relational implementation of the participation of the entities in those categories are shown below. Destination Source The final part of our example of the population of the relational schema with data representing the ISAC A-Graph in figure 7-3 is for the covering aggregation relationships. The entities and named relationships listed in the example above are all components of the A-Graph. Below are shown example tuples from two of the covering aggregation relationships, one for a covered entity type (Process) and one for a covered named relationship (Flows_to). ISAC_A-Graph_Covers_Process ISAC_A-Graph_Covers_Flows_to This completes the relational portion of our example of implementation of the ISAC solution to the IFIP Case. The next section considers the object-oriented implementation.
7.4.2 Object-Oriented Implementation of the ISAC IFIP CASE Solution The object-oriented implementation of the ISAC A-Graph requires the definition of 197
classes for the entity types and named relationship types of the CoCoA model of ISAC AGraphs, as shown above in figure 7-2. The standard mappings for the data structure part of the class definitions were shown in section 7.3.1. The standard or fundamental operations needed were introduced in section 7.3.2 (and are explained in more detail in Appendix D. The following is an extract of the class specifications developed for the full example, which is given in Appendix E. As described in section 7.3.1, a class definition for a class which represents an entity type from a CoCoA model must include instance variables to represent 1) the attributes of the entity type, 2) the playing of a role in a named relationship, and 3) the covering of other entity types and named relationship types. A subset of the needed instance variables which represents at least one of each of these is shown in the following (next page) partial example of a class to implement the Data_Flow_Model entity type in figure 7-3. Only parts of this class example are shown here. See appendix E for the full class specification. Additionally, there are operations needed to support the various aspects of the object and the CoCoA data abstractions which it implements. Following the instance variables, a representative selection of these operations is shown. This class inherits from a class RDB_Storable, which is explained further below.
198
Class Data_Flow_Model inherits RDB_Storable
-- class explained below
-- instance variable for attribute Name: text -- instance variable for participation in named relationship Specifies: --> Specifies -- points to named relationship object -- instance variable for covering of an entity type Covers_processes: set of --> Process -- points to entity object -- instance variable for covering of named relationship type Covers_flows_to: set of --> Flows_to -- points to relationship -- operations for named relationships Procedure Create_Specifies (proc: --> Process) Procedure Delete_Specifies Function Traverse_Specifies returns --> Process Function Is_Specifier_of (proc: --> Process) returns boolean -- operations for covering entity Procedure Include_Process (proc: --> Process) Procedure Disinclude_Process (proc: --> Process) Function Retrieve_Component_Process returns set of --> Process Function Is_Component_Process (proc: --> Process) return boolean -- operations for covering named relationship Procedure Include_Flows_to (FT: --> Flows_to) Procedure Disinclude_Flows_to (FT: --> Flows_to) Function Retrieve_Component_Flows_to returns set of --> Flows_to Function Is_Component_Flows_to (FT: --> Flows_to) return boolean -- operations for complex object Function Deep_Copy return --> Data_Flow_Model Function Deep_Equal (dfm: --> Data_Flow_Model) return boolean -- operations for storing named relationships Procedure RDB_Store_Specifies Procedure RDB_Retrieve_Specifies Procedure RDB_Delete_Specifies end Class Data_Flow_Model.
As noted above, Data_Flow_Model class inherits from a class called RDB_Storable. This class would implement the generic operations to make classes storable into a 199
relational database. Other classes only need to inherit from this class to get those operations. The specification for this class is shown now. Class RDB_Storable -- instance variables Surrogate_Key: RDB_Surrogate_Key -- operations Procedure RDB_Store Procedure RDB_Retrieve Procedure RDB_Delete end Class RDB_Storable. Named relationships are stored in named relationship objects in the object-oriented system. Like entity objects inherit from RDB_Storable, named relationship objects inherit from RDB_Storable_Relationship_Object, which contains generic operations used for storing and retrieving relationship objects. Its specification is shown below. Class RDB_Storable_Relationship_Object -- instance variables Surrogate_Key: RDB_Surrogate_Key -- operations Procedure RDB_Store_Relationship Procedure RDB_Retrieve_Relationship Procedure RDB_Delete_Relationship end Class RDB_Storable_Relationship_Object. An example of a class for a named relationship object is shown below. Note that there is one instance variable for each role in the named relationship, which in this case is a binary relationship. Class Specifies inherits RDB_Storable_Relationship_Object -- instance variables -- pointers to entity objects specifier: --> Data_Flow_Model specified: --> Process end Class Specifies. This completes the extracts from the example implementation of the ISAC solution to the IFIP Case. Further details are shown in Appendix E. 200
7.5 Summary CoCoA provides constructs sufficient to describe complex problem domains. Such complex problem domains should be well understood and integrated together in order to be better supported by information systems. A conceptual model like CoCoA will assist in the analysis of a complex problem domain so that it may be better integrated. Once conceptualized and integrated with CoCoA, information about the complex problem domain must represented so that it can be stored and manipulated within an information system. This chapter has introduced an architecture for such an information system and the implementation of CoCoA models within it. It has also shown how CoCoA can be directly mapped onto information structures in both the object-oriented levels (levels 2 and 3) and the relational level (level 1) of the architecture. The latter of these supports the integration of the complex problem domain. Finally, this chapter has also identified a set of 24 different operations that may be needed in the objects which implement a CoCoA model, divided into 4 different groups by the abstraction mechanism they support and their support for the interface between layers 1 and 2 of the architecture.
201
Chapter 8 A Possible External Implementation: The User Interface
The previous chapter introduced one possible architecture for the building of information systems to support complex problem domains. That chapter concentrated on the lower two levels of the architecture. Those two levels emphasize the detailed, internal representation of the concepts modelled in CoCoA, and especially the ability to support the characteristics of complex problem domains. They are directly derivable from CoCoA models of the problem domain. The purpose of chapter 8 is to present the upper two levels in the architecture. These two levels concentrate more on the user interface of an information system. They support the user presentation of the complex, multi-granularity, partially-overlapping objects modelled with CoCoA. They support the implementation of a user's cognitive-level actions for manipulating and viewing those objects, such as editing. Finally, they support the dialogue management and style for invocation of those actions. Section 8.1 re-introduces the architecture, emphasizing the place of levels 3 and 4 within it. Sections 8.2 and 8.3 covers level 3 and 4 respectively, in more detail. Section 8.4 provides an example illustrating how these levels might work to support an example complex problem domain. 8.1 Review of the Architecture As discussed in chapter 7 (see figure 8-1, which repeats figure 7-1), the tool architecture proposed here has four levels. The levels (from bottom to top) are for 1) secondary storage facilities (a repository shared among all tools), 2) objects modelling the 207
concepts of the complex problem domain (in memory) with primitive/fundamental operations, 3) objects providing the users' cognitive-level actions on the concepts of the problem domain, and 4) the presentation or depiction of the concepts to the IS user and the dialogue mode (e.g., command, pop-up menus, etc.). In this chapter, we are primarily interested in levels 3 and 4. Level 2, which was described in detail in chapter 7, includes both an in-memory data structure representing a view of the concepts within the problem domain and a minimum set of capabilities (operations) for manipulating the data structure at a detailed level. As we saw in chapter 7, these minimum capabilities include such things as the creation of individual objects and establishing named relationships or covering aggregation relationships between them. It also includes operations for interfacing to Level 1. Level 3 overlays a set of cognitive-level operations on level 2. Cognitive-level operations implement more complex capabilities than those in level 2. They are at the level at which a user would think about manipulating the concepts and which the user Figure 8-1: Proposed IS software architecture 64444444444444444444444444444444444444444444444444444447 5 5 5 5 5 5 5
Main Level: Level 4 " Implements user representation of concepts " Implements the interaction mode " Manages the user dialogue
5 5 5 5 5 5 5
94444444444444444444444444444444444444444444444444444448 * * * * * * Calls * *
V
V
V
V
64444444444444444444444444444444444444444444444444444447 5 5 5 5 5 5 5 5
Cognitive Level: Level 3 " Implements user actions on the concepts " Independent of user representation " Independent of physical implementation of concepts 94444444444444444444444444444444444444444444444444444448
208
5 5 5 5 5 5 5 5
* *
* *
V
V
Calls
* *
* *
V
V
64444444444444444444444444444444444444444444444444444447 5 5 5 5 5 5 5 5
Low Level Objects: Level 2 " Implements primitive concepts " Implements in-memory data structure " Implements rudimentary operations on concepts
5 5 5 5 5 5 5 5
94444444444444444444444444444444444444444444444444444448 * * * * * * Uses * *
V
V
V
V
64444444444444444444444444444444444444444444444444444447 5 5 5 5 5 5 5 5
Secondary Storage Level: Level 1 " Persistent storage of concepts, e.g., IS model concepts " Integrated schema will allow data sharing with different tools
5 5 5 5 5 5 5 5
94444444444444444444444444444444444444444444444444444448
would invoke directly. Of course, the user may also wish to use the more low-level operations of level 2, so they may also be made available. Level 3 simply adds these higher level operations. Level 3 and cognitive-level operations are discussed in more detail in section 8.2. Only the uppermost level (level 4) is specific to a particular IS or tool. It uses the lower 3 levels, and adds the user presentation, the dialogue style, and the management of the user dialogue. This clearly separates the internal representation of the concepts of the IS's view of the problem domain from the user interface. Level 4 is discussed in detail in section 8.3. Together, levels 3 and 4 can be considered to be the user interface. Therefore, the user interface as a whole is clearly separated from levels 1 and 2 in the architecture. Cognitive-level operations are given a separate level from the rest of the user interface so that they may be reused by different IS which share the same view of (their part of) the 209
problem domain (i.e., they expresses the same semantics), but have different user presentations and/or dialogues. By reusing the lower three levels of the architecture in another information system, IS developers can concentrate on the user interface in level 4. This frees them to more effectively depict the concepts in the view to the IS user (through a particular user representation) and to design more intuitive and easy-to-use means for the IS users to invoke the operations (provided by levels 2 and 3). To use (and reuse) the architecture, all three of the lower levels will need to be implemented for each view within the problem domain (except for the parts which overlap at the bottom level, which are shared). Once that is done, they can be reused by multiple IS, which can add different user interfaces in level 4. 8.2 Level 3: Cognitive-Level Operations Level 2 of an IS using the architecture will embody both a data structure to represent a logical model and the fundamental operations needed to manipulate that data structure. Level 3 implements additional, higher-level operations on the model. These are here called cognitive level operations and level 3 is also called the cognitive level. Cognitive level operations are operations at the level at which the user thinks about manipulating the model (or its user representation). They are at a higher level and are more complex. They have to do with higher level semantics within the problem domain. Typically, they are built up from combinations of the lower-level operations in level 2. For example, some cognitive-level operations on a data flow diagram might be 1) to move a dataflow connection from one process to another, 2) to define a process in terms of subprocesses (explosion), or 3) to group several processes into a single process. Example cognitive-level operations on a CoCoA model might be to enclose a group of entity types and relationship types as a covering aggregation relationship with an entity type or to merge two entity types (and their attributes) into a single entity type. It is a hypothesis proposed by this work that the needed cognitive-level operations on a 210
view will be independent of the user representation, as long as the semantics are the same. It is also hypothesized that a comprehensive set of needed operations can be determined for each view within the complex problem domain. However, it is beyond the scope of this dissertation to investigate these hypotheses. See chapter 9 on future research. As mentioned in chapter 7, the capabilities of both level 2 and 3 could be implemented as software objects using object-oriented concepts, or as abstract data types (ADTs) using other programming concepts, depending on the implementation language to be chosen. In this dissertation, we concentrate on object-oriented implementations which make use of inheritance to ease the implementation (and subsequent maintenance). The most straight-forward implementation will be simply to inherit an object which implements an entity which is at the level of a view from level 2. This will bring in the data structure and the low-level operations. Then the additional cognitive-level operations can be added and defined, using the low-level operations in level 2 as local services. If other ADT implementations are used for programming languages without inheritance (e.g., Ada), then an ADT should be implemented for the low-level data structure and operations first. Another implementation can be made for the cognitive level, which uses the services of the first. Of course, new interfaces would also need to be added to the cognitive level implementation for those low-level operations which are also desired. Implementation of those interfaces would be very simple -- just be calls to the first implementation.
8.3 Level 4: User Presentation and Dialogue Level 4 of the architecture makes use of the other 3 levels, but particularly level 3 (since it is the interface to the two levels). Level 4 encompasses 3 main things. The first is the user representation or depiction of the concepts of the problem domain. The second is the user interface style, such as whether the interface is accoplished with direct 211
manipulation or pull-down menus. The third is the actual structure of the dialogue within the selected style. 8.3.1 User Presentation The instance variables within level 2, which are inherited by level 3, represent the concepts of the problem domain within the computer system. Level 4 must represent or depict those concepts to the user of the information system. The depiction must be in a way that is understandable, and suitable to the work which the user performs using them. There may potentially be a number of different ways to depict the same concepts. Possible alternatives include text, lists and tables, graphic diagrams, and so on. Within each of these alternatives, there are a multitude of other choices. It is a design decision to choose the best user representation. As noted in the introduction to level 4 in section 8.1, it is possible to have more than one information system supporting the same view of its part of the problem domain and that different user presentations could be chosen for each information system. This might be done, for example, because the concepts emphasized in one task are different than for another. Or it might be that they are more easily conceptualized in one representation than in another for a particular task. This, however, introduces the problem of how to go from one user representation to another. Each user representation has a particular syntax for the semantics of the concepts that it represents. To move from one user representation requires extracting the semantics of the concepts from the syntax of the user representation. This is expressly supported by the 2nd and 3rd levels because they represent the concepts as they were modelled in CoCoA. In doing this modelling, we should be very careful not to include the syntax. The process then is one of removing the syntax from one user representation and then adding a different syntax back in for the other user representation. The first part of this is called "parsing." The second part is called "unparsing." 212
If there is only one user representation used for the view, then information about the particular syntax used by the user to express a particular set of concepts to the system can be stored and used for unparsing, i.e., the unparsed user representation is also stored and simply recalled the next time it is needed. For example, the type and placement of symbols in a graphical user representation can be stored, along with a representation of the semantics of those symbols, as modelled using CoCoA. However, if there is more than one user representation, and a movement is made from one of them to another, then the unparser in the information system supporting the second user representation must be able to generate the syntax for the user representation of the concepts to the user, not just used a stored description of it. This is much like derivation in the CoCoA model, only of the user presentation's concepts. The architecture supports this because the IS with the first user presentation can unparse and save the concepts to the relational database, then a second IS can retrieve the concepts and unparse them. 8.3.2 User Interface Style The style of the user interface is the choice of whether a command language, formfilling, menus, direct manipulation, etc. interface style is to be used as the dominant approach by which the user may invoke the conceptual operations and give them operandi from the problem domain. Of course some combination is possible. It is generally important that the style be consistently applied, rather than to switch between different styles in different parts of the user interface. 8.3.3 Dialogue Structure Within the overall style, lower-level choices must be made about the details of the dialogue structure. Particular menus or commands have to be created. These have to be linked to the conceptual operations. It must also be determined how the operandi or parameters for the conceptual operations will be indicated by the user, possibly by defaults. This must, of course, also be consistent with the user representation of the 213
concepts. It must also be decided how the results of any operations which are invoked are to be presented to the user within the context of the user representation. Of course, consistency in the design within the style is also important to the user interface design. 8.3.4 Implementation The user interface, whatever the style of the interaction or the depiction of the problem domain concepts to the user, can be implemented in a number of ways, the same as for level 3. It could be implemented using object-oriented constructs and inheritance. It could instead be implemented using the services provided by implementations of ADTs in level 3. Note that level 4 could make use of other services than those shown here, such as a windowing system, graphics package, a user interface management system (UIMS), or some other objects. 8.4 An Example In this section, we will expand on the example begun in section 7.4 of chapter 7 by showing how it could be implemented in the upper two levels of the architecture. The example shown in chapter 7 (figure 7-2) was of the IFIP case used by the CRIS conferences, as modelled using ISAC A-Graphs in [Lundeberg] (p. 183). First we will discuss the cognitive-level operations in level 3 of the architecture, then the user depiction of the A-Graphs and dialogue mode in level 4 of the architecture. 8.4.1 Cognitive-Level Operations Cognitive-level operations are operations at the level at which the user thinks about actions on the problem domain. They are generally at a higher level than the basic operations which are built into level 2 of the architecture (although those operations may also be needed and made available directly to the user. This section, as an example, lists and discusses some possible cognitive-level operations. 1) Create_Block 214
This operation is used to group a number of the constructs of an ISAC A-Graph together. Once created, the block/group can be used for some of the other cognitive-level operations. One thing that will need to be considered when implementing the operation is a data structure for the block. Since a block is composed of the same objects as an ISAC A-Graph itself, an ISAC A-Graph object could be used. This is an implementation issue and is not discussed further here. 2) Copy_Block This operation takes an existing block and inserts it into an existing ISAC A-Graph. The existing ISAC A-Graph could be the same one that the block was copied from - or a different one. 3) Move_Block This operation removes an existing block and inserts it an existing ISAC A-Graph. 4) Delete_Block This operation removes an existing block from an ISAC A-Graph. 5) Push_Block_to_Lower_Diagram This operation takes an existing block from one ISAC A-Graph and creates a new ISAC A-Graph which contains only that block. An important part of the semantics of the operation is that the operation must preserve the SPECIFIES relationships between the processes removed with the block and any other ISAC A-Graphs that specify them. Finally, in the place of the block removed from the first ISAC A-Graph, a new PROCESS should be inserted and take up all the old connections from the block. While it is a user interface design concern, the user should probably be prompted to give names to this new process as well as to the new ISAC A-Graph containing the block. This cognitive-level operation will be illustrated in section 8.4.3. 8.4.2 User Interface Level Level 4 of the architecture encompasses the user depiction of the concepts of the area 215
of the problem domain that it supports and also the user dialogue mode. The user depiction of ISAC A-Graphs would be best if it used graphics that mimic the paper-based method. One can imagine that an empty box would be available first and that the user could then pick and choose the graphic symbols, place them in the diagram, annotate them with text, and make connections. Of course, there are also various lowlevel decisions about the specific depiction - shapes, line widths, fonts, and so forth. This is like any other CASE tool and doesn't warrant further discussion.
Other
depictions are possible, such as pure text, but would distance the implementation from the paper-based diagrams, so are not discussed further. Once the user depiction is established, we also need to consider the user dialogue mode, as well as other details for how the user will perform his or her work on the computer platform. For an example of the latter, a choice could be made whether or not the information system will be based on a windowing system and, if so, what kinds of windows there should be, how many windows could be shown at the same time, etc. The choice of dialogue mode considers the choice of how the users would use the information system to perform the actions available on the user depiction of the problem domain. For example, the choice might be static or pop-up menus, command language, or even question and answer. In any of these dialogue modes, the objective is to determine a means for invoking the cognitive-level operations, including both selection of the operation and indicating the operandi for the operation. 8.4.3 An Example User Interface In this section, an example user interface is provided for a hypothetical ISAC A-Graph editor too. This example is used to illustrate how this can fit into the proposed architecture. The user interface is only sketched here to show how it could work within the architecture. The objective is not to present a "polished" user interface. We'll apply
216
the editor in this example to the creation of the ISAC A-Graph given in figure 7-2. The user interface for a hypothetical ISAC A-Graph editor might begin with an editor window with a blank rectangle in which to place the flows, as shown in figure 8-2. ISAC sets (flows which arise or terminate outside of the system) can be added above or below the rectangle. The menus at the top can be invoked to find the various commands. Note that the editor in this design assumes the user will edit the Overview A-
Figure 8-2: Initial ISAC A-Graph Editor Window
Graph to begin with. Using the commands to create the various symbols, place them in the diagram, and connect them together, a hypothetical user might have arrived at the partially
217
constructed A-Graph as shown in figure 8-3. This figure differs somewhat from that shown in figure 7-2. The problem facing this hypothetical user is that he or she has begun to work at too low a level for the overview figure that he or she is trying to create. The processes (activities) named "Call for papers planning," "Planning and administration of referee work," and "Program design" are too detailed. (In the ISAC solution of the IFIP case, these were in the next level down.) Recognizing this, our hypothetical user might decide to divide the currently drafted diagram into two diagrams. The three more detailed activities should be on another lower level diagram. To achieve this, our
Figure 8-3: Overly Detailed Overview A-Graph Figure 8-4: Block to Push to Second A-Graph 218
hypothetical user can make use of higher, cognitive-level operations to create a block, then push the block to a lower level diagram. Figure 8-4 shows pushing of the block (shaded area). Note that the block could have been marked in several ways, e.g., by marking each item in the block individually, by using a rubber-banding rectangle to mark all items within the rectangle, or by enclosing the items in the block by tracing a freeform outline around them. Once the command is given to push the block to a new A-Graph, figure 8-5 shows the new lower-level A-Graph (also only partially drafted now), shown in a newly opened window. After finishing with the new lower-level A-graph (or during or before, depending on the user-interface design), our hypothetical user can return to the original A-graph
Figure 8-5: Block after Being Pushed to New A-Graph
Figure 8-6: Original A-Graph after Block is Removed
and finish the details of the push operation, as shown in figure 8-6. Note that the 219
processes that were removed are instead replaced by a single process. This process must be assigned a name and a good user-interface design will prompt our hypothetical user for that (perhaps even before opening the window for the lower level A-Graph). All other references in the diagram are kept consistent. Once the block is removed from the overview A-Graph and pushed to a second AGraph, the user can then edit the original A-Graph to come up with the desired overview, as shown in figure 8-7. To recap - figure 8-3 is the before figure. Figure 8-4 is during the operation to invoke the push-tolower-A-Graph operation. Figures 8-5 and 8-6 are after the operation, showing its result. Figure
Figure 8-7: Completed Overview A-Graph
8-7 shows the final result after editing the Overview A-Graph. This completes our example of the use of the upper 2 levels of the architecture to implement an ISAC A-Graph editor, with an example application to the IFIP Case.
220
Part IV Conclusions
Chapter 9 Conclusions
This dissertation has presented a new conceptual data model, named CoCoA, which is intended to support the conceptualization and data modelling of complex problem domains. It has evaluated the CoCoA model in two ways. First it compared the CoCoA model to a set of criteria for evaluating conceptual data models. These included some criteria which were derived from an analysis of the characteristics of complex problem domains. Second, the CoCoA model was applied to examples of complex problem domains in order to assess its ability to deal with them. The dissertation also presented a possible architecture for integrated information systems which would support work in complex problem domains and showed how CoCoA models can be mapped onto that architecture. 9.1 Contributions of this Research The primary contribution of this dissertation is the CoCoA model itself. The CoCoA model includes several new conceptual data modelling features and combines them with existing features adapted from other conceptual data models. The new conceptual data modelling features of the CoCoA model include: 1) the concept of complex covering aggregation of entities and named relationships (from which CoCoA derives its name), 2) the concept of allowing multiple participation in named relationship roles (implicit categories), 3) the user representation of named relationships and their roles as divided boxes, and 4) the user representation of covering aggregation with enclosing boxes. The new concepts better support modelling of complex problem domains. The user representation better supports clear, precise, unambiguous, and graphically distinct 227
depiction of the problem domain using the concepts of the model. Another contribution of this dissertation has come from the example applications of CoCoA to areas within the complex problem domain of IS modelling. These areas were termed IS modelling perspectives. CoCoA was used to show a possible integration of the data flow perspective, including Data Flow Diagrams, SADT Activity Diagrams, and ISAC A-Graphs. The dissertation also applied CoCoA to perform a detailed review, analysis, and integration of the conceptual data modelling perspective. Using CoCoA enabled the effective comparison and integration of the conceptual data models reviewed and their concepts. It also showed how the conceptual data modelling perspective is related to the object-oriented systems analysis perspective. Third, some enhancements to existing techniques for view integration within complex problem domains have been proposed. In particular this includes modelling smaller groups of concepts (subviews) within particular views of the complex problem domain, then searching for similar subviews within other views of the problem domain. Doing so enables the conceptual data modeler to deal with a smaller part of the problem domain at any one time and to develop a richer, more complete understanding of that subview. This makes identification of synonyms and linking of related concepts easier. Fourth, this dissertation has also developed and presented standard mappings from CoCoA to relational and object-oriented implementations. These mappings can be used regardless of whether the entire proposed architecture is used. They can also be used for other conceptual data models which support the same concepts. Finally, this dissertation has presented an architecture for information systems to provide integrated support for work in complex problem domains. Arguments have been made for the use of a relational database as the means to integrate the concepts of the complex problem domain, as well as for the division into the various higher levels in the architecture. Further validation of the architecture is of course needed (see section 9.3 228
below). 9.2 Limitations of the Approach The CoCoA model was designed to support development of a rich and complete understanding of a complex problem domain, i.e., its conceptualization. Concerns for the implementation of the resulting model have been deliberately kept secondary. In spite of this, we have seen that CoCoA models are readily mappable onto relational and objectoriented implementations. There are, however, three remaining limitations pertaining to the ease of implementing CoCoA models and their suitability for documenting requirements. First, CoCoA has deliberately kept the specifications for derivations informal. This is useful for conceptualization, but informal derivations are sometimes insufficient to support implementation of the derivations. Note that this limitation can also be said to apply to other methods that treat derivations informally, such as the OMT model. In order to be more suitable as a basis for implementation, CoCoA should be augmented with some more formal means of specifying derivations (see section 9.3 below). This, however, remains outside the scope of this dissertation work.
Second, the
identification and development of operations for object-oriented implementations of CoCoA models has relied on general classes of operations and rules. Little guidance has been provided for the identification and development of operations which are more closely linked to the problem domain. However, this dissertation has provided guidance for the identification of low-level operations related to the logical structure of the concepts of the problem domain (as would be implemented in level 2 of the architecture) and has also provided some limited guidance for the identification of user-level operations (as would be implemented in level 3 of the architecture). Furthermore, the identification of operations can be seen as an implementation concern which diverts attention from the main goal of aiding in the conceptualization of the concepts of the 229
complex problem domain. Finally, CoCoA does not provide for identification and specification of operations as a part of the CoCoA model, unlike the object-oriented analysis models, such as OMT. Identification of operations is not a concern of conceptual data modelling, rather it is a concern of object-oriented analysis approaches. CoCoA takes this approach partly because it seems unnecessary at the conceptualization stage and partly because specification of low-level operations would unnecessarily clutter the CoCoA diagrams. However, CoCoA could still be augmented with additional constructs to identify and/or specify higher level operations needed on CoCoA entity types (which would then be "object types"). This is also outside the scope of this dissertation. However, see section 9.3 below. 9.3 Topics for Further Research As pointed out in the previous section, informal treatment of derivations may not be sufficient for the resulting specification to serve as a basis for implementation of some derived concepts. Therefore, an important topic for further research is more formal specification of derivations. Furthermore, it may be fruitful to investigate classifications of different kinds of derivations, e.g., mandatory, optional, situation dependent, etc. which may be helpful in characterizing and specifying the circumstances in which concepts should be derived. As the specification of derivations is supplemental to a CoCoA Diagram, whether it is informal, formal, or both, the means for delivering the ability to make and view such specifications to the user also needs to be investigated. It would be particularly useful to have a standard means. One possibility is the use of hypertext systems, e.g., as discussed in [Kerola and Oinas-Kukkonen]. For example, different links could exist from a derived concept to both informal and formal specifications. Further research is also needed to investigate tool support to enable the use of 230
multiple views during conceptual modelling. Computer-based tools offer many flexible and dynamic possibilities beyond those of paper-based tools. This dissertation has hinted at some ideas (e.g., a conceptual data modelling tool could support suppression or inclusion of details, such as attributes and/or named relationship roles and constraints). However, other capabilities should be investigated. For example, a tool to support CoCoA modelling might have options for 1) displaying only generalization-specialization structures, 2) highlighting all relationships linked to a particular entity, or 3) presenting all diagrams which include an entity type and highlighting where it is used. Hypertext systems (cf. [Kerola and Oinas-Kukkonen]) could possibly used for this facility also. In performing the CoCoA modelling in the example complex problem domains for this dissertation, I have occasionally encountered situations in which a number of named relationships are quite similar in nature, embodying similar semantics and differing mainly in the constraints on the relationships. It may be fruitful to investigate the application of the generalization-specialization relationship to named relationships. Thus far, this has only been applied to entity types (and object types). Creation of generalized named relationship types may make various kinds of queries easier. Alternatively, it may be possible to achieve similar effects by relaxing the constraint in CoCoA that multiple participation in the same role in a relationship must be subject to the same cardinality constraints. A notation which supports this would have to be developed. To the author's knowledge, neither of these possibilities has been investigated in the literature. The evaluation of CoCoA in this dissertation has been made on the basis of comparison to an objective set of criteria as well as test applications to realistic example complex problem domains. This allows individual readers to evaluate the CoCoA model's utility for themselves. However, further empirical testing of the CoCoA model on samples of conceptual data model users could be very useful. For example, potential users could be asked to compare CoCoA to other existing conceptual data models in 231
terms of ease of use, ease of learning, or ease of understandability of the resulting models. Of course, such tests would require a rough equivalence of the conceptual data models compared, i.e., it would be inappropriate to evaluate CoCoA against other conceptual data models which do not adequately model complex problem domains. Tests could also be performed to evaluate users' abilities to develop "correct" data models using CoCoA. Such further empirical tests might allow us to generalize on the potential user population or determine to what extent and what kind of training is necessary to make effective use of CoCoA. This dissertation has applied CoCoA to only a couple of areas within the IS modelling problem domain. Further applications of CoCoA are warranted to investigate and integrate other IS modelling perspectives. Furthermore, CoCoA should also be applied to other complex problem domains, such as CAD/CAM and CIM. In fact, such uses of CoCoA are the entire reason for its creation. Finally, this dissertation has proposed a possible architecture for information systems to support work in complex problem domains. The evidence supporting this architecture has only been in the form of argumentation based on the abstract characteristics of complex problem domains and the parts of the architecture. Given the emphasis in this dissertation on conceptualization of the complex problem domain, further evaluation of the architecture is outside the research scope. However, further research (e.g., development of prototypes) to investigate the utility of and to refine the proposed architecture is certainly warranted.
232
Appendices
Appendix A CoCoA Modelling of Additional Conceptual Data Models
This appendix uses CoCoA to model additional conceptual data models than those included in section 6.2 in the dissertation. Section 6.2 models the ER model and the CoCoA model. In addition, this appendix models the EER, NIAM, ECR, and OMT conceptual data models, each in their own section. 1. Extended Entity Relationship Model The Entity Relationship (ER) model is often extended to provide additional semantic constructs which improve its ability to describe complex situations. The extended version modelled in figure A-1 is by [Teorey et al.]. It was briefly described in section 3.2.
237
Figure A-1: The Extended Entity Relationship (EER) Model The primary two additions of the EXTENDED_ER_MODEL to the ER model (described in section 6.2.1) are a SUBSET_RELATIONSHIP entity and a GENERALIZATION_RELATIONSHIP entity. Each has two relationships with the ENTITY entity type, the SUBSETS, SUBSETS_INTO, SPECIALIZES, and GENERALIZES relationships. There are also some changes to the attributes of the various entities. The ATTRIBUTE entity also gets the DATA_TYPE and FUNCTION attributes. ROLEs in the EER model are not named, so that attribute is not present. CONNECTIVITY and MEMBERSHIP_CLASS are the terms for maximum and minimum cardinality in the EER model. 2. NIAM Data Model The NIAM data model takes a somewhat different view of data from the ER and EER models (see figure A-2 below). What are called ATTRIBUTEs and ENTITIes in the ER 238
and EER models are both considered to be something called ENTITY_TYPEs. NIAM's ENTITY_TYPEs are linked via FACT_TYPEs (similar to relationships in the ER and EER data models). The version of NIAM described here is that published in [Nijssen and Halpin] in 1989. NIAM is described here in two figures because of the size and complexity of the model - in particular the large number of constraints. Figure A-2 represents the core concepts of the NIAM data model, while figure A-3 shows the various constraints on and between individual roles, on sets of roles, and between ordered combinations of roles (in which roles are matched pairwise in order). In figure A-2, we can see that a NIAM_MODEL (the conceptual data modelling part) is made up of a number of NIAM_DIAGRAMs and separate DERIVATION annotations. Each DERIVATION DERIVES a FACT_TYPE. A DERIVATION may be DERIVED_FROM a number of other FACT_TYPEs.
239
Figure A-2: The Nijssen Information Analysis Method (NIAM) Model A NIAM_DIAGRAM is composed fundamentally of ENTITY_TYPEs, FACT_TYPEs, LABEL_TYPEs and REFERENCE_TYPEs. A FACT_TYPE links 2 or more ENTITY_TYPEs via ROLEs, similarly to the relationship construct of the ER and 240
EER models. But a FACT_TYPE may also be linked to a single ENTITY_TYPE (when it is a unary fact type). Therefore, a FACT_TYPE is composed of one or more ROLEs. A ROLE must be PLAYed by a single ENTITY_TYPE. A FACT_TYPE may optionally be objectified as a NESTED_FACT_TYPE, thereby becoming both a FACT_TYPE and an ENTITY_TYPE. An ENTITY_TYPE may also be DEFINEd by an ENTITY_TYPE_CONSTRAINT, which is like a data type definition. A REFERENCE_TYPE is like a FACT_TYPE except that it links a single ENTITY_TYPE to one or more LABEL_TYPEs. This is like an identifier in other data models; the LABEL_TYPEs identify the ENTITY_TYPE. When multiple
241
LABEL_TYPEs are USED_TO_REFER to an ENTITY_TYPE through a single REFERENCE_TYPE, a concatenated key is implied. Figure A-3 shows the various kinds of role constraints in NIAM. These could not be shown in figure A-2 due to space limitations. The CoCoA model allows partitioning into multiple diagrams according to whatever criteria make sense to the model user. In this case, all role constraints were grouped on one diagram, which left a small enough number of concepts remaining to be shown on only one more diagram, figure A-2.
242
Figure A-3: NIAM Constraints In the top half of figure A-3, three of the NIAM role constraints, EQUALITY_CONSTRAINT, EXCLUSION_CONSTRAINT, and SUBSET_CONSTRAINT are between either a pair of individual ROLEs or a pair of 243
ORDERED_ROLE_COMBINATIONs. The case of an individual ROLE is accounted for in this diagram because an ORDERED_ROLE_COMBINATION aggregates one or more ROLEs. An ORDERED_ROLE_COMBINATION is needed because the ROLEs in each combination are matched against the equivalent ROLEs in the other combination involved in the constraint. The order of the ROLEs is represented by the FOLLOWS_IN_COMBINATION relationship, which is aggregated along with the ROLEs into the ORDERED_ROLE_COMBINATION entity. Because EQUALITY_CONSTRAINTs and EXCLUSION_CONSTRAINTs are both reflexive, there is only one kind of role for an ORDERED_ROLE_COMBINATION to play with respect to them as each is MUTUALLY_CONSTRAINED. On the other hand, the SUBSET_CONSTRAINT is not reflexive, so the two
244
ORDERED_ROLE_COMBINATIONs linked by a particular SUBSET_CONSTRAINT must each play different roles, SUBSET and SUPERSET. The constraints in the lower half of figure A-3, UNIQUENESS_CONSTRAINT, OCCURRENCE_FREQUENCY_CONSTRAINT, and MANDATORY_ROLE_CONSTRAINT, apply either to an individual ROLE or to an UNORDERED_ROLE_COMBINATION. Like with the ORDERED_ROLE_COMBINATION above, a single ROLE is accounted for by the (1,n) cardinality in the aggregation of ROLE into UNORDERED_ROLE_COMBINATION. The order of the ROLEs is unimportant because the constraint requires no matching of ROLEs to ROLEs. Therefore, the FOLLOWS_IN_COMBINATION relationship is not necessary in this case, and is not a component of UNORDERED_ROLE_COMBINATION. 3. ECR Data Model The Entity-Category-Relationship data model [Elmasri et al.] also extends the EntityRelationship (ER) model. The ECR model introduced the category concept (actually, more than one category concept), partly to handle the restriction of the ER model that only one entity type could participate in a role in a relationship. It was briefly described in section 3.3 and is described more formally by figure A-4.
245
Figure A-4: The Entity Category Relationship (ECR) Model Like the ER and EER models, the ECR model has an ENTITY entity type, which aggregates ATTRIBUTEs, and a RELATIONSHIP entity, which aggregates ROLEs. The attributes of the ROLE entity are somewhat different, but similar to the ER and EER 246
models. The ATTRIBUTE entity has two additional boolean attributes, MULTI-
247
VALUED and TOTAL (meaning at least one value is mandatory). Additionally, an ATTRIBUTE may aggregate other ATTRIBUTEs hierarchically. E.g., an address could aggregate street_address, city, etc..
The primary addition of the ECR model to the ER
and EER models is the CATEGORY entity. A CATEGORY is just like an ENTITY in that it groups entities and can participate in roles in relationships, but it is defined by a relationship to an ENTITY of another CATEGORY. There are two specializations of CATEGORY, depending on which kind of relationship defines the CATEGORY. The ISA_CATEGORY is defined by a CLASSIFICATION_RELATIONSHIP, which specializes an ENTITY or another CATEGORY. The GROUPING_OF_ENTITIES_CATEGORY is defined by a UNION_RELATIONSHIP, which groups ENTITIes or CATEGORIES into a UNION, including all the attributes of the ENTITIes or CATEGORIes grouped. This is different from a generalization relationship, because a generalization relationship puts only the intersection of the attributes (those common to all ENTITIes) into the supertype. An interesting issue here is whether there is a substantive difference between the ENTITY, ISA_CATEGORY, and GROUPING_OF_ENTITIES_CATEGORY entities. First, both an ENTITY and either kind of CATEGORY can play the PLAYER role in the PLAYS relationship (with the same cardinality). Second, both ENTITY and CATEGORY have NAME attributes. Third, both can aggregate their own ATTRIBUTEs. Furthermore, an ENTITY can also be combined with a CATEGORY (of either subtype) into a COMBINED_ENTITY_CATEGORY. Therefore, if an ENTITY is combined with an ISA_CATEGORY, it can additionally play the SUBTYPE role. Similarly, if an ENTITY is combined with a GROUPING_OF_ENTITIES_CATEGORY, it can additionally play the UNION role. Either type of CATEGORY can also play the SUPERTYPE and GROUPED roles, thus allowing them to be used to define other CATEGORIes. Finally, an ISA_CATEGORY may also be a 248
GENERALIZATION_CATEGORY and play its roles (and vice versa)! The above facts indicate that ENTITY and CATEGORY are actually synonyms and differentiating the terms is not really useful. An ENTITY is only called a CATEGORY if it plays a SUBTYPE role relating it to a CLASSIFICATION_RELATIONSHIP or a UNION role relating it to a UNION_RELATIONSHIP. In fact, the figures in [Elmasri et al.] don't consistently distinguish between them. Nevertheless, the terms are differentiated in the ECR model and therefore are modelled as such in this section. 4. Object Modelling Technique As noted in chapter 3, the Object Modelling Technique (OMT) of [Rumbaugh et al.] expands on conceptual data modelling to support object-oriented analysis. OMT contains conceptual data modelling concepts which are useful enough to bear discussing it here. Object-oriented analysis uses an object type concept, which is very similar to the entity type. The differences are detailed in section appendix B, but primarily, the object type has operations as a component, while entity type does not. Since OMT uses objects instead of entities, we will use the term object here, but most features that apply to objects apply to entities as well. One particularly useful feature of OMT is that it directly supports simple covering aggregation of objects into higher level, complex objects. While it does not directly support complex covering aggregation, it supports it indirectly by allowing objectification of relationships, as in NIAM, which can then be aggregated. The concepts of the OMT Object Model notation are described in figure A-5. OMT also includes a Dynamic Model notation and a Functional Model notation which are irrelevant to conceptual data modelling and are not described here.
249
Figure A-5: The Object Modelling Technique (OMT) In OMT, an object type is called a CLASS. A relationship is known as an ASSOCIATION. A CLASS may be linked to one or more other CLASSes by playing a 250
ROLE which is a component of the ASSOCIATION. In OMT, both CLASSes and ASSOCIATIONs may have ATTRIBUTEs as components. In support of object-oriented analysis, a CLASS may also have OPERATIONs as components. OMT supports generalization through GENERALIZATION_RELATIONSHIPs together with the GENERALIZES and SPECIALIZES relationships. A GENERALIZATION_RELATIONSHIP may optionally be associated with an ATTRIBUTE which plays a DISCRIMINATOR role (often called a partitioning attribute). A DISCRIMINATOR PARTITIONS a GENERALIZATION_RELATIONSHIP, i.e., it divides a SUPERTYPE CLASS into SUBTYPEs according to the DISCRIMINATOR's value. OMT supports simple covering aggregation with the AGGREGATION_RELATIONSHIP. Note that only CLASS may be a COMPONENT, so only simple covering aggregation
251
is allowed directly. Note also however that a CLASS may be specialized by objectifying a relationship into an ASSOCIATION_AS_CLASS which is discussed below. This may be used to indirectly support complex covering aggregation. OMT also supports objectification. Objectification is the act of treating a relationship (association in OMT) as an object (class in OMT). This is similar to the NESTED_FACT_TYPE in NIAM (see figure A-2). The result is an ASSOCIATION_AS_CLASS, which inherits from both ASSOCIATION and CLASS. In addition to having OPERATIONS as components of CLASSes, OMT provides further support for object-oriented analysis with OPERATION_PROPAGATION of an OPERATION FROM one CLASS TO another. Finally, OMT supports qualification of relationships. A qualified relationship is one in which a CLASS playing a particular ROLE is divided into disjoint groups by the value of an ATTRIBUTE (called a QUALIFIER) of another CLASS which plays another ROLE in the same relationship. The benefit of this is a smaller name space and a simple (rather than concatenated) key for the CLASS playing the QUALIFIED ROLE. However, I don't think that this aids in conceptualization of the problem domain.
252
Appendix B CoCoA Modelling of Additional Conceptual Data Modelling Abstractions
This appendix uses CoCoA to model additional conceptual data modelling abstractions than those included in section 6.3 in the dissertation. Section 6.3 models the Simple Entity, Named Relationship, and Fact Type conceptual data modelling abstractions. In addition, this appendix models the Generalization and Specialization, Categorization, Covering Aggregation, Objectification, Object Types, Derivation and Derived Concepts, and Constraint conceptual data modelling abstractions, each in their own section. 1. Generalization and Specialization This dissertation introduced generalization and specialization in section 4.3. It is used in several conceptual data models, and we have only a little more to add to its description in this section.
251
Figure B-1: Generalization-Specialization Data Abstraction Generalization and specialization are complementary abstractions. The generalization and specialization abstraction is described more formally via figure B-1. A GENERALIZATION-SPECIALIZATION_RELATIONSHIP is used to link
252
ENTITY_TYPEs, which can play the role of SUPERTYPE or of SUBTYPE (or both roles, but not in the same relationship!). Through its two specializations, an ENTITY_TYPE may play the SUBTYPE role in one of the two _GENERALIZES relationships, so named because a SUBTYPE is the thing that gets GENERALIZEd. An ENTITY_TYPE may play the SUPERTYPE role in a SPECIALIZES relationship, since a SUPERTYPE is the thing the gets SPECIALIZEd. SUBCLASS and SPECIALIZATION are alternative names (aliases) for SUBTYPE. SUPERCLASS and GENERALIZATION are alternative names for SUPERTYPE. SPECIALIZES_TO is an alternative name for GENERALIZES and GENERALIZES_TO is an alternative name for SPECIALIZES. One question about generalization-specialization is whether multiple inheritance is allowed or not. Figure B-1 shows this with specializations of ENTITY_TYPE. The specializations are determined by the boolean partitioning attribute MULTIPLE_INHERITANCE into two subtypes, called MULTIPLE_INHERITANCE_ENTITY_TYPE and SINGLE_INHERITANCE_ENTITY_TYPE. As noted above, both subtypes play the SUBTYPE role in either a MULTIPLY_GENERALIZES or a SINGLY_GENERALIZES relationship. The difference between the two is that the maximum cardinality constraint of the SUBTYPE role is different in each -- one (1) for single inheritance, many (n) for multiple inheritance. The SPECIALIZES relationship is the same for either single- or multiple-inheritance, so it is connected to their supertype (ENTITY_TYPE). The GENERALIZATION-SPECIALIZATION_RELATIONSHIP entity has three boolean attributes, DISJOINT, EXHAUSTIVE, and PARTITIONED. If DISJOINT is true, the various SUBTYPEs of the GENERALIZATIONSPECIALIZATION_RELATIONSHIP must not have overlapping instances, i.e., no instance can be of more than one SUBTYPE of the GENERALIZATIONSPECIALIZATION_RELATIONSHIP. If EXHAUSTIVE is true, every instance which 253
is a member the SUPERTYPE must be a member of at least one of the SUBTYPEs of the GENERALIZATION-SPECIALIZATION_RELATIONSHIP. Note that PARTITIONS is derived. It is true if (and only if) both DISJOINT and EXHAUSTIVE are true. 2. Categorization Categories were initially discussed in section 4.4. We have seen in our CoCoA modelling of the ECR model (appendix A, figure A-4) that the ECR model has two types of categories. The ISA_CATEGORY corresponds directly to specialization, which was addressed in the previous section. The GROUPING-OFENTITIES_CATEGORY_TYPE is what interests us here and is modelled using CoCoA in figure B-2.
Figure B-2: Grouping-of-Entities Category Data Abstraction As discussed in section 4.4, the GROUPING-OF-ENTITIES_CATEGORY_TYPE is similar to generalization in that the relationship played by the category is "inherited" by all the ENTITY_TYPEs which it GROUPS, just like subtypes of a supertype. The difference is that a GROUPING-OF-ENTITIES_CATEGORY holds the union of all the 254
attributes of the GROUPED ENTITY_TYPEs. In generalization, only the intersection of the attributes of the generalized (subtype) entities are found in the supertype. The grouping of ENTITY_TYPEs into a GROUPING_OF_ENTITIES_CATEGORY_TYPE is accomplished through the UNION_RELATIONSHIP instead of a GENERALIZATION-SPECIALIZATION_RELATIONSHIP. Both ENTITY_TYPEs and GROUPING_OF_ENTITIES_CATEGORY_TYPEs may be PLAYERs in PLAYS named relationships. As discussed in Appendix A, the two concepts are differentiated only by the fact that the GROUPING_OF_ENTITIES_CATEGORY_TYPE plays the UNION role relating to a UNION_RELATIONSHIP. Therefore, it is a subtype of the ENTITY_TYPE and may be derived from the ENTITY_TYPE and its participation in the UNION role. 3. Covering Aggregation Section 4.5 introduced both simple and complex covering aggregation. Both kinds of covering aggregation are modelled using CoCoA in figure B-3. The COVERING_AGGREGATION_RELATIONSHIP_TYPE entity is specialized into two subtypes, SIMPLE_COVERING_AGGREGATION_RELATIONSHIP_TYPE and COMPLEX_COVERING_AGGREGATION_RELATIONSHIP_TYPE. The only difference between the two is that a COMPLEX_COVERING_AGGREGATION_RELATIONSHIP_TYPE allows both a NAMED_RELATIONSHIP_TYPE or an ENTITY_TYPE to be a COMPONENT in an COMPLEX_COVER_AGGREGATES relationship with it. Each kind of COVERING_AGGREGATION_RELATIONSHIP_TYPE AGGREGATES_INTO an ENTITY_TYPE.
255
Figure B-3: Covering Aggregation Data Abstraction
The attributes of a COVERING_AGGREGATION_RELATIONSHIP_TYPE entity can be related to the classification in [Iivari]. Iivari developed a 5 dimensional classification of covering aggregation. The first is whether the component object can be 256
connected (i.e., whether it is simple or complex covering). This is taken care of by the COMPLEX boolean attribute, which determines the specialization into one of the two subtypes. The second dimension is whether a COMPONENT ENTITY_TYPE is dependent or not, which is handled by the boolean attribute DEPENDENT_COMPONENT. This implies deletion of the COMPONENT ENTITY_TYPE if the ASSEMBLY ENTITY_TYPE is deleted (unless the
257
COMPONENT ENTITY_TYPE is shared, see below). The third dimension is whether the aggregate entity (the ENTITY_TYPE which plays the ASSEMBLY role) is required to have a particular kind of component. This is represented by the boolean attribute MANDATORY_COMPONENT. The fourth dimension in [Iivari] is whether a component may be shared or not with other assemblies. The classification needs to be extended here because there are two cases of sharing, i.e., sharing of components with assemblies of the same type and sharing of components with assemblies of other types. Each case is handled by a different boolean attribute, SHARES_WITHIN_TYPE and SHARES_ACROSS_TYPES respectively. Each, if true, means "component may be shared." The fifth dimension is whether aggregation is recursive or not. Of course, recursion to the identical object instance cannot be allowed, but recursion to other objects of the same type is very common and will always be allowed in CoCoA. It is represented with the boolean attribute named RECURSIVE_COMPONENT. Note that several of the above attributes, namely MANDATORY_COMPONENT, SHARES_WITHIN_TYPE, SHARES_ACROSS_TYPES, and RECURSIVE_COMPONENT are derived boolean attributes. The MANDATORY_COMPONENT and SHARES_WITHIN_TYPE attributes may be derived from the ASSEMBLY_MULTIPLICITY attribute, which is similar to the MULTIPLICITY attribute of the NAMED_RELATIONSHIP_ROLE_TYPE (see sections 4.3.2.2 and 6.3.2 and figure 6-6). In this case, the implied role is the ASSEMBLY role. If the minimum ASSEMBLY_MULTIPLICITY is zero, the COMPONENT is not mandatory (MANDATORY_COMPONENT is false). If the maximum ASSEMBLY_MULTIPLICITY is more than one, then the COMPONENT may be shared with other ASSEMBLIes of the same ENTITY_TYPE (SHARES_WITHIN_TYPE is true). The SHARES_ACROSS_TYPES attribute is derived as "true" if the COVERING_AGGREGATION_RELATIONSHIP_TYPE plays either role in a 258
CAN_SHARE_COMPONENT_WITH relationship. The RECURSIVE_COMPONENT attribute is true if the COMPONENT ENTITY_TYPE (direct recursion) or any of its COMPONENTs (indirect recursion) are of the same entity type as the ASSEMBLY. Note that, while they were not included, we could have included DIRECT_RECURSIVE_COMPONENT and INDIRECT_RECURSIVE_COMPONENT attributes, which would be derived in the above way. The RECURSIVE_COMPONENT attribute would then be true if either (or both) of these attributes were true. Recursion is allowed in a CoCoA model. However, it might not be allowed in other data models. This completes the CoCoA modelling of the abstraction mechanisms which were reviewed in chapter 4. Now we will review other abstraction mechanisms which were introduced in our CoCoA modelling of conceptual data models other than CoCoA. 4. Objectification While not utilized in CoCoA, we have seen objectification in both the NIAM and OMT models. Objectification is the abstraction with which a relationship is transformed into an entity (or object, but objects are distinguished from entities below). This is done so that a relationship may have the characteristics of an entity, for example, the ability to participate in relationships. The objectification abstraction is modelled in figure B-4. In the NIAM model (see appendix A), the result of objectification is called a NESTED_FACT_TYPE. OMT (see appendix A) calls it an ASSOCIATION_AS_CLASS. The result of objectification is modelled most straightforwardly by
Figure B-4: Objectification Data Abstraction
simply showing the result, called an OBJECTIFIED_RELATIONSHIP_TYPE, as 259
(multiply) inheriting from both ENTITY_TYPE and NAMED_RELATIONSHIP_TYPE. 5. Object Types Object types are used in the OMT model, which calls them classes (see appendix A). We use the term object type here to be consistent with the other terminology. Object types are modelled in figure B-5. Object types are used principally for object-oriented analysis (OOA). OOA is closely related to conceptual data modelling, therefore object types are included here. Objects and
260
Figure B-5: Object Data Abstraction
entities are similar, but, as noted in the previous section, somewhat different, as discussed below. The principal difference between OBJECT_TYPEs and ENTITY_TYPEs is that OBJECT_TYPEs may also have OPERATION_TYPEs as components. In a CoCoA 261
model, OBJECT_TYPE only needs to be made a specialization of ENTITY_TYPE in order to inherit ENTITY_TYPE's characteristics. Figure B-5 shows ATTRIBUTE_TYPEs as components of OBJECT_TYPE redundantly for clarity, even though they are inherited as components from ENTITY_TYPE anyway. OMT also includes propagation of operations, which is modelled here with the OPERATION_PROPAGATION_TYPE entity and the PROPAGATES, FROM, and TO relationships. The OPERATION_TYPE has the same attributes as described for OMT in appendix A namely NAME, ABSTRACT (boolean, true if an abstract operation), ARGUMENTS, and CLASS (boolean, true if the operation applies to the class instead of instances). To these attributes, we add a FUNCTION boolean attribute, which is true if the OPERATION_TYPE is a function. FUNCTION partitions OPERATION_TYPE into two specializations, FUNCTION_OPERATION_TYPE (with a RETURN_DATA_TYPE attribute) and PROCEDURE_OPERATION_TYPE. FORMAL_ARGUMENT_TYPEs are components of OPERATION_TYPE and have NAME, MODE (input, output, or both), and DATA_TYPE attributes. The ARGUMENTS attribute of OPERATION_TYPE is derived from the OPERATION_TYPE's FORMAL_ATTRIBUTE_TYPE components. Derived ATTRIBUTE_TYPEs are often treated as FUNCTION_OPERATION_TYPEs (without FORMAL_ARGUMENT_TYPE components), so the ATTRIBUTE_TYPE and FUNCTION_OPERATION_TYPE concepts overlap somewhat in this regard. This is another reason for including OBJECT_TYPE in our analysis of conceptual data modelling abstraction mechanisms. 6. Derivation and Derived Concepts Both NIAM and OMT (see appendix A) introduced derivation of their various concepts. The general idea is not to store information which is in some way redundant. 262
Information is redundant if it can be derived from other existing information. Figure B-6 models derived concepts and their derivation.
Figure B-6: Derivation Data Abstraction
In NIAM, the main information-bearing concept is a FACT_TYPE (see section 6.3.3 below). A FACT_TYPE can be modelled as either DERIVED (meaning derivable) or not. As noted earlier in this appendix for ENTITY_TYPEs and ATTRIBUTE_TYPEs, and implied for NAMED_RELATIONSHIP_TYPEs (see section 6.3.1), the DERIVED attribute of an ENTITY_TYPE, ATTRIBUTE_TYPE, or NAMED_RELATIONSHIP_TYPE is itself derived from the presence of a DERIVES named relationship with a DERIVATION_RULE. Whether a FACT_TYPE is DERIVED or not is also determined by whether is associated with a DERIVATION_RULE. In OMT, there are three kinds of concepts which may be either DERIVED or not (see appendix A), including ATTRIBUTE, ASSOCIATION (NAMED_RELATIONSHIPS), 263
and OBJECT_TYPE. Because OBJECT_TYPE is a subtype of ENTITY_TYPE and ENTITY_TYPEs could also be derived, ENTITY_TYPE is used instead of OBJECT_TYPE in figure B-6, and OBJECT_TYPE then inherits it. A DERIVATION_RULE has a SPECIFICATION attribute which explains how one DERIVES a FACT_TYPE (or other concept). A DERIVATION_RULE is also linked to the SOURCE of the information which it uses with a DERIVES_FROM named relationship. There is an inconsistency between the situations that (1) the DERIVED attribute of an ATTRIBUTE, NAMED_RELATIONSHIP, ENTITY_TYPE, or FACT_TYPE is derived from a DERIVES relationship with a DERIVATION_RULE and (2) a DERIVATION_RULE is optional in OMT. To address this inconsistency, we will allow the existence of a DERIVATION_RULE and a DERIVES relationship to the DERIVED concept, but the DERIVATION_RULE itself may remain unspecified (the SPECIFICATION attribute remains unvalued). This will allow derivation of the DERIVED attribute while not requiring SPECIFICATION of the DERIVATION_RULE, thus meeting the spirit of the optionality in OMT. As noted in figure B-6, the DERIVES_FROM named relationship is derived. The derivation is from the use of a SOURCE within the DERIVATION_RULE specification. Supporting this, the DERIVES_FROM relationship is optional (minimum participation = 0) for a DERIVATION_RULE. A COVERING_AGGREGATION_RELATIONSHIP_TYPE may also be a SOURCE for a DERIVATION_RULE, therefore it is shown as such in figure B-6. Details about DERIVATION_RULEs' SPECIFICATIONs, i.e., how they are formulated and specified and their various attributes, spill over from the conceptual data modelling area into the procedural area, and are therefore not considered further here. Derivations are informally described in CoCoA with plain language text, although the 264
text includes references to various SOURCE concepts. For example, section 6.3.1 informally described the DERIVATION_RULE of the DERIVED boolean attribute of ATTRIBUTE_TYPE as being derived from the presence (or absence) of the ATTRIBUTE_TYPE's participation in the DERIVED role of a DERIVES relationship with a DERIVATION_RULE entity type. 7. Constraints There are many different kinds of constraints. Various cardinality or multiplicity constraints on entities' (and categories') participation in roles have been introduced already (see section 6.3.2 and figures 6-6 through 6-8). We have also introduced several kinds of intra-fact-type and inter-fact-type constraints in our review of NIAM in appendix A (see figure A-3). OMT also has a constraint type between relationships, which is a subset constraint. OMT also allows that there may be other kinds of constraints that may be useful, therefore we should allow room for expansion, addition, and combination. NIAM's entity type constraint is the same as a data type definition from the other conceptual data models' points of view. It applies only to DESCRIPTOR_TYPEs and IDENTIFIER_TYPEs. Therefore, it was already accounted for in section 6.3.1 and figure 6-5. An entity type constraint in NIAM is a synonym for DATA_TYPE. The views on constraints from the various conceptual data models are integrated in figure B-7. NIAM identifies 3 kinds of intra-fact-type constraints, i.e., constraints which apply to a single role or combination of roles within one fact type. They are the UNIQUENESS_CONSTRAINT, the OCCURRENCE_FREQUENCY_CONSTRAINT, and the MANDATORY_ROLE_CONSTRAINT. A UNIQUENESS_CONSTRAINT is the same as having a MAXIMUM participation of 1. A MANDATORY_ROLE_CONSTRAINT is the same as a MINIMUM participation of 1 or more. An OCCURRENCE_FREQUENCY_CONSTRAINT is the same as a 265
MULTIPLICITY constraint. Therefore, for constraints on single roles of named relationships, these constraints are already accounted for in figure 6-6 by the MULTIPLICITY, MAXIMUM, and MINIMUM attributes of a NAMED_RELATIONSHIP_ROLE_TYPE.
Figure B-7: Constraint Data Abstraction
266
However, there are two differences. First, NIAM applies these constraints to fact types, which are more general than named relationships, as we saw in section 6.3.3 (figure 6-9). Second, other data models apply multiplicity constraints to only a single role, but NIAM also applies them to combinations of roles. These differences are accounted for in the lower part of figure B-7. The INTRA-FACTTYPE_PARTICIPATION_CONSTRAINT has the same MULTIPLICITY, MAX, and MIN attributes as NAMED_RELATIONSHIP_ROLE_TYPE in figure 6-6. It plays the CONSTRAINER role in a CONSTRAINS relationship with a CONSTRAINED UNORDERED_ROLE_TYPE_COMBINATION. While not shown in figure B-7, from an INTRA-FACT-TYPE_PARTICIPATION_CONSTRAINT's MULTIPLICITY, MAX, and MIN attributes respectively, one can derive NIAM's OCCURRENCE_FREQUENCY_CONSTRAINT, UNIQUENESS_CONSTRAINT, and MANDATORY_ROLE_CONSTRAINT. NIAM also has 3 kinds of INTER-FACT-TYPE_PARTICIPATION_CONSTRAINTs -- EQUALITY_CONSTRAINTs, EXCLUSION_CONSTRAINTs, and SUBSET_CONSTRAINTs. These were modelled in figure A-3 of appendix A and are included in figure B-7. OMT also uses a SUBSET_CONSTRAINT on named relationships. Because NAMED_RELATIONSHIP_TYPE is a subtype of FACT_TYPE, the OMT SUBSET_CONSTRAINT is accounted for here. The upper portion of figure B7 shows a generalization hierarchy for INTER-FACTTYPE_PARTICIPATION_CONSTRAINTs. INTER-FACTTYPE_PARTICIPATION_CONSTRAINTs are constraints between two (or more) FACT_TYPEs. They are applied to ORDERED_ROLE_TYPE_COMBINATIONs within those FACT_TYPEs. They can be divided into REFLEXIVE_CONSTRAINTs and NON-REFLEXIVE_CONSTRAINTs. REFLEXIVE_CONSTRAINTs are ones in 267
which the ORDERED_ROLE_TYPE_COMBINATIONs are MUTUALLY_CONSTRAINED by each other. NON-REFLEXIVE_CONSTRAINTs are constraints in which one ORDERED_ROLE_TYPE_COMBINATION is a SOURCE which a constraint USES_TO_CONSTRAIN another ORDERED_ROLE_TYPE_COMBINATION. Both REFLEXIVE_ and NONREFLEXIVE_ CONSTRAINTs have a TYPE attribute, which partitions the entity type into the particular type of constraint. Note that other specializations of REFLEXIVE_ and NON-REFLEXIVE_ CONSTRAINTs are possible, allowing for expansion.
268
Appendix C Reviewed CoCoA Models of Additional Data Models
This appendix reviews the initial CoCoA models of the additional conceptual data models (EER, NIAM, ECR, and OMT), which were developed in appendix A. It extends the discussion in section 6.4.2 to include these four conceptual data models. For each of these four models, this section changes the old terminology for the concepts to the integrated terminology for the conceptual data modelling perspective developed in section 6.4.1, includes aliases for local terminology, and notes where concepts are derived. Each of these four models is now discussed in turn. 1. The EER Model The terminology in the initial CoCoA model of the EER Model (given in figure A-1 of appendix A) must be changed to reflect that of the integrated CoCoA model of conceptual data modelling. As the EER model is based on the ER Model, all of the changes made for the ER model (see section 6.4.2.1) will be incorporated. Figure C-1 below shows the CoCoA model of the EER model which resolves these differences. In this and subsequent figures, the old name for a concept is shown as an alias (in quotations) along with the new name, if the
271
Figure C-1: Revised CoCoA Model of the EER Model two are significantly different.
272
The old SUBSET_RELATIONSHIP of figure A-1 is the same as a non-DISJOINT, non-EXHAUSTIVE (i.e., non-PARTITIONing) GENERALIZATIONSPECIALIZATION_RELATIONSHIP, as shown in figure C-1, while the old GENERALIZATION_RELATIONSHIP of figure A-1 is one which PARTITIONS. Therefore, both of these relationships generalized to the GENERALIZATIONSPECIALIZATION_RELATIONSHIP in figure C-1, with its DISJOINT and EXHAUSTIVE attributes, and may be derived from it. The SUBSET and SUPERSET roles correspond to SUBTYPE AND SUPERTYPE respectively. The terms CONNECTIVITY and MEMBERSHIP_CLASS instead become MINIMUM and MAXIMUM respectively, as in figures 6-6 and 6-7. 2. The NIAM Model The NIAM Model presents the most interesting challenge to integration with the rest of the other (ER-based) conceptual data models, because of its use of FACT_TYPEs and REFERENCE_TYPEs instead of ENTITY_TYPEs, ATTRIBUTE_TYPEs, and NAMED_RELATIONSHIP_TYPEs. Both FACT_TYPEs and REFERENCE_TYPEs are derivable from the combination of the other three. Please consult figure 6-9 which relates these five concepts. Figures C-2 and C-3 show the NIAM model with resolved terminology.
273
Figure C-2: Revised CoCoA model of the NIAM Model In figure C-2, a REFERENCE_TYPE may be derived from an ATTRIBUTE_TYPE which is an IDENTIFIER (LABEL_TYPE) together with the ENTITY_TYPE which it identifies. The ATTRIBUTE_TYPE becomes the LABEL_TYPE which plays the 274
IDENTIFIER_ROLE_TYPE in the REFERENCE_TYPE.
275
FACT_TYPEs (or their various specializations) may be derived in three main ways, each from a different source. The first way derives FACT_TYPEs from UNATTRIBUTED_NAMED_RELATIONSHIP_TYPEs. An UNATTRIBUTED_NAMED_RELATIONSHIP_TYPE is a subtype of FACT_TYPE, and NAMED_RELATIONSHIP_ROLE_TYPE is a subtype of ROLE_TYPE, so they simply map directly. The second way derives FACT_TYPEs from ATTRIBUTED_NAMED_RELATIONSHIP_ROLE_TYPEs. The relationship part of an ATTRIBUTED_NAMED_RELATIONSHIP_ROLE_TYPE maps directly in the same way as an UNATTRIBUTED_NAMED_RELATIONSHIP_TYPE because it is also a subtype of FACT_TYPE. However, in order to accommodate the ATTRIBUTES of the relationship, there are two alternatives. One (but only one) of the ATTRIBUTEs may optionally be included within the FACT_TYPE by playing a DESCRIPTOR_ROLE_TYPE. If there is more than one ATTRIBUTE (or simply optionally), the FACT_TYPE must be objectified into an OBJECTIFIED_RELATIONSHIP_TYPE (called a NESTED_FACT_TYPE in NIAM), which is a subtype of FACT_TYPE. Each ATTRIBUTE then becomes either another FACT_TYPE (if it is a DESCRIPTOR_TYPE) or a REFERENCE_TYPE (if it is an IDENTIFIER_TYPE) which is linked to the parent NESTED_FACT_TYPE (which is also a subtype of ENTITY_TYPE). Derivation of REFERENCE_TYPEs was described above. Derivation of FACT_TYPEs from DESCRIPTOR_TYPEs is described in the next paragraph. The third way derives a FACT_TYPE from an ATTRIBUTE_TYPE which is a DESCRIPTOR_TYPE together with the ENTITY_TYPE which it describes. Note that in figure 6-9, both ENTITY_TYPE and DESCRIPTOR_TYPE are subtypes of GENERAL_ENTITY_TYPE. The term ENTITY_TYPE in figure A-2 in appendix A, 276
which models NIAM, is changed to GENERAL_ENTITY_TYPE to accommodate this terminology resolution. The FACT_TYPE is derived by assigning roles to both the DESCRIPTOR_TYPE and ENTITY_TYPE that it describes, as indicated in a covering aggregation relationship between them. The only thing to do that remains is to give a name to the FACT_TYPE and its component ROLE_TYPEs. As noted in the section on NIAM in appendix A, the names given such FACT_TYPEs are rather dubious anyway; they are only rarely helpful for understanding. Therefore, we can simply supply general names, such as "describes" or "has" for the FACT_TYPE and "descriptor" and "described" for the DESCRIPTOR_ROLE_TYPE and ENTITY_ROLE_TYPE respectively. Finally, we can also derive the ENTITY_TYPE_CONSTRAINT entity type and the DEFINES named relationship type. ENTITY_TYPE_CONSTRAINT is a synonym for DATA_TYPE and DESCRIPTOR_TYPE is a subtype of GENERAL_ENTITY_TYPE. The DATA_TYPE attribute of a DESCRIPTOR_TYPE is then an ENTITY_TYPE_CONSTRAINT which DEFINES its corresponding GENERAL_ENTITY_TYPE another straightforward mapping. Derivation/mapping in the opposite direction, from NIAM to ER-based conceptual data models, is not always so straightforward. However, derivation of the covering aggregation relationship between an IDENTIFIER_TYPE and its ENTITY_TYPE is easy from a REFERENCE_TYPE. Derivation of the DATA_TYPE from an ENTITY_TYPE_CONSTRAINT is similarly easy. One problem with derivation of ERbased model constructs from FACT_TYPEs is that there are four kinds of subtypes they could be, which we can't always tell apart. Also, FACT_TYPEs may be derived from several different ER-based model constructs. The central problem is determining which of the GENERAL_ENTITY_TYPEs are DESCRIPTOR_TYPEs and which are ENTITY_TYPEs in the ER-based models. Human intervention may be required to 277
perform the mapping. However, there are some heuristics which we can apply to begin. First, if the FACT_TYPE is a UNARY_FACT_TYPE, then the GENERAL_ENTITY_TYPE involved is an ENTITY_TYPE, which will need an ATTRIBUTE_TYPE derived for it. This can be a simple boolean ATTRIBUTE_TYPE. However, multiple, complementary UNARY_FACT_TYPEs linked to the same ENTITY_TYPE could possibly be combined into a single ATTRIBUTE_TYPE (typically enumerated). However, this is an optimization and at least we have identified an ENTITY_TYPE, which will remain so. Second, GENERAL_ENTITY_TYPEs which are referenced by a LABEL_TYPE are also ENTITY_TYPEs. Third, GENERAL_ENTITY_TYPEs which are DEFINEd by an ENTITY_TYPE_CONSTRAINT are DESCRIPTOR_ATTRIBUTES. There are two subsidiary cases of this. If a DESCRIPTOR_ATTRIBUTE participates in a binary FACT_TYPE, then the GENERAL_ENTITY_TYPE at the other end is an ENTITY_TYPE which it describes. If a DESCRIPTOR_ATTRIBUTE is linked to a FACT_TYPE of higher degree (ternary, quaternary, etc.), then there are two possibilities. Either the DESCRIPTOR_ATTRIBUTE is participating in an ATTRIBUTED_NAMED_RELATIONSHIP and describes that relationship or it describes an ENTITY_TYPE in combination with another DESCRIPTOR_ATTRIBUTE.
278
Figure C-3: Revised CoCoA Model of NIAM Constraints The latter is much less common. If it is the former, at least two of the roles in the higher degree FACT_TYPE will be played by ENTITY_TYPEs. Using these heuristics, we may be able to fully map the NIAM_MODEL onto one of the ER-based models. However, we may not be able to map everything automatically and some human decision making as to 279
whether a GENERAL_ENTITY_TYPE is an ENTITY_TYPE or a DESCRIPTOR_TYPE may be needed. Figure C-3 shows the changes made to figure A-3 which was the original CoCoA model of NIAM constraints. The only changes from figure A-3 are that the various intrafact-type constraints are now derived from the Multiplicity attribute. In summary then, the principal changes to figures A-2 and A-3 are to rename ENTITY_TYPE as GENERAL_ENTITY_TYPE and LABEL_TYPE as IDENTIFIER_TYPE and to mark the REFERENCE_TYPE, FACT_TYPE, ROLE_TYPE, ENTITY_TYPE_CONSTRAINT, UNIQUENESS_CONSTRAINT, OCCURRENCE_FREQUENCY_CONSTRAINT, and MANDATORY_ROLE_CONSTRAINT entity types and the USES_TO_REFER, REFERS_TO, PLAYS, and DEFINES relationships with asterisks to indicate they can be derived.
280
3. The ECR Model Because the ECR Model is ER-based, most of its constructs also map easily onto the integrated CoCoA Model of conceptual data modelling. Figure C-4 shows the CoCoA model of the ECR model using the integrated terminology.
281
Figure C-4: Revised CoCoA Model of the ECR Model ENTITY, ATTRIBUTE, ROLE, and RELATIONSHIP from the original CoCoA model of figure A-4 map directly to ENTITY_TYPE, ATTRIBUTE_TYPE, NAMED_RELATIONSHIP_ROLE_TYPE, and NAMED_RELATIONSHIP_TYPE in 282
the integrated terminology. The ECR model's CLASSIFICATION_RELATIONSHIP is the same as the integrated terminology's GENERALIZATION_RELATIONSHIP, so this maps directly. The ECR Model entity type which plays the SUBTYPE role in the GENERALIZES relationship with CLASSIFICATION_RELATIONSHIP is called an ISA_CATEGORY. An ISA_CATEGORY is a synonym for a SPECIALIZED_ENTITY_TYPE in the integrated model. So an ENTITY_TYPE in the integrated model which plays a SUBTYPE role must be mapped onto an ISA_CATEGORY in the ECR model, and vice versa. If the ENTITY_TYPE additionally plays a SUPERTYPE role, then it must map onto a COMBINED_ENTITY_CATEGORY, which is both a CATEGORY and an ENTITY in the ECR model. Strictly speaking, the GROUPING_OF_ENTITIES_CATEGORY and UNION_RELATIONSHIP do not map directly onto any of the other models considered, and therefore the concepts are included separately in the integrated CoCoA model of conceptual data modelling. This is because of the semantics of the grouping
283
of all of the attributes of all of the entities (union) instead of the intersection, as in generalization. This concept is difficult to describe and is not utilized elsewhere. It is also unsupported by the relational model, violating 1st normal form, so it is difficult to implement. However, we can consider the grouping of entities without bringing the attributes along, which allows multiple ENTITY_TYPEs to play a single NAMED_RELATIONSHIP_ROLE_TYPE in a NAMED_RELATIONSHIP_TYPE. This maps directly to and from the CoCoA Model's support for simply allowing multiple participation in a role. When mapping from the integrated CoCoA model to ECR, the GROUPING_OF_ENTITIES_CATEGORY would simply have the same name as the NAMED_RELATIONSHIP_ROLE_TYPE that it plays. When mapping from ECR to the integrated CoCoA model, the GROUPING_OF_ENTITIES_CATEGORY would simply be dropped and its constituent ENTITY_TYPEs connected directly to the NAMED_RELATIONSHIP_ROLE_TYPE that the GROUPING_OF_ENTITIES_CATEGORY plays. 4. The OMT Model While the OMT Model is not strictly ER-based, it has many common characteristics and mapping is generally straightforward between its view and the integrated CoCoA model of conceptual data modelling. See the original CoCoA model of OMT in figure A5 and the new version in figure C-5.
284
Figure C-5: Revised CoCoA Model of the OMT Model The CLASS, ATTRIBUTE, ROLE, ASSOCIATION, ASSOCIATION_AS_CLASS, GENERALIZATION_RELATIONSHIP, and 285
AGGREGATION_RELATIONSHIP entity types in figure A-5 correspond directly to the OBJECT_TYPE, ATTRIBUTE_TYPE, NAMED_RELATIONSHIP_ROLE_TYPE, NAMED_RELATIONSHIP_TYPE, OBJECTIFIED_NAMED_RELATIONSHIP_TYPE, GENERALIZATION-SPECIALIZATION_RELATIONSHIP, and SIMPLE_COVERING_AGGREGATION_RELATIONSHIP_TYPE in the integrated CoCoA model of conceptual data modelling. Therefore, these are renamed in figure C-5, which also shows aliases where the names aren't obvious. While operations and operation propagation or even objects themselves are not strictly within the purview of conceptual data modelling, they were considered and modelled in section 5 of appendix B. OPERATION and OPERATION_PROPAGATION correspond directly to the integrated CoCoA model entity types OPERATION_TYPE and OPERATION_PROPAGATION_TYPE, so these are also renamed in figure C-5. The relationship types and attribute types in figure A-5 also correspond directly, so mapping is straightforward in either direction. The only additional note is that OBJECT_TYPE is a subtype of ENTITY_TYPE. Therefore, any OBJECT_TYPE is also immediately an ENTITY_TYPE when mapping from OMT to the integrated model. When mapping from the integrated model to OMT, any ENTITY_TYPE stored using another conceptual data model can be further specialized as an OBJECT_TYPE to convert it to the corresponding OMT construct. To summarize, most of the concepts of OMT map directly onto the integrated model, only with the name changes noted. However, OPERATION_TYPEs and OPERATION_PROPAGATION_TYPEs would not be carried over to the other conceptual data models we have considered, as none of them have the same abstraction. When mapping from the integrated model to OMT, the operation is almost as direct, with the exception that ENTITY_TYPEs must be made into OBJECT_TYPEs (merely by declaring them to be so) to give that mapping. 286
287
Appendix D Fundamental Object-Oriented Operations
This appendix provides a more detailed discussion of the fundamental object-oriented operations introduced in section 7.3.2. In particular, for each operation identified in section 7.3.2, this appendix provides more details on 1) the alternative forms of the interfaces to the operations and when to choose each, 2) the implementation of the operation, and 3) preconditions to the operations. The discussion on implementation includes both how the operations act on the standard object-oriented and relational data representations presented in sections 7.3.1 and 7.2, and also how they use the other available operations. Even though the discussions are more thorough here than in section 7.3.2, they are still just an introduction. See [Venable 1993b and 1993c] for more details.
Section 1 describes operations to support entities and attributes. Section 2 describes operations to support named relationships. Section 3 describes operations to support covering aggregation. Section 4 describes operations to store and retrieve objects to and from a relational database. Section 5 gives a short summary. As noted in section 7.3.2, the operations presented here are not all of the operations needed. These are only fundamental operations. Operations which are tied more closely to the problem domain are not discussed here. 1. Operations to Support Entities and Attributes There are four basic operations needed to support entities and their attributes: Create_Object, Value_Attribute_xxx, Modify_Attribute_xxx, and Delete_Object. As mentioned in section 7.3.2, the part of the name in italics should be replaced with the 287
name of the attribute to which it applies. Each of these operations is now discussed in turn. 1) Create_Object (Object_Type, Attribute_Value_List) This is a typical instantiation operation. It optionally initializes the instance variables with appropriate values. Some object-oriented languages provide a Create_Object operation implicitly while others require one to be written. In some systems this is invoked by sending a message to the class, in others by sending it to the object itself. When creating new instances of complex objects, the object starts out without any components. Once a complex object is created, objects and their relationships may be made components of it through the Include_object_xxx and Include_named_relationship_xxx operations to be discussed below. 2) Value_of_Attribute_xxx This function is used to read the value of an instance variable of an object. For each attribute of each object, this operation is established, replacing Attribute_xxx with the attribute name. Depending on the object-oriented language system used, this operation will be implicit and a reference to the attribute name will suffice. 3) Modify_Attribute_xxx (New_Value) This operation is used to assign a new value to an instance variable of an object. For each attribute of each object, this operation is established, replacing Attribute_xxx with the attribute name. In those object-oriented languages/systems which allow direct access to attributes (instance variables), this operation also will be implicit. 4) Delete_Object This operation is used to discard an unneeded object and free up the space it used. Some object-oriented systems require explicit delete operations to free up space used by unneeded objects, but in object-oriented language systems with garbage collection, this operation may not be necessary. Even with garbage collection, most object-oriented 288
language systems optionally allow such an operation. When an object is deleted, any named relationships that the object participates in will also have to be deleted (as discussed in [Coad and Yourdon], [Smolander], and [Venable 1993b]) to ensure preservation of referential integrity within the object-oriented realm. The implementation of the Delete_Object operation can accomplish this by invoking the Delete_Relationship_xxx operation for all named relationships in which the object participates, as discussed below. Deleting complex objects (those which implement entity types which cover aggregate entity and/or named relationship types) is similar to deleting simple objects as described above except that the component objects and covering relationships must also be dealt with. [Kim et al. 1989] discuss this for simple covering aggregation, but we expand the discussion here to complex covering aggregation. If a component object (no matter whether it implements an entity or a named relationship) is existence dependent on the aggregate object being deleted, then that component object must also be deleted. However, if a component object either is not existence dependent, or if it is existence dependent and also a component of another complex object, then it should not be deleted. Whether the component object is deleted or not, the pointers implementing the covering aggregation relationship must also be deleted (if in a set) or set to null values (if a simple instance variable). Finally, if an object being deleted is a component of another object (the entity type it implements is cover aggregated by another entity type), the operation must ensure that the covering aggregation relationship between it and the complex object is also deleted. 2. Operations to Support Named Relationships As introduced in section 7.3.2, there are four fundamental operations used to support named relationships. For each named relationship type, there will need to be create, delete, and traverse operations. Optionally there may be an "is_related" operation. 289
Like the above operations on objects and attributes, the names of these operations need to be further elaborated according to the CoCoA models, i.e., the Relationship_role_xxx here must be replaced with the named relationship and/or role name from the relationships in the CoCoA model. As briefly introduced in section 7.3.2, there are different possible interface forms for the operations supporting named relationships, depending on the maximum cardinality of the entity that the object represents, the degree of the relationship, and the number of different role types in the named relationship that the entity represented by the object could possibly play. This section describes each different interface form and when to use it for each of the operations. 5) Create_Relationship_Role_xxx This operation causes the data structures within the object-oriented system to be changed to represent the existence of a new named relationship. If the object which has this operation could participate in more than one role, the name of this operation must be made specific to which role is to be played by the object and there must be a separate operation for each possible role. If the object could only play one of the roles, then only on operation is needed and the name of the role may be implicit, rather than explicit, in the name of the operation. The standard data structures for implementation of named relationship types were discussed in section 7.3.1.2. First, this operation causes a named relationship object to be created and the pointer instance variables within it made to point to the objects playing the roles. Second, the appropriate pointer within the object on which this operation is invoked is made to point to the new named relationship object. Finally, this operation needs to additionally ensure that the pointers within the other participating object(s) also refer to the new named relationship object. There are several possible interface forms to this operation. 290
a) Create_Relationship_Role_xxx (Other_Participating_Object) This form of the interface to this operation is used if the named relationship is binary. b) Create_Relationship_Role_xxx (Other_Participating_Object_List)
The second form is needed if the named relationship is of ternary or higher degree. If the relationship is ternary, there will be two items in Other_Participating_Object_List. If the relationship is quaternary, there will be 3 items, and so on. A precondition to the operation should be that its invocation should not violate the multiplicity constraints on the named relationship. If the maximum cardinality of the object's participation in the relationship is 1, then the instance variable within the objects whose participation is so constrained must be null before they are changed. This can be accomplished with the Delete_Relationship operation discussed below, but doing so is outside the scope of the Create_Relationship operation. If the maximum cardinality is n, then a pointer instance variable is simply added to the set or collection which implements the instances of the relationship type. [Venable 1993b] further discusses implementation of this operation, especially additional standard operation (including on the named relationship object itself) used in the implementation of Create_Relationship. 6) Delete_Relationship_Role_xxx This operation causes the object-oriented system to be updated to reflect deletion of an existing relationship between two (or more) objects. It does this in three steps. First the operation must take care of the pointer within the object on which the operation is invoked. If the object's maximum participation in the role is 1, this operation sets the object's simple pointer to the named relationship object to a null value. If the object's maximum participation in the role is greater than 1, this operation must search the object's set of pointers to named relationship objects to find the one which corresponds to the named relationship to be deleted, then delete it from the set. 291
Optionally, the pointer or object identifier may only be marked as deleted so that it is inactive, to facilitate ensuring deletion from the database (discussed in [Venable 1993b]). Second, the operation must also delete the named relationship object. Finally, the operation should ensure the consistency of the implementation of the relationship in the other participating objects by ensuring that the pointer(s) in the other object(s) which refer to the deleted named relationship object are either set to null, deleted from the set of pointers, or marked as deleted. There are three different interface forms to this operation. a) Delete_Relationship_Role_xxx This form (without parameters) is used when an object can only participate once in the given role in the named relationship (maximum cardinality = 1). If so, then it is obvious which instance of the relationship to delete. This interface form is sufficient for any relationship with maximum cardinality = 1, no matter what the degree (binary, ternary, etc.). The second and third forms above are for objects which implement an entity type which can participate more than once (maximum cardinality > 1) in a relationship. If so, then there may be multiple instances of the relationship and the one to be deleted must be located. This is why either the parameter is supplied. The supplied parameter is used to search the named relationship objects for the one which implements the particular named relationship instance to be deleted.
b) Delete_Relationship_Role_xxx (Other_Participating_Object_ID)
This form is for binary relationships. In binary relationships, there will be only one other participating object. c) Delete_Relationship_Role_xxx (Other_Participating_Object_ID_List)
This form is used to delete a ternary or higher degree named relationship. If the named relationship is ternary, there will be two objects in the list which are matched 292
against the ordered pairs used to implement the relationship (see section 7.2.2). If the relationship is quaternary, there are 3 objects in Object_List, and so on. The third form is the most generic. It works for an object which can participate any number of times in a named relationship and additionally, whatever the degree of the named relationship. This form could be used in all situations, but is more cumbersome to construct and utilize than the ones above. Like the Create_Relationship operation, there are pre-conditions to the Delete_Relationship operation. It may be useful to enforce minimum cardinality in such an operation. If so, the operation should fail if attempting to reduce the cardinality of the relationship below the minimum. The named relationship must also exist before it can be deleted. Otherwise, it is an error to delete it. Like when deleting a component object above, if a relationship being deleted is a component of any complex object(s), any covering aggregation references(s) to or from the complex object(s) must also be removed. Additionally, the Delete_Object operation must be extended to ensure deletion of named relationships and ensure referential integrity. This may include invoking the Delete_Relationship operation for each relationship the deleted object participates in. 7) Traverse_Relationship_Role_xxx This operation (a function) enables us to see what other objects participate in the particular named relationship type in which the object on which we invoke it participates. It returns a pointer or Object_ID or a set of pointers or Object_IDs of the object(s) at (one of) the other end(s) of the named relationship. The thing returned from the operation depends on the maximum cardinality of the role played by the object on which the operation is invoked. If the maximum cardinality is 1, it returns a single pointer or Object_ID. If the maximum cardinality is n (many), it returns a set of pointers or Object_IDs. 293
Because named relationships are implemented here as named relationship objects, the implementation of the Traverse_Relationship operation will need to invoke operations on those objects. Such an implementation is discussed further in [Venable 1993b]. However, the interface to the Traverse_Relationship operation in the role-playing objects is the same as discussed below. There are four different forms to the interface to this operation. As with Create_Relationship and Delete_Relationship, the role that the object we are invoking this operation on plays in the relationship may not be obvious and need to be specified. However, if it is obvious, then the operation name need not be made specific to the role played. Therefore, different forms are not given depending on this characteristic. However, the form of the interface to this operation still depends on the characteristics of the named relationship, the entity type's participation in it, and what information we might have to narrow the search. a) Traverse_Relationship_Role_xxx The first interface form is for a binary relationship, where it is obvious what role the other participating object(s) play, i.e., there is only one other role. Either the second, third, or fourth form is necessary if the degree of named relationship type is greater than 2 (non-binary). Each allows the determination of which of the other named relationship role types to which we want to traverse the named relationship. b) Traverse_Relationship_Role_xxx (Role_Type_to) In the second form, the Role_Type_to argument tells which of the possible opposite ends to the relationship that is desired must be identified. How to implement this argument (e.g., by name or maybe role number) is a design decision. Both the third and fourth forms instead include the Role_Type_to within the name of the operation rather than use a parameter. 294
c) Traverse_Relationship_Role_xxx_to_Role_xxx The third fourth form includes the Role_Type_to parameter within the name of the operation (to_Role_xxx) rather than use a parameter, but is otherwise equivalent to the second form. d) Traverse_Relationship_Role_xxx_to_Role_xxx (Search_Role_ID_List)
The fourth interface form is used for ternary or higher degree relationships, where we additionally want to specify a subset of the relationships to be traversed. In this case, we may want to narrow the search for objects in the _to_Role_xxx by object that play the third (and/or fourth) role(s). The Search_Role_ID_List parameter is used to hold values to search for in the other role(s) in the relationship, i.e., other than the role played by the object on which the operation was invoked and also different than that played by the object(s) we are seeking. Only object(s) which participate in the to_Role_xxx with the object on which the operation is invoked and additionally have objects which are in the Search_Role_ID_List are returned. Note that this parameter could be added to the second form to get the same effect. For example, if we had a named relationship X with roles A, B, and C, we could invoke the operation Traverse_X_Role_A_to_C on an object Y which plays role A with the Search_Role_ID_List parameter of a pointer to object Z. This would return all objects which play Role C and also have both object Y playing role A and object Z playing role B in the same relationship. This might be a substantially smaller number of instances than all objects playing Role C in relationship(s) with object Y playing role A. The addition of such a parameter gives the operation some of the character of a SELECT operation on a relational database. Including no parameters (the equivalent of the second and third forms) would return all objects playing the to_Role. This completes the description of elementary operations required to support objectoriented implementation of named relationships. However the following operation may 295
also be useful. 8) Is_Related_by_Relationship_Role_xxx (Other_Role_ID_List) This function returns TRUE if a relationship with the objects in Other_Role_ID_List exists as stated, FALSE otherwise. As above, the Role may not be needed as part of the name. This operation has only one interface form. 3. Operations to Support Covering Aggregation The above 8 operations follow from [Coad and Yourdon] and [Smolander]. See also the discussion in [Venable 1993b]. This dissertation further defines several new operations which support the additional semantics of the complex covering aggregation data abstraction supported in CoCoA. This section presents them. As discussed in section 7.3.2, each complex covering aggregate object needs operations to support the semantics of its complex covering aggregation relationships. Operations are needed to add and remove components, examine the state of the covering aggregation relationships, and to handle the special semantics of complex objects. These operations are presented below. Like operation types 1-8 above, many of these operations' names need to be further specified for each aggregation relationship defined in the conceptual data model. In the first four operations below, the _Object_xxx or _Relationship_xxx needs to be replaced by the actual name of the object type or named relationship type that is being aggregated into the covering object. The following operations are all to be made part of a covering object, which implements a complex entity. 9) Include_Object_xxx (Included_Object_ID) This operation causes the object-oriented system representation to be updated to reflect that the complex object on which this operation is invoked contains the object in the parameter Included_Object_ID as a component. As discussed in section 7.3.1.5, a covering aggregation relationship may be 296
represented by a direct reference pointer to a covered object, stored within a set of the same within the covering object. This operation causes the representation of an instance of a covering aggregation relationship's representation to be added to that set or collection within the covering object. Additionally, if the covering relationship is implemented reflexively (as recommended here), the instance variable within the included object which refers to the covering object is set to the correct value. If the object could be a component of several different aggregate objects, this instance variable will be a set or collection to which the reflexive pointer or object identifier must be added. A precondition for this operation is that the object to be covered must not already be a component. Otherwise, an error should be returned. 10) Disinclude_Object_xxx (Disincluded_Object_ID) This operation causes the object-oriented system representation to be updated to reflect that the complex object on which this operation is invoked no longer contains the object in the parameter Included_Object_ID as a component. This operation is similar to the Delete_Relationship operation (number 6 above). The pointer to the covered object to be disincluded is deleted from the set of pointers which represents the aggregation. Alternatively it may be marked as deleted and made inactive. If the covered object is existence dependent on the covering object and it is not shared with any other covering object, it should also be deleted (or moved instead). Any reflexive covering pointer in the formerly covered object should also either be deleted or set to a null value. A precondition for this operation is that the object to be disincluded must already be covered by the complex object on which the operation is invoked. Otherwise it is an error. 11) Include_Relationship_xxx (Named_Relationship_Object_ID) This operation causes the object-oriented system representation to be updated to 297
reflect that the complex object on which this operation is invoked contains the named relationship in the parameter Named_Relationship_Object_ID as a component. This operation adds the pointer representation of the covering aggregation relationship between of the particular named relationship and the covering object to the set of covered relationships represented in the covering object. The only parameter needed is the pointer to or object ID of the named relationship object. This is then added to the set within the covering object which represents covering of that particular named relationship type. If reflexive pointers are used in the covered named relationship object, an operation needs to be invoked on it which sets the reflexive pointer or object identifier to refer back to the covering object. A precondition for this operation is that the named relationship to be covered must not already be a component. Otherwise, an error should be returned. 12) Disinclude_Relationship_xxx (Named_Relationship_Object_ID)
This operation causes the object-oriented system representation to be updated to reflect that the complex object on which this operation is invoked no longer contains the named relationship in the parameter Named_Relationship_Object_ID as a component. This operation deletes the representation of the covering of the particular named relationship from the set of covered relationships represented in the covering object. If a reflexive reference back to the covering object is used in the named relationship object, that reference must also be deleted (or set to null). Additionally, if the named relationship being disincluded is both existence dependent and not shared with another covering object, it must also be deleted. This can be implemented by invoking the Delete_Relationship operation (see above) on one of the role-playing component objects. A precondition for this operation is that the named relationship to be disincluded must already be covered by the complex object on which the operation is invoked. Otherwise it is an error. 298
We may also wish to explicitly examine the state of the covering aggregation relationships. The following four operations provide that capability.
For further
discussion, see [Venable 1993b]. 13) Retrieve_Component_Object_xxx This operation allows its user to determine which object(s) (of the type Object_xxx) are components of the object on which the operation is invoked. It is similar to the Traverse_Relationship operation discussed in section 2. It is a function which returns a set of Object_IDs. As the covering relationship is a binary relationship, none of the other interface forms are needed as they were for the Traverse_Relationship operation. 14) Retrieve_Component_Relationship_xxx The Retrieve_Component_Relationship operation is the same as the Retrieve_Component_Object operation, except that it returns a set of component Named_Relationship_Object_IDs for named relationships of type Relationship_xxx. 15) Is_Component_Object_xxx (Object_ID) This operation allows It user to determine whether the object with Object_ID is a component object (of type Object_xxx) of the object on which the operation is invoked. It is similar to the Is_Related_by operation discussed in section 2. It is a function which returns TRUE if Object_ID is in the set of covered objects of type Object_xxx. 16) Is_Component_Relationship_xxx (Named_Relationship_Object_ID)
This operation is nearly identical to the Is_Component_Object operation. It returns TRUE if Named_Relationship_Object_ID is in set of covered named relationships of type Named_Relationship_xxx. The covering aggregation relationships established or deleted by the above operations are utilized implicitly for a number of other operations according to their semantics in the CoCoA model. When an action is performed on a whole complex object, it is sometimes implied that the same should happen to its components. In particular, this is true for 299
storage and retrieval of the complex object (discussed in section 4 below) and also for copying and comparing complex objects (as discussed next). 17) Deep_copy This operation (sometimes called deep clone) produces a copy of a covering object, including copies of all of its components. Named relationships with other objects are not copied. The alternative operation, a simple copy, copies just the object and its (cartesian aggregate) attributes. A simple copy includes references to the same components as the object being copied, if the components are allowed to be shared. Those components which cannot be shared may not be components of the new copy of a covering object. Sometimes, it is instead preferable that all the component objects should also be copied. This operation must first use the Create_Object operation to create the (empty) new object. It must either supply (copy) the instance variable values at creation or afterwards with the Modify_Attribute_xxx operations. Then it must create copies of each of its component objects (using Deep_copy on them) and create copies of its covered named relationships, using the Create_Relationship_xxx operation. Finally, it must make sure that the covering aggregation relationships are made between the Deep_copied object and its newly created component objects and relationships. 18) Deep_equal (Other_Covering_Object_ID) A simple equality operation on a covering object will only compare the cartesian aggregate attribute instance variables of the complex object and its references to its component objects. Instead, this operation allows us to determine whether a complex object contains equivalent component objects and named relationships, but which are not necessarily identical (i.e., not with the same identity). It returns a TRUE value if all of the attributes of the two objects are equal and they have equal sets of component objects and named relationships, the objects in which are also Deep_equal. 300
To do this, the operation must first determine that the instance variables representing the attributes of the entity are equal, then go through the sets of component objects one by one. For each component object in the object on which the operation is invoked, an equivalent (Deep_equal) component object must be found in the Other_Covering_Object. Then, for each component named relationship, an equivalent component named relationship must also be found in the Other_Covering_Object. To be equivalent, two named relationships must have Deep_equal objects playing each pair of corresponding roles. 4. Operations to Interface the Object-Oriented and Relational Levels This section discusses the operations needed to interface level 1 with level 2 in the architecture (see figure 7-1). The implementation recommended for level 1 is a relational data base (RDB). Level 2 is to be implemented in objects using an object-oriented language system. The objective is that we would like to be able to store objects and retrieve them again from the relational database. The relational database offers the opportunity to share component objects, as well as to ensure referential integrity and eliminate update anomalies. We need to be able to interface the two levels in order to make use of the objects within memory (level 2), yet share them through secondary storage (level 1). To do this, we will add additional operations beyond the 18 listed in sections 7.3.2.1-7.3.2.3. For both the entities and the named relationships, the operations needed are to 1) store the relational equivalent of the object-oriented representation into the RDB, 2) retrieve the equivalent object-oriented representation from the RDB, 3) delete the relational representation from the RDB. When applied to both objects and named relationships, these comprise six additional operations on objects in level 2. These operations are discussed in more detail in [Venable 1993c]. In particular, [Venable 1993c] discusses the generic implementation of these operations making use of 301
a meta schema based on the CoCoA models used as a basis for the relational and objectoriented implementations. This section discusses the six operations in turn, first the three operations for CoCoA entities and then the three for named relationships. It takes into account the relational and object-oriented implementations of CoCoA discussed in sections 7.2 and 7.3.1. Each of the names of the new operations begins with RDB, which stands for Relational Data Base. The numbering of the operations continues from the previous 18 operations. The first three of these operations apply to objects representing the entities and should be built into them. 19) RDB_Store This operation stores an object into a relational database. Put more formally, it changes the state of the representation of a CoCoA entity within the persistent storage of the relational system (level 1) to be consistent with the state of the representation of the same CoCoA entity as an object held in-memory by the object-oriented system (level 2). The previous state of the relational representation of the entity is discarded. The only change to the in-memory storage of the object is that if it does not yet have a surrogate key, the key is determined when the object is placed into relational storage and its value additionally placed in the object's surrogate key instance variable. A simple object with no relationships exists as a single tuple in a relational database. Its instance variables correspond to the relational attributes. If there is no corresponding tuple in the appropriate relation, this operation causes a new tuple representing the object to be added to the relation. If the corresponding tuple already exists, this operation replaces the tuple with one representing the object. Therefore, RDB_Store corresponds to the "add" and "change" transactions in a traditional transaction processing system. The object-oriented implementation of an entity may contain instance variables which implement part of named relationships, as discussed in section 7.3.1.2. Those instance 302
variables should not be saved by this operation, only by the use of the RDB_Store_Relationship operation to be discussed below. If the object covers other objects and/or named relationships, then this operation should invoke the store operations for those things that it covers. Objects covered should be stored first. This can be done by invoking the RDB_Store operation on each of the components, objects first. Relationships covered can be stored by invoking the RDB_Store_Relationship operation discussed in section 7.3.2.2 below. An important remaining issue is how to handle component entities and/or named relationships which have been previously deleted from a complex covering entity object. If an entity is deleted from an object representing a complex entity, and the aggregate object is later stored into the relational database, then any previously deleted objects which are both existence dependent and not currently shared with other complex objects should be deleted from the database. One means to do this is for the aggregate object to keep track of its components that have been deleted so that they can be deleted from the database also if the complex object is later stored (or deleted, see below). Another means is to have the RDB_Delete operation first delete all non-shared, existence dependent components, then store them again from the in-memory objects. This could be made more efficient by only deleting those which aren't still components of the complex object being stored. Note that in either case, the relational storage must be consulted to see if a component is actually shared, as the other covering object(s) may not currently be retrieved into memory. The relational implementation of the entity follows the generalization-specialization structure in the CoCoA model. This operation must ensure that the object's instance variables get stored in the appropriate attributes of the appropriate specialized and/or generalized entity relations. 20) RDB_Retrieve 303
This operation retrieves an object from a relational database. Put more formally, this operation changes the state of the representation of an entity as an object held in-memory by an object-oriented system (level 2) to be consistent with the state of the representation of the same entity within the persistent storage of a relational system. The previous state of the object in-memory is discarded. If the object to be retrieved is a complex object, i.e., there are representation(s) in the relational database that it covers other objects and/or named relationships, then this operation should invoke create and retrieve operations for those things that it covers. Entity objects covered should be created and retrieved first, then named relationship objects. The RDB_Retrieve operation must also retrieve its instance variables according to the distribution of the attributes of the entity across various specialized and generalized entity relations. The instance variables for the object being retrieved must get their values from the appropriate relational attributes, whatever relation they are located in. 21) RDB_Delete This operation deletes an existing object from the relational database along with any named relationships in which it participates. Put more formally, this operation discards the representation of an entity within the relational system (level 1) which corresponds to the entity identified by the object held in-memory by the object-oriented system (level 2). The operation must also discard the relational representations of any named relationships in which the entity participates, to ensure preservation of referential integrity within the relational realm. For a simple object, the single tuple representing it is deleted from the relation corresponding to the object's type. The RDB_Delete operation must also account for generalization and specialization, therefore the appropriate tuples must be deleted from all the specialized and generalized entity relations that implement the particular entity. 304
For a complex object, in addition to the requirements for a simple object above, the component objects and relationships may need to be deleted if they are both existence dependent and not shared with other objects. Those objects which are not both existence dependent and shared will stand on their own should and not be deleted. The relational representation must be consulted to determine whether a component is shared with other entities, since the entities that the components are shared with might not be held inmemory by the object-oriented system at a particular point in time. If needed, deletion of components can be accomplished by invoking the RDB_Delete operation on the components to be deleted from the RDB. Whether the components are to be deleted or not, the relational implementation of the covering relationships must be deleted also. Having discussed the operations for entities, we will now discuss them for named relationships. Similar operations are needed. The following three operations should be built into the named relationship objects instead of the entity objects. However, for each of these three operations, alternative (or supplemental) operations are noted below which can be placed in the entity objects instead. These allow deletion of named relationships when their surrogate keys are not known. 22) RDB_Store_Relationship This operation parallels the RDB_Store operation, except it is for a named relationship and it is to be placed in the named relationship objects. It alters the state of the relational storage of the named relationship to be consistent with the object-oriented implementation of the relationship(s) being stored. The operation maps the objectoriented implementation (pointer instance variables) of a named relationship onto its relational storage (surrogate key attributes). Named relationship objects are mapped onto tuples in relationship relations. To do so, the operation must first determine the surrogate keys of the entities of the participating objects. Since, in the implementation recommended in this dissertation, the 305
surrogate keys are held in instance variables in the entity objects, the operation must trace the pointers from the named relationship object instance variables to retrieve the values of the surrogate keys from the entity objects' instance variables. Next, the operation must store the surrogate keys in the appropriate locations in the relationship relation. If there is an old tuple, it is replaced. If there is no old tuple, a new tuple is inserted. Finally, if the named relationship object does not yet have a surrogate key, the surrogate key is generated and, in addition to being stored in the surrogate key in the intersection record, it is assigned to the named relationship object's surrogate key instance variable. The RDB_Store_Relationship operation must also account for any categories, if they are used. Upon completion, a tuple must be present in the category relation and the relational attribute depicting which entity type of several in the category is participating in the relationship must be set according to the entity type represented by the participating object. A pre-condition to the use of this operation is that the participating objects must have already been associated with the relational database, or at least that their surrogate keys must have been created, otherwise they will not be there so that the operation can determine them. This operation can then be invoked easily as long as a pointer to the named relationship object is available. This could, for example, be done by the RDB_Store operation of a complex covering aggregate object by simple tracing the complex covering aggregation relationship(s) and invoking the operation on each named relationship object covered. It could also be done from a participating object which has a pointer instance variable in it which points to the named relationship object. The identity of the named relationship object might not be known except by an entity object's participation in it. If so, a participating entity object could have a supplemental operation on to store its participation in the relationship, as follows: 306
22a) RDB_Store_Relationship_Role_xxx This operation, which again is placed in the participating entity object, causes all instance(s) of the object's participation in Relationship_Role_xxx to be stored into the relational database. If the object's maximum participation in the relationship is one, it will be implemented in the object with a simple pointer to the named relationship object and that pointer need only be followed to invoke the RDB_Store_Relationship operation on the named relationship object. If the maximum participation of the entity represented by the object is greater than one, then the relationship will be implemented in the object as a set of instance variables instead. Invoking the RDB_Store_Relationship_Role operation on the participating entity object then involves iterating across each of the pointer instance variables in the set, invoking the RDB_Store_Relationship operation on the named relationship pointed at by each pointer. 23) RDB_Retrieve_Relationship The opposite process to the RDB_Store_Relationship operation must happen when a named relationship object is being retrieved. This operation parallels the RDB_Retrieve operation, except that it is for named relationships and it is to be placed in the named relationship objects. It alters the state of the object-oriented implementation to be consistent with the relational storage of the named relationship(s) being retrieved. For the RDB_Retrieve_Relationship operation to be located in the named relationship objects, relationship relations must be implemented with surrogate keys to identify the tuples, as recommended in this dissertation. Otherwise, there is no way to refer to the relationship within the relational model without already having created the named relationship object and already knowing the details of the participation in the relationship. The mappings discussed above are done in reverse. The surrogate key relational attribute from the relational implementation must be converted to the pointer instance 307
variables of the named relationship and entity objects. Before the operation can be invoked, the named relationship object must have been created and its surrogate key instance variable's value assigned. Invoking the operation first causes the relationship relation containing the intersection records to be searched according to find the intersection record with the same surrogate key. Second, the object-oriented system must be consulted to determine the pointers to the objects with the surrogate keys found in the intersection record. Third, these pointers must then be assigned to the instance variables implementing the named relationship within the named relationship object. Finally, the objects referred to by the pointers must be made to point back to the named relationship object. The RDB_Retrieve_Relationship operation, like the RDB_Store_Relationship operation, must account for categories. The correct relation(s) need to be located and consulted depending on the type of the entity which participates in the relationship. A pre-condition to the use of this operation is that the participating objects must have already been retrieved or otherwise exist in the object-oriented system, otherwise they will not be there so that the pointers in the named relationship object can refer to them. This operation would be most useful for retrieving covered named relationships, where the surrogate keys to the intersection records are known from the covering relationships, but which don't know which entities are related by the named relationship. However, as with the RDB_Store_Relationship operation, there may also be circumstances in which the surrogate key for the named relationship is not known, but it is known that one of the objects participates in the relationship. In this case, the relationship(s) that the object participates in can be retrieved by invoking a similar operation on a participating object, e.g., as follows: 23a) RDB_Retrieve_Relationship_Role_xxx In this latter form for the operation, which is placed in the participating entity object, 308
the relationship relation must be searched for the surrogate key of the object on which the operation is invoked to see what intersection record(s), if any, that the object plays in the particular role in the relationship. If any are found, each intersection record would be retrieved and a corresponding named relationship objects would be created and filled with appropriate instance variables for each. If no intersection record(s) is (are) found, the existing simple pointer instance variable should be assigned a null value (or the set left empty). In either case, a set of pointers (for a maximum cardinality greater than one) should be emptied before instances are retrieved. 24) RDB_Delete_Relationship This operation is similar to the RDB_Delete operation on entity objects, except that it is for named relationships and it is to be placed in the named relationship objects. It alters the state of the relational storage to reflect the deletion of a named relationship. The intersection record which implements the named relationship is deleted from the corresponding relationship relation. A precondition to invoking this operation is that if there is a mandatory role in the named relationship (minimum cardinality > 0), the entity tuple for the entity playing that role should have been deleted first, or the operation not be allowed to proceed. Because this operation can be called to implement part of the RDB_Delete operation above, it is still needed. If desired, the RDB_Store_Relationship operation can could used to change the named relationship to link to a different entity first instead of deleting it. Like RDB_Store_Relationship and RDB_Retrieve_Relationship, this operation must account for categories by deleting the appropriate tuple(s) from any relevant category relation(s). The above form of this operation is useful if you know the surrogate key. However, you may want to delete all the occurrences of a named relationship in which a particular object participates. If so, you could use the following form instead. 309
24a) RDB_Delete_Relationship_Role_xxx This operation is an alternative form which should be placed in the participating entity object. It can be used to delete all relational references to the entity object via a particular role in a named relationship type when it is deleted from the RDB, even if the entity type it represents can play the role more than once. The operation can be implemented by simply invoking the above RDB_Delete_Relationship operation on all named relationships in which the entity plays the role of interest. This operation could be used as part of the implementation of the RDB_Delete operation on an object, to ensure referential integrity in the relational storage. 5. Summary This completes the detailed discussion of the form, implementation, and preconditions for the fundamental types of operations on objects representing entities and named relationships derived from CoCoA Models. The discussion has pre-supposed the data implementations recommended in this dissertation. Further details of implementation, including alternative implementations are available in [Venable 1993b]. Discussion of generalized RDB operations (section 4 above) are found in [Venable 1993c].
310
Appendix E Example Implementation of the ISAC A-Graph Solution of the IFIP Case
This appendix provides a full and detailed illustration of the relational and objectoriented implementations of an example, including showing the population of the example with data. In this example, we will use the ISAC solution to the IFIP case study (of a system to support the organization of an IFIP working conference), as described in [Lundeberg]. This well-known example was developed as a standard case against which a number of methodologies could be applied so that they could be compared [Olle et al.].
317
A CoCoA model of the ISAC method was developed then integrated with other data
Figure E-1: ISAC A-Graph of IFIP Case
318
flow models in chapter 5 (see figure 5-6). The overview ISAC A-graph from the ISAC solution in [Lundeberg] (p. 183) was shown in figure 7-2 (repeated in figure E-1). Section 1 will show the relational implementation (a relational schema) of the CoCoA model of ISAC A-Graphs shown in figure 5-6 (repeated in figure E-2) and will also show its population with data to represent the diagram in figure E-1. Section 2 will show the object-oriented implementation (class specifications) of the CoCoA
319
model of ISAC A-Graphs and the instances of those classes (and the data they contain) which would represent the same A-graph. 1. Relational Implementation of the IFIP Case Example This section illustrates the relational implementation of the CoCoA model of ISAC A-Graphs (see figure E-2). It shows what relational schema would result from following the procedure shown in section 7.2. It then shows how the data to represent the ISAC AGraph in figure E-1 would be stored in the schema.
320
Figure E-2: CoCoA Model of Integrated Data Flow Perspective The relational schema needs to represent the various abstractions in the CoCoA 321
model in figure E-2. We will consider the entity types, named relationship types, generalization-specialization relationships, categories, and covering aggregation relationship types, in that order. There should be a relation for each (non-derived) entity type. Each of these relations will have a relational attribute for each attribute of the entity type, as well as a surrogate key attribute. The entity relations for the entity types in figure E-2 are shown below. Data_Flow_Model ISAC_A-Graph Process External Generalized_Flow
The relational attribute named Flow_Type will have one of 3 values: "Data flow", "Real flow", or "Composite flow". Next we look at the relationship relations to store the named relationships relevant to ISAC A-graphs shown in figure E-2. They are shown below.
322
Specifies Flows_to Flows_from Join_into Splits_into
Next we look at the generalization-specialization relationships. As noted in section 7.2, these relationships are not stored in the schema, but are in the meta-schema, if any. Placement of the attributes also is reflected in the structure of the schema, but only represented in the meta-schema (if any). Links between instances of supertypes and subtypes are through the same values of the surrogate key attributes. Thus, there are no additional relations at the schema level to account for this data abstraction. Next we look at categories. Section 7.2 recommended using explicit category relations. Figure E-2 has two categories, for entities playing the destination and source roles in the Flows_to and Flows_from named relationships, respectively. The necessary relations are shown below. Destination Source
Finally, we look at relational support for the covering aggregation relationships in figure E-2. Most of the various entity types and named relationship types discussed above are covered by the ISAC_A-Graph entity type. Relations to implement these covering relationships are shown below. ISAC_A-Graph_Covers_Process ISAC_A-Graph_Covers_External ISAC_A-Graph_Covers_Generalized_Flow ISAC_A-Graph_Covers_Flows_to ISAC_A-Graph_Covers_Flows_from ISAC_A-Graph_Covers_Join_into ISAC_A-Graph_Covers_Splits_into
Having specified the schema for relational storage of ISAC A-Graphs, now we can 323
turn our attention to populating the schema with data to represent the overview A-Graph from the IFIP Case example given in figure E-1. Note that in the schema, we are unconcerned with the storage of the various numbers for the processes ( or "activities"). These numbers can be generated. Below, we show the population of each of the relations specified in the relational schema above. In this example, we will use simple integers for the surrogate keys. Alternative implementations may be more useful. Data_Flow_Model ISAC_A-Graph Process External Generalized_Flow 324
flow"> flow">
Specifies Flows_to (no Flows_to relationship for "Reports on Working Conference") Flows_from (no Flows_from relationship for "Proposals for Working Conference") 325
Join_into (no instances of this relationship) Splits_into
Destination Source ISAC_A-Graph_Covers_Process 326
ISAC_A-Graph_Covers_External ISAC_A-Graph_Covers_Generalized_Flow
ISAC_A-Graph_Covers_Flows_to 327
ISAC_A-Graph_Covers_Flows_from ISAC_A-Graph_Covers_Join_into (no instances of this relationship) ISAC_A-Graph_Covers_Splits_into 2. Object-Oriented Implementation of the IFIP Case Example Having looked at the relational implementation of the example, we can now turn to its object-oriented implementation. Again, this comes in two parts, the objects' instance variables and also their operations. Both are included in the class specifications given 328
below. In general, a class will be created for each entity type shown in figure E-2. This section assumes that Create_Object and Delete_Object operations are either intrinsic in the object-oriented language system, or can be inherited by all classes defined here and that Value_of_Attribute and Modify_Attribute operations are similarly unnecessary. Furthermore, it is assumed that a class named RDB_Storable has been created, which has a Surrogate_Key attribute and RDB_Store, RDB_Retrieve, and RDB_Delete operations. This class is specified first. In the following specifications, the "-->" refers to a pointer type. Class RDB_Storable -- instance variables Surrogate_Key: RDB_Surrogate_Key -- operations Procedure RDB_Store Procedure RDB_Retrieve Procedure RDB_Delete end Class RDB_Storable.
329
Class Data_Flow_Model inherits RDB_Storable -- instance variables Name: text specifier: --> Specifies cover_processes: set of --> Process cover_externals: set of --> External cover_data_store: set of --> Data_Store cover_generalized_flows: set of --> Generalized_Flow cover_flows_to: set of --> Flows_to cover_flows_from: set of --> Flows_from cover_join_into: set of --> Join_into cover_splits_into: set of --> Splits_into -- operations for named relationships Procedure Create_Specifies (proc: --> Process) Procedure Delete_Specifies Function Traverse_Specifies returns --> Process Function Is_Specifier_of (proc: --> Process) returns boolean -- operations for covering aggregation Procedure Include_Process (proc: --> Process) Procedure Include_External (ext: --> External) Procedure Include_Data_Store (DS: --> Data_Store) Procedure Include_Generalized_Flow (GF: --> Generalized_Flow) Procedure Include_Flows_to (FT: --> Flows_to) Procedure Include_Flows_from (FF: --> Flows_from) Procedure Include_Splits_into (SI: --> Splits_into) Procedure Include_Join_into (JI: --> Join_into) Procedure Disinclude_Process (proc: --> Process) Procedure Disinclude_External (ext: --> External) Procedure Disinclude_Data_Store (DS: --> Data_Store) Procedure Disinclude_Generalized_Flow (GF: --> Generalized_Flow) Procedure Disinclude_Flows_to (FT: --> Flows_to) Procedure Disinclude_Flows_from (FF: --> Flows_from) Procedure Disinclude_Splits_into (SI: --> Splits_into) Procedure Disinclude_Join_into (JI: --> Join_into) Function Retrieve_Component_Process returns set of --> Process Function Retrieve_Component_External returns set of --> External Function Retrieve_Component_Data_Store returns set of --> Data_Store Function Retrieve_Component_Generalized_Flow returns set of --> Generalized_Flow Function Retrieve_Component_Flows_to returns set of --> Flows_to 330
Function Retrieve_Component_Flows_from returns set of --> Flows_from Function Retrieve_Component_Splits_into returns set of --> Splits_into Function Retrieve_Component_Join_into returns set of --> Join_into Function Is_Component_Process (proc: --> Process) return boolean Function Is_Component_External (ext: --> External) return boolean Function Is_Component_Data_Store (DS: --> Data_Store) return boolean Function Is_Component_Generalized_Flow (GF: --> Generalized_Flow) return boolean Function Is_Component_Flows_to (FT: --> Flows_to) return boolean Function Is_Component_Flows_from (FF: --> Flows_from) return boolean Function Is_Component_Splits_into (SI: --> Splits_into) return boolean Function Is_Component_Join_into (JI: --> Join_into) return boolean Function Deep_Copy return --> Data_Flow_Model Function Deep_Equal (dfm: --> Data_Flow_Model) return boolean -- operations for storing named relationships Procedure RDB_Store_Specifies Procedure RDB_Retrieve_Specifies Procedure RDB_Delete_Specifies end Class Data_Flow_Model. Class ISAC_A-Graph inherits Data_Flow_Model -- specific items would be added for user depiction -- in level 4 of architecture end Class ISAC_A-Graph.
331
Class Process inherits RDB_Storable -- instance variables Name: text specified: --> Specifies source: --> Flows_from destination: --> Flows_to -- operations for named relationships Procedure Create_Specifies (dfm: --> Data_Flow_Model) Procedure Delete_Specifies Function Traverse_Specifies returns --> Data_Flow_Model Function Is_Specified_by (dfm: --> Data_Flow_Model) returns boolean Procedure Create_Flows_from (flow: --> Generalized_Flow) Procedure Delete_Flows_from (flow: --> Generalized_Flow) Function Traverse_Flows_from returns set of --> Generalized_Flow Function Is_Source_of (flow: --> Generalized_Flow) returns boolean Procedure Create_Flows_to (flow: --> Generalized_Flow) Procedure Delete_Flows_to (flow: --> Generalized_Flow) Function Traverse_Flows_to returns set of --> Generalized_Flow Function Is_Destination_of (flow: --> Generalized_Flow) returns boolean -- operations for storing named relationships Procedure RDB_Store_Specifies Procedure RDB_Retrieve_Specifies Procedure RDB_Delete_Specifies Procedure RDB_Store_Flows_from_source Procedure RDB_Retrieve_Flows_from_source Procedure RDB_Delete_Flows_from_source Procedure RDB_Store_Flows_to_destination Procedure RDB_Retrieve_Flows_to_destination Procedure RDB_Delete_Flows_to_destination end Class Process.
332
Class External inherits RDB_Storable -- instance variables Name: text source: --> Flows_from destination: --> Flows_to -- operations for named relationships Procedure Create_Flows_from (flow: --> Generalized_Flow) Procedure Delete_Flows_from (flow: --> Generalized_Flow) Function Traverse_Flows_from returns set of --> Generalized_Flow Function Is_Source_of (flow: --> Generalized_Flow) returns boolean Procedure Create_Flows_to (flow: --> Generalized_Flow) Procedure Delete_Flows_to (flow: --> Generalized_Flow) Function Traverse_Flows_to returns set of --> Generalized_Flow Function Is_Destination_of (flow: --> Generalized_Flow) returns boolean -- operations for storing named relationships Procedure RDB_Store_Flows_from_source Procedure RDB_Retrieve_Flows_from_source Procedure RDB_Delete_Flows_from_source Procedure RDB_Store_Flows_to_destination Procedure RDB_Retrieve_Flows_to_destination Procedure RDB_Delete_Flows_to_destination end Class External.
333
Class Generalized_Flow inherits RDB_Storable -- instance variables Name: text Flow_Type: {"Data_flow", "Real_flow", "Composite_Flow"} flows_to: --> Flows_to flows_from: --> Flows_from split_superflow: --> Splits_into split_subflow: set of --> Splits_into join_superflow: --> Join_into join_subflow: set of --> Join_into -- operations for named relationships Procedure Create_Flows_from (source: --> Process, External, or Data_Store) Procedure Delete_Flows_from Function Traverse_Flows_from returns --> Process, External, or Data_Store Function Is_Flow_from (source: --> Process, External, or Data_Store) returns boolean Procedure Create_Flows_to (destination: --> Process, External, or Data_Store) Procedure Delete_Flows_to Function Traverse_Flows_to returns --> Process, External, or Data_Store Function Is_Flow_to (destination: --> Process, External, or Data_Store) returns boolean Procedure Create_Superflow_Splits_into (subflow: --> Generalized_Flow) Procedure Delete_Superflow_Splits_into (subflow: --> Generalized_Flow) Function Traverse_Superflow_Splits_into returns set of --> Generalized_Flow Procedure Create_Subflow_Splits_into (subflow: --> Generalized_Flow) Procedure Delete_Subflow_Splits_into Function Traverse_Subflow_Splits_into returns --> Generalized_Flow Procedure Create_Superflow_Join_into (subflow: --> Generalized_Flow) Procedure Delete_Superflow_Join_into (subflow: --> Generalized_Flow) Function Traverse_Superflow_Join_into returns set of --> Generalized_Flow Procedure Create_Subflow_Join_into (subflow: --> Generalized_Flow) Procedure Delete_Subflow_Join_into Function Traverse_Subflow_Join_into returns --> Generalized_Flow -- operations for storing named relationships 334
Procedure RDB_Store_Flows_from_flow Procedure RDB_Retrieve_Flows_from_flow Procedure RDB_Delete_Flows_from_flow Procedure RDB_Store_Flows_to_flow Procedure RDB_Retrieve_Flows_to_flow Procedure RDB_Delete_Flows_to_flow Procedure RDB_Store_Splits_into_subflow Procedure RDB_Retrieve_Splits_into_subflow Procedure RDB_Delete_Splits_into_subflow Procedure RDB_Store_Splits_into_superflow Procedure RDB_Retrieve_Splits_into_superflow Procedure RDB_Delete_Splits_into_superflow Procedure RDB_Store_Join_into_subflow Procedure RDB_Retrieve_Join_into_subflow Procedure RDB_Delete_Join_into_subflow Procedure RDB_Store_Join_into_superflow Procedure RDB_Retrieve_Join_into_superflow Procedure RDB_Delete_Join_into_superflow end Class Generalized_Flow. Now that we have specified the classes which implement the entity types, we can turn to the classes which implement the named relationship objects. Like the classes which implement the entity objects all inherit from the RDB_Storable class, all classes which implement named relationship objects should inherit from a similar class, called RDB_Storable_Relationship_Object. The specification for this class is shown first below, then the individual classes for the 5 named relationship types in figure E-2. Class RDB_Storable_Relationship_Object -- instance variables Surrogate_Key: RDB_Surrogate_Key -- operations Procedure RDB_Store_Relationship Procedure RDB_Retrieve_Relationship Procedure RDB_Delete_Relationship end Class RDB_Storable_Relationship_Object.
335
Class Specifies inherits RDB_Storable_Relationship_Object -- instance variables specifier: --> Data_Flow_Model specified: --> Process end Class Specifies. Class Flows_to inherits RDB_Storable_Relationship_Object -- instance variables destination: --> External, Process, or Data_Store flow: --> Generalized_Flow end Class Flows_to. Class Flows_from inherits RDB_Storable_Relationship_Object -- instance variables source: --> External, Process, or Data_Store flow: --> Generalized_Flow end Class Flows_from. Class Join_into inherits RDB_Storable_Relationship_Object -- instance variables superflow: --> Generalized_Flow subflow: set of --> Generalized_Flow end Class Join_into. Class Splits_into inherits RDB_Storable_Relationship_Object -- instance variables superflow: --> Generalized_Flow subflow: set of --> Generalized_Flow end Class Splits_into. This concludes this example of the specification of the classes which implement the entity types and relationship types used in the ISAC A-Graph.
336
BIBLIOGRAPHY
[Alagic] Alagic, Suad, Object-Oriented Database Programming. Springer-Verlag, New York, NY, 1988. [Batani et al., 1986] Batani, Carlo, M. Lenzerini, and Shamkant B. Navathe, "A Comparative Analysis of Methodologies for Database Schema Integration," ACM Computing Surveys. Vol. 18, No. 4, December 1986, pp. 323-364. [Batani et al., 1992] Batani, Carlo, Stefano Ceri, and Shamkant B. Navathe, Conceptual Database Design: An Entity-Relationship Approach. Benjamin/Cummings Publishing Company, Redwood City, CA, 1992. [Batory and Buchman] Batory, D. S. and A. P. Buchman, "Molecular Objects, Abstract Data Types, and Data Models: A Framework," Proceedings of the Tenth International Converence on Very Large Data Bases, Singapore, August, 1984 (U. Dayal, F. Schlageter, and L. H. Song, editors), pp. 172-184. [Blaha et al.] Blaha, Michael R., William J. Premerlani, and James E. Rumbaugh, "Relational Database Design Using an Object-Oriented Methodology," Communications of the ACM, Vol. 31, No. 4, April 1988, pp. 414-427. [CASE Studies 1987] CASE Studies 1987. Proceedings of the Eighth Annual Conference on Applications of Computer-Aided Systems Engineering Tools, Ann Arbor, MI, USA, May 18-22, 1987, Meta Systems, Ann Arbor, MI, 1987. [CASE Studies 1988] CASE Studies 1988. Proceedings of the Ninth Annual Conference on Applications of Computer-Aided Systems Engineering Tools, Ann Arbor, MI, USA, May 23-27, 1988, Meta Systems, Ann Arbor, MI, 1988. [Charette] Charette, Robert N., Software Engineering Environments: Concepts and Technology. Intertext Publications, New York, NY, 1986. [Chen] Chen, Peter Pin-Shan, "The Entity-Relationship Model - Toward a Unified View of Data," ACM Transactions on Database Systems, Vol. 1, No. 1, March 1976, pp. 9-36. [Coad and Yourdon] Coad, Peter and Edward Yourdon, Object-Oriented Analysis. 2nd ed., Yourdon Press, Englewood Cliffs, NJ, 1991. [DeMarco] Demarco, Tom, Structured Analysis and System Specification. Prentice-Hall, Englewood Cliffs, NJ, 1979. 335
[Elmasri et al.] Elmasri, R., J. Weeldreyer, and A. Hevner, The Category Concept: An Extension to the Entity-Relationship Model, Data and Knowledge Engineering, Vol. 1 (1985), pp. 75-116. [Gane and Sarson] Gane, Christopher P. and Trish Sarson, Structured Systems Analysis: Tools and Techniques. Prentice-Hall, Englewood Cliffs, NJ, 1977. [Harel] Harel, David, "On Visual Formalisms," Communications of the ACM, Vol. 31, No. 5, May 1988, pp. 514-530. [Heym and Österle] Heym, M. and H. Österle, "A Reference Model for Information Systems Development," in [Kendall et al.], pp. 215-239. [Iivari] Iivari, Juhani, "Relationships, Aggregations, and Complex Objects," Proceedings of the European-Japanese Seminar on Information Modelling and Knowledge Bases. (S. Oshuga, K. Hori, H. Kangassalo, and H. Faakkola, editors) IOS Press, Amsterdam, The Netherlands, 1992. [Kendall et al.] Kendall, Kenneth E., Kalle Lyytinen, and Janice I. DeGross (editors), The Impact of Computer Supported Technologies on Information Systems Development. Proceedings of the IFIP WG 8.2 Conference on the Impact of Computer Supported Technologies on Information Systems Development, Minneapolis, MN, USA, 14-17 June 1992, North Holland, New York, NY, 1992. [Kerola and Oinas-Kukkonen] Kerola, Pentti and Harri Oinas-Kukkonen, "Hypertext Systems as an Intermediary Agent in CASE Environments," in [Kendall et al.], pp. 289-313. [Kim et al., 1988] Kim, Won, Hong-Tai Chou, and Jay Banerjee, "Operations and Implementation of Complex Objects," IEEE Transactions on Software Engineering. Vol. 14, No. 7, July, 1988, pp. 985-996. [Kim et al., 1989] Kim, Won, Elisa Bertino, and Jorge F. Garza, "Composite objects Revisited," Proceedings of the 1989 ACM SIGMOD International Conference on the Management of Data, Portland, Oregon, (J. Clifford, E. Lindsay, and D. Maier, editors) SIGMOD Record. Vol 18, No. 2, 1989, pp. 337-347. [Klein and Venable] Klein, Heinz K. and John R. Venable, "An Object-Oriented Tool Architecture to Support Systems Development as Negotiation," in [CASE Studies 1987], paper number C8730. [Loomis et al.] Loomis, Mary E. S., A. V. S. Shah, and James E. Rumbaugh, "An Object Modelling Technique for Conceptual Design," Proceedings of the European Conference on Object-Oriented Programming. Paris, France, 15-17 June 1987, Lecture Notes in Computer Science, 276, Springer-Verlag, New York, NY, 1987. 336
[Lundeberg] Lundeberg, Mats, "The ISAC Approach to Specification of Information Systems and its Application to the Organization of an IFIP Working Conference," in [Olle et al.], pp. 173-234. [Lyytinen and Tahvanainen] Lyytinen, Kalle and Veli-Pekka Tahvanainen (editors), Next Generation of CASE Tools. Proceedings of the Second Workshop, Trondheim, Norway, May 11-12, 1991, IOS Press, Amsterdam, Netherlands, 1992. [Mathiessen et al.] Mathiassen, Lars, Andreas Munk-Madsen, Peter Axel Nielsen, and Jan Stage, Objektorienteret Analyse. (Object-Oriented Analysis -- in Danish), Marko, Aalborg, Denmark, 1993. [McGee] McGee, W.C., "On User Criteria for Data Model Evaluation," ACM Transactions on Database Systems, 1, 4, (December 1976), pp. 370-387. [Nijssen and Halpin] Nijssen, G. M. and T. A. Halpin, Conceptual Schema and Relational Database Design: A Fact Oriented Approach. Prentice-Hall, Englewood Cliffs, NJ, 1989. [Olle] Olle, T. William, Information Systems Methodologies: A Framework for Understanding. 2nd edition, Addison-Wesley, Reading, MA, 1991. [Olle et al.] Olle, T. William, Henk G. Sol, and Alex A. Verrijn Stuart (editors), Information Systems Design Methodologies: A Comparative Review. Proceedings of the IFIP WG 8.2 Working Conference on Comparative Review of Information Systems Design Methodologies, Noordwijerhout, Netherlands, 10-14 May, 1982, North-Holland, New York, NY, 1982. [Ross] Ross, Douglas T., "Structured Analysis (SA): A Language for Communicating Ideas," IEEE Transactions on Software Engineering, Vol. SE-3, No. 1, January 1977, pp. 16-34. [Rumbaugh] Rumbaugh, James E., "Relations as Semantic Constructs in an ObjectOriented Language," OOPSLA '87 Proceedings. ACM Press, 1987, pp. 466-481. [Rumbaugh et al.] Rumbaugh, James E., Michael Blaha, William Premerlani, Frederick Eddy, and William Lorensen, Object-Oriented Modeling and Design. Prentice-Hall, Englewood Cliffs, NJ, 1991. [Shlaer and Mellor] Shlaer, Sally and Stephen J. Mellor, Object-Oriented Systems Analysis: Modeling the World in Data. Yourdon Press, Englewood Cliffs, NJ, 1988. [Shneiderman] Shneiderman, Ben, Software Psychology. Little, Brown and Company, Boston, MA, 1980. 337
[Smith and Smith, 1977a] Smith, John Miles and Diane C. P. Smith, "Database Abstractions: Aggregation," Communications of the ACM. Vol 20, No. 6, June 1977, pp. 405-413.
[Smith and Smith, 1977b] Smith, John Miles and Diane C. P. Smith, "Database Abstractions: Aggregation and Generalization," ACM Transactions on Database Systems. Vol 2, No. 2, June 1977, pp. 105-133. [Smolander] Smolander, Kari, "OPRR -- A Model for Modelling Systems Development Methods," in [Lyytinen and Tahvanainen], pp. 224-239. [Teorey et al.] Teorey, Toby J., Dongqing Yang, and James P. Fry, A Logical Design Methodology for Relational Databases Using the Extended Entity-Relationship Model, Computing Surveys. Vol 18, No. 2, June 1986, pp. 197-222. [Venable 1993a] Venable, John R., "Implementing CoCoA Models in the Relational Model," Working paper. [Venable 1993b] Venable, John R., "Implementing CoCoA Models in the Object-Oriented Model," Working paper. [Venable 1993c] Venable, John R., "Mapping Between the Object-Oriented and Relational Models," Working Paper. [Venable and Truex] Venable, John R. and Duane P. Truex III, "An Approach for Tool Integration in a CASE Environment,"in [CASE Studies 1988], paper number C8812. [Verhijen and van Bekkum] Verhijen, G.M.A. and J. van Bekkum, "NIAM: An Information Analysis Method," in [Olle et al.], pp. 537-589. [Wijers et al.] Wijers, G. M., A.H.M. ter Hofstede, and N.E. van Oosterom, "Representation of Information Modelling Knowledge," in [Lyytinen and Tahvanainen], pp. 167-223. [Yourdon] Yourdon, Edward, Modern Structured Analysis. Yourdon Press, Englewood Cliffs, NJ, 1989.
338