Views for Interoperability in a Heterogeneous

0 downloads 0 Views 1MB Size Report
among sets of pre-existing and heterogeneous object-oriented databases ... To this end, we have designed a schema integration language called MVDL (Multiple ...... 910409 420. IBM. DATE CLOSING-PRICE. 910408 347. 910409 350. GM.
Views for Interoperability in a Heterogeneous Object-Oriented Multidatabase System

Rehab M. Duwairi

Department of Computer Science University of Wales College of Cardi April, 1997

A dissertation submitted in partial ful llment of the requirement for the degree of Doctor of Philosophy.

DECLARATION This work has not previously been accepted in substance for any degree and is not being concurrently submitted in candidature for any degree. Signed .. ... .. .. .. .. .. ... .. .. .. .. .. ... .. .. .. .. .. ... .. (candidate) Date . .. .. ... .. .. .. .. .. ... STATEMENT 1 This thesis is the result of my own investigations, except where otherwise stated. Other sources are acknowledged by giving explicit references. A bibliography is appended. Signed .. ... .. .. .. .. .. ... .. .. .. .. .. ... .. .. .. .. .. ... .. (candidate) Date . .. .. ... .. .. .. .. .. ... Signed .. .. .. .. .. ... .. .. .. .. .. ... .. .. .. .. .. ... .. .. .. . (supervisor) Date . .. .. ... .. .. .. .. .. ... Signed .. .. .. .. .. ... .. .. .. .. .. ... .. .. .. .. .. ... .. .. .. . (supervisor) Date . .. .. ... .. .. .. .. .. ... STATEMENT 2 I hereby give consent for my thesis, if accepted, to be available for photocopying and for inter-library loan, and for the title and summary to be made available to outside organisations. Signed .. ... .. .. .. .. .. ... .. .. .. .. .. ... .. .. .. .. .. ... .. (candidate) Date . .. .. ... .. .. .. .. .. ...

ii

Abstract The research reported in this thesis is concerned with supporting interoperability among sets of pre-existing and heterogeneous object-oriented databases without forcing them to conform to a common data model, query language and DBMS, i.e. these databases preserve their local autonomy. Special attention has been paid to logically heterogeneous object-oriented databases - where heterogeneity arises through independent design. We have focussed on supporting multiple integration views over the participating databases - as di erent users have di erent reasons for integration and the same user may need to integrate the same set of local integration units in various ways to suit di erent roles or tasks. Thus this research avoids the rigidity of a one-toone correspondence between merging rules and local integration units, and allows local con icts to be reconciled in various ways according to user requirements and preferences. To this end, we have designed a schema integration language called MVDL (Multiple View De nition Language). It consists of a small set of operators that integrate local classes which typically originate from several autonomous databases. This means that they are very likely to be logically heterogeneous. Therefore this heterogeneity should be detected and consequently reconciled during the integration process. MVDL supports multiple semantic interpretations of local data - where it allows the same local classes to be integrated in di erent ways. We also have designed and implemented a prototype that supports the generation of multiple integration views as a semantic layer over participating databases. This prototype is called MVDS (Multiple View De nition System). It has been implemented using meta-programming technology because of its proven power and suitability to this type of research. Originally, integration views are generated, using MVDS, from scratch through the iii

application of MVDL operators to local integration units. Alternatively, an integration view can be generated by evolution - where a set of prede ned global modi cations is applied to an existing integration view to derive a new one. This latter method of view generation is facilitated by the knowledge reuse aspect of our research - here meta-knowledge that accrues from the integration process is stored in an MVDS knowledge base and subsequently utilised by the integrator. Rules that specify how to exploit this knowledge, in the context of generating multiple integration views, have been de ned and associated with the MVDS knowledge base. Knowledge reuse substantially reduces the complexity of the schema integration process. To summarise, the major achievements from this research are a schema integration language (MVDL), a prototype system (i.e. MVDS) that supports multidatabase interoperability and a view modi cation module that supports changes in user requirements at the global level.

iv

Acknowledgements I would like to start by praising God Almighty for providing me with faith, patience and commitment to complete this research. Many people have made this thesis possible. I would like to especially thank the following:

 My supervisors, Professor W. A. Gray and Dr. N. J. Fiddian for their valu-

able advice throughout this research. I appreciate their unlimited support and patience with their students. I am grateful for their meticulous reading of and constructive comments on this thesis and our papers. I am honoured to be one of their students.

 My family who always believed in me. I would like to thank them very much for their help and support despite all the obstacles. I cannot express my gratitude to them.

 Mr. Ali Imran Iqbal for his continuous and priceless encouragement, support and advice.

 Many anonymous referees - who have scrutinised our papers and given valuable feedback which made our system what it is today.

 Mr. Wernher Behrendt for many valuable discussions on interoperability issues.

 Mr. Ajith Madurapperuma for his valuable advice on software-related problems.

v

 Margaret Evans (the Departmental secretary) for helping me with travelrelated and administrative issues.

 The OKS group at Cardi for creating a pleasant environment to work with.  Last but not least, Jordan University of Science and Technology, the British Council and ORS for nancially supporting this research.

vi

Contents Abstract

iii

Acknowledgements

v

List of Figures

xiv

List of Tables

xvii

Acronyms

xviii

1 Introduction

1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Historical Background . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Users' Data Processing Requirements . . . . . . . . . . . . . . 3 1.2.2 Multidatabase Systems: a Plausible Solution . . . . . . . . . . 4 1.2.3 Schema Integration in Multidatabase Systems . . . . . . . . . 6 1.2.4 Multidatabase Interoperability and the OKS Group at Cardi

7

1.3 Research Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Overall Achievements of the Research . . . . . . . . . . . . . . . . . . 11 1.5 Organisation of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . 12 vii

2 Multidatabase Systems: Background

15

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Data Sharing Alternatives . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 What is a Multidatabase System? . . . . . . . . . . . . . . . . . . . . 18 2.4 Heterogeneity in Multidatabase Systems . . . . . . . . . . . . . . . . 20 2.4.1 Heterogeneity Aims for MVDS . . . . . . . . . . . . . . . . . . 21 2.5 Local Autonomy in Multidatabase Systems . . . . . . . . . . . . . . . 22 2.6 Multidatabase System Architectures . . . . . . . . . . . . . . . . . . . 23 2.6.1 Loosely-Coupled Federated Multidatabase Systems . . . . . . 26 2.6.2 Tightly-Coupled Federated Multidatabase Systems . . . . . . 28 2.7 Multidatabase System Architectures Comparison . . . . . . . . . . . 31 2.8 Multidatabase System Reference Architecture . . . . . . . . . . . . . 32 2.9 Multidatabase System Examples . . . . . . . . . . . . . . . . . . . . . 33 2.9.1 Pegasus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.9.2 Myriad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.9.3 Multibase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Object-Oriented Modelling and Schema Integration

38

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2 The Object-Oriented Data Model . . . . . . . . . . . . . . . . . . . . 38 3.2.1 Basic Object-Oriented Concepts . . . . . . . . . . . . . . . . . 38 3.2.2 Why the Object-Oriented Model? . . . . . . . . . . . . . . . . 40 3.2.3 Structural versus Behavioural Object-Orientation . . . . . . . 41 3.3 Object-Orientation and Multidatabase Systems . . . . . . . . . . . . 42 viii

3.3.1 Object-Based Multidatabase Systems . . . . . . . . . . . . . . 42 3.3.2 A Comparison of Object-Oriented and Multidatabase Canonical Data Models . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4 Schemata Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4.1 Sources of Heterogeneity . . . . . . . . . . . . . . . . . . . . . 44 3.4.2 Classi cation of Semantic Heterogeneity in the OO Model . . 47 3.4.3 What is Schema Integration? . . . . . . . . . . . . . . . . . . 56 3.4.3.1 Schema Integration Strategies . . . . . . . . . . . . . 56 3.4.3.2 Schema Integration Phases . . . . . . . . . . . . . . 56 3.4.4 The Impact of Preserving Local Autonomy on Schema Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4 Multidatabase Interoperability Via Multiple Global Views

59

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 Schema Integration Approaches . . . . . . . . . . . . . . . . . . . . . 60 4.3 A Sample of Schema Integration Projects . . . . . . . . . . . . . . . . 62 4.3.1 Integration Operator Based Approaches . . . . . . . . . . . . 62 4.3.2 Knowledge Based Approaches . . . . . . . . . . . . . . . . . . 63 4.3.3 View De nition Facility Based Approaches . . . . . . . . . . . 65 4.4 Logical Foundations for MVDL and MVDS . . . . . . . . . . . . . . . 68 4.5 MVDL and MVDS Design Principles . . . . . . . . . . . . . . . . . . 69

5 Integration Operators for Generating Global Views

72

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2 De ning Global Views Using MVDL . . . . . . . . . . . . . . . . . . 72 5.3 MVDL Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 ix

5.3.1 Importation Operators . . . . . . . . . . . . . . . . . . . . . . 77 5.3.2 Generalisation Operators . . . . . . . . . . . . . . . . . . . . . 78 5.3.3 Specialisation Operators . . . . . . . . . . . . . . . . . . . . . 83 5.3.4 Merger Operators . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.5 Aggregation Operators . . . . . . . . . . . . . . . . . . . . . . 88 5.3.6 Other Operators . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.4 Global View Materialisation Rules . . . . . . . . . . . . . . . . . . . . 90 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6 MVDS Design Principles

94

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.2 The Need for a Canonical Data Model . . . . . . . . . . . . . . . . . 94 6.2.1 The ODMG-93 Standard . . . . . . . . . . . . . . . . . . . . . 95 6.2.1.1 ODMG-93 Data Model . . . . . . . . . . . . . . . . . 96 6.2.1.2 The Object De nition Language . . . . . . . . . . . 97 6.2.1.3 Object Query Language . . . . . . . . . . . . . . . . 97 6.2.1.4 Language Bindings . . . . . . . . . . . . . . . . . . . 98 6.2.2 MVDS Intermediate Model . . . . . . . . . . . . . . . . . . . 98 6.3 Schema Integration Phases in MVDS . . . . . . . . . . . . . . . . . . 99 6.4 Inter-Schema Correspondence Establishment . . . . . . . . . . . . . . 100 6.4.1 Analysing the Local Classes . . . . . . . . . . . . . . . . . . . 101 6.4.2 Semantic Relationship Types . . . . . . . . . . . . . . . . . . 102 6.5 Inter-Object Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.5.1 Object Identi cation Techniques in a Single Database . . . . . 105 x

6.5.2 Object Identi cation Techniques in Multidatabase Systems . . 105 6.5.3 Inter-Object Equivalence Detection in MVDS . . . . . . . . . 106 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

7 MVDS Architecture and Operation

108

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.2 MVDS External Architecture . . . . . . . . . . . . . . . . . . . . . . 108 7.2.1 Source Schema Speci cation Interface . . . . . . . . . . . . . . 109 7.2.2 Target Schema Speci cation Interface . . . . . . . . . . . . . . 110 7.2.3 Global View De nition Interface . . . . . . . . . . . . . . . . . 111 7.2.3.1 Global View De nition Algorithm . . . . . . . . . . . 112 7.3 View Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.3.1 View Non-redundancy . . . . . . . . . . . . . . . . . . . . . . 114 7.3.2 View Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.3.3 View is-a Relationships . . . . . . . . . . . . . . . . . . . . . . 116 7.4 A Sample Interaction Session . . . . . . . . . . . . . . . . . . . . . . 119

8 Schema Integration Meta-Knowledge Classi cation and Reuse in MVDS 130 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.2 Inter-schema Correspondence Reuse . . . . . . . . . . . . . . . . . . . 131 8.3 Virtual Class Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 8.3.1 Allowed Modi cations . . . . . . . . . . . . . . . . . . . . . . 139 8.3.2 Global View Generation by Evolution . . . . . . . . . . . . . . 140 8.3.3 An Example of Global View Evolution . . . . . . . . . . . . . 141 xi

8.4 Meta-Knowledge Management in MVDS . . . . . . . . . . . . . . . . 146

9 Evaluation of the Research

150

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 9.2 MVDS in the Multidatabase Context . . . . . . . . . . . . . . . . . . 151 9.3 MVDS in the ITSE Environment . . . . . . . . . . . . . . . . . . . . 153 9.4 Architectural Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . 154 9.5 MVDL and Schema Integration . . . . . . . . . . . . . . . . . . . . . 157 9.6 Implementation Language . . . . . . . . . . . . . . . . . . . . . . . . 157 9.7 Other Applications of MVDS and MVDL . . . . . . . . . . . . . . . . 159 9.8 MVDS Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 9.8.1 Establishing Inter-Schema Correspondences - Revisited . . . . 161 9.8.2 Binary Integration vs. N-ary Integration . . . . . . . . . . . . 162 9.9 Original Goals - Revisited . . . . . . . . . . . . . . . . . . . . . . . . 163

10 Conclusions and Future Work

165

10.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 10.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

A A Summary of MVDL Operators

171

B ODL Grammar

175

C View Creation Using MVDL

182

D View Creation by Evolution

194

E A Comparison of UniSQL/X and ODMG Object Models

200

xii

F UniSQL/X Grammar

202

G Prolog Predicates for Expressing MVDS Views in ODMG ODL 204 H Prolog Predicates for Expressing MVDS Views in UNISQL/X

212

I Integrating n Classes Using MVDL Operators

215

Bibliography

219

xiii

List of Figures 2.1 A Taxonomy of Multidatabase System Architectures [SHE90] . . . . . 25 2.2 Schema-Importation-Based Multidatabase System . . . . . . . . . . . 26 2.3 Language-Based Multidatabase System . . . . . . . . . . . . . . . . . 27 2.4 A Single-Federation Multidatabase System . . . . . . . . . . . . . . . 29 2.5 Multiple Federations Multidatabase System . . . . . . . . . . . . . . 30 2.6 MVDS's Reference Architecture for Multidatabase Systems . . . . . . 34 3.1 A Multidatabase System based on Distributed Object Architecture . 43 3.2 Phases of the Database Design Process (Based on [ELM94a]) and possible Sources of Heterogeneity . . . . . . . . . . . . . . . . . . . . 45 3.3 Semantic Con icts Classi cation . . . . . . . . . . . . . . . . . . . . . 49 3.4 An Example of Property Con icts between Semantically Equivalent Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5 An Example of a Reconciliation Table between Student's Marks and Grades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.6 An Example of Inconsistencies in the Specialisation Criteria . . . . . 52 3.7 An Example of Inconsistencies in the Specialisation Degree and Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.8 Stock Market in NY, Barcelona and Melbourne (based on [KRI91, SAL93]) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.9 Schema Integration Strategies (based on [OZS91]) . . . . . . . . . . . 57 xiv

5.1 Virtual Class Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2 An Example of Local Class Integration Using the Union Operator . . 80 5.3 An Example of Local Class Integration Using the Union* Operator . 82 5.4 An Example of Local Class Integration Using the Intersect Operator . 84 5.5 An Example of Local Class Integration Using the Intersect1 Operator 85 5.6 Using the MVDL Combine Operator to Merge Two Equivalent Classes 86 5.7 Integrating Local Class Hierarchies Using the Combine* Operator . . 87 5.8 An Example of Local Class Integration Using the Aggregate Operator 88 5.9 An Example of Local Class Integration Using the Di erence Operator 89 5.10 An Example of Local Class Integration Using the Connect Operator . 91 6.1 Schema Integration Phases in MVDS . . . . . . . . . . . . . . . . . . 99 6.2 Establishing Inter-Schema Correspondences in MVDS . . . . . . . . . 104 7.1 The External Architecture of MVDS . . . . . . . . . . . . . . . . . . 110 7.2 Internal Architecture of MVDS . . . . . . . . . . . . . . . . . . . . . 113 7.3 View Closure Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.4 An Algorithm for Is-a Relationships Inference . . . . . . . . . . . . . 120 7.5 MVDS Main Window . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.6 Two Local Schemas to be Integrated Using MVDS . . . . . . . . . . . 122 7.7 The Combine Operator Window . . . . . . . . . . . . . . . . . . . . . 123 7.8 Inter-Class Correspondence Window . . . . . . . . . . . . . . . . . . 124 7.9 S1-ENGINEER and S2-ENGINEER Inter-Class Correspondences . . 124 7.10 MVDS Output for Global Engineer View (V1) in ODL . . . . . . . . 127 7.11 Materialisation Rules for the Class V1-ENGINEER . . . . . . . . . . 128 xv

7.12 Materialisation Rules for the Class V1-ADDRESS . . . . . . . . . . . 129 7.13 Materialisation Rules for the Class V1-PROJECT . . . . . . . . . . . 129 8.1 An Example of Semantic Con icts between Equivalent Classes . . . . 133 8.2 Meta Knowledge Accruing from Comparing Classes Schema1-EMPLOYEE and Schema2-EMPLOYEE . . . . . . . . . . . . . . . . . . . . . . . . . 135 8.3 Heuristics for Semantic Relationship Types Reuse . . . . . . . . . . . 136 8.4 An Example of Meta-Knowledge Reuse at the Semantic Relationship Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.5 Partial Deduction in MVDS . . . . . . . . . . . . . . . . . . . . . . . 138 8.6 Meta-Knowledge Reuse along the Generalisation Hierarchy . . . . . . 139 8.7 The Set of Allowed Global Modi cations . . . . . . . . . . . . . . . . 140 8.8 Global View Modi cation Module . . . . . . . . . . . . . . . . . . . . 142 8.9 An Example of a Global View Created From Scratch . . . . . . . . . 143 8.10 The Internal Representation of the Virtual Class PERSON . . . . . . 144 8.11 Person Names and Spouses View . . . . . . . . . . . . . . . . . . . . 145 8.12 Person Names and Salaries View . . . . . . . . . . . . . . . . . . . . 146 8.13 An Example of the Browse Class Option . . . . . . . . . . . . . . . . 148 8.14 The Class S1-Engineer in its DDL format (ODL in this case) . . . . . 149

xvi

List of Tables 2.1 Data Sharing Alternatives . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Multidatabase Systems Terminology Comparison . . . . . . . . . . . 31 2.3 Multidatabase System Architectures Comparison of Selected Features 32 5.1 MVDL Operator Argument Types . . . . . . . . . . . . . . . . . . . . 91 6.1 Heuristics for Establishing Real World Semantic Relationships . . . . 103

xvii

Acronyms AI CAD/CAM CDM CDMs CORBA DB DAG DBA DBAs DBMS DBMSs DCG DD DDBS DDBSs DDL DDLs DML DOA E-R FQA FQM FTA FTM GAMM GCL GUIs HDDB HOSQL

Arti cial Intelligence. Computer Aided Design and Manufacturing. Canonical Data Model. Canonical Data Models. Common Object Request Broker Architecture. Database. Directed Acyclic Graph. Database Administrator. Database Administrators. Database Management System. Database Management Systems. De nite Clause Grammar. Data Dictionary. Distributed Database System. Distributed Database Systems. Data De nition Language. Data De nition Languages. Data Manipulation Language. Distributed Object Architecture. Entity - Relationship. Federated Query Agent. Federated Query Manager. Federated Transaction Agent. Federated Transaction Manager. Global Attribute Manipulation Module. Global Context Language. Graphical User Interfaces. Heterogeneous and Distributed Database. Heterogeneous Object SQL. xviii

IDL ISs KB KODIM MDB MDBs MDBS MDBSs MVDL MVDS ODBMS ODL ODMG ODP OID OIDs OIS OKS OMG OMT OO OODB OODBMS OODBMSs OODBTG OOMDB OQL ORB SHDM SIM QMTS SMIS SMTS SQL TSSM VCC VDM VMM

Interface De nition Language. Interoperable Systems. Knowledge Base. Knowledge-Oriented Distributed Information Management. Multidatabase. Multidatabases. Multidatabase System. Multidatabase Systems. Multiple View De nition Language. Multiple View De nition System. Object Database Management System. Object De nition Language. Object Database Management Group. Open Distributed Process. Object Identity or Identi er. Object Identi ers. Oce Information Systems. Object and Knowledge-based Systems. Object Management Group. Object Modelling Technique. Object Oriented. Object Oriented Database. Object Oriented Database Management System. Object Oriented Database Management Systems. Object Oriented Database Task Group. Object Oriented Multidatabase. Object Query Language. Object-Request Broker. Semantic Heterogeneity Detection Module. Schema Identi cation Module. Query Meta-Translation System. Schema Meta-Integration System. Schema Meta-Translation System. Structured Query Language. Target Schema Speci cation Module. View Consistency Checker. View De nition Module. View Modi cation Module. xix

Chapter 1 Introduction 1.1 Introduction Users and application programs are increasingly requiring access to and manipulation of data distributed over the many sites of a computer network. Di erent users have di erent reasons for integrating data, and even the same user might need to integrate the same local information in a variety of ways to suit di erent roles or tasks in an organisation. Therefore, multiple integration views are needed to support this diversity. The research reported in this thesis was prompted by the need to design a tool that supports the exible integration of pre-existing object-oriented databases so that multiple integration views can be created. The creation of these views does not require the participating local databases to conform to a single data model, DBMS or query language, so these databases maintain their heterogeneity and diversity. Heterogeneity and diversity, when managed properly, are valuable assets. Their presence is important as it guarantees a more realistic and e ective solution to the creation of an interoperable environment. This is justi able because pre-existing database systems are valuable resources and time and money have already been invested in their development. Thus heterogeneity re ects the variety in people's thinking and freedom of choice between di erent systems and design approaches. Its preservation protects the prior investment in user education and experience at 1

the local level. Also it avoids the imposition of a single restrictive choice that may result in inferior and compromise solutions at the local level. We conjecture that interoperability between sets of heterogeneous databases is best achieved by building several tailored global views as a semantic layer above the participating components. Thus it avoids the rigidity of a one-to-one correspondence between local classes and integration rules, and allows local con icts to be reconciled in various ways to t di erent requirements. This has several advantages: rstly, the end-user is presented with a smaller, more focussed information description, which lightens the burden of searching large schemas to locate the relevant information. Secondly, it reduces the complexity of the schema integration process by reducing the size of the schemas to be integrated. Thirdly, it allows the same local information to be integrated di erently by applying di erent sets of integration rules and therefore it supports di erent viewpoints over the same information (i.e. multiple semantics are supported at the global level). Fourthly, it allows a large number of databases to join the multidatabase; this is a consequence of point one above. Fifthly, the exibility gained by this approach makes it easy for the multidatabase to be extended and evolved by the addition and removal of local databases. The input to our system is a set of object-oriented schemas in their native textual DDL de nitions, while the output is one or more global views expressed in a userspeci ed DDL together with the set of rules necessary to populate them. Each global view contains a homogeneous and consistent description of information relevant to a particular user group. This information, before integration, might have been distributed, heterogeneous and/or inconsistent across the local databases involved. Schema integration is accomplished in a exible way - in that the integrator is able to select the information to be integrated and choose the subset of integration operators that best suits his/her needs for the integration. This was achieved by designing an integration language that is based on a rich and exible set of schema integration operators. This part of our system is called the view de nition subsystem, as a global view is generated from scratch by using these operators. Alternatively, a global view may be created by modifying an existing view. This is achieved by using the view-modi cation subsystem, which generates a global view by applying selected prede ned global modi cations to an existing global view. Further, knowledge accruing 2

from the schema integration process is classi ed and stored in a knowledge base so that it may be reused in subsequent interactions. This last part of our system works in parallel with the other two subsystems, and can substantially reduce the complexity of the schema integration process. The rest of this chapter brie y outlines the overall background, motivations and achievements of our research.

1.2 Historical Background 1.2.1 Users' Data Processing Requirements One of the reasons for the original development of database systems in the 1970s was to overcome the diculties of supporting shared access to several les created by di erent application programs [HSI89, LIT88, LIT90]. These problems ranged from data inconsistency to data duplication. To alleviate these diculties, the autonomous les were replaced by a centrally de ned collection of data (called a database) managed by a database management system (DBMS). A DBMS is a computerised record keeping system whose overall purpose is to maintain data and make it available on demand [DAT90, ROB93]. This had several advantages over an autonomous les approach, namely: it could be used to reduce data redundancy, avoid data inconsistency, allow sharing of data, increase security and maintain integrity of the data. It was successful and sucient to meet user requirements for several years. However, today's user data processing requirements and capabilities have changed dramatically due to the recent progress in network and database technology, which has produced distributed rather than centralised information systems [BRE90, HUR96]. Thus new applications often involve accessing and maintaining data from several pre-existing databases and software packages [BRE90, HUR96, MAD95, SEL93, VAS95], which are typically located on autonomous software and hardware platforms distributed over the many sites of a large computer network. This usually means that there are heterogeneity and legacy problems to be solved [DOG95, GAR96, KIM91a, KIM93, OZS91]. 3

One approach to the above situation is to physically integrate all data needed by an application into one database, thus providing location, replication and heterogeneity transparency through the integration process. In many cases, this is regarded as an expensive and thus infeasible solution [BRE90, PAB96], as the resulting integrated database must be able to serve all anticipated requirements. Furthermore, most local database owners are not willing to sacri ce their local autonomy to a central authority, which is a consequence of this approach. The alternative solution is logical integration of all the data needed by an application or group of users. Here, the users are given the illusion of a globally integrated database and shielded from most types of heterogeneity without the data migrating to a new (composite) database and without requiring the users to know either the location or the characteristics of the di erent local databases and their corresponding DBMSs. When following this latter approach for interoperability, the constituent databases form a multidatabase (MDB).

1.2.2 Multidatabase Systems: a Plausible Solution A MDB system is a system which allows its users to access and/or update data from multiple, pre-existing, autonomous and probably heterogeneous databases [HUR96, FAH94, LIT88]. Users of the MDB are given the illusion of a logically integrated database where distribution, heterogeneity and the e ects of local autonomy are transparent [BUK96a, OZS91]. Ideally, the local database managers retain full control over their local data and operations; this protects an organisation's existing investment in hardware, software and user training. MDBs permit data sharing among users and application programs which leads to easier application program development. Further, their construction does not disturb the operation of existing application programs (some of which might be mission-critical applications). MDB systems are classi ed into loosely-coupled and tightly-coupled systems, based on who creates and administers the federated schemas [SHE90]. In a loosely-coupled MDB system, it is the user's responsibility to create and administer the federated schemas. This type of MDB system is further subdivided into language-based and schema importation-based systems. 4

A language-based MDB system [CHO94, KRI91, LIT86, LIT90, LIT93, WAT93] utilises a powerful query language to access data stored in autonomous and heterogeneous data repositories. This query language is capable of performing interdatabase joins, updates spanning several databases and data interchange between multiple databases. No inter-schema con icts are resolved in advance and therefore users must be aware of the existence of multiple heterogeneous and autonomous databases, as it is their responsibility to locate relevant data and apply the right language construct to overcome each type of discrepancy. Normally, the integration solutions created by these queries are not shared with other users, i.e. it is very unlikely that one user will bene t from a MDB query written by another user, as it is dicult to share and modify these queries successfully. A schema importation-based MDB system uses negotiation and schema importation as a means of supporting data sharing [HEI85, LI91]. This is the classical federated database system originally de ned by Heimbigner and McLeod [HEI85]. Here, every component database maintains three types of schema, namely: private, export and import schemas. The private schema represents the private data of a node (database) which is not available for sharing. The export schema, by contrast, contains the subset of a node's data which it is willing to share with other nodes. Lastly, the import schema refers to the data at other nodes which users of a node are interested in. One disadvantage of this approach is the overheads entailed in de ning and maintaining such schemas at every local node [COL91]. Moreover, integration solutions are developed on a single-site basis and are sharable only by users at that site. This means that these solutions cannot be shared easily with other sites, nor can they be customised easily for users at the same site who have di erent needs. A tightly-coupled MDB system uses schema integration [BAT86, OZS91] as a means for data sharing. Here, all the participating local schemas are analysed in advance to detect inter-schema correspondences and con icts. After that, a homogeneous global schema is created for end-users by integrating all the local concepts into composite global ones. This integration process includes reconciling the con icts discovered during the analysis phase [COL91, DAY84, DOG95, HAY90, MOT83, MOT87, QUT92, QUT93, NAV96a]. Alternatively in this approach, several tailored global schemas may be de ned instead of one global schema [BER91, CLE93, FAH94, 5

LIM94a, ZHA91]. This has the advantages of supporting customisability, potentially reducing the size of the global schema as seen by users, minimising the integrator's e orts, permitting a large number of databases to join and leave the MDB, and allowing di erent sets of integration rules to be applied to the same set of local objects. The integration solutions of this third category (i.e tight-coupling) in either variant are sharable by all the MDB users. This thesis is concerned only with this last category of MDB systems, and in particular, when several global schemas are created and maintained.

1.2.3 Schema Integration in Multidatabase Systems In MDBs where a global schema (or several global schemas) is (are) maintained, schema integration has to be performed. Schema integration is the process where information from several local schemas is merged (integrated) into a single global schema [BAT86, OZS91]. It provides global users with distribution, local autonomy and heterogeneity transparency in their interoperable access to the component databases. During schema integration, it is essential to identify local objects that are related to each other in some respect and to de ne their corresponding global objects. This process involves resolving con icting integration views when there are several ways in which local objects can be combined to create a global object. Because the component local databases were designed independently, they may be based on di erent data models, di erent query languages, di erent DBMSs, di erent versions of the same DBMS, or di erent operating systems and network protocols [BAT86, DOG95, KIM91a, KIM93]. Some might have been designed before the database era or based on pre-relational data models like the hierarchical and network models or on early relational DBMS versions with limited capabilities (usually these are referred to as legacy systems [BRO93]). Even if the local databases are homogeneous with respect to all the above factors, they may di er on how the same real world objects were originally perceived and modelled in the corresponding databases [GAR96, SHE93]. This last form of heterogeneity is called logical or semantic heterogeneity and is the main concern of our research. All such discrepancies have to be detected and reconciled during schema integration. This is what makes 6

it dicult to automate the process completely, and means that it will nearly always require human intervention at some stage to resolve ambiguity and related issues. Generally, methods of schema integration may be classi ed, based on their integration approach, into three groups. Firstly, knowledge-based approaches [COL91, GOT92, HAY90, QUT92, QUT93, SU96, WOE96] utilise inter-schema correspondences (assertions) and integration rules which are speci ed declaratively. Based on the type of an inter-schema assertion, an integration rule will be red in these methods. For example, an equivalence assertion between a set of local objects will cause them to be merged into one global object. In the second group, a view definition facility is provided for integrating local objects [BER91, FAH94, KAU90, LI91, SCH94]. The main di erence between this and the MDB language approach to sharing data (see Section 1.2.2) is that integration solutions are sharable by different users and local con icts are resolved in advance in this approach. Thirdly, the integration operators-based group covers the approaches [MOT83, MOT87, NAV96a] where a set of integration operators is provided for integrating local concepts.

1.2.4 Multidatabase Interoperability and the OKS Group at Cardi Research in the Object and Knowledge-based Systems (OKS) group at Cardi , since the early-1980s, has targeted a number of issues in the Heterogeneous and Distributed Database (HDDB) area. Howells [HOW86, HOW87, HOW88a, HOW88b] developed a Query Meta-Translation System (QMTS) that automatically produces translators between pairs of database query languages. These translators are then used to perform query inter-translations. This work contributed to the reconciliation of query language heterogeneity in a HDDB system. Next, Ramfos [RAM89, RAM91a, RAM91b] developed a Schema Meta-Translation System (SMTS) that automatically produces translators from non-relational schemas (OO in particular) to relational schemas. These translators allowed schemas of databases to be transformed so that remote users could view them in their local DDL representation. This work contributed to data-model heterogeneity reconciliation in a HDDB context.

7

Qutaishat [QUT92, QUT93], by contrast, developed a Schema Meta-Integration System (SMIS) that integrates a set of logically heterogeneous object-oriented schemas by generating a single global schema. SMIS performed the integration of the local schemas in a rather in exible way - issues of selectivity, customisability, global schema evolution and integration meta-knowledge reuse were not investigated in its development (see the next section for clari cation of these issues). This tool was extended by Idris [IDR94] to support global security for a heterogeneous MDB system through integration of local security, and by Alzahrani [ALZ96] to provide support for MDB integrity constraint integration and related query optimisation based on semantic information. A common factor in the above projects is that they were all based on metaprogramming and utilised a logic programming language (namely Prolog) to implement their prototypes. Each prototype, developed as a meta-program, relies on an intermediate model with associated translators to translate from and to this model. This makes the prototype both DBMS and DDL independent, in that it is generally easy to adapt that prototype to a new DBMS or DDL, since only the associated translators have to be changed. Logic programming has also been shown to be a convenient vehicle to investigate many other database issues [GAL84, CER90], such as query processing and optimisation [ALZ96, BEL89, HAM80], integrity constraint modelling and maintenance [FON93, IBR96] and schema integration [COL91, QUT92, QUT93, WHA91, WOE96]. Schema integration tools based on logic programming exploit its declarative nature to capture inter-schema correspondences, user-advisory assertions and integration rules. Further, the inference capabilities of this paradigm greatly simplify the schema integration process by providing means to store and consequently reuse the information resulting from that process. Finally and more importantly it allows the formal de nition of the schema integration process.

8

1.3 Research Motivations The need to access a collection of independently designed databases is inevitable in most organisations nowadays due to the increasing use of diverse database management systems and the rapid rate of change in technology [BRE90, OZS91, PAB96]. In such an environment, the user wants to access information located in other databases without learning a new query language, data model or DBMS. Ideally, he/she would like to use his/her own local query language, data model and DBMS speci cations to access other databases distributed over a network. Further, he/she would like to see a consistent view of the corresponding distributed and possibly overlapping information. This, coupled with the recent emergence of data intensive applications such as arti cial intelligence (AI), computer-aided design and manufacturing (CAD/CAM) and oce information systems (OIS), has encouraged the use of the OO data model as a preferred alternative to traditional data models such as the hierarchical and relational [ATK92, DEU92, FIS87, ROB93, RUM91, STO91, WIL90], which are unable to cope with such applications. Thus, OO technology has the potential to meet our requirements due to its many desirable features such as classi cation, class hierarchies and inheritance, encapsulation, overriding and late binding and its use in OODBMSs. Moreover, the combination of OO technology with MDB technology results in a much more powerful system than those relying on a single technology. Hence for this thesis we investigated interoperability between OOMDB systems. It is our contention that building one global schema for a MDB system above the component local databases has a number of serious disadvantages. For example, it can limit the number of participating local databases, especially if these databases are large in schema terms; the size of the generated global schema can grow very large, which increases the burden on a user trying to locate information of interest; evolution of the global schema is not supported; and time is spent on integrating and merging local concepts that may never be used. On the other hand, the MDB language alternative approach, which leaves the schema integration burden to the end-user, is also impractical, because it requires the end-user to have a high level of expertise and to have a working knowledge of the MDB contents. Thus a better and more exible approach to achieving interoperability between the local databases in a 9

MDB system would appear to result from the approach of creating several integrated schemas or views of these local databases. It is this latter approach which is the main concern of this thesis, which investigates a number of aims that, if achieved, will contribute to successful progress in this context. These aims are that:

 The contents of a given global view should be decided on the basis of the nature

of the application programs that are going to use it and the needs of its users. This means that while creating a global view, only the relevant information should be considered for integration (we call this objective selectivity).

 Di erent global users may want to perceive the same local information in di erent

ways, so our schema integration tool should be exible enough to allow di erent global schemas to co-exist over the same set of local objects, each tailored towards a particular user group (this is called customisability or multiple semantics). Data described in the di erent views need not be consistent. Supporting multiple semantics at the global level is very rarely addressed in the schema integration context. In most cases (e.g. [GOT92, HAY90, QUT92, QUT93, SU96]) user requirements are likely to be compromised as a result of the imposition of a xed set of integration rules which create a single global schema for all users. Di erent users have di erent reasons for integration, even the same user may need di erent integration views to suit di erent roles or tasks within their organisation. So, it is essential to support multiple integration views that are tailored to meet various user requirements.

 Because schema integration is a complex process that will nearly always depend

on extra information supplied by interaction with a domain-expert, it should be possible to store this additional valuable knowledge so that it can be reused in other integrations. Also, it should be possible to infer (learn) new knowledge from this stored information (this objective is called reuse). Despite the many research projects that have investigated the schema integration process, very little work has been done on meta-knowledge reuse and associated algorithms in the schema integration context [ROS94]. We believe that this is an important aspect in schema integration as user requirements are likely to change over time. Consequently, the global schema becomes obsolete, and its evolution is easier if meta-knowledge reuse can be exploited to create an updated global schema. 10

 It should be possible to create a tool with a high-level user interface, where the

integrator is presented with only the required actions at every stage of the integration process. This has the advantage of simplifying schema integration from the integrator's point of view and thereby minimising process errors.

 The tool should be DBMS and DDL independent, i.e. it should be relatively easy

to adapt to a new DBMS and DDL. Many of the available research projects (e.g. [BER96a, BER96b, FAH94, KLA96]) are DBMS and DDL dependent, therefore including a new DBMS in their integration processes will require substantial adaption and re-programming work, costing time and money. Even though most of the above projects (for example [KLA96]) use a canonical data model, this model is focussed on a particular type of DBMS. This means that including a new type of DBMS may require the canonical data model to be altered - either by the addition of new data structures or the modi cation of existing ones. Alteration is necessary to model information represented in the new type of DBMS which cannot be supported by the current canonical data model. Also, existing application programs may then have to be modi ed in turn to suit the new version of the canonical data model.

1.4 Overall Achievements of the Research In our research, a schema integration process was de ned and investigated bearing in mind the goals that have been described above, namely: selectivity, customisability, evolution and reuse. The prototype used in this investigation was implemented as a meta-program using a logic-based language, Prolog [BRA90, CLO87]. The main contributions of this research into MDB interoperability are: 1. The design of a schema integration language (called MVDL, for Multiple View De nition Language) was successfully undertaken. It is based on a small number of semantically rich operators that are capable of generating homogeneous and consistent global classes which incorporate distributed, heterogeneous local ones. MVDL supports selectivity by allowing the integrator to choose the information of interest, and customisability as the same local information can be integrated di erently in

11

alternative schemas to support multiple semantics [DUW96b, DUW96d] (see Chapter 5). 2. Global view evolution is supported, since a global view can be modi ed to meet emerging requirements without re-doing the integration process from scratch. To the writer's best knowledge, this work is the rst to address schema evolution in MDB systems [DUW96c] (see Chapter 8). 3. The knowledge accruing from the schema integration process (called integration meta-knowledge) was classi ed and the set of rules for reusing it to create other global views was determined [DUW96a]. This subsequently reduced the complexity and increased the automation of the schema integration process. To the writer's best knowledge, this work is the rst to provide a complete classi cation of such knowledge and to develop a schema integration tool that exploits it. While Rosenthal and Seligman, [ROS94] discuss the bene ts of reuse in interoperable systems, they only provided initial thoughts about the importance of such a concept in the interoperability context. In our research, this principle was carried far beyond that stage - in that we provided a classi cation of the types of knowledge that typically accrue during the schema integration process, and we utilised this knowledge during the establishment of inter-schema correspondences and the creation of global views [DUW96a, DUW96c] (see Chapter 8). 4. A software tool was built which enabled multiple views to be created either using MVDL or by evolution. It is called MVDS, for Multiple View De nition System. It is based on a graphical user interface which creates a user-friendly environment for schema integration (see Chapters 6 and 7).

1.5 Organisation of this Thesis This section presents an overview of the thesis' organisation. This rst chapter has presented an introduction to the research undertaken, covering background, motivations and highlighting the research goals and original achievements.

12

CHAPTER 2: Multidatabase Systems: Background This chapter gives necessary background information about MDB systems.

CHAPTER 3: Object-Oriented Modelling and Schema Integration This chapter provides a brief description of the OO data model. It also discusses the impact of OO technology on interoperability in MDB systems. In addition, it provides a taxonomy of possible semantic con icts typically found when comparing independently designed OO schemas. Finally, it describes the phases of schema integration.

CHAPTER 4: Multidatabase Interoperability via Multiple Global Views This chapter reviews schema integration approaches and highlights their strengths and weaknesses. This review leads to the identi cation of logical foundations for the design of MVDL and MVDS.

CHAPTER 5: Integration Operators for Generating Global Views This chapter evaluates the pros and cons of the multiple global views approach for MDB interoperability. It also describes formally the operators of the MVDL language.

CHAPTER 6: MVDS Design Principles This chapter describes the logical foundations of MVDS. In particular, it critically appraises the usefulness of a canonical data model in interoperable architectures and justi es the reasons for choosing the Object Model of the ODMG-93 standard [CAT94, CAT96]. It also describes the heuristics followed in MVDS to establish inter-schema correspondences.

CHAPTER 7: MVDS Architecture and Operation This chapter describes the architecture and operation of the MVDS system.

13

CHAPTER 8 : Schema Integration Meta-Knowledge Classi cation and Reuse in MVDS

This chapter provides a classi cation of the knowledge that accrues from the schema integration process. It also presents rules for exploiting this knowledge within the schema integration context.

CHAPTER 9: Evaluation of the Research This chapter focuses on the evaluation of MVDS. It critically appraises the advantages and contributions of our research to MDB interoperability. It also describes the limitations of MVDS and points out how to overcome them.

CHAPTER 10: Conclusions and Future Work This chapter draws conclusions and identi es future work that could be carried out based on the achievements of this research.

14

Chapter 2 Multidatabase Systems: Background 2.1 Introduction In Chapter one, we suggested that a MDB system is an a ordable and a plausible vehicle for supporting interoperability between sets of autonomous and heterogeneous databases. In this chapter, MDB systems are compared with other data sharing alternatives, namely: distributed databases and interoperable systems (Section 2.2). After that, necessary background information about MDB systems is presented. This covers types of heterogeneity, the preservation of local autonomy and a comparison of alternative system architectures, which leads in conclusion to our reference architecture for MDB systems.

2.2 Data Sharing Alternatives It is evident from the continuing progress in communication and database technologies that user and application programs will increasingly need to access data located in di erent, possibly heterogeneous databases. These databases are likely to be managed by independent organisations using di erent management systems. Several data sharing alternatives are available in this situation. These vary in their 15

cost, degree of transparency-support, e ects on local nodes and number of participating local nodes. Di erent problems require di erent solutions, and some require combinations of these solutions. In this section, DDBSs (Distributed Database Systems), MDBSs (Multidatabase Systems) and Interoperable Systems (ISs), as three major data sharing alternatives, are compared. These approaches all use a computer system that includes a global component to access globally shared information and have multiple local components that manage only the information located at their respective local sites [BRI92, BUK96a]. A distributed database system (DDBS) is a collection of cooperating homogeneous local DBMSs that provides a uniform global interface1. These local DBMSs and their corresponding databases reside either on a single computer system or on multiple computer systems that may di er in their hardware, system software and communication support [BUK96a, OZS91]. In this class of DDBSs a global homogeneous schema is provided, and all the participating local sites must adhere to it. By contrast, a multidatabase system (MDBS) supports operations on multiple component databases where each component database is managed by an independent DBMS. These DBMSs are likely to be heterogeneous. The component DBs in a MDBS can be centralised or distributed, and may reside on the same computer or on multiple computers connected by a communication subsystem. MDBSs may have a full global schema, partial global schemas, or dynamically created partial global schemas using a global query language [BUK96a, FAH94]. Finally, interoperable systems (ISs) aim to support interoperability between a wide-range of data repositories, not solely databases managed by DBMSs [BRI92, BOU91, BOU93a, BOU93b, BRO93, COL95, MIL95]. The number of participating local nodes in an IS is usually very large, therefore locating and identifying information of interest is a non-trivial and time consuming process. Most proposals for future IS architectures employ a global object space or name space hierarchy [BRI94, BRO93] with associated browsers to locate and identify the required information, and formats and protocols for shipping the information once it is located. Table 2.1 compares DDBSs, MDBSs and ISs. The rst row describes the data model These DBMSs can be heterogeneous as well. However, they all have to conform to the global DBMS formalisms if they participate in a DDBS. If these DBMSs maintain their heterogeneity and local autonomy, they form a MDB system from the viewpoint of this thesis. 1

16

Data model of the local components Local access (query) language Control including executioncoupling

Design System construction expenses and effects on local nodes Abstraction level

Distributed Database Sys- Multidatabase Systems Interoperable Systems tems (DDBSs) (MDBSs) (ISs) Same Usually di erent Not applicable. It accommodates di erent types of software packages, DBMSs, ..., etc. Same. Usually di erent. Not applicable. Local databases are very tightly-coupled; the global DBMS has control over local data and processing. It accesses the local databases through internal function calls. Top-down Local nodes have to conform to the global DBMS requirements. This is a large expense in addition to the disruption of ongoing operations during installation and evolution. Users see a single centralised DBMS

MDBS software interacts with the local databases via their user interfaces. The global software has less control over the local ones, and therefore MDBSs can be less ecient when compared with DDBSs in terms of global processing costs. Bottom-up When joining a MDB, local nodes continue to preserve their well functioning environment. This reduces the expense of constructing a MDBS and has minimal impact on local nodes. Users generally, but not necessarily, see a single DBMS.

No concept of global processing, just data exchange and local processing. Participating nodes typically interact via a software layer(s) that manages the exchange protocol and data transformations. Bottom-up Represents more of an added capability to the local nodes rather than a constructed architecture, therefore, it has low expense and minimal impact on local nodes. Users are aware of multiple remote data sources and have to perform various actions to integrate data from these sources.

Table 2.1: Data Sharing Alternatives aspect of each of these systems. The second row describes local access languages that are typically used by their respective components. The third row compares these systems on the basis of their global and local function control management policies. The fourth row states the approach that is followed in designing these types of system. The fth row compares the expense entailed in constructing them. Lastly, the sixth row compares the abstraction level of these systems from an end-user point of view. As this table shows, DDBSs provide a very tight execution-coupling2 , where a global DBMS interacts with local DBMSs of the same type and with closely related schemas via internal function calls. This makes it easier to implement global optimisation Coupling in MDBSs may describe how the global software interacts with the local nodes to process a query or transaction (execution coupling), or it may describe the links between global and local schemas and who is responsible for creating and maintaining these links, i.e. whether it is the DBA or the end user (administrative coupling). 2

17

algorithms, but, unfortunately, it compromises the component databases' local autonomy. The participating local node homogeneity assumption and the sacri ce of local autonomy have limited the practical successes of this type of DDBSs, which often have a further constraint that all schemas at the local level should represent the same universe of discourse. ISs, on the other hand, support a very loose execution-coupling between constituent nodes and therefore o er minimal opportunity for global optimisation but provide an a ordable and more practical solution for sharing information when the number of local nodes is very large. MDBSs fall between DDBSs and ISs and provide a compromise solution between taking complete control over local functions (as in DDBSs) and no global control (as in ISs) they interact with the local components at their user interface level. Further, constructing a DDBS is expensive in addition to the costly disruption of ongoing operations, while both MDBSs and ISs have low construction expenses and low impact on ongoing operations. The abstraction level of a DDBS is very high as its users see a single centralised database. Some types of MDBS support a high level of abstraction similar to that of a DDBS. By contrast, the abstraction level of an IS is low as users are aware of the existence of multiple information sources. To summarise, MDBSs fall between DDBSs and ISs with respect to the above factors and provide a practical alternative for data and operation sharing between a set of autonomous databases. The dividing lines between these types of system are not always distinct and it is sometimes dicult to categorise actual products as one or the other. Only MDBSs are considered in this research. However, research into MDBSs and DDBSs should lead to improvements in ISs as well.

2.3 What is a Multidatabase System? In this section, we de ne \MDB system" as the term is used in this thesis. A MDB system is one where it is possible to access and update data held in multiple, pre-existing, autonomous and possibly heterogeneous databases [HUR96, FAH94, LIT88]. Its users can transparently access one or more of its constituent databases

18

with a single query. The MDB software automatically performs query decomposition, together with data model and access language transformations, to convert a global query into an appropriate set of sub-queries for the local databases. If the query produces a response, the sub-results coming from the individual databases are then transformed and composed into the format of the MDB system response language. Instead of providing transformations between every combination of local data models and access languages, the MDB system de nes a global data model and access language for which there is a transformer to/from every local data model and access language. The remoteness, multiplicity and heterogeneity of local databases are transparent to the global user3. Further, the local databases are autonomous (i.e. they retain full control4 over their local data processing). By preserving local autonomy and allowing pre-existing DBMSs to join and leave the global system, a MDB system protects an organisation's existing investment in hardware, software and user training. It also allows data sharing among users and applications, which leads to easier application development. It presents a relatively simple user interface by making the heterogeneity and distribution of information sources largely transparent. In addition, its construction does not disturb running applications (some of which might be mission-critical applications). Many MDB projects have been reported in the literature, for example Pegasus [AHM91, ALB93, RAF91], Multibase [LAN82, SMI81] and Myriad5 [CLE93, LIM94a]. For a comprehensive list refer to [BUK96a]. Pegasus [AHM91, ALB93, PIT96, RAF91] is a heterogeneous MDB system that supports interoperability between a wide range of information sources. Its canonical data model is based on the Iris OO data model [FIS87]. Myriad [CLE93, LIM94a], by contrast, is a federated database prototype that was developed at the University of Minnesota. It employs the relational model as its canonical data model and SQL as its query language. Multibase [LAN82, SMI81] is a prototype that provides integrated access to preexisting and heterogeneous databases. It provides users with a single integrated This depends on the MDB system type. See Section 2.6. In practice, local autonomy will be violated to varying degrees depending on the type of the MDB system. 5 For a brief description of these systems see Section 2.9. 3 4

19

schema and a single query language. Its data model is based on the functional data model and its query language is based on the functional language DAPLEX.

2.4 Heterogeneity in Multidatabase Systems The component databases in a MDB are usually heterogeneous as a consequence of their independent design. This heterogeneity may exist at three basic levels [DOG95]. Firstly, at the platform level - the database systems reside on di erent hardware, use di erent operating systems, and communicate with other systems using di erent communication protocols. Secondly, at the database management system (DBMS) level - the data can be managed by a variety of database systems based on di erent data models and query languages. Thirdly, at the semantic (logical) level - semantic heterogeneity may arise [BAT86, GAR96, KIM91a, KIM93] because di erent designers have di erent viewpoints in modelling the same real world objects in the application domain or because of known equivalence among constructs of the data model which means that a concept can be modelled in several ways using di erent combinations of these constructs. It may also arise because of incompatible design speci cations which lead to di erent naming, types and integrity constraints. Platform and DBMS heterogeneities are well understood [GAR96, KIM93] and can be reconciled by using gateways, for example. Data model heterogeneity, by contrast, is resolved by using inter-model translators. Translating the component schemas, in a MDB system, into a common data model (CDM) representation reduces the number of translators needed compared with an approach based on translating directly between all the component system representations. The chosen CDM is best if it is a canonical data model and expressive enough to capture the meaning of all local data models [CAS94, HUL87, SAL91]. The OO data model is a suitable CDM and has been used for this purpose in several projects (e.g. [BER94, FAH94, PIT95]). Query language heterogeneity can be resolved by providing transparent query translators. This means that a user can use his/her local query language to formulate a query, and this query will subsequently be translated (and divided into sub-queries for di erent databases, if necessary) by the query processor [HOW86, HOW87, 20

HOW88a, HOW88b]. Logical heterogeneity is dicult to detect and reconcile, because it requires a good understanding of the meaning of data (i.e. how data is interpreted by database administrators, application programs and end-users) - and it also evolves with time. Unfortunately, it is not possible to fully capture real world semantics by using available data modelling techniques [SAL91]. Therefore, nearly all tools that deal with detecting and reconciling semantic heterogeneity depend on user interaction to complement and validate their results [GAR96, GOT92, KIM91a, KIM93, SPA91, SAV91].

2.4.1 Heterogeneity Aims for MVDS Heterogeneity is a natural consequence of progress and competition over time. The choice of a data management system depends on the application's requirements and available technology. As these evolve over time, an organisation can end up having several possibly heterogeneous data management systems. Sometimes, the choice of a data management system is dictated by performance considerations, and not all data sources may belong to the organisation using them [BER89]. Therefore, several types of heterogeneity are likely even within a single organisation. Because of the complex nature of the problems addressed in this research, it should not be carried out in an unfocused way, but directed towards the stated primary goals. So we decided to investigate only logical heterogeneity in the schema integration process - as this type of heterogeneity is likely to occur even if the databases concerned are using the same hardware, operating system, data model, DBMS and query language. We did not target hardware and operating system heterogeneity as these should be masked by the communication subsystem, not the schema integration tool of a MDB system. Even though data model heterogeneity is not addressed in depth in our research, one of MVDS's design goals is that it should be able to cope with database data model heterogeneity. Therefore, MVDS employs a canonical data model to which all the participating local schema data models must relate. This means that the 21

participating schemas should be translated into this model before their integration process commences. DDL-heterogeneous schemas are supported with a minimum impact on MVDS - as it has been implemented using a meta-programming approach (see Section 7.2).

2.5 Local Autonomy in Multidatabase Systems The component databases in a MDB system are expected to be under separate and independent control. Often, their owners are willing to let them participate in a MDB system only if they retain full control over their data and operations. This aspect is referred to as local autonomy or component database autonomy [SHE90, BUK96a, COL95, HEI85]. Local autonomy can be broadly divided into four levels, namely: design, communication, execution and association autonomy.

Design autonomy refers to the ability of a component database administrator to choose its:

 data (i.e. the universe of discourse),  data model and query language,  interpretation of data (which contributes signi cantly to the problem of semantic heterogeneity),

 set of integrity constraints,  operations (i.e. functionalities), and  solutions to implementation issues such as record and le structures and concurrency control algorithms.

Communication autonomy, by contrast, refers to the ability of a component

database to decide whether to communicate with other component databases. A component database with communication autonomy decides when and how to respond to requests from other components. 22

Execution autonomy means that a component database is able to execute its

local operations without interference from external operations submitted by other component database systems. Thus the MDB system software cannot impose an order of execution of its commands on a component database system that has execution autonomy. This greatly complicates MDB system transaction processing and global query optimisation techniques.

Association autonomy implies that a component database system has the ability to decide whether and to what extent it will share its operations and data. This includes the ability to associate or disassociate itself with/from the MDB system.

Thus, the need to preserve local autonomy of component databases and the need to share data often con ict. In practical situations, local autonomy may (will) be sacri ced to varying degrees to improve global functionality such as query processing and reconciliation of heterogeneity [COL95]. However the ability to preserve local autonomy gives advantages to a MDB system. For example, it circumvents the costly and highly cumbersome process of modifying local database systems; it allows the component databases to maintain their independence and preserve their \wellfunctioning" environment; and, nally, it permits the easy addition or removal of a component database to/from the MDB system.

2.6 Multidatabase System Architectures MDB systems can be classi ed into di erent types of architecture according to various criteria. For example, they can be classi ed as dynamic or static according to the frequency with which their federated schemas are created and destroyed. If that process is very frequent the MDB is called dynamic, otherwise it is called static. Another possible classi cation is based on a heterogeneity criterion, where systems are classi ed as homogeneous or heterogeneous. In a homogeneous system, all the component databases use the same data model, query language, DBMS and interpretation of data. If one or more of these factors is di erent, the MDB is called heterogeneous. MDB systems can be further classi ed depending on the type of heterogeneity supported, for example data model or query language heterogeneities. 23

Our research classi es MDB systems according to the following criteria:

 local autonomy support, i.e. whether component (local) databases are autonomous or not,

 the existence or absence of federated schemas, and  who administers such schemas, either the DBA or end-users. Figure 2.1 shows a MDB system architectural taxonomy which is based on the one presented in [SHE90]. According to this taxonomy, MDB systems are rst classi ed into non-federated and federated systems. The component database systems in a non-federated MDB system are not autonomous, and there is no distinction between the global and local users. Therefore it appears to its users as a DDBS in our terminology. Our research is not concerned with this type of MDB system although the algorithms and techniques described here are applicable to the design of a global schema for a DDBS. Federated MDB systems preserve the local autonomy of their component database systems, which are in charge of their own data and operations. There is no centralised control in a federated MDB system, and the component databases cooperate to support di erent degrees of integration. These systems are further divided into loosely-coupled and tightly-coupled federated MDB systems, based on who manages the federation and how the component databases are integrated. Loose- and tight-coupling have been used in di erent contexts in the literature, namely: 1. to describe the relationship between global and local functions during global user operation execution (execution coupling). A loosely-coupled MDB system, in this sense, interacts with its component databases at their user interface level, and it cannot control the execution of a global request within a component database system. In contrast, in a tightly-coupled MDB system, local operations are inferior to global ones in their execution priority. 24

Multidatabase Systems

Non-Federated

Federated

e.g. UNIBASE

Tightly-Coupled

Loosely-Coupled e.g. MRDSM

Single-Federation e.g. DDTS, SIRUS-DELTA

Multiple-Federations e.g. MERMAID, MULTIBASE

Loosely-coupled: absence of integrated schemas, and end-users are responsible for performing the integration. Tightly-coupled: presence of integrated schemas generated by the MDBS's DBA(s)

Figure 2.1: A Taxonomy of Multidatabase System Architectures [SHE90] 2. to describe links between global and local data descriptions and who is responsible for creating and maintaining these links (administrative coupling). A tightly-coupled MDB system, in this sense, maintains federated (global) schema(s) and these schemas are usually created and maintained by the system DBA. A loosely-coupled MDB system, on the other hand, provides either importation/negotiation techniques or a powerful query language (see Section 2.6.1). It is a site's DBA or end-user responsibility to perform the importation/negotiation process in the former, while it is the end-user's responsibility to create valid MDB queries in the latter. Because the schema integration process in MDB systems is being investigated in this thesis, we will refer to tight and loose coupling as de ned by point 2 above, i.e. the classi cation of MDB systems presented in this chapter is based on administrative coupling. 25

2.6.1 Loosely-Coupled Federated Multidatabase Systems In loosely-coupled federated MDB systems, it is the users' responsibility to create and maintain the federated schemas. There is no control enforced by the federation and its administrators. Systems of this type are sometimes referred to as interoperable or multidatabase systems [LIT86, LIT90]. The component databases are autonomous, in that every component database system controls its own data and operations; it can join and withdraw from the federation at any time. This is an ideal situation, in practice however a component database system has, at least, to inform the federation of its existence or absence from the federation. This class of MDB systems nearly always has several federated schemas. Such schemas are usually created either by a schema importation process [HEI85, LI91] (see gure 2.2) or through the use of a MDB language [CHO94, KRI91, LIT86, LIT90, LIT93, WAT93] (see gure 2.3).

Private Schema

Export Schema

Import Schema

Federal-Dictionary

Private Schema

Private Schema

Export Schema

Export Schema

Import Schema

Import Schema

Interaction

Negotiation

Figure 2.2: Schema-Importation-Based Multidatabase System 26

Global User Using Multidatabase Language

Global User Using Multidatabase Language

Local Schema

• • •

Local Schema

Local User Interface

Local User Interface

Local DBMS

Local DBMS

Local Schema

Local Data

Local Schema

• • •

• • •

Local Data

Figure 2.3: Language-Based Multidatabase System Schema importation is the process whereby a component database negotiates with other component databases to gain access rights to certain information. This information is usually kept in the component database import schema. Once negotiation is complete, the component database may restructure the information in its import schema, usually by applying a set of schema manipulation operators to it (this operation more or less is a schema integration). One disadvantage seen with this approach is the overhead entailed in creating and maintaining such schemas for every component database system [COL91]. Also, the users have to have a certain degree of expertise to perform the negotiation and schema restructuring processes. The second approach for performing integration in a loosely-coupled federated MDB system is to provide its users with a MDB query language that is capable of accessing several autonomous databases. This language is typically capable of performing retrievals or updates on data located in autonomous and heterogeneous databases. 27

This may require inter-database joins, data interchange and reconciliation of heterogeneous local representations. Note here that it is still the user's responsibility to form a valid global query. Consequently, the process of detecting and reconciling heterogeneity between local databases is left to the user rather than a specialist integrator. Thus the users must have a working knowledge of the MDB contents and must learn the MDB query language before they can exploit it fully and e ectively. Loosely-coupled MDB systems are suitable in cases where the number of local components is very large, retrieval operations only are required, or end-users are experts and therefore it is not necessary to shield them from heterogeneity, distribution and replication of data.

2.6.2 Tightly-Coupled Federated Multidatabase Systems Here, the federation and its administrators are responsible for creating and maintaining the federated schemas and controlling access to the local databases. Local autonomy is also supported although access to a component database is usually controlled by its DBA. This approach is suitable for traditional business or corporate databases, where users are not highly skilled and would nd it dicult to perform negotiation and integration themselves, or when location, distribution and replication transparencies are desirable. One of its disadvantages is that it is not suitable for a very large number of databases or for very large schemas. Also, incremental growth is not easily supported, i.e. adding a new component database to the MDB system requires substantial e ort to incorporate its e ects into existing global schemas - where schema contents that are a ected by this addition have to be extended or modi ed to cater for new types of information and heterogeneity that are supported by the new component. The payo with this type of MDB system is that users are protected from the e ects of heterogeneity by the provision of a consistent and non-redundant global representation(s). Tightly-coupled federations can be further subdivided into single-federation and multiple federations. A single-federation MDB system [COL91, DAY84, DOG95,

28

HAY90, MOT83, MOT87, QUT92, QUT93, SPA92] (see gure 2.4) allows the creation and management of only one federated schema or global schema. This federated schema is a consistent, non-redundant and complete representation of all the local information [BAT86]. This helps in maintaining uniformity in semantic interpretation of the integrated data and therefore it is generally easier to enforce global integrity constraints and to optimise queries. Unfortunately, the global schema can grow very large, especially when the component databases are numerous or large in schema terms. Also, all structural and semantic con icts have to be identi ed at the time of integration even though some of them might never occur in practice. Thus, the creation of such a schema is a complex process and this approach is not well-suited to applications that require frequent modi cations to their local schemas and consequently to their global schema. Global User

Global User

Global User

External View

External View

Global Conceptual Schema

Local Schema Represented in CDM

Local Schema Represented in CDM

Local User Interface

Local User Interface

Local DBMS

Local DBMS

Local Schema

Local Data

Local Schema

• • •

• • •

Local Data

Figure 2.4: A Single-Federation Multidatabase System

A multiple-federations MDB system, by contrast, maintains several federated schemas [BER91, CLE93, FAH94, LIM94a, ZHA91] (see gure 2.5). This permits globalschema tailoring and customisation so that users are presented with information 29

relevant to their requirements, represented in their preferred format. It supports multiple interpretations of data, i.e. the same piece of information can be viewed di erently by di erent users. However these multiple interpretations of local information may lead to update inconsistencies. For example, assume there are two groups of users, each having its own view of a MDB system that contains information about students like their names, courses and teachers. Assume further that the rst view has an integrity constraint which states that a student may take three or more courses per term, while a student may take four or more courses per term in the second view. Now, if a user from the rst view inserts a new student with 3 courses only, then this new student violates the integrity constraint of the second view6. This architecture overcomes the single-federated schema large size problem by maintaining several tailored federated schemas and integrating only the required information in each schema. Global User

Global User

Global Conceptual Schema (partial)

Global Conceptual Schema (partial)

Local Schema Represented in CDM

Local Schema Represented in CDM

Local User Interface

Local User Interface

Local DBMS

Local DBMS

Local Schema

Local Data

Local Schema

• • •

• • •

• • •

• • •

Local Data

Figure 2.5: Multiple Federations Multidatabase System We assume here that the two views operate on the same data set. Of course this con ict can be overcome by ltering out the instances that violate particular view constraints when the data is being accessed by this view's users. However, this example gives a feeling of the overhead entailed when multiple interpretations of data are permitted. 6

30

Note that Sheth and Larson [SHE90] di erentiate between tightly-coupled and loosely-coupled MDB systems according to who creates and administers the federated schemas. If such schemas are created by users, the system is classi ed as loosely-coupled. But, if the MDB administrator is the one who creates such schemas, the system is called tightly-coupled. According to this classi cation, tightly-coupled federated MDB systems are autonomous. Other authors use di erent terminologies, for example Litwin [LIT86, LIT90] uses the term \multidatabase" to refer to a query-language-based, loosely-coupled federated MDB system in Sheth and Larson's [SHE90] terminology. Heimbigner and Mcleod [HEI85] use the term \federated" to refer to schema-importation-based, loosely-coupled federated MDB systems according to Sheth and Larson's classi cation [SHE90]. Table 2.2 shows some of the common terminologies in this context, with Sheth and Larson's de nitions in the rst row and equivalent terms by other authors in subsequent rows. Sheth and Larson [SHE90]

Loosely-Coupled Federated MDB Systems (Schema Importation)

Litwin [LIT86]

Federated

Heimbigner and McLeod [HEI85]

Federated

Fahl [FAH94]

Federated

Bright, Hurson and Pakzad [BRI92] Ours

LooselyCoupled Federated MDB Systems (Query Languagebased) Multidatabase Systems -

Multidatabase Language Federated Multidatabase Language Schema Multidatabase Importation Based Language Systems Approach

Tightly-Coupled Federated MDB Systems (Single Federation)

Tightly-Coupled Federated MDB Systems (Multiple Federations)

Distributed Database Systems Composite Database or Heterogeneous Database Global Schema

-

Global Schema

Multiple Integrated Schemas -

Single Global Multiple Global Schema (View) Schemas (Views)

Table 2.2: Multidatabase Systems Terminology Comparison

2.7 Multidatabase System Architectures Comparison The boundaries between the di erent architectures of MDB systems are not always distinct. For example, the single-federation architecture can be regarded as a special case of the multiple-federations architecture if the latter maintains only one global schema. On the other hand, if the language-based architecture uses persistent views, 31

the multiple-federations architecture becomes a special case of it. Table 2.3 shows a comparison of the features supported by the di erent architectures as in Sheth and Larson [SHE90]. Only aspects relevant to our research are highlighted in this table, for instance, whether local con icts are resolved in advance or not. From our point of view, the multiple-federations architecture is a practical one, as it supports local DB autonomy, accommodates a large number of local databases, supports global schema customisability and evolution, provides a high degree of integration solution sharing, and shields MDB users from the e ects of heterogeneity and local autonomy by constructing the global schemas via a schema integration process. Feature Local Autonomy Method of supporting data sharing Level of sharing of integration solutions Possible for large MDB systems Customisability Local con icts are resolved in advance Support for multiple interpretations of data (multiple semantics)

SchemaImportation-Based MDB Systems Supported Schema importation (negotiation) Single-site users

Language-Based MDB Systems

Single-Federation MDB Systems

MultipleFederations MDB Systems Supported Supported Supported Query language Local schema Local schema integration integration Usually a single All users All users user

Yes

Yes

Dicult

Supported

Supported

No

No

Limited Supported support via external schemas Yes Yes

Yes

Yes

Yes

No and Yes (via ex- Yes ternal schemas)

Table 2.3: Multidatabase System Architectures Comparison of Selected Features

2.8 Multidatabase System Reference Architecture Figure 2.6 shows our reference architecture. The component databases are autonomous, i.e. they control their own data and their existing application programs are una ected by joining the MDB system. This architecture relies on multiple global views to support controlled data and operation sharing between local databases. These views are built via a schema integration process where local con icts are resolved in advance and, therefore, end-users are provided with homogeneous views of the otherwise heterogeneous and distributed information. Schema integration is performed in a very exible way (see Chapters 4 and 5), 32

therefore it is possible to support multiple semantics at the global level over the same local information (i.e. it is possible to apply di erent integration operators to the same set of local objects). All local schemas are translated into the MVDS canonical data model (CDM) representation before schema integration is performed. Even though data model heterogeneity is beyond the scope of this project, it is possible to accommodate data-model heterogeneous databases in this reference architecture by rst translating their schemas into the CDM representation. The global views are initially expressed in the CDM representation and then translated into a user-speci ed DDL later when a user wishes to interact with them. Once the de nition of a global view is complete, users from any site can query through it. The local databases may protect their private information by providing an export schema rather than their local schema itself to the MDB system. It is also possible for a node to provide more than one export schema, thereby limiting external access to selected groups of users and controlling their views of the database. The reference MDB system maintains a knowledge base that contains the local schemas represented in the CDM and known inter-schema correspondences. Also each global view has its own knowledge base that contains its intermediate representation. The advantage of these latter knowledge bases is that they provide a clear separation between information that is relevant to global views, local databases and local schema integration rules, respectively. Any reference to MDB systems in subsequent chapters of this thesis refers to this architecture.

2.9 Multidatabase System Examples Here we provide a sample of three existing MDB systems described in the literature, namely: Pegasus, Myriad and Multibase. Pegasus is a full OO MDB system, while Myriad is a relational federated system. Multibase is a functional MDB system. Thus they are representatives of MDB systems that use three di erent major data models as their canonical data models, namely: the OO model (Pegasus), the functional model (Multibase) and the relational model (Myriad).

33

GS1-KB

GS1

GS2-KB

GS2

Global Level

MDB-KB CS1

CS2

•••

CS n

ES1

Local Level

LS1

LS2

DB1

DB2

GS: Global Schema DB: Database

•••

DBn

CS: Component Schema in CDM ES: Export Schema KB: Knowledge Base LS: Local Schema

Figure 2.6: MVDS's Reference Architecture for Multidatabase Systems

2.9.1 Pegasus Pegasus [AHM91, ALB93, DU96, PIT96, RAF91] is a heterogeneous MDB system that supports interoperability between a wide range of information sources such as relational and OO databases. Its canonical data model is based on the Iris OO data model [FIS87]. Pegasus uses the same language, called HOSQL (Heterogeneous Object SQL), as both its data de nition and data manipulation language. Pegasus provides three functional layers:

 The intelligent information access layer: this is responsible for providing services such as information mining, browsers, schema exploration and natural language interfaces.

 The cooperative information management layer: this deals with schema integration and global query processing, also local query translation and transaction management.

 The local data access layer: this manages schema and command translation, 34

local system invocation, network communication and data conversion and routing. Schema integration in Pegasus is accomplished by rst registering the local schemas, this is followed by importing them to Pegasus and then by integrating the imported schemas. Schema registration provides Pegasus with the necessary information to identify and locate the participating local information sources. Schema importation, by contrast, is responsible for mapping local to global objects. This is a simple oneto-one mapping between the local and global objects. During schema integration, the imported schemas are integrated using integration statements that are speci ed in SQL3+7 . Therefore from our point of view, Pegasus uses view-de nition-facilitybased schema integration.

2.9.2 Myriad Myriad [CLE93, LIM94a] is a federated database prototype that was developed at the University of Minnesota. Its goal is to provide a testbed for investigating alternative architectures and algorithms for database integration, query processing and optimisation, and concurrency control and recovery. It employs the relational data model and SQL as its canonical data model and query language, respectively. The Myriad architecture consists of the following four layers:

 Application tools: these include a suite of tools for end-users and DBAs to

access Myriad. For example, a user-friendly query formulator that provides an interactive query interface to federated databases, schema integrators that are used to merge potentially con icting local databases and to de ne global schemas, and schema browsers for users or DBAs to familiarise themselves with the system. Schema integration is achieved by performing an outerjoin and generalised attribute derivation operations, in addition to the usual set of relational operations such as select, on a set of local relations. This results

Early papers published by Hewlett-Packard Laboratories with regard to Pegasus (e.g. [AHM91, ALB93, RAF91]) state that HSQL is the language used for schema importation and integration. However, recent papers (e.g. [DU96, PIT96]) state that SQL3+ is the language used to perform these activities. 7

35

in an integrated relation. It is possible to expose only some of an integrated relation's attributes or to de ne derived attributes based on its source attributes to suit di erent users' perspectives. Myriad supports both loose and tight administrative coupling where users and/or DBAs may create integrated relations.

 Query processing subsystem: has two types of query processor, namely: the

Federated Query Manager (FQM) and the Federated Query Agent (FQA). FQM manages global query processing, this involves mapping from the user's view of the integrated database to the federated database system's view of the local databases. FQA, by contrast, is responsible for query processing at a local site involved in a global query. Query processing in Myriad is carried out in a distributed fashion with FQA processes sending intermediate results to each other before, nally, one of them returns a single result to FQM.

 Transaction Management subsystem: this is responsible for transaction pro-

cessing in Myriad. It consists of two subsystems, namely the Federated Transaction Manager (FTM) and the Federated Transaction Agent (FTA). These are analogous to FQM and FQA described above - where FTM manages global transactions and FTA coordinates a global transaction's access to a particular local database.

 Communication subsystem: this is responsible for routing messages and in-

termediate query results between logical and physical sites. It is the only component in Myriad that knows about physical sites.

Myriad's designers are planning to extend it to integrate legacy applications in addition to database systems. Also, they are planning to transform its canonical data model to the OO model, where local components are treated as objects. By this means, Myriad in its future versions will be classi ed as a Distributed Object Architecture (DOA) system (see Section 3.3.1).

36

2.9.3 Multibase Multibase [LAN82, SMI81] is a limited prototype that integrates pre-existing heterogeneous and distributed databases. It provides end-users with a uni ed global schema and a single high-level query language. Multibase uses the functional data model as its canonical data model and DAPLEX as its query language. The architecture of Multibase consists of two components:

 Schema design aid: this provides the integrated database designer with the

necessary tools to design a global schema and to de ne mappings from the local databases to the global schema.

 Query processor: this is responsible for translating global queries into local

queries, and ensuring their correct and ecient execution. It utilises the mappings de ned by the schema design aid component to perform its functions.

A global schema or view in Multibase consists of view de nition and view derivation statements. The former de ne global entities and functions of a global view from an end-user's point of view. The latter, by contrast, de ne mappings between this global view and the local schemas. These mappings are expressed as DAPLEX queries and are de ned by the DBA. Our system, MVDS, is not a complete MDB system, rather it is a schema integration tool. Therefore, only schema integration issues have been highlighted in the previous subsections. MVDS improves on Myriad by providing several integration operators to merge local classes, not only the outerjoin operation8 . We believe that MVDS also improves on Pegasus and Multibase by supporting operator-based, not languagebased, schema integration. This reduces the complexity of schema integration from the end-users' point of view, especially since MVDS provides them with a graphical user interface to interact with its integration operators.

8

Myriad uses the relational data model as its CDM.

37

Chapter 3 Object-Oriented Modelling and Schema Integration 3.1 Introduction This chapter provides a description of the OO model's basic concepts, to de ne the terms used in this thesis. It describes how object orientation has shaped current MDB system technology in terms of architectures and CDMs. It also discusses issues related to the schema integration process, namely: sources of heterogeneity in the database design process and a taxonomy of OO schematic con icts; and de nes the integration process strategies and phases. Finally, the impact of preserving local autonomy on schema integration is discussed.

3.2 The Object-Oriented Data Model 3.2.1 Basic Object-Oriented Concepts Object-oriented modelling is based on a set of powerful techniques which initially stem from OO programming and semantic modelling principles. A brief description is given here of selected OO concepts necessary to appreciate our reported research. 38

For more details of the OO paradigm, see for example [HUG91, KIM90, ROB93]. Object-orientation perceives real world entities as sets of objects, where an object is an abstract representation of a real-world entity that has a unique identity, embedded properties and the ability to interact with other objects and with itself via messages [ROB93]. Thus objects can describe entities ranging from the very simple, such as an integer, to the most complex, such as a car-factory. Objects have an existence which is independent of their value and location. Two objects with the same value can be either identical, that is, they are the same object, or equal, i.e. they are possibly di erent objects which have the same value. Objects which have a similar state (instance variables) and behaviour (methods) are grouped into a class. A class serves two purposes, namely, it factors out the similarity between a set of objects and it acts as a container for those objects. The term class is used in this thesis to refer to the schematic de nition of a collection of objects (i.e. the object type) rather than referring to the collection itself. Object-orientation supports encapsulation, where a class is accessed only via its declared interface. Class methods are embedded in the object. This helps to separate the application of a class (i.e. the class as seen by its clients) from its implementation. Inheritance is another important aspect of object-orientation. It allows a class to inherit properties from another class. The inheriting class is called a subclass while the other is called a superclass. Inheritance is divided into two categories; single inheritance, where a class is allowed to inherit properties from only one superclass, and multiple inheritance, where a class is allowed to inherit properties from more than one superclass. Multiple inheritance is a very powerful technique which facilititates the natural representation of relationships between real world entities. Classes with single inheritance are arranged into a class hierarchy whereas classes with multiple inheritance are arranged into a class lattice, where often the latter term is used inclusively to refer to both cases. In a class lattice, two types of con ict may arise in the names of class properties. Firstly, con icts may arise between a class and its superclasses when the class re-de nes an inherited property. Such a con ict is reconciled by allowing the class de nition to override the superclass de nition. The second type of con ict may be introduced by multiple inheritance where

39

some of a class's superclasses' properties have the same name1. This con ict can be reconciled by giving priorities to the superclasses, as in ORION [BAN87], which ensures that the con icting properties are inherited from the superclass with higher priority. Alternatively, declaring the origin of an inherited property explicitly, as in the ODMG-93 standard [CAT94], resolves this problem. Aggregation [SMI77] occurs in the OO model when a relationship amongst objects is represented by a higher level aggregate object. For example, a UNIVERSITY is an aggregation of a set of DEPARTMENTs. Aggregation is a method of representing complex objects. A complex object is one that has other objects as its subparts. For example, a CAR object might have BODY and ENGINE as its sub-objects.

3.2.2 Why the Object-Oriented Model? The ever-increasing complexity of database applications which has occurred over the years, and the cost of implementing, maintaining and extending these applications, has highlighted de ciencies in traditional data models, such as the network and relational data models, and encouraged the emergence of a new generation of data models such as the semantic and OO data models. The OO model naturally meets the needs of complex applications such as arti cial intelligence (AI), computer-aided design and manufacturing (CAD/CAM), and oce information systems (OIS) [ATK92, BAN87, DEU92, FIS87, HUG91, HUR93, KIM90, RUM91, STO91, WIL90]. The following are some of the perceived advantages of the OO model2:

 It promotes modularity by modelling all conceptual entities as objects.  It represents and manipulates complex objects, which permits the successive re-

nement of complex entities.

 It allows the de nition and manipulation of arbitrary data types.  It is capable of representing and managing changes in a database over time, including the notions of time, time interval and versions of objects and schemas. 1 2

These properties may have been inherited from a common superclass. This list is intended to be illustrative rather than exhaustive.

40

 It arranges schema classes in a class generalisation hierarchy that facilitates the

top-down design of the database and helps the factoring out of shared speci cations and implementations in application programs via property inheritance through the class generalisation hierarchy.

 Its encapsulation principle supports a separation between the speci cation (inter-

face) and the implementation of objects, and therefore allows the implementation to be changed, if required, without invalidating existing application programs.

 Its notion of object identity supports the realisation of location and value independences.

 It alleviates the impedence-mismatch problem by providing computationally com-

plete query languages. Impedence mismatch occurs when a query language (e.g. SQL) is embedded in a programming language and the database objects manipulated by the query language statements are not subject to the type-checking constraints of the programming language [HUG91]. Thus in the OO model, there is a much stronger anity between the programming language constructs and the objects in the database, and each object has a well-de ned manipulation protocol.

3.2.3 Structural versus Behavioural Object-Orientation Object-oriented data models are classi ed into two broad categories, namely: structural object-orientation and behavioural object-orientation [ATK92, GOT92, QUT93]. Structural object-orientation emphasises the arrangement of objects into clusters or complex objects with established relationships between them. It also has a special query language to operate on the structural component (data structures) of an OO schema de nition. Behavioural object-orientation, on the other hand, focuses on objects as computational agents. Each agent has a set of instance variables and an interface. An agent's interface is speci ed by methods (procedures) which operate on its instance variables, only method signatures are available for application programs. Instance variables of an agent are hidden from user application programs and can only be manipulated by its declared methods. In such models much of the application semantics is hidden in instance variables and the method codes of 41

objects. However, both categories classify classes into generalisation hierarchies and provide inheritance mechanisms which exploit these hierarchies.

3.3 Object-Orientation and Multidatabase Systems This section shows how object-orientation has shaped current MDB systems in terms of their architectures and CDMs.

3.3.1 Object-Based Multidatabase Systems The application of the OO paradigm in MDB architectures is advocated by many authors [NIC93, PIT95] as it provides a natural model for heterogeneous, autonomous and distributed systems. This model is called the Distributed Object Architecture (DOA). The constituent components of the MDB are treated as large-grained objects, where each of them is presented to the MDB system as an object with a declared interface and hidden implementation. The data-part of such an object represents the resources it possesses; for example, when the component is a database, its resources are the information modelled in its schema. By contrast, the behaviouralpart of such an object represents the services it provides. Again, if it is a database then its services are the operations that implement the ecient retrieval and update of its information contents. Thus a MDB system is seen as a set of interacting objects whose interaction is achieved through message passing. Because the constituent objects interact with each other at their declared interfaces, the DOA supports heterogeneity and autonomy naturally. Heterogeneity is supported because details are hidden by encapsulating an object's implementation. Autonomy is supported because the constituent objects are allowed to change their implementations freely as long as they keep their interfaces unchanged. Finally, the DOA o ers a means of integrating applications across di erent technology domains including Graphical User Interfaces (GUIs), le systems, database systems and programming languages. Figure 3.1 shows a MDB system based on the DOA architecture. There have been a number of standardisation e orts. For example, the Object 42

Object 2

Object 1 Message Passing

Hidden Implementation

Interface

Interface 1

Hidden Implementation

Interface 2

Message Passing

Message Passing Interface

Hidden Implementation Object 3

Figure 3.1: A Multidatabase System based on Distributed Object Architecture management Group (OMG) [OMG91] is developing standards that address the integration of distributed applications through object technology. The architecture proposed by OMG is called the Object Management Architecture (OMA) and, according to it, every piece of software is modelled as an object. These objects communicate with each other via an Object Request Broker (ORB) which is the key communication element. Also, the Object-Oriented Database Task Group (OODBTG) have drafted a reference model of Object Database Management Systems and a recommendation report on standardisation issues for these systems [ANS91].

3.3.2 A Comparison of Object-Oriented and Multidatabase Canonical Data Models As was explained in Section 2.4, the constituent components of a MDB system are most likely heterogeneous which means that its architecture must be able to cope with heterogeneity-related problems. Data model heterogeneity in particular is reconciled by translating the participating schemas into a CDM. Employing a CDM in MDB systems is extremely valuable [BUK96b, BER94, OZS91] because it reduces the number of required model inter-translators compared with not using a CDM. The use of an expressive CDM allows semantic enrichment of local database schemas 43

[SAL91] - a process of semantic upgrading which facilitates the schema integration process. Therefore the chosen CDM must be expressive enough to capture all the information represented in the local schemas, in addition to their inter-schema correspondences [CAS94, HUL87, SAL91]. The OO model is a good candidate for the following reasons [BER94, FAH94, PIT95, RED93]3:

 The OO model is semantically rich - it provides a variety of type and abstraction

mechanisms that are not supported by traditional data models. For example, its ability to represent complex objects via its part of relationship.

 The OO model permits the behaviour of objects to be modelled using methods. Methods support arbitrary combination of information stored in databases.

 The OO model makes it possible to integrate a wide range of database types, programming languages, le systems and GUIs.

 The meta-class mechanism adds exibility to the model since it allows arbitrary

re nements of the model itself.

 The clear separation between an object's interface and its implementation leads to natural support for heterogeneity and autonomy.

3.4 Schemata Integration 3.4.1 Sources of Heterogeneity Figure 3.2 (column 1) shows the database design process [ELM94a]. It can be divided into four phases, namely: user requirements collection and analysis, conceptual design, logical design and physical design. These phases do not have to proceed strictly in sequence, there are feedback loops that are not shown in the gure for reasons of simplicity and conciseness. These are some of the references that use the OO model for interoperability purposes. For a comprehensive list see [BUK96a]. 3

44

Database Design Phases User Requirements Collection and Analysis

Sources of Heterogeneity • Different user requirements. • Different interpretations of the same requirements. • The tool(s) used to gather the user requirements may impose certain restrictions that consequently lead to heterogeneities. • Different skills of different designers. . . . etc.

Description of the database requirements

Conceptual Design

The database conceptual schema in a high-level data model (e.g. E-R)

Logical Design (Data Model Mapping)

• Two designers may choose different data models (e.g. E-R vs. OMT). • Different designers may classify the same real world concepts into different classes, attributes or relationships. • Choosing different names for the same classes. • Choosing the same names for different classes. • Choosing different properties for the same classes. • Choosing different domains for the same attributes. . . . etc.

• Different DBMSs (e.g. Ingres vs. Oracle). • Different versions of the same DBMS. • Different query languages (SQL vs. QUEL). • Different storage structures. . . . etc.

The conceptual schema in a DBMS-specific data model

Physical Design Input/Output Internal schema

Process (phase)

Figure 3.2: Phases of the Database Design Process (Based on [ELM94a]) and possible Sources of Heterogeneity 45

During the user requirements collection and analysis phase, database designers interview potential users and try to understand and specify their needs of the system or expectations from the prospective database. The output of this phase is a concise de nition of user requirements. This should be as accurate and complete as possible. The designers, during the conceptual database design phase, create a concise description of user requirements using a high-level data model such as an E-R [ELM94a] or OMT [RUM91] model. The output of this phase is the conceptual schema of the database. These rst two phases are DBMS-independent, i.e. the designers do not have to limit themselves to the capabilities of a particular DBMS. The next two phases, by contrast, are DBMS-dependent. During the logical design phase the conceptual schema is transformed from a high-level data model to a DBMS-speci c model. In the last phase, i.e. physical design, the internal storage structures and le organisations for the database are speci ed. This design process o ers many possibilities for free choice which contribute to the heterogeneity problem. For example, during the rst phase two di erent designers may interpret user requirements di erently: users might have di erent needs, and communication skills vary from user to user and from designer to designer. During the conceptual schema design process, designers are free to choose from several data models, so it is possible that two di erent designers will choose di erent models. Even if they choose the same data model, the chosen model might have several equivalent constructs to represent a given concept and therefore it is probable that designers will choose di erent constructs to represent the same concept (e.g. a real world concept may be represented as a class in one database and as a property in another). Figure 3.2 (column 2) shows examples of possible sources of heterogeneity in the database design process in its di erent phases. In a MDB system, the designers are presented with a set of pre-existing and independently-owned databases. The component database designers are not necessarily the MDB designers. Moving from the real world to the represented world, i.e. database world, some real world semantics are lost. In fact, the semantic contents of a given database are governed by the expressiveness of the data model supported by its host DBMS and that DBMS itself. These factors make the design of a MDB system 46

a very challenging process. The real world semantics of the component databases have to be clearly understood to detect and consequently reconcile heterogeneities between these databases in the integration process. Given the fact that component database schemas are all that might be available to the MDB designers, the detection and reconciliation of semantic heterogeneity in particular is very dicult. Still, such heterogeneities have to be detected and reconciled before the bene ts of a MDB can be fully achieved.

3.4.2 Classi cation of Semantic Heterogeneity in the OO Model Irrespective of the approach taken for database interoperability, designers are faced with the problem of comparing the information content of the various databases concerned [SPA91]. It is important to know to what extent these databases share related information and it is equally important to instruct the MDB system about such commonalities so that it can manage the global schema(s) for the underlying composite database. Unfortunately, a database schema alone is not enough to correctly interpret the real world meaning of the data it describes. Some additional source of semantic knowledge (e.g. a knowledge base, database extension or human knowledge) has to be used to help detect semantic equivalence and overcome heterogeneity between the component schemas in a MDB system. A classi cation of the set of possible con icts that may exist between a set of independently designed OO schemas is presented here. The aim is not to present a new or exhaustive taxonomy but to give the reader an idea of these types of con ict, as this will help him/her in understanding the function of our MVDL operators. There have been a large number of articles on this topic, some of them provide an excellent treatment (see for example [GAR96, KIM91a, KIM93]). Our main contribution to this topic is the reuse of semantic information that accrues from the detection and reconciliation of semantic heterogeneity (see [DUW96a]). We reuse this information in MVDS when we create subsequent global schemas (see Chapter 8). Figure 3.3 shows a taxonomy of semantic con icts that may exist between OO schemas. It is based on the taxonomies of [GAR96, KIM91a, KIM93]. We believe 47

that Kim's taxonomy [KIM91a, KIM93] is more structured and easy to understand, while Garcia-Solaco's taxonomy [GAR96] is more comprehensive. So we merged these two to produce a structured and comprehensive taxonomy. Semantic con icts are divided into three categories: schema (meta-data) level con icts, instance (data) level con icts and schema-vs-instance con icts. We next describe the major types of these con icts and provide examples when necessary for illustrative purposes. Note that in this discussion, the gure is not followed in a top-down fashion (i.e. moving from headings to sub-headings), but rather a cross section is taken and the con ict types are interpreted in their broader contexts.

I. Schema (Meta-Data) Level Con icts These occur between a set of OO schemas at the data de nition level, even when the schemas use the same data model and DDL. Note that only domain related schemas are dealt with here, i.e. we assume that they describe the same universe of discourse.

 Naming Con icts: Di erent designers use their own terminologies and nomenclatures to describe real world concepts. This may lead to synonym and homonym problems. The former occurs when two di erent names are used by di erent designers to describe the same real world concept. For example one designer may represent a set of employees as class EMPLOYEE in one database (say DB1), while another designer may represent the same set as class WORKER in another database (say DB2). A homonym occurs when the same name is used, by di erent designers, to represent unrelated real world concepts. For instance, the class COURSE in DB1 may denote a set of courses taken by a student, on the other hand the class COURSE in DB2 may refer to the available dishes in a restaurant where that student eats. These problems may occur at the class, property and/or method levels.

 Property Con icts Two designers may attach di erent properties to semantically equivalent classes. In the example of gure 3.4, ve properties (namely: name, course, address, tutor and mark) were chosen for the class STUDENT in database DB1, while only four 48

1. Schema (Meta-Data) Level Conflicts 1.1 Class Level Inconsistencies 1.1.1 Class vs. Class • Name -->different names for the same real world concept (synonyms) -->same name for different real world concepts (homonyms) • Properties ==> 1-to-1 Property • name • data type • numerical vs. non-numerical domains (e.g. student grades as strings vs. student grades as integers) • numerical domains (different dimensions, units or scales) • non-numerical domains • differences in string lengths • differences in format (e.g. date as DD/MM/YY vs. date as MM/DD/YY)

• complex domains (see class aggregation hierarchy conflicts) • present/absent properties • missing but implicit properties • temporal differences • property constraints • single valued vs. multi-valued properties • null value conflicts (e.g. no null, null as not known vs. null as not applicable)

• uniqueness • default value differences ==> M-to-N Properties • Class Constraints 1.1.2 M-classes vs. N-classes 1.2 Class Hierarchy Inconsistencies 1.2.1 Inconsistencies along the generalisation/specialisation hierarchy • inconsistencies in the specialisation criteria • inconsistencies in the specialisation degrees and characterisations • inconsistencies in the specialisation kinds • inconsistencies in the specialisation constraints (e.g. delete effect) 1.2.2 Inconsistencies along the aggregation/decomposition hierarchy • inconsistencies in the type of the aggregation (e.g. simple aggregation vs. composite aggregation) • inconsistencies of the characteristics of the aggregation (i.e. the same type of aggregation but different representations) • inconsistencies of the aggregation constraints

2. Instance (Data) Level Conflicts • presence/absence instances • value mismatch conflicts • for multi-valued properties: differences in the number of values a property may take • entity (object) identification problems

3. Schema (Meta-Data) vs. Instance (Data) Level Conflicts • the same concept is represented as a value in one database and as a property name in another database • the same concept is represented as a value in one database and as a class name in another database

Figure 3.3: Semantic Con icts Classi cation 49

properties (namely: st-name, course, address and grade) were selected for the class STUDENT in database DB2. Further, these designers may choose di erent domains for the same property. In gure 3.4, STUDENT.mark takes integer as its domain in DB1, while STUDENT.grade takes string as its domain in DB2. This type of con ict could be a scale con ict, when it is usually reconciled by means of a mapping table as shown in gure 3.5. Even two properties drawn from the same domain can still be semantically heterogeneous. For instance, EMPLOYEE.salary in two databases is represented as oat in both, but their values are in UK pounds in one database and US dollars in the other. Even if the two salaries are represented in the same currency (say pounds), semantic heterogeneity can still arise. For example, one salary may represent the gross salary while the other represents the net salary (i.e. after deducting taxes). This type of con ict is called a unit con ict and is usually resolved by means of mathematical formulae. Database 2 (DB2)

Database 1 (DB1)

DB2-STUDENT

DB1-STUDENT name: string course: COURSE address: ADDRESS tutor: string mark: integer

st-name: string course: string address: ADDRESS grade: string

Figure 3.4: An Example of Property Con icts between Semantically Equivalent Classes DB1-STUDENT.mark: integer [0 ..100] dcxc 70 =< mark ===> ===> ===>

A B C F

Figure 3.5: An Example of a Reconciliation Table between Student's Marks and Grades

50

 Inconsistencies in the Generalisation/Specialisation Hierarchy In the OO model a class may have a set of superclasses and/or subclasses. Therefore, an OO schema forms a Directed Acyclic Graph (DAG). The specialisation of a class into subclasses is assumed to be done according to some criteria. For example, class EMPLOYEE may be specialised into MALE-EMPLOYEE and FEMALEEMPLOYEE according to sex. Heterogeneity arises when the same class is specialised into di erent subclasses based on di erent criteria. Figure 3.6 illustrates this for the class EMPLOYEE. The number of subclasses a given class has is called its specialisation degree. Con icts occur if two designers choose a di erent number of subclasses for the same class. For instance, the same class may have N subclasses and M subclasses in di erent databases where M and N are di erent. Even if the two classes have the same number of subclasses, they may disagree on the specialisation characterisation. In gure 3.7, a person is classi ed as an adult in the rst database if he is over 20 years old and in the second if he is over 18 years old. Four kinds of specialisation hierarchy can be identi ed: disjoint (if each object of the superclass is a member of at most one of its subclasses), complementary (if each object of the superclass is a member of at least one of its subclasses), alternative (if each object of the superclass is a member of exactly one of its subclasses) and general (if none of these constraints apply to the specialisation). Heterogeneities occur in this dimension if two semantically equivalent classes have di erent specialisation kinds; it may be disjoint in one database while complementary in another. Finally there may be constraints attached to the specialisation hierarchy; heterogeneity arises here if there are di erent or con icting constraints attached to the specialisation hierarchy for the same class in di erent schemas.

 Inconsistencies in the Aggregation/Decomposition Hierarchy Aggregation is divided broadly into three kinds [GAR96]: simple, composite and collective aggregations. In simple aggregation, a class is an aggregation of its properties. Some of these properties might have complex classes as their domains (data types). The aggregated class does not own the classes it refers to, it simply refers to 51

EMPLOYEE

EMPLOYEE

FEMALEEMPLOYEE

MALEEMPLOYEE

MANAGER

Database 1

SALESPERSON

SECRETARY

Database 2

Figure 3.6: An Example of Inconsistencies in the Specialisation Criteria them. Composite aggregation, by comparison, is used to represent the part-of relationship among objects. A complex object is created by aggregating other objects as its sub-parts. Its existence is dependent on the other objects. For example, a class ENGINE is part-of class CAR. A car cannot exist without its engine. Lastly, collective aggregation: an aggregated object of this type is created by collecting a number of objects of just one class. For instance, a CLUB is a collection of PERSONs. This type of aggregation is further subdivided into disjoint (where the component object can belong only to one object of the aggregated class), covering (where each object of the component class has to belong to at least one collection), partitioning (where each object of the component class must be a component of exactly one collection) and general (where no restrictions apply). Semantic heterogeneity occurs in the aggregation hierarchy when two di erent classes are created by di erent kinds of aggregation. For example, a class CAR may refer to class ENGINE via simple aggregation in one database, while in another database, class CAR treats ENGINE as its sub-part through composition aggregation (i.e. a car cannot exist without its engine). Alternatively, semantic heterogeneity may arise even if the related classes are aggregated using the same aggregation kind. For instance, class STUDENT created by simple aggregation in two databases, may still be semantically heterogeneous (see gure 3.4). Finally, inconsistencies in the aggregation constraints may lead to semantic heterogeneity.

52

Database 1 PERSON

ADULT age > 20

TEENAGER 12 =< age 18

TEENAGER 12 < age writes; attribute string dob; attribute string nationality; attribute string author_name; }; interface v20_paper : object ( ) { attribute string title; attribute string journal; attribute short vol_no; attribute short issue_number; attribute short year; attribute v20_virus concerns; }; interface db3_virus : object ( ) { attribute string virus_name; attribute string description; }; define rule v20_author_rule1 on retrieve to v20_author.writes, v20_author.dob, v20_author.nationality, v20_author.author_name do instead retrieve db3_author.writes, db3_author.dob, db3_author.nationality, db3_author.author_name where db3_author =g db4_author;

195

define rule v20_author_rule2 on retrieve to v20_author.writes, v20_author.dob, v20_author.nationality, v20_author.author_name do instead retrieve db3_author.writes, db3_author.dob, db3_author.nationality, db3_author.author_name where Current.OID = db3_author.OID & not(db3_author =g db4_author); define rule v20_author_rule3 on retrieve to v20_author.writes, v20_author.dob, v20_author.nationality, v20_author.author_name do instead retrieve db4_author.writes, db4_author.date_of_birth, db4_author.nationality, db4_author.name where Current.OID = db4_author.OID & not(db3_author =g db4_author);

define rule v20_paper_rule1 on retrieve to v20_paper.title, v20_paper.journal, v20_paper.vol_no, v20_paper.issue_number, v20_paper.year, v20_paper.concerns do instead retrieve db3_paper.title, db3_paper.journal, db3_paper.vol_no, db3_paper.issue_number, db3_paper.year, db3_paper.concerns where db3_paper =g db4_paper; define rule v20_paper_rule2 on retrieve to v20_paper.title, v20_paper.journal, v20_paper.vol_no, v20_paper.issue_number, v20_paper.year, v20_paper.concerns do instead retrieve db3_paper.title, db3_paper.journal, db3_paper.vol_no, db3_paper.issue_number, db3_paper.year, db3_paper.concerns where Current.OID = db3_paper.OID & not(db3_paper =g db4_paper); define rule v20_paper_rule3 on retrieve to v20_paper.title, v20_paper.journal, v20_paper.vol_no, v20_paper.issue_number, v20_paper.year, v20_paper.concerns do instead retrieve db4_paper.title, db4_paper.journal, db4_paper.vol_number,

196

db4_paper.issue_number, db4_paper.year, db4_paper.concerns where Current.OID = db4_paper.OID & not(db3_paper =g db4_paper);

define rule db3_virus_rule1 on retrieve to db3_virus.virus_name, db3_virus.description do instead retrieve db3_virus.virus_name, db3_virus.description where db3_virus =g db4_virus; define rule db3_virus_rule2 on retrieve to db3_virus.virus_name, db3_virus.description do instead retrieve db3_virus.virus_name, db3_virus.description where Current.OID = db3_virus.OID & not(db3_virus =g db4_virus); define rule db3_virus_rule3 on retrieve to db3_virus.virus_name, db3_virus.description do instead retrieve db4_virus.virus_name, db4_virus.virus_description where Current.OID = db4_virus.OID & not(db3_virus =g db4_virus);

v21: Published Papers View Here, assume another groups of users need only information that describes papers published by all local database. A view to meet these requirements, called v21, can be created by importing the class v11 paper to the view v21. Our view closure algorithm is then used to load the class v21 virus to type close the new view. The following depicts MVDS output (in ODL) for the view v21: interface v21_paper : object ( ) { attribute string title; attribute string journal; attribute short vol_no; attribute short issue_number; attribute short year;

197

attribute

v21_virus concerns;

}; interface v21_virus : object ( ) { attribute string virus_name; attribute string description; }; define rule v21_paper_rule1 on retrieve to v21_paper.title, v21_paper.journal, v21_paper.vol_no, v21_paper.issue_number, v21_paper.year, v21_paper.concerns do instead retrieve db3_paper.title, db3_paper.journal, db3_paper.vol_no, db3_paper.issue_number, db3_paper.year, db3_paper.concerns where db3_paper =g db4_paper; define rule v21_paper_rule2 on retrieve to v21_paper.title, v21_paper.journal, v21_paper.vol_no, v21_paper.issue_number, v21_paper.year, v21_paper.concerns do instead retrieve db3_paper.title, db3_paper.journal, db3_paper.vol_no, db3_paper.issue_number, db3_paper.year, db3_paper.concerns where Current.OID = db3_paper.OID & not(db3_paper =g db4_paper); define rule v21_paper_rule3 on retrieve to v21_paper.title, v21_paper.journal, v21_paper.vol_no, v21_paper.issue_number, v21_paper.year, v21_paper.concerns do instead retrieve db4_paper.title, db4_paper.journal, db4_paper.vol_number, db4_paper.issue_number, db4_paper.year, db4_paper.concerns where Current.OID = db4_paper.OID & not(db3_paper =g db4_paper);

define rule v21_virus_rule1 on retrieve to v21_virus.virus_name, v21_virus.description

198

do instead retrieve db3_virus.virus_name, db3_virus.description where db3_virus =g db4_virus; define rule v21_virus_rule2 on retrieve to v21_virus.virus_name, v21_virus.description do instead retrieve db3_virus.virus_name, db3_virus.description where Current.OID = db3_virus.OID & not(db3_virus =g db4_virus); define rule v21_virus_rule3 on retrieve to v21_virus.virus_name, v21_virus.description do instead retrieve db4_virus.virus_name, db4_virus.virus_description where Current.OID = db4_virus.OID & not(db3_virus =g db4_virus);

199

Appendix E A Comparison of UniSQL/X and ODMG Object Models This appendix compares the ODMG-93 and UniSQL/X object models. It does not attempt to conform UniSQL/X to the ODMG-93 standard, rather it provides a summary of issues that we had to consider when we tested MVDS with UniSQL/X schemas. The following table summarises the major similarities and di erences between the above two models:

200

Feature Types Methods Attributes Multiple Inheritance Keys

The ODMG-93 Object The UniSQL/X Object Model Model Supported. Supported. Supported. Supported. Supported. Supported. Supported. Supported.

Users may de ne key(s) Supported through the for types. use of UNIQUE NOT NULL. Extents Users decide whether Automatically mainthe ODBMS should au- tains extent for each detomatically main- ned class (type). tain extents for types. Extent maintenance includes inserting newly created instances in the extent set and removing instances from it as they are deleted. Relationships Supports binary Not supported. relationships. Collections (e.g Homogeneous Homogeneous and hetsets) collections only. erogeneous collections. Class Attributes Not-supported. Supported. (i.e. classes as rst class objects) Object Identi ers Supported. Supported. Object Names Supported and used as Not supported. direct entry names. Attributes as rst Not supported. Not supported. class objects

201

Appendix F UniSQL/X Grammar The following BNF speci cation is taken from UniSQL/X User's manual Volume 2. Any phrase ending with \comma list" implies that the items repeated in the list are separated by commas. A phrase that ends with just \list" implies that commas are not used to separate the list items. create_statement :: CREATE {CLASS | TABLE} class_name [subclass_definition] [CLASS ATTRIBUTE (attribute_definition_comma_list)] [(attribute_definition_comma_list)] [(attr_definition | class_constraint [{, attr_definition | class_constraint_definition} ... ] ) ] [METHOD method_definition_comma_list] [FILE method_file_comma_list] [INHERIT resolution_comma_list] subclass_definition :: {UNDER | AS SUBCLASS OF} class_name_comma_list attribute_definition :: general_attribute_data_name data_type [shared_or_default] [attribute_constraint_list] class_constraint_definition :: [CONSTRAINT constraint_name] UNIQUE (attribute_name [{, attribute_name} ...])

202

method_definition ::

general_method_name [( [argument_type_comma_list] )] [result_type] [FUNCTION function_name]

default_or_shared :: SHARED [value_specification] DEFAULT value_specification attribute_constraint :: NOT NULL UNIQUE resolution :: {attribute_name | method_name} OF class_name [AS {attribute_name | method_name}] class_constraint_definition :: [CONSTRAINT constraint_name] UNIQUE (attribute_name [{,attribute_name} ... ])

203

Appendix G Prolog Predicates for Expressing MVDS Views in ODMG ODL % A: a list of properties. % D: a list of data-types. % R: a list of established conflict reconciliation rules. % Super: a list of superclasses. % Ext: Extent. % St: Stream name. write_interface(St, Name, Ext, Keys, Super, A, D, R):write(St, 'interface '), write(St, Name), write_optional_inheritance(St, Super), nl(St), write(St,

'('), write_extent(St,Ext), nl(St),

write_keys(St, Keys), write(St, write(St,

')'), nl(St),

'{'), nl(St),

write_properties(St, A, D, R), nl(St), write(St,

'};').

write_extent(_, []). write_extent(St, [Ext]):write(St, 'extent '), write(St, Ext), nl(St). write_extent(St, Ext):write(St, 'extent '), write(St, Ext), nl(St).

204

write_optional_inheritance(_, []). write_optional_inheritance(St, [H|T]):write(St,

' : '), write(St,

H),

write_optional_inheritance(St, T). write_keys(_St, []). write_keys(St, List):atom(List),

% the text widget puts a '' around

keys

name(List, L), delete(L, 91, L1), delete(L1, 93, L2), name(Msg, L2),

% length of Msg = 1 in this case

wr_keys(St, 1, [Msg]). write_keys(St, List):length(List, L), wr_keys(St, L, List).

wr_keys(_, 0, _). wr_keys(St, 1, [composite(L)]):write(St,

' key ('), wr_list(St, L), write(St,

')').

wr_keys(St, 1, [H]):write(St,

' key '), write(St,

H).

wr_keys(St, _, Keys):write(St,

' keys '),

write_ls(St, Keys).

write_ls(_St, []). write_ls(St, [composite(L)]):write(St,

'('), wr_list(St, L), write(St,

205

')').

write_ls(St, [H]):write(St,

H).

write_ls(St, [composite(L)|T]):write(St,

'('), wr_list(St, L), write(St,

write_ls(St, T). write_ls(St, [H|T]):write(St,

H), write(St,

', '),

write_ls(St, T).

wr_list(_, []). wr_list(St, [H]):write(St,

H).

wr_list(St, [H|T]):write(St,

H), write(St,

', '),

wr_list(St, T).

write_properties(St, A, D, R):structs(A, D, R, Sa, Sd), difference(A, Sa, Rem1), difference(D, Sd, Rem2), rels(Rem1, Rem2, Ra, Rd), difference(Rem1, Ra, Att), difference(Rem2, Rd, Dom), write_attributes(St, Att, Dom), write_structs(St, Sa, Sd), write_rel(St, Ra, Rd).

% c_vs_c conflicts are always reconciled by % creating a virtual class, these arise when

206

'), '),

% the pair of attributes concerned have complex % data-types. % literal_class is a frame that represents structs. structs([], [], [], [], []). structs([_H1|T1], [_D1|T2], [RuleType|T3], T4, T5):(RuleType = c_vs_c1(_); RuleType = c_vs_c2(_)),!, structs(T1, T2, T3, T4, T5). structs([H1|T1], [D1|T2], [_RuleType|T3], [H1|T4], [D1|T5]):get_dom_name(D1, D_name),

% to avoid collections

literal_class(_, parent_att(H1), name(D_name), _, _),!, structs(T1, T2, T3, T4, T5).

structs([_H1|T1], [_D1|T2], [_RuleType|T3], T4, T5):structs(T1, T2, T3, T4, T5).

% to extract and represent relationships rels([], [], [], []). rels([H|T1], [D|T2], [H|T3], [D|T4]):get_dom_name(D, D_name), rel(name(H), _, to(D_name), _,_,_), !, rels(T1, T2, T3, T4). rels([_H|T1], [_D|T2], T3, T4):rels(T1, T2, T3, T4). write_attributes(_, [], []). write_attributes(St, [H|T1], [D|T2]):get_dom_name(D, D), !,

% single-value attribute

207

write(St,

'attribute

'),

write_single_dom(St, D), write(St,

' '),

write(St,

H), write(St,

'; '), nl(St),

write_attributes(St, T1, T2). write_attributes(St, [H|T1], [D|T2]):get_dom_name(D, D_name),

% multi-value attribute

get_dom_card(D, Card), write(St,

'attribute '),

write(St,

Card), write(St,

'

'), write(St,

H), write(St,

';'),

nl(St), write_attributes(St, T1, T2). write_single_dom(St, D):complex_dom1(D),!, name(D, D1), separate_class_schema(D1, Class_list, _), name(Class, Class_list), prefix_view_class_name(Class, Nclass), write(St,

Nclass).

write_single_dom(St, D):write(St,

D).

% This predicate takes care of derived attributes as well complex_dom1(D):collect_all_class_names(All), member(D, All). write_structs(_, [], []). write_structs(St, [H|T1], [D|T2]):get_dom_name(D, D), !,

% single value

name(D, D1), separate_class_schema(D1, Dom, _),

208

name(DD, Dom), literal_class(_, parent_att(H), name(D), properties(Prop), domains(DOMS)), write(St,

'attribute struct '), write(St,

write(St,

' {'),

DD),

write_struct_properties(St, Prop, DOMS), write(St,

'} '), write(St,

H), write(St,

'; '),

nl(St), write_structs(St, T1, T2). write_structs(St, [H|T1], [D|T2]):get_dom_name(D, D1), get_dom_card(D, Card), name(D1, D1_list), separate_class_schema(D1_list, D_list, _), name(D_name, D_list), literal_class(_, parent_att(H), name(D1), properties(Prop), domains(DOMS)), write(St,

'attribute '), write(St,

write(St,

' '), write(St,

H), write(St,

'; '),

nl(St), write_structs(St, T1, T2).

write_struct_properties(_St, [], []). write_struct_properties(St, [H], [H1]):write(St,

H1), write(St,

' '), write(St,

209

H).

write_struct_properties(St, [H1|T1], [H2|T2]):write(St,

H2), write(St,

write(St,

', '),

' '), write(St,

H1),

write_struct_properties(St, T1, T2).

write_rel(_, [], []). write_rel(St, [H|T1], [D|T2]):get_dom_name(D, D), name(D, D1), separate_class_schema(D1, D_list, _), name(D_name, D_list), prefix_view_class_name(D_name, D_out), rel(name(H), _, to(D), inverse(Inv), _, order_by(Order)), write(St,

'relationship '),

write(St,

D_out), write(St, ' '), write(St, H),

write_inverse(St, Inv, D_out), write_order(St, Order), write(St,

'; '), nl(St),

write_rel(St, T1, T2). write_rel(St, [H|T1], [D|T2]):get_dom_name(D, D1), name(D1, D1_list), separate_class_schema(D1_list, D_list, _), name(D_name, D_list), prefix_view_class_name(D_name, D_out), rel(name(H), _, to(D1), inverse(Inv), multiplicity(Card), order_by(Order)), write(St,

'relationship '), write(St,

write(St,

' < '),

write(St,

D_out), write(St,

Card),

'> '), write(St,

H),

write_inverse(St, Inv, D_out), write_order(St, Order), write(St, write_rel(St, T1, T2).

210

'; '), nl(St),

write_inverse(_St, [], _). write_inverse(St, [Inv], D_out):nl(St), write(St, '

'),

write(St,

' inverse '), write(St, D_out),

write(St,

'::'), write(St,

Inv).

write_order(_, []). write_order(St, [ClassName, Att]):nl(St), write(St, '

'),

write(St,

'{ '), write(St,

write(St,

ClassName),

write(St,

'::'), write(St,

'order_by '), Att), write(St,

211

'}').

Appendix H Prolog Predicates for Expressing MVDS Views in UNISQL/X % A: a list of properties. % D: a list of data-types. % Super: a list of superclasses. % Ext: Extent (not applicable to UNISQL). % St: Stream name. write_interface(St, Name, _Ext, _Keys, Super, A, D, _):write(St, 'create class '), write(St, Name), nl(St), optional_unisql_subclass_def(St, Super), nl(St), optional_unisql_attribute_definition(St, A, D), nl(St). % no superclasses optional_unisql_subclass_def(_St, []).

optional_unisql_subclass_def(_St, [object]). optional_unisql_subclass_def(St, Superclass_list):write(St, 'as subclass of '), remove_dups(Superclass_list, NewList), delete(NewList, object, Result), write_unisql_subclass_def(St, Result).

212

write_unisql_subclass_def(_St, []). write_unisql_subclass_def(St, [H]):write(St, H), nl(St). write_unisql_subclass_def(St, [H|T]):write(St, H), write(St, ', '), write_unisql_subclass_def(St, T). % no attributes optional_unisql_attribute_definition(_St, [], []). optional_unisql_attribute_definition(St, A, D):write(St, '('), nl(St), write_unisql_attribute_definition(St, A, D), nl(St), write(St, ')'), nl(St). write_unisql_attribute_definition(_, [], []). write_unisql_attribute_definition(St, [A], [D]):write(St, '

'), write(St, A), write(St, '

'),

write_unisql_data_type(St, D), nl(St). write_unisql_attribute_definition(St, [A1|At], [D1|Dt]):write(St, '

'), write(St, A1), write(St, '

'),

write_unisql_data_type(St, D1), write(St, ', '), nl(St), write_unisql_attribute_definition(St, At, Dt).

write_unisql_data_type(St, D):get_dom_name(D, D), !,

% e.g. string, s1_address

write(St, D). write_unisql_data_type(St, D):% collection write_unisql_collection(St, D).

213

write_unisql_collection(St, D):get_dom_name(D, Dname),

% collection data type

functor(D, Data_type, _), write(St, Data_type), write(St, '('), write(St, Dname), write(St, ')').

214

Appendix I Integrating n Classes Using MVDL Operators The following paragraphs identify and discuss how n classes can be integrated using the current version of MVDS. C1, ..., Cn are local classes to be integrated. G1, G2, ..., Gm are global classes generated by applying MVDL operators to the above set of local classes. prop(C) is a function that will return the properties of the class C, and ext(C) is a function that will return the set of instances that belong to the class C.

 Include, Include1, Include*, Include1*: these operators are independent

of the integration strategy followed. Recall that these operators only import classes from the local schemas to a global view (see Section 5.3).

 n-ary Union:

The syntax of this operator is: Gm = Union(C1, C2, ..., Cn). Where prop(Gm ) = prop(C1) \ prop(C2), ..., \ prop(Cn?1 ) \ prop(Cn) and ext(Gm) = ext(C1) [ ext(C2), ..., [ ext(Cn?1) [ ext(Cn). This can be achieved by using G1 = Union(C1, C2), followed by G2 = Union*(G1, C3), ..., Gm = Union*(Gm?1 , Cn). The intermediate classes G1, G2 , ..., Gm?1 are deleted from the global view afterwards1 .

These classes are deleted from the user's view. Their internal representation is kept for query processing. 1

215

 n-ary Union1:

The syntax of this operator is: Gm = Union1(C1, C2, ..., Cn ). Where prop(Gm ) = prop(C1) \ prop(C2), ..., \ prop(Cn?1) \ prop(Cn ) and ext(Gm) = ext(C1) [ ext(C2), ..., [ ext(Cn?1) [ ext(Cn). This can be achieved by using G1 = Union1(C1, C2) followed by G2 = Union1(G1, C3), ..., Gm = Union(Gm?1 , Cn ). Strictly speaking, n-ary Union1 is achieved by repeating the operator Union1 n ? 1 times.

 n-ary Union*:

The syntax of this operator is: Gm = Union*(C1, C2, ..., Cn ). Where prop(Gm ) = prop(C1) \ prop(C2), ..., \ prop(Cn?1 ) \ prop(Cn), ext(Gm) = ext(C1) [ ext(C2), ..., [ ext(Cn?1) [ ext(Cn). This can be achieved by using G1 = Union*(C1, C2), followed by G2 = Union*(G1, C3), ..., Gm = Union*(Gm?1 , Cn). The intermediate classes G1, G2, ..., Gm?1 are deleted from the global view afterwards.

 n-ary Intersect:

The syntax of this operator is: Gm = Intersect(C1, C2, ..., Cn). Where prop(Gm ) = prop(C1) [ prop(C2), ..., [ prop(Cn?1) [ prop(Cn ) and ext(Gm) = ext(G1) \ ext(C2), ..., \ ext(Cn?1) \ ext(Cn). This can be achieved by using G1 = Intersect(C1, C2), followed by G2 = Intersect(G1, C3), ..., Gm = Intersect(Gm?1, Cn ). The intermediate classes G1, G2 , ..., Gm?1 are deleted from the global view afterwards.

 n-ary Intersect1:

The syntax of this operator is Gm = Intersect1(C1, C2, ...,Cn). Where prop(Gm ) = prop(C1) [ prop(C2), ..., [ prop(Cn?1 ) [ prop(Cn ), and ext(Gm) = ext(C1) \ ext(C2), ..., \ ext(Cn?1) \ ext(Cn). This can be achieved by using G1 = Intersect1(C1, C2), followed by G2 = Intersect1(G1, C3), ..., Gm = Intersect1(Gm?1, Cn). Strictly speaking, n-ary Intersect1 e ects are 216

achieved by repeating the operator Intersect1 n ? 1 times.

 n-ary Combine:

The syntax of this operator is: Gm = Combine(C1, C2, ..., Cn). Where prop(Gm ) = prop(C1) [ prop(C2), ..., [ prop(Cn?1 ) [ prop(Cn ), and ext(Gm) = ext(C1) [ ext(C2), ..., [ ext(Cn?1) [ ext(Cn). This can be achieved by G1 = Combine(C1, C2), G2 = Combine(G1, C3), ..., Gm = Combine(Gm?1, Cn). Strictly speaking, n-ary Combine e ects are achieved by repeating the operator Combine n ? 1 times.

 n-ary Combine*:

The syntax of this operator is: Gm = Combine*(C1, C2, ..., Cn ). Where prop(Gm ) = prop(C1) [ prop(C2), ..., [ prop(Cn?1 ) [ prop(Cn ), and ext(Gm) = ext(C1) [ ext(C2), ..., [ ext(Cn?1) [ ext(Cn). The same is applied to the subclasses of C1, C2, ..., Cn . This can be achieved by G1 = Combine*(C1, C2), G2 = Combine*(G1, C3),..., Gm = Combine*(Gm?1, Cn). Strictly speaking, n-ary Combine* e ects are achieved by repeating the operator Combine* n ? 1 times.

 Aggregate:

This operator is independent of the integration strategy.

 Di erence:

The syntax of this operator is: Gm = Di erence(C1, C2, ..., Cn). Where prop(Gm ) = prop(C1), and ext(Gm) = ext(C1) - (ext(C2) [ ext(C3), ..., [ ext(Cn)). This can be achieved by G1 = Di erence(C1, C2), G2 = Di erence( G1 , C3), ..., Gm = Di erence(Gm?1, Cn).

 n-ary Connect:

The syntax of this operator is: Connect(C1, C2, ..., Cn ). Where C1 is-a Cn, C2 is-a Cn, ..., Cn?1 is-a Cn (i.e. Cn becomes a superclass of C1, C2, ..., 217

Cn?1). The semantics of n-ary Connect can be achieved by Connect(C1, Cn), Connect(C2, Cn), ..., Connect(Cn?1 , Cn ). We agree that it is sometimes tedious to integrate n classes, but this is not an inherent drawback of MVDS or MVDL. This complexity exists because the participating databases are most probably heterogeneous. Therefore, for a given pair of schemas it is necessary to obtain user decisions regarding the most suitable global representation. However, this integration is performed once, and the end-user sees a normal global view in his/her preferred representation. One might criticise the complexity of the materialisation rules in the case of n-ary integration. The structure of a materialisation rule for a global class, that was generated by integrating n local classes, is a nested structure (i.e. the rule at a higher level refers to another rule at a lower level and so forth until a base class is reached). This dramatically a ects the performance of a query processor. One way of avoiding this is to atten the nested rules, i.e. transform them into rules that are based on base classes only. This may be complemented by fetching the query results in advance and therefore avoiding execution of the set of materialisation rules each time a global query is issued.

218

Bibliography [ADI80] M. E. Adiba and B. G. Lindsay. Database Snapshots. In Proc. 6th VLDB Conference, pages 86{91, 1980. [AHM91] R. Ahmed et al. The Pegasus Heterogeneous Multidatabase System. IEEE Computer, 24(12), pages 19{27, 1991. [ALB93] J. Albert et al. Automatic Importation of Relational Schemas in Pegasus. In Research Issues in Data Engineering: Interoperability in Multidatabase Systems (RIDE-IMS'93), pages 105{113, Vienna, Austria, 1993. [ALZ96] R. M. Alzahrani. Semantic Object-Oriented Multidatabase Access. PhD thesis, University of Wales College of Cardi , 1996. [AND91] J. Andany, M. Leonard and C. Palisser. Management of Schema Evolution in Databases. In Proc. 17th VLDB Conference, pages 161{170, 1991. [AND93] M. Andersson et al. The FEMUS Approach in Building a Federated Multilingual Database System. In Research Issues in Data Engineering: Interoperability in Multidatabase Systems (RIDE-IMS'93), pages 65{68, 1993. [ANS91] ANSI-SPARC-X3-DBSSG-OODBTG. OODBTG Final Report, 1991. [ATK92] M. Atkinson et al. The Object-Oriented Database System Manifesto. In F. Bancilhon, C. Delobel and P. Kanellakis, editors, Building an ObjectOriented Database System: the Story of O2, pages 3{20. Morgan Kaufmann, 1992. [BAN87] J. Banerjee et al. Data Model Issues for Object-Oriented Applications. ACM Transactions on Oce Information Systems, 5(1), pages 3{26, 1987. 219

[BAT86] C. Batini, M. Lenzerini and S. B. Navathe. A Comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys, 18(4), pages 323{364, 1986. [BEH97a] W. Behrendt, N. J. Fiddian, W. A. Gray and A. P. Madurapperuma. The Architecture of the ITSE Interoperation Toolkit for Heterogeneous Database Environments. In Submitted for Publications, 1997. [BEH97b] W. Behrendt, R. M. Duwairi, A. P. Madurapperuma, N. J. Fiddian and W. A. Gray. An Architecture and Enabling Tools for Data Warehousing Applications. In Submitted for Publications, 1997. [BEL89] D. A. Bell, J. B. Grimson and D. H. O. Ling. Implementation of an Integrated Multidatabase-PROLOG System. Information and Software Technology, 31(1), pages 29{38, 1989. [BER89] E. Bertino et al. An Object-Oriented Approach to the Interconnection of Heterogeneous Databases. In Workshop on Heterogeneous Databases, Northwestern University, USA, pages 84{90, 1989. [BER91] E. Bertino. Integration of Heterogeneous Data Repositories by Using Object-Oriented Views. In Proc. 1st International Workshop on Interoperability in Multidatabase Systems, pages 22{29, Kyoto, Japan, 1991. [BER94] E. Bertino et al. Applications of Object-Oriented Technology to the Integration of Heterogeneous Database Systems. Distributed and Parallel Databases, 2, pages 343{370, 1994. [BER96a] E. Bertino and A. Illarramendi. The Integration of Heterogeneous Data Management Systems: Approaches Based on Object-Oriented Paradigm. In O. A. Bukhers and A. K. Elmagarmid, editors, Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, pages 251{269. Prentice Hall, 1996. [BER96b] E. Bertino, M. Negri and L. Sbattella. An Overview of the Commandos Integration System. In O. A. Bukhers and A. K. Elmagarmid, editors, ObjectOriented Multidatabase Systems: A Solution for Advanced Applications, pages 379{422. Prentice Hall, 1996. 220

[BOU91] A. Bouguettaya, R. King and K. Zhao. FINDIT: A Server Based Approach to Finding Information in Large Scale Heterogeneous Databases. In Proc. 1st International Workshop on Interoperability in Multidatabase Systems, pages 191{194, Kyoto, Japan, 1991. [BOU93a] A. Bouguettaya and R. King. Large Multidatabases: Issues and Directions. In D. K. Hsiao, E. J. Neuhold and R. Sacks-Davis, editors, Interoperable Database Systems (DS-5) (A-25), pages 55{68, North Holland, 1993. Elsevier Science Publishers B. V. [BOU93b] A. Bouguettaya et al. Implementation of Interoperability in Large Multidatabases. In Research Issues in Data Engineering: Interoperability in Multidatabase Systems (RIDE-IMS'93), pages 55{60, 1993. [BRA90] I. Bratko. Prolog Programming for Arti cial Intelligence. Addison-Wesley, 2nd edition, 1990. [BRE90] Y. Breitbart. Multidatabase Interoperability. SIGMOD RECORD, 19(3), pages 53{60, 1990. [BRI92] M. W. Bright, A. R. Hurson and S. H. Pakzad. A Taxonomy and Current Issues in Multidatabase Systems. IEEE Computer, 25(3), pages 50{59, 1992. [BRI94] M. W. Bright, A. R. Hurson and S. H. Pakzad. Automated Resolution of Semantic Heterogeneity in Multidatabases. ACM Transactions on Database Systems, 19(2), pages 212{253, 1994. [BRO93] M. L. Brodie. The Promise of Distributed Computing and the Challenges of Legacy Information Systems. In D. K. Hsiao, E. J. Neuhold and R. SacksDavis, editors, Interoperable Database Systems (DS-5) (A-25), pages 1{29, North Holland, 1993. Elsevier Science Publishers B. V. [BUK96a] O. A. Bukhres and A. K. Elmagarmid. Object-Oriented Multidatabase Systems: A Solution for Advanced Applications. Prentice Hall, 1996. [BUK96b] O. A. Bukhres et al. The Integration of Database Systems. In O. A. Bukhers and A. K. Elmagarmid, editors, Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, pages 37{56. Prentice Hall, 1996. 221

[BUS94] R. Busse, P. Fankhauser and E. J. Neuhold. Federated Schemata in ODMG. In Proc. 2nd International EAST/WEST Database Workshop, Klagenfurt, Austria, 1994. [CAS94] M. Castellanos, F. Saltor and M. Garcia-Solaco. A Canonical Data Model for Interoperability Among Object-Oriented and Relational Databases. In M. T. Ozsu, U. Dayal and P. Valduriez, editors, Distributed Object Management. Morgan Kaufmann Publishers, 1994. [CAT94] R. G. G. Cattell. Object Database Standard: ODMG-93, release 1.1. Morgan Kaufmann Publishers, 1994. [CAT96] R. G. G. Cattell. Object Database Standard: ODMG-93, release 1.2. Morgan Kaufmann Publishers, 1996. [CER90] S. Ceri, G. Gottlob and L. Tanca. Logic Programming and Databases. Springer-Verlag, 1990. [CHO94] J. Chomicki and W. Litwin. Declarative De nition of Object-Oriented Multidatabase Mappings. In M. T. Ozsu, U. Dayal and P. Valduriez, editors, Distributed Object Management. Morgan Kaufmann Publishers, 1994. [CLE93] D. Clements et al. Myriad: Design and Implementation of a Federated Database Prototype. Technical Report TR93-76, University of Minnesota, USA, 1993. [CLO87] W. Clocksin and C. Mellish. Programming in Prolog. Springer Verlag, 3rd edition, 1987. [COL91] C. Collet, M. N. Huhns and W. M. Shen. Resource Integration Using a Large Knowledge Base in Carnot. IEEE Computer, 24(12), pages 55{62, 1991. [COL95] R. M. Colomb and M. E. Orlowska. Interoperability in Information Systems. Information Systems Journal, 5(1), pages 37{50, 1995. [CZE92] B. Czejdo and M. C. Taylor. Integration of Information Systems Using Object-Oriented Approach. Computer Journal, 35(5), pages 501{513, 1992. [DAT90] C. J. Date. An Introduction to Database Systems, volume 1. AddisonWesley Publishing Company, 1990. 222

[DAY83] U. Dayal. Processing Queries Over Generalisation Hierarchies in a Multidatabase System. In Proc. 9th VLDB Conference, pages 342{253, 1983. [DAY84] U. Dayal and H. Hwang. View De nition and Generalisation for Database Integration in a Multidatabase System. IEEE Transactions on Software Engineering, SE-10(6), pages 628{645, 1984. [DEU92] O. Deux et al. The Story of O2. In F. Bancilhon, C. Delobel and P. Kanellakis, editors, Building an Object-Oriented Database System: the Story of O2, pages 21{57. Morgan Kaufmann, 1992. [DOG95] A. Dogac et al. METU Interoperable Database System. SIGMOD RECORD, 24(3), pages 56{61, 1995. [DU96] W. Du and M. Shan. Query Processing in Pegasus. In O. Bukhres and A. Elmagarmid, editors, Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, chapter 13, pages 449{471. Prentice Hall, 1996. [DUW96a] R. M. Duwairi, N. J. Fiddian and W. A. Gray. Schema Integration Meta-Knowledge Classi cation and Reuse. In Proc. 14th British National Conference on Databases (BNCOD14), pages 1{17, UK, 1996. [DUW96b] R. M. Duwairi, N. J. Fiddian and W. A. Gray. A Multiple View De nition System for Supporting Interoperability among Heterogeneous and Autonomous Databases. In Proc. 10th ERCIM Workshop on Heterogeneous Information Management, Prague, Czech Republic, 1996. [DUW96c] R. M. Duwairi, N. J. Fiddian and W. A. Gray. A Flexible Integration Framework for Supporting User Requirement Changes in a Multidatabase Environment. In Proc. International Symposium of Cooperative Database Systems for Advanced Applications (CODAS), Kyoto, Japan, 1996. [DUW96d] R. M. Duwairi, N. J. Fiddian and W. A. Gray. Views for Heterogeneous Object-Oriented Database Integration. In Submitted for Publications, 1996. [ELI91] F. Eliassen and R. Karlsen. Interoperability and Object Identity. SIGMOD RECORD, 20(4), pages 25{29, 1991. 223

[ELI95] F. Eliassen. Managining Identity in Global Object Views. In O. Bukhres, M. T. Ozsu and M. C. Shan, editors, Research Issues in Data Engineering Distributed Object Management (RIDE-DOM), pages 70{77, 1995. [ELM94a] R. Elmasri and S. B. Navathe. Fundamentals of Database Systems. Benjamin/Cummings Publishing Company, Houston, Texas, USA, second edition, 1994. [ELM94b] R. Elmasri and S. B. Navathe. Object Integration in Database Design. In International Conference on Data Engineering, 1994. [FAH94] G. Fahl. Object Views of Relational Data in Multidatabase Systems. Master's thesis, Linkoping Studies in Science and Technology, LiU-Tek-Lic, Sweden, 1994. [FID92] N. J. Fiddian, W. A. Gray, A. Ramfos and A. Cooke. Database MetaTranslation Technology: Integration, Status and Application. Database Technology, 4(4), pages 259{263, 1992. [FIS87] D. H. Fishman et al. Iris: An Object-Oriented Database Management System. ACM Transactions on Oce Information Systems, 5(1), pages 48{ 69, 1987. [FON93] M. M. Fonkam. Knowledge Location in Relational Multidatabase Systems. PhD thesis, University of Wales College of Cardi , 1993. [GAL84] H. Gallaire, J. Minker and J. M. Nicolas. Logic and Databases: A Deductive Approach. ACM Computing Surveys, 16(2), pages 153{185, 1984. [GAR96] M. Garcia-Solaco, F. Saltor and M. Castellanos. Semantic Heterogeneity in Multidatabase Systems. In O. A. Bukhres and A. K. Elmagarmid, editors, Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, pages 129{202. Prentice Hall, 1996. [GOH94] C. H. Goh, S. E. Madnick and M. D. Siegel. Context Interchange: Overcoming the Challenges of Large Scale Interoperable Database Systems in a Dynamic Environment. In Proc. 3rd International Conference on Information and Knowledge Management (CIKM-94), pages 337{346, Gaithersburg, Maryland, 1994. 224

[GOT92] W. Gotthard, P. Lockemann and A. Neufeld. System-Guided View Integration for Object-Oriented Databases. IEEE Transactions on Knowledge and Data Engineering, 4(1), pages 3{22, Feb. 1992. [GUH90] R. V. Guha and D. B. Lenat. Cyc: A Midterm Report. In Proc. 8th National Conference on Arti cial Intelligence (AAAI), pages 33{59, 1990. [HAM80] M. M. Hammer and B. S. Zdonic. Knowledge-Based Query Processing. In Proc. 6th VLDB Conference, pages 137{147, 1980. [HAR93] M. Hartig and K. Dittrich. An Object-Oriented Integration Framework for Building Heterogeneous Database Systems. In D. K. Hsiao and E. J. Neuhold, editors, Interoperable Database Systems (Ds-5) (A-25), pages 33{53, 1993. [HAY90] S. Hayne and S. Ram. Multi-User View Integration System (MUVIS): An Expert System for View Integration. In Proc. 6th International Conference on Data Engineering, pages 402{409, 1990. [HEI85] D. Heimbigner and D. McLeod. A Federated Architecture for Information Management. ACM Transactions on Oce Information Systems, 3(3), pages 253{278, 1985. [HEI90] S. Heiler and S. B. Zdonik. Object Views; Extending the Vision. In IEEE Data Engineering Conference, pages 86{93, 1990. [HOL96] R. D. Holowczak and W. S. Li. A Survey on Attribute Correspondence and Heterogeneity Metadata Representation. In Proc. MetaData Conference, http:/www.llnl.gov/liv com/metadata/md97.html, 1996. [HOW86] D. I. Howells, N. J. Fiddian and W. A. Gray. A Comparison of Old and New Technologies for Translating between Relational Query Languages. In Proc. 3rd International Workshop on Statistical and Scienti c Database Management, pages 179{183, Luxembourg, 1986. [HOW87] D. I. Howells, N. J. Fiddian and W. A. Gray. A Source to Source MetaTranslation System for Relational Query Languages. In Proc. 13th VLDB Conference, pages 227{234, Brighton, UK, 1987. 225

[HOW88a] D. I. Howells, N. J. Fiddian and W. A. Gray. A Source-to-Source MetaTranslation System for Database Query Languages - Implementation in Prolog. In P. M. D. Gray and R. J. Lucas, editors, Prolog and Databases { Implementations and New Directions, pages 22{38. Ellis Horwood Limited, 1988. [HOW88b] D. I. Howells. A Source-to-Source Meta-Translation System for Database Query Languages. PhD thesis, Department of Computer Science, University of Wales College of Cardi , UK, 1988. [HSI89] D. K. Hsiao and M. N. Kamel. Heterogeneous Databases: Proliferations, Issues and Solutions. IEEE Transactions on Knowledge and Data Engineering, 1(1), pages 45{62, 1989. [HUG91] J. G. Hughes. Object-Oriented Databases. Prentice Hall, 1991. [HUL87] R. Hull and R. King. Semantic Database Modelling: Survey, Applications, and Research Issues. ACM Computing Surveys, 19(3), pages 201{260, 1987. [HUR93] A. R. Hurson, S. H. Pakzad and J. Cheng. Object-Oriented Database Management Systems: Evolution and Performance Issues. IEEE Computer, 26(2), pages 48{60, 1993. [HUR96] A. R. Hurson and M. W. Bright. Object-Oriented Multidatabase Systems. In O. A. Bukhres and A. K. Elmagarmid, editors, Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, pages 1{36. Prentice Hall, 1996. [IBR96] H. Ibrahim, W. A. Gray and N. J. Fiddian. The Development of a Semantic Integrity Constraint Subsystem for a Distributed Database (SICSDD). In Proc. 14th British National Conference on Databases (BNCOD14), UK, 1996. [IDR94] N. B. Idris, W. A. Gray and R. F. Churchhouse. Providing Dynamic Security in A Federated Database. In Proc. 20th VLDB Conference, 1994. [KAU90] M. Kaul, K. Drosten and E. J. Neuhold. ViewSystem: Integrating Heterogeneous Information Bases by Object-Oriented Views. In IEEE International Conference on Data Engineering, pages 2{10, 1990. 226

[KEN91] W. Kent. The Breakdown of the Information Model in Multidatabase Systems. SIGMOD RECORD, 20(4), pages 10{15, 1991. [KEN93] W. Kent et al. Object Identi cation in Multidatabase Systems. In D. K. Hasio, E. J. Neuhold and R. Sacks-Davis, editors, Interoperable Database Systems (Ds-5) (A-25), pages 313{330, North Holland, 1993. Elsevier Science Publishers B. V. [KHO90] S. N. Khosha an and G. Copeland. Object Identity. In S. Zdonik and D. Mair, editors, Readings in Object-Oriented Database Systems, pages 406{ 416. Morgan Kaufmann Publishers, 1990. [KIM89] W. Kim. A Model of Queries in Object-Oriented Databases. In Proc. 15th VLDB Conference, pages 423{432, 1989. [KIM90] W. Kim. Introduction to Object-Oriented Databases. MIT Press, 1990. [KIM91a] W. Kim and J. Seo. Classifying Schematic and Data Heterogeneity in Multidatabase Systems. IEEE Computer, 24(12), pages 12{18, 1991. [KIM93] W. Kim et al. On Resolving Semantic Heterogeneity in Multidatabase Systems. Distributed and Parallel Databases, 1(3), pages 251{279, 1993. [KLA95] W. Klas and M. Schre . Meta Classes and Their Application - Data Model Tailoring and Database Integration. Springer, 1995. [KLA96] W. Klas et al. Database Integration Using the Open Object-Oriented Database System: VODAK. In O. A. Bukhres and A. K. Elmagarmid, editors, Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, pages 251{269. Prentice Hall, 1996. [KRI91] R. Krishnamurthy, W. Litwin and W. Kent. Heterogeneous Databases with Semantic Discrepancies. In Proc. 1st International Workshop on Interoperability in Multidatabase Systems, pages 144{151, Kyoto, Japan, 1991. [LAN82] T. Landers and R. L. Rosenberg. An Overview of Multibase. In H. J. Schneider, editor, Distributed Databases, pages 153{184. North Holland Publishing Company, 1982. 227

[LI91] Q. Li and D. Mcleod. An Object-Oriented Approach to Federated Databases. In Proc. 1st International Conference on Interoperability in Multidatabase Systems, pages 64{70, Kyoto, Japan, 1991. [LIM93] E. P. Lim and S. Prabhakar. Entity Identi cation in Database Integration. In Proc. 9th International Conference on Data Engineering, pages 294{301, Vienna, Austria, 1993. [LIM94a] E. P. Lim et al. Myriad: Design and Implementation of a Federated Database Prototype. Technical Report TR94-14, University of Minnesota, USA, 1994. [LIM94b] E. P. Lim, J. Srivastava and S. Shekhar. Resolving Attribute Incompatibility in Database Integration: An Evidential Reasoning Approach. In Proc. 10th IEEE International Conference on Data Engineering, pages 154{ 163, Houston, Texas, USA, 1994. [LIM96] E. P. Lim, J. Srivastava and S. Shekhar. An Evidential Reasoning Approach to Attribute Value Con ict Resolution in Database Integration. IEEE Transactions on Knowledge and Data Engineering, 8(5), pages 707{723, Oct. 1996. [LIT86] W. Litwin and A. Abdellatif. Multidatabase Interoperability. IEEE Computer, 10(12), pages 10{18, 1986. [LIT88] W. Litwin. From Database Systems to Multidatabase Systems: Why and How. In Proc. 6th British National Conference on Databases (BNCOD6), pages 161{188, 1988. [LIT90] W. Litwin, L. Mark and N. Roussopouls. Interoperability of Multiple Autonomous Databases. ACM Computing Surveys, 22(3), pages 267{293, 1990. [LIT93] W. Litwin. O*SQL: A Language for Object Oriented Multidatabase Interoperability. In D. K. Hsiao, E. J. Neuhold and R. Sacks-Davis, editors, Interoperable Database Systems (DS-5) (A-25), pages 119{137, North Holland, 1993. Elsevier Science Publishers B. V.

228

[MAD95] S. E. Madnick. From VLDB to VMLDB (Very MANY Large Databases): Dealing with Large-Scale Semantic Heterogeneity. In Proc. 21st VLDB Conference, pages 11{16, Zurich, Switzerland, 1995. [MCC82] J. McCarthy. Metadata Management for Large Statistical Databases. In Proc. 8th VLDB Conference, pages 234{243, Mexico City, USA, 1982. [MEN96] F. Mena et al. Managing Multiple Information Resources through Ontologies: Relationship between Vocabulary Heterogeneity and Loss of Information. In Knowledge Representation Meets Databases (KRDB'96), ECAI'96 Conference, pages 50{52, Budapast, Hungary, 1996. [MIL95] S. Milliner, A. Bouguettaya and M. Papazoglou. A Scalable Architecture for Autonomous Heterogeneous Database Interactions. In Proc. 21st VLDB Conference, pages 515{526, Zurich, Switzerland, 1995. [MOT81] A. Motro and P. Buneman. Constructing Superviews. In ACM-SIGMOD International Conference on Management of Data, pages 56{64, 1981. [MOT83] A. Motro. Interrogating Superviews. In Proc. 2nd International Conference on Databases (ICOD-2 ), pages 107{126, Cambridge, England, 1983. [MOT87] A. Motro. Superviews: Virtual Integration of Multiple Databases. IEEE Transactions on Software Engineering, SE-13(7), pages 785{798, 1987. [NAV86] S. Navathe, R. Elmasri and J. Larson. Integrating User Views in Database Design. IEEE Computer, 19(1), pages 50{62, 1986. [NAV96a] S. Navathe and A. Savasere. A Schema Integration Facility Using ObjectOriented Data Model. In O. A. Bukhres and A. K. Elmagarmid, editors, Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, pages 105{128. Prentice Hall, 1996. [NAV96b] S. B. Navathe, S. Mahajan and E. Omiecinski. Rule Based Database Integration in HIPED: Heterogeneous Intelligent Processing in Engineering Design. In International Symposium on Cooperative Database Systems for Advanced Applications, pages 89{96, Kyoto, Japan, 1996. 229

[NIC93] J. R. Nicol, C. T. Wilkes and F. A. Manola. Object-Orientation in Heterogeneous Distributed Computing Systems. IEEE Computer, 26(7), pages 57{67, 1993. [OMG91] Object Management Group. The Common Object Request Broker: Architecture and Speci cation. Technical Report 91.12.1, OMG, 1991. [OZS91] M. T. Ozsu and P. Valduriez. Principles of Distributed Database Systems. Prentice Hall, 1991. [PAB96] M. P. Papazoglu, Z. Tari and N. Russell. Object-Oriented Technology for Inter-Schema and Language Mappings. In O. A. Bukhers and A. K. Elmagarmid, editors, Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, pages 203{250. Prentice Hall, 1996. [PER80] F. C. N. Pereira and D. H. D. Warren. De nite Clause Grammars for Language Analysis - A Survey of the Formalism and a Comparison with Augmented Transition Networks. Arti cial Intelligence, 13, pages 231{278, 1980. [PIT95] E. Pitoural, O. Bukhres and A. Elmagarmid. Object-Orientation in Multidatabase Systems. ACM Computing Surveys, 27(2), pages 141{195, 1995. [PIT96] E. Pitoural, O. Bukhres and A. Elmagarmid. Object-Oriented Multidatabase Systems. In O. Bukhres and A. Elmagarmid, editors, Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, pages 347{378. Prentice Hall, 1996. [QUI91] Quintus Prolog, Release 3, 1991. [QUT92] M. A. Qutaishat, N. J. Fiddian and W. A. Gray. A Schema MetaIntegration System for A Heterogeneous Object-Oriented Database Environment - Objectives and Overview. In NordDATA'92 Conference, pages 74{92, Tampere, Finland, 1992. [QUT93] M. A. Qutaishat. A Schema Meta-Integration System for A Logically Heterogeneous Distributed Object-Oriented Database Environment. PhD thesis, Department of Computing Mathematics, University of Wales College of Cardi , UK, 1993. 230

[RAF91] A. Ra et al. Multidatabase Management in Pegasus. In Proc. 1st International Workshop on Interoperability in Multidatabase Systems, pages 166{173, Kyoto, Japan, 1991. [RAM89] A. Ramfos, N. J. Fiddian and W. A. Gray. Object-Oriented to Relational Inter-Schema Meta-Translation. In Workshop on Heterogeneous Databases, Northwestern University, USA, 1989. [RAM91a] A. Ramfos, N. J. Fiddian and W. A. Gray. A Meta-Translation System for Object-Oriented to Relational Schema Translations. In Proc. 9th British National Conference on Databases (BNCOD9), pages 245{268, 1991. [RAM91b] A. Ramfos. A Meta-Translation System for Object-Oriented to Relational Schema Translations. PhD thesis, Department of Computer Science, University of Wales College of Cardi , UK, 1991. [RED93] M. P. Reddy, M. Siegel and A. Gupta. Towards an Active Schema Integration Architecture for Heterogeneous Database Systems. In Research Issues in Data Engineering: Interoperability in Multidatabase Systems (RIDE-IMS'93), pages 178{183, Vienna, Austria, 1993. [ROB93] P. Rob and C. Coronel. Database Systems: Design, Implementation and Management. Wadsworth Publishing Company, 1993. [ROS94] A. Rosenthal and L. Seligman. Data Integration in the Large: The Challenge of Reuse. In Proc. 20th VLDB Conference, pages 669{675, Satiago, Chile, 1994. [RUM91] J. Rumbaugh et al. Object-Oriented Modelling and Design. Prentice Hall International Edition, 1991. [RUN92a] E. A. Rundensteiner. A Class Integration Algorithm and Its Application for Supporting Consistent Object Views. Technical Report TR92{50, University of California, Irvine, USA, 1992. [RUN92b] E. A. Rundensteiner. MultiView: A Methodology for Supporting Multiple View Schemata in Object-Oriented Databases. Technical Report TR92{07, University of California, Irvine, USA, 1992. 231

[RUN92c] E. A. Rundensteiner. MultiView: A Methodology for Supporting Multiple Views in Object-Oriented Databases. In Proc. 18th VLDB Conference, pages 187{198, 1992. [SAL91] F. Saltor, M. Castellanos and M. Garcia-Solaco. Suitability of Data Models as Canonical Models for Federated Databases. SIGMOD RECORD, 20(4), pages 44{48, 1991. [SAV91] A. Savasere, A. Sheth and S. Gala. On Applying Classi cation to Schema Integration. In Proc. 1st International Workshop on Interoperability in Multidatabase Systems, pages 258{261, Kyoto, Japan, 1991. [SCH94] M. H. Scholl and H. J. Schek. Object Algebra and Views for MultiObjectbases. In M. T. Ozsu, U. Dayal and P. Valduriez, editors, Distributed Object Management. Morgan Kaufmann Publishers, 1994. [SEL93] P. G. Selinger. Predicates and Challenges for Database Systems in the Year 2000. In Proc. 19th VLDB Conference, pages 667{675, Dublin, Ireland, 1993. [SEL96] L. Seligman and A. Rosenthal. A Metadata Resource to Promote Data Integration. In Proc. Metadata Conference, http://www.llnl.gov/liv comp/metadata/md97.html, 1996. [SHE90] A. Sheth and J. Larson. Federated Database Systems for Managing Distributed, Heterogeneous and Autonomous Databases. ACM Computing Surveys, 22(3), pages 183{236, 1990. [SHE91] A. P. Sheth. Semantic Issues in Multidatabase Systems. SIGMOD RECORD, 20(4), pages 5{9, 1991. [SHE93] A. Sheth and V. Kashyap. So Far (Schematically) Yet So Near (Semantically). In D. K. Hsiao, E. J. Neuhold and R. Sacks-Davis, editors, Interoperable Database Systems (DS-5) (A-25), pages 283{312, North Holland, 1993. Elsevier Science Publishers B. V. [SMI77] J. M. Smith and D. C. Smith. Database Abstractions: Aggregation and Generalisation. ACM Transactions on Database Systems, 2(2), 1977. 232

[SMI81] J. M. Smith et al. Multibase - Integrating Heterogeneous Distributed Database Systems. In Proc. National Computer Conference, pages 487{499, 1981. [SPA91] S. Spaccapietra and C. Parent. Con icts and Correspondence Assertions in Interoperable Databases. SIGMOD RECORD, 20(4), pages 49{54, 1991. [SPA92] S. Spaccapietra, C. Parent and Y. Dupont. Model Independent Assertions for Integration of Heterogeneous Schemas. VLDB Journal, Volume 1, pages 81{126, 1992. [STO90] M. Stonebraker, L. A. Rowe and M. Hirohama. The Implementation of POSTGRES. IEEE Transactions on Knowledge and Data Engineering, 2(1), pages 125{142, 1990. [STO91] M. Stonebraker et al. Third-Generation Database System Manifesto. In R. A. Meersman, W. Kent and S. Khosla, editors, Object-Oriented Databases: Analysis, Design and Construction, pages 495{511, North Holland, 1991. Elsevier Science Publishers B. V. [SU96] S. Su, A. Doshi and L. Su. HKBMS: An Integrated Heterogeneous Knowledge Base Management System. In O. A. Bukhers and A. K. Elmagarmid, editors, Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, pages 589{620. Prentice Hall, 1996. [UniSQL96] UniSQL Corporation. UniSQL/X User's Manual, Volume 1 and Volume 2, 1996. [VAS95] D. Vaskevitch. Very Large Databases How Large? How Di erent. In Proc. 21st VLDB Conference, pages 677{685, Zurich, Switzerland, 1995. [WAT93] A. Watters. Incremental Data Integration of Federated Databases. In Research Issues in Data Engineering: Interoperability in Multidatabase Systems (RIDE-IMS'93), pages 78{85, Vienna, Austria, 1993. [WHA91] W. K. Whang, S. B. Navathe and S. Chakravarthy. Logic-Based Approach for Realizing a Federated Information System. In Proc. 1st International Conference on Interoperability in Multidatabase Systems, pages 92{100, Kyoto, Japan, 1991. 233

[WID95] J. Widom. Research Problems in Data Warehousing. In Proc. 4th International Conference on Information and Knowledge Management (CIKM), 1995. [WIE96] J. L. Wiener et al. A System Prototype for Data Warehouse View Maintenance. In http:///www-db.stanford.edu/warehousing/warehouse.html, 1996. [WIL90] K. Wilkinson, P. Lyngbaek and W. Hasan. The Iris Architecture and Implementation. IEEE Transactions on Knowledge and Data Engineering, 2(1), pages 63{75, 1990. [WOE96] D. Woelk et al. Carnot Prototype. In O. A. Bukhres and A. K. Elmagarmid, editors, Object-Oriented Multidatabase Systems: A Solution for Advanced Applications, pages 621{651. Prentice Hall, 1996. [ZHA91] K. Zhao, R. King and A. Bouguettaya. Incremental Speci cation of Views Across Databases. In Proc. 1st International Workshop on Interoperability in Multidatabase Systems, pages 187{190, Kyoto, Japan, 1991.

234