tion repository based on an X.500 directory sys- tem. Introduction. In virtually every organization, information nec- essary to the operation of the organization is.
Managing Global Information in the CORDS Multidatabase System (Experience)
Michael Bauer
Dept. of Computer Science University of Western Ontario
Neil Coburny
Software Development Centre Antares Alliance Group Canada Ltd.
Per-Ake Larson
Dept. of Computer Science University of Waterloo
Patrick Martin
Dept. of Computing and Information Science Queen's University at Kingston
Abstract Multidatabase systems provide integrated access to autonomous, distributed, and heterogeneous database systems. As such they provide a core technology for Cooperative Information Systems. One of the main issues in developing multidatabase systems is that of managing global information such as properties of component sites and databases, component and application schemas, mappings between the schema levels, multidatabase-level access privileges, and performance statistics for the optimization of global queries. We outline the CORDS multidatabase prototype system and discuss our approach to managing global information by using an information repository based on an X.500 directory system.
Introduction In virtually every organization, information necessary to the operation of the organization is stored in a variety of information systems (ISs). These ISs are typically independently developed and managed, are based on diverse software technologies, such as le systems, database management systems (DBMSs) and spreadsheets, and run on dierent hardware platforms. The evolving in This research project is supported by IBM Canada Ltd. and by the Natural Sciences and EngineeringResearch Council of Canada. y This work was carried out while at the Dept. of Computer Science, University of Waterloo
formation requirements of organizations will demand integrated access to multiple ISs both inside and outside the organization, and will want to present more sophisticated requests to these ISs. Cooperative Information Systems (CISs), that is collections of independent ISs that cooperate to execute joint tasks, are a new paradigm capable of meeting the new requirements [3]. Each component IS within a CIS has its own interface and protocols for retrieving and updating its information. Interoperability, which is the property that two components can interact and understand each other, is therefore a key requirement for CISs. Multidatabase systems (MDBSs) are intended to support interoperability among distributed, heterogeneous, and autonomous, data sources, and will be one of the core technologies for the development of CISs. The diculties which arise when constructing a MDBS stem from three fundamental properties of the component data sources: autonomy, distribution, and heterogeneity. The meaning of autonomy and distribution seems clear: MDBS components can be located at arbitrary sites within a communications network and each component is able to choose how and when it will participate in MDBS-related activities. By heterogeneity we mean that the data sources may support dierent data models (including no data model), may be dierent implementations of the same data model, may have dierent capabilities, may have dierent interfaces, may have dierent data formats, and may run on dierent platforms. A MDBS, like any conventional DBMS, must maintain a system catalog, or data dictionary,
of all the information necessary to operate the MDBS. This global information includes schema descriptions, optimization information, and execution information. Brodie and Ceri have pointed out that deciding upon the content and location of global information or functionality is one of the most challenging architectural issues in a CIS because of the potential integrity and consistency problems and performance degradation [3]. In this paper, we describe the CORDS MDBS prototype and focus on the problem of managing the global information required by the MDBS. We describe the global information requirements of the MDBS and list a set of criteria with which to evaluate methods of managing the global information. We outline an approach to the problem which uses an information repository based on an implementation of an X.500 Directory system and evaluate the approach according to the stated set of criteria.
The CORDS MDBS CORDS is a collaborative research project involving the IBM Centre for Advanced Studies, IBM Research and a number of universities in Canada and the United States. The focus of CORDS is on the development, operation, and management of distributed applications. It concentrates on issues within ve core areas: application management services, data management services, visualization and user interfaces, development languages and tools, and midware and high speed networks [10].
MDBS Architecture
The MDBS [1] is one of the data services oered by the CORDS Service Environment (CSE) [2]. However, independent of the CSE, the CORDS MDBS can be regarded as a virtual DBMS and, as such, it has been designed to oer the full functionality of a commercial DBMS. The architecture of the CORDS MDBS is shown in Figure 1. One of the objectives of the project is to use existing standards and system components wherever possible. The underlying software platform is the Distributed Computing Environment (DCE) from the Open System Foundation (OSF)1 [5]. DCE provides a variety of services which the MDBS can use; for example, a remote procedure call (RPC) 1 OSF and DCE are trademarks of the Open Software Foundation.
package, a lightweight threads package, and an authentication service. We also use the Encina2 distributed transaction management system from Transarc [11], and the TDM directory server, a fast, X.500 directory service from the University of British Columbia [8]. The MDBS Catalog is a central repository for metadata needed by the multidatabase system. Three classes of metadata are required: schemas, mappings, and descriptions of component data sources (CDSs). The common data model used in the CORDS MDBS is the relational model so schemas de ne a collection of data in terms of relational tables and their columns, and any applicable constraints. Three types of schemas are stored: export schemas, MDBS schemas, and application schemas. An export schema de nes the data made available to the MDBS from a CDS, MDBS schemas de ne collections of data at the MDBS level which are drawn from the exported data, and application schemas de ne applicationspeci c views of the MDBS-level data. Mappings are needed to transform export schema objects into MDBS schema objects, and MDBS schema objects into application schema objects. We also need to store descriptions of CDSs which include parameters such as, processing capabilities, relative processing speeds, available resources, and communication links and their speeds. The multidatabase system supports incremental integration; any number of MDBS schemas can be de ned. Creation of a MDBS schema is done using a tool called the MDBS View Builder [7]. As shown in Figure 1, this tool is viewed as an application by the MDBS. The main purpose of this tool is to create global schemas and mappings, and to store this information in the MDBS Catalog. Application programs interact with the MDBS through an MDBS Client; a library of functions that are linked in with the application program. The application program interface is based on Microsoft's Open Database Connectivity interface (ODBC)3 [9]. Clients communicate with the MDBS using the Distributed Relational Database Architecture (DRDA) Application Support protocol [6], which is an IBM standard4 . Requests from a client are rst received by a module called the MDBS Request Coordinator. Its main function is to send requests to the appropriEncina is a trademark of Transarc Corporation. ODBC is a trademark of Microsoft Corporation. DRDA and IBM are trademarks of International Business Machines Corporation. 2 3 4
MDBS Application
MDBS View Builder
Client
Client
Request Coordinator
Parser Request Decomposition
Security Services
Execution Coordinator Security Manager
Optimizer
Figure 1: MDBS Architecture Plan Manager
Transaction Manager
Transaction Service
ate module of the MDBS Server or to forward retional servers that are spread over the network. A quests to other MDBS Servers. A query or update CDS may provideCatalog computational services. A comrequest is rst passed through a Translator compoputational server might also be a (stripped-down) Information nent which parses the request, performs semantic CDS that does not manage any persistent data. checking, and translates the request into one or The Execution Coordinator simplyRepository sends approService more requests against tables speci ed in the expriate instructions to CDSs and to computational port schemas. This process relies heavily on dataAgent servers. It also routes data streams to the correct from the Catalog. The request is then passed on servers. to the Optimizer whose task is to nd an ecient The MDBS interacts with a CDS through an strategy for executing the request and to convert MDBS Agent. All agents present the same interit to a detailed, self-contained plan ready for exeface to the MDBS regardless of characteristics of cution. The access plan is then passed to theComponent Plan the corresponding CDS. This reduces the degree of Data Manager whose main functions are to store and Source heterogeneity handled by the multidatabase sysretrieve execution plans and to invalidate plans tem. Agents accept SQL requests expressed in whose component schemas have changed. If an apterms of the export schemas and then interact with plication requests execution of an invalidated plan, the CDS through its normal application program the Plan Manager automatically asks for the creinterface. An agent has two main tasks: request ation of a new plan. translation and response translation. A request A request to execute a stored plan is passed dimust be translated so that it is expressed in terms rectly from the Request Coordinator to an Exeof the target CDS schema and in terms of the opercution Coordinator. The Execution Coordinator ations understood by the CDS. Response translaretrieves the plan by contacting the Plan Manager tion involves converting both data and error mesand then initiates its execution. The Execution sages from their local representations to standard Coordinator, as its name implies, does not perform representations used by the multidatabase systhe processing on data streams produced by CDSs tem. Each type of CDS requires a custom-built but instead coordinates that work on computaMDBS agent. The complexity of the agent de-
pends largely on the data model and capabilities of the CDS. If the CDS is a relational system, request and response translation may be trivial. On the other hand, an agent for a hierarchical DBMS may be very complex. CDSs currently supported by the prototype include the Empress5 , Oracle6 , and DB2/6000 relational systems, the IMS7 hierarchical database system, and VAX DBMS8 network database system. The current implementation of the MDBS interacts with a number of system services that are provided by the CSE: authentication, (distributed) transaction management, and information repository services. Authentication is provided by DCE's Kerberos. CSE transaction management is supported by Encina. The MDBS Catalog is stored in the CSE Information Repository service which is based on TDM.
MDBS Views
MDBS Views are the main mechanism by which interoperability among CDSs is provided. They are views that span multiple heterogeneous databases. They are like relational views in that they are not physically materialized but rather are stored as mappings which are invoked whenever an MDBS View is accessed. The syntax of our current implementation of MDBS Views extends the common SQL view de nition capability with support for user-de ned functions, and places a much stronger emphasis on the use of set operations, especially union. The user-de ned functions are used primarily to resolve several types of schema con icts. The emphasis on set operations comes from the need in MDBSs to combine data from several sources. The query de ning an MDBS View can be complex. We currently support unions, joins within a single CDS or across CDSs, and subqueries. MDBS Views may be de ned on top of export tables or other MDBS Views. To illustrate how MDBS Views are used to provide interoperability, consider an example application which accesses data about publications from several independent sources. One data source represents books at the University of Waterloo and is described by the export schema UWBooks: 5 Empress is a trademark of Empress Software Corporation. 6 Oracle is a trademark of Oracle Corporation. 7 DB2/6000 and IMS are trademarks of International Business Machines Corporation. 8 VAX DBMS is a trademark of Digital Equipment Corporation.
Export Schema: UWBooks Publisher ( PubNo, Name, Address) Book ( ISBN, Title, PubNo) Author ( ISBN, AuthorName, Aliation) A second data source contains data about publications at Queen's University and is described by the export schema QueensPubs: Export Schema: QueensPubs Items ( ItemNo, Title, Publisher, PubAddress, Type) Authors ( ItemNo, Surname, FirstName, Aliation) The attributes which form the primary key of each relation are underlined. To simplify the example, we assume that the values in corresponding columns in the export tables are drawn from the same domain so con ict resolution is not required. An application at the University of Western Ontario accesses the two data sources through the MDBS schema WesternOntarioBooks: MDBS Schema: WesternOntarioBooks CREATE MDBS-VIEW Books ( ISBN, Title, Publisher ) AS SELECT ISBN, Title, Name FROM UWBooks.Book, UWBooks.Publisher WHERE UWBooks.Book.PubNo = UWBooks.Publisher.PubNo UNION SELECT ItemNo, Title, Publisher FROM QueensPubs.Items WHERE QueensPubs.Items.Type = "Book" The MDBS schema contains one MDBS View called Books which combines data about books from the two export schemas into a single table. Processing the MDBS View statement results in information about the MDBS View, and the mappings between Books and the exported tables, being stored in the MDBS Catalog. The MDBS uses this information to translate and decompose the application's SQL requests, which are posed against the MDBS table Books, into appropriate queries on the CDSs.
Requirements for Managing Global Information We now examine the global information requirements of the MDBS with respect to the types of information and the types of accesses to that information. We then summarize the requirements by presenting a list of properties we feel are essential in a global information repository.
Information Requirements
As mentioned earlier, the MDBS maintains a system catalog of all the information necessary to operate the MDBS. This global information includes the following: schema descriptions: exported data objects; global, or MDBS-level, data objects; mappings between exported and global data objects. optimization information: availability of access methods and access plans; statistical information from the component data sources. execution information: application and user access rights; descriptions of sites, network links and component data sources. Figure 2 portrays some of the major entities and relationships in the global information used by the MDBS. There are a number of MDBS servers in a distributed system. Each server provides service to applications and accesses a number of sites. Each site holds one or more CDSs. An application's access to the data is through an MDBS schema which speci es table, column and constraint de nitions. Requests against a MDBS schema may be compiled and stored in the form of access plans for subsequent execution. Any errors detected by a CDS during the processing of a request must be reported in terms of the error set of the MDBS. The data available from a CDS is de ned in terms of one or more export schemas. Each export schema contains de nitions of the available tables, columns, constraints, and indexes. The mappings between corresponding MDBS schema objects and export schema objects are provided by the database integration process. The global information repository must be able to manage both static and dynamic data. While the de nitions of schema objects are relatively static, the mappings associated with MDBS schema objects are likely to change more frequently as contributing CDSs are brought into,
and removed from, the system. The operational properties of schema objects, and objects like sites, CDSs and network links, are dynamic and are also likely to undergo relatively frequent changes as the system loads and data statistics change. The global information repository must also be able to provide the MDBS servers, and the applications, with support for name resolution. Objects in the system, from columns in a table to sites in the network, will have to be uniquely identi ed and will likely be known by several names. For example, a column in an exported table will have its local name, a globally unique name, and a machine-level identi er for ecient processing.
Access Requirements
The global information is accessed in a number of ways depending upon the system component performing the access. The database integration tool supports users during MDBS View creation. Users explore the global information by querying and by browsing, and issue updates to the MDBS Catalog during the integration process. System run-time components, such as the parser, optimizer, and plan manager, issue queries for speci c objects, typically using a system identi er, and perform updates on run-time information. Short response times for accesses by the run-time components are important for acceptable system performance. We expect that, in order to provide scalability and reasonable performance, the MDBS will consist of multiple MDBS servers distributed throughout the network. This will in turn require that the global information, while logically centralized, is physically distributed and, at least partially, replicated. Thus, some form of distributed transactions are also required to support updates.
Essential Properties
We can translate the set of requirements we have outlined into the following set of properties an \ideal" global information repository should possess: 1. A data model rich enough to model the entities and relationships present in the global information. 2. The ability to store and manage a number of dierent types of data including text, large data structures (e.g. the parse tree of an MDBS View) and functions. 3. The ability to eciently support both static
Reports MDBS Server
MDBS Error HasAccessTo
Services
Site
TranslatesTo
Contains Accesses Application
CDS Reports
Uses
Contains
AccessedBy Access Plan
MDBS Schema Contains
CDS Error
MappedTo
Export Schema Contains
Figure 2: MDBS Global Information MappedTo
IndexedBy
MDBS Export Overview of X.500 Export and dynamic data. Index Table Table The X.500 Directory Standard [4] speci es a 4. Support for name resolution. Directory Service that provides and manages in5. Access language and methods to support both Contains Contains about communication formation entities. These ConstrainedBy ConstrainedBy querying and browsing. entities are represented by entries in a hierarchiMappedTo cal name space, the X.500 Directory Information 6. Support for distributed and MDBS replicated data Export Tree (DIT). Entries are placed in the DIT accordand distributed transactions.Column Column ing to the organizational relationships between the real-world entities that they represent. The set of CORDS Information Repository entities and information held by Export the directory is MDBS called the Directory Information Base (DIB). It Constraint Constraint Discussions with other research groupsMappedTo in is assumed that updates to the DIB will be much CORDS, in particular those groups working in less frequent than queries, and that service will be the area of systems management, led to the obprovided regardless of network partitioning resultservation that there was a common need to store ing from events such as non-local site failures. The and manage global information. The approach we assumption of query type and frequency is based adopted to handle this common need was to deupon the applications anticipated to use the Divelop an information repository service within the rectory service. The query type and frequency anCORDS Service Environment. We chose to imticipated is a major dierence in the requirements plement the repository service based on an X.500 of a Directory service and a distributed database directory system. We already had experience with system. X.500 directory servers in developing a name serA DIT entry contains information about the envice for the CORDS environment, and further tity that it represents. This entity information is study and experimentation suggested that they in the form of attributes whose types and struccould also function as a repository for various tures are governed by a set of rules, or integrity kinds of information.
constraints, called schemas. Each entry in the DIT is labeled by a Relative Distinguished Name (RDN) composed from an attribute or a set of attributes of that entity and which is unique among the other entries that are children of the same parent entry. In this way, each entry has a globally unique name that is composed of the concatenated sequence of RDNs in the path from the root of the DIT to the entry. This globally unique name is the entry's Distinguished Name (DN). The X.500 Abstract Service De nition de nes abstract ports and operations that provide the user functionality for retrieving, searching, and modifying directory information. These operations form the Directory Access Protocol (DAP). The current X.500 standard supports the Read, Compare, List, Search, and Abandon interrogation functions and the basic manipulation functions - Add, Remove, Modify Entry, and Modify RDN. However, the Directory Service's modi cation functionality is limited in that it does not support the arbitrary creation and deletion of nonleaf entries. The Directory Service (including the DIT) is distributed over physically separated entities (such as computer nodes) called Directory System Agents (DSAs). The distribution is transparent to the user. Each user or user-process is represented by a Directory User Agent (DUA) that is responsible for querying or interactively interrogating the directory. The DUA acts as a client to the directory. The DUAs are provided access to the DIB through DSAs; each DUA accesses only a few of the DSAs, typically one. The DSAs can communicate with each other. They are organized into Directory Management Domains, which are either Administrative (such as postal authorities) or Private (such as enterprises). The DAP de nes the requests and responses between a DUA and a DSA, while the Directory System Protocol (DSP) de nes the requests and responses between communicating DSAs.
Capturing Global Information in the Directory
To store the MDBS global information in the information repository we rst map it to the X.500 information model, that is, we de ne the corresponding classes of DIT entries. Class de nitions within X.500 specify the information that directory entries belonging to that class must or may contain. They also specify the relationships among the entries in the sense that class relationships
(super-class, sub-class) impose rules on which entries of certain classes may precede or follow an entry in the DIT itself. We assume that our directory follows the selected class de nitions and the suggested class structure as de ned in the standard. In this case, it makes sense to assume that a MDBS server, and the individual sites and CDSs are each controlled by a single (and possibly dierent) ORGANIZATION. This class speci es that any organization entry in the DIT must have a name and may have other information including telephone number, facsimile number, address, etc. If such information is provided for an entry, then it would not have to be explicitly provided for the MDBS entities that are part of the organization. Figure 3 summarizes the structure relationships for the X.500 schema corresponding to the global information portrayed in Figure 2. In addition, the attributes for each class must be de ned. The speci cation of some attributes is straightforward, such as applicationName, since de nitions for these, that is, the de nition of a name, exist within the directory standard itself. For attributes not found in the standard further work is required to specify their characteristics and to model them appropriately within the framework of the directory standard. A discussion of this eort is beyond the scope of the paper.
Examples Revisited
To illustrate how the de nitions are used and how the global information is kept within the repository, we use the shared library example speci ed earlier in the paper. A DIT for this example might be structured as in Figure 4. Here, we let mA denote an MDBS application (MdbsAppl), cD denote component data source (Cds), mS denote an MDBS schema (MdbsSchema), eS denote an export schema (ExpSchema), mT denote an MDBS table (MdbsTable), mC denote an MDBS column (MdbsColumn), eT denote export table (ExpTable), and eC denote an export column (ExpColumn). The entries de ning the University of Waterloo CDS, the export schema, the table Book and the column ISBN might be as shown in Figure 5. The entry de nition of the component data source, CDS1, speci es the host on which it resides, the export schemas from that source and other optional information, such as speci c product databases. In this example, this information is empty. In the actual directory information base these attributes would be absent altogether; the
Organization
Mdbs
Site
MdbsError MdbsAppl
MdbsSchema
Cds
CdsError
ExpSchema
Figure 3: X.500 Class Structure. MdbsTable ExpTable directoryAccessPlan schema de nition speci es all possible between information which might be routinely colattributes and the speci c entries may or may not lected as part of systems monitoring, and that have values for the optional attributes. Mandatory which may be of use to the multidatabase sysattributes must be present at the time an entry is tem, has been one of the interesting aspects of the created. CORDS project. The de nition of the export schema UWBooks The de nition of the entry representing the mulExpConstraint speci es its exportMdbsConstraint tables. Although these areMdbsColumn pretidatabase application might be ExpIndex and its schema sented as elements of a set, each is de ned as a de ned as in Figure 6. When SharedLibrary is separate attribute-value pair within the directory executed, the MDBS client part of the applicainformation base. Thus a request to the directory tion connects to the catalog to retrieve informafor the tables associated with UWBooks would retion about the application. The entry informasult in all export table attribute value pairs betion for such an application may appear as above. ExpColumn ing retrieved. The export columns de ned within From this entry information, the MDBS client for the entry Book would behave similarly. Note can retrieve the application's schema, namely that in each case, the name of the entry is the WesternOntarioBooks. It could then retrieve the complete global name for that entry within the associated table and column de nitions (not illusDIT which speci es the object uniquely within the trated). The de nition of the CDSs and the necesglobal naming structure. sary export schema, table and column information Information about the site on which the CDS can be found from their entries, as illustrated for resides is also required. The information can be CDS1. Actual network addresses of the sites on found by accessing the appropriate DIB entry and which the CDSs actually reside can then be recould include the machine type, operating system, trieved as discussed. memory, etc. While much of this information is Prototype Implementation really of use in systems and network management, The prototype information repository must sersome of it can be of use to MDBS components, vice very dierent types of requests, as indicated particularly the query optimizer. This interplay
C = Canada
O = UWO
O = UW
O = Queens
mdbs = Prototype1
site = bluebox
site = abbott
mA = SharedLibrary
cD = CDS1
cD = CDS2
mS = UWOBooks
eS = UWBooks
eS = QueensPubs
Figure 4: ExampleeT DIT. = Item
eT = Author
mT = Books
in our previous discussion on access requirements. the user during schema integration by providing a The Catalog API, which is the interface between number of facilities to de ne the MDBS schemas the components of the eT MDBS and the repository, mappings between schemas at dierent = Publisher eT = Book andeTthe= Author is a set of functions to retrieve and update particlevels [7]. The main interface to the MDBS Cataular entities and properties in the catalog. Each log is a graphical browser which allows the user to of the functions is mapped onto calls provided by explore the contents of the catalog. The browser a DUA. currently interfaces with the information reposiThe run-time componentseCof =theISBN MDBS access the Catalog API. eC = Title tory through eC = PubNo the catalog through the functions in the Catalog Prototype versus Ideal Properties API. One incompatibility that had to be resolved We return to the set of properties we proposed between the directory system and the MDBS was earlier for an ideal global information repository: how schema objects are named. The MDBS, for eciency reasons, assigns unique object identi ers 1. Rich data model: (OIDs) to schema objects when they are rst creThe directory system uses an object-based ated and then uses the OIDs in subsequent acmodel. Objects in the DIB are created as cesses. Retrievals from a directory, on the other instances of a class. Inheritance and aggregahand, are most ecient when based on the distion relationships among the classes are suptinguished name of an object. We resolved the ported. problem by creating a translation table for OIDs and distinguished names which maps an OID to 2. Dierent types of data: its corresponding distinguished name before a reThe directory system supports a variety of atquest to the directory is issued. The translation tribute types and users are able to de ne new table is stored in the MDBS Catalog and loaded attribute types when required. into memory when an application begins a session with the MDBS. 3. Ecient support of static and dynamic data: A tool called the MDBS View Builder supports We are able to handle both static and dy-
Entry CDS1f /* Mandatory attributes */ cdsName="/C=Canada/O=UW/site=bluebox/cD=CDS1", class=Cds, /* Optional attributes */ siteName="/C=Canada/O=UW/site=bluebox", exportSchemaName="/C=Canada/O=UW/site=bluebox/cD=CDS1/eS=UWBooks", cdsType="", cdsProduct="", remoteJoinFlag="n", semiJoinFlag="y", description="University of Waterloo library database."g Entry UWBooksf /* Mandatory attributes */ exportSchemaName="/C=Canada/O=UW/site=bluebox/cD=CDS1/eS=UWBooks", class=ExpSchema, /* Optional attributes */ tables=f"/C=Canada/O=UW/site=bluebox/cD=CDS1/eS=UWBooks/eT=Book", "/C=Canada/O=UW/site=bluebox/cD=CDS1/eS=UWBooks/eT=Publisher", "/C=Canada/O=UW/site=bluebox/cD=CDS1/eS=UWBooks/eT=Author"g, timestamp="", description="Library books export schema"g Entry Bookf /* Mandatory attributes */ tableName="/C=Canada/O=UW/site=bluebox/cD=CDS1/eS=UWBooks/eT=Book" class = ExpTable, column=f"/C=Canada/O=UW/site=bluebox/cD=CDS1/eS=UWBooks/eT=Book/eC=ISBN", "/C=Canada/O=UW/site=bluebox/cD=CDS1/eS=UWBooks/eT=Book/eC=Title", "/C=Canada/O=UW/site=bluebox/cD=CDS1/eS=UWBooks/eT=Book/eC=PubNo"g, /* Optional attributes */ noOfColumns=3, noOfIndexes=2, description="Author table in UWBooks"g Entry ISBNf /* Mandatory attributes */ columnName="/C=Canada/O=UW/site=bluebox/cD=CDS1/eS=UWBooks/eT=Book/eC=ISBN", class = ExpColumn, /* Optional attributes */ domain="isbn-syntax-string" nullValuesFlag="n" primaryKeyFlag="y" foreignKeyFlag="y"g
Figure 5: Example X.500 De nitions for Export Schema Objects.
Entry SharedLibraryf /* Mandatory attributes */ mdbsApplicationName="/C=Canada/O=UWO/mdbs=Prototype1/mA=SharedLibrary" class=MdbsAppl, /* Optional attributes */ MdbsSchema="/C=Canada/O=UWO/mdbs=Prototype1/mA=SharedLibrary/ mS=WesternOntarioBooks", hostSite="/C=Canada/O=UWO", description="External selection of books"g Entry WesternOntarioBooksf /* Mandatory attributes */ MdbsSchemaName="/C=Canada/O=UWO/mdbs=Prototype1/mA=SharedLibrary/ mS=WesternOntarioBooks", /* Optional attributes */ componentDataSource=f"/C=Canada/O=UW/site=bluebox/cD=CDBS1", "/C=Canada/O=Queens/site=abbott/cD=CDBS2"g, tables="/C=Canada/O=UWO/mdbs=Prototype1/mA=SharedLibrary/ mS=WesternOntarioBooks/mT=Books" timestamp="", description="Schema defining external library books"g
Figure 6: Example X.500 De nitions for Application Schema Objects. namic data in the directory system however it is not yet clear if we can achieve the ef ciency necessary for a full-scale implementation of the MDBS. One way eciency will be gained is by distributing and replicating data across multiple DSAs. Another CORDS project is also considering a number of modi cations to the TDM directory system [8] to increase eciency, including data caching at the DUA, simpli ed and less costly protocols, and less complex operations within DUAs. Further experimentation is required to adequately evaluate the eciency of our approach. 4. Support for name resolution: The directory system provides support for resolving high-level names and we provide support for OIDs as part of the catalog software. 5. Support for both querying and browsing: The directory system provides functions for basic types of queries. The most ecient queries involve searches based on the distinguished name. We have implemented a graphical browser and we are investigating the best approach to providing a sophisticated query
interface to the information repository. 6. Support for distributed, replicated data and distributed transactions: The X.500 directory standard provides support for distributed and replicated data. The TDM directory system has been extended with a transaction API to support distributed transactions [8].
Conclusions The developers of cooperative information systems face a number of interesting challenges. One of these challenges is how to support convenient integrated access to distributed and heterogeneous legacy information systems. Multidatabase systems are the technology necessary to meet the challenge. There are, however, a number of problems that remain to be solved before MDBS technology is viable. We have examined one of these problems, namely storing and managing global information in a multidatabase system, and presented the solution adopted in the CORDS MDBS project. We discussed the demands an MDBS system places on
a global information repository, in terms of both the properties of the information and the types of accesses, and outlined an implementation of a repository based upon an X.500 directory system. Using an information repository service to hold the MDBS catalog is dierent from traditional solutions which store the database catalog either as an internal structure, or as system tables in the database. Our approach to providing an ecient MDBS service is to have multiple MDBS servers distributed around the system. The MDBS catalog must be shared by the servers and will provide the glue to hold the system together. A global information repository implemented with an X.500 directory system is logically centralized but physically distributed and matches our view of the MDBS catalog. The other reason for using a global information repository was a common need for such a service among several of the CORDS projects. Our list of essential properties of an ideal global information repository have relevance to other sources of global information in a CIS besides the MDBS. A global information repository with these properties can be a powerful tool for supporting cooperation among components of the CIS. The global information repository can also be used to support users of a CIS. Large distributed systems are complex and users cannot, and should not have to, know about everything in the system. A facility to support information discovery in the CIS is therefore required. The repository, together with an interface to support querying and browsing of the information, can provide the necessary functionality.
References [1] Attaluri, G., Bradshaw, D., Coburn, N., Larson, P., Martin, P., Silbershatz, A., Slonim, J. and Zhu, Q. \CORDS Multidatabase Project", submitted to the IBM Systems Journal (January 1994). [2] Bauer, M., Coburn, N., Erickson, D., Finnigan, P. Hong,, J., Larson, P.and Slonim, J. \An Integrated Architecture for Distributed Applications", Proc. 1993 CAS Conference, Toronto (October 1993), 8 { 26. [3] Brodie, M. and Ceri, S. \On Intelligent and Cooperative Information Systems: A Workshop Summary", International Journal of In-
telligent and Cooperative Information Systems 1(2), (June 1992), 249 { 290. [4] CCITT, X.500 Directory Services 1992,
CCITT, 1992.
[5] DCE User's Guide and Reference, Open Software Foundation, 1992. [6] International Business Machines Corporation, Distributed Relational Database Architecture Reference, SC26-4651-0 edition, 1990. [7] Martin, P. and Powley, W. \Database Integration using Multidatabase Views", Proc 1993 CAS Conference, Toronto (October 1993), 779 { 788. [8] Neufeld, G. and Brachman, B. \A Transactional API for the EAN X.500 Directory Service", Proc. 1992 CAS Conference, Toronto (November 1992), 81 { 92. [9] Microsoft Inc., Open Database Connectivity System Development Kit, version 1, 1993. [10] Slonim, J., Bauer, M. and Larson, P. \CORDS: Status and Directions", Proc. 1992 CAS Conference, Toronto (November 1992), 1 { 21. [11] Transarc Corp., Encina Product Overview, Document Number TP-00-M235, Transarc Corp., 1991.