Meta information systems are tools to organize and document the large and ever growing ... environmental and geospatial data compiled by government agencies, ... systems community (e.g. (Kim & Seo, 1991), (Kashyap & Sheth, 1995) and ...
USING ENVIRONMENTAL INFORMATION EFFICIENTLY: SHARING DATA AND KNOWLEDGE FROM HETEROGENEOUS SOURCES Ubbo Visser, Heiner Stuckenschmidt, Holger Wache, Thomas Vögele TZI, Center for Computing Technologies University of Bremen D-28334 Bremen, Germany E-mail: {visser|heiner|wache|vogele}@informatik.uni-bremen.de 1
INTRODUCTION
Environmental information systems have gained more importance both in the public administration and industry since the beginning of 1990. For example, in the public administration every state in the Federal Republic of Germany has developed a type of environmental information system. National and European legislation demanding far reaching transparency in the state of the environment encouraged this development. In industry on the other hand, environmental information systems are used for cost- and product-specific recording of waste flows. These are used to point out weak points within the companies’ processes. Both application areas share the need to store and process large amounts of diverse data, which is often geographically distributed. Most environmental information systems use specific data models and databases for this purpose. This implies that making new data available to the system requires, that the data be transferred, into the system's specific data format. This is a process, which is very time consuming and tedious. Data acquisition, automatically or semi-automatically, often makes large-scale investment in technical infrastructure and/or manpower inevitable. These obstacles are some of the reasons behind the concept of information sharing. The solution of information sharing applies because existing information can be accessed by remote systems in order to supplement their own data basis. The advantages of successful information sharing is thus obvious for many reasons: • • • •
Quality improvement of data due to the availability of large and complete data. Improvement of existing analyses and application of the new analyses. Cost reduction resulting from multiple use of the existing information sources. Avoidance of redundant data and conflicts that can arise from redundancy.
However, in order to establish efficient information sharing, difficulties arising from organizational and competence questions and many other technical problems have to be solved. Firstly, a suitable information source must be located which contains the data needed for a given task. Once the information source has been found, access to the data therein has to be provided. Furthermore, access has to be provided on a technical and informational level. In short, information sharing not only needs to provide full accessibility to the data, it also requires that the accessed data may be interpreted by the remote system. While the problem of providing access to information has been largely solved by the invention of large-scale computer networks, the problem of processing and interpreting retrieved information remains an important research topic. This paper will address three of the problems mentioned above: • • •
finding suitable information sources, enabling a remote system to process accessed data, solutions for helping the remote system to interpret accessed data intended by its source.
In addressing these questions we will explore: meta information systems, data exchange standards, converter and mediator systems and techniques in the area of intelligent information integration, always bearing in mind the special needs of environmental information.
2
META INFORMATION SYSTEMS
Meta information systems are tools to organize and document the large and ever growing resources of environmental and geospatial data compiled by government agencies, academia, non-government organizations, and industry. Their main purpose is to provide an overview of the available information sources and to support manual and semi-automatic searches for environmental information. Good examples for national and international meta information systems are the German/Austrian Umweltdatenkatalog (UDK), and the European Catalogue of Data Sources (CDS). The UDK was created in the nineties to improve access to environmental and geospatial data sources held mainly by government agencies in Germany and Austria (Günther, 1998). (Legat et al., 1999) describes their five years experiences with the Austrian UDK (the most advanced UDK) and concludes, that with the help of the system, the process of finding and administering data has drastically improved. The CDS (ETC/CDS, 2000) is a locator system for environmental data in Europe. It collects and maintains environmental meta information of all 18 countries participating in the European Environment Agency (EEA) and, in the future, the 13 PHARE countries (PHARE is an European Commission Programme, the main channel for the European Union's financial and technical cooperation with the countries of central and eastern Europe). The CDS is maintained by the Topic Centre on Catalogue of Data Sources (ETC/CDS), based in Hannover, Germany. The CDS supplies answers on: • "WHO is responsible for WHAT information and data?" • "In WHICH form and WHERE do the data exist ?" • "HOW can the data be accessed?” The CDS data model (EEA, 1996) consists of a three level architecture: The core level is recommended to be used by all national implementations, levels 1 and 2 allow to implement additional attributes based on specific needs of the referenced data sources or national authorities. The CDS can be accessed over the Internet through a special interface, the WebCDS. Meta data suppliers may use a Windows application, WinCDS, to register data sources or alter meta data entries. Both search and maintenance are supported by GEMET, a multilingual environmental thesaurus (see below and Table 2). Meta information systems like the UDK or the CDS do not actually hold environmental and/or geospatial data sets, but provide and manage information, i.e. meta data about sources of environmental and/or geospatial data. To be useful, such meta information has to describe a number of important attributes of the listed data sources: • • • •
Availability (Which data exist for specific geographic locations?), Fitness of use (Do these data meet the requirements for specific, user defined applications?), Access (How can the data be accessed by the user ?), and Transfer (How can the data be transferred to the user?) (Tschangho, 1999).
Meta information systems give the user the advantage of shortening the search process for data, information or knowledge. However, the user must transform the required data to his format. Furthermore, the integration process is not always supported by the users system. Therefore, tools are required to achieve interoperability of data sources listed in a meta information system. Problems that might arise due to heterogeneity of the data are already well known within the distributed database systems community (e.g. (Kim & Seo, 1991), (Kashyap & Sheth, 1995) and (Saltor & Rodriguez, 1997)). In general, heterogeneity problems can be divided into three categories: (a) syntax (e. g. data format heterogeneity), (b) structure (e. g. homonyms, synonyms or different attributes in database tables), and (c) Semantics (e. g. intended meaning of terms in a special context or application). Attempts have been made to improve catalog interoperability, and they can be classified into several groups (a) Standardization, (b) Development of interchange protocols and (c) Mediation. These groups will be described below.
2.1
Standardization
Standardization efforts can be subdivided into efforts to define standardized models for both data and meta data, i.e. a syntactic standardization, and attempts to provide a common language used for actually implementing meta, i.e. structural and, to a lesser extend, semantic standardization. 2.1.1 Syntactic Standardization To address syntactic standardization, a number of meta data standards have been developed world-wide during the last decade (Hatton, 1997). Some of them were originate in other fields and were adapted by the environmental community, others are the result of specific efforts to create standards for environmental and/or geospatial meta data. The table below shows a selection of the most important national and international standards. Table 1: Selection of important meta data standards Name Content Standards for Digital Geospatial Metadata (CSDGM) Directory Interchange Format (DIF) Dublin Core ANZLIC CEN TC287 Transfer Standard for Digital Hydrographic Data Digital Geographic Information Exchange Standard (DIGEST)) ISO/TC211 (ISO 19100) OGC Abstract Specification – Topic 11 Recommendations on Metadata (V 2.0)
Organization Federal Geographic Data Committee (FGDC)
Country USA
Reference (FDGC, 1994)
Canada Centre for Remote Sensing (CCRS) OCLC/NCSA Internatl. Metadata Workshop ANZLIC Spatial Information Council CEN International Hydrographic Organization (IHO) DGIWG, (NATO)
Canada, USA
(NASA, 1999)
International
(Dublin Core Metadata Initiative, 2000)
Australia, New Zealand
http://www.anzlic.org.au
EU International
http://www.cenorm.be]
International
(DMA, 1991)
ISO/TC 211
International
http://www.statkart.no/is otc211/
Open GIS Consortium (OGC) Center for Earth Observation Programme (CEO)
International
(OGC, 1999)
(IHO, 1994)
EU
The Dublin Core, for example, was developed by an international working group of librarians to organize electronic libraries (Weibel, Godby, Miller, & Daniel, 1995), (Weibel, 1999). Since 1996, the Dublin Core provides a set of 15 core meta data elements that can be extended to fulfill the needs of specific user communities. The intended coarseness with which the Dublin Core offers a compromise between thorough but expensive manual annotation and error prone but cheap automatic indication, has led to it's widespread acceptance. Further leading the use of this standard in an number of formal resource description communities such as museums, libraries, government agencies, commercial organizations, and the environmental community. An example for meta data standards that originated in the environmental community and were specifically designed for environmental and geospatial data are the Content Standards for Digital Geospatial Meta Data (CSDGMD), developed by the Federal Geographic Data Committee (FGDC) (FDGC, 1994). The Content Standard data model defines seven feature groups, of which only the first (identification information) and the last (metadata reference information) are mandatory. The other five (data quality information, spatial data organization information, spatial reference information, entity and attribute information, and distribution information) are optional. The comprehensive standards describe nearly 300 separate elements and provide a solid basis for both geographic and environmental meta data systems (Günther, 1998). In the US, all government agencies are required to use the FGDC Content Standards for documenting geospatial data collected or produced as of April 1995. More recent developments are the recommendations for meta data standards issued by the Open GIS Consortium (OGC, 1999), and the work of the International Organization for Standards (ISO) Technical
Committee 211 (Geographic information/Geomatics). ISO/TC 211 has a close working relationship with the OGC and is the Technical Committee tasked by ISO to prepare a family of geographic information standards in cooperation with other ISO technical committees preparing related standards (e. g. information technology standards) (webref: http://www.statkart.no/isotc211/). ISO has a wide global commitment and each national representation covers many different major national players (governments, authorities, industry and organizations) including most data producers and many users and developers of applications. ISO/TC 211 is presently working on an integrated set of 20 different standards, and now has 25 active member from nations across the world and 16 observing member nations. In addition there are 19 liaison organizations of which OGC was one of the first. Two other liaison organizations that will base their future revisions on ISO/TC211 standards are the North Atlantic Treaty Organization (NATO) through its Digital Geographic Information Working Group (DGIWG), and the maritime society through the International Hydrographic Organization (IHO). The underlying concept behind most efforts to develop syntactic standards for meta data is the idea to create tools generalized enough to satisfy as many heterogeneous user communities as possible. With respect to a limited set of core elements, this seems to be at least partially achievable: Among eight of the most commonly used international meta data standards, six hold comparable information in the areas of • • • •
Identification (title, author, area coverage, themes, age, restrictions), Data quality (accuracy, completeness, logical consistency, origin of data), Spatial reference (projection, datum), Distribution (formats, media), and Metadata reference (age of metadata) (Tschangho, 1999).
2.1.2 Structural and Semantic Standardization The second aspect of standardization is structural and semantic integration. This is achieved through the development of a standardized terminology which is used to fill the meta data models with information. In this intergration the development of thesauri (for attribute data) and gazeteers (for geographic data) plays an important role. An example for a large scale and multi-language thesaurus for environmental terms is the GEMET, the General Multilingual Environmental Thesaurus (Nax & Lethen, 1999). GEMET is the result of a cooperation between a number of international institutions on a European level. The German Umweltbundesamt and the Italian Consiglio Nazionale delle Richerche coordinated the effort. The thesaurus is a combination of several national (Italy, Germany, Spain, Netherlands) and international (UNEP, European Parliament) thesauri and contains more than 5.400 terms and definitions in 12 European languages. GEMET is (limited) polyhierarchical and organized into groups, themes, terms (narrower, broader) and, where available, synonyms or alternate terms. Among others, GEMET is the basis for meta data implementation in the CDS (see above). Table 2 lists a number of large, mostly multilingual environmental thesauri currently available. Table 2: List of environmental thesauri Name UDK Thesaurus
Organization UBA (Austria)
GEMET
CDS/ETC
Languages German
English, French, German, Spanish, Italian, Norwegian, Danish, Dutch, Finnish, Greek, Swedish, Portugese Multilingual Thesaurus UN English, French, for Environment and Spanish, Russian, Development Arabic, Chinese Multilingual Thesaurus in IUGS English, French, Geosciences German, Russian, Spanish, Italian Dutch Milieu Thesaurus Netherlands Agency for English, French, Research Information German, Spanish, (NBOI) Italian, Norwegian, Danish, Dutch
Reference http://www.cedar.univie.ac.at/wg r_home/ (Nax & Lethen, 1999)
(UN, 1992)
http://www.iugs.org/
http://www.hgur.se/envir/miljo21 /littips/0074.html
INFOTERRA Thesaurus
UNEP
English
http://www.csa.com/routenet/r ead/infoterrathesauraus.html
2.1.3 Meta Data Interoperability Although progress has been made with respect to syntactical, structural and semantic standardization, the large number of different meta data standards and environmental thesauri illustrates a key problem of all standardization efforts. The user communities in the environmental field tend to be too heterogeneous to allow the creation of a single data model or a single terminology to satisfy all users. As a result, meta data standards and thesauri are used to support information exchange within specific user communities, and/or, if there is enough thematic overlap, within small clusters of communities. A general information exchange between different user communities, however, remains an unsolved problem. One approach to solve this problem is the use of several meta data standards in parallel. The Warwick Framework (Dempsey & Weibel, 1996) for example, acknowledges the co-existence of multiple meta data standards and consequently defines an architecture for the aggregation and simultaneous use of multiple packages of meta data. Another and highly popular approach is the creation of data clearinghouses. These online systems use the Internet and the World Wide Web to access multiple hosts of environmental and/or geopspatial data and meta data. They allow conduction of simultaneous searches, retrieve meta data and, in some cases, get access to the actual data sets. The bases for such systems are special data interchange protocols and mediators.
2.2
Data Interchange Protocols and Mediation
Data interchange protocols are filter functions that allow the connection of two or more meta information systems or data sources. They can exchange (meta) data between the systems, and supply support functions such as simultaneous browsing access and data retrieval. The objective of these protocols is to enable users to search many physically distributed data catalogs, without having to separately interrogate each one and manually correlate different sets of search results, thus effectively allowing all the data archives to appear as one database. Examples for meta data interchange protocols are the Catalogue Interoperability Protocol (CIP) (CEOS, 1996), (CEOS, 1997) and the Z39.50 protocol (Lynch, 1997). Recently, the emerging W3C standards like the Extensible Markup Language (XML) (W3C, 1999b) and the Resource Description Framework (RDF) (W3C, 1999c) have gained considerable momentum as data interchange protocols (see also section 3.1). CIP was developed as part of an international collaborative effort led by the European Space Agency (ESA), the US National Aviation and Space Agency (NASA), and the Deutsches Zentrum für Luft und Raumfahrt (DLR) with contributions from the other members of the Committee of Earth Observation Satellites (CEOS). The CIP-C specification was ratified in 1997 and defines methods, by which data retrieval (location, requesting and delivery) is to be performed, and data requests, queries, and search results are to be interpreted. CIP is used by the European Environmental Information Services (EEIS) to interconnect the EEA's CDS with INFEO, the INFormation on Earth Observation (INFEO) catalog supported by the Centre for Earth Observation (CEO). The need of the early digital libraries to share bibliographic information led to the development of the Z39.50 protocol. Today, Z39.50 is an internationally recognized standard (ISO 23950) and widely used as an information search and retrieval protocol which defines the interactions between client and server computers in integrated online libraries. Z39.50 provides field-level and full-text query of attributes and can conduct stateful sessions. In the environmental domain, Z39.50 and CIP have been used in a number of integrated systems, like the Australian Spatial Data Directory (ASDD), the G7 Global Environmental Information Locator Service (GELOS), and the NASA Global Change Master Directory. Other examples for distributed environmental meta information systems are the Australia New Zealand Land Information Council (ANZLIC), the Australian Spatial Data Directory (ASDD), and the FGDC's Geospatial Data Clearinghouse. The Data Clearinghouse allows a thematic and geographic search for geospatial data provided by more than 180 online databases, most of which are located in the US, Canada and Australia.
Table 3 lists some of the most important national and international online data clearinghouses for environmental and geospatial data. Table 3: Online data clearinghouses for environmental and geospatial data Name German Environmental Information Network (GEIN2000) US National Geospatial Data Clearinghouse Government Information Locator (GILS) Australian Spatial Data Directory (ASDD) European Environmental Information Services (EEIS) INFormation on Earth Observation (INFEO) Global Environmental Information Locator Service (GELOS) Global Change Master Directory
Organization German Environmental Agency (UBA)
Scope Environmental and geospatial data, Germany
Reference http://www.gein.de/
FGDC
Geospatial data, globally
U.S. Government Printing Office (GPO) Australian Spatial Data Infrastructure (ASDI) Project funded by Telematics Application Programme (DGXIII) Centre for Earth Observation (CEO) Centre for Earth Observation (CEO)
Public administration data , USA Geospatial data, Australia
http://www.fgdc.gov/clearing house/ http://www.access.gpo.gov/s u_docs/gils/gils.html http://www.environment.gov. au/net/asdd/ http://eeis.ceo.sai.jrc.it/
Earth observation data, globally Environment and natural resources data, globally
http://www.infeo.org/
NASA
Earth science data, globally
http://gcmd.gsfc.nasa.gov/
Environment and earth observation data, Europe
http://ceo.gelos.org/
The German Environmental Information Network (GEIN2000) is currently being developed by the German Environmental Protection Agency (UBA) as an online information broker for distributed sources for environmental information in Germany (Wassem-Gutensohn, 2000). As part of the German “Umwelt 2000” initiative, GEIN2000 is designed as a public resource of environmental information. It includes an geographic thesaurus (currently under development by SIMA Group and GISU) with more than 50.000 geographic names that will enable users of GEIN2000 to conduct true geographic searches. In addition, GEIN2000 is one of the first systems to make extensive use of the new WWW-based data-exchange standards like XML (W3C, 1999b) and RDF (W3C, 1999c).
3
SYNTACTIC APPROACHES
Syntactic approaches can help to overcome the problems of syntactical and structural integration described in section 2. In principle, syntactic approaches are based on two different ideas, namely standardization and the use of data converters.
3.1
Standardized Languages for Data Exchange
Due to the increasing importance of the Internet and the World Wide Web (WWW) for the exchange of data, standards for new data-interchange languages are often based on the work of the World Wide Web Consortium (W3C). The Hypertext Markup Language (HTML) (W3C, 1999a) is by now well established as the leading standard for data exchange in the WWW. Recently, a number of new standards have emerged that are intended to overcome some of the limitations of HTML. In this paper, our main focus is on the Extensible Markup Language (XML) (W3C, 1999b) and it's use for the exchange of environmental and geospatial data, as well as the Resource Description Format (RDF) (W3C, 1999c) (see section 4.1.3). 3.1.1 XML: Proposed Standard for Web-based Information Sharing In order to overcome the purely visualization-oriented annotation provided by HTML, XML was proposed as an extensible language that separates data storage from data visualization. In XML, the user is allowed to define his own syntax rules ("tags") and data structures. The main benefit of XML is therefore it's capabilities with respect to exchanging data in a structured way. Recently, the introduction of XML schemas ((W3C, 1999d), (W3C, 1999e)) as a definition language for data structures has emphasized this idea that could be seen.
In the following section we sketch out the underlying principles of XML, and describe XML schema definitions and their potential use for data exchange. The General Idea: A data object is said to be an XML document if it follows the respective W3C specification (W3C, 1999b). The specification provides a formal grammar for so-called "well-formed" XML documents. This grammar specifies that XML data sets consist of trees of nested elements. The elements are implemented using an HTML-like syntax that allows users to define arbitrarily named tags. However, unlike in HTML, each element has to be delimited by a start and an end tag. In addition to this general grammar, the user can impose further grammatical constraints on the structure of a document using a document type definition (DTD). An XML document is called "valid" if it has an associated type definition and complies with the grammatical constraints of that definition. A DTD specifies elements that can be used within an XML document. Furthermore, each element may have a set of attribute specifications consisting of a name and a value. The additional constraints in a document type definition refer to the logical structure of the document, this specifically includes the nesting of tags inside the information body that is allowed and/or required. Further restrictions that can be expressed in a document-type definition concern the type of attributes and the default values to be used when no attribute value is provided. At this point, we ignore the original way DTDs are defined, because XML schemas, which are described next, provide a much more compressible way of defining the structure of an XML document. Schema Definitions and Mappings: An XML schema is itself an XML document defining the valid structure of an XML documents in the spirit of a DTD. The elements used in a schema definition are of the type 'element' and have attributes that are defining the restrictions already mentioned. The information within such an element is simply a list of further element definitions that have to nested inside the defined element. Additionally, XML schema has other features that are very useful for defining data structures: • • • •
Support for basic data types Constraints on attributes (e.g.; occurrence constraints) Sophisticated structures (e.g.; definitions derived by extending or restricting) A name-space mechanism allowing the combination of different schemas
We will not be discussing these features in detail. However, it should be mentioned that the additional features make it possible to encode rather complex data structures. This enables us to map data-model of applications, whose information we wish to share with others on an XML schema. Once mapped, we can encode our information in terms of an XML document and make it (combined with the XML schema document) available over the Internet. This method has big potential for the actual exchange of data. However, the user must commit to our data-model in order to make use of the information. It has to be pointed out that • •
XML is purely syntactic/structural in nature, and that XML describes data on the object level.
Consequently, if we want to describe information on the meta level and include semantic information, we have to look for other approaches that extend XML. The RDF standard was developed to fill this gap and has been proposed as a data model for representing meta-data about web pages and their content using XML syntax. We will discuss the RDF standard in section 4.1.3 of this chapter. 3.1.2 XML Applications in the Environmental Field Many experts in the environmental field have realized the potential of XML as an exchange format for domain-specific data. A number of researchers, organizations, and private firms are currently developing XML based standards for different environmental and geospatial applications. The Open GIS Consortium (OGC) is working on the development of the Geographical Markup Language (GML) (OGC, 2000), an XML-based common terminology for geographical data. The GML specification defines features and syntax used to encode geographic information in XML. The scope of
GML is to allow the transport and storage of geographic information, including both the properties and the geometry of geographic features. In Germany, a consortium of scientific and application-oriented working groups is currently developing an Environmental Markup Language (EML). EML is a set of recommendations concerning the national and international usage of XML in the communication of environmental information (Arndt, Christ, & Günther, 2000). Already, some private firms have realized the business opportunities that come with the development of XML based data exchange standards for environmental data. An example is GeoPraxic Inc., a small firm based in Sonoma, California, that recently released a "Green Building XML (gbXML)" schema. This XML-based standard was designed to be a data-exchange solution for architects, building designers, CAD developers, and product manufacturers who want to incorporate environmental principles in their designs, tools, and products (Colon, 2000). Geospatial data-sets typically describe 3-dimensional, real-world features with respect to a specific 2dimensional (map) representation. To do this, they use a combination of attribute data, describing the feature's properties, and graphical data, describing the feature's 2-d map representation in terms of shape, color, size etc.. The XML-based standards described above are well suited to facilitate the exchange of attribute data. However, to fully support the exchange of complete geospatial data-sets, standards that deal specifically with graphical data are needed as well. The W3C recently released a first draft for the Scalable Vector Graphics (SVG) (W3C, 1999f), an XMLbased standard that extends the older Vector Markup Language (VML) (W3C, 1998). SVG is a language for describing two-dimensional graphics in XML. SVG allows for three types of graphic objects: vector graphic shapes (e.g., paths consisting of straight lines and curves), images and text. Graphical objects can be grouped, styled, transformed and composited into previously rendered objects. The feature set includes nested transformations, clipping paths, alpha masks, filter effects, template objects and extensibility. Although SVG's main purpose is to allow the creation of high-quality graphics and animation sequences for the WWW, it offers considerable potential as an exchange language for geospatial vector graphics, such as maps. Although the belief that XML will be the future standard of the WWW is widely agreed upon in the IT community, it is still very unclear whether or not individual domain-specific standards like the ones presented above will survive in the long run. This depends very much on the acceptance that these standards will be able to achieve, both on a national and an international level.
3.2
Applications for Information Sharing
The main problem of standardization is that it is almost difficult to reach a consensus about a single standard. Thus, the result contains many small sets of competing standards (for example, the OII page contains more than 30 standards for the exchange of geographical information (webref.: OII: http://www.echo.lu/oii/). However, this small set of standards can serve as a base for the exchange of information and information sharing. In principle, applications for information sharing can be divided into two main streams: converters and middleware components (e. g. mediators). 3.2.1 Converter The small sets of competing standards are a good example where standardization could serve as a basis for the development of converters that translate from one data format into the other. From this point it is possible to distinguish specific translators that are tailored for fixed translation procedures. Import routines in existing applications are typical examples for these hard-coded translators. A more promising approach is to develop generic translation programs that can be adapted to a specific translation task. The Feature Manipulation Engine (FME) (FME, 1999), originally developed for the Canadian Government, is an excellent example. FME is emerging as a de facto standard in the industry for sharing geospatial data between diverse applications (Cosentino webref.) Underlying the engine is a rich data model, which is internally consistent and inherently extensible. FME has proven its benefits as the core of a number of applications, such as the Geo-Task Server (Huber, 1998).
But the main problems of converters is, that they only convert one format into another. They can not combine and integrate information that is available in different sources. Therefore, converters normally can not complete incomplete information. For such requirements a middleware architecture is the more appropriate approach. 3.2.2 Middleware Components Currently used software system architectures base on a client/server model. The middleware breaks the classical client-server architecture by introducing an additional layer between the server and the client. Although a concrete definition of the middleware is still missing, from the functionality point of view a middleware remove functionality from the client and/or server in order to share the functionality or information to several other components. An example for a first step for middleware components are application servers. In this case the server is split in a pure database server and in a set of equal application servers providing the application functionality to the clients. Because all information is stored in one database and the application server normally does not store any data, several application servers can be invoked. This architecture reconcile the server bottleneck of classical client-server architecture. Another prominent example for a middleware component is CORBA (OMG, 1992). CORBA at the base allows the object-centered access to distributed servers. The servers provides the objects which the clients uses. The object request broker (ORB) connects the client with the requested server object. CORBA is absolutely platform independent because the object interfaces are described in its own language (Interface Definition language – IDL) which is mapped to several object-oriented programming languages (e. g. C++, Java, etc.). Furthermore, CORBA is absolutely application independent and can serve as a base for information sharing e.g. in the environmental domain. Additional, CORBA provides a couple of standardized services and modules e. g. the object transaction manager introducing transaction services on the object at communication level. In practice CORBA is an excellent platform to develop middleware components, because CORBA only provides a base for object communication; for an information sharing application e.g. in an environmental domain many implementation task are needed. For example, the completion of incomplete information has to be hard coded in a middleware component which based on CORBA for an easy, object-centred access to distributed information sources. In general, such an middleware architecture is an excellent base for information sharing, where the middleware collects the information stored in different information systems, combine them and provides the completed information to several clients. Such a middleware component is a mediator which will be presented in the next section. 3.2.3 Mediators Instead of application servers mediators do not provide application functionality but (read) access to shared information stored in different heterogeneous information sources. A mediator “combines, integrates and abstracts the information” (Wiederhold, 1992). One important advantage of mediators is that the knowledge how to access distributed information is removed from the clients. Suppose that several different clients use one mediator. If a new information source becomes available or disappears or simply changes its data structure, the clients must normally not be adapted to this modification but only the mediator. From the client point of view a mediator acts as a “virtual information source”. It provides an integrated, global view to several information sources. To this view queries can be formulated by the clients. Then the mediator divides a received query in a set of subqueries for the underlying information sources in order to collect the information which is needed to combine the requested answer. A mediator is comparable to a federated database system (e. g. (Sheth & Larson, 1990), (Ahmed et al., 1991), (Litwin, Mark, & Roussopoulos, 1990)) where several autonomous databases has to be integrated. Therefore, the problems in the development of a mediator are known by the federated database systems (Kim & Seo, 1991). But mediators relax the requirement that the underlying information sources has to be database systems but can also be sources which provide more unstructured information, i.e. semistructured information like a collection of Web pages or unstructured documents like information retrieval systems. This relaxation is possible because of the introduction of wrappers. Normally a mediator must access different information sources with different interfaces. In order to free the mediator from the specifics of
the different interfaces, the information sources are encapsulated by wrappers. These wrappers provide a uniform interface which normally is object-centered together with an appropriate query language. The task of the wrappers is to map the object-centered data structure to the data structure of the information source as well as the answering the received query with the help of the information system. Because of the mediators can provide the same interface like wrappers the resulting architecture can be cascaded as illustrated in Figure 1.
Mediator Mediator Mediator
DB
HTML
Pictures
Wrappers
Figure 1: Wrappers
Specification and Generation of Mediators One important benefit of mediators and wrappers is that these components must not be hard-coded but can be automatically generated. Such generators use a declarative specification as input and produce the desired component. A mediator generator can be configured with the help of a high-level transformation language. For example, the TSIMMIS mediator is configured by declarative rules (Papakonstantinou, Garcia-Molina, & Ullman, 1996). The rules specify how the global view provided by the mediator can be mapped to the information sources, i.e. how an object of the global view can be combined by the objects of some information sources. In TSIMMIS, the rules also reflect the relaxation to semi-structured and incomplete information. In general these rules simplify the integration task and allow the user to concentrate on the real problems during an integration task: the reconciliation of syntactic and structural heterogeneity problems. As result the integration costs are reduced dramatically if it is compared to an hard-coded mediator. Also adaptation of the mediators (e. g. introducing a new information source or remove an disappeared one) can easily be done by an adaptation of the appropriate rules. Current open problems with mediators Although the technical problem, the division of the global view query into several subqueries for the underlying information sources is well understood there are some open problems. One open problem is the optimization of the subqueries, i.e. finding a optimal ordering of the subqueries. It is obvious that an imperfect ordering can heavily influence the efficiency of the mediator. But an optimal ordering need some meta information of the underlying information sources which is often not available. Another open problem, which is very important for the geological information systems, appears if the mediator/wrapper technology is viewed from the point of heterogeneity problems. Mediators and wrappers are able to solve syntactical and structural problems but not semantical problems. But these kind of problems are very often in the integration of GIS. Therefore mediators for sharing of geological information has to be extended.
4
SEMANTIC APPROACHES
Unfortunately, syntactic approaches do not support the reconciliation of the semantic heterogeneity problems appropriately. As environmental science has an interdisciplinary character, environmental
information often faces semantic problems. They arise from the use of different terminology established for certain purposes. This type of semantic heterogeneity is already a problem for human experts in communicating with each other. Therefore, it becomes even more challenging when attempting to integrate these terminology’s automatically. Furthermore, these approaches must address two main problems: • •
Attaching semantics to information sources and entities and Drawing conclusions from the semantic annotations available.
Early approaches for semantic integration were mainly based on the use of thesauri to translate between specific vocabularies (see section 2.1.2). Fulton (Fulton, 1996) defined the term semantic plug and play, as an architecture in which the relationships among data are managed through models that define that data and the operations performed on it. Fulton argues, that semantic plug and play should allow different applications to exchange objects without excessive dependency on the user’s knowledge of how that specific data is being used in other applications. He further suggests an architecture that will require some significant improvements to existing technology, namely a cooperative interaction between tool users, tool vendors, and industry standards. He describes the following types of technologies and standards that seem to achieve the above mentioned semantic plug and play environment: 1. A formal modeling language for the explicit modeling of data, processes, objects, and knowledge services. Fulton mentioned CDIF (CASE Data Interchange Format) from EIA (Electronic Industries Association), EXPRESS from STEP (Standard for the Exchange of Product model data), and IDL from IDEF (Integrated Definition Language) as possible modeling languages but concludes that these standards are not mature enough to fulfill the above mentioned requirements. In addition, they are not fully implemented and available on the market. Furthermore, these standards are overlapping and non-interchangeable. 2. Standards for model mapping. The semantic mapping of the services (data, objects, knowledge) from one component in one application to another component in another application is a crucial step. According to Fulton the objective of model mapping is to create rules by which instances of data, processes, objects, and knowledge defined in one system or view can be linked to instances defined in another system or view. This definition is very close to the mediator idea from Wiederhold (Wiederhold, 1992) which was described in the previous section (3.2.3). In addition, some modeling languages such as EXPRESS and CDIF have capabilities that could be used for this purpose. 3. An infrastructure by which models are implemented (compiler, model language tools, etc.). The process of mapping the models into a other, new models must result in the software tool's interoperation with the other software tools in that environment. Fulton mentions three approaches that are able to explicit the data that are implicit in models: • The SQL standard (ISO/IEC, 1999) is implemented in many modeling tools in a way that enables the generation of relational schemas (in SQL) from data models. The databases produced from these schemas can then be accessed using the data manipulation features of SQL. The standards ODBC and JDBC result in freed tool builders from dependence on the implementation of SQL in particular environments. • STEP Part 21 defines the format of a data exchange file from a model defined in EXPRESS. In a given EXPRESS model there is one standard format for that exchange and tool vendors have been quick to build compilers to generate pre- and post-processors for Part 21 files from the EXPRESS models. 4. Standards for application domains. Software tools almost invariably provide a proprietary exchange mechanism that enables some level of data sharing among different installations. When different users select different tools however, data sharing depends on the agreement of the vendors to adopt some sharing mechanism as a standard. STEP application protocols provide a working mechanism for domain specific data sharing. While describing objects of the world with formal model languages it is crucial to note that a ‘shared vocabulary’ will support the process. Otherwise problems will arise during transformation or integration data, objects and processes from one system into another. What do we mean with ‘shared vocabulary’? There is a common basis of understanding in terms of the language that we use to communicate with each other. Terms from natural language can therefore be assumed to be part of a shared vocabulary relying on a common understanding of certain concepts with little variety. This common understanding
relies on the idea of “how the world is organized”. We often call this idea a 'conceptualization' of the world. Such conceptualizations provide a terminology that can be used for communication. The example of natural language demonstrates that a conceptualization is never universally valid, but rather for a limited number of persons committing to that conceptualization. This fact is reflected in the existence of languages, which differ even more (English and Japanese), or are less different (German and Dutch). Problems arising are even more complicated when dealing terminology developed for special scientific or economic areas. In these cases, we often find situations where the same term refers to different phenomena. The use of the term 'ontology' in philosophy and its use in computer science may serve as an example. The consequence is a separation into different groups that share a terminology and its conceptualization. The main problem with the use of a shared terminology according to a specific conceptualization of the world is that much information remains implicit. Ontologies have set out to overcome the problem of implicit and hidden knowledge by making the conceptualization of a domain explicit. In the next section we describe how important ontologies are for the semantic integration process.
4.1
Ontologies for semantic integration
Recently, the use of formal ontologies to support information systems have been discussed (Guarino, 1998), (Bishr & Kuhn, 1999). The term 'Ontology' has been used in many ways and across different communities (Guarino & Giaretta, 1995). When we promote and motivate the use of ontologies for environmental information processing we need a clear picture of what we are referring to with “ontologies”. Thereby, we try and follow the description given in Uschold and Grüninger (Uschold & Grüninger, 1996). Ontologies have set out to overcome the problem of implicit and hidden knowledge by making the conceptualization of a domain explicit. This corresponds to a popular definition of the term ontology in computer science (Gruber, 1993): ''An ontology is an explicit specification of a conceptualization.'' Ontology is used to make assumptions about the meaning of a specific term. It can also be seen as an explication of the context for which a term is normally used. For example in Lenat (Lenat, 1998), he describes context in terms of twelve independent dimensions that have to be know in order to completely understand a piece of knowledge. Also, he states context must demonstrate how dimensions can be explicated using the 'Cyc' ontology. There are many different ways in which ontology may explicate a conceptualization and the corresponding context knowledge. The possibilities range from a purely informal natural language description of a term corresponding to a glossary, to strictly formal approaches with the expressive power of full first order predicate logic or even beyond (e. g. Ontolingua (Gruber, 1991)). Jasper and Uschold (Jasper & Uschold, 1999) distinguish two ways in which the mechanisms for the specification of context knowledge by ontology can be compared: Level of Formality: The specification of a conceptualization and its implicit context knowledge can be done at different levels of formality. As already mentioned above, a glossary of terms can also be seen as an example of ontology despite its purely informal character. A first step to gain more formality is to prescribe a structure to be used for description. A good example for this approach is the new web annotation language XML (Bray, Paoli, & Sperberg-McQueen, 1998). However, the rather informal character of XML encourages its own misuse. While the hierarchy of an XML specification was also designed to describe layout, it can also be exploited to represent sub-type hierarchies (Harmelen & Fensel, 1999) which may lead to confusion. This problem can be solved by assigning formal semantics to the structures used for the description of the ontology. An example is the conceptual modeling language CML (Schreibener, Wielinga, Akkermans, Velde, & Anjewierden, 1994). CML offers primitives to describe a domain that can be given a formal semantics in terms of first order logic (Aben, 1993). However, formalization is only available for the structural part of a specification. Assertions about terms and the description of dynamic knowledge are not formalized, offering total freedom for the description. The other extreme are specification languages that are completely formal. A prominent example is the Knowledge Interchange Format (KIF) (Genesereth & Fikes, 1992), which was designed to enable different knowledge-based systems to exchange knowledge. KIF has been used as a basis for the Ontolingua language (see above) thus giving a formal semantics to that language.
Extent of Explication: The other comparison criterion is the extent of explication that is reached by the ontology. This criterion is strongly connected with the expressive power of the specification language used. We already mentioned DTDs, which are mainly a simple hierarchy of terms. We can generalize this by saying that the least expressive specification of the ontology consists of an organization of terms in a network using two-placed relations. This idea goes back to the use of semantic networks in the seventies. Many extensions of the basic idea have been proposed. One of the most influential was the use of roles that could be filled out by entities showing a certain type (Brachman, 1977). This kind of value restriction can still be found in recent approaches. RDF-schema descriptions (Brickley, Guha, & Layman, 1998), which may become a new standard for the description of web-pages, is an example. However, default values and value range descriptions are not expressive enough to cover all possible conceptualizations. Allowing classes to be specified by logical formulas can provide more expressive power. These formulas can be restricted to a decidable subset of first order logic. This is the approach of the description “logics” (Borgida & Patel-Schneider, 1994) Nevertheless, there are also approaches allowing for more expressive descriptions. In Ontolingua for example, classes can be defined by arbitrary KIF-expressions. Beyond the expressiveness of full first-order predicate logic there are also special purpose languages that have an extended expressiveness to cover specific needs of their application area. Examples are specification languages for knowledge-based systems often including variants of dynamic logic to describe system dynamics (compare (Fensel & Harmelen, 1994)). In the last few years a lot of work has been done in the above-enumerated areas. We will now describe the most promising approaches. 4.1.1 Modeling languages and model mapping Addressing Fultons first two points (formal modeling language, standards for model mapping), the Ontology Interchange Layer (OIL) (Horrocks et al., 2000) could become such a standard. OIL is based on existing proposals such as OKBC (Chaudhri, Farquhar, Fikes, Karp, & Rice, 1998), XOL (Karp, Chaudhri, & Thomere, 1999), and RDF, and enriches them with necessary features for expressing ontologies. With OIL, a proposal has been made that opens a discussion, which may lead to a useful and well-defined consensus of a large community making use of such an approach. OIL combines framebased modeling features, reasoning facilities from description logic, and an integration with RDF- and XML-Schemes. An OIL-specification consists of three layers:
• • •
The object level where concrete instances of the ontology are described. The first meta level where the actual ontological definitions are provided. Here, the object level terminology is defined. The second meta level describes features of the ontology such as author, name, subject, etc. For representing metadata of ontologies, the Dublin Core Metadata Element Set (Version 1.1) standard is used (see section 2.1.1).
The authors of OIL conclude that from a pragmatic point of view, OIL covers consensual modeling primitives of frame systems and description logic. From a theoretical point of view, they claim that it looks quite natural to limit the expressiveness of this version and to have subsumption decidable. This defines an well-understood subfragment of first-order logic. However, OIL is new and it has to be proven that these claims are valid. In the moment OIL is evaluated in two projects, On-to-knowledge and IBROW (Benjamins, Wielinga, Wielemakers, & Fensel, 1999). 4.1.2 Model language tools The number of tools dealing with the construction of ontologies increased over the last few years. (Duineveld, Stoter, Weiden, Kenepa, & Benjamins, 1999) describe six ontology-engineering tools that are web- or Windows-based: • • • • •
Ontolingua, a web-based tool for making ontologies WebOnto, also a web-based tool, but completely graphical ProtégéWin, a Windows-based tool, also graphical OntoSaurus, a web-based tool, looks much like Ontolingua, but uses the Loom language ODE, a Windows-based tool, a combination of a graphical and text-based tool
•
KADS22, a graphical and text-based tool for building ontologies and reasoning strategies
They concluded that WebOnto (Domingue, 1998), ProtégéWin (Eriksson, Ferserson, Shahar, & Musen, 1999) and ODE (Fernández, Gómez-Pérez, Pazos, & Pazos, 1999) were best suited for the conceptualization and formalization phase in ontology development. Also, these tools seem better suited for the less experienced users. Ontolingua (Farquhar, Fikes, & Rice, 1997) and OntoSaurus (ISX Corporation, 1991) appear to be better suited for further phases, where only relatively small revisions and modifications are required. KADS22 is an interactive environment for the CommonKADS modelling language CML2 (see (Schreiber et al., 1999) for CommonKADS). Because the different tools have their pros and cons, it is impossible to select the ideal tool at the moment. Furthermore, Duinefeld et al. concluded that for less experienced users WebOnto and ProtégéWin are better suited, because they require little knowledge of the underlying knowledge representation language, and are easy to learn. These two dimensions are important for beginning ontological engineers. However, more experienced users would be better using, Ontolingua and OntoSaurus which might provided the users better support in creating complex ontologies. Their final conclusion is that current ontology tools are not yet ready for direct use by domain experts, but they also do not require knowledge representation experts. 4.1.3 RDF: A Standard Format for Semantic Modeling The basic model underlying RDF (W3C, 1999c) is very simple. Every type of information about a resource, which may be a web page or an XML element, is expressed in terms of a triple (resource, property, value). Thereby, the property is a two-placed relation that connects a resource to a certain value of that property. A value can be a simple data-type or a resource. Additionally, the value can be replaced by a variable standing for a resource that is further described by nested triples making assertions about the properties of the resource that is represented by the variable. Further, RDF allows multiple values for single properties. For this purpose, the model contains three build in data-types called collections, namely an unordered lists (bag) ordered lists (seq.) and sets of alternatives (alt) providing some kind of an aggregation mechanism. A further problem arising from the nature of the web, is the need to avoid name-clashes that might occur when referring to different web-sites that might use different RDF-models to annotate meta-data. In this example, RDF defines name-spaces. They are defined once by referring to a URL that provides the names and connects it to a source_id that is then used to annotate each name in an RDF specification defining the origin of that particular name. A standard syntax has been defined to write down RDF statements making it possible to identify the statements as meta-data. Thereby providing a low level language for expressing the intended meaning of information in a machine processable way. RDF/S A Basic Vocabulary This very simple model underlying ordinary RDF descriptions allows freedom for writing down metadata in arbitrary ways. However, sharing information requires an agreement on a standard core of vocabulary in terms of modeling primitives that should be used to describe meta-data. RDF schemas (RDF/S) are an attempt to provide such a standard vocabulary. This vocabulary that is shown in Figure 2 is defined using the RDF data model.
Figure 2: Modeling Components of RDF/S
A closer look on the modeling components reveals that RDF/S actually borrows from frame systems well known from the area of knowledge representation. RDF/S provides a notion of concepts (class), slots (property), inheritance (SubclassOf, SubslotOf) and range restrictions (ConstraintProperty). Unfortunately currently, there is no well-defined semantics for these modeling primitives. Currently, we have existing parts such as the reification mechanism, which are not well defined even on an informal level. Further, there is no reasoning support available, not even for property inheritance. XML, RDF and Semantic Modeling We have briefly introduced the W3C standards for information exchange and meta-data annotation, now we have to consider their usefulness for information sharing. When referring to the different layers of integration we mentioned in the introduction that we discovered that XML is only concerned with the issue of syntactic integration. Also, XML can defines structures, however, there is no sophisticated mechanism for mapping different structures. In this case it would be more appropriate to refer to a predefined structure. RDF on the other hand, was designed to provide some description on the semantic level by enabling us to include meta-information in the description of a web page. As we have discussed in the last section, RDF in its current state fails to provide semantic descriptions. Rather it provides a common syntax and a basic vocabulary to be used when describing meta-data. Furthermore, the designers of RDF are aware that there is a strong need for an additional 'logical level' that defines a clear semantics for RDF expressions and provides a basis for integration mechanisms. Taking all of this points into consideration our conclusion about current web standards is that the use of XML and specifically XML schemata is a suitable way of exchanging data with a well defined syntax and structure. Furthermore a simple RDF provides a uniform syntax for exchanging meta-information in a machine-readable format. However, currently neither XML nor RDF provides sufficient support for the integration of heterogeneous structures or different meanings of terms. Therefore there is a need for semantic modeling and reasoning about structure and meaning. Promising candidates for semantic modeling approaches can be found in the areas of Knowledge Representation as well as in the distributed database community. We will discuss some of these approaches in the following section.
4.2
Semantic Interoperability Approaches
In the following sections we will describe actual approaches and research programs in the area of semantic interoperabity and/or translation. Since this is a fairly new research area, we also considered advanced concepts. 4.2.1 Semantic components in the OGC Semantic components in the OGC (OGC, 1999) approach are defined by an attribution model and the collection of properties and their parameters associated with the feature. The Open GIS approach is to establish mechanisms for semantic definition based on dictionaries, thesauri, and catalogs, each with their own well-defined interfaces. Semantic translator services can then ascertain critical aspects of the information content of a feature and to some extent translate it into an alternate schema. This concept is currently a work-in-progress (Gardels, 1996). The OGC addresses semantic interoperability issues, the ability of users to share meaningful information about the earth. It is recognized that different groups of users may have profoundly different world views, each with its own vocabulary and taxonomy, abstractions of earth features and phenomena, and accepted mechanisms for manipulation, communication, and analysis. The semantics associated with an information community include: metadata, feature class definitions, attribute definitions, valid feature and attribute relationships, data capture guidelines for features and attributes, symbol sets for feature representation, rule sets for feature portrayal or display, relationships and dependencies between features, attributes, metadata, etc.; methods; and behaviors (Gardels, 2000). Furthermore, each geodata collection is owned by one information community, and is created according to the semantic rules associated with that community. An information community is also defined by the availability of one or more catalogs and comprising metadata which describe each of its feature collections. Such catalogs may be organized as a hierarchy of increasingly generalized descriptions of sets of individual catalogs for specific feature collections. Gardels describes that if an information community's catalog is sufficiently robust, a user in another community should be able to use the catalog to determine the precise meaning of the collection's components and thereby utilize that information for its own purposes. The only requirement is that the source catalog be exposed to the world through the Internet or other easy-accessible ways.
Lately the OGC founded a Semantic SIG and is about to develop approaches which can be used to fulfill the requirements w.r.t fully interoperable GIS. 4.2.2 Semantic-based Information Retrieval Möller et al. (Möller, Haarslev, & Neumann, 1998) investigated the use of conceptual descriptions based on description logic for content based information retrieval and presented an idea how description logic can be extended with tools dealing with spatial concepts. They defined 15 topological relations that are organized in a subsumption hierarchy. In order to support spatial inferences, they extended CLASSIC (one implementation of description logic (Borgida, Brachman, McGuiness, & Resnick, 1989)) with new concept constructors based on the spatial relations. Their semantics assumes that each domain object is associated with its spatial representation (i.e. a polygon) via a predefined attribute has-area. Concepts for spatial objects are denoted with a predicate, a relation and a name for a polygon constant. They also have contributed to extending description logic theory by increasing the expressive power of description logic concerning reasoning about space (see also (Haarslev & Möller, 1997)). The Least Common Subsumer (LCS) (Cohen, Borgida, & Hirsh, 1992) operation has been extended in order to adequately deal with the spatial representation requirements for a TV-Assistant application. This showed that their theory works in practice. However, the costs for modeling the required domain knowledge must not be neglected. 4.2.3 Darpa Agents Markup Language (DAML) The Defense Advanced Research Projects Agency (DARPA) is currently investigated and developing the new language DAML. DAML is supposed to be a step toward what Tim Berners-Lee, the creator of the World Wide Web, calls a "semantic Web" where agents, search engines and other programs can read DAML mark up to decipher meaning—rather than just content—on a Web site. A semantic Web also lets software agents utilize all the data on all Web pages, allowing it to gain knowledge from one site and apply it to logical mappings on other sites. Enhanced searching of this type would require a lot of groundwork: The authors claim that once DAML is available, authors at individual sites would have to add DAML to their pages to describe the content (Rapoza, 2000). Although DAML is still in an early stage, Hendler has begun working with Berners-Lee and the World Wide Web Consortium to make sure that DAML fits with the W3C's plans for a semantic Web. This would be based primarily on RDF (Resource Description Framework), the W3C's metadata technology for adding machine-readable data to the Web. It is expected to have a working draft of DAML available by the summer. The developers describe DAML as a semantic language that ties the information on a page to machinereadable semantics (ontology). The language must allow for communities to extend simple ontologies for their own use, allowing the bottom-up design of meaning while allowing sharing of higher level concepts. In addition, the language is supposed to provide mechanisms for the explicit representation of services, processes and business models, as to allow non-explicit information (such as that encapsulated in programs or sensors) to be recognized. DAML will provide a number of advantages over current markup approaches. It will allow semantic interoperability at the level we currently have syntactic interoperability in XML. Objects in the web can be marked (manually or automatically) to include descriptions of information they encode, descriptions of functions they provide, and/or descriptions of data they can produce. This will allow web pages, databases, programs, models, and sensors to all be linked together by agents that use DAML which can recognize the concepts they are seaking. If successful, information fusion from diverse sources will become a reality (Hendler, 2000).
5
BUSTER – THE BREMEN UNIVERSITY SEMANTIC TRANSLATOR FOR ENHANCED RETRIEVAL
In this chapter, we propose the BUSTER - approach (Bremer University Semantic Translator for Enhanced Retrieval), which provides a comprehensive solution to reconcile all heterogeneity problems. The system provides an integrated solution to the problem of information integration taking into concern all three levels of integration combining several technologies including standard markup languages, mediator systems, ontologies and knowledge based classifiers.
5.1
Conceptional Architecture
Figure 3 gives an overview about the BUSTER architecture. In general, the architecture can be divided in two distinct phases: a acquisition phase and a query phase.
During the acquisition phase all desired information for providing a network of integrated information sources is acquired. This includes the acquisition of a Comprehensive Source Description (CSD) of each source together with the Integration Knowledge (IK) which describes how the information can be transformed from one source to another. In the query phase, a user or an application (e.g. a GIS) formulates a query to an integrated view of sources. Several specialized components in the query phase use the acquired information, i.e. the CSD's and IK's, to select the desired data from several information sources and to transform it to the structure and the context of the query.
Ont.
Semantic level
CSD
..
Ontol Ontol
Query phase
Classif.Selection
Class.
Mapper
CTR Generator
CTR
CTR-Engine
Structural level
Annot .
Rule Generator
QDR
Wrapper Conf.
Desc. Desc.
CORBA
IK
Mediator
Scheme
Syntactic level
Wrapper
...
Query (Applic. System)
Acquisition phase
Wrapper
Heterog. DBs
Figure 3: The Buster Architecture
All software components in both phases are associated to three levels: the syntactic, the structural and the semantic level. The components on each level deal with the corresponding heterogeneity problems (see section 2). The components in the query phase are responsible for solving the corresponding heterogeneity problems whereas the components in the acquisition phase use the CSD's of the sources to provide the specific knowledge for the corresponding component in the query phase. A mediator for example, which is associated with the structural level, is responsible for the reconciliation of the structural heterogeneity problems. The mediator is configured by a set of rules that describe the structural transformation of data from one source to another. The rules are acquired in the acquisition phase with the help of the rule generator. An important characteristic of the BUSTER architecture is the semantic level, where two different types of tools exists for solving the semantic heterogeneity problems. This demonstrates the focus of the BUSTER system, providing a solution for this type of problems. Furthermore, the need for two types of tools exhibits, that the reconciliation of semantic problems is very difficult and must be supported by a hybrid architecture where different components are combined. In the following sections we describe the two phases and the components in detail. 5.1.1 Query Phase In the query phase a user submits a query request to one or more data sources in the network of integrated data sources. In this query phase several components of different levels interact. On the syntactic level, wrappers are used to establish a communication channel to the data source(s), that is independent of specific file formats and system implementations. Each generic wrapper covers a
specific file- or data-format. For example, generic wrappers may exist for ODBC data sources, XML data files, or specific GIS formats. Still, these generic wrappers have to be configured for the specific requirements of a data source. The mediator on the structural level uses information obtained from the wrappers and "combines, integrates and abstracts" (Wiederhold, 1992) them. In the BUSTER approach, we use generic mediators which are configured by transformation rules (query definition rules QDR). These rules describe in a declarative style, how the data from several sources can be integrated and transformed to the data structure of original source. On the semantic level, we use two different tools specialized for solving the semantic heterogeneity problems. Both tools are responsible for the context transformation, i.e. transforming data from an source-context to a goal-context. There are several ways how the context transformation can be applied. In BUSTER we consider the functional context transformation and context transformation by reclassification (Stuckenschmidt & Wache, 2000). In the functional context transformation, the conversion of data is done by application of a predefined functions. A function is declaratively represented in Context Transformation Rules. These (CTR's) describe from which source-context to which goal-context can be transformed by the application of which function. The context transformation rules are invoked by the CTR-Engine. The functional context transformation can be used for example, in the transformation of area measures in hectars to area measures in acres, or the transformation of one coordinate system into another. All context transformation rules can be described with the help of mathematical functions. Further to the functional context transformation, BUSTER also allows the classification of data into another context (Mapper in Figure 3). This is utilized to automatically map the concepts of one data source to concepts of another data source. To be more precise, the context description (i.e. the ontological description of the data) is re-classified. The source-context description, to which the data is annotated, is obtained from the CSD, completed with the data information and relates to goal-context descriptions. After the context re-classification the data is simply replaced with the data which is annotated with the related goal-context. Context re-classification together with the data replacement is useful for the transformation of catalog terms, e.g. exchanging the term of an source catalog by a term from the goal catalogue. 5.1.2 Acquisition Phase Before the first query can be submitted, the knowledge, in fact the Comprehensive Source Description (CSD) and Integration Knowledge (IK) has to be acquired. The first step of the data acquisition phase consists of gathering information about the data source that is to be integrated. This information is stored in a source-specific data base, the Comprehensive Source Descriptor (CSD). A CSD has to be created for each data source that participates in a network of integrated data sources. The Comprehensive Source Description Each CSD consists of meta data that describe technical and administrative details of the data source as well as its structural and syntactic schema and annotations. In addition, the CSD comprises a source ontology, i. e. a detailed and computer-readable description of the concepts stored in the data source. The CSD is attached to the respective data source. It should be available in a highly interchangeable format (e.g. XML), that allows easy data exchange over computer networks. Setting up a CSD is the task of the domain specialist responsible for the creation and maintenance of the specific data source. With the help of specialized tools that use repositories of pre-existing general ontologies and terminologies, the tedious task of setting up a CSD can be supported. These tools examine existing CSD's of other but similar sources and generate hypotheses for similar parts of the new CSD's. The domain specialist must verify - eventually modifying - the hypotheses and add them to the CSD of the new source. With these acquisition tools the creation of new CSD's can be simplified (Wache, Scholz, Stieghahn, & König-Ries, 1999). The Integration Knowledge
In a second step of the data acquisition phase, the data source is added to the network of integrated data sources. In order for the new data source to be able to exchange data with the other data sources in the network, Integration Knowledge (IK) must be acquired. The IK is stored in a centralized database that is part of the network of integrated data sources. The IK consists of several separated parts which provides specific knowledge for the components in the query phase. For example, the rule generator examines several CSD's and creates rules for the mediator (Wache et al., 1999). The wrapper configurator uses the information about the sources in order to adapt generic wrappers to the heterogeneous sources. Creating the IK is the task of the person responsible for operating and maintaining the network of integrated data sources. Due to the complexity of the IK needed for the integration of multiple heterogeneous data sources and the unavoidable semantic ambiguities, it may not be possible to accomplish this task automatically. However, the acquisition of the IK can be supported by semiautomatic tools. In general, such acquisition tools use the information stored in the CSDs to pre-define parts of the IK and propose them to the human operator who makes the final decision about whether to accept, edit, or reject them.
5.2
Example: Semantic Transformation in the Environmental Domain
We will now describe a possible application of the semantic transformation support by the BUSTER system. The example is simplified due to space constraints, but it tries to give the general idea of situations where semantic integration is necessary and how it could look like. We assume that we have a database of measured toxin values for wells in a certain area. The database may contain various parameters. For the sake of simplicity we restrict our investigation to two categories each containing two parameters: • •
Bacteria: Fecal Coliforms, Intestinal Helminth Salts: Sodium, Sulfate
Our scenario is concerned with the use of this information source for different purposes in environmental information systems. We consider two applications involving an assessment of the environmental impact. Both applications demand for a semantics-preserving transformation of the underlying data in order to get a correct assessment. While the first can be solved by simple mapping, the second transformation problem requires the full power of the classification based transformation described in the previous section, underlining the necessity for knowledge-based methods for semantic information integration. 5.2.1 Simple Transformation: From Values to Classes A common feature of an environmental information system is the generation of geographic maps summarizing the state of the environment using colored maps. High toxin values are normally indicated by red color, low toxin values by a green color. If we want to generate such maps for the toxin categories 'Bacteria' and 'Salts' using the toxin database we have to perform a transformation on the data in order to move from sets of numerical values to discrete classes, in our case the classes 'red' and 'green'. If we neglect the problem of aggregating values from multiple measurements at the same well. (this problem is addressed in (Keinitz, 1999) this classification problem boils down to the application of a function that maps combinations of values on one of the discrete classes. The corresponding functions have to be defined by a domain expert and could be represented by the tables shown below:
Table 4: Bacteria database
Table 5: Salt database
Fecal Coliforms
Intestinal Helminth
Bacteria Assessment
< 0.1 < 0.1 < 1.0 < 1.0 < 10.0 < 10.0 < 100.0 < 100.0 ≥ 100.0 ≥ 100.0
No Yes No Yes No Yes No Yes No Yes
Green Red Green Red Green Red Red Red Red Red
Sodium
Sulfate
Salts Assessment
< 0.1 < 0.1 < 0.1 < 0.1 < 20.0 < 20.0 < 20.0 < 20.0 < 200.0 < 200.0 < 200.0 < 200.0 ≥ 200.0 ≥ 200.0 ≥ 200.0 ≥ 200.0
< 1.0 < 200.0 < 300.0 ≥ 300.0 < 1.0 < 200.0 < 300.0 ≥ 300.0 < 1.0 < 200.0 < 300.0 ≥ 300.0 < 1.0 < 200.0 < 300.0 ≥ 300.0
Green Green Green Red Green Green Green Red Green Green Red Red Red Red Red Red
In the BUSTER system, this kind of semantic transformation is performed by so-called context transformation rules (Wache, 1999). These rules connect one or more database fields in the source database with a field in the target database (in our case the database of the environmental information system) and define a function that calculates the value to be inserted in the target field from the values in the source database. 5.2.2 Complex Transformation: Re-Classification In has been argued that simple rule-based transformations are not always sufficient for complex transformation tasks (Stuckenschmidt & Wache, 2000). The need for such complex transformations becomes clear when we try to use the previously generated information to decide whether a well may be used for different purposes. We can think of three intended uses each with its own requirements on the pollution level that are assumed to be specified as follows: • • •
bathing: absence of Intestinal Helminth and a Fecal Coliform pollution that is below 12.0 mg/l drinking: absence of Intestinal Helminth, a Fecal Coliform pollution that is below 20 mg/l, less than 135.0 mg/l of Sodium and less than 180.0 mg/l of Sulfate irrigation: absence of intestinal Helminth, a Fecal Coliform pollution that is below 30 mg/l, between 125.0 and 175.0 mg/l of Sodium and between 185.0 and 275.0 mg/l of Sulfate
These decisions are easy if we have access to the original database with its exact numerical values for the different parameters. The situation becomes difficult if we only have access to the discretized assessment values used for the generation of the colored map. In this case we cannot rely on a simple mapping from a combination of colors for the different toxin categories to possible uses, because the intended meaning of the colors that is needed for the decision is not accessible. However, if we manage to explicate the intended meaning of the colors, we have good chances of using the condensed information for decision making. In principle, the meaning of a color is encoded in the mapping tables shown above. To enable the BUSTER system to make use of this additional information, we have to provide comprehensive definitions of the concepts represented by the different colors. Using a logic-based representation these definitions could look as follows: BacteriaGr een(W ) ⇔ IntestinalHelmith (W ) = No ∧ FaecalColi forms (W ) < 10.0 BacteriaRed (W ) ⇔ Intestinal Helmith (W ) = Yes ∨ FaecalColi forms (W ) ≥ 10.0 SaltsGreen(W ) ⇔ Sodium(W ) < 200.0 ∧ Sulfat (W ) < 300.0 SaltsRed (W ) ⇔ Sodium(W ) ≥ 200.0 ∨ Sulfat (W ) > 300.0
The above formulas define four classes a well W can belong to. These class definitions can serve as input for the re-classification module of the BUSTER system. The module uses the definition to decide whether a well belongs to the class of wells that is suitable for one of the intended uses that have to be defined in the same way. Translating the informal requirements for the different kind of use into a class definition, we get: Bathing (W ) ⇔
IntestinalHelmith(W ) = No ∧ FaekalColiforms (W ) < 12.0
Drinking(W ) ⇔ IntestinalHelmith(W ) = No ∧ FaekalColiforms(W ) < 20.0 ∧ Sodium(W ) < 135.0 ∧ Sulfat (W ) < 180.0
Irrigation (W ) ⇔
IntestinalHelmith(W ) = No ∧ FaekalColiforms(W ) < 30.0 ∧ Sodium(W ) > 165.0 ∧ Sodium(W ) < 200.0 Sulfat (W ) > 245.0 ∧ Sulfat (W ) < 300.0
Using these definitions, the classifier is able to conclude that a well may be used for bathing, if the assessment value concerning the bacteria is 'green', because this means that Intestinal Helmith is absent and the level of Fecal Coliforms is below 10.0 and therefore also below 12.0. Concerning the use for drinking it can be concluded that drinking is not allowed if one of the assessments is 'red'. However, there is no definite result for the positive case, because we only know that Sodium is below 200.0 and Sulfate below 300.0 if both assessment values are 'green', while we demand them to be below 135.0 and 180, respectively. In practice, we would choose for a pessimistic classification strategy and conclude that drinking is not allowed, because of the risk of physical damage in the case of a misclassification. The situation is similar for the irrigation case: we can decide that irrigation is not allowed if one of the assessment values is 'red'. Again, no definite result can be derived. In this case it is likely that one would tend to an optimistic classification strategy, because the consequences of a misclassification are not as serious as they are for the drinking case.
6
CONCLUSION AND FUTURE WORK
We provided an overview outlining current world wide research results and ongoing work regarding the efficient and intelligent access and use of environmental information. We believe that information sharing is only possible with the help of syntactical, structural and semantical approaches. We have discussed the positives and negatives of meta information systems and believe they can assist and act as information provider needed for the semantic translation processes. Also we have concluded that some problems with heterogeneity cannot be solved with mediator approaches alone. Expressive modeling languages for ontologies (Ontolingua, OIL etc.) and software tools (WebOnto, ProtégéWin, etc.) that support the modeling process are already available and/or are about to emerge. Although domain experts with possibly no help from a knowledge engineer can use these tools, we argue that ontologies can help to overcome the context dependency problem of concepts. This together with transformation rules within a mediator and a certain amount of wrappers that are able to access the original data sources would improve the situation with shared information significantly. Clearly, future work must be done to combine currently available tools to achieve the semantic translation process. Further, not only the combination of available systems but new developments, have to be made in the area of uncertainty and vagueness of concept definitions. Finally, more work must be completed within the application domain communities because it seems inevitable to have a shared terminology.
REFERENCES Aben, M. (1993). Formally Specifying Re-Usable Knowledge Model Components. Knowledge Acquisition Journal, 5, 119-141. Ahmed, R., Smedt, P. D., Du, W., Kent, W., Ketabchi, M. A., Litwin, W. A., Rafii, A., & Shan, M.-C. (1991). The Pegasus Heterogeneous Multidatabase System. IEEE Computer, 24(12), 19-27. Arndt, H.-K., Christ, M., & Günther, O. (2000, 3/2000). Environmental Markup Language (EML) als XML-basierts Metasprache für das Metainforamtionssystem EcoExplorer. Paper presented at the Hypermedia im Umweltschutz, Ulm, Germany. Benjamins, V. R., Wielinga, B., Wielemakers, J., & Fensel, D. (1999). Towards Brokering Problem-Solving Knowledge on the Internet. In D. Fensel & R. Studer (Eds.), Knowledge Acquisition, Modeling and Management (Vol. 1621): Springer.
Bishr, Y., & Kuhn, W. (1999). The Role of Ontology in Modelling Geospatial Features ( Vol. 5). Münster: Institut für Geoinformatik, Universität Münster. Borgida, A., Brachman, R. J., McGuiness, D. L., & Resnick, L. A. (1989). CLASSIC: A Structural Data Model for Objects. Paper presented at the ACM SIGMOID International Conference on Management of Data, Portland, Oregon, USA. Borgida, A., & Patel-Schneider, P. F. (1994). A Semantics and Complete Algorithm for Subsumption in the CLASSIC Description Logic. JAIR, 1, 277--308. Brachman, R. J. (1977). What's in a Concept: Structural Foundations for Semantic Nets. International Journal of Man-Machine Studies, 9, 127--152. Bray, T., Paoli, J., & Sperberg-McQueen, C. M. (1998). Extensible Markup Language (XML) 1.0 ( REC-rdf): W3C. Brickley, D., Guha, R., & Layman, A. (1998). Ressource Description Framework Schema Specification ( PR-rdfschema): W3C. CEOS. (1996). Catalogue Interoperability Protocol (CIP) Specification - Release A.: CEOS Working Group on Information Systems and Services. CEOS. (1997). Catalogue Interoperability Protocol (CIP) Specification - Release B. Issue 2.2.: CEOS Working Group on Information Systems and Services. Chaudhri, V. K., Farquhar, A., Fikes, R., Karp, P. D., & Rice, J. P. (1998). Open Knowledge Base Connectivity (OKBC) Specification document 2.0.3. Stanford: SRI International and Stanford University (KSL). Cohen, W. W., Borgida, A., & Hirsh, H. (1992). Computing Least Common Subsumers in Description Logics. Paper presented at the AAAI - 92. Colon, T. (2000). Green Building gbXML Schema. Sonoma, CA: GeoPraxis Inc. Dempsey, L., & Weibel, S. L. (1996). The Warwick Metadata Workshop: A Framework for theDeployment of Resource Description. D-Lib Magazine, July/August 1996. DMA. (1991). Digital Geographic Information Exchange Standard (DIGEST). Fairfax, VA: Defence Mapping Agency. Domingue, J. (1998). Tadzebao and WebOnto: Discussing, Browsing, and Editing Ontologies on the Web. Paper presented at the Eleventh Workshop on Knowledge Acquisition, Modeling and Management, Banff, Alberta, Canada. Dublin Core Metadata Initiative. (2000, 28.03.2000). The Dublin Core: A Simple Content Description Model for Electronic Resources, [Internet Webpage]. Available: http://purl.org/dc. Duineveld, A. J., Stoter, R., Weiden, M. R., Kenepa, B., & Benjamins, V. R. (1999). Wondertools? A comparative study of ontological engineering tools, Proceedings of the 12th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop.: University of Calgary/Stanford University. EEA. (1996). Catalogue of Data Sources and Thesaurus - Implementation Reference Document ( Draft Version 0.3). Copenhagen: European Environmental Agency. Eriksson, H., Ferserson, R., Shahar, Y., & Musen, M. A. (1999). Automatic Generation of Ontology Editors. Paper presented at the Twelfth Workshop on Knowledge Acquisition, Modeling and Management, Banff, Alberta, Canada. ETC/CDS. (2000, 17.03.2000). CDS. Available: http://www.mu.niedersachsen.de/system/cds/. Farquhar, A., Fikes, R., & Rice, J. (1997). The Ontolingua Server: Tool for Collaborative Ontology Construction. IJHCS, 46(6), 707 -- 728. FDGC. (1994). Content Standards for Digital Geospatial Metadata. Federal Geographic Data Committee. Available: http://www.fgdc.gov/metadata. Fensel, D., & Harmelen, F. v. (1994). A comparison of languages which operationalise and formalise KADS models of expertise. The Knowledge Engineering Review, 9, 105--146. Fernández, M., Gómez-Pérez, A., Pazos, J., & Pazos, A. (1999). Building a Chemical Ontology Using Methontology and the Ontology Design Environment. IEEE Intelligent Systems, Jan. 99, 37 -- 46. FME. (1999). Semantic translation. Fulton, J. (1996, September 1996). Semantic Plug and Play. Paper presented at the Joint Workshop on Standards for the Use of Models that Define the Data and Processes of Information Systems, Seattle, WA. Gardels, K. (1996). A comprehensive data model for distributed, heterogeneous geographic information (Technical USA-CERL, July). Gardels, K. (2000). The Open GIS Approach to Distributed Geodata and Geoprocessing. Available: http://www.regis.berkeley.edu/gardels/envmodel.html [2000. Genesereth, M. R., & Fikes, R. E. (1992). Knowledge Interchange Format Version 3.0 Reference Manual (Report of the Knowledge Systems Laboratory KSL 91-1): Stanford University. Gruber, T. (1991). Ontolingua: A Mechanim to Support Portable Ontologies (KSL Report KSL-91-66): Stanford University. Gruber, T. R. (1993). A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(2). Guarino, N. (1998). Formal Ontology and Information Systems. Paper presented at the FOIS 98, Trento, Italy. Guarino, N., & Giaretta, P. (1995). Ontologies and Knowledge Bases: Towards a Terminological Clarification. In N. Mars (Ed.), Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing (pp. 25--32). Amsterdam. Günther, O. (1998). Environmental Information Systems. Berlin: Springer.
Haarslev, V., & Möller, R. (1997). Spatioterminological Reasoning: Subsumption Based On Geometric Reasoning. Paper presented at the 11th International Workshop on Description Logics DL'97, Gif sur Yvette, France. Harmelen, F. v., & Fensel, D. (1999). Practical Knowledge Representation for the Web. In D. Fensel (Ed.), Proceedings of the IJCAI'99 Workshop on Intelligent Information Integration. Hatton, B. (1997). Distributed Data Sharing. Paper presented at the AURISA 97, Christchurch, New Zealand. Hendler, J. (2000). DARPA Agent Mark Up Language (DAML). Available: http://www.darpa.mil/iso/DAML/DAML_ISO_World.ppt [2000. Horrocks, I., Fensel, D., Goble, C., Harmelen, F. v., Boekstra, J., & Klein, M. (2000). The Ontology Interchange Language OIL: The grease between ontologies. Paper presented at the 12th International Conference on Knowledge Engineering and Knowledge Management EKAW 2000, Juan-les-Pins, France. Huber, M. (1998). High-Tech-Entscheidungstrends bei Geodaten Servern. GeoBit(3), 18-20. IHO. (1994). IHO Transfer standard for digital hydrographic data. Monaco: International Hydrographic Bureau. ISO/IEC. (1999). Information technology -- Database languages -- SQL, ISO/IEC 9075:1992/Cor 3:1999.: International Organization for Standardization (ISO). ISX Corporation. (1991). LOOM Users Guide Version 1.4. Jasper, R., & Uschold, M. (1999). A Framework for Understanding and Classifying Ontoogy Applications, Proceedings of the 12th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop.: University of Calgary/Stanford University. Karp, P. D., Chaudhri, V. K., & Thomere, J. (1999). XOL: XML based ontology exchange language, Specification, Version 0.3.: Pangea Systems, SRI International. Kashyap, V., & Sheth, A. (1995, 1995). Schematic and Semantic Semilarities between Database Objects: A Context-based Approach, LSDIS Lab, Univ. of GA. Keinitz, S. (1999). Integration heterogener Informationsquellen unter dem Aspekt der Abstraktion. Unpublished Masters Thesis, Bremen, Bremen. Kim, W., & Seo, J. (1991). Classifying schematic and data heterogeinity in multidatabase systems. IEEE Computer, 24(12), 12-18. Legat, R., Hashemi-Kepp, Kruse, Nicolai, Nyhuis, Pultz, Stallbaumer, Swoboda, & Zirn. (1999, 21.6.99). Der Umweltdatenkatalog UDK in Österreich, 5 Jahre Erfahrungen. Paper presented at the Umweltdatenbanken im Web, Karlsruhe. Lenat, D. B. (1998). The Dimensions of Context Space. Litwin, W., Mark, L., & Roussopoulos, N. (1990). Interoperability of Multiple Autonomous Databases. ACM Computing Surveys, 22(3), 267-293. Lynch, C. A. (1997). The Z39.50 Information Retrieval Standard. D-Lib Magazine, April 1997. Möller, R., Haarslev, V., & Neumann, B. (1998, 31. August- 4. September). Semantics-Based Information Retrieval. Paper presented at the IT&KNOWS-98: International Conference on Information Technology and Knowledge Systems, Vienna, Budapest. NASA. (1999). Directory Interchange Format (DIF) Writer's Guide, Version 7 (Global Change Master Directory). National Aeronautics and Space Administration (NASA). Available: http://gcmd.nasa.gov/difguide [1999. Nax, K., & Lethen, J. (1999). ETC/CDS GEneral Multilingual Environmental Thesaurus (GEMET) - General Information on GEMET.: ETC/CDS. OGC. (1999). The OpenGIS Abstract Specification ( 99-100r1.doc): Open GIS Consortium. OGC. (2000). OpenGIS Geography Markup Language Specification ( OGC RFC11 13-Dec-1999, based upon OpenGIS Project Document Number 99-082r2): Open GIS Consortium. OMG. (1992). The Common Object Request Broker: Architecture and Specification (OMG Document 91.12.1): The Object Management Group. Papakonstantinou, Y., Garcia-Molina, H., & Ullman, J. (1996, February 1996). MedMaker: A Mediation System Based on Declarative Specifications. Paper presented at the International Conference on Data Engineering, New Orleans. Rapoza, J. (2000). DAML could take search to a new level. PCWeek. Saltor, F., & Rodriguez, E. (1997, August 1997). On Intelligent Access to Heterogeneous Information. Paper presented at the Proceedings of the 4th Workshop Knowledge Representation Meets Databases (KRDB '97), Athens, Greece. Schreibener, A. T., Wielinga, B., Akkermans, H., Velde, W. v. d., & Anjewierden, A. (1994). CML The CommonKADS Conceptual Modeling Language. In S. e. al. (Ed.), A Future of Knowledge Acquisition, Proc. 8th European Knowledge Acquisition Workshop (EKAW 94).: Springer. Schreiber, G., Akkermans, H., Anjewierden, A., de Hoog, R., Shadbolt, N., van de Felde, W., & Wielinga, B. (1999). Knowledge Engineering and Mangement: The CommonKADS Methodology.: MIT Press. Sheth, A. P., & Larson, J. A. (1990). Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys, 22(3), 183-236. Stuckenschmidt, H., & Wache, H. (2000). Context Modeling and Transformation for Semantic Interoperability. Paper presented at the Knowledge Representation meets Databases KRDB 2000. Tschangho. (1999). Metadata for geo-spatial data sharing: A comparative analysis. Annals of Regional Science, Ann Reg Sci (1999)(33), 171 -- 181. UN. (1992). Terminology Bulletin No. 344. Volume II (indexes) and Volume I ( ST/SC/SER. F/344). New York: United Nations.
Uschold, M., & Grüninger, M. (1996). ONTOLOGIES: Principles, Methods and Applications. Knowledge Engineering Review, 11(2). W3C. (1998). Extensible Markup Language (XML) 1.0. Wache, H. (1999, Mai '99). Towards Rule--Based Context Transformation in Mediators. Paper presented at the International Workshop on Engineering Federated Information Systems (EFIS 99), Kühlungsborn, Germany. Wache, H., Scholz, T., Stieghahn, H., & König-Ries, B. (1999). An Integration Method for the Specification of Rule--Oriented Mediators. In Y. Kambayashi & H. Takakura (Eds.), Proceedings of the International Symposium on Database Applications in Non-Traditional Environments (DANTE'99) (pp. 109--112). Kyoto, Japan. Wassem-Gutensohn, J. (2000). Umweltinformationen auf XML-Basis.: Software AG. Weibel, S. (1999). The State of the Dublin Core Metadata Initiative April 1999. D-Lib Magazine, Volume 5(Number 4). Weibel, S., Godby, J., Miller, E., & Daniel, R. (1995). OCLC/NCSA Metadata Workshop Report.: Office of Research, OCLC Online Computer Library Center, Inc. Advanced Computing Lab, Los Alamos National Laboratory. Wiederhold, G. (1992). Mediators in the Architecture of Future Information Systems. IEEE Computer, 25(3), 3849.
Web-References: ANZLIC: http://www.anzlic.org.au ASDD: http://www.environment.gov.au/net/asdd/ Cosentino, M.: http://www.safe.com EEA: http://www.mu.niedersachsen.de/cds FGDC: http://www.fgdc.gov OII: http://www.echo.lu/oii/ OGC: http://www.opengis.org W3C (1998): Vector Markup Language (VML), World Wide Web Consortium Note 13-May-1998, Submission to the World Wide Web Consortium, NOTE-VML-19980513 (http://www.w3.org/TR/NOTE-VML) W3C (1999a): HTML 4.01 Specification. W3C Recommendation 24. December 1999 (http://www.w3.org/TR/html401). W3C (1999b):Extensible Marku Language (XML) 1.0; W3C Recommendation 10. February 1999 (http://www.w3.org/TR/REC-xml) W3C (1999c): Resource Descrition Framework (RDF) Schema Specification; W3C Proposed Recommendation 3. March 1999 (http://www.w3.org/TR/PR-rdf-schema) W3C (1999d): XML Schema Part 1 - Structures. World Wide Web Consortium working draft. 6. May. 1999 (http://www.w3.org/TR/xmlschema-1) W3C (1999e): XML Schema Part 2 - Datatypes. World Wide Web Consortium working draft. 6. May. 1999 (http://www.w3.org/TR/xmlschema-2) W3C (1999f): Scalable Vector Graphics (SVG) Specification, W3C Working Draft 11-February-1999, WD-SVG19990211 (http://www.w3.org/TR/WD-SVG)