catalog will provide a centralized access and provide means to search services, data and metadata provided by ..... RDQL - RDF Data Query Language Tutorial.
Using Ontologies as Artifacts to Enable Databases Interoperability Daniela F. Brauner
Marco A. Casanova
Carlos J. P. de Lucena
LES – DI/PUC-Rio Rua Marques de São Vicente, 225 22453-900, Rio de Janeiro, RJ, Brazil +55 21 2540 6915 #137
{dani, casanova, lucena}@inf.puc-rio.br
ABSTRACT This paper presents a hybrid approach using ontologies and a central access to objects of federated databases. The catalog will provide a centralized access and provide means to search services, data and metadata provided by the federation. The catalog stores the reference ontology that represents the global conceptual schema and instances of reference. It also stores ontologies with descriptions about conceptual schemas and data sets with basic attributes of each data source. The catalog will be based on ontologies mappings between data sources present in the reference ontology. This design decision was taken to minimize the federation databases heterogeneity, in both structural and semantic levels, making possible to have interoperability at data and metadata levels.
Keywords Ontology, Catalog, Framework, Database Interoperability.
1. INTRODUCTION With technology evolution, the amount of available data has been growing exponentially. This data is stored in some Database Management Systems (DBMSs) and it usually has different sources which frequently are related to a same application domain, for instance geoprocessing. However, due to different applications requirements and to the lack of standardization in database modeling distinct data models are adopted in each information source. This heterogeneity characterizes the problem of interoperability between computational systems. In this context, appears the problem of data structural and semantic heterogeneity, that are challenges faced by distributed databases community (Özsu, M. T.; Valduriez, P., 1999). Structural heterogeneity happens when data is organized using different conceptual schemas. Otherwise, semantic heterogeneity considers different interpretations of the data content. To enable interoperability, remote systems must be able to locate and access data sources and, also to interpret and process the data. The solution proposed by the database community aims to identify and treat the relationships between schemas (schemas’ mapping). Therefore, mappings should be generated between
pairs of conceptual schemas. However, this solution became impracticable due to the necessary amount of different mappings when working with a great number of databases. To solve that, another solution is proposed based on the use of a global schema containing unification elements that correspond to each original database conceptual schema. Another approach for the problem of semantic interoperability is based on the use of ontologies to expose the implicit knowledge of applications (Wache, H. et alli, 2001 and Uschold, M.; Grüninger, M., 1996). In our understanding, an ontology refers to an engineering artifact, constituted by a specific vocabulary used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning of the vocabulary words (Guarino, N., 1998). The goal of this paper is to propose a hybrid approach to the interoperability problem through the creation of database federations. In this approach, each federation will have an object catalog that will serve as mediator to the federated databases. The catalog will provide services to access and search for federation data and metadata. To enable it, the catalog will store an ontology for each data source, containing the description of conceptual schemas and a data set with basic attributes. It will also store instances of reference and a reference ontology that represents the federation global conceptual schema. Furthermore, each database schema ontology will have mappings at both the structural and the semantic levels to the elements of the catalog reference ontology. Thus, the interoperability of data sources is enabled in both data and metadata levels.
2. A FRAMEWORK TO ONTOLOGYBASED OBJECTS CATALOGS In what follows we will describe a framework (Fayad, M.E.; Schmidt D.C., 1999) for instantiation of Ontology based Objects Catalogs (OnOCs). The framework can be specialized for generation of customized OnOCs applications based on an ontology that is used at execution time by the catalog application. The ontologies stored into the catalogs may be formalized using OWL (Ontology Web Language), characterizing one frozen spot in the implementation architecture of the framework. The hotspots of the framework are the ontology repository and the reference ontology application domain. The operations for data and metadata access and management are provided in all catalogs interface. The operations of an OnOC will be based on a generalization of the services interfaces specified by
OpenGIS Consortium (OGC): OGC Gazetteer for data access and OGC Catalog to metadata (Nebert, D., 2002). OGC also specifies the Gazetteer Service to retrieve the geometry of one or more features (Atkinson, R.; Fitzke, J., 2002). The query is fulfilled using some feature identifier that can be any word or term that describes the feature. Usually, a gazetteer is a spatial dictionary of objects with geographic attributes (features). An OGC Gazetteer provides operations to:
get the service description and features types (getCapabilities) get the schema description of its features types (describeFeatureType); retrieve a set of feature instances (getFeature).
3. ONTOLOGY-BASED OBJECTS CATALOGS
application domain. It is also possible to use the thesaurus relationships introduced by Mena, E. et alli, (2001) to define relationships between concepts from different ontologies. An OnOC will mediate the access to the data sources acting as pivot of a database federation. To be part of the federation, each database must follow the participation protocol: register in the OnOC informing its conceptual schema (in OWL), map its database schema ontology in the reference ontology of the federation catalog and, finally, provide the set of objects that wants to share with its basic attributes.
3.2 OnOCs GENERIC ARCHITECTURE The generic architecture of OnOCs, illustrated in Figure 2, is composed of the following modules:
Interface: module to access the OnOC operations. Defined as a framework hot-spot;
Metadata Manipulation Module: implements the OnOC operations to manipulate the databases schema ontologies. The operations will be defined as a generalization of the OGC Catalog Service interface specification:
3.1 OVERVIEW An OnOC will maintain a reference ontology and the databases schema ontologies referred to the conceptual schema of each federated database. The reference ontology keeps a data set (objects) with basic attributes to be shared with other federated databases. Each original databases instance is mapped to a concept in the global ontology, preserving its original identification. Thereby, the catalog provides manners to define equivalent classes and objects between different databases. Figure 1 shows the use of the reference ontology (RO) and the instances of reference (IR) to map, respectively, the data sources’ schemas and data.
Reference Ontology
RO Ontology
O1
Ontologies mappings
O2
DB1
Discovery services: provide methods to locate ontologies on the OnOC repository;
Access services: provide methods to request services on data;
Management services: define methods to update ontologies on the repository;
Instances Manipulation Module: implements the OnOC operations to manipulate the instances (data). The operations will be defined as a generalization of the OGC Gazetteer interface specification:
GetCapabilities: returns the service description provided by the OnOC with a list of supported object types and its respective supported operations.
DescribeObjectType: returns the definition of the database schema ontology of any object type.
GetObject: enables the query by a set of objects instances using filters based on the properties of the object types.
Ontology
DB2 Instances of Reference
IR
Inference Module: module containing an inference engine to add capabilities to the OnOC thus enabling inference in the ontology repository to get additional information regarding the federation data;
(instâ (instân ncias) cias) Figure 1: Objects and ontologies mappings
The mapping between the databases schema ontologies and the reference ontology will be based, at least, in the generic relationships of the reference ontology (owl:subClassOf, owl:sameAs, owl:equivalentClass, owl:equivalentProperty,...) and in the defined relationships which are dependent of the catalog
Reference Ontology: local reference ontology to describe, in the federation application domain, the classes, basic properties and equivalent relationships to the integrated vision of the federation databases ontologies;
Ontology Repository: stores and manages the ontologies that describe the federation databases schemas and the OnOC reference ontology;
I1 Objects
Objects mappings
I2 Objects
Interface
Metadata Manipulation Module
RO
Instances Manipulation Module
Inference Module
Reference Ontology (RO)
repository to discover where to find objects of the type “city” (objects that are instances of city class). The answer of the DescribeObjectType operation, as explained above, will have metadata about the “city” class, including mappings to equivalent classes in other registered sources. It enables the user to locate and access all object sources, registered in the catalog, that contain information about cities, through respective database schema ontology URI.
IR
4. PRACTICAL EXPERIENCES Ontology O1
Ontology O2
Mapping with RO
Mapping with RO
Objects Mappings
Objects Mappings
To illustrate the usage scenarios of a Geographic Ontology Catalog, two simple object source ontologies were created. They are identified, respectively, by the following namespaces:
src01: http://www.inf.puc-rio.br/~dani/sw/ontologies/2004/ 09/ciudades.owl#
src02: http://www.inf.puc-rio.br/~dani/sw/ontologies/2004/ 09/cidade.owl#
The object source ontologies listed above were generated from relational database schemas using the following transformations: Ontologies Repository
DB1
Tables from relational databases were described as classes in OWL;
Attributes from tables were mapped into Datatype Properties in OWL;
Foreign keys in the relational representation were mapped into Object Property in OWL, linking the classes that represents the entities involved in the relationship;
DB2
Figure 2: OnOC generic architecture
3.3 OnOCs USAGE SCENARIOS In this section we describe three usage scenarios of OnOCs. Scenario 1: In this scenario, we describe the use of a catalog instance in the geographic application domain as mediator for federated geoobject sources. Consider a user who wants to know facts about the city of “Rio de Janeiro”. The catalog is invoked through the GetObject operation using the place name as the geo-object identifier in the user query (passed as a parameter of GetObject). The catalog will execute the query against its ontologies repository, locating all objects from the “city” type that have “Rio de Janeiro” as place name. The answer may include objects from distinct sources registered in the catalog. This is possible because the catalog stores mappings linking equivalent objects from distinct sources, even if they pertain to different classes.
Figure 3 and Figure 4 respectively show the src01 and src02 object source ontology. These visualizations were generated using the ezOWL Widget plugin in Protégé 3.0 build 54, an ontology editor tool (http://protege.stanford.edu/). Figure 5 shows the geographic reference ontology containing the mapping relationships between src01 and src02. It is identified by GeoMapRO and can be found at http://www.inf.pucrio.br/~dani/sw/ontologies/ 2004/09/GeoMapRO.owl#
Scenario 2: This scenario covers the case of a user that needs to discover object sources that have classes that map to a specific object type (class) G of the reference ontology. In this scenario, the catalog is invoked through the DescribeObjectType operation, having G as one of its parameters. The expected answer is a XML representation of a set of triples of the form (G, p, R), where p is a property that maps G into range R, defined in the reference ontology. In particular, p may represent a relationship between G and some class H defined in the ontology of one of the object sources. For instance, using the same catalog instance referred in the second scenario, consider a user that needs information about “cities”. The catalog will execute the query against the ontology
Figure 3 - src01 object source ontology
CAPES/Brazil and CNPq/Brazil grant 552068/2002-0 and 552040/2002-9.
REFERENCES Atkinson, R.; Fitzke, J.: Gazetteer Service Profile of the Web Feature Service Implementation Specification, Version 0.9, OGC 02-076r3, OpenGIS© Discussion Paper, OpenGIS Consortium, 2002. (http://www.opengis.org/docs/02076r3.pdf, acessado em 10/07/2004).
Figure 4 - src02 object source ontology
Berners-Lee, T.; Hendler, J.; Lassila, O.: The Semantic Web; Scientific American; May, 2001. Chu, W. W.; Mao, W.: CoSent: A Cooperative Sentinel for Intelligent Information Systems, In: Proceedings of 2000 Int’l Conference on Artificial Intelligence (ICAI), Las Vegas, NV, (2000). Fayad, M.E.; Schmidt D.C.: Building Application Frameworks : Object-Oriented Foundations of Framework Design, Wiley Computer Publishing, 1999. Florescu, D.; Levy, A.; Mendelzon, A.: Database Techniques for the World-Wide Web: A Survey. SIGMOD Record, 3(27): September, 1998, pp. 59-74. Guarino, N.: Formal Ontology and Information Systems. In: Proceedings of FOIS’98, Trento, Italy, 6-8 June 1998, Amsterdam, IOS Press, pp. 3-15.
Figure 5 - Geographic Reference Ontology
Gupta, A.; Marciano, R.; Zaslavsky, I.; Baru, C.: Integrating GIS and Imagery through XML-Based Information Mediation, NSF International Workshop on Integrated Spatial Databases: Digital Images and GIS.; June, 1999. (http://www.npaci.edu/DICE/Pubs/isd99.pdf, acessado em 11/07/2004).
Equivalent classes and properties were linked using owl:equivalentClass and owl:equivalentProperty, respectively. Individuals (instances of reference) that refer to the same object are linked using owl:sameAs property.
Hakimpour, F.; Geppert, A.: Resolving semantic heterogeneity in schema integration. In: Proceedings of the International Conference on Formal Ontology in Information Systems, Ogunquit, Maine, USA, October, 2001, ACM Press, Volume 2001, pp. 297 – 308.
5. CONCLUSION
Hemerly, A. S.; Furtado, A. L.; Casanova, M. A.: Towards Cooperativeness in Geographic Databases. In: Proc. 4th Int’l. Conf. on Databases and Expert systems Applications (DEXA 93), Prague (Sept. 1993). (also in Lecture Notes in Computer Science, Vol. 720, Springer Verlag).
In this work, we introduced the concept of an Ontology-based Object Catalog (OnOC) as the pivot of a federation of independent object sources. The paper presented a generic architecture and a framework that facilitates generating customized catalogs for a specific application domain. An OnOC stores a reference ontology and ontologies representing the conceptual schemas of the sources. The reference ontology may include a set of reference instances, which are sometimes essential to guarantee interoperability. At present, we are on a framework specification stage and we intend to test the framework architecture proposed here, by instantiating the Geo-Object Catalog (GeoOC) with the following technologies implemented as framework hot-spots:
Domain of application: geoprocessing
Ontology repository: API Jena
ACKNOWLEDGMENTS We gratefully acknowledge the helpful suggestions received from Nicolau Meisel. This work is partially supported by
Hernandez, M. A.; Miller, R. J.; Haas, L. M.: Clio: A SemiAutomatic Tool For Schema Mapping. In Proceedings ACM SIGMOD Conference 2001, Santa Barbara, CA, USA, 2001. (http://www.almaden.ibm.com/software/km/clio/sigmod02de mo.pdf, acessado em 11/07/2004). Linthicum, D.: Semantic Mapping, Ontologies, and XML Standards. www.XML-JOURNAL.com, May, 2004. Nebert, D.: Catalog Services Specification, Version 1.1.1, OpenGIS© Implementation Specification, OpenGIS Consortium, 2002. (http://www.opengis.org/docs/02087r3.pdf, acessado em 10/07/2004). OpenGIS Consortium Inc. (http://www.opengis.org/, acessado em 10/07/2004). Özsu, M. T.; Valduriez, P.: Principles of distributed database systems. 2nd Edition, Prentice-Hall, Inc., 1999.
RDQL - RDF Data Query Language Tutorial (http://jena.sourceforge.net/tutorial/RDQL/, acessado em 11/07/2004). Uschold, M.; Grüninger, M.: Ontologies: Principles, methods and applications. Knowledge Engineering Review, 11(2): 2001, pp. 93-155. Vretanos, P. A.: Web Feature Service Implementation Specification, Version 1.0.0, OpenGIS© Implementation Specification, OpenGIS Consortium, 2002. (http://www.opengis.org/docs/02-058.pdf, acessado em 10/07/2004). Wache H.; Vögele, T.; Visser, U.; Stuckenschmidt, H.; Schuster, G.; Neumann, H.; Hübner, S.: Ontology-Based Integration of Information – A Survey of Existing Approaches. In: Proceedings of IJCAI-01 Workshop: Ontologies and Information Sharing, Seattle, WA, 2001, pp. 108-117.