In this paper, we propose a context ontology to formally represent context in ..... Upper. Data Integration (DI). GeoSpatial DI. Legend: Fig. 2. Context Ontology for ...
Towards a Context Ontology for Geospatial Data Integration Damires Souza1,2, Ana Carolina Salgado1, and Patricia Tedesco1 1
Centro de Informática, Universidade Federal de Pernambuco, Brazil C.P. 7851, Recife, PE, Brasil – CEP 50732-970 {dysf, acs, pcart}@cin.ufpe.br 2 Centro Federal de Educação Tecnológica da Paraíba – CEFET/PB, Brazil
Abstract. Recently, Geospatial data and Geographic Information Systems (GIS) have been increasingly used. As a result, the integration of geospatial data has become a crucial task for decision makers. Since GIS and geospatial databases are designed by different organizations using different representation models and there are diverse levels of detail for the spatial features, it is much more complex to achieve data integration in geospatial databases. To help matters, context information may be employed to improve two fundamental aspects in Geospatial Data Integration: (1) schema mapping generation and (2) query answering. However, a relevant issue when using context is how to better represent context information. Ontologies are an interesting approach to represent context, since they enable sharing and reusability and help reasoning. In this paper, we propose a context ontology to formally represent context in geospatial data integration. We also present an example where this context ontology is used to improve query processing. Keywords: Geospatial Data Integration, Context, Ontologies.
1 Introduction As the Geographic Information System (GIS) community grows and geospatial data increases in importance, many public and private organizations need to disseminate and access the latest data as fast as possible and at a minimum cost [1]. Most GIS data is usually stored in geographic databases, although some is still stored in proprietary archive systems. To allow data dissemination, GIS and their geographic database schemas have to be, at least, interoperable. However, over time, GIS have been developed independently to meet specific requirements. As a result, interoperability is hampered by the difficulty to reconcile and integrate geospatial data from the many heterogeneous GIS and their data sets. Amongst all the issues in interoperability, semantic heterogeneity is the hardest to reconcile and still remains open. To overcome this difficulty, recent works have considered the use of ontologies [2, 3] as a way of providing a domain reference to improve geospatial schema and data integration. In this work, we use context information, i.e. the circumstantial information that makes a situation unique and comprehensible [4,5], as well as domain ontologies, as a way to enrich the geospatial data integration process. In this R. Meersman, Z. Tari, P. Herrero et al. (Eds.): OTM Workshops 2006, LNCS 4278, pp. 1576 – 1585, 2006. © Springer-Verlag Berlin Heidelberg 2006
Towards a Context Ontology for Geospatial Data Integration
1577
light, context information is used to ease schema mapping discovery, helping to determine the correct meaning of an entity. It is also used to improve query processing capabilities, providing users with “meaningful”, i.e., more relevant results. Thus, context information (explicitly or implicitly gathered) and domain ontology are used to handle heterogeneity and, consequently, provide users with more complete answers according to their current context of work. To illustrate the importance of context usage, consider two geospatial data sources which store the entity “River”. In data source A, “River” is called “StreamofWater” and is represented as a single line. In source B, “River” is represented as a compound line. When a user poses a query like “select r.name, r.length from river r;”, s/he is looking for all the rivers’ names and their correspondent lengths. To answer that query, all context information around its formulation should be considered. Firstly a mapping such as A.StreamofWater ⇔ B.River (which has already been generated) is observed. This means that both entities are semantically equivalent. As a result, query execution retrieves all the rivers found in both sources. Furthermore, to produce the final result, the system takes into account the user application scale, which is the best representation to be depicted (line or compound line) and the user preferences. All these information are contextual and depend on query formulation moment. An important issue in reasoning about context is how to represent its information [4,6,7]. Nevertheless, a challenge to be faced is the fact that there is not still a standard model for representing it. Context ontologies have been considered an interesting approach because they enable sharing and reusability and may be used by different reasoning mechanisms [6,7]. In this paper, we present an ontology to represent contextual information in geospatial data integration. The idea is to identify which information pertinent to the geospatial realm can be classified as context and which kinds of context should be considered to improve data integration. Through this model, it is possible to compose inference rules that enable the discovery of high-level implicit context from low-level explicit context. To clarify matters, we present a scenario illustrating how this context ontology can be used to improve query processing in geospatial data integration. This paper is organized as follows: Section 2 introduces some concepts related to context and context representation; Section 3 presents context in the light of geospatial data integration; Section 4 describes our proposal and an example of its use. Finally, Section 5 draws some conclusions and points out some future work.
2 Context and Context Representation Almost every statement we make is imprecise and hence meaningful only if understood with reference to an underlying context which embodies a number of hidden assumptions [8]. In this sense, context is any information that can be used to characterize the situation of an entity [5]. This entity may be an application user or a computational object, such as a device, a data source or a relation in a relational database schema. An application that gathers and uses context information in order to adapt its behavior accordingly is called a context-aware system. As an illustration, imagine a geospatial database which stores the following: Street(‘Epitacio Pessoa’, 3000, ‘Good’). How can these values be interpreted? When
1578
D. Souza, A.C. Salgado, and P. Tedesco
trying to figure out the sentence’s meaning, some possibilities may be considered: (1) Street(name, length, conservation level) or (2) Street(name, width, cleanliness level). The point is that each database maintains its own assumptions about the data it stores in an autonomous and independent way. Thus, context information is used to specify the assumptions made in database design to understand the underlying semantics. In the database domain, context may be used in several ways to capture the relevant semantics related to an object and its relationships to other objects [9]. Metadata, for instance, are examples of information that can be dealt as context. To allow context usage, it is crucial to define how context information will be represented. Some issues should be considered when evaluating techniques to represent context: (1) the model must be portable, (2) it should have validation tools for edition, type checking and conversion between formats, (3) formality is welcome since it eases definition, reasoning and reusability and (4) it must provide reasoning. Current research has worked with a considerable number of context representation techniques, such as Contextual Graphs [4], Topic Maps [10] and Ontologies [6,7]. A contextual graph is an acyclic directed graph and allows a context-based representation for operational processes by taking into account the working environment [4]. Topic maps are an attempt to connect pieces of data into a graph which represent the relationship between them while providing a lightweight way of navigating the information [10]. Ontologies are commonly referred as the shared understanding of some domains, often conceived as a set of entities, relations, functions, axioms and instances [6]. Thus, shared ontologies are fundamental for reusing knowledge, serving as a means for integrating problem-solving, domain representation and knowledge acquisition modules [11]. According to the issues pointed out above, ontologies seem to be one of the best options for context representation. In other words, there are several advantages for developing ontology-based context models [6,7], namely: to provide knowledge sharing (services are supposed to deal with the same set of concepts), to enable information reuse, to define semantics independently from data representation, and finally to enable the use of existing inference engines.
3 Context in Geospatial Data Integration Geospatial data integration solutions attempt to provide users with a uniform interface to access and retrieve information from distributed data sources (e.g. GIS or geographic databases) that are usually heterogeneous, autonomous and dynamic. The most important advantage of these systems is that they enable users to specify what they want without thinking about how to obtain the answers [12]. One of the main problems in geospatial data integration systems, which adopt a virtual approach [13], is query reformulation. When a user poses a query, it is decomposed into sub-queries to be evaluated and executed on the remote data sources. To this end, the system requires a complete understanding of the semantics of these sources. Thus, the description of a data source must include its metadata (schema), contents, completeness and its query capabilities, mostly in the geospatial realm, since not all data sources will be able to execute the required spatial operations. Besides, the complexity of the process of query reformulation depends on mappings defined between the related schemas.
Towards a Context Ontology for Geospatial Data Integration
1579
Due to heterogeneity, schematic and semantic conflicts appear either at the schema level or the instance level. As geospatial data are often described according to multiple perceptions, using different terms with different levels of detail, heterogeneity becomes more accentuated. In this sense, geospatial data conflicts are concerned with all about general data conflicts, such as domain incompatibility (different data types, precision and measure units), incompatibility among entities (names, keys), generalization and aggregation, and so forth. Specific geospatial data conflicts are: different scales, different coordinate systems, different geometric data types and multi-representation, vector/raster storage models and specific geographical composition (e.g. a street in a source may be represented as a unique line; in another source the same street may be seen as a composition of various line segments). To ease conflict resolution and improve query reformulation, context information can be used. Some of the metadata used to describe the data sources contents becomes contextual information (e.g. available spatial operators). Other contextual information is perceived or inferred dynamically during query processing (e.g. the application scale in use). Applying context reasoning in query processing enriches the complete process as well as provides what has been called context-aware queries - those whose results depend on the context at the time of their submission [14]. In the geospatial realm, specific data conflicts arise mostly when sub-queries answers are assembled to produce a final result. Context information such as user preferences, scale and coordinate system in use, multi-representation factors, intended level of detail and spatial relationships should be taken into account to determine the best scale, format and data representations to be presented to the user. In summary, reasoning over context information can help us to improve geospatial data integration in various aspects, namely: (1) context may ease inter-schema mapping generation, since it helps to determine the meaning of the terms, in addition to domain ontology that is used as a semantic reference; (2) query answering becomes more relevant; and (3) specific geospatial conflicts are better solved according to query formulation context, necessary spatial operations and intended level of detail. In this paper, we present a geospatial context ontology and its usage. We focus on using context to better resolve data conflicts to produce more relevant query answers.
4 A Context Ontology for GeoSpatial Data Integration In this section, we present our first steps towards the construction of an ontology for representing context according to geospatial data integration issues. The ontology has been developed using the Protégé 3.2 tool1. In order to motivate its construction, we firstly introduce a geospatial data integration example. Then, we explain the main concepts defined in the ontology and provide more examples to demonstrate its usage. 4.1 An Integration Example Our motivating example is related to the integration of two geospatial data sources, A and B which store data about the Brazilian Hydrographic System. Source A is at scale 1
http://protege.stanford.edu/
1580
D. Souza, A.C. Salgado, and P. Tedesco
of 1:1’000’000, while source B is more detailed and is at scale of 1:250’000. Figure 1 shows a UML diagram2 for both data sources. Source “B”
Source “A” GeographicArea GID : Integer Name : String
Lake Geometry : Point
StreamofWater Geometry : Line
BasicClass GID : String Name : String
Lake Shape : Polygon Capacity : Real
River Shape : Line Status : String
Fig. 1. “A” Data Source Schema and “B” Data Source Schema
Data Source “A” contains two classes – Lake and StreamofWater which inherit some characteristics from their superclass – GeographicArea. Both classes have a geometry attribute. Data source “B” contains Lake and River which are subclasses of BasicClass. Both have a shape attribute. In this example the semantic conflicts related to schema level are: (1) different entity names – GeographicArea vs. BasicClass and StreamofWater vs. River; (2) different attribute names – geometry vs. shape; and different data types – integer vs. string (GID) and point vs. polygon (lake). Other relevant conflicts are the instance level ones. Here we have different scales 1:1’000’000 (source A) vs. 1:250’000 (source B) and the multi-representation problem, since lake is represented by a point in source A and by a polygon in source B. Finally, both data sources are considered to be vector, but, in fact, real data sets may be vector or raster, which raises complexity and may entail format conversions. 4.2 The Context Ontology The context ontology for GeoSpatial Data Integration is depicted in Figure 2. For the sake of space, we have converted it to the UML notation. Firstly, an upper ontology has been defined (in gray with borders in bold) with meta-concepts that can be used in a broad range of domains. Then, concepts from data integration have been specified (in gray) in a middle ontology that can be used in data integration solutions. Finally, concepts from the geospatial realm have been added to produce the specific ontology for geospatial data integration. The context ontology concepts are explained below. Context Information: this is the ontology root. It is divided into four subconcepts: UserContext, DataContext, AssociationContext and ProcedureContext. UserContext: contains information about the user, his/her profile, identification (UserID) and location (UserLocation). The user may define his/her preferences about the way a query result should be presented. DataContext: refers to all context information related to data. Geospatial Entity constitutes the main concept in geospatial data integration. It has several slots: dataID, entityLocation, scale, data source, geometric representation (e.g. point, line 2
http://www.uml.org/
Towards a Context Ontology for Geospatial Data Integration Context Information
Profile
1...
Explicit
1581
UserID
UserContext
1
UserLocation
Information Type
Implicit
Schema Mapping Generation
ProcedureContext
Perceived
Inferred
DataContext
AssociationContext
Spatial Query
Schema Mapping
Spatial Relationship QueryResult
Topology-based
Semantic Association Metric-based Data Source
Meaning 1
has 1
Coordinate System
* hasSemanticAssociation in *
Synonym
Geospatial Entity
Antonym
1... 1..* Homonym
Direction-based
Legend:
Generalization
Upper Data Integration (DI)
Part-Of
GeoSpatial DI
Specialization
Fig. 2. Context Ontology for Geospatial Data Integration
or polygon), coordinate system and meaning (discovered when identifying its correspondent concept in the domain ontology). In Figure 3, we present a partial view of Geospatial Entity and Data Source with instances and relationships in the light of our motivating example. The example is in OntoViz (Protégé plug-in) notation3, so instances are associated with their concepts through the io (instance of) relationship. Subtypes are associated with their supertypes through the isa relationship. AssociationContext: is concerned with the relationships that may happen in geospatial data integration, such as semantic associations, spatial relationships and schema mappings. These are examples of inferred context information since they are derived according to a set of rules and conditions. Semantic associations represent relationships that really happen in real world and they are used to determine the similarity degree an entity (or an attribute) has to another. Spatial relationships may be derived through objects location analysis and they are classified into topological, directional or metrical. Schema mappings are the result of the schema mapping generation process and they are used to improve query reformulation. ProcedureContext: a procedure is an ordered collection of actions [4]. The idea is to provide the contextualization of the steps that are executed in order to solve a problem. Each step is executed in a surrounding set of circumstances that compose the context of the execution and provide reaction accordingly. Therefore a procedure may be the complete mapping generation process or a particular spatial query. All context information is also concerned with an information type that may be explicit or implicit. An explicit context information is obtained from static sources, such as a profile or an archive. An implicit one is perceived in the surrounding dynamic environment or it is derived through some reasoning process. For example, a spatial relationship is inferred through the analysis of two objects location. Still, a user’s working scale may be identified through his/her application parameters. 3
http://protege.cim3.net/cgi-bin/wiki.pl?OntoViz#nid6CS
1582
D. Souza, A.C. Salgado, and P. Tedesco
Fig. 3. A Partial View of GeoSpatial Entity, Data Source and Some Instances
4.3 Context Reasoning The context ontology is used to represent and provide ways to maintain the contextual information used for the semantic interpretation of data sources elements as well as for the actual geospatial data integration. One of the advantages of using an ontology mechanism is inferring new complex information from existing basic context [7]. Hence, context may be used to ease mappings generation as well as improving query processing. In our example, we assume that those inter-schema mappings have already been generated, so we are able to focus on query reformulation. We intend to use context to improve query processing and provide users with meaningful and more complete answers. In this sense, we consider geospatial contextaware queries as queries whose processing depends on the context at the time of their submission. This means that not only user preferences, scale, intended level of detail are to be considered but also the existent mappings between the sources that will be able to answer the query must be taken into account. Thus, all the surrounding query context must be used for reasoning. Suppose that a user poses the following spatial query SQ1: “SELECT Lake.name FROM Lake, Country WHERE INSIDE (Lake, Country) and Country.name like ‘Brazil’;”. At query formulation time, there are some context information that have already been gathered while there are others that are perceived or inferred. For example, existent mappings between the entities must be observed. As Lake is one of the necessary entities to answer the query, we present some context values for its instances in Figure 4, using the OntoViz notation. The system infers that both entities are equivalent, but are represented differently and are stored in different scales. In fact, the spatial query has its own context values, as we can see in Figure 5, also in the OntoViz notation. Since the query is about the “INSIDE” operation, the system can decide which data sources are able to execute it. Thus, SQ1 is decomposed taking into consideration such information.
Towards a Context Ontology for Geospatial Data Integration
1583
Fig. 4. Context Values for Lake’s Instances
Fig. 5. Context Values for Spatial Query SQ1
When the sub-queries results are supposed to be assembled to produce the final answer, we have to consider other context information such as multi-representation and scales difference. Since the formulating scale is about 1:300’000, this means that the user is working with a more detailed view of the themes. Thus the graphical result will be taken from data source B whose scale of origin is closer and whose object representation (polygon) is more adequate to that level of detail. Sometimes, the final result may be produced from several sources if they return complementary information, for example, when some attributes are present in one source but are absent in another. Representing context information using an ontology brings various benefits. It provides concept subsumption, concept consistency and instance checking (including object properties checking). A context ontology also allows defining constraints and reasoning rules that may be used to derive other implicit context information. For
1584
D. Souza, A.C. Salgado, and P. Tedesco
example, in Table 1, we present some properties that may be used to infer spatial relationships. Thus, knowing that Brazil is part of South America, we can provide users with the extra information that Brazil is also part of America. Also, if a user poses a query that needs the operation “INSIDE” but there is no available data source which realizes it, the system can search one that executes “CONTAINS”, since from one we can derive the other and vice-versa. Table 1. Some Property Rule Examples Property Part-of
Contains-Inside
Rule If A isPartOf B and B isPartof C Then A isPartOf C; If A contains B Then B isInside A;
Instantiation If “Brazil” isPartOf “SouthAmerica” and “SouthAmerica” isPartOf “America” Then “Brazil” isPartOf “America”; If “Brazil” contains “São Paulo” Then “São Paulo” isInside “Brazil”;
These are brief examples of how the use of a context ontology can help to improve geospatial data integration, and, more specifically, query processing and query reformulation. In fact, all information from the geospatial integration world that is to be reasoned over may be dealt with as context information. Consequently, from explicit context information, gathered from the sources, from the mappings and from the query formulation, the system can infer and derive other implicit context information. Moreover, the system is able to adapt and react in accordance to relevant contextual factors.
5 Conclusions and Further Work Research work on context has largely been done in different application domains. To the best of our knowledge, this work is the first attempt to use context to improve geospatial data integration. To this end, firstly we have developed an ontology for context representation. Our ontology is a conjunction of three: an upper, a middle and a specific ontology. To illustrate our ontology usage, we have presented a few examples according to a given geospatial data integration problem. Moreover, we have pointed out the importance context has when trying to provide users with more meaningful answers. This is extremely relevant in geospatial data integration systems. We expect that this ontology will be used by developers of geospatial data integration solutions to identify, model and represent context information in their applications. In fact, the ontology can also be used by intelligent GIS agents to manage and infer all kinds of information that can be reasoned over. We are currently developing additional scenarios which may allow us to work with other instances, constraints, queries and rules as well as with larger datasets. We intend to address the problem of query reformulation in a more complete sense, using the ontology to provide context reasoning over all the necessary steps.
Towards a Context Ontology for Geospatial Data Integration
1585
References 1. Essid M., Boucelma O., Colonna F., Lassoued Y.: Query Processing in a Geographic Mediation System. Proceedings of the 12th annual ACM international workshop on Geographic Information Systems. ACM Press New York, 2004, pp: 101 – 108. 2. Wache H., Voegele T., Visser U., Stuckenschmidt H.: Ontology-based Integration of Information – A Survey of Existing Approaches. IJCAI-01 Workshop: Ontologies and Information Sharing, 2001, pp: 108 – 117. 3. Fonseca F., Davis C., Câmara G.: Bridging Ontologies and Conceptual Schemas in Geographic Applications Development. GeoInformatica: 7(4), 2003, pp: 355-378. 4. Brézillon P.: Context Dynamic and Explanation in Contextual Graphs. Proceedings of the 4th International and Interdisciplinary Conference, CONTEXT (2003), USA, pp: 94-106. 5. Dey A.: Understanding and Using Context. Personal and Ubiquitous Computing Journal, Volume 5, 2001, pp. 4-7. 6. Wang X., Zhang D., Gu T., Pung H.: Ontology Based Context Modeling and Reasoning using OWL. Second IEEE Annual Conference on Pervasive Computing and Communications Workshops, 2004, pp.18-22. 7. Vieira, V., Salgado, A.,Tedesco, P.: Towards an Ontology for Context Representation in Groupware. Proceedings of the 11th International Workshop, CRIWG 2005, Brazil, pp: 367-375. 8. Goh, C.: Representing and Reasoning about Semantic Conflicts in Heterogeneous Information Systems. Ph.D. Thesis, MIT Sloan School of Management, 1996. 9. Kashyap, V., Sheth, A.: Semantic and Schematic Similarities Between Database Objects: A Context-Based Approach. VLDB Journal 5, no. 4, 1996, pp: 276-304. 10. Power, R.: Topic Maps for Context Management. In International Symposium on Information and Communication Technologies (ISICT 2003), pp. 199-204. 11. Brézillon P.: Context in Problem Solving: A Survey. The Knowledge Engineering Review, 14(1), 1999, pp: 1-34. 12. Levy A.: Combining Artificial Intelligence and Databases for Data Integration. Artificial Intelligence Today, 1999, pp. 249-268. 13. Widerhold G. Mediators in the Architecture of Future Information Systems. IEEE Computer, 1992, pp: 38-49. 14. Stefanidis K., Pitoura E., Vassiliadis P.:On Supporting Context-Aware Preferences in Relational Database Systems. In Proc. of the first International Workshop on Managing Context Information in Mobile and Pervasive Environments (MCMP’2005), in conjunction with MDM 2005, Cyprus.