Using Ontologies for Resolution of Semantic ... - Semantic Scholar

Paper presented at the 4th AGILE Conference on Geographic Information Science in Brno, April 19-21, 2001.

Using Ontologies for Resolution of Semantic Heterogeneity in GIS Farshad Hakimpour1

Sabine Timpf

Geographic Information Analysis Division Department of Geography, University of Zurich Winterthurerstr. 190, CH-8057, Switzerland {farshad,timpf}@geo.unizh.ch

Abstract. Recently, problem of semantic conflicts in interoperable information systems or database integration has been under research. In this paper we try to clarify the problem and show how using ontologies can help in detecting and possibly resolving semantic conflicts. We also show how a formalism such as Description Logic can help in this domain.

1.

Introduction

An increasing number of geodata producers and users have expressed the need for interoperable GISs and for the integration of geodata. Interoperable systems have the potential to offer a prompt reaction to the need for data sets when dealing with geographic information analysis. Interoperability not only has to overcome complexity of sharing and integrating data between systems with different data structures and models, it also has to deal with semantic heterogeneity. This has become more important due to the fact that spatial data modeling has been the focus of many research projects and different spatial data models are on the market. By semantics we refer to the meaning of the data in contrast to syntax, which refers to the structure of the schema. As meaning of words and understanding of concepts may differ from one community to another, people may have different interpretations of data, as well. For example, a concept “main street” can be defined as streets with width greater than 30m in one community while in another community it refers to streets with more than 3000 cars per day passing through. The later community may

1. The work of Farshad Hakimpour is funded by Swiss National Science Foundation (Project Number: 2100-053995).

have a concept “wide street” defined by its width grater than 30 meter which is comparable with the concept “main street” in the former community. Different interpretation of data causes semantic heterogeneity. Relying on implicit interpretation of data is the main cause of semantic heterogeneity. People from different communities can talk about “main street” without realising they are referring to different concepts. Many standards in the domain of GIS have been established to overcome this problem. Semantics of geodata should be explicitly represented in the metadata by means of a formalism. Explicit and formalized representation of semantics can be used during integration to overcome semantic heterogeneity in systems using spatial data from different sources. Database schemas are the definitions of logical structures (or patterns) that convey the data and are the result of the database design phase. Schemas are expressed in a language (known as Data Definition Language or DDL). Part of the semantics is based on the interpretation of DDL syntax - i.e., keywords, operators and their orders. That is, when encountering such keywords or operators a computer program takes a standard action or a human would have a standard interpretation. Another part of semantics is related to the names (or terms) one uses for identifiers in the DDL - we refer to it as terminological semantics. Items in schema definitions such as: attributes, classes, methods, data types, relations are declared by specified names and possibly some descriptions as metadata. Such verbal descriptions used to be the way to specify the semantics of identifiers in schemas. The latter (terminological semantics) is the focus of this paper, while the former part is still subject of research (e.g., heterogeneity of OODB schemas and RDB schemas) but will not be discussed here. Ontologies, as “explicit and formalized specifications of conceptualizations” (Gruber [3]), play an important role to extract and formalize semantics. An ontology consists of logical axioms that convey the meaning of terms within a community. The logical axioms represent hierarchies of concepts and the relations among concepts. An ontology is specific to a community and should be agreed upon by members of the community [1]. Ontologies attracted attention in integration of information systems and databases. By means of ontologies we will represent an approximation of conceptualizations [4] which is a basis for interpretation of data. The structure of this paper is as follows. Section 2 aims at giving an introductory definition of ontology applicable to the integration problems. Section 3 discusses the role of ontologies in the integration task. Section 4 focuses on Description Logic as an approach to represent and reason with ontologies including a simple example. Section 5 introduces the conclusion and the direction of the future research.

2.

Ontologies

Explicit and formal definition of semantics of terms guided researchers to apply formal ontologies as a potential solution to semantic heterogeneity (e.g., works preseted in [5]). An ontology consists of logical axioms that convey the meaning of terms for a particular community. Logical axioms are the means to introduce concepts and their relations also express constraints on both concepts and relations. An ontology exists

Possible Worlds

World Structures (W)

A C

Rstreet={, } Rwider={}

B

A .....

C

B

Intensional Relations

Rstreet={, } Rwider={}

.....

..... .....

FIGURE 1. Role of intentional relations in a conceptualization.

under a consensus by members of a community [1] - e.g., users of one information system or people in one discipline. The logical axioms mentioned above define an explicit specification of a conceptualization. Conceptualization is defined by a domain (D), a set of states of the world (W), and a set of intensional (or conceptual) relations (ℜ). The set of intensional relations introduced by Nicola Guarino is a key issue in his definition [4]. The following example can illustrate the definition above. Consider states of the world illustrated in Figure 1. by the following conceptualization: D= {A, B, C} (Domain of our world) ℜ= {ρstreet,ρwider} (Intensional or conceptual relations) Intensional relations map every state of the world to the following extensional relations: R= {Rstreet, Rwider} (Extensional relations) By ρwider, we define a mapping from each state of the world to one particular subset of all possible tuples in the relation Rwider. With such a definition of conceptualization, we are able to overcome the lack of extensional relations to represent the semantics. A set of extensional relations is a representation of the state of the world - called world structure (Figure 1). One may consider the following axioms as a definition of the intensional relations mentioned above. wider(x,y): width(x) > width(y) street(x): transportation_network_element(x) and (runs_vehicle(x,y) => car(y)) and (∃y (sidewalk_of(x,y)) ...

Of course all the new terms introduced in the definitions have to be defined. After defining the intensional relation, one can assign the terms “Straße”, “strada” or “rue” to this concept. In contrast to a thesaurus, which applies a limited number of known relations among terms, an ontology describes the relations among the concepts referred to by terms. Ontology plays a role in the conceptual level by defining concepts (Figure 2).

3.

Ontologies and Semantic based Integration

A community cannot be forced to adopt an ontology. The idea of using States of the Reality level ontologies is to give the communities the World freedom to define their ontologies based on their conceptualizations. On the other hand, it helps where two communities are Conceptual level Concepts willing to communicate based on common understanding. All members of a community should agree on definitions in Terms an ontology. That is, no member can alter Linguistic level or override the definitions in the ontology according to his or her preferences [9] FIGURE 2. otherwise, that causes occurrence of homonyms in the community. Yet, one can add a new ontological definition to an ontology. Models of the world are based on conceptualization. They can be considered projection of the world structures. The modeling process performs this projection and is done World Structures

Possible Worlds (W)

... ...

...

Intensional Relations ...

Modeling

FIGURE 3. Models are projection of world structures build by some constraints according to the requirements.

according to our interest (Figure 3). The essence of the intensional relations is not known to us and we use axioms in our ontologies to express the relations among them. These axioms can help us to assure consistency of conceptualizations between communities. It is important to note the two following cases in integration and interoperability: 1. Models are based on the same ontology: In such case our problem is limited to schema integration or finding synonyms and homonyms. 2. Models are based on two different ontologies. This happens when two communities using different ontologies try to interchange or integrate data. In this case the problem is finding a shared ontology [6]. It can be done through finding the similarities between concepts defined in two ontologies. A shared ontology is produced by taking the shared part of underlying ontologies and finding the similarities and relations between them. A shared ontology is a base to share parts of the conceptualizations and in turn it helps to have the same interpretation of information. Considering that forcing an ontology to the existing systems is not feasible, we need another solution. That would be moving towards a shared ontology based on ontologies from different systems or communities. An approach used to apply ontologies for integration is illustrated in Figure 4. A reasoning system has to find the similarities between concepts in two ontologies and the mediator will map the corresponding items in two schemas. The query should commit to the domain ontology and should introduce its domain ontology to the system. Mediator will use a reasoning system to find the mapping between schemas. A drawback of this approach is the high processing cost, because for every query the mediator should process respective ontologies and derive required mappings. On the other hand, it is a suitable approach where we have dynamic schemas - such as DTDs in XML data - and when number of data producers change frequently. Considering that it is an on the fly approach, a human supervision to find similarities may not be possible therefore it is an error prone approach, as well. Community P

Community Q

Ontology p Schema p2

Mediator Schema q1

DBp1

Schema p1

DBp2

Ontology q

Reasoning System

DBq1

FIGURE 4. An architecture for on the fly integration - with local queries committing to a domain ontology and no global schema or global query.

4.

Description Logic

DL formalism can help to express ontologies and consequently, DL reasoning systems can be used to define concepts and relations between them as well as check the consistency of the definitions. Providing ontological definitions along with data will help in an automatic (or semiautomatic) integration of data and reduces the possibility of semantic conflicts. Both the reasoning ability of existing knowledge base systems and the expressiveness of the formalism are important factors for this approach.

4.1

A formalism to represent ontologies

Description Logic (DL) is one of the means of formalizing and applying ontologies1. A proper formalism for representation of ontologies should be able to express a set of axioms and reason with them. These features include: concept, relation and instance definitions. Concept definition is one means to define intensional relations. We need is-a relation to define hyponym or specialization relation to establish a hierarchical taxonomy of concepts. Relation definition is another means to define intensional relations. Relations are not merely defined by typed attributes carrying referential keys. Relations in ontology definitions must be defined independent by concepts. For instance, in the definition of a brotherhood relation we can state that the relation is established only with persons whose gender is male and that those persons should have the same parents. (This can help to have a hierarchical taxonomy of relations like taxonomy of concepts and deduce new relations between concepts or instances of concepts which are not explicitly stated). Instances (also called individuals) represent members of the domain by a collection of facts. For example, an instance of the concept person can be defined by its relation to an instance of social security, a name and/ or its brother. Instances can be classified under concept definitions (by constraints on its relations). In the following, we show how Description Logic can represent ontologies by such features. 4.2

Terminology and Assertion Definitions

Conventional DL systems consist of two modules: Terminological Box (TBOX) and Assertion Box (ABOX). TBOX refers to the part where concepts and relations are defined and reasoning on them is performed and ABOX is where the instances are defined and reasoning on instances takes place. DL definitions (of concepts or relations) are of two types: primitive and defined (or non-primitives). By primitive definitions, one expresses necessary constraints to be satisfied for instances in its extension. Non-primitive definitions are described by necessary and sufficient conditions. Non-primitive definitions can be used when one can give a thorough clear definition of a concept. A DL reasoning system is able to recog1.PowerLoom [7] language is used in the examples in this paper.

nize instances under such concepts, implicitly. On the other hand, a DL system can not recognize an instance under a primitive concept unless it is declared explicitly due to the fact that the definition is partial. Concepts are defined by their superconcepts and restrictions on their relations with the other concepts. An example of a primitive concept definition is as follows: (defconcept Road_Element (?r Spatial_Element) :=> (and (exists (?j) (and (Junction ?j) (= (starts-at ?r) ?j))) (= (dimension ?r) LINEAR)))

It defines a primitive concept “Road_Element” which is a subconcept of “Spatial_Element”. It also states that all “Road_Element” have at least one “Junction” with which they have “starts-at” relation. The non-primitive concept “Narrow_Road” is defined as follows: (defconcept Narrow_Road (?r Road) : (< (width ?r) 20))

The concept “Narrow_Road” is a subconcept of “Road” and its “width” is filled by a value less than 20. Therefore, every instance of “Road” whose “width” relation is filled by a value less than 20 will be classified as “Narrow_Road” and vice versa. Here is an example of a primitive relation definition: (defrelation dimension ((?se Spatial_Element) (?sd Spatial_Dimention))) (defrelation bounds ((?x Spatial_Element)(?y Spatial_Element)) :=> (and (dimension ?x POINT) (dimension ?y LINEAR)) (defrelation starts_at ((?x Spatial_Element)(?y Spatial_Element)) :=> (bounds ?y ?x))

DL reasoning systems focus on inferring subsumption relations between concepts in TBOX (i.e., they can determine where a concept can be located in a specialization hierarchy) as well as recognizing instances defined in ABOX under concept definitions. These capabilities make DL suitable for reasoning with ontological definitions for interchange of data. As an example consider the following assertion: (assert (Road_Element R1)

That concludes, R1 has a relation with Spatial_Dimension LINEAR and has a Junction to fill in the start-at relation. By adding: (assert (= (width R1) 15))

R1 will be classified as an instance of “Narrow_Road”. 4.3

Coherence Evaluation

An important issue in defining ontologies is coherence of definitions. Consider the following example: (assert (= (start-at R2) R1))

This assertion means that individual R2 “starts_at” individual R1. It concludes that R1 “bounds” R2 consequently, R1 has “dimension” filled by POINT. While we also

said that R1 is a “Road_Element” therefore its “dimension” is filled by LINEAR. This is incoherent according to our understanding of the above definitions. However, latter assertion can cause a reasoning system to conclude that the instances POINT and LINEAR are equal. Consistency checking is inevitable in checking validity of ontological definitions (in TBOX). Besides, there should be tools to evaluate if the assertions comply with the ontological definitions. Theoretically, DL systems have the ability of finding incoherence, but in practice it depends on the system implementation. As an example, Loom [2] with a very expressive language offers the functions to detect incoherence in concept and instance definitions, but it does not detect many examples of incoherence. While, Neoclassic [8] can detect the incoherence in both concept and instance definitions. However, Neoclassic implementation does not support an expressive language in comparison to Loom. For instance, the relation definition in Neoclassic is related to concepts and one cannot define an independent relation with its constraint. Relation (or role) definition in Neoclassic offers only the type checking and does not offer the possibility of defining specialization hierarchy of relations or any other type of constraint. PowerLoom [7] with an expressive language is able to detect incoherence, however it only reacts by a warning message and does not support any special function for such purpose unlike its ancestor Loom. 4.4

An Example

Part of conceptualization of different communities are illustration in Figure 5. The respective ontological definitions defined in DL are shown in the appendix. Community one classifies streets based on traffic (number of cars passing through a street per day) and width and community two only based on width. Giving that the concepts “Street” and the relation “width” are equal in the two communities the reasoning system can merge the ontologies. By merged ontological definitions mapping between two communities can be done by a reasoning system as shown in table 1. The reasoning system can map instances of main street from community one to Community one

Community two

Street

Street

Wide Street

Main Street

Main Street

Secondary Street

Narrow Street

Secondary Street Very Narrow Street

FIGURE 5. Different classification of concept “Street” in different communities.

Community one

Community two

Wide Street

Main Street

Narrow Street Mapping based on width

Secondary Street

Very Narrow Street

Main Street Secondary Street

Mapping based on car_per_day

Mapping based on width

Main Street Secondary Street

TABLE 1. Mapping by a DL reasoning system.

wide street from community two. This mapping is deduced based on the definition of the concepts and does not depend on the state of an instance. On the contrary mapping between secondary street from community two to narrow or very narrow street from community one depends on the width of the street. If the value of width is not known, mapping will cause imprecision due to the fact that community one is using a higher granularity in the classification. The reverse mapping from community one to community two does not depend on value of width. If the value of width is not known mapping causes information loss (which is not of any importance for community two). The terms main street and secondary street are examples of homonyms here. Mapping instances of main or secondary street from community two to main or secondary street in community one depends on the traffic criterion (value of car_per_day). Since the relation car_per_day is not known to community two, such mapping can not take place - the reasoning system can deduce this, though. Unlike car_per_day, width is a relation known in both communities. The inverse mapping can take place depending on the value of width in each instance, if the value is known in the instance.

5.

Conclusion and Further Research

We showed in this paper that ontologies can help applications to be independent from the implicit background knowledge of a community or at least reduce such dependency. A reasoning system can help mediating between systems from different communities. By this approach, one has to explicitly say what his or her intension is by referring to a concept. The main concern of such intensional definitions is to find out the discrepancies in the extension of the concepts. This reduces the chance of misinterpretation and semantic conflicts in communications among the communities and their

respective systems. Another advantage of this approach is extensibility of interoperating systems after building the ontologies. There are still open questions to declare success of ontology based approches: • During the experiment to build ontologies from existing standards many problems due to implicit assumptions can be detected. Due to the fact that changes of the standards are very expensive, and considering poor taxonomies underlying the standards, using ontologies can help interchange or integrate data based on different standards. • Commitment of a schema definition to the ontology should be investigated. The relations between schema definitions and ontology definitions should be formalized. • Low cost approaches to extract ontologies from different sources such as UML diagrams, RDF, XML DTDs, etc. can guarantee the applications of ontologies. While considering the problem of representation of semantics we should also consider the problem of extraction of semantics from the context. • A better architecture for applications with static schema can also reduce the drawbacks of introduced existing architecture. • Representation formalisms and the respective reasoning systems impose their weaknesses over the efficiency of using ontologies.

References [1] Y. A. Bishr, H. Pundt, W. Kuhn, and M. Radwan, 1999. Probing the concept of information communities - a first step toward semantic interoperability. In M. Goodchild, Max Egenhofer, R. Fegeas, and C. Kottman, editors, Interoperating Geographic Information Systems, pages 55–69. Kluwer Academic. [2] David Brill, 1993. Loom Reference Manual. University of Southern California, http://www.isi.edu/isd/LOOM/documentation/reference2.0-twoside.pdf. [3] T. R. Gruber, 1993. Towards principle for the design of ontology used for knowledge sharing. In N. Guarino and R. Poli, editors, Formal Ontology in Conceptual Analysis and Knowledge Representation, International Workshop on Ontology. Kluwer Academic. [4] Nicola Guarino, 1998. Formal ontology and information systems. In Nicola Guarino, editor, Formal Ontology in Information Systems, Proceedings of FOIS’98, pages 3–17, Trento, Italy, June 1998. IOS Press, Amsterdam. [5] Nicola Guarino, editor, 1998. Formal Ontology in Information Systems. IOS Press, Amsterdam. [6] Dean Jones, 1998. Developing shared ontologies in multi-agent systems. In ECAI’98 Workshop on Intelligent Information Integration, Brighton, U.K., August. [7]

Robert M. MacGregor Hans Chalupsky Eric R. Melz, 1997. PowerLoom

Manual. University of Southern California, http://www.isi.edu/isd/LOOM/PowerLoom/documentation/manual.pdf. [8] Peter F. Patel-Schneider, Merryll Abrahams, Lori Alperin, Resnick, Deborah L. McGuinness and Alex Borgida, 1996. NeoClassic Reference Manual. AT&T Labs Research, http://www.research.att.com/sw/tools/classic/papers/NeoRef.html. [9] Pepijn R. S. Visser and Zhan Cui, 1998. Heterogeneous ontology structures for distributed architectures. In ECAI-98 Workshop on Applications of Ontologies and Problem-solving Methods, pages 112–119.

Appendix Community 1 (defconcept Street (Thing)) (deffunction width ((?s Street)) :-> (?i INTEGER)) (deffunction car_per_day ((?s Street)) :-> (?i INTEGER)) (defconcept Main_Street ((?ms Street)) : (and (Street ?ms)(>= (car_per_day ms) 3000))) (defconcept Secondary_Street ((?ss Street)) : (and (Street ?ss)(< (Car_per_day ss) 3000))) (defconcept Wide_Street ((?ws Street)) : (and (Street ?ws)(>= (width ws) 30))) (defconcept Narrow_Street ((?ns Street)) : (and (Street ?ns)(< (width ?ns) 30)(>= (width ?ns) 15))) (defconcept Very_Narrow_Street ((?vns Street)) : (and (Street ?vns)(< (width ?vns) 15)))

Community 2 (defconcept Street (Thing)) (deffunction width ((?s Street)) :-> (?i INTEGER)) (defconcept Main_Street ((?ms Street)) : (and (Street ?ms) (>= (width ms) 30))) (defconcept Secondary_Street ((?ss Street)) : (and (Street ?ss) (< (width ss) 30)))

The result of merging the ontological definitions: (defconcept Street_Shared (Thing)) (deffunction width_Shared ((?s Street_shared)) :-> (?i INTEGER)) (deffunction car_per_day_comm1 ((?s Street_shared)) :-> (?i INTEGER)) (defconcept Main_Street_comm1 ((?ms Street_shared)) : (and (Street_shared ?ms) (>= (car_per_day_comm1 ms) 3000))) (defconcept Secondary_Street_comm1 ((?ss Street)) : (and (Street_shared ?ss)

(< (Car_per_day_comm1 ss) 3000))) (defconcept Wide_Street_comm1 ((?ws Street_shared)) : (and (Street_shared ?ws) (>= (width_shared ?ws) 30))) (defconcept Narrow_Street_comm1 ((?ns Street_shared)) : (and (Street_shared ?ns) (< (width_shared ?ns) 30) (>= (width_shared ?ns) 15))) (defconcept Very_Narrow_Street_comm1 ((?vns Street_shared)) : (and (Street_shared ?vns)(< (width_shared ?vns) 15))) (defconcept Main_Street_comm2 ((?ms Street_shared)) : (and (Street_shared ?ms) (>= (width_shared ms) 30))) (defconcept Secondary_Street_comm2 ((?ss Street_shared)) : (and (Street_shared ?ss) (< (width_shared ss) 30)))