Geoinformatik
Ontology-based Discovery and Composition of Geographic Information Services
Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften durch den Fachbereich Geowissenschaften der Westf¨alischen Wilhelms-Universit¨at M¨unster
vorgelegt von Michael Lutz aus Dublin, Irland im Oktober 2005
Dekan:
Prof. Dr. Hans Kerp
Erster Gutachter:
Prof. Dr. Werner Kuhn
Zweiter Gutachter:
Priv.-Doz. Dr. Ubbo Visser
Tag der m¨ undlichen Pr¨ ufung:
............................................ (wird nach der Pr¨ ufung handschriftlich eingesetzt)
Tag der Promotion:
............................................ (wird nach der Pr¨ ufung handschriftlich eingesetzt)
Preface This thesis consists of eleven papers that address the question how to enhance the discovery of geographic data and geographic information services in spatial data infrastructures by means of ontologies. The collected papers have been published or accepted for publication in peer-reviewed journals or conference proceedings. Chapter I provides a synopsis of the thesis, which puts the papers into perspective and links the topics addressed in each of the papers to each other. Wherever possible, the synopsis gives references to specific sections of the selected papers, which are re-printed in the subsequent chapters. For example, the reference ”section IV.3.2” points to section 3.2 in chapter IV. When referring to other sections within the same chapter, the chapter number is not given (e.g. ”section 3.2”). The selected papers are included in this thesis by permission of the publishers. They have been written as part of and have partly been co-authored by colleagues from the meanInGs project1 and the MUSIL research group2 . Their text appears unchanged3 , but in cases where subsequent improvements have been made, these are pointed out in the synopsis. The first two papers describe the general setting and the problems addressed in the thesis: • Chapter II: Bernard, L., Einspanier, U., Lutz, M. & Portele, C. (2003): Interoperability in GI Service Chains – The Way Forward, in: Gould, M., Laurini, R. & Coulondre, S. (Eds.): 6th AGILE Conference on Geographic Information Science: 179–187. • Chapter III: Lutz, M., Riedemann, C. & Probst, F. (2003): A Classification Framework for Approaches to Achieving Semantic Interoperability Between GI Web Services, in: Kuhn, W., M. F. Worboys & S. Timpf (Eds.): Conference on Spatial Information Theory: Foundations of Geographic Information Science (COSIT 2003). LNCS 2825: 200–217. In chapters IV to VIII, methods for dealing with these problems are developed. • Chapter IV: Klien, E., Lutz, M., & Kuhn, W. (2005): Ontology-based Discovery of Geographic Information Services – An Application in Disaster Management. Computers, Environment and Urban Systems (CEUS) (in press). • Chapter V: Klien, E., Einspanier, U., Lutz, M., & H¨ ubner, S. (2004): An Architecture for Ontology-based Discovery and Retrieval of Geographic Information, in: Toppen, F. & Prastacos, P. (Eds.): 7th Conference on Geographic Information Science (AGILE 2004): 179-188. • Chapter VI: Lutz, M. & Klien, E. (2006): Ontology-based Retrieval of Geographic Information, International Journal of Geographical Information Science (in press). • Chapter VII: Lutz, M. (2004): Non-taxonomic Relations in Semantic Service Discovery and Composition, in: Maurer, F. & Ruhe, G. (Eds.): Proceedings of the First ”Ontology in 1 see
http://www.meanings.de/ http://musil.uni-muenster.de/ 3 In the case of the two papers that have not yet been published (chapters VI and VIII), we reprint the text of the revised version of the manuscript. 2 see
I
II
Preface
Action” Workshop, in conjunction with the Sixteenth International Conference on Software Engineering & Knowledge Engineering (SEKE’2004): 482–485. • Chapter VIII: Lutz, M. (2005): Ontology-based Descriptions for Semantic Discovery and Composition of Geoprocessing Services. Geoinformatica (accepted for publication). Architecture and implementation issues are described in chapter IX (and partly also in chapters V to VII). • Chapter IX: Lutz, M. (2005): Ontology-based Service Discovery in Spatial Data Infrastructures, in: Jones, C. & Purves, R. (Eds.): GIR’05. Proceedings of the ACM Workshop on Geographic Information Retrieval, Bremen, Germany: 45–54. The last three papers (chapters X to XII) focus on issues related, but not central to the main topics of the thesis. • Chapter X: Schade, S., Sahlmann, A., Lutz, M., Probst, F. & Kuhn, W. (2004): Comparing Approaches for Semantic Service Description and Matchmaking, in: Meersman, R. & Tari, Z. (Eds.): On the Move to Meaningful Internet Systems 2004: CoopIS, DOA, and ODBASE, OTM Confederated International Conferences, Part II: 1062-1079. • Chapter XI: Probst, F. & Lutz, M. (2004): Giving Meaning to GI Web Service Descriptions, in: Bevinakoppa, S. & Hu, J. (Eds.): Web Services: Modeling, Architecture and Infrastructure – Proceedings of the Second International Workshop on Web Services: Modeling, Architecture and Infrastructure, WSMAI 2004, in conjunction with ICEIS 2004: 23–35. • Chapter XII: Klien, E. & Lutz, M. (2005): The Role of Spatial Relations for Automating the Semantic Annotation of Geodata, in: Cohn, A.G. & Mark, D.M. (Eds.): Spatial Information Theory. International Conference (COSIT 2005): 133–148.
Acknowledgements This thesis is the product of about three years of hard, often exciting, but also sometimes grueling work. Many people have helped me in one way or another to survive this time. I am deeply grateful to all of them and there are some whom I want to thank in particular. First of all, I would like to thank Werner Kuhn for his advice and comments during the past three years. Werner, our discussions have greatly helped me shape the thesis and keep my focus. Also, I am grateful for being employed in the meanInGs project (which was funded by the German Ministry of Education and Research (BMBF) through the Geotechnologien programme), which made this dissertation possible in the first place. I also want to thank Ubbo Visser for the good cooperation in meanInGs and for being my second referee. Special thanks also go to a number of people at ifgi, in particular Eva Klien, Udo Einspanier, Nicole Ostl¨ ander and Florian Probst. Eva, thanks for being a great colleague, co-author, office mate and friend. I greatly appreciate all the big and small pieces of advice and that you have always been honest with me. Udo, I am glad that you supported the meanInGs team from the beginning with your technical expertise and dry humour. Thanks for being a tough reviewer of my publications and the dissertation. Florian and Nicole, you were my favourite ”outside experts” and always helped me not to get lost in the depths of my own writing. I also really enjoyed our ”Mini Diss” meetings at the beginning of our PhD work and the time we spent together outside the office and away from our dissertations. I particularly enjoyed the retreats in Civezza (also with Eva) – I hope, we can keep this ”habit” up. I am also grateful to my co-authors and colleagues from meanInGs project, especially Ingrid Christ and S¨ oren Haubrock from Delphi IMM in Potsdam and Sebastian H¨ ubner and J¨ orn Witte from TZI in Bremen. The project provided me with a lot of inspiration and opportunity for discussion. I would like to thank Sebastian in particular, for introducing me to the world of description logics at the beginning of the project. I also want to thank my co-authors and colleagues from MUSIL (the M¨ unster Semantic Interoperability Lab) for their comments, questions and inspiration. Trying to figure out who in the emerging group would be working on which topic (remember the multi-colour diagrams?) greatly helped me to sharpen my research focus. Special thanks also go to the members of Prof. Ulrich Streit’s ”Diss-AG”. The discussion in this group always helped to take a different perspective and at the same time to keep me grounded. The time I spent at the Digital Enterprise Research Institute (DERI) in Innsbruck in September/October 2004 was very valuable for the dissertation – and very enjoyable for me. I especially would like to thank Holger Lausen for giving me the opportunity to visit DERI and all DERI members for the interesting insights in their research on Semantic Web Services. Last but not least, I want to thank my family and friends, and in particular my girlfriend Rebecca, for their support. Rebecca, thanks for putting up with me and my moods for the last three years, for the ”thesis-free” time we spent together, but also for ever being curious about my work and for making me explain it in plain and understandable words.
III
Contents
I
Synopsis
1
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Hypothesis and Research Questions . . . . . . . . . . . . . . . . . 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 GI Service Discovery and Composition in Spatial Data Infrastructures 3 A Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Matchmaking Scenarios for GI Service Discovery . . . . . . . . . . . . . 4.1 The Classification Framework . . . . . . . . . . . . . . . . . . . . . 4.2 Matchmaking Scenarios . . . . . . . . . . . . . . . . . . . . . . . . 5 Methods for Ontology-based GI Service Discovery . . . . . . . . . . . . 5.1 Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Shared Vocabularies . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Ontology Languages and Reasoning . . . . . . . . . . . . . . . . . 5.4 Registration Mappings . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Function Subtyping . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Ontology-based Discovery and Retrieval of Geographic Data . . . . . . 6.1 Ontology-based Descriptions of Feature Types . . . . . . . . . . . 6.2 Matchmaking between Query and Application Concepts . . . . . . 6.3 Ontology-based Retrieval of Geographic Data . . . . . . . . . . . . 7 Ontology-based Discovery of Geoprocessing Services . . . . . . . . . . . 7.1 Ontology-based Descriptions of Geoprocessing Operations . . . . . 7.2 Matchmaking between Semantic Advertisements and Queries . . . 8 Implementing Ontology-based Service Discovery in SDIs . . . . . . . . 8.1 Representations for Ontology-based Descriptions . . . . . . . . . . 8.2 Components for Ontology-based Discovery . . . . . . . . . . . . . 8.3 Information Flow for Ontology-based Discovery . . . . . . . . . . . 9 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 2 2 3 4 4 6 9 9 10 12 12 12 13 14 15 16 16 19 20 22 22 25 27 27 29 30 33 35
II
Interoperability in GI Service Chains – The Way Forward
42
III
A Classification Framework for Approaches to Achieving Semantic Interoperability between GI Web Services
52
IV
Contents
IV
V
V
Ontology-based Discovery of Geographic Information Services – An Application in Disaster Management
71
An Architecture for Ontology-based GI Discovery and Retrieval of Geographic Information
94
VI
Ontology-based Retrieval of Geographic Information
105
VII
Non-taxonomic Relations in Semantic Service Discovery and Composition 133
VIII Ontology-based Descriptions for Semantic Discovery and Composition of Geoprocessing Services
138
IX
Ontology-based Service Discovery in Spatial Data Infrastructures
176
X
Comparing Approaches for Semantic Service Description and Matchmaking
187
XI
Giving Meaning to GI Web Service Descriptions
206
XII
The Role of Spatial Relations for Automating the Semantic Annotation of Geodata
220
List of Figures 1 2 3 4 5 6
7
8
9 10
11 12 13 14 15 16 17 18
Service discovery during service composition. The properties of services that have already been discovered impose constraints on the queries for further services. . . . Possible combinations of data access and geoprocessing services for answering Susan’s question. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Which is the closest pub? Pub 1 if using straight line distance (dotted lines), pub 2 if using network distance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manual matchmaking with standardized metadata, e.g. ISO 19115/19119 (level D). Automatic matchmaking with formal metadata (level E). . . . . . . . . . . . . . . An example registration mapping (bottom), which maps the Restaurant feature type (top) to the restaurant2 concept of the associated application ontology. The application ontology is described in more detail in section 6.1. . . . . . . . . . . . . Simplified graphic representation of the business(es), restaurant and measurements domain ontologies used for the running example. XML schema datatypes and geometry types derived from ISO 19107 are shown in grey. . . . . . . . . . . . . . . . . . Taxonomic relationships between the application concepts defined by John (dark grey), the query concepts defined by Susan (light grey) and the domain concept (white) in the running example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ontology-based operation descriptions and their relationships to DL and FOL ontologies at the domain and application level. . . . . . . . . . . . . . . . . . . . . . . The two-step matchmaking procedure: In the first step, all advertisements are filtered using DL subsumption reasoning; in the second step, the remaining services’ pre- and postconditions are compared to those specified in the query. . . . . . . . . Taxonomic relationships between the application concepts used by John (dark grey) and Susan (light grey) and the domain concept gm point (white). . . . . . . . . . . The relationships between conventional metadata, catalogue requests, feature type schemas and application ontologies and semantic advertisements. . . . . . . . . . . The relationships between conventional metadata/catalogue requests, semantic advertisements/queries and application ontologies. . . . . . . . . . . . . . . . . . . . . Registering a data access service with the Semantic Catalogue. . . . . . . . . . . . Discovering a data access service using the Semantic Catalogue. . . . . . . . . . . . Retrieving data from the WFS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Registering a service with the Semantic Catalogue. . . . . . . . . . . . . . . . . . . Discovering a service using the Semantic Catalogue. . . . . . . . . . . . . . . . . .
VI
7 8 9 10 11
15
17
20 23
26 26 28 29 31 31 32 32 33
List of Tables 1 2 3
TBox language features used in this thesis. . . . . . . . . . . . . . . . . . . . . . . Concept constructors used in this thesis. . . . . . . . . . . . . . . . . . . . . . . . . Defining the ranges and domains of roles. . . . . . . . . . . . . . . . . . . . . . . .
VII
14 14 14
Abbreviations and Acronyms CS-W DL EPSG GI GIS GSW IE ISO GPS FOL KRSS LARKS OGC OWL OWL-S RACER SDI UDDI UML W3C WCS WFS WGS84 WMS WPS WSDL WSMO XML
Web Catalog Service Description Logics European Petroleum Survey Group Geographic Information Geographic Information System Geospatial Semantic Web Interoperability Experiment International Standardization Organization Global Positioning System First-Order Logic Knowledge Representation System Specification Language for Advertisement and Request for Knowledge Sharing Open Geospatial Consortium Web Ontology Language OWL-based Web Service Ontology Renamed ABox and Concept Expression Reasoner Spatial Data Infrastructure Universal Description, Discovery and Integration Unified Modeling Language World Wide Web Consortium Web Coverage Service Web Feature Service World Geodetic System 1984 Web Map Service Web Processing Service Web Service Description Language Web Service Modeling Ontology eXtensible Markup Language
VIII
Chapter I
Synopsis This chapter provides an overview of the thesis. It puts the selected papers (chapters II to XII) into perspective and links the topics addressed in each of the papers to each other. Wherever possible, references to specific sections of the selected papers are given. Abstract. Spatial data infrastructures will greatly benefit from the ability to compose geographic information (GI) services to solve complex problems. Discovering suitable services for data access and geoprocessing are major challenges in this endeavour. Current (keyword-based) approaches to service discovery are inherently restricted by the ambiguities of natural language, which can lead to low precision and/or recall. To alleviate these problems, we propose two ontology-based approaches for enhanced discovery of GI services. The approach for ontology-based discovery of data access services is based on semantic matchmaking between Description Logic (DL) concepts representing geographic feature types and the requester’s query. DL subsumption reasoning is used to find matches between queries and service descriptions. The approach for ontology-based discovery of geoprocessing services rests on two ideas. Ontologies describing geospatial operations are used to create descriptions of user requirements and service capabilities. Matches between these descriptions are identified based on function subtyping. In both approaches, service descriptions are based on a shared vocabulary that contains the basic terms of a domain and for which a shared understanding between the actors in the domain is assumed. We use a running example from the geospatial domain to analyse which problems can occur in existing keyword- and ontology-based approaches and how the discovery of GI services differs from other service discovery tasks. The example is also used for illustrating the prototypical implementation of the proposed approach.
1
2
1
Chapter I. Synopsis
Introduction
The efficient use of distributed geographic information is a key factor in planning and decisionmaking in a variety of domains. To facilitate the access to geographic information, spatial data infrastructures (SDIs) (Groot and McLaughlin, 2000; Masser, 2005) are currently being set up within regions, countries and across national borders (e.g. Bernard, 2002). Their main components are geographic information (GI) services providing access to geospatial data and geoprocessing capabilities. SDIs support users in discovering and accessing these services through catalogue services and syntactic interoperability standards. In recent years, the number of GI services available on the Web has been rapidly and continually increasing. While at present, these services are generally isolated applications, their composability is often perceived as their greatest value as it enables more complex processing tasks (Einspanier et al., 2003). First attempts at composing GI services have been made. However, these rely on manually and statically coupling existing services (Bernard et al., 2003) while for future applications service composition is envisioned to be automated (Martin et al., 2004). Discovering services that are appropriate for solving a specific problem from among a large number of available services is a central task within the larger task of service composition. Service discovery is essentially about finding a match between descriptions of service capabilities (i.e. of what the service provides) and user requirements (i.e. what is needed to solve a given problem) (Trastour et al., 2001). This thesis proposes approaches for overcoming semantic problems that can occur during service discovery as part of service composition.
1.1
Problem Statement
In SDIs, the available datasets and GI services are typically registered in catalogue services (OGC, 2004). Users can formulate queries using keywords and/or spatial filters to find appropriate data and services for a specific task. The metadata fields that can be included in the query depend on the used metadata schema and on the query functionality of the service used for accessing the metadata. Even though natural language processing techniques can increase the semantic relevance of search results with respect to the search request (e.g. Richardson and Smeaton, 1995), keywordbased techniques are inherently restricted by the ambiguities of natural language. As a result, keyword-based search can have low recall if different terminology is used and/or low precision if terms are homonymous or because of limited possibilities to express complex queries (Bernstein and Klein, 2002). These problems are particularly critical for two types of GI services: Services for accessing data, which are usually the starting point for any complex GI service, and geoprocessing services, which perform some kind of computation or analysis on the provided data. The functionality of data access services is relatively limited. Also, service interfaces for this service type are standardised and already in widespread use, e.g. the Web Feature Service (WFS) interface (OGC, 2002) for providing access to vector data, which is focused in this thesis. However, the semantics of the data provided by such a service can only be described using its database schema and keywords in the metadata. Geoprocessing services form a very diverse category, which is only just emerging. Here, it is the functionality of the services which is of interest and whose semantics is under-specified in current service descriptions.
1.2
Hypothesis and Research Questions
To alleviate and overcome the semantic heterogeneity problems presented in the previous section, we propose to base the descriptions of GI services on ontologies. An ontology is an explicit formal specification of a shared conceptualization (Gruber, 1995; Studer et al., 1998), where a conceptualization can be defined as a way of thinking about some domain (Uschold, 1998). By using ontologies to enrich the descriptions of GI services, the semantics of data content or service
Chapter I. Synopsis
3
functionality become machine-interpretable, and users are enabled to pose concise and expressive queries. Furthermore, logical reasoning can be used for matchmaking between user queries and service advertisements in catalogues. It is the overall hypothesis of this thesis that a methodology providing (1) ontology-based descriptions and (2) matchmaking mechanisms based on these descriptions leads to improved recall and precision during the discovery of services for accessing and processing geographic data. This hypothesis leads to the following research questions: • Which problems during GI service discovery in conventional catalogues are caused by semantic heterogeneities? • How does service composition effect GI service discovery? • Which aspects of geographic data sources and geoprocessing services should be described in the proposed methodology? • Which methods should be used for matchmaking between these descriptions in the proposed methodology? • What kinds of ontologies (and ontology languages) are required for these descriptions and matchmaking methods? • How can the proposed methodology be encapsulated into software components and how can these be integrated into existing SDI architectures and workflows?
1.3
Contributions
The main contributions of this thesis are two approaches for the enhanced discovery of (1) data access and (2) geoprocessing services, which are presented in the following sections. Ontology-based Discovery of Data Access Services. The approach for ontology-based discovery of data access services is based on semantic matchmaking between Description Logic (DL) concepts representing geographic feature types (i.e. classes of geographic objects with common characteristics) on the one hand and the user’s query on the other hand. Feature types are described by specific application concepts that are built using roles and concepts from a shared vocabulary. This shared vocabulary contains basic terms (the primitives) of a domain which are combined in the application ontologies in order to describe more complex semantics. It is assumed that all actors within a domain share a common understanding of the concepts contained in the shared vocabulary. In the proposed methodology, the requester is supported by a query language and GIS-like graphical user interface, which allow her to intuitively formulate a query using terms from the shared vocabulary. From this query, a DL query concept is automatically generated and DL subsumption reasoning is used to determine whether existing application concepts (describing feature types) are a match for this query concept. When an appropriate feature type is discovered, the user’s query can also be used to automatically generate a request to retrieve the data from its WFS. Ontology-based Discovery of Geoprocessing Services. The approach for ontology-based discovery of geoprocessing services uses (1) descriptions of requirements and service capabilities based on ontologies describing the operations used in GI services and (2) a mechanism based on function subtyping for matching between them. In the presented approach, providers and requesters describe the operations they provide or search for by semantic advertisements and semantic queries, respectively. Both of these operation descriptions consist of a so-called semantic signature, which contains DL concepts (instead of datatypes) to represent inputs and outputs, and a specification of pre- and postconditions in First Order Logic
4
Chapter I. Synopsis
(FOL). Again, it is crucial that semantic advertisements and queries are based on a shared domain vocabulary to ensure greater recall during service discovery. This vocabulary also includes domainlevel operation descriptions that can serve as templates for creating semantic advertisements and queries. Service discovery is based on a two-step matchmaking process between these descriptions. First, the semantic signature is used for efficiently filtering the potentially large number of services. This filtering is done using DL subsumption reasoning on the concepts representing inputs and outputs and should result in a relatively small number of candidate services. In the second step, the preand postconditions of the remaining services are compared to those specified in the query using a FOL theorem prover. In this matchmaking step, several degrees of match (all of which are based on the idea of function subtyping) can be tested. This allows a fine-grained distinction and ranking between the services identified in the first step.
1.4
Overview
The remainder of this synopsis is structured as follows. In section 2 we introduce the notion of spatial data infrastructures, describe different types of GI services and present existing approaches to GI service discovery and composition in current SDIs. We then introduce a running example that we use throughout the synopsis for illustration (section 3) and a framework for classifying different discovery scenarios (section 4). Section 5 introduces the building blocks that provide the basis for the proposed methodologies for ontology-based discovery. It explains the notions of ontologies and shared domain vocabularies and presents the logical formalisms used in this thesis (FOL and DL). It also introduces registration mappings, which describe the mapping between the ontology and the schema describing a certain feature type, and function subtypes, which are the basis for the matchmaking algorithms used in the proposed methodologies. Based on these building blocks, methodologies for semantic discovery of geographic data and geoprocessing services are developed in sections 6 and 7. These sections also give step-by-step instructions for the running example and describe the development of the used ontologies. Section 8 presents a prototypical implementation of both methodologies and illustrates how they can be integrated into existing SDI architectures. The paper concludes with a discussion of related work in section 9 and a conclusion and outlook to future work in section 10.
2
GI Service Discovery and Composition in Spatial Data Infrastructures
SDIs (Groot and McLaughlin, 2000; Masser, 2005) provide the setting for the issues addressed in this thesis. In this section, we therefore introduce the notion of SDIs and illustrate how they support the discovery of GI services and their composition into service chains. Chapter II presents a state-of-the-art example of service chaining and discusses the interoperability issues that have to be overcome in order to achieve the vision of ad-hoc service composition. SDIs. A main motivation for setting up SDIs are problems that prevent an efficient reuse or sharing of geographic data when using conventional GIS technology (McKee, 2000; Nebert, 2001), in particular if data are distributed among heterogeneous user groups. Through catalogue services and interoperability standards for data formats, metadata and service interfaces they support users in discovering, accessing and composing distributed GI services. In this thesis, we focus on the technological aspects of SDIs. Following the definition in Groot and McLaughlin (2000), the notion of an SDI also includes the institutional, organisational and economic resources that support the development and maintenance of the infrastructure and the information it contains. These topics are outside the scope of this thesis.
Chapter I. Synopsis
5
GI Services. SDIs are based on the assumption that users are usually not interested in data, but a piece of information that can be generated using that data. Therefore, SDIs are based on GI services as main components4 . GI services provide functions, that have traditionally been offered by monolithic GIS, over a distributed computing platform, typically the Web. These functions include the capture, modelling, storage, retrieval, sharing, manipulation, analysis and presentation of geospatial (geographically referenced) data (Worboys and Duckham, 2004). In this thesis, we are particularly concerned with two kinds of GI services: 1. Services that provide geographic data through standardised interfaces in standardised formats. In the service classification proposed in ISO (2005), these services are listed (together with many other types) under (geographic) model/information management services. To emphasise that their main task is to provide access to data, we refer to them in this thesis as (geographic) data access services. 2. Services that analyse (and manipulate) geospatial data. Examples include the following operations: buffering a geometry to create a new geometry, measuring the distance between two geometries or intersecting two (sets of) geometries. According to ISO (2005), these services can be classified and will subsequently be referred to as (geographic) processing services (or geoprocessing services in short). Interoperability of GI Services. At present, GI services are generally isolated components. Nevertheless, several services may be composed to create a new service that answers a particular question, even though they were not specifically designed for that purpose. It is this composability (and the more complex processing tasks it enables) that is often perceived as the greatest value of the web service paradigm (Einspanier et al., 2003). GI services can only be sensibly composed if they are interoperable. Interoperability is defined as ”the capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units” (ISO, 2005). In service-centred SDIs, enabling interoperability between services is therefore a key requirement. One approach for achieving interoperability is the standardisation of service interfaces, which allows the classification of services in well-known service types that exhibit a specific behaviour. Thus, standardisation makes it possible for a requester to connect to a service and invoke its operations. Also, it is possible to compose arbitrary service instances as long as they are of a well-known service type. In the geospatial domain, there are mainly two standardisation efforts to enable interoperability in distributed systems: the ISO Technical Committee (TC) 211, which develops the 19100 series of standards, and the Open Geospatial Consortium (OGC)5 . For data access services, service interfaces have already been standardised and are in widespread use. In this thesis the focus is on Web Feature Services (OGC, 2002), which provide vector data, rather than Web Coverage Services (WCS) (OGC, 2003), which provide coverage data. Recently, a service interface has also been specified for geoprocessing services (Web Processing Service, WPS) (OGC, 2005). Standardised interfaces enable syntactic interoperability, i.e. they ensure ”the data can be transferred between systems” (ISO, 2005), and are thus an important step towards full interoperability. However, they fail to enable semantic interoperability, i.e. to ensure that ”the content is understood in the same way in both systems, including by those humans interacting with the systems in a given context” (ISO, 2005). Particularly, the semantics of the data provided by data access services and the semantics of the operations offered by geoprocessing services often remain un4 Thus, SDIs would better be termed spatial information or service infrastructures. However, we will use the term spatial data infrastructure in this thesis as it seems to be well established. 5 formerly called OpenGIS Consortium
6
Chapter I. Synopsis
clear. However, both are important aspects to consider during the discovery and composition of GI services. GI Service Discovery and Composition. In ISO terminology (ISO, 2005), a sequence of services where, for each adjacent pair of services, the execution of the first service is necessary for the execution of the second service, is called a service chain. Three types of service chains are distinguished: User-defined (transparent), workflow-managed (translucent) and aggregate (opaque) service chains. However, apart from the user-defined chaining, the description of these types is focused on the execution of the chain rather than its composition, which is the focus of this thesis. A service chain consists of several services each of which contributes one part to the overall functionality. During service composition a number of services have to be discovered, which together provide the required functionality. As GI services are usually arbitrarily distributed in SDIs, a central component is required that provides metadata in order to help users to discover services and to assess whether they might be useful for their tasks. Such central SDI components are called catalogues. While, traditionally, catalogues mainly stored metadata about data sources, they now often also include metadata about GI services. For both kinds of metadata, standardised schemas have been developed, most notably those defined in ISO 19115 (ISO, 2003b) for geographic data and ISO 19119 (ISO, 2005) for GI services. Strictly speaking, both data access and geoprocessing services should be registered as GI services in catalogues. However, in contrast to geoprocessing services, the operations of data access services are standardised and well known. For these services, the provided data content is of particular interest for a requester. Therefore, we assume in our work that data access services are registered in catalogues as data sources. The data access service itself can be described as the online access method of that data source, e.g. in the optional Online resource item of the core ISO 19115 metadata schema (ISO, 2003b). Providers can advertise data sources or GI services in catalogues by registering their metadata (push), or the catalogue may collect metadata from known service and data providers (pull). A requester can then use the catalogue’s ”librarian functions” (discovery, browsing, querying) to find previously unknown GI services or data sources that fit her needs. While service discovery is an important part of service composition, service composition also imposes some constraints on service discovery (figure 1). The component services of the service chain that have already been discovered have to be taken into account during the subsequent discovery steps: • The outputs of preceding and the inputs of succeeding services have to be considered in order to ensure that the data exchanged between services in a chain are interpreted correctly. • The operation (or functionality) provided by each of the services already discovered6 have to be considered in order to derive which part of the required overall functionality is still missing. The query resulting from these constraints can then be compared to service descriptions registered in some catalogue service.
3
A Running Example
We will use a running example throughout this synopsis to illustrate the approaches for ontologybased discovery of data access and geoprocessing services. Susan is visiting an unfamiliar city. She wants to use a mobile device (e.g. a mobile phone or PDA) to find the nearest place to eat that is still open. Her mobile device is equipped with some location 6 For
simplicity, we assume in this paper that each service only provides one operation.
7
Chapter I. Synopsis
required overall functionality
impose constraints on
Service Requester
Query Matchmaking
has already discovered
inputs outputs functionality
GI Services
inputs outputs functionality
Service Descriptions
inputs outputs functionality
Service Descriptions
GI Services
Figure 1: Service discovery during service composition. The properties of services that have already been discovered impose constraints on the queries for further services.
sensor (e.g. GPS), so that Susan knows her current location. She now wants to find services that provide information about ”places to eat” (including their location and opening hours) and a service for computing the distance from her current location. Both services could be combined into a service chain using a workflow service. We assume that Susan is connected to an SDI and can use its catalogue service to perform her search. The SDI offers three WFS (whose provider is called John) that might be appropriate for Susan’s request: WFS 1 provides information on Pub features, including their location, opening hours and the meals they serve. The location is given as a point with a geographic coordinate reference system: WGS84 (EPSG:4326). WFS 2 provides information on Restaurants, again including their location, opening hours and the meals they serve. The location is given as a point with a projected coordinate reference system: Gauß-Kr¨ uger, zone 2 (EPSG:31466). WFS 3 provides information on Pub features, including their location (again given in GaußKr¨ uger, zone 2) and the meals and drinks they serve. Obviously, both WFS 1 and 2 contain the information (places to eat and their location and opening hours) required for answering Susan’s question. WFS 3, however, while providing ”places to eat”, fails to provide information on the opening hours and is therefore unsuitable. The SDI also offers three services (which are also provided by John) for calculating the distances between point geometries7 : Distance 1 returns the great circle or geodesic distance between two points on a sphere expressed 7 Strictly speaking, the WFSs discovered in the first step provide features instead of just point geometries. This means that the distance services either would have to be able to compute the distance between a feature and a point or another service would have to be included in the chain that extracts the point geometry from point features. For simplicity, we do not consider this in our example.
8
Chapter I. Synopsis
in geographic coordinates8 , which equals the length of the great circle section on the spherical surface that is defined by the two points. Distance 2 returns the 2D Euclidian distance between two points in a plane expressed in Cartesian coordinates, which equals the length of a straight line between them. Distance 3 returns the distance between two points in a specific road network. The network’s geometry is represented in a specific coordinate reference system, e.g. in WGS84 (Distance 3a) or Gauß-Kr¨ uger, zone 2 (Distance 3b). One can imagine such a network to be located on the associated sphere or plane. Formally, the network can be represented as a weighted graph, whose nodes represent the network’s intersections and whose edge weights are equal to the length of the curve connecting two intersections (measured in the space on which the network is located). The distance between two points in the network can then be computed based on the shortest path between the corresponding nodes in the graph, e.g. using Dijkstra’s algorithm (Cormen et al., 1990).
P1
Distance 1
Scenario 1a
Scenario 1b
P2
P1
a
WFS 1
Distance 3 Scenario 2b
b
P2 y
Pubs - location (lat/lon) - meals - opening hours
WFS 2
Scenario 2a
Distance 2
P2 P1 x
Restaurants - location (GK) - opening hours - meals
WFS 3
Pubs - location (GK) - meals - drinks
Figure 2: Possible combinations of data access and geoprocessing services for answering Susan’s question. Which of John’s services is appropriate for answering Susan’s question depends on (i) the WFS she has chosen in the first step and (ii) the kind of distance she wants to compute (figure 2). We consider the following discovery scenarios: Scenario 1: Susan has chosen WFS 1 (whose location is given in WGS84 coordinates). (a) If she is interested in the straight line distance (on the sphere defined for WGS84) she can use Distance 1. (b) If she is interested in the distance within some road network on this sphere she can use Distance 3a. 8 For
simplicity, we assume in this thesis that geographic coordinate reference systems are based on spheres rather than the actual ellipsoids.
9
Chapter I. Synopsis
Scenario 2: Susan has chosen WFS 2 (whose location is given in Gauß-Kr¨ uger, zone 2 coordinates). (a) If she is interested in the straight line distance (on the plane defined for Gauß-Kr¨ uger, zone 2) she can use Distance 2. (b) If she is interested in the distance within some road network on this plane she can use Distance 3b. Distinguishing between the different kinds of distance calculations can be crucial, as is illustrated in figure 3. If Susan only considers the straight line distance, pub 1 would seem closer than pub 2. However, it is across the river and not directly reachable by the road network. When the network is taken into account, pub 2 is closer than pub 1.
Pub 2 River Pub 1 Current Location
Figure 3: Which is the closest pub? Pub 1 if using straight line distance (dotted lines), pub 2 if using network distance.
4
Matchmaking Scenarios for GI Service Discovery
Matchmaking is defined as mediating among requesters and providers of services for some mutually beneficial cooperation (Sycara et al., 2002). A crucial part of finding appropriate GI services for a certain task is assessing whether (or how well) their advertisements fit the requirements of the requester. This matchmaking step is a fundamental part of GI service discovery in SDIs. In chapter III, a framework for classifying different approaches to matchmaking in SDIs is presented. In the following, we shortly present the different dimensions of the framework and illustrate two scenarios that are particularly relevant to the work presented in this thesis.
4.1
The Classification Framework
We distinguish different roles in the framework. The requester and provider roles are (ultimately) always adopted by a human user. In contrast, the matchmaker role can be played by either a human user (one of the above or an independent broker) or a matchmaking service. Accordingly, two kinds of matchmaking can be distinguished, which represent two endpoints of a continuum. While manual matchmaking is done by a human actor and occurs in the mind of the matchmaker, automatic matchmaking is always done by a service. It therefore requires formal descriptions of requirements and service capabilities, which are matched automatically using a specific matchmaking algorithm (e.g. Sycara et al., 2002). The quality of the descriptions of requirements and service capabilities available to the matchmaker is crucial for the matchmaking task. It can vary along three dimensions: the explicitness of the descriptions, their structuring and the formality of their content. As these dimensions are not
10
Chapter I. Synopsis
independent of each other, five levels of explicitness, structuring and formality are presented in section III.3.2. Of these, the last two are of particular interest to this thesis as they correspond to the approaches currently used in SDIs (level D) and the approach proposed in this thesis (level E): • Level D: Explicit, structured, informal semantics. Advertisements in the catalogue are in a structured form, e.g. referring to metadata standards such as ISO 19115 (ISO, 2003b) that – usually informally – specify metadata fields and their semantics. However, with the exception of value lists being specified for some fields, the content of the metadata fields is to be given in the form of free natural language text. • Level E: Explicit, formal semantics. Advertisements in the catalogue refer to explicit and formal specifications in ontologies.
4.2
Matchmaking Scenarios
The two categories presented above are illustrated in the following scenarios for the running example. More examples and further details on the framework are given in chapter III. Scenario 1 – Manual Matchmaking with Standardized Metadata. In this scenario (figure 4) John’s conceptualisations are made explicit and are recorded in metadata documents whose structure is well known and which are made available through a catalogue. Susan can search the catalogue using keywords for all of the fields provided by its query interface. She can then use the returned metadata documents to assess whether or not the advertisements fit her requirements. She might need to access other documents that the metadata documents refer to, e.g. a feature catalogue providing definitions for feature classes or ISO standards defining units of measurement. Do these resources match my requirements?
Where is the closest place to eat?
Requester query: „place to eat“, „distance“
e.g. ISO 19115, ATKIS OK
Restaurant Data
External Information
Distance Service
Query Interface Keywords for some/all metadata fields
Metadata
possibly based on
My data represent restaurants
access enter description e.g. according to ISO 19115/ 19119
Catalogue
enter description
My Service can compute distances
Data Provider
Metadata Schema
Service Provider
Figure 4: Manual matchmaking with standardized metadata, e.g. ISO 19115/19119 (level D).
11
Chapter I. Synopsis
Even though natural language processing techniques can increase the semantic relevance of search results with respect to Susan’s request (e.g. Richardson and Smeaton, 1995), keyword-based techniques are inherently restricted by the ambiguities of natural language. If Susan and John use different terminology in their query and advertisement, keyword-based search can have low recall, i.e. not all relevant advertisements are discovered. If they use homonymous terms or because the possibilities to express complex queries in keyword-based search are limited, precision can also be low, i.e. some of the discovered advertisements are not relevant (Bernstein and Klein, 2002). For example, if Susan, who is interested in places that serve meals (and their location and opening hours), uses ”restaurant” as a keyword she may fail to find the existing data sources that are offering this information (low recall), because their metadata descriptions use different terminology, e.g. ”take away” or ”catering”. Furthermore, she might also discover data sources that are annotated with this keyword but not appropriate for answering her question (low precision), e.g. a data source with restaurant features that does not include information on the restaurants’ locations or opening hours. Ambiguity can be considerably reduced by providing a controlled vocabulary (e.g. a list of keywords), and by referring to other standardised or widely known and agreed-upon documents (e.g. feature catalogues). However, these are usually also based on natural language and therefore prone to the same ambiguities as described above. Where is the closest place to eat?
Requester Access points for appropriate resources
using concepts from
e.g. with preand postconditions, BBox etc.
Domain Ontologies
Requirements Template
My data represent restaurants
Matchmaking Service access
e.g. with preand postconditions etc.
Capabilities Template
using concepts from
enter description
Catalogue
enter description
My Service can compute distances
Data Provider
Service Provider
Figure 5: Automatic matchmaking with formal metadata (level E).
Scenario 2 – Automatic Matchmaking with Formal Metadata. In the second scenario (figure 5) the conceptualisations of Susan and John are not only explicit but also formalised. They use ontology concepts to formulate their queries or advertisements, respectively. Both the queries and advertisements are based on the same domain ontologies (section 5.2) to make them comparable. A service automatically compares Susan’s query with the advertisements stored in its registry using a matchmaking algorithm. It is assumed that by using formal descriptions of
12
Chapter I. Synopsis
semantics and automatic matchmaking algorithms problems such as those described in the previous scenarios can be avoided (Guarino, 1997; Paolucci et al., 2002). How this scenario is realised for the discovery of data access and geoprocessing services is illustrated in the following sections.
5
Methods for Ontology-based GI Service Discovery
In this section, we introduce the theoretical background and methodological building blocks for the approaches to GI service discovery proposed in this thesis. We present the notion of ontologies (section 5.1), how they can be used as shared vocabularies (section 5.2) and different languages for creating ontologies (section 5.3). Registration mappings, which are required in the proposed approach for the retrieval of geographic data, are introduced in section 5.4, and function types and subtypes, which are used in the proposed approach for the discovery of geoprocessing services, are presented in section 5.5.
5.1
Ontologies
To enhance the discovery of data access and geoprocessing services in SDIs, we propose to use descriptions of feature types and geoprocessing operations that are based on ontologies. An ontology is an explicit formal specification of a shared conceptualization (Gruber, 1995; Studer et al., 1998), where a conceptualization can be defined as a way of thinking about some domain (Uschold, 1998). Ontologies consist of formal axioms that describe concepts, individuals and the relationships between them. By using ontologies to enrich the descriptions of GI services, the semantics of data content or service functionality become machine-interpretable, and users are enabled to pose concise and expressive queries. Furthermore, logical reasoning can be used to discover implicit relationships between search terms and service descriptions as well as to flexibly construct taxonomies for classifying advertisements in catalogues. The role of ontologies for enhancing descriptions of feature types is further discussed in sections IV.4.1, V.4 and VI.3.1. Their role for enhancing descriptions of geoprocessing operations is detailed in sections VIII.3.1 and IX.2.2.1.
5.2
Shared Vocabularies
The ontologies used for making the semantics of feature types and geoprocessing operations explicit can be organised in different ways. In our work, we have adopted a classification of different ontology architectures for data integration introduced in Wache et al. (2001). In multiple ontology approaches, each service or query is described by its own local (or application) ontology. In principle, each of these local ontologies can be a combination of several other ontologies. However, it cannot be assumed that several local ontologies share the same vocabulary. This lack of a common vocabulary makes it difficult to compare different local ontologies. In contrast, single ontology approaches use one global ontology, which provides a shared vocabulary for specifying the semantics of all services and queries. Such approaches can be applied to problems where all service descriptions available in a catalogue have been created with a very similar view on a domain, which also has to be shared by all requesters. Hybrid approaches also use a global shared vocabulary, which contains basic terms (the primitives) of a domain. These can be combined to describe the (more complex) semantics of each each service or query in separate application ontologies. In contrast to multiple ontology approaches, the concepts in these application ontologies remain comparable, because they are based on the primitives from the shared vocabulary. In both the single ontology and the hybrid approach, it is assumed that the semantics of the primitives is understood (and this understanding is shared) by all requesters and providers in the domain. Therefore, the primitives require no further formal definitions. Nevertheless, it can
Chapter I. Synopsis
13
sometimes be useful to represent a shared vocabulary as an ontology in order to impose a structure on it. Such an ontology is called a domain ontology. In the heterogeneous and distributed environment prevailing in SDIs it has to be assumed that many diverse descriptions of feature types and geoprocessing operations are produced for the registration and discovery of GI services in catalogues. To keep these descriptions comparable it is crucial that they be based on a shared vocabulary.
5.3
Ontology Languages and Reasoning
The ontologies used in the discovery approaches proposed in this thesis are created using two different logical formalisms. First-Order Logic (FOL) is used during the discovery of geoprocessing services for describing and matching pre- and postconditions of operations. Description Logics (DL) is used during the discovery of data access services for describing feature types and during the discovery of geoprocessing services for describing the inputs and outputs of operations. FOL is described in more detail in section VIII.3, DL is discussed in sections IV.4.4 and VI.3.1. First-Order Logic FOL9 (Russell and Norvig, 2003) is a branch of logic that is based on individuals and the relations (predicates) between them. It permits the formulation of quantified statements about some or all the individuals in the universe of discourse. Predicates in FOL take only individuals as arguments and quantifiers only bind individual variables. The primitive symbols of FOL are (1) parentheses, (2) variables, constants, functions, and predicate symbols, (3) logical connectors, ¬ (not), ∧ (and), ∨ (or), ⇒ (implication), ⇔ (equivalence), and (4) quantifiers, ∀ (for all) and ∃ (there exists) (Gallaire et al., 1984). The goal of logic inference in FOL is to check whether a given knowledge base KB (a collection of sentences) entails a sentence a (KB |= a), i.e. whether a follows logically from KB. This is also called a proof obligation. Entailment in FOL is semidecidable, i.e. every entailed sentence can be found, but for non-entailed sentences, it is not always possible to decide whether they are entailed or not. Despite these theoretical limits, automated theorem provers can solve many hard problems in FOL. Inference procedures often employed include resolution and term rewriting (Russell and Norvig, 2003). Description Logics DL (Baader and Nutt, 2003) is a family of knowledge representation languages that are subsets of FOL10 . They also provide the basis for the Web Ontology Language (OWL), the proposed standard language for the Semantic Web (Antoniou and Van Harmelen, 2003). The basic syntactic building blocks of a DL are atomic concepts (unary predicates), atomic roles (binary predicates), and individuals (constants). The expressive power of DL languages is restricted to a small set of constructors for building complex concepts and roles. Implicit knowledge about concepts and individuals can be inferred automatically with the help of inference procedures (Baader and Nutt, 2003). A DL knowledge base consists of a TBox containing intensional knowledge (declarations that describe general properties of concepts) and an ABox containing extensional knowledge that is specific to the individuals of the domain. In our work, we only use the TBox language features listed in table 1. As, in the subsequent chapters, we use two different syntaxes, the generic DL syntax and the KRSS (Knowledge Representation System Specification) syntax (Patel-Schneider and Swartout, 1993), this section introduces the expressions in both syntaxes. 9 FOL
10 For
is also known as First-Order Predicate Logic or First-Order Predicate Calculus. a mapping from DL to FOL, see e.g. Sattler et al. (2003).
14
Chapter I. Synopsis
Language Feature
DL Syntax
concept definition concept inclusion role definition
C ≡D C D SP
KRSS Syntax (define-concept C D) (implies C D) (define-primitive-role S :parent P)
Table 1: TBox language features used in this thesis.
The different dialects of the DL family mainly differ in the language features for defining concepts and roles. In our work, we only use the features listed in table 2 (concepts) and table 3 (roles). These features are subsets of two popular DL languages that we have used in our papers and implementations: (1) SHIQ and (2) the DL variant of OWL (which is based on SHOIN (D)). The main difference between SHIQ and OWL relevant for the work in this thesis is that OWL natively supports datatypes (e.g., integer and string) and values but is restricted to unqualified number restrictions (Horrocks et al., 2003). Constructor
DL Syntax
intersection union value restriction limited existential quantification full existential quantification
D
number restrictions
→
E F EF ∀R.C ∃R. ∃R.C = nR ≤ nR ≥ nR
KRSS Syntax D
→
(and E F) (or E F) (all R C) (some R *top*) (some R C) (exactly n R) (at-most n R) (at-least n R)
Table 2: Concept constructors used in this thesis.
Role Property range domain
DL Notation (∃ S. )
(∀S.R) D
KRSS Syntax (define-primitive-role S :range R) (define-primitive-role S :domain D)
Table 3: Defining the ranges and domains of roles. One major advantage of (simple) DLs (like the one employed in this thesis) over FOL is that their inference procedures are decidable (Sattler et al., 2003). Of the available inference procedures, the possibility to compute subsumption relationships between concepts is of special importance for our work. Popular DL reasoners include e.g. RACER (Renamed ABox and Concept Expression Reasoner) (Haarslev and M¨oller, 2003) and Pellet11 . For a more detailed introduction to DL languages including different subsumption algorithms see Baader and Nutt (2003).
5.4
Registration Mappings
For ontology-based retrieval of geographic data (section 6.3), specific information on a feature type’s structure is required. To describe the relationships between feature type structure and application ontology, we have adopted the notion of registration mappings suggested in Bowers and Lud¨ ascher (2004). They can be used to derive data transformations or, in our case, to specify a query filter for a WFS GetFeature request to retrieve data. An example registration mapping between a Restaurant feature type and the restaurant2 concept of the associated application ontology is shown in figure 6. 11 see
http://www.mindswap.org/2003/pellet/
15
Chapter I. Synopsis 3404529.22,5759522.42 Kuhlmanns Herr Kuhlmann Pizza Margherita Pizza Funghi (...) 10:00:00 01:00:00 Structural Path /Restaurant /Restaurant/gml:pointProperty /Restaurant/name /Restaurant/oname /Restaurant/meal /Restaurant/open_from /Restaurant/closes_at
↔ ↔ ↔ ↔ ↔ ↔ ↔
Conceptual Path restaurant2 restaurant2.hasLocation restaurant2.hasName restaurant2.hasOwner.hasName restaurant2.serves restaurant2.openFrom restaurant2.closesAt
Figure 6: An example registration mapping (bottom), which maps the Restaurant feature type (top) to the restaurant2 concept of the associated application ontology. The application ontology is described in more detail in section 6.1.
The main idea of registration mappings is to have separate descriptions of the application concept C (called semantic type in Bowers and Lud¨ ascher (2004)) and of the structural details of the feature type it describes (the structural type). The main advantage of registration mappings lies in the fact that the semantics of the feature type can be specified more accurately in application concepts if the specification does not try to mirror the feature type’s structure. This is especially true for feature types that have a ”flat” structure that does not well reflect the conceptual model of the domain. For example, the property oname in figure 6 represents the name of the owner of the restaurant rather than the name of the restaurant itself. A registration mapping consists of a set of rules that define associations between a feature type’s structural and semantic types. The structural type is defined for XML using a subset of XPath (W3C 1999). The semantic type is specified as a so-called contextual path, which denotes a concept, possibly within the context of other concepts. It takes the form C.r1 .r2 .(. . .).rn , where r1 to rn are roles in the ontology connecting the described concept with the concept C. For example, restaurant.serves.hasP rice is a contextual path, where the concept selected by the path is price (the range of the hasP rice role) within the context of the meals or drinks served in a specific restaurant. For further details on how to create registration mappings, see section VI.3.4.
5.5
Function Subtyping
The notion of function subtypes has been used in component-based software development to judge whether a component can be substituted for another one. We employ function subtypes (and several types of match derived from this notion) during the discovery of geoprocessing services to
16
Chapter I. Synopsis
judge whether an operation description matches the requirements specified by the requester. A function f (x) = y has the function type D → C if x is of type D and y is of type C (Simons, 2002b). D is called the domain12 , C the codomain of the function (equation 1). This rule also applies to multi-argument functions because x can be of a product type N × M . x : D y : C ⇒ f (x) = y : D → C
(1)
A function type D1 → C1 is a subtype (@ $EHO '- *DHGH 9- 7D\ORU ./ =KRX ; 60$57 7RZDUGV 6SDWLDO ,QWHUQHW 0DUNHWSODFHV*HRLQIRUPDWLFD >@ %HUQDUG/ ([SHULHQFHVIURPDQLPSOHPHQWDWLRQ7HVWEHG WRVHWXSDQDWLRQDO6',,Q0 5XL] *RXOG 0 5DPRQ - (GV WK $*,/( &RQIHUHQFH RQ *HRJUDSKLF ,QIRUPDWLRQ6FLHQFH 3DOPDGH0DOORUFD >@ %HUQDUG / .UJHU 7 ,QWHJUDWLRQ RI *,6 DQG 6SDWLR7HPSRUDO 6LPXODWLRQ 0RGHOV 7UDQVDFWLRQVLQ*,6 >@ %LVKU @ *HPHLQGHWDJ %DGHQ:UWWHPEHUJ =XP ]ZHLWHQ 0DO LQQHUKDOE HLQHV -DKU]HKQWV 9HUKHHUHQGH 6WUPH LQ %DGHQ:UWWHPEHUJ =HLWVFKULIW IU GLH 6WlGWH XQG *HPHLQGHQ IU 6WDGWUlWH *HPHLQGHUlWH XQG 2UWVFKDIWVUlWH 2UJDQ GHV *HPHLQGHWDJV %DGHQ:UWWHPEHUJ >@ *URRW 5 0F/DXJKLQ - (GV *HRVSDWLDO GDWD LQIUDVWUXFWXUH &RQFHSWV FDVHV DQG JRRGSUDFWLFH2[IRUG2[IRUG8QLYHUVLW\3UHVV >@ ,627& D 7H[W IRU ',6 *HRJUDSKLF LQIRUPDWLRQ 5XOHVIRUDSSOLFDWLRQVFKHPD 9V'UDIW9HUVLRQ,QWHUQDWLRQDO2UJDQL]DWLRQIRU6WDQGDUGL]DWLRQ >@ ,627& E 7H[W IRU ',6 *HRJUDSKLF LQIRUPDWLRQ )HDWXUH FDWDORJXLQJ PHWKRGRORJ\9V'UDIW9HUVLRQ,QWHUQDWLRQDO2UJDQL]DWLRQIRU6WDQGDUGL]DWLRQ >@ ,627& 2*& *HRJUDSKLF LQIRUPDWLRQ 6HUYLFHV 'UDIW ,62',6 2SHQ*,6 6HUYLFH $UFKLWHFWXUH 9V 'UDIW 9HUVLRQ ,QWHUQDWLRQDO 2UJDQL]DWLRQ IRU 6WDQGDUGL]DWLRQ 2SHQ*,6&RQVRUWLXP >@ .XKQ: 6HPDQWLFVRI*HRJUDSKLF,QIRUPDWLRQ*HRLQIR6HULHV78:LHQ:LHQ >@ 2*& )LOWHU(QFRGLQJ,PSOHPHQWDWLRQ6SHFLILFDWLRQ9HUVLRQ2SHQ*,6&RQVRUWLXP >@ 2*& D 6W\OHG /D\HU 'HVFULSWRU ,PSOHPHQWDWLRQ 6SHFLILFDWLRQ *0/ 9HUVLRQ 2SHQ*,63URMHFWKWWSZZZRSHQJLVRUJ >@ 2*&E :HE)HDWXUH6HUYHU,QWHUIDFH,PSOHPHQWDWLRQ6SHFLILFDWLRQ9HUVLRQ2SHQ*,6 3URMHFWKWWSZZZRSHQJLVRUJ >@ 2*& F :HE 0DS 6HUYHU ,QWHUIDFH ,PSOHPHQWDWLRQ 6SHFLILFDWLRQ 9HUVLRQ 2SHQ*,6 3URMHFWKWWSZZZRSHQJLVRUJ >@ 3DROXFFL0.DZDPXUD73D\QH75 6\FDUD. 6HPDQWLF0DWFKLQJRI:HE6HUYLFH &DSDELOLWLHV ,Q , +RUURFNV +HQGOHU - (GV VW ,QWHUQDWLRQDO 6HPDQWLF :HE &RQIHUHQFH ,6:& 6DUGLQLD,WDO\6SULQJHU >@ 6\FDUD..OXVFK0:LGRII6 /X- '\QDPLF6HUYLFH0DWFKPDNLQJ$PRQJ$JHQWVLQ 2SHQ,QIRUPDWLRQ(QYLURQPHQWV$&06,*02'5HFRUG >@ 7KH'$0/6HUYLFHV&RDOLWLRQ '$0/66HPDQWLF0DUNXSIRU:HE6HUYLFHV'$0/YHUVLRQ >@ 9LVVHU86WXFNHQVFKPLGW+:DFKH+ 9|JHOH7 8VLQJ(QYLURQPHQWDO,QIRUPDWLRQ HIILFLHQWO\ 6KDULQJ GDWD DQG NQRZOHGJH IURP KHWHURJHQHRXV VRXUFHV ,Q & 5DXWHQVWUDXFK 3DWLJ6(GV (QYLURQPHQWDO,QIRUPDWLRQ6\VWHPVLQ,QGXVWU\DQG3XEOLF$GPLQLVWUDWLRQ+HUVKH\ 3$ ,GHD*URXS3XEOLVKLQJ >@ :DFKH+9LVVHU8 6FKRO]7 2QWRORJ\FRQVWUXFWLRQ$QLWHUDWLYHDQGG\QDPLFWDVN )ORULGD$UWLILFLDO,QWHOOLJHQFH5HVHDUFK6RFLHW\&RQIHUHQFH)/$,56 3HQVDFROD)/86$
Chapter III
A Classification Framework for Approaches to Achieving Semantic Interoperability between GI Web Services Lutz, M., Riedemann, C. & Probst, F. (2003): A Classification Framework for Approaches to Achieving Semantic Interoperability between GI Web Services, in: Kuhn, W., M. F. Worboys & S. Timpf (Eds.): Conference on Spatial Information Theory: Foundations of Geographic Information Science (COSIT 2003). LNCS 2825: 200–217.23 Abstract. The discovery of services that are appropriate for answering a given question is a crucial task in the open and distributed environment of web services for geographic information. In order to find these services the concepts underlying their implementation have to be matched against the requirements resulting from the question. It is in this matchmaking process where semantic heterogeneity has to be tackled. Whether semantic interoperability can be achieved depends on the quality of the information available to the matchmaker on the semantics of requirements and re-sources. The explicitness, structuring and formality of this information can differ considerably leading to different types of matchmaking. In this paper a framework is presented for classifying the approaches that are currently employed or proposed for achieving semantic interoperability according to these criteria. The application of the framework is illustrated by analysing possible solutions to three examples of semantic interoperability problems.
23 by c
Springer. Reprint permission granted 7 September, 2005. See http://springerlink.metapress.com/ openurl.asp?genre=article&issn=0302-9743&volume=2825&spage=186.
52
Chapter III. A Classification Framework for Semantic Interoperability
A Classification Framework for Approaches to Achieving Semantic Interoperability Between GI Web Services Michael Lutz, Catharina Riedemann, and Florian Probst Institute for Geoinformatics, University of Münster, Robert-Koch-Str. 26-28, 48149 Münster, Germany {lutzm, riedemann, probst}@ifgi.uni-muenster.de
Abstract. The discovery of services that are appropriate for answering a given question is a crucial task in the open and distributed environment of web services for geographic information. In order to find these services the concepts underlying their implementation have to be matched against the requirements resulting from the question. It is in this matchmaking process where semantic heterogeneity has to be tackled. Whether semantic interoperability can be achieved depends on the quality of the information available to the matchmaker on the semantics of requirements and resources. The explicitness, structuring and formality of this information can differ considerably leading to different types of matchmaking. In this paper a framework is presented for classifying the approaches that are currently employed or proposed for achieving semantic interoperability according to these criteria. The application of the framework is illustrated by analyzing possible solutions to three examples of semantic interoperability problems.
1
Introduction
Geographic information science is currently characterized by a paradigm shift – from providing theories for monolithic systems to theories for open and distributed GIS and their use processes. With this comes a move from standardized data formats to specifications of geographic information (GI) service interfaces [1, 2]. In practice, the number of GI services available on the web is rapidly and continually increasing. Semantic interoperability is a core problem in such an open and distributed environment [3]. In the description of the OpenGIS service architecture [4] the syntactic and semantic aspects of interoperability are defined as follows: “Syntactical interoperability assures that there is a technical connection, i.e., that the data can be transferred between systems. Semantic interoperability assures that the content is understood in the same way in both systems, including by those humans interacting with the systems in a given context.” In the open and distributed environment of GI web services, the components that are to interoperate are not previously known. The starting point is a requester’s spe-
53
54
Chapter III. A Classification Framework for Semantic Interoperability
cific question rather than a given system. Discovering the services1 that are appropriate for answering this question from among a large number of available services is a central task within the GI web services domain [5]. Service discovery will therefore be the focus of this paper. In order to find an appropriate service the requirements resulting from the requester’s question have to be matched against descriptions of the the available service implementations. It is in this matchmaking process that semantic interoperability is ensured, making it a crucial part of service discovery (Fig. 1). Question Requirements
Requester
matches ?
matches ?
semantic interoperability matches ?
Component X
Component X
Component Y
Component Y
Provider X
Provider Y
Fig. 1. Semantic interoperability in a GI web services scenario
A large number of languages and technologies have been proposed for service discovery, e.g. for web services in general [6-8], for services and data in the geospatial domain [5, 9, 10], or for software agents and the Semantic Web [11-13]2. Whether semantic interoperability during service discovery can be achieved in any of these approaches depends on the quality of the information available to the matchmaker on the semantics of requirements and resources. The explicitness, structuring and formality of this information can differ considerably leading to different forms of matchmaking. We are not aware of any framework for classifying the plethora of existing approaches for service discovery with respect to achieving semantic interoperability. Therefore, we propose such a framework in this paper. The remainder of the paper is structured as follows. In the next section we present several examples of practical problems caused by semantic heterogeneity. The framework for classifying approaches for overcoming semantic heterogeneity is de1
The notion of service in this paper includes both services that can be used to operate on multiple, unspecified datasets (loosely-coupled services) and services that are associated with a specific dataset (tightly-coupled services) [27]. 2 We assume that the reader is familiar with these approaches. Their strengths and weaknesses are outside the scope of this paper and will therefore not be discussed.
Chapter III. A Classification Framework for Semantic Interoperability
veloped in section 3 and applied to the practical problems in section 4. We conclude the paper by discussing how the results can be applied in the more complex task of (automatic) service composition and by pointing out the next steps for semantic interoperability research along these lines.
2
Examples of Semantic Interoperability Problems
This section presents three examples for problems caused by semantic heterogeneity that we have encountered in our research. They occur in monolithic, at most partially component-based, GIS environments. Nevertheless, they are equally valid for a web service environment. 2.1 Classification of Semantic Heterogeneity Semantic heterogeneity, the source of semantic interoperability problems, is defined in [14] as the consequence of different conceptualizations and database representations of a real world fact. Two types can be distinguished. Cognitive heterogeneity arises when two disciplines have different conceptualizations of real world facts. This becomes a semantic problem when the same names are used for different concepts in both disciplines. Such word pairs are referred to as homonyms. Naming heterogeneity refers to different names for identical concepts of real world facts, also called synonyms. The examples subsequently described are classified according to this distinction in order to make sure that both types of heterogeneity are covered. 2.2 Using Topographic Data for Noise Abatement Planning Situation. To determine which roads could have a considerable noise effect on residential areas those roads touching or crossing residential areas must be identified. German topographic data (Amtliches Topographisch-Kartographisches Informationssystem, ATKIS) contain residential areas and roads as feature classes [15]. Some roads are modeled as lines as shown in Fig. 2 (b). (a)
(b) residential
road
area
residential area
road
residential area
Fig. 2. Different models of roads crossing residential areas
Problem. A user might have the mental concepts of roads and residential areas as depicted in Fig. 2 (a). The system model instead uses representations of roads and residential areas as depicted in Fig. 2 (b). If the user is not aware of the system model
55
56
Chapter III. A Classification Framework for Semantic Interoperability
(that the terms “residential area” and “road” do not reveal) he might assume that roads overlap residential areas as indicated in Fig. 2 (a) and consequently use the dataset as input for an intersect operation in order to find roads crossing residential areas. However, based on a system model as depicted in Fig. 2 (b) he will not find any roads by doing so, which is correct for the data model of the dataset, but does not meet the user’s expectations. Heterogeneity Type. This example depicts cognitive heterogeneity concerning residential areas; the concepts of user and system regarding the geometric representation are different. The difference is hidden by the homonym “residential area”. 2.3 Calculating the Area of Greenland in a Mercator Projection Situation. In the Mercator map projection features on the reference ellipsoid are projected onto a cylinder touching the equator. This leads to increasing distortion towards the poles and does not preserve areas (see Fig. 3 (b)). For all tasks requiring real world area values, it is not appropriate to calculate the area of features in polar regions, like Greenland, directly from the Mercator projection cylinder (see Fig. 3 (a)). (a)
(b)
Fig. 3. Greenland and Africa in (a) equal-area Mollweide projection and (b) non-equal-area Mercator projection (images taken from [16])
Problem. Most GIS do not inform users during execution how areas of features are calculated and whether the results reflect the real world area of that feature. The user may expect an area calculation to return the real world area. Such an area calculation would be based on the feature’s geometry on the reference ellipsoid. However, if the system’s concept of area calculation is based on the feature’s geometry on the projection cylinder, the operation will return a completely different result. If the user is not aware of the different concepts of area calculation he will misinterpret the results. Heterogeneity Type. This example depicts cognitive heterogeneity within the concept of area calculation. 2.4 Topological Operators in GeoMedia and Oracle This example consists of two parts. First, it describes two operators with the same name and different behavior. Then it describes two operators with different names and equivalent behavior.
Chapter III. A Classification Framework for Semantic Interoperability
Situation. GeoMedia3 provides a set of topological operators. In addition, it integrates topological operators of the Oracle4 database system. We look at two GeoMedia operators, called “touch” and “meet”, and compare them to an Oracle operator called “touch”.
Fig. 4. Regions found (each marked with a thick line) by (a) GeoMedia “touch” operator and by (b) Oracle “touch” operator
Fig. 5. Region found (marked with a thick line) by GeoMedia “meet” operator as well as by Oracle “touch” operator
Problem. We have identified two problems in this example: 1. Although the names are identical, the two “touch” operators of GeoMedia and Oracle return different results (Fig. 4). 2. The GeoMedia “meet” operator and the Oracle “touch” operator, however, find the same regions (Fig. 5 (2)) although they are named differently. Thus, the names are confusing and misleading, and consequently not useful to the user for deciding if an operation does what he expects. Heterogeneity Type. The first problem is caused by cognitive heterogeneity with the homonym “touch”. The second problem demonstrates naming heterogeneity with the synonyms “meet” and “touch”. 3 4
GeoMedia Professional (Intergraph Corp.) V5.0 Oracle 9i Release 2 Spatial (Oracle Corp.)
57
58
Chapter III. A Classification Framework for Semantic Interoperability
3
Analysis Framework
There are many approaches to ensuring semantic interoperability in the examples presented in the previous section, e.g. [13, 17, 18]. In this section we present a framework for classifying and analyzing such approaches. We define the term matchmaking and present different types of matchmaking. We proceed to differentiate several levels of explicitness, structuring and formality for the information required by the matchmaker. 3.1 Matchmaking In the literature on agent systems matchmaking is defined as mediating among requesters and providers of services for some mutually beneficial cooperation [13]. The process of finding an appropriate service for a certain task can be regarded as matchmaking, too. During matchmaking it is assessed whether (or how well) an available service fits the requirements of the requester. We distinguish different roles that are played by human actors or system components that exist in the domain of GI web services: the requester role, which is (ultimately) always played by a human (end user or web service provider), the provider role, which is also played by a human (web service provider), and the matchmaker role, which can be played by either a human (one of the above or an independent broker) or a matchmaking service. Note that the same person can take different roles. For example, the person in the requester or provider role can also be responsible for the matchmaking process. As either human or computer can do the matchmaking, two kinds of matchmaking can be distinguished, which represent two endpoints of a continuum: Purely manual matchmaking. Manual matchmaking is done by a human actor and occurs in the mind of the matchmaker. The matchmaker decides whether or not some service fits the requester’s requirements based on information that is available to him about the service. Manual matchmaking is prone to misunderstandings caused by synonyms and homonyms (section 2.1). In order to mitigate this problem, additional information is collected to reduce ambiguity. Fully automatic matchmaking. In contrast to manual matchmaking fully automatic matchmaking is always done by a service. This requires formal descriptions of requirements and service capabilities. These are matched automatically using an algorithm such as described in [13, 19]. In cases where some of the required formal descriptions are missing, the existing informal descriptions have to be formalized for automatic matchmaking to be applied. Alternatively, the formal descriptions can be made informal and manual matchmaking can be applied. Informalization becomes necessary because formal descriptions are usually difficult to read for non-experts. It should be noted that automatic matchmaking, too, could lead to results unexpected by the requester. This can either be due to explication or formalization errors (i.e. inappropriate capabilities or requirements descriptions) or inappropriate parameterization of the matchmaking algorithms.
Chapter III. A Classification Framework for Semantic Interoperability
3.2 Levels of Explicitness, Structuring and Formality The quality of the information (metadata) on requirements and service capabilities that is available to the matchmaker is crucial for the matchmaking task. Which information on requirements and on the service has to be made explicit to the matchmaker depends on who does the matchmaking: If the requester does the matchmaking the requirements are already available in the matchmaker’s mind. Therefore, they do not have to be formalized or even made explicit. However, making the requirements explicit and thus reducing ambiguity can help avoiding misinterpretation. If the provider does the matchmaking the service capabilities are already available in the matchmaker’s mind. Therefore, they do not have to be formalized or even made explicit. However, making the capabilities explicit can help to clarify them and discover inconsistencies. If an independent broker does the matchmaking both requirements and service capabilities have to be made explicit to the matchmaker. A (possibly standardized) structure and formalization might help the broker to do the matchmaking. The quality of the metadata can vary along three dimensions: Explicitness of information. The information can be implicit, i.e. only in someone’s mind, or explicit, i.e. written down in some language. It is also important to note how complete the available information is, i.e. whether all the information that is required by the matchmaker is available. Structuring of information. The structure of the information can be implicit and thus unobservable or explicit or even standardized. We refer to the former as unstructured and to the latter as structured information. There are, of course, different levels of structuring [20]. Formality of semantics. The semantics of the concepts used to describe the service can be expressed in ontologies, which are defined as explicit specifications of conceptualizations [21]. A conceptualization is a set of concepts, their definitions and interrelationships [22]. Ontologies can be expressed both informally and formally, i.e using natural or formal languages. There are also intermediate levels of formality [20]. The classification framework could simply consist of these dimensions. However, they are not independent of each other, e.g. the structuring or formality dimensions do not matter if this information is not explicit. Therefore, we propose five levels of explicitness, structuring and formality. A. Completely implicit semantics. The information exists only in the mind of the provider, requester or matchmaker. B. Implicit semantics. Only names (e.g. „forest data“, „web mapping service“) but no metadata are made explicit to refer to services or requirements. C. Explicit, unstructured, informal semantics. Metadata are made explicitly available, but in an unstructured form using natural language text. D. Explicit, structured, informal semantics. Metadata are made explicitly available in a structured form, e.g. referring to metadata standards such as ISO 19115 [23] that – usually informally – specify metadata fields and their semantics. However, with the exception of value lists being specified for some fields, the content of the metadata fields is to be given in the form of free natural language text.
59
60
Chapter III. A Classification Framework for Semantic Interoperability
E. Explicit, formal semantics. The information is made explicitly available referring to formal ontologies. These categories are somewhat arbitrary as all three dimensions are continuous. However, we think they represent typical examples for approaches to achieve semantic interoperability. This is illustrated by the scenarios presented in the following section. 3.3 Matchmaking Scenarios In order to illustrate the levels of explicitness, structuring and formality described in the previous section, three scenarios are depicted. In all of them the requester wants to know the location of forest parcels in the German federal state Northrhine-Westfalia (NRW). The scenarios represent typical approaches to achieving semantic interoperability at three stages of development. The first scenario shows what is possible and widely practiced by users of the World Wide Web today (levels B and C5). The second scenario describes the research and industry attempts made in the GI community (level D), most notably in the OpenGIS Consortium (http://www.opengis.org) and the ISO Technical Committee 211 (http://www.isotc211.org). The last scenario presents ideas that are currently discussed in the Semantic Web and agent systems communities (level E). Note that the role labeled requester in the following figures could either represent an end user who wants an appropriate service to answer his question, or a service provider who wants to find appropriate services to build a complex service that performs a specific task. The actor or component responsible for the matchmaking is highlighted in gray. Scenario 1 - Manual Matchmaking Based on Names or Unstructured and Informal Information. In the first scenario (Fig. 6) the capabilities of the services are not made explicit by their providers. The only clues for the requester to what the services are doing or which data they provide are their names. One means for finding appropriate services by their name is through a keyword search in an Internet search engine like Google. In such a search the requester can encounter the following problems: No match. Services that fit the requester’s requirements are not found at all, because their names do not match the keywords included in the requester’s query. The simplest reason for this are spelling differences or mistakes. Leaving these aside, the problem can be classified as a case of naming heterogeneity (section 2.1): The conceptualizations of requester and provider are sufficiently similar for the task at hand but concepts are given different names (synonyms). This can have several reasons. Either the name of the service or the keywords used in the query are not appropriate, i.e. they do not well reflect the service capabilities or the requester requirements, respectively. Or both keywords and names are appropriate
5
Level A is not considered because service discovery becomes extremely difficult or even impossible when the semantics of requirements or capabilities are completely implicit.
Chapter III. A Classification Framework for Semantic Interoperability
(within their respective domain), but requester and providers belong to different information communities. Unsuitable match. Services that are found because their name matches the keywords included in the requester’s query do not fit the requester’s requirements. The conceptualizations of requester and provider are different but are given the same names (homonyms). This case can be classified as cognitive heterogeneity leading to naming conflicts (section 2.1). The possible reasons for this can again be inappropriate names or differing information communities. Do these resources match my requirements?
Where are forests in NRW?
query: "map forests NRW"
"Map()", "forest"
Requester
Google
finds
My service can display maps
My service is called "Map()"
My dataset is called "forest"
Web Site
Web Site
enter description Service Provider
finds
My data represent forests
enter description Data Provider
Fig. 6. Manual matchmaking based on names (level B) or unstructured and informal information (level C)
An explicit requirements specification can help the requester to do the matchmaking, because the process of explication often helps to clarify and disambiguate ideas on requirements. Also explicitly describing the service’s capabilities rather than only giving a name can improve the matchmaking by reducing guesswork. This is the case of explicit, but unstructured and informal description of semantics. However, misinterpretation is still possible if the descriptions are ambiguous or incomplete. These two cases are currently the most common ones as service descriptions are either informal or missing completely. Scenario 2 – Manual Matchmaking with Standardized Metadata. In the second scenario (Fig. 7) the providers’ conceptualizations are made explicit and are recorded in metadata documents whose structure is well known and which are made available
61
62
Chapter III. A Classification Framework for Semantic Interoperability
through one (or several) registries. The requester can search a registry using keywords for all of the fields provided by its query interface. He can then use the returned metadata documents to assess whether or not the services fit his requirements. He might need to access other documents that the metadata documents refer to, e.g. a feature type catalog providing definitions for feature classes or ISO standards defining units of measurement. Do these resources match my requirements?
Where are forests in NRW?
query: "map forests", BoundingBoxNRW
External Information, e.g. ISO 19111 or ATKIS OK
Requester
Query Interface
Forest Data Metadata MapService Metadata
Keywords for some/all metadata fields
My service can display maps
Registry
e.g. according to ISO 19115/ 19119
My data represent forests
Metadata Template
enter description Service Provider
enter description Data Provider
Fig. 7. Manual matchmaking with standardized metadata, e.g. ISO 19115/19119 (level D)
As the matchmaking in this scenario is still based on keywords, the problems described for the first scenario can still occur. There can be ambiguity in either the metadata entries themselves or in the referenced documents (e.g. the feature catalogue). However, this can be considerably reduced by using standardized documents, by providing a controlled vocabulary (e.g. lists of keywords), and by referring to other standardized or at least widely known and agreed-upon documents. Scenario 3 – Automatic Matchmaking with Formal Metadata. In the last scenario (Fig. 8) the conceptualizations of requester and providers are not only explicit but also formalized. They use concepts from existing domain ontologies [24] to formulate their requirements or advertisements, respectively. A service automatically matches the requester’s requirements against advertisements stored in its registry using a matchmaking algorithm such as described in [13]. It is assumed that by using formal descriptions of semantics and automatic matchmaking algorithms problems such as those described in the previous scenarios can be
Chapter III. A Classification Framework for Semantic Interoperability
avoided [12, 25]. However, in this scenario, too, problems similar as those identified in the previous scenarios, albeit for different reasons, can occur. No match. Services that fit the requester’s requirements are not found at all because the matchmaking algorithm is too rigorous. In [13] a threshold value has to be specified by the requester indicating which degree of similarity between advertisements and requirements is still acceptable. Unsuitable match. Services that are found do not fit the requester’s requirements. This, too, can be caused by the calibration of the matchmaking algorithm. Here, the matchmaking algorithm is too tolerant because the threshold value is too low. Another possible reason is that the requirements document does not correctly reflect the requester’s requirements or the capabilities documents do not correctly reflect the providers’ conceptualization of the service. We refer to these kinds of errors as explication or formalization errors, respectively. Domain Ontologies Where are forests in NRW? using concepts from
Requester
e.g. with preand postconditions, BBox etc.
access points for appropriate resources
Requirements Template Matchmaking Service
access e.g. with preand postAdvertisements conditions Registry etc.
My service can display maps
My data represent forests
Capabilities Template
enter description Service Provider
enter description Data Provider
using concepts from Domain Ontologies
using concepts from
Fig. 8. Automatic matchmaking with formal metadata (level E)
63
64
Chapter III. A Classification Framework for Semantic Interoperability
3.4 Likelihood of Misunderstanding Misunderstandings can occur in all matchmaking scenarios described in the previous sections. Summarizing the arguments from the previous section, Table 1 gives an estimate of the likelihood of misunderstandings for all possible combinations of explicitness, structuring and formality levels described above. It is assumed that the requester does the matchmaking. Table 1. Likelihood of misunderstandings for different levels of requirements and capabilities descriptions if the requester does the matchmaking. Shading: white – manual matchmaking possible, light gray – manual matchmaking possible but difficult for non-experts, dark gray – automatic matchmaking possible. The scenarios described in the previous section are framed requirements service capabilities
(completely) implicit
completely implicit implicit I highly likely explicit, unstructured, informal semantics
explicit, unstructured, informal semantics
matchmaking impossible highly likely highly likely
highly likely
likely
explicit, structured, informal semantics
likely
possible
explicit, formal semantics
likely
possible
4
explicit, structured, informal semantics
likely
explicit, formal semantics highly likely likely
possible possible possible (automatic matchmaking limited) possible unlikely possible (autom. III unlikely matchm. limited) II
Analysis of Examples
After having presented the framework for classifying matchmaking approaches we show in this section how it can be applied to the examples presented in section 2. In the following tables the first row lists the information required by the matchmaker in order to find resources appropriate for answering the requester’s question. The names of the concepts appear in italics. The remaining rows contain an analysis of the availability, quality and source of the information in each of the three scenarios presented in section 3.3. 4.1 Using Topographic Data for Noise Abatement Planning This example depicts the requester’s attempt to intersect residential areas with roads. This involves a matchmaking process for which information about the requester’s conceptualization of road, residential area and the operators touch and cross as well as information about the ATKIS geometry model are needed (Table 2).
Chapter III. A Classification Framework for Semantic Interoperability
scenario 1
scenario 3
scenario 2
scenario 1
Table 2. Application of the classification framework to example 1 – Using topographic data for noise abatement planning. (The table is split into two for enhanced readability)
available level source
information required by the matchmaker requester conceptualization of road requester conceptualization of touch and residential area and cross 9 9 implicit implicit requester’s mind requester’s mind
available level source
9 implicit requester’s mind
available level source
9 9 explicit, formal explicit, formal domain ontology chosen by the domain ontology chosen by the requester to describe his task requester to describe his task
available level source
scenario 3
scenario 2
available level source
available level source
9 implicit requester’s mind
information required by the matchmaker ATKIS geometry model for road and process model for geoprocessing residential area operations, e.g. intersect or buffer 9 9 – n.a. implicit implicit n.a. requester’s mind (if he is an requester’s mind (if he is an expert ATKIS expert) or dataset of the specific GIS) or trial and error (accessible via visualization of (requires GIS expertise) data, requires GIS expertise) 9 9 explicit, structured, informal explicit, structured, informal The ISO metadata standard supports The ISO services standard provides a references to external feature type template for describing services [4]. catalogs like that of ATKIS as well Alternatives are UDDI [7], WSDL as graphic overviews [23]. [8], Capabilities XML [26]. They focus on operation signatures; descriptions are available only on service level and appear as free text. ISO in addition provides free text descriptions on the operation level. The ISO spatial schema standard provides information for filling such a template [27]. They consist of free text descriptions and formalized operation signatures. 9 9 explicit, formal explicit, formal domain ontology based on ATKIS (geo)processing domain ontology, feature type catalog [15] e.g. based on ISO spatial schema standard [27]
65
66
Chapter III. A Classification Framework for Semantic Interoperability
In scenario 1 the intersection attempt will only be successful if the requester is an expert who is aware of how the ATKIS geometry model will fit his requirements. In scenario 2 the intersection attempt will be successful if the requester is willing to spend the time to access and understand the available metadata and perform the matchmaking manually. In scenario 3 the intersection attempt will be valid even if the requester is no ATKIS expert, because the information needed for the matchmaking is available in formal and explicit form, making automatic matchmaking possible. The result of the matchmaking process may be that the intersection is not possible because the mapping from system to requester concepts would require additional services that are not available. Nevertheless, even in this case the requester is saved from misinterpreting the results of the intersection. 4.2 Calculating the Area of Greenland in a Mercator Projection This example depicts the requester’s attempt to calculate the real world area of Greenland displayed with a GIS using the Mercator projection. This involves a matchmaking process for which information about the requester’s conceptualization of area calculation, the system model of area calculation and indirectly information about the attributes of Mercator projections is needed (Table 3). In scenario 1 the area calculation attempt is likely to lead to misinterpretation as long as the requester is no GI expert. In scenario 2 the area calculation attempt is likely to be canceled. If the requester is willing to spend the time to access and understand the available metadata, he becomes aware of that the calculated area will not meet his requirements of representing the real world area. However, in scenario 2 no further solution is offered. In contrast, in scenario 3 the area calculation attempt may be successful, because all information needed for the matchmaking process is available in formal and explicit form. The requester is made aware of that his requirements differ from the system’s abilities. It might be possible to search for a service that is able to calculate the area according to the requester’s requirements. In this scenario the requester does not need any knowledge about projections and area calculation operations.
Chapter III. A Classification Framework for Semantic Interoperability
available level source
scenario 3
information required by the matchmaker requester concep- system model of area calculation attributes of Mercatualization of (possibly including attributes of tor projection area calculation the projection, see next row) 9 9 9 – – implicit n.a. implicit n.a. implicit requester’s mind n.a. requester’s mind (if he is a n.a. requester’s GI expert) mind (if he is a GI expert)
available level
9 implicit
source
requester’s mind
scenario 2
scenario 1
Table 3. Application of the classification framework to example 2 – Calculating the area of Greenland in a Mercator projection
available level source
9 explicit, structured, informal
The operation signatures can be described in the same way as for the intersect and buffer operations in Table 2. The ISO metadata standard [23] provides attributes for operations which can be applied to the dataset. However, the requester has to judge whether the results (e.g. area calculation) fit his expectations. (see same column next row). 9 9 explicit, formal explicit, formal domain ontology (geo)processing domain ontology, chosen by the e.g. based on ISO spatial schema requester to standard [27] describe his task
9 explicit, structured, informal The ISO standard for spatial referencing by coordinates provides a free text description indicating for which application a coordinate reference system is valid [28].
9 explicit, formal domain ontology for projections, e.g. based on ISO standard for spatial referencing by coordinates [28]
4.3 Topological Operators in GeoMedia and Oracle This example depicts the requester’s attempt to find operations that return geometry features whose boundaries intersect but whose interiors do not. To find the appropriate operations the requester’s requirements have to be matched with the systems’ capabilities. For the matchmaking process information about the requester’s conceptualization of touch is needed as well as the process models of the available operations of the systems, in this case GeoMedia and Oracle (Table 4).
67
68
Chapter III. A Classification Framework for Semantic Interoperability
scenario 3
scenario 2
scenario 1
Table 4. Application of the classification framework to example 3 – Topological operators in GeoMedia and Oracle
available level source
available level source
available level source
information required by the matchmaker requester conceptuali- process models for touch operations (GeoMedia and zation of touch Oracle) and meet operation (GeoMedia) 9 9 – implicit n.a. implicit requester’s mind n.a. requester’s mind (if he is an expert of the specific GIS) or trial and error (requires GIS expertise) 9 9 implicit explicit, structured, informal requester’s mind The ISO services standard provides a template for describing services [4]. The ISO spatial schema standard provides information for filling such a template [27]. See also intersect and buffer operations in Table 2. 9 9 explicit, formal explicit, formal domain ontology cho- (geo)processing domain ontology, e.g. based on ISO sen by the requester to spatial schema standard [27] describe his task
In scenario1 the attempt to find the appropriate operation is likely to lead to misinterpretations if the requester is no system expert. In scenario 2 the attempt may be successful if the requester is willing to spend the time to access and understand the available metadata. He then will learn about the meaning of the different operations and will be able to perform a manual matchmaking. In scenario 3 the matchmaking will be successful. Using terms from a domain ontology the requester can specify his requirements formally and explicitly. Based on this formal and explicit specification the appropriate operations can be chosen automatically from among the available operations.
5
Conclusions and Future Work
We have presented a framework for classifying approaches to achieving semantic interoperability in the domain of GI web services. The framework focuses on the process of matchmaking as this is where semantic interoperability is ensured. Therefore approaches to achieving semantic interoperability are classified according to the quality of the information that is available to the matchmaker. The application of the framework has been illustrated by analyzing existing approaches to solving examples of semantic interoperability problems. In scenario 1 misinterpretations are likely to occur unless the requester is an expert for the components employed. In scenario 2 misinterpretations are less likely if the requester is willing to spend the time to access and understand the available metadata. In scenario 3 misinterpretations are unlikely, even for non-experts, as automatic matchmaking is
Chapter III. A Classification Framework for Semantic Interoperability
applied. However, there is still the possibility that the services required for the requester’s query are not available. The analysis of practical problems only presents a first application of the framework. We believe the framework to be valuable to the GI research community for structuring the domain of semantic interoperability research, because it supports the following tasks: The information required for the matchmaking process can be identified. The required information can be classified according to the qualities explicitness, structuring and formality. It can be assessed which quality level of the required information is appropriate for the task at hand. The different levels of explicitness, structuring and formality can easily be associated to predefined scenarios that indicate possible implementation methods . In the combination of the above reasons, researchera can classify their approach and judge whether the applied methods are appropriate for the task at hand. Future work must look at the role that service discovery plays within the larger task of service composition. It will also be examined whether other sub-tasks play a role in ensuring semantic interoperability in (especially ad-hoc) service composition. For this an abstract model of service composition should be developed. Such a model could be valuable for the standardization efforts in OGC and ISO TC 211, where the task of service composition has not yet been thoroughly explored. It also remains an open question whether examples like those presented in this paper represent a specific (i.e. spatial) kind of semantic heterogeneity or whether they can be treated in the same way as other (non-spatial) semantic problems. If the latter turns out to be possible the framework should be adjusted accordingly.
6
Acknowledgements
Comments from Werner Kuhn to earlier drafts of this paper helped clarify the ideas. The work presented in this paper has been partially supported by the German Ministry for Education and Science as part of the GEOTECHNOLOGIEN program (grant number 03F0369A) and can be referenced as publication no. GEOTECH-23. Furthermore, support from the European Commission through the ACE-GIS (grant number IST-2002-37724) and BRIDGE-IT (grant number IST-2001-34386) projects are gratefully acknowledged.
7
References
1. Abel, D. J., Gaede, V. J., Taylor, K. L., Zhou, X.: SMART: Towards Spatial Internet Marketplaces. Geoinformatica 3 (1999) 141-164 2. Groot, R., McLaughin, J.: Geospatial data infrastructure – Concepts, cases, and good practice. Oxford University Press (2000) 3. OGC: OpenGIS Web Services Architecture. OpenGIS Consortium, OpenGIS Discussion Paper OGC 03-025 (2003)
69
70
Chapter III. A Classification Framework for Semantic Interoperability
4. ISO/TC-211, OGC: Geogaphic information – Services (ISO/DIS 19119) v4.3. International Organization for Standardization & OpenGIS Consortium (2002) 5. Egenhofer, M.: Toward the Semantic Geospatial Web. In: Proc. The 10th ACM International Symposium on Advances in Geographic Information Systems (ACM-GIS) (2002) 6. OASIS: OASIS/ebXML Registry Services Specification v2.5. OASIS/ebXML Registry Technical Committee (2003) 7. Bellwood, T., Clément, L., Ehnebuske, D., Hately, A., Hondo, M., Husband, Y. L., Januszewski, K., Lee, S., McKee, B., Munter, J., von Riegen, C.: UDDI v 3.0. (2002) 8. Chinnici, R., Gudgin, M., Moreau, J.-J., Weerawarana, S.: Web Services Description Language (WSDL) v1.2. (2002) 9. Reed, C., Nebert, D.: The Importance of Catalogs to the Spatial Web. An OGC White Paper. (2002) 10. OGC: OWS1.2 UDDI Experiment. OpenGIS Consortium, OGC 03-028 (2003) 11. Constantinescu, I., Faltings, B.: Efficient Matchmaking and Directory Services. Swiss Federal Institute of Technology, Techn. Report IC/2002/77 Lausanne, Switzerland (2002) 12. Paolucci, M., Kawamura, T., Payne, T. R., Sycara, K.: Semantic Matching of Web Service Capabilities. In: Proc. 1st International Semantic Web Conference (ISWC2002) (2002) 333-347 13. Sycara, K., Widoff, S., Klusch, M., Lu, J.: Larks: Dynamic Matchmaking Among Heterogeneous Software Agents in Cyberspace. In: Proc. First International Joint Conference on Autonomous Agents and Multi-Agent Systems (2002) 173-203 14. Bishr, Y.: Overcoming the semantic and other barriers to GIS interoperability. International Journal of Geographical Information Science 12 (1998) 299-314 15. AdV-Arbeitsgruppe ATKIS: ATKIS-Objektartenkatalog Basis-DLM. (2002) 16. Furuti, C. A. Useful Map Properties.[Online]. Available: http://www.progonos.com/furuti/ MapProj/Normal/CartProp/ 17. Visser, U., Stuckenschmidt, H.: Interoperability in GIS - Enabling Technologies. In: Proc. 5th AGILE Conference on Geographic Information Science (2002) 291-297 18. Kuhn, W., Raubal, M.: Implementing Semantic Reference Systems. In: Proc. 6th AGILE Conference on Geographic Information Science (2003) 19. Sycara, K., Klusch, M., Widoff, S., Lu, J.: Dynamic Service Matchmaking Among Agents in Open Information Environments. ACM SIGMOD Record 28 (1999) 47-53 20. Uschold, M.: Knowledge level modelling: concepts and terminology. The Knowledge Engineering Review 13 (1998) 5-29 21. Gruber, T. R.: Toward Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human-Computer Studies 43 (1995) 907-928 22. Uschold, M., Gruninger, M.: Ontologies: Principles, Methods and Applications. The Knowledge Engineering Review 11 (1996) 93-136 23. ISO/TC-211: Geogaphic information – Metadata (ISO/FDIS 19115). International Organization for Standardization (2003) 24. Guarino, N.: Formal Ontology and Information Systems. In: Proc. Formal Ontology in Information Systems (FOIS’98) (1998) 3-15 25. Guarino, N.: Semantic Matching: Formal Ontological Distinctions for Information Organization, Extraction, and Integration. In: Proc. Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology, International Summer School (SCIE97) (1997) 139-170 26. de La Beaujardière, J.: Web Map Service Implementation Specification. Open GIS Consortium (2002) 82 27. ISO/TC-211: Geogaphic information – Spatial Schema (ISO/DIS 19107). International Organization for Standardization (2002) 28. ISO/TC-211: Text for FDIS 19111 Geogaphic information - Spatial Referencing by Coordinates. Final Draft Version. International Organization for Standardization, (2002)
Chapter IV
Ontology-based Discovery of Geographic Information Services – An Application in Disaster Management Klien, E., Lutz, M., & Kuhn, W. (2005): Ontology-based Discovery of Geographic Information Services – An Application in Disaster Management. Computers, Environment and Urban Systems (CEUS) (in press).24 Abstract. Finding suitable information in the open and distributed environment of current geographic information web services is a crucial task. Service brokers (or catalogue services) provide searchable repositories of service descriptions but the mechanisms to support the task of service discovery are still insufficient. One of the main challenges is to overcome semantic heterogeneity caused by synonyms and homonyms during keyword-based search in catalogues. This paper presents a practical case study to what extent ontology-based service discovery can solve these semantic heterogeneity problems. To this end, we apply the Bremen University Semantic Translator for Enhanced Retrieval as a service broker. The approach combines ontology-based metadata with an ontology-based search. Based on a scenario of finding geographic information services for estimating potential storm damage in forests, it is shown that through terminological reasoning the request finds an appropriate match in a service on storm hazard classes. However, the approach reveals some limitations in the context of geographic web service discovery, which are discussed at the end.
24 by c
Elsevier. Reprint permission granted 19 October, 2005. See http://www.sciencedirect.com/science/ journal/01989715.
71
72
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS
Computers, Environment and Urban Systems xxx (2005) xxx–xxx www.elsevier.com/locate/compenvurbsys
Ontology-based discovery of geographic information services—An application in disaster management E. Klien *, M. Lutz, W. Kuhn Institute for Geoinformatics, University of Muenster, Robert-Koch-Strasse 26-28, Muenster 48149, Germany Received 12 March 2004; accepted in revised form 17 April 2005
Abstract Finding suitable information in the open and distributed environment of current geographic information web services is a crucial task. Service brokers (or catalogue services) provide searchable repositories of service descriptions but the mechanisms to support the task of service discovery are still insufficient. One of the main challenges is to overcome semantic heterogeneity caused by synonyms and homonyms during keyword-based search in catalogues. This paper presents a practical case study to what extent ontology-based service discovery can solve these semantic heterogeneity problems. To this end, we apply the Bremen University Semantic Translator for Enhanced Retrieval as a service broker. The approach combines ontology-based metadata with an ontology-based search. Based on a scenario of finding geographic information services for estimating potential storm damage in forests, it is shown that through terminological reasoning the request finds an appropriate match in a service on storm hazard classes. However, the approach reveals some limitations in the context of geographic web service discovery, which are discussed at the end. 2005 Elsevier Ltd. All rights reserved. Keywords: GI service discovery; Semantic heterogeneity; Ontologies; Semantic matchmaking
*
Corresponding author. Tel.: +49 251 8333724; fax: +49 251 8339763. E-mail address:
[email protected] (E. Klien).
0198-9715/$ - see front matter 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.compenvurbsys.2005.04.002
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS 2
E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
1. Introduction Geographic information is the key to effective planning and decision-making in a variety of application domains. So-called intelligent web services permit easy access and effective exploitation of distributed geographic information (GI) for all citizens, professionals, and decision-makers (Bishr & Radwan, 2000; Brox, Bishr, Kuhn, Senkler, & Zens, 2002). This paper focuses on the task of service discovery, which is a crucial task in the open and distributed environment of GI web services. Often, effective service discovery requires an extensive search for appropriate services across multiple application domains. Catalogues support discovery, organisation, and access of geographic information and thus help the user to find information that exists (OGC, 2004). However, the use of different vocabulary in the different application domains might lead to semantic heterogeneity problems when only simple keyword-based search is employed to find relevant information in a catalogue. Such problems arise when terms are unknown, the meaning of elements is not intuitively clear, or the understanding of the information provider differs from that of the requestor (Schuster & Stuckenschmidt, 2001). Explication of knowledge by means of ontologies is a possible approach to overcome the problem of semantic heterogeneity, as ontologies can be used for the identification and association of semantically corresponding concepts (Wache et al., 2001). The task of ontology-based information brokering has been addressed by the Bremen University Semantic Translator for Enhanced Retrieval (BUSTER) (http://www.semantic-translation.de/). Among other things, this system provides an ontology-based approach with logical reasoning on metadata for retrieving information sources (Neumann, Schuster, Stuckenschmidt, Visser, & Vo¨gele, 2001). The ideas presented in this paper are well known and applied in the context of the Semantic Web (http://semanticweb.org/). To increase precision and recall during service discovery in current service registries (e.g. OGC catalogue services or UDDI registries), several approaches that are based on reasoning with semantic service descriptions that refer to ontologies have been proposed (Kawamura, De Blasio, Hasegawa, Paolucci, & Sycara, 2003; Paolucci, Kawamura, Payne, & Sycara, 2002; Sirin, Hendler, & Parsia, 2003). Several XML-based mark-up languages are available for the description of ontologies including RDF Schema (W3C, 2004) and OWL (Antoniou & Van Harmelen, 2003). We have decided to focus our work on the mechanisms of semantic matchmaking by means of terminological reasoning. In order to remain independent from current web implementations, we use a basic description logic as a representation language. The presented example can easily be reconstructed using a terminological reasoner like RACER. We examine to what extent this approach can contribute to solving semantic heterogeneity problems that can occur during GI service discovery. To this end, we apply BUSTER as service broker in the GI service discovery scenario of this work. Our long-term goal is to integrate the mechanisms presented in this work into geographic services architectures (Klien, Einspanier, Lutz, & Hu¨bner, 2004).
73
74
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
3
The remainder of this paper is structured as follows. In Section 2 we describe a motivating example for our research. Section 3 gives an introduction to GI web service discovery and points out semantic heterogeneity problems that may arise. The approach that is applied for ontology-based GI web service discovery is presented in Section 4, followed by the method descriptions in Section 5. In Section 6 the capabilities of the approach are examined by applying the presented approach for service discovery to the motivating example. The paper concludes with a discussion of the problems encountered and an outlook on the remaining and emerging research questions.
2. Motivating example: discovering services for estimating storm damage in forests The motivating example is set in the area of disaster management and mitigation. Heavy storms, such as the winter storm ‘‘Lothar’’ over Central Europe in December 1999, may cause severe road blockage by windfall timber. We use the motivating example throughout the paper to illustrate problems caused by semantic heterogeneity and to apply the presented approach for service discovery. However, our work is designed to be domain independent and is not restricted to only this example. Susan is the official on duty in the regional authority responsible for ensuring road safety. After a heavy storm, she coordinates the assignment of the Governmental Disaster Relief Organisation (Technisches Hilfswerk, THW) and of the Federal Armed Forces (Bundeswehr) in the affected areas on a multi-regional scale. Susan has to keep track of where and to what extent the local authorities need help in order to clear the road blockages as quickly as possible. In order to coordinate the clearing operations effectively Susan requires an overview of which roads are most likely to be affected by fallen timber. She can obtain this overview by overlaying road data with information on potential storm damage in the forests of the region. In order to do that she first has to acquire information of the susceptibility of forests to storm damage. Finding suitable information sources is the focus of this scenario. The availability of geographic information with adequate quality (currency, completeness) is crucial for sound decisions at a local, regional and global level. GI web services are a key technology in compiling and providing the necessary information for decision makers in an ad hoc fashion (Bernard, Einspanier, Haubrock, et al., 2003). Since each situation will require different information to solve the problem at hand, service discovery, i.e. finding suitable services for answering a given question, becomes a crucial task.
3. GI web service discovery GI web services offer information products rather then raw datasets only interpretable by geographic information system (GIS) experts. The open and distributed
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS 4
E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
GI web service environment opens a wide range of new possibilities for acquiring, processing and analysing geographic information without the need of GIS expert knowledge (Greve, 2002). The World Wide Web supplies the basic infrastructure for system interoperability, i.e. distributed use and multiple exploitation of data and systems. Furthermore, the Open Geospatial Consortium (OGC) and the International Organisation for Standardisation (ISO) have developed geoinformation technology standards providing the essential basis for syntactic interoperability and cataloguing of GI web services (Bernard, Einspanier, Haubrock, et al., 2003). In an environment where services are previously unknown, a service that is appropriate for answering a given question from among a large number of available services has to be discovered first. Service discovery, thus, is a crucial task that will become even more important with the emerging Semantic Geospatial Web (Egenhofer, 2002). 3.1. General (GI) web service architecture Fig. 1 depicts the general ‘‘publish-find-bind’’ pattern of web service architectures, which has been adopted by OGC and ISO (ISO/TC-211 & OGC, 2002). Three roles can be identified. Service providers offer data or applications as services. A service is published with a service broker by advertising a declarative metadata description of the serviceÕs properties, e.g. its input, output or performance. Service requestors search for services that provide the information needed to solve the problem at hand (Nebert & Reed, 2002). The find-operation thus involves searching for an appropriate service by querying the service broker for relevant matches for a given question. A Service broker (or catalogue service) is an intermediary service whose responsibility is to bring a service requester and a service provider together (OGC, 1996), and thus may be considered the core of any GI web service environment. After a service offered by a provider is identified to match the service requestorÕs requirements it is bound to the service requestor, the service then is executed, passing data and instructions across common interfaces (Nebert & Reed, 2002).
Fig. 1. The ‘‘publish, find and bind’’ pattern of web service architectures (Zhang & Hutchison, 2002, modified).
75
76
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
5
3.2. Problems caused by semantic heterogeneity during service discovery Although standards from bodies like the OGC provide the basis for syntactic interoperability the usability of information that is created in one context is often of limited use in another context (Bernard, Einspanier, Haubrock, et al., 2003), because of insufficient means for meaningful interpretation. This problem is referred to as the need for ‘‘semantic interoperability among autonomous and heterogeneous systems’’ (Goh, Bressan, Madnick, & Siegel, 1999). Problems caused by semantic heterogeneous descriptions play a crucial role during the task of finding relevant information within a GI web service environment (Lutz, Riedemann, & Probst, 2003). Heterogeneity is an inherent problem in the geoscientific area because of the wide variety of potential applications. In Bishr (1998), semantic heterogeneity is defined as the consequence of different conceptualisations and database representations of a real world fact. Two types of semantic heterogeneity can be distinguished: • Cognitive heterogeneity: Because of different perspectives on the same real world facts there may not be a common base of definitions of the underlying facts between two disciplines (domains). Problems can occur if these cognitive differences are concealed because the same term is used for different concepts. • Naming heterogeneity: The same real world facts are understood in the same way but are named differently. The problems that semantic heterogeneity can cause during GI web service discovery can be illustrated by extending our motivating example introduced in Section 2. John is a forest ecologist who has developed a model for calculating storm hazard classes for forest stands on the basis of five influencing factors. In order to make his results open to public, John publishes his model as a GI web service. This service returns the forest stands (polygons) of a specified area classified in storm hazard classes, John has developed his model independently from SusanÕs question, but nevertheless, the resulting information may be used for estimating damage after storms (as well as for a variety of other applications, e.g. sustainable forest management). It is not necessarily deducible for a non-expert in forest ecology that the service calculating storm hazard classes could also be used for estimating storm damage. Both types of semantic heterogeneity introduced in Bishr (1998) can lead to problems if Susan performs a simple keyword-based search, e.g. using the terms ‘‘estimate storm damage’’: 1. Naming heterogeneity: If John has described his model in its specific context with the terms ‘‘storm hazard model’’ Susan will fail to find the storm hazard service although it offers relevant answers for her question (Fig. 2). 2. Cognitive heterogeneity: SusanÕs keyword search could also result in finding services that are not appropriate for answering her question, thus indicating the occurrence of cognitive heterogeneity. This would be the case if, for example, a service for depicting storm damages in forests over the last three decades was annotated with keywords like ‘‘storm damage’’ and ‘‘forest’’.
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS 6
E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
Fig. 2. Semantic heterogeneity problems caused by different domain perspectives and terminologies.
These examples show that keywords used in free-text entries have to be considered a poor way to capture the semantics of a query or item (Bernstein & Klein, 2002). Consequently, in order to solve these heterogeneity problems an approach is needed that exceeds the capabilities of current keyword-based search facilities in catalogues. So far, the problem has been acknowledged (Bernard, Einspanier, Lutz, & Portele, 2003; Bishr & Radwan, 2000; Egenhofer, 2002; Lutz et al., 2003) but still no standardised technological solution exists within the GI web service community. Accepting the diversity of geographic application domains, such an approach would need to enable navigating differences in meaning (Harvey, Kuhn, Pundt, Bishr, & Riedemann, 1999). In Stuckenschmidt (2002), it is suggested that explicit context models be used to re-interpret information in the context of a new application. Ontologies have become popular in information science as they can be used to explicate contextual information. In the following we present an ontology-based approach for overcoming semantic heterogeneity problems during service discovery, which has been realized in the BUSTER system.
4. Ontology-based approach to service discovery BUSTER is a scientific prototype developed at the Centre for Computing Technologies (TZI) in Bremen for supporting the specific tasks of information retrieval and information integration in distributed and heterogeneous environments. The BUSTER system offers functionalities in order to solve heterogeneity problems on three different levels (syntactic, schematic and semantic) by combining several technologies including standard mark-up languages, mediator systems, ontologies and knowledge-based classifiers (Visser, Stuckenschmidt, Wache, & Vo¨gele, 2000). In the context of GI web service discovery BUSTER becomes interesting as one of the functionalities it provides is an ontology-based search with terminological reasoning on metadata for finding information sources (Neumann et al., 2001).
77
78
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
7
In this section we present the general approach employed in BUSTER for ontology-based search for information sources. How we apply this approach to GI web services is described in Section 5. 4.1. Ontologies The term ‘‘ontology’’ has been used in information sciences with several meanings. Gruber (1993) introduced the term ‘‘ontology’’ to mean an ‘‘explicit specification of a conceptualisation’’. This initial definition was slightly modified in Borst (1997), where ontologies are defined as ‘‘a formal specification of a shared conceptualisation’’. Merging both definitions, we define an ontology as ‘‘an explicit formal specification of a shared conceptualisation’’ (Studer, Benjamins, & Fensel, 1998). According to Uschold (1998) a conceptualisation is the way of thinking about some domain. A conceptualisation may be implicit, e.g. existing only in oneÕs head, or embodied in a piece of software. To make it explicit means to define both the type of concepts used and the constraints on their use (Benjamins, Fensel, & GomezPerez, 1998). Finally, in order to make it machine-readable, it has to be formalised in some representation language. All this adds up to making the ontology a perfect candidate for communicating a shared and common understanding of some domain across people and computers (Studer et al., 1998). 4.2. Hybrid ontology approach The ontology approach of BUSTER is based on the idea of having a source-independent shared vocabulary for one domain (Fig. 3). It is assumed that the members of a domain share a common understanding of certain concepts, i.e. no further explication is needed. These concepts form the basic terms contained in a shared vocabulary. The shared vocabulary is usually built by an independent domain expert, who is familiar with typical tasks and problems in a domain.
Fig. 3. BUSTER approach to the use of ontologies (Visser & Stuckenschmidt, 2002, modified).
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS 8
E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
Once a shared vocabulary exists, the terms can be used to make explicit the contextual information of the information sources that are to be integrated (Visser & Stuckenschmidt, 2002), e.g. to build an application ontology for an information source. Thus the vocabulary has to be general enough to be used across all information sources that are to be annotated within the domain, but specific enough to make meaningful definitions possible (Schuster & Stuckenschmidt, 2001). The task of constructing an application ontology lies in the responsibility of the provider of the information source. 4.3. Ontology-based metadata In order to register an information source with BUSTER, a metadata description in form of a Comprehensive Source Description (CSD) is needed. Each CSD consists of metadata that describe technical and administrative details of the data source as well as its structural and syntactic schema and annotations (Visser et al., 2000). The CSD is based on the metadata standard Dublin Core and formalised in XML/RDF. Visser and Stuckenschmidt (2002) state that using XML (Extensible Markup Language) is a suitable way of exchanging data with a well-defined syntax and structure and that simple RDF (Resource Description Framework) provides a uniform syntax for exchanging meta-information in a machine-readable format. The application ontology is referenced in the CSD and thus adds capabilities for reasoning about meaning. Setting up a CSD is the task of the provider of the information source. 4.4. Knowledge representation language A knowledge representation language is used to formally describe the concepts of an ontology. In order to perform the tasks of searching, of discovering relationships among concepts, and of looking for inconsistencies in the ontology, the meaning of concepts in the ontology must be represented in a way that can be manipulated by machines. There are a variety of languages that can be used for the representation with varying characteristics in terms of their expressiveness, ease of use and computational complexity (Stevens, Goble, & Bechhofer, 2000). Description Logics (DL) is a family of languages that describe knowledge in terms of concepts and restrictions on roles. The main idea behind DL is to provide means to describe structured knowledge in a way that can be accessed and reasoned with (Nebel, 1996). DL theory is divided into a terminological part (TBox) and an assertional part (ABox). TBox deals with the definition of concepts, while ABox asserts facts about individuals (single objects). With respect to the applied reasoner in the BUSTER system (see next paragraph), the DL language used for representing the concepts of the shared vocabulary and the application ontology is SHIQ. The ‘‘S’’ stands for the basic DL that SHIQ is based on, i.e. ALCR+. This basic DL is extended with role hierarchies (‘‘H’’), inverse roles (‘‘I’’), and qualifying number restrictions (‘‘Q’’) (Horrocks, Sattler, & Tobies, 2000).
79
80
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
9
Generally, SHIQ enables algorithms for TBox reasoning as well as for ABox reasoning. However, in the current prototype of the BUSTER system only TBox reasoning is considered, i.e. the shared vocabulary and the application ontologies of a domain are implemented in the same TBox. For more information on SHIQ see Horrocks et al. (2000). 4.5. Ontology-based search BUSTER, the task of automatically mapping between concepts of different application ontologies within the same domain is performed by the DL system RACER (Reasoner for A-Boxes and Concept Expressions Renamed). The process is called semantic translation through re-classification (Visser et al., 2000), i.e. BUSTER allows the classification of data into another context through subsumption reasoning. Determining whether one description subsumes another one, that is, whether the first is more general than the second is one important reasoning task of DL systems. Formally, subsumption can be defined as follows: in a terminology T containing concepts C and D, C is subsumed by D if in every model of T the set denoted by C is a subset of the set denoted by D (Donini, 2003). With subsumption tests, one can organise the concepts of a terminology into a hierarchy according to their generality. A concept description can also be conceived as a query, describing a set of objects one is interested in Donini (2003). Thus, all concepts that are subsumed by the query concept can be considered to also satisfy the query. A number of algorithms exist to compute subsumption. In order to present the general idea, we introduce a structural subsumption algorithm for a simple DL (FL0) taken from Donini (2003):1 Let A1 u u Am u 8R1 C 1 u u 8Rn C n be the normal form of the FL0-concept description C, and B1 u u Bk u 8S 1 D1 u u 8S n D1 be the normal form of the FL0-concept description D. Then C subsumes D iff the following two conditions hold: (i) for all i; 1 6 i 6 k, there exists j; 1 6 j 6 m such that Bi = Aj. (ii) For all i; 1 6 i 6 l, there exists j; 1 6 j 6 n such that Si = Rj and Cj subsumes Di. This proposition can be illustrated in the following (simplified) example: the concept broadleafForest is defined as a subconcept of landuseCategory whose mainVegetation-Type role is restricted to broadleafTree. Now it is possible to define a queryConcept as a subconcept of landuseCategory whose mainVegetationType role is restricted to Tree.2 As both broadleafForest and queryConcept have the same 1 2
For a description of subsumption algorithms for more expressive DLs, see Donini (2003). Of course, this concept could also be called forest.
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS 10
E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
superconcept, and Tree subsumes broadleafTree, it can be deduced that queryConcept subsumes broadleafForest, and that hence this concept represents an answer to the query represented by the queryConcept.
5. Applying the BUSTER approach to GI service discovery In the following, the methods for applying the BUSTER approach to the GI web service discovery scenario are presented. 5.1. BUSTER as service broker In general, ontology-based search with BUSTER can be divided in two phases: a data acquisition phase and a query phase. The first phase includes all necessary serverside preparations for registering an information source with BUSTER. The second phase refers to the application of the BUSTER client by a human user in order to find a specific information source. When applying the BUSTER approach to the ‘‘publish-find-bind’’ pattern of web service architectures the acquisition phase corresponds to the publish-operation (Fig. 4). John, the service provider in our motivating example, can publish his storm hazard service by registering the BUSTER-specific CSD for the service. The registration of a CSD is essential to make the service detectable via the BUSTER client application. The service requestor (i.e. Susan) searches for appropriate services by defining a query for a service for estimating storm damage in forests during the query phase of BUSTER (which corresponds to the find-operation). As described in the previous section the following tasks have to be accomplished in order to prepare and to conduct the publish-operation for the storm hazard service within the BUSTER framework:
Fig. 4. The service discovery scenario transferred to the ‘‘publish-find-bind’’ pattern.
81
82
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
11
1. A shared vocabulary for (at least for parts of) the domain of forest ecology has to be defined. 2. An application ontology for the serviceÕs output (i.e. storm hazard classes) based on the shared vocabulary has to be built. 3. Finally a CSD for the service on storm hazard has to be written.
5.2. Defining a shared vocabulary For building a shared vocabulary in the domain of forest ecology, the method introduced in Schuster and Stuckenschmidt (2001) is adopted. The authors suggest an iterative approach through five steps: 1. Finding bridge concepts: Bridge concepts are query templates that contain admissible combinations of properties and property values, which can be seen as ‘‘points of entry’’ into the shared vocabulary of a domain. After choosing a query template, users can use the provided properties and property values to specify value constraints for their query. This procedure significantly reduces the amount of terms that are presented to the user and prevent inexperienced users from defining queries that do not make sense. 2. Defining properties: The builder of the shared vocabulary has to define properties that describe the chosen bridge concept. 3. Finding property values: The property values are the ‘‘fillers’’ of the defined properties. They are the main part of the shared vocabulary since they are used to define the concepts in the application ontology, which is built later on. 4. Adapting the shared vocabulary: During the first development cycles, the shared vocabulary will probably not be expressive enough. Schuster and Stuckenschmidt (2001) suggest to adapt the shared vocabulary by building a special ‘‘support ontology’’ in order to revise the problem. 5. Refining the definitions: Following the ‘‘evolving’’ life cycle, the engineer can step back all the time to modify, add and remove ontology definitions. For identifying bridge concepts, properties and property values, several sources of information have been chosen. Two standard works on forest ecology are examined: Barnes, Zak, Denton, and Spurr (1998) and Otto (1994). Furthermore the ‘‘General Multilingual Environmental Thesaurus’’ (GEMET) is taken into account. GEMET provides a core terminology of generalized environmental terms and definitions, covering more than 5.400 terms (Nax & Lethen, 1999). Within the scope of this work, the examination focuses on terms dealing with disturbances and the susceptibility in an ecosystem. Furthermore, the authors decided to extend this approach by taking into account the specific requirements of describing a service rather than an information item. It is an unrealistic expectation to find a complete set of characteristic attributes for a real world object. Describing objects based on their use, i.e. based on their function in the cognitive world, however, has the potential to overcome this problem (Bishr, 1998).
83
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS 12
E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
Correspondingly Kuhn (2001) suggests, that in order to make geographic information more useful and usable, ontologies should be designed with a focus on human activities in geographic space. This approach seems reasonable for defining a shared vocabulary that is to be integrated into a broker for web services. GI web services are needed in order to answer specific questions, i.e. a service requestor recognises objects based on their use. Therefore, the shared vocabulary should provide respective concepts that indicate functions in terms of ‘‘is useable for’’. For that reason, in addition to the concepts identified by the method introduced above, the actions or tasks (i.e. functions) related to a bridge concept are included as property values. 5.3. Building an application ontology The application ontology is the formalised, explicit description of the GI service for modelling storm hazard. In the context of this work, only the information output of this service is relevant. For describing the respective concept a clear understanding of the underlying model is needed. JohnÕs model of storm hazard calculates the degree of the susceptibility of a forest stand to storm. The result is expressed in a storm hazard class. The impact of a storm depends on the susceptibility of a forest stand at that particular moment. Therefore each storm hazard class (indicating the susceptibility of a forest stand) has assigned a critical wind speed (Table 1). The model can be useful in a variety of applications. It can be applied for sustainable forest management, predictions on occurrences of damage, and advanced warnings for all kinds of stakeholders related to forests. With the additional knowledge about the maximum wind velocity over a forest stand, a rough estimation of damage after a storm is possible as well. John has to describe the output of his service (i.e. the concept of a storm hazard class) by using only terms offered in the shared vocabulary of the forest ecology domain. On this basis, he tries to make the semantics of a storm hazard class as explicit as possible. 5.4. Writing a CSD Writing a CSD does not require a specific method. The structure is specified by the requirements of BUSTER (Section 4.3) and can be adopted for any new service to be published. While it is possible to include references to multiple concepts of the application ontology in each CSD, it cannot be explicitly stated which of the service properties Table 1 The underlying model for the service on storm hazard Storm hazard class
Critical wind speed
A very high B high C intermediate/average D inferior
>60 km/h >80 km/h >120 km/h >200 km/h
84
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
13
(e.g. input, output, functions) the concept refers to. In order to annotate all properties of a service a modification of the CSD or of the reasoning process would be required. This is further discussed in Section 7. In our example, Susan is interested in finding information for estimating storm damage in forests, i.e. the information output of a service. Therefore to publish a CSD that includes a reference only to the concept of the serviceÕs output is sufficient for making the find-operation feasible.
6. Publishing and discovering a GI service with the BUSTER approach This section describes how the presented approach for ontology-based service discovery can be applied to our motivating example. We first present the shared vocabulary, the application ontology and the CSD required for registering the service in BUSTER (Sections 6.1–6.3).3 Secondly, we apply these components in order to realise SusanÕs query for services for estimating storm damage in forests (Section 6.4). 6.1. The shared vocabulary There are a variety of bridge concepts that could be defined for the domain of forest ecology. In the context of our scenario the bridge concept disturbance is relevant. The properties and property values in Table 2 are only a small extract from the totality of possible values, but they are sufficient for the purpose of this work. As introduced in Section 5.2, the notion of functions (is_usable_for) is also considered. The shared vocabulary is registered with BUSTER. 6.2. The application ontology Fig. 5 contains an extract of the application ontology, describing the general concept of a storm hazard class (Section 5.3) by using the shared vocabulary of the forest ecology domain. In the following, the single expressions used for describing the concepts are explained. A storm hazard class describes a disturbance, which is caused by a storm (a), affects vegetation (b) and occurs in forests or pastures (c). It is produced by a simulation (e), can be used for depicting or regenerating a disturbance or for estimating the damage caused by it (f), and has a specific critical wind speed (g). The application ontology comprises more concepts, e.g. model of storm hazard which is defined via the input influencing factors and the output concept storm hazard class. However, as for our example only the semantics of the serviceÕs information output are of interest, these other concepts are not further described here.
3
The ontologies and CSD used for this application are stored and accessible on the following website: http://ifgi.uni-muenster.de/~klien/sources/.
85
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS 14
E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
Table 2 Extract of the shared vocabulary for the domain of forest ecology Bridge concept
Properties
Range
Fillers
Disturbance_measure
_disturbance_caused_by
Cause
Storm Fire Acidification Pest_Infestation
_disturbance_affects
Affected_landuse
Vegetation Fauna Soil
_disturbance_occurs_in
Affected_entity
Forest Pasture Wetland
_is_produced_by
Procedure
Measurement Simulation
_is_usable_for
Function
Depiction Prevention Regeneration Estimating_Damage
Fig. 5. Notation for the concept of a storm hazard class.
The application ontology can be put on any server for access. The web address is referenced in the CSD of the GI web service. 6.3. The comprehensive source description The CSD of the service for modelling storm hazard comprises a reference to the concept storm hazard class in the tag. The respective application ontology is referenced in the tag (Fig. 6). The other tags contain administrative and technical details that become interesting once a service has been identified as being suitable for matching a specific query. The CSD is stored on any server and registered via its web address with BUSTER.
86
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
15
Fig. 6. CSD of the service for modelling storm hazard.
DISASTER MANAGEMENT
requires information sources for estimating storm damage in region xy
Service Requestor
1
queries the domain of forest ecology 2
offers shared vocabulary via "defined concept search"
Service Broker 3 defines his requirements on basis of the shared vocabulary finds matching concepts and returns information on the service
4 BUSTER
FOREST ECOLOGY
has reference
has reference
Forest Ecology
Service Description for Service on Storm Hazard
Shared Vocabulary
Comprehensive Source Description (CSD)
based on
Model on Storm Hazard Classes
has reference
Application Ontology
Fig. 7. Procedure of the userÕs defined concept query for services for estimating storm damage in forests.
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS 16
E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
6.4. Query phase Once the service for modelling storm hazard has been annotated with all the information needed, it can be found via the client application of BUSTER. The system offers two different types of queries: the simple query, which uses existing concepts from application ontologies, is not further discussed in this paper. The other type is the defined concept query, which allows the user to define a concept based on a given shared vocabulary, which fits his understanding of a concrete concept (Visser & Stuckenschmidt, 2002). In the following, this user defined concept is referred to as query concept. Fig. 7 shows the procedure of the defined concept query in the context of our scenario, i.e. the query for services for estimating storm damage in forests.
Fig. 8. Query-template Disturbance_Measure and its properties and fillers. The selected fillers are to represent SusanÕs understanding of a concept for storm damage estimation. This screenshot depicts the current version of the client application of the BUSTER Prototype. (You can try this example at http:// geoshare.tzi.de/ConceptDefinitionClient/. The application is under constant development and therefore subject to change. This may affect the appearance or feasibility of this example.)
87
88
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
17
Fig. 9. Query concept and concept of storm hazard class loaded and automatically re-classified in RACER.
In our motivating example, after having chosen the domain of forest ecology, Susan selects a query-template provided by the BUSTER server, which is based on the domainÕs shared vocabulary. The templates correspond to the bridge concepts of the shared vocabulary and contain its properties (slots) and values (filler). This ensures, that Susan only makes use of well-known terms when defining a query concept. Fig. 8 shows the selection options offered by the query-template Disturbance_ Measure. On this basis Susan is asked to select reasonable values for the given properties. Susan chooses the fillers STORM from the property DISTURBANCE_ CAUSED_BY and the filler ESTIMATING_DAMAGE from the property IS_USABLE_FOR. She is not interested in services that deal with the other options. The filled query-template is then automatically translated into a logical term. Susan is not concerned with the task of formalising the explicit description of the query concept. During the query process all CSDs related to the current domain of forest ecology are parsed for the tag. The tag points to the respective application ontology (see Fig. 6). These ontologies are downloaded and transferred into the RACER inference machine. After re-classification (Section 4.5), all sub-concepts of the query concept are presented as results (Visser & Stuckenschmidt, 2002). SusanÕs query results in the discovery of the service on storm hazard, as the concept of storm hazard class is identified as being a subconcept of the userÕs query concept. Fig. 9 depicts the results in RACER (visualised with the graphical user interface RICE). The relevant concepts of the shared vocabulary are shown on the left. The result of the re-classification of the concept of storm hazard class in the new context of the query concept is shown on the right. Apart from the scenario query, there are a variety of other defined concept queries that would succeed in discovering the service on storm hazard, e.g. searching for information in order to depict disturbances that affect vegetation.
7. Conclusions and future work The BUSTER approach uses ontology-based metadata in combination with terminological reasoning to ensure semantic interoperability during information
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS 18
E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
retrieval. The approach has been developed for well-defined domains in which a small number of people or institutions commit to a common shared vocabulary. In this paper, we apply the approach in the broader context of GI web service discovery. The defined concept query offered by BUSTER has proven capable of solving semantic heterogeneity problems caused by naming heterogeneity and cognitive heterogeneity (Section 3.2): • The definition of the query concept is based on the shared vocabularies that are registered for different domains, i.e. the use of valid terms is ensured. In this manner, the problems that may arise due to naming heterogeneity during simple keyword-based search are avoided. • The terminological reasoning facility of BUSTER enables concept (rather than only syntactic) matching and ensures that problems caused by cognitive heterogeneity are factored out as well. While these results indicate that the approach is promising in the chosen application context, a number of challenges remain. 7.1. Publishing a service BUSTER has been designed to deal with single information items. In such cases, the reference to one concept description is sufficient during publishing. Publishing GI web services, however, is more complex, as a service is characterised through several properties, e.g. input, output, functionality and performance (Bernstein & Klein, 2002). The method for constructing an application ontology for web services should consider these requirements, i.e. the concept definitions should reflect the service properties that will be searched for later. It is to be examined to what extent the shared vocabulary, the application ontology, and the CSD can meet these additional requirements for publishing GI web services. While in the presented study, only a concept for the serviceÕs output information is referenced in the CSD, the CSD could in principle comprise references to more than just one concept of an application ontology. Thus, it fulfils an important requirement for annotating web services, i.e. each service property can be annotated with the corresponding concept in the application ontology. 7.2. Creating shared vocabularies A shared vocabulary provides a common basis for the interpretation of the application ontologies built using the vocabularyÕs terms. Thus, a shared vocabulary ensures semantic interoperability between all organisations committing to it. However, this central role also comes at a price: as the a shared vocabulary claims to comprise the basic terms of a common conceptualisation of a domain, great care must be taken to define the terminology on an appropriate level of expressiveness. The terms have to be general enough to allow the annotation of all information sources, but specific enough to make meaningful definitions possible. In consequence, a shared
89
90
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
19
vocabulary will only be useful if it is defined within a certain context and for a wellknown user community. The definition of sound shared vocabularies becomes ever more complicated in case of GI web service environments, which comprise a variety of application domains and whose boundaries are hard to define. For example, consider the domain of forest ecology in relation to soil ecology, vegetation ecology, landscape ecology, and similar disciplines. Furthermore, the restriction of shared vocabularies to a few bridge concepts, their properties and allowed property values leads to a severely limited expressiveness. For example, the shared vocabulary in our example would be more intuitively modelled using two concepts representing the disturbance and the disturbance measure, which are connected by the property DESCRIBES (Table 3). Unfortunately, with the currently available prototype it is not possible to express an application ontology or query based on such a shared vocabulary. A first approach to make shared vocabularies more expressive and flexible is presented in Klien et al. (2004). 7.3. Discovering a service The presented approach for ontology-based search rests on the assumption that all users have a common understanding of the terms provided in the shared vocabulary. This presents a major difficulty if—as in the scenario presented in this work— service requestor and provider are from different domains. In such cases, there has to be at least some overlap between the two actorsÕ conceptualisations (Fig. 10). However, this cannot always be assumed, especially if a user is not familiar with the domain he requests information from. This inherent problem of using shared vocabularies illustrates the need for grounding the used terms in conceptualisations. Approaches like the theory of semantic reference systems introduced by Kuhn (2003) show promising steps in the right direction and are subject of future research. 7.4. Description logic reasoning Another drawback of the approach could be seen in the matchmaking by DL reasoning, which is generally related to high computational complexity (Donini, 2003). In BUSTER, the reasoning task has only been tested within small domains comprising only a few application ontologies. Using BUSTER in an open GI web services
Table 3 Alternative shared vocabulary for the domain of forest ecology Bridge concept
Properties
Property values
DISTURBANCE
CAUSED_BY AFFECTS OCCURS_IN
See Table 2
DISTURBANCE_MEASURE
DESCRIBES IS_PRODUCED_BY IS_USABLE_FOR
DISTURBANCE See Table 2
91
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS 20
E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx CONCEPTUALISATIONS
rce action fo crises squad
storm
conting ency pla
n
ce ban ngestimrabtiance distu
r distu
dama ge
Disaster Managment Domain
suscepti
bility ges intial sta
protecti
ve stand
Forest Ecology Domain
Fig. 10. Overlap of the conceptualisations of two different domains.
environment with complex shared vocabularies and also more application ontologies to reason on, could make the task of matching concepts during service discovery extremely slow. However Li and Horrocks (2003) have shown that while the average time (per registered data set) for classifying a TBox indeed increases rapidly with the size of the registry, matchmaking in an already classified TBox is extremely fast. The authors therefore propose to classify the TBox offline before the matchmaking process starts, and use the classified TBox to reason about requests. 7.5. Integration into spatial data infrastructures Spatial data infrastructures are currently set up on local, regional, national and global level to facilitate the discovery and access to geographic data and GI services. In order to make the presented approach available to a wider community, it should be integrated with existing standardised SDI components such as Catalogue services (OGC, 2004). A first step in this direction is presented in Klien et al. (2004). Future work will address the discussed problems and alternatives in order to further improve the BUSTER approach for GI service discovery. In addition, we will compare the overall approach to other algorithms for discovering web services, e.g. Sycara, Klusch, Widoff, and Lu (1999).
Acknowledgements The work presented in this paper has been supported by the German Federal Ministry for Education and Research as part of the GEOTECHNOLOGIEN program (grant number 03F0369A). It can be referenced as publication no. GEOTECH-141. We are grateful to Sebastian Hu¨bner for his input at various stages of this work, especially for providing support on the BUSTER system and for enabling the implementation of the motivating example in the BUSTER prototype. The
92
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
21
comments from four anonymous referees provided useful suggestions to improve the content of the paper.
References Antoniou, G., & Van Harmelen, F. (2003). Web ontology language: OWL. In S. Staab & R. Studer (Eds.), Handbook on ontologies (pp. 67–92). Berlin: Springer. Barnes, B. V., Zak, D. R., Denton, S. R., & Spurr, S. H. (1998). Forest ecology (4th ed.). New York: John Wiley and Sons, Inc.. Benjamins, V. R., Fensel, D., & Gomez-Perez, A. (1998). Knowledge management through ontologies. In Second international conference on practical aspects of knowledge management, Basel, Switzerland. Bernard, L., Einspanier, U., Haubrock, S., Hu¨bner, S., Kuhn, W., Lessing, R., et al. (2003). Ontologies for intelligent search and semantic translation in spatial data infrastructures. PhotogrammetrieFernerkundung-Geoinformation, 2003(6), 451–462. Bernard, L., Einspanier, U., Lutz, M., & Portele, C. (2003). Interoperability in GI service chains—The way forward. In 6th AGILE conference on geographic information science, Lyon, France. Bernstein, A., & Klein, M. (2002). Towards high-precision service retrieval. In International semantic web conference, Sardinia, Italy. Bishr, Y. (1998). Overcoming the semantic and other barriers to GIS interoperability. International Journal of Geographical Information Science, 12(4), 299–314. Bishr, Y., & Radwan, M. (2000). GDI architectures. In R. Groot & J. McLaughlin (Eds.), Geospatial data infrastructures. Concepts, cases, and good practice (pp. 135–150). Oxford: Oxford University Press. Borst, P. (1997). Constructing of engineering ontologies for knowledge sharing and reuse. Unpublished PhD Thesis, University of Twente, Enschede, The Netherlands. Brox, C., Bishr, Y., Kuhn, W., Senkler, K., & Zens, K. (2002). Toward a geospatial data infrastructure for North Rhine-Westphalia. Computer, Environment and Urban Systems, 26, 19–37. Donini, F. M. (2003). Complexity of reasoning. In F. Baader, D. Calvanese, D. McGuinness, D. Nardi, & P. Patel-Schneider (Eds.), The description logic handbook. Theory, implementation and applications (pp. 96–136). Cambridge: Cambridge University Press. Egenhofer, M. (2002). Toward the semantic geospatial web. In 10th ACM international symposium on advances in geographic information systems (ACM-GIS), McLean, VA. Goh, C. H., Bressan, S., Madnick, S., & Siegel, M. (1999). Context interchange. New features and formalisms for the intelligent integration of information. ACM Transaction on Information Systems, 17(3), 270–293. Greve, K. (2002). Vom GIS zur Geodateninfrastruktur. STANDORT—Zeitschrift fu¨r Angewandte Geographie´, 3, 121–125. Gruber, T. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2), 199–220. Harvey, F., Kuhn, W., Pundt, H., Bishr, Y., & Riedemann, C. (1999). Semantic interoperability: a central issue for sharing geographic information. The Annals of Regional Science, 33, 213–232. Horrocks, I., Sattler, U., & Tobies, S. (2000). Reasoning with individuals for the description logic SHIQ. In 17th international conference on automated deduction (CADE-17), Pittsburgh, PA, USA. ISO/TC-211 & OGC. (2002). Geographic information services draft ISO/DIS 19119. OpenGIS service architecture Vs. 4.3. Draft Version. International Organization for Standardization and OpenGIS Consortium. Kawamura, T., De Blasio, J.-A., Hasegawa, T., Paolucci, M., & Sycara, K. (2003). Preliminary report of public experiment of semantic service matchmaker with UDDI business registry. In First international conference on service-oriented computing (ICSOC 2003), Trento, Italy. Klien, E., Einspanier, U., Lutz, M., & Hu¨bner, S. (2004). An architecture for ontology-based discovery and retrieval of geographic information. In 7th conference on geographic information science (AGILE 2004), Heraklion, Greece.
Chapter IV. Ontology-based Discovery of GI Services
ARTICLE IN PRESS 22
E. Klien et al. / Comput., Environ. and Urban Systems xxx (2005) xxx–xxx
Kuhn, W. (2001). Ontologies in support of activities in geographic space. International Journal of Geographical Information Science, 15(1), 613–631. Kuhn, W. (2003). Semantic reference systems. International Journal of Geographical Information Science, 17(5), 405–409. Li, L., & Horrocks, I. (2003). A software framework for matchmaking based on semantic web technology. In Twelfth international World Wide Web conference, Budapest, Hungary. Lutz, M., Riedemann, C., & Probst, F. (2003). A classification framework for approaches to achieving semantic interoperability. In COSIT 2003, Ittingen, Switzerland. Nax, K., & Lethen, J. (1999). ETC/CDS general multilingual environmental thesaurus (GEMET)—General information on GEMET. Publications of the European Topic Centre on Catalogue of Data Sources: ETC/CDS. Nebel, B. (1996). Artificial intelligence: A computational perspective. In G. Brewka (Ed.), Principles of knowledge representation. Studies in logic, language and information. Stanford: Cambridge University Press. Nebert, D., & Reed, C. (2002). The importance of catalogs to the spatial web. OGC White Paper. Neumann, H., Schuster, G., Stuckenschmidt, H., Visser, U., & Vo¨gele, T. (2001). Intelligent brokering of environmental information with the BUSTER system. In international symposium informatics for environmental protection, Zuerich, Switzerland. OGC. (1996). The OpenGIS guide. The Open CIS Consortium. OGC. (2004). Catalogue services specification, Version 2.0 (OGC implementation specification 04-021r2). Open Geospatial Consortium. Otto, H.-J. (1994). Waldokologie. Stuttgart: UTB. Paolucci, M., Kawamura, T., Payne, T. R., & Sycara, K. (2002). Semantic matching of web service capabilities. In 1st international semantic web conference (ISWC2002), Sardinia, Italy. Schuster, G., & Stuckenschmidt, H. (2001). Building shared ontologies for terminology integration. In KI01 Workshop on ontologies, Vienna, Austria. Sirin, E., Hendler, J., & Parsia, B. (2003). Semi-automatic composition of web services using semantic descriptions. In Workshop on web services: modeling, architecture and infrastructure (in conjunction with ICEIS2003), Angers, France. Stevens, R. D., Goble, C. A., & Bechhofer, S. (2000). Ontology-based knowledge representation for bioinformatics. Briefings in Bioinformatics, 1(4), 398–416. Stuckenschmidt, H. (2002). Ontology-based information sharing in weakly structured environments. Unpublished PhD Thesis, Vrije Universiteit Amsterdam, Amsterdam. Studer, R., Benjamins, V. R., & Fensel, D. (1998). Knowledge engineering: principles and methods. Data and Knowledge Engineering, 25(1–2), 161–197. Sycara, K., Klusch, M., Widoff, S., & Lu, J. (1999). Dynamic service matchmaking among agents in open information environments. SIGMOD Record, 28(1), 47–53. Uschold, M. (1998). Knowledge level modelling: concepts and terminology. The Knowledge Engineering Review, 73(1), 5–29. Visser, U., & Stuckenschmidt, H. (2002). Interoperability in GIS—enabling technologies. In 5th AGILE conference on geographic information science, Palma de Mallorca, Spain. Visser, U., Stuckenschmidt, H., Wache, H., & Vo¨gele, T. (2000). Using environmental information efficiently: sharing data and knowledge from heterogeneous sources. In C. Rautenstrauch & S. Patig (Eds.), Environmental information systems in industry and public administration (pp. 41–73). Hershey, PA: Idea Group Publishing. W3C. (2004). Web services architecture. Available from http://www.w3.org/TR/ws-arch/. Wache, H., Vo¨gele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., et al. (2001). Ontology-based integration of information a survey of existing approaches. In IJCAI-01 workshop: ontologies and information sharing, Seattle, WA. Zhang, Y., & Hutchison, G. (2002). IBM video central for e-business—Summary of a sample DB2 web services implementation (Part 2). Accessed 03-08-03.
93
Chapter V
An Architecture for Ontology-based GI Discovery and Retrieval of Geographic Information Klien, E., Einspanier, U., Lutz, M., & H¨ ubner, S. (2004): An Architecture for Ontology-based Discovery and Retrieval of Geographic Information, in: Toppen, F. & Prastacos, P. (Eds.): 7th Conference on Geographic Information Science (AGILE 2004): 179-188.25 Abstract. Finding and accessing suitable information in the open and distributed environments of current Spatial Data Infrastructures (SDIs) is a crucial task. Catalogues provide searchable repositories of information descriptions, but the mechanisms to support the tasks of discovery and retrieval are still insufficient. Problems of semantic heterogeneity caused by synonyms and homonyms can arise during free-text search in catalogues. Moreover, once a suitable Web Feature Service (WFS) is found and accessed, the property names of a feature are often difficult to interpret. This paper introduces an architecture for ontology-based discovery and retrieval of geographic information that solves semantic heterogeneity problems of current query capabilities. Based on a (real-world) scenario from the area of flood management, the application of our approach shows that the information requestor can be efficiently supported.
25 by c
Crete University Press. Reprint permission granted 13 September, 2005.
94
Chapter V. An Architecture for GI Discovery and Retrieval
95
An Architecture for Ontology-Based Discovery and Retrieval of Geographic Information Eva Klien1, Udo Einspanier1, Michael Lutz1 and Sebastian Hübner2 1
Institute for Geoinformatics (IfGI) University of Münster, Germany {klien|spanier|m.lutz}@uni-muenster.de 2
Intelligent Systems Group University of Bremen, Germany
[email protected]
SUMMARY Finding and accessing suitable information in the open and distributed environments of current Spatial Data Infrastructures (SDIs) is a crucial task. Catalogues provide searchable repositories of information descriptions, but the mechanisms to support the tasks of discovery and retrieval are still insufficient. Problems of semantic heterogeneity caused by synonyms and homonyms can arise during free-text search in catalogues. Moreover, once a suitable Web Feature Service (WFS) is found and accessed, the property names of a feature are often difficult to interpret. This paper introduces an architecture for ontologybased discovery and retrieval of geographic information that solves semantic heterogeneity problems of current query capabilities. Based on a (real-world) scenario from the area of flood management, the application of our approach shows that the information requestor can be efficiently supported. KEYWORDS: semantic heterogeneity, ontologies, GI discovery, GI retrieval 1 INTRODUCTION Geographic information (GI) is the key to effective planning and decision-making in a variety of application domains. So-called intelligent web services permit easy access and effective exploitation of distributed geographic information for all citizens, professionals, and decision-makers (Bishr, 2000; Brox, 2002). This paper focuses on the discovery and retrieval of geographic information. The specifications provided by the OpenGIS-Consortium (OGC) enable syntactic interoperability and cataloguing of geographic information. However, while OGC-compliant catalogues support discovery, organisation, and access of geographic information, they do not yet provide methods for overcoming problems of semantic heterogeneity. These problems still present challenges for the GI discovery and retrieval in the open and distributed environments of Spatial Data Infrastructures (SDIs). One possible approach to overcome the problem of semantic heterogeneity is the explication of knowledge by means of ontologies, which can be used for the identification and association of semantically corresponding concepts (Wache, 2001). In this paper we introduce an architecture for ontology-based discovery and retrieval of geographic information. To this end, we extend the query capabilities currently offered by OGC-compliant catalogues with terminological reasoning on metadata provided by an ontology-based reasoning component. We show how this approach can contribute to solve semantic heterogeneity problems during free-text search in catalogues and how it can support intuitive information access once an appropriate resource has been found.
96
Chapter V. An Architecture for GI Discovery and Retrieval
2 MOTIVATING EXAMPLE: DISCOVERING INFORMATION ON WATER LEVELS IN THE ELBE RIVER Throughout this paper we use a motivating example to illustrate semantic heterogeneity problems that can occur when using state-of-the-art GI query possibilities and our approach to their solution. However, our work is designed to be independent of a particular GI domain and is not restricted to only this example. John is a hydrologist who is interested in water levels of the Elbe River. As an expert in the field he knows the existing control points in the river. He wants to know the measurement of the water level at a specific control point at a specified time. Since John does not know about an existing Web Feature Service (WFS) offering this kind of information, he makes use of an OGC-compliant catalogue in order to find appropriate information for answering his question: “What is the water level at control point X at time Y in the Elbe River?” There are several data providers that offer information about water levels in the Elbe River via a standardized WFS interface1. a) The Federal Agency for Hydrology (Bundesanstalt für Gewässerkunde, BafG): b) The Electronical Information System for Waterways (Elektronisches WasserstraßenInformationssystem, ELWIS): c) The Czech Hydrometeorological Institute (CHMI): Table 1 lists the names of the GML features returned by these WFS and their property names. Table 1: Names of the GML features returned by three WFS and their properties WFS Feature Name of the control point
BafG
ELWIS
CHMI
Pegelmessung
WasserstandMessung
StavVody
name
pegel
stanice
url
quelle
url
wasserstand_cm
hoehe
stav
Unique ID of the control point Internet address of the control point Water level measured in cm Date and time of the measurement
id
zeitpunkt
datum
Date of the measurement
datum
Time of the measurement
uhrzeit
Geometry as Point Name of the river Discharge in cubic meters per second
gml:pointProperty
standort
gml:position tok prutok
3 SEMANTIC HETEROGENEITY PROBLEMS DURING STATE-OF-THE-ART DISCOVERY AND RETRIEVAL OF GEOGRAPHIC INFORMATION This section describes possible problems caused by semantic heterogeneity between user requests and application schemata or metadata descriptions, if users and providers of geographic information are from different information communities (OGC, 1999).
1
These agencies do not yet provide their data on water levels through WFS but through normal html pages. For implementing the example the information is parsed from these pages and provided through WFS interfaces. The access points to these services are provided at http://www.meanings.de/
Chapter V. An Architecture for GI Discovery and Retrieval
97
3.1 Discovery In current standards-based catalogues (e.g. GDI-NRW (2002)) users can formulate queries using keywords and/or spatial filters. The metadata fields that can be included in the query depend on the metadata schema used (e.g. ISO 19115) and on the query functionality of the service that is used for accessing the metadata. Two types of semantic heterogeneity can lead to problems if John performs a simple keyword-based search, e.g. using the terms “water level” and/or “Elbe”. These types are classified by Bishr (1998) as: 1.
Naming heterogeneity (synonyms): John may fail to find the existing WFS that are offering the information, because their metadata description contains slightly different terminology, e.g. “depth” or “Labe” (the Czech name for “Elbe”).
2.
Cognitive heterogeneity (homonyms): John’s request could also result in finding services that are not appropriate for answering his question, thus indicating the occurrence of cognitive heterogeneity. This would be the case, e.g., if John’s free-text search for “water level” resulted in discovering a service for depicting the network of water level control points, without the actual information about the current water level or a service providing groundwater rather than surface water levels.
These examples show that keywords used in free-text entries have to be considered a poor way to capture the semantics of a query or item (Bernstein, 2002). 3.2 Retrieval Another major difficulty arises if John wants to access geographic information via one of the existing WFS. The DescribeFeatureType request (OGC, 2002) returns the application schema for the feature type, which is essential for formulating a filter for the query. John now runs into troubles if the property names are not intuitively interpretable (cf. Table 1). For example, he can only guess that the property names “hoehe” (German) or “stav” (Czech) refer to the measurement of the water level. The goal of the architecture presented in this paper is to provide user support for interpreting property names adequately, since this is a precondition for formulating an appropriate query. 4 ONTOLOGY-BASED APPROACH To solve the semantic heterogeneity and interpretation problems presented in the previous section, an approach is needed that exceeds the capabilities of current free-text search facilities in catalogues and supports an intuitive interpretation of property names. Accepting the diversity of geographic application domains, such an approach would need to enable navigating differences in meaning (Harvey, 1999). Stuckenschmidt (2002) suggests to use explicit context models that can be used to re-interpret information in the context of a new application. Ontologies have become popular in information science as they can be used to explicate contextual information. We adopt a modified version of Gruber’s (Gruber, 1993) often-quoted definition of the term “ontology” by Studer (1998), who defines it as “an explicit formal specification of a shared conceptualization” (a conceptualization being a way of thinking about some domain (Uschold, 1998)). This makes the ontology a perfect candidate for communicating a shared and common understanding of some domain across people and computers (Studer, 1998). 4.1 Hybrid Ontology Approach The hybrid ontology approach used in our architecture for enhancing information discovery and retrieval has been adopted from Visser (2002). It is based on the idea of having a source-independent shared vocabulary for each domain (Figure 1).
98
Chapter V. An Architecture for GI Discovery and Retrieval
DOMAIN
shared vocabulary
application ontology
application ontology
application ontology
information source
information source
information source
Figure 1: The hybrid ontology approach (Visser, 2002), modified. It is assumed that the members of a domain share a common understanding of certain concepts. These concepts require no further explication and therefore form the basic terms contained in a shared vocabulary. Once a shared vocabulary exists, the terms can be used to make the contextual information of an information sources explicit, i.e. to build an application ontology for it (Visser, 2002). Thus, the vocabulary has to be general enough to be used across all information sources that are to be annotated within the domain, but specific enough to make meaningful definitions possible (Schuster, 2001). The task of constructing an application ontology lies in the responsibility of the provider of the information source. For the ontology-based annotation of information sources we have made two modifications to this approach. First, the information sources are not annotated directly. Instead, we describe the feature type provided by a service, which is defined through its application schema. To be more precise, the shared vocabulary is used to describe in detail the properties included in the schema. There is therefore an additional level of semantic annotation. Together with the syntax, which can be requested via the normal DescribeFeatureType() operation, the annotation of an information source is complete. Second, we do not only use domain-specific ontologies (e.g. measurements, hydrology), but also domainindependent ontologies (e.g. SI units). In our example, John searches for the water level of control point X at time Y. In our approach, this means that he uses these properties (i.e. “location”, “water level”, and “date and time of measurement”) to describe a feature type. The precise formulation of the query and its execution and result are presented in the next section. 4.2 Ontology-Based Search We distinguish two (closely-related) types of query. In a simple query the user can choose a concept from an existing application ontology for his query. The defined concept query allows the user to define a concept based on a given shared vocabulary, which fits his understanding of a concrete concept (Visser, 2002). In the following steps existing application ontology concepts and user-defined concepts are treated the same and will be referred to as query concepts. The actual search is performed by automatically mapping between the query concept and concepts of different application ontologies within the same domain. This is possible by applying a terminological reasoner, e.g. RACER (Reasoner for A-Boxes and Concept Expressions Renamed) (Haarslev, 2001), which can work with concepts described in the Description Logic SHIQ (Horrocks, 2000). A reasoner like RACER allows the classification of data into another context by equality and subsumption. Subsumption means that if concept B satisfies the requirements for being a case of concept A, then B can automatically be classified below A (Beck, 2002). This procedure enables query processing and searching in a way that is not possible with keyword-based search.
99
Chapter V. An Architecture for GI Discovery and Retrieval
HYDROLOGY
MEASUREMENTS (define-concept ResultOfAMeasurement (and (some _quantity Quantity) (some _unit Unit) (some _factor Factor) ) ) (define-concept Measurement (and (some _has ResultOfAMeasurement) (some _has Location) (some _has Datestamp) (some _has Timestamp) ) )
(define-concept WaterLevel (and ResultOfAMeasurement (some _quantity Depth) (some _inspected River) ) )
Figure 2: Extract of some concepts definitions in the measurements and the hydrology domain. QUERY CONCEPT (define-concept Query_for_Measurement (and Measurement (some _has WaterLevel) ) )
Figure 3: Definition of the query concept for a feature type representing a water level measurement.
In our example, John can use existing domain concepts like Measurement and WaterLevel (Figure 2) to formulate his query concept Query_for_Measurement (Figure 3). By re-classifying this concept in RACER, it can be deduced that all three web services match the user query because they all provide measurements (having a location, a date and a time stamp) whose result is restricted to water levels. The subsumption hierarchy computed by RACER is shown in Figure 4.
Figure 4: Subsumption hierarchy including the query concept Query_for_Measurement. The query concept is classified as a super-concept to all feature types provided by the three WFS.
100
Chapter V. An Architecture for GI Discovery and Retrieval
5 ARCHITECTURE FOR ONTOLOGY-BASED DISCOVERY AND RETRIEVAL The architecture we propose in this paper offers two functionalities that significantly enhance the usability of existing geographic information: using defined concept queries to overcome semantic heterogeneity problems during information discovery, and providing interpretation support for feature types and properties during information retrieval. In order to support the advanced query capabilities described above, some new service interfaces and information items are needed in addition to the well-known components as catalogues and Web Feature Services of current SDIs. We will first describe these information items and interfaces and then sketch the information flow by means of our motivating scenario. 5.1 Components to Enable Ontology-Based Discovery and Retrieval First, we have to provide the ontologies. For each application schema there is one application ontology that is described with the shared vocabulary of the corresponding domain. These ontologies provide the formal description of the application schema of a data source. Therefore, they are referenced from the feature catalogue description metadata section of the corresponding ISO 19115 documents for that data source (Figure 5). This metadata section describes content information of that data source, e.g. a list of the available feature types names.
... Measurement ResultOfAMeasurement Datestamp Timestamp .... (define-concept WasserstandMessung (and (some _has standort) (some _has hoehe) (some _has datum) (some _has uhrzeit) ... ) )
Ontology level
M etadata level
(define-concept standort Location) (define-concept hoehe (and (some _quantity Depth) (some _unit Meter) (some _factor Centi) (some _inspected River) ) ) (define-concept datum Datestamp) (define-concept uhrzeit Timestamp) ...
false elwis:WasserstandMessung Applikationsontologie für ELWIS WasserstandMessung 2003-11-11
Schema level
... Die Wasserstandshöhe in cm. ...
Shared Vocabulary Measurements
Shared Vocabulary Hydrosphere
... Water Surface_Water River Lake ....
serves
Is based on Application Ontology Ontologybased Reasoner
references serves
Catalogue Service
makes content explicit ISO 19115 Metadata
DISCOVERY
describes
GML-Schema serves
W FS
RETRIEVAL Data level
id pe ge l quell e hoehe datum uhrzei t 1 Deggendorf ht tp:/ /www.elwis.de/.. . 201 12.11. 2003 05:00 2 Oberndorf ht tp:/ /www.elwis.de/.. . 158 12.11. 2003 05:00
Data Source ELW IS "W asserstandsmessung"
Figure 5: Services and information items required for ontology-based discovery and retrieval of geographic information. To provide access to the ontologies, two new interfaces are defined (Figure 6): The Concept Definition Service interface allows access to the concepts of the shared vocabulary and application ontologies. The
101
Chapter V. An Architecture for GI Discovery and Retrieval
Concept Query Service interface allows to reason about possible matches with simple and defined concept search. In our prototype, both interfaces are implemented by a reasoning component that makes use of ontologies expressed in SHIQ. The second component is a cascading catalogue service that is “aware” of the application ontologies. It provides access through the standard OGC Stateless Catalogue Service interface, thus implementing the decorator design pattern (Gamma, 1995). It extends the functionality of the conventional catalogue service by analysing and manipulating the filters of metadata queries. If a filter constrains a query only to return metadata results with a specific feature type in the feature catalogue description section, the advanced matchmaking capabilities of the Concept Query Service are used. The returned list of concepts is also added to the filter. This allows the decoration of any conventional standard catalogue service because the expanded filter requires only the usual exact word match. The decorating catalogue service would also enable enhanced matchmaking on other metadata elements by plugging in additional services, e.g. a gazetteer of hierarchically ordered place names. The last component that deserves special attention is a user interface (UI) that utilizes the ontologies. It makes use of the Concept Definition Service to allow a user to formulate enhanced queries for metadata and geodata. Metadata queries for data sources with specific application schema information are supported by allowing the construction of a query concept. The concepts from the application ontologies support the formulation of WFS queries for unknown application schemas and the interpretation of the results. UI
Data Access Web Feature Service
Catalogue Service
Concept Definition Service
Enhanced Cascading Catalogue
Geodata
Ontology-based Reasoner Concept Query Service
Catalogue
Catalogue Service
Ontologies
ISO 19115 Metadata
Figure 6: Components and interfaces required for ontology-based discovery and retrieval
5.2 Interaction and Information Flow in the Motivating Example We come back to our motivating example to illustrate the interaction and information flow within the architecture (Figure 7).
102
Chapter V. An Architecture for GI Discovery and Retrieval
John wants to construct a defined concept query in the UI using the domain’s shared vocabulary. For this the UI component first retrieves the concepts of the shared vocabulary from the Ontology-based Reasoner. The user defines his query concept and a spatial query constraint that covers the Elbe catchment. The UI component then constructs a filter with a conjunction of the spatial constraint and a featureType constraint. For building the latter the metadata element in the application schema information section is constrained by the query concept. The filter is the input of the GetRecord request to the Enhanced Cascading Catalogue. The catalogue discovers that the filter contains a constraint on the content of the data source. It uses the query concept to get a list of all matching concepts from the Ontology-based Reasoner. It replaces the original query concept in the filter by a disjunction of all matching concepts. This filter is forwarded to the conventional catalogue, which performs an exact word-based match. The results of the GetRecord request are finally returned to the UI component. In the second step, the user wants to analyse the geodata he found with his query. The returned metadata documents contain a reference to a WFS to access the data. The data is encoded in GML. To get a description of the schema of the feature type, the UI invokes a DescribeFeatureType request and presents the GML Schema to the user. To help the user to interpret the schema, the UI also invokes the Ontology-based Reasoner to get the description of the concept defining this feature type. Because the concept is described with terms from the domain’s shared vocabulary, the user can select the correct properties he is interested in, i.e. the water level and the date/time of the measurement.
: UI
: Ontology-based Reasoner
: Enhanced Cascading Catalogue
: Catalogue Service
: Web Feature Service
DescribeSharedVocabulary()
GetRecord(ogcFilter) DefinedConceptQuery( )
The Filter now contains all matching concepts from each Aplication Ontology EnhanceFilter(ogcFilter) GetRecord(enhancedOgcFilter)
DescribeFeatureType( )
DescribeConcept( )
CreateFilter() GetFeature(ogcFilter)
Figure 7: Information flow within the architecture for the motivating scenario.
Chapter V. An Architecture for GI Discovery and Retrieval
103
6 CONCLUSION AND FUTURE WORK We have presented an approach and architecture for ontology-based discovery and retrieval of geographic information that can contribute to solving existing problems of semantic heterogeneity. The tested scenario comprises information items with simple structures. Future tests of the architecture will include more complex application schemas and examples from other domains. The presented architecture is component-based, i.e. it is extendable in various directions. So far, the Enhanced Cascading Catalogue Service and the Reasoner component are tightly coupled in the architecture. However, the standardized interfaces allow to extend the architecture with multiple and exchangeable components. It is also planned to extend the architecture with modules for spatial and temporal reasoning (Vögele, 2003) as well as gazetteer services. Also, in a future version of the architecture, the tasks of discovery and retrieval will be combined in one query. The user will then be able to formulate his actual question straight away (i.e. without having to perform a query on the metadata first) using terms from the familiar shared vocabularies. The discovery and the filter formulation for retrieval will then be automated within the system. This “intelligent” query capability will enhance the usability of existing geographic information even further. ACKNOWLEDGEMENTS We want to thank Sören Haubrock from Delphi IMM for providing the implementation of the WFS interfaces for the use case. The work presented in this paper has been supported by the German Federal Ministry for Education and Research as part of the GEOTECHNOLOGIEN program (grant number 03F0369A). It can be referenced as publication no. GEOTECH-50. REFERENCES Beck, H. & H. S. Pinto (2002): Overview of Approach, Methodologies, Standards and Tools for Ontologies. Bernstein, A. & M. Klein (2002): Towards High-Precision Service Retrieval. In Horrocks, I. & J. Hendler (eds.): The International Semantic Web Conference (Lecture Notes in Computer Science): 84-101. Bishr, Y. (1998): Overcoming the semantic and other barriers to GIS interoperability. International Journal of Geographical Information Science 12 (4): 299-314. Bishr, Y. & M. Radwan (2000): GDI Architectures. In Groot, R. & J. McLaughlin (eds.): Geospatial Data Infrastructures. Concepts, Cases, and Good Practice. Oxford, Oxford University Press: 135-150. Brox, C., Y. Bishr, W. Kuhn, K. Senkler & K. Zens (2002): Toward a Geospatial Data Infrastructure for Northrhine-Westphalia. Computer, Environment and Urban Systems 26: 19-37. Gamma, E., R. Helm, R. Johnson & J. Vlissides (1995): Design Patterns: elements of reusable objectoriented software. Boston, MA, USA, Addison-Wesley. GDI-NRW (2002): Catalog Services für GeoDaten und GeoServices, Version 1.0. International Organization for Standardization & OpenGIS Consortium. Gruber, T. R. [ed.] (1993): Toward Principles for the Design of Ontologies Used for Knowledge Sharing. Kluwer Academic Publishers (Formal Ontology in Conceptual Analysis and Knowledge Representation). Haarslev, V. & R. Möller (2001): Description of the RACER System and its Applications. International Workshop on Description Logics (DL-2001). Harvey, F., W. Kuhn, H. Pundt, Y. Bishr & C. Riedemann (1999): Semantic Interoperability: A central issue for sharing geographic information. The Annals of Regional Science 33: 213-232. Horrocks, I., U. Sattler & S. Tobies (2000): Reasoning with Individuals for the Description Logic SHIQ. In: McAllester, D. (ed.): 17th International Conference on Automated Deduction (CADE-17) (Lecture Notes in Computer Science): 482-496. OGC (1999): Topic 14: Semantics and Information Communities (Version 4), Open GIS Consortium. OGC (2002): Web Feature Service Implementation Specification (OGC 02-058), Open GIS Consortium.
104
Chapter V. An Architecture for GI Discovery and Retrieval
Schuster, G. & H. Stuckenschmidt (2001): Building shared ontologies for terminology integration. In: Stumme, G., A. Maedche & S. Staab (eds.): KI-01 Workshop on Ontologies. Stuckenschmidt, H. (2002): Ontology-Based Information Sharing in Weakly Structured Environments. PhD Thesis, Vrije Universiteit Amsterdam: Amsterdam. Studer, R., V. R. Benjamins & D. Fensel (1998): Knowledge Engineering: Principles and Methods. Data and Knowledge Engineering 25 (1-2): 161-197. Uschold, M. (1998): Knowledge level modelling: concepts and terminology. The Knowledge Engineering Review 13 (1): 5-29. Visser, U. & H. Stuckenschmidt (2002): Interoperability in GIS - Enabling Technologies. In: Ruiz, M., M. Gould & J. Ramon (eds.): 5th AGILE Conference on Geographic Information Science: 291-297. Vögele, T., S. Hübner & G. Schuster (2003): BUSTER - An Information Broker for the Semantic Web. Künstliche Intelligenz 3 ("Semantic Web"): 31-34. Wache, H., T. Vögele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann & S. Hübner (2001): Ontology-Based Integration of Information — A Survey of Existing Approaches. IJCAI-01 Workshop: Ontologies and Information Sharing: 108-117.
Chapter VI
Ontology-based Retrieval of Geographic Information Lutz, M. & Klien, E. (2006): Ontology-based Retrieval of Geographic Information, International Journal of Geographical Information Science (in press).26 Abstract. Discovering and accessing suitable geographic information (GI) in the open and distributed environments of current Spatial Data Infrastructures (SDIs) is a crucial task. Catalogues provide search-able repositories of information descriptions, but the mechanisms to support the tasks of GI retrieval are still insufficient. Problems of semantic heterogeneity caused by synonyms and homonyms can arise during free-text search in catalogues. Moreover, once a suitable data source is dis-covered and accessed, the property names of a feature are often difficult to interpret. In this paper we present an approach to ontology-based GI retrieval that con-tributes to solving existing problems of semantic heterogeneity and hides most of the complexity of the required procedure from the requester. A query language and graphical user interface allow a requester to intuitively formulate a query using a well-known domain vocabulary. The approach is implemented through several components that can be used as an extension to standard SDIs.
26 by c
Taylor & Francis. Reprint permission granted 27 October, 2005. See http://www.tandf.co.uk/journals/ titles/13658816.asp.
105
106
Chapter VI. Ontology-based Retrieval of Geographic Information
Ontology-Based Retrieval of Geographic Information1 Michael Lutz and Eva Klien Institut für Geoinformatik (IfGI) Westfälische Wilhelms-Universität Münster Robert-Koch-Str. 26-28, 48149 Münster, Germany {m.lutz|klien}@uni-muenster.de
Abstract: Discovering and accessing suitable geographic information (GI) in the open and distributed environments of current Spatial Data Infrastructures (SDIs) is a crucial task. Catalogues provide searchable repositories of information descriptions, but the mechanisms to support GI retrieval are still insufficient. Problems of semantic heterogeneity caused by the ambiguity of natural language can arise during keywordbased search in catalogues and when formulating a query to access the discovered data. In this paper we present an approach to ontology-based GI retrieval that contributes to solving existing problems of semantic heterogeneity and hides most of the complexity of the required procedure from the requester. A query language and graphical user interface allow a requester to intuitively formulate a query using a wellknown domain vocabulary. From this query, an ontology concept is derived, which is then used to search a catalogue for a data source that provides all the information required to answer the requester’s query. If a suitable data source is discovered, the relevant data is accessed through a standardised interface. The approach is implemented through several components that can be used as an extension to standard SDIs.
1 Introduction Efficient retrieval of distributed geographic information (GI) is a key factor in planning and decision-making in a variety of domains. The specifications provided by the Open Geospatial Consortium (OGC) enable syntactic interoperability and cataloguing of GI. However, a number of problems caused by semantic heterogeneity still present challenges for GI retrieval in the open and distributed environments of Spatial Data Infrastructures (SDIs). One possible approach to overcome these problems is the explication of knowledge by means of ontologies (Wache et al. 2001), i.e. explicit formal specifications of shared conceptualizations (Gruber 1995, Studer et al. 1998). In (Klien et al. 2004) we have presented an approach and architecture for ontology-based discovery and retrieval of geographic information that overcomes some problems caused by semantic heterogeneity. It supports a requester in formulating queries for (1) finding one GI source in a catalogue service that provides all the information required to solve the requester’s problem, and for (2) retrieving the discovered information through a Web Feature Service (WFS). However, the support 1
This is the text of the revised version of the manuscript, which has been accepted for publication in the International Journal of Geographical Information Science. The article will appear in early 2006.
Chapter VI. Ontology-based Retrieval of Geographic Information
for retrieval in this approach was limited. The requester was provided with an ontological description of the selected feature type and its properties. Based on this description, it was then up to the requester to interpret the meaning of the feature type’s property names and to select the appropriate ones for formulating a WFS query. In this paper we present an extension of the original approach that makes the overall task of GI retrieval more user-friendly by hiding the processes of discovering appropriate WFS and of formulating the actual WFS query. Thus, the requester’s only task is to formulate one query statement using terms from existing ontologies from which both the catalogue query and the WFS query can automatically be derived. The remainder of the paper is structured as follows. Section 2 introduces a motivating example for our work and gives a detailed description of the types of semantic heterogeneity problems that may occur during the process of GI retrieval. The theoretical basis for the presented approach is described in sections 3, which introduces guidelines for ontology development, and 4, which introduces our method to ontology-based discovery of geographic information. In section 5, the approach to ontology-based retrieval is presented, including a description of the workflow, the query languages, the user interface and an example walk through illustrating a prototypical implementation. The work is compared to related work in section 6, a conclusion and pointers to future work are given in section 7.
2 Semantic Heterogeneity Problems in GI Discovery and Retrieval Emerging SDIs facilitate the discovery and retrieval of distributed geographic information. However, some key problems caused by semantic heterogeneity remain to be solved. In this section, we present mechanisms and components for GI discovery and retrieval that already exist in current SDIs (section 2.1). We then introduce a running example that we use throughout the paper (section 2.2) and point out problems caused by semantic heterogeneity in this example (section 2.3). 2.1 GI Discovery and Retrieval in Current Spatial Data Infrastructures A main motivation for setting up SDIs is to make the work with geodata more efficient (McKee 2000, Nebert 2001) by addressing problems that occur with conventional GIS technology and geographic data sets. For the work presented here, the most crucial of these problems are that data sets exist in a plethora of different data formats, are stored in a variety of different systems and that both are often not sufficiently or not at all documented. There are two main standardisation efforts in the geospatial domain, whose goal is to overcome these problems: the ISO Technical Committee (TC) 211, which develops the 19100 series of standards, and the Open Geospatial Consortium (OGC). GI Discovery in SDIs. In an SDI context, clients and data sources are usually arbitrarily distributed in large networks and unaware of each other. In such a scenario, missing or insufficient documentation makes it difficult or even impossible for users to discover data sets and to assess whether a given data set is useful for their tasks. Catalogues are used to solve these problems, which makes them a fundamental part of SDIs. They allow a client (service consumer) to find spatial resources (data and services) available on servers (service providers) that are unknown to the client and fit the client’s needs. Service providers offer particular data access and geoprocessing (data manipulation) services. Both types of spatial resources are described by metadata. The catalogue itself consists of the metadata and the operations on these metadata. In general each service provider has to register (publish) his offerings by means of metadata to
107
108
Chapter VI. Ontology-based Retrieval of Geographic Information
a catalogue to enable accessibility. A catalogue may also collect metadata from known service providers (pull). In addition to these registration functions a catalogue provides “library functions” (discovery, browsing, querying) for service consumers. GI Retrieval in SDIs. The heterogeneity of formats and systems makes it difficult to pose queries and to use data sets in one’s own system and usually requires some form of conversion. In SDIs, this problem is addressed by standardising service interfaces and exchange formats. The standardisation of interfaces allows the classification of services in well-known service types that provide the behaviour specified by the interface. Thus, it becomes possible to connect arbitrary service instances as long as they are of a well-known service type. The specification of the Web Feature Service (WFS) (OGC 2002b) proposes interfaces for describing data manipulation operations on geographic features. Among other things, the WFS interface provides the ability to retrieve features based on spatial and non-spatial constraints (through the GetFeature operation) and to generate a schema description of the feature types provided by a WFS implementation (through the DescribeFeatureType operation). The definition of the Geography Markup Language (GML) (OGC 2003) as an open, vendor-neutral XML encoding for the definition of geospatial application schemas and objects increases the ability of organisations to share geographic information. GML provides a variety of kinds of objects for describing geography including features. A feature is an abstraction of a real world phenomenon; it is a geographic feature if it is associated with a location relative to the Earth. The number of properties a feature may have, together with their names and types, are determined by its type definition. 2.2 A Running Example Throughout this paper we use a running example to point out semantic heterogeneity problems that can occur when using state-of-the-art GI retrieval mechanisms and to illustrate how these problems can be solved using our proposed approach. Our work, however, is not restricted to only this example but is designed to be independent of a particular GI domain. Susan is a hydrologist who is interested in water levels of the Elbe River. As an expert in the field she knows the existing control points in the river. She wants to know the measurement of the water level at a specific control point at a specified time. Since Susan does not know about an existing WFS offering this kind of information, she makes use of an OGC-compliant catalogue in order to find appropriate information for answering her question: “What is the water level at control point X at time Y in the Elbe River?” Note that in this paper, we always assume that a requester searches for only one source that provides all the required information. There are several agencies that offer information about water levels in the Elbe River: x The Federal Agency for Hydrology (Bundesanstalt für Gewässerkunde, BafG) x The Electronic Information System for Waterways (Elektronisches WasserstraßenInformationssystem, ELWIS) x The Czech Hydrometeorological Institute (CHMI) While currently, these agencies only provide their data as HTML pages, the information can easily be parsed and provided through standardized WFS interfaces. The access points to these services are provided at http://www.meanings.de/. Table 1 lists the names of the GML features returned by these WFS and their property names.
Chapter VI. Ontology-based Retrieval of Geographic Information
Table 1. Names of the GML features returned by three WFS and their properties
WFS
BafG
ELWIS
CHMI
Feature
Pegelmessung
WasserstandMessung
StavVody
Name of the control point
name
pegel
stanice
Water level in m (BafG) or cm (ELWIS and CHMI)
wasserstand_m
hoehe
stav
Date and time of the measurement
zeitpunkt
datum
Date of the measurement
datum
Time of the measurement
uhrzeit
Geometry as Point
gml: pointProperty standort
gml:position
Name of the river
tok
Discharge in cubic meters per second
prutok
In order to obtain the required information Susan has to perform two tasks: first she has to find a WFS that provides suitable information to answer this question (GI discovery) and second she has to formulate a WFS query filter for accessing the data she needs (GI retrieval). 2.3 Problems Caused by Semantic Heterogeneity In current standards-based catalogues (e.g. (GDI-NRW 2002)) users can formulate queries using keywords and/or spatial filters. The metadata fields that can be included in the query depend on the metadata schema used (e.g. ISO 19115) and on the query functionality of the service that is used for accessing the metadata. Even though natural language processing techniques can increase the semantic relevance of search results with respect to the search request (e.g. (Richardson and Smeaton 1995)), keyword-based techniques are inherently restricted by the ambiguities of natural language. If different terminology is used by providers and requesters keyword-based search can have low recall, i.e. not all relevant information sources are discovered. If terms are homonymous or because the ability to express complex queries in keyword-based search is limited precision can also be low, i.e. some of the discovered services are not relevant (Bernstein and Klein 2002). For example, if Susan, who is interested in the measurement of the water level at a specific control point of the river Elbe, uses “water level” as a keyword she may fail to find the existing WFS that are offering this information (low recall), because their metadata descriptions use slightly different terminologies, e.g. “depth” or “watermark” (table 2). Furthermore, she might also discover GI that are annotated with this keyword but not appropriate for answering her question (low precision), e.g. a service providing groundwater rather than surface water levels.
109
110
Chapter VI. Ontology-based Retrieval of Geographic Information
Table 2. Keywords in the metadata used to describe the three WFS
WFS
Keywords in the Metadata
Bafg
water level, measurement, Elbe
ELWIS
control point, tide scale, river, depth
CHMI
watermark, measurement gauge, Elbe
Another major difficulty can arise during the second task when Susan wants to access GI via one of the discovered WFS. The DescribeFeatureType request (OGC 2002b) returns the application schema for the feature type, which is essential for formulating a query filter. Susan now runs into trouble if the property names are not intuitively interpretable. For example, she can only guess that the property names “hoehe” (ELWIS) or “wasserstand_m” (BafG) (see table 1) both refer to the measurement of the water level in a river. Also, it is not obvious that the first measurements are given in centimeters while the second measurements are given in meters. In this scenario it might be sufficient to offer Susan a natural language description for each property. However, our work is aiming at automating the process of discovery and retrieval and this makes a machine-interpretable description of the properties indispensable.
3 Semantic Descriptions of Geographic Information To overcome the problems described above, we propose to use ontological descriptions of information sources. An ontology is an explicit formal specification of a shared conceptualization (Gruber 1995, Studer et al. 1998), where a conceptualization can be defined as a way of thinking about some domain (Uschold 1998). By using ontologies to enrich the description of information sources, the semantics of their content become machineinterpretable, and users are enabled to pose concise and expressive queries. Furthermore, logical reasoning can be used to discover implicit relationships between search terms and information descriptions as well as to flexibly construct taxonomies for classifying information sources. In this section, we first introduce the overall approach to building ontologies in section 3.1. We then introduce guidelines for building domain and application ontologies (sections 3.2 and 3.3). We conclude the section with introducing the link between the schema of a feature type and its application concept through registration mappings in section 3.4. 3.1 Ontology Approach The ontologies used for making the semantics of information sources explicit can be organized in different ways. In (Wache et al. 2001), a classification of different ontology architectures for data integration is introduced. In the following, we translated the different approaches into the domain of GI discovery. In multiple ontology approaches, each information source or query is described by its own local (or application) ontology. In principle, each of these local ontologies can be a combination of several other ontologies. However, it cannot be assumed that several local ontologies share the same vocabulary. This lack of a common vocabulary makes it difficult to compare different application ontologies.
Chapter VI. Ontology-based Retrieval of Geographic Information
In contrast, single ontology approaches use one global ontology, which provides a shared vocabulary for specifying the semantics of all information sources and queries. Such approaches can be applied to problems where all semantic descriptions available in a catalogue have been created with a very similar view on a domain, which also has to be shared by all requesters. Hybrid approaches also use a global shared vocabulary, which contains basic terms (the primitives) of a domain. These can be combined the describe the (more complex) semantics of each information source or query in separate application ontologies. In contrast to multiple ontology approaches, the concepts in these application ontologies remain comparable, because they are based on the primitives from the shared vocabulary. In both the single ontology and the hybrid approach, it is assumed that the semantics of the primitives is understood (and this understanding is shared) by all requesters and providers in the domain. Therefore, the primitives require no further formal definitions. Nevertheless, it can sometimes be useful to represent a shared vocabulary as an ontology in order to impose a structure on it. Such an ontology is called a domain ontology. In our work we adopt the hybrid ontology approach. This allows providers and requesters to flexibly build application ontologies. At the same time, the application ontologies remain comparable, which is a crucial prerequisite for matching queries to advertisements. We consider the shared vocabulary to consist of several domain ontologies, each of which describes concepts and relations in a particular domain of interest. Not all of these domain ontologies have to be on the same level of abstraction. Also, it is possible for ontologies on a more specific level to include concepts from ontologies on a more abstract level. In the running example, two domain ontologies describe the more abstract domain of measurements and the more specific domain of hydrology. The specialized application ontologies, which use concepts and relations from one or several domain ontologies, are used for annotating specific information sources such as the ELWIS, BafG and CHMI WFS (figure 1). shared vocabulary domain ontology „Measurements“
domain ontology „Hydrology“
provides basic concepts and relations for specifying
application ontology „ELWIS“
application ontology „BafG“
application ontology „CHMI“
are used for semantic annotations of
ELWIS
BafG
CHMI
Figure 1. The hybrid ontology approach, modified from (Wache et al. 2001)
The ontologies shown in this paper are expressed using a Description Logic (DL) (Baader and Nutt 2003) notation used in the RACER system (Haarslev and Möller 2004). DL is a family of knowledge representation languages that are subsets of first-order logic (for a mapping from DL to FOL, see e.g. (Sattler et al. 2003)). They provide the basis for the Ontology Web Language (OWL), the proposed standard language for the Semantic Web (Antoniou and Van Harmelen 2003).
111
112
Chapter VI. Ontology-based Retrieval of Geographic Information
The basic syntactic building blocks of a DL are atomic concepts (unary predicates), atomic roles (binary predicates), and individuals (constants). The expressive power of DL languages is restricted to a small set of constructors for building complex concepts and roles. Implicit knowledge about concepts and individuals can be inferred automatically with the help of inference procedures (Baader and Nutt 2003). A DL knowledge base consists of a TBox containing intensional knowledge (declarations that describe general properties of concepts) and an ABox containing extensional knowledge that is specific to the individuals of the domain. In our work, we only use TBox language features, namely x concept definition: (define-concept C D), x concept inclusion: (implies C D), and x role definition: (define-primitive-role R :parent P :domain C :range D). The domain of a role is a concept describing the set of all things from which this role can originate. This notion of the term should not be confused with the notion “domain of interest” (as in domain ontology). The range of a role is a concept describing the set of all things the role can lead to. Concepts can be defined using the following constructors: D o (and E F) (or E F) (all R C) (some R C) (at-least|at-most|exactly n R)
(intersection) (union) (value restriction) (existential quantification) (number restrictions)
One major advantage of (simple) DLs (like the one employed in this paper) over FOL is that their inference procedures are decidable (Sattler et al. 2003). Of the available inference procedures, the possibility to compute subsumption relationships between concepts is of special importance for our work. Popular DL reasoners include e.g. RACER (Haarslev and Möller 2004) and Pellet (http://www.mindswap.org/2003/pellet/). For a more detailed introduction to DL languages including different subsumption algorithms see (Baader and Nutt 2003). 3.2 Creating Domain Ontologies It has been suggested that relationships or roles play a central part in ontology engineering (Hart et al. 2004). Rather than building simple concept hierarchies, as is common practice in many existing approaches to semantic information discovery (e.g. (Paolucci et al. 2002, Stuckenschmidt et al. 2004, Vögele and Spittel 2004), we suggest that roles be used wherever possible for defining concepts. Using this approach, ontology engineers can build richer ontologies, which contain not only taxonomic but also non-taxonomic relationships. Thus, a concept does not have to be given a fixed position in a static hierarchy. Rather, its position in the hierarchy can be dynamically inferred based on existing concept and role definitions using subsumption reasoning. Based on these assumptions, we suggest a few guidelines that are meant to facilitate and (to some degree) standardise the development of domain ontologies. They will be illustrated using examples from the domain ontologies shown in figure 2. For readers who are unfamiliar with the DL notation used, a schematic representation of the ontologies is given in figure 3.
Chapter VI. Ontology-based Retrieval of Geographic Information
MEASUREMENTS Concept Definitions
Ranges and Domains of Roles
(define-concept Measurement (and (at-least 1 quantityResult) (exactly 1 location) (exactly 1 timeStamp)))
(define-primitive-role quantityResult :domain Measurement :range Quantity) (define-primitive-role location :range gml_Point)
(define-concept Quantity (and (exactly 1 observable) (exactly 1 value) (exactly 1 unitOfMeasure)))
(define-primitive-role timeStamp :range (or xsd_Date xsd_DateTime)) (define-primitive-role value :domain Quantity :range xsd_Decimal)
(implies Depth Phenomenon)
(define-primitive-role unitOfMeasure :domain Quantity :range Unit)
(implies Centimeter Unit)
(define-primitive-role observable :domain Quantity :range Phenomenon) HYDROLOGY Concept Definitions (define-concept HydrologicalQuantity (and Quantity (all observable HydrologicaPhenomenon) (exactly 1 observedWaterBody)))
Ranges and Domains of Roles (define-primitive-role observedWaterBody :domain HydrologicalQuantity :range WaterBody)
(implies HydrologicalPhenomenon Phenomenon) (implies WaterLevel (and Depth HydrologicalPhenomenon)) (implies Discharge HydrologicalPhenomenon) (implies Lake WaterBody) (implies River WaterBody)
Figure 2. Examples for role and concept definitions from the domains of Measurements and Hydrology
Figure 3. Schematic representation of the DL definitions given in Figure 2
113
114
Chapter VI. Ontology-based Retrieval of Geographic Information
x In order to link roles and concepts, the ranges and, if possible, domains of roles should be defined. For example, the domain of the role observable is restricted to the concept Quantity and its range to the concept Phenomenon. If a role’s range has been defined, it is sufficient to state in a concept definition that the concept has this role. The range does not have to be specified again (unless it is to be restricted within the scope of that concept). If a role’s domain has been defined and the role is used in some concept definition, it can be inferred that the concept is a subconcept of the role’s domain. For this reason, domains should not be specified if a role might also be used with other concepts. For example, the domain of the role location might include other concepts than Measurement (i.e. all concepts that have a location) and therefore is left undefined. x As the relationships between concepts and roles have already been established by defining the ranges and domains of roles, concept definitions can be kept relatively simple. Peripheral concepts can be derived as subconcepts from other concepts to form simple hierarchies, e.g. to state that a Depth is a kind of Phenomenon. However, this should not be done for terms central to the domain, such as Measurement or Quantity in the domain of measurements. Central concepts should be defined using (a) value and (b) cardinality restrictions on existing roles. These restrictions must at least be sufficient to distinguish the concept from all other concepts in the domain. Ideally, they should also restrict possible interpretations of the defined concept to the intended interpretation. (a) Value restrictions are required to further constrain the range of a role for instances of the given concept. For example, all instances of a HydrologicalQuantity are only allowd instances of a HydrologicalPhenomenon (rather than any Phenomenon) as an observable. The value restrictions have to be consistent with the overall range definitions, i.e. HydrologicalPhenomenon has to be subsumed by Phenomenon. (b) Cardinality restrictions limit the number of occurrences of the restricted role in the given concept. For example, instances of Measurement must at least have one quantityResult and exactly one location and timeStamp. x Where possible, the ranges of roles should be mapped to XML schema datatypes (W3C 2001) such as string or decimal or simple GML geometry types (OGC 2003) such as point or polygon (table 3). This ensures that value comparisons can be used and evaluated in the user’s query statements. For example, the range of timeStamp is defined as being either a xsd_Date or xsd_DateTime. We assume that the semantics (and syntax) of these datatypes are well-known and agreed-upon. Therefore, for each XML schema datatype one equivalent concept is introduced (in specific XML datatypes and GML geometry types domain ontologies) without any further definitions.
Chapter VI. Ontology-based Retrieval of Geographic Information
Table 3. XML schema datatypes and GML geometry types XML schema datatypes GML geometry types primitive
derived
string
normalizedString
integer
PointType
boolean
token
long
LineStringType
decimal
language
int
PolygonType
float
Name
short
CurveType
double
NCName
byte
ArcType
duration
ID
nonNegativeInteger
CircleType
…
…
dateTime time date …
3.3 Creating Application Ontologies The focus of application ontologies is on describing a concept that represents a geographic feature type. This feature type concept is defined by referring to and further restricting existing concepts and roles from the domain ontology. In our example, one concept is defined for each of the feature types provided by the three WFSs (table 1). The creation of application ontologies should follow the same guidelines as those described above for domain ontologies. Additionally, the following guidelines apply: x The purpose of the application ontology is to represent the feature type’s semantics rather than to capture its application schema. Therefore, application concepts should be derived (as subconcepts) from existing domain concepts. This strategy ensures that not only explicit information (i.e. what is represented in the schema) but also implicit information (such as the kind of observable or the unit of measurement in our example) is included in the concept definition. x When application concepts are derived from a domain superconcept, their definitions only have to include (a) axioms that further restrict the superconcept’s definition or (b) additional roles. (a) The axioms can be (all-quantified) value restrictions, which further constrain the range of a role, or cardinality restrictions, which further constrain a role’s cardinality. (b) Typically, application ontologies will contain few definitions of new roles. x Additional roles should be introduced to express application-specific relationships that do not exist in the domain ontology. These roles, however, cannot be used by requesters in their queries and therefore are not useful in GI discovery. Of course, it is always possible for a provider to include the new role(s) in the definition of a new domain ontology if he considers them to be relevant for more than just the application at hand. x Subroles of existing roles from the domain ontology should be introduced, if otherwise the domain role occurred several times with different ranges in a concept definition. If R1 is a super-role of R2, then for all pairs of individuals between which R2 holds, R1 must hold too (Haarslev and Möller 2004). This means that the range and/or the domain of sub-roles are more restricted than
115
116
Chapter VI. Ontology-based Retrieval of Geographic Information
those of the super-roles. The introduction of subroles with different ranges introduces an additional layer of granularity that allows to distinguish each occurrence of a role in a concept definition. This is a requirement of the registration mapping approach (section 3.4). To illustrate these guidelines, let us consider the concept elwis_Measurement, which represents the ELWIS feature type (figure 4). It is defined as a specific kind of Measurement (the domain superconcept) with exactly 1 quantityResult (cardinality restriction) measuring a WaterLevel in a River in Centimeter (value restrictions). Its timeStamp is given as an xsd_Date (value restriction) and it includes the additional roles elwis_timeOfDay and name. The definition of the concept representing the BAFG feature type is very similar. In contrast, the concept representing the CHMI feature type differs considerately from the other two. As the CHMI feature type contains two measurement values, one for a water level and one for the discharge, the corresponding concept definition would have to include two quantityResult roles with different ranges. In order to distinguish between both roles, two subroles, chmi_qRWaterLevel and chmi_qRDischarge, are introduced. Both roles are derived from the parent role quantityResult and have different range restrictions. Concept Definitions (define-concept elwis_Measurement (and Measurement (exactly 1 quantityResult) (all quantityResult (and (all unitOfMeasure Centimeter) ELWIS (all observable WaterLevel) (all observedWaterBody River))) (all timeStamp xsd_Date) (exactly 1 elwis_timeOfDay) (exactly 1 name) ) )
BafG
Role Definitions (define-primitive-role elwis_timeOfDay :range xsd_Time)
(define-concept bafg_Measurement (and Measurement (exactly 1 quantityResult) (all quantityResult (and (all unitOfMeasure Meter) (all observable WaterLevel) (all observedWaterBody River))) (all timeStamp xsd_DateTime) (exactly 1 name) ) )
(define-concept chmi_Measurement (and Measurement (exactly 1 chmi_qRWaterLevel) (exactly 1 chmi_qRDischarge) (all timeStamp xsd_DateTime) (exactly 1 name) ) CHMI )
(define-primitive-role chmi_qRWaterLevel :parent quantityResult :range (and (all unitOfMeasure Centimeter) (all observable WaterLevel) (all observedWaterBody (and River (exactly 1 name)))) (define-primitive-role chmi_qRDischarge :parent quantityResult :range (and (all unitOfMeasure CubicMeter) (all observable Discharge) (all observedWaterBody River (and River (exactly 1 name))))
Figure 4. Examples for defining application concepts and roles
Chapter VI. Ontology-based Retrieval of Geographic Information
3.4 Registration Mappings For ontology-based GI discovery it is sufficient to describe and reason about application concepts that represent feature types. For ontology-based GI retrieval, however, more specific information on the feature type’s structure is required. To describe the relationships between feature type structure and application ontology, we have adopted the notion of registration mappings suggested in (Bowers and Ludäscher 2004). An example registration mapping for the CHMI feature type is shown in figure 5. 859015.7375721685,5624676.8195826 LABE USTI N.L. 151 98.9 2003-11-02T07:00:00
Structural Path /StavVody /StavVody/gml:position/gml:Point /StavVody/tok/text() /StavVody/stanice/text() /StavVody/stav /StavVody/stav/text() /StavVody/prutok /StavVody/prutok/text() /StavVody/datum/text()
Conceptual Path l chmi_Measurement l chmi_Measurement.location l chmi_Measurement.quantityResult.observedWaterBody.name l chmi_Measurement.name l chmi_Measurement.chmi_qRWaterLevel l chmi_Measurement.chmi_qRWaterLevel.value l chmi_Measurement.chmi_qRDischarge l chmi_Measurement.chmi_qRDischarge.value l chmi_Measurement.timeStamp
Figure 5. Sample GML file (top) and registration mapping (bottom) for the CHMI feature type
The main idea of registration mappings is to have separate descriptions of the application concept C (called semantic type in (Bowers and Ludäscher 2004)) and of the structural details of the feature type it describes (called structural type). This has the following advantages: x The application concept can be defined or updated after the service is deployed, without requiring changes to the structural type. x The semantics of the feature type can be specified more accurately in application concepts if the specification does not try to mirror the feature type’s structure. This is especially true for feature types that have a “flat” structure that does not well reflect the conceptual model of the domain (e.g. the property tok in figure 5, which represents the name of the river, where the water level measurement was taken). A registration mapping consists of a set of rules that define associations between a feature type’s structural and semantic types. They can be used to derive data transformations or, in our case, to specify a query filter for a WFS query. The rules have the form q ļ p, where q is a query expression that selects instances of the structural type to register to a concept denoted by the contextual path p. The structural-type query is defined for XML using a subset of XPath (W3C 1999). A query q is expressed using the syntax shown in figure 6, where n represents an element tag
117
118
Chapter VI. Ontology-based Retrieval of Geographic Information
name and v represents a text value. The expression /text(), which selects the PCDATA content of an element, usually maps to a contextual path whose range is an XML schema datatype. Taking an example from the CHMI feature type (table 1), /StavVody/stav/text() selects the content of the feature type’s stav property (i.e. the result Susan is interested in). A contextual path denotes a concept, possibly within the context of other concepts. It takes the form C.r1.r2. … .rn for n 0, where r1 to rn are valid properties defined for the semantic type of P. Taking an example from the domain ontology shown in figure 2, Measurement.quantityResult.unit is a contextual path, where the concept selected by the path is Unit within the context of a Measurement’s result. q := /p p := n | p/p | p/text() | p[c] c := p | p=v Figure 6. Syntax of structural-type queries as defined in (Bowers and Ludäscher 2004)
The registration mapping shown in figure 5 illustrates the need to introduce new roles in the application ontology in order to prevent multiple occurrences of the same role in a registration mapping (cf. section 3.3). The CHMI feature type contains measurement results for two observables (water level and discharge). Both results could be described using the domain role quantityResult. Then, however, both the stav and the prutok elements in the XML file would be mapped to the same conceptual path chmi_Measurement.quantityResult. By introducing two subroles of quantityResult (chmi_qRWaterLevel and chmi_qRDischarge) the resulting ambiguity can be prevented. In some cases, the representation of the feature’s geometry can also present a problem. In our example, we assume that Susan is familiar with simple GML geometry types such as the gml:Point element used in the CHMI feature type. Therefore, we map the whole element to the contextual path chmi_Measurement.location and leave the interpretation of that element to Susan. Specifying a more detailed mapping that also describes the coordinates of the point can be problematic in some cases, e.g. with the CHMI feature type. Here, both X and Y coordinates are provided as comma-separated values in a single coordinates element. To extract each coordinate value separately, more sophisticated XPath methods, e.g. substring-before() and substring-after(), are required. In order to be able to use registration mappings in our approach, a machine-interpretable representation is also required. We have defined a simple XML schema for this purpose. Thus, the registration mapping can be stored in an XML file and put on some web server. The semantic annotation of a feature type is then simply implemented as a reference to this XML file in its 19115 metadata document.
4 Ontology-Based GI Discovery Our approach for ontology-based discovery of GI is based on semantic matchmaking between DL concepts representing geographic feature types (i.e. classes of geographic objects with common characteristics) on the one hand and the user’s query on the other hand. Feature types are described by specific application concepts that are built using roles and concepts from a shared vocabulary as described in section 3.3. The user’s query concept can either be a concept from an existing application ontology, or it can be defined based on terms from the shared vocabulary.
Chapter VI. Ontology-based Retrieval of Geographic Information
A terminological reasoning engine, in our case RACER (Haarslev and Möller 2004), is used to find out which of the application concepts are equal to or subsumed by (i.e. are more specific than) the query concept. All concepts for which this is the case are considered to be a match for the query. In this section, we show how ontology concepts can be used to enhance GI discovery in our motivating example (section 4.1) and we introduce the architecture that supports ontology-based discovery (section 4.2). 4.1 Ontology-based GI Discovery in the Running Example In our example, for each of the feature types provided by a WFS a concept is defined (figure 4). All concepts are derived from the domain concept Measurement and introduce name as an additional property. However, the concepts differ in the restrictions they place on the range of the water level’s unitOfMeasure (Meter for bafg_Measurement, Centimeter for the other two concepts). Also, some provide further application-specific properties (e.g. chmi_riverName for chmi_ Measurement). Through their reference to the domain concept, all definitions also imply that the feature types provide at least one quantityResult and exactly one location and one timeStamp. To illustrate our approach, we show three query concepts representing possible queries defined by Susan. In all queries, Susan is interested in feature types that have WaterLevel as an observable and provide a location, a timeStamp and a name (figure 7). While in the first query Susan does not require anything else, in her second query she wants the unit of measure to be Centimeter. In her third query, in addition to the water level in centimeters, she wants the measurement to also contain the discharge in cubic meters. Query Concepts Query_1
(define-concept Query_1 (and (some quantityResult (all observable WaterLevel)) (some location *top*) (some timeStamp *top*) (some name *top*) ) )
Query_2 (define-concept Query_2 (and (some quantityResult (and (all unitOfMeasure Centimeter) (all observable WaterLevel))) (some location *top*) (some timeStamp *top*) (some name *top*) ) )
Query_3 (define-concept Query_3 (and (some quantityResult (and (all unitOfMeasure Centimeter) (all observable WaterLevel))) (some quantityResult (and (all unitOfMeasure CubicMeter) (all observable Discharge))) (some timeStamp *top*) (some name *top*) ) )
Figure 7. Examples for defining query concepts
A classification of the application and query concepts in RACER (figure 8) shows that all three feature type concepts are subsumed by the first query concept. Thus, in contrast to the keyword-based search, all services are correctly discovered. The second query concept only subsumes elwis_Measurement and chmi_Measurement, while the third query concept only subsumes chmi_Measurement. Again, this is the desired result as both feature types provide water level measurements in centimeters, but only chmi_Measurement also provides discharge measurements. This illustrates that compared to keyword queries the ontology-based approach can increase recall and precision.
119
120
Chapter VI. Ontology-based Retrieval of Geographic Information
Figure 8. Subsumption hierarchy including three query concepts and the application concepts for the three feature types introduced in section 2.2
4.2 Architecture for GI Discovery In order to support the advanced query capabilities described above, some new service interfaces and information items are needed in addition to the well-known components as catalogue services of current SDIs. First, we have to provide the application ontologies. For each application schema offered via a WFS there is one application ontology described with the shared vocabulary of the corresponding domain ontologies (as described in section 3.3). These ontologies provide the formal description of the application schema of a data source. The application ontology can be accessed through a reference in the ISO 19115 metadata documents for that data source (a detailed description of this reference can be found in section 3.4). To provide access to the ontologies, two new components are defined. The Ontologybased Reasoner is a central component responsible for storing, managing and reasoning on the ontologies in a domain. It provides two interfaces, one for accessing the shared vocabulary and application ontologies and one for reasoning about possible matches with simple and defined concept search (using RACER). A concept is considered a match if it is equal to or subsumed by the query concept. The Cascading Catalogue Service extends the functionality of the conventional catalogue service by analysing and manipulating the filters of metadata queries that are enriched with DL query concepts. It provides access through the standard OGC Stateless Catalogue Service interface. If a catalogue query contains a DL query concept, RACER is accessed to compute a list of subconcepts from existing application ontologies. These concepts are added to the query, which can then be sent to any conventional standard catalogue service because the expanded query requires only a match based on string comparison. Finally, a client supports the user in formulating catalogue queries that contain a DL query concept for the required feature type. In the current implementation, the user interface is based on so-called Query Templates, which contain allowed combinations of roles and concepts. On the one hand, these templates prevent inexperienced users from defining queries that do not make sense and reduce the amount of terms that are presented to the user. On the other hand, they also seriously limit the expressiveness of possible user queries. One of the goals of the research presented in this paper was to increase the expressiveness of user queries while at the same time keeping the user interface easy to use. The information flow within the described architecture is described in more detail in (Klien et al. 2004). A prototypical implementation can be accessed from http://www.meanings.de/.
5 Ontology-Based GI Retrieval After having set the stage in the previous sections, we now describe the ontology-based retrieval of geographic information in more detail. Section 5.1 will give an overview of the steps that are required for ontology-based GI retrieval. In section 5.2, we introduce a simple syntax for an ontology-based GI query language. This query language provides the basis for the implementation of a user interface that helps the requester to define a semantic query (section 5.3). How DL query concepts are derived from this query is illustrated in section 5.4. Finally, section 5.5 describes how to build the query and filter for the selected WFS.
Chapter VI. Ontology-based Retrieval of Geographic Information
5.1 Ontology-Based GI Retrieval in the Running Example Our goal is to make the overall task of GI retrieval more user-friendly by hiding the processes of discovering appropriate WFSs and of formulating the actual WFS query. Thus, Susan’s only task should be to formulate a query for GI retrieval using terms from existing domain ontologies. This section gives an overview on how this is to be achieved by giving a detailed account of what happens “behind the scenes”. The individual steps are illustrated in two UML sequence diagrams in figure 9 (for GI discovery) and figure 10 (for GI retrieval).
Susan:Requester
:Query Client
:OBR
:CS-W
1. getVocabulary
2. provideQueryUI 3. submit query
4. deriveFeatureTypeQueryConcepts
5. getRelatedConcepts
6. buildCSQuery
7. query
Figure 9. UML sequence diagram illustrating steps 1-7 in the proposed approach to ontology-based retrieval of geographic information
Susan’s entry point for the ontology-based retrieval of geographic information is a component which provides an intuitive query language and user interface (subsequently called query client). By “intuitive” we mean that Susan should be familiar with the elements of the language and the user interface. The research described in this paper is restricted to a simple query language (section 5.2) and user interface (section 5.3); the implementation of a more sophisticated user interface offering a well-designed working environment is left to future research. The query client queries the ontology-based reasoner for terms from existing domain ontologies (step 1) and provides Susan with a user interface for formulating her query (step 2). After Susan has submitted her query (step 3), it is translated into one or several DL concepts (step 4). These concepts are used as query concepts for discovering WFS that provide semantically appropriate feature types. This discovery process follows the approach described in section 4.1 (steps 5 to 7).
121
122
Chapter VI. Ontology-based Retrieval of Geographic Information
Susan:Requester
:Query Client
:WFS 8. displayResults
9. choose WFS
10. derivePropertyNames
11. buildWFSQuery
12. GetFeature
13. derivePropertyNames
14. displayResults
Figure 10. UML sequence diagram illustrating steps 8-14 in the proposed approach to ontology-based retrieval of geographic information
If the ontology-based search yields no or more than one result, Susan gets notified (step 8). In the former case, she modifies her query, in the latter case she selects one of the discovered feature types (step 9). In order to access the chosen source through its WFS interface, a GetFeature query including spatial and/or non-spatial constraints has to be constructed using the property names of the feature type’s application schema. These names can be obtained from the registration mapping of the selected feature type (steps 10 and 11). Finally, the WFS query is executed (step 12) and its results are translated into terms from domain ontologies. Again, this translation can be obtained from the registration mapping (steps 13 and 14). 5.2 Query Language Requesters cannot be expected to formulate complex DL query concepts such as those presented in section 3. Rather, they should be provided with an intuitive query language as well as a graphical user interface. In this section, we propose a simple syntax for semantic queries, which closely resembles an SQL select statement. This syntax provides the basis for the user interface, which is described in the following section. With the proposed language, users should be able to select properties of specific feature types, possibly using one or several constraints. Properties correspond to roles in the shared vocabulary, while feature types correspond to concepts. The connection between roles and concepts is expressed using type variables and the “.” connector. Constraints are expressed
Chapter VI. Ontology-based Retrieval of Geographic Information
using a where clause and can be combined via conjunction (logical and) or disjunction (logical or). A constraint can either be a type restriction or a comparison with a value specified by the requester. Value constraints can only be defined for roles whose range is an XSD datatype or a GML geometry type. In addition to common string and number comparators (such as >= or startsWith), spatial comparators such as withinBoundingBox, intersects or within-distance-of can be used. The comparators supported in WFS queries are given in (OGC 2001), their semantics is defined in (ISO TC 211 2002). An example query statement for finding water level measurements for the Elbe river provided in centimeters for a given date (2004-04-22) and location is given in figure 11. SELECT x.quantityResult.value FROM Measurement x WHERE (x.quantityResult.observable hasType WaterLevel) AND (x.quantityResult.unitOfMeasure hasType Centimeter) AND (x.quantityResult.observedWaterBody.name = “Elbe”) AND (x.dateStamp = 2004-04-22) AND (x.location isWithinBoundingBox (11,52,13,54)) Figure 11. Example for a semantic query statement. The keywords of the proposed syntax are shown in capitals, the comparators in italics
5.3 User Interface As a first step towards intuitive and user-friendly semantic GI retrieval, a desktop client application providing a simple graphical user interface (figure 12) has been prototypically implemented. Transferring the implementation to a web platform will be part of our future work. The design of the user interface is directly derived from the model of the query language. It provides users with a number of concepts corresponding to feature types that are defined in the domain ontology. After selecting one of these concepts, users can build a select statement by choosing one or several roles that are associated with this concept and that represent the feature type’s properties (figure 13).
Figure 12. Simple user interface for defining semantic queries
123
124
Chapter VI. Ontology-based Retrieval of Geographic Information
Figure 13. Dialog for selecting roles that represent properties that are to be selected or used in constraints
Now the constraints can be defined. Again, the user has to select one or several roles that are to be used for these constraints. For each of the selected roles, a constraint input panel and a constraint object are constructed. The types of the input panel and the constraint object depend on the range of the selected role: a value constraint (input panel) if the range is an XML datatype, a type constraint (input panel) otherwise. Also, the logical connector (AND or OR) can be chosen. In the lower part of the screen, the constructed query and the derived DL query concept are displayed. Both are dynamically adapted as the user chooses properties and defines constraints. The detailed procedure for deriving the query concept from the query is described in the following section. 5.4 Deriving DL Query Concepts for GI Discovery In order to discover WFSs that provide suitable feature types for answering the query, the query statement has to be translated into one or several DL query concepts (step 4 in the overall approach described in section 5.1). These concepts are used for the ontology-based discovery as described in section 4.1. Step-by-step instructions for this translation are given below using the example query shown in figure 11. In the illustrating figure (figure 14), the changes from the previous step are shown in bold. x Select statement. The specification of the query concept is based on the concept specified after the FROM keyword, i.e. Measurement in our example (figure 14.1). x Where clause. The where clause is represented as a disjunction or conjunction of the DL equivalents of its constraints (depending on the logical connectors defined in the query statement). The where clause is then added as a conjunction to the query concept. (figure 14.2). x Type constraints are expressed through (all-quantified) value restrictions in DL (figure 14.2). If a property is represented by several roles (e.g. x.quantityResult.unitOfMeasure) only the range of the last role (unitOfMeasure) can be restricted. The other roles are represented using existential quantification. E.g. the expression (some quantityResult (all unitOfMeasure Centimter)) specifies a concept that has at least one quantityResult whose unitOfMeasure is restricted to Centimeter. x Value constraints. We assume that the ranges of the roles provided in the shared vocabulary have already been defined. Therefore, value constraints are simply represented using existential quantification on the specified roles (figure 14.3). All roles whose domain contains the concept that represents the selected feature type (timeStamp and location in our example) do not have to be explicitly included in the query concept.
Chapter VI. Ontology-based Retrieval of Geographic Information
(1) User Query:
SELECT x.quantityResult.value FROM Measurement x
DL Query Concept: (define-concept query Measurement ) (2) User Query:
SELECT x.quantityResult.value FROM Measurement x WHERE (x.quantityResult.observable hasType WaterLevel) AND (x.quantityResult.unitOfMeasure hasType Centimeter)
DL Query Concept: (define-concept query (and Measurement (some quantityResult (all observable WaterLevel) (all unitOfMeasure Centimeter))) ) (3) User Query:
SELECT x.quantityResult.value FROM Measurement x WHERE (x.quantityResult.observable hasType WaterLevel) AND (x.quantityResult.unitOfMeasure hasType Centimeter) AND (x. quantityResult.observedWaterBody.name = “Elbe”) AND (x.dateStamp = 2004-04-22) AND (x.location isWithinBoundingBox (11,52,13,54))
DL Query Concept: (define-concept query (and Measurement (some quantityResult (all observable WaterLevel) (all unitOfMeasure Centimeter) (some observedWaterBody (some name *top*)))) ) Figure 14. Mapping user queries to DL query concepts
5.5 Deriving a WFS Query Filter for GI Retrieval After having discovered a WFS providing an appropriate feature type, the actual WFS query filter has to be built (step 11 in the overall approach described in section 5.1). For this, the property names used in the chosen WFS that are equivalent to the domain ontology terms used in the query statement have to be derived. Also, the structure of the WFS’s feature type has to be known. All the required information can be accessed from the feature type’s registration mapping. Again, we refer to the CHMI feature type for illustration (see the registration mapping in figure 5). For this feature type, the example query shown in figure 11 can be translated into the WFS GetFeature request (OGC 2002b) and filter expression (OGC 2001) shown in figure 15.
125
126
Chapter VI. Ontology-based Retrieval of Geographic Information
stav StavVody/tok Elbe StavVody/datum 2004-04-22 gml:position 78 4. DISCUSSION AND RELATED WORK
8 8 % )9(! 9!-) > !'8 8 %# " #2
184
Chapter IX. Ontology-based Service Discovery in SDIs
%-7
4.1 Approaches to Semantic Service Discovery
!" # $
$ $ % & ' " $
&(!
% %
½¼
,&!
- ) -
1 "
$
)
" %
" $
$ $ %
. "
$
'*+ " $ #
)
,!
'*+ -
$ %
/ "
$ 0 1
½½
"
$
$ $
$ % "
%
$
$
' " -
3 (!" # $ $ $
4.2 Semantic Service Discovery in SDIs ) 9
. " #
$ $
% %
- $ 4
% )
% #
$ % $ $
)
% $ $ $ " $ #
: 9:
-
3 % ; ; 3
%
. " %
$ % )"
"
% %
%
½¿
) $ 9 % 9 3<
$ " # "
$
%
" % $ ?!
3