Supporting Semantics-based Metadata Discovery

0 downloads 0 Views 875KB Size Report
demonstrate a semantics-based metadata discovery tool called MetaSys to ... he has previously entered from the metadata database and edit it using the ...
Supporting Semantics-based Metadata Discovery with MetaSys Kean Huat SOON†*, Douglas MILLER§♣, Brian BILLS§♠, Jennifer WILLIAMS§♦ † Tropical Marine Science Institute, National University of Singapore, 18 Kent Ridge Road, Singapore 119227. {[email protected]} § Earth and Environmental Systems Institute, The Pennsylvania State University, EES Building, University Park, PA 16802, United States. {♣[email protected]; ♠[email protected]; ♦[email protected]} Abstract Existing metadata discovery tools that rely on exact string matching as the search option have proven insufficient to search for relevant metadata. Since different, yet semantically related terms can be used to describe and to search for the same dataset, a metadata tool that deals with semantics is essential. In this paper we demonstrate a semantics-based metadata discovery tool called MetaSys to support the Critical Zone Exploration Network (CZEN) community (www.czen.org) for creating, updating, and more importantly searching metadata and ranking metadata results based on the CZEN ontology. Keywords: Semantics-based search, metadata discovery, ontology, Critical Zone. 1.

INTRODUCTION

Metadata related research has been actively conducted to realize the notion of spatial data infrastructure (van Oosterom and Zlatanova, 2008). A number of metadata tools have been developed in attempt to manage and to search metadata more effectively. A notable one is Tsou (2002), who presented a metadata framework that emphasized the operational aspect of metadata implementation by including geodata objects, software components and web map services. This framework seems able to more freely combine with other metadata frameworks; however, as the author outlined, the approach is limited to text-based keyword search (i.e. exact string search). To overcome this shortcoming, Tsou (2002) proposed a semantic search mechanism by explicitly describing the relationships for geographic names, such as “San Diego is part of California State.” Although this proposal could solve the problem related to geographic names, most search terms from our users are related to the domain of datasets - for example, when looking for a climatology dataset, terms like “precipitation” and “temperature” are usually used to search. In such a case, a geographic name is not used. In this paper we demonstrate a semantics-based metadata discovery tool called MetaSys. MetaSys was developed to support the Critical Zone Exploration Network (CZEN) community (www.czen.org) for creating, updating, and more importantly searching and ranking metadata semantically based on the CZEN ontology (Frank, 1997). This knowledge structure describes concepts that are related to the domain of the Critical Zone, the near-surface portion of the Earth, which includes soil, vegetation, water bodies, etc. Different from most of the available search engines *

The work described here was conducted when the first author was affiliated with the Pennsylvania State University in 2009.

(e.g. Google), which often return search results based on exact string matching, MetaSys supports the discovery of metadata based on semantics. In other words, not only is metadata containing the exact search term returned and ranked, but also metadata that contains words that are semantically related to the search term. In what follows we describe the implementation framework of MetaSys in Section 2. Section 3 discusses the CZEN ontology. Section 4 shows the search results and Section 5 concludes the paper with future work. 2.

THE FRAMEWORK OF METASYS

Figure 1 depicts the implementation framework of MetaSys, a prototype system that was developed mainly based on the Java programming language (http://java.sun.com/). As shown in the figure, MetaSys consists of three major functionalities that allow users to search, populate, and update metadata. Within this framework, all metadata is stored in the metadata database using the metadata population function. Once the metadata is stored, the user can retrieve a record that he has previously entered from the metadata database and edit it using the metadata update function. In addition to the metadata population and update functions, MetaSys also enables the user to search the stored metadata semantically by utilizing the CZEN ontology. The search results are then ranked and returned to the front web interface of MetaSys. Figure 2 illustrates the welcome page of MetaSys, which is the first page where users can choose to populate, search or update metadata. As MetaSys is currently open to everyone within the local network at EESI, no restriction is applied to populate and update metadata so long as the user is granted to access the network. Figure 1: The implementation framework of MetaSys. The core architecture of MetaSys includes the functions of search, populate and update metadata stored in the metadata database User Interface

MetaSys Search

Populate

CZEN Ontology

Metadata Database

Update

Figure 2: The welcome page of MetaSys

2.1

Metadata Population

Figure 3 depicts the metadata population page of MetaSys. This page allows metadata custodians to create metadata. The CZEN metadata was modeled after the Federal Geographic Data Committee (FGDC) (http://www.fgdc.gov/) standard. Figure 3: Metadata population page

To facilitate semantics-based search, MetaSys provides a keyword panel to allow metadata custodians to tag each entered metadata record with concepts from the ontology. As an example in Figure 3, when the custodian clicks on a particular concept in the left panel, the properties that are associated with the selected concept will be displayed in the right panel for the custodian to select. Custodians can select both the concepts and the properties of the ontology to tag their metadata (concepts and properties in the ontology will be further discussed in the next section on the CZEN ontology). In the population page, the custodian can also enter the online linkage, such as an URL to a file on a remote FTP server. This online linkage will then become the pointer for interested parties to obtain the relevant dataset. Whether to provide open or login-restricted access to the file is the decision of the metadata custodian who owns the dataset. 2.2

Metadata Update

When the metadata custodian has successfully entered his metadata, he can click on the metadata edit button at the top of the page to update his metadata. As shown in Figure 4, when the metadata edit button is clicked, a table, which contains a list of metadata records that have been entered previously, is displayed. From the list, the custodian can then choose to delete or to edit a particular record. When the edit button is clicked, the metadata population page similar to Figure 3 will be shown with the metadata information previously entered. The metadata custodian can then make modifications on that page and click the save button to save changes. If the delete button is clicked instead, the metadata record will be deleted from the metadata database. Figure 4: Metadata update page

2.3

Metadata Search

The metadata search page of MetaSys is shown in Figure 5. As one can see at the top left of the page, two text search boxes are provided, so that users can enter more than one term to search for metadata. The search term in the first box however is required and given more weight than the term entered in the second box. This means that if two terms are entered in these boxes respectively, the metadata that contains only the first term will be ranked higher than the metadata that contains only the second term. But, if a metadata record happens to contain both terms, it will be ranked at the top, above all other records that contain only one of the search terms. The text boxes also implement search-while-you-type functionality. To help users search more effectively, MetaSys lists the concepts and properties that match with the entered text while the user types in the box. As shown in Figure 5, while the user types in “pre”, concepts and properties that match with “pre” are listed. To differentiate concepts from properties, each property is listed with its associated concept in parenthesis. Figure 5: The metadata search page: a list of terms that match the search term is displayed while the user types the term in the search box.

In the search page, the users can also indicate if they would like to limit searching to the keywords, title and/or abstract sections of the metadata. To do this, the users can just simply check on the respective box as shown at the center of the page. In addition, the users can also narrow down the search based on the time period and/or the region of the dataset. To assist users with understanding the CZEN ontology, the page also provides links to show the ontology represented as graphics and formatted text (as an indented list). In the next section, we will discuss the CZEN ontology in more detail.

3.

THE CZEN ONTOLOGY

In addition to the MetaSys framework, it is important to understand the CZEN ontology, the core component that supports MetaSys to search and rank metadata semantically. The ontology contains around 45 concepts and more than 100 properties associated with the concepts. Figure 6 shows a portion of the ontology displayed in Protégé (http://protégé.stanford.edu/), an ontology editor from Stanford University. The concepts are displayed in the left panel, while the associated properties are in the middle of the right panel. When a user clicks on a concept in the left panel, the concept’s corresponding properties will be displayed in the right panel. As an example, Figure 6 shows the corresponding properties for the concept “air temperature.” Protégé allows the ontology to be created in the format of Web Ontology Language (OWL) (http://www.w3.org/TR/owl-guide/) (McGuinness and van Harmelen, 2004) for machine processing. OWL is a W3 (World Wide Web Consortium) standard for ontology. Figure 6: The CZEN ontology is displayed in Protégé. The concepts and properties of the ontology are shown in the left and right panels respectively

Concepts

Properties

The concepts in the ontology are connected in a graph structure. As an example, Figure 7 illustrates a fragment of the ontology that describes the term “humidity.” On the right of the figure, the textual form of the term is presented in OWL format, and on the left is the term’s corresponding graphical representation. The terms connected to “humidity” (i.e. “atmosphere” and “evapotranspiration”) are also themselves linked to other terms (e.g. “atmosphere” is related to “precipitation” and “pressure”). The relationships that connect all the terms in the CZEN ontology can be divided into “subClassOf” and “seeAlso” relationships. The “subClassOf” relationship implies the generalization relation or is a relation between terms. For instance “humidity” generally can be considered as “atmosphere” or “penguin” is a “bird.” The “seeAlso” relationship means one term is related to another term, for example “humidity”

“seeAlso” “evapotranspiration” or “penguin” “seeAlso” “Antarctica”, indicate that humidity and penguin are related to evapotranspiration and Antarctica, respectively. Figure 7: A Fragment of the CZEN Ontology that describes the term “humidity”

atmosphere subClassOf humidity

seeAlso

evapotranspiration

Between “seeAlso” and “subClassOf” relationships, MetaSys ranks the search terms related by “seeAlso” higher than those related with the “subClassOf” relationship. The reason is that only with specific domain knowledge can the terms be related with the “seeAlso” relationship, as opposed to the “subClassOf” relationship, which is defined due to general knowledge. As the target audience for MetaSys is professionals in the Critical Zone community (as opposed to general audience), we suppose the users will be more interested in the metadata that are related based on domain knowledge. So, following the last example, “evapotransporation” will be semantically more important than “atmosphere” in the context of “humidity”. This difference is important in ranking the metadata records found in the database. The next section demonstrates how the ontology is used to rank metadata records that are semantically matched with the search term. 4.

RESULTS

Figure 8 shows a screenshot of the search result on the term “humidity.” Although the metadata records stored in the database do not contain any keywords that exactly match with “humidity”, MetaSys also retrieved metadata records with keywords that are semantically related with “humidity.” Given that MetaSys treats the terms related with “seeAlso” as more important, the metadata record that contains “evapotransporation” is ranked higher than the metadata record that contains “atmosphere” which has a “subClassOf” relationship with “humidity.” To obtain the full metadata record, users can click on the Detail hyperlink. A sample of metadata detail is shown in Figure 9. Figure 8: Metadata records that contain words (e.g. “evapotranspiration”) that are semantically related to “humidity” are returned

Figure 9: A sample of metadata detail

5.

CONCLUSIONS AND FUTURE WORK

The paper demonstrated a metadata discovery tool called MetaSys. MetaSys not only allows metadata custodians to populate and update metadata, it also supports a semantics-based search and ranking of results based on the CZEN ontology. MetaSys, however, is purely a text-based system, meaning that no map visualization is involved to help users search for metadata. Presently at Tropical Marine Science Institute, a Geographic Information System (GIS) tool is being developed using ESRI ArcGIS Server. This tool will incorporate the notion of semantics-based search with GIS functions to facilitate metadata discovery and eventually to enable users to obtain the real datasets on climate change.

6.

ACKNOWLEDGEMENT

Support for this work was provided by the U.S. National Science Foundation. The first author thanks Dr Durairaju Kumaran Raju at Tropical Marine Science Institute, National University of Singapore, for his generous guidance. REFERENCES Frank, A. (1997). “Spatial ontology: A geographical information point of view”, in Stock, O (Ed). Spatial and Temporal Reasoning. Springer, pp. 135-153. McGuinness, D.L. and van Harmelen, F (2004). OWL Web Ontology Language: Overview, at http://www.w3.org/TR/owl-features/, [accessed 22 June 2010]. Tsou, M (2002). “An Operational Metadata Framework for Searching, Indexing, and Retrieving Geographic Information Services on the Internet”, GIScience 2002, Lecture Notes in Computer Science 2478. Springer Verlag Berlin Heidelberg, pp. 313 - 332. Van Oosterom, P and Zlatanova, S (2008). Creating Spatial Information Infrastructures: Towards the Spatial Semantic Web, United States: CRC Press.