Geoinformatica (2006) 10: 159–176 DOI: 10.1007/s10707-006-7577-2
Metadata Community Profiles for the Semantic Web Luis Bermudez & Michael Piasecki
Received: 29 July 2004 / Revised: 8 April 2005 / Accepted: 27 September 2005 # Springer Science + Business Media, LLC 2006
Abstract Metadata is needed to facilitate data sharing among geospatial information communities. Geographic Metadata Standards are available but tend to be general and complex in nature and also are not well suited to overcome semantic heterogeneities across vocabularies of different domains and user communities. Current formalizations of metadata standards are not flexible enough to allow reuse and extension of metadata specifications, in particular for Web based information systems. In order to address this problem we propose a methodology to create community specific metadata profiles for the Semantic Web by reusing metadata specifications and domain vocabularies encoded as resources for the Web. This ensures that these community profiles are semantically compatible so they can be used in Web based information systems. The ISO-19115:2003 geographic metadata standard is the most general standard available and is being used in conjunction with the Web Ontology Language as the expression medium to test the methodology for each one of the possible extensions documented in ISO-19115:2003. It is shown that it is possible to extend and reuse metadata specifications and vocabularies distributed in the Web using the Web Ontology Language, by utilizing the language’s flexibility to create restrictions on inherit properties and to make interferences on web distributed resources. Examples from the area of Hydrology are provided to demonstrate the technical details of the approach. Keywords Metadata . Hydrology . Ontology . Semantic interoperability
1. Introduction and Motivation The use of metadata to describe the contents of data sets with regard to their semantic and syntactic specification has become one of the focus areas for many geoscience communities (e.g., CHRONOS [5], CLEANER [6], CUASHI [9], IRIS [23], NOKIS [29]) software related consortiums (e.g., OASIS [30], OMG [33]) and govL. Bermudez : M. Piasecki (*) Department of Civil Architectural & Environmental Engineering, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA e-mail:
[email protected] Springer
160
Geoinformatica (2006) 10: 159–176
ernmental agencies (e.g., EPA [14], NOAA [28], USGS [37]). All of these communities have realized that only a formalized procedure to describe data will enable them to ensure an environment within which it will be possible to interchange data and minimize duplication. As a result, formal metadata specifications are needed to improve search, retrieval, and analysis procedures on an exponentially growing volume of data [7]. Several metadata standards have been developed over the past decade and are currently in use world-wide. The most widely used ones are the International Standard Organization, ISO, 19115:2003 metadata standard, [25] the Dublin Core Metadata Initiative [10], and the Federal Geographic Data Committee [15], metadata sets. While all three of them are being used to some degree, the FGDC standard is predominantly used in the US while the ISO is an international standard whose use is promoted worldwide. While these two standards have been developed for the geo-spatial communities, the DCMI standard has its origin in the library science. The latter is attractive because it contains a relatively small set of core elements; however, because of its library science origin, it does not provide many of the elements that are necessary for geo-spatial data descriptions. Because of the fact that the FGDC standard is not presented in a conceptualized form (e.g., Unified Modeling Language UML [32], and the DCMI lacks in geospatial representations, we will use the ISO 19115:2003 standard a decision that is also supported by the current trend to use the ISO standard more ubiquitously around the globe. Communities typically tend to create their own metadata specifications that fits their specific needs, but are not always aware of the specifications of other communities. This is also the case for the area of Hydrology, where standard descriptions for gage stations, watersheds, well pumping observations and other hydrologic data are not explicitly available. As a consequence, communities prefer to create their own specification by: reusing elements of other specifications, such as Ecological Metadata Language EML [11]; or rewriting a complete new specification by themselves, e.g., Hydrologic Markup Language HydroML [36]. This lack of interoperability has plagued many interdisciplinary collaboration efforts to date [4], [18], [19], [34], and is one of the main obstacles that need to be overcome when connecting various research communities, the government, and the public for a more seamless data and information realm. More specifically, the description of similar datasets using different vocabularies or using similar vocabularies for different data sets results into what is called Bsemantic heterogeneity’’ [13], [20], [35]. There are actually two types of semantic heterogeneities: 1) different representations of a unique world reality, due to different conceptualization or due to misspelled strings. For example water level could be referred as BStage’’ or BGage height’’ (but also as BGaug’’ or BHiegth’’ in case it is misspelled), 2) different world realities having the same representation, which is the current problem of search engines like BGoogle’’ ( e.g.: Bstage’’ is measure for water surface elevation, but also refers to a place for the performing arts.) Metadata specifications can be presented as a text document, as a conceptual model in Unified Modeling Language, UML, [32] or as an application schema like the eXtensible Markup Language, XML schemas [38]. However, both of these representations do not provide the utility for metadata specifications to be adequately expressed as web resources and also do not allow extensions of metadata elements in these specifications. In addition, Uniform Resource Identifiers (URI) are not part of the XML schema model [21], so retrieving elements from an XML Springer
Geoinformatica (2006)10: 159–176
161
schema may cause difficulties because complex types, global and local elements, could have the same name (value in the attribute Bname’’). Text is adequate for human readability but lacks the ability for computer programs to interact with it. UML models can not be used as a web resource because it is not possible to restrict properties as it affects the membership of objects in a class. Also, in UML it is not possible to extend distributed resources in the WEB because it will break the principle of modularization [1]. Seeking a strategy to better deal with the complexity of metadata specifications and the necessity to satisfy the need of community specific semantics, we propose a methodology that allows the extension of metadata models and domain vocabularies to create community profiles by making use of Semantic Web technologies. The proposed methodology has the following advantages: 1) the created community profile is fully interoperable with the extended metadata standard, since the extension is expressed in machine readable format; 2) it allows to restrict inherit properties from other specifications to fit community needs, and 3) the profile uses controlled vocabularies expressed in ontologies that can be distributed as web resources. To demonstrate the proposed concept, we selected the Geographic Information Metadata Standard ISO-19115:2003, and also formalized each one of the possible extensions, using the Web Ontology Language [39].
2. Semantic Web: Linking Distributed Resources The Semantic Web [3] is a universe of metadata and ontologies expressed in machine readable format along with software tools, that allow the understanding of semantic relations among heterogeneous and distributed resources in the Web, [12]. It is based on technologies recommended by the World Wide Web Consortium, such as the extensible Markup Language (XML), Resource Description Framework (RDF), Web Ontology Language (OWL), and Uniform Resource Identifier (URI). RDF is based on statements that resemble simple language expressions. Statements are composed of a resource (subject) with a property (predicate) and a value (object). An example of a statement is: http://waterdata.usgs.gov/nwis/uv?dd_cd=01&site_ no=0208758850 was created by USGS.’’ While in the above statement only the subject is a URI, the other parts of the statement could also be represented as a URI. Figure 1 shows an excerpt in XML where the http://purl.org/dc/elements/1.1./ #creator is also a URI. This resource, abbreviated as dc:creator, is an element provided by the Dublin Core Metadata Initiative [10] to describe resources in the web.
Fig. 1 RDF Triple in XML Springer
162
Geoinformatica (2006) 10: 159–176
In a similar fashion we can describe geospatial data by using elements that have an assigned URI and that are available as a web resource. A community can then refer to one or more metadata standards and reuse one or more vocabularies to fit their needs. Since the resources have a URI as a unique identifier, it will help to solve semantic heterogeneities among information communities and it will facilitate Web information systems to make inferences in the Semantic Web.
3. Metadata Specification A metadata specification is a set of statements that helps domain experts to formally express the rules of usage for metadata elements. The metadata specification can be presented either in an informal or in a formal manner. An informal approach is one that specifies the metadata specification in plain ASCII text using an arbitrary format (e.g., FGDC-STD-001-1998) while a formal approach expresses a metadata specification as a conceptual model in terms of classes and properties, or as an application schema. The two most common conceptual models representations are the Unified Modeling Language, UML, (e.g., ISO-19115-2003) and the XML Metadata Interchange Format, XMI. (e.g., ISO 19139), while an application schema could be an XML schema. An example of the latter approach is the Ecological Metadata Language, EML [11]. The Web Ontology Language, OWL, however (a recent technology recommended by W3C) is capable of i) representing sophisticated conceptual models and ii) serving as an application schema at the same time. Hence, OWL promises to be a very powerful alternative to encode metadata specifications. Differences and similarities between XML schemas, RDF and ontology languages are well documented in the literature [16], [21], [22], while differences between UML and DAML+OIL, which is very similar to OWL, can be found in [1]. We will not elaborate on those but the main advantages of OWL over specifications in plain ASCII text, XML schemas, and UML can be summarized as follows: 1) OWL is able to represent conceptual models with classes and properties and their relationships, similar to UML, which is not possible in XML schemas, 2) OWL is expressed in machine readable format using RDF/XML which plain ASCII text and UML are not, 3) OWL is built on top of the Resource Description Framework (RDF) model, where resources are expressed as Uniform Resource Identifier, or URI, that is not part of the XML schemas model or UML, and 4) OWL allows to restrict inherited properties, a feature that is not present in UML. Figure 2 shows the different implementations (specifications) for metadata organized by level of conceptualization, machine readability and extensibility (by extensibility we mean the possibility of extending distributed Web resources). UML, XMI, and XML as distributed resources could be extended but only with additional technologies like XML Linking language [40] and the XML Path Language [41]. Figure 2 also presents XMI (XML Metadata Interchange), which is a special XML schema that allows expression of UML models. The advantage of UML over OWL is that UML is a mature language and that it has become the standard tool to create conceptual models. However, very recently applications have become available to create OWL ontologies like Prote´ge´ (http:// protege.stanford.edu/) and others listed in http://Web.daml.org/tools/. Also, some research efforts are under way that focus on developing UML tools to create Springer
Geoinformatica (2006)10: 159–176
163
Fig. 2 Metadata specifications organized by level of conceptualization, machine readability and extensibility
ontologies [1]. We favor the use of Prote´ge´ to create the community profiles in combination with the plug-in tool ezOWL (http://iweb.etri.re.kr/ezowl/index.html), which allows the visualization of OWL ontologies in a manner similar to that of UML.
4. Controlled Vocabulary, Ontologies and Conceptual Models A controlled vocabulary is a set of unambiguous terms explicitly stated to be used in a specific domain (e.g., glossaries, taxonomies or thesaurus). They differ from each other based on the type of relation among the terms. If there is any explicit
Fig. 3 Small ontology example Springer
164
Geoinformatica (2006) 10: 159–176
Fig. 4 Hydrologic units ontology
conceptual relation among the terms we consider them as conceptual models, if not, they are terminological tools. Explicit conceptual relation occurs when there is at least one explicit class or entity representing a concept and relating all terms involved. The conceptualization is the product of a mental abstraction, which could be classification, aggregation or generalization [2]. For example, a list of terms such as: BUS,’’ BGermany,’’ and BColombia’’ does not present any explicit conceptual relation, until an explicit class BCountry’’ is abstracted classifying the previous real world objects. Explicit means that somewhere in the presentation of the terms it is clear that Country is stated as a class or entity. An ontology is also a conceptual model; however, it differs from traditional conceptual models expressed in UML because it has a higher degree of expressiveness. For example, [17], defines an ontology as a formal specification of conceptualization and McGuinness [27] states that an ontology should have at least a Bfinite controlled set of vocabulary, an unambiguous interpretation of classes and term relationships, and a strict hierarchical subclass relationship between classes.’’ Also, in ontologies it is permissible to use logical expressions such as stating that a property is transitive or symmetric or that a class is complement-of. A small ontology example is presented in Fig. 3, where the classes BodyOfWater, River and Lake are shown explicitly as boxes with the name of the class in bold in the first row. Properties are presented in the second and third rows. The property connectsTo applies to all the classes that are inherited from BodyOfWater, while length and area apply only to the local classes River and Lake, respectively. Figure 3 also contains one of the many possible representations of an ontology. Because a given domain ontology should be understandable to a variety of communities, a
Fig. 5 Ontology extension in XML Springer
Geoinformatica (2006)10: 159–176 Table 1 Terminology used in OWL, UML, and ISO
UML ISO OWL
165
Class Entity Class
Attribute or Association Element Property
formal approach is necessary (in this case OWL), which is serialized using RDF/ XML and shown on the left side of the image panel. Another example of a conversion of a terminological tool to an ontology is presented in Fig. 4, where categories are encoded as classes and the terms are encoded as individuals (or instances) of these classes. Besides domain ontologies, like the one presented in the above figure, other higher level conceptualizations can be found for the hydrologic domain, such as ArcHydro [26]. ArcHydro is an example of a data model for hydrology and hydrography, presented in UML. Like domain vocabularies, conceptual models should be available as ontologies in OWL, or a similar ontological language however. UML expressed in XMI could be transformed into OWL either by using Style sheet Language for XML Transformations or XSLT [8], or by using a common metadata model, e.g., Metadata Object Facility (MOF), proposed by the Object management Group [31]. In both cases the idea is to map UML classes to OWL classes and UML attributes and associations to OWL datatype properties and object properties, respectively. There are a number of pother mapping approaches, which are not presented here, however, the interested is referred to [1], who discusses additional mapping concepts.
5. Requirements to Create Community Profiles in OWL The requirements to extend a metadata specification using OWL is that both the metadata specifications as well as the controlled vocabulary that is going to be used in the specification should be written in OWL. Unfortunately, at the time of writing this paper there was no endorsement by ISO (or any other publisher of standards) of a metadata specification formalized in OWL. While this may be in part due to the fact that OWL is in its infancy (newly recommended by W3C, 10 February 2004) we were able to use a mapping of ISO:19115 to OWL [24], which we will use for our purpose. In fact, communications with members of the Technical Committee 211, TC211, of ISO at the time of writing this manuscript suggest that efforts are underway to consider an endorsement of the ISO 19115:2003 in OWL in the near future. It is desirable to include and to reuse resources on the WEB that can be included when developing ontologies and controlled vocabularies in order to i.) reduce the repetitive effort of development, and ii.) to readily include agreed upon specifica-
Fig. 6 Creation of a class and a sub-class in OWL Springer
166
Geoinformatica (2006) 10: 159–176
Fig. 7 Neuse-Station ontology
tions. While it will a take a while to have more vocabularies become available on the WEB, two very good ontologies that contain an expansive collection of vocabulary are Wordnet (http://taurus.unine.ch/GroupHome/knowler/), and SWEET (http:// sweet.jpl.nasa.gov/sweet/). The usability of either one, however, is limited because the first one is too general in its word collection and the second one is not complete enough to be readily used for hydrologic purposes. Yet, both of them are written in OWL and as such provide a good starting point that can be utilized and build upon through reuse and extension. The first step for any extension is to import the ontologies that contain the specifications and the vocabularies as shown in Fig. 5. The tag owl:imports encloses the resource to be imported. Software tools build for the Semantic Web understand the rdf:resource tag and will try load the model located in the URI. Once the model is loaded all the resources contained in the URI are available to be extended. The tool Prote´ge´ does this automatically and displays all the classes and properties of all the imported ontologies. OWL syntax conforms to XML namespaces rules, where prefixes and the namespace are declared in the header of the XML file. That is why it is possible to show the abbreviated tag owl:import. Also, it is common to create embedded DTDs in the XML file, so instead of presenting the resource: http://loki.cae.drexel. edu/~wbs/ontology/2004/02/iso-metadata.owl#CI_Citation, we should be able to present the resource as and iso;CI_Citation when it is used as an attribute value. We will follow this abbreviated convention in the subsequent examples to facilitate readability.
6. Community Profiles and Metadata Extension Community profiles are extensions of metadata specifications, created to fit specific needs of a geospatial information community. Typically, all metadata specifications provide guidelines for extending their content and scope as is the case for the ISO19115 (see Annex C, D and E of the standard), for which we will show the pos-
Fig. 8 Usage of a Code-list as a range of a property in OWL Springer
Geoinformatica (2006)10: 159–176
167
Fig. 9 ISO Code-list
sibilities of extension using OWL. The same rationale could be use to create community profiles for any other metadata specification. ISO 19115 is presented diagrammatically using the Unified Modeling Language (UML). The ISO 19115 Metadata set is composed of UML packages, each of which is composed of entities (UML classes). An entity contains elements (UML attributes), which identify the discrete units of the metadata. For example title, alternateTitle and date, are elements of the class CI_Citation. To clarify the usage of terms, Table 1 presents the different terminology used by UML, ISO and OWL, i.e., Class is the same as Entity; and Attribute, Element and Property all refer to the same concept. In ISO:19115:2003 a community profile should consist of the core metadata set, some optional elements, and newly defined elements. To create a community profile the following extensions are allowed: 1) adding a new metadata section; 2) creating a new metadata code list to replace the domain of an existing metadata element that has Bfree text’’ listed as its domain value; 3) creating or expanding a code list; 4) adding a new metadata element to an already existing entity 5) adding a new metadata entity 6) imposing a more stringent obligation on an existing metadata element; 7) imposing a more restrictive domain on an existing metadata element. The mapping rules for the above extensions will be discussed in detail in the following sections. 6.1. Adding a New Metadata Section or Metadata Entity A metadata section is a set of classes that are related to each other. In OWL, however, there is no means provided that would permit the creation of packages or sections. One way to separate the original classes from a new set of classes is to create a new ontology that could resemble a package. In OWL, classes are created using the owl:Class tag. The rdf:ID is the identifier of the resource as shown in the left panel of Fig. 6. The newly created class or classes need to be linked with the original model. This is done by creating a new property or by creating a new parent-child relationship that links a class from the original specification to the newly created class. Creation of a new property is explained in Section 6.4 and creating a new parent-child relationship is done by specifying that the new class is a subclass of the original one, as shown in
Fig. 10 Extending an ISO’s Code-list Springer
168
Geoinformatica (2006) 10: 159–176
Fig. 11 Creation of an object property in OWL
the right panel in Fig. 6. In the above example, the class MD_Keywords_ext is a new class that is a subclass of MD_Keywords, which is an original ISO class. 6.2. Creating a New Metadata Code List Code-lists and enumerations in ISO are defined as datatype classes whose instances form a list of named literals that contain a set of values. In OWL this can be represented as a class whose instances are the list of possible values. Figure 7 shows an excerpt of a list of gauge stations for the Neuse River Basin, NC, encoded in XML, where a new class Station and instances of those classes are created with their respective ids and labels in English (=’’en’’). The station class in the previous ontology excerpt could be used as the range of a property in another ontology. In Fig. 8, the range of the property site refers to all the stations located in the Neuse-station ontology in Fig. 7; or simply the instances of &neuse;Station. 6.3. Expanding a Code-List Code-lists, as previously mentioned, are formalized in OWL as a class. To add a new member of the class one needs to create a new instance of that class. An ISO codelist will look similar to Fig. 9. After importing this ontology an instance can be created for one of the imported classes. Figure 10 shows a new instance of the class TopicCatCd with ID=B_020.’’ 6.4. Adding a New Metadata Element to an Existing Class A metadata element is equivalent to a property in OWL. In OWL there are two types of properties: datatype properties (owl:datatypeProperty) and object properties (owl:ObjectProperty). The datatype property has as its range an XML datatype (e.g., string, integer, date and others given by the XML schema specification), while an object property has as its range an owl:Class. Figure 11 shows the creation of a new property, named site, whose domain is the ISO MD_DataIdentification package, and whose range are the instances of the class Station as previously defined in the Neuse-Station ontology. If we would like to create a property whose range is a simple data type, like an integer, then a owl:datatypeProperty should be declared as shown in Fig. 12:
Fig. 12 Creation of a datatype property in OWL Springer
Geoinformatica (2006)10: 159–176
169
Fig. 13 ISO element datasetURI
6.5. Imposing a More Stringent Obligation on an Existing Metadata Element Restricting a metadata element can be interpreted as imposing restrictions on a property in OWL. The restrictions that are available in OWL are: hasValue, allValueFrom, someValuesFrom, minCardinality, maxCardinality, and cardinality. All of these restrictions can be applied to extend properties, shaping the extended ontology to fit specific needs of geospatial information communities. A property has the following characteristics: cardinality, type, and range. We propose that in order to change the characteristics of a property the following must be done: first, create a subclass of the class that holds the property to be extended, and second, create a local restriction to the extended property in the subclass. Imposing a more stringent obligation on an existing metadata element requires to Bchange’’ the cardinality of an element. In OWL this is done by creating a local restriction on a property. Since it is a more stringent obligation it means that the cardinality before was zero and now should be one. This can be done by stating that owl:minCardinality or owl:cardinality is equal to one. Figure 13 shows the XML expression for the ISO element datasetURI. If this property is to be become mandatory, the following steps must be executed: first, create a new subclass of and iso;MD_Metadata: and ext;MD_Metadata_ext, and second, an owl:minCardinality restriction is created on the property and iso;dataSetURI, an example is shown in Fig. 14. 6.6. Imposing a More Restrictive Domain on an Existing Metadata Element Imposing a more restrictive domain on an existing metadata element is interpreted in OWL as changing the range of a property. It is similar to the previous case because it implies creation of a local restriction on a property.
Fig. 14 Cardinality restriction in OWL Springer
170
Geoinformatica (2006) 10: 159–176
Fig. 15 Restriction iso:keyword
Figure 15 shows an extension to the property iso:keyword on the class MD_Keywords and Fig. 16 shows the extension in XML. The extension implies that the range (or domain in ISO) of the property iso:keyword is no longer CharacterString but now contains all values from gcmd:Surface_Water. In OWL, such restrictions are done indirectly by stating that the class that is restricting the property is a subclass of a class called owl:Restriction. The owl:Restriction contains the property that is being restricted, iso:keyword, and the type of restriction, owl:AllValuesFrom. The owl:ValuesFrom tag is a built-in OWL property that links the restriction to a class description or a data range. A class description is a defined class, whose instances are all the values that the restrictions refer to. In Fig. 15 this class is gcmd:Surface_Water, which contains instances like: discharge and stage height. It is important to note that the logical reading of the created statement is Ball individuals that have values for the property iso:keyword of type gcmd:Surface_Water are of type gic:MD_Keywords_EXT.
7. Extension Problems 7.1. Restricting Datatype Properties OWL can be used in three different versions: OWL-light, OWL-DL and Owl-Full. While the OWL-Full version is the most expressive, the other two guarantee computational completeness and decidability. Because of these latter reasons, we attempted to create all of the ontologies in OWL-DL only; however, there are some expressions where OWL-Full is needed, as for the case presented previously in Fig. 15. The figure shows a restriction that uses allvaluesFrom. If the property,
Fig. 16 Extension of iso:keywords in XML Springer
Geoinformatica (2006)10: 159–176
171
Fig. 17 Value of a property as object property and as datatype property
that the restriction is applied to, is a datatype property and the restriction on that property is not a datatype class, or rdfs:literal, or a oneOf, it becomes an OWL-Full expression. Since gcmd:Surface_Water, in Fig. 15 is an owl:Class and not a datatype class, or a rdfs:literal or a oneOf, the statement is in OWL-Full. There are two possible pathways that one can attempt to solve this problem: 1) in the original metadata specification change the datatype property to be an object Property. This is not feasible, however, since we demand that the metadata specification should be a web-accessible resource, published by an entity (e.g., ISO) that is different from the community creating the extension. For obvious reasons the community can not be permitted to change the original metadata specification directly, because this specification will be used by other communities; 2) treat the datatype property as an object Property in the extension. In this case the extension will be in OWL-Full. Because of the concern with regard to the computational completeness and decidability when using an ontology in OWL-Full, we tested our extension using a JAVA API developed by HP LABS (http://www.hpl.hp.com/ semweb/), and with Prote´ge´ (http://protege.stanford.edu/) from Stanford University. We found that it was possible to create such expressions and did not encounter any problems. An instance can be presented in two ways as an objectProperty or as a datatypeProperty as presented in Fig. 17. Both have the same meaning, however, a computer application that reads these instances should be able to accommodate instances in which some literal values might be a URI (e.g., Bhttp://foo#b1), as shown in the right panel of Fig. 17. A concrete example of a datatype property restriction problem, is depicted in Fig. 16, where the datatype property is treated as an object property.
Fig. 18 MD_Identifier as range of two properties Springer
172
Geoinformatica (2006) 10: 159–176
Fig. 19 Extending a property of a multi-range-class
7.2. Restricting Inherited Properties on Classes that are the Range of More than One Property On occasions the need arises to restrict a property in a class, where the class is a range of more than one property (hereafter, multi-range-class.) If a multi-rangeclass is restricted, every time the class is being used (e.g., range of a property) the restriction will apply. All the classes that were mapped from an ISO class with a stereotype datatype, are potentially exposed to this problem since they are ranges of more than one property. Figure 18, shows a multi-range-class, MD_identifier, which is a range of two properties geographicIdentifier, from EX_GeographicDescription and Identifier from CI_Citation. If the MD_Identifier of geographicIdentifier is restricted to have only one possible value for its authority property (e.g., the citation referring to GETTY, which provides a thesaurus for geographic names), then identifier in CI_Citation will also be subjected to the same restriction. The workaround of this problem is to also create an extension on the domain of the property that uses a multi-range class and apply the restriction to that class. Figure 19 depicts the schematic where the class ext:EX_GeographicDescription_EXT, which is a subclass of iso:EX_GeographicDescription, is subjected to a restriction with all values from ext:MD_Identifier_EXT. Using this procedure a computer program could be coded to prefer the extended classes, from the original ones, and validate the instances, display a guy or query metadata instances that conform to a particular community.
8. Summary The proposed extension methodology will overcome much of the difficulties that are currently encountered within the geoscience communities when attempting to address the complexity of metadata specifications. Using OWL it was shown that it is possible to extend metadata specifications using distributed resources with a much greater degree of flexibility than using UML or XML schemas. To this end, we outlined a concept of how generic metadata catalogues (in UML) and ontologies (in OWL) can be merged to create community specific metadata profiles. The formalized approach included the need to create a controlled vocabulary to overSpringer
Geoinformatica (2006)10: 159–176
173
Table 2 Summary of features Name
Definition
Example
Range Domain
Type of values a property can take Relation between a property and class that contains this property. If a property can be used by a class, the class is defined as the domain of the property. An OWL property that has a XML datatype as its range. Similar to attributes in UML. An OWL Property that has an OWL class as its range. Similar to associations in UML. When a property is restricted to a finite set of values of a domain. The property needs to be restricted to include only those values that are the instances of the class. A class that is the range of more than one property.
iso:dateStamp has range xsd:date Iso:dateStamp has domain MD_Metadata class.
Datatype property Object property
Domain restriction
Multi-rangeclass
iso:dateStamp is a datatype property because its range is xsd:date. iso:contact is an object property because its range is the class CI_ResponsibleParty. See Fig. 15. The property to be restricted is iso:keyword, and the values are instances of the class gcmd:Surface_Water. See Fig. 18. iso:MD_Identifier is the range of more than one property: property iso:geographicIdentifier and iso:Identifier.
come semantic heterogeneities. This has been addressed through the use of domain specific ontologies that permit harvesting of permissible entries from distributed resources on the WEB, i.e., the inclusion of already existing as well as disparate resources. It was shown that the presented methodology also permits a simplified extension of community metadata profiles, which is an important feature because metadata profile alterations are needed as they develop until they reach a high degree of maturity. This concept also includes a set of mapping rules that needed to be established in order to bridge the differences between conceptualizations in UML and OWL.
Table 3 Summary for extension problems Problem description
Checks
Occurrence
Solution
OWL-DL does not allow to have owl:AllValuesFrom restriction on a datatypeProperty. Undesirable propagation of restrictions.
Is the property a datatype property?
If the property to be extended is a datatype property, and the restriction is to have all values from. Properties with multirange-classes as a range.
Treat the datatype property as an object property. This is OWL full.
Is the range of the property, declared as range of other properties?
Create a subclass of the domain of the property and apply the restriction to this subclass. Springer
174
Geoinformatica (2006) 10: 159–176
The following two tables (Tables 2 and 3) are a summary of the main features covered in this paper and a summary for the extension problems. In these two tables we have used the namespaces: xsd refers to http://www.w3.org/2001/XMLSchema, iso refers to http://loki.cae.drexel.edu/~wbs/ontology/2004/08/iso-19115# and gcmd refers to http://loki.cae.drexel.edu/~how/2004/07/14/gcmd.owl The utilization of Semantic Web tools along with already existing thesauri and standardized generic metadata models in OWL, was shown to permit the creation of a framework to tie in both, already existing metadata descriptions and newly to be developed metadata sets to form one comprehensive metadata realm that allows the discovery of data sets from vastly different sources. We are also aware that the Semantic Web needs more tools especially to interact with geospatial metadata, to help in the creation of metadata instances and for the querying, retrieval and interchange of seamless real time, historic and numerical model data. Tools and domain knowledge expressed in ontologies are a necessary future work that will complement the metadata models to achieve semantic interoperability among geospatial information communities. We have developed a tool, Pangloss, that uses the methodology explained in this paper. It is available as a Java Web Start program at http://loki.cae.drexel.edu:8080/ ~how/pangloss/. We have also published a first version of the community profile for CUAHSI, which can be viewed at http://loki.cae.drexel.edu:8080/web/how/me/ metadatacuahsi.html. Acknowledgments This research has been in part supported by a grant from the National Ocean Partnership Program, NOPP, under grant-# NAG 13-0040, and is also currently supported by the NSF-GEO Directorate under grant-# 0412838 to create a Hydrologic Metadata set for the Consortium of Universities for the Advancement of the Hydrologic Sciences, Inc., CUAHSI, prototype Hydrologic Information System (HIS), in the Neuse River Basin, North Carolina. Special thanks go to Ken Lanfear and Jeff Christman of USGS, Reston, for their help in organizing a miniworkshop to discuss IT and metadata concepts and access to their NWIS server logs.
References 1. K. Baclawski, M. Kokar, P. Kogut, L. Hart, J. Smith, J. Letkowski, and P. Emery. BExtending the unified modeling language for ontology development,’’ Software System Model, 1:1–15, 2002. 2. C. Batini, S. Ceri, and S.B. Navathe. BConceptual database design,’’ The Benjamin/Cummings Publishing Company, Inc., Redwood City, CA, 1992. 3. T. Berners-Lee, J. Hendler, and O. Lassila. BThe semantic web,’’ Scientific American, Vol. 184(5):34–43, 2001. 4. Y. Bishr. BOvercoming the semantic and other barriers to GIS interoperability,’’ Geographic Information Science, Vol. 12(4):299–314, 1998. 5. CHRONOS. BAn information system for chronostratigraphy,’’ in http://www.chronos.org/ index.html, 2004. 6. CLEANER. BCollaborative Large-Scale Engineering Analysis Network for Environmental Research,’’ in http://cleaner.ce.berkeley.edu/intro.php, 2004. 7. Commission on Geosciences Environment and Resource CGER. BA data foundation for the national spatial data infrastructure,’’ National Academy Press, Washington, D.C., 1995. 8. S. Cranefield. BUML and the semantic web,’’ in Semantic Web Working Symposium, California, USA, in http://www.semanticweb.org/SWWS/program/full/paper1.pdf, 2001. Springer
Geoinformatica (2006)10: 159–176
175
9. CUAHSI. Consortium for the Advancement of the Hydrologic Sciences, Inc., in http://www. cuahsi.org/, 2004. 10. DCMI. Dublin Core Metadata Initiative, in http://dublincore.org/, 2004. 11. Ecoinformatics. BEML—Ecological Markup Language,’’ in http://www.ecoinformatics.org/ tools.html, 2003. 12. M.J. Egenhofer. BToward the semantic geospatial web,’’ in Tenth ACM International Symposium on Advances in Geographic Information Systems, ACM Press: McLean, VA, USA, 2002. 13. A. Elmargarmid and C. Pu. BGuest editors’ introduction to the special issue on heterogenous databases,’’ ACM Computing Surveys, 22:175–178, 1990. 14. EPA, US Environmental Protection Agency, in http://www.epa.gov/, 2004. 15. FGDC. BContent standard for digital geospatial metadata,’’ Washington, D.C., 1998. 16. Y. Gil and V. Ratnakar. BTRELLIS: An interactive tool for capturing information analysis and decision making,’’ in A. Go´mez-Pe´rez and V. Richard Benjamins (Eds.), Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web: 13th International Conference, EKAW 2002, Lecture in Computer Science, vol. 2473, pp. 37–42, Springer-Verlag: Heidelberg, Siguenza, Spain, 2002. 17. T. Gruber. BA translation approach to portable ontology specification,’’ Knowledge Acquisition 5(2):199–220, 1993. 18. T. Hadzilakos, G. Halaris, M. Kavouras, M. Kokla, G. Panapoulos, I. Paraschakis, T. Sellis, L. Tsoulos, and M. Zervakis. BInteroperability and definition of a national standard for geospatial data: the case of the hellenic cadastre,’’ International Journal of Applied Earth Observations and Geoinformation, Vol. 2(2):120–128, 2000. 19. F. Harvey, W. Kuhn, H. Pundt, and Y. Bishr. BSemantic interoperability: A central issue for sharing geographic information,’’ The Annals of Regional Science, Vol. 33(2):213–232, 1999. 20. J. Helly, A.A.P. Koppers, and H. Staudigel. BScalable models of data sharing in earth sciences,’’ Geochem. Geophys. Geosyst, Vol. 4(1):1010, doi:10.1029/2002GC000318, 2003. 21. J. Hendler. BXML and the semantic web XML,’’ Journal, October, 2002. 22. J. Hunter and C. Lagoze. BCombining RDF and XML schemas to enhance interoperability between metadata application profiles,’’ in The Tenth International World Wide Web Conference, pp. 457–466, ACM Press: Hong Kong, 2001. 23. IRIS. Incorporated Research Institutions for Seismology, in http://www.iris.washington.edu/, 2004. 24. A.K.M.S. Islam, L.E. Bermudez, and M. Piasecki. BOntology for geographic information— Metadata (ISO 19115),’’ in http://loki.cae.drexel.edu/~wbs/ontology/, 2004. 25. ISO. BGeographic information—Metadata,’’ 2003. 26. D.R. Maidment. Arc hydro Gis for water resources. ESRI: California, 2002. 27. D.L. McGuinness. BOntologies come of age.’’ Spinning the semantic Web. D. Fensel, J. Hendler, H. Lieberman and W. Wahlster. The MIT Press: London, England, 2003. 28. NOAA. National Oceanic and Atmospheric Administration, in http://www.noaa.gov/, 2004. 29. NOKIS. North and Baltic Sea Coastal Information System, 2004. 30. OASIS. http://www.oasis-open.org, 2004. 31. OMG. BMeta-Object Facility MOFi,^ version 1.4, in http://www.omg.org/technology/documents/formal/mof.htm, 2002. 32. OMG. BUnified modeling language specification,’’ in http://www.omg.org/technology/documents/formal/uml.htm, 2003. 33. OMG. Object Management Group, in http://www.omg.org/, 2004. 34. A.P. Sheth. BChanging focus on interoperability in information systems: From system, syntax, structures to semantics,’’ in M.F Goodchild, M.J. Egenhofer, R. Fegeas, and C. Cottman (Eds.), Interoperating Geographic Information Systems, 5–29, Boston, Kluwer Academic Publishers, 1999. 35. K. Stocks and J. Quinn. BData technologies: Geospatial data integration,’’ in W. Michener and P. Tooby (Eds.), Scalable Information Networks for the Environment (SINE). Report of an NSFSponsored Workshop, pp. 23–29, San Diego Supercomputer Center, 2002. 36. USGS. BHydrologic Markup Language (HYDROML), ’’ in http://water.usgs.gov/nwis_activities/ XML/nwis_hml.htm, 2004. 37. USGS. US Geological Survey, in http://www.usgs.gov/, 2004. 38. W3C. BExtensible Markup Language (XML),’’ in http://www.w3.org/XML/, 2003. 39. W3C. BOntology Web Language (OWL),’’ in http://www.w3.org/2001/sw/WebOnt/, 2004. 40. W3C. BXML Linking Language(XLink),’’ in http://www.w3.org/TR/xlink/, 2004. 41. W3C. BXML Path Language (XPath),’’ in http://www.w3.org/TR/xpath, 2004. Springer
176
Geoinformatica (2006) 10: 159–176
Dr. Luis Bermudez holds a degree in Industrial Engineering from the Andes University in Bogota, Colombia (1994) and a Masters and Ph.D. in Civil Engineering from Drexel University (2004) with focus on Hydro-informatics. His primary research is on knowledge representation and semantic mediation of geoscience information systems. He is currently the technical leader of the Marine Metadata Interoperability project. He has been involved in the creation of metadata profiles for the hydrologic community. He has worked as a strategic consultant and has been involved in the development on stand-alone tools and web applications for industry as well as for environmental systems.
Dr. Piasecki holds degrees in Civil Engineering from the University of Hannover, Germany (Diplom, 1991) and the University of Michigan (Ph.D., 1994) with a focus on Water Resources Engineering. He is currently holding the rank of Associate Professor at Drexel University in the Department of Civil, Architectural, & Environmental Engineering, Philadelphia. Dr. Piasecki’s research interests centers on the area of HydroInformatics and focuses on the development of metadata profiles for the hydrologic community as well the creation and representation of hydrologic processes and vocabularies using ontologies. Of special interest is the problem of semantic heterogeneity for data description and the utilization of ontologies to overcome these heterogeneities.
Springer