Metadata Tracking and Service Insertion in Geospatial ... - CiteSeerX

0 downloads 0 Views 121KB Size Report
Keywords: Metadata Tracking; Geospatial; Web Service; Service Chain; ... constraints, and the processing may change related metadata items of source data in.
Semantics-based Automatic Metadata Tracking and Service Insertion in Geospatial Web Service Composition Peng Yue1,2, Liping Di1, Wenli Yang1, Genong Yu1, Peisheng Zhao1 1

Center for Spatial Information Science and Systems (CSISS) George Mason University Suite 620, 6301 Ivy Lane, Greenbelt, MD 20770 2 State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University 129 Luoyu Road, Wuhan, China, 430079 {pyue, ldi, wyang1, gyu, pzhao}@gmu.edu

Abstract. Geospatial semantic Web research at CSISS originated from the many years’ work on interoperability standards, standard-based geospatial Web service and grid service, and intelligent geospatial knowledge system. Currently our main work for geospatial semantic Web research is to explore semantic Web technologies to build the intelligent Web services-based geospatial knowledge system (IWS-GKS). At present, IWS-GKS built a prototype system which can automate the data preparation process for NASA EOS (Earth Observing System) data based on the automated construction and execution of service chains. This paper introduces the key technologies and implementation for the service chaining process. A use case of landslide susceptibility assessment is described to illustrate how this system is used to solve a real world problem. Some further demands for semantic Web technologies based on our experience are also addressed. Keywords: Metadata Tracking; Geospatial; Web Service; Service Chain; Semantic Web Service

1. Introduction The advances in sensor and platform technologies have significantly increased the capabilities for collecting geospatial data in recent years. In the U.S., both military and civilian agencies have used remote sensing to collect a considerable amount of geospatial data. While such data are potentially valuable for our nation’s security and economic growth, they must be converted to geospatial information and knowledge before they can be useful. Normally, the process of geospatial knowledge discovery involves three consecutive steps: 1) Geoquery, locating and obtaining data from data repositories; 2) Geoassembly, assembling the data and information needed for geocomputation from data centers; and 3) Geocomputation, analyzing and modeling the complex geospatial phenomena by using data and information from the geoquery [1]. Because of the multidisciplinary nature of geospatial knowledge, the data obtained from the data centers are diverse. Often, the temporal and spatial coverage, resolution,

origin, format, and map projections are incompatible. As a result, even when the analysis itself is very simple, considerable time is required to obtain and assemble the data and information into a form ready for analysis. If the analyst requests datasets not readily available at data centers, the data and information system at the data centers cannot provide the datasets on demand even if the process to make them is very simple. Therefore, analysts have to spend a considerable amount of time ordering and processing the raw data to produce the data they need in the analysis. The objective of this research is to develop the key technologies for an intelligent Web services-based geospatial knowledge system that will 1) fully automate to the first and second steps of geospatial knowledge discovery in the distributed Web service environment, to allow analysts to focus on the creative process of hypothesis generation and knowledge synthesis rather than spending huge amounts of time in data preparation; 2) fully automate a range of knowledge discovery processes in a limited geospatial domain, using the automated construction and execution of service chains. 3) facilitate the construction of complex geocomputation services and models. The goal of the emerging semantic Web services is to provide mechanisms to organize information and services, allowing human queries to be correctly structured for the available application services (the model components and data), thus “automatically” determining the correct relationships between available data and services and build workflows for specific problems. From the above point of view, the approach described here shares the same goal as semantic Web services except that our approach will deal with geospatial problems in particular. Creating the ontologies and constraint programs may appear to be harder than a human brute-force approach. Whether the complexity and diversity of geospatial data and information and the richness and complexity of all of the operations of interest can, in fact, be represented and manipulated using semantic Web technologies remains to be determined. However, they show considerable promise, and, if successful, will permit analysts to focus on the interpretation of the data rather than, as is often now the case, on the techniques of manipulating that data. This paper describes the effort to automate geospatial service composition in the data preparation process. The semantics of geospatial Web service are represented by using OWL-S, a widely used semantic Web service technology. A mediated RDF structure is developed to enable metadata tracking in the composition process. A composer is developed, which can automatically introduce the data reduction and transformation services into the service chain through ECA (Event-Condition-Action) rules. A use case of landslide susceptibility assessment is described to illustrate how this method can be applied to a real world problem.

2. Challenges 2.1 Data Trustability/Metadata Geospatial data and services are described by their metadata. In most of cases, users rely on the metadata to determine the usability and the trustability of the data and services. Therefore, providing accurate metadata for geospatial products generated by the service chains are very important. In addition, geospatial services are highly datadependent. ISO 19115:2003 [2] specifies metadata for geospatial data, which includes

identification, constraints, data quality, spatial/temporal representation, and content This metadata information should be tracked along the service chain, because different categories of geospatial Web services might have different metadata item constraints, and the processing may change related metadata items of source data in different ways. 2.2 Data Reduction Service and Data Transformation Service Data reduction and transformation services are those such as data subsetting, subsampling, reformatting, reprojection, geometric correction, and radiometric correction. Such services do not change the thematic meaning of the data to which they are applied. These services are common to most geospatial analysis, data mining, and feature extractions. The rules for chaining those services to derive user products are simple and universally accepted in geospatial domain. For example, if the available data’s spatial projection is different from the requested data’s spatial projection, a Web Coordinate Transformation Service (WCTS) can be introduced to finish this reprojection process. 2.3 OWL-S When introducing OWL-S as the vehicle for service representation, syntactical, structural, and semantic heterogeneity in the description of Web services needs to be achieved if two Web services are to be chained. Syntactical interoperability of Web services can be achieved using two common Web service standards: WSDL and SOAP. Dealing with structural and semantic heterogeneity requires a domain knowledge representation of geospatial Web service that should resolve the following issues: 1) How to map between the message schema structures of the chainable services 2) How to make semantic input/output parameter types for OWL-S defined in one application interoperable with those defined in other applications? 3) How to ground the semantic types (called “DataType” in the geospatial domain) match in services’ input/output to the actual mapping between message elements of chainable services?

3. Mediated RDF Structure Different semantic data ontologies, such as the SWEET (Semantic Web for Earth and Environmental Terminology) and GCMD (Global Change Master Directory) ontologies, provide geospatial concept frameworks with their semantic type definitions. These ontologies usually represent high-level semantics and include only general structures. When implementing a real application using these ontologies, detailed XML schema structure information needs to be specified to facilitate the message schema mapping between services. Thus, we have designed mediated data structures, to decouple the RDF structure from the semantic data types. RDF structure is necessary for the service grounding description in OWL-S; it serves as the relay structure from the XML structure of the output of one service to the input XML structure of the next service. Message mappings between services are indirectly embedded in the mapping of the services message schema structure to a mediated RDF structure in the service grounding of OWL-S. The adapters can establish

mappings to the general ontologies structures such as SWEET so that the existing OWL-S doesn’t need to be changed. This design shows two advantages: a) extensibility - the existing OWL-Ss can be adapted to different high-level general geospatial ontology frameworks; and b) independence - only basic concepts are defined in the application with a specific mediated structure Thus, the adapters help to incorporate the application knowledge base to the upper-level concept world.

Fig.1. The “DataType” ontology design

The mediated RDF structure is defined by following the ISO 19115 hierarchies, which provide a well-defined metadata model. This structure can be used to chain services developed by other people who use this standard model in their message schema and thus achieve a higher degree of interoperability. A light-weight RDF structure for all “DataType” entity classes, which acts as a relay structure to convey the element value of WSDL messages, is illustrated in Figure 1. For example, Figure 1 shows that the data URL and file format are identified by “linkage” and “name_MD_Format” respectively. Although we currently only include the metadata

items involved in our test scenario (i.e., the landslide model), we can (and plan to) include more ISO 19115 metadata when needed with this implemented infrastructure.

4. ECA Rule Briefly, an ECA (Event-Condition-Action) (or active) rule [3] operates as follows: when an event occurs, an action is executed if the corresponding condition is true. The usage of the ECA rule in service composition is not the first [4][5]. Given the common usage of data reduction and transformation services, they can be incorporated into the service chain through ECA rules. These rules do not belong to individual services; they are commonly accepted rules in geospatial service chaining. For example, consider a WCTS service. An ECA rule can be specified as follows: Rule WCTSBinding Event: Checking the OWL-S precondition of service or metadata of the requested data Condition: Projection unsatisfied Action: Finding other input data that satisfies the precondition or chaining a WCTS

5. Service Composition Process Service composition consists of three steps: (1) “DataType”-driven service composition: The available data may either be readily obtainable from some data provider or need to be generated at run-time through a service chain. A geospatial catalogue (e.g. Catalogue Service for the Web or CSW) provides information on data availability. In addition to the “DataType” constraint, additional filtering metadata requirements are added to the query on the catalogue. Examples are spatial and temporal extents and data format. If the requested data cannot be found, a service is selected which can produce the requested data. The data query becomes the input “DataType” of the selected service. The process continues until all input data are available for the service chain. The composition process is based on a match, either between two services such that the output of the first service provides the input of the second service or between data and services such that the data provides the input of the service. The match is performed based mainly on the hierarchical relationships of ontology: EXACT, SUBSUME, RELAXED, FAILED. (2) Metadata Tracking: A metadata tracking component based on the ISO 19115 specification has been developed to check the metadata information. The OWL-S description for the services might not contain enough metadata information for the processed data. Only those data queried from CSW are guaranteed to have detailed metadata information; thus, it is assumed that when a service processes archived data, explicitly described metadata items can be changed while other metadata items are transferred unchanged to the data output by the service (or virtual data product). This assumption is reasonable in real geospatial Web service application. For example, a slope computation service

changes only the thematic meaning of data and a WCTS changes only the projection information of the data. Thus a virtual value-added metadata structure is relayed in the service chain created in the first step. Each service in the chain will check this metadata information. When preconditions are not satisfied, available ECA rules are used to determine whether additional data reduction and transformation services are needed. (3) Global optimization: When there is multiple precondition checking for one service, the execution efficiency will depend on the sequence of services (e.g., the spatial coordinate transformation service and the file reformatting service) required to deal with different preconditions. The spatial coordinate transformation service might also have a precondition for the file format requirement. Thus, there can be an optimum strategy as to which service is better chained first. This strategy is affected by the real sources available and might be variable. After the final executable service chain has been constructed, a cleaning process eliminates redundant services that have been included as a result of precondition checking. For example, two file reformatting service instances may be sequentially included in the chain to convert the file format from NITF to HDF to GeoTiff. We replace such two instances with a single NITF to GeoTiff reformatting service if multiple file reformatting services with different capabilities exist.

6. Disaster Management Sample 6.1 Implementation A prototype system has been implemented. This system operates on NASA EOS (Earth Observing System) data and uses a number of OGC (Open Geospatial Consortium) standards-compliant services, with OWL/OWL-S ontology descriptions. A grid-enabled CSW [6] is used for data query. OWL-S API1 is used for OWL-S parsing and grounding execution. Jena Transitive and the OWL-Micro Reasoner2 are selected for reasoning (The first one is effective in the subsumption reasoning and the second one is effective on the taxonomy classify process in the OWL-S precondition check). OWLSManager, a component for OWL-S Files Management, which can deploy and undeploy OWL-S files into the knowledge base, is developed. We have applied our implementation to the landslide risk use case (Figure 2) to test the effectiveness of this approach. Multiple OWL-S files are registered in the OWLSManager. The result shows that both OGC-compliant and other Web services are involved in the final service chain created by the automatic service composition process. In this case, an EXACT match cannot automatically produce landslide susceptibility data because the ETM (Landsat Enhanced Thematic Mapper) NDVI (Normalized Difference Vegetation Index) service’s output ETM NDVI is not exactly the same as the NDVI input required by the landslide susceptibility service. A SUBSUME match is used to achieve this goal. We have experimented with using SWRL (Semantic Web Rule Language) support in the OWL-S preconditions. This was done through a slopeService using data having 1 2

http://www.mindswap.org/2004/owl-s/api/ http://jena.sourceforge.net/inference/index.html

a different spatial coordinate reference system from that of the target output, (a WCTS was required) and a different file format from that of the target output, (a file reformatting service was required). The WCTS service involved also has a precondition on the file format. When the precondition is not satisfied, the built-in implementation of the ECA rule is activated. A graphical view of the final service chain created by the metadata tracking process is illustrated at figure 3.

Fig.2. landslide risk use case

An online demonstration of the whole implementation is available at http://www.laits.gmu.edu/geo/nga/demo_instruction.html. 6.2 Result Analysis and conclusion The use case of deriving the landslide risk data product shows the applicability of the “DataType”-driven service composition for automatic geospatial knowledge discovery, and the metadata tracking process with the ECA rule usage can highly automate the data preparation process. The ontology design makes our application knowledge open to other applications located under different high-level general geospatial ontology frameworks. The experiment on the use case of landslide risk shows that a service composition based on the “DataType” match is simple and efficient. When introducing precondition descriptions in OWL-S, the use case for slope OWL-S shows that only one precondition, supportedProjection, requires much time on the built-in rule employment largely because it involves precondition checking(reasoning), parsing and information extraction from the SWRL precondition and the transfer of the spatialProjection extracted from the slope precondition to the target spatialProjection parameter of WCTS. When multiple preconditions are involved or different metadata constraints are interweaved in one precondition, the process can be highly complex. We will pay close attention to the development of semantic Web services. The chaining of data reduction and transformation services can be defined as either a customized rule or a built-in rule in our prototype system. Since this rule is related

to a customized action, e.g., inserting WCTS between the DEMService and the slopeService, it has to relate to a user-implemented action. To support such rules, there are two requirements: a) an interoperable rule representation mechanism; and b) action support in the framework for the specified predicate in the rule, i.e., implementation of the required action when the rule is triggered. We are in the process of testing implementation of a plug-in framework for such rule support and are paying close attention to rule development in the semantic Web.

Fig.3. Metadata tracking proof

Acknowledgements This work is supported by grants from National Geospatial-Intelligence Agency (NGA)’s NURI program (HM1582-04-1-2021, PI: Prof. Liping Di), the NASA REASoN program (NNG04GE61A, PI: Prof. Liping Di), and the NASA Advanced Information System and Technology program ((NAG-13409, PI: Dr. Liping Di).

References 1. Di, L., and K. McDonald, 1999 "Next Generation Data and Information Systems for Earth Sciences Research", in Proceedings of the First International Symposium on Digital Earth, Volumn I. Science Press, Beijing, China. p 92-101. 2. ISO/TC 211, ISO19115:2003, Geographic Information — Metadata. 3. Collet, C., Coupaye, T. and Svensen, T., 1994. NAOS: Efficient and Modular Reactive Capabilities in an Object-Oriented Database System. In Proceedings of the International Conference on Very Large Databases, pages 132{143, Santiago, Chile, September 1994. 4. Medjahed, B., Benatallah, B., Bouguettaya, A., K. Elmagarmid, A., 2004. WebBIS: An Infrastructure For Agile Integration Of Web Services. Int. J. Cooperative Inf. Syst. 13(2): 121-158 (2004) 5. Medjahed, B., 2004. “Semantic Web Enabled Composition of Web Services”. Ph.D. Dissertation, Virginia Polytechnic Institute and State University, Falls Church, Virginia, USA, 278 pp. 6. Wei, Y., Di, L, Zhao, B., Liao, G., Chen, A., Bai, Y., Liu, Y., 2005. The Design and Implementation of a Grid-enabled Catalogue Service. 25th Anniversary IGARSS 2005, July 25-29, COEX, Seoul, Korea. pp. 4224 – 4227.