An integrated framework for geospatial data discovering ... - IEEE Xplore

3 downloads 56 Views 324KB Size Report
Yuanzheng Shao1, Liping Di1, Lingjun Kang1, Yuqi Bai2. 1Center for Spatial .... are summarized as follows: (1) data client sends the data discovery query to ...
An integrated framework for geospatial data discovering and standardized processing Yuanzheng Shao1, Liping Di1, Lingjun Kang1, Yuqi Bai2 1

Center for Spatial Information Science and Systems, George Mason University, Fairfax, Virginia, USA Ministry of Education Key Laboratory for Earth System Modeling, Center for Earth System Science, Tsinghua University, China Email: {yshao3, ldi, lkang3}@gmu.edu, [email protected]

2

Abstract—This paper propose a solution to discover geospatial data from multiple distributed data center in the Web environment. There are two-level of data discovering defined in the paper: dataset discovery and data granule discovery. To find the dataset of interest, multiple dataset clearinghouses were integrated under a dataset catalog federation. Based on the returned dataset information, an integrated catalog was implemented to find data granule from remote data providers. OGC Catalog Service for Web specification was adopted as the interface of catalog federation. The mediator-wrapper architecture was proposed to solve the query language translation and metadata conversion. For the returned data granule, a standardized data processing service, Web Coverage Service was designed and implemented to process heterogeneous data through a universal interface. Overall, the proposed solution in this paper was demonstrated to be effective to find dataset and data granule of interest from multiple sources, and process acquired heterogamous data through standardized interface. Keywords—geospatial data processing; catalog federation

I.

discovering;

standardized

INTRODUCTION

With the availability of more and more Earth observation satellites, the large amounts of geospatial data are archived in multiple data center. The geospatial data can be used in many scientific fields such as agriculture, land use and climate change. To discover and use the geospatial data of interest from multiple data center for multi-disciplinary research, the scientists firstly need to find which datasets meet the research requirement, search the data granule for the dataset of interest based on spatial-temporal constraints, and then process the data granule to access the hidden information. The complex features of data distribution and heterogeneous characteristic of data granule bring challenges on how to make better use of geospatial data in related research. To find the datasets of interest, the scientists are required to deal with multiple website for datasets directory. To get the data granule of the datasets from different data center, the scientists may have to register multiple user accounts, handle different query interfaces and information models (Bai et al., 2007). To access the retrieved data granule, heterogeneous data format and projection also need to be processed (Shao et al., 2011). To address these challenges, this article proposes an integrated framework which facilities geospatial data discovering and standardized processing. The framework contains three

modules: integrated catalog service for dataset discovering, integrated catalog service for data granule discovering, and integrated service for data granule standardized processing. The catalog service for dataset and data granule discovering adopts the mediator-wrapper architecture to handle the query and dispatch the query to corresponding connectors. The OGC Catalog Service for Web (CSW) with ISO-19115 profile is implemented to provide global query interface and metadata model (Voges et al., 2007). Regarding to the catalog service for dataset discovering, two major datasets discovering source, GEOSS Component and Service Registry (CSR) and International Directory Network (IDN) are integrated. Regarding to the catalog service for data granule discovering, some major data centers worldwide, like NASA EOS ClearingHOuse ECHO (ECHO, 2013) and NOAA Comprehensive Large Array-data Stewardship System (CLASS), have been integrated. Regarding to the service for data granule processing, the OGC Web Coverage Service (WCS) is employed to process the coverage of data granule. ISO-19115 metadata standard is also implemented to describe the GetCoverage response of WCS. A prototype system is implemented based on this framework, which provides a single point to the clients for discovering dataset and data granule, and processing the coverage of data granule with the functions like re-format, reprojection and subsetting. The integrated catalog proposed in this article was proved to be an effective tool to discover geospatial data, which also can be used to establish a federated catalogue system for different communities (Shao et al., 2013). The integrated service for data granule standardized processing was also proved to be an effective tool to process heterogeneous geospatial data. II.

CATALOG FEDERATION

Two different catalog federations are defined in this paper: dataset catalog and data granule catalog. Dataset is a collection of data, and data granule is the smallest aggregation of data that can be independently managed. In this paper, dataset catalog federation is used to find the dataset of interest from multiple dataset sources, and data granule catalog federation is used to find the data granule of interest from multiple data providers.

A. Mediator Wrapper Architecture Mediator-wrapper architecture is used to design the federated catalogue (Stoimenov wt al., 2000). As shown in the Figure 1. The data source archive data and distribute them in the Web environment. The wrapper on top of each data source provides a standard-compliant universal query interface to outside world by hiding heterogeneous metadata information models, query language and access model. Mediator, on the top level, accepts query from data users and dispatch the request to related wrapper, assembles the query results from individual data center, and returns assembled results to the data users.

C. Dataset Catalog Federation In this paper, two dataset sources were investigated and integrated: GEOSS Component and Service Registry and International Directory Network (IDN). CSR is similar to a library catalogue. Resource providers of governments and organizations could register their resources in CSR through providing essential details about the name, contents, Earth observation vocabulary, standards and special arrangements of their contribution (Bai et al., 2012). The CEOS IDN is a Gateway to the world of Earth Science data and services. IDN is an international effort developed to assist researchers in locating information on available datasets and services (International Directory Network, 2013). Both IDN and CSR have implemented catalog service. OGC CSW standard are adopted to distribute the dataset information in their archive. The dataset catalog federation also adopted CSW standard, and implemented ISO profile. Fig. 2 shows the flowchart of dataset catalog federation. The request sent from clients will be dispatched to remote catalog, and the response will be integrated in the Mediator part and return to clients.

Figure 1. The Mediator-Wrapper architeture

The interactions among the users, mediator, and wrapper are summarized as follows: (1) data client sends the data discovery query to mediator by following specific query interface; (2) mediator gets and parse the request, dispatches request to related wrapper; (3) wrapper receives the request, translates the global query language to local query language, gets the response, converts the response to global metadata model and returns converted results to mediator; (4) mediator sends the response to data clients. B. OGC Catalogue Service for Web OGC CSW specifies the interfaces between clients and catalogue services, through the presentation of abstract and implementation-specific models. Catalogue services support the ability to publish and search collections of descriptive information (metadata) for data, services, and related information objects. Metadata in catalogues represent resource characteristics that can be queried and presented for evaluation and further processing by both humans and software. Catalogue services are required to support the discovery and binding to registered information resources within an information community (Nebert et al., 2007). The federated catalogue in this paper implements OGC Filter specification. As to the protocol binging, HTTP protocol is supported, and both HTTP GET and HTTP POST methods are implemented. Harvest operation is not implemented, but the following operations, defined in OGC CSW specification, have been implemented. They are GetCapabilities, DescribeRecord, GetDomain, GetRecords and GetRecordById operation.

Figure 2. The flowchart of dataset catalog federation

D. Data Granule Federation There are some major data centers in the worldwide which archive massive geospatial data. For example: the NASA EOS ClearingHOuse (ECHO), the NOAA's Comprehensive Large Array-data Stewardship System (CLASS) and U.S. Geology Survey (USGS) - Landsat Catalog System, Brazil National Institute for Space Research (INPE) catalogue and China Academy of Opto-Electronics (AOE) catalogue. All of those data center provide portal or service to enable data user to discover data of interest. In order to implement integrated catalogue, a global query interface should be adopted. In this paper, OGC Catalog Service for Web is selected. The implemented version is 2.0.2. CSW specification proposes an OGC common catalogue query language, which has to be supported by all compliant OpenGIS catalogue service. This query language supports nested Boolean queries, text comparison operations and temporalspatial operators. The federated catalogue in this paper

implements OGC Filter specification. As to the protocol binging, HTTP protocol is supported, and both HTTP GET and HTTP POST methods are implemented. Geographical information – Metadata (ISO-19115) and its extension for imagery and gridded data (ISO 19115-2) metadata model are selected as the global metadata model for federated catalogue (ISO 19115, 2003). To support query interface translation, the mapping table between OGC CSW common query language and individual native query language are established. At wrapper level, all queries dispatched from mediator will be translated to native query interface. For example, the query about ECHO dataset will be translated to IIMSAQL language (ECHO 2013). IIMSAQL defined the element for specifying collection identifier, spatial extent, temporal range, data center identifier, and many other features, such as Percentage of Cloud Cover, of the data center and collection. As to metadata conversion, like query interface, the mapping tables between ISO 19115 and metadata model used in legacy catalogues are also created. The issue needs to solved during the conversion is the asymmetry feature of different metadata.

digital geospatial information representing space/time-varying phenomena (Baumann, 2010). The latest version of WCS specification is 2.0, released on October 2010. WCS specification defines three operations: GetCapabilities, DescribeCoverage and GetCoverage. A GetCapabilities operation allows a WCS client to retrieve service metadata and coverage offered by a WCS server. A DescribeCoverage request submits a list of coverage identifiers and returns, for each identifier, a description of the coverage. A GetCoverage request prompts a WCS service to process a particular coverage selected from the service’s offering and return a derived coverage. By implementing WCS, the client can perform re-format, re-projection, and subset operation through standardized parameter in the request. The heterogeneous data granule will be processed in the back end with GDAL tool, and other native library if necessary. Figure 4 illustrates the process flowchart of GetCoverage operation. B. Metadata of WCS response Different data granule may adopts different metadata depends on format. For example, GeoTIFF may use TIFF Tags (Key-value pair) to convey information about the imagery, and NetCDF format may apply Climate and Forecast (CF) metadata standard. Data provenance, also called data lineage, records the derivation history of a data product (Di et al., 2013). In Earth science domain, geospatial data provenance is important because it plays a significant role in data quality and usability evaluation, data trail audition, and workflow replication. The generation of the geospatial provenance metadata is usually coupled with the execution of geoprocessing workflow.

Figure 3. The strcture of data granule catalog federation

Figure 3 illustrate the framework of data granule catalog federation. The universal query will be dispatched to connector. The connector will translate the query to native query language and execute the query, convert the native response to the global metadata. III.

STANDARDIZED PROCESSING

Different data granules generally have different properties, such as the projection and format, which results in the heterogeneity at the data access level. The data granule catalog federation provides only access to original archive data, without providing customized data-related services. In order to enhance the interoperability of geospatial raster data access, the existing Geospatial Data Abstraction Library (GDAL) model is extended to support metadata and lineage information. The framework for geospatial data customization is built and OGC WCS is implemented under this framework. Figure 4. Process flowchart of WCS GetCoverage operation

A. Web Coverage Service The OGC Web Coverage Service (WCS) supports electronic retrieval of geospatial data as "coverages" – that is,

To describe WCS results with universal metadata model, and keep the data lineage information, ISO 19115 metadata model was adopted in this paper.

IV.

CONCLUSION

This paper proposes a solution of integrated framework to discover geospatial data from the distributed geospatial catalog, and access data from the heterogeneous geospatial data granules. The proposed solution and implementation proves that: (1) the mediator-wrapper architecture is suitable for creating a federated catalog system that unifies multiple heterogeneous member catalog systems, (2) OGC CSW can be used as global query interface in catalogue federation, (3) ISO 19115 and ISO 19115-2 can be used effectively as the standard metadata information model for such a federated geospatial catalog system, and (4) OGC WCS can be utilized to access data with flexible customized interface. ACKNOWLEDGMENT The integrated catalog part is supported by a grant from the National Oceanic and Atmospheric Administration (NOAA). The WCS service part is co-sponsored by a grant from the National Aeronautics and Space Administration (NASA) in the GeoBrain project and a grant from the Open Geospatial Consortium (OGC) in the OWS-9 phase.

[4]

[5]

[6]

[7] [8] [9]

[10]

[11]

REFERENCES [1]

[2]

[3]

Y. Bai, L. Di, A. Chen, Y. Liu, and Y. Wei, “Towards a Geospatial Catalogue Fedaration Service”, Photogrammetric Engineering & Remote Scensing, 2007, pp. 699-708. Y. Shao, L. Di, J. Gong, P. Zhao, “GIS in the Cloud: Implementing a Web Coverage Service on Amazon Cloud Computing Platform”, Electrical Engineering and Control, 2011, pp. 289-295. Y. Bai, L. Di, D. Nebert, A. Chen, Y. Wei, X. Cheng, Y. Shao, D. Shen, R. Shrestha, H. Wang, “GEOSS Component and Service Registry: Design, Implementation and Lessons Learned”, IEEE Journal of

[12]

[13]

Selected Topics in Applied Earth Observations and Remote Sensing, 2012, pp. 1678-1686. D. Nebert, A. Whiteside, Vretanos, P. 2007. OpenGIS Catalogue Service Implementation Specification (OGC 07-006r1), URL: http://www.opengeospatial.org/standards/cat, OpenGIS (last date accessed: 10 August 2013) Voges, U., Senkler, K. 2007. OpenGIS Catalogue Services Specification 2.0.2 - ISO Metadata Application Profile (07-045), URL: http://www.opengeospatial.org/standards/cat, OpenGIS (last date accessed: 10 August 2013) ISO 19115:2003: Geographic Information – Metadata, URL: http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm? csnumber=26020, International Standard Organization (last date accessed: 10 August 2013) International Directory Network, 2013, URL: http://idn.ceos.org/ , CEOS IDN (last date accessed: 10 August 2013) ECHO, 2013. EOS Metadata Clearinghouse (ECHO), URL: https://earthdata.nasa.gov/echo (last date accessed: 10 August 2013) Y. Shao, L. Di, Y, Bai, L. Kang, H. Wang and C. Yang, “Federated Catalogue for Discovering Earth Observation Data”, PhotogrammetrieFernerkundung-Geoinformation, 2013, pp. 43-52 L. Di, G. Yu, Y. Shao, Y. Bai, M. Deng and K. McDonald, “Persistent WCS and CSW services of GOES data for GEOSS”, 2010 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2010, pp. 1699-1702 L. Di, Y. Shao and L. Kang, “Implementation of Geospatial Data Provenance in a Web Service Workflow Environment With ISO 19115 and ISO 19115-2 Lineage Model”, IEEE Transactions on Geoscience and Remote Sensing, 2013, pp. 1-8 L. Stoimenov, S. Djordjevic-Kajan, and D. Stojanovic, “Integration of GIS data sources over the Internet using mediator and wrapper technology,” Electrotechnical Conference, 2000. MELECON 2000. 10th Mediterranean, pp. 334-336. P. Baumann, OGC® WCS 2.0 Interface Standard - Core (OGC 09110r3), URL: http://www.opengeospatial.org/standards/wcs, OpenGIS (last date accessed: 10 August 2013)

Suggest Documents