Grid-enabled Standard-compliant Open Computing Environment for Earth Science ... application analysis and professional problems-resolving which requires ...
Grid-enabled Standard-compliant Open Computing Environment for Earth Science Exploration and Applications Aijun Chen, Liping Di, Yaxing Wei, Yuqi Bai, Yang Liu Center for Spatial Information Science and Systems (CSISS) George Mason University 6301 Ivy Ln. Ste.620, Greenbelt MD 20770 USA Abstract -- The seamless access to large quantities of distributed, heterogeneous geospatial data/ information is still a critical issues to Earth science exploration and applications now. Using Grid computing and geospatial standards technologies, we devise an open, reusable, optimized, secured and service-oriented computing environment for semi-automatically archiving, sharing and serving geospatial data/information in the way that user required. The environment uses a Grid serviceoriented architecture based on the Globus toolkits. The most important of them are the Data/Information Retrieval Mediator (DIRM) and Data/Information Serving Mediator (DISM). DIRM is based on our refined geospatial metadata for describing geospatial data/information and services and Globus replica manger service for managing replicas of massive data. DISM consists of Grid-enabled geospatial standard services for providing on-the-fly user-customable data. This environment offers an open, secure, ubiquitous and standard-compliant geospatial computing platform to provide Earth science communities with access to massive geospatial data for high-performance, data- and computing-intensive scientific research and applications. I. INTRODUCTION On the one hand, it is now common for so many academic institutes, governments sectors and commercial corporations to engage in the collaborative, complex, and secure application analysis and professional problems-resolving which requires large quantities of distributed, heterogeneous geospatial data, services, computing abilities and other facilities over the world. On the other hand, let us see, only are NASA EOS platforms going to generate at least a terabyte a day data, and probably another terabyte in products. So do other sectors, for example, NOAA, USGS, and so on. It’s a lot of data to archive and serve on a routine basis for much global scientific research. And most of time, data security among many different organizations is highly required. How to effectively take advantage of the existed idle computing resources to securely archive and serve very large quantities of geospatial data to Earth science communities through up to date Grid and geospatial standards technologies is argued in this paper. Grid computing, as a very important new e-science information technology, provides an approach to addressing the above formidable challenges associated with the complete integration of heterogeneous computing systems and data resources with the final aim of providing a global computing space with global resources. The geographically and organizationally dispersed computational resources, such as
CPUs, storage systems, data and its related services, and human collaborators have been brought together through the Grid to securely provide advanced distributed data-intensive and high-performance computing environment to users in one or more Virtual Organizations (VOes) which use the Certificate Authority (CA) and Authentication security policy [1]. Now, the Globus Toolkit is the representative and the de facto standards of Grid technology. In geoscience field, ISO/TC211, FGDC, and NASA etc. respectively proposed the geospatial metadata standards to standardize the description of geospatial data for sharing geospatial data [2-4]. The Open Geospatial Consortium (OGC) is also devoted to the sharing of geospatial resources, such as data, services etc. over different geospatial information systems. It initiated the OGC Web Services (OWS) to enable seamless integration of a variety of online geoprocessing and location services, such as Web Coverage Service (WCS), Web Map Service (WMS) and Catalog Service for Web (CS/W). We have implemented WCS, WMS and CS/W before [5][6]. In the following sectors of this paper, firstly, we introduce our Grid-service-oriented architecture. The DIRM and DISM are detailed to explain how we manage, retrieve and serve geospatial data/information. Then, based on this architecture, two kinds of user-friendly interfaces are designed and implemented for facilitating user easily access to geospatial data/information and computing resources through the environment. We successfully apply the environment to NASA-sponsored research project. II. GRID-SERVICE-ORIENTED SYSTEM ARCHITECTURE The environment is to achieve Earth science resources sharing and collaborative geospatial-related problem solving in dynamic, multi-institutional Virtual Organizations (VO) based on the Grid technologies and Open Geospatial Consortium (OGC) standards interfaces. The key is both DIRM and DISM which consist of Grid services. Any user who has a Grid environment can deploy and run them. Any authorized qualified user can access Earth science resources over the authorized Virtual Organization (VO). Then any general public, such as scientist, government official and policy maker, can transparently use these services through OGC standard-compliant interfaces. A. Data/Information Retrieval Mediator (DIRM) DIRM consists of Grid-enabled Catalogue Service Federation (GCSF) and a series of Grid-enabled Real Cataloguer Service. GCSF and all GRCSes are Grid services. Any data and services’ query go to GCSF. Any organization
U.S. Government work not protected by U.S. copyright
237
can provide their catalog services to join in our GCSF to provide the Grid-enabled catalog serving for their specific data. User requests received by GCSF query the logical or physical dataset and/or service instance from which the real data can be produced. The query to the logical data involves the Globus Replica Location Service (RLS) and IndexService (MDS) for getting the best physical data and services with high performance [7][8]. Any user such as individual, government, university, institute in a VO can transparently get back useful information
about what data served, where and how to get it from Data/Information Retrieval Mediator (DIRM) by submitting some normal or specific requirements. Then user can transparently get back on-demand data/information by Data/Information Serving Mediator (DISM) that receives user further requirements based on the information user getting by DIRM. User can also construct new capabilities transparently and dynamically. Both DIRM and DIS M are respectively Grid service suites among which collaboration and interaction occur (see Figure 1).
Users (Individual, Government, University, Institute etc.)
Data/Information Retrieval Mediator
Other Catalogue Services
Data/Information Serving Mediator
Grid-based Catalogue Services
Grid-based Data/ Information Services
Other Data/ Information Services
Geospatial Web Services Grid Computing Infrastructure (Globus Toolkits, Condor etc.) Super Computing Center, Cluster, Server, PC, Massive Storage System/Network etc. Figure 1. Grid-service-oriented system architecture B. Data/Information Serving Mediator (DISM) DISM consists of Control Grid Service (CGS) and a series of Grid services, such as Grid-enabled WCS (GWCS), Grid-enabled WMS (GWMS), that are for serving geospatial data/information in different way with different format. CGS receives data request from user interfaces. The request is formed based on user query results from DIRM and the interactive user interfaces. When CGS receives the request for real data, it directly forwards the request to data serving services. But, when CGS receives the query results of logical Dataset for real dataset, it will call Replica Location Service (RLS) that tells where and how many replica of dataset available. Globus IndexService is used to provide on-the-fly idle computing resources for selecting the best service that finally is used to serve the selected data. The GridFTP-based Data Transfer Service (DTS) is designed to transfer data from one data site to another for facilitating the data processing or serving. Because some data sites maybe only provide data, there is no service serving there; and some sites can’t serve data directly to general public users. III. USER ACCESS INTERFACES A. OGC standard interfaces
This interface is mainly based on the OGC Web Data Service Specifications, including Web Coverage Service (WCS), Web Map Service (WMS), Catalogue Service for Web (CS/W) and so on[9-11]. The specification allows seamless access to geospatial data in a distributed environment. Currently, we have developed WCS, WMS and CS/W with standard interfaces for registering, querying and serving geospatial data. Some of specific geospatial services such as Web Image Classification Service (WICS), Web Coordinate Transformation Service (WCTS) is developing and testing. OGC web services standards provide standard user interface in Key-Value Pair (KVP) and XML document format. Based on those formats, we provide web-based GET, POST and SOAP methods for sending user request and returning results. KVP standard format is for GET method, and XML format is for POST and SOAP methods. All query results of services are returned in XML format [9][11]. The most important is that we implemented Grid-enabled OGC web services. It means that any user can follow the OGC standards web services to access to these services to take advantages of the Grid computing technologies, but don’t know more about the Grid in the environment. These interfaces are open and public to any users who would like to follow the OGC web service standards and can
U.S. Government work not protected by U.S. copyright
238
access those services by firstly getting the WSDL of those services. B. Standard Grid service interfaces Globus Toolkit has become the de facto software standard of Grid technology. We use the Globus Toolkit as our main Grid software to establish the Grid environment. Open Grid Service Architecture (OGSA) and Web Service Resources Framework (WSRF) compliant OGC web services have been designed and developed and deployed in the Grid environment. These services provide the standard Grid service interface, and can securely invoke or be invoked by any other Grid services from any other Grid system that belongs to the same Virtual Organization as our Grid system or is authenticated to or by our Grid system. In the Grid service level, we also developed some other Grid services for moderating user requests, optimizing the service performance and transferring geospatial data over the VO. All Grid services are secured through Grid authorization and authentication mechanisms, but can be transparently accessed by any authorized and authenticated users. The following geospatial Grid services are designed and implemented in our system abiding by OGSA specification and OGC Web Services standards. Grid-enabled Catalog Service for Web (GCSW), is a RCS, securely provides Grid-based archiving, publishing managing, and querying of geospatial data and services and furnishes transparent access to the replica data and related services under the Grid environment. The information model of GCS/W is based on that of OGC CS/W [10]. Any potentially detailed and rich sets of geospatial data/information can be registered with GCS/W and served by GWCS/GWMS under Grid environment. Grid-enabled Web Coverage Service (GWCS), is a Grid services, provides OGC standard interfaces, including getCapabilities, getCoverage, and describeCoverage. Any other Grid services and OGC standards-compliant client can access and interact with GWCS. It provides access to all geospatial data /information that is registered with GCS/W and managed by Grid, in forms that are useful for client-side rendering, value-added processing, inputting into scientific models and other geospatial processing systems [11]. Grid-enabled Web Map Service (GWMS) responds to user requests to dynamically produce static maps from spatially referenced geographic information by filtering and portraying geospatial data under the Grid environment. The getCapabilities and getMap standard interfaces are Gridenabled and provided here. It can securely access any geospatial data to produce a map in the format required by the user. Therefore, through these two kinds of interactive interfaces, not only can Grid user that has Grid environment, but also OGC user accesses Grid-enabled geospatial data/information and computing resources.
IV. ENVIRONMENT IMPLEMENTATION AND ITS APPLICATION The service-oriented architecture is used for constructing the environment in two ways: OGC Web service and Grid service. We implemented this environment based on the Globus Toolkit over our VO which consists of three machines with three different computing system: Linux, Sun Solaris and Apple Mac OS. CS/W, WCS, WMS are implemented and deployed at each machine for registering, retrieving, and serving data by using the data and computing resources located on them. These services provide the OGC web services interfaces. GCS/W, GWCS, GWMS are implemented and deployed on three machines with Globus toolkit as the Grid services infrastructure [12]. We apply our research achievements to NASA-sponsored research program – “Integration of OGC and Grid technologies for Earth Science Modeling and Applications” to take advantage of Grid technology for OGC technology to serve NASA data to geoscience. Because we have three project partners, so, we establish the project’s big Virtual Organization (VO) which now consists of the GMU LAITS VO, the NASA Ames Research Center (ARC) VO, and the LLNL ESG VO, each of which has its own Certificate Authority (CA) that is used to issue host, user and services certificates for qualified users over the big VO. We set the authorization and authentication up between any two VOes and any two machines over the big VO. Any Grid services in the big VO invoke or be invoked by other Grid services of this VO or any other authenticated Grid services from another authorized VO. We implement all OGC Web services interfaces for supporting the standard-compliant geospatial operations to archive, share, and serve geospatial data and services, and also make these operations work as Grid Services under the Grid environment to achieve Grid-enabled OGC services for the Earth science community. The software environment is now running to serve almost 20TB NASA HDF-EOS data, Landsat Geotiff data from LAITS/GMU and some simulated climate data from LLNL ESG, which potentially provides more than 40TB climate simulation data. V. CONCLUSION AND FUTURE WORKS An open, secure and SOA-based geospatial computing environment is built up based on the Grid computing technology. Data/Information Retrieval Mediator and Data/Information Serving Mediator work as the intelligent mediator to collaborate the Grid services in the environment for retrieving and serving data/information and make best use of the computing resources over the Virtual Organization. GCSF, GRCS, CGS, GWCS, GWMS, RLS, MDS and DTS are developed and used to support the DIRM and DISM. Two kinds of standard interfaces: OGC standard-compliant web service interfaces and standard Grid service interface are provided through the environment. Both kinds of standard interfaces are able to make best use of the data/information and computing resources over the VO based on the Grid computing technology. We apply this environment to our
U.S. Government work not protected by U.S. copyright
239
NASA-sponsored project and make a great success for our research. We have done part of research for virtually building up application abstract model and further producing geospatial data products based on geospatial expert’s knowledge and workflow technologies. The information model of the GCS/W and other information and directory services have to be extended to make it provide enough information to construct the abstract workflow, concrete workflow and related parameters needed by application models when an abstract workflow is executed. A Grid-enabled workflow engine is under the research for accepting and processing Grid services under this environment. ACKNOWLEDGEMENT The project “Integration of OGC and Grid technologies for Earth Science Modeling and Applications” is supported by the NASA Earth Science Technology Office (ESTO) (PI: Liping Di). REFERENCES [1] The Globus Alliance. The Globus Toolkit. http://www.globus.org/. referred by Apr. 15, 2006. [2] ISO. Geographic Information – Metadata, ISO http://www.isotc211.org/protdoc/ 19115:2003, FDIS/ISO_ FDIS_19115_(E).pdf, Dec.01, 2002. [3] ISO draft. Geographic Information – Metadata – Part 2: Extensions for imagery and gridded data, ISO/WD 191152.2, http://www.isotc211.org/protdoc/211n1337/ 211n1337.pdf, Oct. 13, 2003.
[4] FGDC. Content Standard for Digital Geospatial Metadata: Extensions for Remote Sensing Metadata. http://www.fgdc.gov/standards/status/ csdgm_rs_ex.html, Dec. 2002. [5] Chen, A., Di, L., Wei, Y., Liu, Y., Bai, Y., Hu, C., and Mehrotra, P. Grid enabled Geospatial Catalogue Web Services. American Society for Photogrammetry and Remote Sensing 2005. Baltimore, MD, USA. Mar. 7-11. [6] Di, L., Yang, W., Chen, A., Liu, Y., and Wei, Y. The Development of a Geospatial Grid by Integrating OGC Web Services with Globus-based Grid Technology. GridWorld™/GGF15. Boston, MA, USA. Oct. 3-6. [7] The Globus Alliance. The Globus Alliance. RLS Documentation. http://www-unix.globus.org/toolkit/docs 3.2/rls/ index.html. May 5, 2004. [8] The Globus Alliance. The Globus Alliance. MDS 2.2 User Guide. USC/ISI, Mar. 10, 2003. [9] Open Geospatial Consortium (OGC). OpenGIS Catalogue Services – Web application profile (CSW), Open Geospatial Consortium Inc. Feb. 5, 2004. [10] OpenGIS Catalogue Service Specifications 2.0 – ISO19115/ISO19119 Application Profile for CSW2.0. Open Geospatial Consortium Inc. Jun. 15, 2004. [11] Open Geospatial Consortium (OGC). “Web Coverage Service (WCS), Version 1.0.0”, Open Geospatial Consortium Inc., Aug. 27 2003. [12] Di, L., A. Chen, W. Yang and P. Zhao (2002). The integration of Grid technology with OGC Web Services (OWS) in NWGISS for NASA EOS Data. HPDC12 & GGF8 on June 24 - 27, 2003, Seattle, USA.
U.S. Government work not protected by U.S. copyright
240