Spatial Grid Services for adaptive spatial query ...

2 downloads 236 Views 356KB Size Report
The Open Grid Service Architecture (OGSA) makes it is feasible to integrate all kinds of OWS into grid to ..... The details of network monitor will not be referred in ...
Spatial Grid Services for adaptive spatial query optimization Bingbo Gao∗ab, Chuanjie Xiea , Wentao Shenga a Key Laboratory of Resources and Environment Information System, Institute of Geographical Sciences & Natural Resources Research, CAS, Beijing; b College of resource and environment, Graduate University of Chinese Academy of Sciences, Beijing ABSTRACT Spatial information sharing and integration has now become an important issue of Geographical Information Science (GIS). Web Service technologies provide a easy and standard way to share spatial resources over network, and grid technologies which aim at sharing resources such as data, storage, and computational powers can help the sharing go deeper. However, the dynamic characteristic of grid brings complexity to spatial query optimization which is more stressed in GIS domain because spatial operations are both CPU intensive and data intensive. To address this problem, a new grid framework is employed to provide standard spatial services which can also manage and report their state information to the coordinator which is responsible for distributed spatial query optimization. Keywords: spatial query optimization, spatial data grid service, spatial computing grid service, WSRF, WFS, WPS

1. INTRODUCTION With wide use in many aspects of social activities, integrating the distributed spatial information resources is becoming increasing important for GIS. Based on Web Service, the Technical Committee of Geographic Information/Geomatics of ISO (ISO/TC211) and Open Geospatial Consortium (OGC) who both make effort to promote spatial information sharing and interoperation, have worked out a series of spatial information over network sharing and interoperating standards, namely OGC Web Service (OWS). Meanwhile, grid which “supports the sharing and coordinated use of diverse resources in dynamic, distributed ‘virtual organizations’ (VOs)” (Ref. 10), is now being harnessed to facilitate the spatial information sharing and integration. Several spatial information sharing and application gird system have emerged, for instance Earth System Grid of U.S.Department of Energy (DOE), SpaceGrid of The European Space Agency (ESA). The Open Grid Service Architecture (OGSA) makes it is feasible to integrate all kinds of OWS into grid to provide abundant dynamic and stateful spatial services and to combine these spatial services together to provide single, simple but powerful spatial services. For example, NASA Earth Science Technology Office (ESTO) has awarded an investigation which has successfully integrated Web Map Service (WMS) and Web Coverage Service (WCS) in to grid to provide on-demand geospatial data service for Earth Science Modeling and Applications (Ref. 11). In the spatial information grid (SIG), spatial data query, which is used to access distributed spatial data, plays an important role. Because spatial operations are both CUP intensive and data intensive, it’s of significant importance to optimize the execution of spatial data query. But the grid is such a dynamic environment that the states of nodes frequently vary and if a query execution plan is generated according to fixed values of costs parameters, the probability of failure or low efficiency would be high. It is difficult to apply the traditional optimization mechanisms used in distributed database system to spatial data query optimizations in SIG, and as far as we known researches about the adaptive spatial query optimization are now at the very start. ∗

[email protected]; phone 010-64889049-4 Geoinformatics 2008 and Joint Conference on GIS and Built Environment: Geo-Simulation and Virtual GIS Environments, Lin Liu, Xia Li, Kai Liu, Xinchang Zhang, Aijun Chen, Eds., Proc. of SPIE Vol. 7143 71430P · © 2008 SPIE · CCC code: 0277-786X/08/$18 · doi: 10.1117/12.812548 Proc. of SPIE Vol. 7143 71430P-1

2008 SPIE Digital Library -- Subscriber Archive Copy

In this paper, we’ll put forward grid software architecture for adaptive spatial query optimization to integrate computing resources with spatial data resources, and will mainly concentrate on the spatial grid services that support adaptive spatial query optimization. To achieve this, Web Service Resource Framework (WSRF) is used to integrate Web Feature Service (WFS) and Web Processing Service (WPS) into grid to provide standard access to spatial feature data and to provide computing power respectively, and at the same time add states management function to these grid services enabling them to manage themselves and at the same time to publish the fresh states information of themselves. This article is organized as below: in part2 the grid software architecture for adaptive spatial query optimization is introduced, the details of spatial grid services for adaptive spatial query optimization can be found in part 3, in part 4 there are some introductions about prototype realization and a simple result presentation, the final part is about conclusions and future works.

2. GRID SOFTWARE ARCHITECTURE FOR ADAPTIVE SPATIAL QUERY OPTIMIZATION

fl2G

j CLIi

ØneL2 cowlbiiiu

LGdflC

CIopgI 2b9fl91

2GLAICG

q99 2C[J(3W 9

ua

U!1'-!l-''''-1 4!Iricrci '-11 '-"'' hcrc dr

Leth2L9flOU 2CLJGW

The grid software architecture for adaptive spatial query optimization is composed of three kinds of services, namely distributed spatial data query grid service (DSDQGS), spatial data grid service (SDGS) and spatial computing grid (SCGS) service (see Fig.1). The DSDQS possesses spatial data schema and states information of SDGSs, and also states information of SCGSs. It accepts users’ queries, compiles these queries, generates optimal query execution plans according to the information it possesses, then schedules the selected SDGSs and SCGSs to realize the plan, at last merges the returns of SDGSs and SCGSs into a single result and returns to the users.

ØflGLA GXGCIWOL J bI9u

eu6L9I!!OU

CLq 2GLA!C(32

6nc

Ofl6LA DISU cpeqrfl!ua suq

LGflJ

U OL W 9 !OU

LG2flI2 WGL! U d 2GLA!C

1UAOK!U

dnei

Liq

2b9fl91 2GLA! C e

bLOCG22 GXG

2b919I q

dLiq

2GLA1C

2b919I q

2b9

qspe2 dnei

thG2

dnsiL7

Fig. 1. Grid software architecture for distributed data query

Proc. of SPIE Vol. 7143 71430P-2

cne

2b9fl91 cowbi &iq 2GLA! ce COwbn B620fl LCe 2

The SDGSs are responsible for spatial data access by wrapping spatial databases or spatial data files and providing standard data access interface. They accept spatial queries from DSDQGS or other applications, and return requested data. To support spatial data query optimization in grid, they also have states management function that is responsible for registering to DSDQGS, manage states of services and resources and report the fresh states to DSDQGS. Similarly, SCGSs are responsible for spatial operation processing by wrapping computing resources of grid and providing standard computing interface, and bear states management function to manage states of services and resources and report the fresh states to DSDQGS. The SCGSs may need to retrieve spatial data from SDGSs to perform spatial operation.

3. SPATIAL GRID SERVICES FOR ADAPTIVE SPATIAL QUERY OPTIMIZATION It’s mentioned in part one that grid is a dynamic environment, so the grid spatial services (SDGSs and SCGSs) need report their fresh states information to DSDQGS for the generation of a valid optimal query execution plan. We follow the new grid framework WSRF, which is “a generic and open framework for modeling and accessing stateful resources using Web services” (Ref. 12) and support the separation of states and service, to design such spatial grid services. The SDGSs is realized by integrating WFS of OGC to provide access to spatial feature data, and adding management function for the states information of spatial data and services that might be needed by spatial query optimization. Similarly the SCGSs is realized by integrating WPS into WSRF to provide spatial processing and operation services, and adding management function for the states information of the computing resources and services that might be needed by spatial query optimization. 3.1 SDGS Each SDGS has two functions, one is spatial data service which is to provide access to spatial data, the other is states management function which maintains the states information of resources and services and publishes them. The data spatial service is implemented conforming WFS. That says WFS is integrated into WSRF to wrapping spatial data resources and provide GetCapabilities, DescribeFeatureType, GetFeature, Transaction and LockFeature service interface. GetCapabilities is to request a capabilities document that contains information of feature types and operations on each feature type that the service can provide, and also some meta-data. DescribeFeatureType is to get the structure information of one or more feature type. GetFeature is used to retrieve spatial data satisfying certain conditions. Transaction is used to modify features and LockFeature is to tell the service to add locks to some features so that these features can be used exclusively. The data spatial service acts only as a wrapper, accepts sub-queries from the DSDQGS or other clients, compiles them into local commands and invokes the local system to do the actual works, so autonomy of the spatial data sites is kept and there is no need to concern the local spatial query optimization and management of the spatial data resource in the data spatial service. But in fact, spatial data in many sites are stored in separate spatial files there are not spatial data management systems, to fit this situation, the data spatial service is extended to be able to manage spatial data files and carry out spatial operations. In addition the function of obtaining bounding boxes of features is enhanced in the implement of GetFeature operation in order to support spatial semi-join method (which will be introduced in next part). The states management function manages states information of spatial data services and resources that might be required when optimal spatial query execution plans are generated. In SDGS states information is designed to contain three categories. They are statistical information of the data set (SI), quality information of SDGS (QI) and performance information of the data site (PI). SI is designed by extending the statistical model of non-spatial dataset to include statistical information of shape and of spatial distributing of geometries. They are classified into four categories: (1) Collectivity Statistical Information

Proc. of SPIE Vol. 7143 71430P-3

(CSI): including the number of features, the size of maximal feature and the size of minimum feature and the average size of features. (2) Geometry Statistical Information (GSI) : containing maxim, minimum and average size of geometries, the spatial range of all geometries, the area of maxim minimum bounding box (MBR) , the area of the minimum MBR and the average area of all MBRs, if indexed and spatial index type. (3) Attribute Statistical Information (ASI): the size of record, the indexed column, the number of unique values in each column.

QI include General Quality of Service (GQoS) and Thematic Quality of Service(TQoS). GQoS contains usability, stability, reliability, while TQoS indicates to the efficiency of spatial algorithms adopted by they spatial data service. PI contains information of number of CPUs, frequency of each CPU, total memory, available memory, total storage, available storage. States management function is responsible for storage, access, update and publication of the information. In order to provide valid states information, the value of the states information must be updated in time. For the SI, states management function scans the dataset periodically, collects and computes the statistical information of the spatial dataset, besides, offers auto scan invoked command and interfaces for manual update for spatial dataset managers. QI are stable, and only manual update interfaces are available. Some PI information varies all the time, it will cost much to update whenever their values change. We solve this by setting a time interval for periodic update. The interval is defined to be the expectation of stable periods (stable means the change of value dose not exceed a threshold) in a long time, for example several months. Clients can access these states information through standard operations provided by WSRF, for instance GetResourceProperty, GetMultipleResourceProperty, QueryResourceProperty, PutResourcePropertyDocunment, SetResourceProperties and so on. GetResourceProperty is to get a value of a state property, GetMultipleResourceProperty is to get the values of more than one state properties, QueryResourceProperty is to query values of states property, and SetResourceProperties is to modify values of states value using Update, Insert and Delete as parameters. States management function also utilizes the Ws-notification function to support subscription to states information. Each states property is published as a topic and these topics are organized to topic trees according to their categories (Fig. 2. shows the organization of three upper layers of the topic tree). Then the clients can subscribe the topics they are interested in, and when the values of a topic changes, the subscribers will be notified. If a topic is subscribed, its descendents are considered to be prescribed too.

Proc. of SPIE Vol. 7143 71430P-4

rnoujaJou

6I

21

U

bI

C

C C'

C C

Fig. 2. Three upper layers of the topic tree of SDGS

3.2 SCGS Similar to SDGS, The SCGS has two kinds of functions, one is spatial computing service which offers spatial operations and analysis services by wrapping computing resources of grid, the other is states management function that is responsible to maintain the states information of resources and services and publishes them in time. Spatial computing service is realized conforming WPS, providing GetCapabilities, DescribeProcess and Execute interface. GetCapabilities is to request a capabilities document that describes the abilities of the specific server implementation. The names and general descriptions of each of the processes offered by the WPS instance will be included. DescribeProcess is to request the detailed information of one or more processes that can be run on the server, including the inputs required, their allowable formats, and the outputs formats. Table. 1. spatial operations sustained by SCGS

Operation

Difinition

SpatialReference

Returns the Spatial Reference System of the geometry

Envelope

Returns the bounding box of this Geometry

IsEmpty

Returns whether or not the set of points in this Geometry is empty

IsSimple

Return true whether this Geometry is simple

Boundary

Get the boundary of a geometry

Equal

Returns true if this geometry is equal to the specified geometry

Disjoin

Returns true if this geometry is disjoint to the specified geometry

Intersect

Returns true if this geometry intersects the specified geometry

Touch

Returns true if this geometry touches the specified geometry

Cross

Returns true if this geometry crosses the specified geometry

Proc. of SPIE Vol. 7143 71430P-5

Within

Returns true if this geometry is within the specified geometry

Contain

Returns true if this geometry contains the specified geometry

Overlap

Returns true if this geometry overlaps the specified geometry

Distance

Returns the minimum distance between this Geometry and the specified Geometry

Buffer

Computes a buffer area around this geometry having the given width

ConvexHull

Computes the convex hull of a Geometry.

Intersection

Computes a Geometry representing the points shared by this Geometry and other

Union

Computes a Geometry representing all the points in this Geometry and other

Difference

Computes a Geometry representing the points making up this Geometry that do not make up other

SymmDiff

Returns a set combining the points in this Geometry not in other, and the points in other not in this Geometry

Execute is to make certain processes be implemented by WPS using provided input parameter values and returning the outputs produced. (Ref. 13) It now can provide most operations defined by OGIS/SQL (listed in Table.1.) The computing service accepts processing requests of clients, retrieves data from SDGSs appointed by the request, then performs spatial analysis and operations, at last merges the results and sends them to the clients. As mentioned above, the spatial operations are both CUP intensive and data intensive, what’s more, the data on which SCGSs perform spatial analysis and operations are transferred from other SDGSs on network, if the datasets to be computed are of huge size and the network doesn’t have much idle bandwidth, it will take a long time for SCGSs to response clients. We solve this problem by providing every operations that operate on two or more datasets with two implements, one is direct join, that is to retrieve the whole datasets needed an perform spatial operation, another is spatial semi-join, which in fact is a extension of two step operation of filter-refinement into distributed environment. For example, when a SCGS receive a process request asking it to perform a join operation on dataset R and S which can be extracted from SDGS1 and SDGS2 respectively, the spatial semi-join accomplishes this operation in following two steps: First, it retrieves a sequence of tuples containing OID( the unique identifier of a feature) and MBR of R , and a sequence of such tuples from S, after that, uses plane–sweeping techniques to get a set of OID pairs. In each pair one OID is from the tuples of R and the other is from the tuples of S and their corresponding MBRs satisfying overlapping condition. Second, it retrieves features whose OIDs are contained in the OID pairs of the first step from SDGS1 and SDGS2, then performs required operations on the actual features to get exact results. Clients can choose to use either direct join implement or semi-join implement of certain operation. The DSDQGS choose preferable implements according to data statistical information and algorithm of SDGSs and network conditions. The details of network monitor will not be referred in this article. States management function of SCGS is similar to states management of SDGS, but it only maintains QI and PI of the SCGS, updating them in time and providing access and prescription to these states information.

4. PROTOTYPE REALIZATION AND RESULTS The materialization of the prototype of WCGS and WDGS has get funded by Frontier Item of Innovative Subject of Chinese Academy of Sciences (CAS) (Item Name: Spatial Data Query Optimization in grid) and Science and China

Proc. of SPIE Vol. 7143 71430P-6

High-Tech Plan (863 Plan) Foundation (No.2007AA12z203).The SDGS and SCGS are developed using Java under Eclipse. The spatial data service function of SDGS is realized by extracting the core codes referring WFS from GeoServer, an open source software which provides WFS, WCS and WMS conforming to specifications of OGC, wrapping these codes into a gar of JAVA WS Core of Globus to provide stateless services. By the way the data access ability of WFS core of GeoServer is enriched to support Oracle database in which most of our spatial data are stored. The framework of WPS of OGSC is realized by Geo-Processing of 52north,the spatial operations of it are realized using GeoTools and JTS Topology Suite, which are both open source software written in Java. The state information is organized as XML formats and Apache Xindice is utilized to store, retrieve, query and manage these XML-based documents. The states management function is written in java to work with Apache Xindice to update the state information automatically following the updating strategies described in Section 3, and the access and prescription of the state formation is accomplished by standard operations provided by the JAVA WS Core Experimental spatial data include resource and ecology data of west China which is stored in Oracle data base, and some global socio-economic data downloaded from the Internet in different formats, for example, shapefile. Fig. 3 shows the result of a call to one SCGS, telling it to retrieve global country and region data from one certain SDGS, and retrieve time zone data from another certain SDGS, then perform intersection operation on them and return the results in the certain formats, in this example the results is returned in GML. The rendering of the results is done by clients. From the figure, users can obtain the number of time zones of any countries or regions. j Map

S

—-

Tj .

.ç.

r

..>

.

/ TfTT__ _H HH. -

.

H

... .

T...b

[1:67.755.2] [_

Fig. 3. Result of Intersection Returned By SCGS

Proc. of SPIE Vol. 7143 71430P-7

WGS 84

][

7,

54

5. CONCLUSIONS AND FUTURE WORKS OGC Web Service specifications provide standards for spatial information sharing and interoperation over network, and grid is developed to integrate resources (or services, because in OGSA everything is a service), so OWSs on network can be integrated into grid to work together to provide more powerful and composite services. The computing resource of grid can be integrated with spatial data resource to solve the problems brought by the both CUP intensive and data intensive characteristic of spatial operations. However, the dynamic characteristic of grid brings difficulties for spatial query optimization. To solve this problem, we firstly put forward a grid software architecture for adaptive spatial data query optimization in grid, and then focus on the designing and realization of spatial data grid service and spatial computing service that support adaptive optimization of gird spatial query. By utilizing WSRF, add states management and publication function to these two kind of grid services enabling them to report the fresh states information of themselves to the coordinator to make valid query optimization, while integrate WFS into grid to provide standard access to spatial feature data and WPS to provide computing power. In the implement of WPS, semi-join method is supported to improve performance of services. The performances of spatial grid services need to be evaluated and compared with existent distributed spatial data systems. The software prototypes needs bug fixes and to be improved. The states information of SDGS and SCGS needs to be increased or modified to satisfy requirements of different coordinators by investigating the cost models and optimization strategies used by most coordinators of distributed spatial database systems. Enriching the spatial operations and analysis capabilities of SCGS and integrating WCS of OGC into our grid architecture are also planed in our future works.

ACKNOWLEDGEMENTS Our research work is supported by Science China High-Tech Plan (863 Plan) Foundation (No.2007AA12z203) and Frontier Item of Innovative Subject of Chinese Academy of Sciences (CAS) (Item Name: Spatial Data Query Optimization in grid).

REFERENCES 1

Anirban Mondal and Masaru Kitsuregawa, Load-Balancing Remote Spatial Join Queries in a Spatial GRID[C], ER 2004, LNCS 3288, pp. 450–463(2004). 2 Ann Chervenak , Ewa Deelman, Carl Kesselman et al.” High-performance remote access to climate simulation data: a challenge problem for data grid technologies”, Parallel Computing 29, 1335–1356(2003). 3 Arjun Sen , WSRF: EXPLORING INTEROPERABILITY,COMPLIANCE AND EXPOSING SERVICES OVER THE WEB, Master Degree Paper of The University of Manchester,2006 4 CHAI Sheng, ZHOU Yun-xuan, WANG Sheng-sheng, “Research of distributed spatial data share platform based on jabber” (in Chinese), Application Research of Computers, 24(8), 287-288(2007) 5 Chen Luo, Research on the IntegrationTechniques of Distributed Geo-Spatial Data Services(in Chinese), Doctor Degree Paper of Graduate School of National University of Decfnse Technology, Hunan,P.R.China, 2005. Chuanjie Xie, Gaohuan Liu, Bingbo Gao et al. The Optimization of Remote Spatial Join Queries on a Spatial Information Grid[A]. In: 7th International Workshop on Geographical InformationSystem, Beijing China, September, 2007: 180-185 7 DAVID BERNHOLDT, SHISHIR BHARATHI, DAVID BROWN, KASIDIT CHANCHIO, et al. The Earth System Grid: Supporting the Next Generation of Climate Modeling Research[C], PROCEEDINGS OF THE IEEE, 93(3), 2005. 6

Proc. of SPIE Vol. 7143 71430P-8

8

David T. Liu Michael J. Franklin, GridDB: A Data-Centric Overlay for Scientific Grids[C], Proceedings of the 30th VLDB Conference,Toronto, Canada, 2004. 9 I. Foster, C. Kesselman, S. Tuecke. The Anatomy of the Grid: Enabling Scalable Virtual Organizations, International J. Supercomputer Applications, 2001 10 I. Foster, C. Kesselman, J. Nick, S. Tuecke, The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Open Grid Service Infrastructure WG, Global Grid Forum, 2002 11 Kai-Dee Chu, Liping Di, and Peter Thornton, Introduction of Grid Computing Application Projects at the NASA Earth Science Technology Office[C], GPC 2006, LNCS 3947, pp. 289 – 298(2006). 12 OASIS Web Services Resource Framework (WSRF) TC [online] http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wsrf 13 OGC. Web Feature Service Implementation Specification[Z]. 2005 14 OGC. Web Processing Service Implementation Specification[Z].,2007 15 Omar Boucelma, Jean-Yves Garinet, Zoe Lacroix, The VirGIS WFS-Based Spatial Mediation System[C], CIKM’03, November 3–8, 2003, New Orleans, Louisiana, USA. 16 Patel J M, DeWitt D J. Partition Based Spatial Merge join[A]. In: Proceedings of the 1996 Association for Computing Machinery Special Interest Group International Conference on Management of Data[C]. Montreal, Cananda, 1996:259-270 17 Richard ONCHAGA. Modelling for Quality of Services in Distributed Geoprocessing. In: XXth ISPRS Congress, Istanbul, Turkey, July, 2004: 212-218 18 Roth, M.T., Ozcan, F., Haas, L.: Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System. In: VLDB (1999) 19 Shashi Shekhar, Sanjay Chawla. Spatial Data Base, China Machine Press, Beijing, 2004 20 Shao Peiying. Distributed Database System and Application (in Chinese), Science Press, Beijing,2005 21 TANGYu,CHENLuo, HEKai-tao et al “A Study on System Framework and Key Issues of Spatial Information Grid” (in Chinese), Journal of Remote Sensing, 8(5), 425-433 (2004). 22 TULADHAR A., RADWAN M., KADER et al. Federated data model to improve accessibility of distributed cadastral databases in land administration, Proceedings of 8th Global Spatial Data Infrastructure Conference (GSDI-8), 16-21 April 2005, Cairo, Egypt. 23 ZHA N G Jin - quan, TA N G Jian - yu , NI L i – na et al, “Measure and Monitor of the QoS Metrics of Grid Resources Based on Java” (In Chinese), JOURNAL OF TAISHAN MEDICAL COLL EGE, 26(4), 269-272 (2005)

Proc. of SPIE Vol. 7143 71430P-9

Suggest Documents