Loose architecture of multi-level massive geospatial data based on ...

8 downloads 261 Views 1MB Size Report
the data from the remote computers into the local VQT so as to implement seam- less integration of distributed multi-level massive geospatial data. Based on that.
Science in China Series E: Technological Sciences © 2008

SCIENCE IN CHINA PRESS

www.scichina.com tech.scichina.com www.springerlink.com

Springer

Loose architecture of multi-level massive geospatial data based on virtual quadtree YANG ChongJun†, WU Sheng, REN YingChao, FU Li, ZHANG FuQing, WANG Gang, TAN Jian, LIU DongLin, MA ChaoJi & LIANG Li State Key Laboratory of Remote Sensing Science, Jointly Sponsored by the Institute of Remote Sensing Applications of Chinese Academy of Sciences and Beijing Normal University, Beijing 100101, China

This paper proposed a virtual quadtree (VQT) based loose architecture of multilevel massive geospatial data for integrating massive geospatial data dispersed in the departments of different hierarchies in the same sector into a unified GIS (Geographic Information System) platform. By virtualizing the nodes of the quadtree, the VQT separates the structure of data organization from data storage, and screens the difference between the data storage in local computer and in the remote computers in network environment. And by mounting, VQT easily integrates the data from the remote computers into the local VQT so as to implement seamless integration of distributed multi-level massive geospatial data. Based on that mode, the paper built an application system with geospatial data over 1200 GB distributed in 12 servers deployed in 12 cities. The experiment showed that all data can be seamlessly rapidly traveled and performed zooming in and zooming out smoothly. massive geospatial data, virtual quadtree, loose architecture

1 Introduction There are many sectors with vertical structures made up of departments of different administrative hierarchies scattered across the country. For example, land department is made up of agencies of state, provincial and municipal levels. Many those departments have built their own spatial databases according to their own businesses and standards. These databases have four characters. 1) Multi-level: Departments of different administrative hierarchies require geospatial data with different resolutions or scales. 2) Massive: The geospatial data volume is massive. The volume of the high resolution images may be over 1000 GB. Received December 2, 2007; accepted January 5, 2008 doi: 10.1007/s11431-008-5020-7 † Corresponding author (email: [email protected]) Supported by the National Natural Science Foundation of China (Grant No. 2006AA12Z208) and the CAS Innovation Program (Grant No. KZCX2-YW-304-02)

Sci China Ser E-Tech Sci | Apr. 2008 | vol. 51 | Supp. I | 114-123

3) Data format: Support both image data and geographic data. 4) Distributed: Each department maintains his own geospatial data which dispersed in the different places. Quadtrees are usually adopted in these platforms to build pyramid models for multi-level massive geospatial data. The traditional quadtrees tightly bind the structure of data storage and data organization together, which requires that all geospatial data should be stored in the same computer and does not support distributed geospatial data. These quadtrees lack flexibility and ability to extend. So it is hard to support the integration of distributed geospatial data, which brings two problems. 1) High cost: It is hard to integrate the massive distributed geospatial data by data copy, which also makes data hard to be maintained and updated. 2) Massive data volume: The storage of one computer is limited which cannot support infinite geospatial data. The usages of the geospatial data are also geospatially distributed. The departments of the different areas use the geospatial data in their own area more frequently than those in other areas. So a sound way is to link the distributed data together to form a loosely coupled architecture of multi-level massive geospatial data. Therefore, it is a very important and urgent problem for leading departments to efficiently organize the massive geospatial data distributed in the departments of the different administrative hierarchies in the same sector to build a unified visualized GIS platform. This paper proposed a loose architecture of multi-level massive geospatial data based on the virtual quadtrees, which provides a promising solution.

2 Related work Many countries in the world are recognising the importance of spatial data infrastructures (SDIs) consisting of policies, standards and procedures for sharing spatial information at municipal, state/provincial and national levels[1]. Over the last decade, a number of countries and states have successfully established complete SDIs incorporating core digital map bases such as the cadastre or land parcel layer, topography, hydrology, road networks and administrative boundaries. However in most cases the relationship between the Local Government and the National or State systems is at best poor[2]. Many data share architectures for distributed massive geospatial data have been built around the world to realize the data share among departments of different hierarchies in the same sectors or those across sectors, which provides the services of searching, dispatching and migration of massive geospatial data among departments across the sectors and geographic districts. Nowadays, a great number of papers have deeply researched the models and applications of quadtree. The quadtree structure has been used to represent polygonal maps[3]. The quadtrees may be efficiently stored as a forest of quadtrees and as a new structure called a compact quadtree[4]. McArthur et al.[5] described an approach for generating a hierarchical, multi-resolution polygonal database from raw elevation data using the wavelet transforms. Fekete[6] described sphere quadtrees, a spatial data structure applicable to global representations of the Earth. Ren et al.[7] used the hierarchical structure to simplify vector data. A large body of other previous work has addressed issues in applying the quadtree, a fundamental data structure to image processing and spatial data ̣ for 2D/3D visualization[8 11]. Many papers have focused on the centralized quadtree model of the YANG ChongJun et al. Sci China Ser E-Tech Sci | Apr. 2008 | vol. 51 | Supp. I | 114-123

115

pyramid structure of massive geospatial data. The centralized quadtree binds the structure of data storage and data organization and requires thatall data should be accumulated together, which makes centralized quadtrees lack flexibility and extendability. This paper performed researches on how to efficiently organize the massive geospatial data distributed in the departments of different hierarchies in the same sector into a logically unified loose architecture of multi-level massive geospatial data, so as to build a platform and share architecture of distributed visualization geographic information.

3 Virutal quadtree model 3.1 The shortcomings of the centralized quadtree The traditional quadtree tightly couples the physical storage and logical structure of the geospatial data, and describes the structure of the quadtree by directory structure of the data storage (Figure 1). This kind of quadtree has simple structure and is easy to be implemented, which also leads to its own inherent shortcomings of lacking flexibility and ability to extend.

Figure 1 Directory structure of the centralized quadtree.

Firstly, all geospatial data must be stored in one file system. Geospatial data volume is limited by the capacity of hard disks. Although we can add or extend the capacity of the hard disks to store more geospatial data, but it is still hard to accommodate infinite geospatial data. And meanwhile limited computer resources may also make the access of massive geospatial data very inefficient, such as I/O bottleneck. Secondly, centralized quadtree directly copies multi-level geospatial data into directories of the quadtree, which makes it hard to maintain and update geospatial data and the quadtree structure itself, since those operations mean migration of data. 1) Single direction importing. Once the geospatial data are imported into quadtree structure, it is hard to remove the data from the quadtree again, when there exist subtrees under that node in the quadtree, otherwise the whole structure of quadtree may be damaged. 2) Inefficiency. As Figure 2 shows, when there is a quadtree II with high resolution to expand node A in quadtree I, the data in quadtree II have to be copied into quadtree I. When the data volume of the quadtree II is vast, and when the quaretree II is stored in a remote computer in particular, the efficiency of node expanding is very low. 3) Invasive merging. As Figure 3 shows, when subtree B in quardtree II is mergered into node A in quardtree I, the data in node A will be replaced by those in node B, and the original data in node A cannot be recovered any more. Therefore, the inherent structure and the integrity of quadtree will be damaged.

116

YANG ChongJun et al. Sci China Ser E-Tech Sci | Apr. 2008 | vol. 51 | Supp. I | 114-123

Figure 2 Expand subtree.

Figure 3 Merger two quadtrees.

3.2 Virtual quadtree The shortcomings of the traditional quadtree lie in the tightly coupling of logical structure of quadtree and physical data storage. So, adjustment of quadtree structure inevitably accompanies the migration of data. The virtual quadtree (VQT) virtualizes the real node in the traditional quadtree into a reference to the location of the real geospatial data. No real data are stored in the VQT. VQT does not care about how and where geospatial data are stored. Therefore, VQT detaches the structure of data organization from data storage of the quadtree, and just builds the framework of the quadtree and geospatial data are integrated into quadtree framework by hooking in the VQT to generate a hierarchical structure (Figure 4).

Figure 4 Virtual quadtree.

There are two kinds of nodes in VQT. Iqnode (information node): Iqnodes virtualize the real nodes in the traditional quadtree. Instead of real geospatial data, there is only a reference in iqnode pointing to the location of real data. So the iqnodes build the trunk of the VQT which does not contain any geospatial data. Dqnode (data node): Dqnode stores the real geospatial data. By registration mechanism, dqnode is mounted on the iqnode so as to be integrated into VQT. Iqnode provides a series of attributes and methods to build the framework of VQT and mount geospatial data. The main attributes and methods are as follows (Figure 5). Scale: It describes the map scale or resolution of geospatial data mounted on this iqnode. DataLocation: It describes the storage location of the geospatial data mounted on this iqnode.

YANG ChongJun et al. Sci China Ser E-Tech Sci | Apr. 2008 | vol. 51 | Supp. I | 114-123

117

Mount: It implements the method of how the geospatial data are mounted on the iqnode. When geospatial data are mounted on the iqnode, the storage location, scale and other metadata of the geospatial data are registered into iqnode. When the client visits the geospatial data on the iqnode, the iqnode call “Visit” methods to access the geospatial data under the route of dataLocation. Unmount: It implements the method of how the geospatial data are removed from the VQT. When the geospatial data are not needed, we just simply set dataLocation of the iqnode into null, then the original geospatial data will never be visited by the way of iqnode. In this procedure, original storage structures of the geospatial data are not Figure 5 Structure of iqnode. changed. Dqnode may be a geospatial data node or a root of a quadtree. In order to seamlessly integrate dqnode into VQT, dqnode needs to provide unified data format and data access methods, such as WMS, WFS. The mechanism of registration of VQT makes the algorithms of subtree mounting and tree mergering easier, which does not need data copy and would not cover the data so as to damage the original quadtree structure. Algorithm I: Mount subtree. Mount the VQT-II with root r onto the node A in VQT-I (Figure 6). MountSubtree(){ A->Mount (r) } Algorithm II: Merger substree. Merger the subtree on the node B in VQT-II with root td into VQT-I to replace the subtree on node A with root sd (Figure 7). MergerSubtree(){ A->Unmount(sd) A->Mount(td) }

Figure 6 Mount subtree.

VQT has the following characters. 1) Detach the data organization and data storage, which makes the VQT more flexible and easy to be extended. 2) Support organization of multi-level massive geospatial data in distributed environment. By 118

YANG ChongJun et al. Sci China Ser E-Tech Sci | Apr. 2008 | vol. 51 | Supp. I | 114-123

Figure 7 Merger subtree.

registration mechanism, the VQT screens the storage location of geospatial data, which makes the visit of geospatial data on location machine or remote machines with no difference in nature. 3) Loose coupling. By registration mechanism, the geospatial data can be easily integrated into VQT framework without copying data among computers. And geospatial data can be conveniently mounted from one point to another.

4 Loose architecture based on VQT 4.1

Architecure

Based on VQT, we propose a loose architecture of multi-level massive geospatial data under distributed environment (Figure 8).

Figure 8 Loose architecture of multi-level massive geospatial data based on VQT.

The Architecture is made up of two kinds of servers. Geospatial Data Server (GeoServer) stores local geospatial data and implements quadtree based pyramid models in local environment. Resources Registering Server (RegServer) provides resources registration services for GeoServers. GeoServer registers the metadata of the geospatial data on local computer into the RegServer such as spatial extent, resolution or scale, levels, level id, so that it can be discovered by RegServer. RegServers are hierarchically organized. The RegServer with high resolution and small spatial extent locates at the lower level of the hierarchy. And they register the metadata of the geospatial data which they maintain into the RegServer of upper level which is their parent. For example, the YANG ChongJun et al. Sci China Ser E-Tech Sci | Apr. 2008 | vol. 51 | Supp. I | 114-123

119

RegServers at municipal level register their metadata into the RegServer at provincal level, and the RegServers at provincal level register their metadata into RegServer at state level. Thus a logical quadtree is built. Figure 8 shows the hierarchical architecture. In this logical quadtree, RegServers form the trunk of the Architecture. And the GeoServers mount on the Architecture by registering into related RegServers, which form the leaves of the quadtree, and the spatial extents of the GeoServer do not overlap with each other. The architecture owns following characters. 1) Support distributed data. Each department stores and maintains his geospatial data in his own district, without gathering all data together. 2) Loose architecture. Geospatial data are organized by registration mechanism. So it is easy to adjust the nodes in architecture without affecting the whole structure of the architecture. 3) Data sharing. Geospatial data in the architecture can be easily shared in the whole architecture no matter where the data locates or whom the data belongs to. 4.2 Iteration algorithm for resources discovering and accessing (IARDA) IARDA (Figure 9) helps clients to find the geospatial data in the Architecture described in section 4.1. According to the hierarchy of the Architecture, IARDA searches the Architecture by iteration. The Algorithm can be described as follows.

Figure 9 Iteration algorithm for resources discovering and accessing.

The client C asks the geospatial data under scale S and spatial extent E. IARDA () Find the nearest RegServer R from C Ask A whether there are geospatial data satisfying C’s demands if True return C the location of the GeoServer directly mounted on the R else Ask the Root RegServer RR RR searches itself whether there are the geospatial data the C asks if True return C the location of the GeoServer directly mounted on the RR else Ask every child of the RR 120

YANG ChongJun et al. Sci China Ser E-Tech Sci | Apr. 2008 | vol. 51 | Supp. I | 114-123

Iterate the procedure until find the target and return the location of the related GeoServer to C or tell C that the target data cannot be found

5 Application Based on the loose architecture in section 4.1, this paper built a geographic information platform with three levels of state, province and city. The platform supports both images and geographic data. The geospatial data include: 1) Image data with resolution of 500 m and geographic data with scale of 11000000 covering the whole world area; 2) Image data with resolution of 15 m and geographic data with scale of 1250000 covering the whole China area; 3) Image data with resolution of 1 m and geographic data with scale of 12000 covering scores of cities. Geographic data are rendered into high quality images by Map Render Engine. The whole data volume is over 1200 GB. All data are distributed in 12 servers scattered in 12 cities connected by high speed fibres. Each place maintains the geospatial data of their own geographic administrative district. For example, Guangdong Province maintains the geospatial data of the whole Guangdong area, and Shenzhen City only maintains the geospatial data of Shenzhen area (Figure 10).

Figure 10 Geospatial data from the nodes in loose architecture on VQT.

Geospatial data servers registered themselves in the resource registering server so as to form a hierarchical structure. When users request geospatial data, the system searches the local data server at first to find the target data. If the target cannot be found there, the system asks the resource registering server to find the location of target data, and retrieves the target data from the remote data server. When users in Shenzhen access the map of Shenzhen area, they just visit the data server in Shenzhen. But when they want to access the map of the whole Guangdong Province, they have to request the data server of Guangdong Province deployed in Guangzhou. Then the remote data will be integrated with the YANG ChongJun et al. Sci China Ser E-Tech Sci | Apr. 2008 | vol. 51 | Supp. I | 114-123

121

data on local data server. Data update is easy. When the geospatial data are to be updated, they just update data on the server in Shenzhen. The data in Guangdong would not be affected. If the data server in Shenzhen is changed, it just needs to update the registration in Guangdong Province and access can be routed to the new data server in Shenzhen. Experiments show that under the conditions of ordinary bandwidth, the Architecture can support seamless rapid traveling, zooming in and zooming out functions on these multi-level massive geospatial data. Figure 11 and Table 1 show response time of multi-user concurrent accesses to the system. Zooming in and zooming out operations are performed during the test, so the geospatial data may come from both local servers and remote servers. Under 80 concurrent user accesses, the average response time is 3 s, and the system still has a good performance.

Figure 11 Average transaction response time. Table 1 Average response time of single transaction Concurrency 1 10 50 80 110

Test time (min) 5 5 5 5 5

Average response time of single transaction (s) 0.07 0.3 2 3 5.5

6 Conclusion Nowadays, many efforts have been made on the researches of the management of the massive multi-level geospatial data and some distributed systems have been built. But most of these distributed geospatial systems paid much attention to the storage of massive geospatial data in the distributed environments rather than the hierarchical organization of geospatial data. In this paper, we proposed a VQT based loose distributed architecture, in which besides that the distributed geospatial data are horizontally distributed into different districts, the distributed geospatial data are vertically organized into a unified quadtree based multi-level hierarchical structure, so as to realize seamless access of massive multi-level geospatial data. The VQT detaches the structure of data storage and data organization, and screens the difference

122

YANG ChongJun et al. Sci China Ser E-Tech Sci | Apr. 2008 | vol. 51 | Supp. I | 114-123

between the data storage in local computer and those in the remote computer in network environment. And by mounting, VQT easily integrates the data from the remote computer into the local VQT so as to implement seamless integration of the distributed multi-level massive geospatial data. That provides an efficient way to maintain distributed multi-level massive geospatial data for those departments with strong powers of vertical administration. This paper mainly discussed the solutions to the efficient organization of massive geospatial data in the departments of different hierarchies in the same sector, and the solution can be applied in the cases of building uniform visualized GIS platform across sectors. 1 Rajabifard A, Williamson I. From local to global SDI initiatives: A pyramid to building blocks. In: 4th Global Spatial Data Infrastructure Conference, South Africa, 2000 2 Jacoby S, Smith J, Ting L, et al. Developing a common spatial data infrastructure between State and Local Governmenṭan Australian case study. Int J Geogr Inf Sci, 2002, 16(4): 305̣322[DOI] 3 Hanan S, Robert E. Webber, Storing a collection of polygons using quadtrees. ACM T Graphic, 1985, 4(3): 182̣222[DOI] 4 Jones L P, Iyengar S S. Space and time efficient virtual quadtrees. IEEE T Pattern Anal, 1984, 2(6): 244̣247 5 McArthur D E, Fuentes R, Devarajan V. Generation of hierarchical multiresolution terrain databases using wavelet filtering. Photogramm Eng Rem S, 2000, 66(3): 287̣295 6 Fekete G. Rendering and managing spherical data with sphere quadtrees. In: Proceedings of Visualization, 1990 7 Ren Y C, Yang C J, Yu Z F, et al. A way to speed up buffer generalization by Douglas-Peucker algorithm. In: International Geoscience & Remote Sensing Symposium, 2004. 2916̣2919 8 Renato P, Marc A, Roberto L. Quadtin: Quadtree based triangulated irregular networks. In: Proceedings IEEE Visualization. USA: IEEE Computer Society Press, 2002. 395̣402

9 Pajarola R. Overview of quadtree based terrain triangulation and visualization. Technical Report UCI-ICS TR 02-01. Irvine: Computer Science University of California, 2002 10

Zhang L Q. Effective solutions to a global 3D visual system in networking environments. Sci China Ser D-Earth Sci, 2005, 48(4): 511̣518

11

Hanan S. The quadtree and related hierarchical data structures. CSUR, 1984, 16(2): 187̣260[DOI]

YANG ChongJun et al. Sci China Ser E-Tech Sci | Apr. 2008 | vol. 51 | Supp. I | 114-123

123

Suggest Documents