ISBN 978-952-5726-07-7 (Print), 978-952-5726-08-4 (CD-ROM) Proceedings of the Second Symposium International Computer Science and Computational Technology(ISCSCT ’09) Huangshan, P. R. China, 26-28,Dec. 2009, pp. 205-208
Research on Distributed Geo-Computing Oriented Self-organized P2P Network Xicheng Tan 1, and Fang Huang 2 1
International School of Software, Wuhan University, Wuhan, China Email:
[email protected] 2 Institute of Geo-Spatial Information Technology, College of Automation, University of Electronic Science and Technology of China, Chengdu, P.R. China Email:
[email protected]
Abstract—With the extending of spatial information system into the distributed network environment, it faces some challenges including the mass data character of the spatial data, the limited band width of current network, the devilishly centralized spatial information management and geographic computing resources, as well as the higher requirements of the spatial information service capability. For overcoming these challenges, this paper puts forward a Geo-Computing oriented self-organized P2P network model, and the structure of the P2P network is designed. For performing spatial analysis tasks, the paper also analyzes the spatial data management on the self-organized P2P network. Finally, the test system, which has simulated the slope analyzing based on the self-organized P2P network, is also presented. Compare with the single server based spatial analysis systems the P2P computing based analysis task performs more efficiently and has a better capability of supporting huge amount of requests from the users.
SETISETI@home[10],XtremWeb, Chord, etc. The idea of using spare computing resources has been addressed for some time by traditional distributed computing systems. The Beowulf project from NASA was a major milestone that showed that high performance can be obtained by using a number of standard machines. In this paper we analysis the properties of spatial data distributed store. Based on these properties, we put forward a P2P overlay network structure based on R-Tree indexed spatial data, and the topological structure of the P2P network is presented also. More over, the Logical Model and Task Scheduling of Self-Organized P2P GeoComputing are analyzed. Finally, the test system, which has simulated the slope analyzing based on the P2P computing model, is also presented.
Index Terms—Spatial Analysis, Distributed Computing, P2P Computing, High Performance Computing (HPC), Task Dispatch
STRUCTURE
I.
SELF-ORGANIZED P2P OVERLAY NETWORK
In order to distribute and manage the distributed spatial data conveniently and effectively, it is crucial to select one feasible type of the P2P network firstly. Among those spatial data, there exist uncountable spatial relations, which play important roles in the data processing on specific P2P network, mainly including the spatial data locating and indexing. Due to those reasons, the methods for spatial data storage are varied from that of the nonspatial data storage. If the storage of spatial data adopts the type of the non-spatial data, the efficiency of spatial data location and transferring will fall remarkably. As a result, the performance of the whole system will also be debased. From this point, the method of spatial data storage, especially the spatial data locating and indexing, need to draw much attention in P2P network. Unfortunately, the existing P2P applications consider litter on how to locating and indexing the spatial data. In the managing of spatial data in the P2P network, the most important work is how to index the distributed spatial data in the P2P network. Since abundant spatial relations are contained in the spatial data, it is more convenient, to some extents, to locate and manage the resource in the P2P network if the spatial relations are used adequately. We adopt spatial data R-tree index to improve the efficiency and correctness of whole resource location as show in Fig1(a), and a P2P overlay network structure is constructed based on the spatial data R-Tree index. The R-Tree indexed spatial data and the P2P overlay network structure are shown as Fig.1(b)
INTRODUCTION
Geo-Computing is art and science of solving the complex spatial issue with the computer. Computing science use the computer to research science issue, and the High Performance Geo-Computing (HPGC) is a use field of the High Performance Computing. With the high performance resources HPGC can play great role in solving the problems of geosciences. The use fields of HPGC include spatial data analysis, Dynamic modeling, Simulation, Temporal-Spatial science, Visualization, 3DGIS and VR, etc. However, most researches of HPGC are focused on the parallel computing algorithms on the computer clusters currently[1,2,3,4]. With developing of Geo-Computing extents to the distributed environment, the parallel computing on Grid environment has been a new research direction[5,6]. With the developing of Peer-to-Peer (P2P) network, it has given us a more powerful technique to resolve the issue of HPGC. P2P computing has been an important model of distributed computing, and it can operate the collaborative computing by using the spare computing resource of the internet. Moreover, compare with the GRID computing, P2P is a kind of thin-core computing model, and it is benefit for using the distributed computing resources[8, 9]. There are many researches based on P2P computing, such as © 2009 ACADEMY PUBLISHER AP-PROC-CS-09CN005
II.
205
detecting reduces the efficiency depletion of routing maintaining. HA1 Peer Ring
Common Peer Ring
P69
P77
P41
Common peer
P11 P80 P10
P23
P34
P61
P47
HA1
P17
TD1
P51 P30
TD2
P67
P21
HA2
P86 P91
P12
HA3
P63 HA2 Peer Ring 5 P4
HA3 Peer Ring
HAn Peer Ring
(a) R-tree indexed spatial data HA peers of level n
HA peers of level 3
HA peers of level 2
…
…
Figure 2. Topological structure of the P2P network.
...
III.
p63 p72 p62 p45
p30 p51 p86
HA peers of level 1 p10 p80 p61 p12 p67 p34 p41 p51 p11 p17 p23 p47 p21
p16 p28 p31 p19 p18 p20 p43 p46 p15p33p32 p39 p76 p53 p66 p81 p37 p27 Common peers
HAn
TD3
...
(b) Spatial data r-tree based P2P network structure Figure 1. P2P overlay network structure based on r-tree indexed spatial data.
R-Tree is a balance tree and it has two kinds of nodes, leave node and non-leave node. Each node has some index items. To the former, the index item points to the Minimum Bounding Rectangle (MBR) of the spatial data kept by the leave node; To the latter, the items also points to one specific MBR that contains the MBRs of the leave nodes. As shown in Fig.1 (a), p16, p28, p31, p19, p18, p20 are leave nodes, while p10 is non-leave node, in which the item contains the MBRs of p16 et al.. Fig.1 (b) shows the spatial data R-Tree based selforganized P2P structure. In the structure, an important component--hierarchical Agent (HA) is used, and all the HA peers are the non-leave nodes in the R-Tree index. Thus HA peers of every level can manage some common peers through MBR of spatial data kept by them. In the structure, HA peer of level i (HAi) is the core of the peer cluster (or group) to which HAi-1 or the common peers belong. If the MBR of one HA1 contains the MBRs of common peers, these common peers will join in the same group and connect to the HA1, and will be managed by the HA1. If some MBRs in HA1 peers contains the MBR of a HA2 peer, these HA1 peers will join a same group and connect to the HA2, and will be managed by the HA2. In this way, all peers can be managed by HA on different levels. In order to enhance the routing efficiency of the P2P network, Peers in the same cluster will construct a ring shown in Fig.2. The service state information of the peers can be ensured by regular detecting along the ring. For this purpose, a peer will send detecting message regularly to the previous peer to ensure the peer is on-line and to get the network condition of the peer. The regular 206
THE DISTRIBUTED SPATIAL DATA MANAGEMENT ON THE P2P NETWORK
All of the HA peers and common peers are spatial data storage peers as well as client peers. HA peers require higher equipment performance, internet bandwidth and stable online service. Both HA peers and the common peers contribute a part of storage space, and then the contributed storage space will be managed and checked via routing function of P2P network. Comparing with Mixed-Structure P2P network, there isn’t server peer in this network, which can increase the stability of the system, as well as keeping its efficiency. HA Peers has stable service time, fixed public network IP, biggish bandwidth, higher performance. If some spatial data peers can reach certain conditions, it can be joined the network as a HA peer too. Ring structure constructed by common peers is to secure the great amount of grouped common peers. Every HA peer maintains a common peer’s ring, common peers in this ring has higher spatial relation, i.e. these common peers are accepting spatial data service of the same or neighboring space. While accepting service, a great amount of common peers are also the contributors offering the spatial information. Common peers in a same ring could set links with one another and transfers the needed spatial data. Another issue is deciding the information kept by different peers. For the P2P network is constructed based on the R-Tree of the spatial data kept by the peers, undoubtedly every peer should keep the spatial data as well as the MBR of the spatial data. Meanwhile, each peer manages the metadata of the kept spatial data, and with the metadata, the information of the kept spatial data such as data classes, precision information etc. can be described. For regular detecting in the ring, each peer keep the ID of the previous and the subsequent peer in the ring. Moreover, HA peer keeps IDs of the common peers managed by itself and its subordinate HA peers. IV.
PERFORMANCE EVALUATION
For evaluating the performance of task assignment and the geo-computing application in the P2P network, we take a evaluating scenario to evaluate performance of geo-Computing applications.
For evaluating the performance of task assignment and the geo-computing application in the P2P network, we take two evaluating scenarios. The first scenario is used to evaluate the Performance of Geo-Computing Task Assignment and the second is to evaluate performance of geo-Computing applications. In this scenario, we compare the performance of the average slope value computing on the P2P based application and the C/S based application. There are 10 HA peers and 39 common peers in P2P network. The HA peers and the common peers all are the PC with 2.7G CPU, 512MB ROM and 160G disk. A single HA peers
ring with 2 levels of HA peers is constructed and therefore there 10 HA peers in the ring, the number of common peers managed by each HA peer varies from 3 to 5. Three DEM (Digital Elevation Model) data files with different resolution of, respectively, 20m, 10m and 5m, is stored in the P2P network distributed according to the R-Tree index. The environment of C/S based geo-computing application mentioned as follows. The server machine with 3G CPU and 2G ROM, and all the client machines’ parameters are the same with that of the client peers in the P2P environment. All the three DEM data files are stored on the server machine. There are three conditions in the test: (1) task extent; (2) number of concurrent tasks; (3) resolutions of the DEM data files. All the three conditions can be set by the client application. We set two different task extents: 6560km2 and 47580km2. We define the task with 6560km2 as process A, and task with 47580 km2 as process B. For both process A and B, the three levels of resolutions are used. The number of concurrent tasks varies from 1 to 30. The performance of the average slope value computing is shown in Fig.3. With the increasing of the DEM resolution, the extending of the task extent and the rising of the concurrent requests number, the execution time of C/S based computing increases more sharp than that of P2P based computing even though the data is distributed stored in the P2P network. Because more HA peers and common peers act as workers in the geo-computing during the execution of tasks, besides, the communication load of the server in the C/S based computing is bigger than that of the P2P network.
(a) Execution time of analysis on DEM with 20m resolution
V.
CONCLUSION
This paper presented a spatial data R-Tree index based P2P structure firstly, and the mechanism of the spatial data management is designed. According to the P2P geocomputing logical view, the geo-computing oriented task scheduling of P2P geo-computing is designed. Simulation results show that the P2P geo-computing job assignment gets considerable performance. The comparison of the geo-computing applications performance shows that the P2P based geo-computing is more excellent than the C/S based geo-computing.
(b) Execution time of analysis on DEM with 10m resolution
ACKNOWLEDGMENT The research work is supported by the Geographic Spatial Information Engineering Laboratory of China State Bureau of Surveying and Mapping (No. 200806) (c) Execution time of analysis on DEM with 5m resolution
REFERENCES Mineter, M.J., 2003. A software framework to create vector-topology in parallel GIS operations, International Journal of Geographical Information Science, 17(3): 203222. [2] Wang, F.J., 1993. A parallel intersection algorithm for vector polygon overlay. Computer Graphics and Applications 13(2): 74-81. [3] Xiong, D., and Marble, D.F., 1996. Strategies for RealTime Spatial Analysis Using Massively Parallel SIMD [1]
Figure3 The Comparison of the performance of the slope analysis based on different computing model
207
[4]
[5] [6]
[7] [8]
[9]
[10] [11] [12]
[13]
[14]
[15]
[16]
Computers: An Application to Urban Traffic Flow Analysis. International Journal of Geographical Information Systems, 10(6): 769-89. Armstrong, M. P., and Densham, P. J., 1992. Domain Decomposition for Parallel Processing of Spatial Problems. Computers, Environment, and Urban Systems, 16: 497-513. Huang, F., 2009. Implementation and QoS for highperformance GIServices in spatial information grid. Wang, L.Z., Chen, J.J., et al (ed.), Quantitative Quality of Service for Grid Computing: Applications for Heterogeneity, Large-Scale Distribution and Dynamic Environments. New York: IGI Global, pp.181-203. B. E. W. Garces E L , Felber P A Hierarchical Peer-to-Peer systems,Parallel Processing Letters 2003: 13(4), 643-657. G. Jon B. Weissman, Network Partitioning of Data Parallel Computations[C], presented at Proceeding of Third IEEE International Symposium on High Performance Distributed Computing, 1994. T.-H. K. J. M.Purtilo, Load Balancing for Parallel loops in workstation clusters, Technical Report,Department of Computer Science,University of Maryland[J], 1996. J. C. David P Anderson , Eric Korpela, SETI @home : An experiment in public-resource computing http://setiathome.ssl.berkeley.edu/cacm/cacm.html, 2002. Xicheng Tan, Liang Yu, Fuling Bian. Large-scale P2P network based distributed virtual geographic environment (DVGE).prensented at proc of Geospatial Information Technology and Applications.2007:6754(2),675427675430 H. Jin, F. Luo, X. F. Liao, Q. Zhang, and H. Zhang, Constructing a P2P-based high performance computing platform, in Computational Science - Iccs 2006, Pt 4, Proceedings, vol. 3994, Lecture Notes in Computer Science, 2006, 380-387. J. Z. R. D. McLeod, Application layer routing options for efficient data transfer over the Internet[C], presented at Proc of 2002 IEEE Canadian Conf on Electrical & Computer Engineering, Los Alamitos, 2002. I. F. Karl Czajkowski , Nicholas Karonis, A resource management architecture for metacomputing systems http://www.globus.org, 2003. H. E.-R. a. T. G. Lewis, Scheduling Parallel Program Tasks onto Arbitrary Target
208
[17] Machines[J], Journal of Parallel and Distributed computing,
1990: 9(1), 138-153. [18] T. Y. A. Gerasoulis, Scheduling Parallel Tasks on an
[19]
[20]
[21] [22] [23]
[24] [25]
[26]
[27]
[28]
[29] [30]
[31]
[32]
Unbounded Number of Processors[J], IEEE Transaction on Parallel and Distributed System, 1994: 5(9) ,951-967. Yu, Song,Xue, B. A. I.,Shuchun, J. U.,Xiujuan, H. A. N. 2005, Building Dynamic GIS Services based on peer-topeer, Semantics, Knowledge and Grid, 2005. SKG '05, 6868. N. Preguiça, M. Shapiro, C. Matheson. Semantics-based reconciliation for collaborative and mobile environments. CoopIS Conf., 2003. S. Ratnasamy et al. A scalable content-addressable network. Proc. of SIGCOMM, 2001. SETI@home. http://www.setiathome.ssl.berkeley.edu/. M. Shapiro. A simple framework for understanding consistency with partial replication.Technical Report, Microsoft Research, 2004. Stoica et al. Chord: A scalable peer-to-peer lookup service for internet applications. Proc.of SIGCOMM, 2001. Tanaka, P. Valduriez. The Ecobase environmental information system: applications,architecture and open issues. ACM SIGMOD Record, 3(5-6), 2000. Tatarinov et al. The Piazza peer data management project. SIGMOD Record 32(3), 2003. Tomasic, L. Raschid, P. Valduriez. Scaling access to heterogeneous data sources with DISCO. IEEE Trans. on Knowledge and Data Engineering, 10(5), 1998. P. Valduriez: Parallel Database Systems: open problems and new issues. Int. Journal on Distributed and Parallel Databases, 1(2), 1993. Yang, H. Garcia-Molina. Designing a super-peer network. Int. Conf. on Data Engineering,2003. W. Nejdl, W. Siberski, M. Sintek. Design issues and challenges for RDF- and schemabased peer-to-peer systems. SIGMOD Record, 32(3), 2003. B. Ooi, Y. Shu, K-L. Tan. Relational data sharing in peerbased data management systems. SIGMOD Record, 32(3), 2003. T. Özsu, P. Valduriez. Principles of Distributed Database Systems. 2nd Edition, PrenticeHall, 1999.