2015 IEEE International Conference on Smart City/SocialCom/SustainCom together with DataCom 2015 and SC2 2015
High Performance Computation Based on Semantic P2P Network Lican Huang1,2 1
School of Informatics, Zhejiang Sci-Tech University 2 Hangzhou Domain Zones Technology Co., Ltd Hangzhou, China
[email protected]
Abstract—High performance computations are widely used in scientific research and industries. As supercomputers are expensive, how to use cheap computers such as clusters and personal computers is important issue. In this paper we describe a novel distributed computation based on semantic P2P network, in which the peers can be grouped virtually into hierarchical classified domains and the problems are partitioned into subproblems and scheduled to those peers. This strategy is scalable to millions of computers effectively in theory. We have implemented version distributed knapsack problem solution in our semantic P2P network platform.
tree. In our strategy, the problems to be solved are partitioned into sub-problems and scheduled to these nodes which are formed into Semantic P2P Network, and these nodes compute the sub-problems in parallel. The remainder of this paper is organized as follows: section II gives strategy of distributed computation based on semantic P2P Network; section III gives an example of distributed computation on semantic P2P platform; finally we give conclusions.
Keywords—Distribted Computation; P2P; Semantic P2P Network; knapsack problem
II. I.
STRATEGY OF DISTRIBUTED COMPUTATION BASED ON SEMANTIC P2P NETWORK
INTRODUCTION
High performance computations are used widely in scientific research and industries. Some tasks usually take several months or years, even can not been completed by cheap computers such as clusters or personal computers. Therefore, researchers are engaging with the research on distributed computations with large scale cheap computers [11]. Traditionally, distributed computations use master/slave node technologies. However, this infrastructure has several problems such as scalability, single points of failure. P2P technologies are the potential for the solution of the above problems [1,2,3]. The P2P technologies are traditionally classified into two kinds. The un-structural P2P technology such as Freenet[4] using flooding way has shortage of heavy traffic and unguaranteed search. The structural P2P technology uses DHT such as Chord[5], which loses semantic meaning. We have proposed a new kind of P2P technology— Semantic P2P Network[6,7,8]. Other than unstructured and DHT-based structured P2P networks, semantic P2P Network keeps the semantic meanings of the nodes' roles in the communities. The nodes are identified as domain names classified by the semantic meaning of roles in the organizations. The nodes construct Semantic P2P Network according to their domains which are classified as hierarchical 978-1-5090-1893-2/15 $31.00 © 2015 IEEE DOI 10.1109/SmartCity.2015.228
Semantic P2P Network is a domain-related hierarchical structure hybridizing un-structural P2P and structural P2P technology derived from VIRGO [9]. It consists of virtually hierarchical virtual group tree with one root-layer, several middle-layers, and many leaf virtual groups. Fig. 1 shows the architecture of semantic P2P networks with two-tuple virtual hierarchical tree topology (the nodes in different layers connected with dash line are actually the same node). In Fig. 1, from the real network, three virtual overlay groups are organized. From these virtual groups two nodes per virtual group are chosen to form the upper layer virtual group. Each node implements VIRGO protocols and holds its own route table. The protocol of distributed computation based on semantic P2P Networks is as following: Step1: The problem p divided into n sub-problems pijks(i=1; j=0...n-1, k =1), each of which is scheduled to the member node nijk (i,j, k are same values with i, j,k of pijk ) of the top virtual group of this computation domain. Here, i is the layer no. of semantic P2P network, j is the node no. in this layer and k group, and k is the group no. in this layer. 1159
Step 2: if pijk can be computed in the scheduled node nijk, then node nijk compute the pijk, send back the result and go to step 5 else got step 3. Step 3: pijk divided into m sub-problems plqrs ( here, l =i+1, q =0… m-1 , Uę*O ) , each of which are scheduled to the member nlqr of the i+1 layer virtual group of this computation domain. Here, l is the layer no. of semantic P2P network, r is the group no. in this layer, and j is the node no. in r group of this layer. *Ois the group set of l layer. Step 4: if plqr can be computed in the scheduled node nlqr , then compute plqr , send back the result and go to step 5 , else goto step 3. Step 5: integrate all results into the solution.
If computation time for sub-problem that one node can computed is T, then the time complexity for nm nodes is as follows: Tcomplexity = T/. For one computer , time complexity will be 7h nm . For example , if T is 1 hour, n =100, m =5, then for one computer , time complexity will be 1010 hours . However, by using our strategy, time complexity will be less than 2 hours (1/0.5) in theory. III.
DISTRIBUTED KNAPSACK PROBLEM SOLUTION
The knapsack problem is studied in fields such as combinatorics, computer science, complexity theory, cryptography and applied mathematics. Given a set of items with weights, to find which items are included in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. knapsack problem is NP-complete problem , thus there is no polynomial algorithm to solve it. For illustration, we give a simple example as described in [12]. Suppose we have a knapsack problem to find combinations with value 77 among 10 items(2,5,6,3,7,11,13,22,17,23). Suppose there are 16 computers which are grouped into 4 virtual groups, that is, (A,B,C,D), (E,F,G,H), (I,J,K,L), (M,N,O,P). One node is selected to form the upper layer group, such as (A,F,K,P). Suppose each computer can compute sub problem with 6 items. The solving steps are as following: The problem is divided into 4 sub problems with 8 items, and sends these sub problems to nodes in the top layer. (6,3,7,11,13,22,17,23) (77)ÆA (6,3,7,11,13,22,17,23) (75)ÆF (6,3,7,11,13,22,17,23) (72)ÆK (6,3,7,11,13,22,17,23) (70)ÆP
Fig.1. architecture of semantic P2P networks
The computation enhancement is decided by the number of nodes involved to computing. Supposing the number of each group is n, total layers is m. For top layer we have one group, in the second layer we have n groups, in the third layer Qhn groups, and so on. Then total node number of m layers is nm .
Then, each of the above sub problems is divided into 4 sub-sub-problems, and sends to relative nodes in related virtual groups (A,B,C,D), (E,F,G,H), (I,J,K,L), (M,N,O,P). A: (6,3,7,11,13,22,17,23) (77):
Because all nodes are computing in parallel, enhance ratio (ER) is as follows:
(7,11,13,22,17,23) (77)ÆA (7,11,13,22,17,23) (71)ÆB (7,11,13,22,17,23) (74)ÆC (7,11,13,22,17,23) (68)ÆD
ER =nm (0