Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing
P2P Trust Model: The Resource Chain Model
Sinjae Lee, Shaojian Zhu and Yanggon Kim Department of Computer and Information Sciences Towson University, Maryland USA
[email protected] capabilities about reputation are honesty and reliability based on recommendations received from other peers [9].
Abstract As is well known, computer security has been given more attention; many mechanisms have been developed to increase P2P security like encryption, sandboxing, reputation, and firewall. Among those technologies, reputation mechanism as an active method is especially useful to automatically record, analyze and adjust peers’ reputation, trust, histories among the different peers, and this method is suitable for our anonymous and dynamic P2P environment. Therefore, reputation-based trust mechanism is a secure enough reputation system. In this paper, we present a new reputation-based trust management model to prevent the spread of malicious content in the open community using a resource chain model. The main idea of the resource chain model is using a routing table to record the information of nodes’ credibility and their recommending nodes’ credibility. Therefore, using this model can help us to find the best and safest resource location efficiently and decrease the number of malicious transactions.
1.2 P2P Algorithms There are three different algorithms [6] in P2P. First, the centralized directory model which uses an index, all peers published information about the content they offer for sharing. The most popular example is Napster. Peers send a search to a central server with an index then, the server looks up the node with desired file, gives the location to peers. Finally, the transfer proceeds without further server intervention. Flooded request is the next algorithm. It is pure P2P model and broadcast to all nodes. All peers are connected to some number of other nodes, and then file requests are sent to all known nodes. If the node has the file, it returns it or if it doesn’t, it sends requests to all nodes that are known. This algorithm continues until the file is found or times out. The last algorithm is document routing [3], which is very efficient for large global communication and is the most popular P2P model. It has two problems. One is the difficulty of implementation and the other is islanding problems. When the document is published, the algorithm generate hash based on name and content, then move document node with ID closest to hash. The examples of this algorithm are Chord, Tapestry, CAN and Pastry.
1. Introduction Peer-to-peer (P2P) networks are different from client and server networks. To put it bluntly, P2P networks do not have clients and servers. The advantages of P2P networks are valuable externalities, lower cost of ownership, and anonymity/privacy. Conversely, there are many malicious peers and attackers in the P2P communities because P2P network is open and anonymous in nature [7] [8].
1.3 Existing Reputation-based Trust Models EigenTrust is a reputation system which decreases the number of downloads of inauthentic files and is based on this concept of transitive trust [4] [5]. The solution is to give each peer a trust value based on its previous behavior. There are three steps in the model. First, compute local trust value. Then normalize local trust value to avoid maliciousness. The final step is to aggregate local trust value. Let us show how to get a
1.1 Trust and Reputation Trust is a peer’s belief about capabilities of another peer. The capabilities are reliability and honesty based on its own direct experiences. On the other hand, the
0-7695-2909-7/07 $25.00 © 2007 IEEE DOI 10.1109/SNPD.2007.521
357
However, in this model, there are problems unsolved. First, this model does not consider other factors which may affect our trust decision like some environment control factors. Second, it is difficult to decide a threshold credibility standard to judge if trustful or not. In a more complex model, the designer also includes some other factors to make the reputation system more powerful. Besides the feedback (S(u,i) to represent peer u’s feedback of peer i) and credibility(Cr(p(u,i)): Credibility of p(u,i)), other transactions factors (TF(u,i): Transaction context factor) like transaction numbers(I(u)) and transaction scales are included to increase the transaction control. Also the community context factors CF(u) are also included to make the model adaptable to different situations easily. The general metric of this model [11] is:
new local trust value. Using a matrix C, we can get the new local trust value. The matrix C includes the local trust vectors of all the peers in the system. For calculating the new trust values in the local vector of peer “i” based on requesting a particular neighbor “j”, the matrix C is multiplied by the vector i.e. t(new) = C*t(old) [1]. PeerTrust is a dynamic P2P trust model. There are five important factors in PeerTrust. The feedback a peer obtains from other peers, the feedback scope, such as the total number of transactions that a peer has with other peers, the credibility factor for the feedback source, the transaction context factor for discriminating mission-critical transactions from less or no critical transaction, and the community context factor for addressing community-related characteristics and vulnerabilities [10].
T (u ) = a *
1.4 Review of Previous Work
K
I (u ) i =1
S ( u , i ) * Cr ( p (u , i )) * TF ( u , i ) r !( n − r )!
+ b * CF (u )
new_credibility = old_credibility x (1 - Trans_Factor)
Current P2P applications can be classified into one of the following three categories: file sharing, distributed processing, and instant messaging [2]. Since file sharing is the most common application for P2P network, the focus of our paper will be on P2P applications for file exchange. Many researchers have made a lot of effort to this reputation-based system area and some of them have made good theoretical models. Many researchers have given a lot of effort to this reputation-based system area and some of them have made good theoretical models. Using the trust query process [3], which is based on the trust feedback and the credibility factor, we decide by the later action and calculate trust rating and distrust rating by the responses of the responders. The transaction histories are stored in trust vectors, and we assigned an integer with each trust vector to show how many are significant bits in this trust vector. After each action, the most significant bit was replaced by the latest result, and the history bits move to the right of the vector. We calculate trust rating and distrust rating according to the vector’s bits. When we want to do a Trust Query Process, first, a threshold number is given to show the number of peers needed to be considered, and this number stands for the number of the most trusted responses. We can query less than the threshold number of peers only if we want to do any trust query. The process is like this:
∑
∑
The first part of this general metric is used to collect all the transaction information and the second part is used to adjust the community affect to the final reputation result. This model successfully collects all the useful information of the P2P environment and seems very reasonable. And in the implementation of this model, the author uses cache technology to decrease the number of calculations of trust requests by putting a trust cache in each peers’ local vector. But so many transaction factors are hard to collect and this complex computation will certainly slow down the processing. One big problem is such models concentrate on the relationship between the node and its direct neighbors. Usually when we want to find a resource, we will get the resource path through many peers and peers’ neighbors; and such a resource path carries the most information of our future resource searching. And we try to make good use of this resource chain, strengthening the whole resource chain if the end of the chain provides good service or weakening the whole chain if an unsuccessful downloading happens. That is what our model concentrates on.
2. Our Approach Even though the previous model [2] successfully collects all the information, there is a problem with the transaction and the final reputation result. The problem is that this model only concentrates on the relationship between the node and its direct neighbors. If we want to find a resource, we will have the resource path through many peers. The peers’ neighbors try to find a good resource chain, which makes the resource successful.
ct
i =1 i i
k
By comparing the trust number with a standard number, we can see if our model helps us make a positive or negative opinion of some destination peer.
358
We try to make good use of this resource chain by strengthening the whole resource chain. If the end of the chain provides good services or weaken the whole chain it becomes unsuccessful when a downloading takes place. That is what our model concentrate on. As we enter a new P2P community, we can get some network neighbors. Later we can search from our neighbors and neighbors’ neighbors to find the resource of what we want. Since such finding paths are sometimes long chains, we try hard to use our knowledge to find the best chain from such candidate chains and finally try to connect the end of the best chain to get resources. We do not mind if their credibility is low or high but only make sure we choose the most reliable one from all candidates. The amount of truthfulness is calculated using the credibility for certain P2P nodes. When we enter a community, we have our own identities (for example IP addresses in Internet or MAC addresses). When we enter the system, we can broadcast or multicast our requests to enter the system and wait for other peers’ connection replies. After we collect enough replies, we can choose neighbors from the peers who give out replies for our coming requests. Such a choosing can be according to IP address similarities, that is, we try to find neighbors with quite different IP addresses so that we may get neighbors from different groups. Since such a choosing is only a start it is likely there are malicious users among them. We can still update our neighbor table to add new neighbors and delete bad guys according to later transactions. We will discuss it in the update credibility part. After we get so many assigned neighbors, we add them into our neighbor table and give each neighbor a credibility number 0.5, which is the starting credibility. When we begin to ask neighbors questions about the resource and later begin transactions, we can update this neighbor list. One problem is how many neighbors for a node are good. Another problem is related to the search depth, that is, when we do not know where to download information, we will ask our neighbors. Our neighbors will have depth 1. Then if our neighbors do not know where to download and forward our requests to the neighbors’ neighbors, these indirect neighbors will have depth 2. So we can forward the request to depth 3, 4 …L. We hope the number of neighbors can be reasonable so that one node can also refer to its limited neighbors and collect their collected ideas without a lot of computing effort. Also, the length of the resource chain is not very long to keep the search efficiency while at the same time. We hope such a searching route length can help us cover almost all the nodes in the P2P community.
Such two numbers have relationships with the scale of the community like this: N: Number of neighbors for each node. L: Length of a resource chain. SCALE: Scale of the P2P community. Theoretically, we suppose each node can have N neighbors, so if length of the resource chain is L, we have total number of nodes: 1 + N + N 2 + N 3 + N 4 + ..... + N L −1 =
N L −1 = T _ Scale N −1
So SCALE should be no less than this number and can not be too much bigger than this T_Scale. T_Scale < SCALE < N x T_Scale is one possibility for this problem. So to solve this problem, we need first get the scale of the community using statistics or other methods to predict it, then we can get a compromise of N and L using the relationship between the two numbers, and scale of our community, and such a choice of N and L can be adjusted according to the running results.
3. Our Model 3.1. Assumptions Before we introduce our models, to simplify the explanation of our model, we’d like to have such assumptions: In Figure 1(a), we have 8 neighbors, and the length of our search is 8. We use a neighbor list for each node to record a node’s neighbor nodes. We assume the nodes are ordered by the credibility, the higher the node’s credibility, the high position it is and we assume we have such a formula to represent the relationship of the nodes. For 0 < i < 9, Ni = Neighbor (N0, 1). That means the distance of Ni to N0 is 1. And we assume at first, N0 has the following neighbor list. We will also use the below temp table for each request to record all the temporary transaction results before request is met.
(a)
359
to tell the node N where to download the File X. If Ni knows more than 1 node where node N can download the File X, Ni will tell node N the most reliable one node. After that, for each node, we need a temp table to record the temporary information of nodes’ credibility and their recommending nodes’ credibility. For each node which takes part in the transaction, they all use a temporary table to record the transaction information like the nodes they recommend and the information source they get from.
(b)
3.2.3 Forward requests. They will forward the request to their neighbors. Here, we assume that N5, N6, N7, N8 do not know where N can download File X. So they forward the request to their neighbors and we also introduce other neighbors to these 4 nodes:
Figure 1. (a) Neighbor List of Node N0 (b) N0 sends out his request to neighbors
3.2. How our model works
For 13 < i < 20, Ni = Neighbor(N5, 1); For 21 < i < 28, Ni = Neighbor(N6, 1); For 29 < i < 36, Ni = Neighbor(N7, 1); For 37 < i < 44, Ni = Neighbor(N8, 1); (Too small to indicate all the indexes of the nodes)
3.2.1 Request File. We see from Figure 1(b) that if node N0 wants to know where he can download a File X in the network, since he only know limited number of neighbors, he sends the request to all his neighbors. Request means who can tell me where to download File X to. N1 to N8 means the neighbors of N0 through which Node N0 can directly ask for the resource location.
3.2.4 Collect all replies after timeout. There are four situations to collect all replies after time out: After time out, the node that sends out requests can get many replies. After time out of the request, the node N will get many paths of information to download the file. The node N can choose Neighbor_Credibility x Recommend_Credibility as the most reliable one to download from its temp table. Some node may not provide any useful information for the request. For example, N-7 here cannot get any useful information, so it did not give a reply.
3.2.2 Get partial replies. For those who directly know where to download the File X, we assume they will give out the reply to N without more forwarding requests. Here, we assume that N1, N2, N3, and N4 know where N can download File X and we assume that we have new nodes who are unknown by node N0 before.
N9 = Neighbor(N1, 1) N10 = Neighbor(N2, 1) N11 = Neighbor(N3, 1) N12 = Neighbor(N4, 1)
After, the temp table for each node has to be changed. 3.2.5 Download the file. The node N will choose the one node with the most credibility to download the file. If successful, return; if not successful, N can choose the next best path to try.
N9
N10
Reply_N_0
N1
N
_1 _N ply Re Reply_N_2
Re pl
3.2.5.1 The download is successful. Here we assume the node N_2_r is the best candidate for node N to download the File X. N successfully downloaded the file from the best candidate node N11. After, node N will tell N3 this is a good recommendation, and raise the credibility of N3 to thank to N3. When N3 gets this successful information, he is glad his recommendation is good, so he also raises the credibility of his neighbor N11 who offers the resource for node N appears. After this update of their neighbors’ credibility, we help adjust the resource chain credibility. The neighbor credibility of each node taking part in the transaction
N2
y_ N_ 3
N4
N11 N3
N12
Figure 2. Node N gets some replies In the Figure 2, Reply_N_I means if Ni knows where to download the File X, the Ni node will give out replies
360
recommendation and downloading is updated after the transaction happened. A neighborhood chain is strengthened after one successful transaction and the order of the nodes should be adjusted after one transaction according to their credibility.
range: 1-N). Introduce another factor is Change_Factor = Trans_Factor / (1 + Trans_Factor); (range: 0-1). We can stable this number again like this Factor = Log(Change_Factor + 1); (range: 0-1). So, for credibility increase, the result is new_credibility = old_credibilty x (1 + Factor). For decreasing credibility, the result is new_credibility = old_credibility x (1 – Trans_Factor).
3.2.5.2 The download is unsuccessful. The neighbor credibility of each node taking part in the transaction recommendation and downloading is updated after the transaction happened. We help adjust the resource chain credibility. The neighborhood chain is weekend after one successful transaction. For successful transactions, the whole resource chain is strengthened. However, for unsuccessful transactions, the whole resource chain is weakened.
5. Tree Problem about our model One problem of the chain model is we only considered one best path for each neighbor. Usually, for each neighbor, we can get more than one resource path and such resource paths can be good candidates for our use. Since the mostly used and easiest tree is binary tree, we will employ binary trees into our model.
4. Transaction factors and update of credibility after transactions
6. Conclusion
The most important factor to be considered is the size of the downloaded file. The worst case is if the file is very great, and some malicious user may offer downloading to 99% of the file, and the problem would be a great waste of downloading bandwidth and time. So the usual ways employed to avoid this case are the following: 1.
It should be concluded, from what has been said above, that using our model, the p2p reputation problem can be solved and distributed efficiently. Our model can find the bad resource chain after each transaction and weaken the whole chain to prevent it from happening again. We also promote each other’s transaction reputations if the transaction is successful. In that way, we suggest everyone in the p2p network to be honest and benefit from the whole p2p community. Such a basic idea of our model can well serve for the p2p file sharing system.
We should add file authentication information. Using RSA or other authentication methods can improve the file downloading credibility. We can divide a big file into parts, and download each part separately, so that we can use a number FILE_SIZE to indicate the size of each file part. So now we get the transaction factor. After the transaction, we need to update the credibility for neighbors. The updating of credibility:
2.
3.
The most advantages of this model: 1.
Using such file-size information, we can get the transaction factor. That is Trans_Factor = log(file_size) / log(max_file_size). We want to stable the number. If the transaction is successful, new_credibility = old_credibilty x (1 + Trans_Factor). If the transaction is unsuccessful, then the equation for updating the credibility will be new_credibility = old_credibility x (1 – Trans_Factor). For each node in the resource chain, we need to update this credibility according to the same Trans_Factor.
2.
3.
Log(file_size) is considered as the transaction factor to replace the file_size consideration. (There is no maximum file size in our calculation), running time usually used like Trans_Factor = log (file_size); (Trans_Factor
The searching process of the best resource candidate is a heavy task. In our model, this task is assigned to Node N, his neighbors, and his neighbors’ neighbors and so on. Such a load balance can greatly provide the network scalability and process speed. Our model concentrates on the chain effects after one transaction happens; we try hard to strengthen the resource chain of a successful transaction while we make our best to weaken an unreliable resource chain. We try our best to maintain a good enough neighbor list and dynamically adjust their credibility according to transaction results. In the real society, we give other opinions according to their deeds dynamically.
The feasibility of this model:
361
Our model can be applied to p2p file sharing system to enhance the security and automation adjustment of that system. Because of our automatic information collection and reputation updating according to different system environments, our model can be well applied to such communities as a technical method to make p2p world more trustful.
[9] Y. Wang, and J. Vassileva “Trust and Reputation Model in Peer-to-Peer Networks”, University of Saskatchewan, Canada, 2004.
Some Future Works:
[11] L. Xiong, and L. Liu, “A Reputation-Based Trust Model for Peer-to-Peer eCommerce Communities”, In IEEE Conference on Electronic Commerce (CEC'03), Newport Beach, June 2003.
[10] L. Xiong, and L. Liu, “PeerTrust: Supporting ReputationBased Trust in Peer-to-Peer Communities”, IEEE Transactions on Knowledge and Data Engineering (TKDE), Special Issue on Peer-to-Peer Based Data Management , 16(7), July 2004.
However, there are also some future works to do about our model. One thing to note, if we can directly know the resource location, we do not ask for more requests. But this may result badly. Another problem is different resource, with different credibility problems. Though it is good to assign each node a credibility factor, it is also true that for the same node, different resources of him have different credibility. That is true when in our lives a math professor can supply a lot of useful mathematic materials while the materials of chemistry or biology provided by him are not as credible. So in our future studies, we need to consider these problems.
7. References [1] V. Bhat, “Reputation Management in Peer-to-peer Systems”, University of Texas at Austin, 2004. [2] E. Damiani, D. C. Vimercati, and S. Paraboschi, “A Reputation-Based Approach for Choosing Reliable Resources in Peer-to-Peer Networks”, CCS’02 Washington DC, USA, 2002. [3] M. Gupta, P. Judge, and M. Ammar, “A Reputation System for Peer-to-Peer Networks”, NOSSDAV’03, June 1-3, Monterey, California, USA, 2003. [4] S. Kamvar, M. Schlosser, and H. Garcia-Molina, “The EigenTrust Algorithm for Reputation Management in P2P Networks”, In Proceedings of the Twelfth International World Wide Web Conference, May 2003. [5] S. Karmvar, M. Schlosser, and H. Garcia-Molina, “EigenRep: Reputation Management in P2P Networks”, WWW2003, May 20-24, Hungary, 2003. [6] D. Milojicic “Peer-to-Peer Computing. HP Laboratories”, Palo Alto, 2003. [7] A. Oram, Peer-to-Peer Harnessing the Power of Disruptive Technologies, Oreilly, 2001. [8] A. A. Selcuk, E. Uzun, and M. R. Pariente, “A ReputationBased Trust Management System for P2P Networks”, 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), Chicago, USA, 2004.
362