Aug 15, 2005 - 2005 Elsevier B.V. All rights reserved. Keywords: P2P systems; Trust; Reputation; Social networks; Probabilistic estimation. 1. Introduction.
Computer Networks 50 (2006) 485–500 www.elsevier.com/locate/comnet
P2P reputation management: Probabilistic estimation vs. social networks Zoran Despotovic *, Karl Aberer Ecole Polytechnique Fe´de´rale de Lausanne (EPFL), School of Computer and Communication Sciences, CH-1015 Lausanne, Switzerland Available online 15 August 2005
Abstract Managing trust is a key issue for a wide acceptance of P2P computing, particularly in critical areas such as e-commerce. Reputation-based trust management has been identified in the literature as a viable solution to the problem. The current work in the field can be roughly divided into two groups: social networks that rely on aggregating the entire available feedback in the network in hope achieving as much robustness against possible misbehavior as possible and probabilistic models that rely on the well known probabilistic estimation techniques but use only a limited fraction of the available feedback. In this paper we provide first an overview of these techniques and then a comprehensive comparison of the two classes of approaches. We test their performance against various classes of collusive peer behavior and analyze their properties with respect to the implementation costs they incur and trust semantics they offer to the decision makers. 2005 Elsevier B.V. All rights reserved. Keywords: P2P systems; Trust; Reputation; Social networks; Probabilistic estimation
1. Introduction The availability of ubiquitous communication through the Internet is driving the migration of commerce and business from direct interactions between people to electronically mediated interactions. It is also enabling a transition to peer-to*
Corresponding author. Tel.: +41 21 693 5260; fax: +41 21 693 8115. E-mail address: zoran.despotovic@epfl.ch (Z. Despotovic).
peer commerce without intermediaries and central institutions, e.g., through P2P networks. However, to have widely accepted e-commerce in P2P networks one has to eliminate or at least minimize the accompanying risks and threats. They originate primarily in the following requirements: • The environment must be open, meaning that the users can join and leave it when they want to. This leaves the users a strong feeling of autonomy and independence and can result in
1389-1286/$ - see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2005.07.003
486
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
various misbehavior. The effect is also amplified by many other causes such as the inherent technological decentralization or the lack of personal contact (‘‘contextual cues’’ [6]). • The environment must be decentralized, without central points of failure. In particular, it must be free of trusted third parties, that would oversee the transactions and punish or rule out any misbehavior. • The environment must be global, implying that well established assurance mechanisms such as litigation are ineffective due to large transaction costs when crossing jurisdictional borders. One does not have to go this far in order to identify the need for trust management in P2P networks. Even low level technological issues such as behaving according to the underlying P2P protocol, e.g., forwarding queries, leave room for much misbehavior [22]. Reputation systems [20] offer a viable solution to encouraging trustworthy behavior in P2P networks. Their key presumptions are that the participants of an online community engage in repeated interactions and that the information about their past doings is indicative of their future performance and as such will influence it. Thus, collecting, processing, and disseminating the feedback about the participantsÕ past behavior is expected to boost their trustworthiness. Recent empirical studies of eBayÕs reputation mechanism (Feedback Forum) confirm this expectation. Namely, [19] shows that ‘‘reputation profiles are predictive of future performance’’, while [11] and [15] come to the conclusion that Feedback Forum completely fulfills its promises: the positive feedback of the sellers increases their worth, while the negative one reduces it. There has been a lot of research recently on online trust and reputation management. A considerable fraction of this work targets P2P networks specifically. Due to the expectation achieving high robustness against a broad range of misbehavior, including various peer collusion patterns, most of the existing approaches aggregate the entire feedback available in the network in order to assess the trustworthiness of a single node. We term this class of work social networks in the rest of the
paper and describe it in Section 4. Besides social networks, probabilistic estimation methods have been recently proposed as a possible solution. Its key properties are an analytic characterization of the underlying peer behavior in terms of a probability distribution and using well known estimation techniques such as feedback aggregation strategies. These are discussed in Section 3. Unlike social networks, probabilistic methods normally use only a small portion of the globally available feedback to assess the trustworthiness of any specific peer. Thus one would intuitively expect that they exhibit higher implementation efficiency but also less robustness with respect to their ability to detect misbehavior. Our main goal in this paper is to compare the main properties of the two classes of approaches in a comprehensive manner. The comparison comprises analyses of (1) their performance with respect to misbehavior detection, (2) their associated implementation overhead and (3) the trust semantics they offer to the decision makers. These analyses are given in Sections 5–7 respectively. It is important to have such a comparison because it enables us to pick out the most effective trust management solution upon identifying the main properties of the target environment: the expected peer behavior, the size of the environment, the considered interaction type (e.g., file exchanges), etc. We find that probabilistic estimation techniques incur somewhat smaller implementation costs than social networks. As well, their output values can be interpreted as probability distributions over peersÕ possible behavior, and thus offer a clear decision making procedure. We stress that the analysis of the ability to detect misbehavior involves an investigation of the performance of the two classes in presence of various collusion among groups of peers. To the best of our knowledge, careful evaluations of the trust models against collusive peer misbehavior are currently lacking in the literature. With respect to this we offer an analytic view on peer behavior that makes it easy to identify and implement collusive patterns among peers. Our main finding here is that generally probabilistic estimation performs better for smaller sizes of the misbehaving population, while social networks are better when cheat-
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
ers comprise around half of the peer population. A collusion pattern in which misbehaving peers split into two groups, one of which never cheats but always misreports on the performance of peers from the other group has been found to be most effective against social networks. On the other hand, the simple collusion in which the colluding peers form a single monolithic group is particularly effective against probabilistic estimation techniques.
2. P2P reputation systems Consider a P2P network in which the peers engage in bilateral interactions in a specific context. Assume that in each interaction a service provider and a service consumer can be identified and that the service consumer rates the providerÕs trustworthiness in the interaction. The feedback data structure resulting from this process can be apparently represented as a weighted directed multigraph, that we call trust multigraph in the sequel. Its node set coincides with the set of peers and the set of edges with the set of interactions between them. The service consumer is assumed to be the source of the edge corresponding to any given interaction. The weight assigned to an edge represents the service consumerÕs feedback on the service providerÕs trustworthiness in the corresponding interaction. Examples of the feedback set, denote it W, include: the interval [0, 1], the two element set {0, 1} or any discrete grading such as the four element set {very good, good, bad, very bad}. The set W =
487
{r, d} · [0, 1] is another example, in which the contexts of the interactions are separated. Such a feedback set might be used if one wants to deal differently with recommendations of other peers and direct service provisions. In any case, the feedback set W as well as the semantics associated with its individual elements are assumed to be universally known and agreed upon. In particular, we are assuming that there is a binary partial ordering relation (call it ‘‘greater than’’) defined on W with the interpretation that ‘‘greater’’ elements mean better feedback. Fig. 1 presents a trust graph example. We see from the figure that node b provided three times the service in question to node a. Node aÕs contention with peer bÕs trustworthiness in these interactions was evaluated 0.8, 0.9 and 1 respectively. Now the central problem that P2P reputation systems address can be defined as follows: how can a given peer use the feedback that is available in the trust mutigraph to evaluate the trustworthiness of any other peer? Note that we are not assuming here that every peer knows the entire trust graph. Every peers knows only its own outgoing edges, all other edges must be retrieved from their sources; these that can misreport. A possible solution of the problem might be: to assess the trustworthiness of a node (say, peer j from Fig. 1), compute the weighted average of the experiences of the nodes which interacted with that node (peers u and v) where the weights are the trustworthiness values of the feedback originators. This can be expressed by the formula:
Fig. 1. A P2P trust multigraph.
488
tj ¼
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
X e2incomingðjÞ
we
t PsourceðeÞ ; tsourceðf Þ
ð2:1Þ
f 2incomingðjÞ
where incoming(j) is the set of all edges ending at node j, we is the feedback belonging to the edge e and tsource(e) is the trustworthiness of the originator of this feedback. This is exactly what has been proposed in [25]. More generally, to assess peersÕ trustworthiness we need an algorithm, denote it A, that operates on the formed multigraph, aggregates the feedback available in it, and for any peer given as its input it, outputs a value t 2 T denoting the estimate of that peerÕs trustworthiness. Just as with the feedback set W, the trustworthiness levels, represented by the elements of the set T, have globally agreed upon semantics. Note that the feedback set W and the trustworthiness levels set T may coincide, but we do not impose this constraint. All this reasoning leads us to define a P2P reputation system as follows: Definition 2.1 (P2P reputation system). A P2P reputation system is a quadruple (G, W, A, T), where G is a directed weighted multigraph (P, V), with P being the set of peers and V the set of edges which are assigned weights drawn from the set W. A is an algorithm that operates on the graph and outputs a specific value t 2 T for any peer given as its input. The problem of the trust management based on the peersÕ reputations can now be stated simply as follows: define the type of feedback to be taken from interacting peers about their partnersÕ trustworthiness (set W) and define a strategy to aggregate the available feedback (algorithm A) and output an estimate of the trustworthiness of any given peer (set T) so that trustworthy behavior of the peers is encouraged. 2.1. P2P systems perspective P2P computing can be seen as the sharing of computer resources (disk storage, processing power, exchange of information, etc.) by direct communication between the participating computing systems avoiding central control. Thus, it leverages the power of already existing computing
resources enabling a more effective usage of their collective power. We now give a view on existing P2P solutions, with a particular emphasis on reputation data management. Reputation data can be regarded as a simple database consisting of a binary relation storing (key, value) pairs, where key is the identifier of a peer and value holds the associated reputation information. In a P2P architecture this data has to be distributed among a dynamically evolving set of peers so that basic data access operations, such as search and update, are efficient and the storage space required at each peer is small in comparison with the size of the database. In addition, no central control is used and the system should tolerate dynamically leaving and joining peers. In order to achieve these tasks peers organize themselves in so-called P2P overlay networks. There exist two fundamental approaches to construct P2P overlay networks: (1) unstructured P2P networks, e.g., [10], and (2) structured P2P networks, e.g., [2,18,23]. In unstructured P2P networks peers are connected in a randomized fashion with a small number of neighbors. (key, value) pairs are randomly associated with peers and broadcasting mechanisms are used for searching. In structured P2P networks peers are associated with keys from the key space and consequently become responsible to store (key, value) pairs that correspond to their chosen key, typically close-by keys. They maintain routing tables with references to neighboring peers that are constructed such that search requests to a responsible peer can be routed with a low number of hops. Normally, unstructured P2P networks exhibit high lookup costs of O(E) generated messages, E being the number of the edges, while in most of the structured networks this cost is logarithmic in the number of the nodes. On the other hand structured P2P networks incur higher maintenance cost, be it for data insertion and update, or in the presence of node joins and failures. Having clarified this, we see that the trust graph can be actually stored in an underlying P2P system. In the case of an unstructured P2P network every peer can store its outgoing edges from the trust graph (the identifier of the destination node and possibly time stamp may act as the key), while
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
in the case of a structured P2P network the triples (destination, source, time stamp) may act as the keys for the trust graph edges and be stored at peers just as dictated by the P2P network [1], not necessarily at the peers that made the corresponding feedback. This imposes a new problem for structured P2P networks: the peers storing the feedback may find it profitable to misreport. To this end we are assuming that the underlying structured overlay network is configured in such a way that the feedback is replicated (the same edge from the trust graph is stored at multiple peers) and that an appropriate voting scheme to eliminate possible misreports of the feedback stores is employed [1]. In the cases of both structured and unstructured P2P networks the weights of the edges may act as the values. Thus, exploring the trust graph reduces actually to searching the underlying P2P network. More specifically, retrieving feedback about any specific peer is reduced to searching for the data items with the keys starting with that peerÕs identifier. It is also possible to use the trust graph directly as a new overlay network on top of the existing P2P overlay network to retrieve the necessary reputation data. However, we do not believe that this is a good idea. In the case of an underlying structured P2P network the reputation data about any specific peer can be retrieved with O(log N) overhead (O(N) to retrieve the entire trust network). The case of the underlying unstructured networks is somewhat different but the conclusions are the same. Namely, the trust network should normally have a lot more edges than the P2P overlay network. Assuming that both the networks are explored in a flooding or breath-first search like fashion, in which the number of the edges matters, we reach the conclusion that the P2P overlay should be used.
489
only the direct experiences of the peers who interacted with the peer whose trustworthiness is being assessed are considered, it becomes fairly easy to construct probabilistic models, i.e., models whose outputs can be interpreted as probability distributions over the possible behavior of the target peer. In Section 7 we will say more on why this may be important. A typical probabilistic model would do the following. First, it would explicitly introduce the assumptions about the probabilistic behavior of the peers. For instance, such an assumption might be that any peer is trustworthy with a certain, but unknown, probability. Or, when reporting its own experiences with others, each peer may lie with some, again unknown, probability. Second, it would use well known probabilistic estimation techniques to estimate all unknown parameters, whereby the feedback is seen as a set of samples. In the following we will describe these two points in detail. 3.1. Assumed peer behavior We assume that a joint probability distribution is associated with the set of all peers, describing their innate characteristics and determining how they behave in the context of trust and when reporting on othersÕ performance. We term this type of peer behavior probabilistic. Thus any event in the form ð. . . ; pk performs wk to pj ; . . . ; pl reports wl when pm performs wm . . .Þ; where wk,wl,wm 2 W, should have an assigned probability. For instance, a given distribution may specify P ½p1 misreports on p3 jp2 misreports on p3 6¼ P ½p1 misreports on p3 ;
3. Probabilistic estimation There are many ways to aggregate the feedback available in the trust multigraph and come up with an estimate of the trustworthiness of a given peer. A distinction among them can be made based on what fraction of the multigraph they use. When
meaning that peers p1 and p2 misreport on peer p3Õs performance in a coordinated manner, forming thus a collusive group. Needless to say, the exact distribution is not known in advance. The goal of a trust management solution is to devise a method for any trust computation source peer ps to assess the marginal probability distribution of
490
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
any trust computation target peer pt performing in specific ways P[pt performs wm to ps], wm 2 W. Any given method doing this will bear various errors of the estimated marginal for different underlying true joint distributions. This dependency constitutes in our opinion an important measure of the method quality. We use it in Section 5, in which we present our simulation results. Ideally, we would like to have a feedback aggregation method that exhibits small errors for any joint distribution. However, we will see that neither social networks nor probabilistic estimation techniques behave so. We stress that the probabilistic type of peer behavior is normally found in the existing works on P2P trust management, social networks in particular. Our contribution here is that we just have written it down in an analytic way, making it explicit that the mentioned marginal distribution is what is to be assessed. Worth mentioning here is that a new body of work is emerging that deals with so called rational behavior. Rational behavior normally implies that there is an underlying economic model in which utilities are associated with various choices of the peers and that the peers act so as to maximize their utilities. Game theory appears to be the right tool to apply in such settings. However, game-theoretic reputation models are out of the scope of this paper. We point the reader to [13] for an introduction to the game-theoretic framework for modeling reputation, [6] for an overview of current work in the area and [7] for a concrete example modeling eBay-like auction setting. 3.2. Maximum likelihood estimation Assume that we have a random variable and that we know the form of its probability distribution but do not know the exact values of involved parameters. For instance, we might know that a given variable is Bernoulli distributed but we may lack the exact value of the distributionÕs parameter. Assume further that we are concerned with estimating the unknown values. Knowing the distribution type we can compute the likelihood of any sample set generated from the distribution for general values of the unknown parameters.
Now, having a set of realizations of the considered random variable, we can simply fit the values of the unknown parameters that maximize the computed likelihood. This is the gist of maximum likelihood estimation. The estimates of the parameter values obtained in this way are called maximum likelihood estimates. Let us describe how [9] applies this method. The authors consider a P2P network consisting of peers having associated innate probabilities of performing honestly in their interactions with others. Let hj denote the probability of peer j. Assume that peer j interacted with peers p1, . . . , pn and its performance in these interactions were x1, . . . , xn, where xi 2 {0, 1} (1 denoting honest performance and 0 dishonest one). This performance are supposed to constitute the feedback set, thus W = {0, 1}. When asked to report on peer jÕ performance witnesses p1, p2, . . . , pn may lie and misreport. Assuming that they lie with specific probabilities, say lk for peer pk, the probability of observing report yk from peer pk can be calculated as: lk ð1 hj Þ þ ð1 lk Þhj if y k ¼ 1; P ½Y k ¼ y k ¼ lk hj þ ð1 lk Þð1 hj Þ if y k ¼ 0. ð3:1Þ Now, given a random sample of independent reports y1, y2, . . . , yn the likelihood function of this sample becomes Lðhj Þ ¼ P ½Y 1 ¼ y 1 P ½Y 2 ¼ y 2 P ½Y n ¼ y n .
ð3:2Þ
The maximum likelihood estimation procedure now implies simply finding hj that maximizes this expression. This number is the maximum likelihood estimate of the unknown probability. Note that the unknown parameters lk are estimated at the level of the whole network, not at the level of specific peers, by periodic checks of reports about performance. Thus they can be interpreted as probabilities of getting a misreport in the network as a whole, rather than probabilities of specific peers lying. Note that the authors introduce an important constraint on peer behavior: peers act independently, they do not collude in any way. Thus one might expect that a good performance of the method will be observed only for a subset of pos-
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
sible behaviors for which the introduced assumption holds. Quite a good performance has been reported for the case when the assumption is valid [9]. In Section 5 we check the method performance in the case when it is not valid. Note also that the model is consistent with the definition of P2P reputation systems from Section 2. The output of the method is just a number t 2 [0, 1], but it can be also viewed as a probability distribution over W = {0, 1}. 3.3. Bayesian estimation Bayesian estimation offers another possibility of making a probabilistic model. In comparison with the maximum likelihood estimation the main difference is in assigning a prior probability distribution to the unknown parameter(s) and, given an observed set of samples, calculating its posterior according to BayesÕ rule. For example, let us assume that we are estimating the unknown parameter h of a Bernoulli distributed random variable representing the trustworthiness of a specific peer. Let us assign the prior distribution Beta(a, b) to the parameter h.1 If we now observe n realizations of the peerÕs behavior, k of which were trustworthy then the posterior distribution of h becomes Beta(a + k, b + n k). The BayesÕ estimator of h is the expected value of Beta(a + k, b + n k). It aþk can be shown that this value equals aþbþn and that the estimator is asymptotically unbiased and consistent. All this applies directly to the case of allowing only for own experiences. [16] presents an example in which this approach is used. Apart from this the authors also provide the minimum bound on the number of encounters one has to have with another peer in order to retain the probability of a specific error of the estimation within given bounds. It is given by the following inequality: 1 d m P 2 ln ; 2 2
1
A random variable is distributed Beta(a, b) if its probability 1 density function is f ðuÞ ¼ Bða;bÞ ua1 ð1 uÞb1 , 0 < u < 1, a > 0, R 1 a1 b > 0, where Bða; bÞ ¼ 0 u ð1 uÞb1 .
491
where and d are the estimation error and confidence level respectively. However, it was left unspecified how to apply the method to integrate reports from direct witnesses when no meaningful decision can be made based on own experiences only. [4] and [5] make a step towards this. Even though the authors discuss a number of possibilities to deal with ‘‘second hand’’ opinions they use an intuitive approach in which all second-level information sources are given ad hoc weights. To the best of our knowledge, there is no P2P reputation model extending the Bayesian estimation technique consistently to take these or higher level beliefs into account as well. In comparison with the ‘‘one source’’ Bayesian models the only difference would be that the samples do not come from the same distribution but from different ones as the original samples now pass through the second level sources who may misreport. But, the probability distribution of these misreports can also be estimated and updated with new experiences so that calculating the posterior distribution of the unknown parameters is not harder at all.
4. Social networks Eq. (2.1), introduced in [25], presents an alternative to the probabilistic estimation techniques. It computes the trustworthiness of a peer as a weighted average of the reports of the direct witnesses of its performance. The weights are the trustworthiness values of the witnesses themselves. Thus, writing down similar equations for all the peers in the network results in forming a system of N equations with N unknown trustworthiness values, N being the size of the network. A possible way to solve the system is to assign some default values to the trustworthiness unknowns and compute Eq. (2.1) recursively until convergence is observed. In any case, no matter how the system is solved, it is clear that one needs to explore the entire trust multigraph in order to assess the trustworthiness of a single peer. This makes the key difference between the probabilistic models we just presented and the class of approaches that we consider in this section, and that we call ‘‘social networks.’’
492
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
Another P2P reputation system that belongs to the class of social networks is proposed in [12]. The method first requires transforming the formed trust multigraph into a graph: every peer is expected to aggregate its feedback on each of the peers it interacted with in the past and come up with a single value per peer. In addition, a normalization is required so that for each peer the feedback on all outgoing edges sums up to 1. Let us denote the (stochastic) matrix corresponding to the graph by M. Now the gist of the approach is in the following: (1) enumerate all the paths from the computation source node to the target node, (2) merge the feedback along the paths by multiplying the weights of the individual edges, and (3) sum up the merged feedback of the paths. Interestingly, it turns out that this operation is equivalent to computing the left primary eigenvector of the matrix M, or to finding its convergent power Mn. This computation can be done locally (but synchronously!), with the O(N3) complexity per peer, thus reducing greatly possibly exponential enumeration of all the paths between two nodes. (See [21] for a precise characterization of conditions under which this occurs.) Further social-network exploration based approaches include works such as [3,14,26] or [17], that was proposed as a Web sites ranking solution but can be used in this context as well. It is worthwhile to mention that [3] presents one of the rare works separating the contexts of recommendation and direct trust. However, it turns out that this leads to an exponential feedback aggregation algorithm.
makers. We deal with the latter two in Sections 6 and 7, while this section is devoted to the first one. So we are interested in how well the probabilistic estimation techniques and social networks can predict the trustworthiness of any given peer. Let us precisely define what we mean by peersÕ trustworthiness. Recall the discussion from Section 3.1. We are assuming here that a joint probability distribution on the set of all peers, is known to characterize the underlying peer behavior, just as explained there. However, the exact probabilities of the distribution are not known. To have a setting in which both social networks and probabilistic estimation can operate (we will say more on this issue in Section 7), we will assume that W = {0, 1} with the interpretation that 0 means untrustworthy and 1 trustworthy as well as that T = [0, 1] meaning that the output value t 2 T for any peer pi is the marginal P[pi performs 1 to pk], where pk is the peer doing the assessment. Thus the trustworthiness of a peer is precisely this marginal. It is this quantity that is being estimated. We carried out a set of simulations to see how well the two methods can perform this task. As the estimation quality measure we chose the absolute difference between the estimated probabilities of the peers performing trustworthy and their actual values averaged across all peers. We term it below ‘‘the mean absolute error.’’ Precisely, if there are np peers p1 ; . . . ; pnp in the network and the probability of performing trustworthily of peer pi is hi, while the estimated value of this probability is h0i , then the mean absoPnp jhi h0 j lute error becomes err ¼ i¼1np i . To be precise, this quantity was averaged across multiple simulation runs.
5. Probabilistic estimation vs. social networks 5.1. Simulation setting One of our primary tasks in this paper is comparing the two identified classes of approaches to reputation-based trust management in a comprehensive manner. In the discussion on the similarities and the differences between them we will pay attention to the following three dimensions: (1) the performance of the methods with respect to how well they can assess the trustworthiness of the peers, (2) implementation costs, and (3) the trust semantics that they offer to the decision
Let us start with the description of the simulation settings. A complete test of the performance of the two methods requires deriving the dependency of the estimation quality on the behavioral probability distribution for various sizes of the peer population and the trust multigraph as well as various topologies of the trust multigraph. Regarding the topology of the trust multigraph, we present here the results that hold for a power-
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
law-like topology, that we expect to be the most probable in this setting. For instance, in a file downloading scenario a small number of nodes should have large numbers of files and, consequently, interactions—while most of the peers are expected to have just a few of them. Or, such a topology could develop from preferential attachments among the nodes (e.g., through recommendations). We say ‘‘power-law-like’’ because we constrain the number of incoming edges (provided services) of each node to be higher than 10. The reason is rather obvious—we cannot say anything about a peer if we have no feedback on its behavior. We also experimented with uniformly distributed random trust multigraphs, in which the nodes have approximately equal indegrees. The results for this case will be briefly mentioned. Unless specified otherwise the results below will hold for the power-law multigraph topology. The parameter of the distribution was kept constant at 1.1. As for the behavioral probability distributions, we select a number of collusion patterns and for each of them we check the performance of the methods against a number of probability distributions that model the collusive behavior in question. We believe that this is a better strategy than testing the performance extensively by generating arbitrary distributions from scratch, without being able to provide intuitive interpretations. All collusion patterns we considered have the following in common. As said before, the peersÕ performance of a service in the context of trust are treated as Bernoulli (binary) random variables with the outcomes 0 and 1, meaning untrustworthy and trustworthy. We will call pi the probability of performing trustworthily for peer i. These probabilities are assigned randomly to the peers at the startup phase and all performance values for the service were generated independently from the service providerÕs distribution. The independence holds both for any specific peer and across the peers. With respect to reporting on other peersÕ performance we identify the following classes of collusive behavior. (a) Independent misreporting. The peer population consists of two groups: liars and honest peers. The liars always misreport the perfor-
493
mance of the other peers, while the honest peers always report honestly. The liars do not coordinate their misreporting in any way (e.g., in the sense that they misreport only on the performance of honest peers). Note that even though there is no collusion of the misreporting peers we will call this scenario ‘‘collusive’’ as well. (b) Simple collusion. The peers are divided into two groups, liars and honest peers. The honest peers always report honestly. The liars now invert the reports on the performance of the honest peers, while they always report 1 about the performance of the other liars. (c) Collusive chain. As before, there are liars and honest peers. The liars now form a chain pc1 pc2 pcn . For any 1 6 i 6 n, peer pci always reports 1 on peer pci1 Õs performance and always misreports on the performance of all other peers. (We assume here that pc0 pcn so that the chain is cyclic.) The honest peers always report honestly. This scenario is motivated by the expectation that social networks are normally susceptible to trust accumulating within loops (‘‘rank sinks’’ [17]). (d) Two collusive groups. The peers are divided into two groups, liars and honest peers. The liars further split into two equally sized subgroups; call them collusive groups A and B. Peers from group A always perform honestly. So if peer i belongs to group A then pi = 1; note that this is the only case in which we do not generate the probabilities pi at random. Further, they always report 1 on the performances of the peers from group B. Likewise, the peers from group B always report 1 on the performance of the peers from group A. In all other cases the peers from both group A and B behave as non-collusive peers. The main idea is here that the group A peers gain high trust values and get promoted as credible recommenders of the group B peers. The honest peers always report honestly. Note that we also experimented with collusion patterns in which collusive peers not only
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
collaborate when misreporting on the other peersÕ performance but also coordinate the qualities of the service they provide to the colluders and the rest of the population. However, we did not observe a substantial difference in the performance of the simulated methods in this type of setting and the one we described above. The peer population size was varied in increments of 50 from 150 to 550. The size of the trust multigraph was measured in the average number of incoming edges, or equivalently the number of provided services, per peer, with the constraint we specified earlier. The considered sizes are 20, 30 and 40 incoming edges per peer. The results were averaged over 20 runs.
Independent misreporting
1 Simple collusion Collusive cycle
0.8 Mean abs error
494
Two groups collusion
0.6 0.4 0.2
0.1
0.2
0.3
0.4 0.5 0.6 0.7 fraction of collusive peers
0.8
0.9
Fig. 3. Social networks performance as a function of the collusive population size for various collusive behaviors.
5.2. Simulation results Fig. 2 shows how the maximum likelihood estimation method performs in the described collusive settings for various sizes of the collusive groups. Fig. 3 gives the same dependency for the social networks. Error bars are included in the figures. The number of the peers and the size of the trust multigraph were kept constant in these experiments, the former at 200, the latter at 30. The fraction of collusive peers was varied in increments of 0.1 from 0.1 to 0.9. We can see from Fig. 2 that the maximum likelihood estimation follows a general trend of performing well for sufficiently small and sufficiently high fractions of colluders and perIndependent misreporting
1 Simple collusion Collusive cycle
Mean abs error
0.8
Two groups collusion
0.6 0.4 0.2
0.1
0.2
0.3 0.4 0.5 0.6 0.7 fraction of collusive peers
0.8
0.9
Fig. 2. Maximum likelihood estimation performance as a function of the collusive population size for various collusive behaviors.
forming badly when the colluders become around half of the population. This is simply an effect of the introduced probabilistic assumptions and the methodÕs relying on the ‘‘rate of lying’’ (lk in (3.1); see also the paragraph after the equation). If this rate is high then the method just takes the opposite of the majority of the reports as being true. However, this trend does not hold for the ‘‘simple collusion’’ behavioral pattern; in this case the method performs badly even if colluders make a majority of the population. We can explain this behavior of the method as follows. As we just said, the quality of the estimation is strongly influenced by the rate of lying in the network. The best quality is achieved when this rate is close to 0 or 1, while the worst one is when the rate is around 0.5. In the simple collusion scenario this happens as colluders comprise the majority of the population; hence the worst performance for large collusive groups. On the other hand, in the other cases, e.g., the ‘‘two (equally sized) collusive groups’’ scenario, the rate of lying falls far below 0.5 as collusive fractions approach 1 and the method can find the way to making good predictions. At the same time, the social networks, Fig. 3, show almost linear dependency of their performance on the collusive population size. The ‘‘two collusive groups’’ setting obviously results in the worst performance. This is expected for the following reason. The peers from the group A perform always honestly and acquire high trust values.
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500 0.1% collusive peers
1
Mean abs error
0.3% collusive peers
0.8 0.5% collusive peers
0.6 0.4 0.2
150
200
250
300 350 400 450 Number of peers
500
550
Fig. 4. Maximum likelihood estimation performance as a function of the peer population size for various collusive population sizes.
0.1% collusive peers
1
0.3% collusive peers
0.8
Mean abs error
Thus they operate as credible recommenders of the group B peers, enabling their misclassification. This suggests that the trustworthiness of the peers should not be used as the measure of their credibility to recommend others. In addition, note that the rate of increase of the estimation error is the highest for this setting. Let us draw some conclusions from these results. If we compare the qualities of the two methods we can see that the maximum likelihood estimation performs slightly better when the collusive population makes up to around 30% of the overall population. This is also the case for larger fractions of the colluders, when they make up approximately 60% or more. So only when the peer population is split into two groups, made up of collusive and non-collusive peers that are of approximately equal sizes, do the social networks perform considerably better. However, to select the right method in given circumstances, one has to take the implementation costs into account as well. In Section 6 we will say more on this and give estimates of the implementation overheads of the two approaches. One might be concerned that in the previous experiment the average number of interactions per peer is too high as compared to the peer population size (30 interactions in average against only 200 peers) and that the methods would require the peers to have an extremely large number of interactions in order to achieve a similar performance in a large scale network. The second experiment that we ran, checks whether this expectation is correct or not. Figs. 4 and 5 show how the methods perform when the size of the population grows while the average size of the trust multigraph is kept constant (30 in this experiment; only the ‘‘simple collusion’’ behavioral pattern was considered here). We can see from the figures that the performance of both methods is almost insensitive to the changes of the peer population size and we conclude that the mentioned expectation is not correct. However, there is something worth mentioning here. To be exact, we did observe a small increase of the error for larger population sizes. However, we believe that this is a consequence of the power-law distribution of the interactions. Namely, for larger networks we have more nodes
495
0.5% collusive peers
0.6
0.4 0.2
150
200
250
300 350 400 Number of peers
450
500
550
Fig. 5. Social networks performance as a function of the peer population size for various collusive population sizes.
with the minimal number of interactions provided. If we assume that the absolute number of such nodes is the most relevant factor for the error we can expect the mentioned increase. The setting with a uniform distribution of the interactions, that we also tested, can offer some help here. In this setting we observed no increase of the error. This is why we believe that the error increase in the ‘‘powerlaw’’ case is only due to the distribution itself and why we finally draw the conclusion that the absolute amount of feedback is what determines the quality of the methods, not its relative size as compared to the peer population size. To find the feedback size that is needed for good predictions, we ran another experiment in
496
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
6. Implementation costs
0.1% collusive peers
1 0.3% collusive peers
Mean abs error
0.8 0.5% collusive peers
0.6
0.4
0.2
20
25
30 35 Number of interactions per peer
40
Fig. 6. Maximum likelihood estimation performance as a function of the trust mutigraph size for various collusive population sizes.
0.1% collusive peers
1
0.3% collusive peers
Mean abs error
0.8 0.5% collusive peers
0.6 0.4 0.2
20
25 30 35 Number of interactions per peer
40
Fig. 7. Social networks performance as a function of the trust mutigraph size for various collusive population sizes.
which we varied the number of interactions per peer while keeping the population size and the underlying behavioral pattern constant (150 and ‘‘simple collusion’’ respectively). The results of this experiment are shown in Figs. 6 and 7. We can see a slight increase of the estimation quality for larger amounts of feedback. In absolute terms, 20–30 interactions per peer suffice to make good predictions. We emphasize that in the previous two experiments we also tested the performance of the methods against the other collusion patterns, that we discussed previously. We did not observe any important difference that might cast doubt on the results presented.
Because P2P networks normally involve millions of nodes particular attention should be paid to cutting down the total implementation overhead introduced by the employed reputation management solution. It mainly consists of the communication costs associated with the process of retrieving and aggregating the necessary feedback. (The involved storage costs are normally acceptably low so that we will not pay attention to them. The same holds for the computation overhead related to the feedback aggregation unless there is no way around the problem of all-paths exploration, as outlined above.) The communication costs are mainly determined by the amount of reputation data that the implemented feedback aggregation algorithm needs. Social networks operate on the entire trust (multi)graph. They require a trust computing peer either to retrieve the graph prior to the local computation or to do the computation synchronously with the other peers. In both cases the accompanying communication costs are high.2 On the other hand, the probabilistic estimation needs only the feedback about the peer in question. Even better, as we see from the simulations, it needs only a constant number of reports about the peer. So their total implementation costs should be small. But is this true irrespective of the underlying P2P overlay network being used to store reputation data? Let us answer this question and be more precise on these rough estimates. Clearly, the implementation costs, that we will measure in the total number of messages exchanged among the peers, must depend on the characteristics of the underlying P2P overlay network. If the P2P overlay is structured and has, say, logarithmic search cost then the total cost of retrieving a constant number of data items will be logarithmic as well. So O(log N) is the total implementation cost of the probabilistic method. 2
A caching scheme was proposed in [25] as a fix to this problem. Here, the trust values tsource(e) on the right hand side of (2.1) are taken from a cache (default values are used in the case of cache miss). After each computation the computed trust value replaces the corresponding value existing in the cache.
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500 Table 1 Implementation overhead of maximum likelihood estimation and social networks
Structured networks Unstructured networks
Search cost
MLE
SN
O(ln N) O(E)
O(ln N) O(E)
O(N) O(E)
On the other hand, social networks need to explore the entire overlay. Assuming that the overlay has the form of a tree, N 1 edges for N nodes, we see that the overhead is O(N) in this case [25]. The situation becomes somewhat different if the overlay is unstructured or Gnutella-like [10]. Now, the unit search cost is O(E), where E is the number of the edges in the overlay. As before, this is the cost associated with retrieving the reports in the case of probabilistic estimation. But the very same overhead is required to explore the whole network in a flooding-like fashion. So retrieving all available data items, as needed in the case of social networks, will have exactly this overhead. This reasoning is neatly presented in Table 1. The main message to be learned from it is: in a structured P2P overlay probabilistic estimation takes considerably less implementation costs than social networks. In an unstructured P2P overlay the difference between them is negligible.
7. From reputation to trust So far we did not pay much attention to the trust semantics that the probabilistic method and the social networks offer to prospective decision makers. Let us start the discussion on this issue by presenting our view on trust and trust management. Based on [24], we view trust as being inseparable from vulnerability and opportunism associated with the interacting parties. Consequently, we say that peer A (trustor) trusts peer B (trustee) if the interaction generates a gain to be shared with and by peer B and exposes peer A to a risk of loss, if peer B takes a too large a portion from the joint gain. As further pointed out in [24], ‘‘a general issue of trust management is to assess vulnerability and risks in interactions assuming that the interactants are self-interested.’’
497
These assessments must be made in such a way that they enable: (1) reducing the opportunism of the trustee, (2) reducing vulnerability of the trustor and, after these two issues have been properly addressed, (3) deciding if and when to enter an interaction. The main goal of any trust and reputation management mechanism is, if possible, completely reducing the opportunism and/or vulnerabilities of the interacting parties. However, if this is not possible then the mechanism should assess the risks and enable the participants to unambiguously decide whether to interact or not. The best way to enable risk assessment is to output probability distributions over the set of possible behaviors. Let us see how the probabilistic estimation and social networks respond to this. First, as far as the setting of our simulations is concerned, they both can be viewed as outputting probability distributions over the set of possible behaviors W. But this is the case only because of the very special form of the feedback set: it is the two-element set W = {0, 1}. To be exact, the outputs of the social networks can be interpreted as probability distributions only in this particular case. As soon as one tries to have a more finegrained feedback set, containing more than two elements, it is no longer possible to interpret the social networks outputs as distributions. At best, they can now be understood as the most likely outcome (provided the output is rounded to the closest value from W). On the other hand, the maximum likelihood estimation method can be always adjusted to give exactly an estimate of the probability distribution on W. (To see this, just think of the necessary changes to be made on Eq. (3.1) if the feedback set W is a three-element set. See [8] for another example of such a probabilistic model.) Worth noticing is that in the case of the mentioned two-element feedback set there is still the problem of how plausible are the social networkÕs outputs. Even though the intuition behind Eq. (2.1) is quite clear, it is still ad hoc, not based on a theoretically founded estimation method. A similar problem is present in [12], now even more strongly emphasized. Besides lacking a plausible interpretation for a decision maker, the output
498
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
values exhibit another property worth mentioning: the trust values of all peers sum up to 1 and they are independent of the computation source. Therefore, they have no plausible interpretation on an absolute scale. Consequently, the scenarios in which they can be used must include rankings of the values of different peers.
8. Discussion and future work Our primary goal in this paper was providing a comprehensive comparison of probabilistic estimation techniques and social networks in the context of predicting the trustworthiness of the peers based on the available feedback on their past doings. The comparison has been performed under the same parameter settings, involving various patterns of collusive behavior among peers. We found that the social networks cannot be applied in as diverse settings as probabilistic estimation techniques and that they normally exhibit the problem of unclear interpretation of the output values. With respect to the implementation efficiency probabilistic estimation outperforms the social networks if the P2P overlay used for storing the reputation data is structured while they require similar costs if the overlay is unstructured. Regarding the ability of the methods to predict likely future behavior based on past evidence, we found that neither of the methods performs perfectly under all possible behavioral patterns that might be characterizing the underlying P2P community. The ‘‘simple collusion’’ scenario was found to be the most effective against probabilistic estimation while the ‘‘two malicious groups’’ collusion pattern caused the worst performance of the social networks. Normally, probabilistic estimation performs better than social networks for smaller fractions of collusive peers; the social networks give better predictions when collusive peers comprise around a half of the peer population. There are, however, a number of issues that have to be addressed in the future. First, we are not aware of any empirical or theoretical confirmation that probabilistic behavior as we defined it characterizes typical online communities and P2P networks in particular. This is in our opinion
a question that has to be investigated. More generally, we believe that a good reputation system design must start with identifying and understanding the underlying behavior, including the derivation of its analytic characterization. Only having done so is one is able to devise well-performing feedback aggregation strategies that are as efficient as possible. Second, we did not pay attention in this paper to certain problems that can arise in real-world P2P systems. For instance, we assumed away the fact that normally only a fraction of the peers is online and that, consequently, only a fraction of the needed reputation data can be obtained. Further, if the P2P overlay network is structured then the feedback is not necessarily stored at its originator but at different peers, as dictated by the overlay. The peers storing the feedback may misbehave as well and this gives rise to another dimension in peer behavior, that has to be taken into consideration. Though it is obvious that these practical issues will lead to quantitatively different results, we believe that they should remain unchanged qualitatively. However, this claim has yet to be confirmed.
Acknowledgement The work presented in this paper was partly carried out in the framework of the EPFL Center for Global Computing and supported by the Swiss National Funding Agency OFES as part of the European project Evergrow, No. 001935, and by the Swiss NSF as part of the project Computational Reputation Mechanisms for Enabling Peer-To-Peer Commerce in Decentralized Networks, No. 205121-105287/1.
References [1] K. Aberer, Z. Despotovic, Managing Trust in a Peer-2Peer Information System, in: Proceedings of the IX International Conference on Information and Knowledge Management, Atlanta, Georgia, November 2001. [2] K. Aberer, P-Grid: A self-organizing access structure for P2P information systems, in: Proceedings of the Sixth International Conference on Cooperative Information
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13] [14]
[15]
[16]
[17]
Systems (CoopIS 2001), Trento, Italy, September 5–7, 2001. T. Beth, M. Borcherding, B. Klein, Valuation of trust in open networks, in: Proceedings of the European Symposium on Research in Computer Security (ESORICS), Springer-Verlag, Brighton, UK, 1994, pp. 3–18. S. Buchegger, J.Y. Le Boudec, The effect of rumor spreading in reputation systems for mobile ad-hoc networks, in: Proceedings of WiOpt Ô03: Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks, Sophia-Antipolis, France, 2003. S. Buchegger, J.Y. Le Boudec, A Robust Reputation System for P2P and Mobile Ad-hoc Networks, in: Second Workshop on the Economics of Peer-to-Peer Systems, Cambridge, MA, USA, 2004. C. Dellarocas, The digitization of word-of-mouth: Promise and challenges of online feedback mechanisms. Working paper, 4296-03, MIT Sloan School of Management, 2003. C. Dellarocas, Efficiency and robustness of binary feedback mechanisms in trading environments with moral hazard. Working paper 4297-03, MIT, 2003. Z. Despotovic, K. Aberer, A Probabilistic Approach to Predict PeersÕ Performance in P2P Networks, in: Eighth International Workshop on Cooperative Information Agents, CIA 2004, Erfurt, Germany, 2004. Z. Despotovic, K. Aberer, Maximum Likelihood Estimation of PeersÕ Performances in P2P Networks, in: 2nd Workshop on the Economics of Peer-to-Peer Systems, Cambridge, MA, USA, 2004. Gnutella. The gnutella protocol specification v0.4 (document revision 1.2). Available from: http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf, Jun 2001. D. Houser, J. Wooders, Reputation in auctions: Theory and evidence from ebay. Working paper, University of Arizona, 2001. S.D. Kamvar, M.T. Schlosser, H. Garcia-Molina, Eigenrep: Reputation management in p2p networks, in: Proceedings of the World Wide Web Conference, Budapest, Hungary, 2003. D. Kreps, R. Wilson, Reputation and imperfect information, Journal of Economic Theory 27 (1982) 253–279. S. Lee, R. Sherwood, B. Bhattacharjee, Cooperative peer groups in nice, in: IEEE Infocom, San Francisco, CA, USA, 2003. M.I. Melnik, J. Alm, Does a SellerÕs Ecommerce Reputation Matter? Evidence from eBay Auctions, Journal of Industrial Economics 50 (3) (2002) 337–349. L. Mui, M. Mohtashemi, A. Halberstadt, A computational model of trust and reputation, in: Proceedings of the 35th Hawaii International Conference on System Science (HICSS), Hawaii, USA, 2002. L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation ranking: Bringing order to the web, Technical report, Stanford University, Stanford, CA, 1998.
499
[18] S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, A scalable content-addressable network, in: Proceedings of ACM SIGCOMM Õ01, 2001, pp. 161–172. [19] P. Resnick, R. Zeckhauser, Trust among strangers in internet transactions: Empirical analysis of ebayÕs reputation system, in: The Economics of the Internet and E-Commerce, in: Michael R. Baye (Ed.), Advances in Applied Microeconomics, volume 11, Elsevier Science, Amsterdam, 2002. [20] P. Resnick, R. Zeckhauser, E. Friedman, K. Kuwabara, Reputation systems, Communications of the ACM 43 (12) (2000) 45–48. [21] M. Richardson, R. Agrawal, P. Domingos, Trust management for the semantic web, in: Proceedings of the Second International Semantic Web Conference, Sanibel Island, FL, 2003, pp. 351–368. [22] S. Saroiu, G. Krishna, S.D. Gribble, A measurement study of Peer-to-Peer file sharing systems, in: Proceedings of Multimedia Computing and Networking (MMCN) 2002, San Jose, CA, January 2002. [23] I. Stoica, R. Morris, D. Karger, F. Kaashoek, H. Balakrishnan, Chord: A scalable peer-to-peer lookup service for internet applications, in: Proceedings of the 2001 ACM SIGCOMM Conference, 2001, pp. 149– 160. [24] J.-C. Usunier, Trust management in computer information systems. Working paper, IUMI, HEC, University of Lausanne, Switzerland, 2001. [25] L. Xiong, L. Liu, Peertrust: Supporting reputation-based trust in peer-to-peer communities, IEEE Transactions on Knowledge and Data Engineering (TKDE), Special Issue on Peer-to-Peer Based Data Management 16 (7) (2004) 843–857. [26] B. Yu, M.P. Singh, A social mechanism of reputation management in electronic communities, in: Proceedings of the 4th International Workshop on Cooperative Information Agents (CIA), Boston, USA, 2000, pp. 154–165.
Zoran Despotovic is a research assistant in the Distributed Information Systems Laboratory at the Swiss Federal Institute of Technology in Lausanne (EPFL). He holds a B.Sc. in Computer Science from the Faculty of Electrical Engineering, Belgrade University, Serbia. Prior to joining EPFL, he worked at the Yugoslav Chamber of Commerce and Industry as a member of IT staff. Zoran is currently pursuing a Ph.D. under the supervision of Prof. Karl Aberer. His research interests include economics of peer-to-peer systems, with a particular emphasize on reputation-based trust management in peer-to-peer networks, trading and electronic exchange mechanisms suitable for decentralized systems.
500
Z. Despotovic, K. Aberer / Computer Networks 50 (2006) 485–500
Karl Aberer has been full professor at EPFL since September 2000. There he heads the Distributed Information Systems Laboratory of the School of Computer and Communications Sciences. His main research interests are on distributed information management, P2P computing, semantic web and the self-organization of information systems. He received his Ph.D. in mathematics in 1991 from the ETH Zurich. From 1991 to 1992 he was postdoctoral fellow at the
International Computer Science Institute (ICSI) at the University of California, Berkeley. In 1992 he joined the Integrated Publication and Information Systems institute (IPSI) of GMD in Germany, where he became the manager of the research division Open Adaptive Information Management Systems in 1996. He has published more than 100 papers on data management on the WWW, database interoperability, query processing, workflow systems and P2P data management. Recently he was PC chair of ICDE 2005, DBISP2P 2003, DS-9, and ODBASE 2002. He is a member of the editorial board of the VLDB Journal and Web Intelligence and Agent Systems.