Finding reliable users and social networks in a social internetworking system Pasquale De Meo
Antonino Nocera
Giovanni Quattrone
DIMET Università Mediterranea di Reggio Calabria Via Graziella, Località Feo di Vito, 89122 Reggio Calabria, Italy
DIMET Università Mediterranea di Reggio Calabria, Via Graziella, Località Feo di Vito, 89122 Reggio Calabria, Italy
DIMET Università Mediterranea di Reggio Calabria, Via Graziella, Località Feo di Vito, 89122 Reggio Calabria, Italy
[email protected] [email protected] [email protected] Domenico Rosaci Domenico Ursino DIMET Università Mediterranea di Reggio Calabria, Via Graziella, Località Feo di Vito, 89122 Reggio Calabria, Italy
DIMET Università Mediterranea di Reggio Calabria, Via Graziella, Località Feo di Vito, 89122 Reggio Calabria, Italy
[email protected]
[email protected]
ABSTRACT
General Terms
Social internetworking systems are a significantly emerging new reality; they group together a set of social networks and allow their users to share resources, to acquire opinions and, more in general, to interact, even if these users belong to different social networks and, therefore, did not previously know each other. In this context the notions of trust and reputation play a very relevant role. These notions have been widely studied in the past in several contexts whereas they have been largely neglected in the social internetworking research; however, since this application field presents several peculiarities, the results found in other application contexts are not automatically valid here. This paper introduces a model to represent and handle trust and reputation in a social internetworking system and proposes an approach that exploits these parameters to provide users with suggestions about the most reliable persons they can contact or social networks they can register to.
Algorithms, Design, Experimentation
Categories and Subject Descriptors H.3.4 [Information Storage and Retrieval]: Systems and Software—User profiles and alert services; H.3.4 [Information Storage and Retrieval]: Systems and Software— Distributed Systems; H.3.5 [Information Storage and Retrieval]: Online Information Services; H.4.m [Information Systems Applications]: Miscellaneous
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IDEAS 2009, September 16-18, Cetraro, Calabria [Italy] Editor: Bipin C. DESAI Copyright 2009 ACM 978-1-60558-402-7/09/09 $5.00.
173
Keywords Social Networks, Social Internetworking, Trust and reputation computation
1. INTRODUCTION In virtual communities the term trust is generally exploited to indicate the reliance that a community member associates with another one [6, 12, 19, 22]. In the past the issue of trust management in virtual communities has been deeply investigated and several models and approaches to face it have been proposed (see [20] for an exhaustive survey about them). In the analysis of trust in virtual communities three main aspects should be considered [19]: • Trust is a multidimensional concept since there are many aspects that a trustor should evaluate to determine the reliance of a trustee [5, 11]. For instance, the trust of a buyer in a seller should consider the seller’s honesty, experience, precision, efficiency, cooperativeness, and so on. A trustee could have a high value in some of these dimensions and a low value in other ones. • Trust depends on reference contexts [17]. In fact, a trustor could associate a high reliance with a trustee in a sector and a low reliance in another one. For instance, the trust of a student in his computer science professor could be very high when this last discusses about databases and very low when he discusses about disco music. • When a trustor operates in a virtual community, besides his personal evaluation of the trustee (which ex-
presses his subjective point of view), he should consider the opinion of the whole community about the trustee [1, 12, 13]. This opinion represents a more objective point of view. In the literature it is known as reputation of the trustee in the community. The three aspects introduced above should be differently faced on the basis of the kind of the virtual community to analyze. In this paper we aims to investigate them in the context of social internetworking systems, a reality that is currently significantly emerging [2, 3]. A social internetworking system groups several social networks together in order to allow some forms of interoperability and cooperation among them and their members. A social network is an ideal environment for a user to contact other users in a scenario characterized by a high social interaction; users of a social network can interact each other to share resources, to exchange opinions, to cooperate for a goal, and so on. Users of a social internetworking system can be registered to more social networks of the system [12]; a user registered to a social network can interact with all the network’s users; however, users belonging to different social networks cannot directly interact each other. Often, the interaction among users regards the exchange of opinions about available resources: a user can formulate an opinion; other users can evaluate it and other ones can exploit both it and its evaluations to determine new users to contact or new resources to access [4, 16]. In this scenario three main problems should be faced. First, a user of a social internetworking system should be able to determine which users are reliable. For this purpose, according to the third aspect mentioned above, he should consider both the subjective and the objective points of view. Moreover, according to the first two aspects reported above, he should perform his evaluations along several reference dimensions and several reference contexts. Some possible dimensions to consider are: the honesty, the experience, the precision, the effectiveness, the cooperativeness of the trustee. Some possible contexts are sport, university, music, literature, economy, technology; in this scenario, a context should be considered as a symbolic name that identifies a set of logically related topics. As a second problem, even if, in a social internetworking system, users of different social networks cannot directly interact each other, a certain form of cooperation and interoperability among them should be handled. For instance, the general opinion of the users of a social network about a certain resource shared in the social internetworking system should be made available also to users of different social networks. As a third problem, social internetworking systems should promote the cooperation and the interaction among users possibly belonging to different social networks and, therefore, unknown each other. For this purpose a mechanism to support a user in determining the most suitable users with whom he can cooperate appears compulsory. Clearly, this support should deeply consider the notions of trust and reputations, computed along the dimensions and the contexts of interest. This paper aims to provide a contribution in this setting by facing all the three problems mentioned above. For this purpose it introduces a new model of social internetworking system in which the concepts of trust and reputation play a key role.
174
In order to face the first problem introduced above, trust and reputation are not defined as single values but as matrixes whose rows are associated with dimensions and whose columns are associated with contexts. In order to face the second problem described previously, our model introduces the notion of representative user; this is a fictitious user associated with a social network who represents, in a synthetic fashion, the general opinions of the users of that network. The representative user of a social network is automatically registered to all the social networks of the system; as a consequence, representative users are “the bridges” exploited by the system to allow the interaction among users of different social networks. In order to face the third problem introduced above, our model exploits the notions of trust and reputation to determine, for each user of the system, the most promising users he can interact with. For this purpose, given a pair of users u and v, our approach computes the reliability of u as perceived by v by exploiting the trust in u of both v and his closest acquaintances, as well as the reputation of u in the social networks he is registered to. After the reliabilities of all users, as perceived by a user v, have been determined, our approach can select the users with the highest reliabilities and can suggest them to v for a possible cooperation. Some of these users are real whereas other ones are representative. In the former case v can directly contact them to activate an interaction; in the latter case our approach has found new social networks particularly of interest to v and proposes him to register to them. As a consequence, this mechanism allows a dynamic evolution of the system’s social networks. This is a further positive contribution of our approach. To the best of our knowledge, our model and the corresponding approach to handling user cooperation represent the first attempt to define a social internetworking system capable of facing all the problems mentioned above. This paper is organized as follows: Section 2 illustrates the proposed model to represent a social internetworking system. The computation of trust and reputation, as well as their exploitation to favour user cooperation, are described in Section 3. Some experiments devoted to measure the performance of our approach are illustrated in Section 4. In Section 5 we compare our model, and the associated approach, with other related ones already proposed in the literature. Finally, in Section 6, we draw our conclusions.
2. THE SOCIAL INTERNETWORKING MODEL Our model of a social internetworking system SIS consists of three main components, namely a set U of users, a set S of social networks and a set O of user opinions about resources of different format (i.e., texts, images, movies, etc.), belonging to different contexts and present in the social networks of S. Some possible contexts are sport, school, art, etc. In the following we indicate by C the set of all possible contexts. Each social network s ∈ S consists of a set of members, each being a user of U . A user u ∈ U can be registered to more social networks. All the members of a social network can interact each other; by contrast, no direct interaction is handled among members of different social networks. A fictitious user is associated with each social network; he is
called representative user and acts as a sort of “delegate” of the whole social network. Each representative user can be contacted by any other user of SIS; therefore, representative users are considered members of all the social networks of S. In our model an opinion oCS ∈ O is expressed by exactly u one user u and refers to a set CS ⊆ C of contexts for which it is valid. oCS can be evaluated by any user v belonging to a social u network which u is registered to. This evaluation can be performed along one or more dimensions and one or more contexts. Some dimensions usually considered are effectiveness, expertise, honesty and efficiency. In the following we indicate by D the set of all possible dimensions. Clearly, in performing the evaluation of oCS u , v considers the opinion’s content as well as his trust in u. A |D| × |C| matrix Ev,o is exploited to represent the evaluation of oCS performed by v. The generic element Ev,o [i, j] u of Ev,o is either NULL or belongs to the real interval [0, 1]; it denotes the evaluation of oCS performed by v along the u ith dimension and the j th context. If v is a real user, then Ev,o [i, j] is directly provided by him; clearly, Ev,o [i, j] is NULL if j 6∈ CS or if v does not want to evaluate oCS along u the ith dimension and the j th context. If v is a representative user, then Ev,o [i, j] is computed by averaging the evalth uation of oCS dimension and the j th context, u , along the i performed by the users of the corresponding social network; clearly, NULL values are not considered in the average computation and if all involved values are NULL also Ev,o [i, j] is set to NULL. Our model registers also the trust of a user v in a user u. This trust is computed along the same dimensions and contexts considered for opinion evaluations. It is represented by a |D| × |C| matrix Tv,u . The generic element Tv,u [i, j] of Tv,u is either NULL or a real value ranging in the interval [0, 1]; it denotes the trust of v in u along the ith dimension and the j th context. Tv,u [i, j] is computed by averaging the evaluations, along the ith dimension and the j th context, of all the opinions of u performed by v. Again, NULL values are not considered in the average computation and if all involved values are NULL also Tv,u [i, j] is set to NULL. Beside trust, which is a subjective measure, our model considers also reputation, which is an objective measure. Reputation is computed along the same dimensions and contexts considered for opinion evaluations and trust computations. The reputation of a user u in SIS is represented by a |D| × |C| matrix Ru . The generic element Ru [i, j] is either NULL or a real value ranging in the interval (0, 1]; it denotes the reputation of u along the ith dimension and the j th context. Ru [i, j] is computed by applying a suitable methodology based on the PageRank algorithm [8] on the trusts in u, along the ith dimension and the j th context, of all users who interacted with him in the past. All technical details about the computation of Ev,o , Tv,u and Ru , are illustrated in Section 3.1. We are now able to formalize our model of a social internetworking system. Specifically, a social internetworking system SIS can be represented by means of two data structures. The former, called Component Set and denoted by CSetSIS , represents the components of SIS (i.e., its users, opinions and social networks). The latter, called Relationship Set and denoted by RSetSIS , represents the relationships among the components of SIS.
175
Definition 2.1. A social internetworking system is a pair SIS = hCSetSIS , RSetSIS i, where CSetSIS is its Component Set and RSetSIS is its Relationship Set. 2 Definition 2.2. The Component Set of SIS is a tuple CSetSIS = hU, S, Oi, where U is the set of its users, S is the set of its social networks and O is the set of its opinions. 2 Definition 2.3. The Relationship Set RSetSIS of SIS is a tuple RSetSIS = hmm(·), DR, AEi where: • mm(·) : U → 2S is a function that maps a user u ∈ U onto the set of the social networks which he is registered to. • DR = hN SDR , ASDR i is a directed labelled graph. N SDR is the set of its nodes; each node nu ∈ N SDR corresponds to a user u of SIS and is labelled by a pair hnu , Ru i, where Ru denotes the reputation of u in SIS. ASDR is the set of the arcs of DR; each arc of ASDR is a triplet hnv , nu , Tv,u i, where nv is the source node, nu is the target node and Tv,u represents the trust of the user associated with nv in the user associated with nu . • AE = hN SAE , ESAE i is a labelled and directed bipar′ ′′ tite graph. N SAE = N SAE ∪ N SAE is the set of its ′ nodes; there is a node nv ∈ N SAE for each user of U ′′ and a node no ∈ N SAE for each opinion of O. ESAE is the set of edges of AE; each edge of ESAE is a triplet hnv , no , Ev,o i, where nv is the source node, no is the target node and Ev,o is the evaluation of the opinion associated with no performed by the user associated with nv . 2
3. IDENTIFICATION OF RELIABLE USERS AND SOCIAL NETWORKS 3.1 Computation of opinion evaluations, trusts and reputations As specified above, given an opinion oCS and a user v, u our approach handles a matrix Ev,o denoting the evaluaperformed by v. In particular, the generic eletion of oCS u ment Ev,o [i, j] specifies the evaluation of oCS performed by u v along the ith dimension and the j th context. Its value is determined according to the following rules: • If v is a real user, j ∈ CS and v wants to evaluate oCS u along the ith dimension and the j th context, Ev,o [i, j] is set to a value belonging to the real interval [0, 1] and directly provided by v. • If v is a real user, j 6∈ CS or v does not want to evaluate oCS along the ith dimension and the j th context, u Ev,o [i, j] is set to NULL. • If v is the representative user of a social network s ∈ S, Ev,o [i, j] is set to a weighted mean of all the not NULL evaluations of oCS along the ith dimension and the j th u context. If all involved values are NULL then Ev,o [i, j] is set to NULL. i,j Specifically, let RU Ss,o be the subset of real users of s who provided an evaluation of oCS along the ith dimension u
and the j th context; then Ev,o [i, j] is computed as follows: ( P i,j Rw [i,j]·Ew,o [i,j]
as perceived by v; his generic element RLv,u [i, j] estimates the reliability, along the ith dimension and the j th context, that v would assign to u. RLv,u [i, j] should include the following contributions:
This formula shows that the evaluation of Ev,o [i, j], when v is a representative user, is computed by averaging all the evaluations of OuCS performed by users belonging to the social network which v refers to. The evaluations of users are weighted according to the values of the corresponding reputations along the same dimensions and contexts into consideration; specifically, the higher the reputation of a user is, the more his evaluation will be taken into account. Given two users v and u, our approach handles a |D| × |C| trust matrix Tv,u which represents the trust of v in u. The generic element Tv,u [i, j] of this matrix represents the trust of v in u along the ith dimension and the j th context; it is computed by averaging the evaluations of v about the opinions of u along the ith dimension and the j th context. i,j Specifically, let ESv,u be the set of evaluations of v about the opinions of u along the ith dimension and the j th context; then Tv,u [i, j] is computed as follows: P i,j e e∈ESv,u i,j if ESv,u 6= ∅ i,j (2) Tv,u [i, j] = |Ev,u | NULL otherwise
(a) The trust Tv,u [i, j] in u of v along the ith dimension and the j th context; specifically, the higher Tv,u [i, j] is, the higher RLv,u [i, j] should be.
Ev,o [i, j] =
w∈RU Ss,o P
i,j if USs,o 6= ∅
i,j Rw [i,j] w∈RU Ss,o
(1)
otherwise
NULL
Finally, given a user u, our approach handles a |D| × |C| matrix Ru specifying the reputation of u. The generic element Ru [i, j] denotes the reputation of u along the ith dimension and the j th context. Ru [i, j] is computed by applying a suitable methodology based on the PageRank algorithm [8]. Specifically, let RU Sui,j be the subset of real users of U who trusted u along the ith dimension and the ˆ S i,j j th context, let RU u be the subset of real users of U who have been trusted by u along the ith dimension and the j th context; then Ru [i, j] is computed as follows: Ru [i, j] =
ˆ u [i,j] R ˆ w [i,j]} maxw∈U {R
(3)
ˆ u [i, j] and R ˆ w [i, j] denote the absolute reputations where R of u and w, respectively; they are computed by the following system of equations: ˆ x [i, j] = γ + (1 − γ) · R
P
i,j
y∈RU Sx
P
ˆ y [i,j]·Ty,x [i,j] R ˆ i,j Ty,z [i,j]
z∈RU Sy
!
(4) ˆ x [i, j] belongs to the real interval [γ, +∞), whereas Here R Ru [i, j] belongs to the real interval (0,1]. γ is called damping factor and can be used to determine the minimum absolute reputation assigned to each user in SIS as well as to tune the absolute reputation which should be “transmitted” from ˆ x [i, j], one user to another. According to the formula for R the more a user is trusted by users having a high reputation, the higher his reputation will be. As specified in [8], it is possible to show that the system of Equations 4 admits one solution.
3.2 Computation of the reliability of a user as perceived by another user Given two users v and u, our approach handles a |D| × |C| matrix RLv,u specifying the estimation of the reliability of u
176
(b) The trust Tz,u [i, j] in u, along the ith dimension and the j th context, of each user z who can be contacted by v; the higher the length of the path (expressed as the number of hops) between v and z in DR is, the less important the contribution of Tz,u [i, j] in determining RLv,u [i, j] should be. (c) The reputation of u along the ith dimension and the j th context; specifically, the higher Ru [i, j] is, the higher RLv,u [i, j] should be. Starting from the previous considerations, a general formula for the computation of RLv,u [i, j] is: PDepth k α · Ωkv,u [i, j] RLv,u [i, j] = β · k=0PDepth +(1−β)·Ru [i, j] (5) αk k=0
In this formula, α ranges in the real interval (0, 1] whereas β ranges in the real interval [0, 1]; the function Ωkv,u [i, j] indicates the trust Tv,u [i, j] if k = 0, otherwise it denotes the average trust in u as evaluated by all the users who are k hops far from v in DR (see below). In the computation of RLv,u [i, j] we consider only users who are at most Depth hops far from v in DR; here Depth is a fixed number. Clearly, the farther a user is from v, the smaller its contribution to RLv,u [i, j] should be. To encode this intuition we use a sequence {αk } of exponentially decreasing weights to weigh the contribution of Ωkv,u [i, j]. The function Ωkv,u [i, j] is computed in a recursive fashion. The base case specifies the trust of v in u and corresponds to the contribution (a) mentioned above: Tv,u [i, j] if Tv,u [i, j] 6= NULL Ω0v,u [i, j] = (6) 0 otherwise The function Ω1v,u [i, j] is computed as follows: P 0 0 i,j Ωv,z [i,j]·Ωz,u [i,j] z∈U S P u if C1 holds 1 0 [i,j] Ω i,j Ωv,u [i, j] = v,z z∈U Su 0 otherwise
(7)
Here U Sui,j represents the subset of users of U who trusted u along the ith dimension and the j th context. Condition C1 specifies that there exists, in DR, at least one user z such that: (i) there exists one arc from v to z; (ii) there exists one arc from z to u; (iii) Ω0v,z [i, j] 6= 0. The function Ω1v,u [i, j] computes the average trust in u of all the users z who are one hop far from v in DR. This average is weighted according to the values of Ω0v,z [i, j]; specifically, the higher this parameter is, the more Ω0z,u [i, j] will be taken into account. The general formula to compute Ωkv,u [i, j], corresponding to the contribution (b) mentioned above, is: P k−1 0 z∈U Sui,j Ωv,z [i,j]·Ωz,u [i,j] if Ck holds P k k−1 Ωv,u [i, j] = (8) i,j Ωv,z [i,j] z∈U Su 0 otherwise
Condition Ck specifies that there exists, in DR, at least one user z such that there exists one path of length k from v to z and one arc from z to u. Finally, Ru [i, j] in Equation 5 corresponds to the contribution (c) specified above.
3.3 Suggestion of reliable users and new interesting social networks In order to suggest reliable users and new interesting social networks to a certain user v ∈ U , our approach behaves as follows. Preliminarily, v is requested to specify a |D| × |C| matrix Hv whose generic element Hv [i, j] denotes the importance that he gives to the ith dimension and the j th context. Clearly, v must specify Hv only one time, when he registers to the system; whenever he desires, he can suitably update one or more elements of Hv . Then, in order to perform its suggestions, for each user u ∈ U , our approach computes a parameter σv,u , representing a synthetic estimation of the reliability that v would assign to u. σv,u is computed as the weighted mean of the components of RLv,u using the elements of Hv as the corresponding weights; specifically: P|D| P|C| i=1 j=1 Hv [i, j] · RLv,u [i, j] (9) σv,u = P|D| P|C| i=1 j=1 Hv [i, j]
After this, the users characterized by a reliability greater than a suitable threshold are proposed to v. A suggested user could be real or representative; in the former case v can directly contact him to start an interaction; in the latter case v can register to the corresponding social network.
4.
EVALUATION
In this section we present an experimental analysis devoted to measure the performances of our approach. Our analysis has been conceived to answer the following questions: (i) Is our approach able to find users who are actually reliable for a given user v? (ii) Is our approach really able to find “novel” users, i.e., users who are really reliable for v but that he never contacted in the past? (iii) Does the exploitation of dimensions and contexts improve the performances of our approach? To carry out our tests we activated two social networks operating in the University domain. The former (hereafter s1 ) was devoted to BsC students in Computer Engineering, whereas the latter (hereafter s2 ) was conceived for BsC students in Economics. In each social network, topics of discussion referred to seven contexts, i.e. (i) “academic life”, (ii) “travels”, (iii) “sport”, (iv) “career and job”, (v) “night life”, (vi) “accommodations” and (vii) “technology”. In particular, each student was allowed to post his opinion about these arguments and to evaluate the posts submitted by other students. Each post was evaluated along three dimensions, namely: (i) “readability”, (ii) “expertise” and (iii) “honesty”. Students (hereafter, users) of s1 (resp., s2 ) were allowed to join s2 (resp., s1 ). This join process arose spontaneously owing to the transversal nature of some topics (think, for instance, of an announcement about a scholarship open to both Computer Engineering and Economics students). The social networks s1 and s2 , in the whole, formed a social internetworking system SIS. We allowed users to access
177
Table 1: Some statistics about the usage of s1 , s2 and the associated social internetworking system Parameter Number of users Total number of posts Min/Avg/Max number of contacts per user Min/Avg/Max number of posts per user Min/Avg/Max number of rated posts per user
s1 140 1680 1/8/38
s2 180 2340 1/9/46
SIS 264 3289 1/10/51
0/12/105
0/13/145
0/12/125
0/7/84
0/8/71
0/7/81
Table 2: Average Correctness and Average Novelty obtained by our approach for some configurations of Depth, α and β parameters Configuration
Average Correctness
Average Novelty
Depth = 3, α = 0.95, β = 0.75 Depth = 3, α = 0.95, β = 0.5 Depth = 2, α = 0.95, β = 0.75 Depth = 2, α = 0.95, β = 0.5 Depth = 2, α = 0.75, β = 0.75
0.66 0.65 0.74 0.73 0.81
0.38 0.42 0.37 0.41 0.32
Depth = 2, α = 0.75, β = 0.5
0.80
0.38
Depth = 2, α = 0.75, β = 0.25 Depth = 2, α = 0.5, β = 0.5 Depth = 1, α = 0.5, β = 0.5 Depth = 1, α = 0.5, β = 0.25
0.73 0.82 0.84 0.81
0.40 0.31 0.25 0.28
SIS on their own for two months. Some statistics about this usage are reported in Table 1. In order to verify if our approach is really capable of predicting users’ reliabilities, we carried out the following experiment. For each test user v we ran our approach and constructed the set SysSetv of users such that, for each user u ∈ SysSetv , the value σv,u estimated by our approach was greater than a certain threshold set to 0.7. After this, v manually selected, among the users of SysSetv , those considered really reliable by him; selected users formed a set denoted as M anSetv . Finally, among the users of M anSetv , we selected those who never interacted with v in the past and called N ovSetv this set. We processed SysSetv , M anSetv and N ovSetv to obtain two evaluation measures called Correctness (denoted as Corrv ) and Novelty (denoted as N ovv ); they are defined as follows: Corrv =
|M anSetv | |SysSetv |
N ovv =
|N ovSetv | |M anSetv |
Correctness indicates the share of the users considered reliable by our approach who have been judged really reliable by v. Novelty indicates the share of the users considered reliable by both our approach and v but who never interacted with v in the past. Both Correctness and Novelty range in the real interval [0, 1]; the higher they are, the better our approach works. A first experiment aimed to find the values of Depth, α and β parameters (see Equation 5) capable of producing the highest values of Correctness and Novelty. For this purpose, we considered several configurations of these parameters and, for each configuration, we computed the Correctness and Novelty obtained for each user. After this we averaged Correctness and Novelty values across all available users. Due to space limitations, in Table 2 we report only some of these configurations. From the analysis of this table it is possible to observe that the best trade-off between Correctness and Novelty is obtained by setting Depth = 2, α = 0.75 and β = 0.5; in this case the Average Correctness and the Average Novelty were 0.80 and 0.38, respectively.
This result can be explained as follows. As for Depth, we observe that for low values of this parameter (e.g., Depth = 1) our approach achieves high values of Correctness but low values of Novelty. This behaviour is explained by considering that, in this case, our approach selects only users who directly interacted with v in the past; clearly, these users are presumably considered reliable by v, and this leads to high values of Correctness. However, this value of Depth penalizes Novelty, because users “quite far” from v are not taken into account. As an opposite case, high values of Depth (e.g., Depth ≥ 3) would strengthen Novelty to the detriment of Correctness. In fact, if Depth increases, our approach suggests to v also users “quite far” from him. Some of them are considered reliable by v, but many others are refused by him; this leads to a decrease of Correctness. However, the possibility that the users accepted by v are previously unknown to him is quite high because these users were “quite far”; this leads to an increase of Novelty. An intermediate value of Depth (e.g., Depth = 2) guarantees the best tradeoff between Correctness and Novelty. As far as α is concerned, we observe that low values of this parameter (e.g., α = 0.25) would give a great relevance to users close to v and an (almost) null relevance to users who are just few hops far from him. As a consequence, users who did not directly interact with v would be excessively penalized, and this would lead to low values of Novelty. By contrast, high values of α (e.g., α = 0.95) would assign the same “importance” to all users, independently of their closeness to v. This would favour the presence of users “quite far” from v; most of these users could be judged not reliable by him, and this would lead to low values of Correctness. We found that the best trade-off between these two exigencies is obtained when α = 0.75. Finally, we found that the best value for β is 0.5; this implies that, in the computation of the reliability of u as perceived by v, the trust in u of all the other users and the reputation of u are equally important (see Equation 5). Such a result is apparently surprising; in fact, intuitively, we would be oriented to give a higher importance to trust than to reputation since trusts are subjectively provided by users who directly experienced an interaction with u in the past, whereas reputation is a global measure which summarizes the common opinion about the reliability of u and, in our context, is directly connected with the eigenvector centrality of a node in a graph (in our case the node associated with u in DR). However, in real social networks, the number of interactions carried out by a user is almost negligible in comparison with the network size and the number of interactions taking place therein. As a consequence, the trust information that v has at his disposal to evaluate u is anyway limited and can be usefully complemented by the knowledge of the reputation of u. In this scenario reputation plays the role of a “wisdom of crowd”, i.e., it represents the opinion about the reliability of a user not at the user level (i.e., as perceived by another user) but at the group level (i.e., as collectively perceived by the whole community). In a second experiment we studied how dimensions and contexts impact on the Correctness and the Novelty achieved by our approach. For this purpose we considered different running configurations, each characterized by a certain number of dimensions and contexts. Due to space limitations, in the following, we discuss only five cases: • Case 1. A unique dimension (taking “readability”, “ex-
178
pertise” and “honesty” into account) and a unique context (consisting of the union of the seven contexts specified above) are considered. In this case RLv,u is a scalar. • Case 2. Two dimensions (the former taking “readability” and “expertise” into account and the latter considering “honesty”1 ) and two contexts (the former consisting of the union of the contexts “academic life”, “career and job” and “accommodations” and the latter consisting of the union of the other four contexts) are considered. This is equivalent to handle matrices RLv,u consisting of two rows and two columns. • Case 3. The two dimensions above and four contexts (the first consisting of the union of the contexts “academic life” and “career and job”, the second consisting of the union of the contexts “travels”, “sport” and “night life”, the third being the context “accommodations” and the fourth being the context “technology”) are considered. • Case 4. All the three dimensions into consideration in this experiment and the four contexts specified for Case 3 have been considered. • Case 5. All the three dimensions into consideration in this experiment and all the seven contexts are considered. In Figure 1 we report the values of Correctness and Novelty averaged across all test users for Cases 1–5. From the analysis of this figure it is possible to conclude that the increase of the number of dimensions and contexts has a beneficial impact in both Correctness and Novelty. This behaviour can be explained as follows. Consider Case 1; it is characterized by only one dimension taking “readability”, “expertise” and “honesty” into account. Assume a user u posts opinions characterized by a high “readability” and “expertise” but low “honesty”. The unique considered dimension would be quite high and, consequently, our approach would suggest u to users, even to those who would sacrifice “expertise” and/or “readability” to “honesty”. The adoption of multiple dimensions prevents from these failures because it allows our approach to accurately handle the case in which a user is reliable in some dimensions and not reliable in other ones in such a way as to suggest him in the former case and to not suggest him in the latter one. A symmetrical reasoning shows that the adoption of multiple contexts, instead of only one, has positive implications on Correctness and Novelty.
5. RELATED WORK Trust and reputation parameters have been highly investigated in computer science literature and many approaches have been proposed in various research areas (see [6] for a detailed survey). However, to the best of our knowledge, the problem of computing trust, reputation and reliability in a social internetworking system has not been addressed yet. In this section we compare our approach with some approaches conceived to compute trust and reputation in a 1
Observe that “readability” and “expertise” jointly represent an indicator of the effectiveness of a user in producing opinions.
1 0,9
Average Correctness Average Novelty
0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 Case 1
Case 2
Case 3
Case 4
Case 5
Figure 1: Average Correctness and Average Novelty achieved by our approach for Cases 1–5 single social network. These approaches can be classified into three categories, namely: graph-based, link-based and expert finding. These categories are examined in detail in the next subsections.
5.1 Graph-based approaches Graph-based approaches model a social network as a graph G whose nodes represent users [1, 12, 13, 22]. An edge linking two nodes v and u indicates that the user v explicitly trusts the user u. G is usually sparse because a user typically evaluates a handful of other users; as a consequence, various techniques have been proposed to infer implicit trust relationships. In detail, the approach of [1] applies a maximum network flow algorithm on G to compute trust between any pair of users. In [12] the authors apply a modified version of the Breadth First Search algorithm on G to infer multiple values of reputation for each user; these values are then aggregated by applying a voting algorithm to produce a final (and unique) value of reputation for each user. The approach of [13] considers paths up to a length k in G and propagates the explicit trust values on them to obtain the implicit ones. In [22] trust values are computed by applying a spreading activation algorithm. Our approach is closely related to the graph-based ones. Specifically, like [13], it aggregates the opinions of many users to avoid biases (which may correspond to malicious users); moreover, differently from [13], trust relationships can evolve over time and are recursive. Our approach presents some relevant novelties w.r.t. graphbased ones. Specifically, in it, trust values are merged with reputation values to obtain a global value representing the reliability of a user. Moreover, the graph-based approaches described above do not allow opinions expressed by a user to be reviewed by other users. This supplementary feature, typical of our approach, allows more precise and objective trust computations. In addition, our approach considers a multidimensional model for trust and reputation; by contrast, all the approaches described above model trust as a binary value or as a real value ranging in the interval [0, 1].
5.2 Link-based approaches Link-based approaches use ranking algorithms such as PageRank [8] or HITS [15], which have been successfully applied in the context of Web Search, to find trust values. For instance, [14] proposes Eigentrust, an approach based on
179
PageRank to measure peer reputation in a peer-to-peer network. The approach of [18] defines a probabilistic model of trust which strongly resembles that described in [14]; however, differently from this last, the approach of [18] computes and handles trust values and not reputation values. In [10] the authors present an algorithm which computes global reputation values in a peer-to-peer network; the proposed algorithm uses a personalized version of PageRank along with information about the past experiences of peers. Experimental tests have indicated that link-based methods can obtain precise results and are often attack-resistent, i.e., they can resist to attempts conceived to manipulate reputation scores. There are some similarities between our approach and the link-based ones. Specifically, also in our approach user reputation is computed by applying a modified version of the PageRank algorithm and this information is adopted to compute user reliability. In addition, as in the approach of [10], information about the past user interactions is exploited in the computation of user reliability. As for the main differences between our approach and the link-based ones, we can observe that in some approaches trust is conceived as a measure of performance; for instance, in Eigentrust, the trust of a peer depends on the success of downloading a file from it and, then, trust depends on parameters like the number of corrupted files stored in the peer or the number of connections with the peer that have been lost. By contrast, in our approach, trust quantitatively encodes the confidence of a user in the opinions formulated by other ones. As a further difference, our approach has been designed to operate in social sites in which users can evaluate resources and, possibly, can rate the evaluations of other users. This information could usefully complement rankings produced by applying link-based approaches.
5.3 Expert finding approaches Some approaches focused on the problem of finding highly expert users in online communities; this problem is known as expert finding. For instance, [21] considers an online forum in which users can make posts of various nature (e.g., they can submit a question or make an announcement) and can answer posts made by other users. User relationships are modelled by means of a directed graph; each node of this graph corresponds to a user; an arc is created between the node corresponding to the user who made a post and the node corresponding to the user who replied to this post. Finally, ranking algorithms, like PageRank or HITS, are applied on the graph to rank each user. In [9] the authors consider a social network consisting of users who have exchanged e-mail messages. This network is represented by means of a graph in which each node corresponds to a user and each arc links two users who exchanged at least one message. Users are ranked by applying the HITS algorithm on this graph. In [7] the authors propose a technique to find experts within a firm. This technique defines a probabilistic generative model to represent experts’ skills; given a query q, it ranks candidates according to their probability of being experts in the topics appearing in q. Our approach is clearly related to expert finding ones because both of them aim to search a social network and to identify chains of users who can get in touch. However,
there is a subtle (but important) distinction between them. In fact, expert finding approaches aim to find the best expert for a given user request, i.e., the expert with the highest expertise value on a given topic. Our approach’s perspective is different because we think that, in real social networks, there is a plurality of forms by which a user can interact with other ones, and these interactions can bring different value add to him. For instance, think of a social network formed by Web developers. In some cases a user could be more interested in organizing a debate involving the maximum possible number of reliable community members rather than in finding the most skilled experts in a certain argument. Clearly, an expert finding approach would fail in supporting him because it would suggest him only one (even if the best) expert. Our approach can support this goal much better in that it is capable of determining the reliability of involved users. Owing to the multidimensional nature of our approach, this reliability considers not only the expertise but also the honesty, the precision, the efficiency, the cooperativeness, etc. of involved users. Once user reliabilities have been found, our approach suggests a pool of users or, even, new social networks to contact and, for each of them, specifies the estimated reliability degree.
6.
CONCLUSIONS
In this paper we have proposed a new model and a related approach to represent and handle trust and reputation in social internetworking systems. We have seen that a social internetworking system consists of a set of social networks; users of the system can be registered to more social networks; all users of a social network can interact each other whereas no direct interaction is possible among users of different social networks. However, the necessity arises to promote a certain interaction among users of different social networks who share interests and did not previously interact only because they did not know each other. To favour this interaction we have proposed an approach that exploits our trust and reputation model to provide a user with suggestions about the most reliable users he can contact or social networks he can register to. We have also presented some tests that we have carried out to evaluate the performances of our approach. Finally, we have compared our approach with some related ones already proposed in the literature. As for future developments of our approach, we argue that it could be extended to handle, in a more sophisticated fashion, the dynamic evolution of social networks in a social internetworking system; for instance, it could promote the merge of two excessively similar social networks or the split of a scarcely cohesive social network. In addition, we plan to integrate our approach with a recommender system in such a way as to propose to a user of a social internetworking system new resources produced or accessed by other users possibly belonging to social networks he did not know and appearing reliable to him.
7.
REFERENCES
[1] Advogato’s trust metric. http://www.advogato.org/trust-metric.html, 2000. [2] FriendFeed. http://friendfeed.com/, 2009. [3] Google Open Social. http://code.google.com/intl/it-IT/apis/ opensocial/, 2009.
180
[4] E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proc. of the International Conference on Web Search and Web Data Mining (WSDM ’08), pages 183–194, Palo Alto, CA, USA, 2008. ACM Press. [5] J. Ahn, X. Sui, D. DeAngelis, and K.S. Barber. Identifying beneficial teammates using multi-dimensional trust. In Proc. of the International Joint conference on Autonomous agents and Multiagent systems (AAMAS ’08), pages 1469–1472. International Foundation for Autonomous Agents and Multiagent Systems, 2008. [6] D. Artz and Y. Gil. A survey of trust in computer science and the Semantic Web. Web Semantics: Science, Services and Agents on the World Wide Web, 5(2):58–71, 2007. [7] K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In Proc. of the International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2006), pages 43–50, Seattle, WA, USA, 2006. ACM Press. [8] S. Brin and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks, 30(1-7):107–117, 1998. [9] C.S. Campbell, P.P. Maglio, A. Cozzi, and B. Dom. Expertise identification using email communications. In Proc. of the International Conference on Information and Knowledge Management (CIKM ’03), pages 528–531, New Orleans, LA, USA, 2003. ACM Press. [10] P. Chirita, W. Nejdl, M. T. Schlosser, and O. Scurtu. Personalized Reputation Management in P2P Networks. In Proc. of the International Workshop on Trust, Security, and Reputation on the Semantic Web, CEUR Workshop Proceedings, Hiroshima, Japan, 2004. CEUR-WS.org. [11] D. Gefen. Reflections on the dimensions of trust and trustworthiness among online consumers. SIGMIS Database, 33(3):38–53, 2002. [12] J. Golbeck and J.A. Hendler. Inferring binary trust relationships in Web-based social networks. ACM Transactions on Internet Technology, 6(4):497–529, 2006. [13] R. Guha, R. Kumar, P. Raghavan, and A. Tomkins. Propagation of trust and distrust. In Proc. of the International Conference on World Wide Web (WWW ’04), pages 403–412, New York, NY, USA, 2004. ACM Press. [14] S.D. Kamvar, M.T. Schlosser, and H. Garcia-Molina. The Eigentrust algorithm for reputation management in P2P networks. In Proc. of the International Conference on World Wide Web (WWW 2003), pages 640–651, Budapest, Hungary, 2003. ACM Press. [15] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999. [16] H. Liu, E. Lim, H.W. Lauw, M. Le, A. Sun, J. Srivastava, and Y.A. Kim. Predicting trusts among users of online communities: an epinions case study. In Proc. of the ACM Conference on Electronic Commerce
[17]
[18]
[19]
[20]
[21]
[22]
(EC ’08), pages 310–319, Chicago, IL, USA, 2008. ACM. M. Rehak and M. Pechoucek. Trust modeling with context representation and generalized identities. In Proc. of the International Workshop on Cooperative Information Agents (CIA ’07), pages 298–312, Delft, The Netherlands, 2007. Springer. M. Richardson, R. Agrawal, and P. Domingos. Trust Management for the Semantic Web. In Proc. of International Conference on Semantic Web (ISWC 2003), pages 351–368, Sanibel Island, FL, USA, 2003. Lecture Notes in Computer Science, Springer. J. Sabater and C. Sierra. REGRET: reputation in gregarious societies. In Proc. of the International Conference on Autonomous Agents (Agents 2001), pages 194–195, Montreal, Quebec, Canada, 2001. ACM Press. J. Sabater and C. Sierra. Review on computational trust and reputation models. Artificial Intelligence Review, 24(1):33–60, 2005. J. Zhang, M.S. Ackerman, and L.A. Adamic. Expertise networks in online communities: structure and algorithms. In Proc. of the International Conference on World Wide Web (WWW 2007), pages 221–230, Banff, Alberta, Canada, 2007. ACM Press. C. Ziegler and G. Lausen. Propagation models for trust and distrust in social networks. Information Systems Frontiers, 7(4-5):337–358, 2005.
181