Jul 5, 1999 - trust, eTrust, which exploits the dynamics of user prefer- ences in the ...... fore we split the whole dataset into 11 timestamps, i.e.,. T = {T1,...,T11} ...
eTrust: Understanding Trust Evolution in an Online World Jiliang Tang, Huiji Gao and Huan Liu
Atish Das Sarma
Computer Science & Engineering Arizona State University Tempe, AZ 85281 {Jiliang.Tang, Huiji.Gao, Huan.Liu}@asu.edu
eBay Research Labs eBay Inc. San Jose, CA, USA
ABSTRACT Most existing research about online trust assumes static trust relations between users. As we are informed by social sciences, trust evolves as humans interact. Little work exists studying trust evolution in an online world. Researching online trust evolution faces unique challenges because more often than not, available data is from passive observation. In this paper, we leverage social science theories to develop a methodology that enables the study of online trust evolution. In particular, we propose a framework of evolution trust, eTrust, which exploits the dynamics of user preferences in the context of online product review. We present technical details about modeling trust evolution, and perform experiments to show how the exploitation of trust evolution can help improve the performance of online applications such as rating and trust prediction.
Categories and Subject Descriptors H3.3 [Information Storage and Retrieval]: Information Search and Retrieval—Information filtering
General Terms Algorithms, Design, Experimentation
Keywords Multi-faceted Trust, Trust Evolution, User Preference
1. INTRODUCTION The notion of online trust has attracted increasing attention from the computer science community in recent years [8, 22, 15]. Trust plays a central role in helping users overcome perceptions of risk and insecurity [6], especially for online users who seek advice from trusted sources to make decisions. The trustworthiness of the users is often tantamount to the reliability of the information they provide. Trust is
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD’12, August 12–16, 2012, Beijing, China. Copyright 2012 ACM 978-1-4503-1462-6 /12/08 ...$10.00.
{atish.dassarma}@gmail.com
widely exploited to help online users collect reliable information in applications such as high-quality reviews detection [19] and product recommendation [10]. Most existing work on online trust assumes static trust relationships between users [27, 8, 7]. However, trust evolves as humans interact based on the findings from social sciences [11, 30, 12]. Sociologists investigate the evolution of trust in a physical world [30, 29]. Recent years witness many trust related online applications such as intelligent recommender systems [21, 7], collaborative filtering [10, 2], review quality prediction [19] and viral marketing [26], but little work is on online trust evolution due to some unique challenges. For example, in an online world, most of the information that a sociologist collects for trust evolution may not be available. In a physical world, sociologists first invite a group of participants, usually a small group, and then record their sociometric information on interpersonal trust and conditions or situations for the change [30]. In an online world, users are distributed all over the world. Even if one could find a study group, it would be difficult to gather sociometrics on trust. In other worlds, passive observation is the modus operandi for studying trust evolution. Since it is important to study trust evolution, we ask whether we can study online trust evolution and how we can gather online information to model trust evolution. An important characteristic of product review sites such as Epinions1 is that there exist trust networks among users. Such sites provide a sensible platform to study trust in an online world [8, 24, 18, 20]. Figure 1 shows a simple online rating system from Epinions at two different time points, denoted as T1 and T2 . There are two types of objects: users (u1 to u5 ) and items (I1 to I5 ), and two types of actions: establishing trust relations among users and creating ratings from users to items. The rating system evolves over time, highlighted in Figure 1(b), when (1) new users (e.g.,u5 ) and new items (e.g.,I5 ) are added; and (2) new trust relations (e.g.,u2 → u1 ) and new ratings (e.g.,u5 → I4 ) are created. The dynamic online rating system on product review sites serves an observable environment to investigate online trust evolution. First, trust networks with temporal information from product review sites are available, including time points when users entered and the trust relations established. Second, sources, reflecting the changing of user preferences, are available from product review sites. The dynamics of user preferences can be captured by their rating information, widely exploited for collaborative filtering [4, 14, 16]. The evolution of people’s trust can be witnessed 1
http://www.epinions.com
!
!
•
"
•!
•!
•"
•" •
"
•#
•# ••
•• #
#
•$
$
Rating
Rating
Users and the Trust Network
Users and the Trust Network
Items
(a) Rating System at T1
Items
(b) Rating System at T2
Figure 1: A Simple Example of Rating System at Time T1 and T2 via the changes of their preferences [11]. For example, when people are interested in “Electronics” at time t, they trust experts in “Electronics” while people shift their preferences to “Sports” at time t+1, they trust experts in this facet. Ziegler et al. [32] pointed out that there is a strong and significant correlation between trust and user preference similarity. The more similar two persons are, the greater the trust between them exists. In other words, trust relationships will evolve with the drifting of user preferences. In this paper, we make a first attempt to study online trust evolution by exploiting the dynamics of user preferences in the context of online product review. Main contributions of this work include: • Providing a methodology to study the evolution of trust in an online world; • Proposing a framework, eTrust, to understand trust evolution by exploiting the dynamics of user preferences of online product review; • Presenting findings from this eTrust study about the change of user preferences w.r.t. users or items, and trust relations; and
trust relations and creating new ratings. Let Xt ∈ Rnt ×nt denote the trust network at time t where Xt (i, j) = 1 if ui is trusted by uj at time t and zero otherwise. Let Rt ∈ Rnt ×mt t represent the ratings at time t and rij is used to represent the entity at i-th row and j-th column, i.e., the rating of Ij given by ui at time t. Users may have quite different preferences over items from different facets. We assume that there are K latent facets for items and a user has similar preferences over items of the same latent facet. Let pti ∈ RK + denote the ui ’s preference vector at time t and each element of pti , pti (k), represent the ui ’s preference over the k-th facet at time t. Let qj ∈ RK + be the characteristic vector of the item Ij and qj (k) denote the characteristic of Ij in the k-th latent facet. Let Wt ∈ Rnt ×nt ×K denote the multi-faceted trust relations at time t is used to denote the strength of trust relation t and wivk from ui to uv in the k-th facet at time t. With above notations, the trust evolution problem with the dynamics of user preferences can be stated below. Given T time slices, {Ut }Tt=1 , {It }Tt=1 , {Xt }Tt=1 and {Rt }Tt=1 , understand the evolution of trust (eTrust), i.e., {Wt }Tt=1 , by exploiting the changing of user preferences, {{pti }ui ∈UT }Tt=1 , which is formally defined as, eTrust : {Ut , It , Xt , Rt }Tt=1
• Evaluating eTrust via online applications such as rating and trust prediction using real-world data. The rest of paper is organized as follows. The problem of trust evolution is formally stated in Section 2. Section 3 introduces the details about our proposed framework, eTrust. Section 4 discusses applications of eTrust. Section 5 presents experimental results and findings about eTrust. Section 6 briefly reviews related work. Finally, Section 7 concludes this study with future work.
2. PROBLEM STATEMENT A rating system of product review sites consists of two types of objects with two different actions, as shown in Figure 1(a). Let Ut = {u1 , u2 , . . . , unt } be the set of users at time t (e.g., UT1 = {u1 , u2 , u3 , u4 }) where nt is the number of users at time t. It = {I1 , I2 , . . . , Imt } is the set of items at time t (e.g., IT1 = {I1 , I2 , I3 , I4 }) where mt is the number of items at time t. Note that new users or new items may be added 2 thus Ut ⊆ Ut+1 and It ⊆ It+1 . Users can perform two types of actions in the rating system: establishing new 2 We do not consider the deletion of existing users or items in this work.
{{pti }ui ∈UT }Tt=1 −−−−−−−−−−−→
3.
{Wt }Tt=1
(1)
A TRUST EVOLUTION FRAMEWORK
One challenge to studying online trust evolution is lack of conventional sociological data, and passive observation is a common approach to data collection. The dynamic rating system on product review sites provides a new means to the study of online trust evolution. In the next subsections, we introduce a framework of trust evolution, eTrust, with details on modeling and parameter estimation.
3.1
eTrust - Modeling Trust Evolution
It is observed [32] that trust has a strong correlation with user preference similarity in rating systems, reflected in their rating information. Thus most existing work about trust is studied in the context of rating prediction [2, 27]. In this work, we explore the dynamics of user preferences to model trust evolution for rating prediction. The user-item pairs, t (i, j), and their known ratings at time t, rij , are stored in t t Ot = {(i, j)|rij is known}. We use rˆij to distinguish the pret dicted rating from the known rating rij . A baseline model for rating prediction is the latent factor model [13] based on
user preferences pti and item characteristics qj , t t rˆij = q⊤ j pi =
K X
qj (k)pti (k),
as the following minimization problem,
P
t
rvj = e−ηi (t−tvj ) rvjvj ,
(4)
ηi ≥ 0 controls the user specific decay rate for ui and should t in Eq. (3) is the trust strength be learnt from the data. wivk between ui and uv in the k-th facet at time t. There is a strong and significant correlation between trust and user preference similarity [32]. Thus we define stivk ∈ RL as the preference similarity vector between ui and uv in the k-th facet at time t, based on their preferences in k-th facet at time t, i.e., pti (k) and ptv (k), where L is the number of metrics to measure their preference similarity. We further t assume a linear relation between trust strength wijk and t sijk [31], formally stated as, t wivk = f (w⊤ stivk + bi ),
+β
i
(6)
In this formulation, the rating of ui on Ij at time t is determined by two factors. One captures the preferences of ui at time t and the characteristics of Ij , and the other is t about user’s trusted people. rˆij should be influenced by ui ’s t trusted people. wivk indicates their trust strength in the t k-th facet. The stronger ui trusts uv , the more similar rˆij is to rvj . The second part models the influence from their trusted people. The parameter α ∈ [0, 1] adjusts the contributions from these two parts. Then the trust evolution problem, embedded in rating prediction, can be formulated
X
PK
k=1 f (w
v∈Nit
T X
PK
+
kwk22
+
X
kpti k22 +
+λ
X
t rij −α
K X
qj (k)pti (k)−
k=1
⊤ t sivk
+ bi )qj (k)rvj 2 + bi )qj (k)
X
kqj k22
X
kηi k22
Ij ∈IT
kbi k22
+
K X
ui ∈UT T X
⊤ t k=1 f (w sivk
ui ∈UT t=tui
ui ∈UT
(k) c pti (k) − pt−1 i
ui ∈UT t=tui +1 k=1
(7)
where the parameter β controls the quadratic regularized term to avoid overfitting, c(·) a function to model how user preferences change, and λ for the speed of change. When λ → 0, we do not consider any relations of user preferences at different time points. When λ → +∞, the user preference vectors at different time points are restricted to be the same.
3.2
Estimating Parameters for eTrust
eTrust requires the following parameters: w, bi , ηi , the characteristics vector for each item Ij , qj , the user preference vector for each user ui at each time slice t, pti . Different users enter the rating system at different time points, tui for user ui . Thus, for each ui , we need to estimate his preferences from tui to T , {pti }Tt=tui . In our implementation, we use a projected gradient method to solve Eq. (7) and parameters can be updated as, w ← w − γ w ∇w , ∀ui ∈ UT b i ← b i − γ b i ∇b i , ∀ui ∈ UT ηi ← max(0, ηi − γηi ∇ηi )
(5)
where f : R → [0, 1] is an active function to control the trust strength, 0 ≤ wivk ≤ 1 and bi is a user specific bias of ui . The latent factor model does not incorporate the influence from the trust network while the neighborhood model does not consider the user preferences and item characteristics. Earlier work [13, 2, 27] indicated that a proper combination of these two models would help rating prediction. Therefore, we estimate the rating of ui about Ij at time t as, PK P t K X k=1 wivk qj (k)rvj v∈Nit t t qj (k)pi (k) + (1 − α) P rˆij = α PK t k=1 wivk qj (k) v∈N t k=1
v∈Nit
(1 − α) P
i
where Nit represents the set of people trusted by ui at time t t. Assume that rvjvj (tvj < t) is the rating given by uv to Ij at time tvj . The influence of uv to ui on the rating given t t to Ij at time t, or the impact of rvjvj on rij , is related to the distance between tvj and t. The earlier ratings reflect users’ previous preferences and should have less influence on the current ratings [4]. The closer tvj and t, the more t t strongly rvjvj influences rij . Thus we choose an exponential time function to allow the influence of past ratings to decay gradually, which is widely adopted in such applications [4, 14]. Thus rvj in Eq. (3) can be stated as,
X
pti ≥0, qj ≥0, ηi ≥0 t=1 (i,j)∈Ot
k=1
Another approach for rating prediction is based on the trust network, a variant of the nearest-neighborhood model [13] by considering multi-faceted trust relationships between users, PK P t k=1 wivk qj (k)rvj v∈Nit t (3) rˆij = P PK t k=1 wivk qj (k) v∈N t
T X
min
(2)
∀Ij ∈ IT , k ∈ [1, K] qj (k) ← max(0, qj (k) − γqj (k) ∇qj (k) ), ∀ui ∈ UT , t ∈ [tui , T ], k ∈ [1, K] pti (k) ← max(0, pti (k) − γpti (k) ∇pti (k) , where γw , γbi , γηi , γqj (k) and γpti (k) are learning step sizes, which are chosen to satisfy Goldstein Conditions [25]. ∇w , ∇bi , ∇ηi , ∇qj (k) and ∇pti (k) are the partial derivatives of the objective function in Eq. (7) with respect to w, bi , ηi , qj (k) and pti (k), respectively,
∇w = 2βw − 2(1 − α)
T X
X
t Eij
t=1 (i,j)∈Ot
Qtij
X
K X
′ fw qj (k)rvj − Pijt
v∈Nit k=1
∇bi = 2βbi − 2(1 − α)
1 (Qtij )2
K X X
v∈Nit k=1 T X
X
t=1 (i,j)∈Ot
t Eij
′ fw qj (k) ,
1 (Qtij )2
K X X
Qtij
K X X
fb′i qj (k)rvj − Pijt
v∈Nit k=1
fb′i qj (k)
v∈Nit k=1
c(·) is used to capture how user preferences change over time. User preferences are usually assumed to change smoothly over time [16]. Under this assumption, c(·), c′pt−1 k and c′pt k
i
K t X X are defined as follows: Eij f qj (k)rvj (t − tvj ) ∇ηi = 2βηi + 2(1 − α) t Q ij c pti (k) − pt−1 (k) = kpti (k) − pt−1 (k)k22 , t=1 (i,j)∈Ot v∈Nit k=1 i i T X X 1 c′pt−1 k = −2 pti (k) − pt−1 (k) , t t i ∇qj (k) = 2βqj (k) − 2 Eij αpi (k) + (1 − α) t 2 i (Qij ) t=1 (i,j)∈Ot ′ t t−1 (k) . c t pi k = 2 pi (k) − pi X X f) , f rvj − Pijt (Qtij T X
∇pti (k) =
4.
v∈Nit
v∈Nit
2βpti (k)
X
−2
t Eij
(i,j)∈Ot
αqj (k) + (1 − α)
X
X
z∈Tit (z,j)∈Ot
1 Qtzj fp′ t (k) qj (k)rij i (Qtzj )2
t − Pzj fp′ i (k) qj (k) + Hit
(8)
′ where fw , fb′i and fp′ t (k) are the partial derivatives of f with i
respect to w, bi and pti (k), respectively. Tit denotes the users t who trust ui at time t. Eij , Pijt , Qtij and Hit are defined as, t t Eij = rij −α
K X
qj (k)pti (k)−
k=1
P
Pijt
v∈Nit
(1 − α) P =
PK
k=1 f (w
v∈Nit
K X X
PK
⊤ t sivk
+ bi )qj (k)rvj
⊤ t k=1 f (w sivk
f (w⊤ stivk
+ bi )qj (k)
,
Qtij =
X
+ bi )qj (k)rvj ,
v∈NiT +1
+ (1 − α) P
f (w⊤ stivk + bi )qj (k),
v∈Nit k=1
(k) − pti (k)), t = tui −λc′pt k (pt+1 i i λc′ t (pti (k) − pt−1 (k)), t = T i pi k Hit = t−1 t+1 ′ t ′ t λ c (p (k) − p (k)) − c (p (k) − p (k)) , t t i i i i pi k pi k tui < t < T
Next we discuss our choice of f (·) and c(·) for eTrust. f (·) is an active function to control the estimated trust strength in [0, 1]. It should be real-valued and differentiable function and the range of this function is limited in [0, 1]. However, we still lack the detailed description about this function. In this case, a sigmoid function is often used [1]. It exhibits a progression from small beginnings that accelerates and approaches a climax over the domain, formulated as, f (w⊤ stivk + bi ) =
1 . 1 + exp(−(w⊤ stivk + bi ))
(9)
′ Then fw , fb′i and fp′ t (k) are formally stated as, i
′ fw fb′i
= f (1 − f )stivk , = f (1 − f ),
fp′ t (k) i
Rating Prediction
Rating prediction is the task of predicting a given user’s ratings for a given item based on past ratings or other information, regarded as one of the most important algorithms for recommendation system. People tend to seek advices from their trusted friends thus trust networks are widely exploited to improve the task of rating prediction. Most existing trust network based methods assume single and static trust relations. However, rating systems in product review sites are highly dynamic and user preferences are drifting over time, indicating the evolution of trust relations. Thus eTrust, modeling the trust evolution, can be used to further improve the performance of prediction. Given the previous ratings {Ot }Tt=1 , the task of trust prediction aims to predict the ratings at time T + 1, OT +1 . The rating from ui to Ij at time T + 1 can be predicted as, T +1 T +1 rˆij = αq⊤ j pi P
v∈Nit k=1 K X
APPLICATIONS OF eTrust
4.1
v∈Ni
v∈Ni
= f (1 − f )w⊤ (stivk )′pt (k) . i
(11)
We discuss the role of eTrust in product review applications to help improve the performance of rating and trust prediction.
X ′ X ′ 1 t fpt (k) qj (k) fpt (k) qj (k)rvj − Pijt t 2 (Qij i i (Qij ) t t − 2(1 − α)
i
X
(10)
PK
v∈NiT +1
k=1 f (w
PK
⊤ T +1 sivk
+ bi )qj (k)rvj
⊤ T +1 k=1 f (w sivk
+ bi )qj (k)
, (12)
where w is learnt from testing data. For the existing item, Ij , its characteristics vector, qj , is not evolved. For the existing user, ui , bi and ηi are independent on time slices, directly applied to T + 1. We define the change speed of ui ’s preference in the k-facet, Zik , as, Zik =
1 T − tu i − 1
T X
|pti (k) − pt−1 (k)|, i
(13)
t=tui +1
then user preference at T +1, i.e., pTi +1 (k), can be estimated from that at T by considering the change speed of the user preference as, pTi +1 (k) = pTi (k) + Zik .
(14)
At time T + 1, new users and new items might be introduced; and a challenge for eTrust is how to address the cold-start problem where no historical ratings on items or users are available. Homophily [23] is employed to deal with the cold-start problem: similar users are more likely to trust each other. Therefore, for a new user, we first find his topℓ similar users based on their profiles to establish his trust network and then use the average user preferences from his trust network to estimate his preferences. A similar strategy is adopted for each new item: the characteristics of a new item are estimated from its top similar items.
4.2 Trust Prediction
4
1 kpTi +1 k2
K X
T +1 piT +1 (k)wivk ,
3
1
10
0
0
10 0 10
1
2
10
3
10 Trustees
10 0 10
4
10
10
(a) Trustees Distribution
1
2
10
3
10 Trustors
4
10
10
(b) Trustors Distribution
1
1 Trust Rating
0.9
Users Items
0.9 0.8
Ratio of New Objects
0.8 0.7 0.6 0.5 0.4 0.3 0.2
0.7 0.6 0.5 0.4 0.3 0.2
0.1 0
2
10
1
10
0.1 1
2
3
4
5
6
7
8
9
10
11
0
1
2
Timestamps
3
4
5
6
7
8
9
10
11
Timestamps
(c) New Actions Distribution (d) New Objects Distribution Figure 2: Distributions in Epinions Table 1: Statistics of the Dataset # of Users 22,166 # of Items 296,277 # of Categories 27 # of Ratings 922,267 # of Links 355,813 First Rating on Jul 05 1999 Last Rating on May 08 2011 Trust Network Density 0.0014 Clustering Coefficient 0.1518
5. EXPERIMENTS
To study the evolution of trust, we collect a dataset from a popular product review site, Epinions, in the month of May, 2011. In Epinions, people can rate various products and add members to their trust networks or “Circle of Trust”. We started with a set of most active users and then did breadth-first search until no new users could be found. For each user, we collected their profiles (user name, location, real name, registration time, self-description, and favorite websites), product rating entries and trust relations. For each product rating entry, we collected the time point of the entry, product name, category and the rating. For each trust relation, we collected trustee, trustor and the time point when the trust relation established. We compute the trustors and trustees for each user and the distributions are shown in Figure 2(a) and Figure 2(b). Most people have few trustors and trustees, while a few users have an extremely high number of trustees or trustors: suggesting a power-law distribution that is typical in social networks. We check the distributions of new actions and new objects w.r.t. the current number of actions and objects. The results for new actions and new objects are shown in Figure 2(c) and Figure 2(d), respectively: also suggesting power-law-like distributions. Some other statistics of the dataset are shown in Table 1. The earliest rating was published on Jul 05, 1999 and the latest one was on May 08, 2011. However, the temporal information about the trust relations established before
Number of Users
Number of Users
2
(15)
5.1 Dataset and Experiment Setting
3
10
10
k=1
In this section, we conduct experiments to evaluate the proposed framework, eTrust. We first introduce the details about the dataset used in the experiments; next describe the findings and results of applying eTrust to online applications, i.e., rating prediction and trust prediction.
10
10
Ratio of New Actions
Trust between users is widely exploited by search and recommendation systems. Inferring unknown trust between users attracts more and more attention recently [8, 18, 3]. Trust propagation model is a popular model to derive trust relationships based on known trust relationships [8]. However, users’ trust relationships usually follow a pow-law distribution and 80% of users are in the long tail. Thus in practical, there is not enough information to apply this technique. Furthermore, the propagation model is not directly applicable to the new users who have little information. Previous work suggests that models, combining rating similarity with trust networks, can achieve better performance than trust propagation model [18, 3]. eTrust can be applied to the task of trust prediction. Given trust networks and ratings before time T , we want to recommend trust relationships at time T + 1. eTrust learns its parameters by solving the problem in Eq. (7) based on the trust networks and rating information before T . Then the trust strength between ui and uv in the k-th facet can be calculated by Eq. (5). For a new user, we use the average user preferences of his top-ℓ similar users in terms of profiles to represent his preferences. After obtaining trust strengths for each facet, the overall strength can be computed as, T +1 wiv =
4
10
11th January, 2001 is not available from Epinions. Therefore we split the whole dataset into 11 timestamps, i.e., T = {T1 , . . . , T11 }, where T1 covers the data before 11th January, 2001, T11 contains data after 11th January, 2010 and for T2 to T10 , each of them contains one-year data. For example, T2 contains data from 12th January, 2001 to 11th January, 2002. In the following experiments, we choose T1 to T10 as the training set to estimate the parameters of eTrust. In Eq. (7), stivk is the preference similarity vector for ui and uv in the k-th facet at time t and in this work, we define the following 9 metrics of preference similarity as, stijk = [pti (k), ptj (k), pti (k) + ptj (k), pti (k) − ptj (k), ptj (k) − pti (k), (pti (k) − ptj (k))2 , ptj (k)pti (k),
5.2
pti (k) ptj (k) , ]. ptj (k) pti (k)
Findings about eTrust
Some interesting findings are summarized below. • For each trust relation, xiv , the evolution speed of this relation in the k-th facet (facet speed), fivk , is defined as, fivk =
1 T − tiv − 1
T X
t t−1 |wivk − wivk |
(16)
t=tiv +1
where tiv is the time point when xiv established and the overall evolution speed, PFiv , is defined as the sum of all K facet speed, Fiv = K k=1 fivk . We find that the evolution speed varies with facets and trust relations
within an open triad, shown as the left subgraph in Figure 3, are more likely to evolve than those within a closed triad, demonstrated in the right subgraph in Figure 3. On average, the evolution speed of trust relations within an open triad is 6.12 times of the speed of those within a closed triad. • Users with similar preferences are more likely to trust each other, demonstrating the homophily [23] in the trust network. For example, the average preference similarity between trusted users is 3.44 times as that of users without trust relations at T10 and we have similar observations in other time slices. • User preferences drift over time which can be observed in Figure 4, depicting user preferences from 2001 to 2009, i.e., {{pti }ui ∈UT }9t=1 . The change of user preference from t to t + 1, Ytt+1 , is defined as, Ytt+1 =
1 nt+1
X
kpt+1 − pti k i
(17)
ui ∈Ut+1
we find that the change of user preferences varies in different years. For example, the changes of user preferences from 2003 to 2004 and from 2005 to 2006 are 2.96 and 1.79 times of the average, respectively. This might be related to two big events happened in Epinions: Epinions was acquired by Shopping.com 3 in 2003 and in turn was acquired by Ebay 4 in 2005 5 .
Figure 4: User Preference Evolution. Note that xaxis, y-axis and z-axis denote the facet ID, user ID and user preference, respectively.
• The speed of change, defined in Eq. (13), varies with people and facets. Figure 5 shows the average speed of change from 2001 to 2009 w.r.t. different facets. Different people have different speeds of change while user preferences change much faster for some facets than others. For example, the average changing speed for the 13-th latent facet is almost 60 times of that for 2-th latent facet. • People have multiple and heterogeneous trust relations with others. For each facet, people only trust a part of their networks strongly. For example, on average, people trust only 15.8% of their trust networks for a specific facet when the number of latent facets are 20 at T10 . We also observe heterogeneous trust relations including heterogeneous pairs of reciprocal trust relations, transitive relations and co-citation relations. These observations are very consistent with the findings of our previous multi-faceted trust research [27].
••
•
••
•! Open Triad
•
•! Closed Triad
Figure 3: Open Triad vs Closed Triad 3
http://www.shopping.com/ http://www.ebay.com/ 5 http://en.wikipedia.org/wiki/Epinions 4
Figure 5: Speed of Changes w.r.t. Users and Facets
5.3
Rating Prediction
eTrust can be naturally applied to the online application of rating prediction, as discuss in Section 4.1. T11 is the testing set in this experiment and we further divide it into two parts: the ratings involved in new items or new users(coldstart problem), denoted as N ; the remaining ratings from T11 denoted as K. We closely examine T11 and find that N contains 10.06% of the ratings at T11 . We use Root Mean Squared Error (RMSE), a common metric, to evaluate prediction accuracy [13]. sP ˆij )2 (i,j)∈M (rij − r RM SE(M) = (18) |M| where |M| is the size of the testing set, M. Note that eTrust can be easily applied to other rating prediction methods thus we aim to investigate whether considering dynamics of trust relations and user preferences allows us to improve the prediction performance, instead of comparing different rating prediction methods.
1.09
Table 2: RMSE of Rating Prediction in Epinions K N K+N Mean 1.1054 1.1562 1.1106 NN 1.1092 1.1566 1.1148 MF 1.0804 1.1472 1.0872 MF+NN 1.0675 1.1392 1.0747 mTrust 1.0566 1.1375 1.0646 eTrust 1.0299 1.0783 1.0347
User Item
RMSE
1.085
1.08
1.075
The prediction results are shown in the Table 2 and the methods listed in the table are defined as follows: • Mean: the rating of an item is predicted by the mean of known ratings of the product; • MF: this method is based on user preferences and item characteristics, however it assumes that user preferences do not evolve over time [13]; • NN: this model is a simple nearest-neighbor algorithm, assuming that a user has the similar rating as his trust network for the same item [13]; • MF+NN: this model is a combination of NN and MF, however, it considers heterogeneous strengths of trust relations [2]; • mTrust: this model considers the multi-faceted trust relations between users and assumes people place trust differently on different people [27]; • eTrust: our proposed method, considering the dynamics of both trust relations and user preferences. Note that all baseline methods cannot tackle the cold-start problem thus we adopt the similar strategy as eTrust by considering their top-ℓ similar existing users (items) based on their profiles. For example, for Mean, the rating of a new item is predicted by the average rating of its top-ℓ similar items. The parameters of all methods are determined through cross-validation. For eTrust, the parameters are set as: K = 20, α = 0.3, β = 0.05 and γ = 0.1. We notice that RMSE values for all methods are very close to each other. However, small improvement in RMSE terms can have a significant impact on the quality of the top few recommendation [13]. As reported in [13], when the performance improved from 0.9025 to 0.8870 w.r.t. RMSE, it gains more than 50% relative improvement in terms of top few recommendation. Mean obtains better performance than NN. There are two main reasons. First, most of the time, the majority of users actual ratings are close to the average and by closely examining the dataset, we find that more than 70% users give a score of 4 or 5. Second, NN treats all trust relations equally, however, people may trust a part of their trust networks more strongly than others. When considering heterogeneous trust strengths, MF+NN outperforms both MF and NN, further demonstrating that people trust their friends differently while only give similar ratings with their strongly trusted friends. NN and MF+NN assume single and homogeneous trust relations, however, people with multi-faceted interests and experts of different types suggest multiple and heterogeneous trust relations [27]. mTrust considers multi-faceted trust relations and obtains better performance than MF+NN. eTrust
1
5
10
50
100
500
1000
All
# of Selected Top−Similar Objects
Figure 6: RMSE w.r.t. the Number of Selected Top Similar Users consistently outperforms all other methods and we believe that this improvement is contributed by exploiting the dynamics of both trust relations and user preferences. The performance of all methods degrades in N . However, eTrust obtains much better performance than other methods, gaining 0.0592 absolute improvement in terms of RMSE. These results demonstrate that eTrust is more robust to the cold-start problem. We also investigate how the performance of eTrust varies with the number of selected top-similar users or items, ℓ, and the results are shown in Figure 6. The performance with ℓ = 1 is always worst, even worse than that with all users or items. It suggests that users (items) with most similar profiles do not necessarily have most similar user preferences (item characteristics).
5.4
Trust Prediction
Trust plays an important role in helping collect reliable information for users in online communities. If trust relations can be predicted accurately, users can use these relations to make decisions on the reliability of information. For eTrust, trust relations can be inferred by user preferences with estimated w and user specific bias {bi } by Eq. (5). Data from T11 are treated as the testing set, separated into two parts: E: trust relations established among existing users and N : trust relations involved in new users, covering 23.51% of the whole trust relations at T11 . We follow the metric used in [17] to evaluate the performance of trust prediction. Let A be the set of pairs of users, having no relations at time T10 , and B denote the set of pairs of users, establishing trust relations at time T11 . Each trust predictor ranks pairs in A in decreasing order of confidence and we take the first |B| pairs as the set of predicted trust relations, denoting as C. Then the prediction accuracy (PA) can be calculated as, PA =
|B ∩ C| |B|
(19)
where | · | denotes the size of a set. The PA results are shown in Table 3 and the trust predictors mentioned in the table are defined as follows: • Simi: it ranks the pairs according to their rating similarity (if available) and profile similarity, which only considers profile information or rating information. • TP: the trust relations are inferred through trust propagation, only considering the existing trust network information [8].
Table 3: Accuracy of Trust Prediction in Epinions E(%) N (%) E + N (%) Simi 48.41 28.94 43.93 TP 45.47 N.A. 35.01 Simi+TP 50.19 28.94 45.31 eTrust 55.07 33.83 50.18 • TP + Simi: a combination of Simi and TP, integrating both the existing trust network and rating or profile similarity [3]. • eTrust: trust relations are predicted by eTrust through Eq. (5) and details are shown in the subsection 4.2. Trust propagation always obtains the worst performance. TP predicts trust relationship between two users when there are existing trust relationships connecting them indirectly. However, connectivity information is not always enough for correct trust prediction, especially when trust connectivity is not high enough for applying propagation techniques. For example, TP cannot be applied to the dataset N . Simi obtains better performance than TP, demonstrating that users with similar ratings or profiles are likely to trust each other. When considering both existing trust networks and similarity, Simi+TP outperforms both TP and Simi, indicating similarity information and existing trust networks are complementary with each other for trust prediction. eTrust performs consistently best. For example, eTrust gains 9.72% and 14.45% relative improvement in E and N , respectively. There are two main reasons: (1) eTrust performs trust prediction facet by facet, overcoming the problem of the heterogeneous trust relations [27]; and (2) trust relations are inferred based on user current preferences. We observe that the performance of all predictors decreases in N . However, based on profile similarity, Simi still obtains 28.94% in terms of prediction accuracy, much larger than a random predictor whose accuracy is less than 1%. This observation demonstrates the existence of homophily in trust networks: similar users are more likely to trust each other.
6. RELATED WORK Trust has attracted more and more attention from computer science community [20, 5]. In recent years, many trust related online applications are proposed and we next briefly review some of them related to our work. Rating Prediction. People are likely to seek advices from their trust networks thus trust networks are widely exploited in the task of rating prediction. In [20], several methods for incorporating trust networks are proposed to improve the performance of rating prediction. [22] studied and modeled the bidirectional effects between trust relations and product rating. Koren introduced some innovations to both latent factor models and neighborhood models. The factor and neighborhood models can now be smoothly merged, thereby building a more accurate combined model [13]. It reported that a proper combination of these two models can significantly improve the prediction performance. All these methods do not consider the changes of user purchase interest. However, customer preferences for products are drifting over time [14]. Ding et al. presented a novel algorithm to compute the time weights for different items by assigning a decreasing weight to old data [4]. Koren proposed a methodology and specific techniques for modeling the dy-
namics of user preferences in recommender systems [14]. It claimed that temporal dynamics in the data can have more significant impacts on accuracy than designing more complex learning algorithms. Trust Prediction. Trust plays an important role in helping online people collect reliable information. If trust relations can be predicted accurately, users can use these relations to make decisions on the reliability of information. Guha et al. developed a formal framework of trust propagation schemes [8]. It first separates trust and distrust matrix and then performs operations on them to obtain the transitive trust between two nodes. The connection between trust and user similarity was studied in [32], and a strong and significant correlation is found between trust and similarity. The more similar two people are, the greater the trust between them is. In [3], rating similarity is exploited to enrich traditional trust propagation methods. This work demonstrated that predicting trust is more successful for pairs of users that are similar to each other if we combine the topology of the trust network with rating similarity. Trust Strength Prediction is another direction of related research, which differs from trust prediction. The former focuses on modeling the strength of existed links rather than link existence. In [31] a latent variable model is developed to estimate relationship strength from various interaction activities and user similarities. In this model, relationship strength is modeled as the hidden effect of user similarities and it also impacts the nature and frequency of online interactions. Au et al. show heterogeneous trust strengths of trust relations in product review sites and a modified matrix factorization technique is proposed to estimate strengths of trust relations. Trust, as a social concept, naturally has multiple facets, indicating multiple and heterogeneous trust relations between users. Our previous work, mTrust, investigated multi-faceted trust relations between users. People place trust differently on different people and the work of mTrust demonstrates that trust strength can be inferred under the context of rating prediction [27].
7.
CONCLUSIONS
In this paper, we study online trust evolution in the context of product review. By exploiting the correlation between user preferences and trust relations, we propose a framework, eTrust, to understand the evolution of trust in an online world and apply eTrust to various online applications such as rating prediction and trust prediction. Interesting findings are observed in our experiments using realworld data, Epinions; and eTrust can be applied to improve the performance of rating prediction and trust prediction. In our future work, we will continue our research on eTrust and investigate trust evolution in areas such as ranking evolution, useful message recommendation [9] and linked feature selection [28].
Acknowledgments The work is, in part, supported by ARO (#025071) and NSF (#0812551).
8.
REFERENCES
[1] J. Anderson, R. Michalski, R. Michalski, J. Carbonell, and T. Mitchell. Machine learning: An artificial
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
intelligence approach, volume 2. Morgan Kaufmann, 1986. C. Au Yeung and T. Iwata. Strength of social influence in trust networks in product review sites. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 495–504. ACM, 2011. P. Borzymek, M. Sydow, and A. Wierzbicki. Enriching trust prediction model in social network with user rating similarity. In International Conference on Computational Aspects of Social Networks, pages 40–47. IEEE, 2009. Y. Ding and X. Li. Time weight collaborative filtering. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 485–492. ACM, 2005. H. Gao, G. Barbier, and R. Goolsby. Harnessing the crowdsourcing power of social media for disaster relief. Intelligent Systems, IEEE, 2011. D. Gefen, E. Karahanna, and D. Straub. Trust and tam in online shopping: An integrated model. Mis Quarterly, pages 51–90, 2003. J. Golbeck. Trust and nuanced profile similarity in online social networks. ACM Transactions on the Web, 3(4):1–33, 2009. R. Guha, R. Kumar, P. Raghavan, and A. Tomkins. Propagation of trust and distrust. In Proceedings of the 13th international conference on World Wide Web, pages 403–412. ACM, 2004. X. Hu, N. Sun, C. Zhang, and T. Chua. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In CIKM, 2009. M. Jamali and M. Ester. Trustwalker: a random walk model for combining trust-based and item-based recommendation. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 397–406. ACM, 2009. G. Jones and J. George. The experience and evolution of trust: Implications for cooperation and teamwork. Academy of management review, pages 531–546, 1998. P. Karmakar and R. Roy. Evolution of trust and formation of preference clusters in distributed networked structure. International Journal of Virtual Communities and Social Networking, 3(2):17–50, 2011. Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 426–434. ACM, 2008. Y. Koren. Collaborative filtering with temporal dynamics. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 447–456. ACM, 2009. J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive and negative links in online social networks. In Proceedings of the 19th international conference on World wide web, 2010. R. Li, B. Li, C. Jin, X. Xue, and X. Zhu. Tracking user-preference varying speed in collaborative filtering. In Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011.
[17] D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019–1031, 2007. [18] H. Liu, E. Lim, H. Lauw, M. Le, A. Sun, J. Srivastava, and Y. Kim. Predicting trusts among users of online communities: an epinions case study. In Proceedings of the 9th ACM Conference on Electronic Commerce, pages 310–319. ACM, 2008. [19] Y. Lu, P. Tsaparas, A. Ntoulas, and L. Polanyi. Exploiting social context for review quality prediction. In Proceedings of the 19th international conference on World wide web, pages 691–700. ACM, 2010. [20] H. Ma, H. Yang, M. Lyu, and I. King. Sorec: social recommendation using probabilistic matrix factorization. In Proceeding of the 17th ACM conference on Information and knowledge management, pages 931–940. ACM, 2008. [21] P. Massa and P. Avesani. Trust-aware recommender systems. In Proceedings of the 2007 ACM conference on Recommender systems, pages 17–24. ACM, 2007. [22] Y. Matsuo and H. Yamamoto. Community gravity: measuring bidirectional effects by trust and rating on online social networks. In Proceedings of the 18th international conference on World wide web, pages 751–760. ACM, 2009. [23] M. McPherson, L. Smith-Lovin, and J. Cook. Birds of a feather: Homophily in social networks. Annual review of sociology, pages 415–444, 2001. [24] A. Mishra and A. Bhattacharya. Finding the bias and prestige of nodes in networks based on trust scores. In Proceedings of the 20th international conference on World wide web, pages 567–576. ACM, 2011. [25] J. Nocedal and S. Wright. Numerical optimization. Springer verlag, 1999. [26] M. Richardson and P. Domingos. Mining knowledge-sharing sites for viral marketing. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 61–70. ACM, 2002. [27] J. Tang, H. Gao, and H. Liu. mtrust: Discerning multi-faceted trust in a connected world. In the 5th ACM International Conference on Web Search and Data Mining, 2012. [28] J. Tang and H. Liu. Feature selection with linked data in social media. In SDM, 2012. [29] S. Tang. The social evolutionary psychology of fear (and trust): Or why is international cooperation difficult? In annual meeting of the ISA’s 49th Annual Convention, Bridging Multiple Divides, Hilton San Francisco, San Francisco, CA, USA, 2010. [30] G. Van de Bunt, R. Wittek, and M. de Klepper. The evolution of intra-organizational trust networks. International sociology, 20(3):339–369, 2005. [31] R. Xiang, J. Neville, and M. Rogati. Modeling relationship strength in online social networks. In Proceedings of the 19th international conference on World wide web, 2010. [32] C. Ziegler and J. Golbeck. Investigating interactions of trust and interest similarity. Decision Support Systems, 43(2):460–475, 2007.