techgirl76's adoptions are based on her friends' latent factors, rather than her own. Harry Potter and the. Goblet of Fire. Star Wars Episode III: Revenge of the Sith.
Predicting Item Adoption Using Social Correlation Freddy Chong Tat Chua∗
Hady W. Lauw†
Abstract Users face a dazzling array of choices on the Web when it comes to choosing which product to buy, which video to watch, etc. The trend of social information processing means users increasingly rely not only on their own preferences, but also on friends when making various adoption decisions. In this paper, we investigate the effects of social correlation on users’ adoption of items. Given a user-user social graph and an item-user adoption graph, we seek to answer the following questions: 1) whether the items adopted by a user correlate to items adopted by her friends, and 2) how to incorporate social correlation in order to improve prediction of unobserved item adoptions. We propose the Social Correlation model based on Latent Dirichlet Allocation (LDA) that decomposes the adoption graph into a set of latent factors reflecting user preferences, and a social correlation matrix reflecting the degree of correlation from one user to another. This matrix is learned (rather than pre-assigned), has probabilistic interpretation, and preserves the underlying social network structure. We further devise a Hybrid model that combines a user’s own latent factors with her friends’ for adoption prediction. Experiments on Epinions and LiveJournal data sets show that our proposed models outperform the approach based on latent factors only (LDA). 1
Introduction
Unprecedented progress and innovation provide consumers a wide variety of choices. Consumer items such as books, cameras and movies come in various subjects, features and genres. Online shopping provides access to these items to anyone with an internet connection. Consequently, sellers anywhere can reach consumers anywhere, and consumers have access to increasing number of products. The direct effect is consumers have a harder time making purchasing decisions, while sellers do not know what to sell and whom to sell it to. Beyond commerce, users face a similar problem on the Web in general, when deciding which article to read, which group to join, etc. ∗ Singapore † Institute
Management University for Infocomm Research
Ee-Peng Lim∗
To address this information overload, retailers attempt to assist consumers by putting in place decisionmaking aids such as bestseller lists, listing items frequently bought together, etc. However, given the limited space in bestseller lists or any recommendation list targeted at everyone, such aids would favor the very popular items. Some merchants, such as Amazon and Netflix, have put in place more personalized recommender systems based on the individual user’s past transactions. However, such approaches frequently suffer from the cold start problem: no recommendation can be generated for users who have purchased very few items. Therefore, while attractive retail opportunity lies in the long-tail products, it is difficult for such products to be matched to the relevant users. In a trend known as social information processing, users increasingly rely on one another to organize the complex information on the Web. This is evident from the abundant amount of user-generated content, such as tags, ratings, and reviews, all of which collectively aim to allow items to be more easily discovered by other users. Social networks have also become a conduit for discovering relevant information. In such platforms as Twitter or Epinions, users can opt to receive only content generated by other users whom they follow or trust. A user’s choices are increasingly driven not only by personal preferences, but also by the preferences of others in their social networks. This gives rise to the phenomenon of social correlation, whereby users who are socially related tend to make similar choices. In this paper, we therefore aim to address the item adoption prediction problem by studying how social correlation plays a role in user adoption of items. Here, item adoption could refer to various actions such as buying a product, writing a product review, joining a group, etc. We model the adoption relationship between users and items as an undirected bipartite adoption graph Ga (V, U, E) where V represents a set of items, U represents a set of users and E represents the undirected adoption links between V and U . We also assume as input a social graph Gs (U, F ), where U represents the same set of users as in Ga and F represents the social links between users. An edge exists from u1 to u2 if u1 befriends, trusts, or follows u2 . In both Ga and Gs , we only require the binary expression of the links (present
or absent), and do not use any other form of information such as ratings or review text to keep our model simple and general. Given Ga and Gs , we seek to address the following problems: • Learning the extent to which a user relies on social correlation, as opposed to her personal preferences, in making adoption choices. For a given social link (u1 , u2 ) ∈ F , we would like to learn a weight that reflects the extent to which u1 ’s latent factors correlate with the latent factors of u2 . • Predicting the items that a user is likely to adopt based on social correlation. For a given pair of user u and item v, we would like to learn the probability that an adoption link (u, v) would exist in E. Factorization-based approaches can model a user’s personal preferences [14]. One such factorization is Latent Dirichlet Allocation (LDA) [4], which learns a set of latent factors by factorizing the adjacency matrix of the adoption graph into two matrices: one that reflects the importance of each latent factor to users, and another that does the same for items. However, this approach is inadequate because it assumes that all items adopted by a user can all be explained by the user’s and items’ latent factors.
Item Cluster 1 v1
Users
Adoption Link
Item Cluster 2
u1
v2
u2
v4
v3
u3
v5
Social Link
u4
v6
Figure 1: Example Scenario of Adoption (solid) and Social Links (dotted) Consider the example scenario in Figure 1. There are two clusters of items: {v1 , v2 , v3 } and {v4 , v5 , v6 }. Suppose that each cluster groups together items with similar latent factors. Users u1 and u2 have similar preferences, adopting items in the first cluster. Users u3 and u4 adopt items in the second cluster. Given that items in a cluster share similar latent factors, these adoptions can largely be explained by the users’ having similar latent factors. However, u2 ’s adoption of v4 cannot be clearly explained by latent factors alone.
Taking into account u2 ’s social links (dotted lines) to u3 and u4 , we hypothesize that in the case of v4 , u2 depends on the preferences of her friends u3 and u4 . We propose to model social correlation directly using factorization-based approaches. Some users may primarily rely only on their own latent factors in making adoptions. We say that these users have high selfdependency values. However, most users rely on a mixture of self-dependency and social correlation. This is modeled by a user-user social correlation matrix I. Based on a generative model, our approach assumes that a user u1 adopts an item based on her preferences on latent factors of the item with a probability proportional to iu1 ,u1 ∈ I also known as Self-Dependency, and based on another user u2 ’s latent factors P with probability proportional to iu1 ,u2 ∈ I. Here, u iu1 ,u = 1. Hence, we seek to learn both a user’s latent factors and the social correlation matrix from the given adoption and social graphs. We make the following contributions in this paper: 1. We propose two factorization models that we call the Social Correlation and Hybrid models. Social Correlation model decomposes an adoption graph and social graph into three components: users’ latent factors, items’ latent factors, and social correlation. While Hybrid model combines the merit of the Social Correlation model and LDA. 2. Our proposed models derive the social correlation weights from the factorization process, instead of relying on a social graph with pre-assigned link weights. In some cases, the weights are not known before hand. Even if the social graph comes with some form of weights (e.g., friendship strength), they may not accurately reflect the dependency and correlation among users. 3. To evaluate our proposed models, we conduct comprehensive experiments on two real-life data sets from Epinions and LiveJournal. The results show that our proposed models outperform the approach relying on latent factors alone. We also show that the Hybrid outperforms Social Correlation. The rest of the paper is organized as follows. Section 2 will discuss the past research done on modeling items and users relationship. We establish the existence of correlation between adoption and social links in Section 3 through hypothesis testing. In Section 4, we apply Latent Dirichlet Allocation (LDA) to model user adoption of items based on latent factors. In Section 5, we incorporate social correlation into the factorization model. We then proceed to evaluate our methods in Section 6. Finally we conclude our paper in Section 7.
2 Related Work 2.1 Social Correlation Here, we review several concepts related to social correlation, such as homophily, influence, k-exposure, etc. Notably, we go beyond just establishing or measuring social correlation, to also make use of it for adoption prediction. Fond and Neville [11,20] established that social correlation was a result of two processes that happen alternatively over a period of time: “homophily” causing users with similar attributes to form social links, and “influence” causing users with social links to become more similar in attributes. The notion of homophily is a well known phenomenon in sociology. McPherson et al. [19] surveyed articles establishing that homophily exists in various social contexts such as marriage, friendship, co-workers, classmates, involving similarity factors such as socio-demographic attributes. Singla and Richardson [22] also established the correlation of search queries among instant messaging friends. In our work, we are concerned only with the existence of social correlation and its use for adoption prediction, and not with the underlying causes (homophily vs. influence), which are not always observable from the data. Liu et al. sought to measure influence [16] based on clearly observable “following” behaviors. For instance, it looked at how Twitter users re-tweeted postings by others, or how authors published papers on the same topics as cited papers. They first obtain the topic distribution of every author based on the papers they wrote. Then for each author a, they decide who influences a based on the latent factors of the authors whom a cited from. Our work is different in the following ways. First, our focus is the adoption prediction problem, while their focus is on measuring influence and how it varies with the various number of hops in the social graph. Second, our model assumes that any friend (and not just certain friends e.g., authors cited) could be influencers. For example, a user who buys an item does not explicitly state whom she bases her decision on. In such cases, the possible number of influencers can be very large and their method may not scale up. Therefore, we view our approach as a more generalized approach that can work in generic settings. Also related is the notion of k-exposure: the likelihood that a user would adopt an item increases with the number k of her friends who have adopted it. Several works have studied k-exposure with respect to such adoptions as choosing which Wikipedia article to edit or which LiveJournal community to join [2, 6, 7]. The fundamental assumption here is that every user is correlated with their friends in the same way. All that matters is the number of friends who have adopted an item. In contrast, we do not make the same assumption. In
our approach, a user may be correlated with each friend differently, and may have different self-dependency values. Ma et al. extended the Bayesian Probabilistic Matrix Factorization (BPMF) models for rating prediction by adding social factors [17, 18]. They used the latent factors of users and items learned from BPMF coupled with the weighted values of the social links for item ratings prediction. Importantly, they assume the existence of the weighted values that reflect the relationship strength between each pair of friends. In the absence of known weights, all users may be weighted equally. In this work, we do not make the same assumption, and show that it is possible to learn these weighted values through an optimization process. Some prior work focused on how influence propagated across a network. Assuming a propagation framework such that an adoption by a user would probabilistically trigger a similar adoption by her friends, an influential user is one whose initial adoption would eventually result in the most number of total adoptions by all users [13]. The problem of influence maximization is orthogonal to our problem, in that influence maximization is more concerned with the total number of adoptions triggered, while we are concerned more with predicting individual adoption cases. Influence is also addressed by Yang and Leskovec as a form of information diffusion [23] with temporal dynamics. However, their notion of influence requires the explicit adoption of item while we consider in terms of latent factors. 2.2 Factorization The Bayesian Probabilistic Matrix Factorization (BPMF) is a popular model for low rank matrix approximation [21] method by Salakhutdinov. The model avoids overfitting of other methods such as SVD by adding Gaussian noise to the sparse data. The Gaussian noise acts as a regularizer to avoid overfitting the factorized matrices to the sparse data. Salakhutdinov then showed that the model can be approximated using a Gibbs Sampling method. The BPMF method subsequently was applied by Koren to rating prediction in the Netflix Prize Competition [14]. Koren combined the generalization properties of latent factor models to neighborhood methods in collaborative filtering. Koren also extended the factor models to modeling temporal dynamics [15]. When modeling ratings, it is appropriate to use BPMF because rating scores can be approximated to follow the Gaussian distribution. When we want to model simpler discrete relationships, the Latent Dirichlet Allocation (LDA) is more suitable [4]. Instead of Gaussian noise as regularizers, the LDA uses Dirichlet
distributions as smoothing priors which essentially behaves in the same way as regularizers. There are existing works that uses Dirichlet distributions to model item - user and user - user relationships. Balasubramanya and Cohen had proposed BlockLDA for modeling protein interactions [3]. The BlockLDA tries to unite the Mixed Membership Stochastic Blockmodels [1] and LDA to jointly model the relationships. However, their approach and assumptions are currently restricted to protein interactions only. 3
Correlation of Social & Adoption Links
We justify our research motivation by first establishing that a correlation exists between social and adoption links, i.e., whether users with social links also tend to share common adoptions. We do this by performing hypothesis tests on a real world data set obtained from Epinions, a product review site. The social graph in Epinions consists of trust links formed when a user indicates her trust on another user. These trust links are directional and not necessarily reciprocal. An adoption link exists between a user and an item (product) if the user has written a review for the item. We collected the data set by crawling the Epinions site, focusing only on the Videos & DVDs category. The size of the data set is given in Table 1. In total, there are close to 40K users and 7K items. There are also more than 300K social links and 80K adoption links. Both social and adoption links are binary (0 or 1). Although the adoption links are binary in Epinions data set, we can also handle weighed adoption links that represents adoption of the same item multiple times. Table 1: Epinions: Data Size no no no no
of of of of
users |U| items |V | adoption links |E| social links |F |
Count 39,946 6,949 83,763 331,509
We perform hypothesis testing using the Fisher Exact Test [10]. Our null hypothesis H0 states that the probability of two users having a common adoption is independent of whether the two users have a trust link between them. Rejecting the null hypothesis implies accepting the alternate hypothesis H1 , which states that the probability of common adoption is dependent on having social link. We perform the Fisher Exact Test on the contingency table in Table 2. Each value in the table represents the number of user pairs for a combination of social link and common item adoption scenarios. The numbers in parentheses are the expected values if the social graph is independent of the adoption graph. As shown in the
table, the observed number of pairs with both common adoption and social link 24,197 is far greater than the expected 2,594. Table 2: Epinions : Contingency Table For Pair of Users with Social and Adoption Links
No Social Link Has Social Link Total
No Common Adoption 791,271,379 (791,249,776) 307,312 (328,915) 791,578,691
Has Common Adoption 6,218,597 (6,240,200) 24,197 (2,594) 6,242,794
Total 797,489,976 331,509 797,821,485
Using Fisher Exact Test, we obtain a p-value < 2.2 × 10−16 which indicates that we can reject H0 , and conclude that the presence of social links is correlated with the presence of adoption links. We also established similar conclusions on a second data set obtained from LiveJournal, but do not reproduce them here due to space consideration. 4 Factorization based on LDA Our proposed approach is to first factorize the observed adoption graph E into user and item latent factors based on Latent Dirichlet Allocation (LDA), before learning the social correlation matrix I. In this section, we describe how we apply LDA for the item adoption prediction problem. LDA was formerly conceived as a way of modeling unigram words in a document corpus [4]. Each document is seen as a collection of words and the words are generated as a result of the topics each document contains. Using documents and words as analogy, we view users in the adoption graph as documents, the items they adopt as words and the latent factors of the items as topics. We now express a statistical formulation of LDA, and give an alternative linear algebraic formulation later in this section. The user u latent factor distribution θu follows a symmetric Dirichlet distribution with hyper-parameters ν, as follows: θu ∼ Dirichlet(ν) The latent factor zv,u ∈ {1, . . . , T } of each item v that the user u adopts is generated by the multinomial distribution with parameters θu , as follows: zv,u ∼ M ultinomial(θu) The item v that the user u will adopt is generated by the latent factor zv,u and the latent factor-item distribution β, as follows: ev,u ∼ β|zv,u The latent factor-item distribution β follows a symmetric Dirichlet distribution with hyper-parameters φ, as
follows:
a user will adopt an item based on the latent factors of other users. Each element iu,u′ reflects the likelihood that the user u will be correlated to u′ , in the sense of making adoption decision based on the latent factors of u′ . iu,u is the self-dependency of user u, or the likelihood that u relies on her own latent factors. The social correlation matrix is derived as:
β|zv,u ∼ Dirichlet(φ)
In the alternative linear algebraic formulation, LDA is a factorization algorithm that takes as input an M ×N matrix E and outputs a M × T matrix β and a T × N matrix θ. Here, T is the number of latent factors, M is the number of items, and N is the number of users. Intuitively, β represents the latent factors of items, and θ the latent factors of users. (5.4) E ≈ E′I T Suppose our matrix E is as follows, To properly reflect the notion of correlation, I ev1 ,u1 · · · ev1 ,uN cannot just be any N × N matrix. We require that .. .. (4.1) E = ... I must have the following properties: . . evM ,u1 · · · evM ,uN • It is probabilistic. Each element i ′ is in the u,u
The LDA algorithm takes E as input and outputs β and θ. (4.2)
βv1 |1 .. LDA(E) = . βvM |1
θu1 ,1 . . . βv1 |T .. .. .. . . . θu1 ,T . . . βvM |T
. . . θuN ,1 .. .. . . . . . θuN ,T
where each column in β and θ sums to 1. Solving for these two matrices is fundamentally a likelihood optimization problem subjected to the probability constraints. Blei showed that the matrices are learned using variational expectation maximization [4]. Griffiths and Steyvers subsequently showed that LDA can be learned easily using Gibbs Sampling [12]. When we multiply the matrices β and θ, we obtain the dense matrix E ′ which gives us the probability whether the links exist in the original sparse matrix E. As shown in Equation 4.3, E ′ is an approximation of the original E, only denser because it also produces probability values for the unobserved links in E.
range of [0, 1]. P u′ iu,u′ = 1.
For each user u, we also have
• It preserves the social network structure. Since social correlation is based on the underlying social network structure, iu,u′ should have non-zero value only if there is a social link from u to u′ , i.e., iu,u′ > 0 ⇒ (u, u′ ) ∈ F . In addition, we also learn the self-dependency values iu,u for each user u.
I can be obtained in several ways. The naive way is to calculate I by multiplying E with the inverse of E ′ , i.e. I = (E ′ )−1 E. This naive way will not work for several reasons. First, I may over-fit leading to poor results in link prediction. The obtained E ′ I T will be as sparse as E, and thus the factorization does not help in link prediction. Second, I may have values outside the range of [0, 1]. In fact, they may range from negative infinity to positive infinity. Such values do not have clear semantics and it is hard to interpret the meaning of these values. Third, I may have non-zero values even if the users are not connected by social links. Hence, instead of obtaining an exact I, we will obtain an approximated I such that we minimize the er(4.3) E ≈ (E ′ = β θ) ror |E − E ′ I T |, subject to the above-mentioned constraints (probabilistic, social network structure). To As the number of latent factors T approaches a learn I with clear semantics, we formulate a statistilarger value, the product of the factorized matrices E ′ cal learning problem where the goal is to learn the I gets more and more similar to E. However, this is not which maximizes the likelihood of observing the values desirable because we lose the generalization properties in E. Maximizing the likelihood is the dual equivalent of factorization algorithms and the solution becomes problem of minimizing error. more over-fitting to E. Since the graphs are sparse, algorithms that scale with the number of observed links would run faster. In 5 Factorization with Social Correlation the following, we formulate such an algorithm, and show Factorization by LDA alone is not sufficient to model that the complexity is indeed polynomial to the number user adoption of items as it does not account for the of observed links. social correlation effect. Models. Once the social correlation matrix I Social Correlation Matrix; We propose a N × N has been learned, we can instantiate two adoption social correlation matrix I to tell us how likely it is that prediction models as follows.
• Social Correlation represents the approach of relying only on social correlation for item adoption. We compute E ′ I T (see Equation 5.4) based on the learned I, taking into account only the nondiagonal values of I, i.e., setting iu,u = 0, ∀u ∈ U .
its elements iu,u′ gives us the probability that u follows the latent factors of u′ . The special case is u′ = u which tells us the self-dependency of u. The higher is iu,u , the less the user u depends on social correlation. Putting the above intuition formally, the probability that u adopts an item v based on the social correla• Hybrid represents the approach of combining Social tion I is given by: Correlation and LDA, by computing E ′ I T with the original learned I (with diagonal values retained). P (ev,u |θ, β, F, I) X We will experimentally establish the merits of these (5.7) = P (ev,u′ , xv,u = u′ , fu,u′ |θ, β, F, I) models with respect to LDA in Section 6. u′ X Special Case. Our proposed formulation sub= (5.8) P (ev,u′ |θ, β)P (fu,u′ |F )P (xv,u = u′ |I) sumes the underlying latent factors model. In the case u′ where I is the identity matrix, with 1’s as diagonal values and 0’s otherwise, then E ′ I T degenerates to E ′ , For simplicity in the following derivations, we will take, which is the outcome by LDA factorization. (5.9) P (ev,u′ |θ, β) = e′v,u′ 5.1 Solution Formulation We would like to illus- (5.10) P (fu,u′ |F ) = fu,u′ trate the formulation of our algorithm using probabilisP (xv,u = u′ |I) = iu,u′ tic explanations. Given a user u, we will like to know (5.11) X the probability that she will adopt the item v, given the (5.12) P (ev,u |θ, β, F, I) = e′v,u′ fu,u′ iu,u′ user latent factors θu and the topic latent factors β. u′ Suppose now that we have the edges of the social To learn the social correlation values, we maximize graph F and the latent factors of all other users U the log likelihood of ev,u , ∀v ∈ V, ∀u ∈ U , using the including herself, we hypothesize that the user u adopts Expectation Maximization (EM) algorithm [9], items based on the latent factor preferences of her friends and the user herself. We may restate the Y equation as follows, P (E|θ, β, F, I) = (5.13) P (ev,u |θ, β, F, I) v,u P (ev,u |θ, β, F ) X X (5.14) log P (E|θ, β, F, I) = log P (ev,u |θ, β, F, I) = (5.5) P (ev,u′ , fu,u′ |θ, β, F ) (5.6)
=
u′ ∈U
v,u
X
X
P (ev,u′ |θ, β)P (fu,u′ |F )
(5.15)
=
v,u
u′ ∈U
where fu,u′ represents that u has a directed social link to u′ . Also note that ev,u has become ev,u′ on the right hand side of the equations. P (fu,u′ |F ) is either 0 or 1 since we do not model the probability of social links. Equation 5.6 however is not a valid probability equation because it does not sum to 1. In fact, the values will exceed 1 due to the outer summation over u′ . The reason is besides knowing the probability that u indicates u′ as a friend in the social graph P (fu,u′ |F ) and the probability that u′ adopts item v in the adoption graph P (ev,u′ |θ, β), we also need an additional component that tells us the probability that u depends on u′ in the adoption graph P (xv,u = u′ |I) (to be defined shortly). This additional component is the social correlation that we want to determine. Hence, our proposed factorization model is to introduce the latent variable xv,u which tells us which u′ that u depends on, and the social correlation I where
log
X
e′v,u′ fu,u′ iu,u′
u′
5.2 Expectation Maximization Algorithm We first show the E Step. The E Step of the EM algorithm tries to infer for the latent variables using initial values of I, P (xv,u = u′ |ev,u , fu,u′ , θ, β, F, I) P (xv,u = u′ , ev,u′ , fu,u′ |θ, β, F, I) ′′ u′′ P (xv,u = u , ev,u′′ , fu,u′′ |θ, β, F, I) P (ev,u′ |θ, β)P (fu,u′ |F )P (xv,u = u′ |I) =P ′′ u′′ P (ev,u′′ |θ, β)P (fu,u′′ |F )P (xv,u = u |I) =P
(5.16)
=P
e′v,u′ fu,u′ iu,u′ ′ u′′ ev,u′′ fu,u′′ iu,u′′
(5.17) = h(u, u′ , v)
Since we have introduced iu,u′ as a probabilistic 5.3 Complexity Analysis In Section 3, we show weight, hence, it must sum to one. that the social and adoption graphs are sparse. That is, X the number of edges in the graph is significantly smaller ∀u ∈ U iu,u′ = 1, than the total number of possible edges, |F |