A Scalable Framework for Modeling Competitive Diffusion in Social ...

5 downloads 200682 Views 950KB Size Report
the iPhone vs. the Blackberry diffuses through a SN like Face- book? ... We call this last algorithm the “competing diffu- ..... fj(µ1,...,µs)op cj, add the constraint.
A Scalable Framework for Modeling Competitive Diffusion in Social Networks Matthias Broecheler, Paulo Shakarian, and V.S. Subrahmanian University of Maryland College Park, MD Email: {matthias,pshak,vs}@cs.umd.edu

Abstract—Multiple phenomena often diffuse through a social network, sometimes in competition with one another. Product adoption and political elections are two examples where network diffusion is inherently competitive in nature. For example, individuals may choose to only select one product from a set of competing products (i.e. most people will need only one cell-phone provider) or can only vote for one person in a slate of political candidate (in most electoral systems). We introduce the weighted generalized annotated program (wGAP) framework for expressing competitive diffusion models. Applications are interested in the eventual results from multiple competing diffusion models (e.g. what is the likely number of sales of a given product, or how many people will support a particular candidate). We define the “most probable interpretation” (MPI) problem which technically formalizes this need. We develop algorithms to efficiently solve MPI and show experimentally that our algorithms work on graphs with millions of vertices.

I. I NTRODUCTION There are numerous applications where multiple competing phenomena are diffusing through a social network (SN). For instance, companies might be interested in tracking product diffusion in a social network. Can we learn how adoption of the iPhone vs. the Blackberry diffuses through a SN like Facebook? Can we learn how support for a political candidate like Gordon Brown diffuses through a network in competition with support for the opposing candidate, David Cameron? Most past work on social network diffusion has focused on spread of one phenomenon at a time. Examples include diffusion models for the spread of diseases [1], viral marketing [2], spread of a mutant gene [3], spread of information [4], and the spread of cooperation [5]. There are many applications where we need to reason about the interplay between these competing phenomena. We want to know which political candidate is most likely to win, or how many iPhone adoptions will occur within a given SN. This paper makes the following contributions. In Section II, we introduce Weighted Generalized Annotated Programs (wGAPs). wGAPs extend the annotated logic paradigm [6] by the weighing scheme in [7] and provide a declarative mechanism to express a wide range of diffusion models for competing phenomena in a coherent framework. The weights in wGAPs can be automatically learned from historical data using standard algorithms such as the gradient descent based perceptron algorithm proposed in [8]. An “interpretation” is an assignment of truth values (e.g. vertex v1 will vote for the Labour party with 75% certainty, while vertex v2 will vote for

Tories with 85% certainty) — Section III introduces the “most probable interpretation” (MPI) problem and explains why the MPI problem helps answer the above questions. In Section IV, we show that solving the MPI problem can be done in three ways: (i) by representing it as a numeric optimization problem which, for many diffusion models, is polynomially solvable but expensive, (ii) via a fixpoint computation process which is also expensive, and (iii) via a novel approach based on graph partitioning. We call this last algorithm the “competing diffusion engine” (CODE) algorithm and show, in Section VI, that CODE efficiently and accurately solves diffusion problems in SNs with over 8 million edges and 2 million vertices. II. W EIGHTED GAP D IFFUSION M ODELS A weighted generalized annotated program (wGAP) consists of two parts — a generalization of annotated rules [6] together with a set of integrity constraints (ICs). The rules describe the certainty that a given property spreads from one vertex to another, given some information about the vertex itself (e.g. male/female, age group etc) and the nature of the links between vertices (e.g. spouse vs. penpal). The ICs constrain the relationships between properties - for instance, if a particular vertex will vote for Brown with 75% certainty and for Cameron with 50% certainty, then this violates a constraint, since each person has one vote only. A. Syntax Throughout this paper, we assume the existence of two arbitrary but fixed disjoint sets VP, EP of vertex and edge predicate symbols respectively. Each vertex predicate symbol has arity 1 and each edge predicate symbol has arity 2. Definition 1: A social network (S) is a 5-tuple (V, E, `vert , `edge , w) where: 1) V is a set whose elements are called vertices. 2) E ⊆ V × V is a multi-set of edges. 3) `vert : V → 2VP is a vertex labeling function. 4) `edge : E → EP is a edge labeling function. 1 5) w : E × EP → [0, 1] is an edge weight function. We now present a brief example of a social network (SN). Example 1 (Election Example): Let VP = {voteLabour, voteT ory, likeBrown, likeCam, suptBrown, suptCam, 1 Each edge e ∈ E is labeled by exactly one predicate symbol from EP but there can be multiple edges with different labels between vertices.

v6

v2

v1

v10 v5

v13

v8

v7

v3

v9

v11

v14

v4

v15

v12

v16

Fig. 1. Example social network. Square vertices are Brown supporters, diamond vertices are Cameron supporters, solid edges are knows relationships, dashed edges are idol relationships, and dotted edges are olderRel relationships.

student, employee, young} denote properties of vertices (which party they vote for, who they like or actively support, and demographics). Let EP = {knows, mentor, olderRelative, idol} denote relationships between vertices. Consider the social network in Figure 1. For square vertices, V , `vert (V ) = {suptBrown}, and for diamond vertices, `vert (V ) = {suptCam}. For edges, if edge E is a solid line, then `edge (E) = knows, if a dashed line, `edge (E) = idol, and for a dotted line, `edge (E) = olderRelative. All edge weights are 1. Note that our definition of social networks is much broader than those in [1], [2], [9], [10] which often do not consider either `edge or `vert . Demographics, party affiliations, and other properties are all key indicators of how someone might vote and should not be ignored. Likewise, relationship types are crucial in determining the possible level of influence between individuals. We now recall the definition of annotated terms from [6] to develop a general logical language for diffusion models. We assume the existence of a set AVar of variable symbols ranging over the unit real interval [0, 1] and a set F of function symbols each of which has an associated arity. We start by defining annotations. Definition 2 (annotation term): (i) Any member of [0, 1] ∪ AVar is an annotation. (ii) If f is an n-ary function symbol over [0, 1] and t1 , . . . , tn are annotations, then so is f (t1 , . . . , tn ).2 We define a separate language whose constants are members of V and whose predicate symbols consist of VP ∪ EP. We assume the existence of a set V of variable symbols ranging over the constants (vertices). No function symbols are present. A term is any constant (vertex) or variable. If A = p(t1 , . . . , tn ) is an atom and p ∈ VP (resp. p ∈ EP), then A is called a vertex (resp. edge) atom. We now define weighted rules based on the model proposed in [7]. Definition 3 (annotated atom/weighted rule): (i) If A is an atom and µ is an annotation, then A : µ is an annotated atom. (ii) If A0 : f (µ1 , . . . , µn ), A1 : µ1 , . . . , An : µn are annotated atoms and wt is a real number in [0, 1], then wt A0 : f (µ1 , . . . , µn ) ← A1 : µ1 ∧ . . . ∧ An : µn is called a weighted rule and wt is the weight of this rule. When n = 0, the above rule is called a fact. For instance, we might know that an older relative has more influence on an individual than a mere acquaintance. Hence, 2 In the following, we assume f to be a conic function but more general formulations are possible.

we would weigh the diffusion of political opinion across olderRelative edges higher than for knows edges. The use of weights provides great flexibility - for example, [11] proposes “big seed” marketing that combines both viral and massmarketing techniques. Our framework is sufficiently general to allow a user to model both processes simultaneously - allowing users to fine-tune strategies that can take maximal advantage of both techniques.3 However, as mentioned earlier, a set of weighted rules might allow us to infer that a particular vertex will vote for both Brown and Cameron - which of course is impossible. Integrity constraints help address this fact. Definition 4 (integrity constraint/wGAP): Given a set of annotated atoms (not necessarily ground), {A1 : µ1 , . . . , An : µn } a function f , inequality/equality symbol op ∈ {=, 6=, < , >, ≤, ≥}, and real number, c, then {A1 : µ1 , . . . , An : µn } : f (µ1 , . . . , µn )op c is an integrity constraint. A weighted GAP (wGAP) is a pair (Π, IC) where Π is a finite set of rules and IC is a finite set of integrity constraints. Every social network SN = (V, E, `vert , `edge , w) can be represented by a wGAP (ΠSN , −) where ΠSN = {q(v) : 1 1 1 ← | v ∈ V ∧ q ∈ `vert (v)} ∪ {ep(V1 , V2 ) : w(V1 , V2 , ep) ← | (V1 , V2 ) ∈ E ∧ `edge ((V1 , V2 )) = ep}. Definition 5 (embedded social network): A social network SN is said to be embedded in a wGAP (Π, IC) iff ΠSN ⊆ Π. We see from the definition of ΠSN that all social networks can be represented as wGAPs. When we augment ΠSN with other rules — such as rules describing how certain properties diffuse through the social network, we get a program Π ⊇ ΠSN that captures both the structure of the SN and the diffusion principles. Here is a small example. Example 2 (elections): The wGAP(Πelect , IC) might consist of ΠSN using the social network of Figure 1 plus the rules: 0.7

1) voteLabour(A) : X ← suptBrown(A) : X 0.5 2) voteT ory(A) : X ← likeCam(A) : X 0.1 3) voteLabour(B) : X ← voteLabour(A) : X ∧ knows(B, A) : 1 0.25 4) voteLabour(B) : X ← voteLabour(A) : X ∧ mentor(B, A) : 1 ∧ student(B) : 1 0.15 5) voteT ory(B) : X ← voteT ory(A) : X ∧ mentor(B, A) : 1 ∧ employee(B) : 1 0.7 6) voteLabour(B) : X ← voteLabour(A) : X ∧ olderRel(B, A) : 1 0.8 7) voteT ory(B) : X ← voteT ory(A) : X ∧ idol(B, A) : 1 ∧ young(B) : 1

Rule 1 says that if A supports Brown, then he will vote for Labour with relative probability 0.4. Rule 2 is similar for Cameron followers. Other rules, such as 3 depend on the edge 3 Where do rule weights come from? One possibility, is that the user could arbitrarily select rule weights to determine the outcome of the competitive diffusion processes under different circumstances. A similar situation occurs in other, less general diffusion models such as that of [3] where the users must assign a “fitness” to each of the competitors in the diffusion processes. When that model is applied to game theory, such as in [12], the authors relate “fitness” to the payoff associated with a game. A similar intuition could apply to rule weights in the more general framework presented in this paper. Additionally, in Section V, we discuss how to learn weights from real-world data. To our knowledge, there has been no similar scheme to learn the “fitness” in the context of the competitive diffusion model [3].

relationships in the graph. This rule states that if vertex B has some outgoing neighbor on a knows edge who votes for Labour, then vertex B votes for Labour with weight 0.1. The ICs for this wGAPare: 1) {voteLabour(V ) : X1 , voteT ory(V ) : X2 } : X1 + X2 ≤ 1

Constraint 1 says that the total degree of belief for a vertex voting for Labour and Tory is less than 1 since a person has only one vote. III. T HE M OST P ROBABLE I NTERPRETATION P ROBLEM Our goal is to compute the most likely result of competitive diffusion described by a wGAP within a given social network. An interpretation is one way of assigning certainties to vertex atoms. Interpretations of course must satisfy the ICs. We will show how to assign a probability to each interpretation. The “most probable interpretation” then will be the interpretation that has the highest probability of being the correct interpretation and hence of being the most likely outcome of the competitive diffusion process. In our election example, different interpretations might specify the certainty with which different voters might vote for Brown vs. Cameron. The most probable interpretation then is the one that reflects the most likely outcome. Definition 6 (Interpretation): Given the set of ground atoms, atoms, an interpretation I : atoms → [0, 1] is a mapping of ground atoms to real numbers in [0, 1]. We now define a distancebetween the satisfaction of a rule by an interpretation – if this number is 0, the rule is fully satisfied by the interpretation — as the distance increases, the rule is less and less satisfied by the interpretation. Definition 7 (Distance from Satisfaction): Given an interwt pretation I and weighted rule R = A0 : f (µ1 , . . . , µn ) ← A1 : µ1 ∧ . . . ∧ An : µn , the distance from satisfaction of rule R with respect to interpretation I, d(R, I), is wt · max(0, f (µ1 , . . . , µn ) − I(A0 )). Example 3: Consider the following ground-instance of 0.7 rule 1 from Example 2: voteLabour(V11 ) : X ← suptBrown(V11 ) : X. We shall refer to this ground rule as R1 . Suppose we have interpretation I where I(suptBrown(V11 )) = 1.0 and I(voteLabour(V11 )) = 0.5. Therefore, the distance from satisfaction, d(R1 , I) = 0.7 × max(0, 1 − 0.5) = 0.35. The idea of probability of satisfaction of logical formulas was introduced in 1964 in [13] and later studied in seminal papers such as [14] and many other subsequent papers in the last 45 years. Our notion of distance from satisfaction is a variant of such efforts. We now define satisfaction of integrity constraints. Note that here we apply a more traditional definition of satisfaction. Definition 8 (Satisfaction of Integrity Constraints): Given interpretation I and integrity constraint C = {A1 : µ1 , . . . , An : µn } : f (µ1 , . . . , µn )op c (op ∈ {=, 6=, , ≤, ≥}), the distance from satisfaction of the constraint C w.r.t. interpretation I, denoted d(C, I), is 0 if f (I(A0 ), . . . , I(An ))op c and ∞ otherwise.

Hence, an interpretation either satisfies (distance 0) a constraint or not (distance ∞). Example 4: Suppose interpretation I assigns ground atom voteLabour(V6 ) value 0.9 and voteT ory(V6 ) value 0.2. As these two values sum to 1.1, the distance from satisfaction of I w.r.t constraint 1 from Example 2 is ∞. Following [7], we now extend distance from satisfaction to a wGAP. This definition uses an arbitrary distance function δ.4 Definition 9: Given a wGAP (Π, IC) and interpretation I, the distance from satisfaction, denoted dδ (Π ∪ IC, I), of (Π, IC) w.r.t. interpretation I is defined as: dδ (Π ∪ IC, I)

=

 δ [. . . , (Ri , I), . . . , d(Ci , I), . . .]T , ˜ 0 .

where δ is an arbitrary distance function, R = {R1 , . . . , Rn } is the set of all ground rules for rules in Π and C = {C1 , . . . , Cm } is the set of all ground constraints in IC. Thus, the distances from satisfaction for all ground rules and constraints are entered in a single vector and we measure its norm w.r.t. an arbitrary user defined distance function. We can now use this notion of “distance from satisfaction” to define a probability distribution over the space of all interpretations. Definition 10: Given a wGAP (Π, IC), and interpretation I, the probability of I given Π and IC is defined as: 1 P(I|Π, IC) = exp(−dδ (Π ∪ IC, I)) Z R where Z = I 0 exp(−dδ (Π ∪ IC, I 0 )) is the familiar normalizing constant that integrates over possible interpretations. The higher the distance from satisfaction, the lower the probability of an interpretation. Note, that an interpretation which violates any of the constraints in IC has probability 0. Our key intuition is that the truth values assigned to ground atoms by a most probable interpretation accurately resemble the result of the competing diffusion processes. We formalize this intuition in the Most Probable Interpretation Problem (MPI): Given a program Π and integrity constraints IC as input, we want to compute the interpretation I s.t. there does not exist I 0 such that P(I 0 |Π, IC) > P(I|Π, IC).

IV. A LGORITHMS In this section, we present two solutions to MPI. The first algorithm uses non-ground fixpoint computations and numeric optimization to repeatedly apply the rules in the wGAP until convergence. This standard algorithm guarantees that we will always find an exact solution to MPI. In contrast, the CODE algorithm is highly scalable and partitions a dependency graph determined by (Π, IC) to come up with a fast approximation to the correct solution to MPI. 4 A distance function on set X is a binary function s.t. δ(x, x) = 0; δ(x, y) = δ(y, x) and δ(x, z) ≤ δ(x, y) + δ(y, z). In the following we assume δ to be the Euclidean or Manhattan distance.

A. Social Network Fixpoint (SNF) Algorithm The SNF algorithm attempts to repeatedly apply the rules in (Π, IC) until a fixpoint is reached, i.e. the diffusion process converges. We first define non-ground interpretations which allow the algorithm to only consider those atoms relevant to the diffusion process and therefore save memory and time. Definition 11: A non-ground interpretation is a partial mapping N G : Atoms → [0, 1]. N G represents an interpretation grd(N G) defined as follows: grd(N G)(A) = max{N G(A0 )|A is a ground instance of A0 }. grd(N G)(A) = 0 when there is no atom A0 which has A as a ground instance and for which N G(A0 ) is defined, Thus, non-ground interpretations are compressed representations of interpretations such that the number of atoms N G keeps track of is always smaller or equal to a that of a ground interpretation. Before defining our fixpoint operator, we define an optimization problem which computes the MPI for a fixed set of ground rules and can be solved by an standard conic optimization program [15]. Definition 12 (Diffusion Optimization Problem): Given a wGAP (Π, IC), we define the optimization problem DOP (Π, IC). For each ground atom Ai , we have a variable Xi - let X be the set of all such Xi ’s. • For each ground instance Rθ (where θ is a substitution) wtj of the form A0 : f (µ1 , . . . , µn ) ← A1 : µ1 ∧ . . . An : µn of a rule R in Π, let dr(Rθ) be defined as wt × max(0, f (X1 , . . . , Xn ) − X0 ) where Xi is the variable associated with Ai . The Diffusion Optimization Problem assumes that all ground instances of rules in R are enumerated in some arbitrary but fixed order R1 , . . . , Rk . It is defined as follows: Min δ([dr(R1 ), . . . , dr(Rk )]T , ˜ 0) subject to the set of constraints below: 1) For each constraint Cj = {A1 : µ1 , . . . , As : µs } : fj (µ1 , . . . , µs )op cj , add the constraint fj (X1 , . . . , Xs )op cj 2) For each Xi ∈ X, 0 ≤ Xi ≤ 1. Example 5: Consider a program Πsm consisting of the embedding of Figure 1 and rule R1 from Example 3. Let ICsm consist of the integrity constraint from Example 2. We can create DOP(Πsm , ICsm ) as follows: Let variable X1 be associated with atom voteLabour(V11 ), X2 associated with voteT ory(V11 ), and X3 associated with suptBrown(V11 ). Then the objective function for DOP is: δ([0.7 · max(0, X3 − X1 )]T , ˜ 0) which is minimized subject to X1 + X2 ≤ 1. We now define the operator ΓΠ,IC that maps non-ground interpretations to non-ground interpretations by expanding them as the diffusion spreads across the network. Definition 13: Given a wGAP (Π, IC) and nonground interpretation N G, we define ΓΠ,IC (N G)(A) = DOP (Π0 , IC 0 )(A) where Π0 = {R|R is a ground instance of rule wtj A0 : f (µ1 , . . . , µm ) ← A1 : µ1 ∧ . . . ∧ Am : µm ∈ Π where ∀i ∈ {1, . . . , m}, N G(Ai ) 6= 0} And IC 0 = {C|C is a ground instance of integrity constraint

{A1 : µ1 , . . . , Ak : µk } : f (µ1 , . . . , µk )op c ∈ IC where ∀i ∈ {1, . . . , k}, N G(Ai ) 6= 0}. Example 6: Suppose there is a vertex V in the social network of Figure 1 such that student(V ) is annotated with a non-zero number. As this predicate does not appear in any rule heads (in the program from Example 2), the annotation will never change. Hence, using the operator Γ, we never consider these ground atoms – and have reduced the number of variables in the DOP constraints by the number of nodes in the network (not including other, similar vertex atoms). Let N G∅ be the non-ground interpretation that assigns 0 to all atoms, then we define multiple iterations of ΓΠ,IC as follows: • ΓΠ,IC ↑ 0 = ΓΠ,IC (N G∅ ) (Initialization) • ΓΠ,IC ↑ i + 1 = ΓΠ,IC (ΓΠ,IC ↑ i) (Iteration) It is obvious that ΓΠ,IC ↑ ω, the diffusion process convergence state, can be achieved in a finite number of steps that is equal to the number of ground atoms in the worst case but typically faster in practice. Hence, the operator achieves the effect of minimally grounding out Π, IC. As the non-ground interpretation returned by the operator only assigns values to ground atoms, the correctness follows immediately. Proposition 1: Correctness of SNF Algorithm • If there is a solution to DOP, then the interpretation N Gsol formed where for each atom Ai , Isol (Ai ) = Xi is the most probable non-ground interpretation, and the interpretation grd(N Gsol ) is the most probable interpretation. • If there is no solution to DOP, then the distance to satisfaction of the the most probable non-ground interpretation N Gsol and most probable interpretation, grd(N Gsol ) is ∞. And there is no interpretation s.t. the distance to satisfaction for all integrity constraints is finite. B. Scalable Algorithm Though the algorithm presented above is much more efficient than a naive version that grounds out all rules, it still grows (approximately) linearly with the size of the social network (|V|+|E|) for most diffusion models. Hence, the number of variables and constraints involved is expected to be of the order O(|V| + |E|). Taking standard optimization algorithms’ run times into account, we can expect the (polynomial) running time to be O((|V|+|E|)3.5 ) which is infeasible for larger social networks. To improve the scalability of the MPI algorithm, we split the optimization problem into smaller ones by partitioning the dependency graph of the wGAP and then iteratively solving those smaller problems until convergence. By splitting the dependency graph into relatively small isolated components, we hope to find a good approximate solution by optimizing each component independently. Let (Π, IC) be a ground wGAP. The (rule-atom) dependency graph for (Π, IC) is a weighted bipartite graph GΠ,IC where each element from Π ∪ IC as well as each ground atom occurring therein is a vertex – there is an edge from atom vertex a to a rule or constraint vertex r iff a occurs in r.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Algorithm CODE Algorithm Input: Program Π and integrity constraints IC, initial interpretation I Output: S Approximate most probable interpretation I. Π ← Π R∈Π {r | r is a grounding of R ∧ d(r, I) > 0} A ← {a | aSis a ground atom ∧ ∃r ∈ Π : a ∈ r} IC ← IC C∈IC {c | c is a grounding of C ∧ c ⊂ A ∪ domain(I)} f inalRounds ← 0, expansion ← 0 repeat GΠ,IC ← (V, E) where V = A ∪ Π ∪ IC, E = {(a, r) | r ∈ Π ∪ IC ∧ a ∈ A ∩ r} wE (a, r) ← w(r) if r ∈ Π or φ if r ∈ IC, ∀(a, r) ∈ E P ← cluster(GΠ,IC , wE , wV = 1, B) numGrounded ← 0, atomChange ← 0 for each P ∈ P res ← solveDOP (Π ∩ {r | ∃a ∈ P : a ∈ r}, IC ∩ {c | ∃a ∈ P : a ∈ c}) for each a ∈ P where res(a) > β numGrounded + + A ← AS ∪ {a} Π ← Π R∈Π S {r | ∃θ : r = Rθ ∧ a ∈ r ∧ r ⊂ A ∪ domain(I)} IC ← IC C∈IC {c | ∃θ : c = Cθ ∧ a ∈ c ∧ c ⊂ A ∪ domain(I)} atomChange ← atomChange + |res(a) − I(a)| I(a) ← res(a) expansion ← α × expansion + numGrounded if expansion < θA × |A| then f inalRounds + + until (expansion < θA × |A|)∧ (atomChange < αB × |A| ∨ f inalRounds > θC ) return I

Fig. 2.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Scalable CODE algorithm to approximate the MPI

Algorithm Dependency Graph Clustering Algorithm Input: Dependency Graph GΠ,IC , edge weights wE , vertex weight wV cluster vertex weight bound B Output: Partition P which partitions graph GΠ,IC . (c : V → C) ← ∅; c(v) = {v}∀v ∈ V ; 2W ({u},c(t))−2W ({u},c(u)−u) ∆M (u, t) ← [ − 2|E| 2degw (u)[degW (c(t))−degW (c(u)−u)] ] 2 (2|E|) P size(u) ← x∈c−1 c((u)) wV (x) repeat l←0 for all u ∈ V x ← argmaxt∈ngh(u)∧size(t)+w (u)≤B ∆M (u, t) V if ∆M (u, x) > 0 c(x) ← c(x) ∪ {u}; c(u) ← c(u) − {u}; c(u) ← c(x); l ← l + 1 until l < δ × |V | GC ← (V 0 = C, E 0 ) where E 0 = {(x, y) | ∃u ∈ c−1 (x), vP ∈ c−1 (y) : P (u, v) ∈ E} wE 0 ← E 0 → R where wE 0 ((x, y)) = u∈c−1 (x) v∈c−1 (y) w((u, v)) P wV 0 ← V 0 → R where wV 0 (x) = u∈c−1 (x) wV (x) if

|C| |V |

>γ Px ← {u | c(u) = x} P ← {Px }x∈C else P 0 ← cluster(GC , wE 0 , wV 0 , B) P ← {PX = {u | c(u) ∈ X}X∈calp0 return P

Fig. 3.

Dependency Graph Clustering Algorithm

When the destination of the edge is a rule, the weight of the edge is the weight of the rule — otherwise it is a fixed constant real number φ, e.g. a multiple of the largest rule weight. The CODE algorithm shown in Figure 2 extends the SNF algorithm presented above as follows. First, the sets of ground rules Π, constraints IC, and atoms A are initialized by considering only those ground atoms that are relevant to the non-ground interpretation. At each iteration of CODE, we construct the dependency graph GΠ,IC for the current wGAP and compute its weight function. We then partition the graph GΠ,IC into a set of smaller, disjoint subgraphs P of size at most B using the dependency graph clustering algorithm shown in Figure 3 and described below. For each subgraph P ∈ P we construct a numeric optimization model, DOP , as before, but only include those ground atoms which are vertices in P and fix all other atoms a to their truth value I(a) (i.e. they become constants in the numeric representation) where I is the current non-ground interpretation. This results

in |P| smaller optimization problems which are solved individually to update the interpretation I. If the non-ground interpretation needs to be expanded, i.e. another atom gets explicitly grounded because its inferred truth value exceeds some threshold β, then we also update the set of ground rules and atoms. The process of graph partitioning, solving the individual DOP ’s, and updating the interpretation is repeated until a convergence criterion is satisfied at which point I is returned as the most probable interpretation. To test for convergence, we track the number of groundings in the current and previous iterations through the “expansion” variable using a discount factor α for each update. In addition, we compute the change in the interpretation after solving the DOP ’s for all ground atoms in the interpretation and accumulate the difference in the “atomChange” variable. To converge, both metrics must be below certain thresholds (θA , θB ) or the maximum number of iterations must be exceeded (θC ). By adjusting these thresholds we can trade-off efficiency with accuracy of the approximation. We revisit the issue of convergence criterion and study the trade-off between fast convergence and optimal solution in the experimental section. As we are using non-ground interpretations for efficiency, the number of ground rules and constraints can increase during the execution of the algorithm as explained above. In the scalable algorithm, this does not only affect the the numeric representation but also the entails changes to the dependency graph. As new ground atoms, rules, and constraints get added, we therefore have to repartition the graph. We now address the issue of partitioning the dependency graph. Given an undirected graph G = (V, E, wE , wV ) where E ⊂ V × V is the set of edges, wE : E → R is an edge weight function and wV : V → R is a vertex weight function, graph partitioning is typically defined as the problem of partitioning S the set of vertices V into k disjoint subsets P1 , . . . , Pk , i Pi = V , Pi ∩ Pj = ∅, ∀i 6= j, of roughly equal size such that the totalPweight of edges between partitions, defined as the edge cut v∈Pi ,u∈Pj ,i6=j w(u, v), is minimized. In graph partitioning, it is usually assumed that the parameter k is given. However, what would k be in our case? On the one hand, we want each partition subgraph to be small so that the corresponding numeric optimization problem can be solved quickly. This would suggest we choose a large k. On the other hand, we need to ensure that the partition subgraphs are relatively self contained, otherwise we might get poor approximations or require many iterations to converge. This suggests we keep k small. To avoid the problem of having to choose the number of partition blocks k a priori, we resort to community finding algorithms which try to determine the densely connected and relatively isolated subgraphs that comprise a graph. Community finding algorithms are typically studied in the context of determining the groups or communities within a SN. They do not require a size parameter and do not guarantee balanced partitions, but aim to find a “natural” partition of the graph

V. D IFFUSION M ODEL F ITTING Suppose we are given a program Π with unknown weights and a set of constraints IC. Given an observed diffusion process that we would like to model with the program Π plus constraints IC, we can fit the weights to the observation by following the standard procedure of maximizing the likelihood of the observation according to the defined probability measure. Observing a diffusion process on a given social network means that we are given the “true” interpretation IT where the values IT (A) are known for all ground atoms A. Hence, we want to maximize the probability P(IT |Π, IC) which is equivalent to maximizing its logarithm: log P(IT |Π, IC) = −dδ (Π ∪ IC, I) − log Z We optimize the above function by the gradient descent based Perceptron algorithm proposed in [8]. Computing the gradient of the distance function with respect to a single weight is simple as it only requires differentiation of the distance

16000  

Exact  vs  Approximate  Algorithm   Running  Times  

14000  

Time  in  Seconds  

based on its topology. The most commonly used quality measure to identify such “natural” subgraphs is modularity. Definition 14 (Modularity): The modularity of a partition {P1 , . . . , Pk } of an undirected graph G = (V, E, wE ) with weight function w : E → R is defined as X  W (Pi , Pi ) degW (Pi )2  − mod({P1 , . . . , Pk }) = 2 |E| (2 |E|)2 Pi P where degw (v) = w((v, x)) is the weighted degree x∈VP of vertex v, W (X, Y ) = x∈X,y∈Y w((x, y)) is the sum of edge weights P connecting two sets of vertices X, Y ⊂ V , and degW (X) = x∈X degw (x) is the weighted degree of a set of vertices X ⊂ V . Intuitively, modularity measures the difference between the actual and the expected inter-block edge weight. Our dependency graph clustering algorithm displayed in Figure 3 is based on greedy modularity optimization. Our approach leverages the intuition of the algorithm proposed by Blondel et al. [16], because it also constructs a hierarchy of partitioned graphs to construct successively larger subgraphs by moving vertices greedily into the “best” partition according to the modularity measure at each level. Initially, each vertex is assigned to a unique partition block. The algorithm then repeatedly iterates over all vertices u in the graph and determines whether moving u into any neighboring block increases modularity. If so, u is moved into the block which yields the largest increase and the assignments to blocks are updated. Once the number of vertex moves falls below a certain threshold δ, we construct a contracted version of the graph by collapsing all vertices assigned to the same block into one and repeat the process until progress falls below threshold γ. Our algorithm differs from [16] in that we included a hard constraint to avoid that partition blocks grow beyond a certain size bound B and relaxed the convergence criterion to improve efficiency. By enforcing an upper bound on the size of the clusters we ensure that the resulting DOP optimization problems are small enough to be solved efficiently.

12000  

Exact  Algorithm  

10000  

Approximate  Algorithm   with  Parameters  A  

8000   6000   4000   2000   0   0  

10000   20000   30000   40000   50000   60000   70000   80000  

#  Edges  in  Graph  

Fig. 4.

Time comparison of approximate versus exact algorithm

and rule annotation functions. For the normalizing constant (called “partition function”), we apply the frequently used approximation of the gradient of the log partition function by the MAP state of the probability distribution, since computing the expectation is intractable in general. Computing the MAP state, however, only requires computing the most likely interpretation under the current set of weights. While we make no contribution to the actual weight learning algorithm, we note that the CODE algorithm can be used to efficiently approximate the MAP state and thereby greatly speed up weight learning on large social networks. VI. I MPLEMENTATION AND E XPERIMENTS We implemented the non-ground exact and approximate MPI algorithms and compared their performance on our running example voting diffusion model applied to synthetic social networks of varying size. The algorithms are implemented in Java extending the PSL inference framework [7] and using the DOGMA graph database library [17] as well as the MOSEK optimization toolbox (http://www.mosek.com). We first describe our synthetic network generation method and experimental setup before we evaluate the experimental results. A. Synthetic Multi-relational Network Generation Our experiments require multi-relational networks, that is, networks with multiple types of edge labels to distinguish between the different types of relationships which are relevant to the diffusion process. Our method for generating synthetic multi-relation networks relies on well-established characteristics of social networks, such power-law degree distribution, and proceeds as follows. The user specifies a list of edge types, declares each to be of either power-law or random degree distribution and gives the (approximate) number of vertices N of the synthetic network to be generated. For each power-law distributed edge type t, the user furthermore specifies parameters γ,α and for each of the N nodes we sample the in- and out-degree for edges of type t from the distribution D(k) = α × k −γ . We then randomly connect incoming with outgoing edges of the same type until no further matches are possible. For randomly distributed edge types t, the user specifies the expected degree d and we randomly sample d2 × N pairs (u, v) from the list of vertices [1, . . . , N ] and create an edge of type t between u and v.

Note that we adjust the user specified N by the expected number of vertices with degree 0 and remove all disconnected vertices at the end. Hence, the generated network contains only approximately N vertices.

450  

ID θA θB θC

A 0.0002 0.0004 20

B 0.0002 0.0004 10

C 0.001 0.002 5

D 0.005 0.01 4

E 0.02 0.04 3

TABLE I PARAMETER SETTINGS FOR THE CODE ALGORITHM

C. Experimental Results In Section IV we argued that the runtime complexity of the exact, non-ground MPI algorithm is expected to be cubic in the size of the network, which makes it tractable, but impractical for larger social networks. Figure 4 shows the runtime of the exact algorithm compared to the CODE algorithm with parameters A on networks with 10K to 80K edges. Even in the most conservative setting, CODE greatly outperforms the

Parameters  B   Parameters  A   Parameters  C   Parameters  D   Parameters  E  

Time  in  Seconds  

400  

B. Experimental Setup

350   300   250   200   150   100   50   0   0  

10000   20000   30000   40000   50000   60000   70000   80000  

Number  of  Edges  

Fig. 5.

Time comparison of approximate algorithms on medium networks 7%  

Percentage  Rela6ve  Error  

We generated synthetic networks of increasing size with 6 different edge types similar to the ones used in our example: knows, knows-well, mentor, boss, olderRelative, idol. The first 5 types were defined to be power-law distributed with parameter γ between 2 and 3 and parameter α between 0 and 1. The idol relationship was defined as random with parameter d = 1.8. The diffusion model we use throughout the experiments is similar to Example 2 using the relationships defined above, resulting in 7 rules with weights varying between 0 and 1. We do not claim that our simple model accurately describes the diffusion of voter opinion or that the generated networks closely resemble real social networks. Our experiments focus on the verification of our hypothesis that we can efficiently compute competitive diffusion processes on large social networks. Hence, it suffices that our model is non-trivial, i.e. contains multiple rules with varying edge weights, and that these rules “cover” the entire network, i.e. all edge types occur in some rules. In the experiments we compare versions of the CODE algorithm for five different parameters settings which are summarized in Table I. Parameter setting A is the most conservative setting and E the most relaxed. For all versions of CODE we set α = 0.2, β = 0.1, γ = 0.9, δ = 0.05, and B = 50000. All medium size experiments were executed on identical hardware with 8 core 2.33 Ghz Intel processors and 8 GB of RAM. For the large scale experiments, we used a machine with 256 GB of main memory and 24 core Intel CPU. All runtimes were averaged across three independent runs. The differences in approximation error or runtime observed for networks with more than 30000 edges are statistically significant at p = 0.001.

Running  Time  Comparison  of   Approximate  Algorithm  

500  

Rela6ve  Error  compared  to  Exact   Inference  

6%   5%  

Parameters  B   Parameters  A   Parameters  C   Parameters  D   Parameters  E  

4%   3%   2%   1%   0%   0  

10000   20000   30000   40000   50000   60000   70000   80000  

Number  of  Edges  

Fig. 6.

Error of approximation on medium-size networks

exact algorithm which quickly becomes intractable on medium sized networks. Figure 5 compares the running times of the different versions of the CODE algorithm on the same set of medium sized networks. Figure 6 shows the relative error of approximation for all five versions of CODE measured as the normalized difference in distance from satisfaction for the interpretation I computed by CODE to the most probable interpretation I ∗ computed by the exact algorithm, δ (Π∪IC,I∗) i.e. dδ (Π∪IC,I)−d . As expected, more conservative dδ (Π∪IC,I∗) parameter settings yield better approximations but also require more time whereas relaxed parameters allow CODE to terminate much faster at a greater approximation error. To verify that CODE can scale to networks of interesting size, we ran the CODE algorithm on networks with 400K to 8 million edges (up to 2 million vertices). The running times for all five versions are reported in Figure 7 which is drawn in loglog scale to accomodate the wide range in network size. We observe that CODE scales approximately linearly in the size of the network. As running the exact algorithm is clearly impractical for networks of this magnitude, we measured the relative error of approximation compared to the most conservative parameter setting A. The results are shown in Figure 8 with the x-axis drawn in log scale. We observe that the error in approximation seems to remain constant. As before, the different parameters settings highlight the trade-off between computational efficiency and approximation error. VII. R ELATED W ORK To our knowledge, this work presents the first generalized framework for competitive diffusion which scales to large

Time  in  Seconds  

40000  

VIII. C ONCLUSIONS

Run?me  Comparison  on  Large   Networks   Parameters  B   Parameters  A   Parameters  C   Parameters  D   Parameters  E  

4000  

400   3.5E+05  

7.0E+05  

1.4E+06  

2.8E+06  

5.6E+06  

Number  of  Edges  

Fig. 7.

Time comparison of approximate algorithms on large networks

Percentage  Rela9ve  Error  

6%  

Rela9ve  Error  on  Large  Networks  

5%   4%   3%  

Parameters  B   Parameters  C   Parameters  D   Parameters  E  

R EFERENCES

2%   1%   0%   3.5E+05  

7.0E+05  

1.4E+06  

2.8E+06  

5.6E+06  

Number  of  Edges  

Fig. 8.

We presented a general framework that allows for the modeling of competing diffusion processes. We considered the non-trivial case of when the spread of a phenomenon to a vertex in the network precludes the spread of another to that same vertex. We devised and implemented a scalable algorithms for modeling these processes on social networks with 8 million edges and 2 million nodes. There are many avenues that we are considering for future work, such as answering complex aggregate queries over competitive diffusion processes where we attempt to find vertices that maximize some objective function, comparing different diffusion processes on realworld data, and learning competitive diffusion models from data.

Error of approximation on large networks

social networks. Below, we place our work in the literature. In [2], a framework for diffusion is also proposed, although less general than the one presented here. The authors study a different problem than that presented in this paper - the “most influential nodes” problem. Additionally, the authors of [2] do not address competitive diffusion nor scaling to large social networks. [2] is extended to a competive scenario in [18]. However, the authors only allow one competitor to actively diffuse while all others must be static. In our work, all competitors are active at the same time. In [19], the authors provide a theoretical treatment of a problem similar to [18] wrt rumor spread which did not include an implementation. In biology, [3] presents a competitive diffusion model in which “mutant” and “resident” genes attempt to spread in a population. They view the propagation of the mutants and residents as a stochastic process and are primarily concerned with the “fixation probability” that a lone mutant overtakes a population. We allow more than two competitors, different edge labels, more generalized models, and implemented scalable algorithms that avoid costly Monte-Carlo style simulations typically used for such models. The framework of this paper extends our previous work [20] on diffusion in social networks. The approach in [20] does not consider competitive diffusion models, utilizes a less expressive semantics and does not provide an implementation. Lastly, the probabilistic model developed in this paper builds on a large body of related work in energy models from statistical physics [21]. Such models have been integrated into a logical framework within machine learning, such as PSL [7], which is most similar to this work. We extended the core PSL framework by a more expressive semantics tailored to social networks and presented a novel, scalable inference algorithm.

[1] R. M. Anderson and R. M. May, “Population biology of infectious diseases: Part I,” Nature, vol. 280, no. 5721, p. 361, 1979. [2] D. Kempe, J. Kleinberg, and E. Tardos, “Maximizing the spread of influence through a social network,” in Proceedings of KDD ’03. New York, NY, USA: ACM, 2003, pp. 137–146. [3] E. Lieberman, C. Hauert, and M. A. Nowak, “Evolutionary dynamics on graphs,” Nature, vol. 433, no. 7023, pp. 312–316, 2005. [4] R. Cowan and N. Jonard, “Network structure and the diffusion of knowledge,” Journal of Economic Dynamics and Control, vol. 28, no. 8, pp. 1557 – 1575, 2004. [5] F. C. Santos, J. M. Pacheco, and T. Lenaerts, “Evolutionary dynamics of social dilemmas in structured heterogeneous populations,” PNAS, vol. 103, no. 9, pp. 3490–3494, February 2006. [6] M. Kifer and V. Subrahmanian, “Theory of generalized annotated logic programming and its applications,” J. Log. Program., vol. 12, no. 3&4, pp. 335–367, 1992. [7] M. Broecheler, L. Mihalkova, and L. Getoor, “Probabilistic Similarity Logic,” UAI (to appear), 2010. [8] M. Collins, “Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms,” in Proceedings of EMNLP-02, 2002. [9] C. C. F.C. Coelho and H. Cruz, “Epigrass: A tool to study disease spread in complex networks,” Source Code for Biology and Medicine, vol. 3, no. 3, 2008. [10] M. Jackson and L. Yariv, “Diffusion on social networks,” in Economie Publique, vol. 16, no. 1, 2005, pp. 69–82. [11] D. Watts and J. Peretti, “Viral marketing for the real world,” Harvard Business Review, May 2007. [12] H. Ohtsuki and M. A. Nowak, “The replicator equation on graphs,” Journal of Theoretical Biology, vol. 243, no. 7, pp. 86–97, Nov. 2006. [13] H. Gaifman, “Concerning measures in first order calculi,” Israel journal of mathematics, vol. 2, no. 1, 1964. [14] D. Scott and P. Krauss, “Assigning probabilities to logical formulas,” Studies in Logic and the Foundations of Mathematics, vol. 43, 1966. [15] F. Alizadeh and D. Goldfarb, “Second-order cone programming,” Mathematical Programming, vol. 95, no. 1, pp. 3–51, 2003. [16] V. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, p. P10008, 2008. [17] M. Br¨ocheler, A. Pugliese, and V. S. Subrahmanian, “DOGMA: A diskoriented graph matching algorithm for RDF databases,” in ISWC, 2009, pp. 97–113. [18] T. Carnes, C. Nagarajan, S. M. Wild, and A. van Zuylen, “Maximizing influence in a competitive social network: a follower’s perspective,” in ICEC ’07. New York, NY, USA: ACM, 2007, pp. 351–360. [19] J. Kostka, Y. A. Oswald, and R. Wattenhofer, “Word of mouth: Rumor dissemination in social networks,” Lecture Notes in Computer Science, vol. 5058, pp. 185–196, 2008. [20] P. Shakarian, V. Subrahmanian, and M. L. Sapino, “Using Generalized Annotated Programs to Solve Social Network Optimization Problems,” ICLP (to appear), 2010. [21] R. Kindermann and J. L. Snell, Markov random fields and their applications. American Mathematical Society Providence, RI, 1980.

Suggest Documents