Nov 5, 2008 - Work on information diffusion [Gruhl, Tomkins '04]. â« Early adoption based flow model for recommendation systems. [Song '06]. â« Analysis of ...
Contextual Prediction of Communication Flow in Social Networks
Munmun De Choudhury Hari Sundaram Ajita John Dorée Duncan Seligmann
@IEEE Web Intelligence 2007
Arts, Media & Engineering Arizona State University, Tempe Collaborative Applications Research Avaya Labs, New Jersey
November 5, 2008
1
Introduction Communication Flow
A context based framework to predict communication flow in large scale social networks. Alice
Bob
Why is the problem important? • Determine information propagation and the roles of people in the process. • Targeted advertising, spread of fashions and fads, innovations, consumer interests etc. • Determine community evolution. @IEEE Web Intelligence 2007
Spread of innovations
November 5, 2008
2
Our Approach Computation of intent to communicate and delay between two individuals on a particular topic.
Improvement in predicted error
Baseline
• Communication context: Our Approach Neighborhood, Topic and Error in Prediction of Intent to communicate Recipient Context. • A set of features capturing Experimental results on communication semantics. MySpace dataset with • A SVM Regression method for effective prediction (error prediction.
~15-20%).
@IEEE Web Intelligence 2007
November 5, 2008
3
Related Work Work on information diffusion [Gruhl, Tomkins ’04]. Early adoption based flow model for recommendation systems [Song ’06]. Analysis of emails of software developers [Bird ’06]. But in web based analysis, information flow is estimated from indirect evidence, • e.g. a topic appears on a blog several days after it appeared on another blog, not from evidence of communication
Context has not been considered.
Temporal Pattern of Blog Posts [Gruhl et al. 2004] @IEEE Web Intelligence 2007
November 5, 2008
4
Outline Introduction / Related work
Problem Statement Communication Context SVM Based prediction MySpace dataset Experimental Results Conclusions @IEEE Web Intelligence 2007
• Two sub-problems: Intent to communicate Communication Delay • A Physics Metaphor
Intent
November 5, 2008
Delay
5
What is Intent to Communicate? The probability that a person will engage herself in some communication (given a particular topic and at a certain point of time) with another person. • It is contingent upon several factors or features defined
by the communication context. Movie: 40% Sports: 40%
Bob
Ann Alice @IEEE Web Intelligence 2007
Movie: 80% Dinner: 20% November 5, 2008
6
What is Delay in Propagation? The amount of time passed between the reception of a message (on a certain topic) and the corresponding response by a person. Movie: 4 hours Sports: 25 mins
Alice
@IEEE Web Intelligence 2007
Movie: 2 days Dinner: 15 hours
Bob
Ann
November 5, 2008
7
Wavefront Metaphor Thomas Young’s experiments on the wave theory of light. Three concepts: • Ann and Alice’s messages: primary wavefronts. • When Bob receives and responds: secondary wavefronts. • Some of the secondary wavefronts travel back to Ann and Alice: backscatter. @IEEE Web Intelligence 2007
Young’s double slit experiment
Alice
Bob
Ann
Wavefront Metaphor November 5, 2008
8
Outline Introduction / Related work Problem Statement
Communication Context SVM Based prediction MySpace dataset Experimental Results Conclusions
• What is communication context? • Role of context • Neighborhood context • Topic context • Recipient context
Neighborhood
Recipient
Topic @IEEE Web Intelligence 2007
November 5, 2008
9
Communication Context Communication context [Mani and Sundaram ‘07] is the set of attributes that affect communication between two individuals. Contextual attributes are dynamic [Dourish ’02]. • relationship between messages • past communication behavior of a person • response patterns from others
Mani and Sundaram ‘07 @IEEE Web Intelligence 2007
November 5, 2008
10
Neighborhood Context: Susceptibility The susceptibility due to a contact v to her entire social network in time slice ti is given by, nv →w |u
where,
θv |u (Λ, ti ) =
tj
time-stamp of the jth message on topic Λ from v to u
φ(Λ, tj, ti)
an indicator function: 1 if tj lies in time slice ti and 0 otherwise
@IEEE Web Intelligence 2007
∑∑ w
ϕ(Λ, t j ,ti ),
j =1
Emily
Donny Alice
Bob Charlie
Susceptibility
November 5, 2008
11
Neighborhood Context: Backscatter The backscatter of u due to a contact v in time slice ti is given by, n v → u |u
θv → u |u (Λ, ti ) =
∑
ϕ(Λ, t j ,ti )
j =1
where, tj
time-stamp of the jth message on topic Λ from v to u
φ(Λ, tj, ti)
an indicator function: 1 if tj lies in time slice ti and 0 otherwise
Bob
Emily Alice Charlie Backscatter
@IEEE Web Intelligence 2007
November 5, 2008
12
Topic Context: Message Coherence ConceptNet is used to compute distances between messages. Why ConceptNet? • Expands on pure lexical terms, to compound terms – “buy food” • Contains practical knowledge – we can infer that a student is near a library.
The distance between a message m and a topic Λ is given as:
d(m, Λ) = max min dc (wq , wk ) where,
q
k
wq a word in message m wk a word corresponding to topic Λ
@IEEE Web Intelligence 2007
Message Coherence
November 5, 2008
13
Topic Context: Temporal Coherence Determined by the mean and variance of the difference in the time stamps of messages. The mean μj is, µj (Λ, t j , ti ) =
∑ (T (m, Λ, t j ) − ti ) / n(Λ, t j )
m ∈t j
where, m
the index of a message of topic Λ in the time slice tj
n(Λ,tj) the number of messages on topic Λ in the time slice tj Temporal Coherence @IEEE Web Intelligence 2007
November 5, 2008
14
Recipient Context Reciprocity reflects the symmetry in communication. Communication correlation reflects the topical alignment of two individuals. Communication Significance reflects the importance of communication activity with a particular person with respect to the whole social network. Communication Correlation
Reciprocity @IEEE Web Intelligence 2007
Communication Significance November 5, 2008
15
Outline Introduction / Related work Problem Statement Communication Context
SVR Based prediction
• Sequential SVR approach
MySpace dataset Experimental Results Conclusions @IEEE Web Intelligence 2007
November 5, 2008
16
The Prediction Algorithm Feature vectors, xi
Predicted intent, yi
Actual communication,
yi’
t
t+1
t+2
t
t+1
t+2
t
t+1
t+2
t
t+1
t+2
Error in prediction, E
@IEEE Web Intelligence 2007
November 5, 2008
17
Outline Introduction / Related work Problem Statement Communication Context SVM Based prediction
MySpace dataset
• Crawling Details • Topology of the crawled network
Experimental Results Conclusions @IEEE Web Intelligence 2007
November 5, 2008
18
Crawling Statistics World’s largest social networking site with over 108 million users. Crawling using a DFS (Depth First Strategy). A snapshot of MySpace Some statistics of crawled data: Tom
Users
20,000
Messages
1,425,010
Time-span
Sept 2005- Apr 2007 Crawling
@IEEE Web Intelligence 2007
November 5, 2008
19
Topology Characteristics
Topic Histogram Average Path Length Distribution for MySpace crawled data.
@IEEE Web Intelligence 2007
Topology Statistic
Measure
Average Shortest Path Length
5.952
Average Degree per node
215.27 (γ= 2.01 )
Mean Clustering Coefficient
0.79
November 5, 2008
20
Outline Introduction / Related work Problem Statement Communication Context SVM Based prediction MySpace dataset
Experimental Results Conclusions @IEEE Web Intelligence 2007
• Baseline heuristics for validation • Prediction of intent and delay • Feature evaluation • Network Scalability
November 5, 2008
21
Baseline Techniques For intent to communicate: • The ratio of the number of messages n sent by u to v on topic Λ to the total number of messages on all Λ sent by u to v in the past on all topics.
For estimate of delay: • The mean delay between two contacts u and v on topic Λ is the mean delay between all pairs of corresponding messages on the same topic. • ConceptNet is used to compute message correspondence.
@IEEE Web Intelligence 2007
November 5, 2008
22
Experimental Setup A randomly sampled user u from the set of Tom’s (the super-user) contacts. A set of top eight contacts (v) of u (determined by high message density). Recipient variability: • Prediction of communication flow averaged over five weeks for each contact.
Temporal variability: • Prediction of communication flow averaged over all eight contacts for each of the five weeks. @IEEE Web Intelligence 2007
November 5, 2008
23
Predicted Intent The communication intent depends on a wide variety of contextual factors (neighborhood, topic, and recipient); not just on prior probability of communication.
@IEEE Web Intelligence 2007
November 5, 2008
24
Predicted Estimate of Delay Delay may be strongly influenced by factors other than the social network interaction (e.g. they may be habitual).
@IEEE Web Intelligence 2007
November 5, 2008
25
Evaluation of Features A person’s neighboring social network indeed effects whether or not she will engage herself in a particular communication quickly. Errors in L-O-O Procedure
Intent Delay
N o
S us ce pt ib il it N y o B N ac o M ks es ca sa tte ge r N C o oh Te er m en po ce ra lC oh er en N o ce To pi c Q N ua o T nt op ity ic R el ev N an o C ce N om o R m ec u ni N ip ca o ro C ci t i om on ty m Co un rr ic el at at io io n n S ig ni fic an ce
Error (%)
35 30 25 20 15 10 5 0
@IEEE Web Intelligence 2007
November 5, 2008
26
Scaling Experiment Details An exponential function: f(n)= exp(n/k), where k= 4.6 and n= 1, 2, 3, 4, …, 35 is used to choose networks with node out-degree values f(n). Select the top three users corresponding to each f(n) based on high message density.
@IEEE Web Intelligence 2007
November 5, 2008
27
Scalability of Intent With an increase in network size, the user is in regular correspondence with only a small fraction of the network.
Topic A @IEEE Web Intelligence 2007
Topic B November 5, 2008
28
Scalability of Delay Delay influenced by a majority with whom the user is not in active communication. Delay may be affected due to intrinsic factors (e.g. habitual) and less affected by the contextual factors.
Topic A
@IEEE Web Intelligence 2007
Topic B
November 5, 2008
29
Outline Introduction / Related work Problem Statement Communication Context SVM Based prediction MySpace dataset Experimental Results
Conclusions @IEEE Web Intelligence 2007
• Summary • Contributions and Future Work
November 5, 2008
30
Summary Predict communication flow in large scale social networks based on communication context.
Neighborhood
• identified three aspects : neighborhood, topic and recipient context.
Intent to communicate and delay predicted using SVR. Excellent results on a real world dataset MySpace.com
Recipient Topic
• for a single user • networks of different sizes.
@IEEE Web Intelligence 2007
November 5, 2008
31
Conclusions Consequences: • Intent to communicate strongly affected by contextual factors. • Delay is less affected.
Modeling communication context is essential. Future work: • Comparison against a standardized flow model e.g. epidemic disease propagation model. • Prediction, given a pair of users who are separated by n different people in the social network.
@IEEE Web Intelligence 2007
November 5, 2008
32
Thanks!
@IEEE Web Intelligence 2007
November 5, 2008
33