Contextual Prediction of Communication Flow in Social Networks

3 downloads 0 Views 2MB Size Report
Nov 5, 2008 - Work on information diffusion [Gruhl, Tomkins '04]. ▫ Early adoption based flow model for recommendation systems. [Song '06]. ▫ Analysis of ...
Contextual Prediction of Communication Flow in Social Networks

Munmun De Choudhury Hari Sundaram Ajita John Dorée Duncan Seligmann

@IEEE Web Intelligence 2007

Arts, Media & Engineering Arizona State University, Tempe Collaborative Applications Research Avaya Labs, New Jersey

November 5, 2008

1

Introduction Communication Flow

A context based framework to predict communication flow in large scale social networks. Alice

Bob

 Why is the problem important? • Determine information propagation and the roles of people in the process. • Targeted advertising, spread of fashions and fads, innovations, consumer interests etc. • Determine community evolution. @IEEE Web Intelligence 2007

Spread of innovations

November 5, 2008

2

Our Approach  Computation of intent to communicate and delay between two individuals on a particular topic.

Improvement in predicted error

Baseline

• Communication context: Our Approach Neighborhood, Topic and Error in Prediction of Intent to communicate Recipient Context. • A set of features capturing  Experimental results on communication semantics. MySpace dataset with • A SVM Regression method for effective prediction (error prediction.

~15-20%).

@IEEE Web Intelligence 2007

November 5, 2008

3

Related Work  Work on information diffusion [Gruhl, Tomkins ’04].  Early adoption based flow model for recommendation systems [Song ’06].  Analysis of emails of software developers [Bird ’06].  But in web based analysis, information flow is estimated from indirect evidence, • e.g. a topic appears on a blog several days after it appeared on another blog, not from evidence of communication

 Context has not been considered.

Temporal Pattern of Blog Posts [Gruhl et al. 2004] @IEEE Web Intelligence 2007

November 5, 2008

4

Outline Introduction / Related work

Problem Statement Communication Context SVM Based prediction MySpace dataset Experimental Results Conclusions @IEEE Web Intelligence 2007

• Two sub-problems: Intent to communicate Communication Delay • A Physics Metaphor

Intent

November 5, 2008

Delay

5

What is Intent to Communicate?  The probability that a person will engage herself in some communication (given a particular topic and at a certain point of time) with another person. • It is contingent upon several factors or features defined

by the communication context. Movie: 40% Sports: 40%

Bob

Ann Alice @IEEE Web Intelligence 2007

Movie: 80% Dinner: 20% November 5, 2008

6

What is Delay in Propagation?  The amount of time passed between the reception of a message (on a certain topic) and the corresponding response by a person. Movie: 4 hours Sports: 25 mins

Alice

@IEEE Web Intelligence 2007

Movie: 2 days Dinner: 15 hours

Bob

Ann

November 5, 2008

7

Wavefront Metaphor  Thomas Young’s experiments on the wave theory of light.  Three concepts: • Ann and Alice’s messages: primary wavefronts. • When Bob receives and responds: secondary wavefronts. • Some of the secondary wavefronts travel back to Ann and Alice: backscatter. @IEEE Web Intelligence 2007

Young’s double slit experiment

Alice

Bob

Ann

Wavefront Metaphor November 5, 2008

8

Outline Introduction / Related work Problem Statement

Communication Context SVM Based prediction MySpace dataset Experimental Results Conclusions

• What is communication context? • Role of context • Neighborhood context • Topic context • Recipient context

Neighborhood

Recipient

Topic @IEEE Web Intelligence 2007

November 5, 2008

9

Communication Context  Communication context [Mani and Sundaram ‘07] is the set of attributes that affect communication between two individuals.  Contextual attributes are dynamic [Dourish ’02]. • relationship between messages • past communication behavior of a person • response patterns from others

Mani and Sundaram ‘07 @IEEE Web Intelligence 2007

November 5, 2008

10

Neighborhood Context: Susceptibility  The susceptibility due to a contact v to her entire social network in time slice ti is given by, nv →w |u

where,

θv |u (Λ, ti ) =

tj

time-stamp of the jth message on topic Λ from v to u

φ(Λ, tj, ti)

an indicator function: 1 if tj lies in time slice ti and 0 otherwise

@IEEE Web Intelligence 2007

∑∑ w

ϕ(Λ, t j ,ti ),

j =1

Emily

Donny Alice

Bob Charlie

Susceptibility

November 5, 2008

11

Neighborhood Context: Backscatter  The backscatter of u due to a contact v in time slice ti is given by, n v → u |u

θv → u |u (Λ, ti ) =



ϕ(Λ, t j ,ti )

j =1

where, tj

time-stamp of the jth message on topic Λ from v to u

φ(Λ, tj, ti)

an indicator function: 1 if tj lies in time slice ti and 0 otherwise

Bob

Emily Alice Charlie Backscatter

@IEEE Web Intelligence 2007

November 5, 2008

12

Topic Context: Message Coherence  ConceptNet is used to compute distances between messages.  Why ConceptNet? • Expands on pure lexical terms, to compound terms – “buy food” • Contains practical knowledge – we can infer that a student is near a library.

 The distance between a message m and a topic Λ is given as:

d(m, Λ) = max min dc (wq , wk ) where,

q

k

wq a word in message m wk a word corresponding to topic Λ

@IEEE Web Intelligence 2007

Message Coherence

November 5, 2008

13

Topic Context: Temporal Coherence  Determined by the mean and variance of the difference in the time stamps of messages.  The mean μj is, µj (Λ, t j , ti ) =

∑ (T (m, Λ, t j ) − ti ) / n(Λ, t j )

m ∈t j

where, m

the index of a message of topic Λ in the time slice tj

n(Λ,tj) the number of messages on topic Λ in the time slice tj Temporal Coherence @IEEE Web Intelligence 2007

November 5, 2008

14

Recipient Context  Reciprocity reflects the symmetry in communication.  Communication correlation reflects the topical alignment of two individuals.  Communication Significance reflects the importance of communication activity with a particular person with respect to the whole social network. Communication Correlation

Reciprocity @IEEE Web Intelligence 2007

Communication Significance November 5, 2008

15

Outline Introduction / Related work Problem Statement Communication Context

SVR Based prediction

• Sequential SVR approach

MySpace dataset Experimental Results Conclusions @IEEE Web Intelligence 2007

November 5, 2008

16

The Prediction Algorithm Feature vectors, xi

Predicted intent, yi

Actual communication,

yi’

t

t+1

t+2

t

t+1

t+2

t

t+1

t+2

t

t+1

t+2

Error in prediction, E

@IEEE Web Intelligence 2007

November 5, 2008

17

Outline Introduction / Related work Problem Statement Communication Context SVM Based prediction

MySpace dataset

• Crawling Details • Topology of the crawled network

Experimental Results Conclusions @IEEE Web Intelligence 2007

November 5, 2008

18

Crawling Statistics  World’s largest social networking site with over 108 million users.  Crawling using a DFS (Depth First Strategy). A snapshot of MySpace Some statistics of crawled data: Tom

Users

20,000

Messages

1,425,010

Time-span

Sept 2005- Apr 2007 Crawling

@IEEE Web Intelligence 2007

November 5, 2008

19

Topology Characteristics

Topic Histogram Average Path Length Distribution for MySpace crawled data.

@IEEE Web Intelligence 2007

Topology Statistic

Measure

Average Shortest Path Length

5.952

Average Degree per node

215.27 (γ= 2.01 )

Mean Clustering Coefficient

0.79

November 5, 2008

20

Outline Introduction / Related work Problem Statement Communication Context SVM Based prediction MySpace dataset

Experimental Results Conclusions @IEEE Web Intelligence 2007

• Baseline heuristics for validation • Prediction of intent and delay • Feature evaluation • Network Scalability

November 5, 2008

21

Baseline Techniques  For intent to communicate: • The ratio of the number of messages n sent by u to v on topic Λ to the total number of messages on all Λ sent by u to v in the past on all topics.

 For estimate of delay: • The mean delay between two contacts u and v on topic Λ is the mean delay between all pairs of corresponding messages on the same topic. • ConceptNet is used to compute message correspondence.

@IEEE Web Intelligence 2007

November 5, 2008

22

Experimental Setup  A randomly sampled user u from the set of Tom’s (the super-user) contacts.  A set of top eight contacts (v) of u (determined by high message density).  Recipient variability: • Prediction of communication flow averaged over five weeks for each contact.

 Temporal variability: • Prediction of communication flow averaged over all eight contacts for each of the five weeks. @IEEE Web Intelligence 2007

November 5, 2008

23

Predicted Intent  The communication intent depends on a wide variety of contextual factors (neighborhood, topic, and recipient);  not just on prior probability of communication.

@IEEE Web Intelligence 2007

November 5, 2008

24

Predicted Estimate of Delay  Delay may be strongly influenced by factors other than the social network interaction (e.g. they may be habitual).

@IEEE Web Intelligence 2007

November 5, 2008

25

Evaluation of Features  A person’s neighboring social network indeed effects whether or not she will engage herself in a particular communication quickly. Errors in L-O-O Procedure

Intent Delay

N o

S us ce pt ib il it N y o B N ac o M ks es ca sa tte ge r N C o oh Te er m en po ce ra lC oh er en N o ce To pi c Q N ua o T nt op ity ic R el ev N an o C ce N om o R m ec u ni N ip ca o ro C ci t i om on ty m Co un rr ic el at at io io n n S ig ni fic an ce

Error (%)

35 30 25 20 15 10 5 0

@IEEE Web Intelligence 2007

November 5, 2008

26

Scaling Experiment Details  An exponential function: f(n)= exp(n/k), where k= 4.6 and n= 1, 2, 3, 4, …, 35 is used to choose networks with node out-degree values f(n).  Select the top three users corresponding to each f(n) based on high message density.

@IEEE Web Intelligence 2007

November 5, 2008

27

Scalability of Intent  With an increase in network size, the user is in regular correspondence with only a small fraction of the network.

Topic A @IEEE Web Intelligence 2007

Topic B November 5, 2008

28

Scalability of Delay  Delay influenced by a majority with whom the user is not in active communication.  Delay may be affected due to intrinsic factors (e.g. habitual) and less affected by the contextual factors.

Topic A

@IEEE Web Intelligence 2007

Topic B

November 5, 2008

29

Outline Introduction / Related work Problem Statement Communication Context SVM Based prediction MySpace dataset Experimental Results

Conclusions @IEEE Web Intelligence 2007

• Summary • Contributions and Future Work

November 5, 2008

30

Summary  Predict communication flow in large scale social networks based on communication context.

Neighborhood

• identified three aspects : neighborhood, topic and recipient context.

 Intent to communicate and delay predicted using SVR.  Excellent results on a real world dataset MySpace.com

Recipient Topic

• for a single user • networks of different sizes.

@IEEE Web Intelligence 2007

November 5, 2008

31

Conclusions  Consequences: • Intent to communicate strongly affected by contextual factors. • Delay is less affected.

 Modeling communication context is essential.  Future work: • Comparison against a standardized flow model e.g. epidemic disease propagation model. • Prediction, given a pair of users who are separated by n different people in the social network.

@IEEE Web Intelligence 2007

November 5, 2008

32

Thanks!

@IEEE Web Intelligence 2007

November 5, 2008

33

Suggest Documents