Discovering Community Structure in Dynamic Social Networks using ...

4 downloads 14023 Views 2MB Size Report
Keywords: Dynamic social network; Organizational structure;. Community discovery; Evolution analysis; Web ranking; Crawling;. Correlation Density Rank. 1.
2014 ASE BIGDATA/SOCIALCOM/CYBERSECURITY Conference, Stanford University, May 27-31, 2014

Discovering Community Structure in Dynamic Social Networks using the Correlation Density Rank Zeynab Bahrami Bidoni 1, Roy George2 Department of Computer and Information Systems Clark Atlanta University, Atlanta, GA

[email protected] 1; [email protected] 2

changing organizational structure, such as who is becoming more powerful or the shifting of the power structure.

Abstract Recent research has produced advances in the understanding of communities within a dynamic social network. A “community” in this context is defined as a subgraph with a higher internal density and a lower crossing density with respect to other subgraphs. In this paper, we describe a novel and efficient distance based ranking algorithm, called the “Correlation Density Rank” (CDR), which is utilized to derive the community tree from the social network and to develop a tree learning algorithm that is employed to construct an evolving community tree. Also, we present an evolution graph of the organizational structure, through which new insights into the dynamic network may be obtained. The experiments, conducted on a datasets, both synthetic and real, demonstrate the feasibility and applicability of the framework. Keywords: Dynamic social network; Organizational structure; Community discovery; Evolution analysis; Web ranking; Crawling; Correlation Density Rank.

1. Introduction Community detection is an important research issue in social network analysis (SNA), where the objective is to recognize related sets of members such that intra-community associations are denser than inter-communities associations [110]. Researchers have presented various methods to extract communities from a Social Network (SN) data. . In particular, discovering the organizational structure of communities in an SN has been identified as an interesting but challenging problem [11,12] Examples of important applications include characterizing potential candidates for viral marketing, finding members of criminal groups, discovering affinity groups, etc. [12] While there has been research on finding key members in an SN [11-15] the results have limited power to supply a complete view of the organizational structure. In the real world, social networks are constantly changing and evolving. New members may join the network, existing members may quit from the network, and associations among members constantly change over time. Therefore, the approach should be capable of supporting the exploration of organizational structure dynamically. Some earlier research has provided approaches to detecting communities from a dynamic social network [16-19] These approaches discover changes of communities in an SN, but do not answer questions related to

©ASE 2014

In the workflow area, organization mining has also been the focus of past research [20-22]. The workflow area research has emphasized the exploration of organizational structure in the unit from event logs of information systems. Research efforts [23-25] have also addressed determining the hierarchy in an SN, where hierarchy has a similarity to an organizational structure. In this paper we use the notion of a community tree data structure to represent organizational structure and its evolution. This approach is similar to that presented by Qui and Lin [26] where the ranking of nodes is based on PageRank algorithm with iteration complexity of the order of logN [27]. However, the technique presented here has no iteration complexity and is not susceptible to network anomalies [28] such as spider-traps, dead-ends, etc. and the “rich-get-richer” problem [29]. Consequently, the approach in this paper is algorithmically efficient and produces communities and organizational structures with better accuracy. We propose a novel density measure, the Correlation Density Rank (CDR), as the basis of community and organizational structure detection. We apply a density based method to describe the relationship between nodes linked by edges. In comparison with other earlier Density based algorithms [30-32], this approach offers several advantages (1) the CDR is a generalized formulation, which permits the weighting of different correlation types, . (2) a more realistic solution where with important neighbors have more priority than lesser ones is employed, and (3) optimization of the CDR is easier, since the number of parameters needed for tuning is smaller. The contributions of this paper are as follows: (1) Developing an algorithm to derive the community tree from the SN. (2) Developing a tree learning algorithm to generate the evolving community trees. (3) Proposing an approach for representing the evolution of the organizational structure based on the evolving community trees. The rest of the paper is organized as follows. Section 2 introduces the concept of community tree and proposes an approach for deriving the community tree from a static social network. Section 3 presents the methods for analyzing the evolution of organizational structure in a dynamic SN. Section 4 provides the experimental results. Section 5 offers concluding remarks.

ISBN: 978-1-62561-000-3

1

2014 ASE BIGDATA/SOCIALCOM/CYBERSECURITY Conference, Stanford University, May 27-31, 2014

2. Discovering the organizational structure in a static social network 2.1 Organization structure of a network Song and Van [33] developed a definition of organizational model, where the organizational model consists of organizational units (e.g. functional units), roles (e.g. duty), originators, and relationships (e.g., hierarchy). In this paper, we assume that the organizational structure of an SN is a hierarchy that represents communities (or units) and subordinations of members in the SN. However, the derived hierarchical organizational structure of an SN is not necessarily reflected in a real-world organization [26]. For instance, there exists real organizations that are cannot be characterized as an academic network, nor in a blogosphere. In this case for a network formed by e-mail communications, we may not be able to exactly determine to what extent the relationships can be mapped to the real-world organizational structure. Therefore, in deriving the organizational relationships, subordination describes the relationship between two members where the leader is the most likely and more important destination of information flow starting from the subordinate. i.e., the subordinate has a closest interaction with the leader in comparison with others while the leader has a higher possibility of attracting interaction with the nodes across the entire network in comparison to the subordinate nodes. The importance of members in the SN, is depicted through a score, the m-Score, which is equivalent to CDR value of a member. The m-Score value of a member would be its certainty on attracting interaction with all nodes through whole network. The higher the score is, the more important the member is. We use a data structure, called the community tree, to represent the SN organizational structure. Definition 1. (Community Tree): Let N  n1,..., n k  be a collection of members in an SN, CT is a tree, and NULL is the root of the tree, and every member in N is referred to as a node in the tree. Each member ni in CT it has a unique parent node

n j where m-Score ( n j ) > m-Score ( ni ). If the parent node of

ni is the root node NULL, ni is called a core of the tree. A core and its descendants compose a community. To derive the organizational structure, we calculate the mScore for every member and then attempt to find the immediate leader of every member in a network. Further, we construct the community tree using m-Score and subordinations. After having constructed the SN community tree, we can discover communities and obtain the SN organizational structure. An example of a community tree is illustrated in Fig. 1 where node 1 and node 5 are cores of community 1 and community 2, respectively.

This research is funded in part by the Army Research Laboratory under Grant No: W911NF-12-2-0067 and Army Research Office under Grant Number W911NF-11-1-0168. Any opinions, findings, conclusions or recommendations expressed here are those of the author(s) and do not necessarily reflect the views of the sponsor.

©ASE 2014

Figure 1. An example of a community tree [26].

SN may be regarded as a graph indicating information flows among members. The relationship of SN members can be obtained by analyzing information flows [34]. The process of determining SN information flow is similar to a random walk on a graph [35,36]. Given a graph and a starting point, the starting point's neighbor is selected at random, and the next start point is moved to this neighbor then a neighbor of this new start point is selected again at random and so on. The random sequence of points generated in this process is a random walk on the graph. The expected lengths of random walks on the graph, can be used to derive randomized shortest paths (RSP) dissimilarity [37,38]. The RSP dissimilarity, which has its foundation in statistical physics, may be used to compute the shortest path distance for all pairs of nodes of a graph in closed form. Recently [39] has generalized distance from the RSP framework based on the Helmholtz free energy between two states of a thermodynamic system with a distance measure, the free energy distance. We employ the RSP measurement method in [39] as the distance between nodes, but with one major difference: we consider customized initial cost for edges such that, along with finding shortest path between nodes. The random walker intelligently selects the most important neighbor resulting in lower cost and smaller distance. Combining RSP with m-Score of every node, we can find an immediate leader for every member in an SN (see section 2.3). The SN community tree is derived in this way. Our framework includes the following steps to derive the community tree: a) Employ the novel “Correlation Density Rank” method to ranking nodes as the m-Score for every node in an SN; and, b) Combine RSP with m-Score of every node to derive a community tree. 2.2 Calculating m-Score To calculate the m-Score for every node, we need to investigate each node's importance in a network. Those nodes that link many important nodes are also themselves important. Such a process is very similar to PageRank based algorithms [40]. PageRank is a link analysis algorithm that produces a global “importance” ranking for every web page by analyzing

ISBN: 978-1-62561-000-3

2

2014 ASE BIGDATA/SOCIALCOM/CYBERSECURITY Conference, Stanford University, May 27-31, 2014

links among web pages. A fast and efficient page ranking mechanism for web crawling and retrieval remains a challenging issue. Recently, several link based ranking algorithms like PageRank, HITS, OPIC and etc. [27] have been proposed. We propose a novel method, Correlation Density Rank, based on finding more frequent and influential RSP. The CDR considers the entropy of distance between nodes as punishment and is used to compute ranks of nodes. Hence, there will be a larger traffic amongst shortest path of nodes, if the distance becomes smaller. As proved in [41], if the distance between i and j was less than the distance between i and k, then, i’s rank effect on j is more than on k, in other words, the probability that a random surfer reach j from i is more than the probability to reach k. Therefore, the objective is to minimize punishment so that a node with less distance entropy to have a higher rank. The Shannon entropy is a measurement of system uncertainty [42]. The larger the Shannon entropy is, the more uncertainty the system will be. If the CDR value of a node in complex network is the smallest, then the uncertainty of its distance distribution from other nodes is the greatest. On the contrary, while the CDR value of a node in complex network is very high, then the uncertainty of its distance distribution from other nodes is the smallest. Moreover, the more popular nodes are, the more linkages other nodes tend to have to them or are linked to by them. The proposed algorithm is analogous to the weighted PageRank algorithm [43], assigning larger rank values to more important (popular) nodes instead of dividing the rank value of a node evenly among its out-link nodes. We assign each out-link node a value proportional to its popularity (its number of in-links and out-links). The popularity from the number of in-links and out-links is recorded as W ijin and W ijout , respectively.

On j

W ijout 



Op

p R ( ni )

Where O n j and O p represent the number of out-links of node n j and node p, respectively. R( ni ) denotes the reference node list of node ni . These equations has two exceptions, first, if node n j is a dead-end (which may be easily determined from the frequency matrix), we let W ijout   that  is a very small number less than 1. Second, W ijout , W ijin  1 , that means R( ni )={ n j } we add 

to sum of the reference nodes’

frequency out/in-link. An algorithm for calculating the mScore of members in a social network is described as follows. Algorithm 1. Calculating m-Score for members: Correlation Density Rank (CDR) Input: social network G Out: vector of m-Score for all members R 1.

Initialize cost distance matrix C

C [ i , j ]  lo g

(1  e x p (   f ij ))

1  w

in out ij w ij



(The logarithm of (1  exp( f ij )) based on (1  w ijinw ijout ) ) 2.

W ijin is the weight of link between node ni and n j

Finding the matrix of RSP dissimilarities by employ the algorithm of [43]: { W ← P ref  exp(  C ) 1

calculated based on the number of in-links of node n j and the

Z ←  I W  (Note that  I W 1  I

number of in-links of all reference nodes of node ni .

S ← Z C W  Z  (Z   )



I nj

W ijin 



Where I n j and I p represent the number of frequency inlinks of node n j and node p, respectively. R( ni ) denotes the reference node list of node ni .

2

W

3

 ...)



C ← S  ed sT T  R S P ← C  (1   )C

Ip

p R ( ni )

0   1

}

3.

M ← Normalize matrix  R S P on rows

4.

For each node ni ( 1  i  k ) compute the entropy of related row from matrix M: 1 k Ei ←   M ij Ln (M ij ) Lnk j 1

W ijout is the weight of link between node ni and n j calculated based on the number of frequency out-links of node n j and the number of out-links of all reference nodes of node

di ← 1 Ei Ri ← d i

ni . 5.

©ASE 2014

W  W



k i 1

di

Return R

ISBN: 978-1-62561-000-3

3

2014 ASE BIGDATA/SOCIALCOM/CYBERSECURITY Conference, Stanford University, May 27-31, 2014

Where f ij is the number of frequency from node ni to node

n j . if f ij  0 , let C [i , j ]   or a very big number. P ref is the transition probability matrix that Pijref is equal to the rate of f ij divided by sum of frequency between node ni and all its references nodes. k is the number of members in social network (or nodes on G). The parameters  ,  and  are input values determined by user.  controls the effect of frequency on the cost function which restrict cost ratio with respect to our defined infinite constant. β is the influence of the cost on the walker’s selection of a path, and is equal to inverse of temperature at Helmholtz free energy in thermodynamical system [39].  is the weight factor by which the leadership status of members will be distinguished by the amount of contact density comes in or goes out. Also, in step 2, d s  diag (S ) is the vector of diagonal elements of S, and e is the identity matrix. Note that A ◦ B and A ÷ B are elementwise product and division, respectively. For calculating step 2, we use the easier way of computing the matrix Z [38]. The values of Ri ( 1  i  k ) indicate the final m-Scores of members in the social network. 2.3 Deriving the Community Tree The m-Score of every member is combined with the normalized RSP matrix (M) on the graph to derive the community tree from an SN. The RSP matrix helps us to find the most likely and closest interaction for each node, and mScores determine whether there is the leadership relation between two nodes with closest interaction. Algorithm 2. Deriving Community Tree: CT_Deriving Input: Social network G Output: Community Tree CT 1. CT← [null,…,null] 2. R ← Correlation Density Rank (G); 3. For each member ni {

node of node i. After all the operations end, we obtain a community tree CT represented in an array where CT[i] indicates the parent node of node i. A null value of CT[i] indicates that there is no the immediate leader of node i. Hence node i is the core of community. If a node does not have both immediate leader and belongingness, is called private node. Node i and its descendent compose a community. The mScore of each member shows the member's importance in the community. The parent of each node is its immediate leader. 3. Exploring evolution of the organizational structure in a dynamic social network In Section 2, we develop the algorithms to derive a static community tree from a static SN. However, the static community tree does not do a good job in presenting the evolution of the organizational structure in the dynamic SN, because it does not consider intra time-step evolutions. To present the evolution of organizational structure in a dynamic SN, we aggregate the change of an SN during different time periods, and then derive community trees. We further construct the evolving community tree from the two closest community trees. Hence, the evolving community tree can accurately present evolution of organizational structure over time. The Best-first search algorithm explores a graph by expanding the most promising node chosen according to a specified rule [44]. Motivated by the idea of the Best-first search, [26] uses tree edit distance [45] (defined as the least cost of edit operation to change a tree to another tree) as a measure of distance (similarity) between the two trees. We propose a new algorithm to derive the evolving community tree with more efficiency and less complexity than previous ways. Because constructing community tree in our idea is based on RSP, we only need to compute linear combination of RSP of two static community trees and use the new RSP to drive evolving CT using algorithm (2) without any iteration. However, [26] generated a collection of candidate evolving community trees and then chose the candidate having the minimum ES score (by means of scoring function, which measures distance errors among evolving CT, previous period CT and current period CT) as a solution of each iteration. 3.1 Tree learning algorithm We propose a tree learning algorithm to derive an evolving community tree from two static community trees. The constructing process is as follows:

k← arg j min (M ij )

(1) Obtain a collection of members in the evolving community tree N ce  N pre  N cs where N pre and N cs are

if R[k] > R[i] CT[i] ← k } 4. Return CT

collections of members in the previous time period static community tree and current time periods one respectively;

Initiating a random walks at node i, we can find ending node j with the most likely correlation density, from the normalized RSP matrix M, by starting. If the m-Score of node j is much greater than that of node i, we regard j as the parent

(2) Compute the RSP for all pair members in the evolving community tree, where α is a smoothing factor, RSP ceRSP  (1   )   RSP . For those members that pre    cs

©ASE 2014

ISBN: 978-1-62561-000-3

4

2014 ASE BIGDATA/SOCIALCOM/CYBERSECURITY Conference, Stanford University, May 27-31, 2014

appear in the evolving community tree but not in the current community tree, if their m-scores are less than a threshold , we regard them as retired members and remove them from the evolving community tree; (3) According to the definition of algorithm 1, having  matrix, we can compute Rce , M ce matrices. Then, by applying them into Algorithm 2, we construct the evolving community tree. RSP ce

Algorithm 3. Learning evolving ECT_Learning RSP Input: RSP matrices  RpreSP , cs

community

tree:

c i 1 is the evolving entity of ci . We say c1 ,...,c n  is a lifeline of the community c1 . Community c k C is called the entity of the life-line. A life-line depicts an evolving process of one community in the dynamic SN. Definition

3.

(supporter):

Given

a

life-line

CL  c1,c 2 ,...,c n  , we call the members as supporters of CL if they appear in CL not less than δ times with   n . In Definition 3,  is a parameter set by the user. If there is a life-line CL  c1,c 2 ,c3 ,c 4  and   3 , that means only members appearing in the life-line not less than 3 times are supporters of CL . Exploring life-lines in the evolution graph and their supporters helps to better understand dynamic social networks. For example, we can discover the backbone of criminal group or detect loyal members in a forum over time.

Output: Evolving community tree CT e 1.

N ce  N pre  N cs

2.

RSP  ceRSP  (1   )   RSP pre     cs

3.

N ce  N ce  n i  N ce  N cs m  score (n i )   

4. 5. 6.

Do step 3,4 of algorithm 1 to find Rce , M ce Employ algorithm2 to find CT ce Return CT ce

3.2 Exploring dynamic social network After deriving a series of evolving community trees, we exploit these trees to discover the evolution path of the organizational structure and to study the properties of the dynamic SN. We can consider four types of relationships among communities to generate an evolution graph that represent evolution of the organizational structure. Fig. 2 provides examples to illustrate the relationships among communities [26].

Definition 4. (Activeness of community): Let node p be the core of community c , we use the m -Score of p to indicate the activeness of community c . Activeness of community is a metric for the extent to which associations occur among members of the community over one period of time. We can use Activeness of community to reveal hot communities, which may reflect on-going hot topics in the forum or new activities in a criminal group. 4. Experiments In order to implement our approach, first, we consider a small dynamic, synthetic network (Fig. 3) during five time intervals.

Figure 2. Relationships between communities.

Combing evolution graph and dynamic SN properties, we can obtain insights into the dynamic SN. Definitions 2–4 define key properties of the dynamic SN [26]. Definition 2. (life-line): Let C  c1 ,...,c n  be a collection of communities. For each community c i C with i  n ,

©ASE 2014

Figure 3. A small dynamic network with frequency contacts between nodes

ISBN: 978-1-62561-000-3

5

2014 ASE BIGDATA/SOCIALCOM/CYBERSECURITY Conference, Stanford University, May 27-31, 2014

After employing algorithms 1 and 2, we find static community trees for each periods of time separately that shown in Fig. 4 The arrows indicate leadership, and The parameters let   0.1,   0.001,   0.5,   0.01,   0.7 .

Now, in this stage, we have information about leadership during each periods of time separately that is not enough for the complete analysis. So, with the purpose of achieve a good understanding of the network’s changing trends during intra time-step evolutions, we employ algorithms 3 to drive the evolving community trees shown in Fig. 6.

Figure 6. The evolving community trees for the synthetic network

The analysis of outputs clarifies some events that were happened during time:

Figure 4. Static community trees for each periods of time separately.

Also, the trend of member’s activity in Fig. 5 May be seen.

0.35 0.3 Activity

0.25

A

0.2

B

0.15

C

0.1

D

0.05

E

0 0

2 4 periods of time

6 Figure 7. Evolution of communities for the synthetic network.

Figure 5. Evolving member’s activity for the synthetic network

©ASE 2014

a) From T1 to T2, the leader of member E changes from C to B. b) From T2 to T3, leader of members A and E change from B to D. c) From T3 to T4, 1) splitting on B, C. 2) evolving between C, D. 3) emerging the core D related to members B, E. d) From T4 to T5, splitting on member E.

Fig. 7 Shows the evolution map of communities in which Community that its core is D has most stability (number of

ISBN: 978-1-62561-000-3

6

2014 ASE BIGDATA/SOCIALCOM/CYBERSECURITY Conference, Stanford University, May 27-31, 2014

supporters divided by the size, named stability. Size is the average number of members in each community in life-lines.) among all communities and its supporters are members B and C. While the straightforward application of this method is in social networks, this technique is appropriate for all type of complex networks, and the type of network does not influence the results. We have used the real frequency data from a computer network of 288 nodes to evaluate this approach. Data for a period of 100 seconds, divided into 10 equal periods of 10 seconds each, is used to construct evolving community trees and draw evolution map of communities they are. After employing algorithms 1, 2 and 3, we have reached evolving community trees as shown in Appendix. The arrows indicate hierarchical leadership between nodes, and the parameters let   0.1,   0.001,   0.5,   0.01,   0.7 .

and a real world datasets, and produce good results. The experiments show that the framework can well present the organizational structure of a social network. We obtained experimentally the following insights: (1) those communities with long life-line and great stability likely correspond to a real organization; and (2) the cores in an organizational structure, in general, are either the leaders of the organization or the agents of these leaders. Although it is possible that the organizational structure discovered from a social network is not perfectly in line with the real world organization, the approach described here helps reach new understandings of the organization based on the power of attracting information flow and the interaction closeness.

References [1]

[2]

[3]

[4]

[5] [6] [7]

Figure 8. Evolution of communities for the real data set.

Fig. 8 Shows the evolution map of communities, in which the community that its core is node 82 has 185 size, longer lifeline (30 second) and the largest stability with amount of 0.7 among all communities. As expected, the algorithm identifies routers and hubs as the cores of communities.

[8]

[9]

[10]

5. Conclusions Exploring organizational structure in a dynamic social network has a broad range of applications, such as monitoring gang activities, fraud detection, and improving performance of viral marketing. In this paper, we present our research effort in extracting organizational structure from such data to obtain a better understanding of the social network. We formalize a community tree data structure for the purpose of representing the social network organizational structure, and propose a framework to explore the dynamic behavior of the participants of the community. The framework is composed of three main parts: (1) defining the “Correlation Density Rank”, to rank the nodes to acquire a community tree from the static social network; and, (2) a tree learning algorithm, which employs the tree edit distance as a scoring function, to generate the evolving community tree; These algorithms were applied to a synthetic

©ASE 2014

[11]

[12]

[13] [14]

[15]

[16]

Chekuri, C.S., et al. Experimental study of minimum cut algorithms. in Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms. 1997. Society for Industrial and Applied Mathematics. Ding, C.H., et al. A min-max cut algorithm for graph partitioning and data clustering. in Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on. 2001. IEEE. Hagen, L. and A.B. Kahng, New spectral methods for ratio cut partitioning and clustering. Computer-aided design of integrated circuits and systems, ieee transactions on, 1992. 11(9): p. 1074-1085. Long, B., et al. Community learning by graph approximation. in Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on. 2007. IEEE. Newman, M.E., Fast algorithm for detecting community structure in networks. Physical review E, 2004. 69(6): p. 066133. Newman, M.E. and M. Girvan, Finding and evaluating community structure in networks. Physical review E, 2004. 69(2): p. 026113. Shi, J. and J. Malik, Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2000. 22(8): p. 888-905. Wu, A.Y., M. Garland, and J. Han. Mining scale-free networks using geodesic clustering. in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. 2004. ACM. Xu, X., et al. SCAN: a structural clustering algorithm for networks. in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 2007. ACM. Zhou, D., et al. Probabilistic models for discovering e-communities. in Proceedings of the 15th international conference on World Wide Web. 2006. ACM. Goyal, A., F. Bonchi, and L.V. Lakshmanan. Discovering leaders from community actions. in Proceedings of the 17th ACM conference on Information and knowledge management. 2008. ACM. Xu, J.J. and H. Chen, CrimeNet explorer: a framework for criminal network knowledge discovery. ACM Transactions on Information Systems (TOIS), 2005. 23(2): p. 201-226. Carley, K.M., et al., Toward an interoperable dynamic network analysis toolkit. Decision Support Systems, 2007. 43(4): p. 1324-1347. [14]. Ma, H., et al. Mining social networks using heat diffusion processes for marketing candidates selection. in Proceedings of the 17th ACM conference on Information and knowledge management. 2008. ACM. Xu, J.J. and H. Chen, Fighting organized crimes: using shortest-path algorithms to identify associations in criminal networks. Decision Support Systems, 2004. 38(3): p. 473-487. Kumar, R., J. Novak, and A. Tomkins, Structure and evolution of online social networks, in Link mining: models, algorithms, and applications. 2010, Springer. p. 337-357.

ISBN: 978-1-62561-000-3

7

2014 ASE BIGDATA/SOCIALCOM/CYBERSECURITY Conference, Stanford University, May 27-31, 2014 [17] Tang, L., et al. Community evolution in dynamic multi-mode networks. in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 2008. ACM. [18] Tantipathananandh, C., T. Berger-Wolf, and D. Kempe. A framework for community identification in dynamic social networks. in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 2007. ACM. [19] [19]. Zhou, D., et al. Discovering temporal communities from social network documents. in Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on. 2007. IEEE. [20] Bertino, E., E. Ferrari, and V. Atluri, The specification and enforcement of authorization constraints in workflow management systems. ACM Transactions on Information and System Security (TISSEC), 1999. 2(1): p. 65-104. [21] Song, M. and W.M. van der Aalst, Towards comprehensive support for organizational mining. Decision Support Systems, 2008. 46(1): p. 300317. [22] Zur Muehlen, M., Organizational management in workflow applications–issues and perspectives. Information Technology and Management, 2004. 5(3-4): p. 271-291. [23] Clauset, A., C. Moore, and M.E. Newman, Hierarchical structure and the prediction of missing links in networks. Nature, 2008. 453(7191): p. 98-101. [24] Grobelnik, M., D. Mladenic, and B. Fortuna, Semantic technology for capturing communication inside an organization. Internet Computing, IEEE, 2009. 13(4): p. 59-67. [25] Li, H., et al. Scalable community discovery on textual data with relations. in Proceedings of the 17th ACM conference on Information and knowledge management. 2008. ACM. [26] Qiu, J. and Z. Lin, A framework for exploring organizational structure in dynamic social networks. Decision Support Systems, 2011. 51(4): p. 760-771. [27] Singh, A.K. and P. Ravi Kumar, A comparative study of page ranking algorithms for information retrieval. International journal of electrical and computer engineering, 2009. 4(7): p. 469-480. [28] Rajaraman, A. and J.D. Ullman, Mining of massive datasets. 2012: Cambridge University Press. [29] Cho, J., S. Roy, and R.E. Adams. Page quality: In search of an unbiased web ranking. in Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 2005. ACM. [30] Jin, Hong, Shuliang Wang, and Chenyang Li. "Community detection in complex networks by density-based clustering." Physica A: Statistical Mechanics and its Applications 392.19 (2013): 4606-4618. [31] Qi, X., et al., Optimal local community detection in social networks based on density drop of subgraphs. Pattern Recognition Letters, 2014. 36: p. 46-53. [32] Gong, M., et al., Novel heuristic density-based method for community detection in networks. Physica A: Statistical Mechanics and its Applications, 2014. 403: p. 71-84. [33] Song, M. and W.M. van der Aalst, Towards comprehensive support for organizational mining. Decision Support Systems, 2008. 46(1): p. 300317. [34] Gruhl, D., et al. Information diffusion through blogspace. in Proceedings of the 13th international conference on World Wide Web. 2004. ACM. [35] Craswell, N. and M. Szummer. Random walks on the click graph. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 2007. ACM. [36] Lovász, L., Random walks on graphs: A survey. Combinatorics, Paul erdos is eighty, 1993. 2(1): p. 1-46. [37] Yen, L., et al. A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 2008. ACM. [38] Saerens, M., et al., Randomized shortest-path problems: Two related models. Neural Computation, 2009. 21(8): p. 2363-2404. [39] Kivimäki, I., M. Shimbo, and M. Saerens, Developments in the theory of randomized shortest paths with a comparison of graph node distances.

©ASE 2014

[40]

[41]

[42]

[43]

[44] [45]

Physica A: Statistical Mechanics and its Applications, 2014. 393: p. 600-616. Brin, S. and L. Page, The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems, 1998. 30(1): p. 107-117. Zareh Bidoki, A.M. and N. Yazdani, DistanceRank: An intelligent ranking algorithm for web pages. Information Processing & Management, 2008. 44(2): p. 877-892. Anand, K. and G. Bianconi, Entropy measures for networks: Toward an information theory of complex topologies. Physical Review E, 2009. 80(4): p. 045102. Xing, W. and A. Ghorbani. Weighted pagerank algorithm. in Communication Networks and Services Research, 2004. Proceedings. Second Annual Conference on. 2004. IEEE. Pearl, J., Heuristics: intelligent search strategies for computer problem solving. 1984. Bille, P., A survey on tree edit distance and related problems. Theoretical computer science, 2005. 337(1): p. 217-239.

ISBN: 978-1-62561-000-3

8

2014 ASE BIGDATA/SOCIALCOM/CYBERSECURITY Conference, Stanford University, May 27-31, 2014

Appendix

35 08 2

99

55 3 1

00 2

159

214

44

79

154

266

55

225

7 1

194

215

28

12

224

1

72

178

36

75

14

66

252

81

237

96 1

24

260

112

13 69

162

101

32

7 4

243

57

272 83 108

55

2 72

232

48 1

11 4

24

129

203

260

111 20

84

267

77 2

129

23

137

162

14

140

89

68 222

61 1 219

175

138

81

150

50

255

25

213

207

61

101

144

24

18 83

39

15 2

21 4 34

63 1

1 2

119 67

44

20 5 93 19 2

17

14 3

71

25 0

164

32

82

35

42 1

78

24 8

6 5

75

1 48

275

2 42

26 5 1 25

178

40 51

68

4 5 22

27 5

40 1

26

12

169

9 7

22 9

59

49 2

52

228

32 1

55 2 18 5

15 6

273

159

27 2 5 4

158

204

35

114 03 1

71 2

2 15

11 5

28 1

36 2

87

268

18 6

24 1

175

72

41

23 4

21 2

23 7

67

2 22

6

15 9

2 56

136

140

269

153

116 35

16

97 1

18

103

78 164

38

24 3

4 2 59 114

99 127

107

16 1

19

67

37

3

33

63 1

41

38 206

230 53

244

14

217

256

232

129

124

98 1

263

120

18 4

243

200 54

85

2 1

166

187

17 5

115

18

14 8

183

41 2

158

63

238

188 4 8

246

8

119

192 37 2

40 52 1

27 0

211

94

266 132

268

10

0 8

236

30

146 176

274

21 0

72

144

142

57 1

6 2

229

79

61

26 3

257

46

142

174

141

2 9

5 9

234 245

6 3

4 5

45

207

102

24 4 1

12 0

194

6 8

0 5

65

30

8

56

55

39 47 1

77 253

2 61

80

67

242 276

16 48

133

2 6

222

63

105

62 205

173

25 2

9

22

249

187

195

NU LL 76

95

164

26 7 258

73

3

81

252

185

251

224

57

55 2 13 9

220 87

11 5

78

19 3

124

11

26

93 165

75 2

247

209 178

17 4

2 04

53

200

10 9

30 2

143

19 6

38

83

64 2

153

15 4

19

79

227

203

87 93

242

213 276

15 6

23 16

130 27 1

115

23 0 1 66

57

216

14 2

242

3 89

11 29

1 65

84 22 0

66

25 4 13 3

86 71

83

248

15

90 179

228 215

138

128

3 1

34 1

169

104 262

135 150

77

202

99

5 9

233 221

229

110

58

11 7 211

1 27

73 87

1 52

25 249 238 17

12 7 21 184

20 5

22 3 14

272

31

6 2

66 91

99

2 21 18 8

1 32 31

1 86

108

44 177

196

51

21

62

219

262

125

97

37 270

25 7

20 7 11 7

76 2 74 11 1

1 58

2 53

1 36 20 2

2 13

58

20

2 69

98 1

75 201

47

59 2

25 6 240

146

1 63

69

27 135 02 1

239

16 0

228 14 1

206

23 2

212

119 8 6

260

271 41 131

104

29

155

1 38 88

25 8 16 1

68

1 08

1 40

79 215 155

191

181 253

7

26 5

63

227

32

105 20 38

2 25 13 1

13 9

24 0

67

262

01 2

2 35

24 7

61

11 9

1 78

2 50 10 5

12 9

186 18 2

204 173

82 248 74 144

74

14 1

19 1

24 9 7

17 3

10 1 14 3

25 9

26 7

82

2 31

91

10 4

17 3

64 2

24 5

2 48

23 3

55 257 270

NUL L

22 7 18

2 29 10 2

2 32

11 7

47 1

22

2 73

94 19 4

42 1 23 1

154

30

18 0

67 1

6 35 2

195

34

60 2 38

11 0

31 2

20

66 1

1 8

37 1

231

145

59 1 26 1

160

82 76

254

34 1

2 14

1

118

70

225

8

224

139 218 200

185

2 62

21 8

97

9 8

29

20 19 165

70

NUL L 8 4 9 1

209

15 7

12

95 32

19 2

17 2

54 2 71

52

13

22 3 179

16 2

25 0

53 1

112

186

147 87

188

28

14 2 99

1 83

76 89 1

61

91 101

88 136 11

191

151

15

42 65

19 8 2 77

1 79

2 11 10

2 09

70

65 2 16

114

17

1 23 14 5

47

1 12

27 6 15 6

1 28

93

1 69

265 22 6

167

46 1

1 95

10 2

21 3

52 2

26 6 1 55

75 2

10

234 25 30 1 22 1

233

245 40

06 1 236

54 1 30 13

23

45 2

60 172

189

141

86

1 16 1 37

1 24

2 39

74

2 52

2 43

14 7

5 6 134

20 1

2 60

9

4 9

27 4

72

1 75 24

21 5

180

88 1

07 1

50

36

240 20 1

157

1 13

18

19 9

80

1 81

18 9

85

1 93

1 87

26 4

51

26 3

1 64

48

25 20 3

97

1 62

28

2 68

2 08

56 1 00 5 1

76

2 46

22 3

1 35

39

243

172

155

52

11

238

40 77

20 5 8

16 69

9 8

11 109

221 13

91

1 50

1 09 63

92 15

62

136

41 1

15 73

209 50

245

2 9

54

1 20 193

90

26

2 10

20 7

8 4 41 2

84 1

10 6

19 7

12 5

60

199 160

223

138

53

NUL L 17

30 27

15 4

263

176 65 1

80

11

74

254

25 1

247

0 3

29

8 2

203 27 6

23 8

9 1

124

18 9 19 8

212 206

6

37 53 2

112

7

194 247

8

267

17 4

20 0

43 137

39 58 2

25 1

17 9

91 1

16 2 1 4

130

269

237

225

3 2

0 7

47 31 1

32 2 88

217

190 81 1

44 2

23 1 266

161

05 1

7

15 3

10 51 1

233

239 23 5

120 18 2

42 2 18

16 2

183

13

146

77

46 2

10 2 22

269 19 3 26 1 20

110 9

17 4

48 1

12

108 72

189

135

21

18 7

45 1

42

270

244

204 23 0

10 4

95

167

247

41

259 16

71

236

152

22 1 75 204 222

75

38

101

63

69

94

50

239

259

133

25 6

50 1

116 6

21

10 3

81

94

120

196

16

15

186

68

100

31

8 7

62 2

224

229

265 210 264 107

46 110

241

231

95 1

3 7

185

140

163

39

32

254 104

139

77

129

166

266

218

45

23 9

11

180

253

9

191 219

105 195

240 217 19 4

155

198

21 91

71 128

138

6 4

127

84

114

261

136

69

22

117 220

237

8

137 112

70 125

208 225

181

89

73 52

133

25

273

83

179 40

113 159 274

275

257 29

9

6

1

183 154

139 130

169 84

233

20 2

251 28

82

158

143

33

46

176

7

58 33

272 10

207

36

21 0 31

197

118 109 177

44

246

85 208

213 78 09 1

86

212 35

102

100

206

18 22 8

19 3

90

147

235

92 66 199

6 5

203 261

18 7

57

23 113

156

177

111

18 4

227 18 1

110

0 6

148

169 175

45

259

192

46

161 19 2

250

65

163 16

65

178

56

51

143

85

223

145

3

202

197

277

176

60

131

183

209

74

258

160 22 0

132

08 1

256

13

17 255 37

44 2

64 1

20 2

103

1 76 1 57

19 214 184

180

1 67 1 85 2 40

1 80

31

47

53 196

15 2

2 55

27 0

92 205

258

74 2 11 2

22

85 135

48

58

52

264

58

107

0 8

197

34 2

173 125 117

10 6 2 71

97 260

10

156 158

167 273

51

10 3

1 46 100

21 2

231

257

1 34

192 NUL L

90

128 41 2

271

2 36

22 7 26 7

33

248

160

22 4

57 90

53

230

1 77 34 2

86

144

88

68 2

11 8

269

131 1 6

37

14

249

2 51

1 07 33

172

15 3 23 1

165

123

172

2 06

268

145

263

97

20 8

119 62

124 33 1

276 6

199

36

174 250

93

150 246

Time Period 1

Evolving between Time Periods 1 and 2

235

Evolving between Time Periods 2 and 3

Evolving between Time Periods 3 and 4

Evolving between Time Periods 4 and 5

1 43

198

172 6

97

213 203

209

205

11 6

181 173 3 8

2 41

200

202 89

196 8

77

165

243

1 40

146

11 7

3 9

11

59

1 29

25 275

16

56

86 228

156

100

189

1 47 18

237 15 194

272

99

80

29

165

165

23

27 7

87

198

56

63 8

146

26 4

1 48

10 7

43

23 9

232 1 12

144

143

54

200

118

1 83

35

89

36 135

23 9

24 3

145 86

127

NU LL

88

174

36

1 45

2 9

158

103

3

63

225

257 93

19 4

10

108

2 31

150

111

40

02 2

176

191

160 174 2 33

8 6

53

138

1 4 2 11

30

74

48

274

230

17

172

131

160

50

66

2 46

24 8

19 7

13 3 2 06

16

2 14

17 7

42

141

1 65

44 91

21

13 0

178 12

1 20

13 9

15 2

55

2 73

57

10 5

1 36

34

22 4

195 26 8 209

74

14 6

274

229

270

14 3

210 22 6

203

212

104

21 7

42

169

86

164 103

135

190 11 5

1 32

60

166

106 72

56

15 1

79

240

1 85

236

39

75

21 6

25 134

116

2 75

38

3

40

184

227

115

17

2 54 13 1

19

43

1 81 262

22 8

9

110

94

274 1 11

25 2

15

128 1 57

60

2 29

200

19 7

99

2 49

145

142

225

138 124

154 2 21

238 202

20 1

242

29

37 6

2 31

1 53 193

2 43 1 80

10

1 94

1 66

128

271

13

183

76 80 33

12 9

158

11 2

31

78

24 2

40

144

253 1 50

155

2 45

58

20 5

23 6

8

41

71

1 86 141

95

25 6

132

234

55

218

187

268

88

137

25 7

1 92

198 11 8

67 251

54

196 24 4

79 2 32

157

1 59

238

184

224 210

18 8

40

53

91 231

117

63

1 93

243

185 17 4

208

160

46

152 85

87

99

230

10 6

65

21 6 23

109 24 0

1 24 200

2 33 1 23 28

94

267

3

22 7

100 161

10

163

244

87

179

2 20

25

276 22 7

140

48

159

276 181

275 26 6

1 27 92

1 65

66

21 1 108 103

12

123 173

8

2 62

93

172 50

191

70 1

239

20

161

69 101

15 6

207

26 264 82

30

1 31

209

255

258

61

1 73 100

11 3

267 2 61 163

213 72

1 72

25

24 1

90 125

13

62

73

270

10 7 26 5

2 66 52

19 9

2 50

83

81 84

47

27 7

79 7

2 54

13 35

57

195 18 7

1 69

1 62

17 6

147

32

56 51

1 77 1 40

1 73

34

99

15 3

27 2

14 1 14 2 75 2 30

10 2

11 1

93

164 259 68

3

211

260

53

21 0

2 69

154

74

18

18 3

184 43

97

61 1 75

1 50 28

191 46

2 51

2 48

50 97

21 8

224

14 8

13 3 12 7

1 01 1 39

75

45 260

54 21

191

232

123

16 2

110

1 29 53

12

31

2 09

12 4

8 4

21 5

2 35

1 78

19

1 20 14 1

13 0

2 01 68

237

18 0 18

26 2 1 42

60

1 60

21 8

8 5 20

0 28

47 74

220

24 6

55

148

8

17 5

27 5

17

20

101 192

22

222

47 109

92 44

10

11 9

39

114

1 46

2 19

174

26 3

2 03 24 5

20 6

37

61

62

38

219

14

18 9

175 77

33

150

20 5

208 2 11

218

89

NU LL

119

37

1 38

255 27 6

6

15 8

186

200

78 04 2

113

75

19 8

248

156

45

230

19 2

2 63

2 44 83

76 48

26 6

116

214

18 1

1 79 133

229

86

141

52

89 250

155

114

21 5 2 61

84

1 18 15 6

81

212 128

10

242

203

124 211 69

193

76 5 1 7

72

119

44 255

3

41 3 8

73 197

95 4 3

161

197 7 5 41

178

6 17

59 2 10 7

177

183

46 234 1 89

147

258

125

261 254

2 37

129

2 08 219

222

85 66 13

1 23

227

1 25 1

65 1 80 15 3

1 07

22

190

65

151

153 207

26

73

12 11 1 36 2

65 240

162 18 7

87

38 77

72 137

254

154 209

238

2 23

60 164

73 1 216

86 1 57

71

1 55

135 195

40 1

185 264

7

79

10

23 3

2 04

134

163 241 184

266 26 5

233

1 5

244

31

41

2 53

40 166

13

177

53 2

08 1 101

11 9

80 1 11 2

21

20

19 6

262

120

37

2 3

260

130 243

27 0

25

268

0 5

45 2 67

4 5

80

8 2

5 8

127

9

0 9

162 148

4 1

256

102

8 7

269 117

9 124

37

112

252

201

178 33

154

110

97

251

109

1 9

221

159

207 1 86 204 118

133

66

38

199

19 260

179 79

85

12 9

137 8 4 102

9 0 267

179

44

83 70

22 1

21 71

39

273

2 24 7 8 69

245 73 68 238

60

244

188

214

15 5

26 2

253

130 225 259 92

77

23

33

129

90

52 94

193

113 229

146

1 43

1 04

13 7

221

26 5

1 72

120 220

23 2 246

207 80

19

67

2 34 261

81

21

102

2 57

67 69

27 1

32

152 81 57

19 3

82

259

135

8

212

1 15

245

2 22

26 7

35

11 5 266

167

1 05

218

1 04

53

23 7

143

103

185

54

223

76

80

2 48 252

257

2 47

13 5

187

128 1 66

1 44

31 12 5

247

113

241 2 50

36

215 86

263 250

164

105

272

93

9 5 277

114 236

16 7

2 35

21 4

264

13 1

26 4

46

116

32 159

32 219

235

270

55

1 99

21 5

23 203

2 06

223

NULL

2 50

242

206

1 37

142 78

28

27 2

13 9

242 13 0

1 33

177 17 6

191

11

1 04

153 36

2 49 51

258

154 85

90

92

204

70

N ULL

247 142

110

22 8 2 04

221

65

194

35

74

263

239

240

12

14

62

24 7

22

13 2

186 89 30

91 1 96

26

251 95

230

70

222

84 214

14 8

192

18 0

17

29

71

2 02

26 0

63

39

1 95

9 47

238 71

48

119 7

105

118

225

2 63 256

252

108 58

82

2 39

2 34

42

27 2

6 69

2 31

11 1

1 79 162

13 4

10 3 21 2

160

131

254 101

52

22 18 9

1 56

255

14 7

253

51 24 9

2 59

25 2 208

2 41 2 25

144

50 1 12 5

16

83

23 1 97

11 1 99 35

19

105

93

94 145

66

19 4 50

277

33

210 16 9

36

195 223

31

267

205

115

67

6 2

15 2 231

14 7

0 7 246 N ULL

232

68

213

81

1 25

102

112

2 35

233

44 1

228

26 8 27 4

18 7

206

26

76

18 215

261

265 276

20

1 98 55 2 97

139 62

234

72

47

63

18 9

16 3

145 2 8

159 104 223

183

2 46

18 178

248

10 9

15

57

2 02

2 36 142

11 6

24 7

1 276

220

249

274

41

55

22 4 1 85

24 77

2 28

227

88

1 64

12 8

74 1

46

192

85

1 61 7 8 16 167

95

113

16

181

249

123 235

20 73

117

2 77

1 96

22 2

91 2 70

56

199

28

210

258

75

265 27 1

9

69 2

2 8

257

88

17 6

20 7

92

11

26

107

2 37 1 08

240

1 09

30

21 6 94

22 9

58

5 2 15

17

1 66

20 5

227 30 256 99

144

15

21 3

100 139

12 0

16 7

269

155

2 56

163

1

61

212

1 00

15 2 161

Evolving between Time Periods 5 and 6

©ASE 2014

140

84

176

167

26 9 61

Evolving between Time Periods 6 and 7

Evolving between Time Periods 7 and 8 The evolving community trees

ISBN: 978-1-62561-000-3

2 19

Evolving between Time Periods 8 and 9

1 14

Evolving between Time Periods 9 and 10

9

Suggest Documents