Traversing a graph for identifying communities

4 downloads 1934 Views 783KB Size Report
University of Central Florida, Orlando, FL [email protected], [email protected]. ..... Figure 4: American College football network grouped by their conferences.
Traversing a graph for identifying communities Mahadevan Vasudevan and Narsingh Deo Dept. of EECS University of Central Florida, Orlando, FL [email protected], [email protected]

Abstract Locally dense subgraphs of significant size in a large, globally-sparse graph are referred to as Communities. In other words, communities are subsets of nodes that are closely-knit in a relatively sparse neighborhood. Existence of communities is a property inherent to Complex Networks (e.g. the World Wide Web (WWW), the Internet, the protein-interaction network, etc.). Classical Erdös-Rényi random graphs do not exhibit this phenomenon. Emergence of complex networks in various disciplines (sociology, biology, computer science, linguistics and mathematics, to name a few) has spurred the interest in community discovery. Extracting communities in a large graph has been shown to be NP-Complete, and a number of approximate algorithms have been proposed in recent years. At least two different questions may be posed on the community structure in large networks: (i) Given a network, detect or extract all (i.e., sets of nodes that constitute) communities; and (ii) Given a node or a small subset of seed nodes in the network, identify the best community structure that includes these nodes, if there exists one. Several algorithms have been proposed to solve the former problem, known as Community Discovery. The latter problem, known as Community Identification, has also been studied, but to a much smaller extent. In this paper, we discuss and compare the existing community identification algorithms in a unified framework of traversing the graph from the seed node. Also, we propose an improved definition of a community and a novel community identification algorithm based on that definition. Keywords: Community, graph traversal, complex networks.

I.

Introduction

Real-world complex systems have been modeled and studied as networks. Each node of the network corresponds to an individual object of the system and the edge symbolizes the interaction between the system entities [44]. Examples of real-world complex networks include the Internet, the World Wide Web (WWW), protein-interaction networks, human metabolic networks, ecological networks and railroad networks [6, 29]. The term Complex network

1

refers to any large, dynamic, random graph that corresponds to a complex system. These networks exhibit randomness significantly different from that of the classical Erdös-Rényi random graphs [14]. Complex networks exist prominently across disciplines such as sociology, biology, computer science, linguistics and mathematics. Their ubiquitous nature is the main reason behind the surge of interest in the study of their properties [32]. The properties of complex networks have been studied at two extreme levels: (i) microscopic - properties at node level (e.g. degree distribution [47] and clustering coefficient [7]), and (ii) macroscopic - global properties such as the network distance (small-world effect [3]). Despite the large size of the complex networks, the values for each of these properties can be easily obtained. The degree, average distance and clustering coefficient have been empirically identified for more than 700 real-world complex networks [32].

Figure 1: Communities in a graph. The subgraphs marked by dashed circles correspond to communities. However, recent research has focused on the study of Communities in complex networks. A Community is defined as a subset of vertices that are closely-knit in a relatively sparse neighborhood. Modules, motifs, and clusters are other terminologies referring to such dense subgraphs. Figure 1 shows an

2

example graph with three communities. Detecting and defining communities in complex networks is tricky because they lie intermediate between the microscopic and macroscopic properties. Hence they are referred as the mesoscopic property. Techniques to identify, extract and detect communities in networks are classified as Community detection algorithms. Detecting communities in a large graph is NP-complete [18]. Algorithms dealing with community detection in complex networks address one of the following two questions: (i) given a network, can we explore and extract subsets of nodes that form a community? (ii) given a network and a seed node, can we identify the best community that the given seed belongs to, if there exists one? The former problem, known as Community Discovery, has been studied extensively in the literature and a number of approximate algorithms has been proposed [11, 16, 20, 30, 38]. The latter, known as Community Identification, has also been studied in the literature, but to a smaller extent. Techniques to discover communities in complex networks are broadly discussed by some authors [18, 35, 41, 45], but none of them elucidate community identification algorithms in detail. The taxonomy we proposed on community detection [45] gives a comprehensive classification of the existing algorithms. Global community detection is tedious especially in the case of complex networks. It necessitates the exploration of the entire graph. Whereas, identifying communities from a given seed node using only its local neighborhood information is relatively easier and has wide range of real-world application [42, 43]. But identifying an induced connected subgraph with a specific property in a given graph is NP-Complete [19]. In this paper we give a comprehensive summary of the approximate algorithms proposed to identify communities. We also propose an improved definition of a community and an algorithm to identify community in complex networks based on that definition. The organization of the paper is as follows. Section II exposes several abstract and qualitative definitions of a community. A generic approach to community identification in complex networks is discussed in section III followed by a brief description of the existing community identification algorithms in section IV. In Section V we describe the proposed community identification algorithm in detail followed by results section and conclusion.

II.

Defining a Community

Communities are locally-dense subgraphs in large, globally-sparse graphs. An accepted mathematical definition for a community is yet to hit the

3

literature [18]. Existing definitions are conceptual and depends mainly on the structure of the underlying network. Some of the definitions are also constructive, i.e., the result of algorithmic steps [39]. Earlier notion of a community focused on its equivalence to a Clique [24] . But expecting every member to be connected to every other member within a subgraph of a sparse random network is stringent. Therefore, a more relaxed definition based on the maximum distance between two nodes of the subset was proposed. Instead of requiring every neighbor to be at a distance one (clique), the subgraph diameter [12] would have an upper limit of d. d-clique [2], d-clan and d-club [28] are of the terminologies associated with dense subgraphs based on distance. But a subset of nodes with maximum diameter of d in a graph need not be connected, since there maybe shortest paths through nodes that are not in the subgraph. A more formal approach to defining communities was attempted by Radicchi et al. An induced subgraph is a community if the sum of internal degrees of the nodes within the subgraph is greater than the sum of their external degrees [37]. Later, Hu et al. [21] suggested a tighter version of this definition and only external edges connecting other communities were taken into account. Subgraphs satisfying these definitions were referred as weak-communities. In order to define a community in the strong sense, comparing the sum of the degrees of the nodes of the subgraph against its neighborhood is not sufficient. Instead, a stronger association or alliance among every node and its neighbors is expected within the subset. Flake et al. [15] suggested that the internal degree of every vertex within a community should be greater than or equal to its external degree. The definition was widely accepted especially in the context of a web graph. Radicchi et al. and Hu et al. defined a strong community as the one in which the internal degree of every node is greater than its external degree. These definitions coincide with that of the alliances in graph theory. A powerful alliance is a subset of vertices such that each vertex has more neighbors within the set to defend and defeat any attack from the neighbors outside the set [22]. Alliances in graphs can be of two types: (i) Given a graph G (V, E), a non-empty set is a defensive alliance, if for every vertex v in S, v has at most one more neighbor in than it has in S. (ii) S is an offensive alliance if for every that has a neighbor in S, v has more neighbors in S than in . A powerful alliance is both defensive and offensive. The definition we propose is a direct consequence of the intuitive definition of a community – locally-dense subgraph in large globally-sparse graph. The denseness of the subgraph is expressed by the ratio of the total number of edges in the subgraph to the number of edges possible with the given

4

subset of nodes. In order to measure the density of the subgraph relative to its neighbors, we use the ratio of the number of edges within and outside the induced subgraph. The former is referred as the internal density and the latter is the relative density . The proposed definition is similar to Schaeffer‟s definition [40], but differs in terms of the relative density. The internal and relative density values in our definition are expressed by the following equations. In a given graph , let be any subgraph with , and the subset with the maximum f (product of and ) forms a community . The density values are calculated as follows:

A complete graph (Kn) has a value of 1 for f. Higher the f value, better the community. The proposed algorithm is based on maximizing the value of f from a given seed vertex. III.

Community identification preliminaries

Community identification can be formally described as follows: Given a graph G (V, E) (a large sparse graph) and a seed vertex , does there exist a community that u belongs to? If yes, return the induced subgraph. In a large sparse graph, a node need not always belong to a community. There are three possible scenarios based on the location of a seed vertex (Figure 2) and any community identification should address these scenarios. (i)

(ii)

The seed vertex belongs to the core vertex set of the community. This is the primary scenario and the algorithm should easily identify the corresponding community. The seed vertex belongs to the community, but not to the core set. In other words, the seed vertex is one of the boundary vertices [11] and thus can belong to more than one community. Such overlapping communities (cover) are dealt with in a separate class of algorithms[45]. But for community identification, when the seed vertex belongs to

5

(iii)

the overlapping zone, the algorithm should be able to identify the cover. The seed vertex does not belong to any community and so the algorithm should not return any community associated with such a node.

Figure 2: Community identification scenarios. Ci denotes a community.

Given any community identification algorithm would adhere to the following steps in general: 1) Begin with the seed vertex . 2) Locally traverse the graph from u (BFS [12], Simulated annealing [40], etc) 3) Choose nodes to be included in the community. 4) Traverse further from the newly added nodes till no new nodes can be added. Initially, the traversal begins with the seed vertex‟s adjacency information. Then the neighbors are locally explored to reach out to the next set of unvisited nodes, typically in a breadth-first manner. Newer nodes are added subsequently to the community during the traversal. The traversal techniques and the criteria for adding nodes to a community are discussed in detail in the following section. IV.

Existing techniques

Community identification algorithms are very sparse in the literature and are mainly greedy heuristics [4, 5, 8, 10, 26, 27, 33, 40]. Each of these

6

algorithms iteratively accumulates one or more nodes that increase the value of a predefined metric. This metric measures the denseness of the identified community. Usually the community mining begins with the assumption that the seed vertex belongs to a community (case 1). As mentioned above, the traversal explores the seed node‟s neighbors and then the neighbors of neighbors and so on in a breadth-first manner. At each stage of the algorithm, one has to decide on the addition or removal of nodes to or from the community. In this section we discuss the various metrics used in the existing algorithms. Given a graph and a subgraph that belongs to the graph, a community where and , consists of two different types of vertices. denotes the set of nuclei vertices of the community whose members do not share neighbors outside the community. Similarly, is the set of boundary vertices whose members have adjacent vertices that do not belong to the community.

Also,

i)

Connection density ratio Chen et al. [8] proposed an algorithm to identify communities based on maximizing the connection density ratio. The connection density ratio δ is given by,

where, the internal are defined as follows:

and external

density of the community C

Using this average density measure, the algorithm extracts the community around a given node in two phases. In the first phase, the adjacent nodes that increase the value of are chosen and added to the community. This continues till no more nodes can be added to the community. The second phase examines subsets for nodes of the current community and chooses the one with the maximum .

7

ii)

Local modularity Clauset [10] proposed a community definition based on the density of the boundary edges. He defined the boundary adjacency matrix B as follows:

He also proposed the local modularity Q and defined it as the ratio of the number of edges the boundary vertices share with the nuclei to the total number of boundary edges. The algorithm iteratively maximizes the value of Q till a given number of nodes is obtained in the community [10]. The local modularity is given by

Here is 1 when either and or vice versa and is 0 otherwise. It is to be noted that the global definition of modularity Q was given by Newman and Girvan [31]. iii)

Subgraph modularity Luo and Wang [25, 26], define modularity Q for a community in terms of the ratio of the number of edges within and outside the induced subgraph.

They refer to this term as subgraph modularity. Their algorithm begins with the seed node and after each iteration adds the node that increases the value of Q. The new nodes to be added are chosen from the neighbor list (which gets updated after each addition). The list is also sorted in non-decreasing order based on the degree. iv)

l-shell spreading using emerging edges Bagrow and Bollt proposed an algorithm based on the emerging edges from the induced subgraph [5]. The number of emerging edges of a node is the out-degree of the node . When the nodes of the graph are explored breadthfirst with the seed as the root, the total emerging degree ( ) is the sum of the out-degree of the nodes at level j.

Their algorithm iteratively computes the change in from the seed node and the stopping criterion is given by an input value α. The value of α is

8

predetermined based on the degree distribution of the input graph and it varies from one network to the other. The change in at a level j is given by

and

where u is the seed node.

v)

Relative edge density metric Schaeffer [40] proposed a metric that considers not only the density of edges within the community but also the proportion of the edges within and outside. The internal density and relative density values given by Schaeffer are as follows:

Let denote the set of edges that connect the community nodes ( ) with the external nodes.

The relative density of the induced subgraph is given by

The algorithm performs a local search using simulated annealing technique and the subgraph with the maximum value of f is the final community. It is to be noted that there may be few edges in apart from and . The proposed definition of a community (section II) takes into consideration these edges since their presence impacts the locally-dense characteristic of a community. V.

Proposed Community identification algorithm

We propose a greedy algorithm based on maximizing the value of the metric „f‟ defined in section II. From the definition it is evident that the induced subgraph with the maximal values for internal and relative densities would form the ideal community. Therefore, we traverse locally from the seed vertex in a breadth-first manner, accumulating nodes that increase the value of f. One of the pitfalls of the existing methodologies in identifying a community is the inclusion of outlier nodes. We address this issue and discuss our algorithm in detail below.

9

The seed vertex and its adjacent neighbors form the initial community. The nodes adjacent to this initial community form the neighbor-list (Queue). This list is frequently updated depending on the change in the community. The core module of the algorithm is the alternation between two steps – (i) addition phase and (ii) deletion phase. In the addition phase each node from the queue is added one-at-a-time and the change in the value of f is checked. We add a node to the current community only if it increases the value of f. The new set of community nodes (the community maybe unchanged after addition phase, if no new node increases the value of „f‟) is the input to the deletion phase. In the deletion phase, we remove one node at a time and compute the change in f value. If there is an increase, clearly, the node does not belong to the community. If the removal of the input seed node increases the density value then it does not belong to the community. The output in that case would be an empty set. The neighbor-list is updated at this point to include nodes adjacent to the newly added nodes. The alternative addition and deletion continues until no new vertices can be added to increase the value of the density. The proposed technique is expressed in Algorithm 1. Luo et al. [26] used the alternative addition and removal of nodes in their algorithm. But the metric they used measures the ratio of the internal and external edges and discards the significance of the number of nodes. Since we are interested in dense subgraphs of significant size it is essential to consider the number of nodes added to the community. The addition phase ensures the inclusion of all the nodes that increase the density of the community. The deletion phase takes care of eliminating outlier nodes that is of little importance to the community. Suppose, the final output consists of k community nodes and the average degree of the subgraph is . The complexity of the proposed algorithm is ), since each iteration needs to check with adjacent nodes.

10

Algorithm 1: Community identification algorithm Input: G (V, E) u Output: C

Input graph with vertex set V and edge set E Seed vertex Set of vertices that form the Community

Procedure Initialize ( ) { // Initialize with u and its adjacent nodes; // Queue to store adjacent vertices to be visited // Compute the value of f for the current community } Procedure find_Community ( ) { do { // Store the current size of the community additionPhase ( ) deletionPhase ( ) }while (

// Repeat the steps till there is no change in size

} // Add new node and compare change in „f‟ value

Procedure additionPhase ( ) {

// Obtain the first element of the Q foreach {

if (

then

// Density value does not increase

else

// Retain the new value

} } Procedure deletionPhase ( ) { foreach {

if (

// Delete existing node and compare change in value

then

// Density value does not increase

else

// Retain the new value

} }

11

VI.

Results

We implemented our algorithm using Java and tested on various regular and random graphs. Regular graphs do not posses any dense subgraphs because their edges are uniformly distributed and hence no communities. Classical random graphs also do not exhibit this property since there is no preferential attachment among nodes to form denser subgraphs [7]. Several random graph generators take into account the dynamic addition of nodes and edges and produce graphs similar to real-world complex networks [9, 13, 17, 29, 34, 46]. These graphs are the benchmark graphs for testing community detection algorithms. The execution results of our algorithm on one such synthetic graph and a real-world graph is discussed below. We used GraphViz [1] to visualize our results. Synthetic (GN graph): Girvan and Newman [31] proposed a simple graph model with communities (GN graphs). These graphs are used as benchmarks for testing community detection algorithms [10, 11, 23]. We generated one such graph with 128 nodes and the nodes are divided equally among four communities. The density of the nodes within and outside the communities is input specific. The synthetic graph is shown in figure 3. One node from each of the communities was used as the seed and the algorithm identified the communities accurately. Real-world (Football network): The American football network depicts the schedule of football games between Division IA colleges during regular season Fall 2000, as compiled by M. Girvan and M. Newman [20]. This realworld complex network is considered a good benchmark for testing community detection algorithms since the communities are well defined [36]. Each node represents a college team in the division and an edge denotes a scheduled game between two teams. Each team plays against other teams in their conference at least once and occasionally against other conference teams. Therefore, the communities in this network correspond to the eleven conferences of the league. The algorithm distinctly identified the communities when one node from different conferences was given as the input seed. The results are shown in figure 4.

12

Figure 3: Synthetic graph with 128 nodes and 4 communities (each consisting of 32 nodes) generated using Girvan-Newman algorithm. The four communities were identified by our algorithm from the corresponding seed vertices and are differentiated by different node shapes in the figure.

13

Figure 4: American College football network grouped by their conferences. Ten conferences were identified distinctly as communities as shown (remaining nodes are shown by grey circles)

VII.

Conclusion

Detecting clusters in large networks has a wide range of application including webpage ranking based on hyperlink analysis and data mining in social networks. Community identification algorithms are significant in the context of complex networks because they require only local information in large graphs. In this paper, we have addressed two main problems related to identifying communities – (i) community definition and (ii) community identification algorithm. An accepted mathematical definition of a community is yet to hit the literature. The definition we have proposed expresses a locallydense subgraph more effectively than the existing definitions. This is mainly

14

because we take into account the relative dense nature of an induced subgraph in our definition. The greedy algorithm discussed in this paper identifies communities more efficiently than the existing techniques, mainly because of the underlying definition. Outlier nodes are clearly identified and ignored in the final community. This is a significant improvement over the existing techniques. Given a large web-like graph, the local community detection can be run using different input nodes, for a global community discovery. Acknowledgements We would like to thank Mark Newman for the data on American College Football network and Santa Fortunato for the code to generate synthetic GN network. List of Symbols

B d f G (V, E) k Kn Q S u, v δ

Boundary vertices of the community Total emerging degree (out-degree) Nuclei vertices of the community Degree of the node u Average degree of the subgraph External density of an induced subgraph Internal density of an induced subgraph Relative density of an induced subgraph Boundary adjacency matrix Diameter of a graph Product of densities Graph with vertex set V and edge set E Number of nodes in the community Complete graph of n vertices Local modularity Subset of vertices in a graph Vertices of a graph Connection density ratio Community in a graph G with vertex set VC and edge set EC Edge between vertices i and j External edges of an induced subgraph Subgraph of a graph G with vertex set V‟ and edge set E‟

15

References [1] [2] [3] [4]

[5] [6]

[7]

[8]

[9]

[10] [11]

[12] [13]

[14] [15]

"Graphviz - Graph Visualization Software http://www.graphviz.org/," http://www.graphviz.org/. R. D. Alba, “A graph-theoretic definition of a sociometric clique,” The Journal of Mathematical Sociology, vol. 3, no. 1, pp. 113 - 126, 1973. R. Albert, H. Jeong, and A. Barabasi, “Diameter of the World-Wide Web,” Nature, vol. 401, no. 6749, pp. 130-131, 1999. J. P. Bagrow, “Evaluating local community methods in networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 05, pp. P05001, 2008. J. P. Bagrow, and E. M. Bollt, “Local method for detecting communities,” Phys. Rev. E, vol. 72, no. 4, pp. 046108, 2005. S. Boccaletti, V. Latora, Y. Moreno et al., “Complex networks: Structure and dynamics,” Physics Reports, vol. 424, no. 4-5, pp. 175308, 2006. A. Cami, and N. Deo, “Techniques for analyzing dynamic random graph models of web-like networks: An overview,” Networks, vol. 51, no. 4, pp. 211 - 255, 2007. J. Chen, O. Zaiane, and R. Goebel, “Local community identification in social networks,” Social Network Analysis and Mining, IEEE Computer Society., pp. 237-242, 2009. F. Chung, L. Lu, T. G. Dewey et al., “Duplication models for biological networks,” Journal of Computational Biology, vol. 10, no. 5, pp. 677-687, 2003. A. Clauset, “Finding local community structures in networks,” Phys. Rev. E, vol. 72, no. 026132, 2005. A. Clauset, M. E. J. Newman, and C. Moore, “Finding community structure in very large networks,” Phys. Rev. E, vol. 70, pp. 066111, 2004. N. Deo, Graph Theory with Applications to Engineering and Computer Science, Upper Saddle River, NJ, USA: Prentice-Hall, Inc, 1974. N. Deo, and A. Cami, “Preferential deletion in dynamic models of weblike networks,” Info. Processing Lett., vol. 102, no. 4, pp. 156-162, 2007. P. Erdös, and A. Rényi, “On random graphs,” Publ. Math. Debrecen, vol. 6, pp. 290-297, 1959. G. W. Flake, S. Lawrence, and C. L. Giles, “Efficient identification of Web communities,” in Proc. of 6th ACM SIGKDD Intl. conf. on Knowledge discovery and data mining, Boston, Massachusetts, United States, 2000, pp. 150 - 160

16

[16]

[17]

[18] [19] [20]

[21]

[22]

[23]

[24] [25] [26]

[27]

[28] [29] [30] [31] [32]

G. W. Flake, S. Lawrence, C. Lee Giles et al., “Self-organization and identification of Web communities,” Computer, vol. 35, no. 3, pp. 6671, 2002. A. D. Flaxman, A. M. Frieze, and J. Vera, “A geometric preferential attachment model of networks,” Internet Math, vol. 3, no. 2, pp. 187205, 2006. S. Fortunato, “Community detection in graphs,” Physics Report, vol. 486, no. 3 - 5, pp. 75 - 174, Feb 2010. M. R. Garey, and D. S. Johnson, Computers and Intractability: A guide to the theory of NP-Completeness: Freeman, New York, USA, 1979. M. Girvan, and M. E. J. Newman, “Community structure in social and biological networks,” Proc. of Natl. Acad. Sci. USA, vol. 99, no. 12, pp. 7821-7826, June, 2002. Y. Hu, H. Chen, P. Zhang et al., “Comparative definition of community and corresponding identifying algorithm,” Phys. Rev. E, vol. 78, no. 2, pp. 026121, 2008. P. Kristiansen, S. M. Hedetniemi, and S. T. Hedetniemi, “Alliances in Graphs,” J. of Combinatorial Math. and Combinatorial Comp., vol. 48, pp. 157-178, 2004. A. Lancichinetti, S. Fortunato, and F. Radicchi, “Benchmark graphs for testing community detection algorithms,” Phys. Rev. E, vol. 78, pp. 046110, 2008. R. Luce, and A. Perry, “A method of matrix analysis of group structure,” Psychometrika, vol. 14, no. 2, pp. 95-116, 1949. F. Luo, J. Z. Wang, and E. Promislow, "Exploring local community structures in large networks." pp. 233 - 239. F. Luo, J. Z. Wang, and E. Promislow, “Exploring local community structures in large networks,” Web Intelligence & Agent Systems, vol. 6, no. 4, pp. 387-400, 2008. F. Luo, Y. Yang, C.-F. Chen et al., “Modular organization of protein interaction networks,” Bioinformatics, vol. 23, no. 2, pp. 207-214, January 15, 2007, 2007. R. J. Mokken, “Cliques, clubs and clans,” Quality & Quantity, vol. 13, no. 2, pp. 161-173, 1979. M. E. J. Newman, “The structure and function of complex networks,” SIAM Review, vol. 45, no. 2, pp. 167 - 256, 2003. M. E. J. Newman, “Detecting community structure in networks,” Eur. Phys. J. B, vol. 38, pp. 321 - 330, 2004. M. E. J. Newman, and M. Girvan, “Finding and evaluating community structure in networks,” Phys. Rev. E, vol. 69, no. 026113, 2004. J.-P. Onnela, D. Fenn, S. Reid et al., “A Taxonomy of Networks,” 2010.

17

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41] [42]

[43]

[44] [45]

[46]

[47]

S. Papadopoulos, A. Skusa, A. Vakali et al., "Bridge Bounding: A Local Approach for Efficient Community Discovery in Complex Networks," 2009. P. Pollner, G. Palla, and T. Vicsek, “Preferential attachment of communities: The same principle, but a higher level,” Eur. Phy. Lett., vol. 73, no. 3, pp. 478, 2006. M. A. Porter, J. Onnela, and P. J. Mucha, “Communities in networks,” Notices of the AMS, vol. 56, no. 9, pp. 1082 - 1097, 1164 - 1166, Oct 2009. A. A. Rad, A. Khadivi, and M. Hasler, “Information Processing in Complex Networks,” Circuits and Systems Magazine, IEEE, vol. 10, no. 3, pp. 26-37, 2010. F. Radicchi, C. Castellano, F. Cecconi et al., “Defining and identifying communities in networks,” Proc. of Natl. Acad. Sci. USA, vol. 101, pp. 2658 - 2663, 2004. A. W. Rives, and T. Galitski, “Modular organization of cellular networks,” Proc. of Natl. Acad. Sci. USA, vol. 100, no. 3, pp. 11281133, February 4, 2003, 2003. J. M. Scanlon, and N. Deo, “Network Communities Based on Maximizing Average Degree,” Congressus Numerantium, vol. 190, pp. 183 - 192, 2008. S. E. Schaeffer, “Stochastic local clustering for massive graphs,” Proc. of 9th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, LNCS, vol. 3518, pp. 354 - 360, 2005. S. E. Schaeffer, “Graph Clustering,” Computer Science Review, vol. 1, no. 1, pp. 27-64, August 2007. B. Shneiderman, “Network Visualization by Semantic Substrates,” IEEE Transactions on Visualization and Computer Graphics, vol. 12, pp. 733-740, 2006. Y. Song, Z. Zhuang, H. Li et al., “Real-time automatic tag recommendation,” in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore, Singapore, 2008. S. H. Strogatz, “Exploring complex networks,” Nature, vol. 410, no. 6825, pp. 268-276, 2001. M. Vasudevan, H. Balakrishnan, and N. Deo, “Community discovery algorithms: An overview,” Congressus Numerantium, vol. 196, pp. 127-142, 2009. A. Vázquez, “Growing network with local rules: Preferential attachment, clustering hierarchy, and degree correlations,” Phys. Rev. E, vol. 67, no. 5, pp. 056104, 2003. D. J. Watts, and S. H. Strogatz, “Collective dynamics of small-world networks,” Nature, vol. 393, no. 6684, pp. 440-442, 1998.

18

Suggest Documents