A Topic Oriented Multi-Objective Clusteri

3 downloads 82 Views 119KB Size Report
Sub-Communities of Twitter Blogosphere: A Topic. Oriented Multi-Objective Clustering Approach. Dionisios N. Sotiropoulos. Department of Management ...
2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Semantically Meaningful Group Detection within Sub-Communities of Twitter Blogosphere: A Topic Oriented Multi-Objective Clustering Approach Dionisios N. Sotiropoulos

Chris D. Kounavis

George M. Giaglis

Department of Management Science and Technology Athens University of Economics and Business Evelpidon 47a & Lefkados St. 11361, Athens, Greece Email: [email protected]

Department of Management Science and Technology Athens University of Economics and Business Evelpidon 47a & Lefkados St. 11361, Athens, Greece Email: [email protected]

Department of Management Science and Technology Athens University of Economics and Business Evelpidon 47a & Lefkados St. 11361, Athens, Greece Email: [email protected]

Abstract—This paper addresses the problem of semantically meaningful group detection within a sub-community of twitter micro-bloggers by utilizing a topic modeling, multi-objective clustering approach. The proposed group detection method is anchored on the Latent Dirichlet Allocation (LDA) topic modeling technique, aiming at identifying clusters of twitter users that are optimal in terms of both spatial and topical compactness. Specifically, the group detection problem is formulated as a multiobjective optimization problem taking into consideration two complementary cluster formation directives. The first objective, related to spatial compactness, is achieved by minimizing the overall deviation from the corresponding cluster centers. The second, related to topical compactness, is achieved by minimizing the portion of probability mass assigned to low probability topics for the corresponding cluster centroids. In our approach, optimization is performed by employing a multi-objective genetic algorithm ,which results in a variety of cluster structures that are significantly more interpretable than cluster assignments obtained with traditional single-objective clustering algorithms.

I.

I NTRODUCTION

The recent outburst of online social networks (such as Twitter, Facebook, Flickr, LiveJournal, MySpace, Digg, YouTube and DBLP collaboration network, etc.) has led to the emergence of a significantly promising research area originating from the traditional field of Social Network Analysis(SNA)[1], [2]. In this context, the detection of inherent community structures [3] or clusters within online social networks is an essential task since it can reveal interesting properties shared amongst community members. Community discovery modules can be incorporated by a wide spectrum of different applications [4] ranging from collaborative recommendation [5] and information spreading [6] to knowledge sharing [7]. A common application of community detection within the context of the Word Wide Web is that related to proxy caches [8] where the grouping of web clients according to similar interests and near geographic location may enable them to be served by a dedicated proxy server. Examples originating from the same domain can be found in [9], [10], where the authors aim at discovering communities within the hyper-linked structure of the web towards the detection of ASONAM'13, August 25-29, 2013, Niagara, Ontario, CAN Copyright 2013 ACM 978-1-4503-2240-9 /13/08 ...$15.00

734

link farms. The same logic may be encountered in the Ecommerce domain, where grouping together customers that exhibit interelated buying profiles, enables the development of more personalized recommendation engines [11]. In a completely different setting, community detection in mobile ad-hoc networks enables efficient message routing and posting [12] by distinguishing members within the core of the community from members on the border, corresponding to edge routers. It is evident that detection of communities may provide remarkable insight towards a deeper understanding of the inherent social network structure by identifying individuals of similar behavior such as political biases, voting patterns, viewpoints, preferences, motivations or interests. Therefore, it is of critical importance to be in position to identify highly interpretable and semantically meaningful communities that simultaneously maximize intra-cluster coherence. Contemporary studies on community detection mainly focus on learning the underlying topological structure [13], [14], [15], [16] for the purposes of community prediction and clustering. The obvious drawback of such approaches lies upon the dynamic nature of social media which gives users the freedom to arbitrarily join and leave communities, resulting in link structures that are not stable over time. Moreover, there exist many approaches for which ties have to be implicitly inferred, giving rise to groupings that are not consistent with the actual underlying linkage. The most significant deficiency of the aforementioned approaches is that communities are formed based on maximizing the intracluster spatial coherency, neglecting the semantic content of the identified groups. Therefore, such approaches confuse the meaning of the extracted communities since their underlying clustering algorithms do not take into consideration the semantic homogeneity of the groups that are to be formed. An alternative research avenue, particularly applicable within the context of online social networks, emphasizes on the content of the social objects [17], [18]. Communities detected by utilizing this kind of methodologies tend to concentrate on a single topic but do not conform to the requirement of spatial coherency. Therefore, groups are formed by merging

2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining weakly connected individuals without reflecting the strength of the social underpinning. This paper presents a unification framework which embracing both spatial and semantic coherency directives. Specifically, we address the problem of semantically meaningful group detection within a sub-community of twitter microbloggers by utilizing a topic modeling, multi-objective clustering approach. The proposed group detection method is anchored on the Latent Dirichlet Allocation (LDA) topic modeling technique, aiming at identifying clusters of twitter users that are optimal in terms of both spatial and topical compactness. Specifically, the group detection problem is formulated as a multi-objective optimization problem taking into consideration two complementary cluster formation directives. The first objective, related to spatial compactness, is achieved by minimizing the overall deviation from the corresponding cluster centers. The second, related to topical compactness, is achieved by minimizing the portion of probability mass assigned to low probability topics for the corresponding cluster centroids. In our approach, optimization is performed by employing a multi-objective genetic algorithm ,which results in a variety of cluster structures that are significantly more interpretable than cluster assignments obtained with traditional single-objective clustering algorithms. II.

B. Probabilistic Topic Modeling Probabilistic topic modeling approaches [19] share the fundamental assumption that documents within a corpus can be formulated as mixtures of topics, where each topic is modeled as a probability distribution over words. A topic model may be interpreted as a generative model for documents, since it specifies a simple probabilistic procedure according to which new documents emerge. In this context, our dataset is a collection D of n documents such that D = {d1 , d2 , . . . , dn }, where each document d ∈ D is a collection of words. The LDA topic modeling technique was applied on the corpus of n = 4090 purified tweets, by setting to T = 10 the number of topics to be extracted, resulting in the the contents of Table I. Besides unraveling the latent topic structure, LDA lays the foundations for the corpus vectorization process, according to which each document d ∈ D in the corpus can be treated as a point in a T - dimensional probability vector space P. Formally, LDA defines a mapping φ : D → P, such that each document d ∈ D is mapped to a point p = φ(d) on the (T − 1)-dimensional simplex, such that p = [p(1) , p(2) , . . . , p(T ) ] and T 

p(t) = 1

(1)

t=1

P RELIMINARIES

which provides an alternative representation for our corpus, so that

A. Data Description We collected and analyzed a set of over 4000 tweets during the time period between 25/10/2012 and 8/11/2012 by utilizing the Streaming API of Twitter. The data collection process was focused on gathering tweets that were explicitly referring to the case of the arrest of Greek editor/journalist Kostas Vaxevanis. Mr. Vaxevanis was arrested and later exonerated for publishing a list containing the names of Greeks who held Swiss bank accounts and were suspected for tax evasion. The list is commonly referred to in the press as “Lagarde” list, after Mrs Christine Lagarde, Head of IMF and former finance minister of France, who allegedly first handed the list to the then Greek finance minister. The journalist arrest case received significant international attention and stimulated people from diverse backgrounds, nationalities, and age ranges to post about it in twitter. The data collection process was accomplished by parsing the Streaming API of Twitter through keyword filtering on the terms on the most popular hashtags associated with the case such as “Vaxevanis”, “FreeV”, “FreeVaxevanis”, “KostasVaxevanis”, “Opgreece”, and “LagardeList”. Data preparation involved the elimination of all non-english tweets and the construction of our corpus as a collection of distinct author documents where each document contained the text from a single tweet. The final version our corpus was formed after applying a series of tokenization, stop-word removal, and stemming operations on the original tweet text. Moreover, we deleted all words whose length was less than 2 characters. Finally, each document was represented as a 10-dimensional vector in a topic probability space through the utilization of the LDA topic modeling technique which is described in Section II-B.

735

 = {φ(d) ∈ P : d ∈ D} = {p1 , p2 , . . . , pn } D

III.

(2)

M ETHODOLOGY

A. Problem Formulation Informally, the community detection problem involves the division of a given dataset into homogeneous subgroups, such that data items within one cluster should be similar to each other, while those within different clusters should be dissimilar. Cluster homogeneity is usually expressed through the utilization of spatial compactness measures, which aim at keeping intra-cluster variation small. These approaches, however, emphasize on the underlying linkage structure of the dataset, neglecting the semantic interpretation of the clusters that are to be formed. On the other hand, there exist alternative methodologies which focus on the semantic coherence of the identified clusters, ignoring the connectedness requirement, demanding for the neighboring data items to be assigned to the same subgroup, related to clustering. Our approach, unifies the spatial and topical compactness directives of clustering within the general framework of multiobjective optimization. Given the vectorized version of the corpus, according to Eq. 2, the very essence of our approach lies upon finding a K-partition C = {C1 , C2 , . . . , CK } of  which simultaneously minimizes the optimization criteria D Jspatial and Jtopical which are given by the following functions:   Jspatial (C, δ(·, ·)) = δ(p, qk ) (3) Ck ∈C p∈Ck

2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining TABLE I.

TOPIC TOPIC TOPIC TOPIC TOPIC TOPIC TOPIC TOPIC TOPIC TOPIC

1 2 3 4 5 6 7 8 9 10

tax vaxevanis vaxevanis arrest vaxevanis vaxevanis vaxevanis greek ministry vaxevanis

journalist greek greece vaxevanis greek journalist greece journalist finance arrested

Jtopical (C, ζ(·)) =

greek tax says police list greek now vaxevanis greek got

 

ζ(p)

T OPICS DETECTED IN OUR EXPERIMENTAL DATA

LDA-BASED TOPICS vaxevanis freedom kostas journalist corrupt governed warrant journalist bank acquitted list acquitted democracy elite reports cnn leaked petition kostas greece

(4)

Ck ∈C p∈Ck

such that



 Ck = D

(5)

published editorial editor lagard kostas breach birth cleared hacks like

list vindictive list issues news privacy gave bank secret controversial

x

with x =

f (x) = [fspatial (x), ftopical (x)] [x1 , x2 , . . . , xn ] ∈ X = {1, . . . , K}

(9a) n

(9b)

where x is an n-dimensional decision vector or solution, and X is the decision space, that is the set of all expressible solutions.

and Ck ∩ Cl = ∅, ∀k = l ∈ [K]

(6)

where qk , k ∈ [K] represent the corresponding cluster centroids. The distance function δ(·, ·) quantifies the notion of spatial adjacency, which can be realized by utilizing the generalized Euclidean distance in a multi-dimensional vector space according to the following equation: δ(p, q) = p − q

(7)

Therefore, by minimizing Eq. 7 we enforce the spatial compactness objective. Function ζ(·), on the contrary, may be interpreted as a topic focus measure that valuates the degree at which a given topic probability distribution is spread over a wider range of topics. Entropy is an ideal candidate for realizing such a measure, given by the following equation: T 

press nyt lagarde reporter editor editor powerful charges sign list

The general (unconstrained) multi-objective optimization problem (MOP), addressed in this paper, can be defined as: min z =

k∈[K]

ζ(p) = −

exposing down clique list swiss bank via list documents leaked

  p(t) log2 p(t)

(8)

t=1

The ζ(·) function is in practice a probability distance measure which quantifies the deviation of a given probability distribution from the corresponding uniform. In the context of our work, the possible values of ζ(p) for a given T -dimensional topic probability distribution vector p, can range within the [0, log2 T ] interval. The minimum entropy value corresponds to a probability distribution vector which concentrates the entire probability mass on a single topic while the maximum entropy value emerges from a probability distribution vector that assigns equal probability to each topic. Therefore, by minimizing Eq. 8 we impose the topic focusing objective. Otherwise stated, the utilization of an entropy-based measure favors the formation of clusters for which the resulting centroids, {qk , k ∈ [K]}, correspond to uneven topic probability distributions for which the majority of the probability mass is distributed over a restricted number of topics. More importantly, it is not required to set the number of focusing topics in advance.

736

The objective function f (x) maps X in to R2 , so that the sub-functions fspatial (·) and ftopical (·) directly correspond to the optimization criteria given by Eqs. 3 and 4. The vector z = f (x) is an objective vector or point. The image of X in objective space is the set of all attainable points, Z. This minimization problem, however, cannot be tackled through the utilization of straightforward optimization techniques because, in general, there does not accept a single solution that is minimal on both objectives. Instead, there is a partial ordering of points in objective space: ∀y, z ∈ Z, y ≤ z ⇔ ∀i ∈ [2], yi ≤ zi ∧ ∃j ∈ [2], yj < zj (10) In the above, y is said to dominate z. Thus, there is usually a set of optimal solutions X ∗ ⊂ X , known as the Pareto optimal set: X ∗ = {x∗ ∈ X | x ∈ X , f (x) ≤ f (x∗ )} (11) The points in the objective space corresponding to the Pareto optima are termed non-dominated and when plotted form the pareto front. B. Genetic Algorithm-based Multi-Objective Clustering Having in mind, the immense number of possible clustering solutions that can be encoded by the decision space X and the partial ordering of them, imposed by considering a pair of complementary optimization objectives, leads us to embrace the evolutionary framework of Pareto optimization. Specifically, we employ a multi-objective evolutionary algorithm (MOEA) [20] in order to obtain a set of trade-off solutions representing group organizations of the initial dataset that are Pareto optimal with respect to both spatial and topical compactness. IV.

E XPERIMENTAL R ESULTS

In order to assess the clustering efficiency of our approach, in terms of both spatial and semantic coherency, we conducted a series of experiments by varying the number of clusters that were to be identified. The first scenario involved clustering the vectorized version of our corpus into 3 groups, while the

2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining ensure whether the identified cluster centroids concentrate their probability mass on a restricted number of topics, thus, giving rise to highly interpretable groups. The obtained pareto fronts for each clustering scenario indicate that both requirements were sufficiently met.

Topic Focus Objective

6.72

6.715

TABLE II.

T OPICAL C ENTERS FOR 3 C LUSTERS

6.71

6.705 1523.5

Fig. 1.

1524

1524.5

1525 1525.5 1526 1526.5 Topic Deviation Objective

1527

1527.5

CENTROIDS FOR TOPICS C1 Topic1 0.027 Topic2 0.029 Topic3 0.028 Topic4 0.026 Topic5 0.753 Topic6 0.026 Topic7 0.027 Topic8 0.027 Topic9 0.026 Topic10 0.026

1528

Multi-objective GA Optimization Pareto Front for 3 Clusters

10.715

3 CLUSTERS C2 C3 0.096 0.033 0.129 0.031 0.111 0.034 0.129 0.041 0.073 0.036 0.045 0.692 0.065 0.029 0.111 0.033 0.121 0.030 0.114 0.035

Topic Focus Objective

10.71

TABLE III.

10.705

TOPICS Topic1 Topic2 Topic3 Topic4 Topic5 Topic6 Topic7 Topic8 Topic9 Topic10

10.7

10.695

10.69

10.685 1411

Fig. 2.

1411.1

1411.2

1411.3 1411.4 1411.5 Topic Deviation Objective

1411.6

1411.7

1411.8

Multi-objective GA Optimization Pareto Front for 5 Clusters

second and third experimentation sessions attempted to organize the set of 4090 tweets into 5 and 10 clusters respectively. The multi-objective optimization results for the each one of the experimentation scenarios are presented in Figs. 1, 2 and 3. Attention should be focused on the obtained Pareto fronts which represent the Pareto optimal solutions with respect to both optimization objectives. We are particularly interested in investigating whether the final cluster assignments conform with the explicit optimization objectives adapted by our approach. Firstly, this entails that we need to verify the degree at which data points within a cluster are distributed in a confined region around the corresponding centroid, providing evidence of the spatial compactness of the identified groups. Secondly, it is required to

20.459

Topic Focus Objective

20.4588 20.4586 20.4584 20.4582 20.458 20.4578 20.4576

Fig. 3.

961.3

961.4 961.5 961.6 Topic Deviation Objective

961.7

961.8

C5 0.033 0.356 0.043 0.032 0.094 0.042 0.036 0.036 0.035 0.289

C ONCLUSIONS & F UTURE W ORK

This paper unifies the complementary spatial and topical compactness directives of clustering within the general framework of multi-objective optimization. The first objective, related to spatial compactness, is achieved by minimizing the overall deviation from the corresponding cluster centers. The second, related to topical compactness, is achieved by minimizing the portion of probability mass assigned to low probability topics for the corresponding cluster centroids. The proposed group detection method is anchored on the Latent Dirichlet Allocation (LDA) topic modeling technique, aiming at identifying semantically meaningful sub-communities of twitter users that simultaneously maximize intra-cluster coherence. Our research provides significant insight towards a deeper understanding of the inherent social network structure,

20.4592

961.2

CENTROIDS FOR 5 CLUSTERS C1 C2 C3 C4 0.029 0.123 0.023 0.029 0.033 0.036 0.023 0.030 0.031 0.139 0.024 0.036 0.030 0.170 0.023 0.041 0.727 0.059 0.788 0.035 0.029 0.049 0.023 0.696 0.029 0.077 0.023 0.030 0.030 0.142 0.023 0.032 0.028 0.158 0.023 0.030 0.029 0.041 0.023 0.035

The most important findings of our approach relate to the semantic interpretation of the identified clusters. Tables II, III and IV present the topic probability distribution vectors corresponding to each cluster centroid for the 3 different experimentation scenarios. According to Table II, the first and the third clusters are formed by groups of twitter users focusing on the 5-th and the 6-th topic respectively. The second cluster, however, fails to focus on a particular topic. Table III and, on the other hand, show that the first and third clusters focus on the 5-th topic, whereas, the fourth cluster focuses on the 6-th topic. Moreover, the second cluster does not focus on a particular topic, while the members of the fifth cluster waver between the 2-nd and the 10-th cluster. By applying the same kind of analysis on Table IV, it is easy to deduce that clusters {1, 2, . . . , 10} focus on the topics {5, 4, 5, 3, 10, 1, 6, 2, 8, 5}. V.

20.4574 961.1

T OPICAL C ENTERS FOR 5 C LUSTERS

961.9

Multi-objective GA Optimization Pareto Front for 10 Clusters

737

2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining TABLE IV. Topics Topic1 Topic2 Topic3 Topic4 Topic5 Topic6 Topic7 Topic8 Topic9 Topic10

C1 0.026 0.029 0.027 0.026 0.760 0.025 0.026 0.026 0.025 0.025

C2 0.031 0.032 0.036 0.649 0.042 0.035 0.031 0.031 0.075 0.033

C3 0.025 0.027 0.026 0.025 0.765 0.026 0.025 0.025 0.025 0.026

T OPICAL C ENTERS FOR 10 C LUSTERS CENTROIDS C4 0.032 0.035 0.546 0.043 0.048 0.040 0.031 0.039 0.142 0.037

FOR 10 CLUSTERS C5 C6 C7 0.038 0.455 0.029 0.072 0.045 0.029 0.050 0.042 0.031 0.066 0.066 0.034 0.112 0.068 0.031 0.061 0.131 0.726 0.115 0.045 0.028 0.042 0.057 0.029 0.207 0.038 0.029 0.232 0.048 0.030

determining the special characteristics of social behavior that are shared among individual within a group. Our findings indicate that the final cluster assignments obtained, are in accordance with the explicit optimization objectives adapted by our formulation. Firstly, we verify that data points in each cluster lie within a confined region around the corresponding centroid providing evidence of the spatial compactness of the identified groups. Secondly, we corroborate that the identified cluster centroids concentrate their probability mass on a restricted number of topics, thus, giving rise to highly interpretable groups. Future research will be focused on incorporating the temporal parameter within the clustering process in order to develop a group monitoring mechanism. ACKNOWLEDGMENT This research has been co-financed by the European Union (European Social Fund ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) Research Funding Program: Aristeia, SocioMine R EFERENCES [1] [2] [3]

[4]

[5]

[6]

[7]

[8]

J. Scott, Social Network Analysis: A Handbook. SAGE Publications, January 2000. P. R. Monge and N. S. Contractor, Theories of communication networks. Oxford University Press, USA, 2003. S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos, “Community detection in Social Media,” Data Mining and Knowledge Discovery, vol. 24, no. 3, pp. 515–554, May 2012. [Online]. Available: http://dx.doi.org/10.1007/s10618-011-0224-z L. A. Adamic and N. Glance, “The political blogosphere and the 2004 u.s. election: divided they blog,” in Proceedings of the 3rd international workshop on Link discovery, ser. LinkKDD ’05. New York, NY, USA: ACM, 2005, pp. 36–43. [Online]. Available: http://doi.acm.org/10.1145/1134271.1134277 W. Yuan, D. Guan, Y.-K. Lee, S. Lee, and S. J. Hur, “Improved trustaware recommender system using small-worldness of trust networks,” Know.-Based Syst., vol. 23, no. 3, pp. 232–238, Apr. 2010. [Online]. Available: http://dx.doi.org/10.1016/j.knosys.2009.12.004 F. Wu, B. A. Huberman, L. A. Adamic, and J. R. Tyler, “Information flow in social groups,” Physica A: Statistical and Theoretical Physics, vol. 337, no. 1-2, pp. 327–335, June 2004. [Online]. Available: http://dx.doi.org/10.1016/j.physa.2004.01.030 P. Liu, B. Raahemi, and M. Benyoucef, “Knowledge sharing in dynamic virtual enterprises: A socio-technological perspective,” Know.-Based Syst., vol. 24, no. 3, pp. 427–443, Apr. 2011. [Online]. Available: http://dx.doi.org/10.1016/j.knosys.2010.12.004 J. Srivastava and R. Cooley, “Web usage mining: Discovery and applications of usage patterns from web data,” SIGKDD Explorations, vol. 1, pp. 12–23, 2000.

738

C8 0.027 0.715 0.033 0.028 0.048 0.030 0.026 0.028 0.028 0.032

C9 0.033 0.042 0.047 0.043 0.073 0.051 0.033 0.598 0.033 0.041

C10 0.028 0.030 0.030 0.027 0.745 0.028 0.027 0.027 0.027 0.027

[9] G. Buehrer and K. Chellapilla, “A scalable pattern mining approach to web graph compression with communities,” in Proceedings of the 2008 International Conference on Web Search and Data Mining, ser. WSDM ’08. New York, NY, USA: ACM, 2008, pp. 95–106. [Online]. Available: http://doi.acm.org/10.1145/1341531.1341547 [10] D. Gibson, R. Kumar, and A. Tomkins, “Discovering large dense subgraphs in massive graphs,” in Proceedings of the 31st international conference on Very large data bases, ser. VLDB ’05. VLDB Endowment, 2005, pp. 721–732. [Online]. Available: http://dl.acm.org/citation.cfm?id=1083592.1083676 [11] P. K. Reddy, M. Kitsuregawa, P. Sreekanth, and S. S. Rao, “A graph based approach to extract a neighborhood customer community for collaborative filtering,” in Proceedings of the Second International Workshop on Databases in Networked Information Systems, ser. DNIS ’02. London, UK, UK: Springer-Verlag, 2002, pp. 188–200. [Online]. Available: http://dl.acm.org/citation.cfm?id=645321.649772 [12] R. Krishnan and D. Starobinski, “Efficient clustering algorithms for self-organizing wireless sensor networks,” Ad Hoc Netw., vol. 4, no. 1, pp. 36–59, Jan. 2006. [Online]. Available: http://dx.doi.org/10.1016/j.adhoc.2004.04.002 [13] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan, “Group formation in large social networks: membership, growth, and evolution,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’06. New York, NY, USA: ACM, 2006, pp. 44–54. [Online]. Available: http://doi.acm.org/10.1145/1150402.1150412 [14] M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,” Proceedings of the National Academy of Sciences, vol. 99, no. 12, pp. 7821–7826, Jun. 2002. [Online]. Available: http://dx.doi.org/10.1073/pnas.122653799 [15] R. Kumar, J. Novak, and A. Tomkins, “Structure and evolution of online social networks,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’06. New York, NY, USA: ACM, 2006, pp. 611–617. [Online]. Available: http://doi.acm.org/10.1145/1150402.1150476 [16] J. Abello, M. G. C. Resende, and S. Sudarsky, “Massive quasiclique detection,” in Proceedings of the 5th Latin American Symposium on Theoretical Informatics, ser. LATIN ’02. London, UK, UK: Springer-Verlag, 2002, pp. 598–612. [Online]. Available: http://dl.acm.org/citation.cfm?id=646389.690506 [17] Z. Zeng, J. Wang, L. Zhou, and G. Karypis, “Out-of-core coherent closed quasi-clique mining from large dense graph databases,” ACM Trans. Database Syst., vol. 32, no. 2, Jun. 2007. [Online]. Available: http://doi.acm.org/10.1145/1242524.1242530 [18] A. McCallum, X. Wang, and A. Corrada-Emmanuel, “Topic and role discovery in social networks with experiments on enron and academic email,” J. Artif. Int. Res., vol. 30, no. 1, pp. 249–272, Oct. 2007. [Online]. Available: http://dl.acm.org/citation.cfm?id=1622637.1622644 [19] M. Steyvers and T. Griffiths, “Probabilistic topic models,” Handbook of latent semantic analysis, vol. 427, no. 7, pp. 424–440, 2007. [20] J. Handl and J. D. Knowles, “An evolutionary approach to multiobjective clustering,” IEEE Trans. Evolutionary Computation, vol. 11, no. 1, pp. 56–76, 2007.

Suggest Documents