Community Detection Based on Lion Optimization ...

100 downloads 251 Views 859KB Size Report
Social Networks depict the interactions between individuals or entities and are ... research Ant Lion Optimization (ALO) has been used as effective optimization ...
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

1

Community Detection Based on Lion Optimization Algorithm Ramadan Babers, Neveen I. Ghali, AboulElla Hassanein and Naglaa M. Madbouly

Abstract—One of the important issue in Online Social Networks (OSNs) in recent years is communities detection in such networks. Social Networks depict the interactions between individuals or entities and are represented by a graph of interconnected nodes. The vast amount of data leads to the need of analyzing such social network. Community detection problem can be represented as an optimization problem as the objective is how to divide the network to groups of nodes while the connectivity between nodes in the same group is better than connectivity with other nodes. In this research Ant Lion Optimization (ALO) has been used as effective optimization method to detect the number of communities in the networks automatically. The results show that ALO succeed to find an optimization community structure based on the quality function used. Keywords—Networks community detection, social networks, ant lion optimization.

I.

I NTRODUCTION

Community structure is a network characteristic describing the tendency of groups of nodes to form intensive linkage between them than other nodes of the network [1]. Community structure used in the analysis of networks for many applications such as collaboration network, which contains collection of individuals and other entities needed to achieve a specific target [2]. There are some areas affected by communities such as sociology, bioinformatics, geography, information science and marketing. Visualizing social networks present the data into meaning information which must be not hard to read for helping scientists see and understand their data. Girvan-Newman (GN) referred to communities boundaries can be defined by measuring the edges between communities by counting the shortest paths between nodes that passes through a special edge in the network [3]. Modularity which introduced by Girvan-Newman is a function to measure the quality of partition of the network [13]. High Modularity indicates to strong community structure that has dense edges and interactions between individuals in community. To solve community detection problem, a lot of algorithms and techniques have been investigated as artificial fish swarm [4], R. Babers, Faculty of Science, Helwan University, Egypt, e-mail: ramadanf [email protected] Neveen I. Ghali, Faculty of Science, Al-Azhar University, Egypt. A. E. Hassanien, Faculty of Computers and Information, Cairo University, Egypt. Scientific Research Group in Egypt, See(http://www.egyptscience.net.) Naglaa M. Madbouly, Faculty of Science, Helwan University, Egypt. Manuscript received April 19, 2005; revised January 11, 2007.

artificial bee colony [5] and genetic algorithm [20]. The rest of this paper is organized as: Section (II) presents a literature review of the community detection problem and related works. Section (III) presents the ant lion optimization technique. Section (IV) presents the proposed algorithm. Section (V) presents the experimental result. Section (VI) formulates the general Conclusion and future work. II. L ITERATURE R EVIEW A. Social Networks an Overview. The simplest definition for the network is a set of vertices or nodes linked by a set of edges. Vertices can represent anything like individual, product and protein. Edges is a kind of relation between two nodes, which have different types such that direct and undirected edges and have properties such that edge weight. Social networks defined as information network which people can share common interest and interact with each other [6]. In recent years the social networking sites have becomes an important part of internet users, these sites target connecting people and creating groups, for example Tweeter, MySpace, and FaceBook. The last few years social networks becomes more specific and have clear target, for example Last.FM (Music) and LinkedIn (Business). Most of the social networks share the same properties, user profile, like and dislike, contact list and possibility to upload images and videos. The social networks consists of individuals that are connected by type of interactions. Networks are defined formally using graph structures, G = (V, E) where V is the set of individuals and E is the set of edges. The number of individuals denoted by n = |V | and the number of edges by m = |E|. Given a set of data points {x1 , x2 , .....xn }, the goal of clustering is to divide the data points into groups such that points in the same group are similar and points in different groups are dissimilar to each other. Each vertex vi in this graph represents a data point xi . Two vertices are connected if the similarity sij between the points xi and xj is positive or larger than a certain threshold. Community defined as groups of individuals that are more connected to each other than to individuals outside the group and the individuals inside community sharing some common features depends on the type of the network data. Community detection in complex networks is a widely studied subject in recent years in sociology, ecology, biology, protein-protein interaction and natural resources networks [7].

JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

B. Measures for Social Networks Analysis. There are some useful statistics and measures for analysis the social networks as Modularity, Association Index, Strength, Eigenvector centrality, Reach, Affinity and Node Degree [8], [9]. Modularity: is one measure of the structure of networks or graphs, it measure the strength of division of a network into communities. It is defined as the difference between the fractions of edges that fall within communities and the expected value of the same quantity if those edges are fall at random on the given communities [23], [24]. Let ci be the community to which vertex i is assigned. Then the fraction of the edges in the graph that fall within communities, i.e., that connect vertices that both lie in the same community, is P 1 X ij Aij δ (ci , cj ) P = Aij δ (ci , cj ) (1) 2m ij ij Aij Where A is an adjacency matrix with n vertices such that:  1 if i and j connected. Aij = (2) 0 otherwise. The P δ -function δ(u, v) is 1 if u = v and 0 otherwise, and 1 ij Aij is the number of edges in the graph. If we preserve 2m the degrees of vertices in the network but otherwise connect vertices together at random, then the probability of an edge existing between vertices i and j is ki kj /2m, where ki is the degree of vertex i. Thus the modularity Q, as defined above, is given by   1 X ki kj Q=( ) Aij − δ (ci , cj ) (3) 2m ij 2m

2

III. T HE A NT L ION O PTIMIZATION The Ant Lion Optimizer (ALO), which was presented by S. Mirjalili, is a novel nature-inspired algorithm. The ALO algorithm is simulating the hunting mechanism of ant lions behavior and used in solving many application [16]. ALO is a metaheuristic algorithm which is a part of stochastic optimization. Metaheuristic algorithms are able to generate different solutions for the problem in each run. The ant lion is an insect and to catch its prey (ant) it digs a cone-shaped pit in sand. The size of the pit depends on the level of hunger and shape of the moon. The diameter and depth of the pit increases as they become more hungry and/or when moon is full [16]. Applying ALO algorithm to community detection problem, an ant lion represents an individual in network and ants are their food which are move over the search space. Ants move randomly when searching for food. In this research we use random walk for modeling ants’ movement. The position of ants and ant lions are saved in the following matrices: A1,1  A2,1  .  =  ..  .  . . An,1 

MAnt

··· ··· .. . .. . ···

··· ··· .. . .. . ···

 A1,d A2,d  ..   .  ..   . An,d

(4)

Where MAnt is the matrix for saving the position of ants , Ai,j is the value of the j − th dimension of i − th ant, n is the number of ants and d is the number of variables. AL1,1  AL2,1  .  =  ..  .  . . ALn,1 

C. Previous Work. Community detection in complex networks is an important study which is discussed by many researchers who proposed different methods to find it. The techniques used to analyse social networks are developing rapidly. F. Malliaros in [10] present comparative review of the methods presented for clustering directed networks, methods and metrics for evaluating graph-clustering results. The proposed methods such as naive graph transformation approach, transformations maintaining directionality, extending clustering objective functions and methodologies to directed networks and alternative approaches. N. Azizifard in [14] proposed an algorithm to find out communities in a way that modularity factor increases, for this goal random walks and random local search agent were used. This proposed algorithm gives better modularity in comparison with other proposed algorithms such as M. Girvan and M. Newman [3], M. Newman [13] and J. Duch and A. Arenas [12].G. Pan et al in [11] developed an online community detection algorithm with linear time complexity for large complex networks, the proposed algorithm optimizes expected modularity and running time less than the commonly used Louvain algorithm [15].

A1,2 A2,2 .. . .. . An,2

MAntlion

AL1,2 AL2,2 .. . .. . ALn,2

··· ··· .. . .. . ···

··· ··· .. . .. . ···

 AL1,d AL2,d  ..   .  ..   . ALn,d

(5)

Where MAntlion is the matrix for saving the position of antlions, ALi,j is the value of the j − th dimension of i − th antlion and d is the number of variables. The fitness of both ants and antlions are saved in the following matrices:   f ([A1,1 , A1,2 , · · · , A1,d ])  f ([A2,1 , A2,2 , · · · , A2,d ])    ..   . MOA =  (6)    ..   . f ([An,1 , An,2 , · · · , An,d ]) Where MOA is the matrix for saving the fitness of ants, Ai,j is the value of the j − th dimension of i − th ant, n is the number of ants and f is the objective function.

JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

 f ([AL1,1 , AL1,2 , · · · , AL1,d ])  f ([AL2,1 , AL2,2 , · · · , AL2,d ])    ..   . =    ..   . f ([ALn,1 , ALn,2 , · · · , ALn,d ])

3



MOAL

(7)

Where MOAL is the matrix for saving the fitness of antlions, ALi,j is the value of the j − th dimension of i − th ant, n is the number of antlions and f is the objective function. A. Assumptions. Using ALO may be used to improve community detection in social network by searching and discovering the relationships between individuals. In this research ALO optimizer used to improve community detection such networks by analyzing the relationships between individuals and discovering the hidden relationships. The behavior of individuals assumptions are derive for formulation of community identication as the following: • Ants move in the search space using different random walks, which applied to all the dimension of ants and these walks are affected by the pits of antlions. • Each ant can be caught by an antlion in each iteration. • Antlions build its traps proportional to their fitness. • Antlions after caught prey, it change its position and build new trap to improve its chance of catching new prey. B. ALO Steps Description. Simulation of the antlions behavior is as follows: 1) Random Walks of Ants. Ants update their position with random walk at every iteration Equation (8). There is a boundary of seach space to keep the random walks inside the search space. Xit = (

(Xit − ai ) × (di − cti ) ) + ci bti − ai

(8)

Where ai is the minimum of random walk of i − th variable, bti is the maximum of random walk in i − th variable, cti is the minimum of i − th varaible at t − th iteration and bti is the maximum of i−th variable at t−th iteration. 2) Trapping in Antlion’s Pits. The ants randomly walk in hyper sphere around the antlion by the vectors c and d as the following: cti = Antliontj + ct

(9)

dti = Antliontj + dt

(10)

Where ct is the minimum of all variables at t − th iteration, dt is the vector including the maximum of all variables at t − th iteration. cit is the minimum of all variables for i−th ant, dtj is the maximum of all varaibles

for i − th ant and Antliontj is the position of the selected j − th antlion at t − th iteration. 3) Building Trap. The random function used for selecting antlions based on their fitness which give high chance to the fitter antlions for catching ants. 4) Sliding Ants Towards Antlion. As discussed before the traps are proportional to the antlions fitness. The antlions shoot sands outwards the pit to fall down ants. 5) Catching Prey and Re-building The Pit. After antlions catch the ants, the antlions need to update its position to improve its chance of catching new ant. 6) Elitism. At every iteration, the best solution is saved and considered as elite. For more information [16]. IV.

T HE P ROPOSED A LGORITHM

The ALO algorithm needs some modification to apply it to the community detection problem. The proposed algorithm is Algorithm1. Algorithm 1 Ant Lion Optimization Algorithm 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18:

Input: A network G = (V, E). Output: Community membership for input nodes. procedure I NITIALIZE THE PARAMETERS : M ax Iteration, number of trails, distance and step. →. Randomly initialize the position of ants − x a Calculate the fitness of ants and antlions. Find the best antlions and assume it as the elite (determined optimum). for each ant do Select an antlion randomly. Update ant’s position. end for Calculate the fitness of all ants. Replace an antlion with its corresponding ant if it becomes fitter. Update elite. t ← t + 1. Until t > M ax Iteration. return Elite. end procedure

To represent individuals in network we chose the locusbased adjacency encoding scheme which depends on genetic algorithm [18] [19]. The antlion position represented as x consists of n elements (x1 , x2 , · · · , xn ). A value j in the range [1 · · · n] assigned to each element n. Also, a value j assigned to i element. The assignment between i and j is simulate the relation between individual i and individual j, so the two individuals are in one community. The position of antlion represents a solution in the search space and the

JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

4

fitness value of each antlion represents the amount of ants that are antlion can prey them. The number of ants can found in the same position of antlion is expressed as yi = f (xi ), where yi is the objective function value assigned to xi . Step in algorithm represents the number of modified individuals’ membership to move the solution xi toward a solution xj by using crossover used in genetic algorithm [20]. The step value is ∈ [1 · · · n]. To measure the distance between antlions and ants we used a distance measure dis(xi , xj ) based on Normalized Mutual Information [17] to measure the similarity of how much two solutions are similar. V.

E XPERIMENTAL R ESULTS

In this section the proposed algorithm tested on data sets of real life social networks. The communities partitions of those sets are known. Normalized Mutual Information (NMI) [17] used to compare the accuracy of the resulting community structure be measuring the optimum community structures and the detected ones. To measure community quality Modularity is used.

Fig. 1: NMI Values of Different Networks of The Result Community Structure.

The proposed algorithm applied for each network with each objective 10 times and for every best solution the NMI and Modularity are calculated. The number of iteration used was 10 times and average NMI and Modularity is saved. The ALO algorithm was applied with the following parameters value; Max Iterations=100, number of trails=10, distance=0.4 and step=0.05n. The ALO algorithm applied on three social networks: 1) The Zachary karate Club [21]: it is simulate the relationships between the club president and the karate instructor, it contains 34 nodes and 78 edges. 2) The Bottlenose Dolphin Network [22]: it is simulate the observation over seven years of behavior of bottlenose dolphin, it contains 62 nodes and 318 edges. 3) American College Football Network [3]: it is simulate the football games between American College during a regular season in fall 2000, it contains 115 nodes and 1226 edges. Figs [1] and [2] show the average Modularity and NMI for each iteration when the ALO is applied with different objectives. Fig. [1] shows that modularity objective achieves high NMI values and modularity is higher that the modularity value in Fig. [2] of the ground truth division which means that the detected structure is more modular that the original one. After applying ALO Algorithm on the networks, the networks divided to small size data sets. Fig. [3] shows a result of Zachary Karate network after applying ALO. The dashed line shows the original division of the network and rectangles shows the different communities produced after applying ALO. We noted that the result matches to the original result as it divided to two communities and and each

Fig. 2: Modularity Values of Different Networks of The Result Community Structure.

community is farther divided into two communities. Figs [4] and [5] show that the Dolphin network, we noted that node 40 is misclassified in modularity objective and back to its community in fitness objective. Also, fitness objective detect that the large community in left can be divided into two communities and small community (4, 9, 21, 37, 60) can be immersed in other communities in the same side without any effect in original division. Fig. [6] shows the original division of the American College Football Network (12 communities) and Figs [7] and [8] show that the communities are less than 12 communities, 9 and 10 communities respectively. This is happened as smaller communities assigned to larger communities leading to a more modular structure. VI.

C ONCLUSION AND F UTURE W ORK

This paper presented Ant Lion Optimization as a technique used effectively for the community detection in networks

JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

Fig. 3: Zachary Karate Network After Applying ALO Algorithm (Modularity and Fitness Objectives)

Fig. 4: Dolphin Network After Applying ALO Algorithm (Modularity Objective)

problem. ALO was applied with two different quality functions to capture the intuition of communities, community fitness and modularity are used. The results show that the performance of the proposed algorithm is promising in terms of accuracy and successfully finds an optimized community structure. In future, we will investigate to setting some criteria for increasing the accuracy and scalability of community detection problem. R EFERENCES [1]

Sh. Bansal, S. Bhowmick and P. Paymal,”Fast Community Detection For Dynamic Complex Networks, Communications in Computer and Information Science”, Springer, Vol. 116, pp. 196-207, 2011.

5

Fig. 5: Dolphin Network After Applying ALO Algorithm (Fitness Objective)

Fig. 6: Original Division For The American College Football Network.

[2]

A. Borabasi, H. Jeong, Z. Nedc and et al,”Evolution Of The Social Network Of Scientific Collaborations”, Physical, Vol. 311(3), pp. 590614, 2002.

[3]

Girvan, M. and Newman, M.E.J, ”Community Structure In Social And Biological Network”, Proceeding of the National Academy of Science, Vol. 99, pp. 7821-7826, 2002.

[4]

Eslam A.H., Ahmed I.H., A. Hassanien and Aly A.F., ”Community Detection Algorithm Based On Artificial Fish Swarm Optimization”, Intelligent Systems’ 2014, Springer, pp. 509-521, 2015.

[5]

Ahmed I.H., Hossam M.Z., A. Hassanien and Aly A.F., ”‘Networks Community Detection Using Artificial Bee Colony Swarm Optimization”’, Proceedings of the Fifth International Conference on Innovations in BioInspired Computing and Applications IBICA 2014, Springer, pp. 229-

JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007

6

[11] [12]

[13] [14] [15]

[16] [17]

[18]

Fig. 7: The American College Football Network After Applying ALO Algorithm (Modularity Objective).

[19]

[20]

[21]

[22]

[23] [24]

Fig. 8: The American College Football Network After Applying ALO Algorithm (Fitness Objective).

[6]

[7] [8]

[9]

[10]

239, 2014. D. Ma,”Visualization Of Social Media Data: Mapping Changing Social Networks”, Faculty of Geo-Information Science and Earth Observation, University of Twente, Netherlands, Master of Science’s Thesis, 2012. J. Dahlin and P. Svonson,”Ensemble Approaches For Improving Community Detection Methods”, arXiv preprint arXiv:1309.0242, 2013. H. Whitehead,”Analyzing Animal Societies, Quantitative Methods For Vertebrate Social Analysis”, The University of Chicago Press, Chicago and London, pp. 118-145, 2008. M. Salama, M. Panda, Y. Elbarawy, A. Hassanien and A. Abrahem,”Computational Social Networks: Security And Privacy”, Springer Verlag, Vol. 2, pp. 3-21, 2012. F. Malliaros and M. Vazirgiannisl,”Clustering And Community Detection In Directed Network: A Survey”, Physics Repolts, Computer Physics

Reports, Vol. 533, pp. 95-142, 2013. G. Pal and et ao,”Online Community Detection For Large Clmplex Netwokrs”, Pnos One, Vol. 9(7), 2014. J. Duch and A. Arenas,”Community Detection In Complex Networks Using Extremal Optimization”, Physical Review, American Physical Society, Vol. 72(2), 2005. M. Newman,”Fast Algorithm For Detecting Community Structure In Networks”, Physical Review, Vol. 69(6), 2004. N. Azizifard,”Social Network Clustering, Information Technology And Computer Science”, Vol. 1, pp. 76-81, 2014. V. Blondel, J. Guillaume and et al,”Fast Unfolding Of Communities In Large Networks”, Journal of Statistical Mechanics: Theory and Experiment, Vol. 10, 2008. M. Seyedali, ”The Ant Lion Optimizer”, Advances in Engineering Software, Elsevier, Vol. 83, pp 80-98, 2015. L. Danon, A. Diaz-Guilera and et al, ”Comparing Community Structure Identification”, Journal of Statistical Mechanics: Theory and Experiment, IOP Publishing, Vol. 2005(09), pp 09008, 2005 Shi, Chuan and et al, ”A New Genetic Algorithm For Community Detection”, Complex Sciences, Springer, pp 1298-1309, 2009. Pizzuti and Clara, ”Community Detection In Social Networks With Genetic Algorithms”, Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation, ACM, pp 1137-1138, 2008. Pizzuti and Clara, ”Ga-net: A Genetic Algorithm For Community Detection In Social Networks”, Parallel Problem Solving from NaturePPSN X, Springer, pp 1081-1090, 2008. Zachary and Wayne W., ”An Information Flow Model For Conflict And Fission In Small Groups”, Journal of anthropological research, JSTOR, pp 452-473, 1997. Lusseau David. ”The Emergent Properties Of A Dolphin Social Network”, Proceedings of the Royal Society of London B: Biological Sciences, The Royal Society, Vol. 270(Suppl 2), pp S186-S188, 2003. M. Newman,”Analysis of Weighted Networks”, Statistical Mechanics, Physical Review, American Physical Society, Vol. 70(5), 2004. R. van Dobben de Bruyn,”The Modularity Theorem”,Bachelors Thesis, Mathematisch Instituut, Universiteit Leiden, 2011.

Suggest Documents