A Discrete Bat algorithm for the community detection problem Eslam Ali Hassan1,3 , Ahmed Ibrahem Hafez2,3 , Aboul Ella Hassanien1,3 , Aly A. Fahmy1 1 Faculty of Computers and Information, Cairo University, Egypt
[email protected],
[email protected],
[email protected] 2 Faculty of Computer and Information, Minia University, Egypt
[email protected] 3 Scientific Research Group in Egypt (SRGE), http://www.egyptscience.net
Abstract. Community detection in networks has raised an important research topic in recent years. The problem of detecting communities can be modeled as an optimization problem where a quality objective function that captures the intuition of a community as a set of nodes with better internal connectivity than external connectivity is selected to be optimized. In this work the Bat algorithm was used as an optimization algorithm to solve the community detection problem. Bat algorithm is a new Nature-inspired metaheuristic algorithm that proved its good performance in a variety of applications. However, the algorithm performance is influenced directly by the quality function used in the optimization process. Experiments on real life networks show the ability of the Bat algorithm to successfully discover an optimized community structure based on the quality function used and also demonstrate the limitations of the BA when applied to the community detection problem. Keywords: Community detection, Community Structure, Social Networks, Bat algorithm, Nature-inspired algorithms
1
Introduction
A social network can be modeled as a graph composed of nodes that are connected by one or more specific types of relationships, such as friendship, values, work. The purpose of community detection in networks is to recognize the communities by only using the information embedded in the network topology. Many methods have been developed for the community detection problem. These methods use tools and techniques from different disciplines like physics, chemistry, applied mathematics, and computer and social sciences [1]. One of the main interests in social network analysis is discovering community structure. A Community is a group of nodes that are tightly connected to each other and loosely connected with other nodes. Community detection is the process of network clustering into similar groups or clusters. Community detection has many applications including visualization, detecting communities of special interest,
2
Authors Suppressed Due to Excessive Length
realization of the network structure [2], etc [3]. One of the recent techniques in community detection is Girvan-Newman (GN) algorithm [4]. Girvan-Newman is a divisive method that uses the edge betweenness as a measure to discover the boundaries of communities. This measure detects the edges between communities through counting the number of shortest paths between two specific nodes that passes through a special edge or node. Later on Girvan and Newman introduced a new technique called Modularity [5]. Modularity measures the community strength of a partition of the network, where high Modularity means strong community structure that has dense inter-connections between the community’s nodes, Thus the problem of community detection can be viewed as a Modularity Maximization problem. Finding the optimal Modularity is an NPComplete problem, many heuristic search algorithms have been investigated to solve this problem such as genetic algorithm (GA), simulated annealing, artificial bee colony optimization (ABC) [1]. The remainder of this paper is organized as follows. In Section 2 we formulate the community detection problem and show the objective function used in the research. In Section 3 we illustrate The Basic Bat algorithm. In Section 4 we illustrate our proposed algorithm. Section 5 shows our experimental result on real life social networks. Finally we provide conclusions in section 6.
2
The community detection Problem
A social network can be viewed as a graph G = (V , E ), where V is a set of nodes, and E is a set of edges that connect two nodes of V . A community structure S in a network is a solution to the problem which is a set of communities of nodes that have a bigger density of edges among the nodes and a smaller density of edges between different sub-groups. The problem of detecting m communities in a network, where the number m is unknown can be formalized as finding a clustering of the nodes in m subsets that can best satisfy a given quality measure of communities F(S ). The problem can be formalized as an optimization problem where one usually wants to optimize the given fitness measure F(S ) [6]. The objective function has a significant role in the optimization process; it’s the “steering wheel” in the process that leads to good solutions. A lot of objective functions have been introduced to capture the intuition of communities, and there does not exist a direct method to compare those objective functions based on their definitions [7–9]. Network Modularity [5] is one of the most used quality measure of communities in the literature. Modularity measures the number of within-community edges relative to a null model of a random graph with the same degree distribution. We use community Modularity as our objective function in the Bat algorithm.
3
Bat Algorithm
The Bat Algorithm (BA), that was presented by Xin-She Yang [10], is a new meta-heuristic optimization algorithm derived by simulating the echolocation
A Discrete Bat algorithm for the community detection problem
3
system of bats. It is becoming a promising method because of its good performances in solving many problems [11]. Biological inspiration In nature, the echolocation behavior of bats used for hunting and navigation, where a bat emits ultrasound pulses to the surrounding environment, then listens back to the echoes in order to locate and identify preys and obstacles as shown in Figure 1. Each bat in the swarm can find the more nutritious area by individual search or moving forward towards a more nutritious location in the swarm. The basic idea of the BA is to imitate the echolocation behavior of bats with local search of bat individual for achieving the global optimum. At the beginning of the search, each individual bat emits pulses with low frequency but with greater loudness in order to cover bigger area in the search space, later on when a bat approaches from its prey, it increases its pulse emission rate and decreases the pulse loudness, this frequency/loudness adjustment process can control the balance between the exploration and the exploitation operations of the algorithm. 3.1
Bat Movements Description
Applying the algorithm to the optimization problem, generally a ‘bat’ represents an individual in a population. The environment in which the artificial bat lives is mainly the solution space and the states of other artificial bats. Its next movement depends on its current state, velocity and its environmental state (including the quality and state of the best bats in the swarm). Let the state vector of an artificial bat x composed of n variables such that x = (x1 , x2 , . . . , xn ) , and the velocity vector of artificial bat v composed of n variables such that v = (v1 , v2 , . . . , vn ) , fi is the current frequency that moves between fmin and fmax , vit is a new velocity selected according to equation 2 and xti a new state selected as in equation 3. Where β ∈ [0, 1] is a random vector drawn from a uniform distribution, x∗ is the current best global solution that is determined after selecting the best solutions between all the n bats in the current population, Because the product λi fi is the increase in velocity, then we can either use fi (or λi ) to modify the velocity while fixing the other parameter according to the problem type. fi = fmin + (fmax − fmin ) ∗ β
(1)
vit = vit−1 + (xti − x∗ ) ∗ fi
(2)
xti = xit−1 + vit
(3)
We chose fmin = 0 and fmax = 1, and draw initial frequency of each virtual bat from a uniform distribution f0 ∈ [fmin , fmax ]. Performing a local search, one of the best solutions in the current population is selected randomly, then a new solution is produced using 4, Where ∈ [0, 1] is
4
Authors Suppressed Due to Excessive Length
Fig. 1: Shows the echolocation behavior of an Artificial Bat approaching its prey
random number, and At = hAti i is the average loudness of all bats in the current population. xnew = xold + At
3.2
(4)
Pulse emission rate and Loudness description
Moreover, the pulse emission rate r and the loudness A of each virtual bat must be updated when a new population generated, As soon as the bat found its prey, its loudness decreases and pulse emission rate increases according to equations 5 and 6 At+1 = αAt i
(5)
rit+1 = ri0 [1 − exp(−γt)]
(6)
Where α and γ are constants, after performing some experiments we chose α = 0.99, γ = 0.02, ri0 = 0.01 and A0i = 0.99 , Also we can notice that Ati ← 0, rit ← ri0 as the virtual bat gets closer to its prey t ← ∞, which is rational since a bat has just reached its prey and stops emitting any sound. This update process will only occur when a new better solution generated, which imply that this bat moves towards an optimal solution.
4
Proposed Algorithm
It is not possible to directly apply the Bat algorithm to the problem of community detection. First the algorithm should be redesigned. In the current section we illustrate the modified Bat algorithm so it can be applied to the community detection problem. The algorithm is outlined in listing 1.
A Discrete Bat algorithm for the community detection problem
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
5
Input: A Network G = (V , E ) Output: Community membership assignments for network’s nodes Initialize the parameters: pulse rates ri , loudness Ai , Swarm size np, the maximum number of iterations M ax Iterations Randomly initialize each artificial bat in the swarm with a random possible solution as its current state, the velocity of each bat, and calculate its fitness repeat Generate new solutions by adjusting frequency, and updating velocities and locations/solutions [equations 2 to 4 and the modified equations 7 and 8] if rand > ri then Select a solution among the best solutions Generate a local solution around the selected best solution end if rand < Ai and f(xi ) < f(x∗ )) then Accept the new solutions Increase ri and decrease Ai end t←t+1 until t > M ax Iterations; return the best solution achieved
Algorithm 1: Bat algorithm
4.1
Parameters redesign
The first step of modifying a BA for solving the problem of community detection, To provide an appropriate scheme of representation of an individual artificial bat in a population. The locus-based adjacency encoding scheme [12, 13] is selected to represent the solution. In that representation, each artificial bat state x composed of n elements (x1 , x2 , . . . , xn ) and each element can take a value j in the range [1 .. n]. A value j set to the ith element is translated as an association between node i and node j. Which means that, in the community structure detected, nodes i and j will exist in the same community. Normally the bat swarm will contain np artificial bats (AB). The bat current state represents a solution in the search space and the fitness value of the solution represents the amount of food resource at that location. The food condensation in the location of an artificial bat is formulated as yi = f (xi ) , Where yi is the fitness function value associated with xi calculated using Modularity quality measure. 4.2
Bat Movement
Bat movement are described in equations 1, 2 and 3. In a discrete problem representation such equations can not be applied directly. First the difference operator between to bat position will be calculated using equation 7; where g(xi ) is the group assignment of node i in the solution represented by bat position x. 1 if g(xi ) 6= g(x∗i ) di = (xi − x∗i ) = (7) −1 if g(xi ) = g(x∗i )
6
Authors Suppressed Due to Excessive Length
Fig. 2: Shows how a new solution is generated using and the old solution X when the average loudness At = 0.6. Now equation 3 will be considered as uniform crossover between (x, x∗ ) using velocity vector v as the mixing ratio controller, so the new position value will be updated using equation 8. ∗ xi if vi ≥ 1 (8) xnew = i xi otherwise 4.3
Local Search
In order to create a new solution using equation 4 , First should be converted into a vector of size n of uniformly distributed random numbers between 0 and 1, then perform the generation process by selecting a new element from one of the neighbors of the ith node when i > At , otherwise keep the old neighbor as illustrated in Figure 2. 4.4
Current Global Best
The Basic Bat algorithm uses x∗ the current global best solution in updating all VB in the swarm, which will lead to moving all bats to the same location. In optimization problems this will might lead to trap the algorithm in a local optima. To overcome this problem instead of using one global best, we select η top bats according to their fitness value, then for each VB update a randomly selected VB from the current η top best is used. η is set 10% of the population size np by trail and error. Since the algorithm employs stochastic process to find optimal solution, it may converge to different solutions (non-deterministic). It is therefore not uncommon to run the algorithm multiple T times i.e. number of restarts, starting with initial different population in each iteration (chosen randomly) and then returning the best solution found across all runs according to the objective function used in the optimization process.
5
Experimental results
In the section we tested our algorithm on a real life social networks for which a ground truth communities partitions is known. To compare the accuracy of the
A Discrete Bat algorithm for the community detection problem
7
resulting community structures; we used Normalized Mutual Information (NMI) [14] to measure the similarity between the true community structures and the detected ones. Since Modularity is a popular community quality measure used extensively in community detection, we used it as a quality measure for the result community structure of all other objectives. We applied our algorithm on the following social networks datasets :– The Zachary Karate Club: which was first analyzed in [15], contains the community structure of a karate club. The network consists of 34 nodes. Due to a conflict between the club president and the karate instructor, the network is divided into two approximately equal groups. The network consists of 34 nodes and 78 edges. – The Bottlenose Dolphin network: was compiled by Lusseau [16] and is based on observations over a period of seven years of the behaviour of 62 bottlenose dolphins living in Doubtful Sound, New Zealand. The network split naturally into two large groups. – American College football network: [4] represent football games between American colleges during a regular season in Fall 2000, nodes in the graph represent teams and edges represent regular-season games between the two teams they connect. Games are more frequent between members of the same conference than between members of different conferences, the network is divided into 12 conferences. – Facebook Dataset: Leskovec [17] collects some data for the Facebook website -10 ego networks-. The data was collected from survey participants using a Facebook application [7]. The ago network consist of a user’s –the ego node– friends and their connections to each other. The 10 Facebook ego networks from [17] are combined into one big network. The result network is undirected network which contain 3959 nodes and 84243 edges. Despite there is no clear community structure for the network, a ground truth structure was suggested in [18]. For each dataset; we applied the Bat algorithm 10 restarts and calculated the NMI and Modularity value of the best solution selected. This process was repeated 10 times and average NMI and average Modularity is reported. The Bat algorithm was applied with the following parameters values; number of VB in the population np = 100 and the maximum number of iterations M ax Iterations = 100. Table 1 summarizes the average NMI and Modularity for the result obtained using the Bat algorithm. We observed that the result for the each network is better that its ground truth in term of Modularity. The detected community structures for each network is visualized in Fig.3. Figure 3a shows a visualization of the result for the Zachary network. The original division of the network is indicated by the vertical line and the detected structure is indicated by the nodes’ colors. As we can observe that the detected community structure contains 4 communities with a high Modularity value = 0.4696 in which the top level is similar to the original division of the network, however in the detected structure each group is farther divided into two groups.
8
Authors Suppressed Due to Excessive Length
Table 1: Modularity Result Zachary Karate Bottlenose Dolphin College Football Facebook Modularity 0.4696 0.5498 0.612 0.753 Ground Truth 0.4213 0.395 0.563 0.7234 NMI
0.6873
0.5867
0.84786
0.67374
Thus the NMI of comparing the detected structure with the ground truth of Zachary network is 0.6873 i.e. 68% of similarity between the two structures. Despite that the NMI value is somehow low, it is not a major judgmental criteria of result for two reasons. First the major quality criteria is Modularity value, since the algorithm is optimizing Modularity objective. Second the know ground truth of the network from the original study [15] might not be the optimal one and it is possible that there is a more modular structure that the ground truth. Figure 3b visualizes the result for the Bottlenose Dolphin network. As before the original division of the network is indicated by the curved line and the detected structure is indicated by the nodes’ colors. The original division of the Bottlenose Dolphin network is divided into 2 groups with Modularity value of 0.395. The detected community structure by the Bat algorithm is divided into 5 groups and has a higher Modularity value of 0.5498. Regarding the College football network; the original division of the network into conferences is highlighted in Fig.3c; only edges between nodes from the same group are shown and nodes’ color refer to which groups they belong to. From Fig.3c we can observe that some nodes never played any match with other nodes from their group. The detected community structure for this network is shown in Fig.3d which contains 10 communities with a more modular structure than the one shown in Fig.3c. Figure 3e shows a visualization of the largest 12 communities from the detected community structure of the Facebook dataset. As we can observe that each group show a dense connection between nodes from the same group and spare or low interactions between nodes from different groups except for a few nodes that has a large edge degree cross different groups such as nodes {1750, 476 and 294} which make group membership assignment a hard process for the algorithm. 5.1
Comparison Analysis
Now we compare the result obtained by Bat algorithm with other methods in the literature which are AFSA community detecton [19], Infomap [20], Fast greedy [21], Label propagation [22], Multi-level or Louvain [23], Walktrap [24] and leading Eignvector [25]. Each method is run 10 times for each dataset and the average NMI and Modularity of the result community structure is reported. Figure.4 summarize the NMI and Modularity values for all methods. As we can observe that in term of NMI; Bat algorithm produces a good result compared
A Discrete Bat algorithm for the community detection problem
(a) Result for Zachary Karate network. 80 31 56 30 20 36
2
34
110
26 38 90 46
106 104
86
100 72 55
13
19 35
44 39
83
43
51
29
74
89
66
24 94 17 105 10 5
50 84
4
108
99
114 97 96
88 63
6 85 82 53 41
86
100 72 55
13
19 35 37
44 39
45 93
52 109 8 112 79 9 78 22 69 23
57
28 18 21
1
42
51
89
94
71 77
105
108
99
4
64 98
28 18 88 63 21
40 3 33 7 107 14 101 16 65 61 48
73 75
6 85 82 53 41
(c) Original division of the College Football
57 59
114
17 12 10
96
11 103
60
66
5
97
50 84
67 92
91
25
24 70 29
76
58 49 113 87
68 47 54
115 111 74
40 3 33 7 107 14 101 16 65 61 48
73 75
11 103
71 77
62 15 27
43
32
95 81 80 102 31 56 30 83 20 36
106 104
98
12
70
115 59
26 38 90 46
76
60
64
25
68 47 54
110
67 92 49 58 87 113111
45 93
1
42
2
34
81 91 37
52 109 8 112 79 9 78 22 69 23
(b) Result for Bottlenose Dolphin network.
62 15 27
32
95 102
9
(d) Result for the College Football
908 949910 920 868
897947901 967 1048946915944895 891 913
881
476
930926 942 918 890960 924 1089 957 945
1095
1122
1417
922
927
1086 1096 1123 898 956 925905865 943 939 909 899 1077
1159148018921655189514861475 1174
1075
1581
1617 1639 1173
965953
966 954
883 959921 951863 1074 1044 1091 893 1084
859875
896
929
1434
1783
434
916
851
1444
1118 1015 977 1424 998 1012 1119
1389
818
1120816
1496
1513
849 1390
3272
1585
1753 1915 1920
1977 1989 1927
1983 1987 1928 1981
1974 1978 1589 1590 1921 1979
1019
1972 1980 1975
1991
1990
523
1858
3177
3226
1140
1064
792
1263
1259 1446 12581535 9831007 1580 979 12621260 1431
1527 1269
999 1056
1664
10601267 822 1515 1261 12641398 1516 1533 1545 1518 1534 971 1508995
993 980
1004
1511
1510
1519 1532
1512 1268 1517
1720
1666
1057
443
1650
1681 1782
1266
1514 1509
3312 3387
1414
1062
380
1537
789
987
81 86 76 124 84 8378 80 77 119 85
810
125
29224 207
215
1441
1400
121
122
155
790
1826 1824 1825
226
209
8 55
69
40 195 197 113
132
444406
152
801
6
97103
68 37
94 63 35
11836 111 44
1
21
20 15 6034
53 109
73 52
43
64
14
24 104
93
115
4
42
98
271
1243
3274 3279 3353
3407
3388
3175
2782 2546
2299
2664
2747
2666
27742696
2489
2668
2667
2695 26772594
2548
764
2663
2693
2777
2814
2599
2694
2802
2813
640613667604
2537
2492
2595
2737
932
2799 2801
2614 2800 2618 2798
1070
737803
736
2874
2609
27682769
2772
2659
2883
2824
2387
2238
2443
2386
2687
2781
2587
2681
2589 1821
2701
1780 2394
2761
24132716 2407 2732 2811
2592
2408
486
2645
2422 2678
2699
1318
1401 13271372 1375
1320 1328 13161333 2385
3229
3269 3386
440
3163
1752
935
1498
29602946 2910 2913 2997 3029 2982 29343043 2957 29353063 2987 2893 3005 2955 3102 2992 29422976 30223068 3164 28963031 30423058 3109 2963 30983014 3018 1709 2916 2938 3010 3056 3006 3067 3000 30382951 2936 2900 3032 2967 2970 3181 2927 2928 30283015 2943 2969 2962 2984 2915 3012 2898 2925 30642953 3105 3008 2965 3001 2939 2972 2961 30922901 2890 3096 2903 2981 3026 29912980 3034 3055 2959 3065 3013 2973 3049 3027 28943019 3004 3089 2904 2920 3039 29072971 2999 3023 3093 2986 2918 3110 2899 3104 2929 3048 2919 3007 2930 3035 29953040 3030 2917 2994 3046 3103 3062 2998 2906 3057 2958 3069 2924 3053 2948 2947 2891 3020 30912905 2945 3060 2993 29142956 3021 2944 29262941 2912 2931 3172
3228
1635
1437
3672 3744
3182 3095
3674
3675
3468 3712 35963426 37703777
3449 3448 3590 3636 36343425 384834443771
3612
3438
3702
3437 3594 35653775 3471
3759 35593728 3757
36243628
3566378234473844 3622 3810 3588 3589 3558 3776 3554 3895 3783 3615 3597 3595 344638413760 3632 1355 3555 362336103433 3723 37223727 3710 3818 2204 3765 3427 3568 3553 1904 3721 3704 3703 3635 3781 3724 3434 3616 3840 3846 37333849 3585 3847 359335081361 3732 36173619 3613 3621 3814 3625 3633 3845 3809 1352 34693820 3719 3439 3764 3725 3720 3570 22023626 3627 3561 3908 3900 36113620 3819 3631 36303614 3466 1350 3430 3436 3706 3746 3774 3815 3586 3443 3668 3812 3591 3629 3904 3766 3707 3705 3711 3640 3580 3572 1353 3609 3578 3512
3837
1689
3811 3667 1344 1337 3860 3708 3587 3717 1363 1349 1360 3709 3644 3858 3582 34761347 13563569 3737 1346 1334 3574 1341 3859 3730 3718 3618 13573739 3567 3505 1340 1343 1359 3857 3772 3576 3891
3564 1362 3875
3581 1348
3753 3752 3738 3740 3741 3747
3536
3889
3743
3573 3465 3523
3642 36063550
3715
3504
3603
3520
3511
35283541 3641 3600 3513 3549 3607
3507 3545
3531 3524 3517 3599 3529 3532 3530 3604 3575 3645
3693
3533
3515 3543 3605
3510 3521 35183602 3544 3601
3716
3789 3793
3787 38013798
3790
3785
3797
3748
3749 3742
3516
3745
3869 3886
3880
3890
3868
3813
3577
3514
3643
3887
3639 3726
3583 1336 1351 1364 3473 1345 3571 35383534 3666 1338 3608 3474 13353467 3750 3638 3525 3546 1342 3598 3548 3579
3470 3526 3527 3537 35473522 13583472 3539 3503
3795 38003799
3349 3170
3133 3171 3100
3768
1308
2229
2209 2449 2236 2275
2278
2230 2237
3231
3173
1633
3788 37863794 3796 3792
3317
3249
1652
1502 1311 1315 1499
2733
3791 3802
3268
3270 3108
1253 1246 1252 1249
1248 1507
13211374 1506
3248 3400 3230 3251
3135
3180
1786
1255
2584 2208 2465 2211 2226 2274 2260 17812460 2452 2679 24002223 2251 2244 2462 1692 2248 2258 2454 2267 2257 2242 2447 2246 2231 2273 2217 2227 2253 2451 2390 2466 2444 22622206 2219 2222 22762256
2213
3385
547
1676
1254
1523
2680 2712
2564
3054 2911 3059 30502897 2974 2975 29503099 3045 3009 31062990 2988 29682909 2983 3036 3061 2985 2989 2932 3090 2952 2949 3097 3136 2977 3051 3024 2902 1700 3017 3134 3101 3052 3044 2895 29083002 2892 3003 2923 3066 2933 3107 3025 2940 3047 3037 2978 2937 3094
747
1604
1606 799 1677
1326
1371 1373 1325 1402 1319 1452 133213101313 808 1454 1314 1377
1317 1331 1323
1449
1828
2403
2397 2562
1384
3285
527
812
1787
1788
1183
1050
2821
2428 2418 239625722438 2700 1829 2575 2461 2569 1876 2734 2439 23802435 2634 2581 2578 2583 1823
2567
760
6751418 533
1185
1368
1385
1613
1550 1199 1685 1403 1764 1547 1309
1322
1453
1330 1051 1312 1378
1553
1439
1628
600 603 512556
1181 150116261455
811 1195
492
3155
2921 1802
794
1182
1609
1180
1369
564 1611
1053
2810 2812 2808 2809
2780
2426 2266 2632 1774 2463 2239 2580 2457 2269 2558 1836 1770 2404 237524021765 2405 2207 2234 2393 2252 2456 1779 1777 2265 2277 1579 2255 2264 2240 2379 2424 2247 15742423 2382 2270 2392 2212 2235 17712399 2464 24551562 2453 2221 22152727 2205 2218 1873 1566
2467 2585 2272 2216 2259 2254 22682450 2224 2249 2245 2243 2566 1878 15582389 2643
1827 1554
1599
1594 1598 1600 1595 1552 1784 14041751
2725 2806
2804
2648
1543
1376
491
3162 3399
3158 3152 3148
3147 3159 3151
3296
659
1433 687
734
934813 795 1184
690 804722 1068 707521 797
1419 721 68815491257 937
1450
1503
490
493
2684
2577 1573 2434 1556 23812469 2391 1571 2702 1776 2225 2401 2263 2214 2433 1766 1820 25912582
2647
809
1379
1831
2878
2442
2371 2743 2660 18742778 2746 2430 1578 2615 24592445 2674 2662 2417 2406 2412 2458 1772 2432 2427 23772671 1884 1778 1567 2233 1560 2395
2271
677 724
1380
2807
2232
2796
587 805 796
683
1524
271527142771
2565 237226201886
2767 260826032766 2751 26022384 2619 2571 2881 2738 2613 2626 2623 2383 2573 1775 27412640 2378 2628 2698 2605 2606 2776 1684 1565 2431 2673 2637 2415 2750 2374 2731 2672 2646 2752 2629 2588 2612 2429 2860 24202670 2376 2724 2411 2756 27082568 2758 25571832 2631 16952617 2723 1887 1563 2610 2762 2757 2669 2579 2370 424 2440 2763 2369 2555 188527832706 1769 2561 26012448 2649 26412744 183315721877 27492704 18222657 2437 2729 1575 2425 2650 26522421 2748 27601570 2398 2709 1875 2713 259027392785 2625 1568 2616 2661 1879 27302656 2436 225022412441 2707 2765 2388 2726 26112685 2604 2563 2468 2745 2638 2718 2446 2764 2586 2719 2228 2373 2416 2682 2409 2607 2742 2717 1682 1688 2822 2728 2419 18301561 25742721 1773 2740 1768 2560 2220 2635 2576
2683
3153 3160
3149 3115 3150 3154 3161 31573146
3250
569
553
1367
2784
3156
1721
787
793
2773
2644
294
731
1542
700 732
2496
1705
1115
814
1366
1432
2642
275
3380
786
1880
798
2261
1217
3266
349
347
1704
1546
1442
753 674
25702720
1814
3402
3295
3293
1798 1713
1250
1116
779
2410
1228
1717 1714 1223
3415 3414
1251
1399
425
2770
3241
3138
761
419
422
2703
252026002779
3247
304
1722 310 1229 12061699 1797 3168
3243 3131
684 776702 709 520 537726 766 499 615597 595655630 628704743 1067 577 551 581686 669 662 757 632579 571 528 582 591 696 651634 658717509 755762596 560 572689543501 534 720 506 576 739570 629 517 568 626 788660668 647730781701 627 524 590 751 697 513 633 725 676741 678672 652710 679 775 500 748 502 522593784616 745 738 525 518694 740 7 28 636 711 503 768 727 559716 657530601 705693586 544558555 546763718 715735 746535 496548 532511498 723519714 703 542 708 539 562 495554 692 729 585526 680 515778573 605541 557654 589 538 574 549 598 550602567 713682754765580552 695584
2472
2330
3166
214
2288 2797
2542
3165
3126
750
1188
2705
2845 2843 2841 2847
3183
1804
639645770 771 782 578933 497 625681650 649670 666 643638742 609566 8001117 620 712 637 646691780706 565 508656607 536 563510 802516
2333 24701767 2525 2554 2337
2516
3142
3176
1812
1069
2805
2551
3116 3128
3129
3316 3348 3346
599772531635529
936663
2736
2348 2875
617
611 774561 505777642 631588 661 606621 767769 1114719 614756749641733 592545 575 665685 653610 622 758 618 664 673 612 752 644 619 608 624759 1329
2597
2545 2366 2473 2527 2598 2311 25052534 2363 2368 2335 2351 2501 2298 2310 2352 25362521 2517 2315 2281 2512 2483 23602313 2307 2478 2541 2494 2327 2515 2303 2319 2329 2285 2486 2317 2480 2293 2332 2539 2535 2297 2553 23672308 252925142354 2530 2471 2323 2526 2487 23262476 2338 2550 2283 2314 2318 2479 2507 2347 2552 2350
26902491 2334 2364 2688 2488 2543 2676 2481 2342 2522 2320 2528 2538 2502 2655 2503 2593 2513 2509 2511 2531 2518 2341 2735 2331 2484 2485 2477 2559 2305 2654 2339 2653 2499 2510 23562324 2549 2834 2280 2547 2474 2346 24902722247525002497 2504 2292 2296 23552312 2349 2290 2343 2544
3179
3347
1818
3318
32943178
504 507540 698648
2665 2691 2692
2755
2753
2697 2533 2689
373
783
1799
1702 357
3123
3119
671
1807 1813
3167
277 3244 1210 3239
3207
31303280 3252
623
367
372 1231345 342 333325 328 283 1240 1222 299 293316 323 3306 34317111214 1809 1213 3235352 375 356 1718 317 3113 3169 3232 17011232 3233 3120 1227 336 3114 353 3259 1220 1237 285281 1805 3127 1816 1230 1202 3111 3260 324 3267 3265 1815 295 3245 3137 1201 3238 3264 354 18081716 319 1242 1696 3121 3144 3236 331 3139
3118 3304 3143 3122
3308
3174 3140
699
284
3278
3282
3281 3352
514
12351204 1211 330
318 12261800 309 1207 329 1706 1708 308 1221 376 1203 337 1197 362 351 306 300 314 12253263 3283 1233 344 1811 3124 3261 1212 12161205 350 365 3125 374 291
110
156 149
126159 228 229 232
227 230
146
1187
3254
3284 3188
3237
1698
3112
457
2294 2493 2325
3242
3309
66
62 130
133 158 127 134 223112114 148
154
181 139137 136 128 151
381
261
1817
1810 1803 312
1796
58 6717 47 3 48 105 56 38
71100 219
102 135 192 212180 147
161162
397
3256
3276 3132 3288
25 1970 95 140
1712 3551801287
3079340 335 276270 1710 368 326360278 322 1244 1236 296 377 741707 1697 364 338302366 3591703 307282 1215 1208 1806 1715 348 1239 1238 1245 3341723 332 3240 3234 32621218
459
150
807
3277 3311
16 18 267 13 65 45 96 1602 1011101 30
57 32
141 142 193 211 153 144 49
407 201
221210 145222
420
33 23 9 31 22 27 50 28 5 131
117
59
194 54
220143 216
268
92
12087 185 79 225106157
1724
303 1200 313 3117 3193320 36312241234 1219 272 3053076 3145 31411241 361 301 30822693713077 3081 1209
138
91
89 88 90
123
82
744
1686 1871
2344 2306 2495 2279 23002357 2506 2316 2322 2282 2519 2291 23212284 2286 2345 2358 2340 2302 2361 2289 23532304 2498 2523 2362 2287 2295 2540 2336 2508 2482 2309 2596 2365 2328 2359 2301 2532
3330
3199
3319 3202
3360
1587 1923 1911
19261982
1410
1020
3218
1582
1397
1631
843
1395
1059
1063
1763
8429701001
981 974 1528 8261445 840 1544
1602
583
1588 1988
1037 1065 1127
1016
1674
1128
1421
1426 1427 1627
3404 3289
33843275
1591
1058
1607
848
1265
994
1536 1370 99613861420834986973 969 821 11131009 975 1003 9921000 1121968978 997 976 1055 1388
1014 1392 1010 982 1391 837853 972 1416 1011 902
3379
1848 1870
1022 1458
1005
3324
3217
3305
3273 32123291 3331 3203 3219
3356 3204
1851
1869
1794
32253336
3298 3210 3214 3303 3339 3208 3314 3224 3227
1854
1846
1837 1842 1843 18641852 18681865 1855 1867
3222
3299 33013302
1866
1651
1440
1054
1256
1460
32213329
3325
3361 3363 33823334 3307 33223344 3383 3310 3397 3337 3286 32903326
3209 3381 3313 3323 3287 3321 3297 3328 3200 329232063327 3220 3341 3338 3211 3300 3396 3271 32233201
3315 3358
1669
1840
1839
1793 1792
3320 3342 321333573401 3335
3362 3364 3405
1933 1932
1754 454
455
478479
481 480
1028
1436 1673
938
1530
1603
1476 452
1934
1850 1905
1898
1916
1464
1665
1680 1795
1679
1017
1522 15391129
984
1890 1845
1835
1470
1478 1168 10244831461456 484
1018 1034 1039
1406 1130 1526
9851006
1192
4531396 1678
3345 3333
1857
18441618
1158 14661487
1139
791
1415 1540 1538 1061 1125 1126 14301505
1608
815
1634
1026
1131 1521
1520
1393
1013
825
836
839 845 1394 1186
1541
1500
873
1100 1108
436 1405
1658
1917
1040 831 1382 1030 1138 8201529 1409 1429 1412 1428
3390
1726 1913 1728 1853
1659
1653
1167 1668 14951175 1151 1620 1492 1494 1165 1491 1525 1148 1630 1160 1170 1667 1143
482 1023
1134 1038 1132 841838 1021
830
1411
1049
1908
17391849 1729
1755
1745 1169 1156 1162 11571465 1142 1036 1149 1624 14681463
1029
854 8291137
817 835
941
1008
1615
1457
1456 475
1098 1101 1191 1073 11931102 879 1085 1071 866
1111
1931 1930 1912
17401894
1897
1748
1747
1079 1080 857 10421083 1094 1082885 8701097 1093 1047
867 11101106
1909
1484 1733 1727 17431907 1731 1730 1660 1906 1178 1746 1741
1742 1657 1489 1144 18561738
11721469 1725 1736 1152 1474 1164 1177 1163 1176 1477 1661 1663173418411847 1638 1481 1154 1790 1472 1145 1171 1819 1744 1147
1136 847 10321027 8528551025 1383 1033 846850 1035 1616 1002 1041 833 828827 8241443 8191133 1381 1459 1531
880 856962860877882 864 858 884 886 1099 8761107 872 8781092 903 1407 861 10781103 10431190 1112 1105 1109 1104
1789
1914
1483
1737
888
958
963 1076
1081 1088871 1072 1087 887
1640
1153 1656 1479 1161 1471 1838 1654
1625 1732 1488 1619 1179 1662 1785 1735 1791 1150 1482 1637 1896 1749 1146 1485 14731155 1462
948
911
1893
1636 1621 1622 1623 1467 1166 1490
1090
964 950892 919955862 923 952 894
904 914 1046 940 869874931912 917 961 900906 889 907 1045928
3497
3888
3671
3803 3806 3882 3862 3484
3804 3483
3903
3778 3836 3490 3489 3779
3486 3477 34943495 3480 3488 3498 3881
3808
3502 3485 3478 3479 3493 3492 3835 36373482
3832 3501 3807 3834 3499 3902 3500 3670 3669
348734813878
3491 3496 38053659 38333873 3879 3647 381738163661
3655 3649 3650 3654 3663 3648 3662 36643651
3656 3657
3646 3665 3658 3653 3652 3660
3016 2922 3041 2966 2964 2979 2996 3011 2954 3033
(e) Result for Facebook dataset
Fig. 3: Visualizations of the result obtained by Bat Algorithm on each social network dataset.
10
Authors Suppressed Due to Excessive Length
to other methods as shown in Fig.4a. In term of Modularity; Bat algorithm is very competitive with other methods as shown in Fig.4b. For the small size data set we can observe the Bat algorithm produce a community structure with a high Modularity value compared to all other methods. Regarding the Facebook datasets; Bat algorithm failed to produce a result with high Modularity value compared to the other methods. 5.2
Discussion
As observed from our experimental result that the Bat algorithm performance is promising for small size networks, however for a large networks Bat algorithms performance is degraded compared to other CD algorithms. From our initial analysis, we found there is no much diversity in the VB swarm over the search space and the Bat algorithm does not explore a large region in the search space for the following reasons:
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6 0.5
Modularity
NMI
– All the population is moving toward one position (Current global best x∗ ). over iterations this will lead to all VBs will move/evolve to similar solutions. Despite that we overcome this problem using η top global best, it decreased its impact but did not eliminate it. – There is no operator/behavior that allow the VB to escape a local optima or jump/explore new random regions in the search space. For example in Genetic algorithm there exist a mutation operation that allow such behavior even a simple mutation in the current solution could cause a large diversity in the current population. Despite that Bat algorithm performance in other application [11] that are continues in nature is very promising, however for the community detection problem (discrete case) Bat algorithm performance has some limitations. – The local search and Bat difference operator that we proposed for the community structure Section 4.2 and 4.3 are not optimal and it is not clear if they are efficient in exploring the search space. It is possible for another design to cause a significant improvement to the algorithm performance.
0.4 0.3 0.2 0.1
0.5 0.4
0.3 0.2 0.1
0 Zachary Karate Bat
AFSA
Bottlenose Dolphin
Infomap
Fastgreedy
American College Football
Label Propagation
(a) NMI values.
Multilevel
Facebook Walktrap
0 Zachary Karate Bat
AFSA
Infomap
Bottlenose Dolphin Fastgreedy
American College Facebook Football Multilevel Walktrap Ground Truth
Label Propagation
(b) Modularity values.
Fig. 4: Average NMI and Modularity values of the result community structure obtained by each algorithm.
A Discrete Bat algorithm for the community detection problem
11
– Accepting criteria for new solution has some limitation. The basic Bat algorithm accept new solution only if it is better than the current global best. This may constrain the number of moves that a bat can perform.
6
Conclusions and Future Work
A discrete Bat algorithm is introduced for finding community structure in social network. The locus-based adjacency encoding scheme is applied to represent a community structure. The locus-based adjacency encoding scheme has a major advantage that it enables the algorithm to deduce the number of communities k without past knowledge about it. The BA uses Modularity Quality measure as the fitness function in the optimization process. Experimental results demonstrate that the performance of the bat algorithm is quite promising in terms of accuracy and successfully finds an optimized community structure based on the Modularity quality function for small size networks, however the performance is degraded for large size networks. BA algorithm produce good result for the small size network compared to other CD methods, however the result for the larger networks does not compete with other methods. In future work we are going to conduct an investigation of the discrete BA limitation introduced in Sec.5.2 to propose a new enhanced BA for the community detection problem and investigate other popular bio-inspired optimization algorithms, and apply it for the community detection problem, then conduct an analytical study between those methods to compare their performance.
References 1. Santo Fortunato. Community detection in graphs. Physics Reports, 486:75–174, 2010. 2. Ahmed S Ali, Ashraf S Hussien, Mohamed F Tolba, and Ahmed H Youssef. Visualization of large time-varying vector data. In Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on, volume 4, pages 210–215. IEEE, 2010. 3. Z. Masdarolomoor, R. Azmi, S. Aliakbary, and N. Riahi. Finding community structure in complex networks using parallel approach. In Embedded and Ubiquitous Computing (EUC), 2011 IFIP 9th International Conference on, pages 474–479, Oct 2011. 4. M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99:7821–7826, 2002. 5. M.E.J Newman and M. Girvan. Finding and evaluating community structure in networks. Physics Rev. E, 69:026113, 2004. 6. Chuan Shi, Cha Zhong, Zhenyu Yan, Yanan Cai, and Bin Wu. A multi-objective approach for community detection in complex network. In IEEE Congress on Evolutionary Computation (CEC), pages 1–8. IEEE, 2010. 7. Jure Leskovec, Kevin J Lang, and Michael Mahoney. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th international conference on World wide web, pages 631–640. ACM, 2010.
12
Authors Suppressed Due to Excessive Length
8. Chuan Shi, Philip S Yu, Yanan Cai, Zhenyu Yan, and Bin Wu. On selection of objective functions in multi-objective community detection. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 2301–2304. ACM, 2011. 9. Ahmed Ibrahem Hafez, Eiman Tamah Al-Shammari, Aboul ella Hassanien, and Aly A Fahmy. Genetic algorithms for multi-objective community detection in complex networks. In Social Networks: A Framework of Computational Intelligence, pages 145–171. Springer, 2014. 10. Xin-She Yang. A new metaheuristic bat-inspired algorithm. In JuanR. Gonzlez, DavidAlejandro Pelta, Carlos Cruz, Germn Terrazas, and Natalio Krasnogor, editors, Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), volume 284 of Studies in Computational Intelligence, pages 65–74. Springer Berlin Heidelberg, 2010. 11. Xin-She Yang and Xingshi He. Bat algorithm: Literature review and applications. Int. J. Bio-Inspired Comput., 5(3):141–149, July 2013. 12. Chuan Shi, Cha Zhong, Zhenyu Yan, Yanan Cai, and Bin Wu. A new genetic algorithm for community detection. Complex Sciences, 5:1298–1309, 2009. 13. Clara Pizzuti. Community detection in social networks with genetic algorithms. pages 1137–1138, Atlanta, GA, USA, 2008. 14. L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas. Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 9:09008, 2005. 15. W.W Zachary. An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33:452–473, 1977. 16. D. Lusseau. The emergent properties of dolphin social network. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270:S186–S188, 2003. 17. Julian J. McAuley and Jure Leskovec. Learning to discover social circles in ego networks. pages 548–556, 2012. 18. Ahmed Ibrahem Hafez, Aboul Ella Hassanien, and Aly A Fahmy. Testing community detection algorithms: A closer look at datasets. In Social Networking, pages 85–99. Springer, 2014. 19. EslamAli Hassan, AhmedIbrahem Hafez, AboulElla Hassanien, and AlyA. Fahmy. Community detection algorithm based on artificial fish swarm optimization. In Intelligent Systems’2014, volume 323 of Advances in Intelligent Systems and Computing, pages 509–521. Springer International Publishing, 2015. 20. M. Rosvall, D. Axelsson, and C. T. Bergstrom. The map equation. The European Physical Journal Special Topics, 178(1):13–23, 2009. 21. Aaron Clauset, Mark EJ Newman, and Cristopher Moore. Finding community structure in very large networks. Physical review E, 70(6):066111, 2004. 22. Usha Nandini Raghavan, R´eka Albert, and Soundar Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3):036106, 2007. 23. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, 2008. 24. P. Pons and M. Latapy. Computing communities in large networks using random walks (long version). ArXiv Physics e-prints, 12 2005. 25. Mark EJ Newman. Finding community structure in networks using the eigenvectors of matrices. Physical review E, 74(3):036104, 2006.