2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Cyclic Entropy of Complex Networks Ibrahim Sorkhoh
Khaled Mahdi
Maytham Safar
Computer Engineering Department Kuwait University P.O. Box 5969 Safat 13060 Kuwait
[email protected]
Chemical Engineering Department Kuwait University P.O. Box 5969 Safat 13060 Kuwait
[email protected]
Computer Engineering Department Kuwait University P.O. Box 5969 Safat 13060 Kuwait
[email protected]
Abstract—We calculate the cyclic entropy of a real virtual friendship network to have an insight on the degree of its robustness. Upon counting the number of cycles of different sizes in the network, a probability distribution function is resulted. An actual friendship network is found to have cyclic entropy bounded between random and small-world networks models. It has dual properties. Small world networks indicate the existence of critical network sizes: 150 and 700 at which the cyclic entropy is minimum. Scale-free networks have the highest cyclic entropy among all other complex network models regardless of the size of the network.
To evaluate the network robustness and its ability to tolerate changes [1, 2, 3], in [4, 5] they introduced the concept of network degree heterogeneity as a measurement of a networks’ robustness. However, the result showed that a network heterogeneity is not related to the networks’ ability to face external attacks. Increasing the network heterogeneity will increase the vulnerability to intentional node attacks and at the same time it will decrease the vulnerability to random ones. Entropy in thermodynamics is used in practice to measure the systems’ efficiency while conceptually it is related to the system degree of chaos and the system ability to posses new states. Boltzmann Generic Entropy definition allowed scientists from different fields to use this concept to study system robustness. As a first try, while he was expanding the information theory, Shannon was the first scientist who utilized the entropy principle to characterize communication systems. He used the entropy to calculate how a communication systems’ next states are unknown [6]. Another application of the entropy principle was in [3, 4]. By defining the systems’ state as the degree distribution of the network, they used entropy to measure the systems’ degree heterogeneity, hence the network robustness as they define it. A problem of such definition is that node degrees do not have universal distribution over all the complex networks models. Random networks have poisson distribution. Small Worlds have generalized binomial distribution and scale-fee networks have power law degree distribution. The objective of this work is to propose new methods to characterize complex networks by mean of cyclic entropy. Section 2 discusses the network entropy and the degree of cycles distribution. Section 3 proposes a new method to measure network entropy. We suggest a new approach to calculate the entropy of a dynamic complex network by defining the network state with the degree of cycles existing in the network. Section 4 provides an experimental comparison between the cyclic and degree entropies. Section 5 concludes the work and finally section 6 provides future work directions.
I. I NTRODUCTION AND R ELATED W ORK In the search for the most robust and stable complex networks, we resolve to the fundamental problem of statistical mechanics; the problem of system equilibrium, continuous and discrete alike. Equilibrium is the state at which the system’s entropy is the highest. The total number of states is the number of configurations in the system as well. In the case of complex network, it is the number of possible topologies that a network can have. The topology of a network can be defined in terms of the links and nodes in the network, such definition is simple, obvious and consistent with the direct relationship existing between nodes. Another definition is in terms of triad, a cycle of degree 3, such definition is a coarse-grain of the former definition, lesser system’s degree of freedom is needed for system representation. It introduces an ignored property in the links representation that is the feedback. In triads, feedback is taken care within a subset of three nodes. The total number of possible configurations using triads representation should lead to triadic entropy of the network. A more generalized definition of network topology is to consider all cycle sizes. The motivation is the inclusion of all possible feedbacks in the network. The total number of possible topologies using cycles representation leads to the introduction of cyclic entropy. Following a previous analysis, the robustness of a network, its ability to adapt to changes, is correlated to its ability to deal with internal feedbacks within the network. The more feedback loops (cycles) exist in the network, the more robust the network is. It follows that a fully connected network should represent the most robust networks. Among sets of robust networks, the most stable robust network is a network of size seven because it has the highest cyclic entropy among all sizes. 978-0-7695-4799-2/12 $26.00 © 2012 IEEE DOI 10.1109/ASONAM.2012.182
II. C YCLES D ISTRIBUTION AND C YCLIC E NTROPY A. Cycles Distribution The exploitation of cycles in characterizing dynamic systems was originally suggested in the dynamic systems fields. A dynamic system is a set of states and the transitions between 1082 1050
Fig. 1.
Degree of cycles distribution of a friends’ network. Fig. 2.
those states. It was mathematically proved that periodic orbits or precisely the non-wondering trajectories are strongly related to the system stability. Since they are defined as the trajectories that revisit the neighborhood of their roots, they represent the long-term trajectories that are preserved by the system along with the equilibrium points. Any other trajectories are considered to be transient since they reach their ends relatively in short time. The translation of those facts to the complex network fields gives the cycles interpretation explained before. In fact, a non-wondering trajectory when it is mapped to a network can be represented by closed walk. However, since this kind of structure appears massively in any network (even if it is sparse), it is practically useful to be satisfied with only the closed paths (cycles) [7]. The cyclic property is the connectedness of the network and the return of the information to the source as represented by cycle. Such uniformity of cycles distribution allows better characterization of complex networks. The cycles distribution 2 is given by the probability function, p(li) = a ∗ exp(− (li−b) ) c2 where li is the cycle length size, and a, b and c are positive real numbers that have unique values for each network. Therefore, we define the state as the degree of loops or cycles existing in the complex network. Then we find the distribution function of such cycles. Philosophically, it answers the fundamental question, in an honest complex network, what is the probability that an actor will receive the same information dispatched by him again? From this definition, we clearly assigned the entropy of non-cyclic complex network, purely hierarchical, to infinity.
Degree of cycles distribution of different network models
heterogeneity and uncertainties known entropy, H defined as: H(p) = −
N
P (k) ln(P (k))
(1)
k=1
In a complex network context, P (k) may represent the degree distribution of links or the remaining degree (outward links) or cycles of size k in the network. Degree distributions of links are the most common representation of P (k) in the literature. Different models of networks such as random, small world, scale free, exponential, and uniform and many others are usually represented and constructed as degree distribution models of actual networks; such as software, social, biological, circuits ...etc networks. Simple degree distribution describes the connectedness of the network; hence the entropy will be a measure of heterogeneity, uncertainty of network connectedness. The authors in [8] were the first to suggest computing the entropy based on the remaining (outward) degree distribution. The use of remaining degree distribution allows the calculation of degree-degree correlation and hence measures the network assortativeness. Therefore, the entropy should reflect the assortativeness uncertainty of the network. The work in [9] suggests calculating the entropy based on the degree of cycles instead of links. Such approach emphasizes the feedback nature in complex network and indirectly measures the network ability to store information within the network cycles. Therefore, the cyclic entropy will measure the uncertainties associated with the information feedback in the network. One interesting finding about the degree of cycles distribution is all networks have Gaussian (normal) distribution, each network differs in the value of variance and mean. In all cases, the minimum entropy of the network Hmin = 0
B. Cyclic Entropy Let k be a set of discrete random variable that takes the following values k = {1, . . . , k, . . . , N } with probabilities p = {P (1), . . . , P (i), . . . , P (N )} respectively such that P (k) >= N 0 and k=1 P (k) = 1. There exists a measure of randomness,
1051 1083
is when all nodes have the same degree, links, remaining links or cycles. As of the maximum entropy, each network has a limit which is a function of number of network nodes. Another picture of entropy is derived directly from the original definition of entropy by Boltzmann. The work in [10] proposes to compute the network entropy based on the topology configurations of the network. By varying the network parameters, several network configurations are generated. These configurations are constrained by the network model. For instance, in a random network model, the total number of configurations is limited by the wiring probability. The results in [10] assert that their approach can be extended to other models and can estimate the network deviation from equilibrium. The entropy based on the number of configurations is defined as S(W, p) = −
W
P (k) ln(P (k))
world networks is a little bit more complicated than random networks. The network must be initially a lattice. That is, each vertex has k neighbors. After constructing this initial network, a rewiring probability must be considered. This probability is used to rewire some of the existing edges randomly. The World Wide Web network is the most well known network that follows scale-free model. The WWW network has the property of having central hubs. These hubs own the most of the connections in the network. This property is the feature of the scale-free network model. A small number of vertices have a very high degree and most vertices have a very low degree. One interesting complex network that is classified as scale-free is the complex commerce network. In [11], they tried to find a mathematical reason that causes this network to be of this type. Other examples of real scale-free networks are: Protein-protein interaction networks and the collaboration of actors in films [2] . Scale-free networks are built following a growth pattern. First, the network starts with a random network of size m0. Then the network grows by adding new nodes to the existing network and creating more relations between the newly added node and some of the pre existing nodes. Each time a number of m nodes are added to the existing network. Nodes-adding process ends when reaching the required total number of nodes in the network. However, adding the new relations between a new node i and an existing node v is proportional to the total degree of v. The probability of creating a link between node i and node v is represented by the degree of v over the summation of degrees of all nodes in the network:
(2)
k=1
where the probability in this case is the probability of finding a given configuration. III. C OMPLEX N ETWORKS M ODELS A. Network Representation We represent the complex network by an un-weighted undirected graph because of the lack of information in the differences between the relations we extracted from a friendship website. Moreover, the relations are duplex due to a constraint from the website that there is no friendship relation unless both sides agree. This representation has the following implications over the network. The relations are unified: Any action taken from a node using a link must be taken with all the links connected to this node at the same time with the same manner. All the nodes are trusted: they have no storage or camouflaging abilities. Although the simplicity in our network representation is too far from the actual complex systems, we believe that this representation represents a basis for any other complex network.
d(v) pl(v) = v d(v)
(3)
where the degree d(v) is the number of neighbors of node v. C. Actual Network In constructing the actual network, we use data from real complex networks such as Hi5 [12]. The network was constructed by extracting a number of records from Hi5 online complex network using Breadth First Search (BFS) algorithm. Hi5 is a social networking website that was founded in 2003 and claims to have over 60 million active members. To construct the friends’ network, we first created nodes representing each individual and the friends in a known client list. Then, we assign one of his friends as the current clients and extract his friends. We continue doing that with every node in the list until the required number of nodes is reached.
B. General Networks’ Models Generally, model complex networks are classified in three major types: random networks (RN), small-world networks (SW) and scale-free networks (SF). These three classes have different topological features. This causes the connectivity between nodes to be different between the three classes. The simplest and the most popular network to generate is the random network. With this type of networks, connections between each pair of nodes are randomly generated by considering an input design parameter pc (probability of connection). Then we can generate the connections between each pair of the network vertices randomly using pc. Small world is an interesting model that can be found in many real networks. The main property that distinguishes this model from any other is the small world property. This property says that starting from any vertex; any other vertex can be reached by a small number of hubs. Generating small
D. Counting cycles Researchers usually solve the problem of counting by using its corresponding enumeration problem that can be efficiently solved. The proposed work depends on the cycle’s distribution of networks (see Figure 3). To get the cycles distribution, we need to count the cycles for each possible cycle length in the network. An interesting algorithm exists that is based on statistical mechanics concepts, named Belief Propagation (BP) algorithm. The algorithm accurately counts the cycles distribution without enumerating the cycles themselves. That
1052 1084
greater than 0. At each iteration step, a new distribution point (L, CL ) is produced. The iteration step for u is 0.0001 at the early stages of the algorithm. This value is not fixed. It will be changed when Lnew − Lold < 0.001 (i.e. the progress in L is slow). If this condition is satisfied, u will be increased by 10%. As noticed from the equations above, the output at each step (L, CL ) depends on u. At specific stages of the iteration (when u gets large), too many iterations are wasted outputting nearly the same point. To avoid this condition, a jump in u is made. This algorithm yields a plot of (L, CL ) points. To extract the needed distribution points (3 to n), we use interpolation. Interpolation equations needs at least two points to find the third one, this point should be surrounded with the other two nodes. For example, P1 (5.9876, 453) and P2 (6.0124, 490) and we need to find the number of cycles for L =6, then interpolation leads to a value of 471.5.
gives it a speed more than any other algorithm that depends on enumerating all the cycles. However, this algorithm faces many ambiguities existing and unpredictabilities on its output due to its high dependence over randomness and convergence criteria. Our work uses our enhanced BP algorithm [13] for cycles computation in complex networks. We developed a java program that is based on the backtracking algorithm (based on a statistical mechanics [14] approach) designed by [15]. It uses the Belief Propagation equations and an approximation method to approximate the statistical mechanics model and find the cycles distribution. Two methods can be used as approximation algorithms, Monte Carlo simulation and Bethe approximation. Bethe is used here because of the well-known correspondence between both Bethe and Belief Propagation. However, due to the complexity of such computation (NPHard) , we developed a new approximation algorithm for counting cycles in a network [16] (for further details refer to [13].) The graphs in this algorithm are represented as adjacency matrices. The input to the algorithm is an undirected graph, and the output is the cycle’s distribution of the graph (number of cycles as a function of their size.) The algorithm starts by reducing the graph. All leaf nodes (nodes with degree 1 or 0) are removed from the graph. Each edge of the graph is initialized with a random positive value y(0). Each edge is iterated from its initial value until convergence reaching to a fixed value of y*. Convergence is determined according to some accuracy level. To guarantee the convergence of the algorithm, we restricted y (T +1) - y (T ) to be less than or equal to 0.001. The value y represents the probability that the edge is present in a cycle c. The y value can be calculated using the following equation: (T ) u m∈βi−j ym→i (T +1) (4) yi→j = (T ) (T ) 1 + 0.5u2 m,n∈βi→j ym→i yn→i
Input: Undirected graph G. Output: Array A[1..n] of size n which contains the cycle count of each length. 1: A[1] ← A[2] ← 0 2: Reduce G by removing all leaves nodes 3: U ← 0 4: Points ← null set of points 5: While(L < n) 6: Begin 7: y = random number between 0 and 10 ( do for all edges) 8: While( |ynew − y| < 0.0001 for all edges) 9: Calculate ynew using equation 1 for all edges twice 10: y ← ynew 11: L ← equation 2 12: CL ← equation 4 13: Add (L,CL ) to points 14: u ← u + 0.0001 15: if( |Lnew − Lold | < 0.001) 16: u ← u * 1.1 17: End 18: For(i=3→n) 19: Begin 20: Find two points that surround i ( Lp < i < Lp+1 ) 21: Calculate Ci using interpolation 22: A[i] ← Ci 23: end
where u is a positive real number value. Then from all y’s two values are calculated; CL and L ∗ ∗ uyi→j yj→i (5) L= ∗ ∗ 1 + uyi→j yj→i (ij)∈E 1 ∗ ∗ R= ln 1 + 0.5u2 (ym→i yn→i ) N i∈V m,n∈βi & m=n L 1 ∗ ∗ (6) ln(1 + uyi→j yj→i ) − ln(u) − N N (ij)∈E
CL = e
RN
(7)
Fig. 3.
where: βi is the set of neighbors of node i βi − j is the set of neighbors of i except j N is the number of nodes in the graph CL is the number of cycles of length L
Counting Cycles Approximate Algorithm
IV. N ETWORK M ODELS C OMPARISON BASED ON C YCLIC AND D EGREE E NTROPIES The parameters used to construct small world (p and k) and scale-free networks (m) are the ones that provide the least deviations from the actual networks known properties such as average degree, clustering coefficients and others. Therefore, the comparison of an actual complex network with
The procedure explained above is repeated starting from an initial value of u = u0 to u = umax . Where u0 and umax are
1053 1085
Fig. 4. Comparison between different complex network types using cyclic entropy.
other complex networks types will be meaningful. A cross check is made by experimenting with the rewiring parameters and evaluate the cyclic entropy and compare it to the actual network to support the conclusions made in this work. Upon the evaluation of cyclic entropy of actual, small world, random and scale free networks for different networks sizes, summarized in Figure 4, we attempt to characterize the type of a real complex network by comparing its cyclic entropy with its counterparts of other networks types. Apparently, the cyclic entropy of the actual network equals the small world network up to size 350 where the small-world entropy diverges from actual network. The actual network has constant cyclic entropy irrelevant to the size of the complex network. Such constancy exists in the random network, however, at a higher cyclic entropy. According to cyclic entropy approach, the actual network has the constancy of a random network but the value of small size small world network. One interesting finding is the minimum cyclic entropy observed in the small world network when the network size equals 150. Such property is known in sociological researches [17] and references there in. Among all studied networks, only small world showed such property. Minimum entropy suggests the highest state of order in the system. Primitive societies tend to divide themselves naturally when their sizes come closer to 150 otherwise the community will not function optimally as it is supposed to be. To our knowledge, there is no other approach proved the possible existence of a critical small world network size. Cyclic entropy of scale-free networks seems erratic and no information can be deduced from such high fluctuations in the value of cyclic entropy over the studied sizes of scale-free networks. On the other hand, such non-monotonic behavior of scale-free networks leads to an interesting look on the robustness of scale-free networks. Although it has been proven [4] that scale free networks are not as prone to attacks as
Fig. 5. Comparison between different complex network types using degree entropy.
others, the size of the network is very essential and relevant to such conclusion. Figure 4 shows sizes of scale-free networks that can be considered the most ordered however unstable (such n = 300 and 550) and sizes with least order but stable (such n = 400 and 800). Further investigations are required to confirm such findings for large scale-free networks. The implication of the existence of a critical size of scale-free networks should not be taken lightly as the Internet is a scale-free network. The growth of the internet by more nodes added to the network constantly emphasizes the significance of the knowledge of the critical size that leads to ordered configuration, i.e. the minimum cyclic entropy. In the next experiment we evaluated and compared the degree entropy of actual, small world, random and scale free networks for different networks sizes, summarized in Figure 5. We attempt to characterize the type of real complex networks by comparing their degree entropy with their counterparts of other network types. Apparently, the degree entropy of the actual network is similar in trend to the scale-free networks. The actual network has almost constant degree entropy irrelevant to the size of the complex network. Such constancy exists in the scale free network for sizes greater than 200 nodes. V. C ONCLUSION Robustness of a complex network implicitly assumes that the network is resilient to random failures and attacks. This work analyzes friends’ websites that exhibit good and robust complex networks. We measure the degree of its robustness by calculating its entropy statistically. That is, measure how ordered the network is. We think that such characterization is important and will have major effects on complex networks field. When comparing the cyclic entropy of different networks with respect to the corresponding network size, clear distinctions arise on the value of the cyclic entropy of each type. The actual network, the friendship network in this study, shows
1054 1086
dual behavior, it has the constancy of a random network but the value of cyclic entropy of ”small-sized” Small-World network. Such property of actual network was noted in the literature in many studies using different characterizing parameters, such in the analysis of the degree of diffusion [18]. One interesting finding is the Small-World sensitivity to the network size. SW network shows high degree of order, another way of saying minimum cyclic entropy, if the network sizes are 150 and 700. In other words, information and communication of SW of size 150 or 700 will be critically ordered system that communicates and perhaps ”living-together” effectiveness within the network becomes high but any small perturbation to such system will lead to disorder. Hence, it is recommended not to keep a small world network configuration within such size. Another finding is the erratic behavior of cyclic entropy of Scale-Free with respect to the network size, interestingly, SF network, compared to the rest of the networks is the least ordered but closer to thermodynamic equilibrium, a state of no driving force in the system towards change. Furthermore, when the network sizes are 400, 800, 1200 and multiple of 400, the SF networks have their highest point of disorder and resistance to change, hence failure. The value of the cyclic entropy at these maxima is approximately 12. In the search for a network in equilibrium that shows the most disorder and resistance to failure. A complex network has a large entropy value indicating its most disordered state, equilibrium, state and hence the most stable and robust point. We found this equilibrium state by calculating its cyclic entropy statistically. VI. F UTURE W ORK The calculation of entropy using the cyclic method will be compared in the future to other methods of calculations. Furthermore, cyclic calculation of entropy is a novel concept that can be explored in further details by considering several types of complex networks. Considering the network evolution models, nodal attributes models or exponential random graph models is a must due to their generality and their ability of dynamic network representing. However, applying our methodology in characterizing the network needs an efficient algorithm to count the cycles for such dynamic models [29]. Our future concentration will be on suggesting a new efficient counting algorithm even if its results are a good enough approximation and not exact solution [9, 19, 20].
[4] R. Albert, H. Jeong, and A.-L. Barabasi, “Error and attack tolerance of complex networks,” Nature, vol. 406, no. 6794, pp. 378–382, 2000. [5] A.-L. Barab´asi and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, pp. 509–512, 1997. [6] C. E. Shannon and W. Weaver, The Mathematical Theory of Communication, C. E. Shannon and W. Weaver, Eds. University of Illinois Press, 1975. [7] P. Cvitanovi´c, R. Artuso, P. Dahlqvist, R. Mainieri, G. Tanner, G. Vattay, N. Whelan, and A. Wizrba, Chaos: Classical and Quantum. http://chaosbook.org/, 2004. [8] Information Theory of Complex Networks: On Evolution and Architectural Constraints, vol. 650, 2004. [9] K. Mahdi, M. Safar, and I. Sorkhoh, “Entropy of robust social networks,” in IADIS International Conference eSociety, Algarve, Portugal, 2008. [10] Network Entropy Based on Topology Configuration and Its Computation to Random Networks, vol. 25, 2008. [11] A. T. Stephen and O. Toubia, “Explaining the powerlaw degree distirbution in a social commerce network,” Social Networks, vol. 31, no. 4, 2009. [12] Hi5, “Hi5,” 2008. [Online]. Available: www.Hi5.com [13] I. Sorkhoh, K. Mahdi, and M. Safar, “Computation nonintensive estimation algorithm for counting cycles,” in iiWAS, 2010. [14] M. Kardar, Statistical Physics of Fields. Cambridge University Press, 2007. [15] E. Marinari, R. Monasson, and G. Semerjian, “An algorithm for counting circuits: application to real-world and random graphs,” 2005. [16] J. S. Yedidia, W. T. Freeman, and Y. Weiss, Understanding Belief Propagation and its Generalizations. Morgan Kaufmann Publishers Inc., 2003, no. 239-269, ch. 8. [17] M. Gladwell, The Tipping Point. Little Brownenguin, 2000. [18] K. Mahdi, M. Safar, and S. Torabi, “A model of diffusion parameter characterizing social networks,” in IADIS International Conference e-Society, 2009. [19] M. Safar, K. Mahdi, and I. Sorkhoh, “Maximum entropy of fully connected social network,” in IADIS International Conference Web Based Communities, Amsterdam, Holland, 2008. [20] I. Sorkhoh, M. Safar, and K. Mahdi, “Classification of social networks,” in IADIS International Conference WWW/Internet, Freiburg, Germany, 2008.
R EFERENCES [1] B. Wang, H. Tang, C. Guo, and Z. Xiu, “Entropy optimization of scale-free networks’ robustness to random failures,” Physica A, vol. 363, no. 2, pp. 591–596, 2005. [2] L. d. F. Costa, F. A. Rodrigues, G. Travieso, and P. R. Villas Boas, “Characterization of complex networks: A survey of measurements,” 2006. [3] R. Albert and A.-L. Barab´asi, “Statistical mechanics of complex networks,” Reviews of Modern Physics, vol. 74, 2002.
1055 1087