Overlapping-box-covering method for the fractal dimension of complex

1 downloads 0 Views 481KB Size Report
Apr 17, 2014 - The results yielded by the box-covering method with separated boxes ... this paper, we adopt the overlapping box to tile the entire network, ...
PHYSICAL REVIEW E 89, 042809 (2014)

Overlapping-box-covering method for the fractal dimension of complex networks Yuanyuan Sun ()* and Yujie Zhao ()† College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China (Received 30 July 2013; published 17 April 2014) The fractality and self-similarity of complex networks have been widely investigated by evaluating the fractal dimension, the crux of which is how to locate the optimal solution or how to tile the network with the fewest boxes. The results yielded by the box-covering method with separated boxes possess great randomness or large errors. In this paper, we adopt the overlapping box to tile the entire network, called the overlapping-box-covering method. In such a case, for verifying its validity, we propose an overlapping-box-covering algorithm; we first apply it to three deterministic networks, then to four real-world fractal networks. It produces optimums or more accurate fractal dimension for the former; the quantities of boxes finally obtained for the latter are fewer and more deterministic, with the redundant box reaching up to 33.3%. The experimental results show that the overlapping-box-covering method is available and that the overlapping box outperforms the previous case, rendering the errors smaller. Moreover, we conclude that the overlapping box is an important determinant to acquire the fewest boxes for complex networks. DOI: 10.1103/PhysRevE.89.042809

PACS number(s): 89.75.Hc, 89.20.Ff

I. INTRODUCTION

In recent decades, complex networks have been studied extensively and, as a consequence, prominent achievements have been astonishingly made in many aspects such as the topological characteristics [1–23], network models [24–29], and network dynamics [30–37]. The studies for fractality and self-similarity in networks have also achieved outstanding results [38–49]. It is suggested that the fractal scaling originates from the disassortative degree-degree correlations [41] or the repulsion between hubs [42] as well as the underlying tree structure called the skeleton in networks [43,45]. The fractal scaling is the power-law relationship between the minimum number of boxes, NB (lB ), needed to tile the entire network and the linear size of boxes, lB , i.e., NB (lB ) ∼ lB−dB ,

(1)

with a finite fractal dimension dB [50]. The fractal networks are always self-similar but some self-similar networks are not fractal; in other words, fractality and self-similarity do not always imply each other [43] or are disparate notions in some networks [46]. The box-counting method is the one most used in measurement for the fractal dimension [50]. The reason for its dominance lies in the easy and automatic computability by machine. However, Song et al. discovered that many realworld scale-free networks possess the fractal nature through a generalized box-covering method using the intrinsic metric of the chemical distance in networks [44]. The new definition of boxes applicable to complex networks is introduced so the maximum separation between any two nodes within each box is less than lB , which differs from that in Euclidean space defined by Hausdorff [50–52]. Nevertheless, the crux of the matter is how to locate the optimal solution or how to tile the entire network possibly with the fewest boxes of a given size, which belongs to NP-hard problems [53].

* †

[email protected] [email protected]

1539-3755/2014/89(4)/042809(7)

On this basis, several box-covering algorithms have been proposed to calculate the quantities of boxes for complex networks. First, the random sequential (RS) box-covering algorithm was introduced in Refs. [45–47]. The size rB is regarded as the radius of a box, with the diameter of the box 2rB + 1. That is, the odd sizes of boxes are only considered, ignoring the cases of even ones. Strictly speaking, it does not help to find the optimal solution though correctly identifies the fractal property of the underlying network structure. In addition, it contains a random process of selecting the position of the center of each box, rendering it easy to be carried out but the results random. Second, Song et al. proposed three algorithms: the greedy coloring (GC) algorithm, the compact-box-covering (CBB) algorithm, and the maximumexclude-mass-burning (MEMB) algorithm [48]. Both GC and CBB algorithms construct boxes using the criterion of diameter lB , with their results at a given lB following the Gaussian distribution, thus they are similar. Moreover, they are both simple to implement. MEMB algorithm constructs boxes with radius rB = (lB − 1)/2, i.e., lB = 2rB + 1, with the cases of even lB neglected. It seeks to cover networks with boxes of maximum “excluded mass,” of which a node’s excluded mass is the number of uncovered nodes with their distances less than rB ; ultimately, the results are nearly deterministic. Recently, a new box-covering algorithm proposed in Ref. [49] shares a common spirit with the MEMB algorithm in constructing boxes with radius rB = (lB − 1)/2 for a central node and rB = lB /2 for a central edge, leading to the criterion of the distance between any two nodes is weaker. It counts the boxes created by each node or edge and finds out the necessary boxes to tile the whole network, with the improvement reaching up to 15% in the case of the World Wide Web (WWW) network, but it takes a long time. Nevertheless, there are pathological cases, the loop structures with more than three connecting nodes, in the case of radius rB . Given a loop with four nodes (1-2, 2-3, 3-4, and 4-1), it needs at least two boxes for rB = 1, but only one for lB = 2rB + 1 = 3. Thus it is essential for the optimal solution to construct boxes using the criterion of diameter lB . However, all boxes finally obtained by the algorithms mentioned above are separated or nonoverlapping with each

042809-1

©2014 American Physical Society

YUANYUAN SUN AND YUJIE ZHAO

PHYSICAL REVIEW E 89, 042809 (2014)

other, causing the results to be random or the errors to be larger. What will the results be if the boxes constructed in the box-covering method overlap with each other? We call this the overlapping box. Just like the overlapping communities [16], you cannot say one node only belongs to one community or box from the practical perspective. In the following, we demonstrate the plausibility of the overlapping box to cover networks through the analysis in detail. II. OVERLAPPING-BOX-COVERING METHOD

For a given network G, a box with size lB is a set of nodes where all distances between any two nodes are smaller than lB [44,48]. However, when one node that has many neighbors belongs to one box, equivalently, it can also be in another. It means that two different boxes can overlap with each other, which is not in contradiction with the definition but also has a certain practical significance. Figure 1 shows an example of the box covering for a network G. A box with size lB is “compact” when it includes the maximum possible number of nodes between which all the distances are smaller than lB [48]. As we can see from the upper panel in Fig. 1, not all the separated boxes finally obtained are “compact” for CBB algorithm, i.e., every separated box is possible to be “incompact,” thereby yielding a different number of boxes. Even though one box [see the middle one in Fig. 1(b)] is “compact,” it is redundant or unnecessary to the optimal solution for the whole network. It is one of the reasons why the results possess great randomness. Instead, all boxes are “compact” when they are allowed to overlap, which implies that the overlapping box could be a good choice. Here we define the box to which at least one node only belongs as the effective box (EB); otherwise, the box is a redundant box (RB). In other words, there are the nodes that are covered only once in the EB while all nodes in the RB are

FIG. 1. (Color online) An example of covering a network G with boxes at lB = 2. In the upper panel, shown in (a) and (b), are two possible solutions of the CBB algorithm with six, the minimum, and seven boxes, respectively. In the lower panel, two solutions corresponding to (a) and (b) are drawn in (a ) and (b ), respectively, when boxes are overlapping. There are six boxes (red ones) in (a ) same to that in (a). Meanwhile, it also has six boxes in (b ) when the RB, the middle one (green), is removed, whereas there is one more box in (b).

covered more than once. As shown in the lower panel of Fig. 1, both the EB and RB are identified successfully when boxes are allowed to overlap. On the one hand, in Fig. 1(a ), the number of boxes finally obtained is the same to the optimum in Fig. 1(a), thus the overlapping EB can achieve the same effects. On the other hand, in Fig. 1(b ), it is fewer than that in Fig. 1(b) but the same to that in Figs. 1(a) and 1(a ) as the RB (the middle one) is removed, thereby keeping consistent. It means that in the same execution order of nodes, the number of boxes finally obtained under the condition of overlapping is fewer than or at most equal to that in the case of nonoverlapping. To some extent, it is believed that all “compact” effective boxes (EBs) finally obtained make it possible to come closer to the optimums or even achieve them. Furthermore, one can see clearly that not all EBs finally obtained are overlapping. If there are the nodes that are covered more than once in an EB, then it is an overlapping box; otherwise, a nonoverlapping box. Therefore, in the following, we adopt the overlapping box to tile the entire network, called the overlapping-box-covering method. On such a basis, we propose a new algorithm, the overlapping-box-covering algorithm (OBCA), whose implementation is as follows: (1) Repeat the following for each lB from 2 to lBmax − 1, where lBmax is the diameter of the network plus 1. (2) Set the covered frequency to zero for each node, indicating all nodes are uncovered initially. (3) Starting from the nodes with small degrees to the ones with large degrees, repeat the following until all nodes are covered. (a) If the node is covered, then continue; otherwise, it is selected as the seed to construct a box. Then check whether all distances between any two nodes in the box constructed by the seed are smaller than lB or not. (b) Select one unchecked node i in the box, preferring to select the uncovered nodes. If the distance lij from i to another one j is greater than lB , remove j from the box. (c) Repeat (b) until all distances between any two nodes in the box are smaller than lB . Consequently, yield an EB. (d) Increase the covered frequency by 1 for each node in it, and save it temporarily. (4) Judging from the covered frequency of each node, check whether the boxes finally obtained are the EB. Since some EBs may be divided by others and become the RB, if one RB is found, delete it and decrease the covered frequency by 1 for each node in it at the same time. Finally, we acquire the number of boxes for a given lB . NB (1), obviously, is equal to the order of the network, N , at lB = 1 and NB (lBmax ) = 1 at lB = lBmax . Because leaf-nodes (the nodes with the smallest degree in the network) must belong to the EB, we construct boxes in this ascending execution sequence, from the nodes with small degrees to the ones with large degrees. If not, it turns out to be a general OBCA. The construction of the “compact” box in OBCA is similar to CBB algorithm, thus OBCA is also simple to implement. Nevertheless, OBCA takes slightly more time than the CBB algorithm, because one node may be compared many times and the RB needs to be checked out in the final step. In the following, we elaborate on the validity of our method by

042809-2

NB(lB)

P

lB

K

lB

CBB OBCA ΔN

B

l

lB

P(Nolap)

B

P(ΔN)

K

1,7

l

P

PHYSICAL REVIEW E 89, 042809 (2014)

P

P

OVERLAPPING-BOX-COVERING METHOD FOR THE . . .

M

2,6

11

l

l

B

l

B

B

FIG. 2. (Color online) OBCA applied to Km,t , Mt , and MIIt . [(a)–(d)] The box counting of OBCA (red circle) in comparison with that of the CBB algorithm (green square). Shown are cases for (a) K1,7 , (b) K2,6 , (c) M11 , and (d) MII5 . The difference (blue triangle) N is the numbers of RBs among the boxes yielded by CBB algorithm, i.e., N = NCBB − NOBCA . Insets: The probability distribution of difference (blue triangle) P (N ) and the overlapping box (cyan diamond) P (Nolap ). P (N ) = N/NCBB ,P (Nolap ) = Nolap /NB (lB ), where Nolap is the number of overlapping ones among the boxes produced by OBCA at a given lB .

III. EXPERIMENTAL RESULTS

First, we apply OBCA to three deterministic networks, the Koch network Km,t [27], the polygon network Mt [29], and the minimal network MIIt , which are all evolving with time t. The construction of MIIt is based on Mode II of the minimal model in Ref. [42]. Here we consider four networks, K1,7 , K2,6 , M11 , and MII5 , where m (1, 2) is the generalized dimension. Meanwhile, the CBB algorithm is also employed to them in the same execution sequence, with the experimental results shown in Fig. 2. The quantities of boxes of finally obtained by OBCA for Km,t and Mt at each given size lB are equivalent to the empirical optimums, derived by the mathematical induction. The optimal quantity of boxes for Km,t is  lB = 2n 3m(3m + 1)t−n , NB (lB ) = , (2) 2(3m + 1)t−n + 1, lB = 2n + 1 where t > 0,2 ≤ lB ≤ 2t + 1,n ∈ N ∗ . Similarly, for Mt , it is ⎧ t lB = 2 ⎪ ⎨3 × 2 − 2, t−n NB (lB ) = 3 × 2 + , ⎪ ⎩N (t − l + 1), l = (2n + 1)or(2n + 2) lB B B (3) where t > 0,2 ≤ lB ≤ 2(t + 1),n ∈ N ∗ . NlB (x) is defined as ⎧ 0, x