Relationship between the in-degree and out-degree of ... - Google Sites

ARTICLE IN PRESS

Physica A 371 (2006) 861–869 www.elsevier.com/locate/physa

Relationship between the in-degree and out-degree of WWW Jianguo Liua,, Yanzhong Danga, Zhongtuo Wanga, Tao Zhoub a

Institute of System Engineering, Dalian University of Technology, 2 Ling Gong Road, Dalian, 116023 Liaoning, PR China b Department of Modern Physics, University of Science and Technology of China, Hefei Anhui 230026, PR China Received 16 December 2005; received in revised form 3 March 2006 Available online 2 May 2006

Abstract In this paper, the relationship between the in-degree and out-degree of World-Wide Web is studies. At each time step, a new node with out-degree kout is added, where kout obeys the power-law distribution and its mean value is m. The analytical and simulation results suggest that the exponent of in-degree distribution would be gi ¼ 2 þ 1=m, depending on the average out-degree. This finding is supported by the empirical data, which has not been emphasized by the previous studies on directed networks. r 2006 Elsevier B.V. All rights reserved. Keywords: Complex networks; Scale-free networks; Small-world networks; Disordered systems

1. Introduction The last few years have burst a tremendous activity devoted to the characterization and understanding of complex network [1–4]. Researchers described many real-world systems as complex networks with nodes representing individuals or organizations and edges mimicking the interaction among them. Commonly cited examples include technological networks, information networks, social networks and biological networks [4]. The results of many experiments and statistical analysis indicate that the networks in various fields have some common characteristics. They have small average distance like random graphs, large clustering coefficients like regular networks and scale power-law degree distribution. The above characters are called the small-world effect [5] and scale-free property [6]. Motivated by the empirical studies on various real-life networks, some novel network models were proposed recently. The first successful attempt to generate networks with high clustering coefficient and small path length is that of Watts and Strogatz (WS model) [5]. The WS model starts with a ring lattice with N nodes in which every node is connected to its first 2m neighbors. The small-world effect emerges by randomly rewiring each edge of the lattice with probability p such that self-connections and duplicate edges are excluded. The

Corresponding author.

E-mail addresses: [email protected] (J. Liu), [email protected] (Y. Dang), [email protected] (Z. Wang), [email protected] (T. Zhou). 0378-4371/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.physa.2006.03.054

ARTICLE IN PRESS 862

J. Liu et al. / Physica A 371 (2006) 861–869

rewiring edges are called long-range edges which connect nodes that otherwise may be part of different neighborhoods. Recently, some authors have demonstrated that the small-world effect can also be produced by using deterministic methods [7–9]. Another significant model capturing the scale-free property is proposed by Baraba´si and Albert (BA network) [6,10]. Two special features, i.e., the growth and preferential attachment, are investigated in the BA networks for the free scaling of the Internet, WWW and scientific co-authorship networks, etc. This points to the fact that many real-world networks continuously grow by the way that new nodes added to the network, and would like to connect to the existing nodes with large number of neighbors. While the BA model captures the basic mechanism which is responsible for the power-law degree distribution, it is still a minimal model with several limitations: it only predicts a fixed exponent in a power-law degree distribution, and the clustering coefficients of BA networks is very small and decrease with the increasing of network size, following approximately Cln2 N=N [11]. To further understand various microscopic evolution mechanisms and overcome the BA model’s discrepancies, there have been several promising attempts. For example, the aging effect on nodes’ charms leads the studies on the aging models [11–14], the geometrical effect on the appearance probability of edges leads the studies on the networks in Euclidean space [15–17], and the self-similar effect on the existence of hierarchical structures leads the studies on the hierarchical models [18–22]. One of the extensively studied network is the World-Wide Web [23–28], which can be treated as a directed network having power-law distributions for both in-degree and out-degree. In addition, it is a small-world network. Since the knowledge of the evolution mechanism is very important for the better understanding of the dynamics built upon WWW, many theoretical models have been constructed previously [29–33]. However, these models have not considered the relationship between the in-degree distribution and the out-degree distribution. In this paper, we study the relationship between the in-degree and out-degree of World-Wide Web. Assuming that the out-degree of the new added nodes obeys the power-law degree distribution, we find that the exponent of the in-degree distribution is determined by the average out-degree. By the way, this model displays both scale-free and small-world properties. Comparisons among the empirical data, analytic and simulation results suggest the present model is a valid one. The rest of this paper is organized as follows: in Section 2, the present model is introduced. In Section 3, the analyzes and simulations on network properties are shown, including the degree distribution, the average distance, and the clustering coefficient. Finally, in Section 4, the main conclusion is drawn. 2. The model (1) Initial condition: The model starts with a fully connected graph of N 0 nodes and m0 edges. (2) Growth: At each time step i, a new node v with 2ei edges is added and would connect to 2ei existing nodes. Time i is identified as the number of time steps. (3) The first step: v is attached to the existing ei nodes, denoted by the set Qi , with the probability proportional to their in-degree. (4) The second step: For each node x 2 Qi , one of its neighbors is randomly selected to connect to v. The selfconnections and duplicate edges are excluded. The out-degree distribution is constructed by man-made node series, which out-degree distribution obeys the power-law with exponent gout ¼ 2:7. The average out-degree m can be obtained by adjusting the length of the series. It should be emphasized that, since the out-degree of the WWW network is not fixed but approximately obeying a power-law, the number of newly added edges during one time step, 2e, is not a constant but a random number also obeying a power-law. And the average out-degree m is fixed, which significantly affects the in-degree distribution exponent, average distance and clustering coefficient of the whole network.

ARTICLE IN PRESS J. Liu et al. / Physica A 371 (2006) 861–869

863

3. Exponent of the in-degree distribution The probability that a newly appearing node connects to a previous node is simply proportional to the indegree k of the old vertex. Because the in-degree of the new added node is zero, it cannot be selected by other nodes. We define the node initial attraction A, which is generated by itself, but not by its connections. Then the probability of attachment to the old vertices should be proportional to k þ A, where A is a constant and we set A ¼ 1 for simplicity [34]. The probability that a new edge attaches to any of the vertices with degree k is ðk þ 1Þpk ðk þ 1Þpk P . ¼ mþ1 k ðk þ 1Þpk

(1)

The mean out-degree of the newly added node is simply m, hence the mean number of new edges to nodes with current in-degree k is ðk þ 1Þpk m=ðm þ 1Þ. Denote pk;n the value of pk when the network size is n, then the change of npk is 8 m½kpk1;n ðk þ 1Þpk;n > > ; kX1; < ðn þ 1Þpk;nþ1 npk;n ¼ mþ1 (2) m > > ; k ¼ 0: : ðn þ 1Þp0;nþ1 np0;n ¼ 1 p0;n mþ1 The stationary condition pk;nþ1 ¼ pk;n ¼ pk yields ( ½kpk1 ðk þ 1Þpk m=ðm þ 1Þ; kX1; pk ¼ 1 p0 m=ðm þ 1Þ; k ¼ 0: Rearranging, one gets 8 k > < p ; k þ 2 þ 1=m k1 pk ¼ > : ðm þ 1Þ=ð2m þ 1Þ;

kX1;

(3)

(4)

k ¼ 0:

This yields kðk 1Þ 1 p ðk þ 2 þ 1=mÞ ð3 þ 1=mÞ 0 ¼ ð1 þ 1=mÞBðk þ 1; 2 þ 1=mÞ,

pk ¼

ð5Þ

where Bða; bÞ ¼ GðaÞGðbÞ=Gða þ bÞ is Legendre’s beta function [35], which goes asymptotically as ab for large a and fix b, hence pk kð2þ1=mÞ .

(6) gi

This leads to pk k with gi ¼ ð2 þ 1=mÞ for large N, where gi is the exponent of the in-degree distribution. In Fig. 1, the degree distributions for m ¼ 1; 2; 3; 4 are shown. The simulation results agree with the analytic ones very well and indicate that the exponents of the degree distribution have no relationship with the network size N. One of the significant empirical results on the in- and out-degree distributions is reported by Albert et al. [36]. In this paper, the crawl from Altavista was used. The appearance of the WWW from the point of view of Altavista is as follows [3]: In May 1999 the Web consisted of 203 106 nodes and 1466 106 hyperlinks. The average in- and outdegree were kin ¼ kout ¼ 7:22. In October 1999 there were already 271 106 nodes and 2130 106 hyperlinks. The average in- and outdegree were kin ¼ kout ¼ 7:85. The distributions were found to be of a power-law form with exponent gi ¼ 2:1 and go ¼ 2:7, where go is the exponent of the out-degree distribution.


864

(a)

(b)

(c)

(d)

Fig. 1. (Color online) Degree distribution of the network, with m ¼ 1; 2; 3; 4. In this figure, PðkÞ denotes the probability that a node with in-degree k in the network. (a) When m ¼ 1, the power-law exponent g of the density functions are g20 000 ¼ 2:95 0:06 and g80 000 ¼ 2:97 0:04. The inset shows the out-degree distribution. (b) When m ¼ 2, g20 000 ¼ 2:46 0:07 and g80 000 ¼ 2:47 0:03. (c) When m ¼ 3, g20 000 ¼ 2:29 0:08 and g80 000 ¼ 2:31 0:03. (d) When m ¼ 4, g20 000 ¼ 2:21 0:07 and g80 000 ¼ 2:23 0:03.

When kout ¼ 7:22 and 7:85, one can obtain from gi ¼ 2 þ 1=m that gi ¼ 2:138 and 2:127, respectively, which is very close to the real value 2.1 [3], thus give a strong support to the validity of the present model. 4. Other statistical characteristics 4.1. Average distance The average distance plays a significant role in measuring the transmission delay, thus is one of the most important parameters to measure the efficiency of communication network. Since the original conception of small-world effect is defined based on undirected networks, hereinafter we only consider the undirected version of our model, that is, the directed edge E ij from node i to j is considered to be an bidirectional edge between node i and j. When the node is added to the network, each node of the network according to the time is marked. Denote dði; jÞ the distance between nodes i and j, the average distance with network size N is defined as LðNÞ ¼

2sðNÞ , NðN 1Þ

(7)

where the total distance is X sðNÞ ¼ dði; jÞ.

(8)

1piojpN

Clearly, the distance between the existing nodes will not increase with the network size N, thus we have sðN þ 1ÞpsðNÞ þ

N X i¼1

dði; N þ 1Þ.

(9)


865

Denote y ¼ fy1 ; y2 ; . . . ; yl g as the node set that the ðN þ 1Þth node have connected. The distance dði; N þ 1Þ can be expressed as follows: dði; N þ 1Þ ¼ minfdði; yj Þg þ 1, (10) where j ¼ 1; 2; . . . ; l. Combining the results above, we have X Dði; yÞ, sðN þ 1ÞpsðNÞ þ ðN lÞ þ

(11)

L

where L ¼ f1; 2; . . . ; P Ng fy1 ; y2 ; . . . ; yl g is a node set with cardinality N l. Consider the set y as a single node, then the sum i¼L dði; yÞ can be treated as the distance from all the nodes in L to y, thus the sum P i¼L dði; yÞ can be expressed approximately in terms of LðN lÞ X dði; yÞ ðN lÞLðN lÞ. (12) i¼L

Because the average distance LðNÞ increases monotonously with N, this yields 2sðN lÞ 2sðNÞ o . ðN lÞLðN lÞ ¼ ðN lÞ ðN lÞðN l 1Þ N l 1

(13)

Then we can obtain the inequality 2sðNÞ . (14) N l1 Enlarge sðNÞ, then the upper bound of the increasing tendency of sðNÞ will be obtained by the following equation: dsðNÞ 2sðNÞ ¼N lþ . (15) dN N l1 This leads to the following solution: sðN þ 1ÞosðNÞ þ ðN lÞ þ

sðNÞ ¼ ðN l 1Þ2 logðN l 1Þ ðN l 1Þ þ C 1 ðN l 1Þ.

(16)

2

From Eq. (7), we have that sðNÞN LðNÞ, thus LðNÞ ln N. Since Eq. (14) is an inequality, the precise increasing tendency of the average distance LðNÞ may be a little slower than ln N. The simulation results are reported in Fig. 2. 4.2. The clustering coefficient The clustering coefficient is defined as C ¼ Ci ¼

2EðiÞ ki ðki 1Þ

PN

i¼1 C i =N,

where (17)

is the local clustering coefficient of node i, and EðiÞ is the number of edges among the neighboring node set of node i. Approximately, when the node i is added to the network, it is of degree 2ei and EðiÞ ei if the network is sparse enough. If a new node is added as i’s neighbor, EðiÞ will increase by 1 if the network is sparse enough. Therefore, in terms of ki the expression of EðiÞ can be written as follows: EðiÞ ¼ ei þ ðki 2ei Þ ¼ ki ei .

(18)

Hence, we have that Ci ¼

2ðki ei Þ . ki ðki 1Þ

(19)

This expression indicates that the local clustering scales as CðkÞk1 . It is interesting that a similar scaling has been observed in pseudofractal web [19] and several real-life networks [18]. Consequently, we have C¼

N N 2X k i ei 2X kin ¼ , N i¼1 ki ðki 1Þ N i¼1 ki ðki 1Þ

(20)


866 6.5 6 5.5

m=1 m=2 m=3 m=4 0.715*ln(N) 0.52*ln(N)

5

log(L)

4.5 4 3.5 3 2.5 2 1.5 102

103 Number of nodes

104

Fig. 2. (Color online) The average distance L vs. network size N of the undirected network. One can see that L increases very slowly as N increases. The main plot exhibits the curve where L is considered as a function of ln N, which is well fitted by a straight line. When m ¼ 1, the curve is above the fitting line when Np3000 and under the line when NX4000. When m ¼ 2; 3; 4, the curve is under the line when NX200, which indicates that the increasing tendency of L is approximately to ln N, and in fact a little slower than ln N. 101 N=50000

m=1 m=2 m=3 m=4 slope=-3

100

P(k)

10-1

10-2

10-3

10-4

10-5 0 10

101

102 k

103

104

Fig. 3. (Color online) Degree distribution of the undirected versions of the present model. At each time step, the new node selects m ¼ 1; 2; 3; 4 edges to connected, respectively. When m ¼ 1; 2; 3; 4, the power-law exponent g of the density functions are g1;80 000 ¼ 2:95 0:06, g2;80 000 ¼ 2:97 0:05, g3;80 000 ¼ 2:96 0:07 and g4;80 000 ¼ 2:96 0:06, respectively. The dash line have slope 3:0 for comparison.

where kin denotes the in-degree of node i. Because the average out-degree is m, one can replace the out-degree of each node with m to calculate its clustering coefficient. Because the average clustering coefficient C is defined on the undirected network, we give the undirected degree distribution firstly. From Fig. 3, one can get that the degree distribution of the undirected network is pðkÞk3 , where k ¼ kmin ; kmin þ 1; . . . ; kmax . As an example, the clustering coefficient C when m ¼ 1 can be rewritten as C¼

N 2X 1 . N i¼1 ki

(21)


867

0.9

Average Clustering Coefficient

0.8

0.7

0.6 m=1 m=2 m=3 m=4

0.5

0.4

0.3

0.2

0.1

0

500

1000

1500

2000 2500 3000 Number of nodes

3500

4000

4500

5000

Fig. 4. (Color online) The average clustering coefficient vs. the network size N to different m of the undirected versions of the present model. In this figure, when m ¼ 1; 2; 3; 4, one can find that the clustering coefficient of the network is almost a constant 0.74, 0.28, 0.18 and 0.14, respectively. This indicates that the average clustering coefficient is relevant to the average out-degree m.

Since the degree distribution is pðkÞ ¼ c1 k3 , where k ¼ 2; 3; . . . ; kmax . The clustering coefficient C can be rewritten as C¼

kmax kmax X X 2 NpðkÞ ¼ 2c1 k4 . N k k¼2 k¼2

(22)

For sufficient large N, kmax b2. The parameter c1 satisfies the normalization equation kmax X

pðkÞ dk ¼ 1.

(23)

k¼2

P max 4 It can be obtained that c1 ¼ 4:9491 and C ¼ 2 4:9491 kk¼2 k ¼ 0:8149. The demonstration exhibits that most real-life networks have large clustering coefficients no matter how many nodes they have. From Fig. 4, one can get that as the average out-degree increases, the clustering coefficient decreases dramatically, which indicates that the clustering coefficient C is relevant to the average out-degree m. This intrinsic hierarchy can be characterized in a quantitative manner using the recent findings of Dorogovtsev et al. [19] and Ravasz et al. [18]. The relationships between the clustering coefficient CðkÞ and the degree k indicates that the clustering coefficient and the degree k follow the reciprocal law, which is demonstrated in Fig. 5. 5. Conclusion and discussion In summary, we argue that the degree distribution of many real-life directed networks may be fitted appropriately by two power-law distributions, i.e., in- and out-degree power-law distributions, such as the WWW, citation network, Internet network and World-Wide Web. The present model demonstrated that if the out-degree of new added node obeys the power-law distribution, the exponent of in-degree distribution would be gi ¼ 2 þ 1=m, where m is the average out-degree. Numerical study indicates the exponent of the in-degree distribution of the presented networks can be well fitted by 2 þ 1=m, which has been observed in the empirical


868 101

m=1 m=2 m=3 m=4 Slope=-1

C(k)

100

10-1

10-2 0 10

101

102

103

k

Fig. 5. (Color online) Dependence between the clustering coefficient and the degree k when N ¼ 2000. One can see that the clustering coefficient and the degree k follow the reciprocal law.

data. Although this model is simple and rough, it offers a good starting point to explain the existing empirical data and the relationship between the in- and out-degree distribution exponents. Acknowledgments The authors are grateful to Dr. Qiang Guo for her valuable comments and suggestions, which have led to a better presentation of this paper. This work has been supported by the National Science Foundation of China under Grant Nos. 70431001 and 70271046. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]

S.H. Strogatz, Nature 410 (2001) 268. R. Albert, A.-L. Baraba´si, Rev. Mod. Phys. 74 (2002) 47. S.N. Dorogovtsev, J.F.F. Mendes, Adv. Phys. 51 (2002) 1079. M.E.J. Newmann, SIAM Rev. 45 (2003) 167. D.J. Watts, S.H. Strogatz, Nature 393 (1998) 440. A.L. Baraba´si, R. Albert, Science 286 (1999) 509. F. Comellas, M. Sampels, Physica A 309 (2002) 231. F. Comellas, G. Fertin, Phys. Rev. E 69 (2004) 037104. T. Zhou, B.-H. Wang, P.-M. Hui, K.-P. Chan, arXiv: cond-mat/0405258. A.-L. Baraba´si, R. Albert, H. Jeong, Physica A 272 (1999) 173. K. Klemm, V.M. Eguı´ luz, Phys. Rev. E 65 (2002) 036123. L.A.N. Amaral, A. Scala, M. Barthe´le´my, H.E. Stanley, Proc. Natl. Acad. Sci. USA 97 (2000) 11149. S.N. Dorogovtsev, J.F.F. Mendes, Phys. Rev. E 62 (2000) 1842. P.-Q. Jiang, B.-H. Wang, T. Zhou, Y.-D. Jin, Z.-Q. Fu, P.-L. Zhou, X.-S. Luo, Chin. Phys. Lett. 22 (2005) 1285. S.H. Yook, H. Jeong, A.-L. Baraba´si, Proc. Natl. Acad. Sci. USA 99 (2002) 13382. S.S. Manna, P. Sen, Phys. Rev. E 66 (2002) 066114. S.S. Manna, G. Mukherjee, P. Sen, Phys. Rev. E 69 (2004) 017102. E. Ravasz, A.-L. Baraba´si, Phys. Rev. E 67 (2003) 026112. S.N. Dorogovtsev, A.D. Goltsev, J.F.F. Mendes, Phys, Rev. E 65 (2002) 066122. J.S. Andrade, J.H. Hermann, R.F.S. Andrade, L.R. da Silva, Phys. Rev. Lett. 94 (2005) 018702. T. Zhou, G. Yan, B.H. Wang, Phys. Rev. E 71 (2005) 046141. Z.-M. Gu, T. Zhou, B.-H. Wang, G. Yan, C.-P. Zhu, Z.-Q. Fu, arXiv: cond-mat/0505175. R. Albert, H. Jeong, A.-L. Baraba´si, Nature 401 (1999) 130.

ARTICLE IN PRESS J. Liu et al. / Physica A 371 (2006) 861–869 [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35]

869

L.A. Adamic, B.A. Huberman, Science 287 (2000) 2115. A.-L. Baraba´si, R. Albert, H. Jeong, Physica A 281 (2000) 69. B.A. Huberman, The Laws of the Web, MIT Press, Cambridge, MA, 2001. P.L. Krapivsky, G.J. Rodgers, S. Redner, Phys. Rev. Lett. 86 (2001) 5401. P.L. Krapivsky, S. Redner, Comput. Networks 39 (2002) 261. S. Bornholdt, H. Ebel, Phys. Rev. E 64 (2001) 035104. B. Tadic´, Physica A 293 (2001) 273. B. Tadic´, Physica A 314 (2002) 278. B. Kahng, Y. Park, H. Jeong, Phys. Rev. E 66 (2002) 046107. P. Holme, B.J. Kim, Phys. Rev. E 65 (2002) 026107. S.N. Dorogovtsev, J.F.F. Mendes, A.N. Samukhin, Phys. Rev. Lett. 85 (2000) 4633. M. Abramowitz, I.A. Stegun, Handbook of Mathematical Functions, Applied Mathematics Series, vol. 55, National Bureau of Standards, Washington (Reprinted 1968 by Dover Publications, New York), 1964 (Chapter 6). [36] R. Albert, H. Jeong, A.-L. Baraba´si, Nature 401 (1999) 130.