with attachment probability Ak. When Ak grows slower than linearly with k, the number of nodes ... appears that the rate equation method is better suited for.
Organization of Growing Random Networks P. L. Krapivsky and S. Redner
arXiv:cond-mat/0011094v2 [cond-mat.stat-mech] 4 Jun 2001
Center for BioDynamics, Center for Polymer Studies, and Department of Physics, Boston University, Boston, MA, 02215 The organizational development of growing random networks is investigated. These growing networks are built by adding nodes successively and linking each to an earlier node of degree k with attachment probability Ak . When Ak grows slower than linearly with k, the number of nodes with k links, Nk (t), decays faster than a power law in k, while for Ak growing faster than linearly in k, a single node emerges which connects to nearly all other nodes. When Ak is asymptotically linear, Nk (t) ∼ tk−ν , with ν dependent on details of the attachment probability, but in the range 2 < ν < ∞. The combined age and degree distribution of nodes shows that old nodes typically have a large degree. There is also a significant correlation in the degrees of neighboring nodes, so that nodes of similar degree are more likely to be connected. The size distributions of the in-components and out-components of the network with respect to a given node – namely, its “descendants” and “ancestors” – are also determined. The in-component exhibits a robust s−2 power-law tail, where s is the component size. The out component has a typical size of order ln t and it provides basic insights about the genealogy of the network. PACS numbers: 02.50.Cw, 05.40.-a, 05.50.+q, 87.18.Sn
citation and web graphs, insights developed in the field of bibliometrics [9] have been applied to help understand the structure of the web [22]. Because of the dynamic nature of the citation and web graphs, it is not surprising that their topologies at any fixed time are very different from classical random graphs. In distinction to the power-law degree distributions of the citation and web graphs, random graphs have a Poisson node degree distribution. Here node degree is defined as the number of links at a node. To overcome the shortcomings of random graphs in describing the dynamic natures of these systems, both “small-world” networks [23,24] and growing random network models [20,25–28] have been recently introduced. The former are aimed at understanding the relatively small diameter of large graphs of socially interacting units, while the latter seek to understand the growth dynamics. In this paper, we provide a comprehensive quantitative description of a simple growing network (GN) model. Our results are based on the analysis of the rate equations for the densities of nodes of a given degree. This approach bears many similarities to the rate equations for the kinetics of aggregation. The rate equations for the evolution of growing networks are relatively simple and the results that emerge are comprehensive. Thus it appears that the rate equation method is better suited for probing the structure of growing random networks compared to the classical approaches for analyzing random graphs, such as probabilistic [4] or generating function [5] techniques. The rate equation approach also has the advantage that it can be adapted to other evolving graph systems, including networks with addition and deletion of nodes and links, as well as networks with link re-wiring. We will specifically investigate two types of models: (a) the GN in which nodes are added one at a time and a link is established with a pre-existing node according to an attachment probability Ak which depends only on
I. INTRODUCTION
Networks of many interacting units play an important role in epidemiology, ecology, gene regulation, neural networks, and many other fields [1–3]. In many studies of these networks, the number of nodes is considered to be fixed and the presence of a link between two nodes is treated as a random event independent of the other links. These assumptions lead naturally to random graph models [4,5]. While these models have rich behavior and considerable utility, they are not necessarily appropriate for describing growing networks, where the addition of nodes and links may depend on the local features of the network where the growth event is taking place. Typical examples of such growing networks include transportation or electrical distribution systems, where growth occurs in response to population-driven demands. Two currently appealing examples are the distribution of scientific citations and the structure of the world-wide web. For both these examples there is now considerable data available, in spite of the very rapid growth of these systems. In the former case, one may consider papers to be the nodes of a graph and citations as the links. The structure of the resulting “citation graph” was originally studied by Lotka in 1926 [6], and then by many others [7–13]. The basic feature of this citation distribution is that it appears to have a relatively steep power-law tail; thus most papers are minimally cited while highly-cited papers are rare. Similarly, in the web graph, much structural data has recently been obtained [14–21] which suggest that the number of nodes with k links has a power-law tail, with an exponent that is somewhat larger than 2. This powerlaw tail again corresponds to the basic fact that most nodes of the web graph are unimportant, while a relatively small number of nodes garner a large fraction of “hits”. Due to the qualitative similarities between the 1
attachment probability. The latter distribution predicts a network “diameter” which grows as ln t and also provides basic insights about the genealogy of the network. We conclude in Section VII.
the degree of the target node (Fig. 1), and (b) the GN with re-direction (GNR), in which the newly-created link can be re-directed to the “ancestor” node of the original target node. An important feature of these models is that the links are directed and the resulting graphs have a simple tree-like topology. The motivation for the GNR model is that this re-direction process roughly mimics how we might (lazily) construct the references to this paper. In addition to papers that we peruse and cite directly, we are also likely to incorporate some of the references within these papers as part of our reference list. A related “copying” process also affects the organization of the web [15].
II. THE MODELS A. Growing Network (GN)
In the GN, we introduce a new node at each time step and link it to one of the earlier nodes in the network (Fig. 1). This leads to a network which has a topology of a (directed) tree graph. In terms of citations, we may interpret the nodes as publications, and the directed link from one paper to another as a citation to the earlier publication. In terms of the web graph, nodes are web pages and the directed links are hyperlinks. We will refer to the node to which the link is directed as the ancestor of the current node. As the network grows, a degree distribution Nk (t), defined as the average number of nodes with k links (k − 1 incoming and 1 outgoing) builds up. The initial node is unique as it does not have an outgoing link. The basic ingredient which determines the structure of the network is the attachment kernel Ak , defined as the probability that the newly-introduced node links to an existing node which already has k links. On general grounds, this attachment kernel should be a non-decreasing function of k, and natural scenarios are attachment kernels with a power law dependence on k. For the linear kernel, the GN reduces to the “scale free” model introduced by Barab´ asi and Albert [20] and further investigated in [25–27]. The general homogeneous model, Ak = k γ with γ ≥ 0, was investigated in [28] where it was found that the degree distribution Nk (t) crucially depends on the value of γ. For γ < 1, the linking probability grows weakly with node “popularity” and Nk (t) decreases as a stretched exponential in k for any t. The complementary case of γ > 1 leads to phenomenon akin to gelation [29] in which a single “gel” node links to nearly every other node. For γ > 2, this phenomenon is so extreme that the number of links between other nodes is finite in an infinite graph. We shall show that these results also apply for the more general situation where Ak ∼ k γ as k → ∞ in addition to the strictly homogeneous situation where Ak = k γ . The borderline case of an asymptotically linear attachment kernel, Ak ∼ k, is particularly intriguing as it leads to Nk ∼ k −ν , with the exponent ν tunable to any value larger than 2 depending on finer details of the attachment kernel. In particular, the strictly linear kernel, Ak = k, leads to ν = 3. However, by changing the value of a single attachment probability, for example A1 = α and Ak = k for k > 2, any value of ν > 2 is possible. This sensitivity of asymptotic behavior on microscopic details indicates that the case of attachment index γ = 1 is marginal. A
9 8
1
4
7 6
10
3
2
5
FIG. 1. Schematic illustration of the evolution of the growing random network. Nodes are added sequentially and a single link joins the new node to an earlier node. In this example, node 1 has degree 5, node 2 has degree 3, nodes 4 and 6 have degree 2, and all the remaining nodes have degree 1. Also note that node 1 is the “ancestor” of 6, while 10 is the descendant of 6.
One of our primary results is that for asymptotically linear attachment kernels, Ak ∼ k as k → ∞, the degree distribution of the GN has a power-law form Nk (t) ∼ tk −ν , with ν tunable in the range 2 < ν < ∞. By choosing the control parameters of our model in a plausible manner it is then easy to reproduce the quantitative observations about the node degree distribution of the web graph. In Sec. II, we define the GN and GNR models precisely and then determine their node degree distributions in Sec. III by the rate equation approach. Different distributions arise in the GN model which depend on the asymptotic behavior of the attachment probability as a function of node degree. In Sec. IV, we investigate the joint age-degree distribution and find (not surprisingly) that “old” nodes are typically more highly connected. In Sec. V, we study the correlations which develop between the degrees of connected nodes as the network grows. In Sec. VI, we study a more global measure of the network, namely, the size distributions of the “in-component” and “out-component”. With respect to a given node x, the incomponent is the set of nodes which can reach node x via a directed path of links. Conversely, the out-component is the set nodes which can be reached from node x via a directed path. The former exhibits a robust power-law size distribution which appears to be independent of the 2
the rate equations for the GN with a shifted linear attachment kernel.
related phenomenon occurs in constant-kernel aggregation, where the asymptotic kinetics is sensitively dependent on the actual values of the reaction rate [30,31].
III. THE DEGREE DISTRIBUTION B. Growing Network with Re-direction (GNR)
A. GN Model
We now study the evolution of the degree distribution of the GN model. The rate equations for Nk (t) are
The GN is built by simultaneous node and link addition and disregards other elemental processes which can occur in the development of large networks. In the context of the web, these include node and link deletion (for out-of-date websites), link re-wiring, the tendency of a new node to connect to nearby nodes, and the copying of links from existing nodes to new nodes. The GNR model incorporates a simple form of link re-wiring into the GN model. At each time step, a new node n is added and an earlier node x is selected uniformly as a possible “target” for attachment. With probability 1 − r, the link from n to x is created; in this case, the evolution is the same as in the GN. However, with probability r, the link is re-directed to the ancestor node y of node x (Fig. 2).
y
dNk = A−1 [Ak−1 Nk−1 − Ak Nk ] + δk1 . (1) dt The first term on the right-hand side of Eq. (1) accounts for the process in which a node with k − 1 links is connected to the new node, leading to a gain in the number of nodes with k links. PThis happens with probability Ak−1 /A, where A(t) = j≥1 Aj Nj (t) is the appropriate normalization factor. A corresponding role is played by the second (loss) term on the right-hand side of Eq. (1). Notice that the overall amplitude in Ak is irrelevant, since it appears in both the numerator and denominator of Eq. (1), and can be chosen arbitrarily. The last term on the right-hand side of Eq. (1) accounts for the continuous introduction of new nodes with no incoming links. We also set N0 ≡ 0, so that Eq. (1) applies for all k ≥ 1. It is worth noting that at a fundamental level, Eqs. (1) describe the symbolic reaction [k] → [k + 1]. Many other reactions, such as the Becker-D¨ oring theory of nucleation [35], additive polymerization [36], hydrolysis [37], catalysis and submonolayer epitaxial growth [38], fit into this scheme. However, there is one important difference in that we consider strictly a single connected cluster (the growing network), while in the context of aggregationlike processes, one generally deals with a collection of clusters. The effect of having more than one cluster in the framework of growing networks is currently under investigation [39]. We start by solving the equations for the low-order moments of the P degree distribution, which are defined by Mn (t) = j≥1 j n Nj (t). Summing Eqs. (1) over all k gives the rate equation for the total number of nodes, M˙ 0 = 1, whose solution is M0 (t) = M0 (0) + t. The first moment (the total number of link endpoints) obeys M˙ 1 = 2, which gives M1 (t) = M1 (0) + 2t. The first two moments are therefore independent of the attachment kernel Ak , while higher moments and the degree distribution itself do depend on the kernel Ak . To develop an appreciation for the types of behavior that can occur, consider the linear kernel Ak = k for which A(t) coincides with M1 (t). In this case, we can solve Eqs. (1) for an arbitrary initial condition. However, since the long-time behavior is most interesting we limit ourselves to the asymptotic regime (t → ∞) where the initial condition is irrelevant. Using therefore M1 = 2t, we solve the first few of Eqs. (1) and obtain N1 = 2t/3, N2 = t/6, etc., which implies that the Nk grow linearly with time. Accordingly, we substitute
x
n FIG. 2. Illustration of the basic processes in the GNR model. The new node (solid) selects a target node x. With probability 1 − r a link is established to this target node (dashed arrow), while with probability r the link is established with the ancestor of x (thick solid arrow).
A model of this spirit was recently mentioned in the context of the web development [15]. A related model was also proposed long ago by Simon [32,33] to describe the word frequencies of English text. The Simon model gives a power-law frequency distribution whose exponent is tunable in manner which closely mirrors the behavior in the GNR model. The Simon model was also recently applied to explain power-law distributions in the frequency of family names [34]. While at first sight the GNR model appears complicated, we shall see that its characteristics can be obtained in a simple fashion. Another very helpful and surprising property of the GNR with a uniform initial attachment probability is that it is equivalent to the GN with a shifted linear attachment kernel and no re-direction. We shall exploit this equivalence extensively in the following. Nevertheless, we consider the GNR separately, as in many cases the rate equations for the GNR with a uniform attachment kernel is simpler to appreciate than 3
h 1−γ 1−γ i 1 −2 k −γ exp −µ k 1−γ 2 < γ < 1, µ2 −1 h √ i nk ∼ k 2 exp −2µ k (5) γ = 12 , h i 1 1 k −γ exp −µ k1−γ + µ2 k1−2γ 1−γ 2 1−2γ 3 < γ < 2,
Nk (t) = t nk in Eqs. (1) to yield the simple recursion relation nk = nk−1 (k − 1)/(k + 2). Solving for nk gives nk =
4 . k(k + 1)(k + 2)
(2)
In the context of discrete functions defined on the positive integers, this distribution is algebraic over the entire range of k. Indeed, as explained in Ref. [40], the proper analog of the continuous power-law function f (x) = x−λ is the discrete function fk = Γ(k)/Γ(k + λ), where Γ is the Euler gamma function. Rewriting Eq. (2) as nk = 4Γ(k)/Γ(k + 3), we see that nk is indeed algebraic over the entire range k ≥ 1. Returning to more general attachment kernels, let us assume that the degree distribution and A(t) both grow linearly with time. We anticipate that this hypothesis will hold for attachment kernels which do not grow faster than linearly with k. By substituting Nk (t) = t nk and A(t) = µt into Eqs. (1) we obtain the recursion relation nk = nk−1 Ak−1 /(µ + Ak ) and n1 = µ/(µ + A1 ). Solving for nk , we obtain nk =
µ Ak
k Y
j=1
1+
µ Aj
−1
.
etc.. The pattern given in Eq. (5) continues ad infinitum: Whenever γ decreases below 1/m, with m a positive integer, an additional term in the exponential arises from the now relevant contribution of the next higher-order term in the expansion of the product in Eq. (3). 2 1.8
µ
1.6 1.4 1.2
(3)
1
To complete the solution, we need Pto find the amplitude µ. Combining the definition µ = j≥1 Aj nj and Eq. (3), we obtain the implicit relation −1 ∞ Y k X µ 1+ = 1. Aj j=1
0
0.2
0.4
γ
0.6
0.8
1
FIG. 3. The amplitude µ in Mγ (t) = µt versus γ.
To complete the solution, we require the amplitude µ. We have been unable to find an explicit expression for µ, even if the attachment kernel is strictly homogeneous, Ak = k γ , as it requires solving Eq. (4). However, this relation can be easily evaluated numerically and it shows that µ(γ) varies smoothly between 1 and 2 as γ increases from 0 to 1 (Fig. 3). These two limits correspond to the known limiting behaviors for M0 and M1 . More detailed results can be obtained for the limiting solvable cases of Ak = const. and Ak = k. In these limits, µ = 1 and µ = 2, respectively, and the corresponding degree distributions are given by nk = 2−k and by Eq. (2). The former can be easily obtained by following exactly the same steps as those used to solve the network with the linear kernel. We can then apply perturbation theory to find the respective limiting behaviors of µ(γ) for γ close to 0 or 1, µ = 1 + B0 γ + O γ 2 , µ = 2 − B1 (1 − γ) + O (1 − γ)2 ,
(4)
k=1
Thus the amplitude µ always depends on the entire attachment kernel. On the other hand, we shall show that the degree distribution exhibits a robust behavior which depends only on gross features of the attachment kernel, as long as Ak grows slower than linearly. The case where Ak is asymptotically linear is perhaps the most intriguing as the degree distribution has a power-law behavior whose exponent depends on microscopic details of the dependence of Ak on k. When Ak grows faster than linearly, drastically different gelation-like behavior arises. It is again worth noting that these three regimes of kinetic behavior also arise in the solutions to the rate equations for additive polymerization processes, with the different regimes arising when the attachment exponent γ is smaller than, larger than, or equal to one [41]. We now separately describe these three cases.
with
1. Sub-linear kernels
B0 =
Consider sub-linear kernels which are asymptotically homogeneous, that is, Ak ∼ k γ , with 0 < γ < 1. Substituting this asymptotics into Eq. (3), writing the product as the exponential of a sum, converting the sum to an integral, and performing this integral, we obtain
∞ X ln j
2j
j=1 ∞ X
B1 = 4
j=1
4
= 0.5078 . . . ,
ln j = 2.407 . . . . (j + 1)(j + 2)
In particular, the node degree for an undirected graph generalizes to the in-degree and out-degree for a directed graph. These are just the number of incoming and outgoing links at a node, respectively. Thus, the node degree k in a directed graph is the sum of the in-degree i and outdegree j. The most general linear attachment kernel for a directed graph is therefore of the form Aij = ai + bj. The GN corresponds to the case where the out-degree of any node equals one; thus j = 1 and k = i+1. Hence the general linear attachment kernel reduces to Ak = a(k−1)+b. Since, as mentioned above, the overall scale factor in the kernel is irrelevant, we can re-write Ak as the shifted linear kernel Ak = k + w, with w = −1 + b/a, so that it can vary over the range −1 < w < ∞. We can now easily determine the degree distribution for the shifted Plinear attachment kernel. First we note that A(t) = j Aj Nj = M1 (t) + wM0 (t). Then using the basic results A = µt, M0 = t and M1 = 2t, we have µ = 2 + w and thence ν = 3 + w, according to Eq. (6). Furthermore, from Eq. (3) we easily determine the entire degree distribution to be
2. Linear kernels
Consider now asymptotically linear attachment kernels, Ak ∼ k as k → ∞. As already mentioned, we can always choose the amplitude in the asymptotic relation equal to one, as attachment kernels which differ by a multiplicative factor give identical behavior. For the asymptotically linear kernel, expanding the product in Eq. (3) and following step-by-step the approach that led to Eq. (5) now gives the power-law asymptotic behavior nk ∼ k −ν ,
with
ν = 1 + µ.
(6)
An important feature of this result is that the exponent ν can be tuned to any value larger than 2. This lower bound P follows from the fact that the sum P immediately µ = j Aj nj ∼ j jnj must converge and this, in turn, requires that ν must be larger than 2. As an explicit example, consider the attachment kernel Ak = k for k ≥ 2, while A1 ≡ α is an arbitrary positive number. Now it is convenient to separately treat A1 and Ak for k ≥ 2 in Eq. (4) to recast it as −1 ∞ Y k X µ 1+ µ = A1 . Aj j=2
nk = (2 + w) (7)
The right-hand side of Eq. (7) can be simply expressed as the ratio of Euler gamma functions to yield ∞ X
Γ(2 + µ)
k=2
Γ(1 + k) . Γ(1 + µ + k)
(8)
This sum can be evaluated by employing the identity [40] ∞ X Γ(a + k)
k=2
Γ(b + k)
=
Γ(a + 2) , (b − a − 1)Γ(b + 1)
3. Super-linear kernels
For the super-linear homogeneous attachment kernels, Ak = k γ with γ > 1, we now show that a “winner take all” phenomenon arises, namely, there emerges a single dominant “gel” node which is linked to almost every other node. A particularly singular behavior occurs for γ > 2, where there is a non-zero probability that the initial node is connected to every other node of the network. Let us first determine the probability that the initial node connects to all other nodes. It is convenient to consider a discrete time version of the GN in which one node is introduced at each elemental step which always links to the initial node. After N steps, the probability that the new node will link to the initial node is N γ /(N + N γ ). This probability that this connectivity pattern continues indefinitely is
(9)
so that Eq. √ (8) reduces to µ(µ − 1) = 2α, with solution µ = (1 + 1 + 8α)/2. Thus the exponent ν = 1 + µ is √ 3 + 1 + 8α ν= . (10) 2 Furthermore, following the steps that lead to Eq. (3), the degree distribution for the GN with the attachment kernel A1 = α and Ak = k for k ≥ 2 is n1 =
µ , µ+α
nk =
µα Γ(2 + µ) Γ(k) . µ + α Γ(1 + µ + k)
(12)
In a similar vein, we can solve the GN with an arbitrary piecewise linear attachment kernel. In all these cases, the exponent ν can be tuned to any value larger than 2, and for sufficiently large degree nk can be expressed as the ratio of gamma functions, i. e., the degree distribution is a purely (discrete) algebraic function.
k=2
µ=α
Γ(k + w) Γ(3 + 2w) . Γ(1 + w) Γ(k + 3 + 2w)
(11)
Notice that for 0 < α < 1, the exponent lies in the range 2 < ν < 3; in particular, ν = 2 + 2α − 4α2 + . . . as α → 0. When α = 1, we recover the connectively distribution of Eq. √ (2). For α > 1, we have ν > 3; in particular, ν → 2α as α → ∞. The GN is also solvable when Ak = k + w. This shifted linear kernel can be motivated naturally by explicitly keeping track of the directionality of the links.
P=
∞ Y
N =1
1 . 1 + N 1−γ
(13)
Clearly, P = 0 when γ ≤ 2 but P > 0 when γ > 2. Thus for γ > 2 there is a non-zero probability that the initial node connects to all other nodes. 5
with three links grows as t3−2γ and the number with more m than three is finite. Generally for m+1 m < γ < m−1 , the number of nodes with more than m links is finite, while Nk ∼ tk−(k−1)γ for k ≤ m. Logarithmic corrections also arise at the transition points.
To determine the behavior for general γ > 1, we first need the asymptotic time dependence of Mγ . To this end, it is useful to consider the discretized version of the master equations Eq. (1), where the time t is limited to integer values. Then Nk (t) = 0 whenever k > t and the rate equation for Nk (k) immediately leads to Nk (k) =
(k − 1)γ Nk−1 (k − 1) Mγ (k − 1)
= N2 (2)
k−1 Y j=2
jγ . Mγ (j)
B. Relation to citation data
Let us now attempt to relate some of our predictions from the GN model to the distribution of citations in recent scientific publications [11,12]. The GN model represents an extreme idealization of the citation process in which each publication cites only a single paper and the probability of citing a paper depends only on its current number of citations, and not on its intrinsic quality or any other realistic features. Thus we anticipate that the connection between the model and the data will be, at best, tenuous. The data that we discuss is based on: (a) 783,339 papers with 6,716,198 citations (provided by the Institute of Scientific Information (ISI)), and (b) 24,296 papers with 351,872 citations from all issues of Physical Review D (PRD) from 1975–1994 (provided by the SPIRES database) [42]. A cursory visual inspection of this data suggests that the number of publications with k citations decays as a stretched exponential function of k (see e. g., Fig. 1 of Ref. [12]). However, an analysis based on presenting the data in a Zipf plot, in conjunction with scaling, is suggestive of a power-law form for the citation distribution, k −ν , with ν ≈ 3 (Fig. 2 of Ref. [12]). This ambiguity between a stretched exponential and powerlaw form for the citation distribution corresponds to the situation where the predictions of the GN itself are difficult to discern numerically. If we consider the GN with attachment kernel Ak ∼ k γ for γ < ∼ 1, then a plot of nk in Eq. (5) versus k, for 1 ≤ k ≤ 1000, changes relatively slowly as γ varies in the range (0.9, 1). If one attempts to fit this data to a power law, then an exponent value somewhat larger than 3 gives a reasonable fit to the data. It is only as γ → 1 from below, however, that the factors in the exponential of Eq. (5) conspire to give a pure power-law form for nk . Because of the relatively small change in nk as γ varies, the relatively incomplete data on the distribution of citations is insufficient to provide a clear test for the existence of a power law. Further, for the GN model with linear attachment kernel, the degree distribution depends on additional details of this kernel and can achieve any value greater than 2. In short, it is difficult to relate the GN model to citation data based on the form of the distribution alone. Another interesting aspect of the citation distribution which can be compared with the GN model is the nature of highly-cited publications. Within the GN model, the degree of the most popular node, kmax , P may be determined by the extreme statistics criterion k>kmax Nk =
(14)
From this, and the obvious fact that Nk (k) must be less than unity, it follows that Mγ (t) cannot grow more slowly than tγ . On the other hand, Mγ (t) cannot grow faster than tγ , as follows from the estimate Mγ (t) =
t X
k γ Nk (t)
k=1
≤ tγ−1
t X
kNk (t) = tγ−1 M1 (t)
(15)
k=1
Thus Mγ ∝ tγ . In fact, the amplitude of tγ is unity as we will derive self-consistently after solving for the Nk ’s. We now use Mγ ∼ tγ , with γ > 1, in the rate equations to solve recursively for each Nk . Starting with the equation N˙ 1 = 1 − N1 /Mγ , we see that the second term on the right-hand side is sub-dominant. Thus by neglecting this term we obtain N1 = t. Similarly, N˙ 2 = (N1 −2γ N2 )/Mγ ∼ N1 /Mγ gives N2 ∼ t2−γ /(2−γ). Continuing this same line of reasoning for each successive rate equation gives the leading behavior of Nk , Nk (t) = Jk tk−(k−1)γ
for
k ≥ 1,
(16)
Qk−1 γ with Jk = j=1 j /[1 + j(1 − γ)]. This pattern of behavior for Nk continues as long as its exponent k − (k − 1)γ remains positive, or k < γ/(γ − 1). The full temporal behavior of the Nk (t) may be determined straightforwardly by keeping the next correction terms in the rate equations. For example, N1 (t) = t − t2−γ /(2 − γ) + . . .. For k > γ/(γ − 1), each Nk has a finite limiting value in the long-time limit. Since the total number of connections equals 2t, and t of them are associated with N1 , the remaining t links must all connect to a single node which has t connections (up to corrections which grow no faster than sub-linearly with time). Consequently the amplitude of Mγ equals unity, as argued above. Therefore for super-linear kernels, the GN undergoes an infinite sequence of connectivity transitions as a function of γ. For γ > 2 all but a finite number of nodes are linked to the “gel” node which has the rest of the links of the network. This is the “winner take all” situation. For 3/2 < γ < 2, the number of nodes with two links grows as t2−γ , while the number of nodes with more than two links is again finite. For 4/3 < γ < 3/2, the number of nodes 6
1, which states that there is one node in the network whose degree lies in the range (kmax , ∞). This criterion gives (ln t)1/(1−γ) 0 ≤ γ < 1; (17) kmax ∼ t1/(ν−1) asymptotically linear; t γ > 1.
is a power law with exponent ν = 1 + 1/r, which can be tuned to any value larger than 2. This exponent value was first obtained in Simon’s original paper [32], but in a rather different context, by employing an approach which is similar to ours. IV. THE AGE DISTRIBUTION
We now compare this prediction with the data about the most-cited paper. To make a correspondence between citations and Eq. (17), we identify the total number of publications in each dataset with t. The most cited paper had 8,904 citations in the ISI data set and 2,026 citations in the PRD data set. These results are consistent with the first line of Eq. (17) when γ ≈ 0.86 and γ ≈ 0.7 respectively, and also with the second line for ν ≈ 2.5 and ν ≈ 2.3 respectively. Thus an analysis of the most-cited paper does not cleanly indicate whether the citation distribution is a power law or a stretched exponential. These ambiguities indicate some of the issues that should be be clarified to provide a clear description of citations in terms of a growing network model.
In addition to the distribution of degree, we study when connections occur in the GN. This provides a deeper understanding of the overall development of growing networks. Naively, we expect that older nodes will be better connected and this can be quantified by categorizing nodes both by their degree and their age. It should be emphasized, that the GN does not have explicit aging, in which the connection probability depends on the age of the target node; this feature is treated in Ref. [26]. Instead, we are merely extending the categorization of node to include their age as well as their degree. A. Linear connection kernel
C. GNR Model
Let ck (t, a) be the average number of nodes of age a which have k − 1 incoming links at time t. Here age a means that the node was introduced at time t − a. That is, we are now resolving each node both by its degree and its age. The resulting joint age-degree distribution is simply R t related to the degree distribution through Nk (t) = 0 da ck (t, a). The joint distribution evolves according to ∂ Ak−1 ck−1 − Ak ck ∂ ck = + + δk1 δ(a). (19) ∂t ∂a A(t)
We now solve the GNR model within the rate equation framework. According to the basic processes in the model (Fig. 2), the degree distribution Nk (t) evolves by the rate equations dNk 1−r [Nk−1 − Nk ] = δk1 + dt M0 r [(k − 2)Nk−1 − (k − 1)Nk ] . + M0
(18)
The second term on the left accounts for the aging of nodes and the probability of connecting to a given node again depends only on its degree and not on its age. We start by considering the linear attachment kernel, Ak = k, and focus on the long time asymptotic behavior. Then we can disregard the initial condition and write A(t) ≡ M1 (t) = 2t. This transforms Eqs. (19) into ∂ (k − 1)ck−1 − kck ∂ ck = + + δk1 δ(a). (20) ∂t ∂a 2t
For re-direction probability r = 0, the first three terms on the right-hand side of Eqs. (18) are the same as in the GN. The last two terms account for the change in Nk due to the re-direction process. To understand their origin, consider the gain term due to re-direction. Since the initial node is chosen uniformly, if re-direction does occur, the probability that a node with k − 1 pre-existing links receives the new “re-directed” link is proportional to k − 2, the number of pre-existing incoming links. A similar argument applies for the re-direction-driven loss term. Since N0 ≡ 0 is tacitly assumed, Eq. (18) applies for all k ≥ 1. By combining the terms in Eq. (18), the rate equation reduces to that of the original GN with Ak = (k − 1)r + 1 − r = r[k − 1 + (1 − r)/r]. By scaling out the factor r, we then reduce Ak to the shifted linear kernel k + w, with w = (1 − r)/r − 1 = 1r − 2. Thus we can merely transcribe our results about the GN with the shifted linear kernel to determine the degree distribution for the GNR model. Amusingly, for r = 1/2, the GN R model is identical to the GN with the purely linear kernel. In general, the degree distribution in the R model
The homogeneous form of this equation implies that solution should be self-similar. Thus we seek a solution as a function of the single variable a/t rather than two separate variables. Thus, we write ck (t, a) = fk (x)
with
a x=1− . t
(21)
This turns the partial differential equation (20) into the ordinary differential equation − 2x 7
dfk = (k − 1)fk−1 − kfk . dx
(22)
We now solve Eqs. (25), subject to the boundary condition (23), and with µ determined from Eq. (4). Let us first replace x by X = −µ−1 ln x, which reduces the dfk left-hand side of (25) to dX . Applying a Laplace transR∞ form, fˆk (s) = 0 dX e−sX fk (X), fˆk (s) obeys a simple algebraic recursion formula whose solution is
We have omitted the delta function term, since it merely provides the boundary condition ck (t, a = 0) = δk1 , or fk (1) = δk1 .
(23)
1 10
0.03
k=1
0.8
0.02
−1 k s 1 Y ˆ 1+ . fk (s) = Ak j=1 Aj
20
ck(t,a)
0.01 50
0.6
0
0.99
Apart from notation, this is identical to Eq. (3) and can be analyzed accordingly. In particular, we can determine fˆk (s) for various asymptotically linear attachment kernels. For example, for the shifted linear attachment kernel, Ak = k + w, we find
1
0.4 2
0.2
5 10
0
(26)
0
0.2
0.4
0.6
0.8
Γ(k + w) Γ(1 + w + s) . fˆk (s) = Γ(1 + w) Γ(k + 1 + w + s)
1
a/t
To invert this Laplace transform, it is useful to rewrite this expression as a sum of rational functions fˆk (s) = P F k (j + w + s)−1 . This then gives fk (X) = P1≤j≤k jk −(j+w)X , with 1≤j≤k Fj e
FIG. 4. Age-dependent degree distribution for the GN for the linear attachment kernel. Low-degree nodes tend to be relatively young while high-degree nodes are old. The inset shows detail for a/t ≥ 0.98.
Fjk =
The solution to this boundary-value problem may be simplified by assuming the exponential solution fk = Φϕk−1 ; this is consistent with the boundary condition, provided that Φ(1) = 1 and ϕ(1) = 0. The above ansatz reduces the infinite set of rate equations (22) into two elementary differential equations for ϕ(x) and √ Φ(x) whose √ solutions are ϕ(x) = 1 − x and Φ(x) = x. In terms of the original variables of a and t, the joint age-degree distribution is then r r k−1 a a ck (t, a) = 1 − 1− 1− . (24) t t
(−1)j−1 Γ(k + w) . Γ(j) Γ(k − j + 1) Γ(1 + w)
(28)
When then re-express this in terms of the original variable x = e−(2+w)X . Hence fk (x)Pcan be re-written as the k (j+w)/(2+w) . sum of k power-laws fk (x) = 1≤j≤k Fj x Substituting the explicit expressions (28) into this sum reduces the joint age-degree distribution to fk (x) =
h ik−1 1+w 1 Γ(k + w) x 2+w 1 − x 2+w . Γ(k) Γ(1 + w)
(29)
This expression shows that old nodes have a broad distribution of degrees up to a characteristic degree hki = (1 − a/t)−1/(2+w) . One can also verify that the average R t age ak of nodes of R 1 degree k, defined as ak = Nk−1 0 da ack (t, a) = tn−1 dx (1 − x)fk (x), is k 0
Thus the degree distribution for nodes of fixed age decays exponentially with degree, with a characteristic degree which diverges as hki ∼ (1 − a/t)−1/2 for a → t. As expected, young nodes (those with a/t → 0) typically have a small degree while old nodes have large degree (Fig. 4). It is the slow decay of the degree distribution for old nodes which ultimately leads to a power-law degree distribution when this joint age-degree distribution is integrated over all ages to give Nk (t).
ak Γ(5 + 3w) Γ(k + 3 + 2w) =1− t Γ(3 + 2w) Γ(k + 5 + 3w) const. ∼ 1 − 2+w . k
(30)
Thus nodes with very large degree necessarily have an age which approaches that of the entire network. Finally, the joint age-degree distribution simplifies in the limit k → ∞ and x → 0, with the scaling variable ξ = kx1/(2+w) kept finite. In this case, we can rewrite (29) in the scaling form
B. General connection kernels
Let us now consider the GN with a connection kernel which grows either linearly or more slowly with k. The ansatz (21) still is valid, so that the distribution fk evolves according to dfk = Ak−1 fk−1 − Ak fk . − µx dx
(27)
fk (x) = k −1 F (ξ),
(25) 8
F (ξ) =
ξ 1+w exp(−ξ). Γ(1 + w)
(31)
For example, in the network of Fig. 1, there are N1 = 6 nodes of degree 1, with N12 = N13 = N15 = 2. There are also N2 = 2 nodes of degree 2, with N25 = 2, and N3 = 1 node of degree 3, with N35 = 1. The correlation function is not defined for the initial node. Generally, Nkl is defined P for k ≥ 1 and l ≥ 2, and obeys the sum rule Nk = l Nkl . A gratifying feature of the rate equation approach is that the correlation function Nkl can be understood in a natural and simple fashion.
The scaling variable can also be written as ξ = k/hki, and thus Eq. (31) clearly shows that old nodes have a broad distribution of degrees: 1 ≤ k < ∼ hki. We can derive explicit age-degree distributions for other attachment kernels. For example, for the constant attachment kernel, Ak = 1, the joint age-degree distribution is the Poisson distribution, fk (X) =
X k−1 −X e , (k − 1)!
(32)
or in terms of the original variables a and t, a | ln(1 − a/t)|k−1 ck (t, a) = 1 − . t (k − 1)!
A. Linear connection kernel
(33)
For the GN with the linear attachment kernel Ak = k, the joint distribution Nkl (t) evolves according to
The characteristic degree now diverges relatively slowly, viz. hki ∼ − ln(1 − a/t) as a → t, than for asymptotically linear attachment kernels. On the other hand, the average age approaches the maximal age t at a much faster rate, as ak = t 1 − (2/3)k , as k approaches its maximal value. For cases where we have been unable to obtain an explicit solution, the Laplace transform method still allows us to extract the asymptotics. For example, for asymptotically homogeneous attachment kernels, Ak → k γ as ˆ k → ∞,Eq. (26) gives the large-k asymptotics fk (s) ∼ −γ 1−γ k exp −sk /(1 − γ) (see Eq. (5). (For concreteness, we consider here the range 1/2 < γ < 1.) Inverting this Laplace transform yields k 1−γ fk (X) ∼ k −γ δ X − . (34) 1−γ
M1
dNkl = [(k − 1)Nk−1,l − kNkl ] + dt [(l − 1)Nk,l−1 − lNkl ] + (l − 1)Nl−1 δk1 . (36)
The first two terms on the right-hand side account for the change in Nkl due to the addition of a link onto a node of degree k − 1 (gain) or k (loss), while the second set of terms gives the change in Nkl due to the addition of a link onto the ancestor node. Finally, the last term accounts for the gain in N1l due to the addition on the new node. Asymptotically, M1 → 2t and Nkl → tnkl , and we use these hypotheses to reduce Eqs. (36) to the timeindependent recursion relations (k + l + 2)nkl = (k − 1)nk−1,l + (l − 1)nk,l−1 + (l − 1)nl−1 δk1 . (37)
In particular, the age of nodes with k links is peaked about the value ak which satisfies ak k 1−γ . (35) ≃ 1 − exp −µ t 1−γ
This can be reduced to a constant-coefficient inhomogeneous recursion relation by the substitution
This again shows that old nodes are much better connected.
to yield
nkl =
Γ(k) Γ(l) mkl Γ(k + l + 3)
mkl = mk−1,l + mk,l−1 + 4(l + 2)δk1 .
(38)
(39)
By solving Eqs. (39) for the first few k, one can grasp the pattern of dependence on k and l and thereby infer the general solution
V. NODE DEGREE CORRELATIONS
We now demonstrate that correlations between the degrees of connected nodes spontaneously develop as the network grows. One motivation for focusing on these correlations is that recently random graph models with arbitrary degree distributions have been investigated [43–45]. While the degree distribution can be chosen arbitrarily in these models, the degrees of connected nodes are uncorrelated. This lack of correlation suggests that such random graphs may have limited applicability to growing network systems. For the GN, a useful characterization of node degree correlations is Nkl (t), the number of nodes of total degree k which attach to an ancestor node of total degree l.
mkl = 4
Γ(k + l) Γ(k + l − 1) + 12 . (40) Γ(k + 2) Γ(l − 1) Γ(k + 1) Γ(l − 1)
This solution can also be obtained in a more systematic manner by the generating function method (see below for the shifted linear kernel). Combining Eqs. (38) and (40) we finally obtain 4(l − 1) k(k + 1)(k + l)(k + l + 1)(k + l + 2) 12(l − 1) . + k(k + l − 1)(k + l)(k + l + 1)(k + l + 2)
nkl =
9
(41)
reduces Eqs. (45) to
The important feature of this result is that the joint distribution does not factorize, that is, nkl 6= nk nl . This confirms our earlier assertion that correlations between the degrees of connected nodes form spontaneously. This is arguably the most important distinction between classical random graphs – where node degrees are uncorrelated – and the GN. While the solution of Eq. (41) is unwieldy, it greatly simplifies in the scaling regime, k → ∞ and l → ∞ with y = l/k kept finite. The scaled form of the solution is nkl = k
−4
4y(y + 4) . (1 + y)4
mkl = mk−1,l + mk,l−1 + δk1 W
M(x, y) =
(42)
M(x, y) =
mkl xk y l
(48)
k=1 l=2
∞ W xy 2 X Γ(j + 5 + 3w) j y . 1 − x − y j=0 Γ(j + 4 + 2w)
(49)
Expanding M(x, y) we obtain mkl = W
(43)
l−2 X Γ(k + l − 2 − j) Γ(j + 5 + 3w) . Γ(k) Γ(l − 1 − j) Γ(j + 4 + 2w) j=0
(50)
Eqs. (46) and (50) constitute the exact solution for the correlation function of the GN with the shifted linear attachment kernel. When the parameter w is an integer, we can reduce nkl to a rational function. In the general case, the exact solution also simplifies in several extreme limits. When k ≫ l, the dominant contribution to nkl is provided by the first term in the sum in Eq. (50). Assuming additionally l ≫ 1 and repeatedly using the asymptotic relation Γ(N + n)/Γ(N ) → N n as N → ∞, we ultimately find
This last result demonstrates the correlations in the network most cleanly. If there were no correlations, then nk nl would be proportional to (k l)−3 .
nkl ≃ W
B. General connection kernels
Γ(5 + 3w) 1+w −5−2w l k Γ(4 + 2w)
k ≫ l ≫ 1. (51)
In the complementary case of l ≫ k ≫ 1, all the terms in the sum of Eq. (50) are important. However, we can simplify this sum by employing the above asymptotics for the ratio of gamma functions and then replacing the sum by an easily-computable integral. We find
In general, correlations between the degrees of neighboring connected nodes exist for any attachment kernel. The analysis of these correlations for an arbitrary kernel is tedious and we merely outline some of the primary results in the relatively simple cases of the shifted linear and constant attachment kernels. In the former case, we follow the same approach as the linear kernel to reduce the rate equation for the correlation function to recursion relations of a similar form to Eq. (37), viz.
nkl ≃ W Γ(2 + w) k −2 l−2−w .
(52)
When the attachment kernel is uniform, correlations between the degrees of a node and its ancestor still develop. To see how this comes about quantitatively, we again follow the same steps as those which led to Eq. (37) and find that the joint distribution nkl now satisfies the recursion relation
(k + l + 2 + 3w)nkl = (k + w − 1)nk−1,l + (45) (l + w − 1) [nk,l−1 + nl−1 δk1 ] . Here nl is determined from Eq. (12). In analogy with Eq. (38), the substitution Γ(k + w) Γ(l + w) mkl Γ(k + l + 3 + 3w)
∞ X ∞ X
is given by
Finally, when both k and l are large and also their ratio is very different from one, the limiting behaviors of nkl are 16 (l/k 5) when l ≪ k, nkl → (44) 4/(k 2 l2 ) when l ≫ k.
nkl =
(47)
where W = (2 + w) Γ(3 + 2w)/(Γ(1 + w))2 . We solve the recursion (47) by the generating function method [40]. Multiplying Eq. (47) by xk y l and summing over all k ≥ 1, l ≥ 2, we find that the generating function
For fixed large √ k, the distribution nkl has a single maximum at y∗ = ( 33 − 5)/2 ∼ = 0.372. Thus a node whose degree k is large is typically linked to another node whose degree is also large; the typical degree of the ancestor is 37% of the degree of the daughter node. In the complementary case of a fixed degree l for the ancestor node, the distribution nkl reaches maximum when k = 1, i. e., the daughter node is usually dangling. From Eq. (41), we find that this configuration occurs with probability 2(l − 1)(l + 6) . n1l = l(l + 1)(l + 2)(l + 3)
Γ(l + 3 + 3w) , Γ(l + 2 + 2w)
3nkl = nk−1,l + nk,l−1 + 2−(l−1) δk1 .
(53)
This recursion relation can again be solved by the generating function technique to give
(46)
10
nkl =
1 2l−1
−
k−1 1 X
3l−1
i=0
1 Γ(l − 1 + i) . Γ(l − 1) Γ(i + 1) 3i
The in-component to node x is the set of all nodes from which node x can be reached by following a path of directed links. Similarly the out-component of node x is the set of nodes which can be reached by following the path directed links which emanate from node x. For the GN model, the out-component is just a single path, while in more realistic networks both the in- and out-components will be branched. In the context of citations, the in-component is the set of all publications which refer to x, either directly or through intermediate reference lists until x is reached. The out-component is the set of cited publications generated by iteratively following the reference list(s) of x and its ancestors.
(54)
To appreciate the qualitative behavior of the joint distribution nkl it is again useful to fix one variable and vary the other. For fixed l, Eq. (54) shows that nkl has a maximum at k = 1. The magnitude of this maximum is n1l = 2−(l−1) − 3−(l−1) . To analyze the behavior when k is fixed, it is convenient to transform Eq. (54) into nkl =
∞ 1 X
3l−1
i=k
1 Γ(l − 1 + i) . Γ(l − 1) Γ(i + 1) 3i
(55)
Now a straightforward analysis shows that for large k, the maximum is attained at l = k/2. The form of the joint distribution nkl remains relatively complex even in the scaling regime, k, l → ∞, with the scaling variable y = l/k kept finite. We determine the scaled form of the solution (55) by applying Stirling’s formula and the identity Γ(x + λ)/Γ(x) → xλ as x → ∞. For y < 2, we find p 1 1 + y −1 −kY nkl ≃ √ e , (56) 2πk 2 − y
A. In-component size distribution
The size distribution of the in-component can be easily obtained by the rate equation formalism for the GN with a uniform attachment kernel and also for the GNR. Given the equivalence between the latter and the GN with a shifted linear kernel, the latter case is also soluble. We start by considering the GN with the uniform attachment kernel. In this case, the number Is (t) of incomponents with s nodes satisfies the rate equation
where Y = y ln y − (y + 1) ln[(y + 1)/3]. For y > 2, it is preferable to use the solution in the form of Eq. (54). After some algebra, we can verify that the dominant contribution equals 2−(l−1) , that is, independent of k [46]. Finally, the limiting behavior of the correlation function is kl−2 3−(k+l−2) (l−2)! when l ≪ k, −1 nkl → 2 × (57) 2−(l−2) when l ≫ k. Thus, correlations are strong even for the random attachment kernel and the qualitative behavior is similar to that of the linear attachment kernel.
(s − 1)Is−1 − sIs dIs = + δs1 . dt t
(58)
To understand this equation, consider first the loss term. For an in-component of size s there are s nodes in which the attachment of a new node causes this component to increase in size by one. This gives a loss rate for Is which is proportional to s. If there is more than one in-component of size s they must be disjoint, so that the total loss rate for Is is simply sIs . A similar argument applies for the gain term. Finally, the overall factor of t−1 converts these rates to normalized probabilities. Curiously, Eqs. (58) are almost identical to the rate equations for the degree distribution of the GN with linear attachment kernel, except that the prefactor equals t−1 rather than (2t)−1 . From Eqs. (58) we can determine all moments of the P n in-component size distribution, In (t) = s≥1 s Is (t). The zeroth moment obeys I˙ 0 = 1, whose solution is I0 (t) = I0 (0) + t. This is obvious since the total number of in-components equals the total number of nodes. The first moment obeys I˙ 1 = 1 + I1 /I0 , whose asymptotic solution is I1 (t) ∼ t ln t. We shall see that this logarithmic factor is an outcome of the asymptotic power law for Is with the tail decaying as s−2 . To solve for Is (t), we note that it again grows linearly in time. Thus we substitute the ansatz Is (t) = tis into Eqs. (58) to obtain i1 = 1/2 and is = is−1 (s − 1)/(s + 1), which immediately leads to
VI. LARGE-SCALE PROPERTIES
The degree of a node is an important but local network characteristic and we now seek to quantify more global features of the network. One such characteristic is the partitioning of the network into an in-component and an out-component with respect to any node (Fig. 5).
in-component
x out-component
FIG. 5. In-component and out-component of node x.
11
is =
1 . s(s + 1)
For this characterization, we begin by reorganizing the GN into a genealogical tree according to a procedure which is suggested by the growth process itself. Generation g = 0 contains the single “seed” node. The nodes which attach to the seed node form generation g = 1, and generally the nodes which attach to nodes in generation g form generation g + 1, independent of when the attachment actually occurs. Thus the position of a node in the genealogical tree depends only on the position of the ancestor node and not on when the node is introduced. In this respect, the GN genealogical tree differs from usual genealogies, where each new generation is born into a progressively later position in the genealogical tree. For example, the network of Fig. 1 has 5 nodes in the first generation and 4 nodes in the second generation leading to the genealogical tree of Fig. 6. The sizes of all generations grow continuously, except for generation g = 0 which always consists of the single node.
(59)
For this s−2 decay, the moments In diverge when n ≥ 1. However, the size of the largest in-component, smax = t, provides an upper threshold in P the computation of the moments. For example, I1 ∼ s≤t sIs (t) = t ln t. It is intriguing that the algebraic in-component distribution co-exists with an exponential in-degree distribution, nk = 2−k . Similarly, we can determine Is (t) for the GNR model. In this case, the number Is (t) of in-components with s nodes satisfies dIs (s − 2 + (1 − r))Is−1 − (s − 1 + (1 − r))Is = (60) dt t for s ≥ 2, and I˙1 = 1 − (1 − r)I1 /t. This rate equation can be understood in a similar manner as Eq. (58). Consider the loss term for an in-component of size s. There are two possibilities to consider: (i) If the apex of the incomponent is initially chosen, then the new node will attach to this apex with probability 1− r (i. e., attach with no re-direction); (ii) If any other of the s − 1 nodes of the in-component is chosen, the new node will surely attach to the in-component even if re-direction occurs. These two processes give a loss rate for Is which is proportional to (s − 1 + (1 − r))Is . Solving for the in-component distribution in this process now yields Is (t) = tis , with is =
1−r . (s − r)(s + 1 − r)
g=0
1
3
6
4
2
5
7
8
9
10
1 2
FIG. 6. Genealogy of the growing random network of Fig. 1. The indices indicate when a node is introduced, while the ancestor determines where a new node is positioned.
(61)
Once we understand the genealogical structure of the GN, we simultaneously establish the out-component distribution. Indeed, the number Os of out-components with s nodes equals Ls−1 , the number of nodes in generation s − 1 in the genealogical tree. We therefore compute Lg (t), the size of generation g at time t. We start with the simplest situation when the attachment rate is uniform. In this case, Lg (t) increases when a new node attaches to a node in the previous generation. This occurs with rate Lg−1 /M0 , where M0 (t) = 1 + t is the total number of nodes. Because of the simplicity of the corresponding rate equations, we use the exact expression for M0 rather than the asymptotic expression M0 ∼ t, as was done in solving for the in-component. Thus we write
Remarkably, the asymptotic power law Is ∝ s−2 holds for any r. It is striking that this apparently universal behavior has also recently been observed in measurements of the Internet [47] Since the GNR model is identical to the GN with the shifted linear attachment kernel Ak = k + ( r1 − 2), Eq. (61) also applies to the in-component distribution for the GN with shifted linear attachment kernels. For example, the in-component distribution for the linear kernel is is = 2/(4s2 − 1). Since the same Is ∝ s−2 decay holds for the GN with both constant and linear attachment kernels, we conjecture that the in-component distribution exhibits a universal s−2 decay for an arbitrary attachment kernel, as long as it does not grow faster than linearly with node degree.
dLg Lg−1 = . dt 1+t
(62)
Solving these equations gives B. Out-component size distribution
Lg (τ ) =
The out-component from each node reveals basic insights about the “genealogy” of the growing network in an extremely simple fashion. For example, it allows us to estimate the diameter of the network, an important characteristic which has been measured for the web graph [48,49] and for social networks [23].
τg g!
where τ = ln(1 + t).
(63)
We therefore conclude that for a fixed (large) time, the generation size grows with g when g < τ , reaches a maximum size which is equal to t Lmax ≃ √ 2π ln t 12
(64)
when g = τ , and then decreases and eventually becomes of order one when g = eτ . The distribution Lg quickly decays when g exceeds the cutoff value eτ . At time t, the genealogical tree therefore contains approximately eτ generations. Hence the diameter D of the network is approximately 2eτ , or D ≈ 2e ln N
The above results can be reformulated in terms of the out-component distribution. In particular, for the GN with uniform attachment kernel, the number Os of outcomponents with s nodes equals Os (τ ) =
(65)
where τ = ln(1 + t).
(68)
Similar results apply for the linear attachment kernel, suggesting that the out-component distribution is robust as long as the attachment kernel does not grow faster than linearly with node degree.
where N = 1 + t is the total number of nodes. Thus, the diameter of an evolving GN exhibits the same N dependence as a static random graph [4]. We can also find the generation size distribution for shifted linear attachment kernels. It is again simpler to derive the rate equations in the framework of the GNR model and then transcribe the results to the shifted linear kernel. For the GNR model, the rate equation for the generation size distribution is (1 − r)Lg−1 + rLg dLg = dt 1+t
τ s−1 (s − 1)!
VII. DISCUSSION AND CONCLUSIONS
In this paper, we have analyzed the structure of the growing network (GN) model and shown that many of its properties can be easily determined within a rate equation approach. We have found that the GN has a powerlaw node degree distribution, Nk (t) ∼ tk −ν , for asymptotically linear attachment kernels, with an exponent ν which is always larger than 2. By tuning parameters of the model in a reasonable way, it is easy to obtain a node degree distribution which is in quantitative agreement with available data for the web graph [14–16,19–21,49]. A remarkable feature of this network is the spontaneous development of correlations between connected nodes. These correlations provide a much more sensitive characterization of the structure of growing networks than the extensively studied degree distribution. These correlations are a crucial feature which distinguishes the GN from classical random graphs. Thus testing for the presence of correlations between node degrees in large evolving networks may provide crucial insights to help determine the underlying mechanism of their growth. We have also studied two specific large-scale properties of the network, namely, the size distributions of the inand out-components with respect to a given site. The in-component distribution exhibits a robust s−2 powerlaw behavior, where s is the component size, as long as the attachment probability does not grow faster than linearly with node degree. The out-component distribution reveals the basic genealogical feature that the number of “generations” in the network grows logarithmically with the total number of nodes, again for attachment kernels which do not grow faster than linearly in node degree. The qualitative agreement between the degree distributions of real evolving networks, such as the web graph, and the GN is reassuring given that the model ignores many important features of real networks. Nevertheless, a number of characteristics of real growing networks are difficult to treat in the framework of the GN model. One important such characteristic is the out-degree distribution. Within the GN model the out-degree of each node is one by construction. In contrast, for real growing networks the out-degree distribution has a power law form [49]. Additionally, the average in- and out-degrees at
(66)
for g > 1, and L˙ 1 = (1 + t)−1 [1 + rL1 ]. The first term in (66) has the same origin as in the GN without redirection and the second term accounts for the change in Lg due to the re-direction. In the latter case, the new node provisionally attaches to a node in generation g; this occurs with relative probability Lg . However, by the re-direction process, this new node actually attaches to a node in generation g − 1 and thereby joins generation g. To solve Eq. (66), we again use τ = ln(1 + t) and apply the Laplace transform technique. After some elementary steps, we obtain Z τ [(1 − r)x]g xr Lg+1 (τ ) = dx e . (67) g! 0 From this solution, we find that for a fixed (large) time, the generation size grows with g when p g < (1 − r)τ , reaches a maximum value Lmax ≃ t/ 2π(1 − r) ln t at g = (1 − r)τ , and then decreases when g > (1 − r)τ . Eventually the generation size becomes of order one when g = Gτ , where G is the root of equation G ln(G/(1−r)) = G + r. The diameter of the network is then D ≈ 2Gτ . These two solvable cases again suggest that the genealogy of the GN is robust, as long as the attachment kernel does not grow faster than linearly with node degree. For the super-linear kernels, however, the genealogy changes drastically. When the attachment exponent exceeds 2, there will be only a few generations overall, and one generation g ∗ will contain all but the finite number of nodes. For such a network, the gel node will reside in generation g ∗ − 1. When the attachment exponent lies in the range 1 < γ < 2, a single generation will also contain almost all t nodes. However, the number of nodes which reside in other generations is of order t2−γ and thus grows as well. Additionally, the number of non-empty generations grows indefinitely with the total number of nodes. 13
existing nodes [50]. This simple construction allows us to generate non-trivial out-degree distributions which closely match web graph data. An even more challenging direction is to describe the global topological structure of growing networks. The GN model leads to a singlecomponent tree graph, while the web graph has numerous disconnected components. A deeper understanding of the web graph may provide valuable insights to help develop algorithms for web crawling, searching, and community discovery. We are grateful to NSF grant DMR9978902 and ARO grant DAAD19-99-1-0173 for partial financial support of this research.
each node are generally larger than one. For the web graph, for example, hii = hji ≈ 7.5 [49]. There are several natural ways to extend the GN model to generate an average out-degree which is greater than one. A simple construction is to link every new node to more than one earlier node, as already discussed in Ref. [20]. Let us consider a network which is built by attaching every new node to exactly p earlier nodes. For the linear attachment kernel, the degree distribution Nk (t), which is now defined only for k ≥ p, evolves according to dNk p [(k − 1)Nk−1 − kNk ] + δkp . = dt M1
(69)
Clearly, the average in-degree hii and out-degree hji of each node in this network is equal to p. By applying the basic approach of Sec. III to this rate equation, we find that the degree distribution again asymptotically approaches a stable distribution Nk → tnk , with nk =
2p(p + 1) k(k + 1)(k + 2)
for
k ≥ p.
[1] See, e. g., G. Caldarelli, P. G. Higgs, and A. J. McKane J. Theor. Biol. 193, 345 (1998) and references therein. [2] S. A. Kauffman, The Origin of Order: Self-Organization and Selection in Evolution (Oxford University Press, London, 1993). [3] Models of Neural Networks I, eds. E. Domany, J. L. van Hemmen, and K. Schulten, 2nd updated ed. (Springer Verlag, New York, 1995). [4] B. Bollob´ as, Random Graphs (Academic Press, London, 1985). [5] S. Janson, T. Luczak, and A. Rucinski, Random Graphs (Wiley, New York, 2000). [6] A. J. Lotka, J. Washington Acad. Sci. 16, 317 (1926). [7] W. Shockley, Proc. IRE 45, 279 (1957). [8] E. Garfield, Science 178, 471 (1972). [9] L. Egghe and R. Rousseau, Introduction to Informetrics (Elsevier, 1990). [10] N. Gilbert, Sociol. Res. 2, 2 (1997). [11] J. Lahererre and D. Sornette, Eur. Phys. J. B 2, 525 (1998). [12] S. Redner, Eur. Phys. J. B 4, 131 (1998). [13] M. E. J. Newman, cond-mat/0007214. [14] S. R. Kumar, P. Raphavan, S. Rajagopalan, and A. Tomkins, in: Proc. 8th WWW Conf. (1999). [15] J. Kleinberg, R. Kumar, P. Raphavan, S. Rajagopalan, and A. Tomkins, in: Lecture Notes in Computer Science, vol. 1627 (Springer-Verlag, Berlin, 1999). [16] S. R. Kumar, P. Raphavan, S. Rajagopalan, and A. Tomkins, in: Proc. 25th VLDB Conf. (1999). [17] B. A. Huberman, P. L. T. Pirolli, J. E. Pitkow, and R. Lukose, Science 280, 95 (1998). [18] B. A. Huberman and L. A. Adamic, Nature 401, 131 (1999). [19] M. Faloutsos, P. Faloutsos, and C. Faloutsos, Comp. Commun. Rev. 29(4), 251 (1999). [20] A. L. Barab´ asi and R. Albert, Science 286, 509 (1999). [21] A. Medina, I. Matta, and J. Byers, Comp. Commun. Rev. 30(2), 18 (2000). [22] R. Larson, in: Ann. Meeting Amer. Soc. Info. Sci. (1996). [23] S. Milgram, Psychol. Today 2, 60 (1967). [24] D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998).
(70)
Thus for the linear attachment kernel, the average node degree does not affect the exponent ν of the degree distribution. However, for other solvable examples, the new feature of attaching the new node to more than one preexisting node leads to different degree distributions. For example, for the shifted linear kernel we find Γ(k + w) Γ(k + 3 + w + w/p) −1 p+w . np = 1 + p 2p + w
nk = const. ×
for k ≥ p,
(71) (72)
This gives the asymptotic behavior nk ∼ k −(3+w/p) . Thus the exponent of the degree distribution depends on the average node degree, with ν = 3 + w/p. The multiple linking construction also reduces the number of nodes with in-degree zero. For example, for the GN with the shifted linear attachment kernel, the fraction of such nodes is n1 = (2 + w)/(3 + 2w), which is always larger than 1/2. However, for the multiple linking construction, the fraction of nodes with in-degree zero is reduced to the value np given in Eq. (72). If we use p = 7 to reproduce the correct average node degree of the web graph, the fraction of nodes with in-degree zero always exceeds 1/8, which, however, apparently disagrees with web data [49]. Thus while multiple attachment does reduce the number of poorly-connected nodes, this reduction is still insufficient to account for web-graph data. However, it is clear that the multiple linking construction has the potential to provide a better description of citation data. Another shortcoming of the multiple attachment construction is that it cannot dynamically generate a nontrivial out-degree distribution. However, we can extend the GN model by allowing for creation of links between 14
[25] A. L. Barab´ asi, R. Albert, and H. Jeong, Physica A 272, 173 (1999). [26] S. N. Dorogovtsev and J. F. F. Mendes, Phys. Rev. E 62, 1842 (2000). [27] S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin, Phys. Rev. Lett. 85, 4633 (2000). [28] P. L. Krapivsky, S. Redner, and F. Leyvraz, Phys. Rev. Lett. 85, 4629 (2000). [29] M. H. Ernst, in: Fundamental Problems in Stat. Physics VI, ed. E. G. D. Cohen (Elsevier, New York, 1985). [30] F. Leyvraz and S. Redner, Phys. Rev. Lett. 57, 163 (1986); Phys. Rev. A 36, 4033 (1987). [31] F. Calogero and F. Leyvraz, J. Phys. A 33, 5619 (2000). [32] H. A. Simon, Biometrica 42, 425 (1955). [33] H. A. Simon, Models of Man (Wiley, New York, 1957). [34] D. H. Zanette and S. C. Manrubia, nlin.AO/0009046. [35] E. Becker and W. D¨ oring, Ann. Phyz. 24, 719 (1935); see also J. S. Langer in Solids Far From Equilibrium, ed. C. Godr`eche (Cambridge University Press, Cabridge, 1992). [36] E. M. Hendricks and M. H. Ernst, J. Coll. Interface Sci. 97, 176 (1984) [37] T. Matsoukas and E. Gulari, J. Coll. Interface Sci. 132, 13 (1989). [38] A. Pimpinelli and J. Villain, Physics of Crystal Growth, (Cambridge University Press, Cambridge, 1998). [39] P. L. Krapivsky, G. J. Rogers, and S. Redner, in preparation. [40] R. L. Graham, D. E. Knuth, and O. Patashnik, Con-
[41]
[42] [43] [44] [45] [46]
[47] [48] [49]
[50]
15
crete Mathematics: A Foundation for Computer Science, (Reading, Mass.: Addison-Wesley, 1989). See e. g., N. V. Brilliantov and P. L. Krapivsky, Sov. Phys. Solid State 31, 271 (1989); J. Phys. A 24, 4787 (1991); J. A. Blackman and A. Wielding, Europhys. Lett. 16, 115 (1991); Ph. Laurencot, Nonlinearity 12, 229 (1999); P. L. Krapivsky, J. F. F. Mendes, and S. Redner, Phys. Rev. B 59, 15950 (1999). All of the citation data discussed here can be obtained from http://physics.bu.edu/∼redner. M. Molloy and B. Reed, Random Struct. Alg. 6, 161 (1995); Combin. Probab. Comput. 7, 295 (1998). W. Aiello, F. Chung, and L. Lu, in: Proc. 32nd ACM Symposium on Theory of Computing (2000). M. E. J. Newman, S. H. Strogatz, and D. J. Watts, condmat/0007235. The apparently singular behavior at y = 2 indicates that a scale finer than linear in k is √required in the proximity of y = 2. The proper scale is k and its use resolves the singularity and matches solutions for y < 2 and y > 2. G. Caldarelli, R. Marchetti, and L. Pietronero, Europhys. Lett. 52, 386 (2000) R. Albert, H. Jeong, and A. L. Barab´ asi, Nature 401, 130 (1999). A. Broder, R. Kumar, F. Maghoul, P. Raphavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, Computer Networks 33, 309 (2000). P. L. Krapivsky, G. J. Rodgers, and S. Redner, preprint.