with no multiple edges and no isolated vertices, and show that the degree distributions of the resulting graphs are have either a Zipfian distribution, or. 1 ...
Graph Operations and Zipfian Degree Distributions Walter Kirchherr Department of Mathematics and Computer Science San Jose State University San Jose, CA 95192-0249
Abstract The probability distribution on a set S = { 1, 2, . . . , n } defined by Pr(k) = 1/(Hn k), where Hn in the nth harmonic number, is commonly called a Zipfian distribution. In this note we look at the degree distribution of graphs created from graphs with Zipfian degree distributions using some standard graph theoretical operations.
Introduction: The probability distribution on a set S = { 1, 2, . . . , n } defined by Pr(k) = 1/(Hn k), where Hn in the nth harmonic number, is commonly called a Zipfian distribution, after George Zipf (1902 - 1950), who observed it in relation to word frequency in English. (See, for example, the discussion in [1]). Recently, it has been observed (see, e.g., [4] and [5]) that the worldwide web, modeled as a directed graph, has a degree sequence that is vaguely ”Zipf-like”. (Modeling the web as a graph is done in the obvious way, i.e., a website is a node and links between sites are edges.) In an earlier paper (see [2]) the question of the existence of graphs whose degree distributions are exactly Zipfian was addressed. Here we look at the degree distributions that result when graphs are constructed from Zipfian-distribution graphs using two graph theoretic operations. We consider undirected graphs with no multiple edges and no isolated vertices, and show that the degree distributions of the resulting graphs are have either a Zipfian distribution, or 1
one that is nearly so. We close with two theorems about degree distributions that follow naturally from our calculations. We use the notation introduced in [2]. Definition Z(n, m) is the set of all graphs on n vertices and with maximum outdegree m whose degree sequence exhibits a Zipfian distribution. In [2] it was shown that Z(Hm m!, ˙ m) is not empty and in fact contains a connected graph if and only if m is not equal to 3 or 4. It is also noted that any G ∈ Z(n, m) contains (mn)/(2Hm ) edges. Graphs Built From Graphs with Zipfian Degree Distributions For a warm-up, we first look at multiple copies of Z(n, m). Definition: kG is the graph which consists of k copies of G. Theorem: If G ∈ Z(n, m) then kG ∈ Z(kn, m) proof: kG has k|V (G)| = kn vertices, k|E(G)| = (knm)/(2Hm ) edges, and maximum degree m. Thus |{ w ∈ kG | d(w) = d }| = k|{ v ∈ G | d(v) = d }| n = k dHm Hence, Pr(d(w) = d in kG) =
kn/dHm 1 = . kn dHm ////
We now look at the product of two graphs. (The operation we call product, as defined below, is also known variously in the literature as Cartesian product, sum, and β-product. See, e.g., [3].) Definition: G × H is a graph on vertex set V (G) × V (H) where (x1 , y1 ) is connected to (x2 , y2 ) if and only if (x1 , x2 ) ∈ E(G) and y1 = y2 or x1 = x2 and (y1 , y2 ) ∈ E(H). In the following we let G be a graph in Z(n, m) and H be a graph in Z(t, r). Further, we use v to indicate a vertex in G, w to indicate a vertex in 2
H, and (v, w) to indicate a vertex in G × H. Without loss of generality, we assume that r ≤ m. We want to know the degree distribution of G × H). It is known that G × H has nt vertices and n|E(H)| + t|E(G)| edges. It is further known that deg(v, w) = deg(v) + deg(w). (See [3].) If both G and H have Zipfian degree distributions, this means that 1 nrt mnt |E(G × H)| = ( + ) edges. 2 Hr Hm Thus, recalling that isolated vertices are not allowed, that r ≤ m, which means that 1 ≤ d − i ≤ r, we get |{ (v, w)| deg(v, w) = d }| =
d−1 X
|{ v| deg(v) = i }| · |{ w| deg(w) = d − i }|
i=1 min(m,d−1)
X
=
|{ v| deg(v) = i }| · |{ w| deg(w) = d − i }|
i=max(1,d−r)
nt = Hm Hr
min(m,d−1)
1 1 · i d−i i=max(1,d−r) X
To evaluate this, we look at b X 1
b b 1 1 X 1 X 1 · = + n k=1 k k=1 n − k k=1 k n − k
"
#
b n−1 X 1 1 X 1 + = n k=1 k k=n−b k
=
by partial fractions by a change of index in the second sum
1 [Hb + Hn−1 − Hn−b−1 ] n
for n ≥ b + 1
Thus, b X 1
1 1 = [(Hb + Hn−1 − Hn−b−1 ) − (Ha−1 + Hn−1 − Hn − a)] n k=1 k n − k 1 = (Hb − Ha−1 − Hn−b−1 + Hn−a ) for n ≥ b + 1, b ≥ a, and a ≥ 2 n ·
(In the above, we assume that H0 = 0.)
3
Hence, nt |{ (v, w)| deg(v, w) = d }| = Hm Hr
min(m,d−1)
1 1 · i d−i i=max(1,d−r) X
yields three cases: case (1.) d − r > 1 (thus, max(1, d − r) = 1) and d − 1 ≤ m (thus, min(d − 1, m) = d − 1) (This is the case of vertices of “small” degree.): i 1 1 h · Hd−1 − Hd−1 + Hd−(d−1)−1 d Hm Hr = 0
Pr(deg(v, w) = d) =
(Which fact, of course, follows from the definition of product. ) case (2.) d−r > 1 and d−1 ≤ m (This is the case of vertices of “middle-sized” degree.) 1 d 1 = d
Pr(deg(v, w) = d) =
i 1 h Hd−1 − Hd−r−1 + Hd−(d−1)−1 + Hr Hm Hr 1 · [Hd−1 − Hd−r−1 + Hr ] Hm Hr
·
case (3.) d − 1 > m (This is the case of vertices of “large” degree. Pr(deg(v, w) = d) =
1 1 · [Hm − Hd−r−1 + Hd−m−1 + Hr ] d Hm Hr
(This is all the cases, since r ≤ m.) Notice that a Zipfian distribution in this case would have the above probability as 1/(dHm+r ). It is not too hard to believe that G × H is “vaguely Zipfian”, but to state so definitively would require a precise notion of that concept. We leave that for later work. Two Theorems for Distributions The above calculations lead naturally to two simple theorems concerning the degree distributions of graphs build using the operations described above. The second theorem generalizes easily. We leave the proofs, which are implicit in the calculations above, to the reader. 4
Theorem 1 Given G on n vertices and degree distribution given as µG (d) = Pr(deg(v) = d). Then, letting k · G denote k copies of G, we have that µk·G (d) = µG (d) Theorem 2 Given G and H with degree probability distributions µG and µH , respectively, and letting G × H be defined as above, µG×H (d) =
X
µG (i) · µH (j)
i+j=d
A straightforward induction generalizes the above theorem as follows: Given G1 , G2 , . . . , Gk with degree probability distributions µ1 , µ2 , . . . , µk , respectively, let H = G1 × G2 × · · · × Gk . Then µH (d) =
X
k Y
µj (ij )
i1 +i2 +···+ik =d j=1
References [1] R. Graham, D. Knuth, and O. Patashnik, Concrete Mathematics, Addison-Wesley, 1989. [2] W. Kirchherr, ”On graphs with a Zipfian degree distribution”, in preparation. [3] W. Kirchherr, ”NEPS operations on cordial graphs”, Discrete Mathematics 115 (1993) pp. 201-209. [4] J. Kleinberg, R. Kumar, P. Raghavan, S. Rajogopalan, and A. Tomkins,”The web as a graph: measurements, models, and methods”, in manuscript. [5] R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, ”Extracting large-scale knowledge bases from the web”, Proceedings of the 25th VLDB Conference, 1999, Edinburgh, Scotland.
5