Supplemental Information for Dimensionality of social networks using ...

2 downloads 79 Views 320KB Size Report
The geometric-protean model (GEO-P) model is a model for online social networks ... geometric and ranking information into an evolving network structure.
Supplemental Information for Dimensionality of social networks using motifs and eigenvalues Anthony Bonato,1∗ David F. Gleich,2∗ Myunghwan Kim,3 Dieter Mitsche,4 Pawel Pralat,1 Amanda Tian,1 Stephen J. Young5 1 Department of Mathematics, Ryerson University Toronto, Ontario, Canada 2 Computer Science Department, Purdue University West Lafayette, Indiana, United States of America 3 Electrical Engineering Department, Stanford University Stanford, California, United States of America 4 Laboratoire J.A. Dieudonn´ e, Universit´ e de Nice Sophia-Antipolis Nice, France 5 Mathematics Department, University of Louisville Louisville, Kentucky, United States of America ∗ To whom correspondence should be addressed. E-mail: [email protected], [email protected].

Memoryless GEO-P Model Review of GEO-P The geometric-protean model (GEO-P) model is a model for online social networks which incorporates geometric and ranking information into an evolving network structure. More specifically, the GEO-P model, as defined by Bonato, Janssen, and Pralat [3], defines a sequence of graphs {Gt : t ≥ 0} on n nodes where Gt = (Vt , Et ), based on four parameters: the attachment strength α ∈ (0, 1), the density parameter β ∈ (0, 1 − α), the dimension m ∈ N, and the link probability p ∈ (0, 1]. Each node v ∈ Vt has a unique rank r(v, t) ∈ [n] where [n] = {1, 2, . . . , n}; we explicitly list r(v, t) to emphasize that the rank may change with time. In order to stay consistent with the standard usage, the highest rank is 1 and the lowest rank is n. Additionally, each node has a geometric location in [0, 1]m under the torus metric d(·, ·). m That is, for any two points x, y ∈ [0, 1]m , d(x, y) is defined to be min {kx − y − uk∞ : u ∈ {−1, 0, 1} } . We note that this implies that the geometric space is symmetric in any point as the metric “wraps” around. For any node v, we define its influence region at time t ≥ 0, written R(v, t), to be the ball of Euclidean volume r(v, t)−α n−β centered at v. Notice that, since the we are in the torus metric, this is a cube measuring r(v, t)−α/m n−β/m on a side. Note. All asymptotic results in this paper are with respect to n. We say that a statement holds with extremely high probability, if it holds with probability at least 1 − exp(−ω(n) log n) for some function ω(n) with ω(n) → ∞ as n → ∞. In particular, if there are a polynomial number of events, each of which holds with extremely high probability, then all of them hold with extremely high probability. Let G0 be any graph. In order to form Gt from Gt−1 , first choose a node w uniformly at random from Vt−1 and remove it. The remaining nodes are re-ranked, that is, all nodes with lower ranks than w decrease their ranks by 1. Then place a node v uniformly at random in [0, 1]m , generate uniformly at random a rank for v, and re-rank the remaining nodes again. Finally, for every node u which is such that v is in the influence region of u, add the edge {u, v} with probability p. It is clear that this process depends only on the current state of Gt , and so forms a ergodic Markov chain with a limiting distribution π. A random instance of GEO-P is then defined to be a sample from this limiting distribution. 1

It is clear that the distributions of edges of Gt are determined by the relative rank histories of all the nodes at the time the other nodes entered. More specifically, if we order the nodes of Gt according to their age with node 1 being the oldest, then for any i > j the probability of the edge {i, j} being present is determined by their respective geometric locations and the rank of node j when node i arrives. Thus, in order to sample from the limiting distribution π it suffices to sample from the distributions of node histories, then randomly assign locations to the nodes, and determine if the edges are present. We note that according to the distribution π the final permutation between ages and ranks is uniformly distributed over all permutations. Since there are n! permutations of nodes and at most n2 different permutations reachable from a given state, it takes at least logn2 (n!) = n2 (1 − o(1)) iterations to reach the stationary distribution. Standard results in the mixing rate of random graphs  suggest that in order to assure that a sample is close to the stationary distribution at least Ω log nn!2 = Ω(n log(n)) iterations are required. In fact, it is easy to see that the stationary distribution is reached at the time when the last node from the initial graph G0 is removed, which happens with probability 1 + o(1) after (1 + o(1))n log n steps, by the coupon collector problem.

Introducing MGEO-P For large n this number of iterations is a significant computational roadblock, so we introduce here a variant of the GEO-P model which we call a memoryless geometric-protean graph (MGEO-P). In essence this model is the GEO-P model where the each node has forgotten its history of ranks. More specifically, a permutation σ on [n] is chosen uniformly at random and σ(i) represents the rank of the ith oldest node. Thus, for each pair i > j the edge {i, j} is potentially present if and only if the node j is in the ball of volume σ(i)−α n−β centered around node i. It is worth noting that,√as shown in Bonato et 2 al. Lemma 5.2[3], if a node    in the GEO-P model receives an initial rank R ≥ n log n, then its rank

is R 1 + O log− /2 (n) = R(1 + o(1)) for its entire lifetime with extremely high probability. Thus, if we imagine coupling the MGEO-P model in the natural way to GEO-P, and assuming that ranks do not change much as mentioned above, we have that for all but a vanishing fraction of the edges, the  probability that a given edge is present in one model but not the other is O pn−α+2β/2 log(n)1−4α/2 . Hence, we would intuitively expect that the MGEO-P model would not differ too much from GEO-P model. In order to confirm this we prove that the parameters we are interested in do not differ by much from the proven parameters of the GEO-P model. Specifically, we look at the average degree, the degree distribution, and the diameter. 1

An equivalent description of the MGEO-P model We now describe a model that is equivalent to the MGEO-P model just introduced, but that we found useful for our analysis. It has a different interpretation. The key change is that we reverse the way links are formed: when a node i arrives in the network, then all existing nodes j form links to i if i is within the influence regions of j. Intuitively, this models how links may arise in a citation network – a new paper links to those that are topically related (that is, nearby in the metric space) or highly influential. In the language we used above, this process is: fix a permutation σ on [n] chosen uniformly at random and σ(i) represents the rank of the ith oldest node. Thus, for each pair i > j the edge {i, j} is potentially present if and only if the node i is in the ball of volume σ(j)−α n−β centered around node j. The two descriptions are equivalent as we can simply reverse the order of vertex arrivals. Thus, they induce the same distribution over graphs because the order is a uniform random choice.

The average degree In order to consider the degrees, we first need the following standard result on the tails of the hypergeometric distribution, see for instance Jansen et al.[4] 2

Lemma 1. Let X be the number of red balls in a set of t balls chosen at random from a set of n balls containing m red balls. Then, E[X] = tm n , and for any  > 0,   2 −  2 tm ≤ e 2+ 3 P X ≥ (1 + ) n

tm n

.

Further, for any  ∈ (0, 1),   2 tm tm P X ≤ (1 − ) = e− 2 n . n Theorem 1. Let α ∈ (0, 1), β ∈ (0, 1−α), n ∈ N, m ∈ N, p ∈ (0, 1]. Let v be a node of MGEO-P(n, m, α, β, p) with rank R and age i, then   s   2 i−1 p log (n)  , deg(v) = n1−α−β + (n − i)pR−α n−β 1 + O n−11−α n1−α−β with extremely high probability. Proof. Let deg+ (v) denote the number of older neighbors of v and let deg− (v) denote the younger neighbors of v. In order to determine deg+ (v) we consider connecting v to nodes of all ranks other than R and keeping i − 1 of those uniformly at random. The expected degree of v before the edge deletion is n X

pr

−α −β

n

− pR

−α −β

n

= pn

−β

Z

n

x−αdx + O(1) =

1

r=1

p n1−α−β + O(1) . 1−α

  i−1 p 1−α−β Thus, E deg+ (v) = n−1 + O(1). Furthermore, by Chernoff bounds the initial degree of v is 1−α n    log(n) p 1−α−β with extremely high probability, and thus, Lemma 1 gives that 1 + O √ 1−α−β 1−α n n



deg+ (v) ≤

s

i−1 p n1−α−β 1 + O n−11−α

log

2

(n)nα+β i

 

with extremely high probability as well. Additionally, if i ≥ log3 (n)nα+β , then equality holds. Since the edge probability between v and the younger nodes does not depend on the rank of the younger neighbors, deg− (v) can be expressed as a sum of independent random variables which has expectation (n − i)pR−α n−β . Hence, by Chernoff bounds it follows that with extremely high probability  s  2 α nβ log (n)R  . deg− (v) ≤ (n − i)pR−α n−β 1 + O n−i Now if n − i ≥ log3 (n)Rα nβ , then equality holds. Combining deg+ (v) and deg− (v) we have that with extremely high probability s  s 2 2 i−1 p log (n)i log (n)(n − i) . deg(v) ≤ n1−α−β + (n − i)pR−α n−β + O + n 1−α nα+β R α nβ

3

In order to express the error in a multiplicative faction, we note that log2 (n)i nα+β

2

 log2 (n) ∈ O 1−α−β n     2 log2 (n)i Rα nβ log2 (n) ∈ O nα+β n−i n1−α−β   2  log2 (n)(n − i) Rα nβ log2 (n) ∈ O R α nβ n−i n1−α−β     2 log2 (n)(n − i) nα nβ log2 (n) ∈ O 1−α−β Rα nβ i−1 n 

nα+β i−1



i≥

n , 2

i≤

n , 2 

i≤n−n  i≥n−n

R 2n



R 2n



, .

Thus, for the entire range of i both of the error terms are individually dominated by one of the primary terms and hence, we have that with extremely high probability  s    2 i−1 p log (n)  , deg(v) ≤ n1−α−β + (n − i)pR−α n−β 1 + O n−11−α n1−α−β and furthermore if log3 (n)nα+β ≤ i ≤ n − log3 (n)Rα nβ , then equality holds. Noting that s     α   3 2 E deg+ (v) n i−1 1 R log (n) log (n)   = ∈ O 1−α−β ⊂ o n−1n−i1−α n n n1−α−β E deg− (v) where i ≤ log3 (n)nα+β , and  s     2  n α E deg− (v) log3 (n) n−1n−i log (n) ,   = (1 − α) ∈ O 1−α−β ⊂ o n i−1 R n n1−α−β E deg+ (v) where n − i ≥ log3 (n)Rα nβ completes the proof. Theorem 2. Let α ∈ (0, 1), β ∈ (0, 1 − α), n ∈ N, m ∈ N, and p ∈ (0, 1], then with extremely high probability the average degree of node of MGEO-P(n, m, α, β, p) is s   2 log (n) p  . n1−α−β 1 + O d= 1−α n1−α−β Proof. From the proof of Theorem 1 we have that with extremely high probability for a node v with age i,  s  2 i − 1 p log (n)  , deg+ (v) ≤ n1−α−β 1 + O n−11−α n1−α−β with equality if i ≥ log3 (n)nα+β . Now since every edge is counted exactly once in deg+ (u) for some node

4

u, the average degree is with extremely high probability 2 |E| 2X = deg+ (v) n n v  s  n 2 2 X i−1 p log (n)  ≤ n1−α−β 1 + O n i=1 n − 1 1 − α n1−α−β  s  n 2 X log (n) 2pn−α−β     (i − 1) = 1+O n1−α−β (n − 1)(1 − α) i=1   s   2 2pn−α−β log (n) n     = 1+O n1−α−β (n − 1)(1 − α) 2  s  2 p log (n)  . = n1−α−β 1 + O 1−α n1−α−β

In a similar manner, we find that  s    3  2 2 |E|  p 2pn−α−β log (n)nα+β log (n) 1−α−β    ≥ 1+O n − n n1−α−β 1−α (n − 1)(1 − α) 2  s     log2 (n)  log6 (n) p = 1 + O  n1−α−β 1 + O 1−α−β 2−2α−2β n n 1−α   s 2 p log (n)  , = n1−α−β 1 + O 1−α n1−α−β completing the proof.

The degree distribution Let Nj be the number of nodes in MGEO-P(n, m, α, β, p) with degree precisely j and let N≥k = P ∞ j=k Nj be the number of nodes in degree MGEO-P(n, m, α, β, p) with degree at least k. We will 1 show that similarly to the geometric protean graphs, N≥k ∝ k − α for a significant range of k, and thus, MGEO-P(n, m, α, β, p) exhibits a power-law degree distribution over that range with power-law exponent 1 + α1 . Following prior work[3] we will characterize the pairs (i, R) of ages and ranks which will assure that the degree of a node is at least k and show that this value concentrates about its expectation using the following specialization of the Azuma-Hoeffding inequality. Theorem 3 (McDiarmid’s Inequality). If X1 , X2 , . . . , Xn are independent random variables and f (x1 , x2 , . . . , xn ) is a function such that for every i ∈ [n] |f (x1 , . . . , xi , . . . , xn ) − f (x1 , . . . , x ˆi , . . . , xn )| ≤ ci , then for any  > 0 P(|f (X1 , X2 , . . . , Xn ) − E[f (X1 , X2 , . . . , Xn )]| > ) < 2e 5

2 2 i ci

− P

.

We will use the notation f (n)  g(n) if f (n)/g(n) → ∞ as n → ∞. Similarly, f (n)  g(n) if g(n)/f (n) → ∞ as n → ∞. Moreover, it will be convenient not to worry about less significant factors, ˜ (n)) to denote any function which is at most f (n) logO(1) n. so we will use O(f Theorem 4. Let α ∈ (0,q1), β ∈ (0, 1 − α), n ∈ N, m ∈ N, p ∈ (0, 1], and let k and  be such that 1− α −β 1−β 1 1 log2 (n) n1−α−β n1−α−β  k  nlogα2(n) ,  , and  ≥ c log(n)k α n 2 − α for some c > 0, then with , k n1−α−β extremely high probability MGEO-P(n, m, α, β, p) satisfies that N≥k = (1 + O())

α 1/α (1−β)/α −1/α p n k . 1+α

Proof. We first note that if the age rank pair (i, R) for a node v satisfies that   α1 R 1−α−β n − i 1 ≤ (1 − ) pn , n n k then by Theorem 1 with extremely high probability   s  2 log (n) i−1 p  n1−α−β + (n − i)pR−α n−β 1 + O deg(v) = n−11−α n1−α−β s  !  −α 2 i−1 p n − i R log (n)  = n1−α−β + p n1−α−β 1 + O n−11−α n n n1−α−β  s    2 k i−1 p log (n) 1 + O  n1−α−β + ≥ n−11−α (1 − )α n1−α−β 

k (1 + o()) (1 − )α k + o(k) = (1 − )α = k + αk + o(k)

=

> k. Similarly, if   α1 R 1−α−β n − i 1 ≥ (1 + ) pn , n n k then with extremely high probability deg(v) < k. Let Xi be the event that the node with age i has rank R satisfying  1 n−i1 α R ≤ (1 − )n pn1−α−β , n k and let Yi be the event that the node with age i has rank R satisfying    α1  α1 1−α−β n − i 1 1−α−β n − i 1 ≤ R ≤ (1 + )n pn . (1 − )n pn n k n k

6

Letting X =

P

i

Xi and Y =

P

i

Yi we have that X ≤ N≥k ≤ X + Y . Thus, consider

E[X] =

X

E[Xi ]

i

=

X

 (1 − ) pn

1−α−β n

i

 = (1 − ) 

pn−α−β k −α−β

−i1 n k

 α1 X

 α1

1

(n − i) α

i

 α1 

 1+α α α = (1 − ) n + O(1) k 1+α  1−α−β  α1   1  α pn  α +o = (1 − ) n . 1+α k n pn

2 E[X]. We note as well that E[Y ] = 1− We recall that the age-rank pairs can be represented by a permutation σ chosen uniformly at random from the symmetric group, and thus, it can be generated by a sequence of transpositions (1, a1 )(2, a2 ) · (n, an ) where each ai is chosen independently and uniformly at random from {i, i + 1, . . . , n}. Thus, X (and Y ) may be viewed as a function of independent random variables and so Theorem 3 applies. Furthermore, the change of any particular variable impacts the value of X by at most 2. Hence, we note that,  1−α−β  α2  1 2 α2 pn 2 2 2 ∈ Ω log2 (n) ,  E[X] ≥  (1 − ) n 2 n (1 + α) k

and thus, with extremely high probability |X − E[X]| ≤ E[X] and |Y − E[Y ]| ≤ E[X]. Hence, we have that E[X] − E[X] ≤ N≥k ≤ E[X] + 3E[X] , with extremely high probability and the desired result follows. We note that by choosing  = log− /3 (n) we can easily obtain the same type of degree distribution result for MGEO-P that exists for the original GEO-P[3]. 1

Theorem 5. Let α ∈ (0, 1), β ∈ (0, 1 − α), n ∈ N, m ∈ N, p ∈ (0, 1], and n1−α−β log /2 (n) ≤ k ≤ n1− 1

α/2−β

log−2α−1 (n),

then with extremely high probability MGEO-P(n, m, α, β, p) satisfies    α 1 1 (1−β)/α − 1 N≥k = 1 + O log− /3 (n) p /α n k α, 1+α where N≥k is the number of nodes of degree at least k.

The diameter Theorem 6. Let α ∈ (0, 1), β ∈ (0, 1−α), n ∈ N, m ∈ N, p ∈ (0, 1]. The diameter of MGEO-P(n, m, α, β, p) 1 is nΘ( m ) with extremely high probability.

7

  β ˜ n (1−α)m ∈ nO( m1 ) . To this end, let Proof. We first show that the diameter is O t = 10



n

β (1−α)

ln

2α 1−α

(n)

1/m 

β < 1, by and divide [0, 1)m into tm uniform subcubes with side-lengths 1t in the natural way. Now, as (1−α)   β 1− ˜ n (1−α) nodes in each of the subcubes with extremely high probability. Chernoff bounds there are Θ   β ˜ n (1−α)m it suffices to show that for any two nodes u and v at Thus, in order to show diameter O

`∞ -distance at most 2/t the graph distance between the two nodes is at most some fixed constant. Now consider an arbitrary node v. By Chernoff bounds, with extremely high probability there are  α+β Ω n1−α−β nodes at `∞ distance at most 21 n− m from v and age rank at least n2 (that is, young α+β nodes). As the radius of influence of every node is at least 21 n− m , this implies that with extremely high  α+β probability every node has Ω n1−α−β neighbors at `∞ -distance at most 12 n− m with age rank at least n 2. In a similar manner, by combining Lemma 1 and Chernoff bounds, with extremely high probability every node v with age rank at least n2 has      Ω (1/t)m nβ/(1−α) ln2/(1−α) n = Ω ln−2α/(1−α)+2/(1−α) n = Ω ln2 n neighbors at `∞ -distance at most

1 2t ,

β

with rank at most n 1−α ln2/(1−α) n, and age rank at most n/2 (that β

is, old nodes). Note that each node with rank at most n 1−α ln2/(1−α) n has radius of influence at least 1/m 5 1  −αβ = (1 + o(1)) . n 1−α −β ln−2α/(1−α) n 2 t Combining these two observations we have that, with extremely high probability, every node v is  α+β 1 of a set of Θ ln2 (n) nodes, Xv , with rank at within graph-distance two and `∞ -distance n− m + 2t β

most n 1−α ln2/(1−α) n. Thus, if u and v are at `∞ -distance at most 2t , the distance between elements of α+β Xu and Xv is at most 2t + 2n− m + 1t ≤ 4t . On the other hand, as we already mentioned, the radius of influence of each node in Xu or Xv is at least 4/t. Thus, with extremely high probability, some member of Xu and Xv are adjacent and hence u and v are within graph distance 5, completing the proof of the upper bound. For the lower bound, let us take some node v and consider distances to other nodes. With probability 1 − 2−m some other node is at `∞ -distance at least 1/4. Hence, by Chernoff bounds, with extremely high probability there exist two nodes at `∞ -distance at least 14 . As the  diameter of every influence region is β

β

at most n− m , this gives that the diameter of the graph is Ω n m

∈ nΩ(1/m) .

Experimental design Given a graph G = (V, E), we employ the following methods to determine the dimension m of the MGEO-P models: Experiment 1 1. Set n to the number of nodes. Determine values of α and β independently of m (see the appendix of the original paper). 2. Simulate 50 samples of an MGEO-P network with m varying between 1 and 12. 8

3. Compute the graphlet counts for each sample of MGEO-P and train a SVM classifier to predict the dimension of the network given the samples. 4. Compute the graphlet counts for the graph G and use the output from the classifier as the dimension m of the network. Experiment 2 1 & 2. As in experiment 1. 3. Compute the spectral density for one sample of MGEO-P for each m between 1 and 12 (only one MGEO-P sample is used to get the density).1 4. Compute the spectral density of the graph G and find the value of m that minimizes the KL-divergence between the density from the graph and the MGEO-P samples. The first approach employs a complex statistical technique—the support vector machine classifier—to determine nonlinear predictive correlations among the graphlet counts and the dimension. This sophistication renders the method opaque and difficult to interpret the precise similarity mechanism. The second approach is simple and still illustrates the dimensional scaling, although the precise dimensions differ, which indicates that it is matching the network in a different way.

Estimating dimensions using graphlets and support vector machines The relationship between the dimension of a graph and its graphlets is highly nonlinear and so we used a multi-class support-vector machine (SVM) based classification tool from WEKA to predict this relationship. In this case, each dimension is a class, but as an SVM can only make a binary decision we train the SVM using a dimension-vs-dimension classification. That is, we build a classifer to predict dimension 5-vs-dimension 3, dimension 5-vs-dimension 4, etc. so there are 66 = “12-choose-2” SVMs trained. The dimension picked most often among these classifiers is the predicted class; this is the standard behavior of the sequential minimal optimization classifier (SMO) used in Weka. The dimension of a real-world network is then predicted by running this classifier on the graphlet counts of the networks. An alternative methodology (which has had some previous success) would be to to train the classifier using alternating decision trees; however this training methodology significantly restricts the behavior of the classifier and produces inconsistent results.

Comparing spectral densities Given the eigenvalues of the normalized Laplacian, we compute a spectral density by taking a 201-bin histogram of these eigenvalues. We then use the KL-divergence between these histograms as used in Banerjee and Jost (2009) as a measure of similarity. If P A and P B are the histograms of networks A and B normalized to probabilities, then for our 201-bin histograms we have that: KL(A, B) =

201 X

log(PiA /PiB )PiA .

i=1

We select the single best dimension based on the value of m that minimizes the KL divergence KL(S, Gm ) where S is the sample of either Facebook or LinkedIn and Gm is a sample of a MGEO-P network with dimension m. We add 1 to all of the eigenvalue counts in the histogram as a form of smoothing for the probabilities. We define a dimension interval by looking at the maximum interval such that the extreme points are within 105% of the true minimum. 1 We only use one sample of the MGEO-P network to estimate the eigenvalue distribution as these computations are time-consuming and our preliminary studies showed that the spectral density had only small variations between repeated samples.

9

Sensitivity studies In the following sections, we study how the predicted dimension changes due to large scale structural changes in the graph. We focus our efforts on studying the Facebook samples as the LinkedIn samples are highly correlated due to the temporal nature of their construction. Our results show that 1. Erd˝ os R´enyi random graphs have no apparent dimension. 2. The graphlet fitting methodology is influenced by the degree distribution in a way that generates high variance in the predicted dimension but where a logarithmic trend may still exist. This effect is not present in the spectral histograms. 3. The graphlet fitting methodology is robust to changing 10% of the edges of the network via a random percolation process.

Dimensions of Erd˝ os R´ enyi random graphs In our first experiment to verify the relevance of our dimensionality fits, we attempt to fit the dimension of an Erd˝ os R´enyi random graph with the same number of expected edges. That is, for each of the samples of the Facebook network, we run the SVM dimension classifier we constructed on the graphlet counts of 50 separate Erd˝ os R´enyi random graph samples where the probability is designed to yield the number of edges of the original network in expectation. In all but 3 of the 5000 examples (50 samples for each of the 100 graphs), the predicted dimension is the maximum 12. When the dimension was not the maximum in those three cases, it was 11. When we tried this with dimensions up to 10, then the Erd˝ os R´enyi random graphs fit to the dimension 10, thus, we expect these graphs to be predicted at the highest dimension of the training set. We see this as evidence that our graphlet methodology is sensitive to clearly erroneous graphs.

Dimensions of random graphs with the same degree distribution In our second experiment to verify the relevance of our dimension fits, we attempt to fit the dimension of a graph with the same degree distribution as one of the Facebook networks but with edges randomly drawn. To generate these graphs, we use the Bayati-Saberi-Kim procedure[1] as implemented in the bisquik library[2]. This method terminated for 92 of the 100 graphs. (The process did not terminate in the other 8 cases, which is a limitation of this particular sampling scheme.) The dimensional fits for these 92 resampled networks are shown in Figure S1. The eigenvalue fits show no logarithmic scaling in the dimension whereas the graphlet fits do. However, the variance in the predicted dimensions based on graphlets is substantially higher for these random samples compared to the original networks (see Figure 4 in the main text). The evidence from graphlets alone, is then, possibly biased due to the degree distribution. However, the results from the spectral histograms, the graphlets, and the prediction dimension from the model itself encourage us to be more optimistic.

Dimension variance with random percolation In our final experiment, we study random percolation of the predicted dimension of the Facebook networks. In a random percolation process, we randomly sample an edge from the network, delete it, replace it with an edge between two randomly drawn nodes, and continue until we have done this procedure k times. We study how the predicted dimension varies as we change 1%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50% of the total edges of a network. For each of the 100 Facebook networks, and each 10

12

10

10 Predicted dimension

Predicted dimension

12

8 6 4 2

SVM

8 6 4 2

Linear Fit R2=0.478606 0 3

4

10

Eigen Linear Fit R2=0.716962

0

10

3

4

10

Vertices

10 Vertices

Figure S1: Predicted dimensions of random graphs with the same degree distribution. At left, the predicted dimension using our SVM-Graphlet methodology and at right, the prediction dimension using our spectral histogram methodology percentage of total edges, we repeat the percolation process 10 times. This generates 110 total networks for each Facebook network. Figure S2 shows a box-plot of how the predicted dimension varies for each perturbation level over all 1100 total graphs. This plot suggests that the dimension is unchanged until more than 15% of the edges have been percolated. This figure further illustrates that the predicted dimension is a stable quantity for a network that is not overly sensitive to small perturbations.

Online codes Our experimental codes and data for the graphlet and spectral densities are available from https://www.cs.purdue.edu/homes/dgleich/codes/geop-dim/

11

Absolute deviation of perturbed prediction

7 6 5 4 3 2 1 0 −1 0.01 0.05

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Fraction of perturbed edges

0.5

Figure S2: The change in the predicted dimension based on the graphlets methodology as we randomly percolate small or large fractions of the total edges in the network. Each box-plot represents the results over all 100 Facebook networks. The label 0.05 corresponds to randomly altering 5% of the total edges in a network. The line tracks the mean over all the samples.

12

Full statistics of Facebook data Table of properties of the Facebook networks. Name

Nodes

Edges

PL Exp

Eff. Diam

α

β

DimG

DimE

Caltech36 Reed98 Haverford76 Simmons81 Swarthmore42 Amherst41 Bowdoin47 Hamilton46 Trinity100 USFCA72 Williams40 Oberlin44 Smith60 Wellesley22 Vassar85 Middlebury45 Pepperdine86 Colgate88 Santa74 Wesleyan43 Mich67 Bucknell39 Brandeis99 Howard90 Rice31 Rochester38 Lehigh96 Johns-Hopkins55 Wake73 American75 MIT8 William77 UChicago30 Princeton12 Carnegie49 Tufts18 UC64 Vermont70 Emory27 Dartmouth6 Tulane29 WashU32 Villanova62 Vanderbilt48 Yale4 Brown11 UCSC68 Maine59 Georgetown15 Duke14 Bingham82 Mississippi66 Northwestern25

769 962 1446 1518 1659 2235 2252 2314 2613 2682 2790 2920 2970 2970 3068 3075 3445 3482 3578 3593 3748 3826 3898 4047 4087 4563 5075 5180 5372 6386 6440 6472 6591 6596 6637 6682 6833 7324 7460 7694 7752 7755 7772 8069 8578 8600 8991 9069 9414 9895 10004 10521 10567

16656 18812 59589 32988 61050 90954 84387 96394 111996 65252 112986 89912 97133 94899 119161 124610 152007 155043 151747 138035 81903 158864 137567 204850 184828 161404 198347 186586 279191 217662 251252 266378 208103 293320 249967 249728 155332 191221 330014 304076 283918 367541 314989 427832 405450 384526 224584 243247 425638 506442 362894 610911 488337

7.00 4.38 7.00 4.74 5.60 5.64 5.80 4.63 5.83 4.13 5.17 4.83 5.78 4.64 5.93 6.59 5.27 6.07 5.74 4.72 5.03 5.70 4.98 6.48 6.30 5.34 6.40 5.57 5.71 4.85 5.24 5.19 4.80 5.35 4.98 5.48 5.71 4.72 6.18 5.45 6.64 5.23 5.58 5.61 5.79 4.85 5.11 5.25 4.91 5.52 5.96 5.46 5.65

3.81 3.88 3.63 3.92 3.77 3.81 3.81 3.79 3.84 3.97 3.82 3.96 3.84 3.92 3.82 3.92 3.88 3.81 3.90 3.92 4.26 3.85 3.85 3.81 3.82 3.96 3.90 4.07 3.86 4.07 4.02 3.90 4.27 4.00 4.04 4.22 4.58 4.24 4.02 4.11 4.10 3.94 4.09 3.91 4.02 4.01 4.56 4.29 4.13 4.02 4.08 3.90 3.98

0.17 0.30 0.17 0.27 0.22 0.22 0.21 0.28 0.21 0.32 0.24 0.26 0.21 0.27 0.20 0.18 0.23 0.20 0.21 0.27 0.25 0.21 0.25 0.18 0.19 0.23 0.19 0.22 0.21 0.26 0.24 0.24 0.26 0.23 0.25 0.22 0.21 0.27 0.19 0.22 0.18 0.24 0.22 0.22 0.21 0.26 0.24 0.24 0.26 0.22 0.20 0.22 0.22

0.27 0.17 0.23 0.22 0.20 0.21 0.23 0.15 0.23 0.19 0.21 0.22 0.27 0.21 0.26 0.27 0.22 0.25 0.25 0.20 0.29 0.25 0.23 0.26 0.27 0.27 0.30 0.28 0.25 0.26 0.27 0.26 0.27 0.26 0.26 0.29 0.36 0.29 0.30 0.29 0.34 0.26 0.29 0.26 0.29 0.24 0.33 0.33 0.25 0.28 0.33 0.26 0.30

4 3 4 4 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 4 5 4 4 5 4 5 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 5 5 5 5 5

5 5 5 5 6 6 6 6 6 5 6 6 6 5 6 7 6 6 6 6 6 6 6 8 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 6 6 7 6 6

13

Name

Nodes

Edges

PL Exp

Eff. Diam

α

β

DimG

DimE

Cal65 BC17 Stanford3 Columbia2 Notre-Dame57 GWU54 Baylor93 USF51 Syracuse56 Temple83 UC61 Northeastern19 JMU79 UPenn7 UCSB37 UCF52 UCSD34 Harvard1 MU78 UMass92 UC33 Tennessee95 UVA16 UConn91 Oklahoma97 USC35 UNC28 Auburn71 Cornell5 BU10 UCLA26 Maryland58 Virginia63 NYU9 Berkeley13 Wisconsin87 UGA50 Rutgers89 FSU53 Indiana69 Michigan23 UIllinois20 Texas80 MSU24 UF21 Texas84 Penn94

11247 11509 11621 11770 12155 12193 12803 13377 13653 13686 13746 13882 14070 14916 14935 14940 14948 15126 15436 16516 16808 16979 17196 17212 17425 17444 18163 18448 18660 19700 20467 20871 21325 21679 22937 23842 24389 24580 27737 29747 30147 30809 31560 32375 35123 36371 41554

351358 486967 568330 444333 541339 469528 679817 321214 543982 360795 442174 381934 485564 686501 482224 428989 443221 824617 649449 519385 522147 770659 789321 604870 892528 801853 766800 973918 790777 637528 747613 744862 698178 715715 852444 835952 1174057 784602 1034802 1305765 1176516 1264428 1219650 1118774 1465660 1590655 1362229

7.00 5.41 5.76 4.29 4.83 4.76 5.81 4.93 5.93 4.35 5.33 4.39 5.20 5.89 5.54 5.51 5.01 5.69 6.21 4.54 4.79 4.83 5.45 5.15 5.26 6.31 4.45 4.99 5.10 5.32 5.67 5.41 4.54 4.15 4.32 4.74 5.77 5.33 5.79 5.40 5.12 5.56 5.65 5.11 4.92 4.79 4.16

4.33 4.13 4.31 4.37 4.01 4.14 3.90 4.65 4.08 4.52 4.47 4.54 3.98 4.15 4.49 4.28 4.46 4.41 4.08 4.15 4.46 4.05 4.00 4.09 3.97 4.09 3.99 3.92 4.34 4.49 4.51 4.19 4.14 4.42 4.32 4.30 3.99 4.60 4.45 4.22 4.53 4.35 4.44 4.38 4.17 4.08 4.56

0.17 0.23 0.21 0.30 0.26 0.27 0.21 0.25 0.20 0.30 0.23 0.29 0.24 0.20 0.22 0.22 0.25 0.21 0.19 0.28 0.26 0.26 0.22 0.24 0.23 0.19 0.29 0.25 0.24 0.23 0.21 0.23 0.28 0.32 0.30 0.27 0.21 0.23 0.21 0.23 0.24 0.22 0.22 0.24 0.26 0.26 0.32

0.39 0.30 0.30 0.24 0.26 0.27 0.30 0.34 0.34 0.29 0.33 0.28 0.32 0.32 0.35 0.36 0.33 0.30 0.35 0.29 0.31 0.28 0.31 0.32 0.29 0.35 0.26 0.28 0.31 0.35 0.35 0.34 0.30 0.26 0.27 0.31 0.34 0.36 0.37 0.34 0.33 0.35 0.37 0.35 0.32 0.31 0.29

5 5 5 6 6 6 5 5 5 6 5 6 6 5 5 6 6 5 6 6 6 6 6 6 5 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7

7 7 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 6 6 7 8 6

Table of log-graphlets counts of the Facebook networks Name

˜1 G

˜2 G

˜3 G

˜4 G

˜5 G

˜6 G

˜7 G

˜8 G

Caltech36 Reed98 Haverford76

13.718 13.902 15.514

11.731 11.560 13.307

17.388 17.884 19.375

17.766 18.200 20.127

16.802 17.004 18.868

13.808 14.272 16.273

14.806 14.729 16.775

13.148 12.498 14.728

14

Name

˜1 G

˜2 G

˜3 G

˜4 G

˜5 G

˜6 G

˜7 G

˜8 G

Simmons81 Swarthmore42 Amherst41 Bowdoin47 Hamilton46 Trinity100 USFCA72 Williams40 Oberlin44 Smith60 Wellesley22 Vassar85 Middlebury45 Pepperdine86 Colgate88 Santa74 Wesleyan43 Mich67 Bucknell39 Brandeis99 Howard90 Rice31 Rochester38 Lehigh96 Johns-Hopkins55 Wake73 American75 MIT8 William77 UChicago30 Princeton12 Carnegie49 Tufts18 UC64 Vermont70 Emory27 Dartmouth6 Tulane29 WashU32 Villanova62 Vanderbilt48 Yale4 Brown11 UCSC68 Maine59 Georgetown15 Duke14 Bingham82 Mississippi66 Northwestern25 Cal65 BC17 Stanford3 Columbia2 Notre-Dame57 GWU54 Baylor93 USF51

14.421 15.558 15.964 15.868 16.075 16.201 15.404 16.270 15.860 15.860 15.993 16.294 16.338 16.763 16.570 16.642 16.523 15.523 16.604 16.584 17.223 16.950 16.592 16.849 16.825 17.496 16.981 17.263 17.297 16.832 17.450 17.195 17.178 16.320 16.550 17.508 17.437 17.287 17.751 17.474 18.003 17.867 17.739 16.615 16.843 17.875 18.194 17.416 18.525 18.054 17.326 17.905 18.314 18.027 18.120 17.944 18.529 17.259

12.015 13.229 13.672 13.491 13.724 13.887 12.859 13.833 13.215 13.325 13.350 13.655 13.914 14.319 14.120 14.183 14.002 12.990 14.135 13.875 14.542 14.489 14.103 14.300 14.287 15.025 14.238 14.629 14.587 14.068 14.720 14.619 14.458 13.797 13.750 14.937 14.626 14.752 15.070 14.767 15.401 15.043 14.846 13.980 13.954 15.029 15.468 14.647 15.924 15.302 14.567 14.992 15.527 15.042 15.088 15.032 15.734 14.443

18.476 19.853 20.383 20.066 20.384 20.534 19.837 20.883 20.484 20.315 20.594 20.725 20.785 21.290 20.940 21.265 20.967 20.435 21.011 23.371 22.062 21.533 21.867 21.824 22.032 22.285 21.921 22.254 23.076 22.312 22.371 22.400 22.155 21.212 21.667 22.487 22.655 22.379 24.290 22.473 23.641 23.511 23.016 21.546 22.256 23.093 23.419 22.543 24.072 24.403 22.509 23.342 23.744 24.979 23.689 24.613 24.456 22.687

18.950 20.360 20.974 20.694 20.950 21.243 20.414 21.393 20.975 20.981 21.118 21.409 21.454 21.864 21.675 21.854 21.619 20.894 21.762 21.924 22.623 22.186 21.935 22.269 22.336 22.886 22.521 22.856 23.038 22.533 23.056 22.890 22.758 21.792 22.193 23.133 23.152 22.950 23.744 23.104 23.678 23.710 23.580 22.232 22.519 23.689 24.008 23.260 24.494 24.064 23.209 23.902 24.356 24.149 24.111 24.060 24.655 23.190

17.604 19.099 19.725 19.302 19.552 19.833 18.884 20.029 19.351 19.411 19.533 19.774 19.995 20.403 20.159 20.362 20.035 19.409 20.236 21.023 20.957 20.657 20.429 20.725 20.841 21.320 20.751 21.200 21.438 20.798 21.296 21.264 20.978 20.205 20.403 21.521 21.351 21.338 22.343 21.376 22.035 21.951 21.716 20.499 20.624 21.824 22.262 21.441 22.847 22.399 21.433 22.019 22.520 22.584 22.105 22.284 22.822 21.281

14.759 16.372 16.908 16.519 16.808 17.065 16.018 17.129 16.442 16.281 16.633 17.012 17.117 17.543 17.317 17.468 17.142 16.214 17.303 17.326 18.341 17.556 17.335 17.709 17.745 18.248 17.591 18.098 18.222 17.599 18.367 18.152 17.908 16.810 17.205 18.507 18.337 18.099 18.961 18.316 18.877 18.763 18.701 17.054 17.278 18.778 19.163 18.197 19.665 19.015 17.982 18.969 19.306 19.149 18.935 18.838 19.610 18.062

15.341 16.884 17.565 17.053 17.239 17.596 16.458 17.696 16.813 16.912 16.993 17.262 17.672 18.043 17.827 17.996 17.618 16.944 17.879 18.008 18.425 18.237 17.966 18.273 18.391 18.895 18.095 18.656 18.679 18.102 18.714 18.713 18.353 17.797 17.741 19.140 18.757 18.891 19.492 18.806 19.431 19.200 19.047 17.956 17.875 19.132 19.713 18.915 20.335 19.570 18.757 19.366 19.783 19.705 19.317 19.363 20.119 18.569

13.296 14.733 15.582 15.091 15.129 15.634 14.442 15.621 14.691 15.261 14.840 15.029 15.709 16.114 15.916 16.035 15.653 15.171 16.079 15.722 16.117 16.396 16.241 16.449 16.594 17.249 16.398 16.864 16.888 16.351 16.637 16.857 16.427 16.375 15.907 17.365 16.699 17.347 17.472 16.948 17.748 17.250 17.014 16.316 16.161 17.152 17.913 17.258 18.567 17.727 17.390 17.386 18.121 17.484 17.287 17.599 18.376 16.929

15

Name

˜1 G

˜2 G

˜3 G

˜4 G

˜5 G

˜6 G

˜7 G

˜8 G

Syracuse56 Temple83 UC61 Northeastern19 JMU79 UPenn7 UCSB37 UCF52 UCSD34 Harvard1 MU78 UMass92 UC33 Tennessee95 UVA16 UConn91 Oklahoma97 USC35 UNC28 Auburn71 Cornell5 BU10 UCLA26 Maryland58 Virginia63 NYU9 Berkeley13 Wisconsin87 UGA50 Rutgers89 FSU53 Indiana69 Michigan23 UIllinois20 Texas80 MSU24 UF21 Texas84 Penn94

17.948 17.470 17.692 17.324 17.826 18.403 17.768 17.936 17.596 18.857 18.186 17.800 17.831 18.822 18.681 18.012 18.956 18.681 18.724 19.105 18.598 18.035 18.331 18.376 18.448 18.307 18.637 18.472 19.010 18.244 18.729 18.989 18.861 18.957 18.905 18.829 19.324 19.574 19.106

15.270 14.446 15.061 14.343 14.817 15.520 14.983 15.121 14.777 15.913 15.370 14.756 14.971 15.821 15.691 15.073 16.166 15.760 15.544 16.140 15.614 14.925 15.435 15.374 15.322 15.117 15.458 15.374 16.124 15.303 15.938 16.028 15.914 16.040 16.092 15.676 16.295 16.232 15.797

23.999 23.621 23.205 22.931 25.055 23.941 23.076 25.126 23.748 24.389 23.543 24.903 23.489 25.122 24.568 23.583 24.691 24.776 24.894 25.604 25.378 23.718 23.907 25.593 26.598 24.927 26.726 25.177 24.970 24.316 24.706 24.958 25.187 25.801 24.841 25.554 27.899 26.785 26.853

24.101 23.581 23.766 23.363 23.911 24.550 23.754 23.785 23.679 25.025 24.261 23.986 23.945 24.846 24.786 24.141 25.130 24.863 24.532 25.360 24.834 24.231 24.597 24.728 24.835 24.656 25.366 24.862 25.418 24.585 25.089 25.537 25.433 25.446 25.387 25.234 26.069 26.199 25.886

22.329 21.669 22.100 21.360 22.157 22.617 21.915 22.209 21.847 23.095 22.387 22.100 22.109 22.964 22.854 22.113 23.322 22.972 22.325 23.539 22.939 22.137 22.657 22.783 23.067 22.562 23.590 22.817 23.515 22.615 23.220 23.496 23.418 23.498 23.520 23.194 24.224 24.047 23.738

18.987 18.532 18.442 18.038 18.468 19.369 18.352 18.435 18.195 20.079 18.844 18.601 18.467 19.566 19.422 18.638 19.898 19.538 18.877 19.947 19.449 18.678 19.068 19.165 19.158 19.064 19.668 19.153 19.869 18.987 19.551 19.971 19.984 19.865 19.807 19.711 20.261 20.229 19.985

19.663 18.860 19.376 18.618 19.027 19.867 19.210 19.460 19.061 20.333 19.641 19.198 19.340 20.153 20.060 19.347 20.686 20.228 19.196 20.678 20.112 19.259 19.866 19.773 19.911 19.574 20.406 19.957 20.751 19.843 20.543 20.731 20.730 20.728 20.845 20.328 21.154 20.911 20.750

18.297 16.869 17.989 16.892 17.504 18.070 17.784 18.270 17.590 18.188 18.263 17.446 17.816 18.553 18.381 17.725 19.128 18.667 17.272 19.123 18.464 17.708 18.416 18.264 18.418 17.772 18.426 18.362 19.238 18.299 19.192 19.257 19.046 19.296 19.340 18.660 19.737 19.327 18.978

16

Full statistics of LinkedIn data Table of properties of the LinkedIn networks. We only compute eigenvalue dimensional fits up to 72, 000 nodes.

Nodes

Edges

PL Exp

Eff. Diam

α

β

DimG

DimE

578 650 709 774 916 1012 1158 1312 1512 1800 2259 2643 2972 3389 3817 4158 4508 4898 5288 5728 6216 6719 7230 7892 8551 9572 10444 11222 12210 13043 14173 15246 16499 17816 19522 21125 22665 24530 26535 28760 31176 33834 37010 40032 43567 47027 51542 55984 61138 66622 71627

3140 3596 4032 4448 5300 5983 6955 7957 9287 11366 14643 17795 20465 23432 26656 29086 31557 34273 37173 40550 44091 47813 51600 56238 60985 68113 74215 79805 87129 93158 101279 109186 118284 127765 140191 151651 163457 177418 191947 207848 225897 246160 270718 295219 322710 350726 387570 423919 465192 510201 551151

3.01 3.02 2.93 2.77 2.74 2.82 2.77 2.75 2.80 2.73 2.66 2.74 2.61 2.76 2.82 2.81 2.80 2.81 2.85 2.86 2.85 2.81 2.84 2.83 2.87 2.88 2.90 2.91 2.93 2.95 3.22 2.94 3.22 3.23 3.28 2.79 2.79 3.38 2.87 3.78 2.84 2.84 3.39 3.34 2.78 2.79 2.78 3.39 3.44 3.52 3.48

4.70 4.75 4.68 4.68 4.71 4.69 4.77 4.79 4.83 4.93 5.07 5.06 5.10 5.20 5.28 5.23 5.30 5.37 5.30 5.36 5.40 5.47 5.50 5.47 5.56 5.54 5.55 5.58 5.57 5.59 5.59 5.59 5.62 5.65 5.67 5.68 5.66 5.69 5.72 5.72 5.77 5.79 5.78 5.80 5.81 5.83 5.81 5.82 5.83 5.85 5.85

0.50 0.50 0.52 0.56 0.57 0.55 0.56 0.57 0.56 0.58 0.60 0.57 0.62 0.57 0.55 0.55 0.56 0.55 0.54 0.54 0.54 0.55 0.54 0.55 0.53 0.53 0.53 0.52 0.52 0.51 0.45 0.52 0.45 0.45 0.44 0.56 0.56 0.42 0.53 0.36 0.54 0.54 0.42 0.43 0.56 0.56 0.56 0.42 0.41 0.40 0.40

0.14 0.13 0.12 0.07 0.07 0.10 0.08 0.08 0.11 0.09 0.08 0.10 0.06 0.12 0.14 0.14 0.13 0.15 0.15 0.16 0.16 0.15 0.16 0.16 0.17 0.18 0.19 0.19 0.20 0.21 0.27 0.21 0.28 0.28 0.29 0.18 0.18 0.32 0.21 0.38 0.20 0.20 0.33 0.32 0.19 0.20 0.19 0.33 0.34 0.36 0.35

4 4 3 3 3 4 4 4 4 4 4 4 4 4 5 4 4 5 5 5 4 5 5 5 5 5 5 5 5 5 6 5 6 5 6 6 5 7 5 7 6 5 6 6 5 5 5 6 6 6 5

3 4 3 3 3 4 4 3 4 3 3 4 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 4 5 5 5 4 4 5 4 5 4 4 5 5 4 4 4 5 5 5 5

17

Nodes

Edges

PL Exp

Eff. Diam

α

β

DimG

DimE

77093 82869 88870 95073 102902 111651 122320 132572 142230 152676 164333 177058 193345 209628 226563 245398 354977 397649 441553 489997

594813 640416 688431 738405 802009 874290 963418 1051233 1131895 1219866 1322238 1436964 1582671 1732864 1886060 2058727 3184846 3675170 4214245 4827461

3.51 3.48 3.46 3.44 3.42 3.33 3.34 3.30 3.28 3.11 3.11 3.13 3.11 3.10 3.11 3.11 3.06 3.05 3.00 2.96

5.85 5.84 5.84 5.85 5.86 5.86 5.85 5.83 5.87 5.85 5.86 5.85 5.85 5.83 5.85 5.85 5.78 5.77 5.75 5.71

0.40 0.40 0.41 0.41 0.41 0.43 0.43 0.43 0.44 0.47 0.47 0.47 0.47 0.48 0.47 0.47 0.49 0.49 0.50 0.51

0.36 0.36 0.36 0.35 0.35 0.34 0.34 0.34 0.33 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.29 0.29 0.27 0.27

7 7 7 7 7 6 6 6 5 6 6 6 6 6 6 6 6 6 5 5

– – – – – – – – – – – – – – – – – – – –

Table of log-graphlets counts of the LinkedIn networks Nodes

Edges

˜1 G

˜2 G

˜3 G

˜4 G

˜5 G

˜6 G

˜7 G

˜8 G

578 650 709 774 916 1012 1158 1312 1512 1800 2259 2643 2972 3389 3817 4158 4508 4898 5288 5728 6216 6719 7230 7892 8551 9572 10444 11222 12210 13043

3140 3596 4032 4448 5300 5983 6955 7957 9287 11366 14643 17795 20465 23432 26656 29086 31557 34273 37173 40550 44091 47813 51600 56238 60985 68113 74215 79805 87129 93158

10.678 11.307 10.984 11.127 11.545 11.679 11.975 12.037 11.981 12.312 12.829 12.950 13.370 13.400 13.637 13.756 13.595 13.938 13.934 14.019 14.151 14.166 14.211 14.425 14.603 14.601 14.714 14.846 14.859 14.902

8.080 8.431 8.191 8.343 8.548 8.627 8.907 8.987 8.910 9.220 9.611 9.818 10.055 10.158 10.429 10.457 10.275 10.669 10.588 10.704 10.861 10.793 10.757 11.036 11.195 11.177 11.250 11.323 11.328 11.391

14.270 14.360 14.744 14.544 15.146 15.359 15.813 16.088 16.181 16.612 17.055 17.430 17.753 18.100 18.340 18.462 18.466 18.724 18.907 18.863 19.020 19.205 19.294 19.416 19.590 19.731 19.884 20.062 20.262 20.347

14.010 14.339 14.447 14.567 15.140 15.352 15.704 15.968 16.124 16.586 16.942 17.355 17.616 17.857 18.168 18.271 18.351 18.549 18.709 18.782 18.906 19.043 19.138 19.290 19.467 19.616 19.776 19.938 20.071 20.150

12.623 12.850 12.969 12.975 13.552 13.729 14.067 14.128 14.272 14.704 14.908 15.410 15.654 15.849 16.176 16.198 16.277 16.476 16.554 16.633 16.768 16.876 16.985 17.054 17.200 17.340 17.488 17.660 17.785 17.842

9.442 9.630 9.860 9.870 10.558 10.604 11.013 11.117 11.245 11.626 11.793 12.197 12.515 12.669 12.856 12.879 13.087 13.110 13.301 13.360 13.440 13.490 13.604 13.675 13.877 14.060 14.081 14.307 14.374 14.340

10.268 10.333 10.536 10.482 10.973 11.147 11.402 11.306 11.480 11.908 12.065 12.603 12.816 12.935 13.198 13.258 13.277 13.424 13.486 13.633 13.950 13.894 14.056 14.063 14.183 14.324 14.538 14.618 14.815 14.886

8.009 8.593 8.466 8.281 8.324 9.139 8.895 8.937 9.044 9.624 9.777 10.141 10.289 10.441 10.994 10.979 10.947 11.171 11.050 11.138 11.909 11.573 11.973 11.819 12.030 12.466 12.177 12.336 12.742 12.793

18

Nodes

Edges

˜1 G

˜2 G

˜3 G

˜4 G

˜5 G

˜6 G

˜7 G

˜8 G

14173 15246 16499 17816 19522 21125 22665 24530 26535 28760 31176 33834 37010 40032 43567 47027 51542 55984 61138 66622 71627 77093 82869 88870 95073 102902 111651 122320 132572 142230 152676 164333 177058 193345 209628 226563 245398 354977 397649 441553 489997

101279 109186 118284 127765 140191 151651 163457 177418 191947 207848 225897 246160 270718 295219 322710 350726 387570 423919 465192 510201 551151 594813 640416 688431 738405 802009 874290 963418 1051233 1131895 1219866 1322238 1436964 1582671 1732864 1886060 2058727 3184846 3675170 4214245 4827461

15.075 15.192 15.287 15.348 15.407 15.574 15.545 15.695 15.856 15.901 16.062 16.121 16.109 16.308 16.311 16.447 16.643 16.672 16.785 16.909 17.021 17.326 17.171 17.360 17.335 17.494 17.561 17.725 17.969 18.002 18.035 18.129 18.531 18.459 18.681 18.917 18.849 19.350 19.705 20.009 20.415

11.475 11.585 11.706 11.718 11.786 11.892 11.894 12.010 12.175 12.218 12.383 12.431 12.442 12.580 12.661 12.772 12.907 13.042 13.091 13.252 13.327 13.518 13.494 13.533 13.648 13.791 13.771 13.903 14.055 14.160 14.195 14.247 14.481 14.531 14.664 14.762 14.867 15.411 15.624 16.003 16.301

20.485 20.530 20.780 20.896 20.997 21.147 21.324 21.394 21.575 21.773 21.861 21.976 22.189 22.390 22.513 22.910 23.111 23.136 23.437 23.495 23.814 24.178 24.223 24.481 24.491 24.777 24.827 25.348 25.645 25.824 26.271 26.387 26.716 26.974 27.251 27.405 27.594 28.372 28.791 29.337 29.879

20.336 20.391 20.561 20.700 20.886 21.001 21.098 21.229 21.375 21.496 21.642 21.751 21.905 22.087 22.241 22.449 22.622 22.703 22.882 23.050 23.221 23.393 23.509 23.671 23.778 24.000 24.137 24.403 24.607 24.723 24.898 25.058 25.250 25.471 25.644 25.809 25.941 26.869 27.215 27.588 28.002

17.975 17.970 18.211 18.365 18.490 18.584 18.692 18.813 18.919 19.061 19.204 19.328 19.480 19.681 19.883 20.115 20.307 20.462 20.627 20.757 20.953 21.177 21.307 21.411 21.467 21.704 21.795 22.081 22.280 22.382 22.631 22.757 22.942 23.162 23.265 23.464 23.604 24.554 24.904 25.406 25.887

14.590 14.588 14.738 14.808 15.042 15.123 15.193 15.341 15.421 15.557 15.681 15.833 15.996 16.157 16.388 16.625 16.800 16.986 17.095 17.263 17.420 17.532 17.627 17.763 17.780 17.954 18.075 18.252 18.363 18.438 18.627 18.709 18.814 19.039 19.148 19.330 19.391 20.246 20.512 20.909 21.298

14.979 14.910 15.147 15.328 15.476 15.520 15.657 15.775 15.855 16.028 16.232 16.457 16.560 16.758 17.021 17.254 17.468 17.686 17.831 17.995 18.241 18.447 18.560 18.696 18.650 18.901 18.979 19.307 19.539 19.612 19.866 19.881 20.034 20.179 20.219 20.427 20.522 21.400 21.733 22.201 22.579

12.737 12.745 12.942 13.113 13.284 13.240 13.372 13.531 13.797 13.798 14.142 14.196 14.376 14.438 14.667 14.909 15.314 15.354 15.562 15.727 15.881 16.081 16.126 16.325 16.217 16.598 16.630 16.934 17.232 17.297 17.313 17.430 17.458 17.742 17.881 18.021 18.135 19.064 19.452 19.979 20.384

19

References and Notes [1] M. Bayati, J.H. Kim, A. Saberi, Algorithmica (2010) 58: 860910. [2] Gleich, https://www.github.com/dgleich/bisquik [3] A. Bonato, J. Janssen, P. Pralat. Internet Mathematics 8 2012, 2–28. [4] S. Jansen, T. Luczak, A. Rucinski, Random graphs, Wiley-Intersci. Ser. Discrete Math. Optim. (2000).

20