Beyond clustering: Mean-field dynamics on networks with arbitrary subgraph composition arXiv:1405.6234v1 [math.DS] 23 May 2014
Martin Ritchie 1 , Luc Berthouze2,3 & Istvan Z. Kiss1,∗ May 27, 2014 1
2
School of Mathematical and Physical Sciences, Department of Mathematics, University of Sussex, Falmer, Brighton BN1 9QH, UK Centre for Computational Neuroscience and Robotics, University of Sussex, Falmer, Brighton BN1 9QH, UK 3 Institute of Child Health, London, University College London, London WC1E 6BT, UK Abstract Clustering is the propensity of nodes that share a common neighbour to be connected. It is ubiquitous in many networks but poses many modelling challenges. Clustering typically manifests itself by a higher than expected frequency of triangles, and this has led to the principle of constructing networks from such building blocks. This principle has been generalised to networks being constructed from a set of more exotic subgraphs. As long as these are fully connected, it is then possible to derive mean-field models that approximate epidemic dynamics well. However, there are virtually no results for non-fully connected subgraphs. In this paper, we provide a general and automated approach to deriving a set of ordinary differential equations, or mean-field model, that describe, to a high degree of accuracy, the expected values of system-level quantities. Moreover, our approach offers a previously unattainable degree of control over the arrangement of subgraphs and network characteristics such as classical node degree, variance and clustering. The combination of these features makes it possible to generate families of networks with different subgraphs composition while keeping classical network indicators constant. Using our approach, we show that higher-order structure realised either through the introduction of loops of different size or by generating clustered networks with or without overlapping triangles, lead to significant differences in epidemic dynamics despite controlling for the basic network metrics. Hence, our methodology provides the vehicle for further systematic investigation of the impact of higher-order network structure on dynamics on networks.
Keywords: networks; subgraphs; epidemic; high-order structure; automated code generation
∗corresponding author email:
[email protected]
1
2
1
Introduction
Network models have revolutionised our way of thinking about complex phenomena such as the spreading of disease, the development of neuronal avalanches, the formation and interaction of social groups. Mathematical epidemiology, in particular, has embraced and benefited greatly from the use of network as a modelling paradigm, with models ranging from data-driven models [1–3] to theoretical models [4–6]. These have been used to study the impact of different network properties on how diseases break out and spread. Network models have led to greater clarity in understanding and quantifying the impact of contact heterogeneity, preferential mixing and community structure, including households [7,8]. Although clustering of contacts or transitivity, i.e., the propensity of nodes with a common neighbour to be connected, is pervasive in many real-world networks, it continues to pose many significant challenges to the community, both from the viewpoint of network generation and even more so from that of modelling, namely, deriving well-performing approximate models. To investigate the impact of network properties, one can either use empirical networks or ones that have been computationally generated from theoretical network models with tunable properties [4, 9, 10]. Many algorithms exist for clustering, but it is generally the case that focusing on achieving a particular clustering leads to changes above and beyond those controlled by the algorithm; and this can preclude correct analysis of the impact of clustering [9, 11–17]. When looking at the impact of higher-order structure, for example, it is important that degree distribution remain the same between networks with different clustering. Some algorithms have been proposed [9, 18] and are based on the notion of subgraphs, where clustering is achieved by mixing fully-connected subgraph types such as fully-connected triples or quadruples, and nonfully connected subgraphs such as overlapping triangles. Using such networks, Volz et al. [18] have developed a low-dimensional ODE model that approximates well the expected value of a number of system-level quantities, and Karrer & Newman [9] have provided final epidemic size results for a mix of different subgraphs. Further, House et al. [12, 19] generalised the pairwise approach to closure at the level of all possible quadruples or subgraphs. However, a number of outstanding issues remain. The Volz et al. model, which provides time evolution, can handle well only fully-connected subgraphs. Karrer & Newman’s approach, which combines a wider variety of subgraphs, can only characterise large-time limits. Finally, to our knowledge, House et al.’s approach has not been compared to stochastic simulations and it will perform poorly for heterogeneous networks [12]. In this paper, we provide a general and automated approach to deriving a set of ODEs that describe, to a high degree of accuracy, the expected values of system-level quantities, e.g., prevalence, for networks that are generated based on an arbitrary set of subgraphs. This is achieved by a rigorous separation of the role of nodes within the subgraphs and by using the probability generating function (PGF) formalism to correctly track (a) the distribution of subgraphs to which nodes belong and (b) the excess degree that is generalised from the classical notion of a stub of a single edge to different corner types given by the subgraphs. This is a significant step toward being able to (a) accurately model and analyse dynamical processes on networks with higher-order structure such as real-world networks, (b) map out the impact of clustering in the classic sense, and more importantly, its impact at a higher level involving four or more nodes [11], and (c) provide much needed insights into the role of small units/subgraphs or motifs in epidemiology and systems biology.
3 The paper is organised as follows. After introducing some necessary model components, we describe the PGF formalism for subgraph corners. A key component of the proposed model is to label and track the position of each and every node in all subgraphs in order to avoid any ambiguity as to the role of nodes in non-fully-connected subgraphs. This section includes a general step-by-step explanation of the model derivation with concrete examples for a particular network and a detailed presentation of the code-generating algorithm. We then compare our approach to state-of-the-art models that can, in principle and to some extent, capture the system’s expected behaviour. Where fair comparisons are possible, we show excellent agreement. In other scenarios, we show our model to either outperform existing models or to produce accurate results where other models fail.
2
Model derivation
This section presents a derivation of a general SIR epidemic model for a network built from an arbitrary number of subgraph types. Conceptually, this model uses the node labelling approach of Karrer & Newman [9] and generalises the PGF-type framework of Volz et al. [18,20]. By taking this approach it is possible to derive ODEs that accurately predict the epidemic prevalence on networks that exhibit a variety of exotic subgraphs, both fully- and non-fully connected. The first step is to choose the set of subgraphs to be included in the network. Let the arbitrary set of subgraphs be denoted by {G1 , G2 , . . . , GM }. For example, Fig. 1 shows M = 5 different subgraphs, which results in m = 17 distinct node positions, where m stands for the total number of nodes over all subgraphs. We call hyperstub the set of half-links connecting a node to a subgraph. This example highlights the key component of the model, namely to distinguish between all nodes of a subgraph even those that are topologically equivalent. This distinction makes it possible to deal with the added complexity of having to account for labelled subgraphs. Each node/position of a subgraph is labelled. This is reflected in a PGF that accounts for each and every node in each and every subgraph. This gives rise to a PGF of the following form: ψ(α) ˆ =
∞ X
pyˆ
m Y
αiyi ,
(1)
i=1
yˆ=0
where α ˆ = (α1 , α2 , . . . , αm ) is a placeholder and yˆ = (y1 , y2, . . . , ym ) is such that yi is the expected number of times a node appears in position xi , i = 1, . . . , m. A subgraph’s state a time t is denoted by G(·) (A, B, . . . , C), where subscript (·) denotes the subgraph with the time variable omitted for clarity. Note that for this notation to be unambiguous a unique mapping between labels and nodes must be set. For a subgraph containing position xi the unique subgraph count is computed by N∂ψ(α)/∂α ˆ ˆ = 1. i at α The summation form of the PGF is highly malleable and illustrative when computing various network properties; define θi (t) the probability that a node in position xi has not been infected by any node in the corresponding subgraph by time t. A node that appears k times in position xi will remain susceptible with probability θik . Hence, to determine the probability that a randomly selected node remains susceptible, we average over all nodes and all possible hyperstubs: ˆ = S(t) = ψ(θ)
∞ X yˆ=0
pyˆ
m Y i=1
θiyi .
(2)
4 This probability is equal to the fraction of susceptible nodes in the population at time t [20]. θ is referred to as a survivor function. It dependant on time and may by computed from first principles using the definition of the Poisson process. However, in our formulation, it is computed from variables that denote the expected rate, Ti , at which infection is transmitted to a node in position xi through the corresponding subgraph. Each position label xi has a Ti variable associated with it. The following examples show these rates for positions x1 , x2 and x3 , see Fig. 1: T1 = τ [G0 (SI)], T2 = τ [G0 (IS)], T3 = τ [G△ (SSI) + G△ (SIS) + 2G△ (SII) +G△ (SRI) + G△ (SIR)].
(3) (4) (5)
To generate the above identities, we consider a susceptible node in position xi and list all possible corresponding subgraph states that allow this node to be exposed to infection. T = (T1 , T2 , . . . , Tm ) can now be used to determine the probability that a susceptible node has an infectious neighbour within a certain subgraph type. This is done by dividing Ti τ −1 by all possible states that involve a susceptible at position xi : τ
P
Ti . G(·) (xi = S, . . . , A, B, C, D)
(6)
A,B,C,D
The expected degree of a susceptible node at position xi is given by ∞ m X Y ∂ψ yi , hki i = yi pyˆ θi = θi ∂αi α=θˆ i=1 yˆ=0
(7)
where θˆ = (θ1 , θ2 , ..., θm ). To compute the expected degree for every position of every subgraph, ˆ one can take the Jacobian of ψ evaluated at x = θ, J(ψ)|α=θˆ.
(8)
The ith entry of this vector evaluated at α = θˆ shall be denoted Ji . A susceptible node in position xi will have remained susceptible up to time t, with probability θi after which infection may be transmitted at rate Ti /Ji . This information may be used to form the following equation: Ti Ti (1 −˙ θi ) = θi ⇒ θ˙i = −θi . Ji Ji
(9)
θ˙i decays at the rate at which a subgraph transmits infection to its node in position xi , conditional on that node being susceptible. Once a node is newly infected it is important to determine what, if any, subgraph states are created or destroyed. To do this, we use the susceptible nodes’s excess degree prior to the infection. For the full derivation of susceptibles’ excess degree we refer the reader to the SI text. In this derivation, the excess degree must be generalised to account for the degree of the
5 different positions a node may be in. i.e. hki i, i = 1, 2, . . . , m. The expected excess degree for susceptible nodes is given by: Hi,j (ψ) , (10) ∆i,j = θi Ji (ψ) α=θˆ
where H(ψ) is the Hessian of the PGF. ∆i,j denotes the expected number of xj positions associated with a node that has been selected at random, but proportionally to the number of xi positions associated with that node. It is now possible to formulate ODEs describing subgraph states. We denote the time derivative of a subgraph’s state by G˙ (·) . This quantity is dimensionless but not normalised. For example, the number of unique (SI) links in a network of size N is given by [SI] = NG0 (SI). To form the ODE for the subgraph state G0 (SI), we consider all possible ways in which this state may be created or destroyed, namely G˙ 0 (SI) = −(τ + γ)G0 (SI) −(T ∆)1 G0 (SI) + (T ∆)2 G0 (SS),
(11)
where (T ∆)1 denotes the first entry of the vector that is the product of the matrix ∆ multiplied from the left by vector T . Conceptually (T ∆)i denotes the expected number of nodes in position xi an infection will encounter upon infecting a susceptible node through any possible route, see Fig. 2. The first term on the RHS of Eq. (11) describes this state being destroyed by the I infecting the S or the I recovering. The second term stands for this state being destroyed by the S being infected by an outside source. Finally, the last term corresponds to this state being created by the second node of G0 (SS) being infected by a source external to the subgraph. To further illustrate this, the equations for G0 (SS) and G0 (IS) are given G˙ 0 (SS) = −[(T ∆)2 + (T ∆)1 ]G0 (SS), G˙ 0 (IS) = −(τ + γ)G0 (II) −(T ∆)2 G0 (IS) + (T ∆)1 G0 (SS).
(12) (13)
Equations for every state of every subgraph must be derived. In general, we first describe any infection and recovery events of nodes within a subgraph. Next we list all possibilities for susceptible nodes to be infected from sources external to that subgraph using the appropriate (T ∆) terms. To compute network-level prevalences, we recall that S(t) can be computed at any time by ˙ Eq. (2). I(t) is computed directly by differentiating S(t). Namely, since susceptibles become infected and and infected nodes recover at rate γ, we have ˙ I(t) =
m X i=1
∂ψ(t) θ˙i (t) − γI(t), ∂θi
˙ R(t) = γI(t).
(14) (15)
P |Gi | The total number of equations is given by 2 + m + M , where | · | denotes the number of i=1 3 nodes in a subgraph. See the SI text where we show how our model is equivalent to previous approaches [18] for complete subgraphs.
6
2.1
Initial conditions
Let ǫ be the fraction of initial infected. ǫ = I0 /N, where I0 is the initial amount of infected and N is the network size. Initial conditions for the I and R populations are given by: I0 = ǫ, R0 = 0.
(16)
At time t = 0 no hyperstub has transmitted infection, therefore, θi (t = 0) = 1. I0 /N gives the probability of a node initially being infected. For each subgroup that contains a single infected node, G(t = 0) = ǫhkii where hkii is the expected hyperstub degree. For the subgraph with every node susceptible we set G(t = 0) = (1 − ǫ)hkii .
3
Automated code-generation of the mean-field model
We now present our methodology for computationally generating a complete system of equations for a network constructed from subgraphs following a configuration model. This procedure requires the PGF of a hyperstub degree distribution (HDD), the adjacency matrices of corresponding subgraphs, and epidemiological parameters as inputs. The algorithm will output the system of ODEs that will predict the network-level prevalence. Table 1 gives a brief summary of the variables that need to be generated, listed in the order they are generated in this section. ~ denote the vector of states of a subgraph G with G ~ i denoting a specific state of G. For Let G ~ has 3|G| elements. To generate Ti from G: ~ (1) cycle through G, ~ (2) for each the SIR model, G ~ j make the addition Ti = Ti + G. ~ Using T the survivor infectious contact to node i in state G functions can be computed, see Eq. (9), which are then used to compute the fraction of the population that are susceptible, infected or recovered, see Eq. (14). The ODEs corresponding to subgraphs need to be represented with a rate matrix, Z. This matrix encodes all information relating to the given subgraph, namely the excess degrees, rates of infection over subgraphs T , epidemiological parameters τ and γ, and implicitly encodes the subgraph’s adjacency matrix g. To compute ∆, we use Eq. (1) and a symbolic software package to calculate the Jacobian and Hessian of the PGF. For each subgraph, we initialise the matrix Z as a square matrix with all entries set to zero. ~ i . Once populated, the entry Zi,j contains The ith column and row of Z correspond to state G the rate at which state i transitions to state j. ~ = (SS, To illustrate how to generate Z, we consider the G0 subgraph, see Fig. 1, with states G SI, SR, IS, II, IR, RS, RI, RR). We associate the state G0 (SS) with the first row and column of Z. Moving along the top row, when a column index is reached that corresponds to a state that G0 (SS) may transition to, we update the entry with the appropriate rate. The first row of Z is all zero except for Z1,2 = (T ∆)2 and Z1,4 = (T ∆)1 . The second row, corresponding to state G0 (SI), has entries Z2,3 = γ and Z2,5 = τ + (T ∆)1 , see Eq.(11). Fill every row of the matrix Z in this way, refer to the SI text for the full matrix corresponding to G0 . The algorithm for this process is given for an arbitrary subgraph in the SI, algorithm S2, and the corresponding Matlab code will be made publicly available upon acceptance of the manuscript. ~ i using the rate matrix compute the following: To form an ODE for the subgraph state G 3|G| 3|G| X X ~ d Gi ~ k. ~i + Zk,i G (17) = − Zi,j G dt j=1 k=1
7 The final step in generation of the full system is to set the initial conditions. Only the initial conditions for subgraph states need computing as I(0), R(0) and θ(0) are fixed as per the ~ If (a) G ~ i is a purely previous section. This can be done by cycling through each element of G. ~ i0 = Ji (1 − ǫ), and if (b) G ~ i contains a single infectious individual susceptible state then we set G ~ i0 = Ji ǫ. All other states are set to zero, as we assume and is otherwise susceptible, we set G that with a sufficiently small infectious seed, the probability of having two infectious individuals in a subgraph is zero.
4
Results
To validate the proposed mean-field model and to assess the goodness of the approximation, we compare results from the ODEs to output from stochastic simulation. Networks were generated following the configuration algorithm, please refer to the SI, algorithm S1. Typically we generated 500 networks of size N = 15000 and computed a single realisation of the epidemic, according to the Gillespie algorithm with the per link rate of infection τ = 1 and a recovery rate of γ = 1. Simulations which died out before an outbreak occurred were removed. The simulations were seeded with a single infectious individual and an outbreak was said to occurr if 5% infectious incidence was achieved. To start, we test the performance of our model against existing or state of the art models. To do this, in Fig. 3a, we show results for two degree distributions that are homogeneous in the classical sense. Their PGFs are given by 1 1 (α14 + α17 ) (α15 + α16 ), 2 2 1 1 (α1 + α2 ) 2 (α10 + α11 + α12 + α13 )2 , ψ2 (α) ˆ = 2 4
ψ1 (α) ˆ =
(18) (19)
where the variables αi correspond to subgraphs given in Fig. 1. Figure 3a shows results from a pairwise model with closures at the level of quadruples [12, 19]. While the classical clustering is easy to compute, the order-four clustering/transitivity ratios were measured following a recently developed subgraph counting algorithm [11]. These are defined as the ratio of a given subgraph count to all open and closed paths of length four, both counted uniquely. Currently, this model operates using an average or homogenous degree and stores no information about the degree distribution, but does assume random mixing of subgraphs. All models perform well in capturing the epidemic dynamics on networks generated using the PGF given by ψ1 , see Fig. 3a with higher epidemic peak. However, when networks are created using the PGF given by ψ2 , see lower peak in Fig. 3a, the pairwise model struggles to accurately capture the dynamics, both anticipating and compressing its time scale of the epidemic but preserving its peak infective incidence. The pairwise model does not encode any information relating to degree or subgraph distribution and hence a homogenous random setup, as used here, would be the most appropriate choice. The key advantage of our model over existing ones is that it can handle non-fully connected subgraphs. To test this, in Fig. 3b, we consider networks with loops of increasing size from triangles to pentagons (i.e. GD ). Each network was generated so that hki ∼ P ois(4). Two of an average node’s four edges will go on to form a loop. In doing, so each node has, on average,
8 a single loop in its local neighbourhood. A Poisson random network with hki = 4 has been included for reference. In this case, the clustered network accelerates the speed of the outbreak and reduces the final size compared to the random network. Once a single node in a triangle has been infected, its neighbours may become infected not only by the original infected node but also via the third node. When closing an open triple of nodes, the extra edge ensures a more systematic spread of the infection. However, this also leads to competition for susceptibles. In contrast, if the edge closing the open loop connects to a 4th node, this will allow for a larger final epidemic size. In Fig. 3b, this experiment was repeated by replacing G△ loops with G loops and then with empty pentagons, GD . Compared to the clustered network, both increase the initial speed of outbreaks, produce a greater peak infective incidence and reduce the final size of the epidemic. It is worth noting that these two networks have clustering comparable to that of a random network, and therefore, these effects are purely a result of the presence of higher-order loops. A single infected node in an otherwise susceptible loop gives rise to two propagating infectious fronts. Even if an infectious node in one of the fronts recovers, it is still possible for the other front to propagate around the loop. This ‘ripple around’ effect increases the chance of the epidemic reaching more nodes. Hence, as shown by our results, longer loops give rise to larger epidemics. However, in general, networks are not composed of single isolated loops but rather tend to involve overlapping subgraphs, and therefore, such effects are much more subtle. To highlight the flexibility of our model and its wide-ranging applicability to systematically investigating the impact of higher-order network structure, in Fig. 3c, we consider two networks with the same degree distribution and variance, and the same classical clustering value, but generated using different families of subgraphs. The first network was generated with G△ ∼ P ois(4), i.e. all triangles, while the second one was built using G0 ∼ P ois(2/3) and G ∼ P ois(4/3). Both networks have degree given of k ∼ P ois(4) and a clustering of φ = 0.2 and φ = 0.1905, respectively. It is possible to interpolate between the two extreme networks, i.e. non-overlapping and overlapping triangles, via a family of networks which can be constrained to exhibit the same degree distribution, degree variance and classical clustering. These constraints translate to 5 3 hki = G0 + 2G△ + G , h△i = G△ + G , (20) 2 2 where hki and h△i denote the expected number of lines and triangles per node, respectively. By treating hki and h△i as given and G as a free parameter, we can derive the following parametric equations 1 3 G0 = hki − 2h△i + G , G△ = h△i − G , (21) 2 2 with the following positivity constraints: max{0, 2(hki − 2h△i)} ≤ G ≤ 32 h△i. The networks in Fig. 3c correspond to G = 0 and G = 2/3 with hki = 4 and h△i = 2. This approach is applicable to many combinations of different subgraphs and degree distributions, making it possible to systematically vary the composition of higher-order structures. Overlapping triangles lead to epidemics that grow faster and peak higher compared to networks with non-overlapping triangles. Partly, this effect is due to what was observed in Fig. 3b. Namely, by letting triangles overlap, additional loops of length 4 are formed with a denser connectivity compared to two separate triangles. As connections within subgraphs become denser, connections between subgraphs must decrease. As a result, two different networks with the same
9 degree distribution, variance and classical clustering give rise to different prevalence profiles. Thus, it is important to consider how clustering in the classical sense arises since higher-order structure, such as the square with one diagonal, can significantly affect the epidemic dynamics.
5
Discussion
Higher-order structures, captured for example as different subgraph compositions and arrangements in a network, have been identified as features of real networks. Examples include households, social interactions and biological networks. These building blocks of networks have been shown to play a key role in defining a network’s topology and can have significant impact on the functions of the network or on the dynamical processes unfolding on the network. Despite this, the modelling toolset in this direction is underdeveloped. Here, we provided an approach that considerably extends the scope of the current modelling framework by enabling us to consider arbitrary sets of exotic subgraphs as building blocks for the network. Our approach also offers control over the arrangements of subgraphs and, more importantly and uniquely, an automated way of generating a system of ODEs that accurately capture the prevalence profile for a wide range of subgraph sets, as shown in the results section. The previous section has shown how higher-order structures may be investigated using this model. Moreover, we provided the first example of generating classes of networks that can interpolate between different subgraph sets while keeping degree, variance and clustering, all in the classic sense, fixed. For example, we showed that epidemics on networks with no clustering, but exhibiting open loops, display features which are significantly different to those observed in classical random networks with effectively no clustering. Equally, we have shown that the arrangements of subgraphs, in our example, triangles, can create higher-order structure that may significantly affect the epidemic dynamics. Our work opens the possibility to carry out a wide-ranging and systematic investigation of the impact of subgraphs and higher-order structure on dynamics on networks. When presented with real world network data whose structure can be explained by a set of subgraphs, all that will be needed in order to apply our framework is to extract the subgraphs and their distribution around nodes. A possible limitation to the widest applicability is the number of nodes in the largest subgraph. However, as shown by our results when going from squares to pentagons, it is likely that the effects of higher-order structures will decay, or be less marked, as their size increases. There are two key ways in which this work may be extended: (a) generalisation to SIS dynamics. Due to the definition of θ it is currently not possible to apply this model to SIS dynamics. However, all the framework relating to network structure is independent from this variable and may therefore still be appropriate, (b) the subgraph approach is highly suitable for adaptation to household models. Household models typically specify a distribution of household sizes overlaid on a contact network to produce a well-connected network [21, 22]. A successful incorporation of such network in our framework could lead to a highly relevant set of household models.
10
Acknowledgements Martin Ritchie acknowledges funding for his PhD studies from EPSRC (Engineering and Physical Sciences Research Council) and the University of Sussex.
References [1] Tildesley MJ, et al. (2010) Impact of spatial clustering on disease transmission and optimal control. Proceedings of the National Academy of Sciences 107:1041–1046. [2] Barab´asi AL, Albert R (1999) Emergence of scaling in random networks. science 286:509– 512. [3] Kiss IZ, Green DM, Kao RR (2006) The network of sheep movements within great britain: network properties and their implications for infectious disease spread. Journal of the Royal Society Interface 3:669–677. [4] Newman ME (2002) Spread of epidemic disease on networks. Physical review E 66:016128. [5] Pastor-Satorras R, Vespignani A (2001) Epidemic spreading in scale-free networks. Physical review letters 86:3200. [6] Keeling MJ (1999) The effects of local spatial structure on epidemiological invasions. Proceedings of the Royal Society of London Series B: Biological Sciences 266:859–867. [7] Ball F, Lyne OD (2001) Stochastic multi-type SIR epidemics among a population partitioned into households. Advances in Applied Probability 33:99–123. [8] Ball F, Sirl D, Trapman P (2010) Analysis of a stochastic SIR epidemic on a random network incorporating household structure. Mathematical Biosciences 224:53–73. [9] Karrer B, Newman MEJ (2010) Random graphs containing arbitrary distributions of subgraphs. Physical review E 82:066118. [10] Molloy M, Reed B (1995) A critical point for random graphs with a given degree sequence. Random structures & algorithms 6:161–180. [11] Ritchie M, Berthouze L, House T, Kiss IZ (2014) Higher-order structure and epidemic dynamics in clustered networks. Journal of Theoretical Biology 348:21 – 32. [12] House T, Davies G, Danon L, Keeling MJ (2009) A motif-based approach to network epidemics. Bulletin of Mathematical Biology 71:1693–1706. [13] House T, Keeling MJ The impact of contact tracing in clustered populations. PLoS Comput Biol 6:e1000721. [14] Keeling MJ (1999) The effects of local spatial structure on epidemiological invasions. Proceedings of the Royal Society of London Series B: Biological Sciences 266:859–867.
11 [15] Green DM, Kiss IZ (2010) Large-scale properties of clustered networks: Implications for disease dynamics. Journal of Biological Dynamics 4:431–445. [16] Milo R, et al. (2002) Network motifs: simple building blocks of complex networks. Science 298:824–827. [17] Colomer-de Sim´on P, Serrano MA, Beir´o MG, Alvarez-Hamelin JI, Bogun´a M (2013) Deciphering the global organization of clustering in real complex networks. Scientific reports 3, doi:10.1038/srep02517. [18] Volz EM, Miller JC, Galvani A, Ancel Meyers L (2011) Effects of heterogeneous and clustered contact patterns on infectious disease dynamics. PLoS Computational Biology 7:e1002042. [19] House T (2010) Generalised network clustering and its dynamical implications. Advances in Complex Systems No.3 Vol.13:281–291. [20] Volz E (2008) SIR dynamics in random networks with heterogeneous connectivity. Journal of mathematical biology 56:293–310. [21] House T, Keeling MJ (2009) Household structure and infectious disease transmission. Epidemiology and infection 137:654–661. [22] Ball F, Sirl D, et al. (2012) An SIR epidemic model on a population with random network and household structure, and several types of individuals. Advances in Applied Probability 44:63–86.
12
6
Figures and tables x1
x2 G0
x3
x5
x4 G△
x6
x7
x10
x11
x14
x15
x9
x8
x13
x12
x16
x17
G⊠
G
G
Figure 1: Subgraph notation and position labelling. Subgraphs are denoted by G followed by a symbolic subscript for ease of reference.
13
Figure 2: Graphical representation of ∆. Infection flows from left to right. The test node (in red) may belong to a number of subgraphs shown to its left. Upon being infected, the test node will potentially transmit to any of the G△ shown to its right.
14
0.33 0.32
0.3
0.25
0.25
0.2
0.2
1.6
1.8
2
2.2
Prevalence
Prevalence
0.3
0.2 0.15
Prevalence
0.31
0.25
0.15
0.15
0.1
0.1
0.05
0.05
0.1 0.05 0 0
1
2
3 Time
(a)
4
5
6
0 0
1
2
3
Time
(b)
4
5
6
0 0
1
2
3 Time
4
5
6
(c)
Figure 3: (a) The dashed and solid lines correspond to ODE solution and simulation average, respectively. Data with the lower peak are based on networks generated with each node allocated one of each corner type of a G with resulting clustering φ = 0.3. Data with higher peak correspond to networks generated with a single G0 and two G subgraphs. The pairwise model solutions are shown in dashed-green, and the ODE solutions in dashed-black. (b) The effect of empty loops for four different distributions: G0 ∼ P ois(4) (red), G0 ∼ P ois(2) and G△ ∼ P ois(1) (blue), G0 ∼ P ois(2) and G ∼ P ois(1) (green) and G0 ∼ P ois(2) and GD ∼ P ois(1) (pink). The corresponding ODE solutions are shown with dashed lines. (c) The solution corresponding to the network generated with G△ ∼ P ois(2), i.e., all triangles with φ = 0.2 is shown in blue. That generated with G0 ∼ P ois(2/3) and G ∼ P ois(4/3) with φ = 0.1905 is shown in red. Both networks have classical degree of k ∼ P ois(4).
15 Table 1: Summary of the key system variables and their generation. Variable ψ
Description Generation PGF of the HDD given as a A symbolic software packfunction, not as a series. age can be used to compute the Jacobian and Hessian. θ Survivor functions with These ODEs can be defined their evolution equations within a single for loop, see given by ODEs. Eq.(9). (S, I, R) The prevalences of S, I and From Eq. (14), it follows R, with the later two given that S = ψ(θ). by numerical solutions of ODEs Ti Total rate of infection expe- For a subgraph with m rienced by an S in position nodes, Ti may be generated xi . by m nested for loops cycling through the possible states that a subgraph can be in, see Eq. (3). G(A, B, . . . , C) Expected prevalence of a The equation for this is subgraph in a given state. computed based on the rate matrix, Z, see Eq. (17).
16
7
Supplementary information
In the following we give a more detailed explanation of the excess degree, we provide ODEs for an example network, we show how a previous model is equivalent to ours under certain conditions, we provide an example state transition matrix and finally we describe pseudocode for both a subgraph-based configuration model and the algorithm used to generate the state transition matrix.
7.1
Excess degree
Consider the probability generating function (PGF) of a network’s hyperstub degree distribution with m nodal positions: ψ(α) ˆ =
∞ X yˆ=0
pyˆ
m Y
αiyi ,
(22)
i=1
where α ˆ = (α1 , α2 , . . . , αm ) is a placeholder, and yˆ = (y1 , y2 , . . . , ym ) is such that yi is the expected number of times a node appears in position xi , i = 1, 2, . . . , m. Select a node at random but proportionally to its number of xi hyperstubs. This means that, typically, a node will be a member of yi pyˆ subgraphs at position xi . Summing this value over all nodes gives the expected xi degree: ∞ X ∂ψ(α) ˆ yi pyˆ =: hxi i = . (23) ∂α i α=1 ˆ yˆ=0 In performing the above sum, we considered all nodes from which xi stubs originate. In order to compute the expected xj degree of these nodes, we use the conditional expectation E(xj |xi = yi ), which yields: P∞ yˆ=0 yi yj pyˆ , (24) δxi ,xj = P∞ yˆ=0 yi pyˆ (25)
where δxi ,xj denotes the expected xj hyperstub degree observed from a node selected proportionally to its xi hyperstub degree. The denominator is given by Eq. (22), and the numerator is specified by: ∞ X ∂ 2 ψ yi yj pyˆ∗ = . (26) ∂αi αj α=1 yˆ∗ =0
7.2
ODEs for an example network
In the following, the values of G are determined by integrating ODEs. This is a logical place to start when deriving ODEs by hand. The following equations specify the rates of infection experienced through hyperstubs, and effectively list the key states a subgraph can be in: T1 = τ [G0 (SI)]
(27)
17 T2 T3 T4 T5
= = = =
τ [G0 (IS)] τ [G△ (SSI) + G△ (SIS) + 2G△ (SII) + G△ (SRI) + G△ (SIR)] τ [G△ (SSI) + G△ (ISS) + 2G△ (ISI) + G△ (RSI) + G△ (ISR)] τ [G△ (ISS) + G△ (SIS) + 2G△ (IIS) + G△ (IRS) + G△ (RIS)].
(28) (29) (30) (31)
The full system corresponding to subgraphs with node cardinality n is of order O(3n ) equations. In the case of lines, G0 , the full system contains 9 equations, where the first few ODEs are: G˙ 0 (SS) = −[(T ∆)2 + (T ∆)1 ]G0 (SS), G˙ 0 (SI) = −(τ + γ)G0 (SI) − (T ∆)1 G0 (SI) + (T ∆)2 G0 (SS), G˙ 0 (IS) = −(τ + γ)G0 (IS) − (T ∆)2 G0 (IS) + (T ∆)1 G0 (SS),
(32) (33) (34)
with equations for {G˙ 0 (SR), G˙ 0 (II), G˙ 0 (IR), G˙ 0 (RS), G˙ 0 (RI), G˙ 0 (RR)},
(35)
being omitted. Similarly, sample ODEs for the G△ subgraph, taken from a system of 27 ODEs, are: G˙ △ (SSS) G˙ △ (SSI) G˙ △ (SIS) G˙ △ (ISS)
= = = =
−[(T ∆)5 + (T ∆)4 + (T ∆)3 ]G△ (SSS), −[2τ + γ + (T ∆)4 + (T ∆)3 ]G△ (SSI) + (T ∆)5 G△ (SSS), −[2τ + γ + (T ∆)5 + (T ∆)3 ]G△ (SIS) + (T ∆)4 G△ (SSS), −[2τ + γ + (T ∆)5 + (T ∆)4 ]G△ (ISS) + (T ∆)3 G△ (SSS),
(36) (37) (38) (39)
with equations for {G˙ 0 (SSR), G˙ 0 (SII), G˙ 0 (SIR), G˙ 0 (SRS), G˙ 0 (SRI), G˙ 0 (SRR), G˙ 0 (ISI), G˙ 0 (ISR), G˙ 0 (IIS), G˙ 0 (III), G˙ 0 (IIR), G˙ 0 (IRS), G˙ 0 (IRI), G˙ 0 (IRR), G˙ 0 (RSS), G˙ 0 (RSI), G˙ 0 (RSR), G˙ 0 (RIS), G˙ 0 (RII), G˙ 0 (RIR), G˙ 0 (RRS), G˙ 0 (RRI), G˙ 0 (RRR)},
(40) (41) (42) (43)
being omitted. Each hyperstub will have a survivor function and a corresponding ODE describing its evolution, as follows: T1 , θ˙1 = −θ1 M1 T2 θ˙2 = −θ2 , M2 T3 , θ˙3 = −θ3 M3 T4 θ˙4 = −θ4 , M4 T5 . θ˙5 = −θ5 M5
(44) (45) (46) (47) (48)
18 The fraction of the population that is susceptible or infected is computed by integrating the ODEs. Recoveries occur independently and may be computed directly from the infected population, as specified below: d ˆ ψ(θ), dt d ˆ − γI, I˙ = − ψ(θ) dt R = γI, S˙ =
(49) (50) (51)
where ψ is the probability generating function that generates the hyperstub degree distribution and θˆ = (θ1 , θ2 , θ3 , θ4 , θ5 ) is the probability that infection via subgraphs of types one to five has not been transmitted. The total system size for this example network is 32 + 33 + 5 + 2 = 43,
(52)
with each term in the above corresponding to G0 , G△ , survivor functions and epidemic prevalence, respectively. In general, the total number of equations is given by: m X
3|Gi | + |Gi | + 2,
(53)
i=1
where Gi denotes a subgraph, |Gi | is the number of nodes in a subgraph, and m is the total number of subgraphs.
7.3
Equivalence to previous model for complete subgraphs
The PGF formulation originally proposed by Volz et al. [18] is equivalent to our proposed model in the case of complete subgraphs. Consider an arbitrary complete subgraph composed of l nodes and a network that is composed only of this subgraph. If positions within the subgraph are labelled distinctly, {x1 , x2 , . . . xl }, as we have done in our approach, then the PGF of such a network is given by: ψp (α) ˆ =
∞ X
pyˆ
yˆ=0
l Y
αiyi ,
(54)
i=1
where yˆ = (y1 , y2 , . . . , yl ). Volz et al.’s framework treats all topologically equivalent positions as one single position. Thus, in this case, the subgraph has a single label, x, and the PGF takes the following form: ψv (α) ˆ =
∞ X
py α y .
(55)
y=0
We now show how one may obtain Eq. (55) from Eq. (54). Since both PGFs describe the same network, the rate at which our formulation allocates position xi must be 1/l the rate at which
19 Volz et al.’s formulation allocates x. If we replace the unique position labels of Eq. (54) with a single position marker (such as in Volz et al.’s model), the following expression is obtained: ψp (α) ˆ =
∞ X yˆ=0
pyˆ
l Y
αy/l ,
(56)
i=1
where the following substitutions, yi = y/l and αi = α, were made so that αy is the result of the above product. Now, every time an xi is allocated, we allocate an x instead. Finally, since pyˆ is a joint distribution of l identically distributed independent random variables, i.e., yˆ = (y/l, y/l, . . . , y/l), we get: ψp (α ˆ = α) =
∞ X
py α y .
(57)
y=0
It is also possible to translate between the two models elsewhere later in the derivation. As an example, in our approach, infection over lines is given by T1 and T2 , as per Eq.(3). By summing these values, the equivalent values used in Volz et al.’s formulation may be recovered. Following our derivation, first let G0 (SI) ≡ G0 (IS) and: T1 + T2 = τ G0 (SI) + τ G0 (IS) = 2τ G0 (SI).
(58)
Since each G0 is generated from a PGF that allocates positions at rate 1/2 that of Volz et al.’s PGF, the 2 will cancel yielding τ G0 (SI). However, it is only necessary to show equivalence between the two PGFs since all other variables follow from this.
7.4
State transition matrix
The state transition matrix for G0 , lines, is given by: (SS) (SI) (SS) 0 (T ∆)2 (SI) 0 0 (SR) 0 0 (IS) 0 0 0 Z = (II) 0 (IR) 0 0 (RS) 0 0 (RI) 0 0 (RR) 0 0
(SR) (IS) 0 (T ∆)1 γ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(II) 0 τ + (T ∆)1 0 τ + (T ∆)2 0 0 0 0 0
(IR) 0 0 (T ∆)1 0 γ 0 0 0 0
(RS) (RI) 0 0 0 0 0 0 γ 0 0 γ 0 0 0 (T ∆)2 0 0 0 0
(RR) 0 0 0 0 0 . γ 0 γ 0
20
8
Algorithm S1 - Hyperstub CM algorithm
Algorithm 1: The hyperstub configuration model. input : N, K output: A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Variables / initialisation N: the network size, m: number of node positions, M: number of subgraphs K: hyperstub degree sequence non-square matrix K ∈ N0N ×m , G: a subgraph with node cardinality |G|, g: adjacency matrix of a subgraph, A: adjacency matrix of network, A ∈ {0, 1}N ×N . Procedure % work through all nodes filling the hyperstub bins for i = (1, 2, . . . , N) do for j = (1, 2, . . . , m) do append, Ki,j multiples of node(i), to the hyperstub bin(hj ) end end for i = (1, 2, . . . , M) do for j = (1, 2, . . . , |Gi |) do nj = rand-sample(hij ) end for k = (1, 2, . . . , |Gi |) do for l = (1, 2, . . . , |Gi |) do if g(k, l) = g(k, l) = 1 then A(nk , nl ) = A(nl , nk ) = 1 end end end end
21
9
Algorithm S2 - Transition matrix algorithm
~ i refers to the i-th state of G. ~ Algorithm 2: Generating the state transition matrix. G ~ i. The i-th column and i-th row of Z have entires corresponding to state G input : g, the adjacency matrix of a graphlet G. output: Z the state transition matrix corresponding to G. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Variables / initialisation n: node count of G, n n Z: initialise Z ∈ 03 ×3 , ~ initialise G ~ ∈ 03n ×3 , G: τ : per link infection rate, γ: recovery rate. Procedure ~ using n nested for loops from that cycle through {1, 2, 3} %% Generate G, %% corresponding to states {S, I, R} respectively. for i = 1 to 3n do for j = 1, j 6= i to 3n do ~ =G ~i − G ~j dG %% Check a maximum of one change has occurred. ~ and 1. %% Use element wise multiplication of dG %% ek is all zeros except for the k th entry. ~ · 1 = ek then if dG ~ k do switch dG case −2 Zi,j = γ endsw case −1 for l = 1 to n do if state(g(k, l)) == I then Zi,j = Zi,j + τ end end Zi,j = Zi,j + (T ∆)· endsw endsw end end end