But what can we say about marginal distributions of subgraphs? c Tom A.B. Snijders .... binomial distribution with binom
Conditional Marginalization Simulation Example
1 / 27
Conditional Marginalization for Exponential Random Graph Models Tom A.B. Snijders
University of Oxford University of Groningen
April 2010
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
2 / 27
Exponential Random Graph Model (ERGM) defined by Pθ {Y = y } = exp θ0 u(y ) − ψ(θ)
y ∈ Y(N )
where Y(N ) is the set of all graphs on a node set N . (Frank & Strauss 1986; Pattison & Wasserman 1996; Snijders, Pattison, Robins & Wasserman 2006).
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
2 / 27
Exponential Random Graph Model (ERGM) defined by Pθ {Y = y } = exp θ0 u(y ) − ψ(θ)
y ∈ Y(N )
where Y(N ) is the set of all graphs on a node set N . (Frank & Strauss 1986; Pattison & Wasserman 1996; Snijders, Pattison, Robins & Wasserman 2006). The induced subgraph on a subset of nodes N1 ⊂ N does not in general have an ERG distribution. (Frank & Strauss, 1986)
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
2 / 27
Exponential Random Graph Model (ERGM) defined by Pθ {Y = y } = exp θ0 u(y ) − ψ(θ)
y ∈ Y(N )
where Y(N ) is the set of all graphs on a node set N . (Frank & Strauss 1986; Pattison & Wasserman 1996; Snijders, Pattison, Robins & Wasserman 2006). The induced subgraph on a subset of nodes N1 ⊂ N does not in general have an ERG distribution. (Frank & Strauss, 1986) ... Is this serious ... ?
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
3 / 27
In ‘regular’ statistical models with i.i.d. observations, omitting a random subset of the data is only a loss of information, not an assault on the model’s validity: the marginal distribution of the smaller data set is still an i.i.d. sample from the same distribution – the model marginalizes straightforwardly. Therefore, the same type of statistical analysis still is applicable. For ERGMs this is not the case (trivial specifications excepted), which may be regarded to be a quirk of the model.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
3 / 27
In ‘regular’ statistical models with i.i.d. observations, omitting a random subset of the data is only a loss of information, not an assault on the model’s validity: the marginal distribution of the smaller data set is still an i.i.d. sample from the same distribution – the model marginalizes straightforwardly. Therefore, the same type of statistical analysis still is applicable. For ERGMs this is not the case (trivial specifications excepted), which may be regarded to be a quirk of the model. Taking nodes out of the graph is an amputation.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
3 / 27
In ‘regular’ statistical models with i.i.d. observations, omitting a random subset of the data is only a loss of information, not an assault on the model’s validity: the marginal distribution of the smaller data set is still an i.i.d. sample from the same distribution – the model marginalizes straightforwardly. Therefore, the same type of statistical analysis still is applicable. For ERGMs this is not the case (trivial specifications excepted), which may be regarded to be a quirk of the model. Taking nodes out of the graph is an amputation. But what can we say about marginal distributions of subgraphs? c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
4 / 27
When does the ERGM marginalize??
When does the induced subgraph on a subset N1 ⊂ N still have an ERG distribution?
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
4 / 27
When does the ERGM marginalize??
When does the induced subgraph on a subset N1 ⊂ N still have an ERG distribution?
Under the condition that the graph on N1 is disconnected from the rest.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
5 / 27
Theorem. If N = N1 ∪ N2 , under the condition that there are no connections between N1 and N2 , the induced graphs Y1 |N1 and Y2 |N2 are mutually independent, both having the ERG distribution Pθ
Yh |Nh = yh = exp θ0 u(y˜h )) − ψh (θ) ,
where y˜h is the graph on N obtained from y by deleting all edges outside Nh . This generalizes to multiple disconnected subgraphs, and to more restrictive conditions (implying disconnection).
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
6 / 27
But there is a condition: exclude action at a distance
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
6 / 27
But there is a condition: exclude action at a distance Definition. The function u(y ) is component separable, if for any partition N = ∪H h=1 Nh , and Nh ∩ Nk = ∅ for h 6= k , and any graph y that has no edges between Nh and Nk for any h 6= k , u(y ) can be written as u(y ) =
H X
u(y˜h ) + ud
h=1
where y˜h is the graph on N obtained from y by deleting all edges outside Nh . c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
7 / 27
For ERGMs, component separability is equivalent to disconnected induced subgraphs being independent.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
7 / 27
For ERGMs, component separability is equivalent to disconnected induced subgraphs being independent. Examples of ERGMs that are not component separable 1
u(y ) =
sX
yij
ij
Nonlinear functions of subgraph counts. 2
1 u(y ) = 8
X
.
.
.
.
i
j
h
k
yij yhk ,
i,j,h,k :{i,j}∩{h,k }=∅
the count of subgraphs on four points, composed of only two disconnected edges. c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
8 / 27
An aside ...
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
8 / 27
An aside ...
In specifications of ERGMs, use only statistics that are linear combinations of subgraph counts (with weights depending on covariates) unless you are willing to permit action at a distance.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
9 / 27
Theorem. For a component separable ERGM with sufficient statistic u(y ), and N1 , . . . , NH a partition of the node set, let A0 be the event that the subsets Nh all are mutually disconnected, and Ah be an event referring only to the induced graph Y |Nh . Then conditional on A0 ∩ A1 ∩ . . . ∩ AH , the subgraphs Y |Nh for h = 1, . . . , H are independent, and ( exp θ0 u(y˜h ) − ψh (θ) if yh satisfies Ah Pθ Y |Nh = yh = 0 otherwise.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
10 / 27
The theorem (in its extended form) has a number of corollaries. Suppose that Y follows an ERGM.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
10 / 27
The theorem (in its extended form) has a number of corollaries. Suppose that Y follows an ERGM. 1
The graph on a subset of nodes, disconnected from the rest, follows again an ERGM with the "same" specification.
2
A connected graph Y1 on a subset of nodes, disconnected from the rest, follows again an ERGM with the "same" specification, under the additional condition of Y1 being connected.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
11 / 27
3
Suppose we begin a snowball sample with node set B . The saturated snowball generated by B follows an ERGM with the same specification, under the additional assumption that all nodes are reachable from B. (Cf. Doreian and Woodard, 1994, for network delineation by a snowball sample.)
4
The graph without its isolates again follows an ERGM with the same specification, under the additional restriction that there are no isolates.
5
The small components:
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
c Tom A.B. Snijders
12 / 27
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
13 / 27
(Add Health Study, Moody et al.)
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
13 / 27
(Add Health Study, Moody et al.) The small connected components again follow the same ERGM distribution, under the condition of being small connected components, and here only the low-order subgraph counts play a role.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
13 / 27
(Add Health Study, Moody et al.) The small connected components again follow the same ERGM distribution, under the condition of being small connected components, and here only the low-order subgraph counts play a role. For example, consider the disconnected triads
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
14 / 27
Corollary. Let N0 be the number of isolated 3-node connected subgraphs; these are isolated twopaths or isolated triangles. In the original ERGM denote the coefficient of the number of edges by θE , the coefficient of the number of two-stars by θS2 , and the coefficient of the number of triangles by θT . Then, conditional on N0 , the number of isolated triangles has a binomial distribution with binomial denominator N0 and probability parameter exp(θE + 2θS2 + θT ) . 1 + exp(θE + 2θS2 + θT )
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
15 / 27
How this works in practice was studied in a simulation study.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
15 / 27
How this works in practice was studied in a simulation study. Simulation design: Non-directed network with n = 100 nodes. parameters: edges –3.5 alt. k -stars 0.2 alt. 2-paths –0.4 alt. k -triangles 1.5
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
15 / 27
How this works in practice was studied in a simulation study. Simulation design: Non-directed network with n = 100 nodes. parameters: edges –3.5 alt. k -stars 0.2 alt. 2-paths –0.4 alt. k -triangles 1.5 This yields average degrees ∼ 2.1.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
16 / 27
200 replications. For each replication, one network was generated; the parameters were estimated for this network, and for the largest connected component (‘saturated snowball’), if larger than 50, under the condition of connectedness. Extra requirements for Metropolis Hastings steps.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
16 / 27
200 replications. For each replication, one network was generated; the parameters were estimated for this network, and for the largest connected component (‘saturated snowball’), if larger than 50, under the condition of connectedness. Extra requirements for Metropolis Hastings steps. Expectations:
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
16 / 27
200 replications. For each replication, one network was generated; the parameters were estimated for this network, and for the largest connected component (‘saturated snowball’), if larger than 50, under the condition of connectedness. Extra requirements for Metropolis Hastings steps. Expectations: 1
Parameter estimates appr. unbiased in either case.
2
Larger standard errors for saturated snowball.
3
Larger standard errors for smaller sat. snowballs.
4
Type-I error rates appr. correct. c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
17 / 27
Of the 400 estimations, 28 had t-ratios for convergence > 0.15. These were discarded. Distribution of sizes of giant components
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
18 / 27
Parameter estimates: degree; with fitted means.
edge parameter = –3.5 downward bias
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
19 / 27
Parameter estimates: alternating k -stars; with fitted means.
a. k -star parameter = 0.2
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
20 / 27
Parameter estimates: alternating two-paths; with fitted means.
a. twopaths parameter = –0.4
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
21 / 27
Parameter estimates: alternating k -triangles; with fitted means.
a. k -tri. parameter = 1.5 upward bias for snowballs
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
22 / 27
Mean absolute errors: degree; with fitted means.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
23 / 27
Mean absolute errors: alternating k -stars; with fitted means.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
24 / 27
Mean absolute errors: alternating two-paths; with fitted means.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
25 / 27
Mean absolute errors: alternating k -triangles; with fitted means.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
26 / 27
Conclusions from simulations Estimates close to unbiased for total and snowball data; slight biases for degree and alternating k -triangle parameters.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
26 / 27
Conclusions from simulations Estimates close to unbiased for total and snowball data; slight biases for degree and alternating k -triangle parameters. Standard errors slightly higher for snowballs, but mainly for degree and alternating k -triangle parameters.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
26 / 27
Conclusions from simulations Estimates close to unbiased for total and snowball data; slight biases for degree and alternating k -triangle parameters. Standard errors slightly higher for snowballs, but mainly for degree and alternating k -triangle parameters. Standard errors for snowballs not clearly dependent on size largest component.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
26 / 27
Conclusions from simulations Estimates close to unbiased for total and snowball data; slight biases for degree and alternating k -triangle parameters. Standard errors slightly higher for snowballs, but mainly for degree and alternating k -triangle parameters. Standard errors for snowballs not clearly dependent on size largest component. Empirical type-I error rates for tests of true hypotheses ranged from 0.02 to 0.04.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
26 / 27
Conclusions from simulations Estimates close to unbiased for total and snowball data; slight biases for degree and alternating k -triangle parameters. Standard errors slightly higher for snowballs, but mainly for degree and alternating k -triangle parameters. Standard errors for snowballs not clearly dependent on size largest component. Empirical type-I error rates for tests of true hypotheses ranged from 0.02 to 0.04. Therefore, this small simulation study is supportive of the theoretical results. c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
27 / 27
Conclusion Support for theoretical consistency of the ERGM.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
27 / 27
Conclusion Support for theoretical consistency of the ERGM. Marginalization & independence for mutually disconnected subgraphs is intuitive, once you think of it.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
27 / 27
Conclusion Support for theoretical consistency of the ERGM. Marginalization & independence for mutually disconnected subgraphs is intuitive, once you think of it. To avoid action at a distance: restrict statistics to subgraph counts (with attribute weights).
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
27 / 27
Conclusion Support for theoretical consistency of the ERGM. Marginalization & independence for mutually disconnected subgraphs is intuitive, once you think of it. To avoid action at a distance: restrict statistics to subgraph counts (with attribute weights). Network delineation by saturated snowball sample OK. Extra conditions must be imposed in Metropolis Hastings steps for drawing from ERGMs.
c Tom A.B. Snijders
Conditional Marginalization ERGMs
Conditional Marginalization Simulation Example
27 / 27
Conclusion Support for theoretical consistency of the ERGM. Marginalization & independence for mutually disconnected subgraphs is intuitive, once you think of it. To avoid action at a distance: restrict statistics to subgraph counts (with attribute weights). Network delineation by saturated snowball sample OK. Extra conditions must be imposed in Metropolis Hastings steps for drawing from ERGMs. Possibilities for homogeneity testing in ERGMS; are parameters the same in different components? c Tom A.B. Snijders
Conditional Marginalization ERGMs