Conditional Marginalization for Exponential Random ... - Google Sites

1 downloads 203 Views 734KB Size Report
But what can we say about marginal distributions of subgraphs? c Tom A.B. Snijders .... binomial distribution with binom
Conditional Marginalization Simulation Example

1 / 27

Conditional Marginalization for Exponential Random Graph Models Tom A.B. Snijders

University of Oxford University of Groningen

April 2010

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

2 / 27

Exponential Random Graph Model (ERGM) defined by Pθ {Y = y } = exp θ0 u(y ) − ψ(θ)



y ∈ Y(N )

where Y(N ) is the set of all graphs on a node set N . (Frank & Strauss 1986; Pattison & Wasserman 1996; Snijders, Pattison, Robins & Wasserman 2006).

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

2 / 27

Exponential Random Graph Model (ERGM) defined by Pθ {Y = y } = exp θ0 u(y ) − ψ(θ)



y ∈ Y(N )

where Y(N ) is the set of all graphs on a node set N . (Frank & Strauss 1986; Pattison & Wasserman 1996; Snijders, Pattison, Robins & Wasserman 2006). The induced subgraph on a subset of nodes N1 ⊂ N does not in general have an ERG distribution. (Frank & Strauss, 1986)

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

2 / 27

Exponential Random Graph Model (ERGM) defined by Pθ {Y = y } = exp θ0 u(y ) − ψ(θ)



y ∈ Y(N )

where Y(N ) is the set of all graphs on a node set N . (Frank & Strauss 1986; Pattison & Wasserman 1996; Snijders, Pattison, Robins & Wasserman 2006). The induced subgraph on a subset of nodes N1 ⊂ N does not in general have an ERG distribution. (Frank & Strauss, 1986) ... Is this serious ... ?

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

3 / 27

In ‘regular’ statistical models with i.i.d. observations, omitting a random subset of the data is only a loss of information, not an assault on the model’s validity: the marginal distribution of the smaller data set is still an i.i.d. sample from the same distribution – the model marginalizes straightforwardly. Therefore, the same type of statistical analysis still is applicable. For ERGMs this is not the case (trivial specifications excepted), which may be regarded to be a quirk of the model.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

3 / 27

In ‘regular’ statistical models with i.i.d. observations, omitting a random subset of the data is only a loss of information, not an assault on the model’s validity: the marginal distribution of the smaller data set is still an i.i.d. sample from the same distribution – the model marginalizes straightforwardly. Therefore, the same type of statistical analysis still is applicable. For ERGMs this is not the case (trivial specifications excepted), which may be regarded to be a quirk of the model. Taking nodes out of the graph is an amputation.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

3 / 27

In ‘regular’ statistical models with i.i.d. observations, omitting a random subset of the data is only a loss of information, not an assault on the model’s validity: the marginal distribution of the smaller data set is still an i.i.d. sample from the same distribution – the model marginalizes straightforwardly. Therefore, the same type of statistical analysis still is applicable. For ERGMs this is not the case (trivial specifications excepted), which may be regarded to be a quirk of the model. Taking nodes out of the graph is an amputation. But what can we say about marginal distributions of subgraphs? c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

4 / 27

When does the ERGM marginalize??

When does the induced subgraph on a subset N1 ⊂ N still have an ERG distribution?

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

4 / 27

When does the ERGM marginalize??

When does the induced subgraph on a subset N1 ⊂ N still have an ERG distribution?

Under the condition that the graph on N1 is disconnected from the rest.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

5 / 27

Theorem. If N = N1 ∪ N2 , under the condition that there are no connections between N1 and N2 , the induced graphs Y1 |N1 and Y2 |N2 are mutually independent, both having the ERG distribution Pθ



  Yh |Nh = yh = exp θ0 u(y˜h )) − ψh (θ) ,

where y˜h is the graph on N obtained from y by deleting all edges outside Nh . This generalizes to multiple disconnected subgraphs, and to more restrictive conditions (implying disconnection).

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

6 / 27

But there is a condition: exclude action at a distance

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

6 / 27

But there is a condition: exclude action at a distance Definition. The function u(y ) is component separable, if for any partition N = ∪H h=1 Nh , and Nh ∩ Nk = ∅ for h 6= k , and any graph y that has no edges between Nh and Nk for any h 6= k , u(y ) can be written as u(y ) =

H X

u(y˜h ) + ud

h=1

where y˜h is the graph on N obtained from y by deleting all edges outside Nh . c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

7 / 27

For ERGMs, component separability is equivalent to disconnected induced subgraphs being independent.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

7 / 27

For ERGMs, component separability is equivalent to disconnected induced subgraphs being independent. Examples of ERGMs that are not component separable 1

u(y ) =

sX

yij

ij

Nonlinear functions of subgraph counts. 2

1 u(y ) = 8

X

.

.

.

.

i

j

h

k

yij yhk ,

i,j,h,k :{i,j}∩{h,k }=∅

the count of subgraphs on four points, composed of only two disconnected edges. c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

8 / 27

An aside ...

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

8 / 27

An aside ...

In specifications of ERGMs, use only statistics that are linear combinations of subgraph counts (with weights depending on covariates) unless you are willing to permit action at a distance.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

9 / 27

Theorem. For a component separable ERGM with sufficient statistic u(y ), and N1 , . . . , NH a partition of the node set, let A0 be the event that the subsets Nh all are mutually disconnected, and Ah be an event referring only to the induced graph Y |Nh . Then conditional on A0 ∩ A1 ∩ . . . ∩ AH , the subgraphs Y |Nh for h = 1, . . . , H are independent, and (    exp θ0 u(y˜h ) − ψh (θ) if yh satisfies Ah Pθ Y |Nh = yh = 0 otherwise.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

10 / 27

The theorem (in its extended form) has a number of corollaries. Suppose that Y follows an ERGM.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

10 / 27

The theorem (in its extended form) has a number of corollaries. Suppose that Y follows an ERGM. 1

The graph on a subset of nodes, disconnected from the rest, follows again an ERGM with the "same" specification.

2

A connected graph Y1 on a subset of nodes, disconnected from the rest, follows again an ERGM with the "same" specification, under the additional condition of Y1 being connected.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

11 / 27

3

Suppose we begin a snowball sample with node set B . The saturated snowball generated by B follows an ERGM with the same specification, under the additional assumption that all nodes are reachable from B. (Cf. Doreian and Woodard, 1994, for network delineation by a snowball sample.)

4

The graph without its isolates again follows an ERGM with the same specification, under the additional restriction that there are no isolates.

5

The small components:

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

c Tom A.B. Snijders

12 / 27

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

13 / 27

(Add Health Study, Moody et al.)

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

13 / 27

(Add Health Study, Moody et al.) The small connected components again follow the same ERGM distribution, under the condition of being small connected components, and here only the low-order subgraph counts play a role.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

13 / 27

(Add Health Study, Moody et al.) The small connected components again follow the same ERGM distribution, under the condition of being small connected components, and here only the low-order subgraph counts play a role. For example, consider the disconnected triads

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

14 / 27

Corollary. Let N0 be the number of isolated 3-node connected subgraphs; these are isolated twopaths or isolated triangles. In the original ERGM denote the coefficient of the number of edges by θE , the coefficient of the number of two-stars by θS2 , and the coefficient of the number of triangles by θT . Then, conditional on N0 , the number of isolated triangles has a binomial distribution with binomial denominator N0 and probability parameter exp(θE + 2θS2 + θT ) . 1 + exp(θE + 2θS2 + θT )

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

15 / 27

How this works in practice was studied in a simulation study.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

15 / 27

How this works in practice was studied in a simulation study. Simulation design: Non-directed network with n = 100 nodes. parameters: edges –3.5 alt. k -stars 0.2 alt. 2-paths –0.4 alt. k -triangles 1.5

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

15 / 27

How this works in practice was studied in a simulation study. Simulation design: Non-directed network with n = 100 nodes. parameters: edges –3.5 alt. k -stars 0.2 alt. 2-paths –0.4 alt. k -triangles 1.5 This yields average degrees ∼ 2.1.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

16 / 27

200 replications. For each replication, one network was generated; the parameters were estimated for this network, and for the largest connected component (‘saturated snowball’), if larger than 50, under the condition of connectedness. Extra requirements for Metropolis Hastings steps.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

16 / 27

200 replications. For each replication, one network was generated; the parameters were estimated for this network, and for the largest connected component (‘saturated snowball’), if larger than 50, under the condition of connectedness. Extra requirements for Metropolis Hastings steps. Expectations:

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

16 / 27

200 replications. For each replication, one network was generated; the parameters were estimated for this network, and for the largest connected component (‘saturated snowball’), if larger than 50, under the condition of connectedness. Extra requirements for Metropolis Hastings steps. Expectations: 1

Parameter estimates appr. unbiased in either case.

2

Larger standard errors for saturated snowball.

3

Larger standard errors for smaller sat. snowballs.

4

Type-I error rates appr. correct. c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

17 / 27

Of the 400 estimations, 28 had t-ratios for convergence > 0.15. These were discarded. Distribution of sizes of giant components

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

18 / 27

Parameter estimates: degree; with fitted means.

edge parameter = –3.5 downward bias

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

19 / 27

Parameter estimates: alternating k -stars; with fitted means.

a. k -star parameter = 0.2

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

20 / 27

Parameter estimates: alternating two-paths; with fitted means.

a. twopaths parameter = –0.4

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

21 / 27

Parameter estimates: alternating k -triangles; with fitted means.

a. k -tri. parameter = 1.5 upward bias for snowballs

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

22 / 27

Mean absolute errors: degree; with fitted means.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

23 / 27

Mean absolute errors: alternating k -stars; with fitted means.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

24 / 27

Mean absolute errors: alternating two-paths; with fitted means.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

25 / 27

Mean absolute errors: alternating k -triangles; with fitted means.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

26 / 27

Conclusions from simulations Estimates close to unbiased for total and snowball data; slight biases for degree and alternating k -triangle parameters.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

26 / 27

Conclusions from simulations Estimates close to unbiased for total and snowball data; slight biases for degree and alternating k -triangle parameters. Standard errors slightly higher for snowballs, but mainly for degree and alternating k -triangle parameters.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

26 / 27

Conclusions from simulations Estimates close to unbiased for total and snowball data; slight biases for degree and alternating k -triangle parameters. Standard errors slightly higher for snowballs, but mainly for degree and alternating k -triangle parameters. Standard errors for snowballs not clearly dependent on size largest component.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

26 / 27

Conclusions from simulations Estimates close to unbiased for total and snowball data; slight biases for degree and alternating k -triangle parameters. Standard errors slightly higher for snowballs, but mainly for degree and alternating k -triangle parameters. Standard errors for snowballs not clearly dependent on size largest component. Empirical type-I error rates for tests of true hypotheses ranged from 0.02 to 0.04.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

26 / 27

Conclusions from simulations Estimates close to unbiased for total and snowball data; slight biases for degree and alternating k -triangle parameters. Standard errors slightly higher for snowballs, but mainly for degree and alternating k -triangle parameters. Standard errors for snowballs not clearly dependent on size largest component. Empirical type-I error rates for tests of true hypotheses ranged from 0.02 to 0.04. Therefore, this small simulation study is supportive of the theoretical results. c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

27 / 27

Conclusion Support for theoretical consistency of the ERGM.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

27 / 27

Conclusion Support for theoretical consistency of the ERGM. Marginalization & independence for mutually disconnected subgraphs is intuitive, once you think of it.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

27 / 27

Conclusion Support for theoretical consistency of the ERGM. Marginalization & independence for mutually disconnected subgraphs is intuitive, once you think of it. To avoid action at a distance: restrict statistics to subgraph counts (with attribute weights).

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

27 / 27

Conclusion Support for theoretical consistency of the ERGM. Marginalization & independence for mutually disconnected subgraphs is intuitive, once you think of it. To avoid action at a distance: restrict statistics to subgraph counts (with attribute weights). Network delineation by saturated snowball sample OK. Extra conditions must be imposed in Metropolis Hastings steps for drawing from ERGMs.

c Tom A.B. Snijders

Conditional Marginalization ERGMs

Conditional Marginalization Simulation Example

27 / 27

Conclusion Support for theoretical consistency of the ERGM. Marginalization & independence for mutually disconnected subgraphs is intuitive, once you think of it. To avoid action at a distance: restrict statistics to subgraph counts (with attribute weights). Network delineation by saturated snowball sample OK. Extra conditions must be imposed in Metropolis Hastings steps for drawing from ERGMs. Possibilities for homogeneity testing in ERGMS; are parameters the same in different components? c Tom A.B. Snijders

Conditional Marginalization ERGMs

Suggest Documents