Jul 15, 2017 - Infinite-Population Random Graph Inference. M. Schweinberger. P. N. Krivitsky. C. T. Butts. Abstract. An important problem in the statistical ...
arXiv:1707.04800v1 [stat.ME] 15 Jul 2017
Foundations of Finite-, Super-, and Infinite-Population Random Graph Inference M. Schweinberger
P. N. Krivitsky
C. T. Butts
Abstract An important problem in the statistical analysis of network data is that network data are non-standard data and therefore the meaning of core statistical notions, such as sample and population, is not obvious. All too often, the meaning of such core notions has been left implicit, which has led to considerable confusion. Starting from first principles, we build a statistical framework encompassing a wide range of inference scenarios and distinguish the graph generating process from the observation process. We discuss inference for graphs of fixed size, including finite- and super-population inference, and inference for sequences of graphs of increasing size. We review invariance properties of sequences of graphs of increasing size, including invariance to the labeling of nodes, invariance of expected degrees of nodes, and projectivity, and discuss implications in terms of inference. We conclude with consistency and asymptotic normality results for estimators in finite-, super-, and infinite-population inference scenarios. Keywords: social networks; random graphs; exponential-family random graph models; ERGMs; projective families
Contents 1 Introduction 1.1 Topics not covered . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 5
2 Models 2.1 Reference measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Sufficient statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Parameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 7 7 8
1
3 Complete- and incomplete-data generating process 3.1 Complete-data generating process . . . . . . . . . . . . . . . . . . . 3.1.1 Finite graphs: finite-population inference . . . . . . . . . . . 3.1.2 Finite graphs: superpopulation inference . . . . . . . . . . . 3.1.3 Sequences of graphs: infinite-population inference . . . . . . 3.1.4 Sequences of graphs with size-dependent parameterizations . 3.1.5 Sequences of graphs with node-dependent parameterizations 3.1.6 Constructing sequences of graphs of increasing size . . . . . 3.2 Incomplete-data generating process . . . . . . . . . . . . . . . . . . 3.2.1 Sampling nodes: ego-centric sampling and link-tracing . . . 3.2.2 Sampling pairs of nodes: edge sampling . . . . . . . . . . . . 3.2.3 Sampling subgraphs . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Missing data . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Ignorable incomplete-data generating processes . . . . . . . 3.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Subgraph-to-graph inference problem . . . . . . . . . . . . . 3.3.2 The number of nodes “n” is not the sample size . . . . . . . 4 Invariance properties of sequences of random graphs 4.1 Invariance to labeling of nodes: exchangeable random graphs . 4.2 Invariance of expected degrees of nodes: sparse random graphs 4.3 Invariance in the form of projectivity . . . . . . . . . . . . . . 4.3.1 Implications of strong projectivity in terms of modeling 4.3.2 Implications of strong projectivity in terms of inference 5 Consistency and asymptotic normality of estimators 5.1 Finite-population inference . . . . . . . . . . . . . . . . . . . 5.2 Superpopulation inference: finite graph . . . . . . . . . . . . 5.3 Super- and infinite-population inference: sequences of graphs 5.3.1 Dyad-independence models . . . . . . . . . . . . . . . 5.3.2 Dyad-dependence models . . . . . . . . . . . . . . . . 6 Conclusion
1
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . . . . . . . . . .
10 11 11 12 13 14 15 15 16 17 18 18 18 19 19 19 20
. . . . .
21 21 23 23 26 27
. . . . .
29 29 30 32 32 33 35
Introduction
The statistical analysis of network data is an emerging area of statistics (Kolaczyk, 2009), which has in common with the well-established field of spatial statistics that 2
data are dependent and structured (Ripley, 1988). In its simplest and most common form, network data can be represented by a graph consisting of a set of entities, called nodes, and a set of pairwise relationships, called edges. Examples of network data include contact networks arising in the study of infectious diseases (e.g., Ebola, HIV), protein-interaction and gene-interaction networks, insurgencies and terrorist networks, friendship networks, online social networks (e.g., Facebook, Twitter), and recommendation networks (e.g., Amazon). Stochastic models of networks have been known since the 1930s (Moreno & Jennings, 1938; Erd˝os & R´enyi, 1959; Gilbert, 1959), but random graph models of networks with complex structure did not emerge until the pioneering work of Holland & Leinhardt (1981) and Frank & Strauss (1986) in the 1980s. While important advances were made in the 1990s (e.g., Strauss & Ikeda, 1990; Wasserman & Pattison, 1996; Pattison & Wasserman, 1999), it was not until the advent of Markov chain Monte Carlo methods and other computational and statistical advances during the 2000s that these models became widely available (e.g., Snijders, 2002; Handcock et al., 2008; Wang et al., 2009; Snijders et al., 2006). Fueled by these statistical and computational innovations, the past decade has seen important advances in random graph models (Kolaczyk, 2009; Goldenberg et al., 2009; Fienberg, 2012; Hunter et al., 2012). The increasing application of random graph models, ranging from applications to studying the structure of the human brain (e.g., Simpson et al., 2012) to social networks (e.g., Lusher et al., 2013), has begun to establish a body of knowledge regarding the capabilities and pitfalls of the current generation of random graph models, and of the conceptual challenges that remain to be addressed. One such challenge is that network data are relational data—i.e., data beyond the attributes of individuals—and are hence non-standard data from a statistical point of view. As a result, the meaning of core statistical notions, such as “sample” and “population,” is not obvious. All too often, the meaning of such core notions has been left implicit in the seminal literature in the area (e.g., Holland & Leinhardt, 1981; Frank & Strauss, 1986; Snijders et al., 2006), which has led to considerable confusion. Some of the more recent literature has attempted to reduce this confusion, but has in some cases added to it. A case in point is the confusion surrounding the wellknown subgraph-to-graph inference problem discussed by Shalizi & Rinaldo (2013), which we henceforth abbreviate as SR. SR were interested in likelihood-based inference for a parameter θ of a population model PN,θ that generated a graph yN defined on a finite population of nodes N, given an observed subgraph yN′ of yN induced by a subset of nodes N′ ⊂ N. For example, yN may correspond to a network of collaborations among U.S. senators and yN′ may correspond to a subnetwork of collaborations 3
among a subset of U.S. senators N′ ⊂ N. Despite being interested in likelihood-based inference, SR considered statistical inference based on PN′ ,θ (YN′ = yN′ ), which is not proportional to the likelihood unless the population model and the sampling design satisfy additional conditions (Handcock & Gile, 2010; Koskinen et al., 2010). The misspecification of the likelihood may be rooted in the fact that SR neither specified the goal of statistical inference nor the complete- and incomplete-data generating process in the sense of Rubin (1976). Thus, owing to the misspecification of the likelihood, the results of SR are not applicable to likelihood-based superpopulation inference, i.e., the motivating example of SR. The confusion surrounding the subgraph-to-graph inference problem demonstrates the need for proper statistical language, specifying relevant assumptions regarding the graph generating and observation process and allowing a clear specification of the associated statistical problems. Here, starting from first principles, we introduce a statistical framework to describe such processes and accomplish the following aims: • We separate the complete-data generating process (i.e., the graph generating process) from the incomplete-data generating process (i.e., the observation process). • We distinguish statistical inference for graphs of fixed size, including finite- and super-population inference, and statistical inference for sequences of graphs of increasing size. • We discuss desirable forms of invariance of sequences of random graph models, including invariance to the labeling of nodes, invariance of expected degrees of nodes to network size, and projectivity, and discuss implications in terms of statistical inference. • We clarify one of the main sources of confusion in the literature, the role of projectivity in statistical modeling and inference. • We discuss consistency and asymptotic normality of estimators in finite-, super-, and infinite-population scenarios. In so doing, we show that when proper questions are asked, proper answers to substantive questions can be obtained. The paper is structured as follows. Section 2 introduces models. Section 3 discusses possible goals of statistical inference along with complete- and incomplete-data generating processes. Section 4 reviews desirable forms of invariance of sequences of random graph models. Section 5 discusses consistency and asymptotic normality of estimators.
4
1.1
Topics not covered
The statistical analysis of network data is too broad a field to cover all interesting topics in a single paper. Therefore, we focus here first and foremost on parametric random graph models without latent variables, i.e., we focus on Bernoulli random graphs (Erd˝os & R´enyi, 1960) and the more general exponential-family random graphs (Kolaczyk, 2009; Lusher et al., 2013; Harris, 2013). In other words, we do not discuss: • nonparametric models (e.g., Butts, 2007; Dekker et al., 2007). • latent variable models, such as stochastic block models (Nowicki & Snijders, 2001) and extensions (e.g., Airoldi et al., 2008), latent space models (Hoff et al., 2002) and extensions (e.g., Schweinberger & Snijders, 2003; Handcock et al., 2007; Sewell & Chen, 2015), and exponential-family random graphs with latent variables (Koskinen, 2009; Schweinberger & Handcock, 2015). • time-dependent random graphs (Snijders, 2001; Butts, 2008; Hanneke et al., 2010; Krivitsky & Handcock, 2014). Likewise, we do not discuss specification issues, except as needed to address core issues of the paper (e.g., invariance of sequences of random graph models). While the specification of random graph models raises important challenges, model specification and misspecification and the related topic of model degeneracy (Handcock, 2003; Schweinberger, 2011; Chatterjee & Diaconis, 2013) are complex issues requiring independent treatment. We do note the following: some models—but not all models—are known to be ill-posed: e.g., Strauss (1986) first observed that the edge and triangle model is near-degenerate in the sense that it places most probability mass on graphs with almost all edges and triangles when the triangle parameter is positive. Such illposed models have been studied by Jonasson (1999), Handcock (2003), Schweinberger (2011), Butts (2011), and Chatterjee & Diaconis (2013). Since statistical inference for ill-posed models is pointless (except in the small subset of the parameter space in which these models are well-behaved), we do not consider statistical inference for the edge and triangle model and other models that have been known to be ill-posed since the 1980s. There are many sensible alternatives, such as curved exponentialfamily random graphs with geometrically weighted model terms (Snijders et al., 2006; Hunter & Handcock, 2006; Hunter et al., 2008; Krivitsky, 2012). An introduction to such models can be found in, e.g., Goodreau et al. (2008) and Lusher et al. (2013). Other promising alternatives are exponential-family random graphs with local dependence (Schweinberger & Handcock, 2015), including exponential-family random graphs with multilevel structure, and nonparametric exponential-family random graphs 5
(Thiemichen & Kauermann, 2017).
2
Models
To introduce random graphs, let N be a set of nodes and DN ⊆ N × N be a set of pairs of nodes. The nodes i ∈ N may have one or more attributes xi ∈ Xi (e.g., race). We denote the attributes of nodes by xN ∈ XN . In addition to attributes of nodes, pairs of nodes (i, j) ∈ DN may be connected by edges, which are considered to be random variables Yi,j ∈ Yi,j and may take on values in a countable set Yi,j . We write YN = (Yi,j )(i,j)∈DN and YN = (i,j)∈DN Yi,j . The edges may be directed or undirected. A random graph is called undirected if Yi,j = Yj,i for all (i, j) ∈ DN with probability 1 and otherwise directed. Throughout, we consider undirected random graphs unless stated otherwise. It is convenient to represent random graph models—such as Bernoulli random graphs (Erd˝os & R´enyi, 1960) and more general random graph models—in exponentialfamily form (Barndorff-Nielsen, 1978; Brown, 1986). Throughout, we consider discrete exponential families of densities with respect to a reference measure ν with countable support YN , specified by a sufficient statistic s : XN × YN 7→ Rp and a map η : Θ × N 7→ Rp with Θ ⊆ {θ ∈ Rq : ψ(θ, N) < ∞}:
×
d PN,η(θ,N) (yN ) = exp{hη(θ, N), s(xN , yN )i − ψ(θ, N)}, yN ∈ YN , dν where hη(θ, N), s(xN , yN )i denotes the inner product of natural parameter η(θ, N) and sufficient statistic s(xN , yN ) and Z ψ(θ, N) = log exp{hη(θ, N), s(xN , yN′ )i} d ν(yN′ ). YN
We note that most of these quantities depend on the set of nodes N, but we suppress the dependence on N unless it is essential. One quantity whose dependence on N is essential is the natural parameter η(θ, N). The natural parameter η(θ, N) can depend on the number of nodes |N| (e.g., to capture sparsity of random graphs) or individual nodes (e.g., to capture heterogeneity across nodes), which we demonstrate in Section 2.3. Before doing so, we discuss the choice of reference measure (Section 2.1), sufficient statistic (Section 2.2), and parameterization (Section 2.3). Attributes: non-random or random. The attributes of nodes may either be exogenous, non-random (e.g., race) or endogenous, random (e.g., political preference), 6
governed by a joint probability model for both the random attributes and the random graph. Model extensions along these lines were considered by Fellows & Handcock (2012) in an exponential-family framework and have long been popular in the temporal network model literature (Snijders et al., 2007). Latent variable models of random attributes and random graphs were proposed by Fosdick & Hoff (2015). We do not consider them here, but the general framework described in Section 3 can be extended to cover both non-random and random attributes. Attributes: spatial or other contexts of nodes. In addition to attributes describing properties of nodes, attributes may describe contexts in which nodes are embedded. Spatial locations are an important example of such attributes: the space in question may be latent (Hoff et al., 2002), geographical (Butts & Acton, 2011), or constructed from social and demographic attributes (McPherson, 1983). More often than not, the probability of an edge is a decreasing function of the distance in the embedding space, the form of which has considerable impact on the structure of the random graph (Butts & Acton, 2011). An important feature of spatially embedded networks is that network growth is often associated with increased spatial dispersion. As a result, spatial models with reasonable assumptions induce sparsity (Butts et al., 2012), which is an important property of random graphs discussed in Section 2.3.
2.1
Reference measure
|N| For binary random graphs with YN = {0, 1}( 2 ) , a natural choice of the reference measure ν is counting measure on YN , though there are situations where one may choose other reference measures: e.g., to model sparse random graphs, one may specify reference measures that assign more weight to sparse graphs than dense graphs (Krivitsky et al., 2011; Krivitsky & Kolaczyk, 2015; Butts & Almquist, 2015). For non-binary random graphs, counting measure may not be the most natural reference measure. A discussion of reference measures for non-binary, network count data can be found in Krivitsky (2012).
2.2
Sufficient statistic
In classical statistics, it is common to specify models by first choosing an exponential family (e.g., Gaussians) and then deducing the sufficient statistics of the exponential family. In statistical network analysis (Wasserman & Faust, 1994; Lusher et al., 2013), it is more common to reverse these steps by first choosing sufficient statis7
tics that capture interesting features of graphs (e.g., the number of edges) and then base statistical inference on the exponential family generated by the chosen sufficient statistics. Model construction along these lines takes advantage of the maximum entropy property of exponential families (Barndorff-Nielsen, 1978; Geyer & Thompson, 1992; Handcock, 2003). In practice, the flexibility of specifying models by specifying sufficient statistics has made them popular among network scientists (Lusher et al., 2013). Examples of popular sufficient statistics are functions of attributes of nodes and edges; the degrees of nodes, i.e., the numbers of edges of nodes; k-stars to model brokerage in networks; k-cycles and k-triangles to model cyclical and transitive closure in networks; and countless others (e.g., Morris et al., 2008). Some examples can be found in Section 2.3.
2.3
Parameterization
There are many parameterizations of random graph models. Here, we give selected examples of parameterizations: • Dense Bernoulli(π) random graphs (Erd˝os & R´enyi, 1960) with the number of P edges i∈N < j∈N yi,j as sufficient statistic and natural parameter η(θ, N) = θ,
which capture the overall propensity to form edges in dense graphs, i.e., random graphs with expected number of edges of order |N|2 . Dense Bernoulli(π) random graphs assume that edges are independent Bernoulli(π) random variables, where π = logit−1 (θ) denotes the probability of an edge and the natural parameter η(θ, N) = θ = logit(π) is the log odds of π. • Sparse Bernoulli(π|N| ) random graphs (Erd˝os & R´enyi, 1960; Krivitsky et al., P 2011) with the number of edges i∈N < j∈N yi,j as sufficient statistic and natural parameter η(θ, N) = θ − log |N|, which capture the overall propensity to form edges in sparse graphs, i.e., random graphs with expected number of edges of order |N|. Here, π|N| = logit−1 (θ − log |N|) denotes the probability of an edge, which is a function of a size-invariant parameter θ and a size-dependent offset log |N|. P • β-models (Diaconis et al., 2011) with the degrees j∈N, j6=i yi,j of nodes i ∈ N as sufficient statistics and natural parameters ηi,j (θ, {i, j}) = θi + θj ,
i ∈ N,
j ∈ N,
which capture the propensities of nodes i and j to form edges. 8
• Curved exponential-family random graphs (Snijders et al., 2006; Hunter & Handcock, 2006; Hunter, 2007) with the numP ber of edges i∈N < j∈N yi,j and the numbers of pairs of nodes with k = 1, . . . , |N|− P P 2 edgewise shared partners i∈N < j∈N yi,j I( h∈N, h6=i,j yi,h yj,h = k) as sufficient statistics and natural parameters η1 (θ, N) = θ1 η1+k (θ, N) = θ2 exp(α) [1 − (1 − exp(−α))k ], k = 1, . . . , |N| − 2, P which capture transitive closure; here, I( h∈N, h6=i,j yi,h yj,h = k) is an indicator P function that is 1 if h∈N, h6=i,j yi,h yj,h = k and is 0 otherwise. Such models were proposed by Snijders et al. (2006), Hunter & Handcock (2006), and Hunter (2007) and are better suited to capturing transitive closure than models with triangle terms (e.g., Hunter et al., 2008). • Canonical exponential-family random graphs (Hunter et al., 2012) with the P number of edges and transitive edges i∈N < j∈N yi,j P P i∈N < j∈N yi,j I( h∈N, h6=i,j yi,h yj,h ≥ 1) as sufficient statistics and natural parameters η1 (θ, N) = θ1 , η2 (θ, N) = θ2 , which capture transitive closure. Such models are special cases of curved exponential-family random graphs with edge and geometrically weighted edgewise shared partner terms and α = 0. These models are likewise better behaved than models with edge and triangle terms (Hunter et al., 2012). All of the described models can be represented as exponential families of densities with |N| respect to counting measure on YN = {0, 1}( 2 ) . We note that in the case of sparse Bernoulli(π|N| ) random graphs, the offset log |N| can be absorbed into the reference measure, so that the resulting reference measure assigns more weight to sparse graphs than dense graphs (Krivitsky et al., 2011). More flexible reference measures for sparse random graphs are described by Butts & Almquist (2015). There are countless other models—indeed, the flexibility of the exponential-family random graph framework is one of its greatest advantages. While reviewing the full range of models is impossible, it is important to stress two observations: first, η(θ, N) may be a linear or non-linear function of a lower-dimensional parameter θ; and, second, η(θ, N) may depend on the set of nodes N. Indeed, there are good reasons why η(θ, N) should depend on N, as we explain in Section 3.1.4 (size-dependent parameterizations), Section 3.1.5 (node-dependent parameterizations), and Section 4.2 (invariance of expected degrees of nodes to network size). 9
3
Complete- and incomplete-data generating process
The subgraph-to-graph inference problem discussed in Section 1 demonstrates the confusion that can arise when the goal of statistical inference and the complete- and incomplete-data generating process are not specified. To reduce the confusion, we follow the principled approach of Rubin (1976) and distinguish the complete-data generating process (generating the population graph) from the incomplete-data generating process (determining which parts of the population graph are observed). A failure to take both of these processes into account can lead to unwarranted conclusions, as discussed by Rubin (1976), Dawid & Dickey (1977), Handcock & Gile (2010), and Koskinen et al. (2010). We discuss completeand incomplete-data generating processes in Sections 3.1 and 3.2, respectively. We demonstrate in Section 3.3 that the resulting statistical framework helps clarify statistical issues, including the confusion surrounding the subgraph-to-graph inference problem. The specification of the complete-data generating process serves at least two additional purposes. First, the parameters of the complete-data generating process constitute the natural target of statistical inference. Second, the population graph or superpopulation of population graphs generated by the complete-data generating process is the population or superpopulation to which statistical inferences generalize. In addition, the complete-data generating process is coupled with the goal of statistical inference. We distinguish three broad goals of statistical inference: finite-, super-, and infinite-population inference, which we define below. These notions are inspired by the corresponding notions in classical statistics (e.g., Hartley & Sielken, 1975). We adapt them here to the statistical analysis of network data. Definition. Finite-population inference is concerned with a finite population of nodes N and a population graph yN defined on N. It does not assume that the population graph was generated by a population model. The goal is to estimate functions of the population graph, such as the number of edges in the population graph. Definition. Superpopulation inference is concerned with a finite population of nodes N and a population graph yN defined on N. In contrast to finite-population inference, it is assumes that the population graph was generated by a population model. The goal is to estimate the parameters of the population model. Definition. Infinite-population inference is concerned with an infinite pop10
ulation of nodes N and a population graph yN′ defined on N′ ⊂ N. The goal is to estimate the parameters of the population model.
3.1
Complete-data generating process
The complete-data generating process is the process that generates the complete data, i.e., the population graph of interest. It is possible to make no assumptions about the complete-data generating process, leading to finite-population inference (Section 3.1.1). If the process that generates graphs is of substantive interest, one may specify a superpopulation. The specification of a superpopulation may assume that the size(s) of the graph(s) are either fixed or limited to a finite range of possible sizes, leading to superpopulation inference on models of graphs of the same size or similar sizes (Section 3.1.2). An alternative is to make assumptions about how the model behaves as the size and composition of the set of nodes N changes, leading to infinite-population inference on models of sequences of graphs of increasing size (Section 3.1.3). We discuss these cases in turn. 3.1.1
Finite graphs: finite-population inference
In some applications, it is neither necessary nor desirable to make assumptions about the complete-data generating process. An example is a network of sexual relationships between HIV-infected residents and non-infected residents of New York City, where the goal is to estimate the number of sexual contacts between HIV-infected and noninfected residents. The population of interest N consists of the residents of New York City and the population graph yN consists of sexual relationships between residents of New York City. If the whole population graph yN is observed, the population graph can be used to answer the question of interest by counting the number of sexual relationships between HIV-infected and non-infected residents of New York City. If it is not possible to observe the whole population graph yN but a sample of sexual relationships is generated as discussed in Section 3.2, then the sample can be used to construct an estimator of the number of sexual relationships between HIVinfected and non-infected residents of New York City. But, regardless of whether the whole population graph yN is observed, answering the question of interest does not require any assumption about the complete-data generating process. In such situations, finite-population inference is all that is needed to answer the question of interest.
11
Target of statistical inference. In finite-population inference, any function of the population graph yN is a legitimate target of statistical inference: e.g., in the sexual network example described above, researchers may be interested in estimating the number of sexual relationships between HIV-infected and non-infected residents of New York City. Other possible targets of statistical inference include the number of edges, the degree distribution, and the clustering coefficient of population graph yN . Here, model-based inference is neither necessary nor desirable and design-based inference is all that is needed (Kurant et al., 2012; Gjoka et al., 2014, 2015). A special case in which finite-population inference connects with parametric random graph models is treated by Krivitsky & Morris (2017). Let s(xN , yN ) be a function of attributes of population members xN and the population graph yN of interest. Suppose that it is desired to obtain graphs that are similar to the population graph. To do so, one can exploit properties of exponential families as follows. Define def
θ(xN , yN ) = arg max exp{hη(θ ′, N), s(xN , yN )i − ψ(θ ′ , N)} θ′ ∈Θ
and note that the maximizer θ(xN , yN ) exists as long as s(xN , yN ) falls into the interior of the convex hull of the set {s(xN , yN ) : yN ∈ YN } (Barndorff-Nielsen, 1978, p. 151). The function θ(xN , yN ) is a function of the attributes of population members xN and the population graph yN and is hence a legitimate target of finite-population inference. We note that the maximizer θ(xN , yN ) is equivalent to the maximum likelihood estimate, but θ(xN , yN ) is not random, because neither xN nor yN are random. In fact, if the whole population graph yN is observed, then the maximizer can in principle be computed without error, though in practice one may have to approximate the maximizer by using Monte Carlo maximum likelihood estimates as described by Krivitsky & Morris (2017). The function θ(xN , yN ) is of interest, because it can be used to simulate graphs that are similar to the population graph: by well-known exponential-family properties (Brown, 1986, Theorem 5.5, p. 148), the expected sufficient statistic s(xN , YN ) matches the sufficient statistic s(xN , yN ) of the population N under θ(xN , yN ). Thus, simulated graphs will be similar to the population graph yN . 3.1.2
Finite graphs: superpopulation inference
While in some applications, it may neither be necessary nor desirable to make assumptions about the complete-data generating process, in other applications the completedata generating process is of substantive interest. An example is given by scientists who are interested in the underlying process that generates populations of graphs: 12
e.g., sociologists studying friendship and bullying networks in schools may wish to gain insights about the social forces governing these networks, intended to be predictive of networks in the same or similar social settings. Here, the interest centers on a population model PN,η(θ,N) that generates finite graphs of the same size or similar sizes, without postulating a model for network growth. Target of statistical inference. In superpopulation inference, the target of statistical inference is the parameter θ of the population model PN,η(θ,N) that generated the population graph yN and governs the superpopulation consisting of all possible population graphs of the same size or a finite range of possible sizes. We note that even when the whole population graph is observed, uncertainty arises from the fact that the parameter θ is unknown. 3.1.3
Sequences of graphs: infinite-population inference
In both statistical practice and theory, it is sometimes convenient to consider sequences of graphs of increasing size. In many such situations, there is an explicit or implicit assumption that there exists a graph limit—i.e., an infinite graph defined on an infinite population—to which sequences of graphs converge (Lov´asz, 2012). We therefore refer to statistical inference based on sequences of graphs of increasing size as infinite-population inference, despite the fact that researchers in practice may be more interested in subsequences of graphs of finite sizes rather than the graph limit itself. In statistical practice, sequences of graphs of increasing size may be meaningful when, e.g., one observes two or more graphs of different sizes and wishes to formulate a model that is invariant in a well-defined sense, e.g., invariant in the sense that the expected degrees of nodes do not depend on network size (Krivitsky et al., 2011). An example are relationships constrained by geographical distance: e.g., consider exercise networks in New York City and Seattle, where an edge is said to exist if two residents meet at least twice a month to workout together. While New York City has more than 10 times as many residents as Seattle, it is not credible that the expected number of workout partners of New York City residents is more than 10 times larger than the expected number of workout partners of Seattle residents. In such situations, it is convenient to formulate a model of sequences of graphs of increasing size such that the expected degrees of nodes are invariant to network size and consider the two observed graphs—the large New York City exercise network and the small Seattle exercise network—as two observations taken from a sequence of graphs generated by
13
the model (Krivitsky et al., 2011). In statistical theory, it is convenient to embed observed data (e.g., an observed graph) into a sequence of data sets of increasing size (e.g., a sequence of graphs of increasing size), which is a classic approach in statistical theory: e.g., Lehmann (1999) suggested “...to embed the actual situation in a sequence of situations, the limit of which serves as the desired approximation” (Lehmann, 1999, p. 1). Sequences of graphs of increasing size can be constructed in many ways, e.g., graphs can grow by adding nodes or subsets of nodes along with edges. To cover a wide range of sequences of graphs of increasing size, including cumulative and noncumulative sequences, let A1 , A2 , . . . be a sequence of sets of nodes and N1 , N2 , . . . S be a sequence of sets of nodes satisfying Nk ⊆ kl=1 Al . Suppose that the sequence of random graphs YN1 , YN2 , . . . is generated by a sequence of models of the form PN1 ,η(θ,N1 ) , PN2 ,η(θ,N2 ) , . . . , where the natural parameter η(θ, Nk ) may depend on the set of nodes Nk and the dimension of parameter θ may grow with the size |Nk | of Nk . Then the generating processes can be described by a sequence of the form (N1 , xN1 , YN1 , PN1 ,η(θ,N1 ) ), (N2 , xN2 , YN2 , PN2 ,η(θ,N2 ) ), . . . Such sequences cover a wide range of generating processes. While an exhaustive discussion of all possible generating processes is impossible, we do wish to emphasize an important point: when modeling a sequence of random graphs of increasing size, the natural parameter η(θ, N) should not, in general, be constant, as assumed by SR and others, but may have to depend on the size and composition of N. We discuss sizedependent parameterizations in Section 3.1.4 and node-dependent parameterizations in Section 3.1.5. We conclude with some more detailed comments on how sequences of graphs of increasing size can be constructed (Section 3.1.6). Target of statistical inference. In infinite-population inference, the target of statistical inference is the parameter θ; note that θ may not be the natural parameter and that the dimension of θ may be infinite, because it may depend on the number of nodes, as it does in the case of the β-models described in Section 2.3. 3.1.4
Sequences of graphs with size-dependent parameterizations
Size-dependent parameterizations are important for at least two reasons. First, many real-world networks are sparse, because many real-world settings constrain the number of edges of nodes (Krivitsky et al., 2011). That suggests that 14
the expected number of edges is much smaller than the number of possible edges |N| . As an example, the Bernoulli(π) random graphs of Erd˝os & R´enyi (1960) as2 sume that edges Yi,j are independent and identically distributed Bernoulli(π) random variables, which implies that the expected number of edges is |N| π. Sparsity im2 |N| |N| plies that 2 π ≪ 2 , which in turn implies that π ≡ π|N| must depend on the size |N| of N and must satisfy π|N| → 0 as |N| → ∞. Hence the natural parameter η(θ, N) = logit(π|N| ) of sparse Bernoulli(π|N| ) random graphs and other sparse random graphs depends on |N|. Second, when considering a sequence of random graph models PN1 ,η(θ,N1 ) , PN2 ,(θ,N2 ) , . . . , it is natural to impose some form of invariance on the sequence of random graph models. One desirable form of invariance is invariance of the expected degrees of nodes to network size: e.g., in the New York City–Seattle exercise network described in Section 3.1.3, it is reasonable to assume that the expected degrees are invariant to network size and hence do not grow with network size. If the expected degrees of nodes are invariant to network size, then the probability of an edge depends on |N| (Krivitsky et al., 2011; Krivitsky & Kolaczyk, 2015; Butts & Almquist, 2015). We discuss invariance of the expected degrees of nodes in Section 4.2. 3.1.5
Sequences of graphs with node-dependent parameterizations
To capture heterogeneity in the propensities of nodes to form edges, the parameters of models may have to depend on nodes. One example are β-models and p1 -models (Holland & Leinhardt, 1981; Krivitsky & Kolaczyk, 2015; Yan et al., 2016a, 2015, 2016b). Both classes of models have node-dependent natural parameters η(θ, N): e.g., the β-models described in Section 2.3 have natural parameters ηi,j (θ, {i, j}) = θi + θj , where θi and θj can be interpreted as the propensities of nodes i ∈ N and j ∈ N to form edges. 3.1.6
Constructing sequences of graphs of increasing size
We provide here more details on how sequences of graphs of increasing size can be constructed. The construction of sequences of graphs is important, because in some applications it is more natural to consider sequences of graphs that increase by adding subsets of nodes rather than single nodes. An example are random graphs with multilevel structure as described in Section 5.2. There are many possible constructions, but the following examples cover some of the most interesting ones: • The subsets A1 , A2 , . . . may consist of single nodes, i.e., the sequence of sets of nodes N1 , N2 , . . . grows by adding nodes one by one. 15
• The subsets A1 , A2 , . . . may consist of more than one node and are of the same size, i.e., the sequence of sets of nodes N1 , N2 , . . . grows by adding subsets of nodes of the same size. • The subsets A1 , A2 , . . . may consist of more than one node and the sizes are of the same order of magnitude, i.e., the sequence of sets of nodes N1 , N2 , . . . grows by adding subsets of nodes of similar sizes. • The sizes of subsets of a finite subsequence of K subsets A1 , . . . , AK may grow as K grows, i.e., when more subsets of nodes are added, the existing subsets of nodes can grow along with the added subsets (e.g., schools facing surging demand may add more school classes and at the same time increase the sizes of all school classes). • The subsets A1 , A2 , . . . may or may not overlap. S • Cumulative processes: Nk = kl=1 Al corresponds to cumulative processes, i.e., all members of Nk−1 are members of Nk . S • Non-cumulative processes: Nk ⊂ kl=1 Al corresponds to non-cumulative processes, i.e., not all members of Nk−1 may be members of Nk , which means that some nodes leave the set of nodes when others are added. In Section 5, we discuss consistency results based on sequences of graphs of increasing size. In particular, Theorems 1 and 4 in Sections 5.2 and 5.3.2 are based on sequences of graphs that grow by adding subsets of nodes of the same size or similar sizes, whereas Theorems 2 and 3 in Section 5.3.1 are based on sequences of graphs that grow by adding nodes.
3.2
Incomplete-data generating process
The incomplete-data generating process is the process that, conditional on the population graph generated by the complete-data generating process, determines which parts of the population graph are observed. In the best-case scenario, the whole population graph is observed, but in more common scenarios, some of the edges in the population graph are unobserved. The two most common reasons for incomplete data are sampling and missing data. We discuss selected incomplete-data generating processes, with an emphasis on sampling designs (Sections 3.2.1, 3.2.2, and 3.2.3) and missing data (Section 3.2.4). We conclude with some comments on the fundamental concept of ignorability of incomplete-data generating processes for the purpose of likelihood-based super- and infinite-population inference (Section 3.2.5).
16
3.2.1
Sampling nodes: ego-centric sampling and link-tracing
If a population of nodes N is large, it may not be possible to observe the whole population graph yN . A popular solution is to sample edges by using ego-centric sampling (Krivitsky & Morris, 2017) or link-tracing (Thompson & Frank, 2000; Handcock & Gile, 2010). Both sample a subset of nodes N′ ⊆ N and record edges from nodes in N′ to nodes in N′ and from nodes in N′ to nodes in N \ N′ . An ego-centric sampling design generates a sample of nodes along with edges as follows (Krivitsky & Morris, 2017): 1. Generate a probability sample of nodes, called egos. 2. For each sampled ego, record edges to connected nodes, called alters. A probability sample of nodes can be generated by any sampling design for sampling from finite populations (e.g., Thompson, 2012). A number of variations of ego-centric sampling designs are possible. First, some ego-centric sampling designs identify alters, so that it is known whether two egos nominated the same alter. Second, other ego-centric sampling designs ask egos to report which pairs of alters have edges (Smith et al., 1972–2016). Third, an important extension of ego-centric sampling is link-tracing. Link-tracing exploits the observed edges of sampled nodes to include additional nodes into the sample provided the identities of the egos and alters of sampled nodes are known. One specific form of k-wave link-tracing, called a breadth-first search design, samples nodes and edges as follows (Thompson & Frank, 2000; Handcock & Gile, 2010): 1. Wave l = 0: Generate an egocentric sample. 2. Wave l = 1, . . . , k: (a) Add the nodes who are linked to the population members of wave l − 1 to the sample. (b) For each added node, record edges. Egocentric sampling can be considered to be a special case of k-wave link-tracing with k = 0. In general, breadth-first search designs and a related class of sampling designs, respondent-driven sampling (Gile & Handcock, 2010; Gile, 2011), do not generate probability samples, but often approximate probability samples when suitably adjusted (Kurant et al., 2011). Other link-tracing designs (e.g., random walk sampling, reweighted and stratified random walk sampling, multigraph sampling) do converge to probability samples and are preferred when available (Gjoka et al., 2011).
17
3.2.2
Sampling pairs of nodes: edge sampling
While ego-centric sampling and link-tracing sample edges indirectly by first sampling nodes and then recording edges of sampled nodes, one can sample edges directly. One example is a sampling design that samples spouses from a list of spouses, i.e., which samples pairs of nodes connected by an edge (here, marriage). A theoretical treatment of edge sampling can be found Crane & Dempsey (2015). 3.2.3
Sampling subgraphs
An alternative approach is based on sampling a subset of nodes N′ ⊆ N and collecting information about the whole subgraph yN′ of yN induced by N′ ⊆ N. Sampling subgraphs is distinct from ego-centric sampling and link-tracing, because subgraph sampling collects information about all edges among nodes in N′ but does not collect information about edges between nodes in N′ and nodes in N \ N′ , which ego-centric sampling and link-tracing do. The most widely used form of subgraph sampling is multilevel sampling (Snijders & Bosker, 1999; Lazega & Snijders, 2016). Consider a population of nodes N partitioned into subpopulations A1 , . . . , AK . Suppose that a subset of subpopulations S ⊆ {1, . . . , K} is sampled and that the subgraphs yAk induced by the sampled subpopulations Ak with k ∈ S are observed. A simple example of a multilevel sample is a sample of school classes from a population of school classes, generated by any sampling design for sampling from finite populations (e.g., Thompson, 2012). If all students in the sampled school classes are asked to report edges to other students in the same school class, the subgraphs induced by the sampled school classes are observed. 3.2.4
Missing data
In addition to design-based missingness due to sampling, there is out-of-design missingness due to, e.g., nonresponse of respondents in network surveys (Handcock & Gile, 2010; Koskinen et al., 2010). Out-of-design missingness is not under the control of researchers, but as long as the data are missing at random in the sense of Rubin (1976), Handcock & Gile (2010), and Koskinen et al. (2010), the missing-data mechanism may be ignorable for the purpose of likelihood-based super- and infinite-population inference, as explained below.
18
3.2.5
Ignorable incomplete-data generating processes
An important concept in likelihood-based super- and infinite-population inference given incomplete data is the notion of ignorability due to Rubin (1976). An incompletedata generating process is ignorable for the purpose of estimating the parameter θ of the population model PN,η(θ,N) if the probability of not observing data does not depend on the nature of the unobserved data and the parameters of the complete- and incomplete-data generating processes are distinct (Handcock & Gile, 2010; Koskinen et al., 2010). If an incomplete-data generating process is ignorable, the likelihood of the population parameters simplifies, which we demonstrate in Section 4.3.2. Examples of ignorable incomplete-data generating processes include ego-centric sampling, linktracing (e.g., breadth-first search designs, random walk sampling), edge sampling, subgraph sampling, and data missing at random, but exclude respondent-driven sampling (Lunagomez & Airoldi, 2014). We refer to Handcock & Gile (2010) and Koskinen et al. (2010) for likelihood-based inference with ignorable incomplete-data generating processes and Lunagomez & Airoldi (2014) for likelihood-based inference with non-ignorable incomplete-data generating processes.
3.3
Applications
Here, we present two applications to demonstrate how the distinction of completeand incomplete-data generating processes can help clarify statistical issues of interest that have been mired in confusion. 3.3.1
Subgraph-to-graph inference problem
We used the subgraph-to-graph inference problem of SR in Section 1 to demonstrate that lack of proper statistical language can give rise to considerable confusion. To reduce the confusion, we use the statistical framework introduced above, which shows that the subgraph-to-graph inference problem of SR can be understood as follows: 1. The goal is likelihood-based superpopulation inference. 2. The complete-data generating process assumes that population model PN,θ generated population graph yN . 3. The incomplete-data generating process assumes that subgraph yN′ of yN was generated by a sampling design that is ignorable for the purpose of likelihoodbased inference for parameter θ of population model PN,θ (Rubin, 1976; Handcock & Gile, 2010).
19
4. Since the population model PN,θ generated yN and the sampling design generating subgraph yN′ of yN is ignorable, the likelihood is proportional to marginalizations of PN,θ (YN = yN ) (Handcock & Gile, 2010). 5. In some cases, marginalizations of PN,θ (YN = yN ) reduce to PN′ ,θ (YN′ = yN′ ), but in many other cases, they do not, hence PN′ ,θ (YN′ = yN′ ) is a misspecified likelihood (Schweinberger et al., 2017). Thus, by neglecting to specify the goal of statistical inference and the completeand incomplete-data generating process, SR considered statistical inference based on the misspecified likelihood PN′ ,θ (YN′ = yN′ ) rather than the proper likelihood. The resulting confusion is evident in the writings of Fienberg (2012) and others, as we have detailed elsewhere (Schweinberger et al., 2017). 3.3.2
The number of nodes “n” is not the sample size
A common misinterpretation is to take the number of nodes to be the sample size. The misinterpretation is rooted in the unfortunate use of the symbol n to denote the number of nodes in many of the classic papers in the area (e.g., Frank & Strauss, 1986; Nowicki & Snijders, 2001; Hoff et al., 2002). While in classical statistics the symbol n typically denotes the sample size, which is under control of researchers and is determined by the sampling design, in many network studies the number of nodes is neither under the control of researchers nor determined by the sampling design. To reduce the confusion, it is again helpful to separate the complete-data generating process from the incomplete-data generating process. In many of the classic papers in the area, including Frank & Strauss (1986), Nowicki & Snijders (2001), and Hoff et al. (2002), there is an implicit assumption that there is a finite population of interest N, the whole population graph yN is observed, and the goal is likelihood-based superpopulation inference. In such situations, the number of nodes n = |N| refers to the number of nodes in the population N and hence pertains to the complete-data generating process. Therefore, the number of nodes n = |N| is neither under the control of researchers nor determined by the sampling design, but is instead determined by the substantive process of interest: e.g., the size of a corporate board is determined by the corporation rather than by the economist who wishes to study the corporate board. Thus, while some theoreticians (e.g., Shalizi & Rinaldo, 2013) have assumed that it is natural to allow the number of nodes n to grow without bound in order to study asymptotic properties of estimators, this may not be meaningful. Indeed, the size of many populations is bounded above by physical, geographical, financial, organizational, or other constraints. In addition, population graphs of different sizes are often believed to be governed by dif20
ferent substantive processes (Krivitsky et al., 2011; Schweinberger et al., 2017). As a result, finite- and superpopulation inference for finite populations may be preferable to infinite-population inference. In situations where samples from the population graph yN are generated, n = |N′ | may refer to the number of nodes sampled from the population of nodes N as discussed in Section 3.2. In such situations, n pertains to the incomplete-data generating process, and it is meaningful to ask what happens as the number of sampled nodes increases (e.g., Handcock & Gile, 2010; Koskinen et al., 2010; Krivitsky & Morris, 2017).
4
Invariance properties of sequences of random graphs
We pointed out in Section 3.1.3 that the natural parameters η(θ, N1 ), η(θ, N2 ), . . . of sequences of random graph models PN1 ,η(θ,N1 ) , PN2 ,η(θ,N2 ) , . . . may depend on the sets of nodes N1 , N2 , . . . . However, while the natural parameters η(θ, N1 ), η(θ, N2 ), . . . may depend on the sets of nodes N1 , N2 , . . . , it is natural to demand that a sequence of random graph models shares some common, invariant features, i.e., it is natural to impose some form of invariance on a sequence of random graph models. Invariance is desirable on both scientific and statistical grounds: e.g., if one wishes to use observed graphs to generate model-based predictions of graphs (which may not have the same size as the observed graphs), then the process that generated the observed graphs must be related to the process that generates model-based predictions of graphs. There are many invariance properties that could be imposed on sequences of random graph models. We discuss three invariance properties: invariance to the labeling of nodes and exchangeable random graphs (Section 4.1), invariance of expected degrees of nodes to network size and sparse random graphs (Section 4.2), and invariance in the form of projectivity (Section 4.3).
4.1
Invariance to labeling of nodes: exchangeable random graphs
A natural form of invariance is invariance of random graph models to the labeling of nodes, i.e., exchangeability (Diaconis & Janson, 2008; Lov´asz, 2012; Crane & Dempsey, 2015; Lauritzen et al., 2017). We follow here Lauritzen et al. (2017) and focus on finite exchangeability rather than infinite exchangeability (Diaconis & Janson, 2008).
21
A random graph defined on a finite set of nodes N is called finitely exchangeable if its probability mass function is invariant to the labeling of nodes. Lauritzen et al. (2017) studied the properties of finitely exchangeable random graphs, with a focus on the conditional independence properties as expressed by conditional independence graphs (Lauritzen, 1996). In contrast to random graphs, which use graphs to represent data structure (i.e., the structure of real-world networks, such as friendship networks), conditional independence graphs use graphs to represent model structure (i.e., the conditional independence structure of models). Conditional independence graphs can be used, and have been used since the pioneering work of Frank & Strauss (1986), to represent the conditional independence structure of random graphs. A conditional independence graph of a random graph contains the edge variables Yi,j as nodes and the absence of an edge between two edge variables Yi,j and Yk,l in the conditional independence graph indicates that the two edge variables Yi,j and Yk,l are conditionally independent given all other edge variables. Throughout, we follow the classic work of Frank & Strauss (1986) and refer to conditional independence graphs as dependence graphs, despite the fact that the term conditional independence graph would be more accurate. Lauritzen et al. (2017) showed that exchangeable random graph models can express four classes of conditional dependence structures: • the empty dependence graph and its complement; • the incidence dependence graph of Frank & Strauss (1986) and its complement. If the dependence graph is empty, then the edge variables Yi,j are independent (Frank & Strauss, 1986; Lauritzen, 1996). Examples are the Bernoulli random graphs and β-models described in Section 2.3. The incidence dependence graph of Frank & Strauss (1986) assumes that two edge variables Yi,j and Yk,l are dependent conditional on all other edge variables when the two pairs of nodes {i, j} and {k, l} are incident, i.e., {i, j} ∩ {k, l} = 6 {}. Since neither the empty dependence graph and its complement nor the complement of the incidence dependence graph of Frank & Strauss (1986) can represent the dependence structures of real-world networks, the Markov random graph models of Frank & Strauss (1986) emerge as the natural choice among exchangeable random graph models. Markov random graph models have edges, k-stars (k = 2, . . . , |N| − 1), and triangles as sufficient statistics (Frank & Strauss, 1986). We note that the homogeneous Markov random graph models of Frank & Strauss (1986) with equal k-star and triangle parameters are known to be misspecified models (Handcock, 2003; Schweinberger, 2011; Chatterjee & Diaconis, 2013), but curved exponential-family parameterizations of inhomogeneous Markov random graph models of Frank & Strauss (1986) are sensible al22
ternatives (Snijders et al., 2006; Hunter & Handcock, 2006; Hunter et al., 2008; Schweinberger, 2011). An example are random graph models with edge and geometrically weighted edgewise shared partner terms introduced in Section 2.3.
4.2
Invariance of expected degrees of nodes: sparse random graphs
A second form of invariance that can be imposed on sequences of random graph models is invariance of expected degrees of nodes to the size |N| of N. The invariance of expected degrees of nodes to network size |N| is motivated by the desire to impose some form of invariance on sequences of random graph models. The degrees of nodes are fundamental features of random graphs, hence it is natural to demand that the expected degrees of nodes are invariant to network size |N| (Krivitsky et al., 2011; Krivitsky & Kolaczyk, 2015; Butts & Almquist, 2015). To demonstrate, consider the Bernoulli(π) random graphs of Erd˝os & R´enyi (1960), which assume that edges Yi,j are independent and identically distributed Bernoulli(π) random variables. If the probability of an edge π is constant, the expected degrees of nodes are (|N| − 1) π < |N| π and hence increase with network size |N|. In many applications, it is not be plausible that the expected degrees of nodes increase with network size |N|: e.g., in the New York City–Seattle exercise network described in Section 3.1.3, it is reasonable to propose that the expected number of workout partners of New York City and Seattle residents are the same and do not depend on the size of the respective city. In other words, it is reasonable to suppose that the expected degrees |N| π = µ are equal to a finite constant µ regardless of network size |N|. To ensure that the expected degrees of nodes are size-invariant in the large-|N|limit, Krivitsky et al. (2011) proposed the parameterization π|N| = logit−1 (θ−log |N|). Krivitsky et al. (2011) showed that the degrees of nodes converge in distribution to Poisson(exp(θ)) as |N| → ∞. As a consequence, the expected degrees of nodes tend to exp(θ) as |N| → ∞ and hence are size-invariant in the large-|N|-limit. In addition, the expected degrees of nodes can be size-invariant due to common phenomena such as population-size-related spatial dispersion, as mentioned in Section 2.
4.3
Invariance in the form of projectivity
Last, but not least, a strong form of invariance of sequences of random graph models is projectivity (Snijders, 2010; Shalizi & Rinaldo, 2013; Schweinberger et al., 2017).
23
We discuss here weak and strong forms of projectivity and clarify the implications of projectivity in terms of statistical modeling and inference. We begin with the notion of strong projectivity in the sense of SR. We refer to projectivity in the sense of SR as strong projectivity to distinguish it from weaker forms of projectivity discussed below. Definition. Strong projectivity. A random graph model {PN,η(θ,N) , θ ∈ Θ} is strongly projective if η(θ, N′ ) = θ for all N′ ⊆ N and PN′ ,θ (YN′ = yN′ ) = PN,θ (YN′ = yN′ , YN\N′ ∈ YN\N′ ) for all θ ∈ Θ, where yN\N′ ∈ YN\N′ corresponds to yN ∈ YN excluding the subgraph yN′ ∈ YN′ induced by the subset of nodes N′ ⊂ N. In other words, the distribution of a subgraph induced by a subset of nodes N′ ⊂ N belongs to the same family of distributions with the same natural parameter η(θ, N′ ) = θ, regardless of the size |N′ | of N′ . It turns out that almost all classic and modern random graphs fail to satisfy strong projectivity. We give examples of models that do and do not satisfy strong projectivity. Throughout, we denote by Nk = {1, . . . , k} the set of nodes and by k the number of nodes. Example of strongly projective random graphs. One of the few examples of strongly projective random graph models are dense Bernoulli(π) random graphs with size-invariant natural parameter η(θ, N) = logit(π) = θ. To demonstrate, note that, e.g., 1 X
1 X
PN3 ,θ (YN3 = yN3 ) = π y1,2 (1 − π)1−y1,2 = PN2 ,θ (YN2 = yN2 ),
y1,3 =0 y2,3 =0
where we used the fact that η(θ, N2 ) = η(θ, N3 ) = θ is size-invariant. Example of non-projective random graphs. Consider the sparse Bernoulli(π|N| ) random graphs of Erd˝os & R´enyi (1960) with size-dependent edge probabilities π|N| , where the edge probabilities decrease as the number of nodes increases: π|N1 | > π|N2 | > . . . An example is the parameterization π|N| = logit−1 (θ − log |N|) of Krivitsky et al. (2011). It is straightforward to see that sparse Bernoulli(π|N| ) random graphs with π|N1 | > π|N2 | > . . . are not strongly projective: e.g., 1 X
1 X
y
PN3 ,η(θ,N3 ) (YN3 = yN3 ) = π3 1,2 (1 − π3 )1−y1,2
y1,3 =0 y2,3 =0 y
6= π2 1,2 (1 − π2 )1−y1,2 = PN2 ,η(θ,N2 ) (YN2 = yN2 ), 24
because π2 6= π3 . While SR expressed concern that meaningful statistical inference for models without strong projectivity may not be possible, it turns out that statistical inference may nonetheless be meaningful. For example, the size-invariant parameter θ of the sparse Bernoulli(π|N| ) random graphs of Krivitsky et al. (2011) with sizedependent edge probabilities π|N| = logit−1 (θ − log |N|) can be estimated by the maximum likelihood estimator θb|N| of θ, and the maximum likelihood estimator θb|N| is consistent and asymptotically normal, as shown by Theorem 3 in Section 5.3.1. Indeed, it turns out that strong projectivity entails strong assumptions: first and foremost, strong projectivity rules out almost all sparse random graphs and random graphs with dependent edges, as discussed in Section 4.3.1. Weaker forms of projectivity are therefore preferable to strong projectivity. One interesting form of weak projectivity is the following.
Definition. Weak projectivity. Assume that there exists a partition of the set of dyads DN ⊆ N × N into subsets D1 , . . . , DL and let yDl be the subset of edges corresponding to the subset of dyads Dl ⊆ DN , where l ∈ L = {1, . . . , L}. A random graph model {P∪l∈L Dl , η(θ, ∪l∈L Dl ) , θ ∈ Θ} is weakly projective if, for all K ⊂ L, P∪l∈K Dl , η(θ, ∪l∈K Dl ) (Y∪l∈K Dl = y∪l∈K Dl ) = P∪l∈L Dl , η(θ, ∪l∈L Dl ) (Y∪l∈K Dl = y∪l∈K Dl , Y∪l∈L\K Dl ∈ Y∪l∈L\K Dl ). In contrast to strong projectivity, which SR called consistency under sampling, weak consistency may be called consistency under block sampling, where blocks corresponds to subsets of dyads D1 , . . . , DL . In other words, if a sample of subsets of dyads is generated and ∪l∈K Dl with K ⊂ L denotes the collection of all sampled dyads, then the subpopulation model P∪l∈K Dl , η(θ, ∪l∈K Dl ) is consistent with the marginalization of the population model P∪l∈L Dl , η(θ, ∪l∈L Dl ) , despite the fact that the models need not be strongly projective within subsets of dyads. The appeal of weak projectivity is rooted in the fact that it can accomodate a wide range of dependencies within the subsets of dyads D1 , . . . , DL , whereas strong projectivity rules out almost all interesting dependencies, as pointed out above. Example of weakly projective random graphs. An example of weakly projective models are random graphs with local dependence (Schweinberger & Handcock, 2015), such as random graphs with multilevel structure (Schweinberger & Stewart, 2017). Suppose that there exists a partition of the population of nodes N into subpopulations A1 , . . . , AK , which induces a partition of the set of dyads: e.g., schools consist of school classes, which induce a partition of the set of friendships among students. A model induces local dependence if the 25
dependence is confined to subpopulation subgraphs, i.e., "K # Y PN, η(θ,N) (YN = yN ) = PAk , η(θ,Ak ) (YAk = yAk ) ×
"
k=1
K Y k−1 Y
Y
#
P{i,j},η(θ,{i,j}) (Yi,j = yi,j ) .
k=1 l=1 i∈Nk < j∈Al
Schweinberger & Handcock (2015, Theorem 1) showed that models with local dependence satisfy weak projectivity. The advantage of weak projectivity is that it is satisfied by all models having additional structure in the form of subpopulation structure, including a wide range of models with complex dependence within subpopulations (such as the complex dependence induced by transitive edge terms within subpopulations as described in Section 2.3). While models with local dependence are weakly rather than strongly projective, we show in Sections 5.2 and 5.3.2 that consistent super- and infinite-population inference for models with local dependence is possible. Other forms of projectivity. Snijders (2010) considered a form of conditional marginalizability or projectivity, conditional on the event that there are no edges between two or more non-overlapping subsets of nodes. It is a weak form of projectivity, because it is conditional on the observed graph and limited to models with counts of connected subgraphs as sufficient statistics (e.g., k-stars and triangles, see Frank & Strauss, 1986). In addition, the probability of the event that there are no edges between two or more non-overlapping subsets of nodes is close to 0 for all dense random graphs and all sparse random graphs above the so-called threshold for connectivity (Bollob´as, 1998)—e.g., in Bernoulli(π|N| ) random graphs, the threshold for connectivity corresponds to π|N| = (log |N|) / |N| (Bollob´as, 1998). Hence the notion of conditional projectivity of Snijders (2010) may not be useful unless the random graph is sparse. 4.3.1
Implications of strong projectivity in terms of modeling
While some probabilists and mathematical statisticians have argued that strong projectivity is a natural requirement for random graph models (e.g., Shalizi & Rinaldo, 2013; Crane & Dempsey, 2015; Lauritzen et al., 2017), strong projectivity is too restrictive for the purpose of modeling real-world networks. First, many real-world network processes depend on the size |N| of N, as discussed in Sections 3.1.4 and 4.2. While SR and others have assumed that it is desirable to 26
generalize a model for a random graph of one size to random graphs of arbitrary, possibly infinite size while holding all natural parameters constant—i.e., η(θ, N) = θ for all N′ ⊂ N, regardless of the size |N′ | of N′ —it is not credible to expect models of small random graphs to generalize to large random graphs without changes in natural parameters. Indeed, the substantive processes governing graphs of different sizes are believed to be different. The classic Bernoulli(π|N| ) random graphs of Erd˝os & R´enyi (1960) with size-dependent edge probabilities π|N| and natural parameters η(π|N| ) = logit(π|N| ) respect that, and so should more general random graphs. Second, SR have shown that strong projectivity rules out almost all interesting dependencies. But decades of research, starting with the pioneering work of Holland & Leinhardt (1970, 1972, 1976) in the 1970s, have shown that many realworld networks exhibit complex dependencies. As a consequence, conclusions and predictions based on models with strong projectivity may be misleading, because models with strong projectivity cannot capture interesting dependencies. In conclusion, strong projectivity fails to respect the nature of real-world networks and superimposing strong projectivity on random graphs amounts to an undesirable limitation of statistical network analysis. 4.3.2
Implications of strong projectivity in terms of inference
There has been much confusion about the role of strong projectivity in statistical inference (Schweinberger et al., 2017). We discuss here the implications of strong projectivity in terms of finite- and super-population inference. Finite-population inference. Finite-population inference does not assume that the population graph was generated by a probability model. Therefore, finite-population inference is not affected by projectivity, which is a property of probability models. Superpopulation inference. Superpopulation inference is based on probability models and can hence be affected by projectivity, but it turns out that proper likelihood-based superpopulation inference along the lines of Rubin (1976), Handcock & Gile (2010), and Koskinen et al. (2010) is not affected by lack projectivity. We review here the approach of SR mentioned in Section 1, which is based on a misspecified likelihood and may be affected by lack of projectivity, and the proper likelihood-based approach of Handcock & Gile (2010), which is based on the proper likelihood and is not affected by lack of projectivity.
27
The approach of SR is based on maximizing M(θ) = PN′ ,η(θ,N′ ) (YN′ = yN′ ) under the assumption that η(θ, N′ ) = θ for all N′ ⊆ N. Basing statistical inference on M(θ) is problematic for at least three reasons. First, M(θ) is not, in general, proportional to the likelihood and is hence a misspecified likelihood, as explained in Section 3.3.1. Second, the results of SR suggest that when the model is not strongly projective, then using M(θ) is problematic, because PN′ ,η(θ,N′ ) may not be relatable to PN,η(θ,N) when N′ ⊂ N. Third, the assumption that η(θ, N′ ) = θ for all N′ ⊆ N—i.e., the natural parameter is the same for all possible subgraphs N′ ⊆ N, regardless of the size |N′ | of N—is restrictive, because graphs of different sizes are governed by different processes, as discussed in Section 4.3.1. To describe the proper likelihood-based approach of Handcock & Gile (2010), denote by A the N × N-matrix with elements Ai,j ∈ {0, 1}, where Ai,j = 1 if the value yi,j of Yi,j is observed and Ai,j = 0 otherwise. Recall that DN ⊆ N × N is the set of pairs of nodes of interest and let DN′ = {(i, j) ∈ DN : Ai,j = 1} be the set of pairs of nodes for which observations are available. We denote by yDN′ = {yi,j : (i, j) ∈ DN′ } the observations. The matrix A can deal with all forms of incomplete observations of yN , whether data are unobserved due to node sampling, edge sampling, subgraph sampling, missing data, or any combination of the aforementioned incomplete-data mechanisms. The incomplete-data generating process may depend on a parameter α (e.g., sample inclusion probabilities). The incomplete-data generating process is called ignorable for the purpose of estimating the population parameter θ as long as Pα (A = a | YDN = yDN ) = Pα (A = a | YDN′ = yDN′ ), i.e., the probability of being included in the sample does not depend on the unobserved data yDN \DN′ . If the incomplete-data generating process is ignorable, the likelihood is given by X L(α, θ) = Pα (A = a | YDN′ = yDN′ ) PN,η(θ,N) (YDN = yDN ) = L(α) L(θ), yDN ∈YN (yD ′ ) N
where YN (yDN′ ) denotes the set of graphs yDN ∈ YN compatible with the observed data yDN′ . Here, L(α) = Pα (A = a | YDN′ = yDN′ ) is the likelihood of parameter α P and L(θ) = yD ∈ YN (y ′ ) PN,η(θ,N) (YDN = yDN ) is the likelihood of parameter θ. As N
D
N
a result, statistical inference concerning parameter θ can be based on L(θ). 28
An important observation is that the likelihood L(θ) is based on marginalizations of the population probability mass function PN,η(θ,N) . Thus, while misspecified likelihood-based superpopulation inference based on M(θ) may be affected by lack of projectivity, proper likelihood-based superpopulation inference based on L(θ) is not.
5
Consistency and asymptotic normality of estimators
Consistency and other properties of estimators are based on the notion of observing more data from the same source. The notion of observing more data from the same source depends on both the complete- and incomplete data generating process. We discuss here consistency and asymptotic normality of estimators in finite-, super, and infinite-population inference scenarios based on suitable notions of observing more data from the same source.
5.1
Finite-population inference
Finite-population inference focuses on functions of the population graph yN , such as the number of edges of yN , and does not assume that the population graph yN was generated by a population model. When the whole population graph yN is observed, there is no uncertainty. However, when a sample from the population graph yN is generated as described in Section 3.2, there is uncertainty due to the unobserved parts of the population graph yN . In such situations, two forms of consistency are available for estimators of population quantities based on sample quantities: Fisherconsistency (Fisher, 1922) and consistency and asymptotic normality under sampling (Gjoka et al., 2015; Krivitsky & Morris, 2017). First, many estimators of population quantities are Fisher-consistent (Fisher, 1922). In other words, when the whole population graph is observed, the estimator of the population quantity of interest is equal to the population quantity. An example is an estimator of the proportion of edges in the population graph based on the proportion of edges in a sample. Second, it is often possible to write functions of the population graph of interest in terms of weighted population totals. In such settings, one can construct classical Horvitz-Thompson estimators for the weighted population total of interest, whose properties follow from the sampling design and often include consistency and asymptotic normality under sampling (Gjoka et al., 2015).
29
Last, but not least, consider the function def
θ(xN , yN ) = arg max exp{hη(θ ′ , N), s(xN , yN )i − ψ(θ ′ , N)}, θ′ ∈Θ
which we introduced in Section 3.1.1. Krivitsky & Morris (2017) showed that estimators of θ(xN , yN ) based on ego-centric samples are consistent and asymptotically normal provided the sample size increases without bound and the sufficient statistic s(xN , yN ) can be reconstructed from egocentric observations of all members of the population N.
5.2
Superpopulation inference: finite graph
Superpopulation inference is concerned with a finite population N and a population graph yN defined on N generated by a population model PN,η(θ,N) . We consider here statistical inference for parameter θ of PN,η(θ,N) given a complete observation of population graph yN . Extensions to sampled data are possible, as we mention below. In general, finite-population concentration and consistency results based on a single observation of a finite population graph are challenging unless random graphs are endowed with additional structure, such as multilevel structure (Lazega & Snijders, 2016). A simple form of multilevel structure is a partition of a population N into K subpopulations A1 , . . . , AK . Examples are terrorist networks partitioned into terrorist cells and armed forces partitioned into units of armed forces. If multilevel structure in the form of a partition of population N into K subpopulations A1 , . . . , AK is available, it may be reasonable to assume that the dependence is local in the sense that the dependence is restricted to subpopulations A1 , . . . , AK as follows (Schweinberger & Handcock, 2015): # "K Y PAk , η(θ,Ak ) (YAk = yAk ) PN, η(θ,N) (YN = yN ) = ×
"
k=1 K Y
k Y
Y
k=1 l=1 i∈Ak < j∈Al
#
P{i,j},η(θ,{i,j}) (Yi,j = yi,j ) .
A simple but important special case is given by K independent graphs YA1 , . . . , YAK induced by sets of nodes A1 , . . . , AK , where edges between nodes in Ak and nodes in Al are absent with probability 1 provided k 6= l. An example would be an experiment with K groups, where edges between groups are impossible by design. We consider here random graph models with multilevel structure and local dependence induced by Ak -dependent edge and transitive edge terms as described in 30
Section 2.3 (k = 1, . . . , K). In other words, edges and transitive edges are counted within subpopulations Ak (k = 1, . . . , K). We assume here that the sizes of subpopulations A1 , . . . , AK are either the same or are similar in the sense that there exist A > 0 and B > 0 such that Ak ∈ [A − B, A + B] (k = 1, . . . , K). Since the sizes of subpopulations are the same or similar, it may be assumed that the natural parameters η(θ, Ak ) = θ corresponding to subpopulations Ak are constant across subpopulations (k = 1, . . . , K). The between-subpopulation edges can be assumed to be independent and identically distributed Bernoulli(π|N| ) random variables with size-dependent edge probability π|N| , such that π|N| decreases as a function of |N| and hence induces sparsity between subpopulations. We focus here on statistical inference concerning the within-subpopulation parameter θ and assume that within- and between-subpopulation parameters are distinct, so that statistical inference concerning within-subpopulation parameter θ can be based on within-subpopulation subgraphs. The following finite-population concentration result is taken from Corollary 3 of Schweinberger & Stewart (2017). The result assumes that the whole population graph is observed, but it is possible to extend the result to sampled within-subpopulation subgraphs as described in Section 3.2.3. S Theorem 1. Suppose that a population NK = K k=1 Ak consists of K known subpopulations A1 , . . . , AK , which are non-empty, disjoint, and have the same size or similar sizes in the sense that Ak ∈ [A − B, A + B] (k = 1, . . . , K). Assume that the population graph YNK is governed by population model PNK ,η(θ,NK ) with subpopulation-dependent edge and transitive edge terms, where Θ is a compact subset of R × R+ . Let θ ∈ Θ be the data-generating parameter and θbK be the maximum likelihood estimator based on YNK . Then, for all ǫ > 0, there exist δ(ǫ) > 0, C > 0, and K0 ≥ 1 such that, for all K > K0 , P(kθbK − θk2 < ǫ) ≥ 1 − 6 exp −δ(ǫ)2 C K ,
where kθbK − θk2 denotes the ℓ2 -distance between θbK and θ.
Theorem 1 shows that the probability of the event kθbK − θk2 < ǫ is close to 1 provided K > K0 . Theorem 1 is a finite-population result in the sense that it applies to all finite populations with K > K0 subpopulations. Its most important implication is that superpopulation inference for models with complex dependence, including transitive closure and other complex dependencies, is possible and meaningful when there is additional structure in the form of multilevel structure. Other consistency results for superpopulation inference based on sequences of graphs are reported in Section 5.3.2. 31
5.3
Super- and infinite-population inference: sequences of graphs
We turn to consistency and asymptotic normality results concerning super- and infinite-population inference based on sequences of graphs. The sequences of graphs may consist of graphs of the same size, graphs of similar sizes, or graphs of increasing size. It is convenient to divide the discussion of consistency and asymptotic normality results into results for models with dyad-independence (Section 5.3.1) and dyad-dependence (Section 5.3.2). 5.3.1
Dyad-independence models
Most existing consistency and asymptotic normality results have been obtained under the assumption of dyad-independence, i.e., either edges Yi,j are assumed to be independent (undirected random graphs) or pairs of edges (Yi,j , Yj,i) are assumed to be independent (directed random graphs). Examples are consistency and asymptotic normality results for β-models and p1 -models (Diaconis et al., 2011; Rinaldo et al., 2013; Krivitsky & Kolaczyk, 2015; Yan et al., 2015, 2016a,b). We present here two interesting examples, one with node-dependent parameters and one with size-dependent parameters. The first example concerns p1 -models for directed random graphs with nodedependent parameters (Yan et al., 2016a). Under p1 -models without reciprocity, the directed edges are independent Bernoulli(πi,j ) random variables with edge probabilities πi,j = logit−1 (αi + βj ) and natural parameters ηi,j (θ, {i, j}) = αi + βj , where θ = (α1 , . . . , α|N| , β1 , . . . , β|N| ). To make the model identifiable, we follow Yan et al. (2016a) and set β|N| = 0, so that θ ∈ R2 |N|−1 . The following result is taken from Yan et al. (2016a, Theorems 1 and 2). Theorem 2. Let N1 , N2 , . . . be a sequence of sets of nodes and YN1 , YN2 , . . . be a sequence of random graphs governed by a sequence of p1 -models without reciprocity PN1 ,η(θ,N1 ) , PN2 ,η(θ,N2 ) , . . . , where Nk = {1, . . . , k} (k = 1, 2, . . . ). Assume that kθk∞ ≤ τ log |N|, where 0 < τ < 1 / 44 and kθk∞ = max1≤i≤2 |N|−1 |θi |. Then • with a probability approaching 1, the maximum likelihood estimator θbN based p on YN exists, is unique, and kθbN − θk∞ → 0 as |N| → ∞. • for any fixed k ≥ 1, the vector consisting of the first k elements of (θbN − θ) is asymptotically multivariate normal with mean vector zero and variancecovariance matrix given by the corresponding k × k block of the inverse Fisher information matrix as |N| → ∞.
32
It may be suprising that consistent estimation of the 2 |N| − 1-dimensional parameter θ is possible. Note, however, that the number of independent observations from the p1 -model without reciprocity is |N| (|N| − 1), so the number of independent observations (which is quadratic in |N|) grows faster than the number of parameters (which is linear in |N|). The second example concerns Bernoulli(π|N| ) random graphs with size-dependent edge probabilities π|N| = logit−1 (θ − log |N|) and natural parameters η(θ, N) = θ − log |N| (Krivitsky et al., 2011). The following result is based on Theorem 3.1 of Krivitsky & Kolaczyk (2015). Theorem 3. Let N1 , N2 , . . . be a sequence of sets of nodes and YN1 , YN2 , . . . be a sequence of random graphs governed by a sequence of sparse Bernoulli random graph models PN1 ,η(θ,N1 ) , PN2 ,η(θ,N2 ) , . . . , where Nk = {1, . . . , k} (k = 1, 2, p. . . ). Then b the maximum likelihood estimator θ|N| based on YN is consistent and |N| (θb|N| − d
θ) −→ N(0, exp(−θ)) as |N| → ∞.
Other consistency and asymptotic normality results for models with dyad-independence can be found in Krivitsky & Kolaczyk (2015). These consistency and asymptotic normality results demonstrate that, when meaningful sequences of random graph models are specified and larger graphs contain more information than smaller graphs, then consistency and asymptotic normality results for size-invariant parameters are possible. In particular, the maximum likelihood estimator θb|N| is consistent and asymptotically normal despite the fact that Bernoulli(π|N| ) random graphs with size-dependent edge probabilities π|N| = logit−1 (θ − log |N|) are not strongly projective. 5.3.2
Dyad-dependence models
There is a common misconception that maximum likelihood estimators of random graph models with complex dependence may not be consistent, because many random graph models with complex dependence—such as transitivity—are not strongly projective (Fienberg, 2012; Yan et al., 2016a; Schweinberger et al., 2017). The following result shows that maximum likelihood estimators are consistent even when models induce transitivity, as long as the form of transitivity is a sensible one and some form of replication is possible. The result is taken from Corollary 3 of Schweinberger & Stewart (2017) and focuses on models with edge and transitive edge terms as described in Section 5.2. Theorem 4. Let A1 , A2 , . . . and N1 , N2 , . . . be a sequence of non-empty, disS joint sets of nodes, where NK = K k=1 Ak (K = 1, 2, . . . ). Suppose that the sizes 33
of subsets A1 , . . . , AK are of the same order of magnitude in the sense there exist A1 > 0 and A2 > 0 and a non-decreasing function h : N 7→ N such that A1 h(K) ≤ |Ak | ≤ A2 h(K) (k = 1, . . . , K, K = 1, 2, . . . ). Consider a sequence of random graphs YN1 , YN2 , . . . governed by a sequence of random graph models PN1 ,η(θ,N1 ) , PN2 ,η(θ,N2 ) , . . . with Ak -dependent edge and transitive edge terms (k = 1, 2, . . . ). Let θ ∈ Θ be the data-generating parameter and θbK be the maximum likelihood estimap tor based on YNK . Then kθbK − θk2 → 0 as K → ∞ provided kAk∞ = o(K 1/4 ), where kAk∞ = max1≤k≤K |Ak |.
Theorem 4 suggests that statistical inference is meaningful in each of the following scenarios, despite the fact that models with Ak -dependent edge and transitive edge terms induce dependence: I. A large number K of independent random graphs YA1 , . . . , YAK is observed, where the sets of nodes A1 , . . . , AK have the same size or similar sizes: e.g., K units of armed forces of the same size are observed or K corporate boards of similar sizes are observed. II. A large number K of independent random graphs YA1 , . . . , YAK is observed, where the sets of nodes A1 , . . . , AK grow at the same rate and kAk∞ = o(K 1/4 ): e.g., a large state with a surging population may increase the number of public schools K and at the same time all K public schools grow by admitting additional students. III. A single random graph YNK consisting of K subgraphs with local dependence is observed, where the subsets of nodes A1 , . . . , AK have the same size or similar sizes: e.g., a terrorist network consisting of K terrorist cells of similar sizes is observed; edges within terrorist cells are dependent whereas edges between terrorist cells are independent. IV. A single random graph YNK consisting of K subgraphs with local dependence is observed, where the subsets of nodes A1 , . . . , AK grow at the same rate and kAk∞ = o(K 1/4 ): e.g., a school facing surging demand may increase the number of school classes K and at the same time increase the sizes of all K school classes; edges within school classes are dependent whereas edges between school classes are independent. More general results on canonical and curved exponential-family random graphs with local dependence can be found in Schweinberger & Stewart (2017). The main conclusion is that replication—replication in the sense that there are K similar-sized graphs or a single graph consisting of K similar-sized subgraphs with local dependence—facilitates consistency results. Thus, statistical inference is meaningful for many random graph models, including models with sensible forms of tran34
sitivity and other complex dependencies. Here, sensible forms of transitivity refer to models with edge and transitive edge terms as described in Section 2.3. Models with edge and transitive edge terms are better behaved than the ill-posed models with edge and triangle terms mentioned in Section 1.1 (Hunter et al., 2012). These results have implications in terms of both statistical theory and practice. In terms of theory, the consistency results suggest that more attention should be paid to replication-based asymptotics and that strong projectivity is not necessary for consistency of maximum likelihood estimators in replication-based asymptotics. In terms of practice, the consistency results encourage replicative data collection designs, which have the benefit of providing an immediate and meaningful route to replication-based asymptotic results. Sequences of graphs of increasing size with size-dependent edge and transitive edge terms. A simulation study by Krivitsky & Kolaczyk (2015, Section 3.3) suggests that Theorem 3 concerning models with size-dependent edge terms can be extended to size-dependent edge and transitive edge terms, but a proof is elusive. It is worth noting, however, that an extension of Theorem 3 to edge and transitive edge terms would follow a route that is different from the route taken by Theorem 4: Theorem 4 relies on replication—i.e., it relies on either multiple graphs with edge and transitive edge terms or a single graph consisting of multiple subgraphs with local edge and transitive edge terms. In contrast, an extension of Theorem 3 would not rely on replication but on a sequence of graphs of increasing size, without dependence being restricted to (sub)graphs, and is hence more challenging.
6
Conclusion
We have demonstrated that a proper statistical framework—and the language to express it—is essential to ask well-posed questions regarding statistical inference for random graph models. Among other things, we believe that the important class of exponential-family random graphs—which has been used to study a wide range of topics, ranging from the structure of the human brain (e.g., Simpson et al., 2012) to social networks (e.g., Lusher et al., 2013)—is well-suited to likelihood-based superpopulation inference. The consistency and asymptotic normality results discussed in Sections 5.2 and 5.3.2 indicate that statistical inference for these models makes sense as long as these models are used to ask proper questions about random graphs. It goes without saying that the language of exponential-family random graphs can be abused to ask improper questions by specifying models with problematic assumptions. But 35
potential for abuse of a language does not invalidate its potential for eloquent and effective communication when properly employed. Given the statistical framework presented here, it is possible to ask a number of additional questions. One of the more interesting questions is which invariance properties sequences of random graph models should satisfy. We have sketched some invariance properties, but there are other interesting invariance properties. Another interesting question is which theoretical results can be obtained for the broader range of generating processes we have outlined. We have presented basic results for some of the most common scenarios, but it should be possible to obtain theoretical results for other scenarios as well. Last, but not least, we have set aside questions relating to latent variable and temporal random graph models, both of which are of obvious interest. Many of the results shown here have possible extensions to latent variable and temporal random graph models, including cases where the size and composition of the set of nodes changes over time (Almquist & Butts, 2014). By identifying the data generating processes involved in a statistical network analytic problem and specifying the associated inferential target, clarity can be brought to a wide range of challenging statistical problems.
References Airoldi, E., Blei, D., Fienberg, S. & Xing, E. (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9, 1981–2014. Almquist, Z. W. & Butts, C. T. (2014). Logistic network regression for scalable analysis of networks with joint edge/vertex dynamics. Sociological Methodology 44, 273–321. Barndorff-Nielsen, O. E. (1978). Information and Exponential Families in Statistical Theory. New York: John Wiley & Sons. ´ s, B. (1998). Modern Graph Theory. New York: Springer-Verlag. Bolloba Brown, L. (1986). Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory. Hayworth, CA, USA: Institute of Mathematical Statistics. Butts, C. T. (2007). Permutation models for relational data. Sociological Methodology 37, 257–281.
36
Butts, C. T. (2008). A relational event framework for social action. Sociological Methodology 38, 155–200. Butts, C. T. (2011). Bernoulli graph bounds for general random graph models. Sociological Methodology 41, 299–345. Butts, C. T. & Acton, R. M. (2011). Spatial modeling of social networks. In The SAGE Handbook of GIS and Society Research, T. Nyerges, H. Couclelis & R. McMaster, eds., chap. 12. SAGE Publications, pp. 222–250. Butts, C. T., Acton, R. M., Hipp, J. R. & Nagle, N. N. (2012). Geographical variability and network structure. Social Networks 34, 82–100. Butts, C. T. & Almquist, Z. W. (2015). A flexible parameterization for baseline mean degree in multiple-network ERGMs. Journal of Mathematical Sociology 39, 163–167. Chatterjee, S. & Diaconis, P. (2013). Estimating and understanding exponential random graph models. The Annals of Statistics 41, 2428–2461. Crane, H. & Dempsey, W. (2015). A framework for statistical network modeling Arxiv.org/abs/1509.08185.v4. Dawid, A. P. & Dickey, J. M. (1977). Likelihood and Bayesian inference from selectively reported data. Journal of the American Statistical Association 72, 845– 850. Dekker, D., Krackhardt, D. & Snijders, T. A. B. (2007). Sensitivity of MRQAP tests to collinearity and autocorrelation conditions. Psychometrika 72, 563–581. Diaconis, P., Chatterjee, S. & Sly, A. (2011). Random graphs with a given degree sequence. The Annals of Applied Probability 21, 1400–1435. Diaconis, P. & Janson, S. (2008). Graph limits and exchangeable random graphs. Rendiconti di Matematica 28, 33–61. ˝ s, P. & R´ Erdo enyi, A. (1959). On random graphs. Publicationes Mathematicae 6, 290–297. ˝ s, P. & Re ´nyi, A. (1960). On the evolution of random graphs. Publications Erdo of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17–61. 37
Fellows, I. & Handcock, M. S. (2012). Exponential-family random network models Arxiv.org/abs/1208.0121. Fienberg, S. E. (2012). A brief history of statistical models for network analysis and open challenges. Journal of Computational and Graphical Statistics 21, 825–839. Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, Series A 222, 309–368. Fosdick, B. K. & Hoff, P. D. (2015). Testing and modeling dependencies between a network and nodal attributes. Journal of the American Statistical Association 110, 1047–1056. Frank, O. & Strauss, D. (1986). Markov graphs. Journal of the American Statistical Association 81, 832–842. Geyer, C. J. & Thompson, E. A. (1992). Constrained Monte Carlo maximum likelihood for dependent data. Journal of the Royal Statistical Society B 54, 657– 699. Gilbert, E. N. (1959). Random graphs. The Annals of Mathematical Statistics 30, 1141–1144. Gile, K. (2011). Improved inference for respondent-driven sampling data with application to HIV prevalence estimation. Journal of the American Statistical Association 106, 135–146. Gile, K. & Handcock, M. H. (2010). Respondent-driven sampling: An assessment of current methodology. Sociological Methodology 40, 285–327. Gjoka, M., Kurant, M., Butts, C. T. & Markopoulou, A. (2011). Practical recommendations on crawling online social networks. IEEE Journal on Selected Areas in Communications on Measurement of Internet Topologies . Gjoka, M., Smith, E. & Butts, C. T. (2015). Estimating subgraph frequencies with or without attributes from egocentrically sampled data Arxiv.org/abs/1510.08119. Gjoka, M., Smith, E. J. & Butts, C. T. (2014). Estimating clique composition and size distributions from sampled network data. Proceedings of the Sixth IEEE Workshop on Network Science for Communication Networks (NetSciCom 2014) .
38
Goldenberg, A., Zheng, A., Fienberg, S. & Airoldi, E. (2009). A survey R in Machine Learning 2, of statistical network models. Foundations and Trends 129–233. Goodreau, S. M., Handcock, M. S., Hunter, D. R., Butts, C. T. & Morris, M. (2008). A statnet tutorial. Journal of Statistical Software 24. Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks. Tech. rep., Center for Statistics and the Social Sciences, University of Washington. Www.csss.washington.edu/Papers. Handcock, M. S. & Gile, K. (2010). Modeling social networks from sampled data. The Annals of Applied Statistics 4, 5–25. Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M. & Morris, M. (2008). statnet: software tools for the representation, visualization, analysis and simulation of network data. Journal of Statistical Software 24, 1548–7660. Handcock, M. S., Raftery, A. E. & Tantrum, J. M. (2007). Model-based clustering for social networks. Journal of the Royal Statistical Society A (with discussion) 170, 301–354. Hanneke, S., Fu, W. & Xing, E. P. (2010). Discrete temporal models of social networks. Electronic Journal of Statistics 4, 585–605. Harris, J. K. (2013). An Introduction to Exponential Random Graph Modeling. Sage. Hartley, H. & Sielken, R. (1975). A “super-population viewpoint” for finite population sampling. Biometrics , 411–422. Hoff, P. D., Raftery, A. E. & Handcock, M. S. (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association 97, 1090–1098. Holland, P. W. & Leinhardt, S. (1970). A method for detecting structure in sociometric data. American Journal of Sociology 76, 492–513. Holland, P. W. & Leinhardt, S. (1972). Some evidence on the transitivity of positive interpersonal sentiment. American Journal of Sociology 77, 1205–1209.
39
Holland, P. W. & Leinhardt, S. (1976). Local structure in social networks. Sociological Methodology , 1–45. Holland, P. W. & Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. Journal of the American Statistical Association 76, 33–65. Hunter, D. R. (2007). Curved exponential family models for social networks. Social Networks 29, 216–230. Hunter, D. R., Goodreau, S. M. & Handcock, M. S. (2008). Goodness of fit of social network models. Journal of the American Statistical Association 103, 248–258. Hunter, D. R. & Handcock, M. S. (2006). Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics 15, 565–583. Hunter, D. R., Krivitsky, P. N. & Schweinberger, M. (2012). Computational statistical methods for social network models. Journal of Computational and Graphical Statistics 21, 856–882. Jonasson, J. (1999). The random triangle model. Journal of Applied Probability 36, 852–876. Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models. New York: Springer-Verlag. Koskinen, J. H. (2009). Using latent variables to account for heterogeneity in exponential family random graph models. In Proceedings of the 6th St. Petersburg Workshop on Simulation, S. M. Ermakov, V. B. Melas & A. N. Pepelyshev, eds., vol. 2. St. Petersburg, Russia: St. Petersburg State University. Koskinen, J. H., Robins, G. L. & Pattison, P. E. (2010). Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation. Statistical Methodology 7, 366–384. Krivitsky, P. N. (2012). Exponential-family models for valued networks. Electronic Journal of Statistics 6, 1100–1128. Krivitsky, P. N. & Handcock, M. S. (2014). A separable model for dynamic networks. Journal of the Royal Statistical Society B 76, 29–46. 40
Krivitsky, P. N., Handcock, M. S. & Morris, M. (2011). Adjusting for network size and composition effects in exponential-family random graph models. Statistical Methodology 8, 319–339. Krivitsky, P. N. & Kolaczyk, E. D. (2015). On the question of effective sample size in network modeling: An asymptotic inquiry. Statistical Science 30, 184–198. Krivitsky, P. N. & Morris, M. (2017). Inference for social network models from egocentrically-sampled data, with application to understanding persistent racial disparities in HIV prevalence in the US. Annals of Applied Statistics 11, 427–455. Kurant, M., Gjoka, M., Wang, Y., Almquist, Z. W., Butts, C. T. & Markopoulou, A. (2012). Coarse-grained topology estimation via graph sampling. In Proceedings of ACM SIGCOMM Workshop on Online Social Networks (WOSN) ’12. Helsinki, Finland. Kurant, M., Markopoulou, A. & Thiran, P. (2011). Towards unbiased BFS sampling. IEEE Journal on Selected Areas in Communications 29, 1799–1809. Lauritzen, S. (1996). Graphical Models. Oxford, UK: Oxford University Press. Lauritzen, S., Rinaldo, A. & Sadeghi, K. (2017). Random networks, graphical models, and exchangeability ArXiv:1701.08420v1. Lazega, E. & Snijders, T. A. B., eds. (2016). Multilevel Network Analysis for the Social Sciences. Switzerland: Springer. Lehmann, E. L. (1999). Elements of Large Sample Theory. New York: SpringerVerlag. ´ sz, L. (2012). Large Networks and Graph Limits. Providence: American MathLova ematical Society. Lunagomez, S. & Airoldi, E. (2014). Bayesian inference from non-ignorable network sampling designs. arXiv:1401.4718 . Lusher, D., Koskinen, J. & Robins, G. (2013). Exponential Random Graph Models for Social Networks. Cambridge, UK: Cambridge University Press. McPherson, J. M. (1983). An ecology of affiliation. American Sociological Review 48, 519–532.
41
Moreno, J. L. & Jennings, H. H. (1938). Statistics of social configurations. Sociometry 1, 342–374. Morris, M., Handcock, M. S. & Hunter, D. R. (2008). Specification of exponential-family random graph models: Terms and computational aspects. Journal of Statistical Software 24, 1–24. Nowicki, K. & Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association 96, 1077–1087. Pattison, P. & Wasserman, S. (1999). Logit models and logistic regressions for social networks: Ii. Multivariate relations. British Journal of Mathematical and Statistical Psychology 52, 169–193. Rinaldo, A., Petrovic, S. & Fienberg, S. E. (2013). Maximum likelihood estimation in network models. The Annals of Statistics 41, 1085–1110. Ripley, B. D. (1988). Statistical Inference for Spatial Processes. Cambridge: Cambridge University Press. Rubin, D. B. (1976). Inference and missing data. Biometrika 63, 581–592. Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families. Journal of the American Statistical Association 106, 1361– 1370. Schweinberger, M. & Handcock, M. S. (2015). Local dependence in random graph models: characterization, properties and statistical inference. Journal of the Royal Statistical Society B 77, 647–676. Schweinberger, M., Krivitsky, P. N. & Butts, C. T. (2017). A note on the role of projectivity in likelihood-based inference for random graph models Arxiv.org/abs/1707.00211. Schweinberger, M. & Snijders, T. A. B. (2003). Settings in social networks: a measurement model. In Sociological Methodology, R. M. Stolzenberg, ed., vol. 33, chap. 10. Boston & Oxford: Basil Blackwell, pp. 307–341. Schweinberger, M. & Stewart, J. (2017). Consistent M-estimation of curved exponential-family random graph models with local dependence and growing neighborhoods arxiv.org/abs/1702.01812.
42
Sewell, D. K. & Chen, Y. (2015). Latent space models for dynamic networks. Journal of the American Statistical Association 110, 1646–1657. Shalizi, C. R. & Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. The Annals of Statistics 41, 508–535. Simpson, S. L., Moussa, M. N. & Laurienti, P. J. (2012). An exponential random graph modeling approach to creating group-based representative wholebrain connectivity networks. Neuroimage 60, 1117–1126. Smith, T. W., Marsden, P., Hout, M. & Kim, J. (1972–2016). General social surveys. Tech. rep., NORC at the University of Chicago. Snijders, T. A. B. (2001). The statistical evaluation of social network dynamics. In Sociological Methodology, M. Sobel & M. Becker, eds. Boston and London: Basil Blackwell, pp. 361–395. Snijders, T. A. B. (2002). Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure 3, 1–40. Snijders, T. A. B. (2010). Conditional marginalization for exponential random graph models. The Journal of Mathematical Sociology 34, 239–252. Snijders, T. A. B. & Bosker, R. (1999). Multilevel Modeling: An Introduction to Basic and Advanced Multilevel Modeling. London: Sage. Snijders, T. A. B., Pattison, P. E., Robins, G. L. & Handcock, M. S. (2006). New specifications for exponential random graph models. Sociological Methodology 36, 99–153. Snijders, T. A. B., Steglich, C. E. G. & Schweinberger, M. (2007). Modeling the co-evolution of networks and behavior. In Longitudinal models in the behavioral and related sciences, K. van Montfort, H. Oud & A. Satorra, eds. Lawrence Erlbaum, pp. 41–71. Strauss, D. (1986). On a general class of models for interaction. SIAM Review 28, 513–527. Strauss, D. & Ikeda, M. (1990). Pseudolikelihood estimation for social networks. Journal of the American Statistical Association 85, 204–212.
43
Thiemichen, S. & Kauermann, G. (2017). Stable exponential random graph models with non-parametric components for large dense networks. Social Networks 49, 67–80. Thompson, S. & Frank, O. (2000). Model-based estimation with link-tracing sampling designs. Survey Methodology 26, 87–98. Thompson, S. K. (2012). Sampling. John Wiley & Sons, 3rd ed. Wang, P., Robbins, G. & Pattison, P. (2009). PNet. Program for the Simulation and Estimation of Exponential Random Graph (p*) Models. Wasserman, S. & Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press. Wasserman, S. & Pattison, P. (1996). Logit models and logistic regression for social networks: I. An introduction to Markov graphs and p∗ . Psychometrika 61, 401–425. Yan, T., Leng, C. & Zhu, J. (2016a). Asymptotics in directed exponential random graph models with an increasing bi-degree sequence. The Annals of Statistics 44, 31–57. Yan, T., Wang, H. & Qin, H. (2016b). Asymptotics in undirected random graph models parameterized by the strengths of vertices. Statistica Sinica 26, 273–293. Yan, T., Zhao, Y. & Qin, H. (2015). Asymptotic normality in the maximum entropy models on graphs with an increasing number of parameters. Journal of Multivariate Analysis 133, 61–76.
44