Randomly Weighted $ d-$ complexes: Minimal Spanning Acycles and ...

4 downloads 20 Views 2MB Size Report
Jan 1, 2017 - R. Mathew, I. Newman, Y. Rabinovich, and D. Rajendraprasad, ... networks and some generalizations, Bell System Technical Journal 36.
RANDOMLY WEIGHTED d−COMPLEXES: MINIMAL SPANNING ACYCLES AND PERSISTENCE DIAGRAMS

arXiv:1701.00239v1 [math.PR] 1 Jan 2017

PRIMOZ SKRABA, GUGAN THOPPE, AND D. YOGESHWARAN Abstract. This paper has three key parts. The first part concerns spanning acycles which are higher dimensional analogues of spanning trees. We give a compendium of basic properties of spanning acycles lending further credence to this analogy as well as acting as a stepping stone for study of minimal spanning acycles on infinite complexes. We also give a simplicial version of the Kruskal’s and Jarn`ık-Prim-Dijkstra’s algorithm for generating a minimal spanning acycle of a weighted simplicial complex. In the second part, using simplicial Kruskal’s algorithm, we prove that the set of face-weights of a minimal spanning acycle in a weighted simplicial complex is same as the set of ‘death times’ in the associated persistence diagram provided the weights induce a filtration. Further, we prove a stability result for the face-weights in a minimal spanning acycle, and hence for death and birth times under the Lp matching distance for any p ∈ {0, . . . , ∞}. In the third part, we consider randomly weighted d−complexes on n vertices. In the generic case, all faces up to dimension d − 1 have 0 weights, while the d−face weights are perturbation of some i.i.d. distribution. Under minor assumptions, using the above stability result, we show that if the maximum perturbation converges in probability to 0 sufficiently fast then, after suitable scaling, the extremal weights in the minimal d−spanning acycle and extremal death times in the persistence diagram of the (d−1)−th homology of the associated simplicial process converge to the Poisson point process on R with intensity e−x dx. We lastly show that the limiting results of Frieze [29] on total edge-weight of a random minimal spanning tree and that of Hiraoka and Shirai [33] on lifetime sums of persistence diagrams of randomly weighted complexes also hold for suitable noisy version.

1. Introduction This paper investigates extremal properties of persistence diagrams and minimal spanning acycles of a ‘mean-field’ model of weighted random complexes. As we shall see, intimately related to the above two terms is what we refer to as the “nearest face” distance. In the much more popular scenario of weighted graphs (which are a special case of weighted complexes), persistence diagrams correspond to the vanishing thresholds of number of connected components, minimal spanning acycles correspond to minimal spanning trees and nearest face distance is nothing but the nearest neighbour distance. That connectivity and nearest 1991 Mathematics Subject Classification. Primary : 60C05, 05E45 Secondary : 60G70, 60B99, 05C80 . Key words and phrases. Random complexes, Persistent diagrams, Minimal spanning acycles, Point processes, Weak convergence, Stability. Research supported by EU project Proasense FP7-ICT-2013-10(612329). Research supported by URSAT, ERC Grant 320422. Research supported by DST-INSPIRE faculty award. 1

neighbour distance are intertwined was conspicuous even in the earliest work on connectivity thresholds for random graphs by Erd˝os and R´enyi [27]. Coincidentally, three years earlier the relation between minimal spanning trees and vanishing of connected components in a graph was used in Kruskal’s algorithm to construct a minimal spanning tree [44] and the relation was also used later in the seminal work of Frieze [29]. On the other hand, connections between largest nearest neighbour distances and longest edges of a minimal spanning tree on randomly weighted graphs were at the heart of studies in [32, 60, 3, 54, 35]. More complete accounts of the theory of random graphs and in particular, the relation between the above three quantities can be found in [7, 63, 37, 55, 30]. Random complexes: In the recent years, motivated by applications in topological data analysis, the study of random graphs has been extended to the study of random complexes. Though we now narrow our focus to a specific random complex, we alert the reader of existence of a richer theory of random complexes and topological data analysis [10, 39, 6, 40, 20]. Before describing random complexes, we define (simplicial) complexes, which are higher dimensional versions of a graph. Definition 1.1. An (abstract) simplical complex K on a finite ground set V is a collection of subsets of V such that if σ1 ∈ K and ∅ = 6 σ2 ⊂ σ1 , then σ2 ∈ K as well. The elements of K are called simplices or faces and the dimension of a simplex σ is |σ| − 1, with | · | here denoting the cardinality. A d−face of K is a face of K with dimension d. Given a complex K, we denote the d-faces of K by F d (K) and its d-skeleton by Kd (i.e., the sub-complex of K consisting of all faces of dimension at most d) for any d ≥ 0. We use σ, τ to denote faces and the dimension of the face shall not be explicitly mentioned unless required. If a (simplicial) complex consisted only of 0-faces and 1-faces then it is a graph or in other words, the 1-skeleton of a complex is a graph. Associated to each simplicial complex is a collection of non-negative integers denoted β0 (K), β1 (K), . . . , called the Betti numbers1 (see Section 2.1 for detailed definitions) which are a measure of connectivity of the simplicial complex. Informally, the d−th Betti number counts the number of (d + 1)-dimensional holes in the complex or equivalently the number of independent non-trivial cycles formed by d-faces. Two points to note at the moment are: (i) β0 (K) is one less than the number of connected components in the graph formed by 0-faces and 1-faces and (ii) if the dimension of K (maximum of dimension of faces) is d, then βj (K) = 0 for all j ≥ d + 1. The probabilistic model of interest to us is the one introduced by Linial and Meshulam [45] and then extended by Meshulam and Wallach [51]. This model, called as the random dcomplex and denoted by Yn,d (p), consists of all faces on n vertices (i.e., ground set V = [n] := {1, . . . , n}) with dimension at most (d − 1) and each d-face is included with probability p independently. Yn,1 (p) is the classical Erd˝os-R`enyi graph on n vertices with edge-connection probability p. Like Erd˝os-R`enyi graph is a mean-field model of pairwise interactions, the 1We

work with reduced Betti numbers defined using field coefficients throughout the article. 2

random d-complex can be considered as a model of higher-order interactions. Though a simple model, it has spanned a rich literature in the recent years [45, 51, 14, 18, 19, 47]. The focus of many studies on random d-complexes have been the two non-trivial Betti numbers of the complex: βd−1 (·) and βd (·). The starting part of our study is the following fine phase transition result for βd−1 (Yn,d (p)). Lemma 1.2. [61], [41, Theorem 1.10] Fix d ≥ 1. Consider Yn,d (pn ) with d log n + c − log(d!) n for some fixed c ∈ R. Then, as n → ∞, βd−1 (Yn,d (pn )) ⇒ Poi(e−c ), where Poi(λ) stands for the Poisson random variable with mean λ and ⇒ denotes convergence in distribution. pn =

Additionally, one has that βd−1 (Yn,d (pn )) → 0 if npn − d log n → ∞ and βd−1 (Yn,d (pn )) → ∞ if npn − d log n → −∞. These were proven by Erd˝os and R´enyi [27] in 1959 for d = 1, much later by Linial and Meshulam [45] in 2006 for d = 2 and shortly thereafter in 2009 for d ≥ 3 by Meshulam and Wallach [51]. Consider the complex on n vertices having all faces up to dimension d. A standard coupling of random graphs that can be extended to random complexes, is to endow all the d-faces with i.i.d. uniform [0, 1] weights and all other faces having dimension d − 1 or less with weight 0. We denote such a weighted random complex as Un,d and call it the uniformly weighted d−complex. Let Un,d (t) denote the subcomplex of Un,d with only those faces with weight less than or equal to t. Then, clearly, for t ∈ [0, 1], Un,d (t) has the same distribution as Yn,d (t). Topological considerations imply that βd−1 (Un,d (t)) is a non-increasing, step function in t, i.e., a.s. a non-increasing jump process. Let D = {Di } (called as death times of the (d − 1)-persistent diagram) denote the set of jump times of this process. Since the weights are a.s. unique, the jump times have no multiplicity. Then clearly the set D characterises the persistence diagram of Un,d (see Subsection 2.1.2). Defining the scaled point process D Pn,d := {nDi − d log n + log(d!)}, we can rephrase Lemma 1.2 as D Pn,d (c, ∞) ⇒ Poi(e−c ), D where, for R ⊆ R, Pn,d (R) := |{i : nDi − d log n + log(d!) ∈ R}|. See Figure 1(a) for a simulation of death times of the (d − 1)-persistent diagrams of U30,d (t) for d = 1, . . . , 4. One of our main results is that the above distributional convergence can be extended to weak convergence of the corresponding point processes in vague topology. D Theorem 1.3. As n → ∞, Pn,d converges in distribution to Ppoi , where Ppoi is the Poisson −x point process with intensity e dx on R.

Theorem 1.3 gives convergence of the extremal points of the persistence diagram of a uniformly weighted d-complex. We say extremal points because only the points near the maximum contribute to the scaled point process and the scaling maps many points in the persistence diagram to −∞. A weak convergence result as above along with the continuous D mapping theorem yields asymptotic distribution of various statistics of Pn,d . For example, one can infer the asymptotic joint distribution of the largest m points in the persistence 3

Erd˝os-R´enyi Persistence Diagram for 50 points 1

Linial-Meshulam Death Times for 30 points

3

Dimension

0.8 2

0.6 Dimension=0 Dimension=1 Dimension=2 Dimension=3 Dimension=4 Dimension=5

0.4

1

0.2 0 0

0.2

0.4

0.6

0.8

0

1

0

0.2

0.4

0.6

0.8

1

Time

Figure 1. (a) The death times and corresponding for the uniformly weighted random d-complex built on 30 points in different dimensions. (b) The persistence diagram for an Erd˝os-Renyi clique complex for 50 points. diagram whereas Lemma 1.2 will give only result about the one-dimensional marginal distributions. Further, our results might be useful in deriving asymptotics for other summary statistics of persistence diagrams such as persistence landscapes [8], homological scaffolds [56] or accumulative persistence function [5]. An important paradigm in topological data analysis is that extremal points of a persistence diagram encode meaningful topological information about the underlying structure. Viewed in this light, our result completely characterizes the extremal behaviour of the persistence diagram of the uniformly weighted d-complex. We would like to emphasize that to the best of our knowledge this is the first such complete characterization of extremal persistence diagram of a model of random complex in higher dimensions and to nobody’s surprise, it is for the simplest model of a random complex. For the case d = 1, which corresponds to β0 , Theorem 1.3 for a model of random geometric graphs can be deduced from the results of [54, 35]. As with many other scaling limit results, one would expect our result to hold for many other random complex models such as random clique complexes [39], random geometric complexes [6], etc. But this is beyond the scope of the current paper. As for our proof, it mainly involves an application of the factorial moment method to show convergence of the point process of largest nearest face distances and using this to approximate the extremal persistence diagram of the uniformly weighted d-complex. Though such an approach should not surprise someone familiar with the proof techniques of [45, 51, 41], the theorem significantly extends the connection between nearest neighbour distances and connectivity of random graphs to random complexes. We discuss important extensions of these results a little later. Having discussed so far persistence diagrams and nearest face distances, we now discuss the more novel component of the paper. Minimal Spanning Acycle: A spanning tree of a graph on a vertex set V is easily described in topological terms as a set of edges S such that β0 (V ∪ S) = β1 (V ∪ S) = 0, i.e., 4

V ∪ S is connected and has no cycles. As an higher-dimensional generalization, the following definition due to Kalai [42] is natural. Definition 1.4 (Spanning acycle). Consider a complex K of dimension at least d, d ≥ 1. A subset S of d-faces is called a spanning acycle if βd−1 (Kd−1 ∪ S) = 0 (spanning) and βd (Kd−1 ∪ S) = 0 (acyclicity). A subset S of d-faces is called a maximal acycle if it is an acycle and maximal w.r.t. inclusion of d-faces. Though the definition of a spanning acycle appears natural and seems to be merely replacing the appropriate indices in the definition of a spanning tree, what is not obvious is that this is a good higher-dimensional generalization of a spanning tree and enjoys the many nice properties spanning trees do. An algebraic description of a spanning tree is that it is the set of columns that form a basis for the column space of the incidence matrix or boundary matrix, i.e., the matrix ∂1 whose rows are indexed by vertices and columns by edges such that the i, j−th entry is 1 if the vertex i belongs to the edge j and 0 otherwise2. Such a description (detailed in Section B) also holds for spanning acycles and underpins many of our proof ideas even though it is not mentioned explicitly. Further, once we assign weights to faces, that is we consider a weighted complex K, one can naturally define a minimal spanning acycle. Since we deal with only finite complexes, existence of a minimal spanning acycle is guaranteed once a spanning acycle exists. Though Kalai’s definition of a spanning acycle and enumeration of number of spanning cycles (a generalization of Cayley’s formula for spanning trees) is more than three decades old, it is receiving increased attention in the last few years [4, 24, 36, 43, 33, 34, 48, 46, 50]. Our second significant contribution is to add to this burgeoning literature on spanning acycles and the nascent literature on random spanning acycles. Firstly, we list a number of basic properties of (minimal) spanning acycles analogous to those known for (minimal) spanning trees. We expect this enumeration of properties (see Section 3.1) - existence, uniqueness, cut property, cycle property, exchange property - to be useful in further research on (minimal) spanning acycles. We also present the simplicial Kruskal’s and Jarn´ık-PrimDijkstra’s algorithm (see Section 3.2) to generate minimal spanning acycles. We believe that many of the properties we have enumerated - especially the Prim’s algorithm and Lemma 3.25 - are important steps towards study of minimal spanning acycles on infinite complexes and in particular Euclidean minimal spanning acycles. For analogous results about minimal spanning trees, see [49, Chapter 11], and for the Euclidean case, refer to [1, 2]. Though many of the above properties can be proved independently using matroid theory [64, 53]) and such a viewpoint has been beneficial to the study of spanning acycles (see [4, 34] and in particular [43, Section 4]), we provide self-contained proofs here using tools from combinatorial topology. It is possible that some of these results have been implicitly used in the literature but we have not been able to find explicit mention of these properties elsewhere. We now highlight an inclusion-exclusion identity which relies heavily on MayerVietoris exact sequence from algebraic topology. If γd (K) were to denote the cardinality of 2For

simplicity, we are assuming our underlying field F = Z2 here, i.e., all vector spaces involved are Z2 -vector spaces. 5

a spanning acycle and K1 , K2 are two finite complexes such that K1 ∪ K2 is of dimension at least d, then γd (K1 ) + γd (K2 ) = γd (K1 ∪ K2 ) + γd (K1 ∩ K2 ) + β(ker νd−1 ), where the homomorphism νd−1 (or linear map in this case) is precisely defined in Theorem 3.7, ker stands for kernel and β(·) denotes the rank. The matroidal property of spanning acycles shall only yield an inequality. Higher-dimensional connectivity can be studied via hypergraphs as well and we briefly discuss this relation with hypergraph connectivity in Section 3.2, in particular showing that if the (d − 1)-faces of the complex K are hypergraph connected, so is the spanning acycle. This also cannot be studied only using matroid properties. Persistent Diagrams and Minimal Spanning Acycles. Having introduced minimal spanning acycles and discussed their basic properties, we now preview their connection to persistence diagrams. Let K be a weighted complex with real valued injective 3 weight function w such that K(t) := w−1 (∞, t] is a simplicial complex for all t ∈ R. As K(t) ⊆ K(t0 ) whenever t ≤ t0 , w induces the filtration {K(t) : t ∈ R} on K (see Subsection 2.1.2), where R = R ∪ {−∞, ∞}. Let d ≥ 0 be arbitrary and suppose that βd (K) = 0. Let βd (t) = βd (K(t)). We remark that βd (t) is a jump function. The times of positive jumps are birth times B = {Bi } of the persistence diagram and the times of negative jumps are death times D = {Di } of the persistence diagram as described earlier. Persistent diagram (or homology) is more than merely keeping track of death and birth times and in particular considers the (complex) pairings of birth times with their “corresponding” death times (For example, see Figure 1(b) for a persistence diagram of Erd˝os-R`enyi clique complexes). In this article we shall focus only on their two projections - birth and death times. An easy consequence of the above informal description that can be justified via Fubini’s theorem is the following identity for lifetime sum of persistence diagrams : Z ∞ X (1.1) Ld := (Di − Bi ) = βd (t)dt. 0

i

Assume that βd−1 (K) = βd−2 (K) = 0. This assures us that Md and Md−1 , respectively the d−th and (d − 1)−th minimal spanning acycle, exist and are unique (see Lemmas 3.6 and 2.6). Under the above conditions, we have one of the important results of the paper (Theorem 3.23) : (1.2)

Dd = {w(σ) : σ ∈ Md }, Bd−1 = {w(σ) : σ ∈ Kd−1 \Md−1 },

where Bd−1 , Dd are respectively the birth and death times of the Hd−1 (·) persistence diagram of K. The simplicial version of Kruskal’s algorithm and the incremental algorithm for persistence (which is also a greedy algorithm) are the crucial tools in the above proof. As a 3Our

results hold for non-injective monotonic weight functions as well but birth and death times need to be defined appropriately. 6

corollary of the above, we derive the following relation Z ∞ X X (1.3) Ld−1 = βd−1 (t)dt = w(σ)− w(σ) = w(Md )+w(Md−1 )−w(F d−1 ), 0

σ∈Md

σ∈Kd−1 \Md−1

where w(S) = σ∈S w(σ) for a subset S of simplices. For d = 1 (assuming K0 ⊂ w−1 (0)), the above relation can be derived from Kruskal’s algorithm and, for d ≥ 2, this relation was derived recently in [33, Theorem 1.1] using different techniques. This latter paper and, in particular, their derivation of (1.3) served as our stimulus to investigate minimal spanning acycles. Apart from the striking simplicity of the result connecting minimal spanning acycles to persistence diagram, we believe the relation can be useful in studying either of them using the other. Since much of the complication in understanding persistent homology arises from the complex pairing of birth and death times, the above result is useful in understanding death or birth times individually and in certain cases, this shall yield useful information (e.g. lifetime sum) even without knowledge of the pairings. As a trivial corollary to Theorem 1.3, we can deduce point process convergence of extremal weights of a minimal spanning acycle on uniformly weighted d-complexes. To the best of our knowledge, such a result (though not surprising) is not known even for complete graphs with i.i.d. uniform [0, 1]-weights, which might be considered as a mean-field model for random metric spaces. Again for random geometric graphs, such a point process convergence result for extremal edge weights of the minimal spanning tree has been proven in [54, 35]. These results were important in understanding connectivity of random geometric graphs. However, here the scenario has been reversed. We have gone from results on connectivity (i.e., Hk (·) persistence diagrams) to those for minimal spanning acycles. P

Stability Results. Stability results (e.g. [25, Section VIII.2], [11, 12, 15, 16]) are an important cog in the wheel of topological data analysis and provide a theoretical justification for the robustness of persistent homology. While L∞ stability (or bottleneck stability) is the most standard form of stability proven for persistence diagrams, Lp stability for p ≥ 0 requires restrictive assumptions and are not widely applicable. Using simplicial version of Kruskal’s algorithm and the correspondence (1.2), we prove the following stability result separately for birth and death times with minimal assumptions. Theorem 1.5. Let K be a finite complex with two weight functions f, g; both of which induce a filtration on K. Let Bf = {Bi } ; Df = {Di } ; Bg = {Bi0 } ; Dg = {Di0 } be the respective birth and death times in the Hd (·) and Hd−1 (·) persistence diagrams of f, g respectively. Let ΠD be the set of bijections from Df to Dg and similarly, ΠB , the set of bijections from Bf to Bg . Then for any p ∈ {0, . . . , ∞}, X X X max{ inf |Di − π(Di0 )|p , inf |Bi − π(Bi0 )|p } ≤ |f (σ) − g(σ)|p , π∈ΠD

i

π∈ΠB

i

σ∈F d

For p = ∞ and a sequence {xi }i≥1 , in the usual manner 7

P

i

|xi |p should be read as supi |xi |.

As part of the proof of the above stability result, we show that changing weights of m (m ≥ 1) faces can change at most m death times and m birth times by the difference between the weights on the faces. One might suspect that the L∞ stability in the above theorem can be deduced from bottleneck stability of persistence diagrams by a projection argument. We would like to point out that this is not the case. This is mainly due to the fact that the diagonal plays a special role in the definition of bottleneck stability of persistence diagrams whereas there is no such equivalent for bottleneck distance between point processes on R. We explore consequences of the above stability result in the context of weighted random complexes. In particular, we consider the generically weighted d-complex L0n,d whose d-faces σ have weights φ(σ) = w(σ) + n (σ) where w(σ) are i.i.d. with a strictly increasing and Lipschitz continuous distribution and n (σ)’s are a arbitrary collection of random variables. Regardless of any dependence properties of n (σ)’s, we show that if n supσ |n (σ)| → 0 in probability, then the point process of extremal face-weights of the minimal spanning acycle (resp. death times) converge again to the Poisson point process with intensity e−x dx on R. In other words, the conclusion of Theorem 1.3 holds for both death times and face-weights of the minimal spanning acycles under the perturbed set-up provided the perturbations decay sufficiently fast. We also mention special cases where the above condition can be verified easily. Reconsider the uniformly weighted d−complex Un,d . In this case, we have that the lifetime sum Ln,1 (see (1.1)) is equal to the weight of the minimal spanning tree (see (1.3)). A celebrated result in the theory of minimal spanning trees by Frieze ([29]) shows that (1.4)

a.s.

Ln,1 → ζ(3) =

∞ X

k −3 = 1.202,

k=1

where ζ(·) is Riemann’s zeta function. In higher dimensions, Hiraoka-Shirai [33, Theorem 1.2] showed recently that E[Ln,d ] = Θ(nd−1 ). The existence of a limit and its value still remains open for d ≥ 2. As a consequence of our stability result (Theorem 1.5), we trivially get that the above results of Frieze and Hiroaka-Shirai hold in expectation for the noisy random complex L0n,d provided supσ E[n (σ)] = o(n−2 ). While it is believed that introducing weak dependencies between the random variables will not affect the asymptotics, it is not often easy to rigorously prove such a statement. In general, our results shall help one to easily establish limit theorems for a “noisy” version of any random complex model once it has been established for the random complex model without noise. For example, in [33, Theorem 6.10], an upper and lower bound for the expected lifetime was shown. It is possible to extend these bounds to a suitable noisy version of the random clique complexes. Organisation of Paper: The next section - Section 2 - gives in detail the necessary topological (Section 2.1) and probabilistic preliminaries (Section 2.2)4. Section 3 is exclusively devoted to studying various properties of minimal spanning acycles (Section 3.1), algorithms 4In

our effort to make this article reasonably self-contained as well as accessible to the trio of probabilists, combinatorialists and topologists, we might have erred on the side of details rather than terseness. 8

to generate them (Section 3.2), their connection to persistence diagrams (Section 3.3) and a crucial stability result (Section 3.4). In the next two sections, we study weighted simplicial complexes wherein the weights have been generated randomly and prove our point process convergence results. In Section 4, the weights are independent and identically distributed (i.i.d.) uniform [0, 1] random variables and in Section 5, the weights are either i.i.d. with a more general distribution F or a perturbation of the same. The appendix collects results regarding point process convergence in vague topology, gives a matrix theoretic perspective of spanning acycles and explains a cohomological bound needed for a self-contained proof of Theorem 1.3. 2. Preliminaries We describe here the basic notions of simplicial homology and persistent homology. Later in Section C, we also introduce simplicial cohomology in order to keep the paper selfcontained and in Section B, rephrase our topological notions in the language of matrices for ease of understanding. After topological preliminaries are covered here, we review notions regarding point processes and their weak convergence. 2.1. Topological Notions. We restrict ourselves to studying simplicial complexes and shall always choose our coefficients from a field F. In this regard, 0 stands for additive identity, 1 stands for multiplicative identity and −1 for the additive inverse of 1. An often convenient choice in computational topology is F = Z2 in which case 1 = −1. 2.1.1. Simplicial Homology. This provides an algebraic tool to study the topology of a simplicial complex. For a good introduction to algebraic topology, see [31], and for an easy introduction to simplicial complexes, see [25]. A more detailed account of simplicial homology can be found in [52]. Let K be a simplicial complex (see Definition 1.1). We assume throughout that all our simplicial complexes are defined over a finite set V . Further, we denote the cardinality of F d (K) by fd (K). The 0-faces of K are also called as vertices. When obvious, we shall omit the reference to the underlying complex K in the notation. A d-simplex σ is often represented as [v0 , . . . , vd ] to explicitly indicate the subset of V generating the simplex σ. An orientation of a d-simplex is given by an ordering of the vertices and denoted by [v0 , . . . , vd ]. Two orderings induce the same orientation if and only if they differ by an even permutation of the vertices. In other words, for a permutation π on [d], [v0 , . . . , vd ] = (−1)sgn(π) [vπ(0) , . . . , vπ(d) ]. We assume that each simplex in our complex is assigned a specific orientation (i.e., ordering). P Let F be a field. A simplicial d-chain is a formal sum of oriented d-simplices ci σi with i

the coefficients ci ∈ F. The free abelian group generated by the d-chains is denoted by the Cd (K), the d−th chain group, i.e., ( ) X Cd (K) := ci σi : ci ∈ F, σi ∈ F d (K) . i

9

Clearly Cd (K) is a F-vector space. We shall set C−1 = F and Cd = 0 for d = −2, −3, . . .. For a vector space, we shall denote the rank by β(·). Thus, β(Cd ) = fd for d ≥ 0. For d ≥ 1, we define the boundary operator ∂d : Cd −→ Cd−1 first on each d−simplex as below and then extend it linearly on Cd : ∂d ([v0 , . . . , vd ]) =

d X

(−1)i [v0 , . . . , vˆi , . . . , vd ],

i=0

where vˆi denotes that vi is to be omitted. ∂0 is defined by setting ∂0 ([v]) = 1 for all v ∈ F 0 . It can be verified that ∂d is a linear map of vector spaces and more importantly that ∂d−1 ◦∂d = 0 for all d ≥ 1, i.e., boundary of a boundary is zero. When the context is clear, we will drop the dimension d from the subscript of ∂d . Note that the free abelian group of d-chains is defined only using F d (K). When we use a subset S ⊂ F d (K) of d-faces rather than the entire collection of d-faces to generate the free abelian group, we shall call the corresponding free abelian (sub)group of d-chains as Cd (S). P In other words, Cd (S) = Cd (Kd−1 ∪ S). Also, for a d-chain i ai σi , the support denoted as P supp( i ai σi ) is {σi : ai 6= 0} ⊂ F d . For F = Z2 , the support characterizes the d-chain. The d−th boundary space denoted by Bd is im ∂d+1 and the d−th cycle space Zd is ker ∂d . Elements of Zd are called cycles or d-cycles to be more specific. The d-dimensional (reduced)5 homology group is then defined as the quotient group (2.1)

Hd =

Zd . Bd

Again, since we are working with field coefficients, Bd , Zd and Hd are all F−vector spaces. The d−th Betti number of the complex βd (K) is defined to be the rank of the vector space Hd . Respectively, let bd (K) := β(Bd ) and zd (K) := β(Zd ) denote the ranks of the d−th boundary and d−th cycle spaces respectively. Thus we have that βd = zd − bd . Note that we drop the adjective reduced henceforth, but all our homology groups and Betti numbers are indeed ˜ d and β˜d to denote reduced homology groups reduced ones. Some authors prefer to use H and Betti numbers respectively, but we refrain from doing so for notational convenience. However, under such a notation, we note that βd − β˜d = 1[d = 0]. This gives an easy way to translate results for reduced Betti numbers to Betti numbers and vice-versa. We denote the Euler-Poincar´e characteristic by χ and the Euler-Poincar´e formula holds as follows: (2.2)

∞ ∞ X X j χ(K) = (−1) fj (K) = 1 + (−1)j βj (K). j=0

j=0

An important property of homology groups that is often of use is the following: If K1 , K2 are two complexes such that the function h : K10 → K20 is a simplicial map (i.e., σ1 = [v0 , . . . , vd ] ∈ K1 implies that h(σ1 ) = [h(v0 ), . . . , h(vd )] ∈ K2 for all d ≥ 0), then there exists an homomorphism h∗ : Hd (K1 ) → Hd (K2 ) called the induced homomorphism between the homology groups. One of the natural simplicial maps is the inclusion map from a complex K1 5Reduced

is used to refer to the convention that C−1 = F instead of C−1 = 0. 10

to K2 such that K1 ⊂ K2 . One of the tools in algebraic topology we use is the Mayer-Vietoris sequence. Below we state it explicitly and its implication for Betti numbers. A sequence of vector spaces V1 , . . . , Vl and linear maps νi : Vi → Vi+1 , i = 1, . . . , l − 1 is said to be exact if im νi = ker νi+1 for all i = 1, . . . , l − 1. Lemma 2.1. (Mayer-Vietoris Sequence [52, Theorem 25.1] , [22, Corollary 2.2].) Let K1 and K2 be two finite simplicial complexes and K0 = K1 ∩ K2 (i.e., K0 is the complex formed from all the simplices that are in both K1 and K2 ). Then the following are true: (1) The following is an exact sequence, and, furthermore, the homomorphisms νd are induced by the respective inclusions: ν

d · · · → Hd (K0 ) → Hd (K1 ) ⊕ Hd (K2 ) → Hd (K1 ∪ K2 )

νd−1

→ Hd−1 (K0 ) → Hd−1 (K1 ) ⊕ Hd−1 (K2 ) → · · · (2) Furthermore, (2.3)

βd (K1 ∪ K2 ) = βd (K1 ) + βd (K2 ) + β(ker νd ) + β(ker νd−1 ) − βd (K0 ).

2.1.2. Persistent Homology. A filtration of a simplicial complex K is a sequence of subcomplexes {K(t) : t ∈ R} satisfying ∅ = K(−∞) ⊆ K(t1 ) ⊆ K(t2 ) ⊆ K(∞) = K for all −∞ < t1 ≤ t2 < ∞. Put differently, the filtration {K(t) : t ∈ R} describes how to build K by adding collections of simplices at a time. Persistent homology provides a formal tool to understand how the topology of a filtration evolves as the filtration parameter changes. For more complete introduction and survey of persistent homology, see [25, 9, 10]. We now describe the natural filtration associated with weighted simplicial complexes. Consider a simplicial complex K weighted by w : K → R satisfying w(σ) ≤ w(τ ), whenever σ, τ ∈ K and σ ⊂ τ. Functions having this property are called monotonic functions in [25, Chapter VIII]. As w is monotone, {K(t) : t ∈ R} with K(t) := w−1 (−∞, t] forms a sublevel set filtration of K. Further note that w induces a partial order on the faces of K. Assuming axiom of choice, this partial order can always be extended to a total order [62] such that the filtration property is preserved. Let 0, there exists δ > 0 (depending on m, ) such that dB (m1 , m) ≤ δ implies that dv (m1 , m) < . Set δ = dB (m1 , m) and, without loss of generality, assume that δ < 1. Further let γ : supp(m) → supp(m1 ) be the bijection such that maxx∈supp(m) |x − γ(x)| ≤ δ. Now, the following is enough to prove the above claim: For a given  > 0, there exists constant λ and a compact set K (depending on  alone) such that the following bound holds:  (A.2) dv (m1 , m) ≤ 2λm(K )δ + . 2 37

To prove the above bound, first note that for a given  > 0, from (A.1) we can choose k such that the following holds: dv (m1 , m) ≤

k X

 |m1 (hj ) − m(hj )| + . 2 j=1

Let us set Kj to be the compact support of hj and λj to be the Lipschitz constant. By the definition of Bottleneck distance, we have that for any compact set K m1 (K) ≤ m(K δ ) ≤ m1 (K 2δ ),

(A.3)

where K δ := {x ∈ R : ∃y ∈ K s.t. |x − y| ≤ δ}. Fix a j ∈ {1, . . . , k} and set M = supp(m), M1 = supp(m1 ). Now by definition of m(hj ) and that hj is a λj -Lipschitz continuous function supported on Kj , we have that |m1 (hj ) − m(hj )| X X X = | hj (x) − hj (x)| = | [hj (x) − hj (γ(x))]| x∈M

x∈M1

x∈M

≤ λj [m(Kj ) + m1 (Kj )]δ ≤ 2λj m(Kj1 )δ, where in the last inequality we have used (A.3) and the fact that δ < 1. Now setting P  λ = kj=1 λj and K = ∪kj=1 Kj1 , we get the desired bound (A.2). Appendix B. Matrix theoretic perspective Since we are concerned only with field coefficients, all the groups in consideration are vector spaces as mentioned and the homomorphisms are all linear maps. This naturally calls for rephrasing the above notions in the language of matrices which shall perhaps make it more tractable for many readers. Such a viewpoint shall also bring in obvious advantages in understanding minimal spanning acycles. For simplicity, we shall stick to F = Z2 in this subsection. We denote the fd−1 × fd dimensional matrix corresponding to the boundary operator ∂d by Id = (Id (ij))1≤i≤fd−1 ,1≤j≤fd . The columns in this matrix correspond to the d−faces and rows to the d − 1 faces. Then the matrix entry Id (ij) = 1 if the i−th (d − 1)-face is in the boundary of j−th d-face else Id (ij) = 0. From the discussions in Section 2.1, we get β(Zd ) = nullity(∂d ), β(Bd ) = rank(∂d+1 ), βd (K) = nullity(∂d ) − rank(∂d+1 ). With d-simplices considered as columns of the matrix Id , the columns corresponding to the d-faces in a d-acycle S (i.e., βd (Kd−1 ∪ S) = 0) are linearly independent. Thus, when a d-spanning acycle S exists, the columns corresponding to the d-faces in S form a basis for the column space of Id . In other words, a spanning acycle is nothing but the columns corresponding to a basis for the column space of Id . Due to this discussion on acycles as linearly independent columns of the boundary matrix, it follows that the collection of acycles of a simplicial complex is a matroid (see [53, 64]). Since we shall not exploit this connection in this article, we do not want to elaborate further on this. However, this connection to matroids is of utmost importance in understanding spanning acycles. 38

Appendix C. Betti numbers and Isolated faces in Yn,d (p) Lemma 4.3 is proved here. The cases d = 1 and d ≥ 2 are dealt with separately, with the latter requiring cohomological arguments. Proof of Lemma 4.3 for d = 1. Let Vn be the vertex set of Yn,1 (pn ). Since there can be at most one component of size bigger than n/2, for reduced β0 , X |β0 (Yn,1 (pn )) − N0 (Yn,1 (pn ))| ≤ 1V + 1[N0 (Yn,1 (pn )) = n], V ⊂Vn ,2≤|V |≤n/2

where 1V = 1 whenever V forms a connected component in Yn,1 (pn ) and there is no edge between a vertex in V and a vertex in V c . If |V | = k, then for all sufficiently large n, k(n−k) . E[1V ] ≤ k k−2 pk−1 n (1 − pn )

This is because, when |V | = k, there are k k−2 possible spanning trees in V, the probability of getting a particular spanning tree in V is pnk−1 , and the probability of having no edge between V and V c is (1 − pn )k(n−k) . We say sufficiently large because pn may be negative for small n if c is negative. Hence, for all sufficiently large n, n/2    X n n k−2 k−1 k(n−k) E|β0 (Yn,1 (pn )) − N0 (Yn,1 (pn ))| ≤ k pn (1 − pn ) + (1 − pn ) 2 . k k=2  n Since (1 − pn ) ≤ e−pn and k ≤ ek nk /k k , for all sufficiently large n, E|β0 (Yn,1 (pn )) − N0 (Yn,1 (pn ))| ≤

n/2 X

−pn k(n−k) ek nk k −2 pk−1 n e

−pn

+e

n 2



.

k=2

As pn = (log n + c)/n, the second term decays to 0 with n. With regards to the first term, n/2 X

−pn k(n−k) ek nk k −2 pk−1 n e

k=2

=

n/2 X

k2 pn −kc ek k −2 pk−1 n e



k=2

n/2 X

eTk ,

k=2

where Tk = k − 2 log(k) + (k − 1) log(log n + |c|) − (k − 1) log n + k|c| + k

2



 log n + |c| . n

Fix n. Treating k as a continuous variable, observe that the second derivative of Tk w.r.t. k is strictly positive for k ∈ (3, n/2). This shows that Tk is convex in (3, n/2) and hence Tk ≤ max{T3 , Tn/2 } for k ∈ {3, . . . , n/2}. But T3 > Tn/2 for all sufficiently large n. Hence, for all sufficiently large n, n/2 X

−pn k(n−k) ek nk k −2 pk−1 ≤ eT2 + n e

k=2

n T3 e . 2

But the RHS converges to 0 with n. The desired result now follows. 39



We now give a brief exposition about reduced cohomology (w.r.t. Z2 for simplicity) here which is necessary for proving Lemma 4.3 for the case d ≥ 2. In one line, it can be said that cohomology is the dual theory of homology and can be derived by considering the dual of the boundary operator ∂. Consider a simplicial complex K. For d ≥ 0, a d−cochain of K is a map g : F d → Z2 . Its support8 is given by supp(g) := {σ ∈ F d : g(σ) = 1}. Let C d := {g : F d → Z2 } denote the set of all d−cochains and it is Z2 -vector space under natural addition and scalar multiplication operations on C d . The d−th coboundary operator δd : C d → C d+1 is defined as follows : X (C.1) δd (g)(σ) := g(τ ), g ∈ C d , σ ∈ F d+1 . τ ∈F d :τ ⊂σ

For d ≥ 1, let B d := im (δd−1 ); and, for d ≥ 0, let Z d := ker(δd ). Let B 0 := {0, 1}, where 0 and 1 are respectively the 0−cochains that assign 0 and 1 to all vertices. The elements of B d are called coboundaries while the those of Z d are called cocycles. As in homology, we have that δd ◦ δd−1 = 0 and hence we define the d−th cohomology group Zd . Bd It is well known that the d−th homology group Hd is isomorphic to the d−th cohomology group H d and so we have that βd (K) := rank(H d (K)). We first describe upper and lower bounds for Betti numbers, which to the best of our knowledge, have not been explicitly mentioned anywhere. But they follow from the proofs in [45, 51, 41]. We first discuss upper bounds for Betti numbers. Fix an arbitrary d ≥ 0. For g ∈ C d , let [g] = g + B d and H d :=

w(g) = min{|supp(g 0 )| : g 0 ∈ [g]},

(C.2)

where | · | denotes cardinality. Then it follows that H d = {[g] : g ∈ Z d } = {[0]} ∪ {[g] : g ∈ Z d , |supp(g)| ≥ 1, w(g) = |supp(g)|} and hence (C.3)

βd (K) = rank({[g] : g ∈ Z d , |supp(g)| ≥ 1, w(g) = |supp(g)|}).

For d = 0, call every A ⊆ F 0 connected. For d ≥ 1, call A ⊆ F d connected, if for every σ1 , σ2 ∈ A, there exists a sequence τ1 , . . . , τi ∈ A with τ1 = σ1 and τi = σ2 such that, for each j, τj and τj+1 share a common (d − 1)−face. Now fix d ≥ 1 and consider g ∈ Z d such that supp(g) is not connected. Then clearly there exists {Ai : Ai ⊆ F d } such that each Ai is non-empty and connected; Ai ∩ Aj = ∅; for all σi ∈ Ai and σj ∈ Aj , σi and σj do not have a common (d − 1)−face; and supp(g) = ∪i Ai . Let gAi ∈ C d be such that supp(gAi ) = Ai . It is then easy to see that X g= gAi . i 8This

notion of support is different than that given for chains in Section 2.1.1 40

From the above relation and our assumption that g ∈ Z d , it necessarily follows that each gAi ∈ Z d . Suppose not. Then there exists i and σ ∈ F d+1 such that X δd (gAi )(σ) = gAi (τ ) = 1. τ ∈F d ,τ ⊂σ

Since no σi ∈ Ai shares a (d − 1)−face with any d−face in ∪j6=i Aj , the above necessarily implies that δd (g)(σ) = 1; which is a contradiction. From the above discussion and that the fact that the rank only depends upon independent elements, we have, for each d ≥ 0, βd (K) = rank({[g] : g ∈ Z d , |supp(g)| ≥ 1, w(g) = |supp(g)|, supp(g) is connected}). If we define 1g = 1[g ∈ Z d ] and (C.4)

Gd (K) = {g ∈ C d : |supp(g)| ≥ 1, w(g) = |supp(g)|, supp(g) is connected},

then the above discussion yields the upper bound X (C.5) βd (K) ≤ 1g . g∈Gd (K)

We now obtain a lower bound for the different Betti numbers. As usual, let Nd (K) denote the number of isolated d−faces in the given simplicial complex K. When d = 0, we have N0 (K) − 1[N0 (K) = |F 0 |] ≤ β0 (K). This shows that the number of isolated vertices is a lower bound for β0 (K) except in one particular case when all vertices in K are isolated. For d ≥ 1, however, one can easily come up with several examples when the number of isolated d−faces exceeds βd (K). From this, it follows that βd (K) for d ≥ 1 needs to treated a little differently. Fix d ≥ 1. In contrast to the setup used for the upper bound, we will assume here that ˜d (K), which the d−skeleton Kd of the given simplicial complex K is complete. Consider N we define to be the number of disjoint isolated d−faces in K. We call a d−face disjoint isolated if it is isolated in K and none of its neighbouring d−faces (i.e.,σ 0 ∈ F d which share a (d − 1)−face with σ) are isolated. Let σ1 , . . . , σN˜d be all the disjoint isolated d−faces in K and let gσ1 , . . . , gσN˜ be their associated indicator d−cochains. We claim that d

(C.6)

˜d . rank({[gσ1 ], . . . , [gσN˜ ]}) = N d

˜d = 1, then the above is obviously true. We need to verify it for N ˜d > 1. Suppose not. If N ˜d } such and a f ∈ C d−1 such that Then there exists I ⊆ {1, . . . , N X X (C.7) gσi = δd−1 (f ) i.e., gσi ∈ [0]. i∈I

i∈I

Fix an arbitrary i ∈ I and consider σi . Suppose σi = {v0 , . . . , vd }. Let vd+1 ∈ F 0 be such that vd+1 ∈ / σi . For j = 0 to d, let τij = {v0 , . . . , vd+1 }\{vj }. Note that each τij is a d−face 41

and it belongs to F d because of our assumption that Kd is complete. Clearly none of the τij ’s are σk ’s for any i, j, k. Thus, from (C.7), we have that for i, j ∈ {0, . . . , d} δd−1 (f )(τij ) = 0 and δd−1 (f )(σi ) = 1. But by the property of the coboundary operator, we derive a the contradiction that δd (δd−1 (f ))[v0 , . . . , vd+1 ] =

d X

δd−1 (f )(τij ) + δd−1 (f )(σi ) = 0.

j=0

˜d > 1. From (C.6) and (C.3), we now have the lower bound Thus (C.6) holds even when N ˜d (K) ≤ βd (K). (C.8) N Some preliminary results are required before we prove Lemma 4.3 for d ≥ 2. The below result can be proved using above arguments. But we give a slightly more general proof. Lemma C.1. Fix d ≥ 0. Let K be a simplicial complex with complete d−skeleton such that |F 0 | ≥ d + 2. Let σ ∈ F d and gσ be the associated indicator d−cochain. Then gσ ∈ Gd (K). Also σ is isolated in K if and only if gσ ∈ Z d . Proof. It is easy to see that if gσ ∈ / B d , then the desired result follows. For d = 0, since 0 0 |F | ≥ 2 and B = {0, 1}, it is immediate that gσ ∈ / B 0 . Let d ≥ 1 and suppose that gσ ∈ B d (K), i.e., there exists f ∈ C d−1 (K) such that δd−1 (f ) = gσ . Since |F 0 (K)| ≥ d + 2, it follows that there exists v ∈ / σ. Construct a new simplicial complex K∗ such that K∗d = Kd and {v} ∪ σ is the only (d + 1)−face in K∗ . Clearly C d−1 (K∗ ) = C d−1 (K), C d (K∗ ) = C d (K), and B d (K∗ ) = B d (K), and hence gσ ∈ B d (K∗ ) and f ∈ C d−1 (K∗ ) with gσ = δd−1 (f ). For δd : C d (K∗ ) → C d+1 (K∗ ), we have δd (gσ )({v} ∪ σ) = 1. But this is a contradiction, since by definition of boundary maps, δd (δd−1 (f ))({v} ∪ σ) = 0. Hence gσ ∈ / B d (K∗ ), which implies that gσ ∈ / B d (K). The desired result now follows.  Denoting by Nd (K), the number of isolated d−faces in K, we have the following consequence of the above result : X X (C.9) Nd (K) + 1g = 1g . g∈Gd (K),|supp(g)|>1

g∈Gd (K)

Lemma C.2. Fix d ≥ 1. Let K be a simplicial complex with complete d−skeleton such that |F 0 | ≥ 2d + 4. Let σ, σ 0 ∈ F d be such that σ 6= σ 0 and σ and σ 0 share a common (d − 1)−face. If gσ,σ0 is that g ∈ C d which has supp(g) = {σ, σ 0 }, then gσ,σ0 ∈ Gd (K). Proof. Since |supp(gσ,σ0 )| = 2 and supp(gσ,σ0 ) is connected, it suffices to show that w(gσ,σ0 ) = 2, where w(·) is as in (C.2). By repeating the argument below (C.7), we can derive a contradiction to [gσ,σ0 ] = [0] and so w(gσ,σ0 ) 6= 0. Suppose w(gσ,σ0 ) = 1. Then there exists some indicator d−co-chain gσˆ associated with the d−face σ ˆ such that gσ,σ0 + gσˆ = δd−1 (f ). Now either σ ˆ belongs to {σ, σ 0 } or not. In the former case, without loss of generality, assume 0 that σ ˆ = σ . Then it follows that gσ = gσ,σ0 + gσˆ = δd−1 (f ). Repeating the argument below 42

(C.7) (with σi there replaced by σ), we get a contradiction. Now suppose that σ ˆ∈ / {σ, σ 0 }. Since |F 0 | ≥ 2d + 4, there exists vd+1 ∈ F 0 such that vd+1 ∈ / σ ∪ σ0 ∪ σ ˆ . again repeating the argument below (C.7), we again get a contradiction and hence w(gσ,σ0 ) = 2 as required.  Theorem C.3. Fix d ≥ 1. Let K be a simplicial complex with complete d−skeleton such that |F 0 | ≥ 2d + 4. Let Nd (K) denote the number of isolated d−faces in K. Then X |βd (K) − Nd (K)| ≤ 3 1g . g∈Gd (K),|supp(g)|>1

˜d (K) denote the number of disjoint isolated d−faces in K. Then from (C.8), Proof. Let N (C.5) and (C.9), we have X X ˜d (K) ≤ βd (K) ≤ ˜d (K) ≤ Nd (K) ≤ N 1g ; N 1g . g∈Gd (K)

g∈Gd (K)

Combining the above two relations, we get X X ˜d (K) = ˜d (K). (C.10) |βd (K) − Nd (K)| ≤ 1g − N 1g − Nd (K) + Nd (K) − N g∈Gd (K)

g∈Gd (K)

Let 1(σ) = 1[σ is an isolated d−face in K] and ˆ1(σ) = 1[σ is a disjoint isolated d−face in K]. Then, we have that XX X ˜d (K) = 1(σ)1(σ 0 ), [1(σ) − ˆ1(σ)] ≤ Nd (K) − N σ∈F d σ 0

σ∈F d

where the second sum is over all neighbouring d-faces of σ. For a neighbouring d-face σ 0 , 1(σ)1(σ 0 ) ≤ 1gσ,σ0 , where gσ,σ0 is that g ∈ C d and supp(g) = {σ, σ 0 }. From the above discussion, we have X ˜d (K) ≤ 2 Nd (K) − N 1g , g∈Gd (K),|supp(g)|=2

The factor 2 comes because gσ,σ0 = gσ0 ,σ . Now using (C.9) in (C.10), the proof is complete.  Lemma C.4. [41, (3.5), (5.1)] Let d ≥ 2. Consider the random d−complex Yn,d (pn ) with d log n + c − log(d!) n for some fixed c ∈ R. Let Nd−1 (Yn,d (pn )) be the number of isolated (d − 1)−faces in Yn,d (pn ). Also let Gd−1 (Yn,d (pn )) be as in (C.4). Then     X X lim E  1g − Nd−1 (Yn,d (pn )) = lim E  1g  = 0. pn =

n→∞

n→∞

g∈Gd−1 (Yn,d (pn ))

g∈Gd−1 (Yn,d (pn )),|supp(g)|>1

In [41] (see in particular Section 5 there), each g ∈ Gd−1 (Yn,d (pn )) is identified by an appropriate hypergraph H. X(H) is the number of d−faces in Yn,d (pn ) that contain an odd number of faces of supp(g). Hence the result follows from the following inequality : E[1g ] = (1 − pn )X(H) ≤ e−pn X(H) . 43

Proof of Lemma 4.3 for d ≥ 2. This is easy to see from Theorem C.3 and Lemma C.4.



References 1. David Aldous and J. Michael Steele, Asymptotics for euclidean minimal spanning trees on random points, Prob. Th. and Rel. Fields 92 (1992), no. 2, 247–258. 2. Kenneth S. Alexander, Percolation and minimal spanning forests in infinite graphs, Ann. Probab. 23 (1995), no. 1, 87–104. 3. M.J.B. Appel and R.P. Russo, The connectivity of a graph on uniform points on [0, 1]d , Statistical Probability Letters 60 (2002), 351–357. 4. C. Bajo, B. Burdick, and S. Chmutov, On the tutte–krushkal–renardy polynomial for cell complexes, Journal of Combinatorial Theory, Series A 123 (2014), no. 1, 186 – 201. 5. C. A. N. Biscio and J. Møller, The accumulated persistence function, a new useful functional summary statistic for topological data analysis, with a view to brain artery trees and spatial point process applications, 2016, arXiv:1611.00630. 6. O. Bobrowski and M. Kahle, Topology of random geometric complexes: A survey, arXiv:1409.4734, 2014. 7. B. Bollob` as, Random graphs, Cambridge Studies in Advanced Mathematics, vol. 73, Cambridge University Press, Cambridge, 2001. 8. P. Bubenik, Statistical topological data analysis using persistence landscapes, Journal of Machine Learning Research 16 (2015), no. 1, 77–102. 9. G. Carlsson, Topology and data, Bulletin of the Ameriancan Mathematical Society 46 (2009), no. 2, 255–308. 10. , Topological pattern recognition for point cloud data, Acta Numerica 23 (2014), 289–368. 11. F. Chazal, D. Cohen-Steiner, M. Glisse, L. J. Guibas, and S. Y. Oudot, Proximity of persistence modules and their diagrams, Proceedings of the twenty-fifth annual Symposium on Computational geometry, ACM, 2009, pp. 237–246. 12. F. Chazal, V. De Silva, and S. Y. Oudot, Persistence stability for geometric complexes, Geometriae Dedicata 173 (2014), no. 1, 193–214. 13. Fr´ed´eric Chazal, Vin De Silva, Marc Glisse, and Steve Oudot, The structure and stability of persistence modules, arXiv preprint arXiv:1207.3674 (2012). 14. D. Cohen, A. Costa, M. Farber, and T. Kappeler, Topology of random 2-complexes, Discrete and Computational Geometry 47 (2012), no. 1, 117–149. 15. D. Cohen-Steiner, H. Edelsbrunner, and J. Harer, Stability of persistence diagrams, Discrete and Computational Geometry 37 (2007), no. 1, 103–120. 16. D. Cohen-Steiner, H. Edelsbrunner, J. Harer, and Y. Mileyko, Lipschitz functions have l p-stable persistence, Foundations of computational mathematics 10 (2010), no. 2, 127–139. 17. T. H. Cormen, C.E. Leiserson, and R.L. Rivest, Introduction to algorithms, MIT Press and McGraw-Hill., 2009. 18. A. E. Costa and M. Farber, The asphericity of random 2-dimensional complexes, Random Structures and Algorithms 46 (2015), no. 2, 261–273. , Geometry and topology of random 2-complexes, Israel Journal of Mathematics 209 (2015), no. 2, 19. 883–927. 20. A.E. Costa, M. Farber, and T. Kappeler, Topics of stochastic algebraic topology, Electronic Notes in Theoretical Computer Science 283 (2012), no. 0, 53 – 70, Proceedings of the workshop on Geometric and Topological Methods in Computer Science (GETCO). 21. William Crawley-Boevey, Decomposition of pointwise finite-dimensional persistence modules, Journal of Algebra and Its Applications 14 (2015), no. 05, 1550066. 44

22. C J A Delfinado and H Edelsbrunner, An incremental algorithm for Betti numbers of simplicial complexes, Proceedings of the ninth annual Symposium on Computational geometry, ACM, 1993, pp. 232– 239. 23. E. W. Dijkstra, A note on two problems in connexion with graphs, Numerische Mathematik (1959), no. 1, 269—271. 24. A M. Duval, C J. Klivans, and J L. Martin, Simplicial matrix-tree theorems, Transactions of the American Mathematical Society 361 (2009), no. 11, 6073–6114. 25. H. Edelsbrunner and J.L. Harer, Computational topology, an introduction, American Mathematical Society, Providence, RI, 2010. 26. H. Edelsbrunner, D. Letscher, and A. Zomorodian, Topological persistence and simplification, Discrete and Computational Geometry 28 (2002), no. 4, 511–533. 27. P. Erd¨ os and A. R´enyi, On random graphs, i, Publicationes Mathematicae Debrecen 6 (1959), 290–297. 28. S. N. Ethier and T. G. Kurtz, Markov processes: characterization and convergence, vol. 282, John Wiley & Sons, 2009. 29. A. Frieze, On the value of a random minimum spanning tree problem, Discrete Applied Mathematics 10 (1985), no. 1, 47 – 56. 30. A. Frieze and M. Karo` nski, Introduction to random graphs, Cambridge University Press, 2016. 31. A. Hatcher, Algebraic topology, Cambridge University Press, Cambridge, New York, 2002. 32. N. Henze, The limit distribution for maxima of “weighted” rth-nearest-neighbour distances, Journal of Applied Probability 19 (1982), no. 2, 344–354. 33. Y. Hiraoka and T. Shirai, Minimum spanning acycle and lifetime of persistent homology in the LinialMeshulam process, 2015, arXiv:1503.05669. 34. Y. Hiraoka and T. Shirai, Tutte polynomials and random-cluster models in bernoulli cell complexes, 2016, arXiv:1602.04561. 35. T. Hsing and H. Rootzen, Extremes on trees, Annals of Probability 33 (2005), no. 1, 413–444. 36. M. E.H. Ismail, E. Koelink, V. Reiner, A. M. Duval, C. J. Klivans, and J. L. Martin, Cellular spanning trees and laplacians of cubical complexes, Advances in Applied Mathematics 46 (2011), no. 1, 247 – 274. 37. S. Janson, T. Luczak, and A. Rucinski, Random graphs, Wiley-Interscience Series in Discrete Mathematics and Optimization, Wiley, New York, 2000. 38. V. Jarn´ık, O jist´em probl´emu minim´ aln´ım, [about a certain minimal problem], Pr´ace Moravsk´e Pˇr´ırodovˇedeck´e Spoleˇcnosti (in Czech) (1930), no. 6, 57—63. 39. M. Kahle, Topology of random simplicial complexes: a survey, AMS Contemporary Mathematics 620 (2014), 201–222. 40. M. Kahle, Random simplicial complexes, 2016, arXiv:160707069. 41. M. Kahle and B. Pittel, Inside the critical window for cohomology of random k-complexes, Random Structures and Algorithms (2014). 42. G. Kalai, Enumeration of Q-acyclic simplicial complexes, Israel Journal of Mathematics 45 (1983), no. 4, 337–351. 43. V. Krushkal and D. Renardy, A polynomial invariant and duality for triangulations, Electronic Journal of Combinatorics 21 (2014), no. 3. 44. J. B. Kruskal, On the shortest spanning subtree of a graph and the traveling salesman problem, Proceedings of the American Mathematical Society (1956), no. 7, 48—50. 45. N. Linial and R. Meshulam, Homological connectivity of random 2-complexes, Combinatorica 26 (2006), no. 4, 475–487. 46. N. Linial, I. Newman, Y. Peled, and Y. Rabinovich, Extremal problems on shadows and hypercuts in simplicial complexes, 2014. 47. N. Linial and Y. Peled, On the phase transition in random simplicial complexes, 2014.

45

48. R. Lyons, Random complexes and `2 -betti numbers, Journal of Topology and Analysis 01 (2009), no. 02, 153–175. 49. R. Lyons and Y. Peres, Probability on trees and networks, Cambridge University Press, 2016, Available at http://pages.iu.edu/~rdlyons/. 50. R. Mathew, I. Newman, Y. Rabinovich, and D. Rajendraprasad, Boundaries of Hypertrees and Hamiltonian Cycles in Simplicial Complexes, 2015. 51. R Meshulam and N Wallach, Homological connectivity of random k-dimensional complexes, Random Structures and Algorithms 34 (2009), no. 3, 408–417. 52. J.R. Munkres, Elements of algebraic topology, Addison-Wesley, 1984. 53. J. Oxley, What is a matroid?, Cubo Matem´atica Educacional 5 (2003), no. 3, 179–218. 54. M. D Penrose, The longest edge of the random minimal spanning tree, Annals of Applied Probability (1997), 340–361. 55. M. D. Penrose, Random geometric graphs, Oxford Studies in Probability, vol. 5, Oxford University Press, Oxford, 2003. 56. G. Petri, P. Expert, F. Turkheimer, R. Carhart-Harris, D. Nutt, P.J. Hellyer, and F. Vaccarino, Homological scaffolds of brain functional networks, Journal of The Royal Society Interface 11 (2014), no. 101. 57. R. C. Prim, Shortest connection networks and some generalizations, Bell System Technical Journal 36 (1957), no. 6, 1389—1401. 58. S. I. Resnick, Extreme values, regular variation and point processes, Springer, 2013. 59. Primoz Skraba and Mikael Vejdemo-Johansson, Persistence modules: algebra and algorithms, arXiv preprint arXiv:1302.2015 (2013). 60. J. M. Steele and L. Tierney, Boundary domination and the distribution of the largest nearest-neighbor link in higher dimensions, Journal of Applied Probability 23 (1986), no. 2, 524–528. 61. V. E. Stepanov, Combinatorial algebra and random graphs, Theory of Probability and Its Applications 14 (1969), no. 3, 373–399. 62. E. Szpilrajn, Sur l’extension de l’ordre partiel, Fundamenta mathematicae 16 (1930), no. 1, 386–389. 63. Remco Van Der Hofstad, Random graphs and complex networks, 2016, Available at http://www.win.tue.nl/ rhofstad/NotesRGCN.html. 64. D. J. A. Welsh, Matroid theory, Academic Press [Harcourt Brace Jovanovich, Publishers], London-New York, 1976, L. M. S. Monographs, No. 8. 65. Wikipedia, Minimum spanning tree — wikipedia, the free encyclopedia, 2016, [Online; accessed 10February-2016]. 66. A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete and Computational Geometry 33 (2005), no. 2, 249–274. (PS) Jozef Stefan Institute, Ljubljana, Slovenia & FAMNIT, University of Primorska, Koper, Slovenia E-mail address: [email protected] (GT) Faculty of Electrical Engineering, Technion - Israel Institute of Technology, Haifa, Israel E-mail address: [email protected] (DY) Statistics and Mathematics Unit, Indian Statistical Institute, Bengaluru, India E-mail address: [email protected]

46