Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Computing 2-Connected Components and Maximal 2-Connected Subgraphs in Directed Graphs: An Experimental Study Loukas Georgiadis1 Giuseppe F. Italiano2 Aikaterini Karanasiou2 2 Nikos Parotsidis Nilakantha Paudel2 Abstract Motivated by very recent work on 2-connectivity in directed graphs, we revisit the problem of computing the 2-edge- and 2-vertex-connected components, and the maximal 2-edge- and 2-vertex-connected subgraphs of a directed graph G. We explore the design space for efficient algorithms in practice, based on recently proposed techniques, and conduct a thorough empirical study to highlight the merits and weaknesses of each technique. 1
Introduction
Edge and vertex connectivity are fundamental concepts in graph theory with numerous practical applications (see, e.g., [2, 19]). Let G = (V, E) be a directed graph (digraph), with m edges and n vertices. A vertex (resp., an edge) of G is a strong articulation point (resp., a strong bridge) if its removal increases the number of strongly connected components of G. A digraph G is 2-vertex-connected if it has at least three vertices and no strong articulation points; G is 2-edge-connected if it has no strong bridges. Let C ⊆ V . The induced subgraph of C, denoted by G[C], is the subgraph of G with vertex set C and edge set E ∩ (C × C). If G[C] is 2-vertex-connected (resp., 2-edge-connected), and there is no set of vertices C 0 with C ( C 0 ⊆ V such that G[C 0 ] is also 2-vertexconnected (resp., 2-edge-connected), then G[C] is a maximal 2-vertex-connected (resp., 2-edge-connected) subgraph of G. Hence, in the context of reliable communication, maximal 2-vertex- and 2-edge-connected subgraphs correspond, respectively, to parts of a network that are resilient to single vertex and edge failures. These concepts, however, do not capture the pairwise connectivity among the vertices. Indeed, two vertices may lie in different maximal 2-connected subgraphs but still be connected by several disjoint paths. This observation motivates the following natural 2connectivity relations [9, 10, 14, 22]. Let v and w 1 University
of Ioannina, Greece. Email:
[email protected] of Rome Tor Vergata, Italy. Email:
[email protected] 2 University
be two distinct vertices. We say that v and w are 2-vertex-connected (resp., 2-edge-connected ), and we denote this relation by v ↔2v w (resp., v ↔2e w), if there are two internally vertex-disjoint (resp., two edge-disjoint) directed paths from v to w and two internally vertex-disjoint (resp., two edge-disjoint) directed paths from w to v. Note that a path from v to w and a path from w to v need not be edge-disjoint or vertex-disjoint. We define a 2-vertex-connected (resp., 2-edge-connected ) component of a digraph G = (V, E) as a maximal subset B ⊆ V such that u ↔2v v (resp., u ↔2e v) for all u, v ∈ B. See Figure 1. We remark that in digraphs 2-vertex and 2-edge connectivity have a much richer and more complicated structure than in undirected graphs. Specifically, the vertex-disjoint (resp., edge-disjoint) paths that make two vertices 2-vertex-connected (resp., 2edge-connected) in a component might use vertices that are outside of that component, while in a maximal 2-vertex-connected (resp., 2-edge-connected) subgraph those paths must lie completely inside that subgraph. Hence, two vertices that are 2-vertexconnected (resp., 2-edge-connected) are in a common 2-vertex-connected (resp., 2-edge-connected) component, but not necessarily in a common maximal 2vertex-connected (resp., 2-edge-connected) subgraph. See, e.g., vertices e and f in Figure 1. As a result, 2-connectivity problems on digraphs appear to be much harder than on undirected graphs. For undirected graphs it has been known for over 40 years how to compute the 2-vertex- and 2-edge-connected components in linear time [23]. In the case of digraphs, however, only O(mn) algorithms were known (see e.g., [14, 15, 18, 20]). It was shown only recently how to compute the 2-vertex- and 2-edge-connected components in linear time [9, 10], and the best current bound for computing the maximal 2-vertex- and the 2edge-connected subgraphs is O(min{m3/2 , n2 }) [5, 12]. Throughout, we refer to the problems of computing the 2-vertex- and 2-edge-connected components, respectively as 2VCC and 2ECC, and to the problems of computing the maximal 2-vertex- and 2-edge-connected subgraphs, respectively as Max2VCS and Max2ECS.
169
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
𝑎
𝑏
𝑎
𝑏
𝑎
𝑏
𝑎
𝑏
𝑎
𝑏
𝑐
𝑑
𝑐
𝑑
𝑐
𝑑
𝑐
𝑑
𝑐
𝑑
𝑓
𝑒
𝑒
𝑓
𝑓
𝑒
𝑒
𝑓
𝑒
𝑓
𝑔
ℎ
𝑔
ℎ
𝑔
ℎ
𝑔
ℎ
𝑔
ℎ
𝑖
𝑗
𝑖
𝑗
𝑖
𝑗
𝑖
𝑗
𝑖
𝑗
(a) G
(b) M ax2VCS (G)
(c) 2VCC (G)
(d) M ax2ECS (G)
(e) 2ECC (G)
Figure 1: (a) A strongly connected digraph G, with strong articulation points and strong bridges shown in red (better viewed in color). (b) The maximal 2-vertex-connected subgraphs of G. (c) The 2-vertex-connected components of G. (d) The maximal 2-edge-connected subgraphs of G. (e) The 2-edge-connected components of G. Note that vertices e and f are in the same 2-vertex- (resp., 2-edge-) connected component of G since there are two internally vertex-disjoint (resp., edge-disjoint) paths from e to f and from f to e. However, e and f are not in the same maximal 2-vertex (resp., 2-edge-) connected subgraph of G. Previous work. To the best of our knowledge, the only previous experimental study on 2-connectivity in digraphs was carried out by Di Luigi et al. [7], who compared the linear-time algorithms for computing the 2-vertex-connected components of [10] and for computing the 2-edge-connected components of [9] with simple-minded algorithms. Their experimental results show that the more sophisticated linear-time algorithms are faster than the simple-minded algorithms. Di Luigi et al. also considered the computation of the maximal 2-vertex-connected subgraphs of a digraph, and proposed an O(mn)-time algorithm that refines the dominator tree division applied by an algorithm of Jaberi [15] with the same running time. Di Luigi et al. compare their new algorithm with the algorithm of Jaberi [15] and with the O(m2 n)-time algorithm of Erusalimskii and Svetlov [8]. They find that their algorithm is faster than the algorithm of Jaberi by a factor of 2 on average, while the algorithm of Erusalimskii and Svetlov is not competitive even for graphs of moderate size. Our results. In this paper we revisit the problem of computing the 2-vertex- and the 2-edge-connected components and the maximal 2-vertex- and the 2edge-connected subgraphs of a directed graph G in practice, by taking into account the bulk of recent theoretical advances in this area (see, e.g.,
[5, 11, 12]). In particular, we explore the design space for algorithms that perform well in practice, by implementing and engineering the newly proposed algorithms. We do this by comparing the new implementations against the fastest implementations in [7] and carry out a thorough empirical analysis that highlights the merits and weaknesses of each technique. Specifically, we present an efficient implementation of a new linear-time algorithm for computing the 2-vertex and the 2-edge-connected components of G, based on loop nesting information [11], and compare it against the algorithms based on a twolevel decomposition of G using auxiliary graphs [9, 10] implemented in [7]. Then, we consider the computation of the maximal 2-vertex- and 2-edgeconnected subgraphs of G, and investigate how the recent O(n2 )-time algorithms by Henzinger et al. [12], and the recent O(m3/2 )-time algorithms by Chechik et al. [5] perform in practice. Although the asymptotic running time of those algorithms is better than O(mn), a straightforward implementation of those algorithms makes them non-competitive to the simpler O(mn) algorithms [7]. We engineer the algorithms from [5, 12], providing implementations that run faster by 2-3 orders of magnitude, compared to their straightforward implementations. Our main experimental findings can be summa-
170
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
rized as follows. For the computation of the 2-vertexand the 2-edge-connected components, our experiments indicate that the new loop-nesting-based algorithms from [11] are not only substantially faster in practice than previous state-of-the-art implementations, but they are also much more efficient in terms of memory usage, especially for sparse graphs. This makes them particularly suitable to large-scale realworld graphs, which are known to be inherently sparse. Furthermore, the new loop-nesting-based methods are conceptually simpler to implement. Our experiments also highlight that the performance of the loop nesting computation tends to degrade as the graph density increases. We thus propose a variant of this computation that alleviates the problem, which may be of independent interest, since loop nesting information is useful in a variety of applications (see, e.g., [21, 25]). For the computation of the maximal 2-vertex- and 2-edgeconnected subgraphs, our experimental results reveal that, although asymptotically faster, in real-world graphs the new more sophisticated algorithms [5, 12] are not competitive with the previous algorithm based on dominator tree decomposition [7]. On the other hand, we show that our carefully engineered version of the algorithm by Chechik et al. [5] performs closely to the previous algorithms from [7] in real-world graphs (within a factor of 3.7, on average), and moreover in pathological worst-case graphs it performs more than two orders of magnitude faster than previous methods. Due to the lack of space, we omit from this extended abstract the tables containing the absolute values of our experiments, as well as some plots whose findings are discussed in the paper. They will appear in the full version of this paper. 2
Algorithms
in Gs contains u. This relation can be represented by a tree rooted at s, the dominator tree D of Gs , such that u dominates w if and only if u is an ancestor of w in D. For any v 6= s, we let d(v) denote the parent of v in D. Similarly, we can define the dominator R tree DR of the flow graph GR s , and let d (v) be the R parent of a vertex v 6= s in D . We let C(v) (resp., C R (v)) denote the set of children in D (resp., DR ) of a vertex v. The dominator tree of a flow graph can be computed in linear time [1, 4]. A vertex v 6= s is a strong articulation point of G if and only if it is not a leaf in D or in DR [13]. An edge (u, v) is a bridge of a flow graph Gs if all paths from s to v include (u, v).1 An edge is a strong bridge of G if and only if it is a bridge of Gs or GR s [13]. Moreover, a bridge (u, v) of Gs (resp., R GR ) s is an edge of D (resp., D ), i.e., d(u) = v (resp., R d (u) = v). After deleting from the dominator trees D and DR respectively the bridges of Gs and GR s , we obtain the bridge decomposition of D and DR into forests D and DR . We denote by Du (resp., DuR ) the tree in D (resp., DR ) containing vertex u, and by ru (resp., ruR ) the root of Du (resp., DuR ). Property 2.1. ([10]) Let v and w be any vertices of R G. Then v ↔2e w only if rv = rw and rvR = rw . Property 2.2. ([9]) Let v and w be any vertices of G. Then v ↔2v w only if v and w are siblings or one is the parent of the other in both D and DR . Note that Properties 2.1 and 2.2 hold also when v and w are in the same maximal 2-edge-connected and 2-vertex-connected subgraph, respectively. Furthermore, Property 2.2 implies that each maximal 2-vertex-connected subgraph is contained in a subgraph induced by the vertices of some set (C(u) ∪ {u}) ∩ (C R (v) ∪ {v}). Auxiliary graphs. Auxiliary graphs were defined in [10] and [9] to decompose the input digraph G into smaller digraphs (not necessarily subgraphs of G) that maintain, respectively, the original 2-edgeand 2-vertex-connected components of G. For 2-edgeconnectivity, we construct an auxiliary graph G2e r , for each tree Dr ∈ D, as follows. Call a vertex w marked if (d(w), w) is a bridge of Gs . We say that v ∈ Dr is a boundary vertex in Dr if v is a parent of a marked vertex w. The vertex set of G2e r consists of ordinary vertices, which are the vertices in Dr , and auxiliary vertices, which are d(r) and the marked
In this section we give an overview of the algorithms that we implemented for our experimental study. First, we review some key concepts and techniques used by these algorithms. Flow graphs, dominators, and bridges. A flow graph is a directed graph with a distinguished start vertex s such that every vertex is reachable from s. Let G = (V, E) be a strongly connected graph. The reverse digraph of G, denoted by GR = (V, E R ), is the digraph that results from G by reversing the direction of all edges. Throughout the paper we let s be a fixed but arbitrary start vertex of G. Since G is strongly connected, all vertices are reachable from s and reach s, so we can view both G and GR as flow graphs 1 Throughout the paper, to avoid confusion we use consiswith start vertex s. We will denote those flow graphs tently the term bridge to refer to a bridge of a flow graph and respectively by Gs and GR s . A vertex u is a dominator the term strong bridge to refer to a strong bridge in the original of a vertex v (u dominates v) if every path from s to v graph. 171
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
children of boundary vertices of Dr . To form G2e r , we contract all descendants in D of a marked child w of a boundary vertex into w. All vertices in V \ Dr that are not descendants of r in D are contracted into d(r) (r 6= s if any such vertex exists). In the case of 2-vertex connectivity, the auxiliary graphs (denoted by G2v r ) are quite more involved. Both types of auxiliary graphs have total size O(m + n) and can be constructed in linear time. Loop nesting forests. Let Gs = (V, E, s) be a flow graph. A loop nesting forest represents a hierarchy of strongly connected subgraphs of Gs [25], and is defined with respect to a dfs tree T of Gs as follows. For any vertex u, the loop of u, denoted by loop(u), is the set of all descendants x of u in T such that there is a path from x to u in G containing only descendants of u in T . Vertex u is the head of loop(u). Any two vertices in loop(u) reach each other. Therefore, loop(u) induces a strongly connected subgraph of Gs ; it is the unique maximal set of descendants of u in T that does so. The loop(u) sets form a laminar family of subsets of V : for any two vertices u and v, loop(u) and loop(v) are either disjoint or nested (i.e., one contains the other). The above property allows us to define the loop nesting forest H of Gs , with respect to T , as the forest in which the parent of any vertex v, denoted by h(v), is the nearest proper ancestor u of v in T such that v ∈ loop(u) if there is such a vertex u, and null otherwise. Then loop(u) is the set of all descendants of vertex u in H, which we will also denote as H(u) (the subtree of H rooted at vertex u). The properties of dfs imply that every cycle of Gs is contained in a loop. A loop nesting forest can be computed in linear time [4, 25]. Since here we deal with strongly connected digraphs, each vertex is contained in a loop, so H is a tree rooted at s. Therefore, we will refer to H as the loop nesting tree of Gs . 2-isolated sets. Henzinger et al. [12] define a 2isolated set of a digraph G = (V, E), where G is not necessarily strongly connected, to be a set of vertices S ⊆ V that (a) cannot be reached by the vertices of V \S or (b) can be reached from V \S only through one edge e. Every maximal 2-edge-connected subgraph of G contains either only vertices of S or only vertices of V \ S. Hence, if such a set S is found, we can compute recursively the maximal 2-edge-connected subgraphs in the subgraphs induced by S and V \ S respectively. Similarly, for 2-vertex connectivity, a 2-isolated set of G, is defined as a set of vertices S ⊆ V that (a) cannot be reached by the vertices of V \ S or (b) can be reached from V \ S only through one vertex v. The set of vertices of every maximal 2-vertex-connected subgraph of G is either a subset of S ∪ {v} or a subset
of V \ S. Hence, the computation recurses in the subgraphs induced by S ∪ {v} and V \ S. Although the definition of a 2-isolated set differ in the cases of 2-edge- and 2-vertex-connectivity, they are used in a similar way algorithmically. Thus, we refer to them with the same term to retain a common high-level description. 2.1 Algorithms for 2-edge- and 2-vertexconnected components We now describe the two linear-time algorithms for computing the 2-edge- and the 2-vertex-connected components. Auxiliary graphs. The linear-time algorithm of [10] computes the 2-edge-connected components of G by applying Property 2.1 and two levels of auxiliary graphs. From Property 2.1 we get an initial partition of the vertices into components, which is given by the ordinary vertices in each auxiliary graph Jr = G2e r of G. This process is repeated in each JrR , producing a second level of auxiliary graphs. Finally, the 2-edgeconnected components of G are formed by the strongly connected components of the second-level auxiliary graphs, after removing the strong bridge entering the root vertex of each second-level auxiliary graph. We refer to this algorithm as 2E-AUX. The analogous high-level idea, using Property 2.2 and auxiliary graphs G2v r , gives a linear-time algorithm that computes the 2-vertex-connected components of G [9]. We refer to this algorithm as 2V-AUX. We note that 2V-AUX is more complicated than 2EAUX, due to the fact that, unlike the 2-edge-connected components, 2-vertex-connected components do not form a partition of V . Here we maintain the current components, that are refined through the execution of 2V-AUX, using an additional data structure. Loop nesting forests. Very recently, [11] presented new linear-time algorithms to compute the 2-edge- and 2-vertex-connected components, based on loop nesting forests. We refer to these algorithms as 2E-LNF and 2V-LNF, respectively. Algorithm 2E-LNF assigns a label to each vertex v, where label(v) = hrv , hv , rvR , hR v i. In this label, rv and rvR are, respectively, the roots of the trees that contain v in the bridge decompositions D and DR . Also, hv (resp., hR v ) is the nearest ancesR tor w of v in H (resp., H ) such that h(w) ∈ / Dw R (resp., hR (w) ∈ / Dw ). Each 2-edge-connected component of G is formed by the vertices that have identical labels. Algorithm 2V-LNF uses an analogous approach for computing the 2-vertex-connected components in linear time. The main difference is that instead of computing vertex labels, we maintain the same data structure while refining the maintained components as in 2V-AUX. Here, the components are refined with
172
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
respect to the loops in Gs and GR s. In our experiments, we noticed that the performance of 2E-LNF and 2V-LNF degrades as the graph density increases, which is due to the data structures used to compute the loop nesting trees of Gs and GR s . Specifically, as the algorithm finds the loops of a flow graph, it contracts all the vertices of a loop into a single vertex, using a disjoint set union data structure [24]. The vertex to which we contract a loop is called the head of the loop. As the graph undergoes vertex contractions, we need to maintain the dynamic list of edges entering the head of each contracted loop, which we use in order to search for the vertices that are in a specific loop. In order to improve the memory usage of this algorithm, we apply the following idea that avoids unnecessary edge insertions in these dynamic lists. Since the vertices of a loop are contracted into a single vertex, we can detect if the algorithm tries to insert parallel edges in the dynamic lists, and avoid such insertions. We will refer to the implementations of 2E-LNF and 2V-LNF that apply this method as 2E-LNFD and 2V-LNFD, respectively (for 2E-LNF and 2V-LNF for Dense graphs). 2.2 Algorithms for maximal 2-edge- and 2vertex-connected subgraphs We now describe the algorithms for computing the maximal 2-edge- and the 2-vertex-connected subgraphs. Basic algorithm for computing the maximal 2edge-connected subgraphs. The basic algorithm for computing the maximal 2-edge-connected subgraphs can be seen as maintaining a partition of the vertices that is repeatedly refined by identifying parts that cannot be in the same 2-edge-connected subgraph. In the basic algorithm these parts are identified by computing strong bridges, removing them, and processing the resulting strongly connected components recursively. This leads to an O(mn)-time algorithm since the recursion depth can be at most Θ(n). We refer to this algorithm as M2E-BASIC. Basic algorithm for computing the maximal 2vertex-connected subgraphs. As shown in [7], we can compute the maximal 2-vertex-connected subgraphs of G by applying Property 2.2 recursively. We maintain a collection of strongly connected subgraphs of G, that we refine via Property 2.2, until we are left with subgraphs that are 2-vertex-connected. Each such subgraph is then a maximal 2-vertex-connected subgraph of G. For every strongly connected subgraph J of G, the algorithm picks an arbitrary start vertex s ∈ J and computes the dominator trees DJ and DJR of flow graphs Js and JsR , respectively. Then we compute induced subgraphs of J that contain vertices v
and w that satisfy Property 2.2, and recurse on each such induced subgraph. We refer to this algorithm as M2V-BASIC. Hierarchical sparsification. The algorithm of Henzinger et al. [12] is based on a fast computation of 2-isolated sets using subgraphs of the input digraph G. We refer to the corresponding algorithms for 2-edgeand 2-vertex connectivity, respectively as M2E-HKL and M2V-HKL, from the authors’ initials in [12]. As shown in [12], a 2-isolated set S of type (a), i.e., a set of vertices S that cannot be reached by the vertices of V \ S, can be found by computing strongly connected components; a 2-isolated set S of type (b), i.e., a set S that can be reached from V \ S only through one vertex v, can be found by computing dominators in a suitably defined flow graph. In order to find 2-isolated sets fast, Henzinger et al. apply a technique called hierarchical sparsification. They start the search for a 2-isolated set in a subgraph of G that includes all vertices but only the first incoming edge of each vertex. If no 2-isolated set is found, they repeatedly double the number of incoming edges per vertex in the subgraph until the search is successful. They show that in this way the search will take time O(n) per vertex in the 2-isolated set, which gives an O(n2 ) total time bound. Engineered version of hierarchical sparsification. A simple observation that may help to speed up M2E-HKL and M2V-HKL in practice, is that when we compute strongly connected components in some subgraph constructed by the hierarchical sparsification, we may actually find several 2-isolates sets. Then, instead of recursing on the partition defined by only one of these sets, we can recurse on each subgraph induced by these sets. Moreover, when M2E-HKL (resp., M2V-HKL) searches for a 2-isolated set of type (b), we can use many strong bridges (resp., strong articulation points) to refine further strongly connected subgraphs induced by such 2-isolated sets. We refer to these engineered algorithms as M2E-HKL∗ and M2V-HKL∗ , for 2-edge- and 2-vertex connectivity, respectively. Local 2-isolated sets search. The algorithm by Chechik et al. [5] extends the straightforward algorithms for computing the maximal 2-vertex- and 2edge-connected subgraphs. The straightforward algorithm, for computing the 2-vertex-connected subgraphs iteratively computes a strong articulation point v, computes the strongly connected components C1 , C2 , . . . , Cl in G \ v, and processes the strongly connected components of G[C1 ∪ v], G[C2 ∪ v], . . . , G[Cl ∪ v] recursively until it reaches a maximal 2-vertexconnected subgraph. The straightforward algorithm for computing the maximal 2-edge-connected subgraphs iteratively removes a strong bridge and re-
173
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Algorithm
Problem
Technique
Complexity Reference
2V-AUX
2VCC
Dominator-tree division and auxiliary graphs
O(m + n)
[9]
[7]
2V-LNF
2VCC
Dominator-tree division and loop nesting forests
O(m + n)
[11]
This paper
2V-LNFD
2VCC
Memory efficient version of 2V-LNF
O(m + n)
This paper
This paper
2E-AUX
2ECC
Dominator-tree division and auxiliary graphs
O(m + n)
[10]
[7]
2E-LNF
2ECC
Dominator-tree division and loop nesting forests
O(m + n)
[11]
This paper
2E-LNFD
2ECC
Memory efficient version of 2E-LNF
O(m + n)
This paper
This paper
M2V-BASIC
Max2VCS Dominator-tree division and induced subgraphs
O(mn)
[7]
[7]
M2V-HKL
Max2VCS Computing 2-isolated sets via hierarchical sparsification
O(n2 )
[12]
This paper
M2V-HKL∗
Max2VCS Engineered version of M2V-HKL
O(n2 )
This paper
This paper
[5]
This paper
3/2
)
Implemented
M2V-CHILP
Max2VCS Strong articulation point division and fast search of small 2-isolated sets
O(m
M2V-CHILP∗
Max2VCS Engineered version of M2V-CHILP
O(m3/2 )
This paper
This paper
M2E-BASIC
Max2ECS Removing all strong bridges
O(mn)
Folklore
[7]
2
M2E-HKL
Max2ECS Computing 2-isolated sets via hierarchical sparsification
O(n )
[12]
This paper
M2E-HKL∗
Max2ECS Engineered version of M2E-HKL
O(n2 )
This paper
This paper
[5]
This paper
This paper
This paper
3/2
M2E-CHILP
Max2ECS Removing a single strong bridge and fast search of small 2-isolated sets
O(m
)
M2E-CHILP∗
Max2ECS Engineered version of M2E-CHILP
O(m3/2 )
Table 1: An overview of the algorithms considered in our experimental study. The bounds refer to a digraph with n vertices and m edges. computed strongly connected components, until no strong bridges exist. Both algorithms have recursion depth at most Θ(n). Chechik et al. [5] showed√how to limit the maximum recursion depth to O( m). This is achieved by identifying efficiently all “small” 2-isolated sets between any two iterations of the basic algorithm. Identifying such sets, allows the algorithm to treat its vertices separately since no vertex from such set can be 2-connected with a vertex outside the set. The authors of [5] introduce a subroutine that either identifies a 2-isolated set containing a specific vertex u whose induced subgraph has at most O(∆) edges, or concludes that any 2-isolated set containing u induces a subgraph with more than ∆ edges. Moreover, the search is executed in O(∆) time. We first describe the algorithm for computing the maximal 2-edge-connected subgraphs of a graph. The algorithm is recursive. Between recursive executions of the algorithm, all 2-isolated √ sets whose induced subgraphs contain at most O( m) edges are identified using the local search subroutine (i.e., we set ∆ = √ m), and their incident edges with vertices outside the 2-isolated set are removed. Initially, the algorithm executes such a search from each vertex. Then, in order to identify all small 2-isolated sets (as we identify sets and we remove edges), it suffices to execute a
local search starting from each vertex that had some incident edges deleted. After all small 2-isolated sets are identified, we compute the strongly connected components of the resulting graph and we remove a single strong bridge from each of these components. As shown in [5],√each strongly connected component either contains m fewer edges than the input graph of the current recursion, or is a maximal 2-edgeconnected subgraph. We refer to this algorithm as M2E-CHILP from the authors’ initials in [5]. The algorithm for computing the maximal 2vertex-connected subgraphs, which we call M2VCHILP, follows the same high-level idea as M2E-CHILP. However, M2V-CHILP is more complicated in two ways. First, the local search for small 2-isolated sets is more involved. Second, whenever a small 2-isolated set S is found we create a copy of the boundary vertex (i.e., either the only vertex that has outgoing edges from S or the only vertex that has incoming edge to S) since it may be part of maximal 2-connected subgraphs with vertices that do not belong to S. Engineered version of 2-isolated sets search. We engineer the algorithm by Chechik et al. [5] as follows. First, when all small 2-isolated sets have been identified we remove all strong bridges from each strongly connected component instead of
174
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Graph details Graph Type n web-NotreDame [17] WG 54.0K soc-Epinions1 [17] SN 32.2K Amazon0302 [17] PCP 241.8K WikiTalk [17] SN 111.9K web-Stanford [17] WG 150.5K Amazon0601 [17] PCP 395.2K web-Google [17] WG 434.8K web-BerkStan [17] WG 334.9K SAP-4M MP 4.1M Oracle-6M MP 6.4M flickr [16] SN 1.6M SAP-11M MP 11.1M USA-USA [6] RN 23.9M LiveJournal [17] SN 3.8M SAP-32M MP 32.3M SAP-70M MP 69.8M uk-2002 [3] WG 12.0M
2VCC details Max Avg # 6.2K 3.0 22K 17.6K 3.4 12.6K 123.6K 4.1 74.8K 50.2K 2.9 53.6K 26.2K 3.9 40.8K 287.6K 6.0 78.1K 151.4K 3.5 149.3K 64K 3.4 109.1K 119.7K 2.6 271.9K 283K 2.5 1471.6K 611.4K 2.7 950.5K 640.9K 3.0 752.3K 16M 4.2 7.4K 2.9M 5.4 862.5K 197.4K 3.0 478.4K 947.5K 2.3 6.9M 2.1M 3.3 4.1M
m 296.2K 443.5K 1.1M 1.5M 1.6M 3.3M 3.4M 4.5M 11.9M 15.9M 30.3M 36.4M 57.7M 65.3M 81.8M 214.9M 227.5M
2ECC details Max Avg # 16.5K 35.6 760 18.1K 261.0 70 140.2K 24.6 7.3K 50.3K 8.4K 6 58.6K 76.9 1.2K 305.9K 87.7 3.7K 225.0K 56.2 4.8K 128.2K 64.4 2.9K 141.5K 39.6 5.3K 389.4K 19.0 44.9K 649.6K 160.1 4.2K 751.8K 36.7 25.8K 16.1M 158.7 105.7K 2.9M 747.7 39.5K 264.1K 22.3 27.0K 1.3M 8.7 44.8K 4.7M 66.9 93.4K
Table 2: The characteristics of the real-world graphs in our dataset; n and m refers to the number of vertices and the number of edges, respectively. The graph types are encoded as follows: road network (RN), peer to peer (P2P), web graph (WG), social network (SN), production co-purchase (PCP), memory profiling (MP). The graphs are sorted in increasing order according to their number of edges. Additionally, we report the statistics of their 2VCC (resp., 2ECC), whose size refers to the number of their vertices. removing only a single one. For computing the 2-vertex connected subgraphs, we apply a single iteration of M2V-BASIC, that is, by applying Property 2.2. Furthermore, the original algorithm maintains a queue of vertices from which √ it has to search for a 2-isolated set with at most O( m) edges. Instead, √ we maintain log( m) levels of queues. Each vertex from which we wish to search for an 2-isolated set is initially placed into the queue at level 1. Iteratively, we extract a vertex v from the√non-empty queue with minimum level 1 ≤ j ≤ log( m) and we execute a local search for a 2-isolated set with at most O(2j ) edges, containing v. If the search identifies such a set, then we have found a small 2-isolated set containing v and therefore v is removed from the queues. Assume now that the search fails to identify a j 2-isolated set √ containing v with at most O(2 ) edges. If j = log( m), v is removed from the queues since there √ is no 2-isolated set containing it with at most O( m) edges, √ as required by the original algorithm. If j < log( m), v is placed in the queue at level j + 1. This modification allows the algorithm to identify very small 2-isolated components without necessarily √ spending O( m) time, but proportional to their size. In the worst case, modified √ algorithm spends √ our √ 2 + 4 + 8 + · · · + m/2 + m = O( m) time for each local search, and therefore the asymptotic running time of the original algorithm is not affected. Finally,
during this process, after searching for all 2-isolated sets containing up to ∆1 = m1/4 edges, and before proceeding √ to larger sizes (since we would search up to ∆ = m), we test whether each resulting √strongly connected component is smaller by at least m edges than the input graph of the current recursion. We do the same after searching for all sets containing up to ∆2 = m1/3 edges. As a result, we are able to avoid searching for larger sets if sufficiently many edges are removed, and therefore we also guarantee √ the desired O( m) recursion depth of the algorithm. This last modification takes advantage of the fact that the algorithms M2E-BASIC and M2V-BASIC have very small recursion depth in practice [7]. We refer to our engineered versions of M2E-CHILP and M2V-CHILP, respectively as M2E-CHILP∗ and M2V-CHILP∗ . The main algorithms considered in this paper are summarized in Table 1. 3
Experimental Analysis
Here we report the results of the experiments that we conducted, using the algorithms shown in Table 1. We implemented all our algorithms in C++ without the use of any external graph library. Specifically, for the computation of the 2-edge- and 2-vertex-connected components, we implemented the two loop-nestingtree based algorithms, 2E-LNF and 2V-LNF together
175
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
SA P70
SA
uk
-20
02
M
US A Liv -US eJ A ou rna l SA P32 M
US A Liv -US eJ A ou rna l SA P32 M
flic k
r SA P11
M
M le6 Or ac
SA P4M
alk Wi kiT
03 Am az on
we
02
s1 ion so c-E pin
2V-AUX 2V-LNF 2V-LNFD
1.2 1 0.8
0.35
2E-AUX 2E-LNF 2E-LNFD
02
70 P-
-20 uk
flic
kr SA P-
11
M
6M leac Or
4M PSA 1+e7
M
n Sta
gle
erk b-B
we
06 01 on az
b-G we
Am
oo
d for tan b-S
alk kiT
we
Wi
03 02 az Am 1+e6
0.45 0.4
on
pin c-E so
we
0.2
b-N
otr
eD
0.4
ion s1
am
e
0.6
Running time in µs/edge
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
b-S tan for d Am az on 06 01 we b-G oo gle we b-B erk Sta n
e am otr eD b-N we
1.4
1+e8
Number of edges (in log scale)
0.3 0.25 0.2 0.15 0.1 0.05 0 1+e6
1+e7
1+e8
Number of edges (in log scale)
Figure 2: Comparison of the algorithms 2V-AUX, 2V-LNF and 2V-LNFD(top) and of the algorithms 2E-AUX, 2E-LNF, and 2E-LNFD (bottom) for computing the 2-vertex and 2-edge connected components of real-world graphs respectively. Running times are shown in µs/edge. (Better viewed in color). with their memory-efficient versions 2E-LNFD and 2VLNFD. For the computation of the maximal 2-edgeand 2-vertex-connected subgraphs, we implemented the algorithms of Chechick et al. [5], M2E-CHILP and M2V-CHILP, and of Henzinger et al. [12], M2E-HKL and M2V-HKL, together with their engineered versions, M2E-CHILP∗ and M2V-CHILP∗ , and M2E-HKL∗ and M2V-HKL∗ , respectively. For the remaining four algorithms, 2E-AUX, 2V-AUX, M2E-BASIC, and M2VBASIC, we used the implementations of [7]. (See Table 1.) We compiled our codes with g++ v.4.8.4 with full optimization (flag -O3). The experiments were conducted on a 64-bit GNU/Linux machine running on Ubuntu 14.04LTS. The machine uses an 3696MHz Intel i7 − 4790 octa-core processor, 20GB of RAM, 16MB of L3 cache, and each core has a 2MB private L2 cache. We measured CPU running time using the getrusage function, and memory consumption using Valgrind version 3.11 (http://valgrind.org/). All experiments are executed on a single core without using any parallelization. For our experiments we considered several realworld graphs, mostly taken from network collections [3, 6, 16, 17], whose size ranges from ten thousand vertices and hundred thousand edges to ten million vertices and hundred million edges. The character-
istics of the graphs we consider are summarized in Table 2. Additionally, we generated random graphs with specific properties in order to analyze in more depth the performance of some algorithms. We next describe the results of our experimental findings. 3.1 Experimental results for computing the 2connected components In this section we evaluate the performance of the algorithms for the computation of 2-connected components. Real world graphs. We start by evaluating the performance of the algorithms for the computation of the 2-vertex-connected components, i.e., 2V-AUX, 2VLNF and 2V-LNFD, on our datasets and their running times are presented in Figure 2 (top). We observe that algorithm 2V-LNF is faster than 2V-AUX by a factor of 4.2 on average. Also, on average, 2VLNF is slightly faster than 2V-LNFD. This can be explained by considering that for sparse input graphs not many parallel edges are expected to be inserted into the dynamic lists used by the loop nesting tree computations: in this scenario, 2V-LNFD pays the overhead of maintaining additional arrays necessary for filtering all the edges, without obtaining benefits from this filtering phase. Furthermore, we notice that both 2V-LNF and 2V-
176
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
-429 D Rand -46 Rand 6D -49 Rand 9D -536 D
Rand
-359 D -393 D Rand
-286 D
-252 D
-322 D
Rand
Rand
Rand
-216 D
Rand
Rand
-182 D Rand
-145 D Rand
-109 D Rand
-75D
-66D
Rand
-54D
-60D
Rand
Rand
Rand
-45D Rand
-39D Rand
-33D Rand
-23D Rand
-17D Rand
-11D Rand 0.25
2V-AUX 2V-LNF 2V-LNFD
0.15
0.22
-359 D Rand -393 D Rand -429 D Rand -46 Rand 6D -49 Rand 9D -536 D
Rand
-286 D
-252 D
-322 D Rand
Rand
Rand
-216 D Rand
-182 D Rand
-145 D Rand
-109 D Rand
-75D Rand
-66D
Rand
-54D
-60D Rand
Rand
-45D Rand
-39D Rand
-33D Rand
-23D Rand
-17D Rand
Rand
0.05
-11D
0.1
1+e7
0.2
2E-AUX 2E-LNF 2E-LNFD
Number of edges (in log scale)
0.18 0.16 0.14 0.12
0.07 0.06
LNF LNFD
-32 2D Ra 359D nd Ra 393D nd Ra -429 D n Ra d-46 6D n Ra d-49 9 nd -53 D 6D nd
Ra
-28 6D
nd
nd
Ra
-25 2D nd
Ra
Ra
-21 6D nd Ra
-18 2D nd Ra
-14 5D nd Ra
-10 9D Ra
nd
-54 D -60 Ra D nd -66 D Ra nd -75 D nd
nd
Ra
Ra
-45 D nd Ra
-39 D nd Ra
-33 D nd Ra
-23 D nd Ra
-17 D nd Ra
0.06
nd
0.08
-11 D
0.1
Ra
Running time in µs/edge
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
0.2
1+e7
Number of edges (in log scale)
0.05 0.04 0.03 0.02 0.01 1+e7
Number of edges (in log scale)
Figure 3: Comparison of the algorithms 2V-AUX, 2V-LNF and 2V-LNFD (top) and of the algorithms 2E-AUX, 2E-LNF, and 2E-LNFD (middle) for computing the 2-vertex-connected components of the random graphs. Comparison of the algorithms LNF and LNFD for computing the loop nesting forest of the aforementioned random graphs (bottom). The number of vertices is fixed to 100K and the number of edges, selected uniformly at random, is in the range 1.1M to 53.6M. (Better viewed in color). LNFD are more robust than 2V-AUX, as their running times appear to be less sensitive to the structure of the input graph. The performance of 2V-AUX, on the other hand, is more dependent on the graph structure, which affects the number and size of the auxiliary graphs. More specifically, the 2V-AUX algorithm is faster in graphs with few 2-vertex-connected components because it creates fewer auxiliary graphs (even of large size). Now we evaluate the performance of the algorithms 2E-AUX, 2E-LNF, and 2E-LNFD for the computation of 2-edge connected components. Figure 2 (bottom) plots the corresponding running times. On average, algorithm 2E-LNF is 1.69 times faster than 2E-AUX and
10.38% faster than 2E-LNFD. Hence, similarly to the computation of the 2-vertex-connected components, the algorithm based on loop nesting trees, 2E-LNF, achieves overall the best performance. Notice, however, that the gap between the 2E-LNF and 2E-AUX is smaller compared to the gap between 2V-LNF and 2V-AUX. This is due to the fact that the auxiliary graphs that are created by 2E-AUX are less complicated than the auxiliary graphs created by 2V-AUX, and also they require less memory. Random graphs. Since all the real-world graphs tend to be sparse, we made additional experiments to understand the behavior of the algorithms for the computation of the 2-connected components in dense
177
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
SA SA
P-
70
M
US A Liv -USA eJ ou rna l SA P32 M US A Liv -USA eJ ou rna l SA P32 M
SA
P-
11
M
M le6 ac
4M PSA
Or
gle
erk Sta n b- B
b- G oo
we
we
d
06 01
for Am
az on
tan
lk
b- S we
kiT a Wi
so
Am
c-E
az on
pin
03 02
ion s1
am e eD otr b- N we
1+e7
M 70 P-
M 11 PSA
le6 ac Or
4M SA
P-
b-B we
1+e6
1+e2
M
erk Sta n
gle oo
we
az on
b-G
06 01
d Am
b-S we
Wi
kiT a
lk
tan
for
03 02 az on
pin c-E
Am
1+e1
so
b-N
otr
eD
ion s1
am e
2V-AUX 2V-LNF 2V-LNFD
we
Memory in Bytes/edge (in log scale)
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
1+e2
1+e8
Number of edges (in log scale) 2E-AUX 2E-LNF 2E-LNFD
1+e1 1+e6
1+e7
1+e8
Number of edges (in log scale)
Figure 4: Memory consumption of the algorithms 2V-AUX, 2V-LNF, and 2V-LNFD (top) and of the algorithms 2E-AUX, 2E-LNF, and 2E-LNFD (bottom) for computing the 2-vertex-connected components and the 2-edgeconnected components, respectively, of the real-world graphs. graphs. We considered random graphs with a fixed number of vertices n = 100K and density in a range that spans from 11 to 536 (where each edge is selected uniformly at random). First, we evaluate the computation of the 2-vertex connected components of random graphs and the results are plotted in Figure 3 (top). The observation is that 2V-LNF gradually loses advantage over 2VAUX, and it even becomes significantly slower for very dense graphs. The bottleneck in the case of dense graphs is the loop nesting tree computation, since it introduces many memory writes when it inserts edges into the dynamic lists of the loop heads. On the other hand, 2V-LNFD was designed to filter many unnecessary insertions in those dynamic lists, and hence, it performs consistently faster than both 2VAUX and 2V-LNF. Next, we evaluate the performance of the algorithms 2E-AUX, 2E-LNF and 2E-LNFD as the density of the input graph increases. The running times of the experiments in these random graphs are plotted in Figure 3 (middle). Similarly to the behavior of 2V-LNFD compared to 2V-LNF, algorithm 2E-LNFD becomes significantly faster compared to 2E-LNF as the graphs density increases. Moreover, the absolute running times between 2V-LNF and 2E-LNF, and
between 2V-LNFD and 2E-LNFD, are very similar. The bottleneck in the case of dense graphs is the loop nesting tree computation since it introduces many memory writes when it inserts edges into dynamic lists. In Figure 3 (bottom) we present separately the comparison between the two versions of the algorithm that compute the loop nesting forest. Therefore, it becomes clear that the advantage, in terms of running time, of the memory efficient version increases a lot as the graphs get denser. The fact that the loop nesting tree computation is part of both 2E-LNF and 2V-LNF, explains the similar behavior between 2V-LNF and 2E-LNF and between 2V-LNFD and 2E-LNFD. Notice also that for the computation of 2-edgeconnected components, algorithm 2E-AUX performs faster than both 2E-LNF and 2E-LNFD. This is due to the fact that the auxiliary graphs that are created by 2E-AUX are less complicated than the auxiliary graphs created by 2V-AUX, and also they require less memory. This fact also becomes clear by the difference in the absolute running times of 2E-AUX and 2V-AUX. Effect of number of 2-edge- and 2-vertexconnected components on 2V-AUX and 2E-AUX. We further investigate how the number of the 2-vertexand 2-edge-connected components affects the running times of 2V-AUX and 2E-AUX, respectively. To assess
178
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Graphs web-NotreDame soc-Epinions1 Amazon0302 WikiTalk web-Stanford Amazon0601 web-Google web-BerkStan SAP-4M Oracle-6M flickr SAP-11M USA-USA LiveJournal SAP-32M SAP-70M uk-2002
Max 1.5K 17.1K 55.4K 49.4K 10.9K 276.0K 77.5K 29.1K 2.5K 3.6K 605K 6.3K 16.0M 2.9M 6.6K 7.0K 802.5K
Max2VCS Avg # 20.2 893 84.9 210 7.8 19789 1768.5 28 16.4 2936 35.0 9341 12.3 15957 15.7 8104 15.1 1883 9.6 47430 37.84 17735 20.1 1480 148.8 112780 153.7 19202 10.2 7280 13.9 10630 15.7 282831
Max 6390 17516 81423 49506 21771 296281 155041 56166 2501 3591 641329 6340 16051072 2914808 6616 7045 2649794
Max2ECS Avg 41.6 300.4 12.0 16504.0 29.1 75.1 32.2 27.8 17.1 10.2 162.7 20.1 158.7 779.2 10.5 13.9 48.5
# 478 59 12874 3 1708 4301 6454 4744 1900 44127 4044 1480 105704 3771 7280 10629 101051
Table 3: Statistics regarding the maximal 2-vertex- (Max2VCS) and 2-edge-connected subgraphs (Max2ECS) of the real-world graphs of Table 2. Only non-trivial subgraphs (i.e., subgraphs with at least 3 vertices) are considered. The reported numbers refer to the maximum number of vertices in a subgraph (Max), the average number of vertices in each subgraph (Avg), and the total number of non-trivial subgraphs (#). this dependence, we executed 2V-AUX and 2E-AUX on the following type of artificial graphs. For a fixed number of vertices, n = 100K, we marginally increase the density by 0.5 each time creating three types of graphs: (i) a graph that is 2-vertex-connected (resp., 2-edge-connected), (ii) a graph containing 42 2-vertexconnected (resp., 2-edge-connected) components with sizes in the range from 2 to n/5, and (iii) a graph containing 10K components of equal size. Our experiments suggest that the running time of the algorithms increases, up to a factor of 3 compared to 2-vertex(resp., 2-edge-) connected graphs, as the number of the 2-vertex-connected (resp., 2-edge-connected) components increases. Memory consumption. We analyze the memory consumption of the algorithms for the computation of the 2-connected components on real-world and random graphs. Figure 4 (top) plots the memory usage of the algorithms 2V-AUX, 2V-LNF, and 2V-LNFD on the real-world graphs that we consider. We observe that 2V-LNF and 2V-LNFD require significantly less memory, compared to 2V-AUX, for all input graphs. On average, 2V-LNFD uses about 3.2 times less memory than 2V-AUX, and improves the memory consumption of 2V-LNF by about 10.6%. Figure 4 (bottom) compares the memory consumption of the algorithms 2E-AUX, 2E-LNF, and 2E-LNFD for computing the 2-edge-connected components of a digraph. We notice that 2E-LNFD requires 30.26% less memory compared
to 2E-LNF, and 3.66 times less memory than 2E-AUX. Since all the real-world graphs that we consider are sparse, we conducted further experiments on the same dense random graphs that we considered in Figure 2. We observe that 2V-LNFD reduces the memory consumption by 6.20 times compared to 2VAUX and by 26% compared to 2V-LNF. Finally, our experiments show that 2E-LNFD consumes 25% less memory than 2E-LNF and 4.43 times less memory than 2E-AUX, on average. 3.2 Experimental results for computing the Maximal 2-connected subgraphs In this section we evaluate the performance of the algorithms for the computation of maximal 2-connected Subgraphs. Real-world graphs. First we evaluate the performance of algorithms M2V-BASIC, M2V-CHILP, M2VCHILP∗ , M2V-HKL, and M2V-HKL∗ that compute the maximal 2-vertex-connected subgraphs of the realworld graphs from Table 2. We summarize the characteristics of the maximal 2-vertex and 2-edge-connected subgraphs of the real-world graphs that we consider in Table 3. We begin by highlighting the speedup that our engineered versions M2V-CHILP∗ and M2V-HKL∗ of algorithms M2V-CHILP and M2V-HKL, respectively, achieve compared to their straightforward implementations. Figure 5 (top) plots the corresponding running times of M2V-CHILP and M2V-CHILP∗ . We observe that our engineered version M2V-CHILP∗ runs on aver-
179
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
1+e4
SA
SAP
02 -20 uk
P-
70
M
US A Liv -US eJ A ou rna l SA P32 M
USA -US LiveJ A ourn al SAP -32M
flic
flickr
kr SA P11
M
M le6
4M P-
ac Or
SA
Orac le-6M
gle
erk Sta n b- B
b- G oo
we
we
d
06 01
for tan
az on Am
lk
b- S we
web-S
kiT a Wi
Am
WikiT alk
az on
pin c-E so
1+e7
002 uk-2
-70M
-11M
SAP
1+e6
SAP
-4M
erkS tan web-B
oogle web-G
01 zon0 6 Ama
Ama
tanfo rd
zon0 3
02
ns1 pinio soc-E
web-N
1+e0
otreD
ame
1+e1
Running time in µs/edge (in log scale)
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
03 02
ion s1
am e eD otr b- N we
M2V-CHILP* M2V-CHILP
1+e2
1+e8
M2V-HKL* Number of edges (in log scale) 1+e3
M2V-HKL
1+e2
1+e1
1+e0 1+e6
1+e7
1+e8
Number of edges (in log scale)
Figure 5: Comparison between the algorithms M2V-CHILP and M2V-CHILP∗ (top), and the algorithms M2VHKL∗ and M2V-HKL (bottom), for computing the maximal 2-vertex-connected subgraphs of the real-world graphs in our dataset. Running times are shown in µs/edge (in log scale). Executions running longer than 24 hours were terminated (this affected M2V-HKL only). (Better viewed in color). age approximately 1.4 orders of magnitude faster than the original version M2V-CHILP. Also, as it is shown in Figure 5 (bottom), our engineered version M2VHKL∗ provides a substantial improvement over the original version M2V-HKL as it runs approximately 2.1 orders of magnitude faster. We draw similar conclusions in the case of the algorithms M2E-CHILP∗ and M2E-HKL∗ compared to M2E-CHILP and M2E-HKL, respectively. Next, we compare the running times of the engineered versions M2V-CHILP∗ and M2V-HKL∗ , and of M2V-BASIC for the real-world graphs. Figure 6 (top) plots the corresponding running times. Despite its inferior asymptotically running time, M2V-BASIC achieves a superior practical performance in all instances considered. In particular, M2V-BASIC runs on average 3.8 times faster than M2V-CHILP∗ and approximately 1.9 orders of magnitude faster than M2V-HKL∗ . In our experiments, M2V-CHILP∗ performs on average 1.4 orders of magnitude faster than M2V-HKL∗ . Note that the differences in performance between different algorithms tend to be greater for maximal 2-vertex-connected subgraphs (Max2VCS) than for 2-vertex-connected components (2VCC): this is ex-
plained by taking into account that the fastest algorithms for 2VCC require linear time, while the fastest algorithms for Max2VCS require super-linear time. For the computation of the maximal 2-edge connected subgraphs in real-world graphs, we observe similar behavior to the case of the maximal 2vertex connected subgraphs of the comparison of our engineered versions M2E-CHILP∗ and M2E-HKL∗ of M2E-CHILP and M2E-HKL, respectively. Algorithm M2E-CHILP∗ outperforms M2E-CHILP by 1.47 orders of magnitude, and M2E-HKL∗ runs 2.90 orders of magnitude faster than M2E-HKL. Next, we compare our engineered versions M2E-CHILP∗ and M2E-HKL∗ with algorithm M2E-BASIC. We plot in Figure 6 (bottom) the running times of those algorithms for the real-world graphs. Algorithm M2E-BASIC performs better than M2E-CHILP∗ and M2E-HKL∗ in almost all instances. More specifically, M2E-BASIC runs 33% faster than M2E-CHILP∗ and 1.18 orders of magnitude faster than M2E-HKL∗ . Pathological instances. We also tested the algorithms in pathological instances that cause algorithm M2V-BASIC to have linear recursion depth, i.e., Θ(n) where n is the number of vertices. We define a graph family BAD(k, d), where parameters k and d control,
180
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
SA SA
02 -20 uk
P-
70
M
US A Liv -US eJ A ou rna l SA P32 M
kr SA P11
flic
US A Liv -roa eJ d-U ou rna SA l SA P32 M
M
M 4M
le6 ac
PSA
Or
gle
erk Sta n b- B
b- G oo
we
we
d
06 01
for tan
Am
az on
lk
b- S we
Wi
Am
kiT a
az on
pin c-E so
1+e7
02 -20
P-
70
M uk
flic
kr SA P11
M
M le6 ac Or
SA
P-
b-B we
1+e6
M2E-BASIC M2E-CHILP* M2E-HKL*
4M
erk Sta n
1 we
b-G
oo
-60
gle
d az on Am
we
b-S
lk kiT a Wi
Am
tan
for
2 -30 az on
pin c-E so
1+e-1 1+e2
we
b-N
otr
eD
1+e0
ion s1
am e
1+e1
Running time in µs/edge (in log scale)
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
03 02
ion s1
am e eD otr b- N we
M2V-BASIC M2V-CHILP* M2V-HKL*
1+e2
1+e8
Number of edges (in log scale)
1+e1
1+e0
1+e-1 1+e6
1+e7
1+e8
Number of edges (in log scale)
Figure 6: Comparison of the algorithms M2E-BASIC, M2E-CHILP∗ , and M2E-HKL∗ computing the maximal 2edge-connected subgraphs of the real-world graphs. Running times are shown in µs/edge (in log scale).(Better viewed in color). respectively, the number of vertices and the number of edges; the graph has n = 2k + 1 vertices and m ≈ kn edges. To construct BAD(k, d) for k > 1 and d ≥ 3, we begin with the base graph BAD(k, 3), depicted in Figure 7, and add some edges until the graph density m/n is approximately equal to d. Specifically, we add edges of the following types: (i) (xi , s) for 1 ≤ i < k, (ii) (xi , xj ) for k ≥ i > j ≥ 1, (iii) (yi , yj ) for 1 ≤ i < j ≤ k, and (iv) (xi , yj ) for i 6= j. Note that the addition of the above edges does not create any non-trivial 2-vertex- or 2-edge-connected subgraphs. In our experiments, we fix the number of vertices n to be 10K and gradually increase the graph density, from 3 up to 5000, by adding edges of type (i)-(iv). The running times of algorithms M2V-BASIC, M2V-CHILP∗ , and M2V-HKL∗ are plotted in Figure 8 (top). As expected, M2V-BASIC performs poorly on these instances. Our engineered algorithms M2VCHILP∗ and M2V-HKL∗ , on the other hand, run much faster. In particular, M2V-CHILP∗ achieves a speedup of approximately 2.4 and 1.7 orders of magnitude compared to M2V-BASIC and M2V-HKL∗ , respectively. This behavior can be explained as follows. In every recursive call M2V-BASIC identifies a single strong articulation point, whose removal disconnects only a single vertex from the rest of the graph. Notice that in
each recursive execution, the vertex that is separated from the rest of the graph and the strong articulation point form together a 2-isolated set of two vertices. On the other hand, M2V-HKL∗ , is able to identify each 2-isolated set containing 2 vertices, that appears repeatedly, by spending only linear time in a sparse graph with n vertices and at most 2n edges. Finally, M2V-CHILP∗ repeatedly identifies all small 2-isolated sets efficiently (i.e., in time proportional to their size) without searching for a strong articulation point in the whole graph, and hence it is able to identify fast all the small 2-isolated sets that appear recursively. Finally, we compare the algorithms M2E-BASIC, M2E-CHILP∗ , and M2E-HKL∗ on the same pathological instances. Similarly to the case of computing the maximal 2-vertex-connected subgraphs, in the case of computing the maximal 2-edge-connected subgraphs M2E-CHILP∗ performs significantly better than M2EBASIC and M2E-HKL∗ . More specifically, on average, M2E-CHILP∗ outperforms M2E-BASIC and M2E-HKL∗ by 2.78 and 1.66 orders of magnitude, respectively. The results are shown in Figure 8 (bottom). 4
Conclusions
In this paper we revisited the problem of computing the 2-edge- and 2-vertex-connected components,
181
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
by Chechik et al. [5], on the other hand, performed closely to the algorithms of [7] in real-world graphs, and were faster by more than two orders of magnitude in pathological graphs.
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
𝑠
𝑦5
𝑦4
𝑦3
𝑦2
𝑦1
𝑥5
𝑥4
𝑥3
𝑥2
𝑥1
Acknowledgements We thank Bob Foster for providing the “Oracle” graphs, and Andreas Buchen and Krum Tsvetkov for providing the “SAP” graphs. References
𝑘=5 Figure 7: A family of digraphs that elicits the worstcase behavior of algorithms M2V-BASIC and M2EBASIC. For a given parameter k, the digraph has n = 2k + 1 vertices and m = 6k − 1 = 3n − 4 edges. A double-headed arrow corresponds two parallel but oppositely directed edges. Vertex yk (shown in red) is the only strong articulation point, which when deleted leaves two strongly connected components: an isolated vertex xk and the digraph of the family with parameter k − 1. Similarly, edge (yk , xk ) is the only strong bridge, which when deleted leaves vertex xk as a singleton strongly connected component. Therefore, this digraph has no non-trivial 2-vertex- and 2-edgeconnected subgraphs.
and the maximal 2-edge- and 2-vertex-connected subgraphs of a directed graph in practice. We implemented and engineered newly proposed algorithms, and compared them experimentally against the fastest previously known algorithms. To that end, we conducted an extensive empirical analysis, using realworld and synthetic graphs, in order to asses the different aspects of each approach. Our main findings suggest that for the computation of the 2-vertex- and the 2-edge-connected components the new loop-nesting-based algorithms from [11] perform substantially faster compared to previous approaches, and moreover use significantly less memory, especially in sparse graphs. This makes them suitable for large-scale real-world graphs, which are known to be inherently sparse. For dense graphs, we showed that our proposed variant for computing the loop nesting tree achieves a significant speedup compared to implementations that are oblivious to memory usage. In our experiments for computing the maximal 2-vertex- and 2-edge-connected subgraphs, the asymptotically inferior algorithms based on dominator tree decomposition [7] outperformed the state-of-the-art algorithms [5, 12] in real-world graphs. The algorithms of [7], however, are sensitive to pathological instances. Our carefully engineered versions of the algorithms 182
[1] S. Alstrup, D. Harel, P. W. Lauridsen, and M. Thorup. Dominators in linear time. SIAM Journal on Computing, 28(6):2117–32, 1999. [2] J. Bang-Jensen and G. Gutin. Digraphs: Theory, Algorithms and Applications (Springer Monographs in Mathematics). Springer, 1st ed. 2001. 3rd printing edition, 2002. [3] P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004), pages 595–601, 2004. [4] A. L. Buchsbaum, L. Georgiadis, H. Kaplan, A. Rogers, R. E. Tarjan, and J. R. Westbrook. Linear-time algorithms for dominators and other path-evaluation problems. SIAM Journal on Computing, 38(4):1533–1573, 2008. [5] S. Chechik, T. Dueholm Hansen, G. F. Italiano, V. Loitzenbauer, and N. Parotsidis. Faster algorithms for computing maximal 2-connected subgraphs in sparse directed graphs. In Proc. 28th ACM-SIAM Symp. on Discrete Algorithms, pages 1900–1918, 2017. [6] C. Demetrescu, A.V. Goldberg, and D.S. Johnson. 9th DIMACS Implementation Challenge: Shortest Paths. http:// www.dis.uniroma1.it/~challenge9/, 2007. [7] W. Di Luigi, L. Georgiadis, G. F. Italiano, L. Laura, and N. Parotsidis. 2-connectivity in directed graphs: An experimental study. In Proc. 17th Workshop on Algorithm Engineering and Experiments, pages 173–187, 2015. [8] Ya. M. Erusalimskii and G. G. Svetlov. Bijoin points, bibridges, and biblocks of directed graphs. Cybernetics, 16(1):41–44, 1980. [9] L. Georgiadis, G. F. Italiano, L. Laura, and N. Parotsidis. 2-vertex connectivity in directed graphs. In Proc. 42nd Int’l. Coll. on Automata, Languages, and Programming, pages 605–616, 2015. [10] L. Georgiadis, G. F. Italiano, L. Laura, and N. Parotsidis. 2-edge connectivity in directed graphs. ACM Transactions on Algorithms, 13(1):9:1–9:24, October 2016. Announced at SODA 2015. [11] L. Georgiadis, G. F. Italiano, and N. Parotsidis. Strong connectivity in directed graphs under failures, with applications. In Proc. 28th ACM-SIAM Symp. on Discrete Algorithms, pages 1880–1899, 2017.
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited
000) BAD(5
500) BAD(2
250) BAD(1
25) BAD(6
13) BAD(3
57) BAD(1
9) BAD(7
0) BAD(4
0) BAD(2
0) BAD(1
) BAD(5
) BAD(3
M2V-BASIC M2V-CHILP* M2V-HKL*
1+e5
1+e3
1+e6
000) BAD(5
500) BAD(2
250) BAD(1
25) BAD(6
13) BAD(3
57) BAD(1
9) BAD(7
0) BAD(4
0) BAD(2
0)
)
BAD(1
1+e-1
BAD(5
)
1+e0
BAD(3
(in log scale)
1+e1
Running time in µs/edge
Downloaded 01/08/18 to 154.16.64.195. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
1+e2
1+e7
Number of edges (in log scale) 1+e2 1+e1 1+e0
M2E-BASIC M2E-CHILP* M2E-HKL*
1+e-1 1+e-2 1+e5
1+e6
1+e7
Number of edges (in log scale)
Figure 8: Running times shown in µs/edge to compute the maximal 2-edge-connected subgraphs of the pathological instances BAD. [12] M. Henzinger, S. Krinninger, and V. Loitzenbauer. Finding 2-edge and 2-vertex strongly connected components in quadratic time. In Proc. 42nd Int’l. Coll. on Automata, Languages, and Programming, pages 713–724, 2015. [13] G. F. Italiano, L. Laura, and F. Santaroni. Finding strong bridges and strong articulation points in linear time. Theoretical Computer Science, 447:74–84, 2012. [14] R. Jaberi. Computing the 2-blocks of directed graphs. RAIRO-Theor. Inf. Appl., 49(2):93–119, 2015. [15] R. Jaberi. On computing the 2-vertex-connected components of directed graphs. Discrete Applied Mathematics, 204:164–172, 2016. [16] J. Kunegis. KONECT: the Koblenz network collection. In 22nd International World Wide Web Conference, WWW, pages 1343–1350, 2013. [17] J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap. stanford.edu/data, June 2014. [18] S. Makino. An algorithm for finding all the kcomponents of a digraph. International Journal of Computer Mathematics, 24(3–4):213–221, 1988. [19] H. Nagamochi and T. Ibaraki. Algorithmic Aspects of Graph Connectivity. Cambridge University Press, 2008. 1st edition. [20] H. Nagamochi and T. Watanabe. Computing kedge-connected components of a multigraph. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E76–A.4:513–
183
517, 1993. [21] G. Ramalingam. On loops, dominators, and dominance frontiers. ACM Transactions on Programming Languages and Systems, 24(5):455–490, 2002. [22] J. H. Reif and P. G. Spirakis. Strong k-connectivity in digraphs and random digraphs. Technical Report TR-25-81, Harvard University, 1981. [23] R. E. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2):146– 160, 1972. [24] R. E. Tarjan. Efficiency of a good but not linear set union algorithm. Journal of the ACM, 22(2):215–225, 1975. [25] R. E. Tarjan. Edge-disjoint spanning trees and depthfirst search. Acta Informatica, 6(2):171–85, 1976.
Copyright © 2018 by SIAM Unauthorized reproduction of this article is prohibited