An Empirical Study of Dynamic Graph Algorithms - CiteSeerX

0 downloads 0 Views 717KB Size Report
We 1265/5-1 (Leibniz-Preis). zWork supported in .... In this section we describe sparsi cation in a top-down fashion: we rst start from a high-level perspective and ...
An Empirical Study of Dynamic Graph Algorithms  David Alberts y

Institut fur Informatik Martin-Luther-Universitat Halle-Wittenberg, Germany [email protected]

Giuseppe Cattaneo

Dipartimento di Informatica ed Applicazioni Universita di Salerno, Italy [email protected]

Giuseppe F. Italiano z

Dipartimento di Matematica Applicata ed Informatica Universita \Ca' Foscari" di Venezia, Italy [email protected]

Abstract

The contributions of this paper are both of theoretical and of experimental nature. From the experimental point of view, we conduct an empirical study on some dynamic connectivity algorithms which where developed recently. In particular, the following implementations were tested and compared with simple algorithms: simple sparsi cation by Eppstein et al. and the recent randomized algorithm by Henzinger and King. In our experiments, we considered both random and non-random inputs. Moreover, we present a simpli ed variant of the algorithm by Henzinger and King, which for random inputs was always faster than the original implementation. For non-random inputs, simple sparsi cation was the fastest algorithm for small sequences of updates; for medium and large sequences of updates, the original algorithm by Henzinger and King was faster. From the theoretical point of view, we analyze the average case running time of simple sparsi cation and prove that for dynamic random graphs its logarithmic overhead vanishes.

Work partly supported by the Commission of the European Communities under the ESPRIT-LTR Project no. 20244 (ALCOM-IT). A preliminary version of this work appears in [3]. y Part of this work was supported by the Deutsche Forschungsgemeinschaft, Grant We 1265/2-1, and Grant We 1265/5-1 (Leibniz-Preis). z Work supported in part by a Research Grant from Universit a di Venezia \Ca' Foscari" and the Italian MURST Project \Ecienza di Algoritmi e Progetto di Strutture Informative". Part of this work was done while at University of Salerno and while visiting ICSI, Berkeley, CA 94704. 

1 Introduction A dynamic graph algorithm maintains a certain property { like connectivity, bipartiteness, or a minimum spanning tree { of a dynamically changing graph. Usually, this is realized by means of a data structure allowing three kinds of operations: inserting an edge, deleting an edge, and answering a query. For the dynamic connectivity problem, for example, a query takes two nodes u and v as its arguments and returns \True", if there is a path connecting u and v in the current graph. The eld of dynamic graph algorithms has been a blossoming eld of research in the last years [4, 9, 11, 13, 14, 15, 17, 19, 31, 33], motivated by theoretical and practical questions (see for instance [29]). However, despite this blend of theoretical and practical interests, we are aware of no implementations and experimental studies in this eld. In this paper, we aim at bridging this gap by studying the practical properties of theoretically interesting dynamic graph algorithms that were developed recently. The contributions of this paper are both of theoretical and of experimental nature. From the theoretical point of view, we analyze the average case running time of simple sparsi cation [11] in a fairly general random graph model, and prove that in this case the worst-case logarithmic overhead vanishes. More precisely, we prove that the average number of sparsi cation tree nodes a ected by an update is a small constant, independent of the size of the graph: this is con rmed by our experiments on random inputs. In the model for random graph updates that we use, the current graph G is always a random subgraph of some prescribed supergraph G . Each possible subgraph, i.e., each subgraph of G with the same number of edges as G is equally likely. The type of an update (insertion or deletion of an edge) is not random, but edges to be deleted are chosen uniformly at random from the current edge set, and edges to be inserted are chosen uniformly at random from the set E (G ) n E (G). An analysis with respect to this model holds for special update type patterns (like insertions only or insertion and deletion regularly alternating) and it holds for random graphs with special properties (e.g., bipartiteness) which can be imposed by choosing a suitable supergraph G . Alberts and Henzinger [4] proposed this model for dynamic graph algorithms (similar models were used in computational geometry before, see, e.g., [25, 32]) and showed that a broad class of dynamic graph algorithms can be sped up for inputs with this type of restricted randomness. Our second contribution is more of an experimental nature: we conducted an extensive empirical study of dynamic connectivity algorithms. In particular, we implemented simple sparsi cation and the recent randomized polylogarithmic dynamic connectivity algorithm of Henzinger and King [19]. Moreover, we propose a simpli ed variant of the algorithm of [19]. The worst case running time of this variant is O(log n) for insertions and queries, and O(m log n) for deletions, where n is the current number of vertices and m the current number of edges in the graph. For random inputs however, we conjecture an expected running time of O(log2 n) per deletion. This variant was indeed always faster than the original algorithm in our experiments with random inputs. We compared the performance of our implementations to other static and dynamic algorithms, and we performed extensive tests under several variations of graph and update parameters. Our experiments were run both on randomly generated graphs, and on more structured graphs that could be taken as benchmarks for future experimental work on dynamic graph algorithms. Our variant of the Henzinger and King's algorithm was among the fastest imple1

mentations for random inputs. For non-random inputs, simple sparsi cation yielded better times for small sequences of updates; for medium and large sequences of updates, the original algorithm of Henzinger and King took over. We remark that our implementation of simple sparsi cation used a static O(n + m (m; n)) p algorithm rather than the O( m ) dynamic algorithm of Frederickson [14] (as no implementation of this algorithm was available). This produced a bound of O(n (n; n) log(m=n)) 1 per p update rather than O( n ). We are currently implementing the algorithm by Frederickson to check whether sparsi cation on top of it will produce faster algorithms for structured graphs. All our implementations are written in C++ and are based on LEDA, the library of ecient data types and algorithms developed by Mehlhorn and Naher [24]. The source codes are available via anonymous ftp: from ftp.inf.fu-berlin.de in the directory /pub/misc/dyn con/, and from ftp.dia.unisa.it in pub/italiano/sparsification. The reminder of the paper is organized as follows. Section 2 describes the sparsi cation technique and our implementation of it. Section 3 describes the randomized algorithm of Henzinger and King, and our simpli ed variant, while in Section 4 we list the other simple algorithms for dynamic connectivity we implemented. Section 5 contains the analysis of the expected running time of simple sparsi cation for sequences of random updates. Section 6 reports the detailed results of our experiments with the algorithm of Henzinger and King, simple sparsi cation, and the simpler algorithms both on random and non-random inputs. Finally, Section 7 lists some concluding remarks and open problems.

2 Simple Sparsi cation In this section we describe sparsi cation in a top-down fashion: we rst start from a high-level perspective and then ll in the low-level details. The technique itself is very simple. Let G be a graph with m edges and n vertices. We partition the edges of G into a collection of O(m=n) sparse subgraphs, i.e., subgraphs with n vertices and O(n) edges. The information relevant for each subgraph is summarized in an even sparser subgraph, which is called a sparse certi cate. We next merge certi cates in pairs, producing larger subgraphs which are made sparse by again computing their certi cate. This is applied recursively, yielding a balanced binary tree in which each node is represented by a sparse certi cate. We call a tree produced by this recursive procedure a sparsi cation tree. The sparsi cation tree has O(m=n) leaves, and hence its height is O(log(m=n)). Each update involves examining the sparse certi cates contained in a constant number of tree paths from leaves up to the tree root: this results in considering O(log(m=n)) small graphs with O(n) edges each, instead of one large graph with m edges. This explains how a f (n; m) time bound can be sped up to O(f (n; O(n)) log(m=n))). We call this version simple sparsi cation. The use of more sophisticated data structures allows sparsi cation to achieve an O(f (n; O(n))) bound, thus chopping the log factor from the previous bound. We call the latter improved sparsi cation, and refer the interested reader to reference [12] for the details of the method. Simple sparsi cation comes in three avors, depending on whether the certi cates at the tree nodes are recomputed by a static, fully dynamic, or partially dynamic algorithm. The rst 1

Throughout the paper we let log x stand for maxf1; log2 xg, so log(m=n) is never smaller than 1 even when

m < 2n.

2

variant uses a static algorithm to recompute a sparse certi cate in each tree node a ected by an edge update. This variant is very e ective at producing dynamic graph data structures for a multitude of problems, in which the update time is O(n logO(1) n) instead of the static time bounds of O(m + n logO(1) n). In the second variant, the certi cates are maintained using a dynamic data structure. For this to work, we need a stability property of our certi cates, to ensure that a small change in the input graph does not lead to a large change in the certi cates. This variant transforms time bounds of the form O(mp ) into O(np log(m=n)). In many instances, certi cates can be maintained even more eciently than the property they certify, and the O(log(m=n)) term vanishes. In the third variant, we perform deletions as in the rst variant, and insertions using a partially dynamic algorithm. This leads to insertion times often matching those of known partially dynamic algorithms, together with deletion times similar to those of the rst variant. We implemented the rst and the third variant of simple sparsi cation, and experimented them with static and partially dynamic algorithms for connectivity. We start by describing an abstract version of simple sparsi cation. The technique is based on the concept of a certi cate:

De nition 1. For any graph property P , and graph G, a certi cate for G is a graph G0 on the same vertex set such that, for any H , G [ H has property P if and only if G0 [ H has the property.

Note that, according to this de nition, a certi cate need not necessarily be a subgraph of

G: it is only required that G0 has a structure closely related to that of G. However, most of the certi cates that we know of are subgraphs of the original graph G. Therefore, all the certi cates used throughout this paper will be subgraphs of the original graph G. Furthermore, in all our uses of De nition 1, the graphs G and H will have the same vertex set and disjoint edge sets. The following facts follow immediately from De nition 1.

Fact 1. Let G0 be a certi cate of property P for graph G, and let G00 be a certi cate for G0. Then G00 is a certi cate for G.

Fact 2. Let G0 and H 0 be certi cates of P for G and H . Then G0 [ H 0 is a certi cate for G [ H. De nition 2. A graph property P is said to have c-sparse certi cates if there is some constant

c > 0 such that for every graph G on an n-vertex set, we can nd a certi cate for G with at most cn edges.

We now describe the full details of simple sparsi cation from a general viewpoint. Let P be a property for which we can nd c-sparse certi cates in time g (n; m), and such that we can construct a data structure for testing property P in time f (n; m) which can answer queries in time q (n; m). We maintain a partition of the edges of the graph into dm=ne groups, all but one of which contain exactly n edges. The remaining group, which we call the small group, may contain between 1 and n edges. When we insert an edge, it is placed in the small group. When we delete an edge, we move another edge from the small group into the group from which we 3

deleted the edge, to maintain the group size invariant. If we delete the small group's last edge, we remove the group, and if we insert an edge when there are n edges in the small group, we start a new small group. We form a sparsi cation tree as a balanced binary tree with dm=ne leaves corresponding to the dm=ne groups. Each node in this sparsi cation tree corresponds to a subgraph formed by the edges in the groups at the leaves of the tree that are descendants of the given node. For each such subgraph, we maintain a sparse certi cate of P . The certi cate at a given node is found by forming the union of the certi cates at the two child nodes, and running the sparse certi cate algorithm on this union. By Facts 1 and 2 this gives a sparse certi cate for the subgraph at the node. Each certi cate can be computed in time g (n; 2cn). Each change in the number of groups will cause a single leaf of the tree to split, or two leaves to merge. Since we only allow edge insertions and deletions, the number of edges per group will never change. This sparsi cation tree has exactly dm=ne leaves. The number of internal nodes in this tree is at most (dm=ne ? 1), and thus the total number of tree nodes is at most (2 dm=ne ? 1). The tree height is exactly dlog(m=n)e. When an edge is inserted or deleted, we change O(1) groups. For each of these groups, and for their log(m=n) ancestors in the sparsi cation tree, we recompute a new sparse certi cate. This results in a sparse certi cate for the whole graph at the root of the tree. We update the data structure for property P , on the graph formed by the sparse certi cate at the root of the tree, in time f (n; cn). The total time per update is thus O(g (n; 2cn) log(m=n) + f (n; cn)). The time needed to set up the sparsi cation tree can be analyzed as follows. The time required to form the initial O(m=n) groups is clearly O(m). After that, we need to initialize each node of the sparsi cation tree. There are O(m=n) such nodes, and each node requires in the worst case merging two sparse certi cates from the children nodes, and computing a new certi cate in the resulting graph. Thus, the time spent at each tree node is g (n; 2cn) + O(cn). After all the nodes are initialized, we compute the data structure for property P at the tree root in time f (n; cn). Thus the total preprocessing time is O(m+(m=n)(g (n; 2cn)+O(cn))+f (n; cn). The previous results can be summarized as the following theorem shows.

Theorem 1 (Eppstein et al. [11]). Let P be a property for which we can nd c-sparse cer-

ti cates in time g (n; m), and such that we can construct a data structure for testing property P in time f (n; m) which can answer queries in time q(n; m). Then there is a fully dynamic data structure for testing whether a graph has property P , for which edge insertions and deletions can be performed in time O(g (n; 2cn) log(m=n)+ f (n; cn)), and for which the query time is q (n; cn). The preprocessing required by the data structure is O(m +(m=n)(g (n; 2cn)+ O(cn))+ f (n; cn)).

We now make the analysis of Theorem 1 more precise by taking into account also the implied constants. As we have seen, simple sparsi cation consists of recomputing the certi cates in at most two tree paths, and in recomputing the property to be maintained only at the tree root certi cate. Thus, each update implies: (1) at most 2dlog(m=n)e recomputations of sparse certi cates on graphs with 2cn edges (the recomputations take place in at most two tree paths); (2) at most 2dlog(m=n)e unions of sparse certi cates; 4

(3) one recomputation of the property on a graph with n vertices (the tree root certi cate). If lower order terms are neglected, this yields a worst-case time bound of 2g (n; 2cn) log(m=n) + 4cn log(m=n) + f (n; cn) per update. We now analyze the exact complexity of the preprocessing. Assume that the partition of edges into O(m=n) groups can be done in time m. After this, the sparsi cation tree must be built. The most expensive task in building the tree is the certi cate computations and merges. There will be one certi cate computation for each tree node: thus, the total number of initial certi cate computations will be at most (2dm=ne ? 1). The number of merges required equals the number of internal tree nodes, which is (dm=ne ? 1). At the end, we need to initialize the data structure for property P at the tree root. Assuming that merging two graphs with n1 and n2 edges can be done exactly in time (n1 + n2 ), gives a total preprocessing time of m + (2dm=ne ? 1)g(n; cn) + (dm=ne ? 1)2cn + f (n; cn). Hence, the following theorem follows.

Theorem 2. Let P be a property for which we can nd c-sparse certi cates in time g(n; m), and such that we can construct a data structure for testing property P in time f (n; m) which

can answer queries in time q (n; m). If lower order terms are neglected, simple sparsi cation yields a dynamic data structure for which edge insertions and deletions can be performed in time 2g (n; 2cn) log(m=n) + 4cn log(m=n) + f (n; cn), and for which the query time is q (n; cn). The total preprocessing required by the data structure is m + (2dm=ne? 1)g (n; cn) + (dm=ne? 1)2cn + f (n; cn). For the sake of argument assume that (when lower order terms are neglected) f (n; m) = Am, g(n; m) = Bm, q(n; m) = Dm, where A, B and D are the constants hidden in the asymptotic notation. Then, the dynamic data structure built with simple sparsi cation has the following performance: Each edge insertion and deletion can be supported in time (4(B + 1) log(m=n) + A)cn. Query: Each query can be supported in time Dcn; Preprocessing: The total preprocessing time is m+(2dm=ne?1)Dcn+(dm=ne?1)2cn+Acn, which is roughly (2c (D + 1) + 1)m + Acn. Update:

It should be clear at this point that the dynamic data structure built with simple sparsi cation (like any other dynamic data structure) performs updates faster at the price of a higher preprocessing. Furthermore, for an e ective exploitation of sparsi cation it is crucial that the c-sparse certi cates be small, and that the algorithm for computing these certi cates be practical and with very small constants. As an example, let P be graph connectivity. In order to check whether a given graph is connected, we can compute the connected components of G. Once the connected components of G have been computed, a query about connectivity can be answered in time q(n; m) = O(1). Let A be an algorithm that does this in time f (n; m) = Am. We can maintain information about the connectivity of G during insertions and deletions of edges by running simple sparsi cation on top of A, as follows. First of all, we need a certi cate for connectivity: it is easily seen that 5

a spanning forest of a graph G complies with De nition 1 when the property is connectivity. Thus, in this speci c example, we can use the same algorithm to compute the property P and the certi cates for the property P : f (n; m) = g (n; m) = Am. Since a spanning forest of G has at most (n ? 1) edges, it is a 1-sparse certi cate of connectivity. By Theorem 2, the dynamic data structure produced with simple sparsi cation has the following performance: Each edge insertion and edge deletion can be supported in time (4(A + 1) log(m=n) + A)n, which is roughly 4An log(m=n); Query: Each query can be supported in time O(1); Preprocessing: The preprocessing required by the data structure is less than 2(2A + 1)dm=nen + An, which is roughly 4Am.

Update:

Then, in this example, simple sparsi cation is able to reduce the update time from Am to 4An log(m=n) at the expense of a higher preprocessing time.

2.1 The Implementation The implementation follows closely the algorithm of Eppstein et al. [11], with very few exceptions to achieve better practical results. The sparsi cation tree we choose to maintain is the following. First, all the leaves are at the same level 0. Secondly, all (but possibly the rightmost) of the internal nodes at any level ` > 0 have exactly two children. The main di erences with the algorithm of Eppstein et al. [11] are the following: 1. Each time an edge is deleted from a given group, we insert a pointer to the corresponding tree leaf in a queue. This queue will contain pointers to empty slots in edge groups: during an edge insertion, we pop the rst item from the queue and insert the new edge in the corresponding leaf. If the queue is empty, there are no available slots from previous deletions, and the new edge is inserted as in the algorithm of Eppstein et al. either in the small group or in a new leaf, if the small group is full. This modi cation was introduced to improve the running time of edge deletions: deleting a graph edge need no swap with the small group, and consequently involves examining only one tree path rather than two. This resulted roughly in a 100% speed-up for edge deletions. 2. Because of our implementation of edge deletions, edge groups may be impoverished throughout the sequence of operations. Namely, update sequences having more deletions than insertions may result in edge groups with less than n edges. If the number of impoverished leaves is high, the height of the sparsi cation tree may be larger than dlog(m=n)e. We tried several methods that prevented the tree height to grow out of control, and found the following to be the most e ective in practice: We rebuild the sparsi cation tree from scratch each time the di erence between the actual height and the supposed height (i.e., dlog(m=n)e) is more than one. These recomputations revealed to be extremely rare in practice, and thus the computation overhead involved in tree rebuilds seems to be a very good price to pay for the deletions' speed-up. 6

3. During an update operation, we do not always percolate the update all the way up. Rather than recomputing all the certi cates from a leaf to the tree root, we stop at the rst tree node whose certi cate is not changed by the graph update; all the ancestors of this node will not be a ected by the update as well. Note that the change can still percolate all the way up to the root, thus requiring us to examine (dlog(m=n)e +1) tree nodes in the worst case. However, this method is expected to introduce substantial savings in the average, as we will show in Section 5. The implementation design took into account especially eciency, modularity, readability, and portability. We choose C++ and LEDA, and heavily exploited their object-oriented features to achieve these goals. The module for simple sparsi cation can be viewed as a black box which takes in input two external C++ functions, one for testing property P and the other for computing sparse certi cates. For many problems these two functions can be identical. Simple sparsi cation produces a faster program for maintaining dynamically the property P during edge updates. The interface of the sparsi cation module to the code for the external functions was designed to be both exible and easy-to-use without sacri cing eciency. Indeed, the codes for these functions is represented by user-de ned functions whose addresses are stored for sake of eciency in a high-level class of the module, and may be modi ed at any time to adapt to di erent algorithms. The user-de ned functions are supposed to take a list of input edges and to produce in output the list of edges that are part of the solution. This is the only information needed to interface the user-de ned functions and the sparsi cation module: neither of them need to know more about the other. We use only two C++ classes to de ne the main data structures needed for simple sparsi cation: the class Sparse manages the sparsi cation tree, and the class SparseNode maintains information about the internal details of sparsi cation tree nodes.

2.1.1 The Class Sparse The constructor Sparse::Sparse() is used to initialize a sparsi cation tree from a given graph. The method Sparse::Rebuild rebuilds a sparsi cation tree which grew out of balance because of edge deletions. We needed a special method for this task, since the tree to be rebuilt must keep the same address as the old one. The member functions of the Sparse class represent the front-end access to the sparsi cation tree. Sparse::InsertEdge updates the sparsi cation tree after the addition of a new edge, while Sparse::DeleteEdge removes an existing edge from the sparsi cation tree. Finally, there are few functions that are used to print and draw sparsi cation trees, such as Sparse::PrintResult that outputs the solution stored at the tree root, and Sparse::Draw that draws a sparsi cation tree. Finally, there is a destructor Sparse::Del tree which recursively releases the memory used by each node of the sparsi cation tree.

2.1.2 The Class SparseNode The class SparseNode deals with the low-level operations in the sparsi cation tree nodes which are needed by the class Sparse. The constructor SparseNode::SparseNode creates a new tree node, together with its input and certi cate list. The constructor behaves di erently according to whether the tree node is a leaf, an internal node or the tree root. The member functions 7

of the class SparseNode are SparseNode::Merge that computes the input list of a given tree node, merging the two certi cate lists of the children nodes, SparseNode::UpdateIns that propagates the changes caused by a given insertion in the tree path from a leaf to the tree root, SparseNode::UpdateDel that propagates the changes caused by a given insertion in the tree path from a leaf to the tree root. Furthermore, there is a function SparseNode::InitStruct, which initializes the sparsi cation subtree rooted at a given node in a bottom-up fashion. This function is useful in the initializations and rebuilds of the sparsi cation tree. Finally, there is a printing utility: the function SparseNode::DumpNode, which prints the input and certi cate list of a given node.

3 The Randomized Dynamic Connectivity Algorithm by Henzinger and King In this section we describe the randomized dynamic connectivity algorithm by Henzinger and King, described in [19] and implemented in [2]. It achieves an expected amortized update time of O(log3 n) for a sequence of at least m0 updates, where m0 is the number of edges in the initial graph, and a query time of O(log n). It needs (m + n log n) space. In Section 3.1, we present the main ideas of the algorithm, and in Section 3.2 we describe the new data structure for representing a spanning forest which is a main ingredient of the algorithm. In Section 3.3, we describe a simpli ed version of the algorithm for random inputs.

3.1 Main Ideas Like all previous dynamic connectivity algorithms, the algorithm by Henzinger and King also maintains a spanning forest. All trees in the spanning forest are maintained in a data structure which allows one to obtain logarithmic updates and queries within the forest. All we have to do is to keep it spanning, so the crucial case is the deletion of a forest edge. Unlike the dynamic connectivity algorithms by Frederickson and the algorithm by Eppstein et al. which is based on the former, the algorithm by Henzinger and King is not based on minimum spanning forests. This has the advantage that in case of a forest edge deletion, we do not have to nd a reconnecting edge of lowest weight. If there are reconnecting edges at all, then any of them is suitable. One main idea is to use random sampling among the edges incident to the tree T containing the forest edge e to be deleted, in order to nd a replacement edge quickly. A single random edge adjacent to T can be sampled and tested whether it reconnects T in logarithmic time. The goal is an update time of O(log3 n), so the number of sampled edges is O(log2 n). However, the set of possible edges reconnecting the two parts of T , which is called the candidate set of e in the following, might only be a small fraction of all non-tree edges which are adjacent to T . In this case it is unlikely to nd a replacement edge for e among the sampled edges. If there is no candidate among the sampled edges, the algorithm does the following. Let T1 and T2 be the components that T is split into by the deletion of e, and let T1 be the smaller such component. Then the algorithm checks all the edges incident to T1 . Otherwise it would not be guaranteed to provide correct answers to the queries. Since there might be a lot of edges which are adjacent to T , this could be an expensive operation, so it should be a low probability event. This is not 8

yet true, since deleting all edges in a relatively small candidate set, reinserting them, deleting them again, and so on will almost surely produce many of those events. The second main idea prevents this undesirable behavior. The algorithm maintains an edge decomposition of the current graph G into O(log n) edge disjoint subgraphs Gi = (V; Ei). These subgraphs are hierarchically ordered. Each i corresponds to a level. For each level i, there is a forest Fi of Gi such that the union [ik Fi is a spanning forest of [ik Gi ; in particular the union F of all Fi is a spanning forest of G. A spanning tree at level i is a tree in [ji Fj . The weight w(T ) of a spanning tree T at level i is the number of pairs (e0 ; v) such that e0 is a non-tree edge in Gi adjacent to the node v in T . If T1 and T2 are the two trees resulting from the deletion of e, we sample edges adjacent to the tree with the smaller weight. If sampling is unsuccessful due to a candidate set which is non-empty but relatively small, then the two pieces of the tree which was split are reconnected on the next higher level using one candidate, and all other candidate edges are copied to that level. The idea is to have sparse cuts on high levels and dense cuts on low levels. Non-tree edges always belong to the lowest level where their endpoints are connected or a higher level, and we always start sampling at the level of the deleted tree edge. After moving the candidates one level up, they are normally no longer a small fraction of all adjacent non-tree edges at the new level. If the candidate set on one level is empty, we try to sample on the next higher level. There is one more case to mention: if sampling was unsuccessful despite the fact that the candidate set was big enough, which means that we had bad luck, we do not move the candidates to the next level, since this event has a small probability and does not happen very frequently. The pseudocode for replace(u; v; i), which is called after the deletion of the forest edge e = (u; v ) on level i , is illustrated in Figure 1. The \constants" depending only on n (i.e., log2 n; 16 log2 n; 8 log n) can be changed in the implementation. If there are only deletions of edges, a bound of d2 log ne for the number of levels is guaranteed. If there are also insertions, there have to be periodical rebuilds of parts of the data structure to achieve the same bound. There is a function called move edges taking the number, i say, of a level. It moves all edges at levels i and above to level i ? 1. This is a rebuild of level i. An edge is added to level i, if it is either newly inserted into level i or moved up from level i ? 1 during a replace. At each level there is a bound which is some constant for the highest level and twice the bound of level i + 1 for level i. Level i is rebuilt when the number of edges added to levels i and above reaches the bound for level i. The cost for moving edges up or down is also O(log3 n) per update in the amortized sense, if there are at least (m0) updates. In the implementation the number of levels as well as the bound for rebuilds on the highest level can be set to other values than those proposed by [19] (possibly invalidating the analysis). By using di erent parameters, we realized a simpli ed variant of the algorithm which performs well for random inputs. For convenience, we used an initialization procedure with a bound of O((m + n log n) log n). We simply insert the initial edges into the empty graph. Since there are no deletions, we get the above bound. Using a more sophisticated procedure would give a bound of O(m + n log n) [18]. Very recently, Henzinger and Thorup improved the theoretical bound for updates to O(log2 n) expected amortized time by a more sophisticated sampling scheme [20]. This is not yet implemented. 9

REPLACE(u,v,i)

1. Let Tu and Tv be the spanning trees at level i containing u and v , respectively. Let T be the tree with smaller weight among Tu and Tv . Ties are broken arbitrarily. 2. If w(T ) > log2 n then (a) Repeat sample and test(T ) for at most 16 log2 n times. Stop if a replacement edge e is found. (b) If a replacement edge e is found then do delete non tree(e), insert tree(e; i), and return. 3. (a) Let S be the set of edges with exactly one endpoint in T . (b) If jS j  w(T )=(8 log n) then Select one e 2 S , delete non tree(e), and insert tree(e; i). (c) Else if 0 < jS j < w(T )=(8 log n) then Delete one edge e from S , delete non tree(e), and insert tree(e; i + 1). Forall e0 2 S do delete non tree(e0) and insert non tree(e0; i + 1). Update added edges[i + 1] and rebuild level i + 1 if necessary. (d) Else if i < l then replace(u; v; i + 1).

Figure 1. The pseudocode for replace(u,v,i).

10

3.2 Euler Tour Trees A tree T in the spanning forest at level i is represented implicitly by some encoding, the Euler tour sequence of T . This is a sequence ET (T ) of the vertices of T in the order in which they are encountered during a traversal of T starting at an arbitrarily selected vertex r, the root of T . The traversal is an Euler tour of a modi ed T where each edge is doubled. Thus, each node v occurs exactly d(v ) times except for the root which appears d(r)+1 times. In total, the sequence has 2k ? 1 occurrences where k is the number of nodes in T . Each edge is represented by three or four occurrences of tree nodes corresponding to its two traversals. An edge is represented by only three occurrences, if its two traversals in the Euler tour are consecutive. The tree T is subject to changes. It may be split by removing an edge or joined with another tree by inserting an edge. For the e ect of these operations on the encoding of a tree see Figures 2 and 3. Each update is handled by a constant number of splitting or joining operations on the corresponding one or two Euler tour sequences. T1

T2

ET(T1)

ET(T2 )

1 4

2 3

11

21

31

22

12

11

21

41

51

42

41

51

42

5

1 2

e

3

4

23

31

22

12

5

T

ET(T) occurrences representing e

Figure 2. Inserting and Deleting a Tree Edge This gure illustrates the e ect of inserting a new tree edge between 2 and 4. For convenience T2 is already rooted at 4. Otherwise, we had to change the root before. This gure also illustrates the e ect of deleting the edge e in T when viewed from bottom to top.

We maintain the Euler tour sequence ET (T ) of a tree T implicitly by storing the occurrences in in-order at the nodes of a balanced binary tree which is called the ET-tree for T . ET-trees are derived from balanced binary trees with node weights. For each node v of T one of the occurrences of v in ET (T ) is arbitrarily selected to be the active occurrence of v . The active occurrence of v gets a node weight equal to the number of non-tree edges adjacent to v at level i. Nodes which are not active get a weight of zero. At each node of an ET-tree the weight of 11

ET(T)

T

move

remove 2 1 3

append 4

before:

11

after:

21

12

31

13

41

14

31

13

41

14

21

12

32

new root

Figure 3. Changing the Root This gure illustrates the e ect of changing the root of T from 1 to 3 on the Euler tour sequence of T.

its subtree is maintained. In particular we can determine the weight of a tree in the current spanning forest by looking at the subtree weight of the root of its corresponding ET-tree. This information is needed in replace. The modi cations in the Euler tour sequence arising from an update are handled by splitting and merging ET-trees. Thus, inserting or deleting an edge can be handled in logarithmic time. We de ne a unique key for every tree T in the spanning forest at level i. It is the root node in the representation of T as a balanced binary tree. We de ne the key of a vertex v at level i to be the key of the tree T at level i containing v. The key of v at level i can be found in logarithmic time by following the path up in the balanced binary tree starting at the active occurrence of v at level i. By comparing the keys of two vertices u and v at level i, we can deduce whether they are connected at level i. In particular, u and v are connected in G, if they are connected at the highest level. Thus, we can answer a query in logarithmic time. At each node v , we store a balanced binary tree containing the non-tree edges at level i which are adjacent to v . We also maintain subtree weights in this balanced binary tree. In order to sample a random non-tree edge adjacent to the tree T at level i after the deletion of a tree edge we do the following. In a rst step, we choose a random integer a in the range from 1 to w, where w is the weight of the ET-tree for T . Using the subtree weights in this ET-tree we can retrieve the node v belonging to a in logarithmic time. In general v represents more than one integer, because there are more than one non-tree edges adjacent to v . Thus, let b be the o set from the beginning of the interval of integers represented by v to a. In the second step we select the edge b + 1 in the tree of non-tree edges adjacent to v at level i, which takes logarithmic time again. This sampling procedure chooses a non-tree edge adjacent to T at level i almost uniformly for the following reasons. We uniformly choose among the pairs (v; e) in the rst step, where v is a vertex in T and e is a non-tree edge adjacent to v at level i. There are two di erent possibilities for a non-tree edge e adjacent to T , either both of its endpoints belong to T or just one of them. All edges in the rst category are chosen with the same probability p. All edges in the second category are chosen with probability q, where p = 2q, since there are two pairs (v; e) which account for an edge e, if it is in the rst category, whereas there is just one such pair, if e is in the second category. 12

3.3 A Simpli ed Version of Henzinger-King Preliminary experiments with random inputs showed that moving edges to a higher level almost never occurred. Moreover, the average number of edges sampled during a successful sampling was less than 2. This lead to a simpli ed variant of the Henzinger and King algorithm: there is just one level, so we do not move edges up or down, and we sample only at most c log n edges instead of 16 log2 n. On random inputs this was always faster than the original algorithm and was more robust to input variations. This variant takes O(m + n) space and O(m log n + n) preprocessing time using the simple initialization scheme. Insertions, deletions of non-tree edges and queries take logarithmic time, but currently there is no better bound for the running time of a tree edge deletion than O(m log n) worst-case time. We conjecture that the deletion of a tree edge takes O(log2 n) expected time for random inputs. This is motivated by the following observation. Consider a random subset S consisting of at most half of the vertices of a random graph. The ratio between the expected number of edges with exactly one endpoint in S and the expected number of edges with both endpoints in S is greater than 1, and it grows as S becomes smaller. Our experiments with random inputs indicate a good performance of this variant, see Section 6.

4 Simple Algorithms Beside the sophisticated algorithms, we also considered some simple algorithms, which were very easy to implement in only a few lines of code. Usually, this means that their implementation constants are extremely good, and thus these algorithms are likely to be very fast in practice for reasonable inputs. For the dynamic connectivity problem, we considered two simple algorithms, called fast{update and fast{query. They are de ned as follows. In case of an update the fast{ update algorithm does nothing, except for updating two adjacency lists. It then answers a query(u; v ) by doing a breadth rst search starting from u. The fast{query algorithm maintains a spanning forest and component labels at the vertices. These can be computed by a breadth rst search for each connected component. In case of a deletion, the algorithm checks whether the edge to be deleted is in the current spanning forest (those edges are marked). In case of an insertion, the algorithm checks whether the inserted edge connects two distinct connected components of the former graph by using the component labels. In either of these cases, the spanning forest is forced to change, and the algorithm recomputes from scratch the trees and component labels for the a ected components. If an update does not a ect the current spanning tree, it is handled as in the fast{update algorithm. A query is answered using the component labels of the input nodes. Clearly, the fast{update algorithm takes constant time for updates and O(n + m) for queries in the worst case, while the fast{query algorithm takes constant time for queries and non-tree updates and O(n + m) for tree updates. Both algorithms take O(n + m) space and preprocessing time. For random inputs the fast{query algorithm takes only O(n) expected time per update independently of the number of edges (cf. [4]). These two algorithms are extreme cases, as one is optimal for updates only and one is optimal for queries only. When we consider a particular application, we have to take this into 13

account. If there are no queries at all, this is no longer a connectivity problem. If there are only queries, the problem becomes static, and the fast{query algorithm turns into an optimal static algorithm, which we certainly cannot beat with any dynamic algorithm.

5 Average{Case Analysis of Simple Sparsi cation In Section 2, we saw that simple sparsi cation is able to turn an f (n; m) bound into a better O(f (n; cn) log(m=n)) bound. We recall here that the log(m=n) factor derives from the fact that an update may percolate from a leaf up to the sparsi cation tree root in the worst case. However, when the certi cate at a certain level is not changed by the update, there is no need to propagate further up this update in the sparsi cation tree. In this section we prove that by not propagating further than we need to, for random inputs in the average the logarithmic factor disappears from the previous bound, and the expected number of tree nodes examined by each update is a very small constant. As a consequence, the exact constants hidden in the asymptotic notation are extremely small, which makes simple sparsi cation already better than improved sparsi cation in practice. Before embarking on the average-case analysis of simple sparsi cation, we prove few technical lemmas about sparsi cation trees that will be useful in the analysis. We recall here that the sparsi cation tree we chose in our implementation is a balanced binary tree, having all leaves at the same level. Secondly, all (but possibly the rightmost) of the internal nodes at any level ` > 0 have exactly two children. Other choices of sparsi cation trees are possible, as long as the sparsi cation tree depth remains logarithmic in the number of leaves. Let T be a sparsi cation tree, and let v be any node of T . We denote by L(v ) the set of leaves that are descendants of v in T . We use `(v ) = jL(v )j to denote the number of leaves that are descendant of v . If r is the sparsi cation tree root, then L(r) = L(T ) denotes the set of leaves of T . The following fact follows directly from the de nition of sparsi cation tree. Level 0 is the level with the leaves.

Fact 3. Let T be a sparsi cation tree of n nodes and height h, and let vk be a sparsi cation tree node at level k, 0  k  h. If vk is not the rightmost node at level k, then `(vk ) = 2k . If

vk is the rightmost node, `(vk ) = 2k if n is a multiple of 2k , and `(vk ) = (n mod 2k ) otherwise.

Lemma 1. A sparsi cation tree with ` leaves has at most ` nodes. 20 9

Proof: Let N (`) be the total number of nodes in a sparsi cation tree with ` leaves. By de nition of sparsi cation tree, N (`) can be computed by the following recurrence 8 > < ` + N ( `` ) if ` is even N (`) = > ` + N ( ) if ` is odd :1 if ` = 1 whose solution satis es N (`)  `. 2 2 +1 2

20 9

Note that the bound of Lemma 1 is tight, as N (9) = 20. 14

De nition 3. A canonical sparsi cation tree is a weighted sparsi cation tree T such that each node v of T has the following cost: cost(v ) = `(1v )

De nition 4. Given an integer c  0, a sparsi cation tree with threshold c is a weighted 0

0

sparsi cation tree T such that each node v of T has the following cost: cost(v ) = 1 min fc0; `(v )g `(v) Note that a canonical sparsi cation tree is a special case of sparsi cation trees with threshold equal to 1. De nition 5. Given a weighted sparsi cation tree T , and a node v 2 T , let (v) be the path from v to the root. The weighted depth of v is given by: X d(v; T ) = cost(x) x2(v)

while the average weighted depth of T is de ned as follows: X d(v; T ) d(T ) = jL(1T )j v2L(T ) The following two lemmas characterize the average weighted depth of sparsi cation trees. Lemma 2. A canonical sparsi cation tree T has average weighted depth d(T )  20 9 Proof: Let jT j be the total number of nodes in T . By De nition 5,

d(T ) = jL(1T )j where We can rewrite  (T ) as

X

v2L(T )

0 1 X X 1 A = 1  (T ) @ d(v; T ) = jL(1T )j ` ( jL(T )j v2L T x2 v x) ( )

(T ) =

(1)

( )

X X 1

v2L(T ) x2(v) `(x)

(T ) =

X

x2T

(x)

where  (x) is the contribution of each node x 2 T to the sum  (T ). A node x 2 T contributes an additive factor of `(1x) to  (T ) each time it is contained in a di erent leaf-to-root path. Since x has `(x) leaves, it is contained exactly in `(x) di erent leaf-to-root paths: the total contribution of x to  (T ) will therefore be

(x) =

`X (x)

1

=1 i=1 `(x) Hence,  (T ) = jT j. By Lemma 1, jT j  209 jL(T )j and the lemma follows. 2 15

We now generalize the previous lemma to sparsi cation trees with thresholds.

Lemma 3. A sparsi cation tree T with threshold c > 0, has average weighted depth: ( 1 d(T )  dlogc c e + ifif cc  >1 0

20 9 0

2

0

40 9

0

0

Proof: Let x be any sparsi cation tree node. By De nition 4, cost(x) = `(1x) minfc0; `(x)g

(2)

and the average weighted depth of T is given by X X cost (x) d(T ) = jL(1T )j v2L(T ) x2(v) As in Lemma 2, we can rewrite the sum in d(T ) as follows: X d(T ) = jL(1T )j (x) x2T

(3)

where  (x) = `(x)cost (x ) is the contribution of each node x to the sum. To compute d(T ), we bound  (x) for each node x 2 T . Assume rst that 0 < c0  1. Since for each x 2 T `(x)  1, in this case (2) becomes cost (x) = `(cx0 ) , yielding  (x) = c0 for each x 2 T . Thus (3) becomes

d(T ) = jL(1T )j c0 jT j  20 9 c0 by Lemma 1. Assume now that c0 > 1, and let k0 = dlog2 c0e. Note that 2k0 ?1 < c0  2k0 . Let xk be a sparsi cation tree node at level k: by Fact 3, `(xk )  2k . We claim that:

(xk ) 

(

`(xk ) for 0  k  k0 ? 1 c0 for k  k0

(4)

Indeed if 0  k  k0 ? 1, `(xk )  2k0 ?1 < c0 , and (2) becomes cost (xk ) = ``((xxkk )) = 1. Thus, (xk ) = `(xk ). Otherwise, if k  k0, cost (xk )  `(cx0k ) , and hence (xk )  c0. For k  0, let Lk (T ) denote the set of nodes at level k in T (note that L0 (T ) = L(T )). De ne T 0 as the sparsi cation tree obtained from T after removing all the nodes at level k, 0  k  k0 ? 1 (namely, T contains all and only the nodes of T at level  k0 ). The number of leaves in T 0 is exactly jL(T 0)j = Lk0 (T ). Then (3) can be decomposed as follows:

0

1

kX 0 ?1 X X X d(T ) = jL(1T )j (x)  jL(1T )j @ `(v) + c0A k=0 v2Lk (T ) x2T 0 x2T

16

P

For 0  k  k0 ? 1, v2Lk (T ) `(v ) = jL(T )j. Hence, T )j + c0jT 0j  dlog c e + 20 jL(T 0)j c d(T )  k0jL(jL 2 0 (T )j 9 jL(T )j 0 by Lemma 1. To complete the proof, it remains to show that c0jL(T 0)j  2jL(T )j. By de nition of T 0 , the leaves of T 0 are all the nodes at level k0 in T : L(T 0) = Lk0 (T ). Let v be any leaf of T 0; v is a node at level k0 in T : let `(v ) be the total number of leaves descending from v in T . By Fact 3, `(v ) = 2k0 unless v is the rightmost node and jL(T )j is not a multiple of 2k0 ; in this case, `(v ) = (jL(T )j mod 2k0 ). Consequently, jL(T )j = 2k0 jL(T 0)j if jL(T )j is a multiple of 2k0 , and jL(T )j = 2k0 (jL(T 0)j ? 1) + (jL(T )j mod 2k0 ) otherwise. In both cases, jL(T )j  2k0?1jL(T 0)j  c20 jL(T 0)j. 2 We now turn to the analysis of the expected running time of simple sparsi cation. We adopt Mulmuley's expected case model of dynamic geometric computation [26], a model which was already used for fully dynamic graph problems by Eppstein [10] and by Alberts and Henzinger [4]. This model makes only very weak assumptions about the input distribution. Namely, the order in which edges are inserted and deleted is assumed to be random, but the set of edges to be inserted or deleted, and the times at which insertions and deletions occur are assumed to be chosen by an adversary (i.e., worst case). More formally, let G = (V; E ) be a graph with n vertices?and m edges, which ?is dynamically changing because of edge insertions and edge deletions.  V n edges de ned on on the vertex set V . A pattern for G is given by Let 2 be the set of? all 2  e edges, together with a string S consisting of `+' and a set Ee , E  Ee  V2 , consisting of m e `?'. The set E is called the universe and determines the set of edges to be inserted and deleted in the random graph, while the string S determines the types of update operations applied to the random graph: each `+' represents an edge insertion, and each `?' an edge deletion. Each pattern determines a space of update sequences as follows. The string is examined from left to right: if the current symbol is a `+', then an edge is chosen uniformly at random among the edges in the universe Ee not yet in E , and it is added to the graph G. If the current symbol is a `?', then an edge is chosen uniformly at random among all the edges in E , and the graph G is updated by removing this edge. Given an initial graph G, and a pattern for G, the expected running time of an algorithm in this model is given by the time averaged over all possible update sequences consistent with that pattern. Namely, given a graph G, an adversary xes a pattern for G: but once the pattern is xed, we can expect the actual update sequence to be chosen randomly among all of the possible sequences determined by the pattern. We remark that in this model the pattern is not necessarily known to the algorithm but may be revealed only as the actual sequence of updates proceeds. This will be an important point in the average-case analysis of simple sparsi cation. Before embarking on the analysis, however, we need to establish some preliminary results in this model. We start with the following de nitions rst.

De nition 6. Let G = (V; E ) be a random graph with m edges, and P a graph property for

which we can compute a certi cate C in G. Let e be an edge to be deleted randomly from G, and let G? be the graph obtained from G after the deletion of e. Let del (e; E ) be de ned as follows: del (e; E ) = 1 if C is no longer a certi cate for P in G? , and del (e; E ) = 0 otherwise. 17

The deletion factor of G is de ned as follows: X F (G) = 1

?me 

del

m EEe jE j=m

! 1 X  (e; E ) del m e2E

De nition 7. Let G = (V; E ) be a random graph with m edges, and P a graph property for which we can compute a certi cate C in G. Let e be an edge chosen at random from Ee n E , and let G+ be the graph obtained after the insertion of e into G. Let ins (e; E ) be de ned as follows: ins (e; E ) = 1 if C is no longer a certi cate for P in G+ , and ins (e; E ) = 0 otherwise. The insertion factor of G is de ned as follows:

0 1 X X B@ 1 CA F ins (G) = ?me1   ( e; E ) ins e ? m) e2EenE (m m EEe jE j m =

Let G = (V; E ) be a random graph, and let C be a certi cate for property P in G. Since every pair (e; E ) is equally likely in our model, intuitively F ins (G) represents the expectation that the insertion of a new edge will invalidate C (i.e., the expectation that C will no longer be a certi cate for the new graph), and F del (G) represents the expectation that the deletion of an existing edge will invalidate C . We now need the notion of good certi cate :

De nition 8. Let a graph G be given, let G be the graph obtained from G after inserting +

an edge e+ in G, and let G? be the graph obtained from G after deleting an edge e? from G. Given a property P to be maintained for G, and a certi cate C for P in G, C is said to be a good certi cate if the following two conditions are met: (1) If C is no longer a certi cate for P in G+ , then the edge e+ must be contained in all the certi cates for G+ . (2) If C is no longer a certi cate for P in G? , then the edge e? must be contained in C . We remark that most of the certi cates we know of for unweighted graphs are subgraphs of the original graph G and good certi cates: indeed the certi cates for connectivity, and kedge connectivity trivially satisfy both conditions of De nition 8. The following two lemmas establish upper bounds for F ins (G) and F del (G) in case of good c-sparse certi cates.

Lemma 4. Let G = (V; E ) be an unweighted random graph with m edges, and let P be a property to be maintained on G, for which we can compute a good c-sparse certi cate C . Let e be an edge to be deleted randomly from G: then F (G)  1 minfcn; mg: del

m

18

Proof: By De nition 6,

X F del (G) = ?m1e 

m EEe jE j=m

! 1 X  (e; E ) del m e2E

Recall that C is a c-sparse certi cate on a graph with m edges: any such certi cate has at most minfcn; mg edges. Combining condition (2) of De nition 8 with De nition 6 yields that whenever del (e; E ) 6= 0, we have that del (e; E ) = 1, and that e must belong to C . This implies that for any possible choice of E , and thus independent of the way in which the current certi cate C was built,

X

e2E

and hence

del (e; E )  minfcn; mg

F del (G)  ?m1e  m

2

e jE j = m 8E  E;

X minfcn; mg minfcn; mg = m m EEe jE j m =

Lemma 5. Let G be an unweighted graph with m edges, and let P be a property to be

maintained on G, for which we can compute a good c-sparse certi cate C . Let e be an edge chosen at random which is to be inserted in G: then F ins (G)  (m 1+ 1) minfcn; m + 1g:

Proof: By De nition 7,

0 1 X B 1 X F ins (G) = ?me1  ins (e; E )C @ (me ? m) A m EEe e2EenE jE j=m

Let E 0 = E [ feg. We rewrite F ins (G) in terms of E 0 by using properties of our model: X X F ins (G) = ?me  1 ins (e; E 0 ? e) e ? m) E0 Ee e2E0 m (m jE 0 j=m+1

?  ?  Since mme (me ? m) = mme+1 (m + 1), we have that

1 0 X @ 1 X ins (e; E 0 ? e)A F ins (G) = ? me  ( m + 1) e2E 0 m E0 Ee jE 0 j m 1

+1

=

+1

19

Because of condition (1) of De nition 8 and De nition 7, whenever ins (e; E 0 ? e) 6= 0, we have that ins (e; E 0 ? e) = 1 and that e must be in all the possible certi cates of the new graph obtained after inserting e. As these certi cates are c-sparse, this can happen at most minfcn; m + 1g times for any possible choice of E 0:

X

e2E

and thus

0

ins (e; E 0 ? e)  minfcn; m + 1g

X minfcn; m + 1g minfcn; m + 1g = F ins (G)  ? me1  m + 1 m+1 m+1 E0 Ee jE 0j=m+1

2

We are now ready to analyze the expected running time of simple sparsi cation. Let G = (V; E ) a graph, and let T be a sparsi cation tree for G. For some k, 0  k  dlog(m=n)e, let k be a sparsi cation tree node at height k, let Ck be the certi cate stored at k , and let Gk = (V; Ek) be the corresponding subgraph, i.e., the graph containing the edges in the groups at the leaves descending from k . For sake of simplicity we assume that the groups at the leaves have exactly n edges.

Theorem 3. Let P be a property to be maintained on an unweighted graph G for which we

can compute c-sparse good certi cates, and let e be an edge to be deleted uniformly at random from G. Then, the expected number of sparsi cation tree nodes a ected by the deletion of e is at most 209 c if c  1, and at most (dlog2 ce + 409 ) if c > 1.

Proof: Let T be the sparsi cation tree for G, and let k be a sparsi cation tree node at height k. The graph associated with k is a random graph Gk = (V; Ek) with n vertices and mk edges. Since the edge e to be deleted is chosen at random, e is contained with equal probability in any of the sparsi cation tree leaves. Thus, the expected number of sparsi cation tree nodes a ected by a random deletion is bounded above by

1 0 1 0 CC B 1 X X F (G ) 1 X @ 1 X  (e; E )AC 1 X X B B = del k ?  C B jL(T )j v2L T k 2 v @ mmfkk E Ee mk e2Ek jL(T )j v2L T k 2 v del k A k k ( )

By Lemma 4,

( )

( )

jEk j=mk

cn; mk g F del(Gk )  minfm k

( )

Gk = (V; Ek) is the graph associated with the sparsi cation tree node k . As node k has `(k ) leaves in T , Gk has exactly mk = n`(k ) edges, and thus cn; n`(k )g = minfc; `(k)g F del (Gk )  minfn` (k ) `(k ) 20

Hence, the average number of sparsi cation tree nodes examined by a random deletion is bounded above by 1 X X minfc; `(k)g jL(T )j v2L(T ) k 2(v) `(k) By de nition 5, the above quantity is exactly the average depth of the weighted sparsi cation tree T with threshold c. Thus, the theorem follows directly from Lemma 3. 2

Theorem 4. Let P be a property to be maintained on an unweighted graph G for which we can compute c-sparse good certi cates, and let e be an edge to be inserted uniformly at random from the set of edges not yet in G. Then, the expected number of sparsi cation tree nodes a ected by the insertion of e is at most 209 c if c  1, and at most (dlog2 ce + 209 ) if c > 1. Proof: Let Gk = (V; Ek) be a random graph with n vertices and mk edges. As in Theorem 3, the expected number of sparsi cation tree nodes a ected by a random insertion is bounded above by

0 0 11 CC X 1 X X B 1 1 X X F (G ) C BB ? 1  X B =  ( e; E ) C @ A ins A jL(T )j v2L T k 2 v ins k jL(T )j v2L T k 2 v @ mme EEe (me ? m) e2EenE jE j m ( )

( )

( )

( )

=

By Lemma 5,

cn; mk + 1g F ins (Gk )  minfm k +1 Proceeding as in Theorem 3 yields the desired bounds. 2 We observe that Theorem 3 still holds if the graph is weighted (e.g., also for the dynamic minimum spanning tree problem). Indeed, the presence of edge costs does not interfere with the average-case analysis, as the edge to be deleted is chosen at random and independently of its cost. On the contrary, Theorem 4 does not extend easily to weighted graphs, since for the same analysis to hold, we need that the cost of every edge be xed (although arbitrary). If this is the case, then Theorem 4 still holds. However, we nd this assumption unrealistic, as in a truly random model for dynamic graphs one would expect that the cost of an edge to be inserted can be chosen at random, and thus can be di erent in di erent experiments. We believe that this is not a limitation in Mulmuley's random model, as in geometric problems the costs are often related to geometric distances (which makes the assumption about xed costs realistic in the geometric setting). Rather, this limitation seems inherent in the extension of Mulmuley's model to fully dynamic graph problems. As we will show in Section 6, the experimental data about c-certi cates for unweighted graphs con rms precisely the theoretical predictions of Theorems 3 and 4, thus proving that their bounds are extremely tight. This gives practical evidence that the theoretical model we adopted suites well the evolution of dynamic unweighted random graphs. Moreover, the experimental data on dynamic minimum spanning trees con rm our criticism on the extension of this model to weighted graphs. Indeed, in the case of minimum spanning tree problems, the average number of sparsi cation tree nodes examined for each deletion is still no more than 21

2, according to Theorem 3. However, the average number of tree nodes examined for each insertion does not comply at all with the bounds computed in Theorem 4. We end this section by mentioning that other choices on the structure of the sparsi cation tree would yield a di erent constant in Theorems 3 and 4. The best constant is achieved by a complete binary tree whose leaves are in at most two adjacent levels: indeed in this case Lemma 1 holds with the constant 2 in place of 209 (namely, a tree of this kind with ` leaves has at most 2` nodes). The same analysis carried out for Theorems 3 and 4 show that for these types of sparsi cation trees the average number of tree nodes examined by an update is no more than 2.

6 Experimental Results 6.1 Implemented Data Structures We used implementations of the following data structures. fast{update This algorithm only changes two adjacency lists of the graph in case of an update. It answers query (u; v ) simply by doing a breadth rst search for v starting at u. fast{query We maintain component labels at the vertices and a spanning forest by means of edge labels. Queries are answered by comparing vertex labels. Non-tree edge updates are handled as above in fast{update. If a tree edge is inserted, we just relabel one of the two components which get connected. If a tree edge is deleted, we start to label the vertices and mark the tree edges at one endpoint of the deleted edge using breadth rst search. sparsi cation This is an implementation of simple sparsi cation on top of a semi-dynamic spanning forest algorithm, cf. Section 2. Henzinger-King This is our implementation of the algorithm by Henzinger and King, see Section 3. Henzinger-King variant This is the variant of Henzinger and King's algorithm as described on page 13. It is implemented by a suitable choice of parameters in Henzinger-King, cf. Section 3.3. The correctness of the implementations was checked by comparing the answers to the queries in sequences of random operations among the di erent implementations. Additionally, we performed tests with random updates, and every time a speci ed number of updates had been done, we checked whether all possible queries were answered according to the component labels of the vertices. These labels were computed by the static COMPONENTS algorithm which is part of LEDA. Clearly, the fast{update algorithm takes constant time for updates and O(n + m) for queries in the worst case. For random inputs however, fast{query is asymptotically faster, if there are !(n) edges, because it takes only O(n) expected time per update independently of the number of edges. For random inputs with o(n) edges the expected running time of fast-query is even better, namely O(log n) per update, since the graph consists with high probability of many sparse connected components of size O(log n). 22

Algorithm

fast{query fast{update sparsi cation Henzinger-King Henzinger-King variant

Query Insert Delete O(1) O(n + m) O(n + m) O(n + m) O(1) O(1) O( (n; n)) O( (n; n) log(m=n)) O(n (n; n) log(m=n)) O(log n) O(log 3 n) O(log3 n) O(log n) O(log n) O(m log n)

Table 1. Bounds for Worst-Case Inputs.

The fast{query algorithm and the fast{update algorithm are extreme cases, as the latter is optimal for updates only and the former is optimal for queries only. When we consider a particular application, we have to take this into account. If there are no queries at all, this is no longer a connectivity problem. If there are only queries, the problem becomes static, and fast{query turns into an optimal static algorithm, which we certainly cannot beat with any dynamic algorithm. Moreover, it is a common phenomenon that a theoretically very fast algorithm is outperformed on all small input sizes by a simpler method, because the constants hidden in the O{notation can be really large. Algorithm

fast{query fast{update sparsi cation Henzinger-King Henzinger-King variant

Query Insert Delete O(1) O(n) O(n) O(n + m) O(1) O(1) O( (n; n)) O( (n; n)) O(n (n; n)) O(log n) O(log 3 n) O(log3 n) O(log n) O(log n) O(m log n) (O(log2 n) conj.)

Table 2. Bounds for Random Inputs.

In Table 1 we summarize the bounds for the worst case performance of the di erent algorithms. In Table 2 we summarize the average case bounds. Table 3 shows preprocessing time and space usage. The bounds for preprocessing for Henzinger-King and Henzinger-King variant are not best possible. They refer to our implementation. By using a more sophisticated initialization procedure, we get a preprocessing time of O(m + n log n) for Henzinger-King and of O(n + m) for Henzinger-King variant [18]. Algorithm

fast{query fast{update sparsi cation Henzinger-King Henzinger-King variant

Preprocessing Space Usage O(n + m) O(n + m) O(n + m) O(n + m) O(n + m (n; m)) O(n + m) O((m + n log n) log n) O(m + n log n) O(n + m log n) O(n + m)

Table 3. Preprocessing and Space Usage

23

6.2 Test Environment All data structures were implemented in C++ using LEDA. We used the GNU g++ compiler version 2.7.0 with optimization option -O3. We disabled the low-level consistency checks which LEDA normally performs by specifying -DLEDA CHECKING OFF. Our experiments were run on two machines. One was a DEC APX 4000/720 machine, with two 190 MHz Alpha CPUs, and 128 MB of RAM. It has a hierarchical memory architecture with a 4 MB cache. The other machine was a SUN SparcStation 10/51. This machine has two 51 MHz Sparc CPUs, 64 MB of RAM, and a cache of 1 MB. All tests on this machine were run on just one of its two CPUs. We measured separately the times that the algorithms spent in preprocessing, i.e., building the data structure for the initial graph, and processing, i.e., performing the sequence of updates and/or queries. We did not measure the times required for loading the initial graphs. All our times were CPU times measured in seconds.

6.3 Sparsi cation We rst present our experiments with simple sparsi cation. We ran simple sparsi cation on top of three di erent algorithms for solving two di erent problems: maintaining a spanning forest of a dynamic graph (i.e., dynamic connectivity) and maintaining dynamically a minimum spanning tree. The rst algorithm is the LEDA Spanning Tree function, the second is a partially dynamic algorithm based on the LEDA Spanning Tree function, and the third is the LEDA Min Spanning Tree function. The LEDA Spanning Tree function computes the spanning forest of a graph with m edges and n vertices in O(n + m (m; n)) worst-case time. This is the time needed, if we recompute the spanning forest of a graph after each edge insertion or edge deletion. The partially dynamic algorithm is able to obtain faster times for this problem: indeed it supports an edge insertion in O( (m; n)) time, and recomputes the solution from scratch after each deletion in O(n + m (m; n)) time. The LEDA Min Spanning Tree function implements Kruskal's algorithm, thus computing the minimum spanning tree of a graph in a total of O(m log n) time. In our implementation the underlying functions of simple sparsi cation can be easily changed. For the dynamic connectivity problem, we plan to make further experiments with algorithms that are more ecient than Spanning Tree, such as fast{query for random inputs (this choice being motivated by practical experiments) and Frederickson's algorithm [14] for non-random inputs (this choice being motivated by theoretical arguments). We compared the average running times of simple sparsi cation and the underlying algorithm. We performed these tests on random sequences of updates starting from random graphs of di erent densities. Since the height of the sparsi cation tree of a graph with m edges and n vertices is dlog(m=n)e, di erent graph parameters, and more precisely di erent m=n ratios give rise to sparsi cation trees of di erent heights. For sake of brevity, we represent here only a small but signi cant set of data: for n = 10, 25, 50, 250 and 500 we choose the number of edges in the initial graph to be m = n (sparse graphs), n ln n (moderately dense graphs), n3=2 (dense graphs) and n2 =4 (very dense graphs), and the total number of updates to be 500. We measured other graph and update sizes and reported similar data. By Theorems 3 and 4, the average number of sparsi cation tree nodes examined during an update is a small constant, independent of the tree height. We remark that Theorem 3 still 24

holds if the graph is weighted (e.g., also for the dynamic minimum spanning tree problem). In case of insertions, however, Theorem 4 does not extend to weighted graphs, as for the same analysis to hold, we need the cost of every edge to be xed, an assumption which we believe unrealistic.

n0

m0

Spanning Tree

Min Spanning Tree

Up ins Up del Up ins 10 10 1.004 1.214 1.476 10 23 0.946 1.693 2.480 10 25 1.036 1.592 2.124 10 31 0.909 1.695 2.436 25 25 0.971 1.252 1.476 25 80 0.768 1.824 2.404 25 124 0.709 1.849 2.724 25 156 0.672 1.844 2.776 50 50 0.904 1.032 1.512 50 195 0.669 1.605 2.172 50 353 0.565 1.753 2.692 50 625 0.666 1.848 3.032 100 100 0.936 1.086 1.328 100 460 0.692 1.731 2.472 100 1000 0.545 1.951 2.928 100 2500 0.584 1.867 3.120 250 250 0.734 0.889 1.372 250 1380 0.571 1.906 2.572 250 3952 0.466 1.798 2.804 250 15625 0.538 1.875 3.260 500 500 0.814 0.832 1.380 500 3107 0.615 1.836 2.808 500 11180 0.544 1.881 3.124 500 62500 0.534 1.825 3.464

Up del 1.108 1.523 1.660 1.728 1.148 1.528 1.856 1.820 1.056 1.292 1.716 1.832 0.852 1.592 1.772 1.848 0.976 1.676 1.940 1.716 0.880 1.792 1.760 2.012

Table 4. Experiments with simple sparsi cation and c-sparse certi cates, for c = 1. Sparsi cation is run both on top of LEDA Spanning tree and on top of LEDA Min Spanning Tree. Up ins [respectively Up del] denotes the average number of sparsi cation tree nodes a ected by an edge insertion [respectively edge deletion]. Each data set is the average of ten samples.

Table 4 shows the results of some of our experiments with simple sparsi cation on top of the LEDA Spanning Tree function (for unweighted graphs) and on top of the LEDA Min Spanning Tree function (for weighted graphs). In both experiments, we measured the average number of sparsi cation tree nodes a ected by a random update. As shown in Table 4, the experimental data about certi cates for unweighted graphs con rm the theoretical predictions of Theorems 3 and 4, thus showing that the computed bounds are extremely tight. This gives practical evidence that the model we adopt is not only theoretically convenient, but more 25

importantly it suites well the evolution of dynamic unweighted random graphs as it yields signi cant estimates in practice. On the other side, the experimental data we collected on dynamic minimum spanning trees con rm our criticism in this case. Indeed, while the average number of sparsi cation tree nodes a ected by a deletion is still no more than 209 , the number of tree nodes a ected by an insertion does not comply at all with the bounds computed in Theorem 4. An important consequence of the property that a random update involves less than 209 tree nodes in the average for c = 1, is that the running times of simple sparsi cation for random updates are fairly independent of the number of edges: if the number of vertices n is xed, the running times of simple sparsi cation show very little variations as m increases. This was clearly con rmed by our experiments, as illustrated in Table 5. As predicted from the theory, the denser the graph (i.e., the larger the value m=n), the better was the performance of simple sparsi cation over the underlying algorithm. For n = 500, for instance, the speed-up given by simple sparsi cation ranges from around 30% for very small values of m=n (say m=n  3) up to incredible speed-ups for larger values of m=n: when m=n = 125, simple sparsi cation produces algorithms that are around 28 times faster, and when m=n = 249:5 (the maximum value of this ratio for n = 500), algorithms that are around 60 times faster! For sake of completeness, Figure 4 illustrates a detailed test for Spanning Tree and simple sparsi cation on top of Spanning Tree on graphs with 500 vertices and di erent densities. It was surprising for us to notice that simple sparsi cation outperformed the underlying algorithm even for very sparse graphs: indeed in our experiments, simple sparsi cation is no worse than the underlying algorithm even in the range 2 < m=n  3, a case where theoretical arguments would suggest the opposite to be true. This has an explanation for random inputs, however. Indeed, in this case the sparsi cation tree has height equal to three, and the two non-root nodes in the rightmost path have exactly m ? 2n  n edges. An update involving this rightmost path is thus likely to be faster than the update performed by the underlying algorithm which operates on the whole graph.

6.4 A Comparative Test In this section, we present the results of comparative experiments with all our implementations using di erent types and sizes of inputs. Since there are no former experimental studies in the eld of dynamic connectivity algorithms, there are no libraries of benchmark inputs. Due to the dynamic situation, \typical" or \interesting" inputs are even less obvious than for the static case. Additionally, there are not yet as much applications as in the static case which could provide inputs generated by the solution of \real-world problems". Our experiments were conducted on two types of inputs. First of all, we studied random inputs (i.e., random sequences of updates on random graphs), because we were interested in the average case performance of the data structures. Furthermore, we used more structured non-random inputs, that tried to force bad update patterns on the algorithms.

26

0.8 Sparsification Spanning_Tree 0.7

0.6

times in sec

0.5

0.4

0.3

0.2

0.1

0 0

500

1000

1500

2000

2500 m

3000

3500

4000

4500

5000

Figure 4. The running times of Spanning Tree and of simple sparsi cation on top of Spanning Tree on graphs with 500 vertices and di erent densities, and sequences of 500 updates. Each data set is the average of ten di erent samples.

6.4.1 Random Inputs The random inputs consisted of random graphs with di erent densities, and sequences of updates containing as many insertions as deletions and as queries. The deletions were uniformly distributed in the set of current edges. The insertions were uniformly distributed in the set of current non-edges with respect to the set of all possible edges for the xed node set. The queries were uniformly distributed among all pairs of vertices. The random inputs were generated as follows. The input generating program gen ops takes six parameters, graph, m, name, #ins, #del, and #queries, where  graph is the name of a le containing a graph G in LEDA format which becomes the supergraph of all graphs Gi that evolve with the sequence of operations to be generated,  m is the number of edges in the initial random subgraph G0 of G ,  name is a pre x for the two les generated by this program, one will contain the initial graph G0 and the other one the sequence of operations, 27

Figure 5.

 #ins is the desired number of edge insertions in the sequence of operations,  #del is the desired number of edge deletions in the sequence of operations,  and #queries is the desired number of connectivity queries in the sequence of operations, respectively.

Since the vertex set is xed to V (G ), each possible subgraph of G is uniquely determined by its edge set. Thus, the program maintains the current graph Gi in the sequence implicitly by means of two sets containing the current edges in Gi , and the complement of the edges in Gi with respect to G , respectively. These two sets are called edges and non-edges, and they are stored in balanced binary trees, which allow fast uniform random sampling, as well as inserting and deleting of edges. Initially, all edges of G are put into non-edges. Then m times a nonedge is chosen uniformly at random, removed from non-edges and inserted into edges. This is equivalent to choosing a subset of m edges of G uniformly at random, thus we get the initial subgraph G0 , where all subgraphs with m edges of G are equally likely2 . Then the sequence of operations is constructed as follows. Every operation is determined randomly where the chances for an edge insertion, say, are proportional to the fraction of remaining insertions over the sum of all remaining operations. The same holds for deletions and queries, analogously. If the next operation is chosen to be an insertion, but it is not possible to do so, because no non-edges are left, then a deletion is done instead, and vice versa. An insertion is done by moving a random non-edge to the set of edges, and a deletion is done by moving a random edge to the set of non-edges. In our experiments the supergraph G was always the complete graph on the speci ed number of nodes. Choosing a node uniformly at random, and then one of its adjacent edges does not produce a uniform distribution on the edges in general. 2

28

We conducted a series of tests for graphs on n = 100; 300; 500; 700 vertices and m = n=2; n; n ln n; n3=2; n2=4 edges. The sequence of operations consisted of m insertions, m deletions, and m queries, respectively. As an exception from the previous rule, we had longer sequences for n = 100, since otherwise the overall running times were too small. Moreover, we discovered that for n = 300; 500; 700 there was too much variance in the data for m = n=2; n, so we redid these tests with 3000 insertions, deletions and queries. For every pair of values for n and m we did ten experiments and took averages. In every experiment, we used the same input for all implementations. These tests were run on the SUN workstation. The results are shown in Figures 5, 6, 7, and 8. We do not show the results for m = n=2 in our gures, because most algorithms were much faster for this type of input rendering the gures less readable. The fastquery algorithm was the fastest algorithm for this density, followed by Henzinger-King variant. In Figure 9, we show the time needed for preprocessing in the experiments with n = 300. The choices of m were motivated by the following considerations. Since we are dealing with random graphs, we know quite a lot about their basic structural properties from the theory of random graphs [6, 7]. For m < n, the graph consists with high probability of a lot of connected components of size O(log n). If m is about the same as n, then the whole graph is still disconnected, but there is one component of size in the order of n2=3 . If m is somewhat larger than n there is a single so-called giant component of order n and all other components of the graph are O(ln n). For m  n ln n the graph is connected with high probability. We chose m = n3=2 , because it is an intermediate step on the way to a dense graph with m = n2 =4. The smaller values of m, i.e., n=2; n; n ln n are interesting as points of fundamentally di erent structural properties of the graphs. The bigger values of m show the in uence of the total number of edges on the running times. For some of the algorithms this is particularly interesting, because they maintain a spanning forest for the whole graph and behave di erently for updates involving tree edges and non-tree edges. The denser the graph, the higher is the expected number of non-tree edge updates as compared with the tree edge updates. The properties of random graphs and the uniform distribution of the edges involved in an update allow us to draw some conclusions on the update sequences. For m  n, there are almost exclusively tree updates, because very few non-tree edges exist. For m < n most of the algorithms bene t from the small connected components. For m approximately equal to n, the algorithms have to deal with a situation with big components on the one hand, while the connectivity keeps changing on the other hand. This uctuation of connectivity becomes smaller, if we add more edges, and it stops, if there are around n ln n edges. Tree updates now involve just one big component, the whole graph, but there are fewer of them. One can expect only one tree update among ln n updates. For bigger values of m, the graph becomes more and more crowded. There are even less tree updates, and there are more and more possible replacement edges for a deleted tree edge. We now have a look at the distributions of the running times for a xed number of vertices. Since queries are faster than updates, these distributions are mostly determined by the distribution of the updates as described above. The only exception from this rule is fast{update. In order to get a more detailed picture of the distributions of running times, we did tests with more di erent values for m and n = 300 for every algorithm. Their results are shown in Figure 6.

29

Figure 6.

fast{update For the fast{update algorithm, query (u; v ) is implemented by a BFS from u. It is remarkable that the running time for graphs with a xed number of vertices is fairly independent of the number of edges. fast{query The running time per tree update for this algorithm is (n0 + m0 ) where n0 is the number of vertices in the a ected components and m0 is the corresponding number of edges. A non-tree update is O(1). If m increases from n to n(n ? 1)=2, then the expected number of tree updates decreases, while each of them becomes more and more expensive. In theory, these two e ects cancel out, and we get an expected running time of O(n) per update. In practice, the throughput is between 3600 and 5900 operations per second for m in the range of 300 to 20000. For m = 40000, the throughput drops to 2300 operations per second. We cannot explain this from the theoretical point of view. sparsi cation As predicted by theory, the running times for sparsi cation are fairly independent of m, at least for m = 1500 and above. We get a throughput of about 700-800 operations per second in that range. Below m = 1500, we get a slightly better throughput, e.g., for m = 150 it is some 2500 operations per second. Like fast{query, fast{update, and HenzingerKing variant, sparsi cation attains its lowest throughput for m = 40000. Henzinger-King and its variant For Henzinger-King and its variant, the distinction between cheap non-tree updates and more expensive tree updates applies, too. They also bene t from small connected components as they appear for m < n. Inputs with m equal to n or slightly larger are the most dicult for Henzinger-King, while for its variant additionally very dense inputs are hard. If the number of edges increases, then Henzinger-King gets better and better,

30

Figure 7.

because there are more cheap non-tree updates. Its variant rst shows a similar behavior, but then the throughput decreases again. The di erence between Henzinger-King and its variant might be explained as follows. By construction the performance of both algorithms for the cheap non-tree updates is approximately the same, whereas tree updates are more expensive for Henzinger-King than for its variant. Since tree-updates become fewer and fewer for larger values of m, one can expect that the performances of Henzinger-King and its variant converge. We still need to explain, why the performance of Henzinger-King variant drops again for denser graphs. If we do not consider cache misses and page faults in uence, there is one more fact which might be the reason for this behavior. The true bound for a non-tree update is logarithmic in m. This means that these operations for dense graphs are slower by a factor of about 2 than for sparse graphs.

Comparison For n = 300 our variant of Henzinger-King was the fastest algorithm, except

for m < n where fast{query is a clear winner. For graphs with n = 500 or n = 700, the gap between our variant of Henzinger-King and the second best algorithm becomes bigger. For n = 100 fast-query is slightly faster than our variant of Henzinger-King, especially for very dense and for very sparse graphs. The only drawback in using our variant of Henzinger-King for graphs on 300 or more vertices is its higher preprocessing time, as well as a bigger space usage especially for dense graphs. If the sequence of updates is relatively short, then the overall running time for fast{query is better. It is interesting to compare the performance for a xed density and di erent values of n. The relative performance degradation for Henzinger-King and its variant are much lower than for the other algorithms. This behavior can be explained by the polylogarithmic bounds for the former and the (n) bound of the latter algorithms. 31

Figure 8.

An exception from this rule are the results for fast{query for m = n=2. We expect only a logarithmic decrease there, too, because fast{query is linear in the size of the a ected components which is O(log n) with high probability in this case. However, for bigger values of n the ratio of performance between fast{query and Henzinger-King variant drops.

6.4.2 Non-Random Inputs We wanted to test our implementations also on structured, non-random inputs. We now describe the graphs and sequences of updates we used for these experiments. These graphs are characterized by two parameters: n, the number of vertices, and an integer s, s > 0. We de ne a family of graphs G (n; s) for each value of n and s. Each graph G(n; s) in G (n; s) is as follows. G(n; s) has exactly n vertices, grouped in s di erent groups V1, V2 , : : :, Vs : Vj , 1  j  s ? 1, has exactly dn=se vertices, while Vs has the remaining n ? (s ? 1)dn=se vertices. For 1  j  s, let Gj be the subgraph of G(n; s) induced by Vj : Gj is the complete graph. Furthermore, there are (s ? 1) edges in G(n; s): edge ei , 1  i  s ? 1, is a bridge connecting one vertex of Vi and one vertex of Vi+1 . Note that when s is much smaller than n, G(n; s) has strong structural properties, while when s is of the same order as n, G(n; s) looses completely its structural properties and degenerates into a sparse graph. The dynamic operations we perform on G(n; s) are deletions and insertions of bridges, which will force the algorithms to handle dense subgraphs, while the connectivity of the whole graph keeps changing. With high probability this situation does not happen for random inputs. Thus, this type of input is really non-random. We ran many experiments on these non-random graphs, measuring the time that each algorithm took in performing a sequence of updates, including the initial preprocessing. In our experiments, simple sparsi cation and the original Henzinger-King algorithm were faster than 32

Figure 9.

the other algorithms by more than one order of magnitude for small values of s (i.e., highly structured graphs). As s gets larger, the graph G(n; s) tends to loose its structured properties, and the gap between all the algorithms became smaller in our experiments. When s becomes of the same order as n, G(n; s) degenerates into a sparse graph: in this case our variant of Henzinger-King became again the fastest algorithm. For more structured graphs (i.e., small values of s), simple sparsi cation was faster than Henzinger-King up to a certain number of updates, and then for larger sequences of updates (that leave room for amortization) the original algorithm of Henzinger-King took over. For instance, we reported the following behavior for s = 2. On graphs with 200 vertices, we needed around 400 updates for Henzinger-King to outperform simple sparsi cation; on graphs with 400 vertices, this threshold moved to around 700 updates; on graphs with 600 vertices, around 1000 updates were needed; for 800 vertices, around 1750 updates; nally for graphs with 1000 vertices, around 2500 updates were needed. For larger values of s, the graphs become less structured, and all these thresholds separating simple sparsi cation and Henzinger-King became smaller. The experiments for s = 2 and n = 400 and 1000 are illustrated in Figures 10 and 11, which use a logarithmic scale for sake of representation.

7 Conclusions and Future Work We have described elegant, robust and ecient implementations of simple sparsi cation [11] and the randomized dynamic connectivity algorithm of Henzinger and King [19], based on the LEDA C++ library of ecient data types and algorithms. We have tried these implementations on dynamic connectivity and have shown with experimental data that they produce fast algorithms in practice. 33

n = 400 Sparsification Henzinger-King

# op / sec

1000

100

10

1

0.1 10

100 # updates

1000

10000

Figure 10. The throughput (number of operations per second) of Henzinger-King and sparsification on non-random graphs with 400 vertices for di erent lengths of the sequence of updates.

We plan to continue this experimental work in several directions. First, we would like to implement dynamic algorithms for other problems, such as edge- and vertex-connectivity. This will require developing static algorithms for higher connectivity that are not yet o ered by existing libraries, and whose implementation may be of independent interest to these libraries. Second, we designed and implemented a version of simple sparsi cation that operates on dynamic algorithms (the version described in this paper only operates on static algorithms) [5]. We used this version of simple sparsi cation on top of existing dynamic algorithms, such as Frederickson's dynamic data structures [14, 15] for minimum spanning trees. The algorithms by Frederickson are sophisticated algorithms, for which no implementation was previously available, and that required a great deal of implementation e ort. We refer the interested reader to reference [5] for details on how the asymptotically ecient methods obtained by Frederickson's algorithms, run with simple sparsi cation on top of them, can play an important role in practice as well. Further investigation in this direction, and better tuning of sparsi cation is still under development. Third, we would like to employ simple sparsi cation on other partially dynamic data structures, such as the ones developed in [9, 16, 21, 22, 23, 33]. This requires again implementing eciently all these algorithms, and may be very time consuming. However, on the positive side we strongly believe that once all these implementations are available, a comprehensive empirical study of dynamic graph algorithms can shed further light on the quest for ecient dynamic graph algorithms and can greatly in uence future research in this eld.

34

n = 1000 Sparsification Henzinger King

# op / sec

1000

100

10

1

0.1 10

100 # updates

1000

10000

Figure 11. The throughput (number of operations per second) of Henzinger-King and sparsification on non-random graphs with 1000 vertices for di erent lengths of the sequence of updates.

Acknowledgments We are grateful to Monika R. Henzinger for her comments on a preliminary version of this paper, and to the anonymous referees for their constructive feedback.

35

References [1] A. V. Aho and J. E. Hopcroft and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA, 1974. [2] D. Alberts. Implementation of the dynamic connectivity algorithm by Henzinger and King. TR B 95{10, Freie Universitat Berlin, Inst. f. Informatik, 1995. [3] D. Alberts, G. Cattaneo, G. F. Italiano, \An empirical study of dynamic graph algorithms" Proc. 7th ACM-SIAM Annual Symp. on Discrete Algorithms (SODA 96), Atlanta, GA, USA, 1996. [4] D. Alberts, M. Rauch Henzinger, \Average Case Analysis of Dynamic Graph Algorithms", Proc. 6th Symp. on Discrete Algorithms (1995), 312{321. [5] G. Amato, G. Cattaneo, G. F. Italiano, \Experimental analysis of dynamic minimum spanning tree algorithms". Proc. 8th ACM-SIAM Symp. on Discrete Algorithms (SODA 97), New Orleans, USA, 5-7 January 1997, 314{323. [6] N. Alon and J. H. Spencer. The Probabilistic Method. J. Wiley & Sons, New York, 1991. [7] B. Bollobas. Random Graphs. Academic Press, London, 1985. [8] T. H. Cormen, C. E. Leiserson, R. L. Rivest, Introduction to Algorithms, Mc-Graw Hill, The MIT Press, NY, 1990. [9] E. A. Dinitz, \Maintaining the 4-edge-connected components of a graph on-line", Proc. 2nd Israel Symp. on Theory of Computing and Systems (1993), 88{99. [10] D. Eppstein, \Average Case Analysis of Dynamic Geometric Optimization", Proc. 5th Symp. on Discrete Algorithms (1994), 77{86. [11] D. Eppstein, Z. Galil, G.F. Italiano, and A. Nissenzweig, \Sparsi cation|A technique for speeding up dynamic graph algorithms", Proc. 33rd IEEE Symp. on Foundations of Computer Science (1992), 60{69. [12] D. Eppstein, Z. Galil, G.F. Italiano, and A. Nissenzweig, \Sparsi cation|A technique for speeding up dynamic graph algorithms", full version. Manuscript, revised 1995. [13] D. Eppstein, Z. Galil, G. F. Italiano, T. H. Spencer, \Separator based sparsi cation for dynamic planar graph algorithms", Proc. 25th Annual ACM Symposium on Theory of Computing (1993), 208{217. [14] G.N. Frederickson, \Data structures for on-line updating of minimum spanning trees, with applications", SIAM J. Comput. 14 (1985), 781{798. [15] G.N. Frederickson, \Ambivalent data structures for dynamic 2-edge-connectivity and k smallest spanning trees", Proc. 32nd IEEE Symp. Foundations of Computer Science (1991), 632{641. 36

[16] Z. Galil and G. F. Italiano, Maintaining the 3-edge-connected components of a graph online, SIAM J. Comput. 22 (1993), 11{28. [17] M. R. Henzinger. Fully dynamic cycle equivalence in graphs. Proc. 35th IEEE Symp. Foundations of Computer Science (1994), 744{755. [18] M. R. Henzinger. Personal communication. 1995. [19] M. R. Henzinger and V. King. Randomized dynamic graph algorithms with polylogarithmic time per operation. Proc. 27th Symp. on Theory of Computing, 1995, 519{527. [20] M. R. Henzinger and M. Thorup. Improved sampling with applications to dynamic graph algorithms. In Proc. ICALP '96, 1996. [21] A. Kanevsky and R. Tamassia and G. Di Battista and J. Chen, \On-line maintenance of the four-connected components of a graph", Proc. 32nd Annual Symp. on Foundations of Computer Science (1991), 793{801. [22] J. A. La Poutre, \Maintenance of triconnected components of graphs", Proc. 19th Int. Colloquium on Automata, Languages and Programming (1992), 354{365. [23] J. A. La Poutre and J. van Leeuwen and M. H. Overmars, \Maintenance of 2- and 3connected components of Graphs, Part I: 2- and 3-edge-connected components", Discrete Mathematics 114 (1993), 329{359. [24] K. Mehlhorn, S. Naher, \LEDA, A platform for combinatorial and geometric computing", Comm. ACM (1995). [25] K. Mulmuley, Randomized multi{dimensional search trees: dynamic sampling. In Proc. 7th Symp. on Computational Geometry, pages 121{131, 1991. [26] K. Mulmuley, Computational Geometry, an Introduction through Randomized Algorithms, Prentice Hall, 1993. [27] H. Nagamochi, T. Ibaraki, \Linear time algorithms for nding a sparse k-connected spanning subgraph of a k-connected graph", Algorithmica, 7 (1992), 583{596. [28] S. Naher, LEDA User Manual, Version 3.0, Technical Report, Max-Planck-Institut fur Informatik, 1994. [29] G. Ramalingam, \Bounded incremental computation", Ph.D. Thesis, Department of Computer Science, University of Wisconsin at Madison, August 1993. [30] M. Rauch, \Fully dynamic biconnectivity in graphs", Proc. 33rd IEEE Symp. Foundations of Computer Science (1992), 50{59. [31] M. Rauch, \Improved data structures for fully dynamic biconnectivity", Proc. 26th Symp. on Theory of Computing (1994), 686{695. [32] O. Schwarzkopf, \Dynamic Maintenance of Convex Polytopes and Related Structures". Ph.D. Thesis, Freie Universitat Berlin, 1992. 37

[33] J. Westbrook and R. E. Tarjan, \Maintaining bridge-connected and biconnected components on-line", Algorithmica 7 (1992), 433{464.

38

n0

10 10 10 10 25 25 25 25 25 50 50 50 50 50 100 100 100 100 100 250 250 250 250 250 500 500 500 500 500

m0

5 10 23 31 12 25 80 124 156 25 50 195 353 625 50 100 460 1000 2500 125 250 1380 3952 15625 250 500 3107 11180 62500

sparsification

0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.01 0.01 0.02 0.00 0.00 0.02 0.02 0.07 0.01 0.01 0.05 0.11 0.44 0.04 0.04 0.11 0.33 1.89

0.10 0.10 0.11 0.11 0.11 0.13 0.12 0.14 0.13 0.14 0.15 0.19 0.18 0.18 0.18 0.23 0.23 0.28 0.28 0.34 0.45 0.54 0.61 0.76 0.48 0.76 0.86 1.31 1.77

Spanning Tree

0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.02 0.03 0.03 0.03 0.04 0.10

0.10 0.11 0.11 0.11 0.12 0.14 0.18 0.21 0.23 0.16 0.19 0.30 0.39 0.57 0.23 0.31 0.58 0.92 1.78 0.48 0.68 1.47 3.08 9.83 0.82 1.18 3.07 7.89 48.84

Table 5. Experiments with sparsification on top of Spanning Tree, and Spanning Tree on di erent graphs and sequences of 500 updates. For each algorithm the left column is the preprocessing and the right column is the processing time in seconds. Each data set is the average of ten di erent samples.

39

Suggest Documents