Self-healing Deterministic Expanders
arXiv:submit/0489890 [cs.DC] 7 Jun 2012
Gopal Pandurangan
∗
Peter Robinson
†
Amitabh Trehan
‡
We present fully-distributed algorithms that construct and maintain deterministic expander networks (i.e., the expansion properties are deterministically guaranteed) in the presence of an adaptive adversary. To the best of our knowledge, these algorithms are the first distributed implementations of deterministic expanders that work even under an all-powerful adaptive adversary (that has unlimited computational power and knowledge of the entire network state, can decide which nodes join and leave and at what time, and knows the past random choices made by the algorithm). Previous distributed expander constructions typically provide only probabilistic guarantees and these rapidly degrade over a series of network changes, so here we provide a much needed solution. Our algorithms are in a self-healing model where at each step, the omniscient adversary either inserts or deletes a node and the algorithm responds by locally adding or dropping edges (connections) in the network. Our algorithms provide fast healing with high probability in O(log n) rounds on average, where n is the number of nodes currently in the network, while deterministically maintaining a constant degree expander. Moreover, our algorithms guarantee that the number of added/dropped edges per round is constant on average. Our communication model follows the CONGEST model, allowing messages of size only O(log n). This ensures that our algorithms are highly scalable. We also present a lower bound of Θ(log n) rounds on average, for any distributed expander self-healing algorithm. This shows that our algorithms are asymptotically the best possible (up to constant factors) in this model. Our work also yields an improved version of the self-healing algorithm Xheal [PODC 2011], which previously relied on expander constructions with only probabilistic guarantees.
1. Introduction This is the modern age of ubiquitous and self-driven networks. Modern networks (social, P2P, mobile, ad-hoc, The Internet etc.) are dynamic and increasingly resemble self-governed living entities with largely distributed control and communication. In such a scenario, the network topology governs much of the functionality of the network. In what topology should such nodes ∗
Division of Mathematical Sciences, Nanyang Technological University, Singapore 637371 and Department of Computer Science, Brown University, Providence, RI 02912. E-mail:
[email protected]. † Division of Mathematical Sciences, Nanyang Technological University, Singapore 637371. E-mail:
[email protected]. ‡ Faculty of Industrial Engineering and Management, Technion - Israel Institute of Technology, Haifa, Israel 32000. E-mail:
[email protected].
1
(having limited resources and bandwidth) connect so that the network has effective communication channels with low latency for all messages, is robust to a limited number of failures, and nodes can quickly sample a random node in the network (enabling many randomized protocols)? The well known answer is that they should connect as a (sparse degree) expander (see e.g., [2]). How should such a topology be constructed? In the centralized setting, there are now many known expanders and construction techniques (discussed below under Related Works). However, the problem is challenging in the distributed setting in a dynamic network (i.e., a network exhibiting churn with nodes and edges entering and leaving the system). We are aware of only a few methods for distributed construction of expander graphs, e.g., the Law-Siu Construction [28], GMS(Gkantsdis, Mihail and Saberi)’s improved Law-Siu construction [15], Dolev and Tzachar’s Spanders [9], and RSW(Reiter, Samar and Wang)’s Expander from Tree construction [39]. However, the probabilistic guarantees of the expansion properties of these constructions degrade rapidly over series of network changes (insertions and/or deletions of nodes/edges) — in the sense that expansion properties cannot be maintained ad infinitum — due to their probabilistic nature. Even if the network is guaranteed to be an expander with high probability (w.h.p.), i.e. a probability of 1 − 1/nc , for some constant c, in every step (e.g., as in the protocols of [28] and [34]), the probability of violating the expansion bound may tend to 1 after some polynomial number of steps. Critically, the expansion properties can degrade even more rapidly under adversarial insertions and deletions (e.g., see [28]). Hence, in a dynamic setting, deterministic expander constructions are even more needed compared to randomized constructions. Furthermore, it is desirable that the network retains its expander properties (such as high conductance, robustness to failures, and efficient multi-path routing) even under dynamic network changes. This will be useful in building good overlay and P2P network topologies with deterministic guarantees that do not degrade with time, unlike the above approaches that have only probabilistic guarantees. In [15], Gkantsdis, Mihail and Saberi have raised the following open questions: Q1) It is an open question how to design a strongly decentralized construction (of dynamic P2P topologies) with constant overhead. Q 2) Both of these algorithms (Their algorithm and Law-Siu’s algorithm) handle deletions much less effectively than additions. Handling deletions efficiently is an interesting open question. Our work answers both of these questions. We answer Q1 in the negative by showing that any fully distributed algorithm (in reasonable dynamic models where nodes with limited power can add limited repair edges) needs Ω(log n) overhead for expander construction (Section 5). Regarding Q2, our algorithms handle deletions as effectively as insertions in the average case, and only slightly worse in the worst case. Our algorithms work in a self-healing model (discussed below). Self-healing is a responsive approach to fault-tolerance, in the sense that it responds to an attack (or component failure) by changing the topology of the network. This approach works irrespective of the initial state of the network, and is thus orthogonal and complementary to traditional non-responsive techniques. Our approach requires the network to be reconfigurable, in the sense that the topology of the network can be changed. Many important networks are reconfigurable, e.g., peer-to-peer, wireless mesh and ad-hoc computer networks, and infrastructure networks, such as an airline’s transportation network. Most of these are also dynamic, since individual nodes may initiate new connections or drop existing ones. Our algorithms have immediate applications. They provide deterministic, efficient and scalable expander constructions which can be used as a subroutine for any algorithm which requires such a construction. One such algorithm is the self-healing algorithm Xheal [35]. This is a selfhealing algorithm (in a model similar to ours) that efficiently maintains spectral properties of the network while allowing only a small stretch and small degree increase. Xheal relied on the
2
Law-Siu randomized construction [28] for its distributed implementation, and required the LOCAL model (unlimited message sizes) [36]. Moreover, the spectral guarantees of the randomized expander construction degrade over some polynomial number of steps. Using our algorithm as a subroutine, Xheal can be implemented in a deterministic manner using the more scalable CONGEST model (O(log n) size message per step) [36]. There are many distributed P2P systems that could benefit [43, 25]. In particular, Chord [43] is an efficient dynamic hash table (DHT) based P2P system, and Re-Chord [25] is a self-stabilizing version. These can be extended by putting another overlay layer based on our construction to add resiliency, giving efficient and resilient DHTs. Our expander constructions are also relevant for many distributed algorithmic application in dynamic P2P networks, where expansion properties need to be maintained (cf. [5] and references therein). Our Contributions: In this paper, we present, what in our knowledge are the first distributed algorithms to construct and dynamically maintain a network (under both insertions and deletions) which is a deterministic expander1 under an all-powerful adaptive adversary in the self-healing model. a) We present a general framework which can be used to obtain efficient distributed implementations of specific (centralized) deterministic expander constructions (e.g. p-cycles [20] or zig-zag product [38]). b) We present almost-optimal distributed self-healing constructions for deterministic expanders that take O(log n) time and polylogarithmic messages on average per healing step. Our algorithms need only local information and use only small-sized messages, and hence are scalable. We guarantee that the network always has a constant spectral gap (for some fixed absolute constant), and has constant degree, and hence is a (sparse) expander. In one of our constructions (based on Cayley graphs [26]), we can guarantee that the degree is small which is relevant in building very sparse networks. We also show that our results can be extended to a model where multiple nodes could be inserted or deleted (in one step) under certain further assumptions (cf. Section 4). c) We prove a tight lower bound that shows that any self-healing algorithm (under an adaptive adversary) needs at least Θ(log n) time and messages on average to maintain an expander. The following theorems state our main results: Theorem 1. Consider an adaptive adversary that observes the entire state of the network including all past random choices and inserts or removes a single node in every step. There is a distributed algorithm for deterministically maintaining a constant degree expander network that has a constant spectral gap. On average, the algorithm takes O(log n) rounds where n is the current network size, uses a polylogarithmic number of messages and requires a constant number of topology changes per healing step, with high probability. Moreover, the worst case running time is bounded by O(log3 n) rounds, with high probability. Theorem 2. Any distributed algorithm, that deterministically maintains an (constant degree) expander in our self-healing model against an adaptive adversary, needs at least Θ(log n) messages and rounds on average per node insertion. Related Works: Expanders are a very important class of graphs that have applications in various areas of computer science —networks, crypography, derandomization, complexity and coding theory etc. (e.g., see [20] for a survey). For example, in distributed computing and 1
By deterministic expander, we mean that the expansion properties of the graph hold deterministically. Strictly speaking, this terminology is superfluous, since an expander means that the expansion properties hold, but we use it here anyway to stress that the focus is on deterministic constructions of expanders.
3
networks, they have been used for censorship resistant networks [12, 11], fault tolerant networks [37], analyzing information spreading in networks [19], and efficient (Byzantine) agreement and leader election algorithms [10, 45, 24, 23, 5]. There are many well known centralized expander construction techniques [31, 13, 38, 1, 42, 22]. See [20] for details. Our construction and general framework rely on the notion of virtual graphs/virtualization. This is a useful notion, and is discussed in the self-healing context in [44] and used by the authors in [16, 18]. Another work which uses this notion and is close to our present work in its aims is [4]; they propose algorithms for expansion maintenance in overlay networks under dynamic node insertions (and not deletions). They propose an algorithm based on zig-zag constructions for nonadversarial (benevolent) insertions and a better analysis of the Law-Siu algorithm [28] for the adversarial case (this yields only a randomized construction). Unlike our constructions which assume adversarial insertions and deletions, their construction is mainly concerned with benevolent insertions, i.e., the algorithm will decide where the incoming node will be inserted and not the adversary. Under this assumption, they show that the algorithm needs only constant amount of work. (This will also be true for our case, if we assume nonadversarial insertions.) They also raise the open question of showing a non-trivial lower bound for adversarial insertions, which we settle in this paper. As stated earlier, there are a few other works addressing the problem of distributed expander construction; however all of these are randomized and the expansion properties hold with probabilistic guarantees only. Many of these are motivated by applications to P2P networks. Law and Siu [28] give a construction where an expander is constructed by composing an appropriate number of random Hamiltonian cycles. They start with a predefined graph of at least three nodes and incoming nodes connect by sending a special request to an existing node. The probabilistic guarantees provided degrade rapidly, especially under adversarial deletions. [15] makes use of random walks to add new peers with only constant overhead. However, it is not a fully decentralized algorithm. Both these algorithms handle insertions much better than deletions. Spanders [9] is a self-stabilizing construction of an expander network that is a spanner of the graph. In [7], the authors show a way of constructing random regular graphs (which are good expanders, w.h.p.) by performing a series of random ‘flip’ operations on the graph’s edges. [39] maintains an almost d-regular graph i.e. with degrees varying around d using uniform sampling to select, for each node, a set of expander-neighbors. In a dynamic setting where nodes get inserted and deleted in every step, there is the additional challenge of quantifying the work done by the algorithm to maintain the desired properties. In general lower bounds on the work needed are not well-studied. An exception is the lower bound result of [29] that shows that Ω(log n) work is also required to maintain connectivity in any dynamic network under a stochastic model (the same as in [34]). Note that this result does not apply to our self-healing model which is a discrete insertion/deletion model. In a model similar to our self-healing model, [27] maintains a DHT in the setting where an adaptive adversary can add/remove O(log n) peers per step. Another paper which considers node joins/leaves is [21]. The self-stabilizing SKIP+ graph of [21] allows handling (single) node joins/leaves using polylogarithmic work. It was shown in [3] that skip graphs contain expanders as subgraphs w.h.p., which can be used as a randomized expander construction.
1.1. Preliminaries We use the notation G = hn, d, λG i if a d-regular graph G has n vertices and the second largest eigenvalue of the normalized adjacency matrix of G is at most λG , for some λG < 1.
4
Definition 1 (Expanders). Let d be a constant and let G = (hn0 , d, λ0 i, hn1 , d, λ1 i, . . . ) be an infinite sequence of graphs. We say that G is an expander family of degree d if there is a constant λ < 1 such that λi 6 λ and ni < ni+1 , for all i ∈ N. Moreover, the individual graphs in G are called expanders with spectral gap 1 − λ. We next state some known results on expander graphs that we use in our analysis. The following lemma relates the number of edges between any two sets to the expansion of the graph (given by the spectral gap). Lemma 1 (Expander Mixing Lemma, adapted from Lemma 2.5 [20]). Let G be a d-regular p | graph with n vertices and set λ = λG . Then, for all S, T ⊆ V , |E(S, T )| − d|S||T n 6 λd |S||T |. We say that H is a contraction of a graph G, if H is formed by identifying two distinct nodes into a single node. We extensively make use of the fact that this operation leaves the spectral gap intact: Lemma 2 (Contraction Lemma, cf. [6]). If H is formed by contractions from a graph G, then λH 6 λG .
1.2. The Self-Healing Model The model we are using is similar to the models used in [17, 35, 18, 40]2 ; also, see [44]. We now describe the details. Let G = G0 be a small arbitrary graph2 where nodes represent processors in a distributed network. In each step t > 1, the adversary either deletes or inserts a single3 node and the nodes in the network react to this change by updating the topology, yielding Gt , as described in Figure 1. In case of an insertion, we assume that the newly added node is initially connected to a constant number of other nodes. This is merely a simplification; nodes are not malicious but faithfully follow the algorithm, thus we could explicitly require our algorithm to immediately drop all but a constant number of edges. The adversary is adaptive and is aware of our algorithm, the complete state of the current network including all past random choices. We assume that no other node is deleted or inserted until the recovery phase of the current change has concluded. The computation during the recovery phase is structured into synchronous rounds. As in the CONGEST model [36], nodes can communicate with their neighbors by sending messages of size O(log n), which are neither lost nor corrupted. The algorithm’s goal is to fulfil the success metrics stated in Figure 1. Intuitively speaking, its goal is to maintain good expansion properties and a constant node degree. To be of use in real networks, it is inevitable that our algorithm provides fast healing, i.e., the recovery time should be be polylogarithmic. At the same time, we want to minimize the resources (number of messages sent, number of added/dropped edges) that are spent during a recovery phase. These metrics captures the reconfigurable nature of many modern networks and overlay networks, and the constraints and distributed nature of the algorithms model the limited memory and bandwidth of the nodes and the network.
5
Pre-processing: Nodes can perform arbitrary computation on the initial network G0 of constant size. for each step t > 1 do Adversary deletes or inserts a node vt from/into Gt−1 , forming Ut . Neighbors of vt initiate recovery phase: Nodes of Ut may communicate (synchronously, in parallel) with their immediate neighbors. During this phase, each node may insert edges joining it to any other nodes as desired. Nodes may also drop edges from previous rounds if necessary. At the end of this phase, we call the graph Gt . Success Metrics: 1. Constant node degree bound ∆. Formally, ∃∆ ∈ O(1) ∀t ∀v ∈ Gt : degree(v, Gt ) 6 ∆. 2. Constant spectral gap λ. Formally, ∃λ ∈ Θ(1), λ < 1 ∀t : λGt 6 λ. 3. Recovery time. Number of rounds necessary for a recovery phase. 4. Communication complexity. Number of messages used in a recovery phase. 5. Work complexity. Number of edges added or dropped.
Figure 1: The Node Insert, Delete and Network Repair Model – Distributed View. Algos
L-S[28]$ Naive ! A-Y[4] Our paper $ ! †
Expansion Deterministic? × X X X
Adversary
Recovery time Average
Oblivious O(logd n) Adaptive Θ(log n) Benevolent O(1) Adaptive O(log n)†
Worst
Communication complexity Average
O(logd n) O(d logd n) Θ(log n) Θ(n) O(log n) O(1) O(log3 n) † O(log2 n)†
Work complexity
Worst
Average
Worst
O(d logd n) Θ(n) O(n) O(n log2 n)†
O(d) Θ(n) O(1) O(1)
O(d) Θ(n) O(n) O(n)
d: parameter of the algorithm = number of Hamiltonian cycles in the ’healing’ graph (H). Basic flooding algorithm: all nodes are informed of each event by flooding, they drop all edges and reconstruct. With high probability.
Figure 2: Comparison of distributed expander constructions.
2. Self-Healing Algorithms This section introduces our algorithms for self-healing and maintenance of deterministic expanding networks. Before discussing our algorithms, it is instructive to consider a simple flooding based algorithm that also achieves deterministic expansion and node degree bounds, albeit at a much larger cost: Whenever a node is inserted (or deleted), a neighboring node floods a notification throughout the entire network and every node, having complete knowledge of the current network graph, locally recomputes the new expander topology. While this achieves a logarithmic runtime bound, it comes at the cost of using Θ(n) messages during every healing step and, in addition, might also result in O(n) topology changes, whereas our algorithms requires only polylogarithmic number of messages and constant topology changes on average. Figure 2 compares our algorithm with some known distributed expander construction algorithm including the above mentioned flooding algorithm. 2
In the usual self-healing model [17, 35, 18, 40], we can begin with any arbitrary graph, but here we restrict ourselves to small initial graphs. This allows us to construct an expander we can maintain later even on larger graph sizes 3 See Section 4 for multiple insertions/deletions per step.
6
Spare capacity
Z i−1
High Load Deletion event
Spare virtual nodes
Zi
Low Insertion Load event
Z i+1
Figure 3: Virtual graphs framework: Z i is the ith largest (virtual) expander topology (of a series). On node insertion, if incoming node cannot get spare virtual vertices to simulate, reconstruct to higher topology Z i+1 to increase availability. On node deletion, if there are not enough takers for deleted nodes’ virtual vertices, reconstruct to lower topology Z i−1 to free capacity. We now describe our framework that is based upon the idea of virtual graphs. We give an informal description of the algorithm and the intuition behind it before moving onto a more precise description. Informally, if we were to construct a graph corresponding to the network where nodes correspond to processors and edges to connections, then we call these nodes the real nodes. Now, we can construct another graph which we call a virtual graph in which the vertices do not directly correspond to the real network but each (virtual) vertex in this graph is simulated by a real node. (Henceforth, we will reserve the term “vertex” for vertices in a virtual graph and “node” for vertices in the real network.) A real node may be simulating multiple virtual vertices in this network. We keep the number of virtual nodes simulated by a real node to be bounded by a constant at all times, to keep the degree bounded by a constant. In our algorithm(s), we maintain this virtual structure and show that preserving certain desired properties in the virtual graph leads to these properties being preserved in the real network. Our virtual expander graph (formally described in Section 2.1) is such a virtual graph that leads to the underlying network of real nodes to be an expander graph too. Figure 3 illustrates the high level intuition behind maintenance of the virtual expander graph: A particular deterministic expander graph construction can be viewed as a discrete series of expander graphs increasing in size with the number of nodes. Our algorithm maintains the virtual graph as such an expander graph while maintaining a “balanced load mapping” between the virtual vertices and the real nodes (Section 2.1) as the number of nodes change. The balanced load mapping keeps the number of virtual nodes simulated by any real node to be a constant — this is crucial in maintaining the constant degree bound. Each insertion and deletion can result in loss of expansion property (i.e., the spectral gap can decrease) which needs to be repaired. However, the repair has to be done in a careful manner so as to not affect the node degree. Each event triggers load balancing and sometimes switching between iterations of the expander graphs w.r.t. to the load conditions. Section 2.2 gives a detailed overview.
2.1. Virtual Graphs and Balanced Mappings As mentioned earlier, our virtual graph consists of virtual vertices simulated by real nodes. Each real node simulates at least one virtual vertex and all its associated edges. See Figure 4 for an example. Formally, this defines a function that we call a virtual mapping: Definition 2 (Virtual Mapping). Consider a surjective map ΦGt : V (Z i ) → V (Gt ) that maps every virtual vertex of the virtual graph Z i to some (real) node of the network graph Gt . If Gt is isomorphic to a contraction of Z i , then we call ΦGt a virtual mapping. We say that node u ∈ V (Gt ) is a real node that simulates virtual vertices z1 , · · · , zk , if u = Φ(z1 ) = · · · = Φ(zk ). We consider the vertices of Z i to be partitioned into disjoint sets
7
6
7
5
6
7
4
8
3
5
4
B
C 8
3
A
9
9
2
2
10
10
1
11
0
12
22
16
17
E
20
F 19
15 16
18
22
21 14
19
15
G
13
20
14
0
12
21
13
1
11 D
17
18
Figure 4: A 4-balanced virtual mapping of a p-cycle expander to the network graph. On the left is a (virtual) 3-regular 23-cycle expander (cf. Definition 4) on Z23 , and on the right the corresponding network Gt with (real) nodes {A, . . . , G}. of vertices that we call clouds and denote the cloud to which a vertex z belongs as cloud(z). Initially, we can think of a cloud as the set of virtual vertices simulated at some node in Gt . We only consider algorithms that produce virtual mappings where the maximum cloud size is bounded by some constant ζ. Note that requiring a constant cloud size is crucial for maintaining a constant degree. Since the network (and therefore Φ) might have changed from time t − 1 to t, we use the notation ΦGt to make it clear to which function Φ we are referring to. Fact 1. Let dist(u, v) be the length of the shortest path between vertices u and v. Consider a virtual mapping Φ : Z i → Gt . Then dist(z1 , z2 ) > dist(Φ(z1 ), Φ(z2 )), for all z1 , z2 ∈ Z i . The following lemma implies that if we are able to maintain the expansion in our virtual graph Z i , then it is maintained in the real graph that it has a virtual mapping to: Lemma 3. Let Φ : Z i → Gt be a virtual mapping. Then it holds that λGt 6 λZ i . Proof. Observe that we obtain a graph that is isomorphic to Gt from Z i by vertex contraction. That is, we contract vertices z1 and z2 if Φ(z1 ) = Φ(z2 ). The result follows readily by the Contraction Lemma (cf. Lemma 2). Next we formalize the notion that our real nodes simulate at most a constant number of nodes. Let SimGt (u) = Φ−1 (u) and define the load of a node u in graph Gt as LoadGt (u) = |SimGt (u)|. Note that due to locality, node u does not necessarily know the mapping of other nodes. Definition 3. Consider a sequence of virtual mappings (ΦGt )06t6m . If there exists a constant C such that in every step t, it holds that ∀u ∈ Gt : LoadGt (u) 6 C, then we call every ΦGt a C-balanced virtual mapping or simply say that Gt is C-balanced. Figure 4 shows a 4-balanced mapping of a 3-regular virtual graph to a network of 7 nodes. We denote the fixed degree of the (regular) virtual graphs as dZ . The degree of a node u ∈ Gt is exactly Load(u) · dZ , thus our algorithm strives to maintain a constant bound on Load(u). Given any virtual mapping to some network graph G, we define the (not necessarily disjoint) sets LowG = {u ∈ G : LoadG (u) 6 2ζ}; SpareG = {u ∈ G : LoadG (u) > 1}.
(1) (2)
Intuitively speaking, LowG contains nodes that do not simulate too many virtual vertices, i.e., have low degree, whereas SpareG is the set of nodes that host at least 2 vertices. When
8
the adversary deletes some node u, we need to find a node in LowG that takes over the load of u. Upon inserting a node v, on the other hand, we need to find a node in SpareG that can spare a virtual vertex for v.
2.2. Description of the Framework and Performance Analysis In this section we describe our healing framework and prove the performance claims of Theorem 1. For the sake of readability, the full pseudo-code can be found in Appendix A. Assume a specialized module/class based on a specific deterministic expander construction (cf. Section 3) which provides subroutines to a querying node. In general, the expander constructions follow a discrete sequence of structures corresponding to the number of nodes in the system, i.e., there is a sequence of (virtual) expander topologies, say Z 0 , Z 1 and so on (in increasing order of number of nodes). We call this module Dexpander and assume it provides subroutines Dexpander.init(G), Dexpander.larger(i), and Dexpander.smaller(i). We describe implementations of these subroutines for the p-cycle expander in Section 3.1 and for the zig-zag and replacement products in Section 3.2. We start with a small initial expander graph G0 of some appropriate constant size. We then seek to maintain this graph as an expander for any number of nodes in the graph above this initial size. Dexpander.init(G0 ) gives this initial graph as a virtual graph with the suitable data structures. Algorithm 2.1 shows the high-level pseudo-code of our framework. Informally, it is sufficient if the virtual graph will be maintained as the appropriate deterministic expander (cf. Lemma 3).
1:
2: 3: 4: 5: 6:
Given: reconstruction threshold θ (cf. Equation (3)), Graph G0 of constant size. We assume that the number of nodes does not fall below this initial size. Initialization: G1 ← Dexpander.Dexpander.init(G0 ) /** Dexpander.init(G0 ) can simply compute Z0 in a centralized manner from G0 and return **/ Maintenance: while true do if a vertex u is inserted then Insertion(u, θ) else if a vertex u is deleted then Deletion(u, θ) Algorithm 2.1: Virtual graphs expander reconstruction framework : high level.
As mentioned, a real node may be responsible for multiple virtual vertices; we always maintain the invariant that each real node simulates at least one and at most a constant number of virtual vertices (cf. Lemma 8). The adversary can either insert or delete a node in every step. In either case, our algorithm reacts by doing an appropriate redistribution of the virtual vertices to the real nodes with the goal of maintaining a C-balanced mapping (cf. Definition 3). Depending on the operations employed by the algorithm, we classify a step as being either a simple step, an inflation step or a deflation step. Simple steps are very efficient, as they consist of making only a constant number of changes to the network by using a single random walk of O(log n) length, w.h.p. In an inflation (or deflation) step, on the other hand, the current virtual graph is replaced by a new virtual graph to ensure a C-balanced mapping (i.e. bounded degrees), which might require O(n) work- and O(n log2 n) message complexity. Nevertheless, there are at least Ω(n) simple steps in between any two inflation or deflation steps (cf. Lemma 9), which allows us to amortize their cost and reach nearly optimal performance bounds on average. Figure 3 illustrates the possible responses of our algorithm.
9
2.2.1. Simple Healing Steps When a node u is inserted (cf. Algorithm A.1), one of its neighbors (node v) initiates a random walk by invoking sampleSpareVertices() (cf. Algorithm A.6) to find a “spare” virtual vertex, i.e., a virtual vertex z that is simulated by a node w ∈ SpareGt−1 . This spare virtual vertex z is reassigned to the new node. On deletion, the nodes execute a similar protocol (cf. Algorithm A.2), by invoking sampleLowLoad() (cf. Algorithm A.6), except this time with the aim of redistributing the deleted node u’s virtual vertices to the remaining real nodes in the system, i.e., we need to find a node w ∈ LowGt−1 . Throughout this section, we assume that parameter θ is a sufficiently small constant such that 1 θ6 , (3) 68ζ + 1 where ζ > 2 is the maximum cloud size of the Dexpander module. Recall that Ut denotes the graph at the beginning of step t that we get after the adversary has modified Gt−1 by inserting or deleting a node. Fact 2. For any node u at any time t the following hold: (a) If u calls inflate() then |SpareUt | < θn. (b) If u calls deflate() then |LowUt | < θn. The following Lemma 4 is instrumental for showing an O(log n) time bound for simple steps. In its proof we use the Chernoff bound for random walks on expander graphs [14] and the fact that we maintain a constant spectral gap. Lemma 4. Suppose that ΦGt−1 : Z i → Gt−1 is a 4ζ-balanced virtual map. The running time of procedures sampleSpareVertices() and sampleLowLoad() is O(log n) rounds and the following properties hold with high probability: (a) Suppose that |SpareGt−1 | > θn. If a node u invokes sampleSpareVertices(), it receives a sample of SpareGt−1 . (b) Suppose that |LowGt−1 | > θn. If a node u invokes procedure sampleLowLoad(), it receives a sample of LowGt−1 . Proof. We will show the result for (a); the result for (b) is analogous. By assumption we have that |Spare| = an > θn, for some constant 0 < a < 1. We start a random walk of length ` log n for some appropriately chosen constant ` (cf. (6) below). We need to show that with high probability, the random walk hits a node in Spare. According to the description of sampleSpareVertices(), we perform the random walk on graph G0t where G0t does not contain any node w0 that hast just been inserted. Let λ = λG0t ; by Lemma 2, it follows that λ 6 λGt−1 . Consider the normalized n × n adjacency matrix M of G0t . It is well known (e.g., Theorem 7.13 in [32]) that a vector π corresponding to the stationary dx distribution of a random walk on G0t has entries π(x) = 2|E(G 0 )| where dx is the degree of node t x. By assumption, the network Gt−1 is the image of a 4ζ-balanced virtual map. This means that the maximum degree ∆ of any node in the network is ∆ 6 4ζdZ where dZ is the (regular) degree of the construction provided by the respective Dexpander module. If the adversary deletes some node in step t, the maximum degree of one of its neighbors can increase by at most ∆. Therefore, the maximum degree in Ut and thus G0t is bounded by 2∆, which gives us the bound dZ π(x) > , (4) 2∆n
10
for any node x ∈ G0t . Let ρ be the actual number of nodes in Spare that the random walk of length ` log n hits. We define q to be an n-dimensional vector that is 0 everywhere except at the index of u in M where it is 1. Let E be the event that ` log n · π(Spare) − ρ > γ, for a fixed γ > 0. That is, E occurs if the number of nodes in Spare visited by the random walk is far away (> γ) from its expectation. In the remainder of the proof, we show that E occurs with very small probability. Applying the concentration bound of [14] yields that −γ 2 (1−λ) γ(1 − λ) q 20` log n , √ Pr [E] 6 1 + (5) · e 10` log n π 2 p √ √ where q/ π is a vector with entries (q/ π)(x) = q(x)/ π(x), for 1 6 x 6 n. By (4), we know that π(Spare) > adZ /2∆. To guarantee that we find a sample w.h.p. even when √ Z` π(Spare) is small, we must set γ = ad2∆ log n. Moreover, (4) also gives us the bound ||q/ π||2 6 q p √ adZ 2∆ 2∆/dZ n. We define C = 1 + 20∆ dZ . Plugging the above bounds into (5) shows that Z` 2 ( ad2∆ ) (1 − λ) log n Pr [E] 6 C n exp − 20`
√
a2 d2 1 Z `(1−λ) − 2 80∆2
! ,
therefore, Pr[E] 6 C · n . To ensure that event E happens with small probability, it is sufficient if the exponent of n is smaller than −C, which is true whenever √ 4 2∆(dZ + 20∆) 120∆2 √ `> . (6) + 2 2 θ dZ (1 − λ) dZ Since θ, ∆, dZ , and the spectral gap 1 − λ are all constants, ` is O(1) and thus the running time of sampleSpareVertices() is O(log n). We can show that our algorithm indeed preserves a surjective virtual mapping: Lemma 5. Every node u ∈ Gt simulates at least 1 virtual vertex, at any time t. Proof. We will show the lemma by induction on t. Let t0 be the step where the adversary inserted u and let v be the node to which the adversary attached u. If sampleSpareVertices() fails to find a sample and the subsequent call of computeSpare() reveals that |Spare| < θn, then u calls inflate() to construct Z i+1 , which leaves at least 1 virtual node at u. On the other hand, if computeSpare() shows that |Spare| > θn, then u is guaranteed to find a node that can spare virtual vertices, since it will repeatedly call sampleSpareVertices() until it succeeds (cf. Lemma 4). For the induction step, assume that node u simulates at least 1 virtual vertex at time t − 1, i.e., Loadt−1 (u) > 1. We first look at the case where a random walk reaches u that was initiated by some node w due to an insertion. Note that u will only accept to spare virtual vertices if u ∈ Spare, i.e., Load(u) > 1, thus u will always simulates at least one virtual vertex. Furthermore, if t is an inflation or deflation step, or if some node performs a random walk (due to a deletion) via sampleLowLoad(), as in either case, it follows readily from the description of the algorithm that at least 1 virtual vertex remains at u. The following properties of simple steps follow from Lemmas 4 and 5:
11
Lemma 6. Suppose that t is a simple step and Gt−1 is 4ζ-balanced. Then it holds that (a) Gt is 4ζ-balanced, (b) with high probability, step t completes in O(log n) rounds, and (c) with high probability, nodes send O(log n) messages in step t. (d) The number of topology changes in t is constant. Proof. For (a), we know by Lemma 5 that ΦGt is surjective, so what remains to be shown is that no node simulates more than 4ζ virtual vertices by the end of step t and that ΦGt maps its entire domain (i.e. Z i ). If some node u is deleted in t, its neighbor v invokes sampleLowLoad() to find alternative nodes v1 , . . . , vk to simulate the k virtual vertices of u. If no k nodes are found, procedure computeLow() is invoked. Since by assumption no node calls deflate() in step t, we have |Low| > θn, which means that sampleLowLoad() will be called repeatedly and is guaranteed to eventually find k nodes in Low to take over v’s load; thus no node will exceed a load of 4ζ and clearly every virtual vertex is still simulated by some node. If t is an insertion step where inflate() is not invoked, it is immediate that the load of no node increases, since the size of the virtual graph does not increase, i.e., the load of every node is still bounded by 4ζ. On an insertion of u, Procedure sampleSpareVertices() is called, in order to find a node w from which a spare virtual vertex can be split off and transfered to u. Similarly as for deletions, we can assume that |Spare| > θn, which tells us that sampleSpareVertices() must eventually succeed in finding such a node w; this shows that ΦGt is surjective. Since a virtual vertex is transfered between different nodes, it follows that Φ still maps its entire domain at the end of t. For (b), suppose that the adversary inserts a node in t. By description of the algorithm this causes some node to call sampleSpareVertices(), which by Lemma 4 takes O(log n) rounds. Moreover, if |SpareUt | > θn, the random walk of procedure sampleSpareVertices() will find a sample with high probability. Note that the case where no sample is found and |SpareUt | < θn is ruled out by assumption (cf. Fact 2), since inflate() is not called in step t. For Property (c), it is sufficient to observe that we only perform a single random walk of length O(log n) by token forwarding, which will succeed in finding a spare vertex on the first attempt with high probability. Property (d) is immediate, since transfering exactly one virtual vertex to the newly inserted node, affects only a constant number of edges. The case where t is a deletion step can be argued along the same lines. 2.2.2. Inflation/Deflation Steps It is possible (with small probability) that the random walk initiated by sampleSpareVertices() does not succeed in finding a node with spare load even if |SpareGt−1 | > θn. Node v tests this deterministically by computing the network size and |Spare| via computeSpare() (which is a simple aggregation algorithm) and initiates another random walk if indeed |SpareGt−1 | > θn. In case this test shows that |SpareGt−1 | < θn, node v initiates rebuilding of the expander to the next larger virtual graph by invoking inflate() (cf. Algorithm A.4), which in turn executes Dexpander.larger(i). That is, if the current virtual graph is Z i , we rebuild to Z i+1 . This rebuilding request is forwarded throughout the entire network to ensure that after this recovery phase, every node uses the new virtual graph Z i+1 . In the new expander, each real node simulates a greater number (by some constant factor) of virtual vertices and now node v will find a spare virtual vertex on the first attempt with high probability, according to Lemma 4.(a). Similarly, for deletions nodes will invoke deflate() (cf. Algorithm A.5) if LowGt−1 is small
12
(i.e., most nodes have a high load), causing the network to be rebuilt according to some smaller virtual expander, therefore reducing the average load in the network. Lemma 7. Suppose that t is an inflation step and Gt−1 is 4ζ-balanced. The following properties hold: (a) Gt is 4ζ-balanced. (b) With high probability, step t completes in O(Tlarger + log3 n) rounds where Tlarger is the running time of Dexpander.larger(i). (c) With high probability, nodes send O(n log2 n) messages in step t. (d) The number of topology changes is O(n). The analogous result holds when t is a deflation step, i.e., we get a running time bound (with high probability) of O(Tsmaller + log3 n) where Tsmaller is the running time of Dexpander.smaller(i). Proof. We show the result for procedure inflate(), the proof for deflate() is analogous. For (a), we know by Lemma 5 that every node simulates at least one vertex, thus ΦGt is surjective. We thus need to show that by the end of step t, every node has a load 6 4ζ. Consider any node u that had Load(v) ∈ (2ζ, 4ζ) after invoking Dexpander.larger(i). To see that u’s load does not exceed 4ζ, it is sufficient to observe that, according to Line 11 of Algorithm A.4, u will mark all its vertices as full and henceforth will not accept any new vertices. Any node v that has a load > 4ζ after calling Dexpander.larger(i) will mark itself as contending and initiates random walks for each of its vertices that needs to be redistributed. By Fact 2.(a), at most θn nodes have a load > 1 in Ut . Let Balls0 be the set of vertices that need to be redistributed. Property 1.(a) tells us that the every vertex in Z i is replaced by (at most) ζ new vertices in Z i+1 , which means that |Balls0 | 6 4θ(ζ 2 − ζ)n, since every such high-load node continues to simulate 4ζ by itself. To ensure that this redistribution can be done efficiently, we need to lower bound the total number of available places (“bins”) for these virtual vertices (“balls”). According to Algorithm A.4, any node that does not mark its vertices as full, will accept to simulate additional vertices until its load reaches 2ζ. By Fact 2.(a), we know that > (1 − θ)n nodes have a load of at most ζ after invoking Dexpander.larger(i). We call the set of all virtual vertices not marked as full as Bins; we have |Bins| > (1 − θ)ζn. We first show that with high probability, a constant fraction of random walks end up at vertices that are not marked as full. Since Z i+1 is a regular expander, the distribution of the random walk rapidly converges to the uniform distribution (e.g., [33]). In particular, within O(log σ) random steps, for σ = |Z i+1 | ∈ Θ(n), the distance (measured in the maximum norm) 1 to the uniform distribution can be bounded by 100σ . That is, the probability for a random walk 99 101 token to end up at a specific vertex is within [ 100σ , 100σ ]. Recall that all nodes have computed i+1 the same graph Z and thus use the same value σ. Due to Fact 1 we can simulate the random walk on the virtual graph Z i+1 on the actual network with constant overhead. We divide the random walks into phases where a phase is an interval of rounds containing Ω(log n) random walks. Moreover, a phase is minimal w.r.t. the number of rounds, i.e., it does not contain any smaller phase. We denote the number of balls that still need to be redistributed at the beginning of phase i as Ballsi . While the number of available bins will decrease over 9 |Bins|; thus, at any phase, we can use the time, we know from (3) that |Bins| − |Balls0 | > 10
13
bound |Bins| >
9 (1 − θ)ζn. 10
(7)
Claim 1. Consider a constant c. If |Ballsi | > c log n, then phase i takes O(log2 n) rounds, w.h.p. On the other hand, if |Ballsj | < c log n, then j comprises O(log3 n) rounds w.h.p. Proof. We will now show that a phase lasts at most O(log3 n) rounds with high probability. First, suppose that |Ballsi | > c log n. By Lemma 2.2 of [8], we know that even a linear number of parallel walks (each of length Θ(log n)) will complete within O(log2 n) rounds w.h.p; any token that has not completed its walk is considered as a walk ending at a full vertex. Therefore, phase i consists of O(log2 n) rounds, since Ω(log n) random walks are performed in parallel. In the case where |Ballsj | < c log n, it is possible that a phase consists of random walks that are mostly performed sequentially by the same nodes (e.g., if |Ballsj | ∈ Θ(1)). Thus we need to add a log n factor to ensure that j consists of Ω(log n) walks; applying Lemma 2.2 of [8] gives the required O(log3 n) rounds. We will first argue that after O(log n) phases, we have |Ballsj | < c log n. Thus consider any phase i where |Ballsi | > c log n. We bound the probability pfull of the indicator random variable Yk that is 1 iff the walk associated with the k-th ball (i.e. virtual vertex) ends up at a full vertex. Considering (7), we have 101 101 9(1 − θ)ζn pfull 6 (σ − |Bins|) 6 1− 100σ 100 10σ 101 3 · , 6 100 20 where the last inequality follows from σ 6 ζ(1 − θ)n + 4ζ 2 θn, and the fact that (3) implies 1− Let Y = show that
P
k∈Ballsi
9(1 − θ)ζ < 3/20. 10((1 − θ)ζ + 4ζ 2 θ)
Yk . Since |Ballsi | = Ω(log n) , we can use a Chernoff bound (e.g. [32]) to 909 909 Pr Y > |Ballsi | > 6E[Y ] 6 2− 1000 |Ballsi | , 1000
since all Yk are independent random variables. That is, with high probability (in n), a constant fraction of the random walks in phase i will end up at non-full vertices. We call the vertices associated with these walks good balls and denote this set as Goodi . We will now show that a constant fraction of good balls do not clash with high probability, i.e., we are able to successfully redistribute the respective vertices. Let Xk be the indicator random variable that is 1 iff the k-th ball is eliminated. Recall from Algorithm A.4 that this happens if no other (good) random walk ends up at the same bin. We have Pr[Xk = 101 )|Goodi |−1 > e−Θ(1) . That is, at least a constant fraction of the balls in 1] > (1 − 100|Bins| Goodi are eliminated on expectation. Observe that whether a ball is eliminated, can affect the elimination of at most one other ball in the same phase. Thus we can apply the method of bounded differences; by the Azuma-Hoeffding Inequality (cf. Theorem 12.6 in [32]), we get a
14
concentration bound, i.e., with high probability, a constant fraction of the balls are eliminated in every phase. We have therefore shown that after O(log n) phases, we are left with less than c log n vertices that need to be redistributed, w.h.p.; let j be the first phase when |Ballsj | < c log n. Note that phase j consists of Ω(log n) random walks. By the same argument as above, we can show that with high probability, a constant fraction of these walks will end up at some non-full vertices without conflicting with another walk and are thus eliminated. Since we only need c log n walks to succeed, this ensures that the entire set Ballsj is redistributed w.h.p. in phase j, therefore completing the proof of (a). By Claim 1, the first O(log n) phases can each last O(log2 n) rounds, while only phase j takes O(log3 n) rounds. Altogether, this gives a running time bound of O(log3 n), as required for (b). For Property (c), note that the flooding of the inflation request to all nodes in the network requires O(n) messages. This, however, is dominated by the phases for redistributing the load, each of which requires O(n log n) messages. Since we are done w.h.p. in O(log n) phases, we get a total message complexity of O(n log2 n). For (d), observe that the sizes of the virtual expanders Z i and Z i+1 are both in O(n). Due to their constant degrees, at most O(n) edges are affected by replacing the edges of Z i with the ones of Z i+1 , yielding a total work complexity of O(n) for inflate(). Note that for the expander constructions considered in this paper (cf. Section 3), the bound Tlarger is O(log2 n), giving an overall polylogarithmic running time for an inflation step. After the construction of the new expander Z i+1 , there might be a small fraction of real nodes, all of which have a load exceeding the constant threshold. In that case, we redistribute the additional load of these nodes among the (much larger) fraction of nodes with spare capacity by simultaneously performing random walks. By leveraging the rapid mixing of random walks in expander graphs, we nevertheless can maintain the polylogarithmic runtime bounds stated in Lemma 7. That is, we can consider the redistribution of the additional load as a balls-intobins scenario and obtain efficient redistribution with high probability by applying the AzumaHoeffding bound. Lemma 6 and Lemma 7 imply the following: Lemma 8. For all steps t, the network graph Gt , is 4ζ-balanced. Proof. The result follows by induction on t. For the base case, note that according to Algorithm 2.1, we initialize G0 to be a one-to-one mapping of the expander Z 0 , which obviously guarantees that Gt is 4ζ-balanced. For the induction step, we perform a case distinction depending on whether t is a simple or inflation/deflation step and apply the respective result (i.e. Lemmas 6 or 7). Corollary 1. Consider any expander construction of Section 3 and let dZ be the fixed degree of the corresponding virtual graph. There is a universal constant ∆ = 4ζdZ such that every node in the network graph Gt has a maximum degree of at most ∆ at all times t. Moreover, Gt has a spectral expansion of λGt 6 λZ i . Proof. Since Gt is 4ζ-balanced at all times t (cf. Lemma 8), we immediately have the required constant bound on ∆. In conjunction with Lemma 2, the mapping ΦGt gives us the sought bound on the spectral gap. 2.2.3. Amortized Performance Bounds We will now show that the expensive inflation/deflation steps occur rather infrequently. this will allow us to amortize the cost of the worst case bounds derived in Section 2.2.2.
15
Lemma 9. There exists a constant δ such that the following holds: If procedures inflate() or deflate() are invoked in steps t1 and t2 , then t1 and t2 are separated by at least δ · |Gt1 | ∈ Ω(n) steps where n is the size of Gt1 . For the proof of Lemma 9 we require the following 2 technical results: Claim 2. Suppose that t is an inflation step. Then |LowGt | > (Θ + 12 )n. Proof of Claim 2. First, consider the set of nodes S = Ut \ SpareUt , i.e., LoadUt (u) = 1 for all u ∈ S. By Fact 2.(a), we have |S| > (1 − θ)n. Clearly, any such node u ∈ S simulates at most ζ virtual vertices after generating its own vertices for the new virtual graph, hence the only way for u to reach LoadGt (u) > 2ζ is by taking over vertices generated by other nodes. By the description of procedure inflate(), only (a subset of) the nodes in SpareUt redistribute their load by calling sampleLowLoad(). By Lemma 8, we can assume that Gt1 −1 is 4ζ-balanced. Since |SpareUt | < θn, we have a total of 6 (4ζ − 4)θn clouds that need to be redistributed. Observe that v continues to simulate 4 clouds (i.e. 4ζ nodes) by itself. Since every node that is in S, has at most ζ virtual nodes, we can bound the size of LowGt by subtracting the redistributed clouds from |S|. For the result to hold we need to show that 1 (θ + ) 6 1 − θ − (4ζ − 4)θ, 2 which immediately follows by Inequality (3). Claim 3. Suppose that t is a deflation step. Then |SpareGt | > (θ +
1 4ζ )n.
Proof of Claim 3. Consider the set S = {u : LoadUt (u) > 2ζ}. Since S = Ut \LowUt , Fact 2.(b) tells us that |S| > (1 − θ)n and therefore we have a total load of least (1 − θ)(2ζ + 1)n + θn in Ut . By description of procedure deflate(), every cloud of virtual vertices is contracted to a single virtual vertex. After deflating we are left with 1 θ Load(Gt ) > (1 − θ)(2 + ) + n. ζ ζ 1 To guarantee the sought bound on SpareGt , we need to show that Load(Gt ) > (1 + θ + 4ζ )n. 1 1 This is true, since by (3) we have θ 6 3 + 4ζ . Therefore, by the pigeon hole principle, at least 1 θ + 4ζ nodes have a load of at least 2.
Proof of Lemma 9. It is easy to see that the values computed by procedures computeSpare() and computeLow() cannot simultaneously satisfy the thresholds of Fact 2, i.e., inflate() and deflate() are never called in the same step. Let t1 , t2 , . . . be the set of steps where, for every i > 1, a node calls either Procedure inflate() or Procedure deflate() in ti . Fixing a constant 1 δ 6 4ζ , we need to show that ti+1 − ti > δn. We distinguish several cases: 1. ti inflate(); ti+1 inflate(): By Fact 2.(a) we know that SpareUti contains less than θn nodes. Since we inflate in ti , every node generates a new cloud of virtual vertices, i.e., the load of every node in Uti is (temporarily) at least ζ (cf. Algorithm 3.1). Moreover, the only way that the load of a node u can be reduced in ti , is by transferring some virtual vertices from u to a newly inserted node w. However, by the description of inflate() and the assumption that ζ > 2, we still have LoadGt (u) > 1 (and LoadGt (w) > 1), and therefore SpareGti ⊇ V (Gti ) \ {w}. Since the virtual graph (and hence the total load)
16
remains the same during the interval (ti , ti+1 ), it follows by Lemma 8 that Spare can shrink by at most the number of insertions during (ti , ti+1 ). Since |SpareUti+1 | < θn, more than (1 − θ)n − 1 > δn insertions are necessary. 2. ti deflate(); ti+1 deflate(): We first give a lower bound on the size of LowGti . By Lemma 7, we know that load at every node is at most 4ζ in Uti . Since every virtual cloud (of size ζ) is contracted to a single virtual zertex in the new virtual graph, the load at every node is reduced to at most 4. Clearly, the nodes that are redistributed do not increase the load of any node beyond 4, thus LowGt = Gt . Analogously to Case 1, the virtual graph is not changed until ti+1 and Lemma 8 tells us that Low is only affected by deletions, i.e., (1 − θ)n > δn steps are necessary before step ti+1 . 3. ti inflate(); ti+1 deflate(): By Claim 2, we have |LowGti | > (θ +1/2)n, while Fact 2.(b) tells us that |LowGti+1 | < θn. Again, Lemma 8 implies that the adversary must delete at least n/2 > δn nodes during (ti , ti+1 ]. 1 4. ti deflate(); ti+1 inflate(): By Claim 3, we have |SpareGti | > (θ+ 4ζ )n, and by Fact 2.(a), we know that |SpareGti+1 | < θn. Applying Lemma 8 shows that we must have more than 1 4ζ n > δn deletions before ti+1 .
Lemmas 6 and 7 show that, with high probability, simple healing steps have a worst case running time of O(log n) rounds and O(log n) messages, whereas inflation or deflation steps have a worst case running time of O(log3 n) rounds and use O(n log2 n) messages. By Lemma 9, we immediately get the following amortized complexity bounds: Corollary 2. With high probability, the amortized running time of any healing step is O(log n) rounds, the amortized message complexity of any healing step is O(log2 n), while the amortized work complexity is O(1). Theorem 1. Consider an adaptive adversary that observes the entire state of the network including all past random choices and inserts or removes a single node in every step. There is a distributed algorithm for deterministically maintaining a constant degree expander network that has a constant spectral gap. On average, the algorithm takes O(log n) rounds where n is the current network size, uses a polylogarithmic number of messages and requires a constant number of topology changes per healing step, with high probability. Moreover, the worst case running time is bounded by O(log3 n) rounds, with high probability. Proof. Corollaries 1 gives the constant degree bound. The worst case bounds follow from Lemmas 6 and 7, whereas Corollary 2 shows the amortized complexity bounds.
3. Distributed Expander Constructions Our framework works with any expander construction for which there are distributed implementations of Procedures Dexpander.larger(i) and Dexpander.smaller(i). We show this for p-cycles, (cf. Section 3.1), replacement and zig − zag products (Section 3.2). The following properties of the respective Dexpander.larger(i) and Dexpander.smaller(i) procedures need to be satisfied by any expander construction in order to be applicable to our framework.
17
Property 1 (Properties of Dexpander.larger(i)). If the network graph Gt is a D-balanced image of the current virtual graph Z i , then procedure Dexpander.larger(i) needs to ensure that every node computes the same virtual graph Z i+1 in O(polylog(n)) rounds such that (a) |Z i+1 | ∈ (C1 |Z i |, C2 |Z i |) for some fixed constants C1 , C2 > 4, the resulting network graph is (Dζ)-balanced where ζ ∈ O(1) is the maximum cloud size4 and (b) Z i+1 has the spectral gap of Z i . Property 2. If the network graph Gt is a C-balanced map of Z i , then procedure Dexpander.smaller(i) ensures that every node computes the same virtual graph Z i−1 in O(log n) rounds such that (a) |Z i−1 | ∈ (|Z i |/C1 , |Z i |/C2 ), for constants C1 , C2 > 4 and the resulting network graph is (Cζ)-balanced; (b) Z i−1 maintains the spectral gap of Z i .
3.1. p-Cycle with Inverse Chords Essentially, we can think of a p-cycle as a numbered cycle with some chord-edges between numbers that are multiplicative inverses of each other. It was shown in [30] that this yields an infinite family of 3-regular expander graphs with a constant eigenvalue gap. Figure 4 shows a 23-cycle. Definition 4 (cf. [20]). For every prime number p, we define the following graph G. The vertex set of G is the set Zp and there is an edge between vertices x and y if and only if either conditions hold: (1) y = (x + 1) mod p, (2) y = (x − 1) mod p, or (3) if x, y > 0 and y = x−1 . Moreover, vertex 0 has a self-loop. We now give a brief overview how we adapt this construction to our framework. For simplicity, we use x to denote both: an integer x ∈ Zp and also the associated vertex in V (Z i ). Note that at any step, all nodes are in agreement on the current virtual graph Z i , which is larger than the network graph Gt by at most a constant factor. As our analysis shows (cf. Lemma 2.(a)), nodes only invoke Dexpander.larger(i) (cf. Algorithm 3.1) if the size of Gt is close to Z i . For the p-cycle, we choose a prime number pi+1 ∈ (4pi , 8pi ), i.e., V (Z i ) = Zpi+1 . Bertrand’s postulate states that for every n > 1, there is a prime between n and 2n, which ensures that pi+1 exists. Every node u needs to determine the set of vertices in Z i+1 that it is going to simulate: Let α = pi+1 pi ∈ O(1). For every currently simulated vertex x ∈ SimGt−1 (u), node u computes the constant value c(x) = bα(x + 1)c − bαxc − 1. and replaces x with the new virtual vertices y0 , . . . , yc(x) where yj = (bαxc + j) mod pi+1 , for 0 6 j 6 c(x). This ensures that the new virtual vertex set matches exactly Zpi+1 . Next, we describe how we find the edges of Z i+1 in Dexpander.larger(i). First, we add new cycle edges (i.e. edges between x and x + 1 mod pi+1 ), which can be done in constant time by using the cycle edges of the old virtual graph Z i : For every x that u simulates, we need to add an edge to the node that simulates vertex x−1 . Since this needs to be done by the respective simulating node of every virtual vertex, this corresponds to solving a permutation routing instance. Corollary 7.7.3 of [41] states that for any bounded degree expander with n nodes, n packets, one per processor, can be routed (even online) according to an arbitrary n(log log n)2 permutation in O( loglog log log n ) rounds w.h.p. Note that every node in the network knows the exact topology of the current virtual graph, and can hence calculate all routing paths in this graph, which map to paths in the actual network (cf. Fact 1). Since every node simulates a 4
Note that rebalancing this additional load (i.e. achieving D-balancedness) is the task of our framework (cf. Procedure A.4).
18
constant number of vertices we can find the route to the respective inverse with a constant number of iterations. Initiating node u floods a request to all other nodes to run this procedure simultaneously; takes O(log n) time. 2: Since every node u knows the same virtual graph Z i , all nodes locally compute the same prime pi+1 ∈ (4pi , 8pi ) and therefore the same virtual expander Z i+1 with vertex set Zpi+1 .
1:
3:
(Compute the new set of locally simulated virtual vertices.) Let α = pi+1 pi and define the function c(x) = bα(x + 1)c − bαxc − 1.
(8)
Replace every x ∈ Sim(u) (i.e. x ∈ Z i ) with a cloud of virtual vertices y0 , . . . , yc(x) where yk = (bαxc + k) mod pi+1 , for 0 6 k 6 c(x). That is, cloud(y0 ) = · · · = cloud(yc(x) ) = {y0 , . . . , yc(x) }. 4: (Compute the new set of edges.) For every x ∈ Sim(u) and every yk , (0 6 k 6 c(x)): Cycle edges: Add an edge between u and the nodes v and v 0 that simulate yk − 1 and yk + 1 by using the cycle edges of Z i in Gt . Inverse edges: Add an edge between u and the node v that simulates yk−1 ; node v is found by solving a permutation routing instance. Algorithm 3.1: Procedure Dexpander.larger(i): compute a larger virtual p-cycle We now prove that the construction procedures Dexpander.larger(i) and Dexpander.smaller(i) satisfy Properties 1 and 2. The pseudo code of these procedures is given in Algorithms 3.1 and 3.2. Lemma 10. Suppose that Z i is a virtual expander graph according to Definition 4, i.e., V (Z i ) ∼ = Zpi . If the network graph Gt is a C-balanced image of Z i , then procedure Dexpander.larger(i) ensures that every node computes the same virtual graph in O(log n(log log n)2 ) rounds such that (a) pi+1 = |Z pi+1 | ∈ (4pi , 8pi ) and the resulting network graph is (Cζ)-balanced;5 (b) there is a one-to-one correspondence between Zpi+1 and V (Z i+1 ); (c) the edges of Z i+1 adhere to Definition 4. Proof. Property (a) follows readily from the discussion in Section 3.1. For Property (b), we first show set equivalence. Consider any z ∈ Zpi+1 and assume in contradiction that z ∈ / V (Z i+1 ). Let α = pi+1 pi and let x be the greatest integer such that z = bαxc + k, for some integer k > 0. By maximality of x, we have that k < dαe. Clearly x > pi , since z ∈ / V (Z i+1 ). This means that z = bαxc + k > bαpi c + k = bpi+1 c + k > pi+1 , which contradicts z ∈ Zpi+1 , thus we have shown Zpi+1 ⊆ V (Z i+1 ). The opposite relation, i.e. V (Z i+1 ) ⊆ Zpi+1 , is immediate since the values associated to vertices of Z i+1 are computed modulo pi+1 . To complete the proof of (b), we need to show that no two distinct vertices in V (Z i+1 ) correspond to the same value in Zpi+1 , i.e., V (Z i+1 ) is not a multi-set. Suppose, for the 5
Note that balancing this additional load (i.e. achieving C-balancedness) is the task of Procedure inflate() (cf. Algorithm A.4).
19
sake of a contradiction, that there are y = (bαxc+k) mod pi+1 and y 0 = (bαx0 c+k 0 ) mod pi+1 with y = y 0 . By the description of Dexpander.larger(i), we know that k 6 c(x) < dαe, thus it cannot be that y 0 = bαxc + k + mpi+1 , for some integer m > 1. This means that x 6= x0 ; wlog assume that x > x0 . Bounding k 0 by the maximum value attainable by c(x) (cf. (8)) shows that y 0 = bαx0 c + k 0 < bα(x0 + 1)c 6 bαxc 6 y, which is a contradiction to y = y 0 . For property (c), observe that all new cycle edges (i.e., of the form (x, x ± 1)) of Z i+1 are between nodes that were already simulating neighboring vertices of Z i , thus every node u can add these edges in constant time. Finally, we argue that every node can efficiently find the inverse vertex for its newly simulated vertices: Corollary 7.7.3 of [41] states that for any bounded degree expander with n nodes, n packets, one per processor, can be routed (online) n(log log n)2 according to an arbitrary permutation in T = O( loglog log log n ) rounds w.h.p. Note that every node in the network knows the exact topology of the current virtual graph, and can hence calculate all routing paths, which map to paths in the actual network (cf. Fact 1). Since every node simulates a constant number of vertices we can find the route to the respective inverse with a constant number of iterations each of which takes T rounds. We next look at Dexpander.smaller(i), which is used by procedure deflate() in our framework and effectively reduces the number of virtual vertices by a constant factor by choosing a prime number from the range (ps /8, ps /4) as the size of the new virtual graph Z s .6 Lemma 11. Suppose that Z i is a virtual expander graph according to Definition 4, i.e., V (Z i ) ∼ = i Zpi for some prime number pi . If the network graph Gt is a balanced map of Z , then procedure Dexpander.smaller(i) ensures that every node computes the same virtual graph Z s in O(log n(log log n)2 ) rounds such that (a) ps = |Z s | ∈ (pi /8, pi /4), where ps is a prime number; (b) there is a one-to-one correspondence between Zps and V (Z s ); (c) the edges of Z s adhere to Definition 4. Proof. Property (a) trivially holds. For (b), observe that by description of Dexpander.smaller(i), we map x ∈ Z i surjectively to yx ∈ Z s using the mapping yx = b αx c where α = ppsi . Note that we only add yx to V (Z s ) if there is no smaller x ∈ Z i that yields the same value in Zps , which guarantees that V (Z s ) is not a multiset. Suppose that there is some y ∈ Zps that is not hit by our mapping, i.e., for all x ∈ Zpi , we have y > b αx c. Let x0 be the smallest integer such that 0 y = b xα c. Clearly x0 exists, since we can always choose x0 such that x0 6 α(y + 1) 6 x0 + α. By assumption we have x0 > pi , which yields j p k x0 i 6 < ps . α α Since α =
pi ps ,
we get bps c =
jp k i
< ps , α which is a contradiction since ps is an integer. Therefore, we have shown that Zs ⊆ V (Z s ). To see that V (Z s ) ⊆ Zs , suppose that we add a vertex y > ps to V (Z s ). By the code of 6
Note that Dexpander.smaller(i) does not guarantee a valid virtual mapping. Instead we only construct the virtual graph and leave the redistribution of virtual vertices to the framework (i.e. procedure deflate()).
20
Dexpander.smaller(i), this means that there is an x ∈ V (Z i ), i.e., x 6 pi − 1, such that y = b αx c. Substituting for α yields a contradiction to y > ps , since j x k p − 1 ps i y= = ps − < ps . 6 α α pi For property (c), note that any cycle edge (y, y ± 1) ∈ E(Z s ), is between nodes u and v that were at most α hops apart in Gt , since their distance can be at most α in Z i . Thus any such edge can be added by exploring a neighborhood of constant-size in O(1) rounds via the cycle edges of Z i in Gt . To add an edge between y and its inverse y −1 , we proceed along the lines n(log log n)2 of the proof of Lemma 10, i.e., we solve permutation routing on Z i , taking O( loglog log log n ) rounds. Initiating node u floods a request to all other nodes to run this procedure simultaneously; takes O(log n) time. 2: Since every node u knows the same virtual graph Z i of size pi , all nodes locally compute the same prime ps ∈ (pi /8, pi /4) and therefore the same virtual expander Z s with vertex set Zps . 3: (Compute the new set of locally simulated virtual vertices Sim’(u) ⊂ Z s .) Let α = ppsi . For every x ∈ Sim(u) (i.e. x ∈ Z i ) we compute yx = b αx c. If there is no x0 < x such that yx0 = yx , we add yx to Sim’(u). This yields the (possibly empty) set Sim’(u) = {yx1 , . . . , yxk }, where x1 , . . . , xk ∈ Z i are a subset of the previously simulated vertices at u. If Sim’(u) = ∅, we mark u as contending. For every vertex yxj , we set cloud(yxj ) = {m : (m − 1)bαc 6 yxj < mbαc}. 4: (Compute the new set of edges.) For every yxj ∈ Sim’(u), (1 6 j 6 k), do: Cycle edges: Add an edge between u and the nodes v and v 0 that simulate yxj − 1 and yxj + 1 by using the cycle edges of Z i in Gt . Inverse edges: Add an edge between u and the node v that simulates yk−1 ; node v is found by solving a permutation routing instance. 1:
Algorithm 3.2: Procedure Dexpander.smaller(i): compute a smaller virtual p-cycle
3.2. Graph Products In this section we show how that our framework is not only applicable to algebraic constructions like the p-cycle, but can also be used on top of combinatorial expander constructions that have gained prominence in recent years (cf. [38]). More specifically, we describe how to apply our framework to two graph products, the replacement product and the zig-zag product, introduced by [38]. Intuitively speaking, the replacement product G r H on graphs G and H replaces every vertex of G with a cloud of H, while keeping the edges of both graphs. Note that the naming of these vertices as clouds is by no means a coincidence, as these sets of vertices naturally correspond to the notion of “cloud” in our framework (cf. Section 2). We follow the notation of [26]. Consider graphs G = hnG , dG , λG i and H = hdG , dH , λH i. For a vertex g ∈ G, let Eg denote the edges in G that are incident to g. Since the degree of
21
G matches the number of vertices in H, there is a bijection Lg : V (H) → Eg , which we call a labeling at g, and call L = {Lg | g ∈ V (H)} a labeling from H to G. Definition 5 (cf. [38]). Let G = hnG , dG , λG i and H = hdG , dH , λH i and consider a labeling L from H to G. We define the vertices of the replacement product R = G r H to be the Cartesian product of V (G) and V (H). For any vertex gh ∈ R, we add an edge to all vertices g 0 h0 ∈ R that satisfy one of the following conditions: 1. There is an edge (h, h0 ) ∈ E(H). 2. The label Lg (h) yields an edge e = (g, g 0 ) ∈ E(G) where h0 = L−1 g 0 (e). We get G r H = (nG · dG , dH + 1, λ r ) where λ r is a function of λG and λH . The zig-zag product G z H replaces each vertex of G by a copy of the vertices of H, a so called H-cloud, and then adds an edge for every “walk” of length 3 that has the following form: For vertex z1 , we first make a move inside the H-cloud (according to E(H)), then jump across an edge that connects this cloud to another H-cloud, and finally again make a move inside the cloud, ending up at some vertex z2 . Note carefully that these “intermediate edges” along which we traveled from z1 to z2 are not part of the zig-zag graph. Definition 6 (cf. [38] and [26]). Let G = hnG , dG , λG i and H = hdG , dH , λH i and consider a labeling L from H to G. The vertices of the zig-zag graph Z = G z H are given by V (G)×V (H). For every vertex g ∈ G, we get a set of vertices gh1 , . . . , ghdG in Z, which we call H-cloud. Consider a vertex z1 = g1 h1 ∈ Z. We add an edge between z1 and every vertex g2 h2 that we can reach the following way: 1. (cloud-internal move) Starting at g1 h1 , choose a vertex g1 h01 where h01 is incident to h1 in H. 2. (inter-cloud move) Evaluating the labeling Lg1 (h01 ) yields an edge e between g1 and some 0 vertex g2 ∈ G. Consider vertex h02 = L−1 g2 (e) and jump to vertex g2 h2 . 3. (cloud-internal move) Now, jump to a vertex g2 h2 where h2 is incident to h02 in H. That is, the zig-zag product takes expanders G and H, and yields G z H = (nG · dG , d2H , λ z ) where λ z is a function of λG and λH . Theorem 3 (Degrees and spectral gap bounds for replacement- and zig-zag product, cf. [38]). Let G = hnG , dG , λG i and H = hdG , dH , λH i be expanders. Then, G r H = hnG · dG , d + 1, λ r i and G z H = hnG · dG , d2H , λ z i, where λ r and λ z are both constants. What makes both, the replacement product and the zig-zag product suitable for an iterative construction is the fact that combining a somewhat large degree graph G with a small degree graph H yields a graph whose degree only depends on the degree of H. Note that both products somewhat worsen (by a constant) the spectral gap, compared to the original graphs. Thus we cannot simply keep applying the replacement product (or zig-zag product) as this will cause the spectral gap to go to 0. To increase the spectral gap, we take the k-th power of the graph (for some even constant k > 2), by adding edges for all walks of length up to k. For example, if G = hn, d, λi, then the square of G is a hn, d2 , λ2 i graph (cf. Fact 1.2 in [38]). Thereofore, taking the graph power allows to compensate for the loss of expansion by performing the graph product. (For the zig-zag product, it is sufficient to simply square the graph before performing the product, while for the replacement product it is necessary to take a larger, but nevertheless constant, power.) Let H be a h(d2 + d + 1)2 , d, λH i expander, for some λH 6 15 . Starting out with an n0 node expander network Z 0 = hn0 , d2 + d + 1, 15 i we can iteratively define a family of expanders Z i+1 = Z i · Z i z H. Using the spectral gap bounds of [38] it can easily be shown that this
22
iteration produces an infinite expander family with λ 6 2/5. An analogous expander family construction can be done for the replacement product, but due to the weaker bound on λ r we get a significantly higher (constant) degree due to the powering of the graph. 3.2.1. Dexpander Procedures We now discuss how to implement Dexpander.larger(i) and Dexpander.smaller(i) for the graph products, which, due to their local nature, have a constant runtime bound.7 We first discuss the replacement product. Starting out with an expander Z i we take the k-th power of the graph, for some constant k, which can be done locally in constant time by exploring the k-neighborhood and adding the required edges; this guarantees Property 1.(b). Then, we simply perform the replacement product with a constant sized graph of appropriately chosen size, which causes every vertex of Z i to be replaced by a cloud of vertices in Z i+1 , which gives us Property 1.(a), for a sufficiently large constant C. For procedure Dexpander.smaller(i), we keep track of the iteration when a specific edge was added and whether it was added due to the replacement product being performed or due to graph powering. Assuming that the current virtual graph is Z i , we simply drop all edges that where added by graph-powering in the i-th iteration. Moreover, we simply contract all replacement edges that where added in iteration i, yielding exactly Z i−1 . The reason why Dexpander.smaller(i) and Dexpander.larger(i) are very straightforward for the replacement product is that the information about the edges and vertices in Z i can be reconstructed from Z i+1 . This, however, is not the case for the zig-zag product, since edges from the (smaller) virtual graph Z i , cannot be reconstructed from Z i+1 by simple edge contracting. In order to use the zig-zag product in this setting, we simply add all edges that are yielded by the replacement product to the zig-zag product. Since the clouds generated by the zig-zag product and the replacement product are the same, the above mentioned method of keeping track when a specific edge was added, allows us to reconstruct the previous zig-zag iteration Z i−1 out of the current graph Z i . Note that this only adds a constant number of edges in addition to the edges of zig-zag. Since the spectral gap bound of [38] for the zig-zag product only holds when considering exclusively zig-zag edges (without the replacement product), we distinguish between the helper edges (i.e. the edges of the replacement product) and the actual communication edges of the zig-zag product. That is, we can still give a spectral gap bound on the overlay graph consisting only of communication edges.
4. Extension: Multiple Nodes Dynamic Model Our framework can be directly extended to a weaker model which allows the adversary to do multiple insertions and deletions at each time step, with certain assumptions. Thus, we would have the following: Insertions: The adversary can insert or delete a set N of up to εn many nodes in each step, for some small ε > 0. We restrict the adversary to attach only a constant number of nodes in N to any node—dropping this restriction will allow the adversary to place the whole set N at the same node u, causing significant congestion due to u’s constant degree. Deletions: For deletions, we only allow the adversary to delete nodes that leave the remainder graph connected, i.e., if the adversary removes nodes N at time t, Gt−1 \ N is still connected. Moreover, each deleted node must have at least one neighbor in the set Gt−1 \ N . 7
Note that this bound is subsumed by the running time bounds of Procedures inflate() and deflate() (cf. Lemma7).
23
From a performance perspective, the case of allowing O(n) insertions or deletions boils down to invoking the inflate(), resp. deflate() procedures in every step, which of course gives us a much larger bound on running time and message complexity.
5. Lower Bound We present a lower bound proof on the number of messages and time needed for the maintenance of a constant degree expander. The basic proof strategy is that beginning with an expander of a large enough size n, the adversary executes a strategy over Ω(n) steps where each node she inserts is at a distance of Ω(log n) from each of the earlier insertions. However, the Expander Mixing Lemma (cf. Lemma 1) implies that, for an expander, the nodes in this set must now have added enough edges among themselves to yield the bound. Theorem 2. Any distributed algorithm, that deterministically maintains an (constant degree) expander in our self-healing model against an adaptive adversary, needs at least Θ(log n) messages and rounds on average per node insertion. Proof. We show the lower bound over a sequence of O(n) insertions. Recall that a (constant degree) expander has a diameter of Θ(log n). Let Gt be a (constant degree) expander at timestep t. Let G0 have n nodes i.e. be an n node expander. The following adversarial strategy forces any expander construction algorithm in our model to use at least Θ(log n) messages and time: Over the next 2cn steps, the adversary adds, if possible, at each step, exactly one node with a single edge to the graph in the following fashion: consider that a node vt is being inserted to form a set Vt such that vt is at a distance of Ω(log n) from every node in Vt−1 . To begin the sequence, let V0 be the empty set and V1 = {v1 } be the single node inserted to form graph G1 . Node v2 is inserted at a distance of Ω(log n) from v1 to form graph G2 (note that the diameter of G1 was Θ(log n), so this insertion is certainly possible). Now V2 = {v1 , v2 }, and so on. Consider the insertion of node vi . If the adversary is able to add node vi so that it is at distance Ω(log n) from every node in Vi−1 , we say that vi is correctly inserted, otherwise, we say it is incorrectly inserted. Notice that it is not possible to always insert a node correctly; in particular, for a correct insertion, node vi can only be connected to a node in Gi−1 \ Vi−1 but over a sequence of O(n) insertions, each of these nodes may have a short path (less than Ω(log n)) to a node in Vi−1 . If node vi is correctly inserted, we continue to node vi+1 . We distinguish the following cases: 1. i < 2cn and node vi cannot be correctly inserted : Consider the largest subset C of Vi such that each node in C is at distance o(log n) from vi . Two cases arise: (a) C is small (of size o(|Vi |)): The adversary continues, trying to insert new nodes so that they are at distance Ω(log n) from (Vi \ C). (b) C is large (of size Θ(|Vi |)): Consider any two nodes in this set; at some point of our sequence of insertions, they were Ω(log n) apart, but now they are only o(log n) apart. By assumption, every node only knows a O(1)-neighborhood before the sequence of insertions, therefore, the number of rounds and messages needed to close this distance is at least Θ(log n). 2. i = 2cn; the adversary is able to correctly insert nodes for 2cn steps: Let |Vt | be of size 2kn(k 6 c). Consider the earlier added kn nodes as a set U and the later added nodes as set V . Applying the Expander Mixing Lemma (Lemma 1, Section 1.1), we get, for t = 2cn and dk |U | = |V | = kn, that the number of edges between U and V are at least kn( 2c+1 − λ). Note that for λ a little less than d/2 (if k = d), this is a positive quantity, and implies that Θ(n)
24
edges exist between U and V . Consider any one such edge, between u (in U ) and v (in V ). When u and v were added, they were Θ(log n) apart (by the adversary’s strategy), but now they are neighbors. Since nodes u and v initially only know a O(1)-neighborhood, this means that Θ(log n) messages and rounds were needed to add this edge by the self-healing algorithm. Thus over all the Θ(n) edges, the total time and messages needed is Θ(n log n), or about Θ(log n) messages and time on average for an addition. A similar analysis applies to Case 1(b) if C was of size Θ(n). Note that this is easily possible for a long enough sequence of insertions. Since Theorem 2 also holds in the LOCAL model (cf. [36]) where messages can be of arbitrary size, the question arises how our framework performs in the absence of congestion. Substituting all random walks by a BFS-like flooding mechanism immediately yields an optimal worst case runtime of O(log n), for all healing steps.
6. Conclusion and Future Work We have presented a distributed framework for self-healing an expander network efficiently. While our running time bounds hold with high probability, our guarantees on the spectral gap and the degree bound hold at all times. We believe that the separation of the construction algorithms from the handling of adversarial changes makes our approach a useful framework for exploring other expander constructions. There are many open questions: How can we deal with large (i.e. linear) amount of churn efficiently? Are there self-healing algorithms for maintaining expanders using O(log n) rounds and messages in the worst case? What about self-stabilizing constructions? In general, can our virtual framework handle other kinds of constructions and distributed computing problems (we believe so)?
References [1] Noga Alon and Yuval Roichman. Random cayley graphs and expanders. Random Structures Algorithms, 5:271–284, 1997. [2] Noga Alon and Joel Spencer. The Probabilistic Method. John Wiley, 1992. [3] James Aspnes and Udi Wieder. The expansion and mixing time of skip graphs with applications. Distributed Computing, 21(6):385–393, 2009. [4] James Aspnes and Yitong Yin. Distributed algorithms for maintaining dynamic expander graphs, 2008. [5] John Augustine, Gopal Pandurangan, Peter Robinson, and Eli Upfal. Towards robust and efficient computation in dynamic peer-to-peer networks. In SODA, 2012. [6] Fan Chung. Spectral Graph Theory. American Mathematical Society, 1997. [7] Colin Cooper, Martin Dyer, and Andrew J. Handley. The flip markov chain and a randomising p2p protocol. In Proceedings of the 28th ACM symposium on Principles of distributed computing, PODC ’09, pages 141–150, New York, NY, USA, 2009. ACM. [8] Atish Das Sarma, Danupon Nanongkai, Gopal Pandurangan, and Prasad Tetali. Efficient distributed random walks with applications. In PODC, 2010.
25
[9] Shlomi Dolev and Nir Tzachar. Spanders: distributed spanning expanders. In SAC, pages 1309–1314, 2010. [10] Cynthia Dwork, David Peleg, Nicholas Pippenger, and Eli Upfal. Fault tolerance in networks of bounded degree. SIAM J. Comput., 17(5):975–988, 1988. [11] Amos Fiat and Jared Saia. Censorship resistant peer-to-peer content addressable networks. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms, SODA ’02, pages 94–103, Philadelphia, PA, USA, 2002. Society for Industrial and Applied Mathematics. [12] Amos Fiat and Jared Saia. Censorship resistant peer-to-peer networks. Theory of Computing, 3(1):1–23, 2007. [13] Ofer Gabber and Zvi Galil. Explicit constructions of linear-sized superconcentrators. J. Comput. Syst. Sci., 22(3):407–420, 1981. [14] David Gillman. A Chernoff bound for random walks on expander graphs. SIAM J. Comput., 27(4):1203–1220, 1998. [15] C. Gkantsidis, M. Mihail, and A. Saberi. Random walks in peer-to-peer networks: Algorithms and evaluation. Performance Evaluation, 63(3):241–263, 2006. [16] Thomas P. Hayes, Jared Saia, and Amitabh Trehan. The forgiving graph: a distributed data structure for low stretch under adversarial attack. In PODC ’09: Proceedings of the 28th ACM symposium on Principles of distributed computing, pages 121–130, New York, NY, USA, 2009. ACM. [17] Thomas P. Hayes, Jared Saia, and Amitabh Trehan. The forgiving graph: a distributed data structure for low stretch under adversarial attack. Distributed Computing, pages 1–18, 2012. [18] Tom Hayes, Navin Rustagi, Jared Saia, and Amitabh Trehan. The forgiving tree: a selfhealing distributed data structure. In PODC ’08: Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing, pages 203–212, New York, NY, USA, 2008. ACM. [19] Keren Censor Hillel and Hadas Shachnai. Partial information spreading with application to distributed maximum coverage. In PODC ’10: Proceedings of the 28th ACM symposium on Principles of distributed computing, New York, NY, USA, 2010. ACM. [20] Shlomo Hoory, Nathan Linial, and Avi Wigderson. Expander graphs and their applications. Bulletin of the American Mathematical Society, 43(04):439–562, August 2006. [21] Riko Jacob, Andrea Richa, Christian Scheideler, Stefan Schmid, and Hanjo T¨aubig. A distributed polylogarithmic time algorithm for self-stabilizing skip graphs. In Proceedings of the 28th ACM symposium on Principles of distributed computing, PODC ’09, pages 131–140, New York, NY, USA, 2009. ACM. [22] Jeong Han Kim and Van H. Vu. Generating random regular graphs. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, STOC ’03, pages 213–222, New York, NY, USA, 2003. ACM.
26
[23] Valerie King and Jared Saia. Breaking the O(n2 ) bit barrier: Scalable byzantine agreement with an adaptive adversary. J. ACM, 58:18:1–18:24, July 2011. [24] Valerie King, Jared Saia, Vishal Sanwalani, and Erik Vee. Towards secure and scalable computation in peer-to-peer networks. In FOCS, pages 87–98, 2006. [25] Sebastian Kniesburges, Andreas Koutsopoulos, and Christian Scheideler. Re-chord: a self-stabilizing chord overlay network. In Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures, SPAA ’11, pages 235–244, New York, NY, USA, 2011. ACM. [26] Mike Krebs and Anthony Shaheen. Expander Families and Cayley Graphs. Oxford University Press, 2011. [27] Fabian Kuhn, Stefan Schmid, and Roger Wattenhofer. A Self-Repairing Peer-to-Peer System Resilient to Dynamic Adversarial Churn. In 4th International Workshop on Peer-ToPeer Systems (IPTPS), Cornell University, Ithaca, New York, USA, Springer LNCS 3640, February 2005. [28] C. Law and K.-Y. Siu. Distributed construction of random expander networks. In INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications. IEEE Societies, volume 3, pages 2133 – 2143 vol.3, march-3 april 2003. [29] David Liben-Nowell, Hari Balakrishnan, and David R. Karger. Analysis of the evolution of peer-to-peer systems. In PODC, pages 233–242, 2002. [30] Alexander Lubotzky. Discrete groups, expanding graphs and invariant measures, volume 125 of Progress in Mathematics. Birkhuser, 1994. [31] G. A. Margulis. Explicit construction of concentrators. Problemy Peredachi Informatsii, 9(4):71–80, 1973. English translation Problems of Information Transmission, Plenum, New York (1975)). [32] Michael Mitzenmacher and Eli Upfal. Probability and computing - randomized algorithms and probabilistic analysis. Cambridge University Press, 2005. [33] Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, 1995. [34] Gopal Pandurangan, Prabhakar Raghavan, and Eli Upfal. Building low-diameter P2P networks. In FOCS, pages 492–499, 2001. [35] Gopal Pandurangan and Amitabh Trehan. Xheal: localized self-healing using expanders. In Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing, PODC ’11, pages 301–310, New York, NY, USA, 2011. ACM. [36] David Peleg. Distributed Computing: A Locality Sensitive Approach. SIAM, 2000. [37] Nicholas Pippenger and Geng Lin. Fault-tolerant circuit-switching networks. SIAM J. Discrete Math., 7(1):108–118, 1994. [38] Omer Reingold, Salil Vadhan, and Avi Wigderson. Entropy waves, the zig-zag graph product, and new constant-degree expanders and extractors. In Annals of Mathematics, pages 157–187, 2000.
27
[39] M.K. Reiter, A. Samar, and C. Wang. Distributed construction of a fault-tolerant network from a tree. In Reliable Distributed Systems, 2005. SRDS 2005. 24th IEEE Symposium on, pages 155 – 165, oct. 2005. [40] Jared Saia and Amitabh Trehan. Picking up the pieces: Self-healing in reconfigurable networks. In IPDPS. 22nd IEEE International Symposium on Parallel and Distributed Processing., pages 1–12. IEEE, April 2008. [41] Christian Scheideler. Universal Routing Strategies for Interconnection Networks, volume 1390 of Lecture Notes in Computer Science. Springer, 1998. [42] Angelika Steger and Nicholas C. Wormald. Generating random regular graphs quickly. Combinatorics, Probability & Computing, 8(4):377–396, 1999. [43] Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, and Hari Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw., 11(1):17–32, 2003. [44] Amitabh Trehan. Self-healing using virtual structures. CoRR, abs/1202.2466, 2012. [45] Eli Upfal. Tolerating a linear number of faults in networks of bounded degree. Inf. Comput., 115(2):312–320, 1994.
28
Appendix A. Pseudo Code of Section 2.2
1: 2: 3: 4: 5:
6: 7: 8: 9:
Assumption: the adversary attaches inserted node u to node v Node u calls sampleSpareVertices() to find 1 node w ∈ Spare (cf. Equation (2)) that can spare a virtual vertex. if sampleSpareVertices() does not find a node in Spare then Determine current network size n and |Spare| via computeSpare() if |Spare| < θn then Invoke inflate(). (u informs neighbors and initiates rebuilding of graph according to a larger expander i.e., if current virtual graph is Z i , we rebuild to Z i+1 . This rebuilding request is forwarded throughout the entire network.) else (Enough nodes with spare virtual vertices present.) Repeat from Line 1. else /** found node w ∈ Spare **/ Transfer a virtual vertex and all its edges (according to the virtual graph) from w to u. Remove edge between u and v unless required by Z i . Algorithm A.1: insertion(u, θ)
1: 2:
3: 4: 5: 6: 7: 8: 9: 10:
Assumption: adversary deletes node u Given: u had k virtual vertices. (We prove that k ∈ O(1)). One of node u’s neighbor v attaches all edges of u to itself (Node v offloads u’s virtual vertices); v invokes sampleLowLoad() in parallel k-times to find k nodes in Low (i.e. with spare capacity, cf. Equation (1)) and distributes u’s virtual vertices (and edges) among them. if invocations of sampleLowLoad() do not find k nodes in Low then Determine network size n and |Low| via computeLow(). if |Low| < θn then Invoke deflate(). (u informs neighbors and initiates rebuilding of graph to a smaller expander. This rebuilding request is forwarded throughout the entire network.) else /** Enough nodes with low load present. **/ Repeat from Line 2. else /** Found nodes w1 , . . . , wk ∈ Low. **/ Distribute the virtual vertices of u and their respective edges (according to the virtual graph) from v to w1 , . . . , wk . Algorithm A.2: Procedure deletion(u, θ)
29
1:
2:
3: 4:
Given: diam is the diameter of Z t (i.e. diam ∈ O(log n)). Node u broadcasts an aggregation request to all its neighbors. In addition to the network size, this request indicates whether to compute |Low| or |Spare|. That is, the request of u traverses the network in a BFS-like manner and then returns the aggregated values to u. If a node w receives this request from some neighbor, it computes the aggregated maximum value, according to whether w ∈ Spare for computeSpare() (resp. w ∈ Low for computeLow()). If node w has received the request for the first time, w forwards it to all neighbors (except v). Once the entire network has been explored this way, i.e., the request has been forwarded for diam rounds, the aggregated maximum values of the network size and |Low| (resp. |Spare|) are sent back to u, which receives them after 6 2diam rounds. Algorithm A.3: Procedures computeSpare() and computeLow().
1: 2:
3:
4: 5: 6: 7: 8: 9: 10:
11:
Given: current network size n (as computed by computeSpare()). All virtual vertices and all nodes are unmarked. 1. Compute larger expander: Inserted node u forwards an inflation request through the entire network. Every node w locally computes the same virtual graph: Z i+1 ← Dexpander.larger(i). This replaces every vertex in Z i with a cloud of vertices and updates the edges of Ut accordingly. After the construction of Z i+1 is complete, we transfer a (newly generated) virtual vertex to the inserted node u from its neighbor v. 2. Perform load balancing: if a node w has Load(w) > 2ζ (i.e. w ∈ / Low) then Node w marks all vertices in Sim(w) as full. if a node v has load k 0 > 4ζ vertices then (Distribute all except 4ζ vertices to other nodes.) for each of the k 0 − 4ζ vertices that need redistributing do Node v marks itself as contending. while v is contending do Every contending node v performs a random walk of length T = Θ(log n) on the virtual graph Z i+1 by forwarding a token τv . This walk is simulated on the actual network Ut (with constant overhead). To account for congestion, we give this walk ρ = O(log2 n) rounds to complete; once a token has taken T steps it remains at its current node. If, after ρ rounds, τv has reached a virtual vertex z (simulated at some node w), no other token is currently at z, and z is not marked as full, then v marks itself as non-contending and transfers a virtual vertex to w. Moreover, if the new load of w is > 2ζ, we mark all vertices at w as full. Algorithm A.4: Procedure inflate()
30
1: 2:
3: 4: 5: 6: 7: 8:
9:
Given: current network size n (as computed by computeLow()). All virtual vertices and all nodes are unmarked. 1. Compute smaller expander: Node u forwards a deflation request through the entire network. Every node v locally computes the same virtual graph: Z s ← Dexpander.smaller(i). 2. Perform load balancing: if Sim(v) = ∅ then Node v marks itself as contending. else Node v reserves one vertex z ∈ Sim(v) for itself by marking z as taken. while v is contending do Every contending node v performs a random walk of length T = Θ(log n) on the virtual graph Z i+1 by forwarding a token τv . This walk is simulated on the actual network Ut (with constant overhead). To account for congestion, we give this walk ρ = O(log2 n) rounds to complete; once a token has taken T random steps it remains at its current node. If, after ρ rounds, τv has reached a virtual vertex z (simulated at some node w), no other token is currently at z, and z is not marked as taken, then v marks itself as non-contending and requests z to be transfered from w to v where it is marked as taken. Algorithm A.5: Procedure deflate()
1:
2: 3:
4:
Given: Let ` be a constant such that Inequality (6) holds. A neighbor v of the affected node u initiates a random walk of length ` (cf. Equation 6) on the current network by generating a token τ . We exclude the newly inserted node u from the walk. The token τ contains the id of the source node u and the traveled distance. When node w receives the random walk token, it appends its own id to the token if w ∈ Spare in the case of procedure sampleSpareVertices() (resp. w ∈ Low for sampleLowLoad()) and forwards the token to a randomly chosen neighbor. If the token has traveled a distance of ` hops without visiting any node in the sample set, then node v is notified that no sample has been found. Algorithm A.6: Procedures sampleSpareVertices() and sampleLowLoad().
31