Aug 29, 2014 - classes of failures without invoking the network con- trol plane. ..... gies [12], Clos networks [1], and grids) and other well- known graphs (cliques ...
arXiv:1409.0034v1 [cs.NI] 29 Aug 2014
Exploring the Limits of Static Failover Routing Marco Chiesa
Ilya Nikolaevkiy
Aurojit Panda
HUJI
HIIT
UC Berkeley
Andrei Gurtov
Michael Schapira
Scott Shenker
HIIT
HUJI
ICSI/UC Berkeley
ABSTRACT
We consider an even simpler form of failover routing: each incoming port on a router has a destination-based routing table that maps the destination address to an ordered list of output ports. We assume that the router can detect when links are down, and always uses the first functional port on this ordered list of possible output ports. We call this static failover routing (SFR) and the set of ordered lists the failover routing tables in the network. SFR is among the simplest forms of failure protection we can think of: it involves static deterministic precomputed per-port routing tables, no additional fields in the packet header, no tunnels, and no ability to detect the state of the network other than directly adjacent links, while still enabling fast response to failures. Per-port routing tables are necessary, otherwise robustness against even a single link failure cannot be guaranteed [17]. We only consider link failures, not router failures (which are not always detectable by neighboring routers, so such fast failover techniques may not apply). The question is, how resilient can SFR be? That is, how many link failures can failover routing tables tolerate before connectivity is interrupted (i.e., packets are trapped in a forwarding loop, or hit a dead end)? The answer depends on the structural properties of the graph (since no failover mechanism helps in a disconnected graph). The main property we use to characterize the graph is the min-cut or connectivity of a graph (i.e., the minimum number of links that must be removed in order to physically disconnect any two routers). In what follows, we present the following positive and negative results, and end with an open conjecture:
Fast Reroute (FRR) and other forms of immediate failover have long been used to recover from certain classes of failures without invoking the network control plane. While the set of such techniques is growing, the limits of even the simplest such approach, which we call static failover routing, has not been adequately explored. We present both positive and negative results on this natural form of routing resilience, and end with an open conjecture.
1.
INTRODUCTION
The most naive form of routing involves defining a control-plane protocol to compute a set of destinationbased routing tables (i.e., tables that map the destination address of a packet to an output port) that route packets along the shortest path to the intended destination. Whenever a link or node fails, these tables are recomputed by invoking the routing protocol to run again (or having it run periodically, independent of failures). This produces well-formed routing tables, but results in relatively long outages after failures (as large as 100s of milliseconds) as the protocol is recomputing routes. As critical applications began to rely on the Internet, such outages became unacceptable. As a result, “fast failover” techniques have long been employed in widearea networks to recover immediately from failures.1 The most well-known of these is Fast Reroute (FRR) in MPLS where, upon a link failure, packets are sent along a precomputed alternate path without waiting for the control plane to recompute routes. FRR and other similar forms of “link protection” thus enable rapid response to failures but are limited to the set of precomputed alternate paths, which is typically limited.
Positive results • Our first positive result gives some hope that failover tables can move beyond the ability to protect against single failures: For any threeconnected graph, one can find failover routing tables that are robust to any two failures.
1 By “immediately”, we mean that there is no control plane delay, but the fast failover can only happen after (a) the failure is detected and (b) the router can update its routing table to use the backup route. These delays depend on the technology used for failure detection and table management, so the resulting delays can vary substantially, but typically are much less than the time it takes for the control plane to reconverge.
• Our second positive result shows that failover tables can be extremely effective in specialized graphs: For a variety of specialized k-connected 1
graphs (including Clos, Chordal, grid, hypercube), one can find failover routing tables that are robust to any k − 1 failures.
a d z
Negative results • Our first negative result says that there are limits to what failover tables can accomplish: Even if a subset of the vertices are k-connected to a destination (i.e., for each of these vertices, there exist k link-disjoint paths to the destination), it is not always possible to find failover routing tables that retain this connectivity against k − 1 failures.
b
o x
y c
(a)
x
c
x
c (b)
Figure 1: (a) No circular routing tables can guarantee 2-resiliency. (b) Edge transformation.
• Our second negative result indicates that simplified forms of failover tables are not sufficiently powerful: There are some k-connected graphs where simplified forms of failover routing tables adopted in previous work (e.g., ”circular ordering routing”) are not robust against any k −1 failures.
2.
FORMAL MODEL
We model the network as a graph G, where the vertices V (G) represent routers and edges E(G) represent links between routers. Since our focus is on perdestination routing tables, we assume that there exists a unique destination vertex d to which every other vertex wishes to send packets. Each vertex v routes data packets to d according to a routing table f v , that matches the incoming edge e ∈ E(G) ∪ {σ}, where σ denotes a packet that is originated at v, to a sequence of incident edges < e1 , . . . , en > at v. A packet that is received through e is forwarded to the first active (nonfailed) edge of f v (e). As an example, consider Fig. 1(a). f o (σ) =< y, z, x > means that packets originated at o are forwarded to y if (o, y) did not fail, to z if (o, y) failed and (o, z) did not, and to x otherwise. Given a graph G, we say that a set of routing tables R is c-resilient if any packet forwarded according to R reaches its destination d as long as at most c edges fail without disrupting physical connectivity between a sender of a packet and d. A set of ∞-resilient routing tables delivers a packet as long as there is physical connectivity to d Similarly, we say that a set of routing tables is vertex-connectivity-resilient if any packet originated at a vertex v that is k-connected to d is guaranteed to reach d as long as at most k − 1 edges fail without disrupting physical connectivity between v and d. The input of the R ESILIENCY problem is a graph G, a destination vertex d, and a constant c > 0, and the goal is to compute routing tables that are c-resilient.
• Our third negative result says that failover routing tables cannot always be robust against failures that do not disconnect the graph: Given a twoconnected graph, it is not always possible to find failover routing tables that are robust to any 2 failures that do not disconnect the graph. Conjecture Motivated by the possibility that one can protect against k − 1 failures in some k connected graphs, we make the conjecture (which, despite much effort, we have not been able to prove or disprove) that: • For any k-connected graph, one can find failover routing tables that are robust to any k − 1 failures. The rest of this paper is devoted to the precise formulation of these results. However, before turning to this more detailed presentation, we first note that there are many other forms of resilient routing. One can use extra bits in the packet header, or tunnels, or dynamic routing tables (which are updated based on packet arrivals, not just control plane recomputations). And there are other, more exotic, approaches such as Failure-Carrying Packets [18] where a list of failures is kept in the packet header. Each of these has their strengths (typically the degree of resilience) and weaknesses (typically requiring additional state or hard-to-deploy changes), but our goal here is not to render a comparative judgement on these approaches. Instead, we are merely asking how resilient can we make our natural and extremely simple failover approach. This is such a basic question we were surprised that it had not yet been fully addressed, but even our own investigation leaves the most fundamental conjecture unresolved.
3. 3.1
POSITIVE RESULTS 3-connected graphs are 2-resilient
In this section we show how to compute routing tables that are robust to any 2 edge failures. The same technique can be used to achieve 1-resiliency for arbitrary graphs, as proved in [10]. The main challenge that arises when multiple edges fail, but routers are only aware of the failure of incident edges, is that naive attempts to find an alternate path may 2
Technique 1.
easily lead to forwarding loops. For this reason, in [8] and [11], two additional bits are added to the header of each packet to record failure information. We show that this overhead is not necessary.
1. If v originates p, it forwards it on an arbitrary spanning tree. 2. If v receives p along an edge colored ci , with i = 0, 1, 2, and the outgoing edge of the same color is active, then it forwards p along that edge.
T HEOREM 1. For any 3-connected graph G there exist a set of 2-resilient routing tables. Our routing technique is based on computing a set of three arc-disjoint directed spanning trees, i.e., such that no pair of trees has a directed edge in common (see [8]). The existence of such trees in any 3-connected graph was proven in [5]. Intuitively, forwarding works as follows. A packet is greedily routed along one of these spanning trees until hitting a failed edge. Then, the packet is rerouted to a different spanning tree. However, since each edge is shared by two different spanning trees, if the next spanning tree is not carefully chosen, it is possible that after two edge failures the packet be rerouted to the initial spanning tree and enter a forwarding loop. We show how to make the right decision at each reroute step, thus constructing 2-resilient routing tables without relying on any additional bits in the packet header. We describe two different techniques that leverage these three arc-disjoint spanning trees in order to guarantee the reachability of a destination d even if two edges fail. Each technique comes with different performance guarantees. We consider two different measures: (i) the length of the paths in the absence of failures and (ii) the worst-case number of times a packet can be rerouted before reaching d. Our first technique allows each vertex to arbitrarily pick the first spanning tree to be used. Our second technique, in contrast, restricts all vertices that originate a packet to use the same spanning tree in the absence of failures. Hence, with respect to measure (i), the first technique guarantees that, with no failed edges, the path from a vertex to d is shorter or equal to the length of the path used in the second technique. This advantage comes at the expense of measure (ii): with the second technique, a packet is guaranteed to be rerouted at most twice, whereas with the first technique, a packet may be rerouted four times before reaching d. We introduce some useful terminology. We refer to the three arc-disjoing spanning trees as the Red, Blue, and Green trees. Consider a vertex x of the graph. We say that an edge (x, y) is an outgoing (incoming) edge for x with color c, if the spanning tree with color c contains a directed edge from x to y (from y to x). We say that a incoming edge from y to x has no color if it is not contained in any spanning tree. We first choose an arbitrary ordering < c0 , c1 , c2 > of the spanning trees (e.g., < Blue, Red, Green >). Each vertex v performs the following actions (each subscript is modulo 3).
3. If v receives p along an edge colored ci , with i = 0, 1, 2, and the outgoing edge of the same color is not active, then it forwards p along the outgoing edge with color ci+1 . If that edge is also not active, it forwards p to the outgoing edge with color ci+2 . T HEOREM 2. Technique 1 constructs 2-resilient routing tables for 3-connected graphs. P ROOF. We now show that this routing scheme is 2resilient. W.l.o.g, assume that a packet p is first routed along the spanning tree colored c0 =Blue (see Fig. 2). Either p reaches d or it hits at a vertex x a failed link x
w
y
z
Figure 2: Proof of Technique 2 of Theorem 1. Dashed colored lines represent paths in the graph.
(x, y). In the second case, p is rerouted along a path colored c1 =Red. Observe that, either p reaches d or it hits at a vertex w, a failed link (w, z). In the latter case, observe that (w, z) 6= (y, x), otherwise we have a loop colored Red that contains (y, x). Observe that possibly w = y. Hence, the only two failed links are (x, y) and (w, z). Vertex w reroutes p along a path colored Green. Now, either p reaches d or it hits at a vertex u ∈ {x, y, w, z}, a failed link. In the latter case, observe that u 6= x, since (x, y) is colored Blue from x to y and if w 6= y, then u 6= w, since (w, z) is colored Red from w to z. Moreover, u 6= z, otherwise we have a loop colored Green that contains (z, w). Hence, u = y, which implies y 6= w, and u reroutes p along Blue. Now, observe that either p reaches d or it hits at z the failed link (w, z). In the latter case, z reroutes p on Red. Suppose, by contradiction, that p does not reach d. It means that it hits at least a failed link colored Red. However, the only outgoing link colored Red is (w, z) from w to z, which implies that there exists a loop colored Red that contains (w, z)—a contradiction. 3
T HEOREM 4. A set of k-shared-link-failure-free routing tables R is (k − 1)-resilient.
Technique 2. 1. If v originates p, it forwards it along the outgoing edge with color Blue.
P ROOF. Suppose, by contradiction, that a packet p is trapped in a forwarding loop. Since R is k-shared-linkfailure-free, a packet p will eventually be routed on the k’th spanning tree (as every new failed edge it hit before let to its rerouting to a new tree). However, it will then hit another failed outgoing edge. This implies that k edges failed—a contradiction.
2. Same as in Technique 1, step 2. 3. If v receives p along an edge (u, v) colored ci , with i = 0, 1, 2, and the outgoing edge of the same color is not active, then it forwards p along the outgoing edge that has the same color as the incoming edge from u to v, unless this color is Blue or it has no color. In both cases, v forwards p along the outgoing edge colored cj , with j = 1, 2 and cy 6= ci .
We now give a high-level description of how to construct k-shared-link-failure-free routing tables for data center and interconnection topologies (generalized hypercubes, which have inspired bCube datacenter topologies [12], Clos networks [1], and grids) and other wellknown graphs (cliques, complete bipartite graphs, and chordal graphs [7]). Our high-level technique consists of two steps. First, the input graph is recursively decomposed into smaller substructures (possibly with less connectivity than the original graph) for which it is easier to compute k-shared-link-failure-free routing tables. Then, these substructures are interconnected in such a way that the resiliency is retained or even increased. The main challenge is to maintain the set of of routing tables k-shared-link-failure-free during the interconnection phase, which is not trivial.
T HEOREM 3. Technique 2 constructs 2-resilient routing tables for 3-connected graphs. P ROOF. We now show that this routing scheme is 2resilient. A packet p is routed along the Blue tree as long as it hits a failure. It may reach d just using a Blue tree, or hit a failed link somewhere along the Blue branch of the tree. Let us denote the failed edge as e. W.l.o.g., assume that the reversed edge of e is colored Red. From now on, the packet will use the Red tree. It may reach a destination or hit a failed edge e0 . Since a tree does not have a cycle, e0 is a different link w.r.t. e. From now on, p is routed along the Black tree. Since a tree does not have a cycle, p cannot hit e0 anymore. Because link e is colored in Blue and Red, p will not be forwarded on it while it is routed along the Black tree. Therefore, p is guaranteed to reach d.
3.2
3.3
Clique graphs
A clique of size k consists of k vertices all connected to each other. Since there exists k−1 edge-disjoint paths between every two pair of vertices, a clique ofsize k is k − 1 connected.
Shared-link-failure-free routing tables
In this section we introduce a sufficient condition to achieve (k − 1)-resiliency. Observe that extending the previous two techniques to achieve robustness against k − 1 edge failures seems a very hard task as k2 edge failures may cause the disruption of at least one arc on each of the k arc-disjoint trees. However, the following property of on arc-disjoint spanning trees, which we call shared-link-failure-free, guarantees (k − 1)-resiliency.
T HEOREM 5. For any k-connected clique graph there exists a set of (k − 1)-resilient routing tables. P ROOF. Let v1 , . . . , vk , d be the set of k + 1 vertices, where d is the destination vertex. We construct a set of k-shared-link-failure-free routing tables based on k arc-disjoint spanning trees T1 , . . . , Tk as follows. For each i = 1, . . . , k, add (vi , d), (v1 , vi ), . . . , (vi−1 , vi ), (vi+1 , vi ), (vk , vi ) into Ti . Routing is as follows. A packet is first routed along T1 . A packet is routed along Ti , with i = 1, . . . , k as long as it does not hit a failed edge. In that case, p is rerouted along Ti+1 . Suppose, by contradiction, that this is not a set of k-shared-link-failure-free routing tables, i.e., Either (i) a packet p is routed along a spanning tree Ti , with i = 1, . . . , k, and hits a failed edge, but p hit only i − 1 distinct failed edges or (ii) a packet does not reach d and does not hit a failed edge in the k’th spanning tree Tk . Case (ii) is not possible since a packet is rerouted on a different spanning tree every time a failed edge is hit. In case (i), let e = (vi , vj ) be the first failed edge that is hit by p twice. Clearly, p cannot hit e twice in
D EFINITION 1. A set of routing tables is k-sharedlink-failure-free if there exist k arc-disjoint spanning trees T1 , . . . , Tk such that if a packet p is routed along a spanning tree Ti , with i = 1, . . . , k, and hits a failed edge, then p hit at least i distinct failed edges. In addition, a packet either reaches d or hits a failed outgoing edge in the k’th spanning tree. Observe that a set of k-shared-link-failure-free routing tables might sometimes reroute a packet to another spanning tree even if that packet did not hit a failed edge. This is crucial, for instance, in our construction of a set of (k − 1)-resilient routing tables for generalized hypercubes below. 4
the same direction, otherwise it means that p has been rerouted k times without hitting a failed edge twice—a contradiction. Hence, p hits e in two opposite directions. i.e. from vi to vj and from vj to vi . W.l.o.g., let i < j. Before p reaches vj we have that it hit j − 1 distinct failed edges. In addition, since vj routes p to vi along Ti , we have that all its edges to d, vj+1 , . . . , vk failed. All these (j − 1) + (k − j + 1) = k failed edges are distinct, otherwise e is not the first edge that p hits twice, hence the statement of the theorem.
let Ti be the spanning tree along which a packet p first hits a failed edge e twice. Clearly, p already hit i − 1 distinct failed edges. Moreover, it hits e in two opposite directions, otherwise, p would be rerouted from a spanning tree Tj to a spanning tree Ti , with j < i, which means that at least k distinct edges failed—a contradiction. Hence, we have two cases, either (a) e = (bi , d) failed or (b) e = (aj , bi ) failed, with 1 ≤ i ≤ |B| and 1 ≤ j ≤ |A|. Since p hits e in two opposite directions, case (a) is not possible. In case (b), observe that a packet is routed to aj (bi ) only if it is routed along Tj (Ti ). Hence, since p cannot be routed from aj to bi (bi to aj ) and bi (aj ) is a leaf of every spanning tree Tl , with l 6= i (l 6= j), a packet will be rerouted to bi (aj ) only after it is rerouted along the other spanning trees, which implies that there are at least k distinct failed edges—a contradiction.
We prove another property that will be used later in this section. Let ni , with 1, . . . , k, be the only neighbor of d such that (ni , d) belongs to Ti . L EMMA 6. For any k-connected clique graph there exists a set of (k − 1)-resilient routing tables such that if a packet is routed at a vertex ni along Ti , then it does not traverse any vertex n1 , . . . , ni−1 while it is routed through Ti , . . . , Tk .
3.5
A generalized hypercube is defined recursively as follows. A clique of size k + 1 is a (1, k)-generalized hypercube. A (i, k)-generalized hypercube, with i > 1, consists of k + 1 copies of a (i, k)-generalized hypercube where all copies of the same vertex form a clique (of size k + 1). Observe that the connectivity of a generalized hypercube increases by a factor of k at each recursive step. Hence, a (i, k)-generalized hypercube is a k i connected graph. The construction of a set of k i -sharedlink-failure-free routing tables is done recursively. First, we construct a k-shared-link-failure-free routing for a clique of size k + 1. Then, in the recursive step, we interconnect all the smaller copies and populate the existing routing tables with additional entries that increase the resiliency of the graph by a factor of k while keeping the set of routing tables a k i -shared-link-failure-free.
P ROOF. Consider the same routing solution used in the proof of Theorem 5. Each vertex ni is a leaf of each spanning tree Tj , with i 6= j. Hence, a packet is never routed towards ni , unless a packet is routed along Ti . Since a packet is rerouted only from a spanning tree Ti to a spanning tree Ti+1 , we have the statement of the theorem.
3.4
Generalized hypercubes
Complete Bipartite Graphs
A complete bipartite graph G = (A, B, E) consists of |A| + |B| vertices a1 , . . . , a|A| , b1 , . . . , b|B| and there exists an edge between every pair of vertices ai and bj , with i = 1, . . . , |A| and j = 1, . . . , |B|. It is easy to see that a (A, B, E) complete graph is k-connected, where k = min{|A|, |B|}. We prove the following theorem.
T HEOREM 8. For any (i, k)-generalized hypercube graph there exists a set of (k i − 1)-resilient routing tables.
T HEOREM 7. For any k-connected complete bipartite graph there exists a set of (k − 1)-resilient routing tables.
P ROOF. We denote by H(i, k, l) a graph containing l copies of a (i, k)-generalized hypercube where all copies of the same vertex form a clique. Observe that H(i, k, k + 1) = H(i + 1, k). A H(i, k, l) graph is (k i + l − 1)-connected. We recall that we denote by nj , with 1 ≤ j ≤ k i + l − 1, a neighbor of d such that (nj , d) belongs to the j’th arc-disjoint spanning tree Tj . We prove that there exists a set of (k i + l − 1)-sharedlink-failure-free routing tables for H(i, k, l) by induction on i and l. Moreover, we also prove that if a packet is routed at a vertex ni along Ti , then it does not traverse any vertex n1 , . . . , ni−1 while it is routed through Ti , . . . , Tk . In the base case, H(1, k, 1) is a clique of size k + 1. W.l.o.g., by symmetry of the hypercube construction, we assume that the destination vertex d is contained in
P ROOF. We construct a set of k-shared-link-failurefree routing tables. W.l.o.g., assume that d is in A. Let k = min{|A|, |B|}. For each i = 1, . . . , k, add into Ti edges (bi , d), (a1 , bi ), . . . , (a|A| , bi ), (b1 , ai ), . . . , (bi−1 , ai ), (bi+1 , ai ), . . . , (b|B| , ai ) into Ti . Routing is performed exactly as for cliques (refer to the proof of Theorem 5). We now prove that this is a set of kshared-link-failure-free routing tables. Suppose, by contradiction, that this is not a set of k-shared-link-failurefree routing tables, i.e., either (i) a packet p is routed along a spanning tree Ti , with i = 1, . . . , k, and hits a failed edge, but p hit only i − 1 distinct failed edges or (ii) a packet does not reach d and does not hit a failed edge in the k’th spanning tree Tk . Clearly, case (ii) is not possible, as in the proof of Theorem 5. In case (i), 5
this clique. By Theorem 5, there exists a set of k-sharedlink-failure-free routing tables. Moreover, by Lemma 6, we have that if a packet is routed at a vertex ni along Ti , then it does not traverse any vertex n1 , . . . , ni−1 while it is routed through Ti , . . . , Tk . In the inductive step, we construct a set of c + 1-shared-link-failure-free routing tables for H l+1 = H(i, k, l + 1), where 1 ≤ l ≤ k and c = (k i + l − 1). Graph H l+1 consists of a graph H l = H(i, k, l) and a graph H 1 = H(i, k, 1), where each vertex of H 1 is connected to l vertices of H l and each vertex of H l is connected to exactly one vertex of H 1 . Destination d is in V (H l ) and we denote by d1 the only neighbor of d in V (H 1 ). By inductive hypothesis, there exists a set of c-shared-link-failure-free routing tables for H l , which is c-connected, based on c arc-disjoint spanning trees T1l , . . . , Tcl . By inductive hypothesis, there exists a set of k i -shared-link-failure-free routing tables for H 1 , which is k i -connected, based on k i arc-disjoint spanning trees T11 , . . . , Tk1i . We now construct c + 1 arc-disjoint spanning trees T1 , . . . , Tc+1 for H l+1 . For each j = 1, . . . , k i , let n1j be the unique neighbor of d1 such that arc (n1j , d1 ) belongs to Tj1 . Let N 1 be the set {n11 , . . . , n1ki }. Let nj,1 , . . . , nj,l−1 be the neighbors of n1j that belong to H l and v1 , . . . , vl−1 be the neighbors of a vertex v ∈ V (H 1 ) that belong to H l . For each j = 1, . . . , k i − 1, let Tj be the union of both Tj1 and Tjl plus arc (n1j , nj,1 ). Moreover, for each j = 1, . . . , k i − 1, we reverse arc (n1j , d1 ) in Tj . For each j = k i , . . . , k i + l − 2, let Tj be the union of both Tjl and, for each vertex v ∈ V (H 1 ), arc (v, vj−ki +2 ), which connects H 1 to H l . Tree Tki +l is constructed from a copy of Tk1i by adding all the edges from vertices in V (H l ) to vertices in V (H 1 ) and reversing arc (d, d1 ). Tree Tki +l−1 is constructed from a copy of Tkl i +l by adding all the arcs from vertices in V (H 1 ) to vertices in V (H l ) that do not belong to any T1 , . . . , Tki +l−2 , Tki +l . Moreover, we add into Tki +l−2 arc (d1 , n1ki ) and all arcs (n1j , d), with j = 1, . . . , k i − 2, k i . Routing is as follows. Routing at vertices of H l is unchanged, i.e., if a vertex was routing from a spanning l tree Tjl towards a spanning tree Tj+h , now it routes a packet received through Tj along Tj+h . In addition, if a packet cannot be routed along Tki +l−1 , then it is rerouted through Tki +l and if a packet is received from H 1 , it is rerouted through T1 , unless p is received from Tki +l . Routing at vertices of H 1 is unchanged, i.e., if a vertex was routing from a spanning tree Tj1 , with 1 j = 1, . . . , k i − 1, towards a spanning tree Tj+h , now it routes a packet received through Tj along Tj+h and if a packet cannot be routed along Tj , then it is rerouted through the next available spanning tree.
T1
d
Figure 3: A (1, 1)-generalized hypercube with one arc-disjoint spanning tree (solid black ).
n1 H l
d
T1 T2 n11 H 1
d1
Figure 4: A (2, 1)-generalized hypercube with two arc-disjoint spanning trees (solid black and dashed red).
Consider the example in Figure 3, where the base case for a (i, 1)-generalized hypercube is depicted together with its unique arc-disjoint spanning tree T1 . In order to construct a set of routing tables for the (2, 1)generalized hypercube depicted in Figure 4, we create a copy of a (1, 1)-generalized hypercube H l = H(1, 1, 1), denoted by H 1 , and construct T2 using T11 . We add all edges from H l to H 1 in T2 , but we reverse (d, d1 ). We then add (n11 , n1 ) and (d1 , n11 ) into T1 . When a packet is routed from H l to H 1 , it is rerouted along T1 , which is obvious for edge (n11 , n1 ), unless it is routed from d1 to d. We now construct a set of routing tables for the (3, 1)-generalized hypercube depicted in Figure 5. We create a copy of a (2, 1)-generalized hypercube H l = H(1, 1, 2) = H(2, 1, 1), denoted by H 1 . We construct T1 by interconnecting T1l and T11 with an edge (n11 , n1 ) and reversing (n11 , d1 ). We construct T3 from T21 . We add all edges from H l to H 1 in T3 , but we reverse (d, d1 ). We then construct T2 by interconnecting T2l with edges from H 1 to H l , except when the edge is incident to a vertex in {d1 , n11 }. We add edges (n11 , d1 ) and (d1 , n12 ). When a packet is routed from H l to H 1 , it is rerouted along T1 , unless it is routed from d1 to d. For instance when a packet is forwarded from n12 to n2 along T2 , it is then forwarded along T1 . We now construct a set of routing tables for the (4, 1)-generalized
d
d1
n1
T1
n11
T2 H n2
l
H
1
T3
n12
Figure 5: A (3.1)-generalized hypercube with three arc-disjoint spanning trees (solid black, dashed red, and dotted blue).
6
d
n1
of H l can detect when (k i + l − 1) edges failed. In that case, a packet is forwarded through the next spanning tree, i.e., Tki +l , and it is routed entirely within H 1 plus (d1 , d). Since a packet p routed through Tki +l is never rerouted to any other spanning tree, we have that either p reaches d1 and, in turn, d or an edge in H 1 or between H l and H 1 must have failed. This edge is different from any other of the (k i + l − 1) edges that failed in H l , which leads to a total of k i + l edge failures. The vertex that cannot forward through Tki +l detects that at least k i + l edges failed. Hence, in this case, routing tables are (k i + l)-shared-link-failure-free since a packet will never enter a loop without any vertex detecting that k i +l edges failed. Before considering a packet that is originated from H 1 , we prove that if a packet is routed at a vertex ni along Ti , then it does not traverse any vertex n1 , . . . , ni−1 while it is routed through Ti , . . . , Tk . Observe that vertices n1 , . . . , nki +l−1 are all contained in V (H l ) and a packet is routed to H 1 (and in turn to nki +l ), only along Tki +l . Hence, by induction hypothesis and since a packet routed along Tki +l is never rerouted to T1 , this property holds. We now consider a packet p that is originated from a vertex of H 1 . Observe that since the routing tables within H 1 have been partially modified, we first need to analyze these differences. These changes in the routing tables only involve d1 and its neighbors in H 1 and the fact that all vertices will route from Tki −1 through a set of routing trees Tki , . . . , Tki +l−2 that connects each vertex of H 1 with a direct edge to a vertex of H l . If a packet reaches H l , then it is rerouted along T1 and we already prove that it is guaranteed to reach d. Otherwise, it is forwarded through Tki +l−1 and, alternatively, through Tki +l . We first consider a packet p that is not originated at d1 . In this case, observe that a packet p is routed through T1 , . . . , Tki −2 exactly as it was routed through T11 , . . . , Tk1i −1 , with a small difference: If p reaches a vertex n1j , while it is routed along Tj , with j = 1, . . . , k i − 1, instead of being routed to d1 , it is routed through edge (n1j , nj,1 ). If that edge is failed, p is routed exactly as if edge (n1j , d1 ) failed in Tj1 . Hence, a packet either reaches a vertex of H l or it is routed according to T1i , . . . , Tk1i −1 . Now, if a packet hits a failed edge along Tki −1 , it is rerouted along the next l − 1 spanning trees Tki , . . . , Tki +l−2 , which directly connect each vertex of H 1 to a vertex in H l . In that case, a packet p either reaches a vertex in H l , for which we are guaranteed that it will reach d or a vertex detects that k i + l edges failed, or k i + l − 1 distinct edges failed and p is routed inside H 1 along Tki +l−1 . In the latter case, p is either routed to a vertex in H l , or it hits a failed edge e that, by construction of Tki +l−1 , either connects H 1 to H l or it is incident to d1 . Observe that e is a
n3
Hl
n2
T1 T2 T3 T4
d1
n11
n13
H1
n12
Figure 6: A (4, 1)-generalized hypercube with four arc-disjoint spanning trees (solid black, dashed red, dotted blue, and dash dotted green).
hypercube depicted in Figure 6. We create a copy of a (3, 1)-generalized hypercube H l , denoted by H 1 . We construct Ti , with i = 1, 2, by interconnecting Til and Ti1 with an edge (n1i , ni ) and reversing (n1i , d1 ). We construct T4 using T31 . We add all edges from H l to H 1 in T4 , but we reverse (d, d1 ). We then construct T3 by interconnecting T3l with edges from H 1 to H l , except when the edge is incident to a vertex in {d1 , n11 , n12 , }. We add edges (n11 , d1 ),(n12 , d1 ), and (d1 , n13 ). When a packet is routed from H l to H 1 , it is rerouted along T1 , unless it is routed from d1 to d. Observe that a packet that is sent along T2 from x4,3 to n12 , where xi,j is the vertex on i’th row and j’th column in Figure 6, it is rerouted on T1 because vertex n2 reroutes packet received from x2,3 along T1l . On the contrary, since vertex d does not reroute a packet received from n3 along T1l , also vertex d1 does not reroute a packet received from n13 along T1 . Observe that, by construction, when a packet is routed through Tki +l in H(i, k, l + 1), it is never rerouted through any other spanning tree. In fact, rerouting on different spanning trees, happens only when a packet is sent from H 1 to H l , unless the destination vertex is d. Since Tki +l does not include any edge from H 1 to H l , except (d1 , d), and it is build from Tkl i +l−1 , which, by induction hypothesis, does not reroute on T1l , the statement easily follows. We now prove that this set of routing tables for H l+1 is (k i +l)-shared-link-failure-free such that if a packet is originated at a vertex ni , then it is routed to a vertex nj , with j > i, only when it is routed along Tj . Consider a packet p that is originated in H l . Observe that all the arcs from H l to H 1 belong to Tki +l and, since routing in H l is (k i + l − 1)-shared-link-failure-free, a vertex 7
composed into a tree T such that: (i) a node of T represent a complete bipartite subgraph of C, (ii) there exists a directed arc from a node x of T to a node y of T if y contains a vertex of C that is closer to d than any vertex of C contained in x, and (iii) each vertex of C belongs to at least one node (at most two nodes) of T . Let n1 , . . . , nl be the set of nodes of T . Let n1 be a complete bipartite graph that contains d and for each graph ni , with i = 2, . . . , l let di be an arbitrary vertex of ni that is closer to d. By Theorem 7, we can construct within each ni , a set of k-shared-link-failure-free routing tables towards di . When a packet reaches di , it is routed through the next complete bipartite graph nj , with j 6= i, towards a destination dj that is closer to d than di . Since, we are using a shortest path metric, such destination must exists. Hence, the statement of the theorem is proved.
distinct failed edges. In fact, all the edges failed along T1 , . . . , Tki +l−2 are not incident to d1 . If p hits e and it is rerouted along Tki +l , it cannot hit e in the opposite direction since the outgoing edge at d1 in Tki +l is towards d. Hence, by induction hypothesis and since Tki +l−1 does only route a packet either directly to a vertex of H l , to d or one of its neighbors, when p is rerouted along Tki +l , it is guaranteed to either reach d1 , and in turn d, or to hit the k i + l distinct failed edge, which proves the statement of theorem in this case. We now finish our proof by studying how a packet p that is originated at d1 is routed in H 1 . If all edges incident to d1 failed, we have that k i + l distinct edges failed and d can detect it. Otherwise, if not all these edges failed, then p is routed to a vertex n1j , with j = 1, . . . , nki . After that, by inductive hypothesis, we have that packet p is guaranteed to do not traverse any vertex n1h , with h < j. Hence, it is either routed to a vertex in H l through a spanning tree in Tj , . . . , Tki +l−1 or it is routed along Tki +l . In that case, p may be routed along Tki +l−1 for at least an edge or not. In the first case, by construction of Tki +l−1 , p is at d or n1ki . In both cases, it can be routed to d along Tki +l and if (d1 , d) is failed, a vertex can detects that k i + l distinct edges failed, which proves the statement of the theorem. In the second case, a packet does not change its location if an edge failed along Tki +l−1 . Hence, by inductive hypothesis, it is guaranteed to be routed to (d1 , d) or to hit a distinct failed edge. Hence, the statement of the theorem is proved in this case as well.
3.6
3.7
Two dimensional grids (Hamiltonianbased routing)
A 2-dimensional n × m grid consists of n + m cycles c1 , . . . , cn , c01 , . . . , c0m , where ci = (v1i , . . . , vmi ) and c0i = (vi1 , . . . , vin ). We now introduce a useful technique based on Hamiltonian cycles that can be used to construct (k − 1)-resilient k-shared-link-failurefree routing tables. Consider a sequence S of 2k arcdisjoint spanning trees S =< T1A , T1B , . . . , TkA , TkB > where TiA is a path (v, w1 , . . . , wn , u, d) and TiB = (u, wn , . . . , w1 , v, d) is the same path reversed. Now, if routing tables route packets according to this ordered sequence S, we obtain (k−1)-resiliency. In fact, when a packet hits a failed edge on a path TiA , it is sent in the opposite direction, where it is guaranteed to either reach d or to hit a different failed edge. Since any spanning tree TiA or TiB does not overlap with any other spanning tree TjA and TjB , with j 6= i, this is a set of (2k − 1)-sharedlink-failure-free routing tables, Observe that paths TiA and TiB form a Hamilatonian cycle, i.e., a cycle that visits all vertices exactly once. Hence, if a graph contains k edge-disjoint Hamiltonian cycles, then we can exploit these cycles to easily construct (2k −1)-resilient routing tables. This allows us to exploit known results about the number of edge-disjoint Hamiltonian cycles in specific graphs in order to provide resiliency guarantees. For instance, it is well-known that a (2i, 1)-generalized hypercube (i.e., a “standard” hypercube) contains i edgedisjoint Hamiltonian cycles [4], which can be used to compute (2i − 1)-resilient routing tables. As for grids, we show in the full version of this paper how to compute 2 edge-disjoint Hamiltonian cycles inside a grid.
Clos networks
A k-Clos network [1] is a k-connected graph that consists of k partially overlapping multirooted trees organized in layers. Its high bisection bandwidth and symmetric structure make it an ideal choice for a datacenter network topology. Our shared-link-failure-free routing table construction decomposes a Clos network into a set of k-connected complete bipartite graphs, where each vertex belongs to at most two complete bipartite graphs. We first compute a set of k-shared-link-failure-free routing tables for each of the k-connected complete bipartite graph. After that, we interconnect all these bipartite graphs in such a way that the resiliency of the resulting graph is also k-1. This technique improves upon all previously known results about resiliency in Clos networks [23] in two ways: First, in our case all the vertices are (k − 1)-resilient and not only the leaves of the multirooted tree; Second, our construction works for any arbitrary number of layers of the Clos network. T HEOREM 9. For any k-connected Clos network there exists a set of (k − 1)-resilient routing tables (k − 1)-resilient.
T HEOREM 10. For any grid graph there exists a set of 3-resilient routing tables. P ROOF. Our routing scheme relies on a grid graph decomposition into 2 edge-disjoint Hamiltonian cycles.
P ROOF. A k-connected Clos network C can be de8
subset. It is easy to see that if a graph G does not contain an induced subgraph G0 , then every induced subgraph of G does not contain an induced subgraph G0 . Hence, any induced subgraph of a chordal graph is a chordal graph.
Figure 7: Odd-Odd case.
Figure 8: Even-Even case.
L EMMA 11. Let G be a k-vertex-connected chordal graph and S be a minimal vertex separator of G. Let A1 , . . . , An be the n set of vertices belonging to the n disconnected components of G after the removal of vertices in S from G. Each G[Ai ∪ S], with 1 ≤ i ≤ n, is a k-vertex-connected graph.
Figure 9: Even-Odd case
We prove that such decomposition always exists. We provide patterns for different parity of grid dimensions which are extendible by adding two rows or columns for such decomposition. Fig. 7, Fig. 8, and Fig. 9 shows all possible parity cases. Two cycles are marked with different line types. Repeatable blocks are highlighted with curve brackets.
3.8
P ROOF. We prove that the vertex-connectivity is retained. Consider any subgraph G[Ai ∪ S], with 1 ≤ i ≤ n. It is well-known that a minimal vertex separator S of a k-vertex-connected chordal graph is a k-vertexconnected clique [14]. Consider two arbitrary vertices u and v in Ai ∪ S. Let p1 , . . . , pk be the k vertex-disjoint paths that exist between u and v in G such that they are chordless, i.e., any two non-adjacent vertices in pi , with i = 1, . . . , k are not connected by an edge. It is easy to see that all these paths are contained within G[Ai ∪ S]. Since S is a clique, any path that goes outside G[Ai ∪S], can be shortened with a direct edge between two vertices in S since Sis a clique.
Chordal graphs
A graph is chordal if for every cycle C in G of length more than three, at least two non-adjacent vertices in C are connected by an edge. We now consider vertex failures instead of edge failures, vertex-connectivity instead of edge-connectivity, and vertex-resiliency instead of edge-resiliency. Namely, a graph G is k-vertexconnected if two arbitrary vertices of G can be disconnected by removing at least k vertices. We say that a set of routing tables R is c-vertex-resilient if any packet forwarded according to R reaches its destination d as long as at most c vertices fail without disrupting physical connectivity between a sender of a packet and d. Also, we replace the word “edge” with the word “vertex” in the definition of k-shared-link-failure-free routing tables, i.e., a packet is guaranteed to hits i distinct failed vertices if it cannot be routed along Ti . We decompose a k-vertex-connected chordal graph into smaller chordal graphs with the same vertexconnectivity. In the base case we have a k-vertexconnected clique and we show that the routing tables constructed in the proof of Theorem 5 is a set of kshared-link-failure-free routing tables also in the vertexresiliency case. Then, we join these smaller chordal graphs while maintaining a set of k-shared-link-failurefree routing tables using the same technique adopted for Clos networks, i.e., routing towards intermediate destinations that are closer to d. We introduce some terminology. A vertex separator S of a graph G is set of vertices, such that after their removal from G, G has at least two components. A vertex separator S is minimal, if if no proper subset of S separates G into two disconnected components. A subgraph of a graph G induced by a set of vertices V 0 ⊆ V (G) is a subgraph G[V 0 ] of G that consists of vertices V 0 together with any edges whose endpoints are both in this
T HEOREM 12. For any k-connected chordal graph there exists a set of (k − 1)-vertex-resilient routing tables. P ROOF. Let G be a k-connected chordal graph. We prove by induction on the decomposition by vertex separators that there exists a set of routing tables that is k-shared-link-failure-free. In the base case, G is a kvertex-connected clique. It is easy to see that the set of routing tables constructed in the proof of Theorem 5 is k-shared-link-failure-free also in the case of vertex failures. In the inductive step, consider a minimal vertex separator S of G. S has cardinality at least k + 1 since G is k-connected. After the removal of S from G, assume that G is divided into n disconnected components, with n ≥ 2. We denote by A1 , . . . , An the set of vertices in each of these disconnected components. Observe that, by Lemma 11, each graph G[Ai ∪ S] is a k-vertex-connected chordal graph. W.l.o.g., assume that d is contained in A1 ∪ S. By induction hypothesis, there exists a set of k-shared-link-failure-free routing tables such that each packet originated at a vertex in A1 ∪ S can reach d under k − 1 vertex failures. For each vertex v in a component Ai , i > 1, by induction hypothesis, each packet p originated at v can reach a vertex s ∈ S under k − 1 vertex failures. When p reaches s, it can use the set of routing tables of G[A1 ∪ S] in order to reach d, which proves the theorem.
4. 9
NEGATIVE RESULTS
We show that simplified forms of failover tables are not sufficiently powerful. It is well-known that without matching the incoming-edge it is not always possible to construct (k − 1)-resilient static routing tables [17]. To overcome this, [36] suggests to route packets based on a circular ordering of the neighbors of each vertex. Namely, we say that a routing table at vertex v is a circular routing table if v routes a packet based on the input port and an ordered circular sequence < e1 , . . . , el > of its incident edges as follows. If a packet p is received from an edge ei , then v forwards it along ei+1 . If the outgoing edge ei+1 failed, v forwards p through ei+2 , and so on. We prove that this simplified routing tables cannot provably guarantee (k − 1)-resiliency even for three-connected graphs.
after the other until the packet reaches its destination. However, if there are more vertices to be protected, it may be not possible to protect all of them. T HEOREM 14. There are a graph G and destination d ∈ V (G) for which no set of vertex-connectivityresilient routing tables exists. P ROOF. Consider the graph G used in the proof of Theorem 13 in Fig. 1(a). Now, replace each edge of G with a path consisting of three edges, as shown in Fig. 1(b). We call the new added vertices intermediate vertices (depicted as small black circles) and the old ones original vertices. Each original vertex of G retains its 3-connectivity to d. It is easy to see that intermediate vertices must forward a packet received through one edge to the other one, if it did not fail. Otherwise, if an intermediate vertex v bounces back to a vertex u a packet, then if all edges incident at u fail, except (v, u), a forwarding loop arises. This implies that we only need to compute routing tables at original vertices. We now prove that the routing tables at the 8 original vertices, except d, must be circular. Once we prove this, the statement of the theorem easily follows from Theorem 13, where we proved that no circular routing tables can guarantee 2-resiliency on G. From now on, we will consider only failures between two intermediate vertices, thus a routing table at each original vertex consists of just four entries: Where to send a packet received from each of its three neighbors n1 , n2 , and n3 and where to send a locally originated packet. We can discard the last entry as it does not influence if a routing table is circular. Hence, we simplify our routing table notation as follows. Let f v (n) = n0 be a routing table at vertex v such that a packet received from a neighbor n is forwarded to a neighbor n0 . We make the following observations. First, for each original vertex v, we have that f v (n) 6= n, with n ∈ {n1 , n2 , n3 } i.e. no vertex bounces a packet back to the edge where it received it, exactly as in the case of intermediate vertices. Second, all entries in the routing table are distinct. Otherwise, suppose by contradiction that, w.l.o.g., f v (n1 ) = f v (n2 ) = n3 and f v (n3 ) = n1 . If both n1 and n3 have a dead-end ahead because of two edge failures, then a forwarding loop among n3 , v, and n1 arises. Hence, routing tables at original vertices must be circular routing tables—a contradiction.
T HEOREM 13. There is a 3-connected graph G for which no 2-resilient set of circular routing tables exist. P ROOF. Consider the 3-connected graph shown in Fig. 1(a), where d is the destination. Suppose, by contradiction, that there exists a 2-resilient set of circular routing tables. Since the graph is symmetric, w.l.o.g, assume that o routes clockwise, i.e., a packet received from x is sent to z, from z to y, and from y to x. Also, w.l.o.g, o sends its originated packet p to y when none of its incident edges fail. We first claim that vertices y, a, and z route counterclockwise. Suppose, by contradiction, that (i) y routes clockwise, or (ii) a routes clockwise, or (iii) z routes clockwise. For each case, consider the following failure scenarios. In case (i), suppose both edges (a, d) and (z, b) fail. In case (ii), suppose both edges (y, c) and (z, b) fail. In case (iii), suppose both edges (y, c) and (a, d) fail. In each case packet p is routed along (y, a, z, o, y) and a forwarding loop arises—a contradiction. Observe now that, in the absence of failures, if c sends a packet p to x, if x routes clockwise it forwards it directly to b, otherwise, if x routes counterclockwise, p is forwarded through o, z, a, y, o, x, and, also in this case, to b. Consider the scenario where both edges (c, d) and (b, d) failed. A packet p received by y from o is routed from c to x and, because of the previous observation, to b. After that, it is routed through (z, o, y) and a forwarding loop arises—a contradiction. We now exploit this result to state the following surprising theorem, which shows that the size of min-cut between two vertices does not match the resiliency guarantee for these two vertices. In other words, even if a vertex v is k-connected to the destination (but not the entire graph), it is not possible to guarantee that a packet originated at v is guaranteed to reach d when k−1 edges fail. Clearly, if we want to protect only one k-connected vertex, we can route along its k edge-disjoint paths one
There is an easy corollary that follows from the previuos proof. We can show that there exists a limit on the resiliency that can be attained in a k-connected graph. It was proved in [10] that ∞-resiliency cannot be guaranteed. We claim a stronger bound: It is not even possible to be robust against two failures that do not disconnect a sender from d.
10
T HEOREM 15. There is a 2-connected graph for which no set of 2-resilient routing tables exists.
tables that maximize the number of vertices that are protected [2, 6, 17, 29]. The latter (i.e., per-incoming port static deterministic routing tables) exploit the incoming port of a packet to infer what links have failed. Our work belongs to this category. Previous work proposed heuristics [3, 36], designed failover mechanisms that guarantee resilience against only one single link/node failure [9, 10, 26, 32, 37, 38], b k2 − 1c-resiliency for k-connected graphs [8], or prove that ∞-resiliency cannot be guaranteed [10]. For specific Clos networks, [23] achieves k − 1 resiliency but no general methodology is described. In contrast, we show how to compute 2resilient routing tables for arbitrary 3-connected graphs, we define a sufficient condition for k − 1 resiliency and show how to exploit it in some recently proposed datacenter topologies, and we show that k-resiliency cannot be guaranteed for any k-connected graph with static routing tables.
P ROOF. Consider the graph G used in the poof of Theorem 14. After having applied all the edge transformations, G becomes 2-connected and we proved that 2-resiliency cannot be achieved. This implies the statement of the theorem. The previous theorem and the promising results shown in Section 3 leads to the following natural and elegant conjecture that relates the size of the min-cut k of a graph to the possibility of constructing routing tables that are robust to k − 1 edge failures. C ONJECTURE 16. For any k-connected graph, it is possible to compute a set of (k − 1)-resilient routing tables. Unfortunately, despite significant effort, we have not yet resolved this conjecture, so we leave this paper on an unsatisfying, yet tantalizing, note.
5. 5.1
5.2
Future Work
The most immediate focus of our future work will be on resolving the conjecture, which remains outstanding despite our best efforts. However, in addition to banging our collective heads against this so-far immovable object, we also plan to investigate a few other avenues. First, we want to consider node failures in addition to link failures. This appears to complicate the question significantly, and positive results seem few and far between for this model. Second, and more open-ended, is considering the impact of randomized routing. Many intractable problems in computer science yield to the power of randomization. The question is whether we can use small amounts of randomized routing (where a routing table isn’t deterministic, but can provide probabilities to route a packet over two or more ports) so that (a) the average delivery time of the packet does not grow fast with the size of the network, while (b) one can achieve resilience to a greater number of failures. We wrote this workshop paper in full recognition of its incompleteness. We have only investigated one particular model of failover, and have not even resolved all of the questions within that narrow context. However, we hope this partial attempt helps to bring together many of the disparate results in the field, all taking slightly different routing models, and inspires the theory community to think about how one might characterize the fundamental resiliency achievable by such models in a more systematic way.
RELATED AND FUTURE WORK Related Work
There is a huge body of literature on related topics, and here we give only a high-level overview. We consider only approaches that do not involve control plane updates (as in [15, 19]) or congestion (as in [6, 20, 33]). We make several distinctions among the studies satisfying these requirements; the first is whether the routing algorithm can rewrite packet headers (inserting/modifying additional state). This category includes [8, 11, 13, 16, 18, 24, 25, 27, 28, 30, 34, 35] and the general thrust of these results (with some variation) is that adding one or a few additional bits (or tunnels) can achieve 1- or 2-resiliency, whereas one can achieve k − 1 resiliency with k bits. When one allows an unlimited list of failed node/links in the packet header, [18] and [31] deliver packets as long as the network remains connected. The next category involves solutions that do not modify the packet header, and here we can further distinguish between solutions that modify the forwarding tables based on packet arrivals, and those that have static tables. The dynamic approaches can deliver packets whenever the network remains connected [21, 22]. Among the static approaches, some depend only on the destination address, and some also depend on the incoming port. The former are guaranteed to deliver packets under any arbitrary non-disconnecting set of failures only if the routing tables are not deterministic, otherwise, for deterministic static routing tables, not only the problem of protecting against one single failure may not admit a solution, but it is even hard to compute routing
6.
REFERENCES
[1] M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network architecture. In SIGCOMM, 2008. [2] A. Atlas and A. Zinin. Basic Specification for IP Fast Reroute: Loop-Free Alternates. IETF, RFC 5286.
11
Correction. In Proc. of SIGCOMM, 2014. [21] J. Liu, A. Panda, A. Singla, B. Godfrey, M. Schapira, and S. Shenker. Ensuring Connectivity via Data Plane Mechanisms. In Proc. of NSDI, pages 113–126, 2013. [22] J. Liu, B. Yan, S. Shenker, and M. Schapira. Data-driven Network Connectivity. In Proc. of HotNets, pages 8:1–8:6, New York, NY, USA, 2011. ACM. [23] V. Liu, D. Halperin, A. Krishnamurthy, and T. E. Anderson. F10: A Fault-Tolerant Engineered Network. In Proc. of NSDI, pages 399–412, 2013. [24] S. Lor, R. Landa, and M. Rio. Packet re-cycling: eliminating packet losses due to network failures. In HotNets IX, page 2. ACM, 2010. [25] M. Motiwala, M. Elmore, N. Feamster, and S. Vempala. Path Splicing. In Proc. of SIGCOMM, pages 27–38, 2008. [26] S. Nelakuditi, S. Lee, Y. Yu, Z.-L. Zhang, and C.-N. Chuah. Fast Local Rerouting for Handling Transient Link Failures. IEEE/ACM Trans. Networking, 15(2):359–372, April 2007. [27] G. T. Nguyen, R. Agarwal, J. Liu, M. Caesar, P. B. Godfrey, and S. Shenker. Slick Packets. SIGMETRICS Perform. Eval. Rev., 39(1):205–216, June 2011. [28] S. Sae Lor, R. Landa, R. Ali, and M. Rio. Handling Transient Link Failures Using Alternate Next Hop Counters. In Proc. IFIP, NETWORKING, pages 186–197, Berlin, Heidelberg, 2010. Springer-Verlag. [29] G. Schollmeier, J. Charzinski, A. Kirstadter, C. Reichert, K. J. Schrodi, Y. Glickman, and C. Winkler. Improving the Resilience in IP Networks. In Proc. HPSR, 2003. [30] M. Shand, S. Bryant, and S. Previdi. IP Fast Reroute using Not-Via Addresses. IETF Internet draft version 05, March 2010. [31] B. Stephens, A. L. Cox, and S. Rixner. Plinko: Building Provably Resilient Forwarding Tables. In Proc. of HotNets, pages 26:1–26:7. ACM, 2013. [32] J. Wang and S. Nelakuditi. IP Fast Reroute with Failure Inferencing. In Proc. of SIGCOMM Workshop on Internet Network Management, INM, pages 268–273, New York, NY, USA, 2007. ACM. [33] Y. Wang, H. Wang, A. Mahimkar, R. Alimi, Y. Zhang, L. Qiu, and Y. R. Yang. R3: Resilient Routing Reconfiguration. In Proc. of SIGCOMM, pages 291–302, 2010. [34] K. Xi and H. J. Chao. IP Fast Reroute for Double-link Failure Recovery. In Proc. IEEE GLOBECOM, pages 1035–1042, Piscataway, NJ, USA, 2009. IEEE Press. [35] M. Xu, Q. Li, L. Pan, and Q. Li. MPCT: Minimum Protection Cost Tree for IP Fast Reroute Using Tunnel. In Proc. IWQoS, pages 16:1–16:3, Piscataway, NJ, USA, 2011. IEEE Press. [36] B. Yang, J. Liu, S. Shenker, J. Li, and K. Zheng. Keep Forwarding: Towards K-link Failure Resilient Routing. In Proc. IEEE INFOCOM, pages 1617–1625, 2014. [37] B. Zhang, J. Wu, and J. Bi. RPFP: IP fast reroute with providing complete protection and without using tunnels. In IWQoS, pages 137–146, 2013. [38] Z. Zhong, S. Nelakuditi, Y. Yu, S. Lee, J. Wang, and C. nee Chuah. Failure Inferencing based Fast Rerouting for Handling Transient Link and Node Failures. In Proc. IEEE INFOCOM, 2005.
[3] A. Atlas and A. Zinin. U-turn Alternates for IP/LDP Fast-Reroute. IETF Internet draft version 03, February 2006. [4] D. S. B. Alspach, J.-C. Bermond. Decomposition into cycles I: Hamilton decompositions. In G. H. et al. (editors) Cycles and rays, pages 351–362, 1990. [5] A. Bhalgat, R. Hariharan, T. Kavitha, and D. Panigrahi. Fast Edge Splitting and Edmonds’ Arborescence Construction for Unweighted Graphs. In Proc. SODA, pages 455–464, 2008. [6] M. Borokhovich and S. Schmid. How (Not) to Shoot in Your Foot with SDN Local Fast Failover - A Load-Connectivity Tradeoff. In OPODIS, pages 68–82, 2013. [7] R. Diestel. Graph Theory. Springer, Berlin, second ed., electronic edition, February 2000. [8] T. Elhourani, A. Gopalan, and S. Ramasubramanian. IP Fast Rerouting for Multi-Link Failures. In Proc. IEEE INFOCOM, pages 2148–2156, 2014. [9] G. Enyedi, G. R´etv´ari, and T. Cinkler. A Novel Loop-free IP Fast Reroute Algorithm. In Proc. EUNICE, pages 111–119. Springer-Verlag, 2007. [10] J. Feigenbaum, P. B. Godfrey, A. Panda, M. Schapira, S. Shenker, and A. Singla. On the resilience of routing tables. In Brief announcement PODC, July 2012. [11] A. Gopalan and S. Ramasubramanian. Multipath Routing and Dual Link Failure Recovery in IP Networks Using Three Link-Independent Trees. In Proc. ANTS, pages 1–6, 2011. [12] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. BCube: A High Performance, Server-centric Network Architecture for Modular Data Centers. In Proc. of SIGCOMM, 2009. [13] S. Kini, S. Ramasubramanian, A. Kvalbein, and A. F. Hansen. Fast Recovery From Dual-Link or Single-Node Failures in IP Networks Using Tunneling. IEEE/ACM Trans. Netw., 18(6):1988–1999, 2010. [14] P. S. Kumar and C. E. V. Madhavan. Minimal vertex separators of chordal graphs. Discrete Applied Mathematics, 89(1-3):155–168, 1998. [15] N. Kushman, S. Kandula, D. Katabi, and B. M. Maggs. R-BGP: Staying Connected in a Connected World. In Proc. of NSDI, 2007. ˇ cic, S. Gjessing, and [16] A. Kvalbein, A. F. Hansen, T. Ciˇ O. Lysne. Multiple Routing Configurations for Fast IP Network Recovery. IEEE/ACM Trans. Networking, 17(2):473–486, April 2009. [17] K.-W. Kwong, L. Gao, R. Gu´erin, and Z.-L. Zhang. On the Feasibility and Efficacy of Protection Routing in IP Networks. IEEE/ACM Trans. Networking, 19(5):1543–1556, October 2011. [18] K. Lakshminarayanan, M. Caesar, M. Rangan, T. Anderson, S. Shenker, and I. Stoica. Achieving convergence-free routing using failure-carrying packets. In SIGCOMM, 2007. [19] A. Li, X. Yang, and W. David. SafeGuard: Safe Forwarding During Route Changes. In Proc. of CoNEXT, pages 301–312. ACM, 2009. [20] H. H. Liu, S. Kandula, R. Mahajan, M. Zhang, and D. Gelernter. Traffic Engineering with Forward Fault
12