Local Detection and Recovery from Multi-Failure Patterns in MPLS-TE Networks Marco Tacca, Kai Wu, Andrea Fumagalli
Jean-Philippe Vasseur
Optical Networking Advanced Research (OpNeAR) Laboratory Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Email: {mtacca, kxw016500, andreaf }@utdallas.edu
Cisco Systems 300 Beaver Brook Road Boxborough , MA 01719 USA Email:
[email protected]
Abstract— MPLS Fast Reroute advocates local protection mechanisms to rapidly reroute traffic onto pre-computed and signaled bypass tunnels. When the network is subject to multiple element failures, it becomes challenging to handle all the possible failure scenarios, for they are more disruptive and may require many more bypass tunnels to be dealt with. The objective of this paper is to adapt the MPLS local recovery schemes to deal with multi-failure scenarios, while retaining as much as possible the simplicity and the fast failure detection feature of current (single failure) local recovery mechanisms. This objective is achieved by optimally grouping failure patterns into clusters and minimizing the overall additional network resources — i.e., bypass tunnels and bidirectional forwarding detection sessions — while yielding full recovery from such failure patterns.
I. I NTRODUCTION As more real-time and mission-critical network applications rely on IP/MPLS networks, prompt recovery of data exchange at the IP/MPLS layer from network element failures is becoming increasingly important. Fast ReRoute (FRR) [1] is a scalable recovery scheme in MPLS networks with Traffic Engineering (TE). FRR is based on local recovery, whereby traffic rerouting is performed by local nodes, i.e., the node immediately upstream of the failing network element. For each immediately downstream network element that may fail, one or more pre-computed bypass tunnels are available to the local node. Upon detection of a downstream element failure, the affected TE Label Switched Paths (LSP’s) are swiftly rerouted by the local node via the available bypass tunnels. This technique guarantees prompt recovery by making use of local signaling for fault detection and pre-computed bypass tunnels. The time required to complete the traffic recovery depends solely on the time required to detect the (local) failure and to perform the (local) rerouting onto the respective bypass tunnel(s). Thus, the network size does not adversely affect recovery speed. Besides single network element failures, today’s network are increasingly facing a variety of additional failure patterns [2], [3]. These patterns include multiple network element failures. Multiple outages may be either correlated by a common This work was in part supported by NSF grants # ANI-0082085 and # CNS0435393, and the Cisco University Research Program with the University of Texas at Dallas.
physical failure and occurring concurrently, or not correlated by a common physical failure but overlapping in time [2]. The extension of the FRR solution to cope with these additional failure patterns represents, however, a challenge. First, multi-failure patterns are expected to generate greater data traffic disruptions in the network, when compared to single-failure ones. Second, pre-computation of the bypass tunnels must take into account the occurrence of many more (including single and multi) failure patterns. It may be even necessary to make use of multiple bypass tunnels at the local node to protect the same LSP(’s), each tunnel being chosen to cope with a specific set of failure patterns. Third, while nodes can quickly detect the status of their adjacent links [1], the detection of other (non-adjacent) network element failures may take significant time1 . Note that accurately and quickly detecting the latter set of failures enables first to detect and identify the occurring failure pattern(s), and second to choose the appropriate bypass tunnel(s) for the rerouting of the disrupted LSP’s. An incorrect choice of the bypass tunnel(s) during this critical phase may lead to temporary loss of traffic. The objective of this paper is to adapt the MPLS local detection and recovery schemes to deal with multi-failure scenarios, while retaining the advantages of current (single failure) local recovery mechanisms. For the detection of non-adjacent network element failures, Bidirectional Forwarding Detection (BFD) sessions are used at the local node [6]. For containing the number of bypass tunnels and BFD sessions at the local node a probabilistic model is adopted, in which it is assumed that past statistics on the network failures are known to the MPLS control plane. This probabilistic assumption is supported by the fact that, in IP/MPLS networks, some network elements are more prone to failure than others, and some network element failures are more coupled than others [2], [3]. Using these available statistics, a number of Probable Failure Patterns (PFP’s) [7] are identified in the network. The PFP may be viewed as a generalization of the well-known Shared Risk Link Group (SRLG) [8]. The PFP has a more general interpretation than the SRLG in that two or more network elements may belong 1 Routing protocols such as OSPF [4] and IS-IS [5] may take tens of seconds, making these solutions not suitable to fast failure protection.
TABLE I O BJECTS AND F UNCTIONS G e n p c b k
Objects graph link node PFP PFP cluster BP path BFD path
E(·) N (·) EA (n) NA (e) P(e)
Functions set of links in the object set of nodes in the object set of links adjacent to n set of nodes adjacent to e set of PFP’s that contains e
D
F
A
B
C
E (a) PFP links
to the same PFP when their failure times overlap, even though they may not be caused by a common physical failure. PFP’s may be logically grouped to form PFP clusters. A single bypass tunnel is then assigned to each PFP cluster. The problem of choosing the PFP clusters, minimizing the aggregated number of the bypass tunnels and the BFD sessions, while guaranteeing 100% recovery, is formulated using integer linear programming (ILP). Additionally, the approximate 2-step approaches are described to help reducing the computational time required to solve the problem. The advantage of the proposed solution based on PFP clusters is twofold. First, by controlling the way PFP clusters are chosen, it is possible to control the number of required bypass tunnels, and thus to control the complexity of the recovery scheme at the local node. Second, since the failure detection phase does not need to detect the exact PFP failed in the PFP cluster, the requirements on the detection scheme are relaxed. Fewer BFD sessions may be required when compared to detecting exactly which PFP is taking place, which makes the overall failure recovery procedure more effective and faster. As it is already the case for single failure FRR schemes, the proposed approach scales with the network size thanks to the local nature of the recovery scheme, as well as with the number of failure scenarios that can be handled due to the PFP cluster concept. II. N ETWORK D ESCRIPTION This section provides more details on PFP, PFP cluster, and the failure detection and recovery scheme. Table I reports a summary of objects and functions used in the description. Consider an MPLS network with arbitrary topology. Let the network be represented by an undirected graph G. N (G) is the set of network nodes. E(G) is the set of network undirected links. Link (n, n0 ) ∈ E(G) is said to be adjacent to nodes n, n0 ∈ N (G). EA (n) is the set of links adjacent to node n, and NA ((n, n0 )) = {n, n0 }. Consider a failure pattern that may occur with a nonnegligible probability. The pattern may contain both link and node failures. Any link failure is assumed to disrupt traffic in both of the link directions. A failed node is considered to be equivalent to the failure of all its adjacent links, i.e., a failure of node n is equivalent to a pattern that contains all links EA (n), both disconnecting n from the rest of the graph. The failure pattern is represented by PFP p. Note that the set of links in p, i.e., E(p), contains all the failed links and all links adjacent to any failed node in p. It is assumed that all
E
A
C
B
F
D
(b) PFP node Fig. 1. Protection procedures
the PFP’s created for the same graph are mutually exclusive events, i.e., when a PFP fails, all network elements not in the PFP are operational. The failure protection scheme is based on local recovery, whereby traffic rerouting is performed by the Point of Local Recovery (PLR), i.e., the node immediately upstream of the failed link. Since link failures are bidirectional, the failure of link (i, j) involves both PLR’s i and j in the recovery procedure. More in general, the failure of multiple links that belong to the same PFP p may require up to 2|E(p)| PLR’s to act at once, where |E(p)| is the number of links in p. Each PLR works independently of the others. Upon detection of a specific PFP, the affected outgoing traffic is swiftly rerouted locally by the PLR(s) on bypass tunnels (BP’s). Depending on the nature of the PFP, there are two distinct protection procedures that may be undertaken by the PLR. If the PFP does not disconnect any node from the graph, the following steps are taken after the failed PFP is detected. For each failed link (i, j) ∈ E(p), PLR i (j) reroutes its outgoing traffic on (i, j) using one BP that terminates at j (i). Note that distinct PFP’s may require the use of different BP’s to deal with the same failed link. Figure 1(a) illustrates these steps. The links in PFP p are E(p) = {(A, B), (D, B), (B, C)}. Node A is the PLR responsible for rerouting traffic around link (A) (A, B) and can choose between 2 BP’s, b1 : A − D − B and (A) b2 : A − E − B. Node B is the PLR responsible for rerouting traffic around links (B, C) and (B, A). Node B has a total of (B) (B) (B) 3 BP’s, i.e., b1 : B − F − C, b2 : B − D − A, and b3 : B − E − A. After that both nodes A and B (independently) detect the failed PFP, they reroute their traffic on their respective (A) BP’s. Node A makes use of b2 : A − E − B. Node B makes (B) (B) use of b1 : B − F − C and b3 : B − E − A. Note that the
incorrect detection of the failed PFP might lead node A to (A) reroute on b1 : A − D − B, which is non-operational. If the PFP disconnects node n ∈ N (G), all traffic terminating at n cannot be recovered. However, the in-transit traffic at node n can and must be protected. Let Nn = NA (EA (n))\{n} be the set of PLR’s that handle the rerouting of the in-transit traffic at n. For self evident reasons, every n0 ∈ Nn cannot reroute its outgoing traffic on (n0 , n) using a BP that terminates at n. Thus, additional BP’s are necessary at every n0 ∈ Nn to bypass node n. Specifically, |EA (n)| − 1 additional BP’s are necessary for every PLR n0 ∈ Nn . Each BP starts at PLR n0 and terminates at one of the nodes in set Nn − {n0 }. Upon detection of a failed PFP that contains node n, all the BP’s defined above are used by their respective PLR’s for traffic rerouting. Figure 1(b) illustrates the above procedure. The failed PFP disconnects node B from the graph. Nodes A, B, and C are PLR’s. For simplicity we focus on PLR A. As soon as PLR A detects the failed PFP, it reroutes its outgoing (A) traffic using |EA (n)| − 1 = 2 BP’s, i.e., b1 : A − E − C and (A) b2 : A − F − D. It is assumed that a PLR can detect the failure of its adjacent links as defined in the single link failure FRR. Detection of non-adjacent link failures is instead based on establishing BFD sessions between the PLR and some other nodes n ∈ N (G). BFD sessions might or might not follow the same path as the BP’s that originate at the PLR. A BFD session is able to inform the PLR whether or not all its links are operational. However, it does not provide specific information about the failed link. PLR may use multiple BFD sessions with different paths at once to logically detect the PFP cluster that for sure contains the failed PFP. As already mentioned in section I, PFP’s may be partitioned to form disjoint PFP clusters. The element set of a PFP cluster, c, contains all S the network elements that belong to its PFP’s, i.e., E(c) = p∈c E(p). Each c is associated with a BP b that must be operational under the occurrence of any PFP p ∈ c. In other terms, E(k) ∩ E(c) = ∅. Let P (e) be the set of all PFP’s that contain link e, and Ze be a partition of P (e). Ze is a collection of disjoint PFP clusters. Note that link e belongs to all |Ze | PFP clusters, i.e., e ∈ c, ∀c ∈ Ze . Therefore, it requires up to |Ze | bypass tunnels. The challenge of this solution is to find the optimal partitioning of set Pe such that, under any failure scenario, at least one bypass tunnel is operational. Note that with high values of |Ze |, failure detection is more accurate but more complex, and the number of required BP’s may increase proportionally. With small values of |Ze |, the number of PFP’s in the same PFP cluster becomes large, making it difficult to find their respective BP’s. III. P ROBLEM F ORMULATION For each PLR, the computation of the three sets — i.e., the PFP clusters, the BFD sessions, and the BP’s — requires a combined solution, since each set may depend on the other two. The three sets are computed to minimize the total number of BFD sessions and BP’s while guaranteeing full
detection and recovery capability at each PLR. This objective is chosen to contain the complexity of the PLR — which must handle both BFD sessions and BP’s — while guaranteeing full detection and recovery capability. The object1-object2 incident binary matrix is used to record the relationship between objects. When object1 is a set of object2, the i, j entry of the matrix is set to 1 if j is in i. When both object1 and object2 are sets of the same object, the i, j entry is set to 1 if i and j are not disjoint. Consider PLR n ∈ N (G)2 . • E = E(G), set of links in G, • EA = EA (n), set of adjacent links of n, S • P = e∈EA P (e), set of PFP’s that contain one of the adjacent links of n, • Ce , set of candidate PFP clusters of PFP’s in P (e), or the power S set of P (e), • C = e∈EA Ce , set of candidate PFP clusters of PFP’s in P , • Be , set of BP paths to protect adjacent link e ∈ EA , S • B = e∈EA Be , set of BP paths that originate at n, • K, set of BFD paths, • X ⊆ B, set of BP paths chosen to establish BP’s, • Xe = X ∩ Be , set of the chosen BP’s to protect e ∈ EA , • Y ⊆ K, set of BFD paths chosen to establish BFD sessions, S • Z = e∈EA Ze , set of PFP clusters chosen to form partitions of P (e), ∀e ∈ EA , • F = {fp } = {fp,e }|P |×|E| , the PFP-link incidence matrix, • G = {gb } = {gb,e }|B|×|E| , the BP path-link incidence matrix, • H = {hk } = {hk,e }|K|×|E| , the BFD path-link incidence matrix, • U = {ub } = {ub,p }|B|×|P | , the BP path-PFP incidence matrix, • V = {vk } = {vk,p }|K|×|P | , the BFD path-PFP incidence matrix, • R = {rc } = {rc,b }|C|×|B| , protectability matrix, {rc,b } takes on 1 if BP path b can protect all PFP’s in candidate PFP cluster c,0 otherwise, • S = {sc } = {sc,k }|C|×|K| , separability matrix, sc,b takes on 1 if BFD path k can separate candidate PFP cluster c, 0 otherwise, • X = {xb }|B|×1 , binary vector representation of X, xb = 1 if b ∈ X, • Y = {yk }|K|×1 , binary vector representation of Y , yk = 1 if k ∈ Y , • A = {ac } = {ac,b }|Z|×|X| , the BP assignment matrix, ac,b = rc,b , ∀c ∈ Z, ∀b ∈ X, • D = {dp } = {dp,k }|P |×|Y | , the detection matrix, dp,k = vk,p , ∀p ∈ P, ∀k ∈ Y ; dp is called the detection vector for p. The following definitions are also introduced 2 Where possible, the formalism introduced in Section II is simplified to improve readability, e.g., E = E(G).
integer linear programming (ILP) model in (4). OPT1 s.t.
Fig. 2. Problem Formulation
•
•
•
a PFP cluster c ∈ Ce is protectable by BP path b ∈ Be iff b is disjoint to all the PFP’s in c, or ∀p ∈ c, gb · fpT = 0; b is also called to protect c, an unordered pair of PFP’s (p, q) is detectable by a BFD path b iff one of the PFP’s (p) is disjoint with b and the other (q) is not, or, gb · fpT = 0 ∧ gb · fqT > 0, a PFP cluster c is separable by a BFD path k iff at least one pair of PFP’s in c is detectable by k; k is also called to separate c.
The protectable relationship among BP paths and candidate PFP clusters is defined in the protectability matrix R, while the separable relationship among BFD paths and candidate PFP clusters is defined in the separability matrix S. Figure III illustrates the relationship among the defined parameters and variables. The solid arrows indicate the direction of derivation. Define multiplication ¯ between two binary matrices as the conventional matrix multiplication in which additions are replaced with logical AND. U and V can be obtained from F , G and H in (1), U = G ¯ FT V = H ¯ FT
(1)
The protectability matrix R is obtained in (2) using boolean addition. ½ P c ∈ Ce ∧ b ∈ B e p∈c ub,p rc,b = , ∀e ∈ EA (2) 0 otherwise The separability matrix S can be obtained in (3) using boolean addition and boolean multiplication. X X vk,p · sc,k = vk,p (3) p∈c
p∈c
Using the above notations and definitions, we formulate an
= (a)
min |X| + |Y | R · X + S · Y ≥ 1C×1
(4)
Objective function (4) minimizes the total number of BFD sessions and BP’s chosen. (4a) states that any candidate PFP cluster, i.e., each subset of all PFP’s sharing one adjacent link of n, or ∀c ∈ Ce , ∀e ∈ EA , is either separable by a chosen BFD path or protectable by a chosen BP path3 . In fact, for any adjacent link e ∈ EA , if a subset of Pe is not protected by any BP b ∈ Be , it must be separated into two clusters by a chosen BFD path k ∈ K. After the BFD paths are chosen to establish BFD sessions, the detection matrix D is determined from Y and U according to the definition. The PFP’s are grouped into clusters by comparing the detection vector dp in D for each PFP p. PFP’s in Pe sharing the same value of detection vector are grouped into the same PFP cluster in Ze . Ze provides a partition for Pe . The values of the detection vectors represent the possible combinations of detection results of the BFD sessions in Y when a PFP fails. A value of 1 in the vector indicates that the corresponding BFD session b reports failure. A value of 0 it reports success after detection. By comparing the detection results of all BFD sessions in Y with the detection vectors, the PLR can uniquely identify the failed PFP cluster for each failed adjacent S link. After Z = e∈EA Ze is determined, the BP assignment matrix A can be obtained from Z and R according to the definition. ac,k = 1 indicates that BP b can be selected as bypass tunnel to reroute affected LSP’s over the failed adjacent link when PFP cluster c ∈ Z is the detected PFP cluster that contains the failed PFP. When there are two or more ones in ac , only one of the eligible BP’s will be selected to protect c. IV. T WO - STEP A PPROACHES It can be demonstrated that OPT1 is a set cover problem, which is NP hard. The set cover problem cannot be approximated efficiently below threshold (1−o(1)) ln η of the optimal solution [9], where η is the size of the set to cover, i.e., the number of constraints in (4a), |C|, which is o(2|Pea | · |EA |). The size of the problem can be reduced significantly by decomposing the matrix in (4a). First, any single PFP cluster that is inseparable by any BFD paths must be protected. We can find the required BP’s by solving the sub-problem of finding the minimum number of BP’s satisfying only the protectability requirement of these unseparate PFP clusters. OPT2
=
min |X|
s.t.
(a)
rc · X ≥ 1|P |×1 , ∀c ∈ {{p}|∀p ∈ P }
(5)
. The solution of OPT2 determines X. The set of BFD sessions Y still needs to be chosen. Two options are provided. 3 There are actually cases when a candidate PFP cluster is neither protectable or separable, e.g., a candidate PFP cluster containing only one PFP of the node failure of a neighboring node. In these cases, the corresponding inequality constraint is ignored, as no recovery is possible for the adjacent link for PFP’s in such PFP clusters.
TABLE II D ISTRIBUTION OF PFP’ S
A. BFDusesBP Two-Step Approach The first approach (BFDusesBP) simply reuses the BP paths in X as BFD paths, i.e., the chosen BFD sessions would use the same paths as the chosen BP’s. However, not all the BP paths are needed to be reused as BFD paths. In fact, for each adjacent link e ∈ EA , |Xe | − 1 out of the |Xe | chosen BP paths can be reused as chosen BFD paths, P where Xe is the set of BP’s to protect e. Therefore, a total of e∈EA |Xe | − 1 = |X|−|EA | BFD sessions are required for the PLR n. The PFP clusters obtained by these BFD sessions are guaranteed to each have at least one protectable BP. In fact, the BFD that reports working means that the BP sharing the same path would work if traffic is switched over that BP. If all BFD sessions reports failed, the BP not sharing any path with the BFD sessions is selected to reroute the affected LSP’s. The BFDusesBP approach, although simple, does not guarantee a minimal number of BFD sessions for a number of reasons. First, the BP’s are not shared among adjacent links. Hence, the BFD sessions obtained using the same paths from the chosen BP’s cannot be shared by adjacent links either. Second, as discussed earlier, the working status of the adjacent links is usually immediately available to the PLR. Such information partially overlaps with the information that the chosen BP paths would provide. B. BFDafterBP Two-Step Approach Learning from the above considerations, the second approach (BFDafterBP) chooses the BFD sessions irrespectively of the chosen BP paths. The result found for X is substituted into (4). Any candidate PFP cluster that is unprotectable requires a BFD session to separate them. The BFD sessions are chosen by solving the sub-problem of finding the minimum number of BFD sessions satisfying only the separability requirement of the unprotectable PFP clusters. In addition, the number of candidate unprotectable PFP clusters in Ce can be reduced by ignoring those PFP’s that can be protected by any BP in Xe . This procedure is based on the observation that if a candidate PFP cluster is separable, the superset of the candidate PFP cluster is separable as well, and that only those PFP’s that can not be protected by at least one BP in Xe are required to form an unprotectable PFP cluster in Ce . OPT3 s.t.
= (a)
min |Y | sc · Y ≥ 1|Te |×1,∀ea ∈EA ,∀c∈Te
(6)
where Te = {c|∀p ∈ c, ∃b ∈ Xe , fb,e = 1} The PFP clusters obtained by the BFD sessions found in the solution of OPT3 are guaranteed to each have at least one protectable BP in X, as all unprotectable PFP clusters are separated satisfying (6a). V. S IMULATION AND R ESULTS In this section, a study case is presented to evaluate the performance of the 3 solutions described in Section III. A 6×6 torus network is considered. PFP’s are randomly generated
Fault type Single Link 2 Adjacent Links 2 1-hop-away Links 2 2-hop-away Links Triple Adjacent Links Single Node
Percentage 100% 50% 20% 10% 25% 75%
TABLE III C OMPARISON : THREE SCHEMES
#H 3 5 7
One Step, #C |X| |Y| 6 2 6 2 6 2
10620 |Z| 15 15 14
Two Steps: BP first, #C 36 BFDusesBP BFDafterBP, #C 11 |X| |Y| |Z| |X| |Y| |Z| 6 2∗ 15 6 2 15 6 2 15 6 2 15 6 2 15 6 1 15
in the network following the distribution of Table II. For each type of failure pattern, e.g., a single link failure, the table shows the percentage of how many (out of all possible such patterns in the torus) actual PFP’s are used in the experiment. The candidate BP paths are found by running the k-shortest path algorithm [10]. Only paths up to five hops are used, i.e., B = {shortest path b s.t., |E(b)| ≤ 5}. The candidate BFD paths are found using a breadth first search algorithm. The maximum hop count for the paths is varied from three to seven during the experiment, i.e., K = {breadth first path k s.t., |E(k)| ≤ #H} where #H = 3, 5, 7. During the experiment, one PLR is chosen at random. There is a total of 39 PFP’s for the chosen PLR. All three solutions are reported in the Table III, i.e, from left to right, the one-step solution (ILP OPT1) in (4), the two-step solution with BFDusesBP (ILP OPT2) in (5), and the twostep solution with BFDafterBP (ILP OPT3) in (6). The table reports: the hop limit of the BFD sessions #H, the number of BP’s found |X|, the number of BFD session’s found |Y |, the number of PFP clusters chosen |Z|, and the number of constraints #C of each generated ILP formulation instance. The results found by the two-step solutions are comparable with the results found by the one-step solution. However, with the two-step solutions, the number of constraints in the ILP formulation is significantly reduced. The BFDafterBP solution requires one BFD session fewer than the BFDusesBP solution does when #H = 7. The asterisk in the table marks an entry for which the length of the BFD paths exceed 3 hops. This is due to the fact that BFD sessions in this solution are the same as the BP’s. Similar results are obtained for other topologies and using other distributions of PFP’s. VI. C ONCLUSION An MPLS TE scheme for rapid local fault detection and recovery in networks that may be hit by multiple concurrent failures was presented. In the scheme, the detection of a PFP (probable failure pattern) is achieved via BFD (Bidirectional
Forwarding Detection) sessions originating at the PLR (Point of Local Recovery). Recovery is obtained via multiple alternate BP’s (bypass tunnels), each designed to cope with a specific subset of the PFP’s, i.e., the PFP cluster. The problem to find the minimum number of BFD sessions and BP’s while guaranteeing 100% recovery from any PFP was formulated as an integer linear program, which is equivalent to a set cover problem. It was shown how to reduce the size of the set cover problem by dividing the problem solution into two steps. First, find the candidate BP’s to protect all PFP’s. Then, find the set of BFD sessions required to detect the PFP’s by their respective PFP cluster. One BP is then chosen from the candidate BP’s to protect all the PFP’s in the same PFP cluster. By simulation it was shown that the approximate two-step solution yields results that are close to those of the optimum single step approach, while significantly reducing the problem size. R EFERENCES [1] P. Pan and et al, “Fast reroute extensions to RSVP-TE for LSP tunnels,” IETF RFC 4090, May 2005. [2] G. Iannaccone, C.-N. Chuah, R. Mortier, S. Bhattacharyya, and C. Diot, “Analysis of link failures over an IP backbone,” in ACM SIGCOMM Internet Measurement Workshop, Marseilles, France, Nov. 2002. [3] A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, and C. Diot, “Characterization of failures in an IP backbone,” in IEEE Infocom, Hong Kong, March 2004, Sprint ATL Research Report. [4] J. Moy, “Ospf version 2,” IETF RFC 2328, June 2003. [5] R. White and A. Retana, IS-IS: Deployment in IP networks. Artech House, 1992. [6] D. Katz and D. Ward, “Bidirectional forwarding detection,” IETF, March 2005, draft-katz-ward-bfd-02.txt. [7] A. Fumagalli, M. Tacca, K. Wu, and J. Vasseur, “Local recovery solutions from multi-link failures in mpls-te networks with probable failure patterns,” in IEEE Globecom, Dallas, December 2004. [8] J. Strand and A. Chiu, “Issues for routing in the optical layer,” Communications Magazine, vol. 39, no. 2, pp. 81–87, Feb 2001. [9] U. Feige, “A threshold of ln n for approximating set-cover,” 1996. [10] D. Eppstein, “Finding the k shortest paths,” in IEEE Symposium on Foundations of Computer Science, 1994, pp. 154–165.