Pre-Configuring IP-over-Optical Networks to ... - Semantic Scholar

Pre-Configuring IP-over-Optical Networks to Handle Router Failures and Unpredictable Traffic Murali Kodialam T. V. Lakshman Sudipta Sengupta Bell Laboratories, Lucent Technologies, Holmdel, NJ, USA Massachusetts Institute of Technology, Cambridge, MA, USA

Abstract— We consider the realization of traffic-oblivious routing in IP-over-Optical networks where routers are interconnected over a switched optical backbone, also called IP-over-OTN (Optical Transport Network). The traffic-oblivious routing we consider is a scheme where incoming traffic is first distributed in a preset manner to a set of intermediate nodes. The traffic is then routed from the intermediate nodes to the final destination. This splitting of the routing into two-phases simplifies network configuration significantly [8], [17]. In implementing this scheme, the first and second phase paths are realized at the optical layer with router packet grooming at a single intermediate node only. Studies like [10] indicate that IP routers are 200 times more unreliable than traditional carrier-grade switches and average 1219 minutes of down time per year. Given this unreliability of routers, we consider how two-phase routing in IP-over-OTN can be made resilient against router node failures. We propose two different schemes for provisioning the optical layer to handle router node failures – one that is failure node independent and static, and the other that is failure node dependent and dynamic. We develop linear programming formulations for both schemes and a fast combinatorial algorithm for the second scheme so as to maximize network throughput. In each case, we determine (i) the optimal distribution of traffic to various intermediate routers for both normal (no-failure) and failure conditions, and (ii) provisioning of optical layer circuits to provide the needed inter-router links. We evaluate the performance of the two router failure protection schemes (in terms of throughput) and compare it with that of unprotected routing. For our experiments, we use actual ISP network topologies collected for the Rocketfuel project.

I. I NTRODUCTION With the increasing use of the Internet for carrying realtime traffic such as VoIP traffic, it has become necessary for service providers to make their data networks highly reliable. Also, service providers have to engineer their networks to handle multiple traffic patterns while avoiding congestion. This requires constant traffic monitoring, traffic forecasts and adapting the network routing to changing traffic conditions and to failures. This considerably increases the operational complexity. Ideally, service providers would like to provision their networks so that network operation is robust to changes in traffic patterns (avoiding the need for frequent reconfiguration) while also accommodating failures in a fast and efficient manner. The need to accommodate multiple traffic patterns has led to interest in the hose traffic model [5] where the only traffic assumptions needed are the total amount of traffic entering and leaving each network ingress or egress port. The actual traffic matrix itself need not be known. Several

routing and capacity allocation schemes for the hose model have been proposed recently. An important scheme that allows the network to be statically configured so as to accommodate multiple traffic patterns is two-phase routing [8], [17]. Here traffic entering the network, instead of being directly sent to an egress-router, is first sent to an intermediate node, and from there is sent to the final egress-router. The first-phase distribution of traffic to the intermediate nodes is done in predetermined proportions that are depend on the intermediate nodes. Throughout this paper, we will refer to this scheme as two-phase routing. The main contribution of this paper is in incorporating mechanisms for guaranteed QoS routing despite router failures while preserving the traffic-independence properties of two-phase routing. In this paper, we focus on two-phase routing in IP-overOptical networks where routers are interconnected over a switched optical backbone, also called IP-over-OTN (Optical Transport Network). In this architecture, the first and second phase paths are realized at the optical layer with router packet grooming at a single intermediate node only. Studies like [10] indicate that IP routers are 200 times more unreliable than traditional carrier-grade switches and average 1219 minutes of down time per year. Given this unreliability of routers, we consider how two-phase routing in IP-over-OTN can be made resilient against router node failures. We propose two different schemes for provisioning the optical layer to handle router node failures – one that is failure node independent and static (called failure independent provisioning), and the other that is failure node dependent and dynamic (called failure dependent provisioning). We develop linear programming formulations for both schemes and a fast combinatorial algorithm for the second scheme so as to maximize network throughput. In each case, we determine (i) the optimal distribution of traffic to various intermediate routers for both normal (no-failure) and failure conditions, and (ii) provisioning of optical layer circuits to provide the needed inter-router links. We view this as an important progress for two-phase routing in IP-over-OTN towards achieving carrierclass reliability so as to facilitate its future deployment in ISP networks. The combinatorial algorithm developed for failure dependent provisioning is a Fully Polynomial Time Approximation Scheme (FPTAS). An FPTAS is an algorithm that finds a solution with objective function value within (1 + ²)-factor of the optimal solution and runs in time that is a polynomial function of the input parameters and 1² . The input parameters

in our problem are the number of nodes n and links m in the network, and the size (number of bits) of the input numbers, e.g., link capacities and node ingress-egress capacities. The value of ² can be chosen to provide the desired degree of optimality for the solution. We assume a single router failure model under which bandwidth can be shared across different router node failure scenarios. The focus on shared backup bandwidth allocation in this paper is because of its reduced cost, the rarity of concurrent multiple router PoP (Point-of-Presence) failures in networks, and the increased complexity of the optimization problems that arises from sharing backup bandwidth. We focus on throughput (which is the reciprocal of maximum link utilization) as the performance metric because it is primarily related to link-congestion. Also, it is the most common metric used in the literature. The paper is structured as follows. In Section II, we discuss some aspects of the inherent difficulty in measuring traffic and introduce the traffic variation model. Related work is reviewed in Section III. In Section IV, we briefly discuss the two-phase routing scheme so as to provide context for this paper. IP-overOTN architecture and the application of two-phase routing to it is discussed in Section V. In Section VI, we propose two schemes for protecting against single router node failures in two-phase routing in IP-over-OTN. Algorithms for maximum throughput routing under these two schemes are developed in Sections VII and VIII respectively. We evaluate the performance of these router node failure protection schemes (in terms of throughput) and compare it with that of unprotected two-phase routing in Section IX. For our experiments, we use actual ISP network topologies collected for the Rocketfuel project [11]. We conclude and point to future work in Section X. Proofs of theorems establishing the performance guarantees and running times of the combinatorial algorithm in Section VIII are presented in the Appendix (Section XI). We briefly describe some notation below before moving on to the next section. A. Notation We assume that we are given a network G = (N, E) with node set N and (directed) edge set E where each node in the network can be a source or destination of traffic. Let |N | = n and |E| = m. The sets of incoming and outgoing edges at node i are denoted by E − (i) and E + (i) respectively. We let (i, j) represent a directed link in the network from node i to node j. To simplify the notation, we will also refer to a link by e instead of (i, j). The capacity of link (i, j) will be denoted by uij . The utilization of a link is defined as the traffic (sum of working traffic and maximum restoration traffic due to any single router node failure) on the link divided by its capacity. II. T RAFFIC M EASUREMENT AND VARIABILITY In an utopian network deployment scenario where complete traffic information is known and does not change over time, we can optimize the routing for that single traffic matrix – a large volume of research has addressed this problem. The most important innovation of the two-phase routing scheme is the

handling of traffic variability in a capacity efficient manner through static pre-configuration of the network and without requiring either (i) measurement of traffic in real-time or (i) re-configuration of the network in response to changes in it. We address the difficulties associated with (i) in this section and then introduce the traffic variation model. The difficulties associated with (ii) for IP-over-Optical networks are addressed in Section V-B. A. Difficulties in Measuring Traffic Network traffic is not only hard to measure in real-time but even harder to predict based on past measurements. Direct measurement methods do not scale with network size as the number of entries in a traffic matrix is quadratic in the number of nodes. Moreover, such direct real-time monitoring methods lead to unacceptable degradation in router performance. In reality, only aggregate link traffic counts are available for traffic matrix estimation. SNMP provides these data via incoming and outgoing byte counts computed per link every 5 minutes. To estimate the traffic matrix from such link traffic measurements, the best techniques today give errors of 20% or more [12]. The emergence of new applications on the Internet, like P2P (peer-to-peer), VoIP (voice-over-IP), and video-on-demand has reduced the time-scales at which traffic changes dynamically, making it impossible to extrapolate past traffic patterns to the future. Currently, ISPs handle such unpredictability in network traffic by gross over-provisioning of capacity. This has led to ISP networks being under-utilized to as low as 20% [12]. B. Traffic Variation Model We consider a traffic variation model where the total amount of traffic that enters (leaves) an ingress (egress) node in the network is bounded by the total capacity of all external ingress links at that node. This is known as the hose model and was proposed by Duffield et al. [5] as a method for specifying the bandwidth requirements of a Virtual Private Network (VPN). Note that the hose model naturally accommodates the network’s ingress-egress capacity constraints. Moreover, conformance of network traffic to the model can be monitored in real-time using simple SNMP aggregate link traffic measurements and policed (or, enforced) using Diffserv-type mechanisms [3]. We denote the upper bounds on the total amount of traffic entering and leaving at node i by Ri and Ci respectively. The point-to-point matrix for the traffic in the network is thus constrained by these ingress-egress link capacity bounds. These constraints are the only known aspects of the traffic to be carried by the network, and knowing these is equivalent to knowing the row and column sum bounds on the traffic matrix. That is, any allowable traffic matrix T = [tij ] for the network must obey n X j:j6=i

tij ≤ Ri ,

n X

tji ≤ Ci ∀ i ∈ N

j:j6=i

For given Ri and Ci values, denote the set of all such matrices that are partially specified by their row and column

Source Node

sums by T (R, C), that is X X T (R, C) = {[tij ]| tij ≤ Ri and tji ≤ Ci ∀ i} j6=i

Phase 1 Tunnel Phase 1 Routing

j6=i

We will use λ · T (R, C) to denote the set of all traffic matrices in T (R, C) with their entries multiplied by λ. Note that the traffic distribution T could be any matrix in T (R, C) and could change over time. Two-phase routing provides a routing architecture that does not make any assumptions about T apart from the fact that it is partially specified by row and column sum bounds and can provide QoS guarantees for routing all matrices in T (R, C) without requiring any detection of changes in traffic patterns or dynamic network reconfiguration in response to it. Quite surprisingly, the performance of two-phase routing has been shown in [9], through evaluation on actual ISP topologies, to be within 15% of the optimal scheme that can possibly re-configure the network in response to traffic changes. III. R ELATED W ORK Direct routing from source to destination (instead of in two phases) along fixed paths for the hose traffic model has been considered by Duffield et al. [5] and Kumar et al. [7]. In related work, Applegate et al. [2] consider fixed path routing and provide relative guarantees for routing an arbitrary traffic matrix with respect to the best routing for that matrix. However, they do not provide absolute bandwidth guarantees for routing variable traffic under the hose model. Two aspects of direct source-destination path routing, namely, (i) the source needs to know the destination of a packet for routing it, and (ii) the bandwidth requirements of the (fixed) paths change with traffic variations, render them unsuitable for some network architectures and applications. Because of (i), these methods cannot be used to provide indirection in service overlay models like i3 where the destination of a packet is not known at the source. Because of (ii), the adaptation of these methods for IP-over-Optical networks necessitates detection of changes in traffic patterns and dynamic reconfiguration of the provisioned optical layer circuits in response to it, a functionality that is not present in current IP-over-Optical network deployments. Our current work is a sequel to [8] and differs from Zhang et al. [17] in the following ways: • Zhang et al. consider only the IP layer (logical) topology which is a fully-connected (complete) graph, while we work with the (sparse) physical WDM topology and consider the routing of Phase 1 and Phase 2 paths on the WDM topology. This increases the applicability of our work to the IP–over-OTN architecture where routers interconnected over a switched optical backbone. • We work with a generalized scheme proposed in [8] with possibly unequal traffic split ratios. In contrast, Zhang et al. consider equal traffic split ratios only in [17]. • We consider the problem of protecting against router node failures for two-phase routing in IP-over-OTN and propose two schemes. We develop algorithms for routing under the two schemes so as to maximize network

Source Node

Intermediate Node

Intermediate Node

Phase 2 Tunnel Phase 2 Routing

Destination Node

Destination Node

Physical View

Fig. 1.

Logical View

Phase 1 and Phase 2 routing in the scheme

throughput. Zhang et al. analyze the extra capacity required to protect against arbitrary IP router failures for the special case of routing with equal traffic split ratios in [17]. IV. OVERVIEW OF T WO -P HASE ROUTING In this section, we give an overview of the two-phase routing scheme from [8]. As mentioned earlier, the scheme does not require the network to detect changes in the traffic distribution or re-configure the network in response to it. The only assumption about the traffic is the limits imposed by the ingress-egress constraints at each node, as outlined in Section II-B. As is indicative from the name, the routing strategy operates in two phases: • Phase 1: A pre-determined fraction αj of the traffic entering the network at any node is distributed to every node j independent of the final destination of the traffic. • Phase 2: As a result of the routing in Phase 1, each node receives traffic destined for different destinations that it routes to their respective destinations in this phase. This is illustrated in Figure 1. A simple method of implementing this routing scheme in the network is to form fixed bandwidth tunnels between the nodes. In order to differentiate the tunnels carrying Phase 1 and Phase 2 traffic, we will refer to these tunnels as Phase 1 and Phase 2 paths respectively. The critical reason the two-phase routing strategy works is that the bandwidth required for these tunnels only depends on the ingress-egress capacities Ri , Ci and not on the (unknown) individual entries in the traffic matrix. Note that the traffic split ratios Pn α1 , α2 , . . . , αn in Phase 1 of the scheme are such that i=1 αi = 1. Let us elaborate on the routing procedure. Consider a node i with maximum incoming traffic Ri . Node i sends αj Ri amount of this traffic to node j during the first phase for each j ∈ N . Thus, the demand from node i to node j as a result of Phase 1 is αj Ri . At the end of Phase 1, node i has received αi Rk traffic from any other node k. Out of this, the traffic destined for node j is αi tkj since all traffic is initially split without regard to the final destination. Thus, the maximum traffic that P needs to be routed from node i to node j during Phase 2 is k∈N αi tkj = αi Cj . Thus, the traffic demand from node i to node j during Phase 2 is αi Cj .

B. Related Routing Methodologies: Unsuitability for IP-overOTN

Fig. 2. Routers interconnected over a switched optical backbone in IP-overOptical Networks

Thus, the maximum demand from node i to node j as a result of routing in Phases 1 and 2 is t0ij = αj Ri + αi Cj . Note that this does not depend on the traffic matrix T ∈ T (R, C). Thus, the scheme handles variability in traffic matrix T ∈ T (R, C) by effectively routing a transformed matrix T 0 = [t0ij ] that depends only on aggregate ingress-egress traffic constraints and the distribution ratios α1 , α2 , . . . , αn , and not on the specific matrix T ∈ T (R, C). This is what makes the routing scheme oblivious to changes in the traffic distribution. Deployment of the scheme requires specification of the traffic distribution ratios α1 , α2 , . . . , αn and routing of the Phase 1 and Phase 2 paths. In [9], linear programming formulations and fast combinatorial algorithms are developed for computing the above so as to maximize network throughput. V. T WO -P HASE ROUTING IN IP- OVER -O PTICAL N ETWORKS In this section, we introduce the IP-over-Optical network architecture, discuss the unsuitability of related routing methodologies for IP-over-Optical networks, and then describe the realization of two-phase routing in such networks. A. IP-over-Optical Networks Core IP networks are often deployed by interconnecting routers over a switched optical backbone, also called IPover-OTN (Optical Transport Network). This is illustrated in Figure 2. Because a router line card is typically 3-4 times more expensive than an optical switch card [15], an IP-overOTN architecture reduces network cost by keeping traffic mostly in the optical layer. Moreover, the ability of router technology to scale to port counts consistent with multiterabit capacities without compromising performance, reliability, restoration speed, and software stability is questionable [13]. By removing transit traffic from the routers to the optical switches, the requirement to upgrade router PoP configurations with increasing traffic is minimized (since optical switches are more scalable with increasing port count than routers). Also, since optical switches are known to be much more reliable compared to routers [10], this makes the architecture more robust and reliable. Routing in IP-over-OTN needs to make a compromise between keeping traffic at the optical layer (for the above reasons) and using intermediate routers for packet grooming in order to achieve efficient statistical multiplexing of data traffic [15].

We return to related work on routing with traffic variability from Section III and point out why such existing methods cannot meet certain requirements in IP-over-OTN. Direct routing from source to destination (instead of in two phases) along fixed paths for the hose traffic model has been considered by Duffield et al. [5] and Kumar et al. [7]. In related work, Applegate et al. [2] consider fixed path routing and provide relative guarantees for routing an arbitrary traffic matrix with respect to the best routing for that matrix. However, they do not provide absolute bandwidth guarantees for routing variable traffic under the hose model. In all of the above, direct source-destination paths are fixed a priori for routing the traffic between each sourcedestination pair. Note that even though the paths are fixed a priori and do not depend on the traffic matrix, their bandwidth requirements change with variations in the traffic matrix. The fixed path routing in the above models, when applied to IPover-OTN, routes packets from source to destination along direct paths in the optical layer. This necessitates dynamic reconfiguration of the provisioned optical layer circuits (i.e., change in bandwidth) in response to traffic variations. As an illustration, consider the scenario in Figure 2, where router A is connected to router C using 3 OC-48 connections and to Router D using 1 OC-12 connection, so as to meet the traffic demand from node A to nodes C and D of 7.5 Gbps and 600 Mbps respectively. Suppose that at a later time, traffic from A to C decreases to 5 Gbps, while traffic from A to D increases to 1200 Mbps. Then, the optical layer must be reconfigured so as to delete one OC-48 connection between A and C and creating a new OC-12 connection between A and D. The (current) traffic matrix is not only difficult to estimate but changes in the same may not be detectable in real time. Moreover, dynamic changes in routing in the network may be difficult or prohibitively expensive from a network operations perspective. In spite of the continuing research on IP-Optical integration, network deployments are far away from utilizing the optical control plane to provide bandwidth provisioning in real-time to the IP layer. The unavailability of network control plane mechanisms for reconfiguring the network in response to and at time-scales of changing traffic amplifies the necessity of static provisioning at the optical layer in any scheme that handles traffic variability. Direct source-destination path routing does not meet this requirement. C. Two-Phase Routing for IP-over-OTN Two-phase routing scheme, as envisaged for IP-over-OTN, establishes the fixed bandwidth Phase 1 and Phase 2 tunnels at the optical layer. Thus, the optical layer is statically provisioned and does not need to be reconfigured in response to traffic changes. IP packets are routed end-to-end with IP layer processing at a single intermediate node only. While in transit at the optical layer inside either Phase 1 or Phase 2 tunnels, packets do enter the router but appear as transit traffic at the Optical Cross-Connect (OXC) only (see Figure

2 paths for optical layer switching. However, the total traffic that was supposed to originate at that node, i.e., Rf , no longer enters the network. We propose two different schemes for provisioning the optical layer in IP-over-OTN in order to handle the redistribution of split ratios after router node failures. We discuss these next. Algorithms for maximizing the throughput under the two protection schemes are presented in Sections VII and VIII respectively. Fig. 3. Intermediate node packet processing for proposed scheme in IP-overOptical Networks

3). The IP layer packet processing at an intermediate node works as follows. The optical layer circuit is dropped at the IP router at the node (through OXC-to-router links), wherein the packets are multiplexed back to the OXC (through routerto-OXC links) to be routed through direct optical layer circuits to their final destinations. This architecture provides the desirable statistical multiplexing properties of packet switching for handling highly variable traffic without significantly increasing the IP layer transit traffic. Compare this with the high levels of IP layer transit traffic in an IP-over-WDM architecture where routers are directly connected to WDM systems and need to process packets at each hop. In summary, two-phase routing when applied to IP-overOTN leads to an architecture with the following salient features: • IP traffic is routed “mostly” at the optical layer from source to destination routers with packet grooming at one intermediate router only. • The optical layer (circuits and their bandwidth) are statically provisioned a priori to provide bandwidth guarantees for end-to-end IP traffic. Routing at the IP layer is static – there is no need to detect changes in traffic or re-configure the routing in response to it. • Bandwidth guarantees are provided for routing all traffic matrices within the network’s natural ingress-egress capacity constraints. VI. M AKING T WO -P HASE ROUTING R ESILIENT TO ROUTER FAILURES IN IP- OVER -OTN In this section, we consider extending the two-phase routing scheme for protecting against router node failures. In the term “router node failure”, node refers to a PoP, hence it includes the failure of all routers in a PoP. When a router at any node f fails, any other node i cannot split any portion of its originating traffic to intermediate node f . Hence, it must redistribute the traffic split ratio αf among other nodes j 6= f . Accordingly, let βjf denote the portion of αf that is redistributed to node j when node f fails. Then, we must have X βjf = αf ∀ f ∈ N j:j6=f

Note that since only the router (and not the OXC) at node f fails, this node can continue to be on the Phase 1 and Phase

A. Failure Independent Provisioning In the first scheme, called “Failure Independent Provisioning”, for each given i, j ∈ N , the restoration demand from node i to node j at the optical layer is provisioned a priori so as to handle the “worst case” node failure scenario. Under this, the additional traffic split ratio that node j needs to handle is maxf ∈N βjf . Thus, the modified split ratio αj0 associated with each node j is αj0 = αj + maxf ∈N βjf . The demand that needs to be statically provisioned at the optical between nodes i and j so as to protect against any single router node failure is αj0 Ri + αi0 Cj . Since this does not depend on which router node fails, hence the name of the scheme. The node f that achieves the maximum value of βjf for a given j is the worst case scenario for the traffic split ratio for intermediate node j. Since for different j, the worst case could be achieved by different failed nodes f , this scheme may not achieve the most capacity efficient sharing of restoration bandwidth across different router node failure scenarios. However, the advantage of the scheme is that it preserves the static nature of the original two-phase routing scheme. B. Failure Dependent Provisioning In the second scheme, called “Failure Dependent Provisioning”, the restoration demand from node i to node j at the optical layer depends on the node f which failed and is provisioned in a reactive manner after the node f fails, the value of the demand being βjf Ri + βif Cj , which could be different for different failed nodes f . By “reactive”, we mean that cross-connects need to be setup for redistributing traffic to other intermediate nodes after failure – this is because backup bandwidth is shared at the link level in the optical layer. However, the scheme allows better sharing of restoration bandwidth across different router node failure scenarios. As we shall show in Section VIII-B, it also admits a fast combinatorial algorithm (FPTAS). VII. M AXIMIZING T HROUGHPUT FOR FAILURE I NDEPENDENT P ROVISIONING Given a network with link capacities and constraints Ri , Ci on the ingress-egress traffic, we consider the problem of routing with protection against router node failures through failure independent provisioning so as to minimize the maximum utilization of any link in the network. The problem is equivalent to finding the maximum multiplier λ (throughput)

such that all matrices in λ · T (R, C) can be feasibly routed with protection against router node failures under the scheme. We first give an alternative (but equivalent) definition of throughput and then present a linear programming formulation for maximizing it. Because other intermediate nodes should be able to take up split ratios of the failed node f , we must have P the traffic 0 j6=f αj ≥ 1 for each f ∈ N . It suffices to have the minimum of these n sums exactly equal to 1. Thus, in the case that this is not so, the traffic split ratios can be normalized (divided) by X λ = min αj0 f ∈N

j6=f

to achieve the desired result – in which case all traffic matrices in λ · T (R, C) can be feasibly routed. Thus, the appropriate measure of throughput in this case is λ as defined above. Let xij e denote the flow variables for routing the demand of αj0 Ri + αi0 Cj from node i to node j. Then, the problem of two-phase routing with failure independent provisioning so as to maximize the network throughput can be formulated as the following linear program:

can be feasibly routed. (The failure redistribution ratios βjf values are scaled by the same amount.) Thus, the appropriate measure of throughput in this case is the quantity λ above when the traffic split ratios αj are not constrained to sum to 1. We first present a path flow based linear programming formulation for this problem. This will be subsequently used to develop the fast combinatorial algorithm (FPTAS) in Section VIII-B. A. Path Indexed Linear Programming Formulation Let x(P ) denote the flow on path P under normal (nofailure) conditions. Let yf (P ) be the restoration flow that appears on path P after failure of node f . Let Pij denote the set of all paths from node i to node j. Then, the problem of two-phase routing with failure dependent provisioning so as to maximize the network throughput can be formulated as the following path-indexed linear program: maximize subject to

maximize λ

λ

e∈E + (k)

xij e −

x(P ) =

i∈N

αi

αj Ri + αi Cj ∀ i, j ∈ N

(4)

P ∈Pij

subject to

X

X

P

X

xij e

≥

=

xij e

X

αj0

∀f ∈N

j∈N,j6=f  0  αj Ri + αi0 Cj −αj0 Ri − αi0 Cj



0 ∀ i, j, k ∈ N

e∈E − (k)

X

X

≤

(1)

yf (P ) =

X βjf = if k = i j∈N,j6=f if k = j X X otherwise x(P ) + (2) i,j P ∈Pij ,e∈P

ue ∀ e ∈ E

i,j∈N

This linear program can be solved in polynomial using a general linear programming algorithm [14]. VIII. M AXIMIZING T HROUGHPUT FOR FAILURE D EPENDENT P ROVISIONING Given a network with link capacities and constraints Ri , Ci on the ingress-egress traffic, we consider the problem of routing with protecting against router node failures through failure dependent provisioning so as to minimize the maximum utilization of any link in the network. The problem is equivalent to finding the maximum multiplier λ (throughput) such that all matrices in λ · T (R, C) can be feasibly routed with protection against router node failures under the scheme. Note that for failure dependent provisioning, we work explicitly with both normal (no-failure) traffic split ratios αj and failure redistribution ratios βjf . Suppose we relax the requirement that the traffic split ratios αj sum to 1 in a feasible solution of the problem. Consider the sum X λ= αi i∈N

The traffic split ratios αj can be normalized (divided) by λ so that they sum to 1, in which case all matrices in λ · T (R, C)

βjf Ri + βif Cj ∀ i, j, f ∈ N (5)

P ∈Pij

(3) x(P ) ≥ yf (P ) ≥

αf ∀ f ∈ N X

X

(6) yf (P ) ≤ ue

i,j P ∈Pij ,e∈P

∀ e ∈ E, f ∈ N 0 ∀ P ∈ Pij , ∀ i, j 0 ∀ P ∈ Pij , ∀ i, j, f ∈ N

(7) (8) (9)

In general, a network can have an exponential number of paths (in the size of the network). Hence, this linear program can have possibly exponential number of variables and is not suitable for running on medium to large sized networks. The path-indexed formulation can be converted to a polynomial size link-indexed program, thus allowing it to be solved in polynomial time using a general linear programming algorithm [14]. We omit this for lack of space. It is well known that general linear programming based algorithms for network problems do not scale well with network size beyond few tens of nodes. In Section VIII-B, we state the dual of the above path-indexed linear program. The usefulness of the primal and dual formulation is in designing a fast combinatorial algorithm for the problem. B. Combinatorial Algorithm In this section, we develop a fast combinatorial algorithm (FPTAS) for failure dependent provisioning. We begin with the dual formulation of the linear program discussed above. The primal-dual approach we develop is adapted from the

technique applied to the maximum concurrent flow problem in [6], where flows are augmented in the primal solution and weights are updated in a multiplicative fashion in the dual solution in an iterative fashion. The dual formulation of the linear program outlined in Section VIII-A associates a variable w(e, f ) with each link capacity constraint in (7), a variable πij with each demand constraint in (4), a variable γijf with each demand constraint in (5), and a variable σf with each split redistribution constraint in (6). For each node i, j ∈ N , denote by SP (i, j) the cost of the shortest path from node i to node j under link costs c(e) = P f ∈N w(e, f ) ∀ e ∈ E. That is, XX SP (i, j) = min w(e, f ) P ∈Pij

e∈P f ∈N

Also, let SPf (i, j) denote the cost of the shortest path from node i to node j under link costs c(e) = w(e, f ) ∀ e ∈ E. That is, X SPf (i, j) = min w(e, f ) P ∈Pij

e∈P

We define two more quantities before arriving at the dual linear program formulation. For any node k ∈ N , we define V (k) as X X V (k) = Ri SP (i, k) + Cj SP (k, j) ∀ k ∈ N i:i6=k

j:j6=k

and, for any k, f ∈ N, k 6= f , we define W (k, f ) as X X W (k, f ) = Ri SPf (i, k) + Cj SPf (k, j) i:i∈{k,f / }

j:j ∈{k,f / }

∀ k, f ∈ N, k 6= f After simplification and removal of the dual variables πij , γijf , and σf , the dual linear program can be written as: minimize

XX

ue w(e, f )

e∈E f ∈N

subject to V (f ) + W (k, f ) ≥ w(e, f ) ≥

1 ∀ k, f ∈ N, k 6= f 0 ∀ e ∈ E, f ∈ N

(10) (11)

Given any set of weights w(e, f ), note that the quantities V (k) and W (k, f ) above can be computed in polynomial time by simple shortest path computations. Let U (k, f ) denote the left-hand-side (LHS) of constraint (10) for any k, f ∈ N, k 6= f , that is U (k, f ) = V (f ) + W (k, f ) ∀ k, f ∈ N, k 6= f Thus, a set of weights w(e, f ) is a feasible solution for the dual program if and only if min

k,f ∈N,k6=f

U (k, f ) ≥ 1

The algorithm works as follows. Start with equal initial weights w(e, f ) = δ ∀ e ∈ E, f ∈ N (the quantity δ depends

1

2

k

R 3

2

f

k

C1

R k Shortest Path

C

C2

1

Fig. 4.

3

R

R 1

C k

3

2

k

3

One Step in the Primal-Dual Computation

on ² and is derived later). Repeat the following until the dual feasibility constraints are satisfied: 1) Compute nodes f = f¯ and k = k¯ for which U (k, f ) is minimum. This also identifies (i) paths Pi from node i to node f¯ for all i 6= f¯, (ii) paths Qj from node f¯ to node j for all j 6= f¯, (iii) paths Pi0 from node i to node ¯ f¯}, and (iv) paths Q0 from node k¯ to k¯ for all i ∈ / {k, j ¯ f¯}. This is illustrated in Figure 4. node j for all j ∈ / {k, 2) For each e ∈ E, let NP (e) be the set of nodes i for which Pi contains link e and NQ (e) be the set of nodes j for which Qj contains link e. Let NP 0 (e) and NQ0 (e) denote similar sets for paths Pi0 and Q0j respectively. Compute the quantity α as follows: X X S(e) = Ri + Cj ∀ e ∈ E i∈NP (e)

S 0 (e) =

X

j∈NQ (e)

Ri +

i∈NP 0 (e)

α = min e∈E

X

Cj ∀ e ∈ E

j∈NQ0 (e)

ue S(e) + S 0 (e)

3) Send αRi amount of flow on path Pi for all i 6= f¯ and αCj amount of flow on path Qj for all j 6= f¯. For each link e, compute the total working flow ∆(e). ¯ f¯} 4) Send αRi amount of flow on path Pi0 for all i ∈ / {k, 0 ¯ and αCj amount of flow on path Qj for all j ∈ / {k, f¯}. For each link e, compute the total restoration flow ∆0 (e, f¯) that appears on link e after failure of router node f¯. 5) For each e ∈ E, update weights w(e, f¯) as µ ¶ ²[∆(e) + ∆0 (e, f¯)] ¯ ¯ w(e, f ) ← w(e, f ) 1 + ue 6) For each e ∈ E, f ∈ N, f 6= f¯, update weights w(e, f ) as ¶ µ ²∆(e) w(e, f ) ← w(e, f ) 1 + ue 7) Increment by α both the split ratio αf¯ associated with node f¯ and the redistribution βk¯f¯ to node k¯ after failure of router node f¯. When the above procedure terminates, dual feasibility constraints will be satisfied. However, primal capacity constraints on each link will be violated, since we were working with the original (and not residual) link capacity at each stage. To

remedy this, we scale down the normal split ratios αi and and failure redistribution ratios βjf uniformly so that capacity constraints are obeyed. Note that since the algorithm maintains primal and dual solutions at each step, the optimality gap can be estimated by computing the ratio of the primal and dual objective function values. The computation can be terminated immediately after the desired closeness to optimality is achieved. Algorithm FDP: αk ← 0 ∀ k ∈ N ; βkf ← 0 ∀ k, f ∈ N, k 6= f ; w(e, f ) ← δ ∀ e ∈ E, f ∈ N ; work(e) ← 0 ∀ e ∈ E ; bkp(e, f ) ← 0 ∀ e ∈ E, f ∈ N ; G←0; while G < 1 do For each i, j ∈ N , compute shortest P path from i to j under link costs c(e) = f ∈N w(e, f ) and denote its cost by SP (i, j) ; For each i, j ∈ N , compute shortest path from i to j under link costs c(e) = w(e, f ) for each f ∈ N and P f (i, j) ; Pdenote its cost by SP Ri SP (i, k) + j6=k Cj SP (k, j) ; V (k) ← i6P =k Ri SPf (i, k)+ W (k, f ) ← i∈{k,f / } P C SPf (k, j) ; j ∈{k,f / } j G ← mink,f ∈N,k6=f [V (f ) + W (k, f )] ; if G ≥ 1 break ; ¯ and f¯ be nodes for which above minimum Let k occurs ; this identifies paths Pi , Qj , Pi0 , and Q0j as defined earlier ; NP (e) ← {i : Pi contains e} for all e; NQ (e) ← {j : Qj contains e} for all e; NP 0 (e) ← {i : Pi0 contains e} for all e; 0 NQ0 (e) ← P e} for all e; P{j : Qj contains S(e) ← i∈N (e) Ri + j∈N (e) Cj ; P P P Q S 0 (e) ← i∈N 0 (e) Ri + j∈N 0 (e) Cj ; P Q ue α ← mine∈E S(e)+S 0 (e) ; Send αRi flow on paths Pi ∀ i 6= f¯ and αCj flow on paths Qj ∀ j 6= f¯ and compute resulting working flow ∆(e) on link e for all e ; ¯ f¯} and Send αRi flow on paths Pi0 ∀ i ∈ / {k, 0 ¯ ¯ αCj flow on paths Qj ∀ j ∈ / {k, f } and compute resulting restoration flow ∆0 (e, f¯) on link e for all e under failure of router node f¯ ; work(e) ← work(e) + ∆(e) ∀ e ; bkp(e, f¯) ← bkp(e, f¯) + ∆0 (e, f¯) ∀ e ; w(e, f ) ← w(e, f )(1 + ²∆(e)/ue ) ∀ e ∀ f 6= f¯ ; w(e, f¯) ← w(e, f¯)(1 + ²[∆(e) + ∆0 (e, f¯)]/ue ) ∀ e ; αf¯ ← αf¯ + α ; βk¯f¯ ← βk¯f¯ + α ; end while bkp max(e) ← maxf ∈N bkp(e, f ) ∀ e ; max(e) scale(e) ← work(e)+bkp ∀e∈E ; ue scale max ← maxe∈E scale(e) ; αk ← αk /scale max for all k ∈ N ; βkf ← βkf /scale max for all k, f ∈ N ; Output αk and βkf as optimal traffic split and redistribution ratios respectively ;

The pseudo-code for the above procedure, called Algorithm

FDP (for Failure Dependent Provisioning), is provided in the box above. Arrays work(e) and bkp(e, f ) keep track respectively of the working traffic on link e and the restoration traffic that appears on link e due to failure of router node f . The variable G is initialized to 0 and remains < 1 as long as the dual constraints remain unsatisfied. After the while loop terminates, the factor by which the capacity constraint on each link e gets violated is computed into array scale(e). Finally, the αi and βjf values are divided by the maximum capacity violation factor and the resulting Pvalues output P as the optimum. Let L = (n − 1)(n + 1)( i∈N Ri + j∈N Cj ) and L0 denote the minimum non-zero value of the Ri ’s and Cj ’s. The values of ² and δ are related, in the following theorem, to the approximation factor guarantee of Algorithm FDP. Theorem 1: For any given ²0 > 0, Algorithm FDP computes a solution with objective function value within (1 + ²0 )-factor of the optimum for δ=

1+² 1 and ² = 1 − √ L0 [(1 + ²) LL0 ]1/² 1 + ²0

We end this section with a bound on the running time of Algorithm FDP. Theorem 2: For any given ² > 0 chosen to provide the desired approximation factor guarantee in accordance with Theorem 1, Algorithm FDP runs in time µ 3 ¶ n m L O (m + n log n) log1+² 0 ² L which is polynomial in the network size, the number of bits used to represent the Ri , Cj values, and 1² . IX. P ERFORMANCE E VALUATION In this section, we evaluate the throughput performance of the two schemes for protecting against router node failures in two-phase routing. To compute the throughput for the failure independent provisioning scheme, we use the linear programming formulation from Section VII and solve it in CPLEX [18]. To compute the throughput for the failure dependent provisioning scheme, we use the fast combinatorial algorithm from Section VIII-B. For the throughput of the unprotected scheme, we use the fast combinatorial algorithm from [9]. For the combinatorial algorithms, the running times range from tens of seconds to few minutes on a Pentium III 1GHz 256MB machine. A. Topologies and Link/Ingress-Egress Capacities We use the six ISP maps from the Rocketfuel dataset which had accompanying (deduced) OSPF/IS-IS weights [11]. These topologies list multiple intra-PoP (Point of Presence) routers and/or multiple intra-city PoPs as individual nodes. We coalesced such nodes so that nodes correspond to cities and the topology represents geographical PoP to PoP ISP topologies. Some data about the original Rocketfuel topologies and their coalesced versions are listed in Table I. The topologies provided by Rocketfuel did not include the capacities of the links, which were needed for our study. The Rocketfuel maps did include derived OSPF/ISIS weights of

Topology Telstra (Australia) 1221 Sprintlink (US) 1239 Ebone (Europe) 1755 Tiscali (Europe) 3257 Exodus (Europe) 3967 Abovenet (US) 6461

Routers (original) 108 315 87 161 79 141

Links (inter-router) 306 1944 322 656 294 748

PoPs (coalesced) 57 44 23 50 22 22

Links (inter-PoP) 59 83 38 88 37 42

TABLE I ROCKETFUEL T OPOLOGIES WITH AS NUMBER AND NAME . T HE TABLE LISTS THE ORIGINAL NUMBER OF ROUTERS AND INTER - ROUTER LINKS , AND THE NUMBER OF COALESCED P O P S AND INTER -P O P LINKS .

links, which were computed to match observed routes. In the absence of any other information on capacities, we need a way to deduce the link capacities from the weights. For this purpose, we assumed that the given link weights are the Cisco default setting for OSPF weights, i.e, inversely-proportional to the link capacities [4]. The link capacities obtained in this manner turned out to be equal in both directions as expected. There is also no available information on the ingressegress traffic capacities at each node. Because ISPs commonly engineer their PoPs to keep the ratio of add/drop and transit traffic approximately fixed, we assumed that the ingress-egress capacity at a node is proportional to the total capacity of network links incident at that node. We also assume that Ri = Ci for all nodes i – since network routers and switches have bidirectional ports (line cards), hence the ingress P and egress capacities are equal. Thus, we have Ri (= Ci ) ∝ e∈E + (i) ue . B. Experiments and Results We denote the throughput values for the different cases as follows: (i) λU N P for unprotected, (ii) λF IP for protecting router node failures with failure independent provisioning, and (iii) λF DP for protecting router node failures with failure dependent provisioning. Clearly, λunp > λF DP ≥ λF IP (the last inequality follows from the nature of failure dependent provisioning as explained in Section VI-B). We are also interested in the number of intermediate nodes i with nonzero traffic split ratios, which we denote for the three cases by Nunp , NF IP , and NF DP respectively. Topology Telstra (Australia) 1221 Sprintlink (US) 1239 Ebone (Europe) 1755 Tiscali (Europe) 3257 Exodus (Europe) 3967 Abovenet (US) 6461

λU N P 2.0 2.0 2.0 2.0 2.0 2.0

λF IP 1.9487 1.9365 1.9000 1.9354 1.8750 1.8738

λF DP 1.9650 1.9704 1.9454 1.9701 1.9336 1.9247

Table II. Throughput of two-phase routing for unprotected (λU N P ), and failure independent (λF IP ), failure dependent (λF DP ) provisioning schemes for protecting router node failures.

1) Throughput: In Table II, we list the lambda values for the three cases for the six Rocketfuel topologies. When either the link capacities or ingress-egress capacities are scaled by a constant, the throughput values are scaled by the same

constant. Hence, for comparison purposes, we have normalized the values so that the throughput for the unprotected case is λU N P = 2.0. In all cases, we have λF IP < λF DP as expected. We can obtain a rough theoretical estimate of the ratio λF IP /λU N P (or, λF DP /λU N P ) as follows. The average split ratio per node under normal (no-failure) conditions is 1/n. When a router node fails, each of the other n−1 nodes takes up on the average an additional split ratio of 1/(n(n − 1)). Thus, the estimate for the ratio λF IP /λU N P (or, λF DP /λU N P ) is 1/n n−1 = 1/n + 1/(n(n − 1)) n The ratios λF IP /λU N P and λF DP /λU N P computed from Table II agree quite closely with this theoretical estimate. The overhead of protecting against router node failures can be measured by the percentage decrease in network throughput over that for the unprotected case. For failure independent provisioning, this is OF IP = (λU N P − λF IP )/λU N P . For failure dependent provisioning, this is OF DP = (λU N P − λF DP )/λU N P . These values are listed in Table III. Topology Telstra (Australia) 1221 Sprintlink (US) 1239 Ebone (Europe) 1755 Tiscali (Europe) 3257 Exodus (Europe) 3967 Abovenet (US) 6461

OF IP 2.56% 3.17% 5.00% 3.23% 6.25% 6.31%

OF DP 1.75% 1.48% 2.73% 1.49% 3.32% 3.77%

Table III. Overhead of failure independent (OF IP ) and failure dependent (OF DP ) provisioning schemes for protecting router node failures compared to unprotected case for two-phase routing.

For failure independent provisioning, the overhead ranges from 2-6% for the six topologies. For failure dependent provisioning, the overhead ranges from 1-3% for the six topologies. Both overheads are quite low. Thus, it is relatively inexpensive to provide resiliency against router node failures in two-phase routing. Observe that the overhead of failure dependent provisioning is only marginally lower than that for failure independent provisioning. Hence, given the static optical layer provisioning property of failure independent provisioning in handling router node failures, it might be the preferred one among the two proposed schemes.

2) Number of Intermediate Nodes: In Table IV, we list the number of intermediate nodes i with non-zero traffic split ratios for the three cases for the six Rocketfuel topologies. Interestingly, the number of intermediate nodes increases when we provide protection against router node failures (it is almost always the same for the failure independent and failure dependent schemes). The explanation is as follows. For the unprotected case, a small number of intermediate nodes is preferred – these nodes are presumably located at geographical center(s) of the network and provide the best opportunities for serving as intermediate nodes without increasing the length of end-to-end paths or decreasing throughput significantly. However, a small number of intermediates nodes is associated with a relatively large traffic split ratio for each such node. Thus, in the event of a route node failure at any of these intermediate nodes, a relatively large fraction of the traffic needs to be restored, thus increasing the resources reserved for restoration. Hence, in an effort to maximize the throughput, the algorithms proposed in this paper intelligently spread the traffic to many intermediate nodes so as to prevent any single split ratio from becoming too large. Topology Telstra (Australia) 1221 Sprintlink (US) 1239 Ebone (Europe) 1755 Tiscali (Europe) 3257 Exodus (Europe) 3967 Abovenet (US) 6461

NU N P 1 6 6 5 8 5

NF IP 39 32 20 31 16 15

NF DP 39 27 20 31 16 15

Table IV. Number of intermediate nodes in two-phase routing for unprotected (NU N P ), and failure independent (NF IP ), failure dependent (NF DP ) provisioning schemes for protecting router node failures.

X. C ONCLUSION AND F UTURE W ORK Motivated by the emergence of new applications on the Internet that have created highly dynamic and changing traffic patterns and the need to route such traffic with QoS (Qualityof-Service) guarantees, the two-phase routing scheme was recently proposed that allows preconfiguration of the network such that all traffic patterns, permissible within the network’s natural ingress-egress capacity constraints, can be handled in a capacity efficient manner without the necessity to detect any traffic changes in real-time. Quite surprisingly, the performance of two-phase routing has been shown, through evaluation on actual ISP topologies, to be within 1-5% of the optimal scheme that can possibly re-configure the network in response to traffic changes [9]. In this paper, we considered the realization of two-phase routing in IP-over-OTN networks where routers are interconnected over a switched optical backbone. Given the high unreliability of routers, we proposed two different schemes for making two-phase routing in IP-over-OTN resilient against router node failures – one that is failure node independent and static, and the other that is failure node dependent and requires limited optical layer reconfiguration after failure. We

developed linear programming formulations for both schemes and a fast combinatorial algorithm for the second scheme so as to maximize network throughput. In each case, we determine (i) the optimal distribution of traffic to various intermediate routers for both normal (no-failure) and failure conditions, and (ii) provisioning of optical layer circuits to provide the needed inter-router links. We view this as important progress for twophase routing in IP-over-OTN towards achieving carrier-class reliability so as to facilitate its future deployment in ISP networks. One would ideally like the network to be quasi-static in its configuration and not require frequent adaptation to network events. ISPs typically use a combination of overprovisioning and dynamic network adaptation to avoid network congestion caused by unpredicted events. However, both overprovisioning and frequent adaptation lead to increased costs. In particular, frequent adaptation incurs high operational costs and risks further instability elsewhere in the network. Two-phase routing, with its extensions for router failure resiliency, can handle extreme traffic variability and router failures in a network with an almost static network configuration and without requiring high capacity overprovisioning. The ability to handle traffic variation without almost no routing adaptation will lead to more stable and robust Internet behavior. We evaluated the throughput performance of two-phase routing with the two proposed schemes for protecting against router node failures on actual ISP topologies taken from the RocketFuel project and compared it with the unprotected case. We conclude that it is relatively inexpensive to provide resiliency against router node failures int two-phase routing. Also, the overhead of failure dependent provisioning is only marginally lower than that for failure independent provisioning. Hence, given the static optical layer provisioning property of failure independent provisioning in handling router node failures, it might be the preferred one among the two proposed schemes. We also rationalized the observation that when maximizing throughput, the number of intermediate nodes is significantly higher for protecting against router node failures compared to the unprotected case. We are currently working on extending the scheme to other types of failures in IP-over-OTN, like fiber cuts and line card failures. Such failures are best restored at the optical layer through link restoration and path restoration mechanisms with backup bandwidth sharing for reducing restoration capacity overhead. In such scenarios, the first and second phase paths need to be protected at the optical layer. The handling of optical layer node failures in two-phase routing poses additional challenges. Failure of optical switches in non-intermediate nodes lying on Phase 1 or Phase 2 paths can be restored by using node detours in link restoration or node-disjoint primary/backup paths in path restoration at the optical layer. The failure of optical switches in intermediate nodes is naturally accommodated in two-phase routing by the approach in this paper – redistributing traffic split ratios to other intermediate nodes. Because an optical switch failure at a single node can lead to both of the above scenarios, the corresponding mechanisms need to be integrated. The above problems are the subject of our current ongoing research.

R EFERENCES [1] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Network Flows: Theory, Algorithms, and Applications, Prentice Hall, February 1993. [2] D. Applegate and E. Cohen, “Making Intra-Domain Routing Robust to Changing and Uncertain Traffic Demands: Understanding Fundamental Tradeoffs”, ACM SIGCOMM 2003, August 2003. [3] S. Blake, et al., “An Architecture for Differentiated Services”, RFC 2475, December 1998. [4] Configuring OSPF, 1997. Cisco Systems, http://www.cisco.com/univerc/cc/td/doc/product/ software/ios113ed/113ed cr/np1 c/1cospf.htm. [5] N. G. Duffield, P. Goyal, A. G. Greenberg, P. P. Mishra, K. K. Ramakrishnan, J. E. van der Merwe, “A flexible model for resource management in virtual private network”, ACM SIGCOMM 1999, August 1999. [6] N. Garg and J. Konemann, “Faster and Simpler Algorithms for Multicommodity Flow and other Fractional Packing Problems”, 39th Annual Symposium on Foundations of Computer Science (FOCS), 1998. [7] A. Kumar, R. Rastogi, A. Silberschatz , B. Yener, “Algorithms for provisioning VPNs in the hose model”, ACM SIGCOMM 2001, August 2001. [8] M. Kodialam, T. V. Lakshman, and S. Sengupta, “Efficient and Robust Routing of Highly Variable Traffic”, Third Workshop on Hot Topics in Networks (HotNets-III), November 2004. [9] M. Kodialam, T. V. Lakshman, and S. Sengupta, “A Versatile Scheme for Routing Highly Variable Traffic in Service Overlays and IP Backbones”, Submitted for publication. [10] C. Labovitz, A. Ahuja, and F. Jahanian, ”Experimental Study of Internet Stability and Wide-Area Backbone Failures”, University of Michigan Technical Report CSE-TR-382-98. [11] R. Mahajan, N. Spring, D. Wetherall, and T. Anderson, “Inferring link weights using end-to-end measurements”, 2nd ACM Internet Measurement Workshop, 2002. [12] A. Medina, N. Taft, K. Salamatian, S. Bhattacharyya, C. Diot, “Traffic Matrix Estimation: Existing Techniques and New Directions”, ACM SIGCOMM 2002, August 2002. [13] M. Reardon and S. Saunders, ”Terabit Trouble”, Data Communications, August 1999, pp. 11-16. [14] A. Schrijver, Theory of Linear and Integer Programming, John Wiley & Sons, 1986. [15] S. Sengupta, D. Saha, and V. P. Kumar, “Switched Optical Backbone for Cost-effective Scalable Core IP Networks”, IEEE Communications Magazine, June 2003. [16] I. Stoica, D. Adkins, S. Zhuang, S. Shenker, S. Surana, “Internet Indirection Infrastructure”, ACM SIGCOMM 2002, August 2002. [17] R. Zhang-Shen and N. McKeown “Designing a Predictable Internet Backbone Network”, Third Workshop on Hot Topics in Networks (HotNets-III), November 2004. [18] ILOG CPLEX, http://www.ilog.com.

XI. A PPENDIX In this section, we provide proofs for Theorems 1 and 2 from Section VIII for the approximation factor guarantee and running time of Algorithm FDP. We begin with some notation, then state some useful lemmas, and finally conclude with the proofs of the main theorems. Given a set of dual weights w(e, f ), let D(w) denote the dual objective function value and let Γ(w) denote the minimum value of the LHS of dual program constraint (10) over all nodes k, f ∈ N, k 6= f . Then, solving the dual program is equivalent to finding a set of weights w(e, f ) such that D(w)/Γ(w) is minimized. Denote the optimal objective function value of the latter by θ, i.e., θ = minw D(w)/Γ(w). We introduce some more notation before stating an important lemma. Let wt−1 denote the weight function at the beginningP of iteration t of the while loop, and let ft−1 be the value of j∈N αj (primal objective function) up to the end of iteration t −P 1. Recall that we defined L = (n − 1)(n + P 1)( i∈N Ri + j∈N Cj ) and L0 as the minimum non-zero

value of the Ri ’s and Cj ’s. Suppose the algorithm terminates after iteration K. Lemma 1: At the end of every iteration t, 1 ≤ t ≤ K, of Algorithm FDP, the following holds Γ(wt ) ≤ δL

t Y

² [1 + (fj − fj−1 )] θ j=1

Proof: Let f¯, k¯ ∈ N be the nodes for which LHS of dual constraint (7) is minimum and let Pi , Qj , Pi0 , Q0j be the corresponding paths (as defined earlier) along which flow is augmented during iteration t. Recall that the weights are updated as: µ ¶ ²∆(e) wt (e, f ) ← wt−1 (e, f ) 1 + ∀ e, ∀ f 6= f¯ ue µ ¶ ²[∆(e) + ∆0 (e, f¯)] ¯ ¯ wt (e, f ) ← wt−1 (e, f ) 1 + ∀e ue where ∆(e) is the total working flow on link e, and ∆0 (e, f¯) is the total restoration flow on link e due to failure of router node f¯ (both sent during iteration t). Using this, we have X D(wt )

=

ue wt (e, f )

e∈E,f ∈N

X

=

ue wt−1 (e, f ) + ²

X

wt−1 (e, f¯)[∆(e) + ∆0 (e, f¯)]

e∈E

=

wt−1 (e, f )∆(e) +

e∈E,f 6=f¯

e∈E,f ∈N

²

X

X

D(wt−1 ) + ²

wt−1 (e, f )[

e∈E,f ∈N

X

αCj ] + ²

X

X

αRi +

i∈NP (e)

X

wt−1 (e, f¯)[

e∈E

j∈NQ (e)

X

αRi +

i∈NP 0 (e)

αCj ]

j∈NQ0 (e)

Using the definition of the sets NP (e), NQ (e), NP 0 (e), NQ0 (e), and that flow is augmented on paths Pi , Qj , Pi0 , Q0j , we can rewrite the summations on the RHS of the above equation to obtain D(wt )

=

D(wt−1 ) + ²α[

X

X

i

Cj Ri

j

=

i

=

X X

wt−1 (e, f ) +

wt−1 (e, f¯) +

Cj

X

wt−1 (e, f¯)]

e∈Q0j

D(wt−1 ) + ²α[

X

wt−1 (e, f ) +

e∈Pi ,f ∈N

e∈Pi0

i

X

X

e∈Qj ,f ∈N

j

X

Ri

X i

¯ + Ri SPf¯(i, k)

Ri SP (i, f¯) +

X j

D(wi−1 ) + ²αΓ(wi−1 )

X j

¯ j)] Cj SPf¯(k,

Cj SP (f¯, j) +

Using this for each iteration down to the first one, we have D(wt ) = D(w0 ) + ²

t X

(fj − fj−1 )Γ(wj−1 )

(12)

j=1

Now consider the weight function wt − w0 . Clearly, D(wt − w0 ) = D(wt )−D(w0 ). Because any path is at most n−1 hops in length, it follows that the quantity SP (i, k) (or SP (k, j)) is a sum of at most n(n − 1) weights w(e, f ). Similarly, the quantity SPf (i, k) (or SPf (k, j))P is a sum of at most Pn − 1 weights w(e, f ). Thus, Γ(w ) ≤ (n − 1)nδR + 0 i i j (n − P P 1)nδC + (n − 1)δR + (n − 1)δC = (n − 1)(n + j i j iP j P 1)δ( i Ri + j Cj ) = δL. Hence, we have Γ(wt − w0 ) ≥ Γ(wt ) − δL. Since θ is the optimal dual objective function value, we have

Since the algorithm terminates when Γ(w) ≥ 1, and since dual weights are updated by a factor of at most 1 + ² after each iteration, we have Γ(wK ) ≤ 1 + ². Note that just before each augmentation mentioned above, the weight w(e, f ), with coefficient at least L0 , is one of the summing components of Γ(w). Hence, L0 wK (e, f ) ≤ 1+². Also, the value of wK (e, f ) is given by r Y ∆t wK (e, f ) = δ (1 + ²) ue t=1 Using the fact that (1 + βx) ≥ (1 + x)β ∀ x ≥ 0 and any t 0 ≤ β ≤ 1 and setting x = ² and β = ∆ ue ≤ 1, we have 1+² ≥ wK (e, f ) ≥ L0

D(wt ) − D(w0 ) D(wt − w0 ) ≤ θ≤ Γ(wt − w0 ) Γ(wt ) − δL

r Y

δ

(1 + ²)∆t /ue

t=1

=

Pr δ(1 + ²) t=1 ∆t /ue

=

δ(1 + ²)κ

whence, whence,

D(wt ) − D(w0 ) ≥ θ(Γ(wt ) − δL)

κ ≤ log1+²

Using this in equation (12), we have Γ(wt ) ≤ δL +

t ²X (fj − fj−1 )Γ(wj−1 ) θ j=1

(13)

The property claimed in the lemma can now be proved using inequality (13) and mathematical induction on the iteration number. We omit the details here, but point out that the induction basis case (iteration t = 1) holds since w0 (e, f ) = δ ∀ e ∈ E, f ∈ N and Γ(w0 ) ≤ δL. We now estimate the factor by which the objective function value value fK in the primal solution when the algorithm terminates needs to be scaled to ensure that link capacity constraints are not violated. Lemma 2: When Algorithm FDP terminates, the primal solution needs to be scaled by a factor of at most log1+²

Proof of Theorem 1: Using Lemma 1 and the fact that 1 + x ≤ ex ∀ x > 0, we have Γ(wt ) ≤

δL

t Y

²

e θ (fj −fj−1 )

j=1

=

δLe²ft /θ

The simplification in the above step uses telescopic cancellation of the sum (fj − fj−1 ) over j. Since the algorithm terminates after iteration K, we must have Γ(wK ) ≥ 1. Thus, 1 ≤ Γ(wK ) ≤ δLe²ft /θ whence,

1+² δL0

to ensure primal feasibility. Proof: Consider any link e and associated weight w(e, f ) for some node f ∈ N . We will show that the working flow on link e plus the restoration flow on link e due to failure of router node f is at most ue when the primal solution is scaled by the above factor. The value of w(e, f ) is updated when flow is augmented on edge e under either or both of the following circumstances: • Link e appears on any of the paths Pi , Qj , in which case the flow is working traffic on this link, or 0 0 • Link e appears on any of the paths Pi , Qj , in which case the flow appears as restoration traffic on link e under failure of router node f . Let the sequence of flow augmentations (working plus restoration) on link e that require updatePof weight w(e, f ) be r ∆1 , ∆2 , . . . , ∆r , where r ≤ K. Let t=1 ∆t = κue , i.e, the total flow (working traffic plus restoration traffic on failure of router node f ) routed on link e exceeds its capacity by a factor of κ.

1+² δL0

θ fK

≤

² ln(1/δL)

(14)

From Lemma 2, the objective function value of the feasible primal solution after scaling is at least fK log1+² 1+² δL0 The approximation factor for the primal solution is at most the (ratio) gap between the primal and dual solution. Using (14), this is given by θ fK

≤ =

² log1+² 1+² δL0 ln(1/δL) ln 1+² ² δL0 ln(1 + ²) ln(1/δL)

The quantity ln 1+² δL0 / ln(1/δL) equals 1/(1 − ²) for δ = 1+² L 1/² /[(1 + ²) ] . Using this value of δ, the approximation L0 L0 factor is upper bounded by 1 ² 1 ² ≤ ≤ 2 ln(1 + ²) (1 − ²) (² − ² /2)(1 − ²) (1 − ²)2

Setting 1 + ²0 = 1/(1 − ²)2 and solving for ², we get the value of ² stated in the theorem. Proof of Theorem 2: We first consider the running time of each iteration of the algorithm during which nodes f¯, k¯ and associated paths Pi , Qj , Pi0 , Q0j are chosen to augment flow. Computation of shortest path costs SPf (i, j) ∀ i 6= j, ∀ f ∈ N involves n all-pairs shortest path computations which can be implemented in O(n2 m + n3 log n) time using Dijkstra’s shortest path algorithm with Fibonacci heaps [1]. All other operations within an iteration are absorbed by the the time taken for this n all-pairs shortest path computations, leading to a total of O(n2 (m + n log n)) time per iteration. We next estimate the number of iterations before the algorithm terminates. Recall that in each iteration, flow is augmented along paths Pi , Qj , Pi0 , Q0j , the value being such that the working flow ∆(e) plus the restoration flow ∆0 (e, f¯) sent on link e during that iteration is at most ue . Thus, for at least one link e, the total flow sent equals ue and the weight w(e, f¯) increases by a factor of 1 + ². Accordingly, with each iteration, we can associate a weight w(e, f ) which increases by a factor of 1 + ². Consider the weight w(e, f ) for fixed e ∈ E, f ∈ N . Since w0 (e, f ) = δ and wK (e, f ) ≤ (1 + ²)/L0 , the maximum number of times that this weight can be associated with any iteration is 1+² 1 L 1 L log1+² = (1 + log1+² 0 ) = O( log1+² 0 ) δL0 ² L ² L Since there are a total of nm weights w(e, f ), hence the toL tal number of iterations is upper bounded by O( nm ² log1+² L0 ). Multiplying this by the running time per iteration, we ob3 tain the overall algorithm running time as O( n ²m (m + n log n) log1+² LL0 ). Note that log LL0 is polynomial in log n and the number of bits used to represent the Ri and Cj values.