A Spare Bandwidth Sharing Scheme Based on Network ... - IEEE Xplore

IEEE TRANSACTIONS ON RELIABILITY, VOL. 54, NO. 1, MARCH 2005

123

A Spare Bandwidth Sharing Scheme Based on Network Reliability Kyu-Seek Sohn, Member, IEEE, Seung Yeob Nam, Student Member, IEEE, and Dan Keun Sung, Senior Member, IEEE

Abstract—Spare bandwidth is required for recovering the network service from network faults. However, it degrades the efficiency of network utilization. Spare bandwidth demand can be reduced significantly by letting spare bandwidth be shared among several network services. Spare bandwidth reserved on a network element can be shared by a set of network services for a network fault if they are not simultaneously affected by the network fault. A new, and more practical spare bandwidth sharing scheme, which is based on the network reliability, is proposed in this paper. In the proposed scheme, multiple link failures are allowed with a given link failure rate, and a reasonable restoration level of near 100%; while in the conventional scheme, only a single link failure, and 100% restoration level are considered. To develop the spare bandwidth sharing scheme, we first investigate the framework for evaluating the reliability of path-based network services, and then we explain the proposed spare bandwidth sharing scheme with decision parameters such as lifetime of the path, restoration level, and the maximum number of working paths which can be protected by a backup link. Simulation results show that the proposed spare bandwidth sharing scheme requires a smaller amount of spare bandwidth than the conventional scheme.

jointness

the maximum number of primary paths that a backup link can protect with a given restoration level. In this paper, we will use the terms working path, and primary path interchangeably, relying on context for clarity. NOTATION set of failed links in the entire network set of failed primary paths in the entire network set of primary paths protected by a backup link -th subset of consisting of primary , where represents paths, . probability that the number of primary paths in which fail simultaneously at a given time is set of links, each of which is shared by primary paths in and not shared by any other primary paths in set of links shared by any or more primary paths in . set of links dedicated for the -th primary path in

Index Terms—Joint path failure, joint path failure probability, lifetime, restoration level, spare bandwidth sharing.

MPLS CR-LDP LSP D-LSP C-LSP SSBS RAFT FMT working path backup path primary path

ACRONYMS1 Multi-Protocol Label Switching Constraint-based Routed Label Distribution Protocol Label Switched Path Distributed Label Switched Path Conventional Label Switched Path Statistical Spare Bandwidth Sharing Resource Aggregation for Fault Tolerance Fault Management Table DEFINITIONS the path established for traffic service the path established for detouring traffic from a failed working path the working path protected by a backup path is called primary path of the backup path.

Manuscript received June 18, 2003; revised March 19, 2004 and May 22, 2004. Associate Editor: M. Xie. K.-S. Sohn is with the Department of Information and Communications, Hanyang Cyber University, Seoul, Korea (e-mail: [email protected]). S. Y. Nam and D. K. Sung are with the Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 305-701 Korea (e-mail: [email protected]; dksung@ ee.kaist.ac.kr). Digital Object Identifier 10.1109/TR.2004.842532 1The

singular and plural of an acronym are always spelled the same.

event that event that event that at a given time event that

at a given time at a given time , at a given time

I. INTRODUCTION

T

HE IMPORTANCE of survivable Internet networks grows rapidly because the Internet has become a social infrastructure supporting a variety of applications such as trading, banking, education, entertainment, governmental services, and defense; and it has become prone to network faults [1]. Network survivability is the ability of a network to maintain or restore an acceptable level of performance during network failures by applying various restoration techniques, and the mitigation or prevention of service outages from network failures by applying preventative techniques [2]. Network fault protection is one preventive technique used to achieve network survivability by pre-establishing backup paths, and assigning spare bandwidth to the backup paths in the network. The amount of spare bandwidth of a backup path should be sufficient for restoring the network service by detouring the traffic from the failed links or paths to the backup path. Because the spare bandwidth is a redundancy, it degrades the bandwidth utilization of the network.

0018-9529/$20.00 © 2005 IEEE

124

Fig. 1. A network topology inappropriate for spare bandwidth sharing known as a dumbbell-shaped topology.

The amount of spare bandwidth must be kept as small as possible. Many studies have considered the design of fault tolerant networks with minimal spare bandwidth [3]–[12]. The methods for assessing network reliability have been studied also [13]. Spare bandwidth sharing schemes [3]–[5] were introduced to reduce the amount of spare bandwidth required in a network. Dovrolis et al. [5] proposed a spare bandwidth sharing scheme called Resource Aggregation for Fault Tolerance (RAFT) scheme that uses a fault management table (FMT) stored at each port of a link. The RAFT scheme protects the network against a single link failure in a network. In a single link failure scenario, the condition of spare bandwidth sharing is that backup paths passing a link simultaneously can share spare bandwidth reserved on the link if their primary paths have no common link. Many studies have adopted this spare capacity sharing scheme in spare bandwidth minimization problems [6]–[8]. Spare bandwidth minimization problems can be considered as a process to aggregate as many backup paths, whose working paths are link-disjoint, as possible into the same link. Here, we & are said to define a term ‘link-disjoint.’ Given paths be link-disjoint if there is no link used by both of the two paths. However, there are several factors that act to impede spare bandwidth sharing in practical networks, such as the Internet. The first of these factors is a network topology called a “dumbbell-shaped topology,” with a geographical bottleneck shared by paths originating from, and destined to nodes residing on different sides of the bottleneck, as shown in Fig. 1. The backup paths in this situation can not share spare bandwidth. Another factor which impedes spare bandwidth sharing is traffic engineering, such as Multiprotocol Label Switching (MPLS), with Constraint-based Routed Label Distribution Protocol (CR-LDP) [14], [15]. Traffic engineering is essential to solve congestion problems, and improve the utilization of network resources on the Internet. Traffic engineering techniques distribute traffic demand over multiple paths for load balancing [16], [17]. The use of multiple paths increases the possibility that paths meet each other on the same link. Thus, it is difficult to find the disjoint working paths in a network where traffic engineering is used. We propose a spare bandwidth sharing scheme which improves bandwidth sharing by relaxing the condition for spare bandwidth sharing. To explain the main idea of the proposed spare bandwidth sharing scheme, we suppose that backup paths pass a link in the network, each of which backs up one working path with 1:1 correspondence between backup & working paths. In other words, the link backs up working paths in the network. If the probability that of the working paths fail during a repair time is lower than the value


corresponding to a required restoration level, the backup link can protect working paths, and guarantee the required restoration level by reserving the amount of spare bandwidth required for working paths. In result, the backup paths can protect the entire working paths by sharing the spare bandwidth of working paths on the link. This is spare bandwidth sharing in a statistical sense, and is thus called “statistical spare bandwidth sharing.” The rest of this paper is organized as follows: A network path model is defined, and the failure probability of multiple paths(joint path failure probability) is analyzed in Section II, in terms of the link failure probability using the network path model. A new spare bandwidth sharing scheme is proposed in Section III; and the amount of spare bandwidth reserved on a backup link is determined in terms of the joint path failure probability, lifetimes of paths, and the restoration level required. Section IV illustrates the performance of the proposed spare bandwidth sharing scheme through simulation, and conclusions are presented in Section V. II. RELIABILITY OF PATHS In this section, we investigate the reliability of the network. First, we define a path model, and then we formulate the probability (joint path failure probability) that multiple paths fail simultaneously. Using the joint path failure probability, we develop the proposed spare bandwidth scheme in the next section. We assume the following: The considered network is the Internet based on • MPLS. Requirements for the lifetime and the restoration level • of a path are not changed during operation of the path. Link failure is the only failure scenario considered in • the network. A failed link is repaired within a given repair time. • All links are initially maintained in a normal operation • state. MPLS has been introduced to support QoS guarantees, and traffic engineering, in Internet. To transfer traffic, an MPLS network sets up a path between the source, and destination nodes. The path is called Label Switched Path(LSP) [18]. A path consists of a series of links between the end nodes. A path fails if any of its links fails. The MPLS network establishes a backup path for each working path [19], [20]. A. Path Model Let denote the set of links in a network. Let be the set of backup paths passing link , and be the set of primary paths for the backup paths passing link . Let the term “dedicated mean that the primary path does link” of a primary path not share the link with any other primary paths included in . Let the term “shared link” of a primary path represent that two or more primary paths included in share the link. Fig. 2 shows an example of primary paths, and their backup paths. Only the end nodes of paths are shown, and the transit are backup paths nodes are omitted for simplicity. , and are the primary paths protected passing link ; and , and denotes the set of by , and , respectively. denotes the backup paths passing link , and

SOHN et al.: A SPARE BANDWIDTH SHARING SCHEME BASED ON NETWORK RELIABILITY

Fig. 4.

Fig. 2.

An example of primary paths(p

;p

), and backup paths(b

;b

).

Fig. 3. An occurrence model of path failures in a network.

set of their primary paths. The links used by only backup paths are called the “backup links.” The primary paths share the shared have their own dedicated links , links ( , ); and , and and , respectively. A primary path, and its backup path are link-disjoint. B. Joint Path Failure Model A link consists of transceiver (transmitter & receiver) boards located at the end nodes of the link, and a cable assembly. Upon failure of a link, repair for the failure involves two phases. These are referred to as the service restoration, and link repair phases. In the service restoration phase, the source router of each path affected by a link failure restores the service by rerouting traffic to the backup path. The typical service restoration time ranges from tens of milliseconds to several seconds. In the link repair phase, the failed components of the link are replaced with spare parts, or repaired. The repair time ranges from several minutes taken in replacing failed transceiver boards to several days required for repairing or replacing failed sections of optical cable. Thus, the probability of the occurrence of a network failure during the repair time period can not be ignored. After completing repair of the failed link, the rerouted primary paths return to their old routes, and the backup link can apply spare bandwidth to the next failure. The backup link must use the spare bandwidth to carry the traffic rerouted from the failed primary paths for at least one repair time period of the failed link. Fig. 3 shows an occurrence model of multiple path failures observed at a backup link. There are five occurrences of link failures. All repair times are assumed to be the same as the time period of . represents the number of primary paths affected by the -th link failure, and represents the number of primary paths simultaneously residing in the failed state. , , , , ,

125

A joint path failure model in a descrete time domain.

, , , and . The minimum amount of the spare bandwidth that a backup link must reserve is determined by the maximum number of primary paths which can be simultaneously in the failed state. If is the largest number, the backup link must reserve an amount of spare bandwidth equal to the sum of the working bandwidth of primary paths. In the path failure occurrence model, if the repair times of two or more failed primary paths overlap with each other, the paths are called “failed jointly.” Such a failure is called the “joint path failure,” or “joint failure” of paths. To simplify this network reliability problem, we consider the problem in a discrete time domain. Because only the maximum number of primary paths residing in the failed state simultaneously is of interest, we can model this joint path failure in a discrete time domain without losing generality. The interval between time points is one repair time , and path failures in this joint path failure model are assumed to occur at time , , as shown in Fig. 4. We assume that the link failure probability of each link is fixed, i.e. it does not change over time, and the failure event at one time for a specific link is independent of the previous failure events for the link. Then, the interval between link failures for a link follows a geometric distribution in a discrete time domain. The link failure events at one link are assumed to be independent of those at the other links. C. Joint Path Failure Probability The joint path failure probability is the probability that a joint path failure occurs at time , where . The joint path failure event is defined for the primary paths protected by a single backup link. We define symbols used to express the joint path failure probability observed at a specific backup link, and thus, the subscripts identifying the backup link are omitted. Because we consider failure probability at a specific time , we also drop the time index from the following expressions for simplicity. The joint path failure probability of any primary paths protected by a given backup link can be expressed as

(1)

126

where


is given by

From (1), and (6), the joint failure probability of any primary paths observed at a specific backup link is obtained as (2)

The detailed derivation of the above equation is given in the Appendix A. in (1), which is defined by (2), in detail, To investigate , for , and for we define as follows. (7)

We show an example of joint path failure probability for the . Suppose a set of two paths . We case of specify the general notation for the case of two paths. is the -th link shared by 3 or more primary paths. is the -th link shared by both paths & . , are the -th links dedicated for paths , , , , and are the and , respectively. , , , and , respecfailure probability of links , , , and denote the number of links in tively. , , , and , respectively. is the total number of primary paths in whose backup paths pass the backup link. From the above definitions, and (2), we can easily obtain that (3) Because

paths cannot fail if no link fails, we can obtain that

paths

fail all other paths are alive

No failure in

(4) The following relation can be easily derived from the definitions of , , and :

We define we obtain

as

(5) . Then, from (3), (4), and (5),

(8)

We derive the joint failure probability of three in the Appendix B. The probabilities for cases of are also given in the Appendix B.

paths , and

III. STATISTICAL SPARE BANDWIDTH SHARING

(6)

We propose, in this section, a new spare bandwidth sharing scheme using the joint path failure probability defined & formulated in the previous section. First, we calculate the minimum & sufficient amount of spare bandwidth of a backup link required for protecting all primary paths corresponding to the backup


link with guaranteeing the acceptable level of restoration performance. Then, we investigate the parameters used for setting up the statistical spare bandwidth sharing(SSBS) scheme, such as jointness of primary paths corresponding to a backup link, lifetime of the primary path, and restoration level.

The unit scale of time is the repair time of a link failure, and or more paths is assumed to occur at the joint failure of , as described in Section II-B. There, time we also assume that events occurring at different times are independent of each other. The probability that no joint failure of is obtained as more than paths occurs by time

A. The Amount of Spare Bandwidth for a Backup Link Suppose that primary paths are backed up by a backup link . Let vector denote a sorted vector of bandwidth demanded primary paths defined as , of the . We define that restoration level as the probability that the failed paths are restored successfully. or more primary paths fail simultaneously If with a joint path failure probability of , the backup link can protect the primary paths with a restoration level of by reserving an amount of spare bandwidth equal to the total bandwidth demand of the primary paths. The amount of spare bandwidth of backup link required for protecting primary paths with restoration level is determined as follows:

no joint failure of more than paths by time no joint failure of more than paths at no joint failure of more than paths at

(11)

represents the event of . Thus, the probability where or more paths occurs at time that the first joint failure of is

joint failure of

(9) This means that all the backup paths passing the backup link can protect all their primary paths with the given restoration level by sharing the spare bandwidth of which the amount equals to the total bandwidth demand of the primary paths. As shown in (9), for smaller value of , a smaller amount of spare bandwidth is required on the backup link. Here, is referred to as the “jointness” of the primary paths for a backup link. Jointness for a backup link means the maximum number of primary paths that can be protected by a backup link with a given restoration level requirement, and a given amount of spare bandwidth. In other words, jointness expresses the level of “sharedness” of spare bandwidth among backup paths in a backup link. The statistical spare bandwidth sharing is the spare bandwidth sharing scheme which uses the statistical parameters, such as joint path failure probability, and restoration level, in order to calculate the amount of spare bandwidth to be shared on a backup link as described above. To evaluate (9), the jointness should be determined from the probability mass of the joint path failure, as in Section III-C.

127

or more paths occurs at

no joint failure of more than paths at

(12)

C. Determining the Jointness of Primary Paths for a Backup Link The parameters which determine the jointness of primary , the minimum paths are the probability distribution of lifetime of primary paths, and the restoration level. The lifetime of a path is defined as the time period during which the path is utilized by a customer, or the network operator. The restoration level is determined, and managed by network operators. For example, the restoration level is higher than 99% for the important traffic, or is not required for best-effort traffic. To determine the jointness for a backup link, we introduce a threshold , which is defined as (13)

B. Probability Mass of the Time to the Joint Path Failure To determine the jointness at a backup link, we need to investigate the probability mass of the time to joint failure of or more primary paths. Let be the time taken until the first or more paths occurs. The probability that joint failure of is defined as

where is the restoration level. The restoration level of a path failure is defined as the ratio of the number of restored path failures to the total number of failed paths in a given time period of network operation, as follows: number of path failures restored total number of path failures If

no joint failure of more than paths by time or more paths fail jointly at time no joint failure of more than paths by time joint failure of or more paths occurs at time no joint failure of more than paths occurs by time

(14)

satisfies the following relation: (15)

(10)

is the given lifetime of path , then a joint where or more paths occurs within from failure of . In other the present time with a probability lower than words, the probability that a joint failure of more than paths does not occur within from the present time is higher than . Thus, the backup link can statistically guarantee

128


the restoration level of by reserving an amount of spare bandwidth equal to the sum of the working bandwidth of primary paths. D. Protection Period The time period during which the backup link can guarantee protection of its primary paths is called the “protection period” for the joint failure of paths. Upon expiration of a protection period, a new protection period begins, and the amount of spare bandwidth is newly determined. If the protection period is too short, frequent estimation of the spare bandwidth may become a heavy burden to the network. If the protection period is too long, the network state may vary, and the backup link may have insufficient or overestimated spare bandwidth for protecting its primary paths. In a normal network, the network state varies whenever a path is released or established. It is assumed that a path is released when its lifetime expires, and the backup link does not know when a new path will be established. Thus, the most appropriate protection period at a backup link is the minimum among the remaining lifetimes of the primary paths for the backup link. IV. PERFORMANCE EVALUATION A. Performance Indices Network bandwidth is the sum of the total working bandwidth, and the total spare bandwidth required in the network. The network bandwidth reduction ratio between the compared networks is defined as (16) where , and represent the amount of working bandwidth, and spare bandwidth required on link in the network using the proposed SSBS scheme, respectively; and , and represent the amount of working bandwidth, and spare bandwidth reserved on link in the network using the conventional spare bandwidth sharing scheme, respectively. The amount of spare bandwidth on each link is determined to be sufficient to guarantee the desired restoration level in both networks. In this performance evaluation, the RAFT scheme is used as a convenis determined tional spare bandwidth sharing scheme, and as (17) where is the set of links in the network, is the failed link, is the set of primary paths affected by the failure of link , is the backup path for primary path , and is a binary link indicator variable that equals 1 if the backup path for primary is the working bandwidth path uses link , 0 otherwise, and of the primary path . B. Performance Evaluation Environment To evaluate the performance of the proposed statistical spare capacity sharing scheme, we use the Toronto-MAN network as

Fig. 5. The Toronto metropolitan area network (MAN) test-bed network.

a test-bed network (Fig. 5). There are 600 possible node pairs in the test-bed network. We suppose that an MPLS network based on distributed label switched path (D-LSP) [21], [22] is operated with the SSBS scheme on the test-bed network. A conventional LSP (C-LSP) mechanism using the RAFT spare bandwidth scheme is compared with the D-LSP mechanism. In the C-LSP mechanism, a single LSP is established between nodes of an end node pair for demand traffic, while the D-LSP mechanism sets up multiple sub-LSP between nodes of an end node pair as mentioned in Section I. All demand traffic is assumed to have a bandwidth requirement of 10 Gbps. During the simulation time there are always 180 LSP connections between node pairs which are randomly selected from the 600 node pairs in the network. The lifetime of each LSP is determined with a value randomly selected from 15, 35, 45, and 60 days. After releasing the LSP when the lifetime expires, a new LSP is immediately established between randomly selected node pairs. C. Threshold for Guaranteeing the Restoration Level Equation (12) describes the mass function of the probability for the time to the joint failure of more than paths. The threshold time point is obtained from (13). At , the value of the cumulative mass function of the joint path failure probability equals , where is the required restoration level. Fig. 6 shows an example of the joint path failure probability mass function observed at a link in the test-bed network for , 1, and 2. The given link failure rate is . The repair time of a link failure is assumed to be one day long (or 24 hours). The simulation time is 500 000 repair times. The value of thresholds , and are 5, and 13 repair times, respectively. If the lifetime required by a path is smaller than the threshold of a backup link, the backup link guarantees the restoration of a joint path failure of paths with a probability of 99% (Fig. 6(a)). Fig. 6(a) is obtained from Fig. 6(b) by magnifying the range from 0 to 300 repair times. As increases, the peak of the mass function is lowered, the distribution is broadened, and the threshold values satisfying (13) become larger. Thus, for a given life time, and link failure rate, we can find the value of which satisfies (15) by increasing from 1 to a small number of primary paths on each backup link.


129

Fig. 7. Threshold for various link failure rates (), and jointness values ( ) of primary paths. TABLE II NETWORK BANDWIDTH REQUIREMENT OF THE PROPOSED D-LSP MECHANISM WITH THE SSBS SCHEME

Fig. 6. Joint path failure probability distribution, and thresholds for the restoration level. (a) Thresholds for guaranteeing a restoration level of 99%; (b) joint path failure probability masses.

TABLE I THRESHOLD ( ) TO GUARANTEE A REQUIRED RESTORATION LEVEL ()

spare bandwidth to be assigned to a link with moderate comis significant [23]. Fig. 7 putational complexity even when shows the relationship among the threshold, the jointness, the link failure rate, and the restoration level. The threshold value increases more rapidly as the link failure rate is lowered. As a result, the required restoration level can be achieved by using a value of which is significantly smaller than the number of backup paths passing a backup link. Thus, more spare bandwidth sharing can be expected on each backup link. D. Restoration Level, and Network Bandwidth

The threshold value is determined by the jointness of the , and primary paths (i.e., the value of ), the link failure . Table I shows the threshold the required restoration level , and , respectively, values , and for jointness for given link failure rates, and restoration levels. Each row of the table is calculated on a backup link randomly selected with the same number of 11 backup paths. The ratio of the threshold values for different values of indicates that the threshold value increases rapidly as the value of increases when the link failure rate is relatively high. Thus, the amount of spare bandwidth sufficient for restoring all the primary paths is determined with a moderate value of even with a high link failure rate. This means that the proposed scheme can calculate the amount of

The amount of network bandwidth for a target restoration level is evaluated for the D-LSP mechanism, and the C-LSP mechanism, in the simulation environment specified in Section IV-B. A larger amount of network bandwidth is required to maintain a higher restoration level for the SSBS scheme. Table II shows the network bandwidth of the D-LSP mechanism with SSBS , and restoration levels . for given link failure rates denotes the measured restoration level. The number of D-LSP connections is 300. The network bandwidth increases by 6% to 7% as the restoration level is increased from 95% to 99%. A lower link failure rate requires a smaller amount of network bandwidth with moderately decreased ratios of 5% to 6%. The restoration level, and the network bandwidth demand, of the proposed D-LSP mechanism with SSBS, and of the C-LSP mechanism with RAFT are compared in Table III. The network bandwidth is evaluated whenever a path is established or released. Three iterations are made in the simulation, and the average of the performance indices is shown in the table.

130


TABLE III NETWORK BANDWIDTH REQUIREMENTS OF THE D-LSP MECHANISM WITH SSBS, AND THE C-LSP MECHANISM WITH RAFT

in the above equaWe will investigate tion in detail. Because the number of possible combinations of paths which fail simultaneously among paths is , we can express as

Compared with the C-LSP mechanism with RAFT, the D-LSP mechanism with SSBS reduces the total network bandwidth by 11.2%, and 17.7% for 180, and 540 LSP, respectively, at the cost of a degradation in the restoration level of less than 1%. The netis greatly improved as the work bandwidth reduction ratio number of LSP connections increases. Thus, the SSBS scheme outperforms the conventional RAFT scheme in a highly utilized network. V. CONCLUSIONS

(A.2)

We introduce a novel framework to analyze the reliability of a network providing connection-oriented packet transfer services, such as MPLS networks, and propose a new spare bandwidth sharing scheme based on the framework of network reliability. The proposed spare bandwidth sharing scheme uses simple statistical characteristics of network reliability, including the link failure rate, the path lifetime, and the restoration level, to reduce the spare bandwidth demand compared with the conventional RAFT scheme by relaxing the constraint for sharing spare bandwidth. In this scheme, the statistical jointness among primary paths is calculated using the link, and path failure probability mass functions. or more paths If the threshold time to the joint failure of is longer than the minimum of lifetimes for the primary paths, the backup paths can share the spare bandwidth even though their primary paths are not physically link-disjoint. A comparison study shows that the D-LSP mechanism using the proposed statistical spare bandwidth sharing scheme requires less spare bandwidth than the C-LSP mechanism using RAFT at the cost of a degradation in the restoration level. Further study will include application of the statistical spare bandwidth sharing scheme to a network where each LSP connection requires a different level of restoration.

implies that

Because the event we obtain that

(A.3) Because of of

, is independent . In addition, Because the set of links belonging to is exclusive of , is independent . Thus, combining (A.2), and (A.3) yields

(A.4) where

is defined as

Because no path in longing to fails, pressed as

APPENDIX

,

fails if and only if no link becan be ex-

A. Derivation of Joint Path Failure Probability of Equation (1) The joint path failure probability of any primary paths pro, can be expressed as tected by a given backup link, (A.5)

paths fail at time no link shared by or more paths fails at time (A.1)

where

.


Combining (A.1), (A.4), and (A.5) yields

131

failure probability, and the reliability, respectively. For example, denotes the failure probability that one of the primary paths in fails at time . We omit the notations of time variables in the following expressions for notational simplicity. : Probability that no path fails at time 1) In this case, the probability that all of the primary is written as paths are normal at (B.1) is the link failure probability at time . : Probability that one of paths fails at The probability that one of paths fails at time

where 2) B. Joint Path Failure Probability for

, 1, and 3

Let be the set of primary paths protected by a given backup link. Let denote the number of primary paths protected by the be the -th link shared by more than backup link. Let primary paths, and denote the total number of . represents the -th dedicated link of the -th primary of denotes path in . Superscript ‘ ’, the number of paths which experience joint failure for a given backup link. We use superscripts ‘ ,’ and ‘ ’ to represent the path, and link, respectively. We use ‘ ,’ and ‘ ’ to represent the

is

path fails all other paths are normal no fail in (B.2)

(B.3)

(B.4)

132

3)


: Probability that three paths fail jointly at We derive the joint path failure probability for the from (7). is the set case of denotes of paths chosen arbitrarily from . . Then we obthe link shared by paths tain the following as shown in (B.3) and (B.4) at the bottom of the previous page where

REFERENCES [1] A. P. Snow, “Network reliability: the concurrent challenges of innovation, competition, and complexity,” IEEE Trans. Rel., vol. 50, no. 1, pp. 38–40, Mar. 2001. [2] “T1A1.2 WG on Network Survivability Performance,”, Tech. Rep. 70, T1.TR.70-2001, Sep. 2001. [3] D. Medhi and D. Tipper, “Some approaches to solving a multi-hour broadband network capacity design problem with single-path routing,” Telecommunication Systems, vol. 13, no. 2, pp. 269–291, 2000. [4] C.-S. Wu, S.-W. Lee, and Y.-T. Hou, “Backup VP preplanning strategies for survivable multicast ATM networks,” in Proc. IEEE ICC97, vol. 1, Jun. 1997, pp. 8–12. [5] C. Dovrolis and P. Ramanathan, “Resource aggregation for fault tolerance in integrated services networks,” in Proc. ACM Computer Communications Review, vol. 28, Apr. 1998, pp. 39–53. [6] A. Gersht and A. Shulman, “Architecture for restorable call allocation and fast VP restoration in mesh ATM networks,” IEEE Trans. Commun., vol. 47, no. 3, pp. 397–403, 1999. [7] Y. Liu, D. Tipper, and P. Siripongwutikorn, “Approximating optimal spare capacity allocation by successive survivable routing,” in Proc. IEEE INFOCOM, 2001, pp. 699–708. [8] C.-C. Lo and B.-W. Chuang, “A novel approach of backup path reservation for survivable high-speed networks,” IEEE Commun. Mag., vol. 41, no. 3, pp. 146–152, Mar. 2003. [9] H. M. F. AboElFotoh and L. S. Al-Sumait, “A neural approach to topological optimization of communication networks, with reliability constraints,” IEEE Trans. Rel., vol. 50, no. 4, pp. 397–408, Dec. 2001. [10] F.-M. Yeh, S.-K. Lu, and S.-Y. Kuo, “OBDD-based evaluation of k-terminal network reliability,” IEEE Trans. Rel., vol. 51, no. 4, pp. 443–451, Dec. 2002. [11] M. Claudio, S. Rocco, and J. A. Moreno, “Network reliability assessment using a cellular automata approach,” Reliab. Eng. Syst. Saf., vol. 78, pp. 289–295, 2002. [12] C.-C. Hsieh and M.-H. Lin, “Reliability-oriented multi-resource allocation in a stochastic-flow network,” Reliab. Eng. Syst. Saf., vol. 81, pp. 155–161, 2003. [13] P. Tsopelas and A. Platis, “Performability indicators for the traffic analysis of wide area networks,” Reliab. Eng. Syst. Saf., vol. 82, pp. 1–9, 2003. [14] D. Awduche et al., Requirements for Traffic Engineering Over MPLS, Sep. 1999.

[15] X. Xiao, A. Hannan, and B. Bailey, “Traffic engineering with MPLS in the internet,” IEEE Netw., vol. 39, no. 2, pp. 28–33, Mar./Apr. 2000. [16] A. Elwalid, C. Jin, S. Low, and I. Widjaja, “MATE: MPLS adaptive traffic engineering,” in Proc. IEEE INFOCOM2001, 2001, pp. 1300–1309. [17] C. Hopps, Analysis of an Equal-Cost Multi-Path Algorithm, Nov. 2000. [18] E. Rosen et al., Multiprotocol Label Switching Architecture, Jan. 2001. [19] V. Sharma and F. Hellstrand, Framework for Multi-Protocol Label Switching (MPLS)-Based Recovery, Feb. 2003. [20] C. Huang, V. Sharma, K. Owen, and S. Makam, “Building reliable MPLS networks using a path protection mechanisms,” IEEE Commun. Mag., vol. 40, no. 3, pp. 156–162, Mar. 2002. [21] K.-S. Sohn and D. K. Sung, “A distributed LSP mechanism to reduce spare bandwidth in MPLS networks,” in Proc. IEEE ICC2002, vol. 2, May 2002, pp. 1189–1193. [22] K.-S. Sohn, S. Y. Nam, and D. K. Sung, “Network cost reduction with a distributed label switched path scheme in MPLS networks,” IEEE Trans. Commun., submitted for publication. [23] T. Gomes, J. Craveirinha, and L. Martins, “An efficient algorithm for sequential generation of failure states in a network with multi-mode components,” Reliab. Eng. Syst. Saf., vol. 77, pp. 111–119, 2001.

Kyu-Seek Sohn received the B.S. and M.S. degrees in electronic engineering from Hanyang University, Seoul, Korea, in 1982, and 1984, respectively; and the Ph. D. degree in electrical engineering from the Korea Advanced Institute of Science & Technology (KAIST), Daejeon, Korea in 2003. From December 1983 to March 2002, he was a Research Engineer at LG Cable Ltd. where he had been engaged in various projects including the development of an optical fiber outside plant operation & management systems. In 2004, he joined the faculty of Hanyang Cyber University where he is a full-time lecturer at the Department of Information & Communications. His research interests include performance & reliability of communication systems & networks, high speed networks, and QoS of the Internet.

Seung Yeob Nam received the B. S., M. S., and Ph. D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejon, Korea, in 1997, 1999, and 2004, respectively. He is currently a Postdoctoral research fellow at CyLab in Carnegie Mellon University, supported by both CyLab, and the Postdoctoral Fellowship Program of the Korea Science & Engineering Foundation (KOSEF), where he is working in the area of internet security, and survivable networks. His research interests include network performance monitoring, scalable provisioning of QoS in the internet, internet security, high-speed switching systems, wireless networks, etc. Dr. Nam received the Best Paper Award from the APCC 2000 Conference, and Bronze prize from the 2004 Samsung Humantech paper contest.

Dan Keun Sung received the B.S. degree in electronics engineering from Seoul National University in 1975; and the M. S., and Ph. D. degrees in electrical & computer engineering from the University of Texas at Austin, in 1982, and 1986, respectively. From May 1977 to July 1980, he was a Research Engineer at the Electronics & Telecommunications Research Institute (ETRI) where he had been engaged in various projects including the development of an electronic switching system. In 1986, he joined the faculty of KAIST where he is currently Professor at the department of Electrical Engineering and Computer Science. He was Director of the Satellite Technology Research Center (SaTReC) of KAIST from 1996 to 1999. He is Editor of IEEE Communication Magazine. He is also Editor of the Journal of Communications and Networks. His research interests include mobile communication systems & networks, high speed networks, next generation IP based networks, traffic control in wireless & wireline networks, signaling networks, intelligent networks, performance & reliability of communication systems, and microsatellites. He has published more than 300 papers in journals & conferences, and has filed more than 100 patents or patents pending. He is a Senior Member of IEEE, a Member of IEICE, and KICS. He is also a Member of Phi Kappa Phi, and Tau Beta Pi.