A Distributed Real Time Path Restoration Protocol with Performance ...

0 downloads 0 Views 250KB Size Report
Protocol with Performance close to Centralized. Multi-commodity ... Dept. of Electrical Engineering, University of Alberta, Edmonton, Alberta. Abstract: Intense ...
from proc. 1st Int’l Conf. on Design of Reliable Communication Networks DRCN’98, Brugge, Belgium, May 1998, paper O9

A Distributed Real Time Path Restoration Protocol with Performance close to Centralized Multi-commodity Max Flow R.R. Iraschko1, Member IEEE, W.D. Grover2,4, Senior Member IEEE, M.H. MacGregor3,4, Member IEEE. 1. MCI, 2400 North Glenville Drive, Richardson, Texas, USA, 75082 2. TRLabs, #800 10611 - 98 Ave., Edmonton, Alberta, Canada T5K 2P7 3. Telus, Flr 20E, 10020 100 Street, Edmonton, Alberta, Canada, T5J ON5 4. Dept. of Electrical Engineering, University of Alberta, Edmonton, Alberta Abstract: Intense competition between transport network service providers and the widespread deployment of vulnerable high-capacity fiber optic transport facilities has created the need for networks with short restoration times. This paper presents an optimized distributed real time path restoration mechanism, named OPRA, capable of restoring network failures quickly with performance close to centralized multi-commodity max-flow. OPRA synthesizes restoration pathsets by autonomous, database-free, self-organizing interaction between nodes. Simulation results predict that OPRA will restore network failures in very minimally redundant networks in less than two seconds.

synthesize the maximum number of restoration paths topologically feasible. A few distributed path restoration algorithms have also been reported [10, 18], but none of these attempt to find a pathset which restores the maximum amount of lost demand that is topologically feasible. Distributed path restoration algorithms developed to date have focused on achieving restoration within the two second calldropping threshold, and have given capacity efficiency secondary consideration. The work here, on the other hand, attempts to configure the surviving spare links1 of a path restorable network into a pathset that is equivalent to a multicommodity max-flow pathset, and do so within the two second call-dropping threshold.

I. Introduction II. Central Principles of OPRA A. Background A. The Interference Heuristic for Coordinating Pathset Formation The development of Digital Crossconnect Systems (DCS), and networks with physically diverse routes, promotes the use of mesh restoration techniques. Mesh-based survivable network architectures exploit the intelligence of DCS-based transport networks to minimize the amount of spare capacity required to protect working demands, relative to selfhealing rings or 1:1 protection switching [3, 4, 8, 9, 11, 12, 13, 14]. Mesh restoration algorithms should restore network failures within two seconds using a minimum amount of spare capacity. Of all restoration architectures [21], mesh restorable networks employing distributed path restoration have the greatest potential to satisfy this goal. The main contribution of the present research is a distributed dynamic path restoration algorithm, named OPRA (Optimized Path Restoration Algorithm), which largely satisfies this goal.

A distributed path restoration algorithm must restore the maximum amount of lost demand that is topologically feasible regardless of a networks’ connectivity, and spare and working capacity placement. Considering the number of permutations and combinations of restoration paths possible between all source and destination node pairs affected by a failure, the use of a heuristic to find that pathset which maximizes network restorability2 (Rn) is warranted. Most distributed algorithms use a ‘first come, first served’ approach which is effectively a shortest path heuristic [6]. The proposed distributed path restoration algorithm uses a new interference heuristic, which is explained in this section.

B. Prior Work

1. In this work a link is only one unit of transport capacity on a span, e.g. a single DS3, STS1, OC12 etc.

To date several distributed span restoration algorithms have been reported [4, 9, 11, 12, 17], many of which mention the ability to perform path restoration with the same process [9, 11, 17]. In general, any span restoration algorithm can be turned into a rudimentary path restoration scheme by iteratively applying the span restoration algorithm to all affected source - destination demand pairs. While this method can be used in path (and hence node) restoration, the recovery patterns obtained from uncoordinated concurrent (or arbitrary sequential) execution of a span restoration algorithm may not yield any form of pro-rated allocation of recovery levels amongst affected demand pairs. Furthermore, such a method is not guaranteed to

S S         2. R n =  [ min ( w i, k i ) ] ⁄  [ w i ] , where     i = 1  i = 1  S is the number of spans in the network, wi is the working capacity on span i, and ki is the number of restoration paths the restoration mechanism is able to synthesize after the failure of span i.



1



from proc. 1st Int’l Conf. on Design of Reliable Communication Networks DRCN’98, Brugge, Belgium, May 1998, paper O9

OPRA is based on insights gained from extensive emulation and exploration of potential processes for path restoration. One of the main principles identified is that to maximize restorability one should defer the creation of restoration paths which render a large number of other restoration paths infeasible. OPRA has a built in mechanism which prefers to use restoration paths which eliminate the fewest other potential restoration paths. This principle is referred to as the interference principle, and its distributed implementation in OPRA is one of the main contributions of this work. The interference principle is illustrated in Figure 1. Two relations are affected by the failure of the span between nodes 8 and 9. Given a network with a single spare link per span, Figure 1 shows the k-shortest link disjoint replacement paths for each affected demand pair in isolation. When the two sets of restoration paths are overlaid, it becomes clear that some paths are more “costly” than others when evaluated in terms of the number of other restoration paths rendered infeasible by that path’s existence. For example, restoration path B does not prevent the formation of any other path in either of the two pathsets, whereas implementation of restoration path C prevents formation of paths D, F, and G. In Figure 1 the overall restorability is maximized in the available sparing when those paths which eliminate the fewest other paths are chosen to restore the failure. For example, five units of demand can be restored using those restoration paths with the lowest interference number i.e. paths B, D, E, F, and G, while only a maximum of 3 units of demand can be restored if restoration paths with the largest interference numbers, i.e. paths A, and C, are used. Mathematically the interference principle can be expressed as x D i ( x ≠ r ) Pi S ( j ≠ i ) r, p I = i

∑ ∑ ∑

 δ r, p ⋅ δ x, y  i, j i, j 

x=1 y=1 j=1

where: r, p Ii

is the interference number of the

0 3

6

∀p = 1, 2, …, P

r i

pth

i

G

δ

B C

9

X

(b) k-shortest link disjoint restoration paths between nodes 1 and 8 in isolation

E F

X

X (c) k-shortest link disjoint restoration paths between nodes 2 and 5 in isolation Restoration Path

(d) interference between individual restoration pathsets

Path Interference Number

A B

2 (contention on spans 2-3 and 2-8)

C D E F G

3 (contention on spans 1-5, 5-7, and 7-8) 1 (contention on span 1-5) 1 (contention on span 2-3) 1 (contention on span 5-7) 2 (contention on spans 2-8 and 7-8)

0

Figure 1. Concept of Interference Numbers A distributed path restoration algorithm does not have access to a database of potential restoration paths, which makes it impossible to use the interference principle as defined in equation 1.0 in a distributed implementation, as this implies a global network view. By design OPRA will not have more than a node-local view. Shifting from a global to a node-local view requires associating interference numbers with the state based signals used to establish the framework for interaction between nodes, known as statelets. The adaptation of the interference principle as defined in equation 1.0 to a distributed implementation based on statelets defines the interference heuristic. Distributed Restoration Algorithms (DRAs) like the SHN [5] and OPRA, incorporate some form of controlled rebroadcasting. OPRA uses the interference heuristic to control the rebroadcasting process in a way that results in a complete set of coordinated restoration paths between many node pairs in a single pass. This pathset is consistent with the exact quantities of spare links available on each span in the network, and maximizes Rn. In OPRA, each concurrent broadcast mesh established at a tandem node forwards one restoration statelet on one link in each span terminated at this node, except the span on which the restoration statelet arrived. The basic target broadcast pattern at a source node

eligible restoration

is the total number of demand pairs to be restored after the

is the total number of eligible restoration paths for demand pair r upon the failure of span i,

S

X

D

failure of span i, r Pi

A

(a) Network topology showing one spare link per span and demand pairs to be recovered

path from relation r after the failure of span i, D

5

7

8

∀i = 1, 2, …, S i

4

2

(1.0)

∀r = 1, 2, …, D

1

is the total number of spans in the network, and r, p takes the value of 1 if the pth restoration path for demand i, j pair r after the failure of span i uses span j, and 0 otherwise.

2

from proc. 1st Int’l Conf. on Design of Reliable Communication Networks DRCN’98, Brugge, Belgium, May 1998, paper O9

When there are fewer spares on a span than there are restoration statelets wanting to obtain rebroadcast through it, a span's interference number is a positive integer. When there are more spares on a span than there are restoration statelets wanting to access it, a span's interference number is zero. A span is never allowed to have a negative interference number, as shown in Figure 2 (b). During restoration the interference number of a span will change as restoration statelets appear and disappear at a node. Calculating the interference number of a span associates larger values with spans that are in high demand relative to their spare capacity, and smaller values with spans that are in low demand relative to their spare capacity. The interference number of each span a restoration statelet traverses is added to the value of that statelet’s interference number field. Initially the value of a statelet’s interference number field is zero, and accumulates as a statelet is rebroadcast at tandem nodes. When the target broadcast pattern of all statelets competing for rebroadcast at a node are satisfied based on their interference number from lowest to highest, as shown in Figure 3, a distributed implementation of the interference principle is achieved.

aims to forward on each span a number of restoration statelets equal to the number of working paths lost by this node. Often the target broadcast pattern at a node cannot be satisfied fully for each restoration statelet because a span can only support a number of restoration statelets equal to the number of spare links on that span. The interference heuristic mediates the competition between restoration statelets for rebroadcast in such cases. The number of other restoration statelets that cannot obtain rebroadcast in a given span determine that spans interference number. The interference number of a span is calculated by counting the number of restoration statelets competing to be forwarded on a span and subtracting from this sum the number of statelets the span can support. For example, in Figure 2 (a) two restoration statelets on span 1, one restoration statelets on span 2, and one restoration statelet on span 3, require rebroadcast on the single spare link available on span 4, resulting in a span interference number of three for span 4. Likewise, restoration statelets originating at a node must compete with incoming signals for available spares. As shown in Figure 2 (b), the node wants to transmit two restoration statelets on each span it terminates and also satisfy the single restoration statelet received on span 2, resulting in an interference number of zero for span 1. (a) Tandem Node Link 3

Link 1

1 2 3 4

Link 2

Span 1

Interference Number 3-3=0 4-2=2 4-3=1 4-1=3

Span

Span 2

Link 2 Span 3

Span 3 Link 1

(b) Combined Source/Tandem Node

Link 1

Span 1 Span 1 2 3 4

Interference Number 3-3=0 0 2 - 3 = -1 3-3=0 3-1=2

Link 1

Link 3 Span 3 Link 1

Link 3

A bidirectional link

Link 1 Span 4

Span 2

Span 4

Link 3 Span 3

Link 1

restoration statelet

source span 2, link 2

1 2

Link 3 Span 1

Link 1 Link 1

ranking based on the value of a statelet’s interference number

(b) Combined Source/Tandem Node

Link 3

Link 1

1 2 3 4 5

Link 3

Link 1

Link 3

span 1, link1 span 2, link 1 span 4, link 1 span 1, link 2 span 3, link 1

Link 1

Link 1

Span 2

Span 4

Link 1

Link 1

Span 2

Span 1

restoration statelet

ranking based on the value of a statelet’s interference number

Link 3

Span 4

Link 1

(a) Tandem Node

A restoration statelet

A bidirectional link

Vacant link

Figure 2. Span Interference Numbers

Link 3 A restoration statelet

Vacant link

Figure 3. Examples of basic target broadcast patterns

3

from proc. 1st Int’l Conf. on Design of Reliable Communication Networks DRCN’98, Brugge, Belgium, May 1998, paper O9

B. The End-node Bottleneck Effect and Bidirectional Flooding (a) Ordinary Flooding

While the interference heuristic can be used by a DRA to defer using paths which traverse spans with relatively few spare links, it does not explicitly defer paths which tandem through the source or destination nodes of lost demands. It is however advantageous to defer formation of such paths because the spare links incident with the source and destination of a failed working path are often the limiting resource for restoration of the demand originating at that node. If the spare links incident at a node are used to restore that node’s lost demand, rather than for tandeming a restoration path from another node pair, two spare links can be used to restore two units of lost demand instead of one. Figure 4 illustrates the end-node bottleneck issue when multiple node pairs search for replacement paths simultaneously. Assume that node pair (1-4) has lost one unit of demand and (2-3) has lost three units of demand. When only one of the nodes from a relation affected by a failure searches for replacement paths, referred to as ordinary flooding in Figure 4 (a), it is possible that some replacement paths will traverse the end-nodes of other affected relations. In Figure 4 (a) the black restoration path for relation (1-4) traverses node 3, which is the destination for relation (2-3). If (1-4) restores one unit of lost capacity along the path traced by the forward flooding black restoration statelet1, i.e. restoration statelet B in Figure 4 (a), then relation (2-3) is limited to restoring one unit of demand instead of three. Thus the problem is that sub-optimal tandem-role path choices can starve out the potential restoration paths for other pairs right at their end-node bottlenecks. However, if bidirectional flooding is adopted, the spare links incident with both source and destination of a failed working path are occupied so that they may be efficiently used to restore demand originating at that node. With bidirectional flooding in Figure 4 (b), restoration statelets are transmitted on each of the links terminated at nodes 1, 2, 3, and 4 at the start of the process. This tends to prevent the early traversal of either the source or destination of a relation to be restored by tandem-role paths for other relations, and allows for the creation of four restoration paths, one for relation 1-4 and three for relation 2-3, as shown in Figure 4 (b). Like OPRA, two other distributed span restoration algorithms to date also employ bidirectional flooding [6, 7]. However, their main intent in doing so is to reduce the restoration time rather than as part of a strategy to form optimized2 restoration pathsets. Though bidirectional flooding may also decrease the restoration time of the distributed path restoration algorithm presented here, it is primarily used in OPRA as part of the strategy to optimize the restoration pathset by avoiding the end-node bottleneck traversal problem. Spans local to an end node of a demand pair will by bidirectional flooding be quickly seized into anchoring paths for that demand pair. In summary, OPRA is a distributed path restoration algorithm that uses the interference heuristic and bidirectional flooding to achieve a solution to the restoration problem. These are the two key traits of OPRA. A full implementation of OPRA is explained in detail in the following section.

1

3

7

5

2

6

Restoration Statelet A, Interference Number = 1

A single spare link

Restoration Statelet B, Interference Number = 0

4

Network topology showing one spare link per span and the initiation of ordinary flooding

Steady state of the network until node 4 reverse links restoration statelet B.

(b) Bidirectional Flooding 1

Reverse linking restoration statelet

3

7

5

2

6

! 4

Network topology showing one spare link per span and the initiation of bidirectional flooding

Recognition of matches between black and grey statelets at node 5 initiates reverse linking

!

! Final state of the network with four bidirectional restoration paths.

Recognition of matches between grey restoration statelets at nodes 1 and 4 initiates reverse linking.

Figure 4. Principle of End-Node Bottlenecking

III. Details of the Optimized Distributed Path Restoration Algorithm (OPRA) In the implementation described here, the nodes of a network in which OPRA is deployed interact solely through the links between them. Restoration occurs as a network-level by-product of the isolated actions of each node, and the processor at each node acts as a stateletprocessing engine during execution of the OPRA task. After a failure, both end-nodes terminating a severed working path, named A and Z in Figure 5, begin transmitting statelets that form the base or root of a broadcast mesh of a single source-destinationindex statelet family. The interference number of all statelets initiated by a source node are assigned the same interference number. As a

1. The process of tracing back the forward flooding path of a restoration statelet is known as reverse linking. 2. The term optimal when used in conjunction with restoration pathset implies a pathset which maximizes Rn

4

from proc. 1st Int’l Conf. on Design of Reliable Communication Networks DRCN’98, Brugge, Belgium, May 1998, paper O9

In order to synthesize a restoration pathset which maximizes Rn, as tested and confirmed by the results presented later, OPRA allows the forward flooding statelet paired with a complemented statelet to disappear after reverse linking is continued locally at one node. This disappearance stops the reverse linking process. For example, consider Figure 6, in which the grey forward flooding statelet on link 5 is received at node 2 before the black reverse linking statelet is received on link 3. Since the interference number of the grey statelet received on link 5 is less than the interference number of the black precursor1 on link 2, the grey statelet on link 5 is allowed to supplant the black statelet on link 3. Consequently, when the black complemented statelet is received on link 3 at node 2, reverse linking is stopped. Had the reverse linking statelet arrived at node 2 before the grey statelet on link 5, node 2 would have relayed the reverse linking statelet along the path traced by the black precursor on link 2, and prevented the grey statelet on link 5 from transmitting any forward flooding statelets.

statelet traverses the network, its interference number increases as statelets on other index, source, and destination families arrive at tandem nodes. Though the interference number of a statelet may decrease when a statelet from another family disappears at a tandem node, it is never less than zero. Any increase or decrease in a statelets interference number is propagated down each branch of that family’s broadcast mesh. Unless a restoration path is one hop long, the statelet’s initiated by A and Z as shown in Figure 5 will in general be rebroadcast through the network and meet at some tandem nodes. When a statelet initiated by node A collides with a statelet initiated by node Z, a potential restoration path is identified. This event is called a match in OPRA. After a match, the reverse linking indicator of the forward flooding statelet initiated by node A is set to the value of the index of the statelet initiated by node Z, and conversely for the reverse linking indicator of the statelet initiated by node Z.

IntNo = 2 source = A destination = Z index = 1

source = A destination = Z index = 2

link 1

source = Z destination = A index = 2

link 2 link 3

node 1

node 2 link 4

A statelet family

IntNo =1 node 3

A

Z

!

link 5

!

match

link 6 (a) time = t

source = A destination = Z index = 3 match, reverse linking indicator field set to 2 for the grey statelet & 3 for the black statelet

link 1

source = Z destination = A index = 1

IntNo = 2

link 2 link 3

node 1

node 2

source = Z destination = A index = 3

link 4

link 5 IntNo =1

node 3

Forward flooding statelet Reverse linking complemented statelet

! link 6

Figure 5. Forward Flooding (b) time = t + Once the reverse linking indicator is set to a non-null value, the statelet from node A follows a path paralleling the forward flooding path of the statelet initiated by node Z, as shown in Figure 5. A statelet with the reverse linking indicator set is referred to as a complemented statelet. The arrival of a complemented statelet paired with a forward flooding statelet at a tandem node causes the complement condition to be propagated toward the source of the forward flooding statelet, and the cancellation of all other statelets on that source-destination-index family. This reopens the local tandem node competition for new statelets on other families. Reverse linking is complete when a complemented statelet reaches the source of the forward flooding statelet with which it is matched.

t1

Forward flooding statelet Reverse linking complemented statelet Figure 6. Reverse linking

1. A precursor of a restoration statelet family is that statelet which is rebroadcast locally at a tandem node, forming the root of the rebroadcast pattern locally at this node for this family.

5

from proc. 1st Int’l Conf. on Design of Reliable Communication Networks DRCN’98, Brugge, Belgium, May 1998, paper O9

minimize the total spare capacity of a path restorable network (i.e. case 2 as defined in [2]). Details of the network and capacity design are presented in Table 1.

When the grey forward flooding statelet on link 3 is received at node 1, the black forward flooding statelet on link 4 is cancelled along with the black reverse linking statelet on link 3. Subsequently the broadcast pattern of the precursor on link 1 and the precursor on link 3 at node 1 are satisfied to the greatest extent possible, consistent with the overall rank of the statelets. By suspending the reverse linking process of those statelets with higher interference numbers, OPRA resolves itself into those restoration paths with the lowest interference number. Unless the reverse linking process of a forward flooding statelet with a large interference number is fast, it may not succeed, as depicted in Figure 6. Given that all the branches of a broadcast mesh seek a match simultaneously, many reverse linking processes may be initiated at the same time for a given statelet family, each racing to collapse the broadcast mesh on itself. Only the reverse linking process which succeeds in cutting off all other branches in the broadcast mesh survives this race, and all other reverse linking processes from that family die. Note that this is a mechanism which helps self-allocate the available spare links amongst the relations to be restored. To determine if the two complemented statelets initiated after a match reach their destination, the demand pair to be restored (A and Z in Figure 7) perform a loop-back test. This loop-back test is initiated after the complemented statelet destined for the node with the smaller ID, node A in Figure 7, reaches its destination. It involves transmitting a statelet referred to as a confirmation statelet along the path traced by the complemented statelet. If this confirmation statelet returns to its source, node A in Figure 7, a valid restoration path exists. The OPRA processes just described occur concurrently at different stages and times at all nodes involved in a restoration event. Forward flooding proceeds simultaneously for several statelet families while other statelet families may be reverse linking or performing a loop-back test. Each family of statelets competes against others to expand in the forward flooding process. Successful statelet families collapse in the process of reverse linking. In the final state, only bidirectional complementary statelet pairs persist. Each pair runs the length of one continuous non-branching non-looping restoration path that condensed out of the mesh of one statelet family.

7

11 4

6

2

8 12

0 9

14

10 Figure 8. Test Network Topology

Table 1. Test Network Characteristics Avg. Avg. span No. of Total demand No. of No. of network length demand between all nodes spans degree (km) pairs node pairs 15

28

3.73

10.3

67

824

Avg. no. of working links/ span

Avg. no. of spare links/span

Total no. of links

Physical network Redundancy

56.7

38.1

2 655

0.79

This test network has only that amount of spare capacity needed for a centralized multicommodity max-flow program to effect 100% restoration. The capacity placement of this network was designed using the IP from [2]. Though OPRA does not have to duplicate the pathset found by the IP, it must make equally efficient use of the network’s spare capacity to restore all span cuts. Therefore, this network serves as a stringent test of OPRA’s ability to efficiently restore span cuts.

match

!

5

3

Confirmed statelet performing a loop-back test

A

13

1

Z

V. Simulation Environment Forward flooding statelet Reverse linking complemented statelet

For the test results presented, it was assumed that restoration statelet information would be imprinted on a link using the path overhead of a SONET signal. Anticipating a statelet length of 80 bits, and a 64 kbps transmission channel, a statelet’s insertion time on a link was set at 1.25 msec. The time required to process any new statelets received at a node was set at 0.5 msec, and the time required to reevaluate the broadcast pattern at a node at 1 msec. OPRA was implemented using a polling mechanism to sequentially cycle through all ports at a DCS, and process new statelets at discrete time intervals in an orderly fashion. The time between successive polls of all ports was set at 0.5 msec.

Figure 7. Performing a loop-back test

IV. Test Network OPRA’s ability to restore single and multiple span failures was tested in 18 different network models [1]. The results from one of these tests is presented in detail here. The topology of the representative test network is shown in Figure 8. The capacity placement in this network was designed using the Integer Program (IP) presented in [2] to

6

from proc. 1st Int’l Conf. on Design of Reliable Communication Networks DRCN’98, Brugge, Belgium, May 1998, paper O9

VI. Test Results The following plot records OPRA’s performance in the network presented in section IV. Figure 9 plots the restoration trajectory of all individual span failures. Each trace plots percentage restoration (0% 100%) against time. The fault which initiates OPRA is at time zero. The number of traces in Figure 9 is equal to the number of spans in the network shown in Figure 8. The following specific traces are highlighted: • that trace which achieves 100% restorability first, • that trace which achieves 100% restorability last, • that trace with the lowest span restorability level of any span in the network (Rn,wc), • that trace which takes the longest to complete its restoration plan (tR). The display of all span restorability trajectories in a single plot conveys the performance envelope of OPRA in this network in terms of the dynamics of restoration time and recovery level.

Table 2. Restoration Path Characteristics Ideal Solution

Span Restorability

Standard Deviation from avg. restoration path length (IP)

43.43 km

97.00 km

17.99 km

Standard Standard Avg. Longest deviation deviation Average Avg. restoration restoration from avg. from statelet interference path length path restoration average volume no. (OPRA) (OPRA) path length interference per link (OPRA) no.

0.9 span failure 7-10, max. restorability (Rn,wc) = 93.6%

0.7

Longest Restoration Path (IP)

OPRA

1 0.8

Avg. restoration path length (IP)

33.59 km 86.00 km 13.53 km

67.11

68.61

7.84

0.6 0.5

span failure 9-10, max. restorability = 99.2% restoration time (tR) = 1 595 msec.

0.4 0.3

VII. Interpretation and Discussion of Results From the restoration trajectory plot it is apparent that OPRA is able to fully restore most span failures before the two second call dropping threshold. Network restorability is 98.9%, with an average restoration time over all span cuts of 330 msec. Those spans which OPRA restores first contain few working links. The restoration trajectories for these small spans are often nearly vertical lines because all of the restoration paths required to restore these spans are identified almost simultaneously as a result of the parallelism inherent in OPRA. It is, however, reasonably expected that network restorability could be 100% in practice because real networks will inevitably have somewhat more spare capacity than the theoretical minimum in the reference network design used here. This is inevitably due to provisioning interval considerations, and the modularity effects of transmission systems. Comparing the restoration path length data in Table 2 from OPRA to the IP reference solutions, it is apparent that OPRA tends to form slightly shorter paths than the IP. As shown in Table 2, the average and longest restoration path length found by OPRA is smaller than that of the IP. This is related to the slightly sub-optimal restorability results as follows: OPRA seeks to complete those restoration paths with the lowest interference number first. However, a forward flooding statelet with a larger interference number may succeed at establishing a short restoration path because the time required to complete a restoration path, as well as interference numbers, moderate the competition between forward flooding statelets for spare capacity. Therefore, OPRA may establish a short restoration path with a larger interference number, instead of a more capacity efficient long restoration path with a lower interference number, because this is the minimal temporal path towards full restoration. The reference solution set found by the IP is

span failure 0-2, max. restorability = 100% restoration time = 32.00 msec.

0.2

span failure 9-14, max. restorability =100% restoration time = 695.0 msec.

0.1 0 0

250

500

750

1000 1250 1500 1750 2000

Restoration Time (msec) Network Restorability (Rn) = 98.9% Time first restoration path found (tp1) = 32.00 msec. Time last restoration path found (tR) = 1 595 msec. Average restoration time (tp,avg) = 330.3 msec. Standard deviation from average restoration time = 283.7 msec. Time required to complete 95% of all paths (t95) = 1 100 msec. Figure 9. Restoration Trajectories In Table 2 the pathset OPRA synthesizes is compared in detail to that of the ideal IP reference solution from [2]. The average length, longest, and standard deviation of the restoration paths synthesized by OPRA and found by the IP are shown. These values facilitate comparing the restoration pathsets found by the reference IP solution [2] and OPRA. The interference number information shown in Table 2 is a diagnostic reflection of the degree of competition amongst statelets attempting to restore a span cut. The last column in Table 2 presents the average number of distinct statelets that occupied a link in a single restoration event.

7

from proc. 1st Int’l Conf. on Design of Reliable Communication Networks DRCN’98, Brugge, Belgium, May 1998, paper O9

only concerned with minimizing a network’s capacity requirements, whereas OPRA’s combined objective is to minimize both the restoration time and the spare capacity used. This leads OPRA to synthesize restoration paths that may be slightly shorter than those of the IP. In the tests presented here, this trait reduces OPRA’s network restorability to 98.9%. In the other 17 test networks in [1] the restorability ranged from 97.2% to 100%. The last entry in Table 2, the average statelet volume per link, gives an indication of the communication requirements between nodes during restoration. As shown in Table 2, an average of 7.8 unique statelets are transmitted between nodes on a single link. This means that a statelet occupies a link on average for 42 msec1.

[7] Fujii, H., Yoshikai, N., “Restoration message transfer mechanism and restoration characteristics of double-search self-healing ATM network”, IEEE J-SAC Special Issue: Integrity of Public Telecommunication Networks, vol. 12, no. 1, Jan. ‘94, pp. 149 158. [8] Chao, C. W., Dollard, P. M., Weythman, J. E., Nguyen, L. T., Eslambolchi, H., “FASTAR-a robust system for fast DS3 restoration”, Proc. IEEE GLOBECOM ‘91, Dec. 1991, pp. 39.1.139.1.5. [9] Chujo, T., Komine, H., Miyazaki, K., Ogura, T., Soejima, T., “Distributed self-healing network and its optimum spare capacity assignment algorithm”, Electronics and Communications in Japan, part 1, vol. 74, no. 7, 1991, pp. 1-8. [10] Kawmaura, R., Sato, K., Tokizawa, I., “Self-healing ATM networks based on virtual path concept”, IEEE J-SAC Special Issue: Integrity of Public Telecommunication Networks, vol. 12, no. 1, Jan. ‘94, pp. 120 - 127. [11] Sakauchi, H., Nishimura, Y., Hasegawa, S., “A self-healing network with an economical spare-channel assignment”, Proc. IEEE Globecom ‘90, Dec. 1990, pp. 438-443 [12] Yang, C.H., Hasegawa, S., “FITNESS: Failure immunization technology for network service survivability”, Proc. IEEE Globecom ‘88, Dec. 1988, pp. 47.3.1-47.3.5 [13] Baker, J. E., “A distributed link restoration algorithm with robust preplanning”, Proc. IEEE GLOBECOM ‘91, Dec. 1991, pp. 306311. [14] Coan, B.A., et al., “Using distributed topology updates and preplanned configurations to achieve trunk network survivability”, IEEE Transaction on Reliability, vol. 40, no.4, 1991, pp. 404-416 [15] Coan, B.A., Vecchi, M. P., Wu, L.T., “A distributed protocol to improve the survivability of trunk networks”, Proceedings of the 13th International Switching Symposium, May 1990, pp. 173 179. [16] Saniee, I., “Optimal routing designs in self-healing communications networks”, Bellcore, MRE 2D-362, 445 South Street, Morristown, NJ 07960-6438, fourth draft, May 1994. [17] Komine, H., Chujo, T., Ogura, T., Miyazaki, K., Soejima, T., “A distributed restoration algorithm for multiple-link and node failures of transport networks”, Proc. IEEE Globecom ‘90, Dec. 1990, pp. 459 - 463. [18] Struyve, Kris, Demeester, P., “Design of distributed restoration algorithms for ATM meshed networks”, Proc. of the 1995 IEEE 3rd Symposium on Communications and Vehicular Technology, 1995, pp. 128 - 135. [19] Restoration of DCS mesh networks with distributed control: Equipment Framework Generic Criteria, FA-NWT-001353, Issue 1, Bellcore, Dec. 1992. [20] The Role of Digital Crossconnect Systems in Transport Network Survivability, SR-NWT-002514, Issue 1, Bellcore Special Report, Jan. 1993 [21] Wu, T., Fiber Network Service Survivability, Norwood, MA: Artech House Inc., 1992. [22] Hu, T.C., Integer Programming and Network Flows, Reading, MA: Addison-Wesley, 1969.

VIII. Conclusion OPRA can restore failures quickly and efficiently in networks that have negligibly greater spare capacity than the theoretical minimum for multicommodity max-flow. Given the conservative processing delays and the tightly designed network used to find the results presented here, a complete restoration plan which fully restores any individual span failure would likely be identified before the two second call dropping threshold in a real network in practice. OPRA is a distributed path restoration algorithm based on the interference heuristic and bidirectional flooding. The interference heuristic coordinates the formation of restoration paths in a way that avoids synthesizing paths which render a large number of other potential restoration paths infeasible. Bidirectional flooding avoids the end-node bottleneck traversal problem. Together these principles enable OPRA to self-organize network spares into a near optimal multicommodity max-flow restoration pathset in less than two seconds in high capacity fiber optic networks. References [1] Iraschko, R.R., Path Restorable Networks, Ph.D. Dissertation, University of Alberta, Spring, 1997. [2] Iraschko, R.R., MacGregor, M. H., Grover, W. D., “Optimal Capacity Placement for Path Restoration in Mesh Survivable Networks”, Proc. IEEE ICC‘96, June 1996. [3] Grover, W.D., “Distributed Restoration of the Transport Network”, Chapter 11, pp. 337 - 419 of Telecommunications Network Management into the 21st Century - Techniques, Standards, Technologies, and Applications, edited by S. Aidarous and T. Plevyak, New York, NY: IEEE Press, 1994. [4] Grover, W.D., “The Selfhealing network: a fast distributed restoration technique for networks using digital cross-connect machines”, Proc. IEEE Globecom ‘87, Dec. 1987, pp. 28.2.128.2.6 [5] Grover, W.D., Selfhealing Networks - A Distributed Algorithm for k-shortest link-disjoint paths in a multi-graph with applications in realtime network restoration, Ph.D. Dissertation, University of Alberta, Fall, 1989. [6] Chow, C. E., Bicknell, J. D., Mccaughey, S., “Performance analysis of fast distributed link restoration algorithms”, International Journal of Communication Systems, vol. 8, 1995, pp. 325 - 345. 1. The length of time a statelet occupies a link can be approximated by dividing the average restoration time (330 msec) by the average statelet volume per link (7.84)

8

Suggest Documents