On Integrating Failure Localization with Network ... - Semantic Scholar

On Integrating Failure Localization with Network Survivable Design Wei He∗ , Pin-Han Ho† and Bin Wu† ∗ Cheriton

School of Computer Science of Electrical and Computer Engineering University of Waterloo Canada Email: {w8he,p4ho,b7wu}@uwaterloo.ca

† Dept.

Abstract—Conventional all-optical restoration strategies like p-cycle achieve very fast restoration with high spare capacity consumption. In contrast, failure dependent protection (FDP) can achieve near-optimal capacity efficiency at the cost of high signaling/control complexity (so as for long restoration time). In this paper, we investigate a previously reported all-optical restoration framework that aims to yield a restoration speed similar to p-cycle while achieving near optimal resource consumption as FDP. In particular, we propose a simple yet efficient heuristic for joint allocation of monitoring trails and protection lightpaths, which serves as the key to enable the all-optical restoration. The resultant all-optical restoration framework is further examined by extensive simulations regarding the network resource consumption, number of transmitters, monitoring requirement, and running time. Index Terms—All-optical networks, monitoring trail (m-trail), protection and restoration.

I. I NTRODUCTION The paper investigates the all-optical restoration strategy reported in [1], [2], which aims to enable an all-optical and signaling-free failure restoration process for any shared protection scheme. Thanks to Network-wide Local Unambiguous Failure Localization (NW-LUFL) [1], [3], each node can localize any SRLG failure event instantly by inspecting the on-off status of the traversing monitoring trails (m-trails). Thus, it enables the restoration process of any shared protection scheme (e.g., shared path protection [4] and failure dependent protection [5]) to be possibly completed within tens of milliseconds while significantly releasing the upper layers from the fault management tasks defined in Generalized MultiProtocol Label Switching (GMPLS) [6]. A typical restoration process under the GMPLS protocol for interrupted working lightpath (W-LP) due to a failure event is briefed as follows. Upon the occurrence of the failure (e.g., a fiber cut), one or multiple adjacent nodes that detected the failure will send alarms to the decision nodes. This is referred to as failure detection, localization, and notification defined under the GMPLS based recovery, where Link Management Protocol (LMP) and/or resource reservation protocol such as RSVP, which are extended for optical networks, could be used for the purpose. After obtaining the failure status, the decision

J´anos Tapolcai‡ ‡ Dept.

of Telecommunications and Media Informatics Budapest University of Technology and Economics Hungary Email: [email protected]

node correlates the failure event based on the received alarms and then launches the recovery phase accordingly. Assuming that the protection lightpaths (P-LPs) are precomputed for each W-LP (which is the most commonly considered scenario under shared protection), an important task in the recovery phase is to activate the pre-planned P-LPs by way of device configuration. This is supported in GMPLS via some resource reservation protocol such as RSVP, where a path setup message is sent along the P-LP to configure the intermediate optical cross-connects (OXCs) one after the other. With the above standard GMPLS based recovery mechanism, there exists a trade-off between failure restoration time and spare capacity consumption in the design of shared protection schemes. Pre-configured Cycle (p-Cycle) [7], [8] is in one extreme of the design spectrum by taking very short restoration time since it does not require any multi-hop signaling in the control plane by skipping the failure localization, notification, and device configuration. Such simplicity is at the expense of high spare capacity consumption. In contrast, failure dependent protection (FDP) is in the other end with the optimal spare capacity efficiency at the cost of a longer/nondeterministic restoration time and complex control signaling. It has been an open question in the past decades how to design an optical restoration strategy with a restoration speed comparable to p-Cycle, while keeping network resource consumption close to the level of FDP. Our previous works [1], [2] tried to address the problem to some extent. With the joint consideration of monitoring and restoration capacity planning, a new design paradigm on spare capacity allocation emerged, thanks to the state-of-the-art of all-optical failure localization technique via monitoring trails (m-trails) [9]. An m-trail is a monitoring structure, whose supervisory lightpath can traverse any link in each direction at most once. Each m-trail has a transmitter, a receiver and a monitor to detect Loss of Light(LoL) of the whole lightpath. Specifically, [1] investigates the monitoring resource hidden problem by sharing capacity between the m-trails and the P-LPs. Since each node is required to localize all the SRLG failures no matter it is needed or not, unnecessary monitoring resource consumption was found and investigated in [2]. By using very short mtrails and in-band monitoring (i.e., taking the on-off status of

the W-LPs into the monitoring plane) in [2], the roles of mtrails and W-LPs become interchangeable. Such flexibility and simplicity in the transport plane management is gained at the expense of taking an excessive number of transponders. The proposed joint design problem is uniquely featured from any previously reported study: 1) Each node can obtain the on-off status of the traversing m-trails via tapping the optical signal of the lightpaths (also referred to as lambda monitoring). 2) Instead of localizing all the SRLG failures, each node is only required to distinguish a subset of SRLG pairs necessary for achieving the all-optical restoration. This is a further improvement against [1] and [2], where the former forces all nodes to distinguish every pair of SRLGs, while the latter, although defined neighborhood for each node under single link failures, still requires to distinguish more SRLG pairs than necessary, and to consume an excessive number of transmitters (or transponders) for using shortest paths as m-trails. 3) Without loss of generality, FDP is considered as the survivable routing strategy. It means every W-LP could be preplanned with multiple P-LPs, each protects one or multiple SRLGs that may affect the W-LP. 4) The wavelength channels (WLs) used by a P-LP can be reused by an m-trail during the normal operation. This makes sense since the P-LPs and m-trails will not be launched with traffic at the same time. To solve the formulated joint design problem, we propose a two-phase approach that allocates P-LPs via enumerating kshortest paths in the first phase followed by a novel heuristic for m-trail allocation, given a set of W-LPs and SRLGs under consideration. By combining the potential P-LP solution and m-trail solution, possible final solutions are inspected, and the one with the highest utility is selected according to a predefined cost function. Simulation is conducted to examine the proposed approach in terms of the required transponders and the total coverlength, and compare it with [1] and [2]. The rest of the paper is organized as follows. Section II introduces some preliminaries of the study. Section III proposes a two-phase heuristic for solving joint design. Simulations are conducted and analyzed in Section VI on random topologies. We conclude the paper in Section V. II. P RELIMINARIES The section firstly provides an overview on the all-optical restoration process reported in [1], [2], followed by formulating the design issues related to optical restoration as the joint design problem. Formal definition of monitoring requirement for joint design as well as a simple method of identifying the monitoring requirement from the corresponding P-LP assignment are explained later. A. Optical Restoration via M-trails To rule out any possible multi-hop electronic signaling in the restoration process under shared protection, an all-optical

Fig. 1.

GMPLS-based restoration Vs. Optical Restoration via m-trails

restoration framework via an interesting implementation of mtrails was reported in [1], [2]. Basically, the strategy employed by the two studies is to enable each node to obtain network failure status by detecting the on-off status of a subset of mtrails passing through it, such that each node is aware of how to react to a failure event. In specific, with all the W-LPs and P-LPs being given, each node can instantly identify any failed SRLG through the deployed m-trails. Then, the intermediate nodes along a pre-planned P-LP corresponding to the failure can form the P-LP and all-optically complete the traffic switchover for the affected W-LP in an extremely short time. Fig. 1 exemplifies the conventional GMPLS-based restoration process and the all-optical restoration in [1], [2]. Let a W-LP w1 (C-A-B-E) be protected by P-LP p1 (C-F-DE). When link (B,E) fails, the GMPLS-based restoration (see Fig. 1a) needs to go through a series of signaling mechanism to complete the restoration. Firstly, node B which is adjacent to the failed link (B,E), will notify the decision node C about the failure. After correlating all the received alarms, node C will initiate a recovery process by a device configuration process, in which the OXCs of C,F,D and E along p1 will be notified by a wake-up message, mostly done in a sequential manner. Thus, the GMPLS-based restoration process may take hundred of milliseconds of restoration time due to such a multi-hop signaling mechanism and cross-layer operation. In contrast, each node along the P-LP (i.e. node C,F,D,E) under the all-optical restoration approach in [1], [2] does not need to wait for a wake-up message as required in GMPLSbased restoration, and the decision node will be able to switch over the affected working traffic without waiting for the failure notification and correlation. As Fig. 1b shows, nodes on P-LP can simultaneously infer the network status from tapping the m-trails passing through them. In this paper, we will follow the same optical restoration process used by [1], [2]. Our goal is to develop a joint design approach that can consider routing of both P-LPs and m-trails, while further saving monitoring resources by defining the necessary monitoring requirements. B. Joint Design Problem The detailed design goals of the optical restoration strategy via m-trails can be formulated as the joint design problem. Definition 1: Given a host of connection requests and a set of SRLGs, the Joint Design Problem asks to route Working Lightpaths (W-LPs), Protection Lightpaths (P-LPs) and Mtrails for a network topology G(V, E) such that:

Fig. 3.

Fig. 2. Solution to a Joint Design Problem (”RConf” and “ConfGrp” means “restoration configuration” and “configuration group” respectively)

1) W-LPs meet the connection requests; 2) all W-LPs are survivable through any single SRLG failure, with the help of P-LPs; 3) Each node can react to any failure in parallel with sufficient information reported by m-trails; 4) resources consumed by W-LPs, P-LPs and m-trails are minimized. In addition, we use the following cost function to evaluate a joint design solution: X ce max{me , pe } (1) Cost = γ · (# of m-trails) + e∈E

where γ is a cost ratio for balancing the importance of a monitor and a Wavelength Channel(WL), ce is the predefined cost for using a WL on link e; me and pe represents the number of WLs consumed by m-trails or P-LPs on the link e respectively. Thus, max{me , pe } reflects that we allow mtrails and P-LPs to share WLs. As illustrated by Fig. 2, a joint design solution is comprised of three parts: P-LP assignment, m-trails, and Restoration Tables(RT). A “P-LP assignment” refers to a set of W-LPs to meet connection requests, and a set of P-LPs that will be activated upon a specific SRLG failure. According to a pre-defined PLP assignment, a node should configure its OXC to support the stipulated P-LPs. A “restoration configuration” of a node is defined as a set of P-LPs, for which the node needs to configure under the failure of one or multiple SRLGs. A “configuration group” (CG) of a node is a set of SRLGs that correspond to the same P-LP(s) passing through the node. Fig. 2 shows an example. Given a 5-node 8-link network and 12 SRLGs in Fig. 2a, the P-LP assignment in Fig. 2b clearly defines the P-LPs to be activated upon any SRLG failure. For

A restoration process based on a joint design solution

instance, when SRLG φ7 or φ8 fails, one P-LP ρ11 (2-1-3) will be activated. To achieve the P-LP assignment at each node, restoration configuration is pre-defined in the node such that it is aware of how to configure its OXC upon any identified failure event. For example, node 0 has to configure its OXC to form P-LP ρ12 when φ1 , φ6 or φ9 fails, or P-LP ρ22 when φ10 fails. Thus for node 0, φ1 , φ6 or φ9 are in one CG, φ10 is in another, while the remaining SRLGs are in a “Standby” CG (omitted in Figure), for which no actions are taken. Each node obtains an alarm code by observing the “set of mtrails” passing through it. As shown in Fig. 2c, four m-trails t0 to t3 (marked by dashed lines) are deployed to collect failure status, and each node maintains an alarm code table (ACT) based on the traversing m-trails (all nodes share one ACT in Fig. 2c). Each row in an ACT corresponds to an alarm code for a specific SRLG failure while each column corresponds to an m-trail. For instance, φ1 ’s alarm code is 0110. Note that the 1 bits in each alarm code represent the set of m-trails which will be interrupted by that SRLG failure and send out alarms. Each node should further maintain the mapping between each possible alarm code and the corresponding OXC configuration. This information can be kept in a RT, such that the device configuration of a node upon any SRLG failure can be performed simply via a look-up-table process (to achieve this, monitoring requirement defined in Subsection II-C must be met). For instance, the RT in Fig. 2d keeps the information how the OXC of each node should be configured upon a specific alarm code. Fig. 3 illustrates a typical restoration process with a joint design solution given in Fig. 2. As Fig. 3a shows, all m-trails and W-LPs are running normally when no failure occurs and a normal alarm code 0000 is received by all nodes. Suppose at some moment link(2,4) fails, which leads to the disruption of the W-LP ω1 (see Fig. 3b). Two m-trails t0 and t3 are interrupted as well and send out alarms. Thus, a new alarm code 1001 is reported. By checking the alarm code 1001 at its restoration table (see Fig. 3c), node 1, 2 and 3 will make proper configurations to help establish P-LPs. The remaining nodes simply ignore the failure since the alarm code is not

TABLE I G ETTING M ONITORING R EQUIREMENTS

found at their tables. Finally, as Fig. 3d shows, the data traffic is successfully switched over to the preplanned P-LP ρ11 . C. Monitoring Requirements To achieve the aforementioned all-optical restoration, it is essential to define the monitoring requirement at each node, such that the node can react to any SRLG failure in terms of its OXC configuration and/or traffic switch-over of some W-LPs, with the help of deployed m-trails. Definition 2: Necessary Monitoring Requirement(NMR): any alarm code used in one CG at a node should not be used by another CG of the node. To see this, suppose two SRLGs in different CGs are with the same alarm code at a node, then the node cannot react to the alarm code properly. To quantify the monitoring requirement of a restoration strategy, we can compute the total number of SRLG pairs required to be distinguished by the nodes using (2): XX |gik | · |gjk | (2) #(SRLG pairs) = k∈V i

On Integrating Failure Localization with Network ... - Semantic Scholar

On Integrating Failure Localization with Network ... - Semantic Scholar

Suggest Documents

On Integrating Failure Localization with Survivable ...

On Integrating Failure Localization with Survivable Design - UWSpace

Integrating Service and Network Management ... - Semantic Scholar

PrimoGENI: Integrating Real-Time Network ... - Semantic Scholar

PrimoGENI: Integrating Real-Time Network ... - Semantic Scholar

Further Results on Sensor Network Localization ... - Semantic Scholar

Weakly Supervised Object Localization with ... - Semantic Scholar

Cooperative Relative Robot Localization with ... - Semantic Scholar

Correlating exciton localization with compositional ... - Semantic Scholar

On-Body Smartphone Localization with an ... - Semantic Scholar

Underwater Localization with Time ... - Semantic Scholar

Integrating Workflow Management Systems with ... - Semantic Scholar

Fulminant liver failure and renal failure related with ... - Semantic Scholar

Integrating monitoring and inspection with ... - Semantic Scholar

CrowdMR: Integrating Crowdsourcing with ... - Semantic Scholar

Integrating Novel Class Detection with ... - Semantic Scholar

Integrating Decentralized Indoor Evacuation with ... - Semantic Scholar

AGRE: Integrating Environments with Organizations - Semantic Scholar

Integrating GIS components with knowledge ... - Semantic Scholar

AGRE: Integrating Environments with Organizations - Semantic Scholar

Integrating Replenishment Decisions with ... - Semantic Scholar

Integrating Cloud Application Autoscaling with ... - Semantic Scholar

Integrating Tps with mega - Semantic Scholar

Glomerulonephritis with Acute Renal Failure ... - Semantic Scholar