Optimal Monitor Placement Scheme for Single Fault ...

4 downloads 5665 Views 128KB Size Report
Keywords: Fault Detection, Monitor Placement and Approximation Algorithm. 1. INTRODUCTION. The fast fault ... tools. Node oriented tools collect information from network devices using SNMP/RMON probes [5] or Cisco. NetFlow tool [6].
Optimal Monitor Placement Scheme for Single Fault Detection in Optical Network Puspendu Nayek1, Sayan Pal1, Buddhadev Choudhury1, Amitava Mukherjee2, Debashis Saha3, Mita Nasipuri1 1 Dept of Computer Sc & Engg, Jadavpur University, Calcutta 700 032, India 2 Royal Institute of Technology, School of Electrical Engineering, Stockholm 10044, Sweden [email protected] or [email protected] 3 MIS and Comp Sc, Indian Institute of Management-Calcutta (IIM-C), Calcutta 700 104, India ABSTRACT This paper presents monitor placement scheme for single node fault detection in optical network. A single fault at a node may generally produce single/many alarms; as a result it becomes very difficult to detect the exact origin of failure. Our two-phased scheme minimizes the placement of the number of monitors to detect the origin of fault in polynomial time. We demonstrate the performance of our scheme on 14-node NSFnet. Keywords: Fault Detection, Monitor Placement and Approximation Algorithm 1.

INTRODUCTION The fast fault detection mechanism is of high priority for any optical network because failure can cause loss of large amount of information. The up-time requirement for the network is 99.999% of time. This requirement corresponds to connection downtime of less than 5 minutes per year. Hence, fault detection in the network and consequently alerting appropriately through alarms is prime network management activity. If a certain parameter is being monitored and its value falls outside the preset range, then the monitors raise alarms for any upcoming fault(s). Existing network monitoring devices can be classified into two types. They are node-oriented or path-oriented tools. Node oriented tools collect information from network devices using SNMP/RMON probes [5] or Cisco NetFlow tool [6]. We consider these tools as monitors, which trigger alarm when a particular parameter falls outside the predefined range. Fault detection mechanism in optical network analyses alarms triggered by monitors. So, proper placement of monitors is necessary for fast and flawless fault detection. The whole fault detection process may be divided into two steps namely monitor placement and detection of the origin of fault. The main objective of monitor placement is to maximize fault-detection coverage while reducing redundant alarms. This paper only considers single fault for the optimal monitor placement for static lightpaths. Related work on fault diagnosis and detection has been discussed in [1]-[4]. This paper is structured as follows. Section 2 describes the problem i.e. problem definition. Section 3 provides the reference network. Section 4 describes the algorithm along with the application on the reference network. Section 5 describes the simulation results on NSFnet and mentions the average deviation of the algorithm. The paper concludes with small discussion. 2.

PROBLEM DEFINITION We consider an optical network where Wavelength Division Multiplexing (WDM) technique is used for multiplexing. The network consists of nodes and fiber channels. Nodes are referred to as different elements of network like transmitter, receiver, add-drop multiplexer etc. Fiber channels are different wavelengths, established lightpaths on the network. We take into consideration network architecture and the established static lightpaths. Dynamic addition and deletion of lightpaths on demand from the network is not considered here. We assume that all nodes initially do not have monitoring capacity. Further we assume single node failure at a particular point of time. When a node fails in optical layer all lightpaths passing through that node would be affected and as a result the monitors attached to the nodes present in the affected lightpaths trigger alarms. Hence, a single fault will generate multiple alarms. By placing monitors in the right way, we can minimize the number of alarms generated for a fault while keeping the fault-detection coverage maximum. This is called optimal monitor placement problem in optical network. 3.

REFERENCE NETWORK We consider the following network to explain different steps of the algorithm. Initially, none of the nodes have monitor. Monitors will be assigned later. We only consider cases of single fault.

Figure 1: Reference NSFnet We assume that the following five light paths are established on the NSFnet consisting of 14 nodes and 21 links. LP1: 5-10-13-11; LP2: 10-13-14; LP3: 8-7-6-11; LP4: 10-5 LP5: 13-10-5-4 4.

OPTIMAL MONITOR PLACEMENT SCHEME Optimal monitor placement scheme is divided into two phases: Maximum Fault Coverage Model and Approximation Algorithm for Monitor Reduction. 4.1 Maximum Fault Coverage Model This is a preprocessing phase as mentioned in the [2]. First we place the monitors to satisfy maximum fault coverage criterion. 4.1.1. Initial Monitor Placement First, we consider light path, LP1. The monitor (M1) is placed at the input port of node 10, which joins path between nodes 5 and 10. Similarly, M2 and M3 are placed at nodes 13 and 14. However, it is being clear that if a monitor is already placed at an input port of a node then it monitors all lightpaths using that port. For example, LP2 also uses the port of node 13 but M2 will monitor that. Proceeding in this way we have placed 10 monitors in the network that is shown consequently in the figure 1. 4.1.2 Construction of Alarmmatrix Here, we construct a fault-reporting alarmmatrix. It is a two dimensional matrix. Number of rows is equal to the number of nodes present in the network and number of columns is equal to the number of monitors placed. Each matrix row represents a vector of alarms that are raised when a particular node fails. The alarm vector is determined for each node by using network topology and established lightpaths. Initially, we place monitors at every used input port on a network node. So, the resulting fault-reporting alarmmatrix gives an upper bound on fault reporting for the network. Hence, the maximum fault coverage criterion is achieved by such placement of monitors. For nodes through which no lightpath passes, the corresponding rows in alarmmatrix will be all zero. We delete such rows, as these nodes need not to be covered. For the reference network alarmmatrix is created as follows: Table 1: Alarmmatrix for reference network ND1 ND2 ND3 ND4 ND5 ND6 ND7 ND8 ND9 ND10 ND11 ND12 ND13 ND14

M1 0 0 0 0 1 0 0 0 0 0 0 0 0 0

M2 0 0 0 0 1 0 0 0 0 1 0 0 1 0

M3 0 0 0 0 1 0 0 0 0 1 0 0 1 0

M4 0 0 0 0 0 0 0 0 0 1 0 0 0 0

M5 0 0 0 0 0 0 0 1 0 0 0 0 0 0

M6 0 0 0 0 0 0 1 1 0 0 0 0 0 0

M7 0 0 0 0 0 1 1 1 0 0 0 0 0 0

M8 0 0 0 0 0 0 0 0 0 1 0 0 1 0

M9 0 0 0 0 0 0 0 0 0 0 0 0 1 0

M10 0 0 0 0 1 0 0 0 0 1 0 0 1 0

Here, rows corresponding to ND1, ND2, ND3, ND4, ND9, ND11, ND12, and ND14 are all zero rows. These indicate the faults occurring in these nodes need not be detected. Hence, they are deleted. After deleting all zero rows, we have the following alarmmatrix given in Table 2. The main objective of the next phase is to reduce the number of monitors i.e. delete columns from alarmmatrix in such a fashion that no two rows become identical and no row becomes all zero. A particular binary pattern in alarmmatrix will indicate a particular node failure. The Exhaustive Method for optimal monitor reduction is an NP-hard problem. Here, we replace Exhaustive Method by a polynomial time approximation algorithm, which gives a solution very close to the optimal solution and takes much less processing time so that it can be used in real time network for monitor placement. Table 2: Revised alarmmatrix for reference network ND5 ND6 ND7 ND8 ND10 ND13

M1 1 0 0 0 0 0

M2 1 0 0 0 1 0

M3 1 0 0 0 1 1

M4 0 0 0 0 1 1

M5 0 0 0 1 0 0

M6 0 0 1 1 0 0

M7 0 1 1 1 0 0

M8 0 0 0 0 1 1

M9 0 0 0 0 0 1

M10 1 0 0 0 1 1

4.2 Approximation algorithm This greedy algorithm selects the monitors, which are to be kept activated. Monitor reduction problem is equivalent to reducing columns of the alarmmatrix keeping all rows distinct as well as non-zero. A column in the alarmmatrix is assigned a weight called hit value to calculate the local best solution. The column with highest hit value is considered as local best and selected. Subsequent column selection depends on the columns already selected. The hit value calculation is illustrated in the section 4.2.1. Each column selection distinguishes some row from other or assigns first 1 to a pattern of a row. Algorithm can be given as follows: Initially no column is selected and hence all the rows are assumed to have all zero patterns. While (not All Distinct Rows) { For each monitor not yet selected { Calculate hit value of the column with respect to the patterns encountered up to the previous iteration If the hit value is greater than maximum hit value calculated for earlier columns (in this iteration) Set the column with maximum hit value } Insert the column with maximum hit value to reflect the patterns encountered } Figure 2: Approximation Algorithm 4.2.1 Time Complexity O (n3) Assume, dimension of alarmmatrix: (m x n). Hit value calculation Hit value of a column is the basis of column selection in the Greedy algorithm. Hit value is calculated as follows: 1. We cannot have all-zero rows. So a column is given a weight (R1), which gives first to R1 rows. 2. The rows, which have same patterns, form a group. The column divides some of such groups into distinguishable subgroups. The column, which divides more groups into more equal individual subgroups, is given more weight. Example: For hit value calculation of a column, 1. Number of rows encounters first row in their respective patterns is R1. 2. At the time of selection of a column, some of the groups of rows having same pattern up to this iteration may be divided into two subgroups. Let p groups be divided, and group having Ni number of elements is divided into two subgroups having N1i and N2i number of elements for i =1, 2,.., p. Then hit value for the column is given by, Hit = R1+ i=1 p (Ni - abs (N1i - N2i)). If N1i=N2i, hit value is maximum. A hit value calculation requires Θ(m) operations. Calculation of Time Complexity In worst case, n number (i.e. all columns) of columns is needed. At each iteration, hit value is calculated for each column not yet selected. A hit calculation for each column requires Θ(m) operations (i.e. a×m operation for some positive constant a). At the end of the iteration, the selected column is inserted. This insertion requires Θ(m) operations (i.e. b×m operations for some positive constant b). For the insertion of a column at (i+1)th iteration, hit calculation of (n-i) columns is required. But insertion of the selected column till requires Θ(m) operations. Keeping

track of the column of maximum hit after each column considered, the column with highest hit value can be obtained Number of operation for this is Θ(n) i.e. c×n for some positive constant c. Continuing in this way let n number of columns are inserted in the worst case. Then, T (n) = i=1 n ( j=1 n (am) + cn + bm) ≤ i=1 n ( j=1 n (am)) + c i=1 n n + b i=1 n m = i=1 n (am× ×n)+c× ×n× ×n + b× ×m× ×n = a ×m× ×n× ×n + c× ×n× ×n + b× ×m× ×n. Let, number of columns is greater than the number of rows for the alarmmatrix i.e. n>m (and this occurs generally). Then, T (n) ≤ an3 + cn2 + bn2 = an3+n2 (b+c). Thus, T(n) = O(n3). Now, for the reference network after applying the approximation algorithm we get the following reduced alarmmatrix at Table 3. It is possible to reduce the number of monitors further. The best solution is as follows at Table 4. ND5 ND6 ND7 ND8 ND10 ND13

M1 1 0 0 0 0 0

M2 1 0 0 0 1 0

M3 1 0 0 0 1 1

M5 0 0 0 1 0 0

M6 0 0 1 1 0 0

M7 0 1 1 1 0 0

M2 1 0 0 0 1 0

ND5 ND6 ND7 ND8 ND10 ND13

Table 3: Reduced alarmmatrix

M4 0 0 0 0 1 1

M5 0 0 0 1 0 0

M6 0 0 1 1 0 0

M7 0 1 1 1 0 0

Table 4:Best alarmmatrix for reference network

5. EXPERIMENTAL RESULTS We apply our monitor placement scheme on NSFnet topology with 14 nodes and 21 links. After building the alarmmatrix as in Section 2, we run the approximation algorithm and find the number of monitors required for locating single failures for this network. We also run the exhaustive method to find out the minimum number of monitors needed and present the comparative study of the monitors required in exhaustive method and approximation method (figure 3). Also we plot the maximum number of locations for monitor placement. Dynamic addition and deletion of lightpaths in the network have not been considered in this work. The approximation algorithm produces near optimal solution when the network is in almost steady state for the large number of established lightpaths. When the average degree of nodes is being less for small number of lightpaths used in the network, the reduction in monitors’ placements is relatively small. 1 2 0

8 0

6 0

A v e ra g e D e v ia t io n l

4 0

W o rs t c a s e D e v ia t io n

2 0

1 0 0 %

Figure 3: Monitor Comparison

17 0

N o o f L ig h tp a th s

15 5

14 0

12 5

11 0

95

80

65

50

35

20

0

5

% confidence level

1 0 0

1 2 0

% confidence level

1 0 0

8 0

6 0

A v e ra g e D e v ia t io n l

4 0

W o r s t c a s e D e v ia t io n

2 0

1 0 0 %

17 0

L ig h t p a t h s

15 5

14 0

12 5

11 0

o f

95

N o

80

65

50

35

20

5

0

Figure 4: Deviation We also present the average deviation of our scheme. The graph (figure 4) shows the average deviation i.e. percentage of the average deviation of approximation algorithm from the optimal solution. The deviation lies between 5% and 15%. The above graph shows the worst-case deviation. It lies between 20% and 50%. 6.

CONCLUSION In this paper, we have presented a scheme for placing monitors to detect single fault in optical network. We have shown that the monitors can be placed in polynomial time. A simple example has been given to illustrate the scheme. Our scheme has also given the average deviation around 5% to 15%. Simultaneous link failure is common in network [4]. So, extension to this work will include simultaneous multiple failure detection. Also, we are working on how to cope with the dynamic traffic conditions and the false and missing alarms. All the above are aimed to make the fault detection system more robust and cost effective. References 1. Carmen Mas and Patrick Thiran, An Efficient Algorithm for Locating Soft and Hard Failures in WDM Networks, IEEE Journal of Selected Areas of Communications, Vol. 18, No. 10, Oct 2000, pp 1900-1911. 2. Sava Stanic, Suresh Subramanium, Hongsik Choi, Gokhan Sahin and Hyeong-Ah Choi, “On Monitoring Transparent Optical Networks”, Proceeding of the International Conference on Parallel Processing Workshops (ICPPW), 2002. 3. R. H. Deng , A. A. Lazar , W. Wang , “A probabilistic Approach to Fault Diagnosis in Linear Lightwave Networks”. IEEE JSAC Vol. 11, No. 9, pp 1438-1448 4. I. Katzela, G. Ellinas , W. S. Yoon, T. E. Stern, “Fault Diagnosis in optical networks”, Journal of High Speed Networks, Vol. 10, no. 4, 2001, pp. 269-91. 5. William Stallings, “SNMP, SNMPv2, SNMPv3, and RMON 1 and 2”, Third Edition , Addison-Wesley Longman Inc., 1999 6. Cisco Systems, “NetFlow Services and Application”, 1999

Suggest Documents