Probe Station Selection Algorithms for Fault Management in ... - cse.iitm

0 downloads 0 Views 315KB Size Report
problem of selection of probe stations to detect failures in the network. We present an ... and traceroutes can be used to check the network availability .... at least one tracer. .... the nodes use shortest paths to reach other nodes, (2) packets.
Probe Station Selection Algorithms for Fault Management in Computer Networks Deepak Jeswani, Nakul Korde, Dinesh Patil, Maitreya Natu

John Augustine

Tata Research Development and Design Centre, Pune, India. Email: {d.jeswani, nakul2.k, d.patil, maitreya.natu}@tcs.com

School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore. Email: [email protected]

Abstract—In this paper, we address the problem of probe station selection. Probe station nodes are the nodes that are instrumented with the functionality of sending probes and analyzing probe results. The placement of probe stations affects the diagnosis capability of the probes sent by the probe stations. The probe station placement also involves the overhead of instrumentation. Thus it is important to minimize the required number of probe stations without compromising on the required diagnosis capability of the probes. In this paper, we address the problem of selection of probe stations to detect failures in the network. We present an algorithm for probe station selection using a reduction of the probe station selection problem to the Minimum Hitting Set problem. We address several issues involved while selecting probe stations such as link failures and probe station failures. We present experimental evaluation to show the effectiveness of the proposed approach.

need to be minimized. Once the probe stations are identified, the probe selection problem addresses the task of selecting appropriate probes such that the failure can be detected and localized. As the probes involve an additional traffic overhead, the number of probes should be minimized. At the same time, the probes should be selected such that the detection and localization time is minimal. In this paper, we address the probe station selection problem. We present an algorithm for probe station selection for localizing node failures. Later we present how the proposed approach can be modified to address link failures. In order to simplify the problem, we limit the probe station selection for localizing at most k simultaneous node failures. The probe station problem can be defined as:

I. I NTRODUCTION

Given a network, find the minimal number of nodes in the network where the probe stations should be placed, such that any k node failures can be localized.

The demand for monitoring the network for detection and localization of faults is becoming more and more critical with the increasing size and complexity of the network. In the past, various approaches have been proposed for network monitoring. One promising approach proposed in the past is based on probing [4][5]. Probing involves sending probes as test transactions in the network. The success and failure of these probes depend on the success and failure of the network components used by the probe. Probes such as pings and traceroutes can be used to check the network availability and latency. More sophisticated application-level probes can be used to test the application performance. Probing based techniques have various advantages over the traditional passive monitoring based techniques [15], such as (1) less instrumentation, (2) capability to compute end-to-end performance, (3) quicker localization, etc. There are two main problems to address while developing probing-based monitoring solutions namely probe station selection and probe selection. The probe station selection problem addresses the problem of selecting nodes in the network where the probe stations should be placed. The probe stations are the nodes that send probes into the network and analyze probe results. The probe station nodes should be selected such that the required diagnosis capability can be achieved through probes. Furthermore, as the probe station selection involves an additional instrumentation cost, the number of probe stations c 2010 IEEE 978-1-4244-5489-1/10/ $26.00

Most of the probe station selection techniques suffer from following limitations: •





The problem of probe station selection has been addressed in the past by various researchers in different domains such as topology discovery, distance estimation, traffic estimation, etc. However, in most cases, the presence of failure in the network is not considered. The problem of probe station selection becomes more challenging in the presence of failures in the network. The connectivity of nodes from the probe stations changes in the presence of failures, thus affecting the overall diagnosis capability of the probe stations. One way to address the probe station placement problem is to deploy a small number of probe stations, but use probes that pass through explicitly routed paths [3] [9]. This approach involves very less instrumentation but demands source routing which is not feasible in many networks. In the past, approaches have been proposed that require placement of probe stations at all nodes making each node monitor its neighborhood [14]. This approach does not demand explicit routing of probes. Also the task of sending and analyzing probes gets divided among all the nodes which reduces the per probe station overhead.

However, the placement of probe stations involves an overhead of instrumentation. Instrumentation overhead refers to the deployment of the functionality of probing on different nodes in the network that are chosen are the probe stations. For real-life solutions, it is important to minimize the number of probe stations. We address the above limitations in this paper. • We address the presence of failures in the network by placing probe stations such that all nodes can be monitored by a probe station even in the presence of failures. We do this by ensuring that each node is monitored by multiple probe stations through independent paths. • We do not make an assumption of explicit routing of probe packets. Instead, we build the probe station selection solution using the routes used by the default underlying routing model. In this work, we assume that the routes do not change dynamically. Addressing dynamic routing is part of our future work. • We intelligently place probe stations to reduce the number of probe stations that are required to achieve the desired diagnosis capability. We present a novel reduction of the Minimum Probe Station Selection problem to the Minimum Hitting Set problem. We then use the approximation algorithms for computing the Minimum Hitting Set to develop approximation algorithms for Minimum Probe Station Selection. We have presented a detailed problem statement and initial ideas on reducing the probe station selection problem to the Minimum Hitting Set problem in an extended abstract in [8]. The main contributions of this paper are as follows: (1) We use a reduction of the probe station selection problem to the Minimum Hitting Set problem to present probe station selection algorithms using the approximation algorithm for the Minimum Hitting Set problem. (2) We address various aspects of probe station selection problem such as the diagnosis of link failures and diagnosis of probe station failures. (3) We present detailed experimental evaluation to compare the performance of the proposed algorithm in different network configurations and with different algorithms. We present related work in Section II. We present the detailed problem description in Section III. We then show the reduction of the probe station selection algorithm to Minimum Hitting Set problem and present algorithms using the existing approximations algorithms for the Minimum Hitting Set problem in Section IV. We show how the proposed approach can be extended to address link failures in Section V. Section VI presents the experimental evaluation of the proposed algorithms. We discuss open issues related to the problems of probing in general and probe station selection in specific in Section VII. Section VIII presents the conclusion and future work. II. R ELATED WORK The problem of probe station selection has been addressed in a variety of ways in the past. Authors in [7] and [16]

propose to divide the network into clusters and strategically place probe stations (tracers) such that each cluster is near at least one tracer. Similar problem has been solved in [16] by using clustering, where the network is divided into clusters and for each cluster a representative distance server is selected. Horton et al. [6] propose to place probe stations at high arity nodes and choose a routing policy to determine the direction of message forwarding. Authors in [9] propose to deploy a single network operations center and use explicitly routed probes along the network paths of interest. Authors in [1] manually choose probe stations segregated in four groups for different domains. Most of the existing techniques (1) assume explicit routing of probes and (2) do not consider network failures. Explicit routing is not always viable in real-systems. Also, the probe station selection problem becomes more challenging when network failures are taken into account. In the past, some researchers have addressed the problem of probe station selection while solving the problem of diagnosing node and link failures [2], [12]. Authors in [2] propose to place probe stations (beacons) in the network such that the network links can be monitored even in the presence of dynamically changing routes. This is done by identifying deterministically monitorable edge set, which is the set of edges that can be monitored under all routing configurations. In the past, we have also addressed the problems of probe station selection [12]. In [12] we propose probe station selection algorithms (SNR algorithm) for localizing node failures in the network. In this paper, we present a novel approach to address the Minimum Probe Station Selection problem by a systematic reduction of the Minimum Probe Station Selection problem to the Minimum Hitting Set problem. We show through experimental evaluation that the proposed algorithm outperforms the past algorithms. Furthermore, the proposed reduction provides opportunities for improvement in the Minimum Probe Station Selection problem with better approximation algorithms for the Minimum Hitting Set problem. III. D ETAILED P ROBLEM D ESCRIPTION We first define the terms we use in the following discussion. We represent the network as a graph G(V,E), where V represents the network nodes and E the network edges. The probe station set P, where P ⊆ V, consists of the set of nodes chosen as the probe stations. For clarity, we address only node failures. We later show how the proposed approach can be used to address link failures in Section V. For the ongoing discussion we assume static single-path routing. Thus packets between a particular source-destination pair always follow a single path that does not change with time. We refer to this routing model as the consistent IP routing model. A path from a probe station node to any other node is referred to as probe path. Two paths to the same destination are said to be independent if there are no common nodes in the path except the destination node, A node whose failure cannot be detected by the probe stations in a certain scenario of k node failures is termed as shadow node.

Fig. 1. Three independent paths to node 7 allow detection of failure at node 7 even in the presence of two other node failures (nodes 2 and 4).

Fig. 2. Example topology with nodes 1 and 3 as probe stations. Figure shows available paths from probe station nodes 1 and 3 to other nodes. With the given availability of paths, all nodes except node 5 have 2 independent paths from the probe stations, making node 5 the shadow node (assuming k = 2).

For clarity we assume that the probe station nodes are fault tolerant and hence do not fail. We later relax this assumption to use the proposed algorithms for detecting probe station failures as well. The algorithms presented in this paper use the following theorem: Theorem 3.1: Assuming a consistent IP routing model with at most k failures in the network, a set of probe stations can localize any k non-probe-station node failures in the network if and only if there exist k independent probe paths to each non-probe-station node [12]. The example shown in Figure 1 explains the above theorem. Figure 1 shows paths to node 7 from the probe station nodes 1, 3, and 5. It can be seen that node 7 has 3 independent (node disjoint) paths from the probe station nodes 1, 3, and 5. Thus failure of node 7 can be detected even if there are two more failures in the network. For instance, if nodes 2 and 4 fail then probe station nodes 1 and 3 will not be able to detect the failure of node 7. With the assumption of at most 3 failures, the third independent path to node 7 from the probe station node 5 will not have any intermediate failures, making the probe station node 5 detect the failure of node 7. From the above theorem, the shadow nodes can be redefined as the nodes that have less than k independent paths from the probe stations. The concept of k independent paths has also been used in the past [2], [12]. Figure 2 shows an example topology with nodes 1 and 3 as probe station nodes. Figure 2 also shows the paths from the probe station nodes to other nodes. It can be seen that nodes 2, 4, and 6 have 2 independent paths from the probe stations. However, node 5 does not have two independent paths from the probe station nodes. Note that we do not demand an explicit path routing. With a customized

routing, node 5 can be provided 2 independent paths from the probe stations. However, with the given path availability, paths from both the probe station nodes to node 5 have a common node 4. Assuming k equals 2, in the topology presented in Figure 2, node 5 is the shadow node. The objective of the probe station selection algorithm is to select probe stations such that there are no shadow nodes in the network. The probe station selection problem can then be defined as follows: Given a network, find the minimal number of nodes in the network where the probe stations should be placed, such that there exist k independent paths from the probe station nodes to all non-probe station nodes. We next address two other issues related to the problem of probe station selection. We first discuss how the neighboring nodes of the probe station nodes can be treated. We then address the problem of probe station failures. A. Treatment of the neighboring nodes of the probe-station nodes The probe station neighbors can be directly probed by the probe stations. The probe-path to the neighboring nodes does not suffer from the presence of k failures in the network. Thus while selecting probe stations, k independent paths need not be computed for the probe station neighbors. However, this assumption cannot be made in certain scenarios such as probe station failures or link failures. In such cases the probe station might not be able to probe its neighboring probe station and thus the neighboring probe stations also require k independent probe paths. One approach to take advantage of better connectivity of the probe station neighbors is to provide lesser (k 0 < k) independent probe paths to these nodes. B. Considering probe station failures We have assumed that the probe stations are fault-tolerant. In cases, where fault-tolerance of probe stations is uncertain, this assumption needs to be relaxed. We can relax this assumption and address probe station failures, by some simple modifications. In order to detect probe station failures and ensure probe station connectivity even in the presence of probe station failures, we propose to provide k independent probe paths to each probe station as well. Thus the problem of probe station selection is then redefined as: Given a network, find the minimal number of nodes in the network where the probe stations should be placed, such that there exist k independent paths from the probe station nodes to all non-probe station nodes as well as the probe station nodes. Similar to the probe station neighbors, one approach to take advantage of fault tolerance of the probe station nodes is to provide lesser (k 00 < k) independent probe paths to these nodes. For clarity, we do not consider probe station failure in the following discussion.

Fig. 4.

Proposed approach using the Minimum Hitting Set reduction.

We next show that the probe station selection problem can be reduced to the Minimum Hitting Set problem. We present a novel reduction of the probe station selection problem to Minimum Hitting Set problem. We then present algorithms for probe station selection using the approximation algorithms for the Minimum Hitting Set problem. IV. P ROPOSED APPROACH The Minimum Probe Station Selection problem has already been proved to be NP-Complete [12] using a reduction from the Set Cover problem. It can be easily proved that the log(n) inapproximability result of the Minimum Set Cover problem carries over to the Minimum Probe Station Selection problem as well. In this section, we reduce the Minimum Probe Station Selection problem to a dual of the Set Cover problem, namely, the Minimum Hitting Set problem. Using this reduction we show that a solution to the Minimum Hitting Set problem can provide a solution to the Minimum Probe Station Selection problem. We then use the approximation algorithms of the Minimum Hitting Set problem and apply that to the Minimum Probe Station Selection problem. Note that the Minimum Set Cover being the dual of Minimum Hitting Set, the problem can be solved in a similar manner using the Minimum Set Cover problem. A. Reduction to the Minimum Hitting Set problem We model the network by an undirected graph G(V , E ), where the graph nodes, V , represent the network nodes (routers, end hosts) and the edges, E, represent the communication links connecting the nodes. We use Pu,v to denote the path traversed by an IP packet from a source node u to a destination node v . For simplifying the reduction, we make the following assumptions about the underlying routing model: (1)

the nodes use shortest paths to reach other nodes, (2) packets for the same destination are always forwarded to the same next hop by a node, (3) paths are symmetric, and (4) path between any two nodes is static and does not change with time. As a consequence, for a node s, the subgraph obtained by merging all the paths Ps,t for every t ∈ V has a tree topology. We refer to this tree for node s as its routing tree Ts . In what follows, we use the following definitions: • Path nodes: We represent the nodes used on a path Pu,v with a set PNu,v • Independent paths to a node: We define paths Pu1 ,v and Pu2 ,v to the same destination node v as independent if none of the nodes on Pu1 ,v is present on Pu2 ,v except the destination node v . That is PNu1 ,v ∩ PNu2 ,v = {v }. The probe station selection problem is to select the smallest set of nodes as probe stations such that for every node v that is not a probe station, there are k independent paths from the probe stations to v. To facilitate the reduction, we define both the Minimum Probe Station Selection problem and the Minimum Hitting Set problem precisely. Minimum Probe Station Selection problem Instance: Graph G(V , E ), a routing tree Tu for each node u ∈ V , an integer k . Problem: Find the set Q ⊆ V of least cardinality such that every node u ∈ {V − Q} has k independent paths from the nodes in Q. Minimum Hitting Set problem Instance: Collection C of subsets of a finite set S . Problem: Find the hitting set H ⊆ S of least cardinality for C. A set H ⊆ S is a hitting set of C if it contains at least one element from each subset in C. We now provide a reduction of the Minimum Probe Station Selection problem to the Minimum Hitting Set problem such that a good solution for the Minimum Hitting Set problem (small hitting set size) will intuitively imply a good solution to the Minimum Probe Station Selection problem (small number of probe stations). The instance of the Minimum Hitting Set problem consisting of a finite set S and a collection C of subsets of S is constructed as follows. Given a Minimum Probe Station Selection instance (Graph G(V,E), routing tree Tu for each node u ∈ V , integer k), create a Minimum Hitting Set instance as follows: 1) Theelements of S are themselves sets. They are the |V | distinct subsets, S1 , . . . , S(|V |) , of the set V , such k k that |Si | = k. 2) Collection C of subsets of S: Each Ci ∈ C (corresponding to vertex vi ∈ V ) is a subset of S and contains the elements Si ∈ S such that the set of nodes represented by Si provide k independent paths to the node vi . It can be seen that the above explained reduction can be performed in polynomial time. For a network size of n nodes and diagnosis of at most k failures, the reduction can be

Fig. 3.

Construction of the Minimum Hitting Set problem instance from the Minimum Probe Station Selection problem instance

performed in O(nk ) operations. From the resulting hitting set H ⊆ S, the probe station set Q can be derived as follows: Q=

[

Hi

(1)

∀Hi ∈H

The above reduction can be explained with the example shown in Figure 3. Figure 3a shows a set V of 5 nodes and a  set S of 52 combinations of the set V, where each combination Si ⊆ S is of size 2. A node v ∈ V is connected to a node pair Si ∈ S if the set of nodes in Si provide two independent paths to v. For instance, node 1 is connected to pairs (1,2) and (1,3), as these pairs provide independent paths to node 1. For clarity the actual network and the paths from each node to every other node are not shown. The set S is one-to-one mapped to a set S 0 in Figure 3b. For instance, the pair (1,2) and (1,3) in S are mapped to the elements a and b in the set S 0 . Based on the mapping of each node v ∈ V to the node pairs in S, a collection C of subsets of S 0 is built. The collection C contains |V | sets such that one node v ∈ V corresponds to one set Ci ∈ C . If a node v ∈ V is connected to m sets in the S, then the set Ci consists of corresponding m elements in the set S 0 . For instance, as node 1 is connected to pairs (1,2) and (1,3), node 1 is mapped to the set {a,b} in C. A Minimum Hitting Set solution for the instance in Figure 3b results in the solution (a,e). The sets a and e in S 0 correspond to the sets (1,2) and (1,3) in S. Thus from the Minimum Hitting Set solution (a,e), the probe station set solution (1,2,3) can be built. Note that another Minimum Hitting Set solution (a,i) results in the probe station set (1,2,3,5). We discuss this issue later in the paper.

B. Greedy algorithm for probe station selection In this section, we present an algorithm for probe station selection using a greedy approximation algorithm for the Minimum Hitting Set problem. We implemented a greedy approximation algorithm for the Minimum Hitting Set problem and used the reduction explained in Section IV-A to compute probe stations. For a problem instance with n sets, the computational complexity of the greedy approximation algorithm for computing the Minimum Hitting Set is O(n2 ). Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for Minimum Hitting Set problem under plausible complexity assumptions. For a given hitting set instance (S, C), the greedy approximation algorithm incrementally selects an element si ∈ S that is present in the largest number of uncovered sets in C. The elements are selected in this fashion until all the elements in S are covered. Note that we call a set uncovered if none of the elements in the set are present in the hitting set H. Thus a set Sj is uncovered if ∀hi ∈H hi ∈ / Sj

(2)

From the resulting hitting set the probe station set is built as per the reduction shown in Section IV-A. Note that the solution for the Minimum Hitting Set problem only attempts to minimize the size of the hitting set. Consider the example shown in the Figure 3. Figure shows two solutions to the problem instance. The probe station set generated from the solution {a,e} is {1,2,3}. On the other hand, another solution {a,i} though of the same size generates a larger probe station set {1,2,3,5}.

From the perspective of the Minimum Probe Station Selection problem, the hitting set solution that generates a smaller probe station set should be preferred. To address this requirement, we present a weighted greedy approach to find the hitting set. We propose to add a weight to each element Si where the weight is computed as follows: From the construction, we know that each element Si ∈ S represents a set of nodes that provide an independent path to a node Vi ∈ V . The weight of an element Si is equal to the number of nodes in the set of nodes represented by the element Si that are different from the nodes represented by the element selected in the hitting set computed so far. Thus the weight represents the number of additional nodes that get added to the resulting probe station set. The algorithm then uses the following heuristic: select the element Si ∈ S such that Si is present in the maximum number of uncovered sets in C and adds up minimum number of new nodes to the resulting probe station set. The algorithm uses a weighted heuristic assigning equal weights to both the above mentioned criteria.

Fig. 5. (a) Network topology. (b) Model of the network in (a) representing both nodes and links as graph nodes.

V. A DDRESSING LINK FAILURES We considered only the node failures in the approach presented in the previous sections. We next show how the proposed approach can be used to address link failures. Considering link failures brings in various challenges in the proposed solution. Links are different from nodes in various ways that affect the probing solutions. For instance: • Unlike nodes, components representing links cannot be probe stations • In a connected network assuming the default IP routes, all nodes can be reached from any probe station. On the other hand, not all links might be reachable from a chosen probe station using the default routes. Thus the probe station selection component needs additional functionality such that the health of all nodes and links can be diagnosed even in the presence of failures of nodes and links. • Probes cannot start or end at the link components. They always start and end at the nodes. • Probes passing through a link always pass through the end-nodes of the links. In case of a node, there can exist probes that pass through the node without passing through some of its incident links. The tight-coupling of the link with its end-nodes demands additional functionality in the probe selection component for localization of node and link failures. A. Modelling of links In order to localize both the node as well as link failures in the network, we present a model where both the physical links and nodes of the network are modelled as graph nodes. We propose to represent each link component with a graph node. Thus a node-link-node component n1 - n2 will be represented as a node-node-node component n1 - en1 ,n2 -n2 .

Fig. 6. (a) Two independent paths to node 5. (b) Three independent paths to link 4-5.

With such a representation all the network components of interest are represented as nodes. The links in the model do not represent any physical component. The links in the model simply represent the logical interconnections between the components. The link n1 → n2 and n1 ← n2 can be represented as two different components. For simplicity we are representing them with a single component en1 ,n2 . Figure 5 shows an example of this model for an example network. B. Independence of paths Recall from Section IV that, we call two paths to the same destination independent, if there does not exist any common component on the two paths except the destination node. If the destination component is a network node then the independent paths to the node do not have any component common other than the destination node. Figure 6a shows two independent paths from probe station nodes 2 and 7 to the node 5. However, as the probes travel from a node to another node, a probe passing through a link will always pass through the end-nodes of the link. Thus, if the destination component is a link, then the independent paths to this link will have a common nodelink-node component of the destination link. Figure 6b shows three independent paths to the link connecting nodes 4 and 5. Note that in this example, the links 4 → 5 and 5 → 4 are not

differentiated and the independent paths to both contribute to the independent paths to the link 4-5. C. Neighboring components of a probe station node In the absence of link failures, the nodes connected to the probe station by a direct link can be considered as the neighbors of the probe station and thus can always be diagnosed without requiring multiple independent paths. However, when considering the link failures, the above assumption cannot be made. The failure of the link between the nodes directly connected to the probe station can prevent the probe station from diagnosing the health of its 1-hop neighboring nodes. Thus, when considering link failures, the neighboring components of a probe station node are only the links incident on the probe station nodes. VI. E XPERIMENTAL EVALUATION In this section, we present the experimental evaluation of the proposed algorithm. We apply the algorithms to build a probe station set that can localize at most 3 node failures. We assume that the probe stations are made fault tolerant and do not fail. As the optimal (combinatorial) algorithm provides the smallest probe station set, we use its result as a benchmark to compare the size of the probe station set computed using the proposed heuristic-based algorithms. Because of its combinatorial nature, we were not able to run the optimal algorithm for larger networks. For evaluation of the proposed algorithms on larger networks, we compared the proposed algorithms with (a) the Random selection algorithm, (b) the Max-Degree algorithm [6], and (c) the Shadow Node Reduction (SNR) algorithm [12]. In the Random selection algorithm, probes stations are selected at random locations in network until the desired diagnostic power is obtained. The Random selection algorithm selects a node as a probe station randomly and updates the shadow node set. The algorithm keeps selecting probe stations in this fashion until it achieves a null shadow node set. The Max-Degree algorithm [6] selects a node location with maximum degree as a probe station and updates the shadow node set. The algorithm keeps selecting nodes with higher degree amongst remaining nodes until it achieves a null shadow node set. The Shadow Node Reduction algorithm selects a node that provides independent path to maximum number of nodes. The algorithm keeps selecting probe stations until each node can be probed by k independent paths. We conducted experiments on network size ranging from 25 to 150 nodes with average network degrees ranging from 6 to 8. We have used BRITE [10] to generate network topologies. Each point plotted on the graph is an average of 20 runs. We have plotted the 95% confidence intervals. A. Comparison with the Optimal algorithm We were unable to run the optimal algorithm on larger networks because of its large execution time. Hence for evaluation

of the proposed algorithms with the optimal algorithm, we have conducted experiments on network size from 10 to 50 nodes with average node degree changing from 4 to 6. Figure 7 compares the proposed algorithm with the Optimal, Random, Max-Degree, and the SNR algorithm. The results show that the proposed algorithm (Greedy HS) provides very close to the optimal results. The proposed algorithm also performs better than the algorithm SNR [12] proposed in the past. The Max Degree and the Random algorithms compute significantly larger size of the probe station set. Figure 7 shows the results for two different values of average node degree. Figure 7a and Figure 7b show the results for networks with average degree 4 and 6 respectively. It can be seen that fewer number of probe stations are required with increasing average node degree. With higher average node degree, the nodes are more densely connected, providing more paths from the probe stations to the nodes. Thus with higher average node degree a fewer number of probe stations are able to provide more independent paths to the non-probe-station nodes. B. Evaluation on larger network sizes We next perform the comparison without the Optimal algorithm. We were unable to run the Optimal algorithm on larger network sizes because of its large execution time. In this section we compare the performance of the proposed algorithm with Random, Max Degree and SNR algorithm on larger network sizes. We considered the network of size ranging from 25 to 150 nodes with an average node degree changing from 6 to 8. Figure 8 shows the number of probe stations computed by different algorithms in this scenario. As part of the future work, we plan to perform experiments on even larger network sizes and on real networks. It can be seen in Figure 8 that the SNR and the proposed GreedyHS algorithms significantly outperform the Random and Max Degree algorithms. As observed in the previous section, the probe station set computed by the GreedyHS algorithm is smaller than that computed by the SNR algorithm. The difference in the sizes of the probe station sets computed by the two algorithms decreases with increasing network size. However, the proposed reduction presents opportunities for improvement in the solution for the Minimum Probe Station Selection problem. A more effective approximation algorithm to compute the Minimum Hitting Set can result in a more effective algorithm in computing the Minimum Probe Station set. This investigation is part of our future work. Figure 8 also shows the effect of change in the average node degree on the computed probe station set size. Figure 8a and Figure 8b present results for networks with average node degrees 6 and 8 respectively. As discussed in the previous section, Figure 8 also shows decrease in the number of probe stations with increase in the average node degree. VII. D ISCUSSION There are various other issues that need to be addressed while developing a complete probing-based solution. In this

Fig. 7.

Number of probe stations required for the optimal and other algorithms for networks with (a) degree = 4, (b) degree = 6

Fig. 8.

Number of probe stations required for the approximation algorithms for networks with (a) degree = 6, (b) degree = 8.

section we present various such problems related to probing in general and probe station selection in specific. A. Probe selection After the deployment of probe stations, appropriate probes need to be selected such that the required diagnosis capability can be obtained. As probes involve sending additional network traffic, it is important to minimize the number of probes to perform fault diagnosis. One approach to optimize the probe traffic without compromising on the diagnosis accuracy and localization time is to adapt the probe set [11]. Send a minimal number of probes to detect the presence of failure in the network. Once the failure is detected, then send more probes in the observed area of interest. Probing decisions can also be used for probe station selection. Thus, an alternative approach to probe station selection is to first identify the required probes and then select appropriate nodes as probe stations that can best provide the required probes [13]. B. Non-deterministic environment In many real-life scenarios, the assumption of the availability of deterministic information need to be relaxed while developing the probing-based solutions. One such scenario is the availability of incomplete and inaccurate topology information. The information about the underlying topology, the request-component dependencies, the number of nodes, etc. might not be complete and accurate due to various reasons

such as limited access to the system information, dynamically changing routes, presence of load-balancers, etc. In such cases, a non-deterministic model of the network needs to be built. In such cases, the selection of probe stations and probes would involve probabilistic reasoning. C. Addressing other types of failures The work presented in this paper addresses node failures. We then show that simple modifications to the existing approach can address link failures. Similarly we show that the case of probe station failure can also be addressed by simple modifications. However, we have only targeted fail-stop failures in this work. As part of our future work, we aim to address performance failures, where the probe stations observe a degradation in the end-to-end performance metrics. This involves addressing problems such as forecasting a potential failure or Service Level Objective (SLO) violation, correlating the observed performance degradation to the performance of individual component, root-case analysis of the observed performance degradation, etc. VIII. C ONCLUSION AND F UTURE W ORK We addressed the problem of probe station selection in order to monitor the network for detecting and localizing node failures. We presented a systematic reduction of the Minimum Probe Station Selection problem to the Minimum Hitting Set problem and presented an algorithm for probe station

selection. We then demonstrated how the proposed approach can be modified to address link failures and probe station failures. We presented experimental evaluation of the proposed algorithm for various network configurations to demonstrate the effectiveness of the proposed approach. As part of the future work, we aim to perform detailed theoretical analysis of the proposed approach and perform more extensive experimental evaluation. In this paper we have addressed fail-stop failures. As part of the future work, we aim to work on addressing Quality of Service (QoS) failures. The algorithms in this paper ensure the availability of k independent probe paths but do not aim to optimize probe traffic or the localization time. An interesting approach to pursue is to identify suitable probes based on the criteria of minimizing probe traffic or localization time, and then attempt to minimize the number of probe stations. The path attributes such as latency, loss rate, bandwidth, etc. can also be considered in this approach. R EFERENCES [1] Cooperative association for Internet data analysis (CAIDA). the skitter project. Technical report, http://www.caida.org/tools/measurement/skitter/index.html, 2001. [2] Y. Bejerano and Rajeev Rastogi. Robust monitoring of link delays and faults in IP networks. In IEEE INFOCOM, San Francisco, CA, Mar 2003. [3] Y. Breitbart, C. Y. Chong, M. Garofalakis, R. Rastogi, and A. Silberschatz. Efficiently monitoring bandwidth and latency in IP networks. In IEEE INFOCOM, Tel Aviv, Israel, Mar 2000. [4] M. Brodie, I. Rish, and S. Ma. Optimizing probe selection for fault localization. In Distributed Systems Operations Management, pages 1147–1157, 2001. [5] M. Brodie, I. Rish, S. Ma, G. Grabarnik, and N. Odintsova. Active probing. Technical report, IBM, 2002. [6] J. D. Horton and A. Lopez-Ortiz. On the number of distributed measurement points for network tomography. In Internet Measurement Conference, IMC, 2003. [7] S. Jamin, C. Jin, Y. Jin, Y. Raz, Y. Shavitt, and L. Zhang. On the placement of Internet instrumentation. In IEEE INFOCOM, Tel Aviv, Israel, Mar 2000. [8] D. Jeswani, N. Korde, D. Patil, M. Natu, and J. Augustine. Probe station selection algorithms for fault management in computer networks. In Extended abstract, Student Research Symposium, HiPC 2009, Kochi, India., Dec. 2009. [9] F. Li and M. Thottan. End-to-end service quality measurement using source-routed probes. In 25th Annual IEEE Conference on Computer Communications (INFOCOM), Barcelona, Spain, Apr 2006. [10] A. Medina, A. Lakhina, I. Matta, and John Byers. Brite: An approach to universal topology generation. In International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunications SystemsMASCOTS’01, Cincinnati, Ohio, Aug. 2001. [11] M. Natu and A. S. Sethi. Application of adaptive probing for fault diagnosis in computer networks. In IEEE/ IFIP Network Operations and Management Symposium, Salvador, Brazil, Apr. 2008. [12] M. Natu, A. S. Sethi, and E. Lloyd. Efficient probe selection algorithms for fault diagnosis (invited paper). Telecommunication Systems Journal, 37:109–125, 2008. [13] H. X. Nguyen and P. Thiran. Active measurement for multiple link failures diagnosis in ip networks. In Passive and Active Measurement Workshop (PAM), pages 185–194, 2004. [14] A. Reddy, R. Govindan, and D. Estrin. Fault isolation in multicast trees. In ACM SIGCOMM, 2000. [15] M. Stemm, R. Kaltz, and S. Seshan. A network measurement architecture for adaptive applications. In IEEE INFOCOM, Mar 2000. [16] W. Theilman and K. Rothermel. Dynamic distance maps of the Internet. In IEEE INFOCOM’2000, Mar 2000.

Suggest Documents