towards improving polling efficiency for network monitoring

11 downloads 955 Views 450KB Size Report
Abstract - One of the criticisms of network monitoring is the performance problems caused by the management traffic generated by he polling technique used to ...
A TRADE-OFF ANALYSIS MODEL: TOWARDS IMPROVING POLLING EFFICIENCY FOR NETWORK MONITORING Feng Gao, Jairo Gutierrez* Department of Computer Science, Department of Management Science and Information Systems* University of Auckland, New Zealand Email: [email protected], [email protected] Abstract - One of the criticisms of network monitoring is the performance problems caused by the management traffic generated by he polling technique used to gather management information. This article presents a trade-off analysis model which enables the network management system to differentiate the polling frequencies systematically so as to collect data from the point of view of the entire network. The simulation results show that for various measurement objectives, the selective trade-off analysis approach is more efficient than at least that of a random polling scheme, and performs well across a wide range of traffic loads. Keywords – Trade-off, Network monitoring, Polling

Network management is essential for providing reliable services over networked systems. Network monitoring forms the basis of control and management systems. The collection of measurements to discover network conditions and alert to a real or potential problem is a promising approach for network monitoring. There are various ways to generate measurement related management information and traffic. Among them, the polling technique using the request-and-response mechanism is the one most widely used. One of the criticisms of network monitoring is the performance deterioration caused by all traffic being directed to and from the locations where management applications reside. In addtion, the volume of data that is collected directly impacts the performance of the managed systems. Due to polling ’s very serious weakness of generating excessive management traffic, asynchronous traps, generated only when the alarm conditions occur, are also deployed simultaeneously. Nevertheless, the reduction of the amount of polling traffic contributes more to management efficiency, and that reduction is the main research motivation for this paper. This study specially addresses the selection of monitoring intervals. Although a smaller polling interval gives more detailed information, it comes at a certain cost: the smaller the polling interval, the larger the management traffic caused by traffic information reporting. Conversely, if the timer interval for two consecutive polling requests is too long, it is hard to understand the state of the network in real-time. This paper proposes a new scheme for a dynamic polling strategy that considers, from a global perspective, the issur of minimizing the monitoring cost systematically. This scheme examines and differentiates polling frequencies individually rather than randomly allocating the polling frequency. The scheme adjusts polling intervals to make the frequency assignments more consistent globally to reduce management resource usages and management impact. In the following sections, without loss of generality, the concepts of management traffic, management cost, and management overhead are interchangeable.

In the next section, a brief review of related works and their drawbacks are highlighted. Section II introduces the research motivation and philosophy. The proposed model is presented in section III. Section IV demonstrates the feasibility of such approach through simulation. Some issues are raised in section V. Finally, in section VI, a summary concludes this paper. I. RELATED WORK Much effort on network monitoring has been devoted to provide a unified monitoring framework, including management coverage and management efforts allocation. (Jiao, 1999) defined the monitoring problem and provided a formal description: The monitoring problem is to construct correct efficient monitoring algorithms for any given set of variables, integrity constraints and alarm conditions. As part of the solutions, a common approach to reducing monitoring cost is to vary the polling frequencies based on the state and characteristics of variables being monitored. After reviewing the existing polling schemes, a dynamic polling scheme based on time variation of network management information values is proposed by (Yoshihara, 1999), which considers both the overhead of polling traffic and the message rates required for management tasks. (Lin, 2002) described a scalable monitoring approach to reduce the amount of information exchange. These approaches however have no systematic differentiation performed between the polling frequencies used thus limiting the overall management overhead reduction. On the other hand, to construct an efficient monitoring algorithm, several proposals presented to balance the management achievement and management efforts. For both the link-bandwidth and path-latency measurements. (Breitbart, 2001) demonstrates various algorithms of collecting linkbandwidth utilization information while minimizing the required number of SNMP probes. Jiao (Jiao, 2000), in the paper titled “towards efficient monitoring”, introduces the concepts of REQUIRED and NON-REQUIRED. The REQUIRED set includes those variables that must be measured to ensure the detection of potential alarm conditions,

and the rest are non-required. To realize this classification, Jiao adapts probabilistic estimations to reduce the average case cost. Based on the literature review, and to the authors’ knowledge, the trade-off analysis model presented in this paper is the first attempt to consider the monitoring frequency variation as a key component of systematic management achievement. II. RESEARCH PHILOSOPHY This study tries to transfer the network monitoring focus from the selection of individual polling frequencies randomly to an efficient way of differentiating polling resources, so as to satisfy multiple criteria measurement objectives. The motivation is that an efficient and accurate application of management can not be achieved without considering the exact effect of each management element on the overall performance of the syetem. This study widens the interest of previous research by considering the interdependence between the different monitoring components and metrics, and the relative importance of each of the performance and management factors and how their impact should be determined. The authors believe that critical network components and services contribute more towards overall performance and should be managed at higher management priority levels than that of those less critical elements. For instance, it is obvious that a centrally located node has an effect on the network performance that is higher than that of a remotely placed node. It is essential therefore to determine the order of importance of each factor and assign a suitable priority. Performance or management factors, which have a high priority and a greater impact on overall system quality, need more careful measurements. Certain factors which have a signficant impact on the monitoring (polling) efficiency include: critical network entities and peformance metrics. Generally, the analysis has to balance the effect and efforts for a higher network health and a lower network load. Conversely, as a consequence, such trade-off unavoidably introduces inaccuracy in the network state information available to the network management system (NMS). Thus, matters of interest now are: How to balance trade-offs. In other words, how could the trade-offs be optimized so as to maintaining the accuracy and efficiency of the system? The polling problem being considered is formally defined as how to select the monitoring intervals with minimum management cost and maximum management performance. It would be great to minimize the management cost but still detect network health and alert conditions as soon as possible.

This effort transforms the monitoring frequency problem into a multi-objective optimization problem (MOP). For such multiple optimization problems, and following the analysis and comparisons found in (Zitzler, 1999), this study adapts a decision making (DM) before search method. That is, the objectives of the MOP are aggregated into a single objective which implicitly includes preference information given by DM. From a mathematical point of view, this study considers a networked system formed by N interconnected network elements and services with multiple management objectives in each of them. For the sake of simplicity, the term ‘element’ and ‘service’ are interchangeable within the following text. The following notations are used: for element i = 1,2, .., N: y is its performance vector, with dimension n, and let f be the management effort vector of element i. Thus, the multiple objective monitoring problems corresponding to the whole system can be stated as follows:

∑ f *w Subject to: ∑ y * p ≤ C

Minimize

Where

0 ≤ y ≤ b,

The value associated with the f and y will be called weight, and denoted by w and p respectively. The capacity (overall performance) of the whole networked system is denoted by C. The problem is further generalized by assuming that for each j (j=1, 2,...., n), b items of y are available. The above defined problem is an NP-complete combinatorial optimization problem, and the searching for optimization part can be reformulated as a 0-1 integer mathematical programming problem (Abdu, 2002). Thus, the above formulation is reformulated as a multiple bounded knapsack problem due to its similar structure:

∑ f *w Subject to: ∑ a * f * p ≤ C

Minimize

Where 0 ≤ y ≤ b, while the value of a stands for the corresponding coefficient matrix, that is, the relationship between the input management effort f and the output performance indicator f. To keep the relevant qualitative and quantitative weights accurate, the AHP (Analytic Hierarchy Process) is introduced to determine the cumulative action imposed on the system, embeding both qualitative and quantative objectives and measurements.

The AHP, developed by Thomas Satty, allows decision makers to model a complex problem in a hierarchical structure showing the relationships of the goal, objectives (criteria), sub-objectives, and alternatives. Uncertainties and other influencing factors can also be included (Saaty, 1994). The result of the AHP application is a process of preference and allocations. The selection of optimization algorithms is based on branch and bound techniques (Martello, 1990). The main reason for making this choice was not only to accelarate the search for the optimal solution, but also to enable the reuse of the existing solution during on-line computation.

and/or optimization formulation are modified (if needed). The final step consists of finding an efficient allocation of polling frequencies using the branch and bound algorithm. IV. SIMULATION EVALUATION In order to evaluate the effectiveness of the proposed model for monitoring an IP network, an experimental study was conducted in a simulated network topology shown in Fig. 1, using the network simulator ns-2 (ns-2, 2001).

III. TRADE-OFF ANALYSIS MODEL FOR POLLING FREQUENCY The trade-off analysis model contains the following steps: o break down the monitoring problem into a hierarchy; According to the AHP, the overall goal is placed at the top, with the main objectives and sub-objectives on the levels below, the bottom level is where the targeted managed entities reside; assign relevant weights to the objectives and subobjectives; The relevant importance of each of those objective and subobjective factors and their effect on the selection of relevant polling frequencies are determined by performing pairwise comparisons between them. o

Fig.1 Simulation Network Topology The selections of the parameters, such as topology, network workloads and traffic patterns, and management distribution model, are based on several studies and relevant research (Thompson, 1997; Rubinstein, 1999; Stathis, 2000; Dill, 2001; Pattinson, 2001)

decide the weights of the lowest entities by using the knowledge obtained from baselining and benchmarking; After investigating and observing the network behaviour and performance, the lowest level entities are compared against standards for each criterion rather than against each other. Their weight is rated by observing how important the element performs according to each criterion. The total score is its performance rating. The bigger the scores, the more important role this entity plays in the monitoring process, and the more frequent this network entity needs to be polled. o

assign the polling frequency with respect to the relevant weights; The values in the total are ratio scale measures of each polling element expected to make up the polling frequency. The total value is normalized so that the unit of the maximum polling frequency is 1.0. o

o synchronously evaluate and adjust the weights. After assigning the weights and polling frequencies, intermediate results are examined, and the AHP hierarchy

Fig.2 Hierarchy view The procedure for polling frequencies assignment is conducted using the proposed trade-off analysis model. Since the lowest levels of the AHP oriented structure are the same, only one group of the lowest metrics is illustrated in fig. 2. The relevant polling frequency to the network devices and services is summed up and weighted.

Due to a limitation of the AHP, it is not accurate to compare more than nine performance alternatives in pairs (Saaty, 1994). To convert the absolute performance measurement into relative performance impact, a ratings decision model is adopted so that alternatives are measured against intensities. Ratings decision models are referred to as absolute measurement models (ExpertChoice, 2000). The value is measured against standards (intensity) built rather than against each other in pairs to establish the priorities. Two kinds of experimental studies are examined. The first scenario uses the random polling scheme, that is, assigning the individual polling frequencies randomly. The frequency extends when response time increases. In contrast, the second one assgins the frequencies using the proposed trade-off scheme. For the sake of simplicity, the procedure of comparing and assgining weights is post processed off-line using the trace files produced by the simulation runs. That is, there is no step 5 ,synchronizations, of the trade-off scheme implemented in real time in the simulation. Within the simulation, monitoring efficiency is evaluated by the total monitoring cost and the polling response time. To decide whether the scheme generates an appropriate amount of management traffic, total monitoring cost is examined by the aggregation of polling packets on each period of the whole simulation. As expected, in figure 3, it is clear that applying trade-off scheme (in blue) causes a significant reduction in the polling packets and therefore on the management traffic.

Fig. 3 Comparison of polling packets To check the polling response time efficiency, that is, how efficient and accurate is the trade-off polling scheme reacting to failure in the system, the normal polling response time and the polling response time towards failure are collected and compared. The failed nodes and links are designated at random. Due to a ns-2’s limitation, the process time on the nodes is not included into the response time.

Figure 4 Polling response time The fact that the middle components within the network normally have higher weights assigned while the remote ones have lower polling frequencies is reflected in the total polling response time in figure 4, which shows that scenario with trade-off analysis scheme implemented (in blue) has a better response time (lowever) to the failures. V. DISCUSSION The off-line simulation of the trade-off analysis scheme raises many issues. The simulation scenarios and experiments are small scale with the sole aim of providing a corroborative demonstration for the proposed trade-off analysis model. This simulation uses an ISP-like topology with several key network components, such as routers and switches, a couple of types of traffic, a combined workload from those traffics, and simplified user behaviours. Since the simulation scenarios are implemented with the assumption of a pre-determined topology, this makes it inapplicable to the existing Internet where resources span multiple domains and the activities involved are more dynamic. Also, topology discovery efforts and topology impact on network behaviours have been ignored. Furthermore, more parameters may be added, and different ranking grades may be adopted if more accurate evaluation parameters can be found. In summary, further studies are required in order to fine tune the proposed trade-off analysis to take into consideration realistic network environments. Althought the AHP cannot provide instantaneous measures or burst identification, the affordable time and space costs make the AHP study of large networks efficient and appealing (Fahmy, 2001). Although the simulation in this paper has focused on a single point-of-control measurement architecture, the trade-off analysis algorithm is also applicable to a distributedmonitoring setting by constructing a higher level set of

measurement objectives. In addition, the central management model employed could raise a scalability issue: how to deal with manager-to-manager communication within large scale internet environments. Besides, information aggregation approaches could be integrated to reduce traffic loads. VI. CONCLUSIONS In this paper, the issue of polling frequencies for network monitoring problem is addressed. A trade-off analysis model is proposed to provide an efficient monitoring in IP networks. Unlike earlier approaches, this study aggregates multiple measurement objectives and perspectives to differentiate the polling frequencies to be used for different the managed network components. Simulation results validated the monitoring strategy presented. The proposed method uses a model that allows quantifying the efficiency of polling by using weights, and developes new heuristics for efficient polling. How to effectively synchronize the weights in a realtime environment is still an open issue for further study. REFERENCES [1]

[2]

[3]

[4]

[5] [6]

[7]

Abdu, H., Lutfiyya, H., Bauer, M., (2002): "A Model for Efficient Configuration of Management Agents in Distributed Systems", retrieved April, 2002, from http://www.csd.uwo.ca/Research_Group_2/pub.htm. Breitbart, Y., Chan C., Garofalakis, M., Rastogi, R., Silberschatz, A. (2001): "Efficiently Monitoring Bandwidth and Latency in IP Networks", Proceedings of INFOCOM 2001, 933 -942. Dill, S., Kumar, R., McCurley, K., Rajagopalan, S., Sivakumar, D., Tomkins, A. (2001): "Self-similarity in the Web", Proceedings of 27th Very Large Databases Conference, 69--78. ExpertChoice, (2000): "Art of Modeling", retrieved May, 2002, from http ://msistweb.va.gwu.edu/ExpertChoice/Art of Modeling/Art of Modeling.pdf. Fahmy, H. M. A., (2001): "Reliability Evaluation in Distributed Computing Environments Using the AHP", Computer Networks,V. 36: 597-615. Jiao, J., Naqvi, S., Raz, D., Sugla, B. (1999): "Minimizing the Monitoring Cost in Network Management", Proceedings of the IFIP/IEEE International Symposium on Integrated Network Management, Boston, USA, Jiao, J., Naqvi, S., Raz, D., Sugla, B., (2000): "Toward efficient Monitoring", IEEE Journal on Selected Areas in Communications (JSAC), special issue on recent advances in network management and operations,V. 18(5): 723 -- 732.

[8]

[9] [10] [11] [12]

[13] [14] [15]

[16]

[17]

Lin, Y., Chan, M. C., (2002): "A scalable Monitoring Approach Based on Aggregation and Refinement", IEEE Journal on Selected Areas in Communications (JSAC),V. 20(4): 677-690. Martello, S., Toth, P., (1990): "Knapsack Problems: Algorithms and Computer Implementations", John Wiley & Sons. ns-2, (2001): "the Network Simulator", retrieved Sep., 2001, from www.isi.edu/nsnam/ns/. Pattinson, C. (2001): "A study of the Behaviour of the Simple Network Management Protocol", Proceedings of DSOM'2001, Rubinstein, M. G., Carlos, O., Duarte, M.B., (1999): "Evaluating Tradeoffs of Mobile Agents in Network Management", Networking and Information Systems,V. 2(2): 237-252. Saaty, T. L., (1994): "The Analytic Hierarchy Process", University of Pittsburgh. Stathis, C., Maglaris, B., (2000): "Modelling the Selfsimilar Behaviour of Network Traffic", Computer Communications,V. 34: 37-47. Thompson, K., Miller, G.J., Wilder, R., (1997): "Wide-Area Internet Traffic Patterns and Characteristics", IEEE/ACM Trans. on Networking,V.: 10-23. Yoshihara, K., Sugiyama, K., Horiuchi, H., Obana, S. (1999): "Dynamic Polling Scheme Based on Time Variation of Network Management Information Values", Proceedings of Integrated Network Management Symposium (IM'99), 141-154. Zitzler, E., (1999): "Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications." Ph.D. thesis, Department of Information Technology and Electrical Engineering, Swiss Federal Institute of Technology (ETH), Switzerland.

Suggest Documents