Inference of Network Anomaly Propagation Using Spatio ... - NetGNA

4 downloads 2322 Views 2MB Size Report
Jul 16, 2012 - article (e.g. in Word or Tex form) to their personal website or institutional .... a profile of normal behavior named Digital Signature of Network.
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Author's personal copy Journal of Network and Computer Applications 35 (2012) 1781–1792

Contents lists available at SciVerse ScienceDirect

Journal of Network and Computer Applications journal homepage: www.elsevier.com/locate/jnca

Inference of network anomaly propagation using spatio-temporal correlation Alexandre Aguiar Amaral a, Bruno Bogaz Zarpela~ o a, Leonardo de Souza Mendes a, Joel Jose´ Puga Coelho Rodrigues b,n, Mario Lemes Proenc- a Juniorc a

Department of Communications (DECOM), School of Electrical and Computer Engineering, University of Campinas (UNICAMP), Campinas, Brazil ~ Portugal ~ Instituto de Telecomunicac- oes, University of Beira Interior, Rua Marques D’Avila e Bolama, 6201-001Covilha, c Computer Science Department, State University of Londrina (UEL), Londrina, Brazil b

a r t i c l e i n f o

abstract

Article history: Received 6 December 2011 Received in revised form 11 June 2012 Accepted 3 July 2012 Available online 16 July 2012

Many solutions have been proposed for network alarm correlation. However, they mainly have focused on alarm reduction and on root cause analysis. This paper presents an automated alarm correlation system composed of three layers, which obtains raw alarms and presents to network administrator a wide view of the scenario affected by the volume anomaly. In the preprocessing layer, it is performed the alarm compression using their spatial and temporal attributes, which are reduced into a unique alarm named Device Level Alarm (DLA). The correlation layer aims to infer the anomaly propagation path and its origin and destination using DLAs and network topology information. The presentation layer provides the visualization of the path and network elements affected by the anomaly propagation. Moreover, it is presented the Anomaly Propagation View (APV), a graphic tool developed to provide a wide visualization of the network status. In order to evaluate the effectiveness of the proposed solution, it was used real traffic data from State University of Londrina. & 2012 Elsevier Ltd. All rights reserved.

Keywords: Alarms Anomaly propagation Noisy alarm DSNS

1. Introduction Anomalies such as failures and malicious activity are common in network environments (Mohamed and Basir, 2010). Such events may spread quickly due to intrinsic dependencies of network devices (Mohamed and Basir, 2010; Katzela and Schwartz, 1995). Therefore, efficient anomaly detection mechanisms are necessary to guarantee that the network continues operating efficiently (Qin et al., 2011; Hoang et al., 2009). Traditionally, anomaly detection systems generate alarms to notify network administrators of detected anomalies (Thottan and Ji, 2003; Liao et al., 2007). These notifications are fundamental to warn him that there is an abnormality in the network and that a countermeasure must be applied. However, an excessive amount of alarms could be triggered when many devices are affected by anomaly propagation. In middle-sized and large networks, manual inspection of these alarms becomes an unmanageable process to be performed by a human administrator (Perdisci et al., 2006). Alarm correlation mechanisms are necessary to automate the process of treating huge volumes of alarms. Alarm correlation is a key feature of Network Management Systems (NMS) (Bellec and

n

Corresponding author. Tel.: þ351 275319891. E-mail addresses: [email protected] (A.A. Amaral), [email protected] (B.B. Zarpela~ o), [email protected] (L.d.S. Mendes), [email protected] (J.J.P.C. Rodrigues), [email protected] (M.L. Proenc- a Junior). 1084-8045/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jnca.2012.07.003

Kechadi, 2006). It is a process that combines the diverse alarms generated for network elements with abnormal situations, providing a high level description of the detected problem. Alarm correlation process should offer understandable and intuitive information to network administrators, including the anomaly source and the network elements affected by anomaly propagation. In real network scenarios, correlating vast amount of alarms and locating the problem source is a complex task. When there are false, lost or spurious alarms, this process may become even more complicated (Tang et al., 2008). These undesirable alarms may imply false localization hypotheses, influencing significantly the correct tracing of anomaly propagation path. Although many approaches have been proposed for alarm correlation in recent years, they are usually focused on two main points: (1) alarm reduction, which includes different techniques such as compression, filtering and clustering (Zhou et al., 2009; Monacelli and Reali, 2011; Perdisci et al., 2006; Chyssler et al., 2004) and, (2) identification of the root cause of network failures. Aiming to address that issues, a wide diversity of solutions based on different knowledge areas have been proposed, e.g., neural and Bayesian network, artificial intelligence, statistical methods and graph theory (Katzela and Schwartz, 1995; Bellec and Kechadi, 2006; Al-Kasassbeh and Adda, 2009; Steinder and Sethi, 2004; Varga and Moldovan, 2007; Li and Li, 2010; Tang et al., 2008). Unlike of these solutions, our approach aims to trace the path traversed by the anomaly, locating the likely source and destination. The anomaly propagation path tracing provides an intuitive

Author's personal copy 1782

A.A. Amaral et al. / Journal of Network and Computer Applications 35 (2012) 1781–1792

way to locate problematic points into a network. It enables the network administrator to apply the correct solution to minimize the damages. This paper proposes an alarm correlation system divided into three layers: preprocessing layer, correlation layer and presentation layer. The preprocessing layer performs the alarm compression using their spatial and temporal attributes, reducing the raw alarms into a single alarm named Device Level Alarm (DLA). The correlation layer infers the anomaly propagation path and its origin and destination by the combination of DLAs and network topology information, which is represented by a network dependency graph (NDG) (Mohamed and Basir, 2010). The presentation layer presents a wide view of the network scenarios affected by anomaly, providing the visualization of the path and network elements affected by the anomaly propagation through the network. We also present a measure based on nonextensive entropy concept. This measure is applied to mitigate the uncertainty generated by noisy alarms in anomaly propagation path identification. We will use the term noisy alarm to refer to false, lost and spurious alarms. The remainder of this paper is organized as follows. In Section 2, we discuss related work. Section 3 presents the alarm system used in this work and alarm classification. The alarm correlation system and its layers are presented in Section 4. Then, we show the evaluation and results in Section 5. Finally, Section 6 relates some final considerations.

2. Related work Many approaches have been proposed for network alarm correlation field in the last three decades. One of the pioneer works on alarm correlation to find the anomaly root cause was proposed by Katzela and Schwartz (1995). In this work, a heuristic algorithm was proposed. The network elements and their dependencies are modeled by a dependency graph. A weight is given to the edges of the NDG. The weights on edges represent the probability of an incident on a particular network device to propagate to its neighbors. Similarly, a conditional probability is associated to each node, representing the probability that the network element fails. The results demonstrated that solution is able to find the anomaly root cause, but noisy alarms were not considered. Tang et al. (2008) developed a technique named Active Integrated Fault Reasoning or AIR, which uses active and passive measurements in order to identify the network fault. The approach proposed in Bellec and Kechadi (2006) uses clustering to reduce the volume of alarms related to network faults. More recently, Mohamed and Basir (2010) proposed a distributed alarm correlation system attempting to find the fault root cause. In this approach, each intelligent agent is dedicated to a managed network domain. The agent is responsible for collecting and correlating the alarms generated in its domain. This solution is based on Dempster–Shafer’s evidential reasoning. Each agent considers each received alarm as a piece of evidence and reports their alarms to a higher-level manager named agent manager (AM), which correlates the received alarms and identifies the problem root cause. Chyssler et al. (2004) propose a solution to filter and aggregate alarms gathered from various Intrusion Detection System (IDS). Alarm reduction is supported by naive Bayesian, k-nearest neighbors and neural network classification algorithms. With the same goal, Perdisci et al. (2006) propose an alarm clustering algorithm aiming to reduce the large amount of alarms triggered by various IDS. The algorithm produces meta-alarms, which offer a high level description of the attacks against network. The meta-alarms are produced by the correlated alarms generated by different IDS.

A system using neural network and clustering algorithm was developed by Tjhai et al. (2010). The main objective of the system is to identify redundant and low priority alarms, deciding whether they are true or false. The proposed solution is performed in two steps. The first step is to use a Self Organizing Map (SOM) to classify and aggregate alarms that have the same timestamp and the same source and destination IP address, because it indicates that the alarms are related to the same anomalous activity. The first step results in the grouping of redundant alarms. In the second step, the K-means algorithm is applied to label alarms belonging to clusters produced in the first step as true or false alarms. The authors emphasize that this step is important because the number of false alarms is minimized before to be presented to the network administrator. Network alarm correlation literature is mainly focused on alarm reduction and root cause analysis. On the other hand, our work uses the alarm correlation to trace anomaly propagation considering the existence of noisy alarms in alarm dataset.

3. Alarm system architecture We adopted an alarm system proposed in previous work (Zarpela~ o et al., 2009; Proenc- a et al., 2005). Figure 1 presents the alarm system architecture. In the alarm system, real-time data collected from a given SNMP object is analyzed in order to detect unexpected changes in network behavior. As a first step, the alarm system characterizes the traffic by using the Baseline for Automatic Backbone Management (BLGBA) model proposed by Proenc- a et al. (2005). BLGBA model generates a profile of normal behavior named Digital Signature of Network Segment (DSNS) or baseline. The DSNS is the set of basic information that shows the traffic profile on a network segment, through minimum and maximum thresholds on the volume of traffic, quantity of errors, types of protocols and services that flow through this segment along the day. More information about BLGBA or DSNS is available on Zarpela~ o et al. (2009) and Proenc- a et al. (2005). After characterizing the traffic, object level analysis is performed by comparing the DSNS to real-time data. When the alarm system identifies an unexpected deviation from normal behavior, it generates alarms to notify the network administrator. Each raised alarm contains various attributes. For the purpose of this

Fig. 1. Alarm system architecture.

Author's personal copy A.A. Amaral et al. / Journal of Network and Computer Applications 35 (2012) 1781–1792

1783

Fig. 2. SNMP objects monitoring device input and output flows.

Fig. 4. Alarm correlation system layers.

Fig. 3. Classification of the SNMP objects used in our work.

work, we use the following alarm attributes: SNMP object name, device IP address, device port number and timestamp. The alarm system monitors a set of SNMP objects available in MIB-II, collecting different information about the monitored equipment. Among these monitored objects, we use the following ones in this work: (i) ifInOctets and ifOutOctets, which contain respectively the quantity of octets received and sent by each device interface. (ii) ipInReceives and ipOutRequests, which contain respectively the number of IP datagrams received and sent and (iii) tcpInSegs and tcpOutSegs, which contain respectively the total number of TCP segments received and sent. By monitoring these objects, the alarm system is able to check the status of input and output device flows in three layers of TCP/IP protocol stack: interface, network and transport. Figure 2 presents how the objects are related to these layers. 3.1. Alarm classification In order to trace the path traversed by an anomaly, we divided the alarms triggered to different SNMP objects in two groups. Figure 3 shows SNMP objects classification. The group 1 (G1) contains the objects related to alarms classified as input alarms. Alarms for these objects indicate that the anomalous traffic ingressed into network device. On the other hand, the group 2 (G2) cointains the objects related to alarms classified as output alarms. They indicate that the anomalous traffic has left the network device.

4. Alarm correlation system This section presents the alarm correlation system proposed in this work. Figure 4 presents the layers of the alarm correlation system. It is divided into three layers: preprocessing layer, correlation

Fig. 5. DLA representation.

layer and presentation layer. The following sections present in more detail each of the layers. 4.1. Preprocessing layer This layer is responsible to perform the alarm compression using their spatial and temporal attributes. The first obstacle to alarm correlation is the high rate of alarms generated in the occurrence of an anomaly. This volume of alarms is a consequence of two main factors. At first, anomaly effects will be observed while a countermeasure is not taken to resolve it. Anomalies may last a long time until the solution is found, what makes the alarm system trigger several alarms to the same problem. These alarms are redundant data, because they do not bring new information to the correlation process. This means that an alarm correlation system must be able to filter relevant alarms. The second factor refers to anomaly propagation through network. A single anomaly may change the behavior of every network device in its propagation path, resulting in a huge amount of alarms. In order to reduce the volume of alarms, the preprocessing layer explores two attributes of each alarm: where and when. The attribute where contains the localization of the network device for which the alarm was triggered. Traditionally, this information is given by IP address and port number. When refers to the time at which the problem was detected in the network device. Thus, the key of our solution is the notion of similarity between alarms in terms of spatial and temporal information. Based on the attributes where and when, alarms triggered by the same device, named alarmed device, to a particular anomalous event are compressed, being reduced to a single alarm named Device Level Alarm (DLA). Figure 5 shows a generic representation of DLA. The

Author's personal copy 1784

A.A. Amaral et al. / Journal of Network and Computer Applications 35 (2012) 1781–1792

first line of DLA specifies the classification of raw alarms: Group 1, Group 2 or both. The meaning of other DLA lines depends on the first line:

 First line¼@G1: raw alarms belong to Group 1. Therefore,





anomalous traffic ingressed into the alarmed device, coming from an adjacent device. Device1 in the second line of DLA represents the adjacent device and Device2 represents the alarmed device. First line¼@G2: raw alarms belong to Group 2. Therefore, anomalous traffic left the alarmed device, going to an adjacent device. Device1 in the second line of DLA represents the alarmed device and Device2 represents the adjacent device. First line¼@G1  G2 or First line ¼@G2 G1: raw alarms belong to Group 1 and Group 2. Therefore, in a time interval Dt, the anomalous traffic ingressed into the alarmed device and left it. If first line ¼@G1  G2, Device1 in the second line of DLA represents the alarmed device and Device2 represents the adjacent device. If first line¼@G2 G1, Device1 in the second line of DLA represents the adjacent device and Device2 represents the alarmed device.

Fig. 7. DLA examples.

about the network topology, the correlation layer aims to produce the most suitable explanation about the incident on the network. This section presents the following three points concerning the correlation layer:

 How the network topology information is modeled as a network dependency graph.

 A measure based on entropy to minimize the uncertainty 

Consider the hypothetical network scenario affected by an anomaly propagation presented in Fig. 6. The values (i), (ii), and (iii) shown in Fig. 6 have the following meanings: (i) Alarms belonging to group 2 (output alarms) were generated to a firewall port that connects the firewall to the network core switch. It means that the anomalous traffic left the firewall and went to the switch. Alarms belonging to group 1 (input alarms) and 2 (output alarms) were triggered for the switch port that connects the switch to the proxy server. It means that, in a time interval Dt, the anomalous traffic propagated in both directions between the switch the proxy server. (ii) Alarms belonging to group 1 (input alarms) were triggered for a web server port. It means that anomalous traffic left the switch and went to the web server. The corresponding DLAs of this example are presented in Fig. 7. 4.2. Correlation layer This layer is responsible to correlate the DLA generated by the pre-processing layer. By combining the DLAs and information

Fig. 6. Network affected by anomalous event.

about anomalous traffic source and destination generated by noisy alarms. The proposed algorithm to identify the anomalous propagation path combining DLAs, the network dependency graph and the entropy-based measure.

4.2.1. Network model Telecommunication networks consist of elements that are dependent on each other (Katzela and Schwartz, 1995). An anomalous event occurred in a particular device xk can directly affect the devices xk þ 1, xk þ 2yxn, given their dependency relationships. We use a network dependency graph (NDG) to represent the dependencies between network devices, being a straightforward model to determine the source of a problem observed (Mohamed and Basir, 2010). We model the network by a directed graph G ¼(v,e). The finite and non-empty set of vertices v represents the network devices, while the set of edges e represents the communication links between them. Each edge in the graph is a link between two nodes, represented by ordered pairs (vi,vj), where vi and vj are distinct elements from v, i.e., vi avj. In each edge (vi,vj), it is assigned a probability. This weight indicates the probability of an anomaly observed in vi to propagate to the dependent devices. The neighborhood of vi is denoted by N(vi). In practice, we consider that probabilities are known a priori, since they can be estimated from previous anomalous events experienced by the network. Let X¼{x0, x1,y,xn}, XDv, denote the devices involved in an anomalous event. The total number of devices affected by anomaly is given by nx ¼#X. The set l ¼{l1,l2,y,ln} is defined as the subset of e, where lk for k¼{1,y,n} represents the link that connects the elements of X. In the anomaly propagation scenario, lk represents the link carrying anomalous traffic from xi to xj. 4.2.2. Calculating the probability of the anomaly propagation in the links The probability of anomalous traffic propagating through the links is estimated using historical anomalous events occurred in the network. At first, the raw alarms a1,a2,y,an belonging to group 1 (G1) and a1,a2,y,am belonging to group 2 (G2) triggered for device(s) affected(s) by anomaly are divided. Subsequently, based on the knowledge of the network dependencies, it is possible to calculate the probabilities as follows. Consider a device r connected to a device w by a link. Thus, we estimate that the probability of the

Author's personal copy A.A. Amaral et al. / Journal of Network and Computer Applications 35 (2012) 1781–1792

anomalous traffic to propagate trough link, denoted by pr-w is given by (1) or it can be simplified by (2). pr-w ¼

a1 ,a2 ,. . .,an A G2 a1 ,a2 ,. . .,am A G1 þa1 ,a2 ,. . .,an A G2

ð1Þ

pr-w ¼

#G2 #G1 þ #G2

ð2Þ

4.2.3. Identifying the source and destination of the anomaly From the network management perspective, quick identification of the anomaly source is crucial to minimize network and service outage and degradation. However, this is not a simple task due to several factors. The occurrence of a single anomalous event may generate a cascade effect affecting all network devices in its propagation path. In such circumstances, a huge volume of alarms are triggered, hindering the identification of the problem primary source (Katzela and Schwartz, 1995). In addition, information provided by the alarms may be incomplete and/or inaccurate. Such information may indicate ambiguous and inconsistent hypotheses to locate the origin of the problem (Mohamed and Basir, 2010). This process has been considered as a NP-Hard problem (Steinder and Sethi, 2004). We use a heuristic method designed to locate the source and destination of the anomaly. Our method is based on the following assumption: there is just a single source sending an anomalous traffic to a unique destination. The graph G structure is represented by a dependency matrix, named matrix D. It contains the relationships of the devices X affected by the anomaly, defined in D ¼ ðdi,j Þnðx þ 1Þ nðx þ 1Þ

ð3Þ

The entries of the matrix D are defined as follows. Each di,j can take the 2-tuple owij,p(xi,xj) 4 value, defined in ( m ðxi ,xj Þ A l, wij ¼ ð4Þ 0 otherwise: and p(xi,xj) is the probability assigned to anomalous link lk that connect xi and xj. m, with m a0, representing an anomaly going from xi to xj. Graph nodes indegree and outdegree are valuable information to identify anomaly source and destination. In order to store indegree and outdegree, a line and a column are added in D. The outdegree of xk can be obtained by (5), which performs the sum of the entries in the line k of the matrix D, where wkj a0 (0rj rnx). X 1 ð5Þ zout x ¼ 80 r j r nx 9wkj a 0

In the same way, the sum of the entries in the column k of the matrix D, with wik a0 (0 rirnx), give us the indegree of xk, as given by X 1 ð6Þ zin x ¼ 80 r i r nx 9wik a 0

1785

A xk node is considered a source when its indegree is zero, i.e., zin x ¼ 0 and its outdegree Z1. Such information indicates that the xk has no predecessors. On the other hand, a xk is considered a destination when its outdegree is zero, i.e., zout x ¼ 0 and its indegree Z1. In (7), the line n(x þ 1) named S and the column n(x þ 1) named F represents the indegree and outdegree of each xk respectively.

ð7Þ

4.2.4. Tracing anomaly propagation path in presence of noisy alarms In a real network environment, noisy alarms are unavoidable (Tang et al., 2008). Therefore, the result of alarm correlation process is prone to inaccuracy in anomaly localization. In this section, we present an approach to solve the problem of defining anomaly source and destination when there are noisy alarms in the alarm set. Let S ¼ s0 ,s1 ,. . .,sm , m A naturals; and monx denote the anomaly source candidates and F ¼ f 0 ,f 1 ,. . .,f n , n A naturals; and nonx, with F\S ¼ | denote the anomaly destination candidates. Both elements of S and F are given by their in/outdegree observed in (7). We define #S and #F as the number of elements belonging to set S and F respectively. Given the assumption that there is only one source emitting the anomaly to a single destination, we can have possible scenarios illustrated in Fig. 8 involving the anomalous devices belonging to S and F. The first case refers to situations in which #S41 and #F¼1, i.e., the noisy alarm does not affect the identification of the destination device, however, the real source anomaly localization is affected. The second one is that when #S¼1 and #F41. In this case, there is not a noisy alarm that affects the localization of the true source of the anomaly, but the identification of the correct destination is influenced. The last case refers to worst scenario in which #S41 and #F41. This scenario indicates that the noisy alarms are producing diverse hypotheses about the source and destination of the anomaly. Let Ps denote the anomalous path between the elements of S and F. Let P denote the set of all anomalous paths where P¼[8sPs. Each Ps is described using a 3-tuple osq,ft,g 4, where sqAS is the source candidate, ftAF the destination candidate and g, a list that contains the probabilities assigned to anomalous links lk belonging the path that connects sq and ft. Considering that exists r anomalous links in a path Ps, g would have the values g1, g2,y,gr. In order to get the most probable path traversed by an anomaly, the possible paths are verified from sq to ft. Initially,

Fig. 8. Possible scenarios caused by noisy alarms.

Author's personal copy 1786

A.A. Amaral et al. / Journal of Network and Computer Applications 35 (2012) 1781–1792

we take the sq ¼xi and verify its neighborhood. If N(xi) 41, the successor node of xi will be a neighbor xi þ u connected by the link lk with the highest weight. The next device is chosen to compose the anomalous path by criterion pmax, defined by pmax ¼ maxpðxi þ 1 9xi Þ, pðxi þ 2 9xi Þ,. . .,pðxn 9xi Þ

ð8Þ

The maximum probability pmax chosen is added to g. This process continues until to reach xj ¼ft. For each sq directed to a given ft, there will be only one possible path. Therefore, we can define that the total number of elements of P is m, where m¼#S. When m41, there are m possible anomalous paths. Then, a quantitative measure is necessary to gives us the support to decide what is the most probable path traversed by the anomaly. We use a measure based on the entropy. The entropy concept has been widely applied both in theory and in practice to solve problems involving uncertainty in different areas such as physics, information theory, data compression and economics (Cover and Thomas, 2006). In this work, it is used the nonextensive entropy, proposed by Tsallis (1988). This entropy is a generalization of Shannon entropy and is defined as follows. Let X a discrete random variable, with probabilities p1, p2,y, pN, with 0rp r1. Nonextensive entropy is defined by ! N X 1 ð9Þ 1 H q ðXÞ ¼ pi q q1 i¼1 As can be seen in (9), the nonextensive entropy has a parameter QUOTE named entropic parameter. The parameter q indicates what probabilities will have greater influences on the entropy’s values, different from the Shannon entropy, in which high or low probabilities have little influence on the result (Ziviani et al., 2007). In the case of the nonextensive entropy, lower or higher probabilities may influence on the final value of entropy depending on the value of the parameter q. For q 41, higher probabilities have major contribution to entropy value than the lower one, while for qo1 we have the opposite (Tellenbach et al., 2009). When q-1, the nonextensive entropy is equivalent to Shannon entropy (Ziviani et al., 2007). The entropy concept is interpreted as the uncertainty associated to each PsAP to be the most probable anomalous path. In this context, the lowest entropy value calculated for all possible Ps will indicate, in terms of the probability, the path traversed by the anomaly. Considering the list g containing the probabilities assigned to each link in the Ps, we can write the uncertainty of this path as follows: ! N  q X 1 gi 1 H q ðPs Þ ¼ ð10Þ q1 N i¼1 The first step in the process consists in calculating the entropy H(Ps) for each PsAP. Thereafter, the most probable path traversed by anomaly is obtained using the criterion defined by MPP ¼ minHðP 0 Þ,HðP 1 Þ,. . .,HðP m Þ

ð11Þ

Finally, the last step consists of getting the most probable source and the most probable destination sq and ftAPs respectively, obtained from MPP, where MPP¼H(Ps). The algorithm to infer the anomalous path source and destination is presented in Table 1. It is a general instance of the abductive inference problem, which uses the evidences (alarms) to identify the anomaly source. 4.3. Presentation layer This layer is responsible for presenting the result of the alarm correlation. An alarm correlation system should facilitate the task

Table 1 Algorithm to trace the path traversed by anomaly. Algorithm: Identifying the anomalous path propagation Input: Output:

Graph G ¼ (v,e) Anomaly propagation map

g

List with weights of the links Path between sq and ft List of paths Device affected by anomaly Device successor in the path of propagation List of successors of the xi

Ps o sq,g,ft 4 P xi xi þ u N 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:

for all sAS do xi :¼ s while(true) do N: ¼getSuccessors(xi) if N.isEmpty() then break else if N.size() ¼ ¼1 then g:¼ N.get(0).getLinkWeight() xi þ u: ¼N.get(0).getNumberDevice() xi:¼ xi þ u else Choose xi þ u based on lk ¼(xi,xi þ u) using the criterion pmax pmax ¼ maxpðxi ,xi þ 1 Þ, pðxi ,xi þ 2 Þ,. . .,pðxi ,xn Þ g: ¼pmax xi:¼ xi þ u end if end while Ps:¼ o s,xi þ u,g 4 P.add(Ps) end for calculateEntropyForPaths(P)

of interpreting the information generated by the correlation process. From the perspective of a human administrator, the graphical view is widely accepted as a plausible and intuitive strategy for interpreting the resulting of the correlation process. In this case, the administrator interacts directly with the graphical visualization, which shows the behavior of all network devices during the occurrence of an anomaly, instead of applying efforts to inspect the alarms in text format. For this purpose, the Anomaly Propagation View (APV) tool was developed.

4.3.1. Anomaly Propagation View features APV is a multiplatform tool developed in Java language using the Java Universal Network/Graph (JUNG) API (JUNG, 2011). The APV enables the reconstruction of the anomalous scenarios using the DLA alarms. Figure 9 presents the APV tool. The item 1 points to the toolbar. It consists of useful buttons to import and open alarms dataset and to save the anomaly propagation map, among others. The item 2 points to the plotting area, which presents the final result of the alarm correlation process. The network is presented in a wide way, showing all devices and links affected by the anomalous event. As we see in item 2, it is possible to contrast the devices and links in red color with the other unaffected by the anomaly. The links in red color refer to anomaly propagation path. The item 3 points to a table that contains detailed information about the results of the correlation process. This table shows the date at which the anomaly occurred, the first and last occurrence, information about the source and destination of the anomalous traffic and the number of devices and links affected by anomalous activity.

Author's personal copy A.A. Amaral et al. / Journal of Network and Computer Applications 35 (2012) 1781–1792

1787

Fig. 9. Anomaly Propagation View tool. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

5.1. Case study 1

Fig. 10. State University of Londrina monitored network.

5. Experiments and evaluation In order to validate our proposal, we present two case studies. The first case study ocurred in May 16, 2010. This anomaly occurrence was selected due to the noisy alarms generated by the alarm system. Therefore, it is possible to show how the the measure based on noextensive entropy is used by alarm correlation system to infer the origin and destination of the anomalous traffic. The second case study, conducted in May 04, 2011, shows an anomalous event generated by an artificial traffic injection. The objective of this case study is to compare the path inferred by the alarm correlation system with the actual path, which is known since the anomaly was generated by an artificial traffic injection. Figure 10 illustrates the monitored network of the State University of Londrina, where the experiment was conducted. Table 2 presents the monitored devices, as well as their description.

In this case study, we present an anomalous event detected on May 16, 2010. It started at 05:10:23 PM and ended at 05:36:20 PM. Anomalous traffic propagation affected the Firewall, Proxy Server, Switches S1, S2 and S3. We present fragments of the alarms for each device affected by the anomalous event. In the S2 device, alarms were generated to port 4011 and SNMP object ifInOctets, as shown in Fig. 11. They indicate that anomalous traffic was generated by S3. It was detected that the anomaly propagation affected the Proxy Server, generating alarms for objects ipOutRequests and tcpOutSegs respectively, as illustrated in Fig. 12. Figure 13 presents the alarm for object ifInOctets in S2’s port 3011. It confirms that the anomalous traffic left the Proxy Server device. In similar fashion, the alarm system detected the anomalous traffic leaving S2 and going to Firewall device. This is indicated by an alarm triggered for the object ifOutOctets, as presented in Fig. 14. In the Firewall, alarms were emitted for the ipInReceives object, as observed in Fig. 15. As illustrated in Fig. 16, alarms were triggered for object ifInOctets. It demonstres that anomalous traffic propagation affected the S1 device. The DLAs generated by the preprocessing layer are shown in Fig. 17. The alarms reduction rate for this case study was 82.6%. For this case study, we have the corresponding matrix D:

ð12Þ

We labeled S1 device as x0, Firewall as x1, S2 as x2, S3 device as x4 and Proxy Server as x6 as described in Table 2. In this anomalous scenario, S¼ {x4,x6}, which is highlighted with red

Author's personal copy 1788

A.A. Amaral et al. / Journal of Network and Computer Applications 35 (2012) 1781–1792

Table 2 Monitored devices. Label

Device

Description

x0 x1 x2 x3 x4 x5 x6 x7

Switch S1 Firewall Switch S2 Switch ATI Switch S3 Email server Proxy server Web server

Intermediary switch between the Internet link and firewall Responsible for filtering all traffic directed from the Internet to the university network and vice versa Responsible for interconnecting the main elements of the network Responsible for interconnecting the university Information Technology Office (ATI) subnet to the network core Responsible for interconnecting the Computer Science Department subnet to the network core Responsible for providing mail service for professors, staff and students Responsible for controlling access to Internet content Main page server of the university

Fig. 11. Alarm triggered to S2 device.

Fig. 12. Alarm triggered to Proxy Server device. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 13. Alarm triggered to S2 device.

Fig. 14. Alarm triggered to S2 device.

circle in (12). As indicated by a blue triangle, F¼{x0}. Therefore, we have #S¼2 and #F ¼1. #S¼2 is an indicator that there are noisy alarms in alarm set. As consequence, there are two probable anomaly sources as presented in Fig. 18. The path P1 that starts at x4 is constituted by links x4-x2x1-x0, where the probabilities estimated using (2) to anomalous links are: Pðx4 ,x2 Þ ¼ 0:78, Pðx2 ,x1 Þ ¼ 0:58, Pðx1 ,x0 Þ ¼ 0:81, then g ¼ 0:78, 0:58, 0:81. The second possible path, P2, starts at x6

Fig. 15. Alarm triggered to Firewall device.

Author's personal copy A.A. Amaral et al. / Journal of Network and Computer Applications 35 (2012) 1781–1792

containing the links x6-x2-x1-x0 with probabilities: Pðx6 , x2 Þ ¼ 0:51, Pðx2 ,x1 Þ ¼ 0:58, Pðx1 ,x0 Þ ¼ 0:81, where g ¼{0.51, 0.58, 0.81}. Calculating the entropy for each anomalous path using (10) with entropic parameter q¼ 1.1, we have that value H1.1 (P1) ffi3.7 and H1.1(P2) ffi 4.5 respectively. The final result indicates that the most probable path is MPP ¼H(P1). Consequently, the

1789

alarm correlation system infers that the most probable source is x4 (S3) and the most probable destination is x0 (S1). Figure 19 presents a wide view of the impacts caused by anomalous traffic propagation generated by the APV tool. The red points highlight the devices involved in the anomalous event. The dashed red lines illustrate the links affected by anomaly propagation in the network.

5.2. Case study 2

Fig. 16. Alarm triggered to S1 device.

This case study presents an anomalous event in which an anomalous traffic was artificially injected in the network. For this purpose, the crafting tool (Ostinato, 2011) was used. This tool enables the generation of a great volume of the data using various protocols, such as TCP, UDP and ICMP which is possible to simulate an anomalous traffic. The experiment was performed on May 16, 2011. Two hosts were used. The first host, located in sub-network ATI, as illustrated in Fig. 20. The second one is in the Internet. After that the Ostinato tool was configured to generate

Fig. 17. DLAs of the case study 1.

Fig. 18. Scenario with two probable anomaly sources.

Fig. 20. Devices and path expected to be affected by injected traffic.

Fig. 19. Network wide view affected by anomalous event of the case study 1. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Author's personal copy 1790

A.A. Amaral et al. / Journal of Network and Computer Applications 35 (2012) 1781–1792

an ICMP traffic with rate of the 10,000 packets per second, starting the transmission at 00:55:00 and ending to 01:06:00. For this case study, the alarm system triggered alarms to switch S2 for objects ifInOctets, port 5001, presented in Fig. 21. Similarly, Fig. 22 shows the alarms emitted for S2, in the port 3001 for object ifOutOctets, signaling that anomalous traffic was directed the Firewall. In the Firewall, the alarm system detected the anomalous traffic and triggered alarms for object ifInOctets, as presented in Fig. 23. Figure 24 shows the alarms emitted for S1, object ifInOctets. The DLAs generated by the preprocessing layer are presented in Fig. 25. The alarm reduction rate for this case study was 78.5%. The labels of the devices involved in the anomalous event are: S1 as x0, Firewall as x1, S2 as x2 and switch ATI as x3. The matrix D is presented as follows:

ð13Þ

In this case study, S¼{x3} and F ¼{x0} highlighted in (13). The alarm correlation system identifies correctly x3 (ATI) as being the source and x0 (S1) as the destination of the anomalous traffic.

Figure 26 presents a holistc view of the network affected by anomalous traffic propagation. 5.3. Discussion The case studies demonstrated the process performed by the alarm correlation system, which provides a wide view of the network health from a set of raw alarms generated by alarm system. It is important to highlight that the whole procedure described in the case studies is performed in a fully automated manner by the alarm correlation system, without the need of network administrator intervention. The alarm reduction rate obtained by alarm compression to case study 1 was 82.6%. It means that to infer the origin and destination of anomalous traffic, 17.4% of the total number of alarms were necessary. This percentage refers to the volume of alarms (DLAs) that was delivered by the pre-processing layer to the correlation layer. For case study 2, the alarm rate reduction was 78.5%. These reduction rates are significant. Some approach like that purposed by Al-Mamory and Zhang (2009) had a reduction rate of 74%, while the system designed by Costa et al. (2009) had a compression rate of 70%. The case study 1 presented a scenario with noisy alarms and how these unwanted alarms inserted uncertainty in the correlation process. It was shown, step by step, how the alarm

Fig. 21. Alarm triggered to S2 device.

Fig. 25. DLAs of the case study 2.

Fig. 22. Alarm triggered to S2 device indicating that anomalous traffic propagated to Firewall.

Fig. 23. Alarm triggered to Firewall.

Fig. 24. Alarm triggered to Firewall.

Author's personal copy A.A. Amaral et al. / Journal of Network and Computer Applications 35 (2012) 1781–1792

1791

Fig. 26. Network global view affected by anomalous event for case study 2.

correlation system inferred the source and the destination of anomalous traffic using the nonextensive entropy. The case study 2 presented an anomalous scenario caused by an anomalous traffic articially injected in the network. Therefore, it was possible to compare the path inferred by the alarm correlation system with the actual path. The alarm correlation system identified correctly the anomaly propagation path, inferring its source and destination in a automated manner. To the best of our knowledge, existing work on network alarm correlation area do not address the volume anomaly propragation tracing problem. For this reason, we did not make straight comparisons between our approach and others concerning the anomalous path inference. The results showed that main objectives of the work were achieved. However, some points may be improved. At first, besides tracing the anomaly propagation path, it is important to classify the anomalies. Anomaly classification produces more information to network administrator, allowing that more specific countermeasures be applied to solve the problem. The proposed approach should also address more complex scenarios: anomalies originated in multiple sources. IP Flow Information Export (IPFIX) adoption is a probable solution to reach these two objectives, because IPFIX brings more detailed information about network operations than SNMP. Therefore, it will be possible to classify the anomalies (e.g., DDoS attack) and to have greater support to analyze whether an anomaly is originated in multiple sources or not. Finally, we intend to modify the alarm correlation system to work in a distributed architecture, aiming to improve system scalability. Distributed processing benefits are essential in current network scenarios and for the next generation networks, because network convergence and large scale aggregation of mobile devices will culminate in a very much larger volume of devices and traffic.

6. Conclusion In this paper, an alarm correlation system divided into three layers is presented. In the preprocessing layer, it is performed the alarm compression using their spatial and temporal attributes.

Based on these attributes, alarms triggered by the same device in the occurrence of the anomaly are reduced to a unique alarm named Device Level Alarm (DLA ). The correlation layer aims to infer the anomaly propagation path and its origin and destination using DLAs and network topology information. The presentation layer provides a wide view of the network graphically. Furthermore, we developed the Propagation Anomaly View (APV) tool. It is a multiplatform tool that enables visualization and fast understanding of the impact caused by anomaly propagation through the network. In order to validate our solution, tests were performed in real network traffic of the State University of Londrina. From the case studies presented, it is possible to verify that the alarm correlation system is able to infer the source and destination of the anomalous traffic even with noisy alarms. We highlight three main contributions of this work: (i) an alarm correlation system that aims to infer the source and destination of the anomalous traffic, (ii) a measure based on nonextensive entropy, applied to infer the anomalous path in scenarios where there are possible paths generated by noisy alarms and (iii) a tool that shows graphically the result of the correlation, providing a holistic view of the impact the anomaly in the network. As a future work, we intend to improve our solution to identify anomaly propagation in a more complex scenario with the following characteristics: noisy alarms and anomalies originated in multiple sources. Future work also includes the development of a distributed architecture to the alarm correlation system and the adoption of IPFIX to collect network operations data.

Ackowledgements This work has been partially supported by Instituto de Tele~ comunicac- oes, Next Generation Networks and Applications Group (NetGNA), Portugal, by National Funding from the FCT—Fundac- a~ o para a Ciˆencia e a Tecnologia through the PEst-OE/EEI/LA0008/ 2011, and by SETI/Fundac- a~ o Arauca´ria and MCT/CNPq/Brazil through the Rigel Project.

Author's personal copy 1792

A.A. Amaral et al. / Journal of Network and Computer Applications 35 (2012) 1781–1792

References Al-Kasassbeh M, Adda M. Network fault detection with Wiener filter-based agent. Journal of Network and Computer Applications 2009;32(4):824–33. Al-Mamory SO, Zhang H. Intrusion detection alarms reduction using root cause analysis and clustering. Computer Communications 2009;32(2):419–30. Bellec JH, Kechadi MT. Towards a formal model for the network alarm correlation problem. In: Proceedings of the 6th WSEAS international conference on simulation, modeling and optimization, 2006, p. 458–63. Costa R, Cachulo N, Cortez P. An intelligent alarm management system for largescale telecommunication companies. In: Proceedings of the EPIA, 2009. p. 386–99. Chyssler T, Burschak S, Semling M, Lingvall T, Burbeck K. Alarm reduction and correlation in intrusion detection systems. In: Proceedings of the detection of intrusions and malware & vulnerability assessment, GI SIG SIDAR workshop, vol. 46 of LNI, 2004. p. 9–24. Cover T, Thomas J. Elements of information theory. second ed. John Wiley & Sons; 2006. Hoang XD, Hu J, Bertok P. A program-based anomaly intrusion detection scheme using multiple detection engines and fuzzy inference. Journal of Network and Computer Applications 2009;32(6):1219–28. JUNG, Java Universal Network/Graph Framework. Available in: /http://jung. sourceforge.net/S (accessed 21.01.11). Katzela I, Schwartz M. Schemes for fault identification in communication networks. IEEE/ACM Transactions on Networking 1995;3:753–64. Li T, Li X. Novel alarm correlation analysis system based on association rules mining in telecommunication networks. Proceedings of Information Sciences 2010;108(16):2960–78. Liao Y, Vemuri VR, Pasos A. Adaptive anomaly detection with evolving connectionist systems. Journal of Network and Computer Applications 2007;30(1):60–80. Mohamed AA, Basir O. Fusion based approach for distributed alarm correlation. In: Proceedings of the 2010s international conference on communication software and networks, 2010. p. 318–24. Monacelli L, Reali G. Evolution of the codebook technique for automatic fault localization. IEEE Communications Letters 2011;15(4):464–6. Ostinato. Packet/Traffic Generator and Analyzer. Available in: /http://code.google. com/p/ostinato/S (accessed 02.02.11).

Proenc- a MLJr., Zarpela~ o BB, Mendes LS. Anomaly detection for network servers using digital signature of network segment. In: Advanced industrial conference on telecommunications, Lisboa. Advanced ICT 2005 proceedings—IEEE Computer Society, 2005. p. 290–5. Perdisci R, Giacinto G, Roli F. Alarm clustering for intrusion detection systems in computer networks. Engineering Applications of Artificial Intelligence. Journal Engineering Applications of Artificial Intelligence 2006;19(4):429–38. Qin T, Guan X, Li W, Wang P, Huang Qiuzhen. Monitoring abnormal network traffic based on blind source separation approach. Journal of Network and Computer Applications 2011;34(5):1732–42. Steinder M, Sethi AA. A survey of fault localization techniques in computer networks. Science of Computer Programming 2004;53(2):165–94. Tellenbach B, Burkhart M, Sornette D, Maillart T. Beyond Shannon: characterizing internet traffic with generalized entropy metrics. In: Proceedings of the 10th international conference on passive and active network measurement, 2009. p. 239–48. Tsallis C. Possible generalization of Boltzmann–Gibbs statistics. Statistical Physics 1988;52(1–2):479–87. Tjhai GC, Furnell S, Papadaki M, Clarke NL. A preliminary two-stage alarm correlation and filtering system using SOM neural network and K-means algorithm. Proceedings of Computers & Security 2010:712–23. Thottan M, Ji C. Anomaly detection in IP networks. IEEE Transactions on Signal Processing 2003;51:2191–204. Tang Y, Al-Shaer E, Boutaba R. Efficient fault diagnosis using incremental alarm correlation and active investigation for internet and overlay networks. IEEE Transactions on Network and Service Management 2008;5(1):36–49. Varga P, Moldovan I. Integration of service-level monitoring with fault management for end-to-end multi-provider ethernet services. Proceedings of the IEEE Transactions on Network and Service Management 2007:28–38. Ziviani A, Gomes ATA, Monsores ML, Rodrigues PSS, Gomes A. Network anomaly detection using nonextensive entropy. IEEE Communications Letters 2007;11(12):1034–6. Zarpela~ o BB, Mendes LS, Proenc- a ML Jr, Rodrigues JPC. Parameterized anomaly detection system with automatic configuration, IEEE global communications conference (IEEE GLOBECOM 2009), Communications Software and Services Symposium, Honolulu, Hawaii, USA, 2009. Zhou CV, Leckie C, Karunasekera S. Decentralized multi-dimensional alert correlation for collaborative intrusion detection. Journal of Network and Computer Applications 2009;32(5):1106–23.

Suggest Documents