Time Series Modeling for IDS Alert Management - CiteSeerX

91 downloads 28374 Views 244KB Size Report
prefer to monitor alert aggregates instead of deactivating ... make the best use of available tools. 3. .... The tool includes a server polling the the network nodes to.
Time Series Modeling for IDS Alert Management Jouni Viinikka, Herve´ Debar

Ludovic Me, ´ Renaud Seguier ´

France Telecom BP 6243 14066 Caen Cedex, France

Supelec ´ BP 81127 35511 Cesson Sevign e´ Cedex, France ´

[email protected]

[email protected]

ABSTRACT Intrusion detection systems create large amounts of alerts. Significant part of these alerts can be seen as background noise of an operational information system, and its quantity typically overwhelms the user. In this paper we have three points to make. First, we present our findings regarding the causes of this noise. Second, we provide some reasoning why one would like to keep an eye on the noise despite the large number of alerts. Finally, one approach for monitoring the noise with reasonable user load is proposed. The approach is based on modeling regularities in alert flows with classical time series methods. We present experimentations and results obtained using real world data.

Categories and Subject Descriptors C.2.3 [Computer - Communication Networks]: Network Operations—Network monitoring; C.2.0 [Computer Communication Networks]: General—Security and protection

General Terms Security, Experimentation

1. INTRODUCTION Intrusion detection systems (IDS) create often excessive amounts of alerts, a fact acknowledged widely in both science and industry [1, 7, 12]. In this section we have a look at some of the causes of this alert overflow and position our work in the alert correlation domain. In this paper by a sensor we mean a misuse- and networkbased sensor. The diagnostic capabilities of current sensors are modest [5], and a large majority of generated alerts can be of little or no use for the operator of the system [7]. This chaff is generated for diverse reasons, but can be roughly divided into four classes. The last class is the most relevant with respect to our work, and we will describe it in more detail. 1) The behavior model of an anomaly-based sensor

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. AsiaCCS’06 March 21–24, 2006, Taipei, Taiwan Copyright 2006 ACM 1-59593-272-0/06/0003 ...$5.00.

or the attack descriptions of a misuse-based sensor can be inaccurate, creating false positives. 2) The sensor may be unable to determine whether the seen attack is a real threat in the working environment of the sensor. 3) The sensor is often mono eventual i.e. unable to associate related attack steps, checking e.g. just one network packet at the time. 4) The sensor is alerting on packets that a) are often caused by normal system functioning, but b) can also be premises or consequences of attacks or problems. The difficulty is that it is impossible to make the difference using the information carried by the packet. This implies that the flow of these alerts contains both signs of of wanted, or at least acceptable, traffic, and possibly signs of of unwanted traffic. Typically, these alerts are not associated with any vulnerabilities, and consequently alert verification [9] using for example the knowledge of the configuration of the information system is not possible. Some of these issues could be, at least in theory and to some extent, addressed by improving the sensors themselves. However, in practice the data capture and analysis capabilities of low level sensors are rather limited because of large amount of data that needs to be monitored in (near) realtime. In addition, the visibility of one sensor can be too limited for diagnosing large scale attacks or using knowledge about the operating environment at sensor level. Alert correlation has been proposed as one solution to the problem [6, 8, 14, 16]. Harshly simplifying, correlation is a higher level function taking multiple alerts and additional information into account to 1) reduce the alert volume, 2) improve the alert content, 3) and to track attacks spreading across multiple packets or consisting of multiple steps. The alerts can be issued by from multiple, heterogeneous sensors, and the additional information is, for instance, the configuration and vulnerability scan results of the monitored system. Our work focuses on processing high alert volumes produced by misuse- and network-based sensor Snort. The alerts in question can mostly be associated with the cause number four, the monitoring of common packets related to system functioning. In the rest of this paper, when speaking of alerts, we mean this particular type of alerts unless mentioned otherwise. In our operating environment we cannot eliminate this type of alerts by deactivating the signatures, so we need a methodology to process them. The proposed methodology builds on three basic ideas: 1) instead of individual alerts we process alert flows, 2) the component of a flow that can be attributed to the normal system behavior is modeled using autoregressive time series

model, and 3) only deviations from the normal flow behavior are reported to the operator as meta-alerts. The remaining of the paper is organized as follows. We define the problem in section 2. In section 3 we present related work and position our work with respect to it. As our approach is driven by problems detected in the wild, we spend some space presenting the used alert corpus and our observations made from the data in section 4. These observations have given the inspiration for this work, and we consider them important. Section 5 explains the methodology we propose to process alerts generated by prolific signatures. This section describes also the used time series analysis methods. Practical results are discussed in section 6 and finally we offer our conclusions in section 7.

However, we use Snort for the following reasons: • Even though an ordinary network sensor could be used to monitor for example ICMP messages, and different types of messages could be separated with header information, this is not enough for our purposes. We need more fine grained alert aggregation using pattern matching on packet payload. For example, ICMP Echo messages can contain information on the purported origin of the ping packet in the payload, which is of interest to us. • By its packet aggregation, Netflow creates an abstraction layer that is not present when we use individual alerts from Snort. It could be seen that we are performing ourselves a flow level abstraction that suits our needs better than the one provided by Netflow.

2. PROBLEM STATEMENT In this section we define the problem we aim to solve, and justify its significance for the users of intrusion detection systems. The main objective of our work is to allow the user to focus on more relevant tasks by relieving him from the manual inspection of numerous benign alerts, and still provide him with the contextual information available only in the aggregates. It has been reported [7] that as much as 99% of the alerts produced by intrusion detection systems can be false positives, alerts mistakenly triggered by benign packets. First of all, we see that the definition of a false positive is a bit problematic. We do think that as such, a large majority of alerts is quite useless for the operator. However, we do not consider them all as false. Consider Snort signatures without associated vulnerability. For example, Snort has signatures for ICMP Destination Unreachable messages, and when the sensor triggers the alert, it is not issued mistakenly, even though likely to be rather useless as such for the operator. We prefer to call these kinds of alerts as the background noise from an operational information system, not false alerts. It is impossible to make the difference between normal activity and attacks/problems in the way they manifest themselves in this type of alerts by looking at the individual alerts. However, we believe that the distinction can be made by analyzing the aggregated alert flow, and especially its intensity, the number of alerts in a time unit with respect to recent history. Returning to ICMP Destination Unreachable messages, unusual changes in the alert intensity can indicate a problem, that is possibly security related. Thus we prefer to monitor alert aggregates instead of deactivating the prolific signatures or filtering on alert-by-alert basis. As mentioned before, alert verification techniques are not a feasible solution for this type of alerts. In addition, we have observed in our operating environment rather significant regularities in alert flows, having most likely non-malicious origins. Hence not any changes is alert intensity, but only changes in and deviations from the regular behavior are interesting for the operator, and would require further investigation. As these regularities account for a large majority of alerts, we aim to model the regularity in order to filter it out. Ordinary network sensors, for example information provided by netflow, 1 could be used to monitor some of these packets at the network level as has been done in e.g. [2]. 1

http://www.cisco.com/go/netflow/

• Last, but not least, Snort is the interface to the monitored network at our disposal. It is a fact we need to accept in our operating environment, and we try to make the best use of available tools.

3.

RELATED WORK

In this section we present related work, in intrusion detection and alert correlation domains, and point out the differences between these approaches and our work.

3.1

Intrusion detection

In [18,19] exponentially weighted moving average (EWMA) control charts are used to monitor Solaris BSM event intensity to detect denial-of-service attacks. The monitoring method is similar to our previous work [17], but it is applied at sensor level, and to host based data source. Mahadik et al. [11] use EWMA based monitoring for Quality of Service (QoS) parameters, like jitter and packet drop rate, in a DiffServ network to detect attacks on QoS. Thus it can be seen to use network based data source. Our approach uses different data source, namely alerts instead of QoS parameters, and the processing method is different. We do apply a similar detection method for anomalies, but we pre-process the alert series and model the normal behavior of the alert flow differently.

3.2

Alert correlation

Among the several proposed correlation methods and techniques, we consider the following being related to our work. Qin and Lee [15] use Granger Causality Test to find unknown relationships in alert data. The time series data is formed as the number of occurrences of meta-alerts in a time unit, and they search causal relationships between the series. The meta-alerts are formed by aggregating alerts sharing other attributes, as defined in IDMEF, than the time stamp, therefore they use more fine grained aggregation criteria than we. We filter out the normal component in one specific type of alert flow instead of trying to track attacks spreading across multiple steps (i.e./ alert series). In other words, the objectives and the used techniques are different. Julisch [7, 8] uses data mining to find root causes for false alerts, and proposes to filter those alerts when it is not possible to fix the root causes. The filters are based on alert attributes and work on alert-by-alert basis. Firstly, changes in the alert intensity created by the root cause can be interesting, and this information is lost, if such filters are in

Table 1: Five most prolific signatures in the first data set signature

SID

# of alerts

SNMP Request Whatsup ICMP Dest Unr LOCAL-POLICY Speedera sum

1417 482 485 480

176 009 72 427 57 420 51 674 32 961 390 491

place. Secondly, for some alert flows, there is no finite set of root causes that could be easily filtered. Episode rules [13] could provide visibility over several alerts, but in [8] they were reported as error prone and laborious approach.

4. ALERT CORPUS In this section we describe the used alert corpus used in the tests. We will also provide justification for the narrow scope of the approach, define different flow profiles, and analyze five high-volume alert flows in detail. The data set consists of alerts generated by three Snort sensors deployed in an operational information system, one closer to the Internet, and two in more protected areas. The sensors store the alerts in a database on a remote server, and our processing component is situated on the server. As the monitored traffic was real, the absolute truth concerning the nature of observed traffic remains unknown. On the other hand, alerts from a real system contain a wealthy amount of background noise, which is challenging for the correlation methods and might be difficult to simulate. The data is the same that was used in [17], and the main motivation for this work was the shortcomings of the model used in that paper with certain types of alert flows. These shortcomings are discussed in more detail in section 6. The data was accumulated over 42 days, and contained over 500k alerts. The sensors had the default set of signatures activated, at that time approximately 2000 signatures plus some additional custom signatures. 315 signatures had triggered alerts, and only five had generated 68% of the total number of alerts. Table 1 shows these five signatures with their Snort IDs (SID) and the number of alerts generated by them. All these five react on packets, that can be generated by normal system functioning. Therefore we consider it worthwhile to focus on processing this type of alerts. This reasoning may be specific to the information system in question. However, the reported examples, such as in [8] would suggest that this could apply also to a wider range of information systems.

B Known, periodic: The alert flow contains clearly visible periodicity and possibly a constant component with a benign origin. The flow contains some anomalies, of which we can explain a majority. C Unknown, periodic: The alert flow is less stable than in class B, it has more anomalies visible and we do not know how to explain them. D Unknown, random-like: The flow seems to be more or less random, only very little or no structure is visible with plain eye. We have only limited explanations for the origins of these alerts. The regularity and explicability span two axes, and the placement of classes with respect to them is depicted in Fig. 1. None of these four classes falls into the second quadrant, as those alerts are usually associated with a vulnerability, and are either true manifestations of attacks or false positives. Most correlation work focuses on processing this type of alerts via techniques like alert fusion, verification, and finding the prerequisites and the consequences among them. In the first quadrant we have two different classes, A and B. The reason in having two separate classes is the the different type of regularity, constant and periodic, of the classes A and B, respectively. Packets causing constant alert flows have typically machine related origins given their clock-like behavior, whereas humane activity can create periodic behavior. For example, if the network traffic triggering the alerts is created by actions of a large number of persons, natural rhythms with period of one day or week are likely to exist in the alert flows.

4.2

Now let’s have a closer look at the five signatures and the alert flows they generated. For each, we give a short description, enumerate the identified interesting phenomena, and provide explanations for the phenomena when we have them. Actual signature documentation can be found on the Snort web site2 with the Snort ID. Figure 2 shows the alert intensities as the function of time for the measurement period. Dotted lines show the division to estimation and validation data, as will be explained in section 6. 







































































explicable

2

low regularity

4.1 Profiling alert flows

A

1

B

high regularity C

While investigating collected alerts, this corpus and generally, we have identified four flow profiles for alerts generated by prolific signatures. Profiles are defined according to their regularity and our capability to explain seen behavior, normal or anomalous. A Known, constant: The alert generation is almost constant. The flow has relatively few anomalies that we can explain and attribute to 1) change in the interaction configuration 2) a problem.

Alert flow analysis

D 3

non explicable

4

Figure 1: The alert classes with respect to the two axes, regularity and explicability 2

http://www.snort.org

SNMP Request UDP reacts to Simple Network Management Protocol (SNMP) request from external sources towards internal hosts. As the source address is external, the request messages are likely caused by a misconfiguration outside the operator’s control. The alert flow is extremely regular, as can be seen in Fig 2(a), with few peaks and valleys plus some smaller anomalies. In this case, there were few particular source addresses responsible for the large bulk of alerts i.e. the root causes were identified, and alerts generated by those nodes could have been filtered out. However, in that case the operator would have also missed the sharp change in the alert intensity marked with p1 . We identified five interesting phenomena in the flow. The first, p1 is a huge peak, likely caused by a change in the interaction configuration at the external source, since it also introduced an increase in the constant component of the flow. Then both p2 and p4 are smaller peaks, and p3 , p5 are drops, probably due to connectivity problems, in the otherwise extremely constant alert flow. There was also low intensity activity that is almost invisible in the picture. This type of low-level activity would be easily lost among the bulk of alerts if the alerts were processed manually. In the case of completely shunning the signature this activity would have been simply impossible to detect. We place this alert flow in the class A because of its constant and explicable nature. ICMP PING WhatsupGold Windows is triggered by ICMP Echo messages with a payload claiming them to be created by a performance measurement tool3 . The source address is by definition external, so the root cause for these alerts is likely to be out of operator’s sphere of influence. In this flow, visible in Fig. 2(b), we see a constant component, and on top of which we have a periodic component creating peaks on five workdays but no additional alerts during the week-ends. Alerts generated by these two components are regular and considered as legitimate, actually originating from the usage of the performance measurement tool. The tool includes a server polling the the network nodes to find out their availability, and thus the normal flow behavior is independent of the general network traffic volume in contrast to the case speedera. This automated origin explains also the high regularity of the daily and weekly variations and the existence of the constant component. The interesting phenomena are the shifts in the constant level. Phenomena p1 , p4 , p6 and p7 are drops, and p2 , p3 , p5 and p8 are increases in the constant component intensity. As with the previous flow, these are changes in the alert generation rate, and filtering based solely on alert attributes would prevent the operator from seeing these changes. We place this flow in the class C because of its periodic nature, and since even though we know the origins of the normal behavior, it is difficult to enumerate them for the anomalies. ICMP Destination Unreachable Communication Administratively Prohibited reacts to particular ICMP messages normally generated when a network node, e.g. a router, drops a packet for administrative reasons. For this flow, shown in Fig. 2(c), there are several possible causes for the alerts like backscatter from DDoS attacks,

network outages or routing problems. Given the erratic profile, it is likely a combination of all these and more. In other words, we could not define the root causes. These messages are like background radiation, almost any network is likely to have them in the inbound traffic as hosts within try to connect to invalid/non-accessible addresses. Consequently, filtering based on individual alerts would be difficult. The flow profile does not contain as much structure as the others. When smoothing the intensity with EWMA some structure came visible, indicating a rather weak week rhythm. A correlation between the network usage and incoming ICMP Destination Unreachable seems quite logical, and would explain the week rhythm. We place this flow in the class C, as we cannot associate the high variability nor the anomalies to a small set of root causes. LOCAL-POLICY External connexion from HTTP server is a custom signature with quite self-explicative name. The rationale for this signature is that as all the back-end servers are internal, a web server should not be on the SYN side of a TCP handshake with external hosts, unless for example infected by a worm. Nonetheless they did and up to this day we have not been able to explain this behavior. The alert intensity is depicted in Fig. 2(d). Even though the profile consists of impulses, some periodicity can be found. There are peaks of thousands of alerts following a weekly rhythm, and smaller peaks following daily and even shorter cycles. In addition there are anomalies: p1 is missing and p3 is a change in the low-level activity. The phenomenon p2 is an extremely high peak with respect to past observations, and p4 and p5 are large peaks breaking the normal rhythm. We place this flow in the class C as we lack explanations for both normal and anomalous behavior, and since the peaks have periodic nature. ICMP PING speedera triggers on ICMP Echo messages carrying a payload claiming the packet to originate from a server belonging to content distribution company Speedera4 trying to determine the closest cache to the host requesting content. The number of these messages is proportional to the accesses to sites hosted by the company. Consequently the flow, shown in Fig. 2(e), has strong periodic component with peaks during working hours and valleys during night time and weekends. We consider the periodic component as normal system behavior. Two anomalies were identified, p1 is a peak during high activity hours, and p2 is a peak during low activity period. These are likely to be anomalies caused by an increase in the legitimate traffic, related to something like the release of Pirelli’s calendar on a server hosted by Speedera. We place the flow in the class B because it is highly periodic, and as we are able to explain both the normal behavior and the anomalies. Next some conclusions we have made from the analysis. In general, large amounts of alerts are not false positives in the strict sense, but background noise generated by prolific signatures reacting to the normal functioning of the system, as discussed earlier. Referring to the noise source classes of section 1, these alerts fall into category four. The interesting phenomena in these flows are often intermittent, and con-

3 http://www.ipswitch.com/Products/WhatsUp/ professional/

4

http://www.speedera.com

sist of 1) changes in the intensity generated by the normal system behavior or by so called root causes, and/or 2) appearance of abnormal sources, i.e. something else than the root causes, both representing typically intermittent activity. We need flow level filtering since alert by alert filtering generally loses the case one and without any filtering the case two risks to be lost in the noise. We wish also to point out that four out of five signatures trigger alerts almost in a non-ending stream. Without further automation of alert processing, the operator would need to be analyzing these alerts constantly. This would be a tedious task, requiring at least time and likely to strain analysts’ mental resources. Based on our observations we suppose that the regularity and the stationary structure present in these alert flows have benign origins. By definition stationary structure has existed in the past and has remained unchanged. Thus it should be normal also according to the underlying hypothesis of anomaly detection.

5. METHODOLOGY FOR ALERT FLOW MODELING AND FILTERING In this section we propose a methodology to model and filter alert flows. To present our observations in a more exact manner, we will use some notation and concepts from the field of time series analysis, mostly according to [3, 4]. It should be mentioned that the theory and methods on which we build are classical and simple, our intention is to map them to alert processing and to our problem, and then to propose a methodology to apply these methods. As the methods are simple, they are likely not the most suitable for all alert flows. However, this is our first trial with techniques from the time series analysis discipline, and we wanted to see first what we can achieve with these simple methods. In addition, obtained results will serve as a yard stick to measure the improvements in future work. We begin with an overview of the proposed methodology, and then describe each step in more detail.

5.1 Overview An alert flow is the stream of successive alerts meeting the aggregation criteria. We aggregate alerts according to the signature that generated the alert, and view the flow intensity measurements taken at fixed intervals as a time series St . The objective is to filter out from this flow or time series the components that correspond to normal activity of the information system. A discrete-time time series model for a set of observations {xt }, is a representation of the joint distributions of a sequence of random variables {Xt } of which {xt } is postulated to be one realization. Each xt is recorded at time t, where the set T0 of observation times is a discrete set. Here we consider only observations made with fixed time intervals. In practice, the specification of joint distributions is rarely available, and unfeasible to estimate, only first- and second order moments of the joint distributions are used. We use a model St for alert series St = Xt + Et ,

(1)

where Xt represents the part of the alert flow caused by the normal activity, and Et the remaining i.e. the abnormal part of the alert flow. As mentioned before, as the result

of alert flow analysis, we consider regularity and stationary phenomena in alert flows as part of normal system behavior. Whereas the intermittent, non-stationary phenomena in alert series we regard as caused by abnormal system behavior. This abnormal behavior can be attacks or more general problems. The distinction between alerts related to normal and abnormal activity is invisible at the alert level, but can be seen at the flow level. According to the classical decomposition model [3] a time series Xt can be considered to contain three, trend, periodic and random, components Xt = Tt + Pt + Rt ,

(2)

where Tt is a slowly changing trend, Pt is the periodic component with a known period d, and Rt is random stationary component. Note that Rt is not necessarily completely random in the common sense of the word, but can contain structure that does not fit into Tt nor into Pt . In the literature Rt is also called the noise component, but to avoid the collision with our term background noise, we prefer to use random component for Rt . For example, Tt would correspond to the slow changes in the alert intensity caused by the overall changes of the monitored traffic over the time. Pt would correspond to the periodicity in alert flows created by periodic system activity related to working hours and working days. As a more specific example, ICMP Ping speedera has a strong periodic component, see Fig. 2(e). Less trivial behavior that is still present and invariant over a longer time would correspond to the structured part of Rt . To let the operator focus on alerts warranting further investigation, we filter out alerts conforming to the description of normal system behavior i.e. the components of alert flow that fit into the Xt of the equation (2). The abnormal component of an alert flow is the residual series Et = St − Xt , and only most significant phenomena of Et are signaled to the operator as meta-alerts. Figure 3 depicts the steps in this process. The transformation from data to series S is in this case trivial, as we only count the number of alerts matching the aggregation criteria in a time unit, one hour. We acknowledge that the choice of the time unit affects especially the timeliness of the detection and the visibility of certain phenomena in alert flows. Given the noise-like nature of monitored alerts, we see that the hourly measurements are sufficient. We also consider only univariate series even though other transforms creating a multivariate series from alert data to series could be envisaged. In 5.2 and 5.3 we describe how to remove Tt and Pt , respectively, from St to obtain first detrended series St0 and then St00 from which the periodic component is removed. Now the series St00 contains the random, Rt , and the abnormal, Et , components. The structure in Rt is captured in a time series model, described in 5.4, and Et is obtained as the 00 difference between model output Sˆt and the observations 00 St . Finally, the detection of most significant anomalies is detailed in 5.5.

5.2

Removing the Trend

First of all, we assume that the trend Tt possibly present in alert flow is linear. The reasoning behind this is based on the common sense and on our observations. If the trend is of higher degree, then there is a serious problem with the sensors, and the alert storage capacity is running out very

1000

1000

p4 p8

100

p5

Alerts per hour

Alerts per hour

p2

100

p2

p1

p3 p4

p5

10

p1

p6 p7 p3

10 Feb−08

Feb−15

Feb−22

Mar−01 Date

Mar−08

1 Feb−08

Mar−15

Feb−15

Feb−22

Mar−01 Date

Mar−08

Mar−15

(b) Whatsup, Class C

(a) SNMP Request, Class A 1000

10000

p4 p5

p2

Alerts per hour

Alerts per hour

1000 100

10

100

p3

p1

10

1 Feb−08

Feb−15

Feb−22

Mar−01 Date

Mar−08

1 Feb−08

Mar−15

(c) Dest. Unr., Class D

Feb−15

Feb−22

Mar−01 Date

Mar−08

Mar−15

(d) LOCAL-POLICY, Class C

1000

p1

Alerts per hour

p2 100

10

1 Feb−08

Feb−15

Feb−22

Mar−01 Date

Mar−08

Mar−15

(e) Speedera, Class B

Figure 2: Hourly alert intensity for five most prolific flows aggregated by signature. Horizontal axis is the time, and the vertical shows the number of alerts per hour in log-scale. The dashed line shows the division to estimation and validation data used for experimentations. The arrows point out phenomena in which we are interested in

data . .. to signal

alerts

s001

s01

s1 s. 1 .. sn

∇1

∇d

trend removal trend removal

sˆ001

e detection

AR model of stationary structure

stationary structure removal

Figure 3: Diagram of the detection process. ing phenomena, more apparent. The transformed series, i.e. the one consisting of the values St0 of Fig. 3, is showed in Fig. 4. We apply ∇1 to all series. This will cost us a loss of information contained in the series, but for the anomaly detection the linear trend and the absolute value of the alert intensity are not necessary.

1000 800 600 400 200

5.3

0

−400 −600 −800

Removing the Periodicity

As mentioned above, the ∇d operator can be used also to remove the periodic component from time series. The application of ∇d to the model Xt of the equation (2) without the trend component, which can be removed as shown above, and where Pt has the period d, results to

−200

0

5

10

15

20

25

30

35

40

45

∇d Xt = Pt − Pt−d + Rt − Rt−d = Rt − Rt−d .

Figure 4: SNMP Request UDP after detrending. The constant component has been eliminated, but the changes of constant level remain visible

soon. This cannot be the normal situation, and if happening, we do not wish to eliminate such information from the series. There are several ways to remove the trend and the periodic components from time series [4]. We have chosen to use lag-d differencing operator, ∇d , for two reasons. First, it does not require the trend to remain constant over time, and second, it does not require estimation of several parameters. An additional plus is that it can be applied to remove both the trend and the periodic components. It is defined as ∇d Xt = Xt − Xt−d .

(3)

In other words, the resulting series is the difference between two observations of the original series, d time units apart. With d = 1 this is the analogy of the derivation operator for continuous functions. When applying ∇1 to a linear trend Tt of form Tt = bt+c, we obtain ∇1 Tt = Tt − Tt−1 = bt + c − (b (t − 1)) + c = b, the slope of the trend function. If viewed as the derivation, this step leaves us with a series representing the rate of change in the alert flow. For example with SNMP Request UDP, this step removes the constant component, visible in Fig 2(a), and makes the level shifts, and thus the interest-

(4)

After this operation, were left with a random component (Rt − Rt−d ). The application of ∇d requires the knowledge of the period d. We will develop this point further in section 6, for the moment let us assume that it is known. We decide whether ∇d is applied to a series based on sample autocorrelation function values of S 0 . It can be shown [3, p. 222] that for a series without any structure, about 95% of the sample autocorrelations should fall within the bounds √ ±1.96/ n, where n is the number of samples. If the sample autocorrelation values for the lag d are outside these limits, we apply ∇d to the series.

5.4

Removing the Stationary Structure

To remove the remaining stationary structure Rt from the series we build a model of this structure. We then use the series from which we have removed trend and periodicity, St00 , as input to the model and then remove the model’s 00 output Sˆt from St00 . This leaves us the abnormal component Et . The structure in Rt is captured in an AR(p) time series model, defined as Rt =

p X

φk Rt−k + Zt ,

(5)

k=1

` ´ where {Zt } ∼ WN 0, σ 2 i.e. white noise. In English this means that the value of Rt at the current instant t is obtained as a sum of two terms, the weighted sum of p previous observations of Rt , and the noise term. The model of equation (5) is a parametric model, and thus we need to 1) choose the model degree p, and 2) estimate

0.5

0.4

0.3

0.2

Sample ACF

parameters φk for k = 1, . . . , p before we can use the model. Generally, the choice of p is done by building different models, and choosing the “best” according to some criterion. In our case this is a decent compromise between interesting phenomena versus the non-interesting signaled by the model using the estimation data. The parameter estimation is done using an algorithm based on least squares. In brief, this means that the chosen parameters minimize the square of difference between observations and the prediction done by the model.5 If the model of the equation (2) would describe the normal component of the alert series St sufficiently, after the trend and the periodicity have been removed, the need for further modeling with AR models could be defined by testing the whiteness of the remaining series for example with LjungBox test [10]. If S 00 resembles white noise, there is no more structure to be modeled, and this step is of no use. However, as the real world alert data contains anomalies, and model of the equation (2) is not perfect, this approach is less feasible. For the moment we build the AR model for every series, which can cause some unnecessary computational overhead. This step differs from the previous ones since we use training data for the parameter estimation. Once the training has been done and the parameters have been estimated, we can use each new observation St00 as input to the model, and obtain Et as00difference between the observation and the model output Sˆt .

0.1

0

−0.1

−0.2

6. RESULTS In this section we present the results obtained when we applying this methodology to the alerts described in section 4. The first part of alert corpus, 406 observations, was used as estimation data for AR model parameters, and the 5 For details, see http://www.mathworks.com/access/ helpdesk/help/toolbox/ident/arx.html

1

2

3

4

5

6

7

8

9

Lag

Figure 5: The sample autocorrelation values for Whatsup after trend removal Table 2: Strongest periods found by algorithm Flow SNMP Whatsup Dest Unr LOCAL-POLICY Speedera

5.5 Anomaly Detection After these steps, we have isolated the abnormal component Et of the alert series. If the original series St does not contain any anomalies, and our model of normal behavior Xt `is exact, ´ the abnormal component is white noise, Et ∼ WN 0, σ 2 . However, in reality usually neither is the case. Anomalies in alert series and model insufficiencies, both at conceptual level and in the parameter estimation, mean that Et is not white noise. To avoid signaling artifacts caused by model deficits and random variations, we pick out only the most significant changes in Et . An anomaly is signaled if the current value et differs more than n standard deviations from the average of the past values. The default value for n is three, but it can be adjusted to increase or reduce the number of anomalies signaled. The average of past values and standard deviation are estimated using exponentially weighted moving averages. The detection method is described with more details in [17]. The knowledge concerning the normal activity could be interesting as such, as it can help the operator to better understand the monitored system. However, this kind of analysis would need some level of understanding of the underlying theory from the operator. To keep things simple we present him only the most significant phenomena of the abnormal part of the alert flow.

0

Lags (hours) 168 144 168 24

24 72 167 12

23 48 24 10

15 24 24 2

latter part, 600 observations, was left for validation. The tool was implemented using Matlab.

6.1

The Periodicity in Alert Flows

To remove the periodic component Pt with a period d using ∇d , we need to know d. Visual inspection of alert series and the sample autocorrelation function (ACF) values as well as the intuition suggested periods close to one day or week. Figure 5 shows the sample ACF values for Whatsup up to a lag of nine days. The dashed lines show the 95% confidence interval for sample ACF values of white noise. One can see that there is rather strong positive correlation for one and seven days shifts as the highest sample ACF values are at lags corresponding to one week and one day. We also used an algorithm that removed the periodic component corresponding to the lag of the largest absolute value of the sample ACF, and then worked down recursively towards shorter lags. Table 2 shows the first four lags used by the algorithm for the periodicity removal. The first i.e. the strongest periodic component removed had the period a multiple of 24 hours for all but SNMP Request udp, which does not have any periodic component present as can been seen in Fig 2(a). If the alert series showed significant autocorrelation, it had strong weekly and daily components. This observation confirmed the intuition we had had concerning the periodicity in the alert flows. We chose to use ∇d only with d corresponding to a week, if at all, for the following reasons: 1) every application of ∇d suppresses information from series, 2) causes the loss of first d observations, since there is not enough historic data to apply the operator, 3) applying ∇d with d corresponding

to a day or a week removed the majority of the strong autocorrelations, and 4) the autocorrelations with the lag of 168 hours (a week) were among the strongest, As explained in section 5.3, the sample ACF value corresponding to a lag of one week is used to determine whether ∇d is applied to a flow.

6.2 Defining the model degrees We used several model degrees p, namely 4, 10, 16, 26 with the estimation data to find out the most suitable for each flow. In addition to AR models, a range of ARMA (p, q) models were estimated. At least with the used model degrees and estimation methods they did not give significant improvement if any over the AR models but their estimation is more resource consuming. The p was chosen such that we could detect as many as possible of the interesting phenomena described in Sect 4.2, and have as few meta-alerts issued on the non-interesting flow behavior. According to anomalies signaled from estimation data, we chose the model degrees as follows: SNMP AR(4), Whatsup AR(26), Dest Unr AR(26), LOCAL POLICY AR(26), and speedera AR(26).

6.3 Detected anomalies After the model degrees were fixed and parameters estimated from the estimation data, we applied the complete alert processing method to the validation data. Next we examine in detail the anomalies signaled for each flow by the chosen models. The overall statistics can be seen in Table 3. For each flow the number of signaled anomalies is in column An. The detection of known and interesting phenomena pi is covered in columns K+ and K-, showing the signaled and missed phenomena, respectively. The pi were identified in section 4.2. Note that not all pi are in the validation data and K- does not contribute into An. In addition, the tool signaled also new anomalies. These can be 1) new, interesting phenomena unidentified in the manual inspection, 2) uninteresting phenomena that at the first glance could seem somewhat significant, but actually are part of the normal behavior, 3) artifacts created by the transforms we have performed in the processing chain, like the use of ∇week operator. The case one phenomena are useful to the operator, and the number of occurrences is recorded in column N+. The case two anomalies are rather harmless, and usually quite easily identified as such. However, they do waste operator’s time. The case three is the worst, as the tool signals artificial anomalies where they do not exist. Depending on the alert flow, their correct identification can be trivial or quite difficult. The number of occurrences for the two latter is recorded in column N-. For speedera we depict the N+ and N- in Fig. 6 We used also the EWMA model of [17] on validation data for comparison. SNMP All the four known phenomena in the validation data were signaled, and all the additional anomalies are real anomalies, small vibrations on top of the constant alert flow. In that sense they are not harmful at all, we just considered the eleven others less interesting than the largest ones. The new phenomena signaled can be manifestations of SNMP traffic injection, or harmless but intermittent behavior in the otherwise extremely constant flow. For this flow the EWMA approach provided the same meta-alerts, which is not surprising given the constant nature of the alert flow.

Table 3: Signaled and missed phenomena Flow

An

K+

K-

N+

N-

SNMP Whatsup Dest unr Local policy Speedera Total

15 11 12 12 5 55

p2 , p3 , p4 , p5 p 2 , p3 , p6 , p8 p 2 , p4 , p5 p 1 , p1 14

p4 , p5 , p7 p3 4

11 2 3 4 1 21

0 5 9 5 2 21

Whatsup The known phenomena p2 , p3 , p6 , and p8 were detected, but p4 , p5 , and p7 were missed by the tool. All the phenomena are shifts in constant level activity. The common factors for K- are the smallness of the shift and the fact that the shift is placed after other anomalies. The reason for this will be discussed in more detail in section 6.4. The two new interesting anomalies are actually the delayed detection of missed p4 and p7 as the tool reacts to the changed height of a weekday peak after the constant level decreases. Of the five N- occurrences, two are extra anomalies or duplicates, signaled right after an interesting known or new phenomenon. The remaining three were issued due to changed intensity of the weekday peaks compared to the observations of the previous week. We did not consider these interesting enough, and placed them into this category. As such, these phenomena highlighted by the tool are not harmful, or can even be useful for the operator. Compared to the EWMA approach, the signaled anomalies are almost the same. However, the model of normal behavior is more accurate, and even the missed known anomalies can be seen in the residual series Et even though the detection method did not pick them out. Destination Unreachable This flow did not contain any known, interesting phenomena, only a rather weak weekly rhythm. The tool signaled in total twelve anomalies. We placed three into the interesting category, as they pointed out overall intensity changes in both high (weekday) and low (weekend) activity periods with respect to the situation week before. Of the nine N- anomalies, two were clearly artifacts created by ∇week . They were relatively easy to identify as artificial phenomenon. For the remaining seven we could not give an explanation. At first this might seem like a disappointing result. However, it is also interesting to note that the periodicity removal step explains quite many peaks of the original flow, which could have seemed suspicious by manual inspection. Even though if we only signal three interesting phenomena, we are able to explain many more as part of the weekly rhythm. The EWMA approach signaled over 50% more anomalies, mostly peaks part of the weekly rhythm. It was unable to point out the level shifts we found out using the proposed methodology. LOCAL-POLICY The known, interesting phenomena p2 , p4 , and p5 were signaled, and p3 was missed. Given that the missed p3 is a change in high frequency peak intensity levels from 1 to 2, among the peaks reaching up to 10000 alerts, we do not consider this as a serious shortcoming. The tool pointed out four new, interesting phenomena. These were additional or missing peaks breaking the weekly rhythm, or changes in the intensity of periodic peaks. Looking the situation a posteriori, these seem quite evident, but

6.4 Discussion Overall, we can model regular and periodic behavior in alert flows, and we detect sharp and impulse-like peaks and valleys (all five flows) outside the day and the week rhythms of the alert flow. On the other hand, normal behavior of alert flows consisting only of impulse like peaks, e.g. LOCAL POLICY, is difficult to capture with proposed model. We can also detect, up to certain degree, shifts in the constant components (SNMP and Whatsup) and the overall intensity (Dest. Unr.). However, the series detrending makes the detection of these shifts more difficult, as only the transition remains visible afterwards. Preceding variations in the residual series (abnormal component) can mask the current anomaly because the detec-

200

1

p1 n1

Intensity

before more thorough analysis of the flow structure these were not so easy to pick out. The five N- contain two artifacts generated by ∇week , two anomalies signaled on extremely high (3000 and 8000 alerts) peaks that however are part of the weekly rhythm, and one anomaly that could be caused by variations in the low-level components. These variations are interesting, but as we cannot be sure that the anomaly is really signaled because of these variations, it was assigned in N-. As the two artificial anomalies were easy to identify, the N- category for this flow contains rather harmless meta-alerts. Even though the alert flow is quite challenging to process, consisting mainly of huge alert impulses, we are able to pick out the known interesting phenomena. In addition, the analysis performed with this methodology helped us to better understand the structure in the alert flow by signaling new anomalies, and by not reacting to certain peaks. The EWMA approach had difficulties particularly with this flow. In practice it signaled every peak, and was unable to leave out the peaks following the weekly rhythm. Speedera The flow is depicted with the signaled anomalies in Fig. 6. It contained only two known, interesting phenomena, p1 and p2 , both of which were detected. The N+ contained only one alert n2 , pointing out a lowerthan-normal intensity Friday. In the N- two signaled anomalies were related to artifacts created by ∇week . Both known anomalies are echoed in transformed series ( s001 of Fig. 3) one week later, p1 causing n3 , and p2 causing n4 . Knowing the behavior of the ∇week , these artifacts were easy to identify in this flow. The phenomenon n1 in the beginning of the series is an artifact of two adjacent anomalies created by the two facts, 1) an AR(p) model starts to provide good predictions starting only from (p + 1)th observation fed into the equation (5), and 2) the detection component needs to receive more data for correct thresholding. This type of artifact is created both with this methodology and with the EWMA approach for every flow. We can ignore these anomalies systematically, and thus have excluded them from the statistics. We chose to visualize speedera’s results as they contain good examples of unwanted artifacts that can be created by ∇week. The results also show how well we are able to filter out the alerts being part of the normal flow behavior. The EWMA approach signals an anomaly only in the beginning of every Monday, when the periodic intensity increase takes place. As such it is unable to both cope with the strong periodic component, and to detect any non drastic changes in the flow behavior.

n2

n3

p2

n4

100

0

0.5

0

5

10

15

20

0 25

Time (d)

Figure 6: The detected anomalies for speedera. The series is the original alert flow, corresponding to the values s1 of Fig. 3 and signaled anomalies are indicated as peaks

tion threshold is based on the standard deviation. Therefore small level shifts, like the ones in Whatsup, are in danger to be lost when alert flow is not very regular or constant. However, the shifts are present in the residuals, and could be picked out with other means. In other words, we could possibly catch more anomalies by developing the detection component of the processing chain (Fig. 3) or just by customizing the smoothing factor and the alerting thresholds, However, the current approach works sufficiently well yet being rather generic, and generic tools are easier to deploy. For these reasons we regarded the extra step not worth the trouble. Now coming back to our main objective: allowing the operator to focus on more relevant tasks by relieving him from the manual inspection of numerous benign alerts. As we are processing alert flows and the meta-alerts point to a time interval, a direct comparison of alert numbers before and after is not the most suitable metric to use. If investigating the alerts manually, the operator is probably not monitoring this type of alerts constantly, but more likely checks from time to time if the situation seems normal or not. Let us assume for the sake of the discussion that from time to time is once in an hour. The operator would need to perform at least a minimal check every hour during which alerts are generated. Since in normal situation alerts occur all the time, also zero activity intervals could incite additional checks. Table 4 shows the number of meta-alerts issued by the tool in the second column and the number of non-zero intervals requiring inspection without automated processing in the validation data for each flow in the third column. The fourth column shows the time gain as the proportion of these one hour time slots freed from manual inspection when alert processing is automated. The proposed approach can relieve the operator from 90% or more of the status checks, and the gained time can be spent on more relevant tasks. In addition, the analysis done by applying the methodology can point out phenomena that would be difficult to see via manual inspection, even if looking at the alerts at the flow level.

Table 4: The gain in time slots with the methodology Flow

Automated

Manual

Gain

15 11 12 12 5

564 390 556 118 518

0.97 0.97 0.98 0.90 0.99

SNMP Whatsup Dest Unr LOCAL-POLICY Speedera

On the negative side, the methodology signals few artificial phenomena, but they seem to be usually rather easy to identify. The approach also misses some interesting phenomena, typically small changes close to larger disturbations. These costs need to be weighed against the gain. From the results we can see, that the number of false positives is relatively small. However, we cannot provide comprehensive statistics on the quality of detection as we lack analyzed and labeled real world alert sets. The proposed methodology has been tailored for our needs, for the alert types and volumes we encounter. Part of the reported flow behaviors are likely to be specific to our environment. This is not a general solution for all alert processing, but a component for a specific task in larger alert management system. However, we believe that similar problems exist also in other information systems, and that this processing methodology can be useful also elsewhere. It should also be pointed out that like all anomaly detection approaches, the methodology does not provide diagnostic of the meta-alert’s cause, and it is not even intended to help in root cause analysis.

7. CONCLUSIONS In this paper we have presented an analysis of real world alert data. The alerts consisted mainly of types that can be considered as background noise of an operational information system. Among the noise there can be premises or consequences of attacks or more general problems, and therefore monitoring this kind of background noise can be interesting with suitable methods, despite the high alert volumes. The analysis revealed significant structure in the alert flows, both visually and with basic mathematical tools. Based on the observations, we proposed a methodology to process these alerts in order to highlight the interesting phenomena contained in the alert flows. We assume that the stationary structure in these flows originates from the normal behavior of the monitored system. We model the normal behavior, and then signal deviations from the model. The process consists of two phases, the estimation and the detection. In both we perform the necessary steps to build an alert series, and to remove the trend and the periodicity from the series. Then, during the estimation phase, we estimate a time series model for the remaining structure in the signal. During the detection phase, the model output is compared to the observations to isolate the abnormal component of the alert flow for further analysis. We presented results obtained with a tool using this methodology. They indicate that we can free significant amount of the time spent on analyzing these alerts when compared to the manual processing. As with all automated processing methods, there is a risk of filtering interesting alerts. In ad-

dition the proposed processing has the possible side effect of creating artificial anomalies into the alert data. However, these risks seem to be relatively small compared to the gains. As we use real world data for model estimation, we are incorporating also existing anomalies in addition to the normal behavior into model parameters. The results show that anomalies in estimation data did not have significant adverse effect on the detection capabilities. One should however keep this fact in mind, and avoid fitting the model too well to the data. We have used one hour sampling interval in this work. It can seem long, but given the nature of processed alerts we consider this reasonable. As discussed previously, this type of alerts are unlikely to be treated in real-time in any case. One might need to decrease the sampling interval for more timely detection, especially if trying to react to automated threats such as worms. Worm detection is not per se the goal of this work. In addition changes in sampling interval may affect the observed flow behaviors, and make certain characteristics appear or disappear. Currently we are performing this processing at the manager level of the intrusion detection framework. For certain alert flows, especially where no pattern matching is needed and the aggregation can be done using for example packet header information, it could be possible to push this kind of processing towards the sensors.

8.

REFERENCES

[1] S. Axelsson. The Base-Rate Fallacy and Its Implications for the Difficulty of Intrusion Detection. In Proc. of the ACM CCS’99, Nov. 1999. [2] P. Barford, J. Kline, D. Plonka, and A. Ron. A Signal Analysis of Network Traffic Anomalies. In Proc of ACM SIGCOMM Internet Measurement Workshop, Nov. 2002. [3] P. J. Brockwell and R. A. Davis. Time series: theory and methods. Springer Texts in Statistics, 1991. [4] P. J. Brockwell and R. A. Davis. Introduction to time series and forecasting. Springer Texts in Statistics, 2002. [5] H. Debar and B. Morin. Evaluation of the Diagnostic Capabilities of Commercial Intrusion Detection Systems. In Proc. of the RAID’02. Springer–Verlag, 2002. [6] H. Debar and A. Wespi. Aggregation and Correlation of Intrusion-Detection Alerts. In Proc. of the RAID’01. Springer–Verlag, 2001. [7] K. Julisch. Mining Alarm Clusters to Improve Alarm Handling Efficiency. In Proc. of the ACSAC’01, Dec. 2001. [8] K. Julisch and M. Dacier. Mining Intrusion Detection Alarms for Actionable Knowledge. In Proc. of the SIGKDD’02, 2002. [9] C. Kruegel and W. Robertson. Alert verification: Determining the success of intrusion attempts. In Proc. of the DIMVA’04, Dortmund, Germany, July 2004. [10] G. M. Ljung and G. E. P. Box. On a Measure of Lack of Fit in Time Series Models. Biometrica, 65(2):297–303, Aug. 1978. [11] V. A. Mahadik, X. Wu, and D. S. Reeves. Detection of Denial of QoS Attacks Based on χ2 Statistic and

[12]

[13]

[14]

[15]

EWMA Control Chart. URL: http://arqos.csc.ncsu.edu/papers.htm, Feb. 2002. S. Manganaris, M. Christensen, D. Zerkle, and K. Hermiz. A Data Mining Analysis of RTID Alarms. RAID’99, 1999. H. Mannila, H. Toivonen, and A. I. Virkamo. Discovering Frequent Episodes in Sequences. In Proc. of the KDD’95, 1995. P. A. Porras, M. W. Fong, and A. Valdes. A Mission-Impact-Based Approach to INFOSEC Alarm Correlation. In Proc. of the RAID’02. Springer–Verlag, 2002. X. Qin and W. Lee. Statistical Causality Analysis of INFOSEC Alert Data. In Proc. of the RAID’03. Springer–Verlag, 2003.

[16] A. Valdes and K. Skinner. Probabilistic Alert Correlation. In Proc. of the RAID’01. Springer–Verlag, 2001. [17] J. Viinikka and H. Debar. Monitoring IDS Background Noise Using EWMA Control Charts and Alert Information. In Proc. of the RAID’04, Springer–Verlag, 2004. [18] N. Ye, C. Borror, and Y. Chang. EWMA Techniques for Computer Intrusion Detection Through Anomalous Changes In Event Intensity. Quality and Reliability Engineering International, 18:443–451, 2002. [19] N. Ye, S. Vilbert, and Q. Chen. Computer Intrusion Detection Through EWMA for Autocorrelated and Uncorrelated Data. IEEE Transactions on Reliability, 52(1):75–82, Mar. 2003.

Suggest Documents