A Nonparametric Multichart CUSUM Test for Rapid ... - CiteSeerX

Proceedings of Joint Statistical Meetings Minneapolis, MN, 7-11 August, 2005

A Nonparametric Multichart CUSUM Test for Rapid Intrusion Detection Alexander G. Tartakovsky

Boris L. Rozovskii

Khushboo Shah

University of Southern California



Department of Mathematics

Department of Mathematics

Department of Electrical Engineering

Los Angeles, CA 90089



[email protected]

[email protected]

[email protected]

[7]. In UDP flooding attacks, attackers send many UDP packets to exhaust the capacity of the victim’s network link. In SYN flooding attacks, attackers make connection requests aimed at the victim server with packets that have unreachable source addresses. The server is not able to complete the connection requests and, as a result, the victim wastes all of its network resources. A relatively small flood of bogus packets could tie up memory, CPU, and applications, resulting in shutting down a server. It has been shown that more than 90% of the DoS attacks are SYN flood attacks [23]. In this paper, we demonstrate the efficiency of the proposed detection algorithm for detecting UDP and SYN flooding attacks using real traces collected by CAIDA. There is a wide variety of intrusion detection methods proposed in the literature. Several analyze different data streams, such as data mining for network traffic, statistical analysis for audit records, sequence analysis for operating system calls, information retrieval and inductive learning. See [8] for a detailed overview. Existing intrusion detection systems (IDSs) can be classified as either misuse detection systems or anomaly detection systems [8, 14]. Misuse detection techniques attempt to model attacks on a system as specific patterns, then systematically scan the system for occurrences of these patterns [26, 28]. By contrast, anomaly detection approaches attempt to detect intrusions by noting significant departures from a normal behavior [11], [16], [18], [30]. The approach undertaken in this paper falls under the class of network-based anomaly detection systems. Typically DoS flooding attacks occur at unknown points in time and lead to abrupt changes in the statistical properties of certain observables. It is therefore intuitively appealing to formulate the problem of detecting attacks as a quickest change-point detection problem: detect changes in statistical models as rapidly as possible (i.e. with minimal average delays) while maintaining the false alarm rate at a given level. See [1, 4, 20, 27, 35] for relevant results of change-point detection theory. The idea of using the sequential change-point detec-

Abstract: An efficient sequential nonparametric multichart (multichannel) CUSUM-type detection test for detecting changes in multichannel sensor systems is proposed. While there is a wide spectrum of applications where it is necessary to consider multichannel generalizations and general statistical models in change-point detection problems, the study in this paper is motivated by network security. Many kinds of intrusions in computer networks lead to abrupt changes in network traffic. These changes have to be detected as rapidly as possible while maintaining a false alarm rate at a low level. Computer intrusion detection encourages the development of a nonparametric multichannel change-point detection test that does not use exact legitimate (pre-change) and attack (post-change) traffic models. The proposed nonparametric detection procedure can be effectively applied to detect a wide variety of attacks such as external denial of service attacks, worm based attacks, port scanning, and insider man-in-the-middle attacks. Operating characteristics of the proposed multichannel CUSUM test are evaluated for real denial of service attacks using traces recently collected by CAIDA. The results of a comparison with a conventional singlechannel CUSUM algorithm show that the multichannel test has much better performance. Keywords: Change Detection; Multichart CUSUM Tests; Computer Intrusion Detection; Denial of Service Attacks.

1. Introduction During the past several years there has been multiple attempts to attack important corporate and governmental networks, server clusters, and other network resources. Examples are external distributed denial-of-service (DoS) attacks against several well known Internet servers such as Yahoo, Amazon, eBay, E*Trade, etc.[10]. Other examples are Internet-wide worm attacks and stealthy attacks by intruders posing as regular users. Software allowing hackers to initiate many varieties of attacks is becoming more and more available and easy to use. As a result, the defense against these serious threats is rapidly gaining importance. There are many kinds of DoS attacks. Typical ones are UDP flooding attacks [6] and TCP SYN flooding attacks 1

tion approach for detecting DoS attacks was introduced in [2]. This work demonstrated the potential usefulness of change-point detection techniques for detection of various flooding attacks. Since then there has been some work in the network security area of using advanced statistical methods and change-point detection methods in particular. In [41], the authors develop a SYN flooding detection mechanism based on the protocol behavior of TCP SYNFIN (RST) pairs and an instance of sequential changepoint detection method. In [34], the authors investigate statistical anomaly detection for SYN flooding by comparing two algorithms: an adaptive threshold algorithm and a particular application of the CUSUM algorithm for change-point detection. In [29], we have compared the performance of a single-channel nonparametric CUSUM (NP-CUSUM) procedure with an adaptive threshold algorithm proposed in [40] for detecting a SYN flooding attack. The adaptive threshold algorithm is based on a simple scan statistic — it triggers an alarm at time n if a run of size k of threshold violations is observed. It turns out that the NP-CUSUM algorithm detects the abnormality much faster than the adaptive threshold algorithm . Finally, the nonparametric approach proposed in [2] has been further improved in [15, 29, 38] in several directions: development of adaptive CUSUM tests and multichart CUSUM tests. In contrast to the sequential change-point problem suitable for “surveillance” applications where the homogeneity hypothesis is tested on-line in the process of data acquisition, the a posteriori change-point problem is considered on the fixed-time interval [1, 4]. In many applications, it is beneficial to combine both methods by grouping the data obtained in the fixed size intervals and first performing an intra-processing of the data in these fixedsize intervals. Then, the results of this intra-processing are further processed sequentially. The resulting procedure represents a multistage sequential procedure with batch processing within individual stages. The idea is similar to group sequential tests. The corresponding batchsequential detection method was proposed in [29]. As it was conjectured and partially proved in [38], the use of multichart tests is extremely important for detecting UDP, ICMP and TCP SYN flooding attacks. The major goal and contribution of the present paper is to further verify this conjecture, i.e., to show that the proposed multichannel (multichart) detection procedure typically performs significantly better than single-channel counterparts. To prove this fact we use several real data sets with real UDP and TCP SYN flooding attacks. Therefore, we may conclude that so far the need for multichart detection tests has been underestimated. In [29], we argue that despite the fact that in many cases DoS attacks are obvious, there are very important scenarios when DoS attacks represent serious threats to the ser-

vice provided by the network infrastructure and there is the need for early DoS attack detection. One of the scenarios where efficient detection of DoS attacks plays a critical role is when large Internet Service Providers (ISPs) with high speed networks provide relatively low capacity links to many customers. This includes large corporate and governmental networks that provide services to many small departments and divisions. In such a case, a DoS attack that overwhelms the customer’s link is essentially invisible to the ISP (see, e.g., [12]). The attack is only obvious to the customer who is defenseless. Therefore, the detection and prevention of the attack has to be performed at the level of the ISP, not at the customer’s site. Our detection method is able to detect low intensity attacks with small detection delays and a low false alarm rate and, therefore, promises a tool for this purpose. The paper is organized as follows. In Section 2, we first give a brief overview of change-point detection methods and then construct the multichannel detection algorithm that is being studied. In Section 3, we describe the data sets under study. In Section 4, we evaluate the proposed algorithm and provide comparison with a conventional single-channel detection procedure in the detection of UDP and SYN flooding attacks. Finally, we conclude in Section 5.

2. The Anomaly Detection Algorithm 2.1. Overview of the Approach There are a variety of applications where an information system undergoes an abrupt change at an unknown point in time, and the goal is to detect this change as soon as possible. Obviously, any detection mechanism will produce false alarms, which, ideally, would only rarely occur. Since the desire to detect changes quickly contradicts the desire to keep a low false alarm rate, there is a trade-off between two performance measures – the speed of detection (detection delay) and the frequency of false detections. The problem of achieving the smallest average detection delay (ADD) for a given false alarm rate (FAR) is a subject of Change-Point Detection Theory [1, 4, 35]. More specifically, in the sequential setting, the problem is formulated as a quickest detection problem: detect a change in the model as soon as possible after its occurrence. The design of the quickest change-point detection procedures involves optimizing the trade-off between the ADD and the FAR [20, 27], and a good detection procedure should have a low FAR and a small ADD. Network security is one particularly interesting application area where change-point detection methods can be applied effectively. The potential usefulness of the change-point detection methodology for the design of network-based anomaly detection systems in general and for detecting DoS attacks in particular was recognized by the authors in 2000-01 during a DARPA-funded fault 2

tolerant networks (FTN) project [2, 3]. In these conference publications as well as in more recent papers [15, 29, 38, 39], the authors have proposed an efficient nonparametric CUSUM test (generally multichart or multichannel) that allows for the rapid detection of intrusions with a low FAR. In the conventional setting of a change-point detection problem, one assumes that there is a single population that produces a sequence of random variables X(1), X(2), . . . , which are i.i.d. (independent and identically distributed), until a change occurs at an unknown point in time λ. After the change occurs, the observations are again i.i.d. but with another distribution. In this paper, we illustrate a generalization of the standard changepoint detection problem to a multichannel (or multipopulation) version that is important for network intrusion applications. In the multichannel setting, it is assumed that there are N independent populations that produce the corresponding N -component observed stochastic process X(j) = (X1 (j), X2 (j), ..., XN (j)), j = 1, 2, . . . . The component Xi (j), j = 1, 2, ..., corresponds to observations obtained from the ith channel of an N -channel system, and all of the channels can be observed simultaneously. Write X ni = (Xi (1), . . . , Xi (n)) and X n = (X n1 , . . . , X nN ) for the concatenation of the first n observations from the ith channel and from all N channels, respectively. We suppose that the data in the channels, i.e., the vectors X n1 , ..., X nN , are mutually independent. However, in general, we do not assume that the data in a particular channel, say Xi (1), . . . , Xi (n), are i.i.d. before and after the change. The goal is to construct a multichannel (multichart) detection procedure that detects the change as quickly as possible, subject to constraints on the FAR. The results of our analysis in Section 4 show that the multichannel IDS substantially outperforms a traditional single-channel algorithm used in network security (see, e.g., [34, 41]). To be more specific, assume that the random variables Xi (1), Xi (2), . . . observed in the ith channel have a joint probability density pi0 (Xi (1), . . . , Xi (n)) for n < λ and another probability density pi1 (Xi (1), . . . , Xi (n)) for n λ, where λ, λ ∈ {1, 2, . . . } is an unknown point of change. In other words, conditioned on the point of change λ = k, the hypothesis Hi that the change occurs in the ith channel, and the vector of n − 1 observations = (Xi (1), . . . , Xi (n − 1)), from the ith channel X n−1 i the conditional density of the nth observation Xi (n) is

A sequential change-point detection procedure is identified with a stopping time τ at which it is declared that a change has occurred. A good detection procedure should have a low FAR and small values of the expected detection delay, provided that there is no false alarm. To specify the speed of detection and the FAR, let Pjk and Ejk denote the probability and the expectation when the change occurs at time λ = k in the j th channel. For the situation when there is no change (i.e. λ = ∞), we will use the notation P∞ and E∞ . For λ = k, the speed of detection can be measured by the (conditional) average detection delay ADDjk (τ ) = Ejk (τ − k | τ k),

k = 1, 2, . . .

The FAR is usually expressed by the mean time to false alarm E∞ τ . If the monitoring process is immediately renewed when the false detection occurs, the value of E∞ τ represents the mean time between false detections. Then the FAR can be measured by the average frequency of false alarms FAR(τ ) = 1/E∞ τ . Ideally, we wish to find a procedure that would minimize ADDjk (τ ) for all k 1 and j = 1, . . . , N when the FAR is kept at a given level FAR(τ ) = γ, where γ is relatively small. Unfortunately, such a uniformly optimal procedure does not exist. An alternative option is to try to find a minimax procedure that minimizes supk1 ADDjk (τ ) in the class of procedures for which FAR(τ ) γ. When γ is small, the asymptotic solution to this minimax problem under quite general conditions is given by a multichart extension of Page’s CUSUM test [35]-[39] which utilizes the log-likelihood ratio (LLR) for the hypotheses “Hki : λ = k” that the change occurred at the point λ = k in the ith channel and “H∞ : λ = ∞” that there is no change, Znk (i) := log =

n j=k

log

p(X n |Hik ) p(X n |H∞ )

pi1 (Xi (j)|Xi (1), . . . , Xi (j − 1)) , pi0 (Xi (j)|Xi (1), . . . , Xi (j − 1))

n k.

This test is defined as the first time when the maximum LLR statistic exceeds a threshold h: τ (h) = min n 1 : max max Znk (i) h . 1iN 1kn

It can be shown that FAR(τ ) N e−h [38, 39]. Therefore, h = log(N/γ) guarantees FAR(τ ) γ. n−1 The restrictive feature of this optimal LLR-based multip(Xi (n)|X i , Hi , λ = k) chart CUSUM test is that it requires complete prior infor pi0 (Xi (n)|X n−1 ) if k > n, mation regarding the pre-change and post-change distrii = n−1 i butions. For this reason, below we will introduce a nonp1 (Xi (n)|X i ) if k n, parametric detection test that does not use precise inforwhere pi0 and pi1 are pre-change and post-change proba- mation on probabilities pi0 and pi1 . Note that the nonparability densities (or probabilities), respectively. metric sign-rank-based detection procedure proposed in 3

[13] is somewhat different from our detection algorithm. We believe that nonparametric sign-rank likelihood ratio detection methods are extremely efficient from a statistical standpoint, especially for the i.i.d. data models. However, they are not computationally feasible for our on-line applications, especially when applied to ultra-high speed networks with gigabit data rates. Network intrusions such as DoS attacks lead to changes in statistical properties of certain observables. In network monitoring, observables can be derived from the packet header, e.g. packet size, source IP address, destination IP address, source port, destination port, types of protocols (e.g., ICMP, UDP, TCP), etc. In the case of UDP flooding attacks, the potential observables are packet sizes, source ports, destination ports, and destination prefix. In the case of TCP flooding attacks, conceivably, we could have multiple channels each one recording counts of different flags (SYN, ACK, PUSH, RST, FIN, URG) from TCP headers. Another plausible observable is a number of halfopen connections for the detection of SYN flooding attacks. We could also have channels that can keep track of the discrepancy in TCP SYN-FIN pairs or TCP SYN-RST pairs, as done in [41]. Although, the detection algorithm described in this paper is general, for the sake of simplicity we concentrate on detection of UDP flooding and TCP SYN flooding attacks at the backbone links. Experiment 1 illustrates the detection of a UDP flooding attack while Experiment 2 deals with the detection of a TCP SYN flooding attack. (See Sections 4.1 and 4.2.) In the first experiment, we simultaneously observe the following: Xi (n)− total number of UDP packets with sizes in the ith bin arrived at the nth sampling interval Tn , i = 1, . . . , N , n 1. In the second experiment, we monitor: Xi (n)− total number of TCP SYN packets arrived with a destination IP address within the ith prefix during the nth time interval Tn , i = 1, . . . , N , n 1. Note that in all cases the sensor represents a multichannel system. We used N = 13 for the UDP attack and N = 256 for the TCP attack. The detailed description of the experiments is given in Sections 4.1 and 4.2.

turn to describing the detection algorithm that represents a multichart nonparametric version of the CUSUM method adapted to detect changes in multiple bins. This algorithm will be referred to as the Multichannel Nonparametric CUSUM (MNP-CUSUM) detection algorithm. The special case of this algorithm – the Single-channel Nonparametric CUSUM (SNP-CUSUM) detection algorithm – is also discussed throughout this paper. Let Hki denote the hypothesis that the attack occurs at time λ = k in the ith bin and let H∞ denotes the hypothesis that there is no attack. Let n g (Xi (1), . . . Xi (s)) be an appropriate score i,s s=k function that measures “a likelihood” that Hki is true. The detection statistics Sn (i), i = 1, . . . , N are defined as + n gi,s (Xi (1), . . . Xi (s)) , Sn (i) = max 1kn

s=k

+

where x = max(0, x). The time of alarm is defined as τ (h1 , . . . , hN ) = min τi (hi ), 1iN

τi (hi ) = min{n 1 : Sn (i) hi }, where hi , i = 1, . . . , N are positive thresholds. Recall that E∞ (P∞ ) and Eik (Pik ) stand for the expectations (probabilities) when there is no attack and when the attack starts at the time λ = k in the ith bin, respectively. If the score functions gi,n have negative mean values E∞ gi,n < 0 before the attack (i.e., for n < λ) and positive mean values Eik gi,n > 0 after the attack starts (i.e., for n λ), then the statistic Sn (i) tends to be close to zero in normal conditions, while under the attack it starts drifting upwards until it crosses a threshold hi . The score functions can be chosen in many ways. One possible solution is based on the observation that in many cases the attack leads to abrupt changes in mean values. Therefore, the score functions should be sensitive to changes in mean values. Let μi = E∞ Xi (n) and θi = Ei1 Xi (n) denote the pre-change and post-change mean values of the number of packets with sizes in the ith channel. The value of μi can be estimated quite accurately in advance and, hence, is supposed to be known. However, it can be re-estimated once in a while. The value of θi is unknown and should be either estimated on-line or pre-set to a reasonable number. We suppose that the attack leads to a change in the mean value of the number of packets Xi (n) for some channel i = 1, . . . , N . In the case of detecting changes in mean values, the reasonable score functions are

2.2. The Detection Algorithm

If the pre-change and the post-change distributions are exactly known, then the likelihood-based multichannel procedures studied in [36, 39] are optimal for the low FAR. In [38], it is shown that this test has certain optimality properties under fairly general conditions that are not limited to a restrictive i.i.d assumption. In computer secugi,n (Xi (n)) = Xi (n) − μi − ci , i = 1, . . . , N, rity applications, however, these distributions are usually unknown. For this reason, a nonparametric approach that where ci is a tuning positive constant. The road map for uses minimum a priori information is needed. We now choosing this constant is discussed in [29, 38]. It is easily 4

Table 1: Trace Description Set D1N D1S

Exp No 1 2

Date 2002-08-14 2002-08-14

BB 1 1

Start 10:00am 9:00am

Duration 60m 8 hr

n verified that the statistics Sn (i) = max1kn s=k gi,s , i = 1, . . . , N have the following efficient recursive representation Sn (i) = {Sn−1 (i) + Xi (n) − μi − ci } ,

Bytes 165 G 2140 G

Packets 294 M 3.3 G

300

Flows 67 M 750 M

Utilization (%) 14.73 25.64

attack starts

250

number of packets per sample period

+

Dir Nbd(0) Sbd(1)

200 150

S0 (i) = 0.

100

As performance metrics of the detection procedure, we compute the ADD and the FAR that are defined as follows:

50 0 0

ADDik (τ ) = Eik (τ − k | τ k) ,

0.5

1 1.5 sample number

2 2.5 5 x 10

Figure 1: Number of packets in a sample period vs. time. Observe that the attack is not visible to the naked-eye.

FAR(τ ) = 1/E∞ τ.

3. Data Description We used two data sets with the packet traces captured on SONET OC-48 links by CAIDA monitors. A bidirectional link from San Jose, CA to Seattle, WA that belongs to the US backbone Internet Service Provider (ISP) was monitored. The traces were collected by a Linuxbased monitor with Dag 4.11 network cards and packet capture software originally developed at the University of Waikato and currently produced by Endace [9]. Table 1 describes the traces used in our study that led to the observations discussed in Section 2.1. Two data sets collected from the same backbone were analyzed. The data set D1N is northbound, from San Jose, CA to Seattle, WA, while the data set D1S is southbound, from Seattle, WA to San Jose, CA. The packet traces contain 44 bytes of each packet just enough to contain TCP and IP header. The 1-hour data set D1N is used to analyze a UDP flooding attack, while the 8-hour data set D1S is used to analyze a TCP SYN flooding attack.

UDP flooding attack (the data set D1N). Figure 1 shows the time-series of the total number of UDP packets in a sample period Tn sec. In this case, the Tn set to 0.015 ms. The figure shows that attack traffic is not distinguishable from normal traffic. Closer, offline examination revealed that this attack consists of sending a Trojan horse called trojan.dasda from one source from port 10100 to one destination on port 44097. Trojan.dasda is a Trojan horse that can download and execute remote files and open a backdoor on an infected computer. We analyze both the SNP-CUSUM and the MNPCUSUM algorithms. In the single channel case, no distinction is made based on the size of the packets. The pre-change mean is μ = 87 packets per sample period (pkts/sp) and the post-change mean is θ = 94 pkts/sp. Thus, the parameter differentiating the normal traffic from an abnormal one is changed from 87 to 94 pkts/sp. In the multichannel case, packet sizes are subdivided into 13 separate channels (bins), and Xi (n), i = 1, . . . , 13 are observed simultaneously with the sampling period Tn . The following is the range of packet sizes that defines thirteen channels: {0-40, 41-60, 61-64, 65-70, 71-75, 7680, 81-100,101-300, 301-500, 501-700, 701-1000, 10011300,1301-1500} bytes. The flooding attack occurs in the channel i = 7 with μ7 = 10.3 pkts/sp and θ7 = 20 pkts/sp. Observe that during the attack the sender sends twice as much packets of a particular size. Splitting into size bins and using multichannel detection helps us to localize the attack and, therefore, to detect it more rapidly. Figures 2 and 3 illustrate the operating characteristics, the ADD versus − log(FAR), for the SNP-CUSUM and MNP-CUSUM tests for different values of c, the tuning

4. Performance Evaluation To evaluate and validate the developed algorithm, we conducted the study on backbone data with the packet traces described in the previous section. We present a comparison between the SNP-CUSUM and MNPCUSUM algorithms in terms of their performance metrics: ADD and FAR. The first experiment illustrates the performance of these algorithms for the detection of a low intensity UDP flooding attack. The second experiment demonstrates the application of our algorithms for the detection of a TCP SYN flooding attack. 4.1. Experiment 1: UDP Flooding Attack This experiment illustrates the feasibility of the proposed multichart detection algorithm for a low intensity 5

c=0

20

c=2 c=4 c=6 c=8

15 10

70 60

ADD (sec)

ADD (sec)

25

50 40 30 20

5 0

SNP-CUSUM for Copt = 6 MNP-CUSUM for Copt = 6

10

-4 -3 -2 -1 0 1 2 3 -log(FAR)

4

5

6

0 0

7

2000 3000 Threshold

4000

5000

Figure 4: ADD (sec) vs. threshold.

Figure 2: ADD (sec) vs.− log(FAR) for the SNPCUSUM algorithm. 4

SNP-CUSUM MNP-CUSUM

log(FAR)

2 0

ADD (sec)

9 c=0 8 c=2 7 c=4 6 c=6 5 c=8 4 3 2 1 0-4 -3 -2 -1 0 1 2 3 -log(FAR)

1000

-2 -4 -6 -8 0

4

5

6

7

200

400 600 Threshold

800

1000

Figure 5: log(FAR) (sec) vs. threshold.

Figure 3: ADD (sec) vs. − log(FAR) for the MNPCUSUM algorithm.

ADD is thirteen times larger in the single channel case. As the FAR gets lower, in the single-channel case, the ADD grows very fast and becomes so large that we are likely to miss an attack. On the other hand, the MNP-CUSUM test allow us to detect an attack with a reasonably low ADD.

parameter. The algorithm performs the best when we have small ADD and small FAR. Note that the range of − log(FAR) from −4 to 7 is equivalent to the frequency of false alarms from every 0.018 sec to every 1096 sec (∼ 18 min). Note that in the left-most region of the plots in Figures 2 and 3, we get very small ADD. However, this results in very large FAR. Every single packet that arrives is declared to be an attack in this region. On the other hand, the right-most region of the plots is the region where we get the lowest FAR and hence bigger ADD. For example, in Figure 3, ADD is 0.015 sec for FAR of 0.018 sec (the left-most region) and ADD is 2 sec for FAR of 1096 sec (the right-most region). This increase in ADD is bigger for SNP-CUSUM (see Figure 2). As it has been shown in [29, 38], the value of c can be varied in order to optimize the algorithm – to obtain the lowest ADD for the same FAR. In both cases, the optimal value of c is copt = 6. For this c, we get the best performance in the sense that for the same − log(FAR), we get the smallest ADD as compared to other values of c. Now compare the two curves for the copt in Figures 2 and 3. It can be seen that the ADD is significantly smaller in the multichannel case as compared to the single-channel case for the same FAR. For example, for − log(FAR) = 7, the ADD for the MNP-CUSUM is 2 sec while for the SNP-CUSUM the ADD is 26 sec, i.e.

Figure 4 shows the relation between the threshold values and the ADD for copt for the SNP-CUSUM (solid line) and the MNP-CUSUM (dotted line) detection tests. It can be seen that the ADD is a linear function of a threshold in both cases. Furthermore, as the threshold increases, the difference between ADDs for multichannel and single-channel systems becomes larger. For example, for h = 4990 packets/sp, the ADD is 26 sec for the multichannel case and 64 sec for the single-channel case, i.e. about 2.5 times larger. Figure 5 shows the relation between thresholds and the log(FAR) for copt for the SNP-CUSUM (solid line) and the MNP-CUSUM (dotted line) tests. This relationship is also approximately linear for reasonably large thresholds. For the same threshold, we get a smaller FAR for the multichannel case as compared to the single channel case. Therefore, the multichannel detector shows a substantial improvement in both performance metrics, the ADD and the FAR. The degree to which the MNP-CUSUM detector performs better than the SNP-CUSUM detector increases as the threshold increases (i.e., the FAR decreases). 6

c=0

35

c = 0.05

7

30

6

ADD (sec)

ADD (sec)

45 40

c = 0.1

25

c = 0.15

20

5 4

15

3

10 5

2

3

4

5 6 7 -log(FAR)

Figure 6: ADD (sec) vs. CUSUM algorithm.

8

13

9

c = 1.4 c = 1.6 c = 1.8 c = 2.0

4

5

6 7 -log(FAR)

8

9

Figure 7: ADD (sec) vs. − log(FAR) for the MNPCUSUM algorithm.

− log(FAR) for the SNP-

the ADD dramatically increases as the FAR decreases, for the lower FAR the SNP-CUSUM algorithm may not be In this experiment, we use the 8-hour data set D2S able to detect short-lived attacks and, hence, attacks may in order to show that our algorithm works well for the be completely missed. detection of TCP SYN flooding attacks. In the case of SYN attack detection, one can observe the number of 5. Conclusion SYNs that would cause denial of service. In Linux this would amount to 5.68 SYNs/sec, in solaris the rate is 4.26 For detecting attacks that cause changes of traffic in disSYNs/sec, and in windows 2000 server the maximum rate tributed computer networks, we have proposed an easily it can handle is 5 SYNs/sec [24]. implementable sequential algorithm based on a “multiSince we are interested in the SYN arrival rate at a des- channel” nonparametric version of the CUSUM statistic. tination, we divide the channels based on their destination The method belongs to the class of anomaly detections IP addresses. There are many ways to divide the IP ad- systems, and it is developed based on change-point detecdress space. We take the following approach to setup the tion theory. multichannel detection problem. One specific group of IP The results of experimental study for several real DoS addresses that belong to the same 8-bit prefix (the first 8- attacks allow us to make the following conclusions. bits of the IP address are the same) is considered. This 1. The proposed multichannel nonparametric CUSUM group of IP addresses (/8) is further subdivided into 256 algorithm performs very well not only for detecting high channels, each channel containing all the IP addresses that contrast DoS attacks when the optimization is not imporhave the same first 16 bits (/16). In this way, we have tant, but also for small and subtle attacks, like those shown N = 256 channels. In each channel, we monitor the in Figure 1. number of SYN packets sent per second for the entire 8 2. The MNP-CUSUM algorithm performs significantly hour duration. In the single channel case, the time series better as compared to the single-channel nonparametric is formed by monitoring the total number of SYNs per CUSUM, especially in the case of low-contrast attacks. second for all the IP addresses that have the same 8-bit This fact has a simple theoretical explanation [15, 38]. prefix. In the single-channel case, the μ = 3 SYNs/sec 3. Based on the results of testing, we conclude that and the θ = 19 SYNs/sec. In the multichannel case, the attack occurs in the channel i = 113 with μ113 = 0.0063 the developed detection technology enables the reliable SYNs/sec and θ113 = 15.3 SYNs/sec. It is therefore obvi- detection of DoS attacks in their early stages, well before ous that localizing the attack with the multichannel system the hostile traffic reaches its full potential. must enhance the detection capability tremendously. 4. The important practical aspect of the multichannel Figures 6 and 7 illustrate the relation between the approach is that the channels may be interpreted as differADD and − log(FAR) for the SNP-CUSUM and MNP- ent populations, i.e. different kinds of attacks. This allows CUSUM detection algorithms, respectively. It can be seen for the development of an easily upgradeable IDS that can that copt = 0.1 for the single channel case and copt = 1.8 be setup to detect various attacks at the same time. For infor the multichannel case. In the extreme right of the plot, stance, we can set up separate channels for simultaneous we achieve the false alarm rate 8103 sec, that is 2.25 hours detection of UDP flooding, Smurf and SYN flooding at(− log(FAR) = 9). For this FAR, the ADD for the MNP- tacks. We can also include a channel for detection of port CUSUM is 3.5 sec, while the ADD for the SNP-CUSUM scanners. This method still needs to be explored for the is 45 sec, i.e., approximately thirteen times higher. Since real data. That is left for the future work. 4.2. Experiment 2: TCP SYN Flooding Attack

7

6. Acknowledgements

[19] J. Lemon, “Resisting SYN flooding DOS attacks with a SYN cache,” Proc. USENIX BSDCon 2002, pp. 89–97, 2002. [20] G. Lorden, Procedures for reacting to a change in distribution, Ann. Math. Statist., 42, pp. 1987–1908, 1971. [21] C. Manikopoulos and S. Papavassiliou, “Network intrusion and fault detection: a statistical anomaly approach,” IEEE Communications Magazine, 40, Issue 10, pp. 76–82, 2002. [22] J. Mirkovic, S. Dietrich, D. Dittrich, and P. Reiher, Internet Denial of Service: Attack and Defense Mechanisms. Prentice Hall PTR: NJ, 2005. [23] D. Moore and S. Savage, “Inferring internet denial of service activity,” USENIX Security Symposium, 2001. [24] Y. Ohsita, S. Ata, and M. Murata, “Detecting distributed denialof-service attacks by analyzing TCP SYN packets statistically,” Proc. IEEE Global Telecommunications Conference (Globecom 2004), Dallas, TX, 2004. [25] E.S. Page, “Continuous inspection schemes,” Biometrika, 41, pp. 100–115, 1954. [26] V. Paxson, “Bro: A system for detecting network intruders in real-time,” IEEE Computer Networks, 31, Issue 23-24, pp. 2435– 2463, 1999. [27] M. Pollak, Optimal detection of a change in distribution, Ann. Statist., 13, pp. 206–227, 1985. [28] M. Roesch, “Snort – lightweight intrusion detection for networks,” USENIX LISA Conference, November 1999. [29] B.L. Rozovskii, A.G. Tartakovsky, R.B. Blaˇzek, and H. Kim, “A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods,” IEEE Trans. on Signal Proc., 2006 (to appear). [30] K. Shah, S. Bohacek, and E. Jonckheere, “The predictability of data network traffic,” Proc. American Control Conference (ACC2003), Denver, CO, June 4-6, pp. 1619–1624, 2003. [31] A.N. Shiryaev, “On optimum methods in quickest detection problems,” Theory Probab. Appl., 8, pp. 22–46, 1963. [32] A.N. Shiryaev, Optimal Stopping Rules. Springer-Verlag: New York, 1978. [33] D. Siegmund, Sequential Analysis: Tests and Confidence Intervals. Springer-Verlag: New York, 1985. [34] V.A. Siris and F. Papagalou, “Application of anomaly detection algorithms for detecting SYN flooding attacks,” Proc. IEEE Global Telecommunications Conference (Globecom 2004), Dallas, TX, 2004. [35] A.G. Tartakovsky, Sequential Methods in the Theory of Information Systems. Radio & Communications: Moscow, 1991 (In Russian). [36] A.G. Tartakovsky, “Asymptotically minimax multialternative sequential rule for disorder detection,” In: Statistics and Control of Random Processes: Proceedings of the Steklov Institute of Mathematics (A.A. Novikov and A.N. Shiryaev, Eds.), 202, Issue 4, pp. 229–236, American Mathematical Society: Providence, Rhode Island, 1994. [37] A.G. Tartakovsky, “Asymptotic properties of CUSUM and Shiryaev’s procedures for detecting a change in a nonhomogeneous Gaussian process,” Mathematical Methods of Statistics, 4, No. 4, pp. 389–404, 1995. [38] A.G. Tartakovsky, B.L. Rozovskii, R. Blaˇzek, H. Kim, “Detection of intrusions in information systems by sequential change-point methods,” Statistical Methodology, 2006 (to appear). [39] A.G. Tartakovsky and V. Veeravalli, Change-point detection in multichannel and distributed systems with applications,” In: Applications of Sequential Methodologies (N. Mukhopadhyay, S. Datta and S. Chattopadhyay, Eds.), Marcel Dekker, Inc., pp. 339– 370, New York, 2004. [40] A. Vasilios and F. Papagalou, “Application of anomaly detection algorithms for detecting SYN flooding attacks,” Proc. IEEE Global Telecommunications Conference (Globecom 2004), Dallas, TX, 2004. [41] H. Wang, D. Zhang, and K.G. Shin, “Change-point monitoring for detection of DoS attacks,” IEEE Trans. Dependable and Secure Computing, 1, Issue 4, pp. 193–208, 2004.

The research was supported in part by the Office of Naval Research grant N00014-03-1-0027 at the University of Southern California. We would like thank Andre Broido, Nevil Brownlee, Ken Keys, Dan Anderson, Colleen Shannon, David Moore, and K.C. Claffy, all of CAIDA, for collecting and providing data for this experiment. We would also like to thank Dr. Stephan Bohacek of the University of Delaware for useful discussions.

References [1] M. Basseville and I. Nikiforov, Detection of Abrupt Changes: Theory and Applications. Prentice Hall: Englewood Cliffs, 1993. [2] R. Blaˇzek, H. Kim, B. Rozovkii, and A. Tartakovsky, “A novel approach to detection of denial-of-service attacks via adaptive sequential and batch sequential change-point detection methods,” IEEE Systems, Man and Cybernetics Information Assurance Workshop, West Point, NY, 2001. [3] R. Blaˇzek, H. Kim, B. Rozovskii, and A. Tartakovsky, “The quickest sequential detection of intrusions in computer networks,” Interface 2003, Salt Lake City, Utah, March 12-15, 2003. [4] B. Brodsky and B. Darkhovsky, Nonparametric Methods in Change-Point Problems. Kluwer: Dordrecht, 1993. [5] J.B.D. Caberera, B. Ravichandran, and R.K. Mehra, “Statistical traffic modeling for network intrusion detection,” Proc. 8th Intern. Sympos. on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 466–473, 2000. [6] CERT advisory CA-1996-01: “UDP port denial-of-service attacks,” 1996. Available at http://www.cert.org/advisories/CA1996-01.html. [7] CERT advisory CA-1996-21: “TCP SYN flooding and IP spoofing attacks,” 1996. Available at http://www.cert.org/advisories/CA-1996-21.html, [8] H. Debar, M. Dacier, and A. Wespi, “Toward a taxonomy of intrusion detection systems,” Computer Networks, 3, pp. 805–822, 1999. [9] Endace, http://www.endace.com/ [10] L. Garber, “Denial-of-service attacks rip the Internet,” Computer, 33, No. 4, pp. 12-17, 2000. [11] A. Ghosh, J. Wanken, and F. Charron, “Detection anomalous and unknown intrusions agains programs,” Annual Computer Security Applications Conference, pp. 259–267, Decemeber 1998. [12] S. Gibson, “Distributed reflection denial of service: description and analysis of a potent, increasingly prevalent, and worrisome Internet attack,” Gibson Research Corporation, 2002. Available at http://www.grc.com/dos/drdos.htm [13] L. Gordon and M. Pollak, “An efficient sequential nonparametric scheme for detecting a change in distribution,” Ann. Statist., 22, pp. 763–804, 1994. [14] S. Kent, “On the trial of intrusions into information systems,” IEEE Spectrum, 37, Issue 12 , pp. 52–56, December 2000. [15] H. Kim, B. Rozovskii, and A. Tartakovsky, “A nonparametric multichart CUSUM test for rapid detection of DOS attacks in computer networks,” International Journal of Computing and Information Sciences, 2005 (to appear). [16] C. Ko, M. Ruschitzka, and K. Levitt, “Execution monitoring of security-critical programs in distributed systems: A specificationbased approach.,” IEEE Symposium on Security and Privacy, pp. 175–187, May 1997. [17] T.L. Lai, “Information bounds and quick detection of parameter changes in stochastic systems,” IEEE Trans. Inform. Theory, 44, pp. 2917–2929, 1998. [18] T. Lane and C.E. Brodley, “Temporal sequence learning and data reduction for anomaly detection,” The 5th ACM conference on Computer and Communications security, pp. 150–158, 1998.

8

A Nonparametric Multichart CUSUM Test for Rapid ... - CiteSeerX

A Nonparametric Multichart CUSUM Test for Rapid ... - CiteSeerX

Suggest Documents