Detection of UDP-Flooding Attack with Hidden Markov Models Junghun Parka , Lei Huangb and C.-C Jay Kuoa a Integrated
Media Systems Center and Department of Electrical Engineering University of Southern California, Los Angeles, CA 90089-2564 b Department of Electrical Engineering and Computer Science Loyola Marymount University, Los Angeles, CA 90045 ABSTRACT
A scheme that uses the hidden Markov model (HMM) is proposed in this work to detect unauthorized nuisance packets in IP networks, which waste network resources and may result in the denial of service (DoS) attack. The proposed HMM is designed to differentiate the attack traffic from the normal traffic systematically. The design of the basic HMM model is first introduced, and the operations of the detector are then described in detail. Finally, we show that the detector using HMM is not sensitive to various attack types and able to detect the attack at an earlier stage.
1. INTRODUCTION The DoS attack using IP spoofed packets can occur in IP-based wired and wireless networks, such as wireless LAN (WLAN) and 3G wireless networks. The target of the attack in a wired network is usually a server since the effect of the attack can be maximized. TCP SYN flooding attack is an example of the attack. In contrast, mobile nodes in wireless networks are clients rather than a server. If the target of the attack is a node in wireless networks, the effect of the attack will be limited in the range of an access point or a base station that covers the target. The resources of a wireless network, such as the bandwidth in WLAN and the power in 3G wireless network, are more limited so that it is possible to deny services in the targeted area with fewer attack packets. Thus, the attack can have a severe impact on a group of users sharing common resource in wireless networks. The denial of service (DoS) attack using IP spoofed packets has drawn a lot of attention in IP networks recently since it is one of the most difficult attacks to prevent. The traceback algorithm1, 2 has been proposed to handle the IP-spoofed attack in wired network. However, the performance of the traceback algorithm is degraded in wireless networks, since mobile nodes may move around and the routes tend to vary frequently with time, which implies that the number of packets required for robust traceback is reduced. Furthermore, due to the limited radio link bandwidth and the available power in wireless networks, the influence of the DoS attack tend to have a greater impact on wireless networks than wired ones. Early detection of the existence of such an attack is critical. Early detection usually demands a high computational overhead due to frequent computation and decisionmaking, since it is not known in advance when the attack occurs. On the other hand, a simplified detection process with a lower complexity may miss some of the attacks and/or cause false alarms as a tradeoff between the complexity and performance. Generally speaking, it is desirable to design a good detector to offer the best tradeoff among the following three criteria: reliability (i.e. a low missing detection rate and a low false alarm rate), efficiency (i.e. a low computational complexity) and early detection (i.e. short response time). In this research, we focus on anomaly detection by identifying traffic that deviates from the constant bit rate (CBR) traffic for a persistent period of time. Adaptive sequential and batch-sequential change-point detection methods were proposed in3 and4 to detect such an attack, respectively. The sequential change point detection algorithm estimates the number of incoming packets during a fixed interval and compares it with a predefined threshold, which is obtained from a target false alarm rate. If the number is higher than the threshold, an alarm message is generated. However, the estimation based on incoming packets is not sensitive to a smaller number For further information about the work, please send correspondence to Junghun Park via e-mail:
[email protected]
Digital Wireless Communications VI, edited by Raghuveer M. Rao, Sohail A. Dianat, Michael D. Zoltowski, Proceedings of SPIE Vol. 5440 (SPIE, Bellingham, WA, 2004) 0277-786X/04/$15 · doi: 10.1117/12.542358
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 7/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
331
of attack packets. To effectively detect the attack at an early stage, we propose a new detector using Hidden Markov Models (HMMs). The rest of this paper is organized as follows. The scenario of the attack is introduced and two related issues are discussed to clarify the application of our current research in Section 2. The design of the detector based on the Hidden Markov model and its detection mechanism are described in detail in Section 3. The performance of the proposed detector is analyzed in Section 4. Finally, concluding remarks and future research directions are given in Section 5.
2. BACKGROUND OF THE RESEARCH Before describing the proposed detector using HMM, we discuss the background of our research, that includes attack scenarios and two related issues, i.e. flow classification and detector deployment.
2.1. Attack Scenarios A scenario of the DoS attack in 3G wireless network is given below. By monitoring IP packets transmitted from a web server to a mobile node, an attacker can find the destination address of these packets. Then, a large number of nuisance packets with the same destination IP address of the mobile node can be sent by the attacker. These data packets will flow into the base station of the mobile node. Afterwards, non-authenticated packets are broadcast over the air of the coverage of the base station. The radio power used to transmit non-authenticated packets in the base station results in a shortage of power required to send out authenticate packets. Even though the down link channels in CDMA systems are orthogonal due to the use of different spreading sequences, the radio power required for sending non-authenticate packets will lower the signal to noise ratio of other mobile nodes covered by the same base station. As a result, non-authenticated packets will degrade the quality of service of authenticated users. There are several types of IP-based DoS attack; namely, TCP SYN, ICMP echo and UDP flood packets. The inbound packet traffic to a user in wireless networks usually comes from a server. Only the UDP flood attack is possible in a wireless network, since the destination of packets of other types in the same network is a server or ICMP echo packets need not to be used within a single-hop based wireless network. By focusing on protection of a base station (or an access point) of a wireless network against the UDP flood attack, the detector can be deployed in the location where incoming packets for each user can be monitored.
2.2. Packet classification To identify UDP flows, the transport header, such as the number of the port, needs to be accessed. This identification is performed at leaf routers, which are usually the trusted entities for the clients in the same intranet.4 It could be more difficult to determine the ports when IPSec is applied. To address this issue, the multi-layer IPSec protocol proposed in5 allows trusted routers to access the transport layer. Thus, the network-level security of IPSec should not be an obstacle in attack detection at the leaf router.
2.3. Deployment of the detector By taking the load of the detection task on the network into account, it is desirable that the detector is located on the router which has one hop to the destination in a distributed manner. For instance, the Packet Data Serving Node (PDSN) in 3G systems has one hop to mobile users. The PDSN receives IP datagrams from the PDN (Packet Data Network), which is connected to the Internet, and establishes, maintains and terminates link layer sessions to the mobile node. The PDSN operates in layer 3 and fragments packets for the Radio Access Network (RAN) to send frames in layer 26 as a leaf router. To convert packets into frames, the PDSN has to manage each mobile user who is connected in the RAN. Therefore, incoming packets can be monitored in the PDSN for each mobile user and the proposed detector should be located in the PDSN, too. In wired networks, the detector can be deployed at the router of a subnet.
332
Proc. of SPIE Vol. 5440
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 7/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
3. DETECTOR DESIGN USING HMM An HMM is characterized by a triple, λ = (A, B, π), where A is the state-transition probability matrix, B is the observation symbol probability distribution, and π is the vector of the initial state probabilities.7 What makes the HMM practically useful in a real environment are the Baum-Welch (BW) re-estimation algorithm and the Viterbi search algorithm. The former is used to optimize the HMM given some observation sequences. The latter is used for HMM analysis. We propose an HMM for differentiating the attack behavior from the normal traffic.
3.1. Proposed hidden Markov model (HMM) The first task for detecting the attack is to build an HMM that represents the behavior of legitimate users. Then, we examine the deviation of the HMM when attack occurs. Observations It is assumed that the attack does not occur immediately when a legitimate user starts to get data services. After an initial period of connection, attackers may find the destination address and begin to send nuisance data packets. The detector monitors the number of arriving packets in a unit time to define two thresholds. We classify time intervals according to the number of arrival packets into three groups: the lower, the middle and the higher one thirds. The first threshold is obtained by averaging the number of arriving packets for the lower one-third while the second threshold is obtained by averaging the number of arriving packets for the higher one-third. The detector continues to update the two thresholds until attack occurs to reflect the jitter of arriving packets in the model. After deciding the two thresholds, the traffic can be claimed to be ”low”, ”normal” and ”heavy” by comparing the number of arriving packets with the two thresholds for every time interval. For example, if the number of arriving packets is less than the lower threshold, the detector regards the observed traffic as ”low” in the interval. If the number is larger than the higher threshold, the observed traffic is labeled as ”heavy”. Otherwise, the observed traffic is claimed to be ”normal”. It is possible to get more traffic levels by increasing the number of thresholds. The thresholds decide the size of the observation symbol set(M). Here, we consider three traffic levels with two thresholds for simplicity. Furthermore, we consider the observed traffic patterns in two consecutive intervals, i.e. the current interval and the previous interval, and associate them with 9 observation symbols (M=9) given in Table 1, where L, N and H mean the low, normal and heavy traffic volumes, respectively. Table 1. The symbols for nine observation cases.
Observed Symbols (vi ) Traffic in Previous Interval Traffic in Current Interval
v1 L L
v2 L N
v3 L H
v4 N L
v5 N N
v6 N H
v7 H L
v8 H N
v9 H H
States The states in the proposed HMM represent the change of the traffic condition between the current interval and the previous interval. We consider three states: traffic decreases (State 0), remains to be the same (State 1) and increases (State 2). When the system is in state 2, there are six possible transitions between two consecutive time intervals. If the traffic is heavy in the previous interval, the traffic in the current interval can be heavy, normal or low. If the traffic is normal in the previous interval, the traffic in the current interval can be normal or low. If the traffic is low in the previous interval, the traffic in the current interval can only be low. Thus, only events vi , with i = 1, 2, 3, 5, 6, 9 are allowed. In other words, given State 2, the probabilities of vi , with i = 4, 7, 8 are all equal to zero. The non-zero observation probabilities for each state are shown in Fig. 1. The specific values of the observation probabilities can be obtained using a set of training sequences.
Proc. of SPIE Vol. 5440
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 7/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
333
t
L
N
H
t
L
L
v1
t-1
t-1 L
v1
N
v4
v5
H
v7
v8
N
t
t-1 L N
L
N
H
v1
v2
v3
v5
v6
H
H
v5
H
v9
N
v9
L : Low N : Normal H : Heavy
v9
Ot : An observation at the time of t Ot V= {v1, v2 , …v9}
Figure 1. The non-zero obseration probabilities in the proposed HMM under States 0, 1 and 2.
3.2. Training Process The training sequences were generated with the network simulator ns-2 according to the conditions specified in Table 2. The background traffic consists of 20% UDP traffic and 80% TCP traffic. The target user received the CBR data service as shown in Fig. 2. We emulated an attack node that generates nuisance packets randomly and sends them to the target user. The attack rate, which is defined as the ratio of the number of attack packets and the number of normal packets, increases from 4% to 15%. The capacity of the link is assumed to be large enough that congestion does not occur in this simple case. However, there exists delay jitter between packets due to the shared queue in the Internet. The generated packets are used for the training of the HMM. To extract the parameters for the HMM, which include the state transition probabilities, the initial state probabilities and observation probabilities for each state, we applied the Baum-Welch (BW) re-estimation algorithm based on training samples. The algorithm is robust in the sense that it always converges. First, we assign the parameter values arbitrarily. By using the Baum-Welch (BW) algorithm we obtained the optimal parameters for a given set of training sequences. In this work, we obtained the optimal parameters of the HMM with training samples generated under the case with the normal traffic and the case with attack packets. Table 2. Parameters used in the simulation environment.
Traffic
Scenario
TCP background(Mbps)
8
UDP background(Mbps)
1.05
Victim receiver (Mbps)
1.0
Attack Rate (%)
Linear increase (0.27 4.3)
Attack Rate (%)
Abrupt increase (8.4 20.5)
The convergence speed is highly influenced by the observation length. The longer the observation length, 334
Proc. of SPIE Vol. 5440
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 7/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
- Link : Duplex link with 20Mbps and Drop tail queue UDP Sink
Attack node
UDP Back ground source
0
TCP background source
1
TCP sink
CBR receiver (victim)
CBR sender
Figure 2. The network topology used in computer simulation.
the slowlier the convergence speed. We ran simulations repeatedly by varying the observation length to find the optimum detection length, in which the detector gives the best performance.
3.3. Detection mechanism Let λnormal and λattack be the HMMs trained using normal data samples and attack data samples, respectively. Intuitively speaking, if attack occurs, the optimal state sequence for this test sequence should contain a larger number of ”state 2”, which means that the traffic increases locally. Furthermore, the likelihood of the test sequence in λattack will be higher than that in λnormal . To determine the optimal state sequence, S = {S1 , S2 ...ST }, with respect to an observation sequence, O = {O1 , O2 ...OT }, we use the Viterbi algorithm.7, 8 The Viterbi algorithm starts by defining the quantity that has the highest probability along a single path at time t, which accounts for the first t observations, and ends in state i. This quantity, denoted by δt (i), can be written mathematically as δt (i) =
max
S1 ,S2 ...St−1
P (S1 , S2 , ...St−1 , St = i, O = O1 , O2 , ...Ot /λ)
(1)
where Ot ∈ V = {v1 , v2 , ...vM }. By induction, we have the quantity at the next time δt+1 (j) = [max δt (i)aij ]bj (Ot+1 )
(2)
i
where aij is the transition probability from state i to state j and bj (Ot+1 ) is the probability that Ot+1 is observed to be in state j at time t + 1, P[Ot+1 = vk |st+1 = j]. As the length T of the observation increases, the underflow problem tends to occur. Thus, it is often to take the logarithm of probabilities in the model. Here, we first preprocess the model parameters, such as the initial state probabilities, the state transition probabilities and the symbol probability distribution in each state, which are obtained from 3.2 and then take the recursion. This procedure can be summarized as follows. Step 0. Preprocessing π ˜ i = − log(πi ), ˜bi (Ot ) = − log[bi (Ot )], a ˜ij = − log(aij ),
1 ≤ i, j ≤ N,
1 ≤ t ≤ T.
Step 1. Initialization δ˜i = log(δ1 (i)) = π ˜ i + ˜bi (O1 ),
1 ≤ i ≤ N.
Step 2. Recursion δ˜t (j) = log(δt (j)) = min
1≤i≤N
˜ij + ˜bi (Ot ), δ˜t−1 (i) + a
1 ≤ j ≤ N,
2 ≤ t ≤ T.
Step 3. Termination P˜ = min
1≤i≤N
δ˜T (i) . Proc. of SPIE Vol. 5440
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 7/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
335
Finally, given some observation sequence, we take the logarithm of the state transition probabilities along the optimal state sequence in each HMM, from the above procedures. They can be represented as P˜ normal = − log[
max
P [S1 S2 · · ST, O1 O2 · · OT / λnormal ]]
P˜ attack = − log[
max
P [S1 S2 · · ST, O1 O2 · · OT / λattack ]]
S1 S2 ··· ST
and S1 S2 ··· ST
The decision can be made according to the following rule: Attack, P˜ attack < P˜ normal , Decision = N ormal, P˜ normal ≤ P˜ attacl
(3)
In Figs. 3 and 4, we show the average of P˜ normal and P˜ attack as a function of the observation length. The normal HMM is trained by data samples of the normal traffic and the attack HMM is trained by data samples with an attack rate of 10%. Now, consider an attack with the attack rate equal to 5%. We see that the probability of the optimal state sequence from the normal HMM is reduced when the attack occurs. Please note that, since the y-axis in the figure is the negative logarithm, a higher value implies a smaller probability. Similarly, when the attack occurs, the probability of the optimal sequence is higher when the attack HMM is adopted. Thus, the probabilities of the optimal state sequence can be used in the detection mechanism. An attack is claimed to happen when the probability of the optimal state sequence from the attack HMM is higher than that from the normal HMM. 100
Average of -log(max P(S,O/HMM))
80
Average of -log(max P(S,O/normal)) Average of -log(max P(S,O/attack))
60
40
20
0 0
2
4
6
8
10
12
14
16
The number of observation sequences (=T)
Figure 3. The plot of P˜ normal and P˜ attack versus the observation length when the test sequence follows the normal traffic pattern.
4. SIMULATION RESULTS AND ANALYSIS We conducted a simple computer simulation under conditions given in Table 2 and the same topology except for the data type and rate of the attacker. There three types of attack: the random packet attack, the burst data attack and the pulse train attack. The two HMMs trained in the previous section are used. They are shown in Fig. 5. 336
Proc. of SPIE Vol. 5440
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 7/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
100 Average of -log (max P(S,O/attack)) Average of -log (max P(S,O/normal))
Average of -log (max P(S,O/HMM))
80
60
40
20
0 0
2
4
6
8
10
12
14
16
The number of observation sequences (=T)
Figure 4. The plot of P˜ normal and P˜ attack versus the observation length when the test sequence contains an attack.
a) Random attack packets
b) Burst attack packets
c) Pulse train attack packets t Figure 5. Various attack types used in the simulation.
Fig. 6 shows the detector performance according to various attack rates for the three attack types. The y-axis of the figure is the miss classification rate (MCR), which is the sum of the false alarm rate and the miss alarm rate, and the x-axis specifies different attack rates. The detector has almost the same performances against all three types of attack. Every result was obtained when the detector decides the occurrences of the attack for 8 observation intervals. Thus, the HMM-based detector is not sensitive to the attack types.
5. CONCLUSION AND FUTURE WORK A detector based on the hidden Markov models (HMMs) was proposed to detect the IP spoofed DoS attack in this work. We used the Baum-Welch re-estimation algorithm to train the HMM with normal and attack data Proc. of SPIE Vol. 5440
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 7/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
337
0.18
0.16 Random attack Burst attack Pulse train attack
0.14
MCR
0.12
0.1
0.08
0.06
0.04
0.02 8
10
12
14
16
18
20
22
24
Attack rate (%)
Figure 6. The detection performance as a function of the attack rate for three attack types.
samples. It was shown by experimental results that, given unknown observation sequences, the proposed HMM can differentiate the attack from the normal traffic systematically. We considered three attack types with a range of attack rates and evaluated the detector performance in terms of the miss classification rate (MCR). It was observed that the detector is not sensitive to the attack type and has better performance when the attack rate is higher. In the future, we will continue to investigate the performance of the HMM detector for different traffic types (e.g. variable bit rate traffic) and attack types. Furthermore, we will compare the performances of the HMM detector with other detectors, such as the one based on the sequential change point detection algorithm.
REFERENCES 1. S. Savage, D. Wetherall, A. Karlin, and T. Anderson, “Practical network support for IP traceback,” in Proc. ACM SIGCOMM, pp. 295–305, (San Diego, CA), August 2001. 2. K. Park and H. Lee, “On the effectiveness of probabilistic packet marking for IP traceback under denial of service attack,” in Proc. IEEE INFOCOM, pp. 338–347, (Anchorage, Alaska), April 2001. 3. R. Blazek, H. Kim, B. Rozovskii, and A. Tartakovsky, “A novel approach to detection of ”denial of service” attacks via adaptive sequential and batch-sequential change-point detection methods,” in Proc. IEEE Workshop on Information Assurance and Security, pp. 220–226, (West Point, NY), June 2001. 4. D. H.Wang and K.Shin, “Detecting syn flooding attacks,” in Proc. IEEE INFOCOM, pp. 1530–1539, (New York USA), June 2002. 5. Y. Zhang and B. Singh, “A multi-layer ipsec protocol,” in Proc. of 9th USENIX Security Symposium, August 2000. 6. 3GPP2, “3G wireless network management system high level.” 3GPP2 S.R0017, Dec 1999. 7. L. Rabiner and B. Huang, “An introduction to hidden markov models,” IEEE ASSP Mag. 3, pp. 4–16, January 1986. 8. D. Bertsekas, Dynamic Programming And Optimal Control, vol. 1, Athena Scientific, second ed., 1996.
338
Proc. of SPIE Vol. 5440
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 7/16/2018 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use