SYN Flood Attack Detection in Cloud Environment ...

2017 8th International Conference on Information Technology (ICIT)

SYN Flood Attack Detection in Cloud Environment Based on TCP/IP Header Statistical Features Muna Sulieman AL-Hawawreh Faculty of Information Technology Mutah University Karak, Jordan [email protected]

Abstract— Virtualization is a foundational key to cloud computing that allows sharing single physical host, resources, and application among multiple users. Virtual cloud infrastructure is vulnerable to distributed denial of service attack, in particular, SYN flood attack which exhausts the server resources and makes it unavailable to the legitimate user. In the clouds, this type of attack impact can extend to a large number of resources which lead to doubling the losses. This paper discusses the SYN flood attack in virtual cloud and detects it based on new features that extracted from TCP/IP header. Hence, different machine learning methods such as neural network, naïve base, decision tree and k-means are used to observe the performance. The results show that the machine learning algorithms proved an efficient performance in classifying and detecting SYN flood attack with extracted features. Keywords— Cloud Computing; Virtual Machine; SYN flood; TCP header; IP header; Detection.

I.

INTRODUCTION

Cloud computing has undergone a phenomenal growth in recent years and will continue to be so for the foreseeable future [1], because of its massive amount of benefits which are presenting for all sorts of business in the globe. It provides a pool of virtualized, managed, and measured computing, platform, storage, and services to consumers on demand via the Internet. Cloud computing is defined as a TCP/IP based high development and integrations of computer technologies, for instance, quick microprocessor, colossal memory, high-speed network, and reliable system design [2]. Distributed Denial of Service (DDoS) attack is one of the biggest challenges and hindrances to cloud computing which influences on the availability of services. In a DDoS attack, the attackers use a group of machines, often infected with a Trojan, to flood the resources or bandwidth of their victim server, in an attempt to consume victim resources, block the communication, and deny the services to legitimate users. The most prevalent DDoS attacks are TCP SYN flood attacks, as reported in Kaspersky lab [3], in Q3 2016, more than 81% of DDoS attacks are TCP SYN flooding. The attackers exploit the weakness of TCP protocol

978-1-5090-6332-1/17/$31.00 ©2017 IEEE

(three handshakes) to send a sequence of SYN requests to the victim server to bring down and block the resources. In clouds, DDOS attacks in their different types are more adverse than in the traditional network because Infrastructure as a Service (IaaS) clouds run client services inside virtual machines. So, when one virtual machine is overloaded by flooding attack, then all the virtual machines in the same shared environment will be affected [4]. These assaults can easily influence the cloud’s performance and cause financial losses and harmful effects to other servers in the same cloud infrastructure. To address these types of attacks, in particular, TCP SYN flood attack the Intrusion Detection System (IDS) is a good choice. It monitors various events in the network to determine if an intrusion activity occurred or not, and it should generate alarms to the system or network administrator as soon as any attack is detected. The IDS can be signature or anomaly; Signature detector analyzes the network traffic to detect an event or a set of events that match a specific pattern, such as a sequence of bytes in network traffic, or known malicious instruction sequences used by malware while the anomaly detector starts working by monitoring the network traffic, and based on heuristics or rules it classifies the traffic as either normal or abnormal. To detect the SYN flood attack most of the previous detection approaches are based on the SYN arrival rate, the difference between the SYN and SYN-ACK packets or SYN and FIN packets. However, these approaches have many drawbacks such as high detection delay, high computational costs, and using a number of SYN packets is difficult in detecting the attacks in the large networks with dynamic resources like cloud computing. Hence, due to that the encrypted traffic and a huge amount of traffic in a cloud environment, detecting SYN flood attack based on the TCP/IP header is the most preferable, it has low computation costs and a high speed in detection [5]. This wok focuses on detecting SYN flood attack in virtual cloud environment based on anomaly detector which used the statistical features that describing the characteristics of TCP/IP headers. Many features

2017 8th International Conference on Information Technology (ICIT) are extracted using Tcptrace tool, after that, the most relevant features were selected based on different filter algorithms, finally, the final data set was evaluated using different machine learning algorithms.

result, the service was stopped for nearly 24 hours. These occurrences of assaults cause substantial downtime, financial losses, and other detrimental effects on the business processes of victims in both the short and long term.

The rest of paper is organized as follows: section II presents the Virtualization Technology, TCP SYN flood attack is explained in section III, related work is discussed in IV, section V presets the proposed work, and the work is concluded in section VI. II. VIRTUALIZATION TECHNIQUE In the usability data center hardware capabilities, a lot of hardware resources are wasted; thus for best utilization of data center hardware, virtualization is preferable. Virtualization is a technique that allows running multiple Operating Systems (OS) at the same time in the same physical host machine. A virtualization forms the base for the cloud computing paradigm, and it is one of the most effective factors that contributed to the evolution of clouds. This technique supports the cloud infrastructure and provides a huge of benefits including utilizing fewer resources, bringing down costs, making server management easier, and consolidating the server [6]. The virtualization technology encapsulates an operating system and the desired application into a portable Virtual Machine (VM) as can be shown in Fig. 1. The virtual machine is a logical entity that executes and behaves like a real physical machine. Each VM may be able to host various operating systems and each OS installed in the VM is called a guest OS. As it is essential to manage layers and to make and control the VMs, this need is treated by a Virtual Machine Monitor (VMM) or hypervisor. The hypervisor is a primary component that enables computing system partitioning (memory and CPU) and allows multiple operating systems.

Fig.2. Attack Scenario in Virtual Cloud environment

III. TCP SYN FLOOD ATTACKS SYN-flood attack depends on the most popular protocol in the internet world, Transmission Control Protocol (TCP). The TCP is a connection-oriented, authoritative, and in-sequence delivering transport protocol, almost all traffic relies on such as old insecure virtual terminal service (Telnet, port 23), mail (SMTP, port 25), File Transport Protocol (FTP, port 21), Domain Network Server (DNS, port 53) and Hypertext Transfer Protocol (HTTP, port 80) which is known as the World Wide Web service (WWW). The TCP protocol utilizes the three handshakes to build up the connection between client and server, in a normal situation as appeared in Fig.3, the client sends the SYN message to the server to start connecting, and then the server will respond with a SYN-ACK message. Thus, when the client receives the SYNACK message, it will send an ACK message to open the connection and start exchanging data. The attackers exploit the weakness in this three handshake mechanism to launch their attacks and prevent the server from providing service to the legitimate users. As depicted in Fig.4, when a client sends the SYN message, the server responds by sending the SYN-ACK message to the client. After that, the server keeps waiting to receive an ACK message from the client. However, it will not receive an ACK, so the connection stays half-open and stored in the backlog queue. This backlog has a finite limit which depends on the type of operating system. When the number of half-open connections is exceeded, the server will reject all incoming requests until some connections time-out is finished, and dropped. As result, the service will be blocked.

Fig.1. Virtual Machine Architecture

In spite of the fact that the virtualization technique provides quick elasticity and optimal management of resources, it also provides a certain risk in the cloud system such as malicious activities which are launched by VMs like VM-(D) DoS attack [7], this sort of attack can happen when the cloud client owns at least one VM which is also used to exploit the vulnerable VMs in the same host machine for launching DDoS flood assault towards another VM victim in the same host machine or cloud environment as shown in Fig 2. Such attack occurred in Amazon EC2 in 2009 [8], where an insider DoS attack was executed against another VM in the same environment. As a

Fig.3. Three Handshakes in Normal Situation

2017 8th International Conference on Information Technology (ICIT) SYN per second in the large network is normal while this increase may be an attack in the entire network.

Fig.4. Three Handshakes in Attack Situation

SYN flood attack can be launched either directly or by using spoofed or random IP. In the direct attack, the attacker can send many SYN requests to the victim server, however, the attacker’s operating system must not respond to the SYNACK, because any ACK, RST, or ICMP message allow to listener to move the Transmission Control Block (TCB) out of the SYN-RECIVED [9] while in the case of using random or spoofed IP, the attacker sends the SYN message with this IP to the victim server, hence the victim responds by sending SYNACK which will never reach to the sender. Thus, the server stays in waiting for the ACK message, and this leads to flooding the victim server with half-open connections. IV. RELATED WORK In [10], the authors presented an adaptive mitigation method to defense against SYN flood attack; they used advanced rules in windows firewall to control the TCP traffic. The attack is detected and blocked by the firewall when the difference between the counter of IP address that passes the network more than one time and the counter of IP addresses that have completed their three handshakes exceed the predefined threshold. The experiment results showed that the proposed is able to detect and mitigate a SYN flood attack. However, detecting the SYN flood attack and preventing based on the firewall is not effective, in particular, the firewall melts quickly under a load of a trivial attack. An anomaly detection based on the TCP/IP header was proposed by [11]. Their work detected the SYN flood attack by filtering the packets based on main rules; type of protocol, TCP flags and the IP address if spoofed or not. After that, every abnormal packet was analyzed using the characteristic of IP header such as identification, time to live, and IP total length, in addition, TCP header like source ports, and header length. The proposed mechanism displayed a good work in detecting SYN flood attacks, but detecting SYN flood based on these features is not sufficient, especially, the attacker today can use the normal packets like a legitimate user. Detecting SYN flood attack based on the number of TCP SYN packets was applied by [12, 13]. The number of SYN packets in the specific interval that exceed the predefined threshold is considered an attack. The results showed that this method able to detect the SYN attack with high accuracy. However, this process has a difficult in implementing in the large network, for example, an increase of SYN counts by 35

The authors in [14] used a difference between incoming SYN packets and outgoing SYN-ACK packets to detect a SYN flood attack. Their work relied on linear prediction analysis to estimate the SYN flood attack over the time. The proposed mechanism achieved good results with short detection delay. On the other hand, a model to detect SYN flood attack was proposed by [15]. The authors presented a framework that integrates the threshold based system and misuse detection systems, their implementation framework included analysis the load of CPU before, during and after TCP SYN flood attack. The results of proposed framework showed that the load of CPU was increased during the occurrence of attack and the load of CPU was minimized efficiently after the detection process. A mechanism for detecting SYN flood attack was presented by [16], their work depends on the variation of the arrival time of packets, they divide the arriving packets into five groups according to its flow flags such as group for packet traffic that completes the three handshakes, and a group for the traffic that terminated by Reset flag. The results showed that this method can detect the high rate attack quickly, but cannot detect the low rate attack where the traffic follows the normal distribution like legitimate traffic. Most of the previous work relied on the counter of SYN packets in the specific interval to detect SYN flood attack, these methods showed that a good performance. However, relying on these methods in the cloud environment with dynamic resources (i.e. virtual machines) and a massive amount of traffic is not satisfactory. So, This research is different from previous detection methods that applied in the traditional network and cannot use them in the cloud computing paradigm, here we used the intrusion detection systems based on machine learning algorithms to detect SYN flood attack. The detection process depends on new statistical features that extracted from TCP/IP header for each flow. The traffic from virtual cloud environment was gathered to analyze and extract the features, after that choosing the optimum features using the intersection process between three popular filter methods. Finally, anomaly detector based on different machine learning methods was used to evaluate the worth and the efficient of the extracted features. V. PROPOSED WORK The proposed work as shown in Fig.5 consists of four main components: a testbed experiment, feature extraction, feature selection, and evaluation based on machine learning. First, capturing the traffic that exchanges between virtual machines in the same physical host, second the features are extracted from the collected traffic, third the most important features are selected using different filter methods, and finally the final data set is evaluated using anomaly intrusion detection system based on different machine learning algorithms. The details of these components are discussed in the following subsections:

2017 8th International Conference on Information Technology (ICIT) Hpin3 -S --Flood -V -p 80 10.0.2.4 To confirm that the system is under TCP SYN flood attack, the following netstat command in the terminal [19] was used Netstat –tuna | grep ‘:80’, which filters all the connections on the port 80 (i.e. HTTP port), then it can see a large number of TCP half–opened connections which marked as SYN_RECV as displayed in Fig.7.

Fig.5. The Proposed work Architecture

A. Testbed Experiment In this work, we used the testbed environment which was conducted in our previous work [17]. The virtual testbed environment was built using the oracle virtual box-5.1.0108711-win hypervisor (type 2). This hypervisor application was installed on the physical host with Windows 10. It allows creating virtual machines and sharing the resources such as CPU, Network Interface Card (NIC) and memory. As shown in Fig.6, four virtual machines were created and configured with Linux-Debian (32 bit) as an operating system. Here, VM1 was supposed as the web server (i.e., Apache 2) to provide web services to other virtual machines at the same physical host. In addition, VM1 and VM2 were connected to the other machines outside physical host via public IP; meanwhile, they connected with VM3 and VM4 in the internal virtual network via private IP. It is worth monitoring, our attention here is only the traffic that exchanges between virtual machines.

From the traffic monitoring in the previous experiments and using the system performance screen, the system resources CPU utilization, network history had been analyzed. Fig.8 shows the CPU utilization before a TCP SYN flood attack, the result shows that the CPU utilization is approximately less than 50% in both CPU 1 and CPU 2. While Fig.9 displays the CPU utilization during TCP SYN flood attack, as it can see CPU1 shows a large increasing in the utilization, it went up by 100% in the interval from 20-40 second. However, CPU2 demonstrates a large decreasing. This is because the CPU1 is the main CPU that being used in this experiment. In the attack situation, the computer needs more processing to action the TCP SYN attack traffic, so the CPU utilization is the most affected by TCP SYN attack.

Fig.7. Half-Opened Connection of TCP-SYN Flood Attack

Fig.6. The Testbed Environment

The experiment consists of two scenarios; normal and attack scenarios. In the normal scenario, the iMacro scripts were used as bots in VM2, VM3, and VM4 to browse and fill the forms in the website that was presented by the web server in VM1. At the same time, the traffic for one hour was captured by TCPDUMP tool. In the attack scenario, the VM2 and VM3 were used as zombie machines to launch TCP SYN flood attack against the VM1 using hping3 tool [18] while VM4 created the normal background traffic. Similar to the normal scenario, TCPDUMP captured the traffic. The following command was used to create direct SYN flood attack:

Fig.8. CPU Utilization before TCP-SYN Attack

Fig. 9. CPU Utilization during TCP-SYN Attack

The network history in Fig. 10 shows a big difference before and during a SYN attack. Overall the result shows the

2017 8th International Conference on Information Technology (ICIT) data sending is less than 5.0 KB/s, while the receiving data is less than 2.5 KB/s. In the next situation as depicted in Fig. 11, it is totally different in the comparison to the previous Fig.10. The graph shows there is drastically increase in receiving and sending data simultaneously where it went up approximately more than 1.0MB/s.

Fig. 10. Network History before TCP-SYN Attack

Fig. 11. Network History during TCP-SYN Attack

B. Data Preprocessing and Feature Extraction After capturing the traffic that exchanges between the virtual machines in normal and attack scenarios in pcap files, the Tcptrace tool was used to extract the statistical features of TCP and IP headers. The Tcptrace was composed by Shawn Osterman at Ohio University for analysis TCP dump files. It can extract various types of output for each pcap file such as elapsed time, and a number of packets seen, or for each connection like round trip time, throughput, segments and bytes sent and received, retransmission, and window advertisement [20]. The features were extracted from normal and attack pcap files by writing the following command in Linux Ubuntu terminal:

C.Feature Selection This module is one of the critical modules of the intrusion detection process as it chooses the most relevant features which can improve the efficiency of intrusion detection systems, minimize the false alarms and reduce the complexity cost. Here, to select the most effective features, the process was carried out by using the output of the one-fourth (i.e. 25 features) split of the ranked features of the three popular filtered methods: • Relief F (RF): it uses to estimate the quality of attribute based on the ability to distinguish among the different classes. It works using the two nearest neighbors; the one from the same class called nearest hit, and the other from different class called nearest miss. This method start working by choosing randomly one instance (R), nearest hit (H), and nearest miss (M), then it decreases the quality of attribute if the R and H instances have different values for the attribute, while it increase the quality estimation if the R and M instances have different values for the attribute,. The process repeat several times [21]. This method strength is ability to deal with noisy and incomplete data. • Information Gain (IG): the information gain of a given feature A with respect to the feature value (class) B is here the reduction in the entropy about the value of B after observing the values of A, the entropy about the value of B is calculated by: (

TABLE I.

DISTRIBUTION OF RECORDS IN FINAL DATASET

Types of Records

Number of Records

Normal

10000

TCP SYN Flood attack

10000

Sum

20000

(

(

(1

Here, the P (Bi) is the prior probabilities for all the values of B. the entropy about the value of B after splitting and noticing the values of A is defined by the following: ( | = ∑ ( | − ∑ ( ( ( | (2

Tcptrace - -csv -l inputfile.pcap> outputfile.csv Further, this command only extracted the features that related to TCP/IP protocol and removed all the unwanted packets, such as packets with the application layer and the packets that come from other protocols like UDP and ICMP. In this work, only the extracted bidirectional features that are the most relevant to the characteristics of header flags and bytes were kept and the others were removed such as source and destination IP address and ports, first packet arrival time, and last packet arrival time. Additionally, to create a final dataset for training and testing, each connection was labeled as normal or TCP SYN attack and 10000 records were randomly chosen for each class (i.e., Normal and TCP SYN).

=−

Where the P (Bi |Ai) is the posterior probability of B given the values of A, thus the information gain is calculated by: ( | = ( − ( | (3 •

Gain Ratio (GR): it considers the probability of each feature value to select the set of features; it normalizes the information gain with feature entropy (intrinsic value) to handle the information gain drawback by preventing the bias of selection process to the multi-valued features [22]. The gain ratio of the feature A and a feature value B can be calculated as in (4): ( , = (4 (

2017 8th International Conference on Information Technology (ICIT) While the intrinsic value can be calculated by: | |

Intrinsic value (A = − ∑ | | ×

| |

Here, the |S| is the number of possible values features A can take, while the |Si| is the number of actual value of feature A. Using Weka tool the feature selection method based on 10flod cross validation was executed. Here, due to the large number of features, only the first 25 features for each method were selected. Table II explains the features with description which selected by the three methods. After that, the output feature sets were entered into a simple intersection process as in (6) to select the optimal feature set. The features that meet the intersection criteria (i.e. 12 features) were selected and used as a final dataset for evaluation based on the machine learning methods. Table III illustrates the selected features (the number of feature as mentioned in the Table II) with respect to each filter method and the selected features after intersection process. }∩{

{

} ∩ {

TABLE III. Filtered Method Relief F Information Gain Gain Ratio Intersection process (Proposed)

_

Actual class of records

THE FINAL SELECTED FEATURES

The Selected Features 1, 2, 27, 26, 3, 29, 5, 4, 30, 28, 6, 7, 8, 32, 9, 10, 11, 12, 33, 34, 15, 31, 16, 17, 36 15, 2, 13, 35, 33, 32, 34, 7, 14, 3, 27, 5, 39, 37, 1, 38, 24, 25, 22, 4, 18, 30, 28, 20, 19 2, 3, 5, 4, 37, 30, 28, 18, 20, 40, 11, 17, 21, 19, 27, 10, 6, 7, 14, 34, 33, 32, 22, 23, 15 2, 3, 5, 4, 30, 28, 27, 34, 7, 32, 15, 33

Confusion matrix: it uses to evaluate the performance of IDS using four metrics as shown in the Table IV. The True Positive (TP) represents the number of attack records that are correctly classified as attack, True Negative (TN) represents the number of normal records that are classified correctly as normal, False Positive (FP) shows the number of normal records that are misclassified as attack, and False Negative (FN) illustrates the number of attack records that are misclassified as normal.

CONFUSION MATRIX Predicted class of records Positive

Positive

TP

Negative

FP

Negative

FN TN

•

Accuracy: It identifies the total number of records that correctly classified respect to the total number of records.

•

Detection Rate: it identifies the number of attack records that correctly classified respect to the total number of attack records.

•

Error Rate: The percentage of misclassified records by the classifier algorithm (i.e. false alarm), if the classifier predicts the class of instance correctly, it is counted as a success while if not it is counted as error.

} (6)

D. Machine Learning Evaluation To evaluate the efficiency of IDS with previous extracted features, different machine learning algorithms were executed such as Multilayer perceptron neural network (MLP-NN), decision tree (J48), Naive Bayes (NB), and K-means clustering. The performance of algorithms in detecting and classifying the traffic into normal or SYN flood attack was measured using the following metrics: •

TABLE IV.

(5)

The experiments was conducted using Weka tool based on 10-fold cross validation for classifier algorithms , the confusion matrix is constructed from MLP-NN, J48, NB, and K-Means are shown in Table V, VI, VII, and VIII. In addition, the results for accuracy, false alarms, and detection rate are displayed in the Table IX. It can be observed from the Table IX that the TCP SYN attack records can be detected with 100% with MLP-NN and the K-Means clustering. In addition, it is identified that the Decision tree (J48) provides more accuracy and less error rate when compared with MLP-NN, NB, and K-Means, the results show the its accuracy is 99.995% and the error rate is 0.005%. From all experiments, it is evident that extracted features are an efficient contributor in classifying the cloud traffic and in particular detecting SYN flood attack. TABLE V.

CONFUSION MATRIX OF MLP-NN Predicted class of records


Positive

SYN-Flood

10000

Normal

4

TABLE VI.

Negative

0 9996

CONFUSION MATRIX OF J48 Predicted class of records


Positive

SYN-Flood

9999

Normal

0

Negative

1 10000

2017 8th International Conference on Information Technology (ICIT)

TABLE II. Number

1 2 3

Feature Name

SYN/FIN_Pkts_sent_a2b Req_1323_ws/ts_a2b Req_sack_a2b

THE SELECTED FEATURES WITH DISCREPTION Description

The total number of packets with SYN/FIN bits sit in the TCP header respectively from source to destination If the endpoint from source to destination requested window scaling/time stamp options as specified in RFC 1323 [23] Aa ‘Y’ is printed, if not ‘N’ is printed In the connection from source to destination, If the endpoint sent SACK in the SYN packet opening the connection, ‘Y’ is printed, else ‘N’ is printed

4

MSS_requested_a2b

The maximum segment size as a TCP option in the SYN packet opening connection from source to destination

5

Adv_wind_scale_a2b

This option determine if the window scale was used in the connection from source to destination

6

Initial_window_pkts_a2b

The total number of packets sent in the initial window from source to destination

7

Min_win_adv_a2b

The minimum window advertisement sent from source to destination

Missed_data_a2b

The missed data which is calculated by the difference between the ttl stream length and the unique bytes sent From source to destination

9

ttl_stream_length_a2b

This is calculated by the difference between the sequence number of the SYN and FIN packets from source to destination

10

Initial _window_bytes_a2b

The total number of bytes sent in the initial window from source to destination

11

Min_seg_size_a2b

The minimum size of segment seen during the life time of connection from source to destination

12

Resets_sent_a2b

The count of reset (RST) packets sent from source to destination

13

Avg_win_adv_a2b

The average of window advertisement sent from source to destination

14

Max_win_adv_a2b

The maximum of window advertisement sent from source to destination

15

Idle time_max_a2b

The maximum time between consecutive packets sent from source to destination

16

Out of order_pkts_a2b

The total number of packets that arrive out of order from source to destination

17

Avg_seg_size_a2b

The average of segments size seen during the life time of connection from source to destination

Pure_ack_sent_a2b

The number of ack packets seen without payload and any SYNFIN/RST flags bits set in the connection from source to destination

19

Actual_data_pkts_a2b

The total number of packets that contain at least a byte of TCP payload from source to destination

20

Actual_data_bytes_a2b

The total number of bytes of data including retransmission from source to destination

21

Pushed_data_pkts_a2b

The total number of packets with push flag bits set in the TCP header from source to destination

22

ACK_Pkts_sent_a2b

The total number of ACK packets sent from source to destination

23

Duplicate_ack_b2a

The total number of duplicated ACKs packet sent from destination to source

24

Total_packet_a2b

The total number of packet exchange from source to destination

25

Unique_bytes_sent_a2b

The total number bytes of data sent excluding the retransmission bytes from source to destination

26

SYN/FIN_Pkts_sent b2a

The total number of packets with SYN/FIN bits sit in the TCP header respectively from destination to source

8

18

27 28

Req_1323_ws/ts_b2a Req_sack_b2a

If the endpoint from destination to source requested window scaling/time stamp options as specified in RFC 1323 [23] Aa ‘Y’ is printed, if not ‘N’ is printed In connection from source to destination If the endpoint sent SACK in the SYN packet opening the connection, ‘Y’ is printed, else ‘N’ is printed

29

MSS_requested_b2a

The maximum segment size as a TCP option in the SYN packet opening connection from destination to source

30

Adv_wind_scale_b2a

This option determine if the window scale was used in the connection from destination to source

31

Initial_window_pkts_b2a

The total number of packets sent in the initial window from destination to source

32

Min_win_adv_b2a

The minimum window advertisement sent from destination to source

33

Avg_win_adv_b2a

The average of window advertisement sent destination to source

34

Max_win_adv_b2a

The maximum of window advertisement sent from destination to source

35

Idle time_max_b2a

The maximum time between consecutive packets sent from destination to source

36

Out of order_pkts_b2a

The total number of packets that arrive out of order from destination to source

Pure_ack_sent_b2a

The number of ack packets seen without payload and any SYNFIN/RST flags bits set in the connection from destination to source

38

ACK_Pkts_sent_b2a

The total number of ACK packets sent from destination to source

39

Total_packet_b2a

The total number of packet exchange from destination to source

40

Max_seg_size_a2b

The maximum size of segment seen during the life time of connection from source to destination

37

2017 8th International Conference on Information Technology (ICIT) TABLE VII.

CONFUSION MATRIX OF NB [5]

Predicted class of records


Positive

SYN Flood

9832

Normal

0 TABLE VIII.

Negative

168 10000 CONFUSION MATRIX OF K-MEANS

[6]

[7]

Predicted class of records


Positive

SYN-Flood

10000

Normal

359

Negative

[8]

0 9641

[9] TABLE IX.

CONFUSION MATRIX OF K-MEANS Accuracy (%)

Error rate (%)

MLP-NN

99.980

0.020

100

J48

99.995

0.005

99.99

NB

99.160

0.840

98.32

K-Means

98.205

1.795

100

Classifier Algorithms

Detection Rate (%)

C. Conclusion In this work, the SYN flood attack in the virtual cloud environment was analyzed and detected based on the TCP/IP header statistical features. The traffic that exchanged between virtual machines was collected in the testbed environment. The collected pcap files were analyzed using Tcptrace tool to extract the statistical features from TCP/IP headers. Due to the high number of extracted features, the most important features were selected using a simple intersection process between three popular filter methods. To evaluate the efficiency of intrusion detection system in detecting SYN attack based on the extracted features, different machine learning algorithms, namely, MLP-NN, NB. J48 and K-means algorithms were used. The results show that all the algorithms proved highly effective in detecting SYN flood attack. Further modifications and ideas can be considered in a future work to evaluate the extracted features using different machine learning methods and apply it in a real cloud environment to make the necessary modifications and developments that enhance the performance. REFERENCES [1] [2]

[3]

[4]

Gartner, http://www.gartner.com/newsroom/id/3384720, Acess time: january, 2017. c. Gong, J. Liu, Q. Zhang, H. Chen, and Z. Gong, “ The characteristics of cloud computing”. In Parallel Processing Workshops (ICPPW), 2010 39th International Conference on, 2010, pp. 275-279. IEEE. Kaspersky Lab, https://securelist.com/analysis/quarterly-malwarereports/74550/kaspersky-ddos-intelligence-report-for-q1-2016/. Access time: octber, 2016 A. Bakshi, and B. Yogesh, “Securing cloud from DDOS Attacks using Intrusion-Detection System”. Communication Software and Networks,

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18] [19]

[20] [21] [22]

[23]

2010. ICCSN'10. Second International Conference on. 2010, pp. 260264. IEEE. N. Moustafa, and S. Jill. "Creating novel features to anomaly network detection using DARPA-2009 data set." In Proceedings of the 14th European Conference on Cyber Warfare and Security. Academic Conferences Limited, 2015, PP. 204-212. S. Manavi, S . Mohammadalian, N. Udzir, and A. Abdullah, “Hierarchical secure virtualization model for cloud”. In Cyber Security, Cyber Warfare and Digital Forensic (CyberSec), 2012 International Conference on. 2012, pp.219-224, IEEE. A. Duncan, S. Cruse, and M. Goldsmith, “Insider attacks in cloud computing”. In 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications, 2012, pp.857862. A. Hovav, and J. D'Arcy, “The Impact of Denial of Service Attack Announcements on the Market Value of Firms”. Risk Management and Insurance Review, 2003. pp. 97-121 M. Bogdanoski, T. Shuminoski, and A. Risteski, “Analysis of the SYN Flood DoS Attack”. Computer Network and Information Security, 2013, pp.1-10.. K. Hussain,, H. Syed Jawad, D. Veena, N. Muhammad, and A. Muhammad Awais. "An Adaptive SYN Flooding attack Mitigation in DDOS Environment." International Journal of Computer Science and Network Security (IJCSNS) 16, 2016, PP.27-33. S. H. C . Haris,. R. B. Ahmad, and M. A. H. A. Ghani. "Detecting TCP SYN flood attack based on anomaly detection." In Network Applications Protocols and Services (NETAPPS), 2010 Second International Conference on. IEEE, 2010, , pp. 240-244. V. Siris, and F. Papagalou. "Application of anomaly detection algorithms for detecting SYN flooding attacks." Computer communications 29, no. 9 ,2006, PP. 1433-1442. K. Pai, N. HR, and A. Bhat. "Detection and Performance Evaluation of DoS/DDoS Attacks using SYN Flooding Attacks." International Journal of Computer Applications, 2014, PP.1-4. D. Divakaran, H. Murthy, and T.Gonsalves. "Detection of SYN flooding attacks using linear prediction analysis." In Networks, 2006. ICon'06. 14th IEEE International Conference on, vol. 1, IEEE, 2006, pp. 1-6. D. Kshirsagar, S. Sawant, A. Rathod, and S. Wathore. "CPU Load Analysis & Minimization for TCP SYN Flood Detection." Procedia Computer Science 85,2016, PP. 626-633. Y. Ohsita, S. Ata, and M. Murata, “Detecting Distributed Denial-ofService Attacks by Analyzing TCP SYN Packets Statistically,”Proceeding of the IEEE Communications Society Globecom, pp.2043-2049, 2004. A, Rawashdeh, M. kassasbeh, and M. AL-Hawawreh, “An Anomaly based approach for DDoS attack detection in cloud enviroment”, International Journal of Computer Applications in Technology (IJCAT), accepted, 2017. http://www.hping.org/hping3.html, Access time: September, 2016. M. Bogdanoski, S. Tomislav, and R. Aleksandar. "Analysis of the SYN flood DoS attack." International Journal of Computer Network and Information Security 5, 2013, PP. 1-11. http://www.tcptrace.org/, Access time : 26 January 2017. M. Robnik-Šikonja, and K. Igor ."Theoretical and empirical analysis of ReliefF and RReliefF." Machine learning 53, 2003, PP. 23-69. O. Osanaiye, H. Cai, K. Raymond Choo, A. Dehghantanha, Z. Xu, and M. Dlodlo. "Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing." EURASIP Journal on Wireless Communications and Networking. 2016, PP.1-11. D. Borman, S. Richard, and V. Jacobson. "TCP extensions for high performance." Internet Engineering Task Force (IETF), 2014.